Skip to content

Pull requests: modelscope/data-juicer

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or 鈬� + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who鈥檚 assigned
Assigned to nobody Loading
Sort

Pull requests list

Fix OOM issue in unittest/dist bug Something isn't working dj:ci/cd issues/PRs about CI/CD of Data-Juicer
#793 opened Sep 29, 2025 by HYLcool Loading鈥�
Fix Auto Prompt pipeline in sandbox bug Something isn't working
#791 opened Sep 28, 2025 by HYLcool Loading鈥�
Add notebook detection and auto-redirect in logger setup
#790 opened Sep 26, 2025 by cmgzn Loading鈥�
Add Operator-Level Parallel Data Processing with Ray Actors dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements enhancement New feature or request
#761 opened Aug 19, 2025 by Cccccc0630 Loading鈥�
Support of partitioning/checkpointing/event-logging
#748 opened Jul 24, 2025 by cyruszhang Loading鈥�
[NewOp] Add group_diversity_filter op
#745 opened Jul 22, 2025 by lingzhq Loading鈥�
Update video_split_by_scene_mapper.py
#744 opened Jul 21, 2025 by liuyuhanalex Loading鈥�
Add lidar object segmentation op
#736 opened Jul 14, 2025 by Qirui-jiao Loading鈥�
Evalscope evaluator & MedEval evaluator for dj-sandbox
#722 opened Jun 26, 2025 by lingzhq Loading鈥�
[WIP] add lidar object detection op
#721 opened Jun 26, 2025 by Cathy0908 Loading鈥�
[WIP] Optimization framework dj:core issues/PRs about the core functions of Data-Juicer dj:efficiency regarding to efficiency issues and enhancements
#702 opened Jun 13, 2025 by cyruszhang Loading鈥�
[WIP] deduping benchmark suite
#607 opened Mar 4, 2025 by cyruszhang Loading鈥�
Optimize dedup to avoid oom dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:tools issues/PRs about specific tools enhancement New feature or request good first issue Good for newcomers
#568 opened Feb 7, 2025 by coolderli Loading鈥�
Add humanvbench operators dj:multimodal issues/PRs about multimodal data processing dj:op issues/PRs about some specific OPs good first issue Good for newcomers
#553 opened Jan 17, 2025 by SYSUzhouting Loading鈥�
Add minhash deduplicator based on RAY and Redis dj:dist issues/PRs about distributed data processing dj:efficiency regarding to efficiency issues and enhancements dj:op issues/PRs about some specific OPs
#489 opened Nov 15, 2024 by pan-x-c Loading鈥�
Automatically split input dataset in ray mode
#415 opened Sep 4, 2024 by pan-x-c Loading鈥�
[WIP]Add text tagging by prompt mapper op dj:op issues/PRs about some specific OPs
#408 opened Aug 30, 2024 by garyzhang99 Loading鈥�
1 task
Add GPT-4V as evaluator dj:multimodal issues/PRs about multimodal data processing enhancement New feature or request stale-pr
#276 opened Mar 22, 2024 by drcege Draft DJ-SORA
ProTip! Add no:assignee to see everything that鈥檚 not assigned.