-
Notifications
You must be signed in to change notification settings - Fork 276
Pull requests: modelscope/data-juicer
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix OOM issue in unittest/dist
bug
Something isn't working
dj:ci/cd
issues/PRs about CI/CD of Data-Juicer
#793
opened Sep 29, 2025 by
HYLcool
Loading鈥�
Fix Auto Prompt pipeline in sandbox
bug
Something isn't working
#791
opened Sep 28, 2025 by
HYLcool
Loading鈥�
Add notebook detection and auto-redirect in logger setup
#790
opened Sep 26, 2025 by
cmgzn
Loading鈥�
Optimize the auto num_proc calculation of operators in ray mode
#789
opened Sep 19, 2025 by
Cathy0908
Loading鈥�
Add Operator-Level Parallel Data Processing with Ray Actors
dj:dist
issues/PRs about distributed data processing
dj:efficiency
regarding to efficiency issues and enhancements
enhancement
New feature or request
#761
opened Aug 19, 2025 by
Cccccc0630
Loading鈥�
Support of partitioning/checkpointing/event-logging
#748
opened Jul 24, 2025 by
cyruszhang
Loading鈥�
Evalscope evaluator & MedEval evaluator for dj-sandbox
#722
opened Jun 26, 2025 by
lingzhq
Loading鈥�
[NewOp] Add generate_challenging_qa_mapper based on MindGYM principles
#703
opened Jun 14, 2025 by
Bat-Reality
Loading鈥�
[WIP] Optimization framework
dj:core
issues/PRs about the core functions of Data-Juicer
dj:efficiency
regarding to efficiency issues and enhancements
#702
opened Jun 13, 2025 by
cyruszhang
Loading鈥�
[NewOp] Add domain_diversity_selector based on DaaR principles
#699
opened Jun 12, 2025 by
lingzhq
Loading鈥�
Add
RayBTSMinhashDeduplicatorWithUid and DocumentMinhashDeduplicatorWithUid.
#677
opened May 22, 2025 by
chenyushuo
Loading鈥�
Optimize dedup to avoid oom
dj:dist
issues/PRs about distributed data processing
dj:efficiency
regarding to efficiency issues and enhancements
dj:tools
issues/PRs about specific tools
enhancement
New feature or request
good first issue
Good for newcomers
#568
opened Feb 7, 2025 by
coolderli
Loading鈥�
Add humanvbench operators
dj:multimodal
issues/PRs about multimodal data processing
dj:op
issues/PRs about some specific OPs
good first issue
Good for newcomers
#553
opened Jan 17, 2025 by
SYSUzhouting
Loading鈥�
Add minhash deduplicator based on RAY and Redis
dj:dist
issues/PRs about distributed data processing
dj:efficiency
regarding to efficiency issues and enhancements
dj:op
issues/PRs about some specific OPs
#489
opened Nov 15, 2024 by
pan-x-c
Loading鈥�
鏀寔RangeSpecifiedFieldSelector浣跨敤鎸囧畾瀛楁鐨勫�煎煙杩涜鏁版嵁閫夋嫨
#432
opened Sep 21, 2024 by
2108038773
•
Draft
[WIP]Add text tagging by prompt mapper op
dj:op
issues/PRs about some specific OPs
#408
opened Aug 30, 2024 by
garyzhang99
Loading鈥�
1 task
Add GPT-4V as evaluator
dj:multimodal
issues/PRs about multimodal data processing
enhancement
New feature or request
stale-pr
ProTip!
Add no:assignee to see everything that鈥檚 not assigned.