ML Engineer @huggingface |
Maintainer of 🤗 Datasets
-
Hugging Face
- Paris
Block or Report
Block or report lhoestq
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
-
huggingface/datasets Public
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools -
huggingface/transformers Public
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
1,464 contributions in the last year
Less
More
Activity overview
Contributed to
huggingface/datasets,
huggingface/huggingface_hub,
huggingface/transformers
and 5 other
repositories
Contribution activity
February 2022
Created 10 commits in 1 repository
Created a pull request in huggingface/datasets that received 2 comments
Set base path to hub url for canonical datasets
This should allow canonical datasets to use relative paths to download data files from the Hub cc @polinaeterna this will be useful if we have audi…
+7
−2
•
2
comments
Opened 4 other pull requests in 1 repository
huggingface/datasets
1
open
3
merged
Reviewed 28 pull requests in 1 repository
huggingface/datasets
28 pull requests
- Multilingual Spoken Words
- added electricity load diagram dataset
- Fix bugs in NewsQA dataset
- Make RedCaps streamable
-
Support streaming in size estimation function in
push_to_hub -
Add more compression types for
to_json - Add FrugalScore metric
- WIP: update docs to new frontend/UI
- Patch all module attributes in its namespace
- Fix flatten of complex feature types
-
Check if indices values in
Dataset.selectare within bounds -
Add support for
AudioandImagefeature inpush_to_hub - Fix ClassLabel to/from dict when passed names_file
- Raise informative error when loading a save_to_disk dataset
- Add dev-only config to Natural Questions dataset
- Fix streaming for servers not supporting HTTP range requests
- Upgrade black to version ~=22.0
- Common voice validated partition
- PR for the CFPB Consumer Complaints dataset
- added told-br (brazilian hate speech) dataset
- Remove unnecessary 'r' arg in
- Fix TestCommand to copy dataset_infos to local dir with only data files
-
feat:
🎸 generate info if dataset_infos.json does not exist - Fix sem_eval_2018_task_1 download location
- Process .opus files with torchaudio
- Some pull request reviews not shown.
Created an issue in huggingface/datasets that received 6 comments
[Audio] MP3 resampling is incorrect when dataset's audio files have different sampling rates
The Audio feature resampler for MP3 gets stuck with the first original frequencies it meets, which leads to subsequent decoding to be incorrect. He…
6
comments
Opened 1 other issue in 1 repository
huggingface/datasets
1
open
1
contribution
in private repositories
Feb 16