Google Research Datasets
- Mountain View, CA
- http://research.google.com
Pinned
Repositories
- Video-Timeline-Tags-ViTT Public
A collection of videos annotated with timelines where each video is divided into segments, and each segment is labelled with a short free-text description
- poem-sentiment Public
Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This dataset can be used for tasks such as sentiment classification or style transfer for poems.
- ToTTo Public
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. We hope it can serve as a useful research benchmark for high-precision conditional text generation.
- NatGenMT Public
This dataset is intended as an evaluation benchmark for gender issues in Machine Translation. We consider the challenges in modeling and handling gendered language in the context of machine translation and extend over previous work that identifies issues using synthetic examples. We focus on the class of issues which surface when a neutral refer…
- C4_200M-synthetic-dataset-for-grammatical-error-correction Public
This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…