TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/GYAFC

GYAFC

Grammarly’s Yahoo Answers Formality Corpus

TextsCustom (research-only)Introduced 2018-03-17

Grammarly’s Yahoo Answers Formality Corpus (GYAFC) is the largest dataset for any style containing a total of 110K informal / formal sentence pairs.

Yahoo Answers is a question answering forum, contains a large number of informal sentences and allows redistribution of data. The authors used the Yahoo Answers L6 corpus to create the GYAFC dataset of informal and formal sentence pairs. In order to ensure a uniform distribution of data, they removed sentences that are questions, contain URLs, and are shorter than 5 words or longer than 25. After these preprocessing steps, 40 million sentences remain.

The Yahoo Answers corpus consists of several different domains like Business, Entertainment & Music, Travel, Food, etc. Pavlick and Tetreault formality classifier (PT16) shows that the formality level varies significantly across different genres. In order to control for this variation, the authors work with two specific domains that contain the most informal sentences and show results on training and testing within those categories. The authors use the formality classifier from PT16 to identify informal sentences and train this classifier on the Answers genre of the PT16 corpus which consists of nearly 5,000 randomly selected sentences from Yahoo Answers manually annotated on a scale of -3 (very informal) to 3 (very formal). They find that the domains of Entertainment & Music and Family & Relationships contain the most informal sentences and create the GYAFC dataset using these domains.

Source: Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer

Benchmarks

1 Image, 2*2 Stitchi/Accuracy1 Image, 2*2 Stitchi/BLEU-41 Image, 2*2 Stitchi/Harmonic mean2D Classification/Accuracy2D Classification/BLEU-42D Classification/Harmonic mean2D Human Pose Estimation/Accuracy2D Human Pose Estimation/BLEU-42D Human Pose Estimation/Harmonic mean2D Semantic Segmentation/BLEUDrawing Pictures/AccuracyDrawing Pictures/BLEU-4Drawing Pictures/Harmonic meanSketch/AccuracySketch/BLEU-4Sketch/Harmonic meanStyle Transfer/AccuracyStyle Transfer/BLEU-4Style Transfer/Harmonic meanText Generation/BLEUText Style Transfer/BLEU

Statistics

Papers
103
Benchmarks
21

Links

Homepage

Tasks

1 Image, 2*2 Stitchi2D Classification2D Human Pose Estimation2D Semantic SegmentationDrawing PicturesFormality Style TransferSketchStyle TransferText GenerationText Style TransferUnsupervised Text Style Transfer