TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/Penn Treebank

Penn Treebank

TextsCustomIntroduced 1993-01-01

The English Penn Treebank (PTB) corpus, and in particular the section of the corpus corresponding to the articles of Wall Street Journal (WSJ), is one of the most known and used corpus for the evaluation of models for sequence labelling. The task consists of annotating each word with its Part-of-Speech tag. In the most common split of this corpus, sections from 0 to 18 are used for training (38 219 sentences, 912 344 tokens), sections from 19 to 21 are used for validation (5 527 sentences, 131 768 tokens), and sections from 22 to 24 are used for testing (5 462 sentences, 129 654 tokens). The corpus is also commonly used for character-level and word-level Language Modelling.

Source: Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for Sequence Modelling Image Source: https://dl.acm.org/doi/10.5555/972470.972475

Benchmarks

Chunking/F1 scoreConstituency Parsing/F1 scoreDependency Parsing/LASDependency Parsing/UASDependency Parsing/POSOpen Information Extraction/F1Open Information Extraction/AUCPart-Of-Speech Tagging/AccuracyShallow Syntax/F1 score

Related Benchmarks

Penn Treebank (Character Level)/Language Modelling/Bit per Character (BPC)Penn Treebank (Character Level)/Language Modelling/Number of paramsPenn Treebank (Character Level) 3x1000 LSTM - 500 Epochs/Stochastic Optimization/Bit per Character (BPC)Penn Treebank (Word Level)/Language Modelling/ParamsPenn Treebank (Word Level)/Language Modelling/Test perplexityPenn Treebank (Word Level)/Language Modelling/Validation perplexity

Statistics

Papers
1,006
Benchmarks
9

Links

Homepage

Tasks

ChunkingConstituency Grammar InductionConstituency ParsingDependency ParsingLanguage ModellingMissing ElementsOpen Information ExtractionPart-Of-Speech TaggingShallow SyntaxStochastic OptimizationUnsupervised Dependency Parsing