TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/CoNLL 2003

CoNLL 2003

TextsUnknownIntroduced 2003-06-12

CoNLL-2003 is a named entity recognition dataset released as a part of CoNLL-2003 shared task: language-independent named entity recognition. The data consists of eight files covering two languages: English and German. For each of the languages there is a training file, a development file, a test file and a large file with unannotated data.

The English data was taken from the Reuters Corpus. This corpus consists of Reuters news stories between August 1996 and August 1997. For the training and development set, ten days worth of data were taken from the files representing the end of August 1996. For the test set, the texts were from December 1996. The preprocessed raw data covers the month of September 1996.

The text for the German data was taken from the ECI Multilingual Text Corpus. This corpus consists of texts in many languages. The portion of data that was used for this task, was extracted from the German newspaper Frankfurter Rundshau. All three of the training, development and test sets were taken from articles written in one week at the end of August 1992. The raw data were taken from the months of September to December 1992.

| English data | Articles | Sentences | Tokens | LOC | MISC | ORG | PER | |-------------------|----------|-----------|---------|------|------|------|------| | Training set | 946 | 14,987 | 203,621 | 7140 | 3438 | 6321 | 6600 | | Development set | 216 | 3,466 | 51,362 | 1837 | 922 | 1341 | 1842 | | Test set | 231 | 3,684 | 46,435 | 1668 | 702 | 1661 | 1617 |

Number of articles, sentences, tokens and entities (locations, miscellaneous, organizations, and persons) in English data files.

| German data | Articles | Sentences | Tokens | LOC | MISC | ORG | PER | |-------------------|----------|-----------|---------|------|------|------|------| | Training set | 553 | 12,705 | 206,931 | 4363 | 2288 | 2427 | 2773 | | Development set | 201 | 3,068 | 51,444 | 1181 | 1010 | 1241 | 1401 | | Test set | 155 | 3,160 | 51,943 | 1035 | 670 | 773 | 1195 |

Number of articles, sentences, tokens and entities (locations, miscellaneous, organizations, and persons) in German data files.

Benchmarks

Chunking/AUCChunking/AccuracyChunking/F1Chunking/PrecisionChunking/RecallCross-Lingual/SpanishCross-Lingual/GermanCross-Lingual/DutchCross-Lingual Transfer/SpanishCross-Lingual Transfer/GermanCross-Lingual Transfer/DutchEvent Extraction/AUCEvent Extraction/AccuracyEvent Extraction/F1Event Extraction/PrecisionEvent Extraction/RecallImage Enhancement/F1 scoreInformation Extraction/AUCInformation Extraction/AccuracyInformation Extraction/F1Information Extraction/PrecisionInformation Extraction/RecallNamed Entity Recognition (NER)/AUCNamed Entity Recognition (NER)/AccuracyNamed Entity Recognition (NER)/F1Named Entity Recognition (NER)/PrecisionNamed Entity Recognition (NER)/RecallOpen Information Extraction/AUCOpen Information Extraction/AccuracyOpen Information Extraction/F1Open Information Extraction/PrecisionOpen Information Extraction/RecallShallow Syntax/AUCShallow Syntax/AccuracyShallow Syntax/F1Shallow Syntax/PrecisionShallow Syntax/Recall

Related Benchmarks

CONLL 2003 Dutch/Information Extraction/F1 scoreCONLL 2003 German/Information Extraction/F1 scoreCoNLL 2003 (English)/Chunking/F1CoNLL 2003 (English)/Named Entity Recognition (NER)/F1CoNLL 2003 (English)/Shallow Syntax/F1CoNLL 2003 (German)/Chunking/F1CoNLL 2003 (German)/Named Entity Recognition (NER)/F1CoNLL 2003 (German)/Shallow Syntax/F1CoNLL 2003 (German) Revised/Named Entity Recognition (NER)/F1Conll 2003 Spanish/Information Extraction/F1 score

Statistics

Papers
755
Benchmarks
37

Links

Homepage

Tasks

ChunkingCol BERTTripletCross-LingualCross-Lingual NERCross-Lingual TransferEvent ExtractionFG-1-PG-1Image EnhancementInformation ExtractionInformation RetrievalLow Resource Named Entity RecognitionNERNamed Entity RecognitionNamed Entity Recognition (NER)Open Information ExtractionPOSSemantic SimilarityShallow SyntaxSparse Information RetrievalToken ClassificationUIEWeakly-Supervised Named Entity Recognition