LitBank
TextsIntroduced 2019-06-01
LitBank is an annotated dataset of 100 works of English-language fiction to support tasks in natural language processing and the computational humanities, described in more detail in the following publications:
- David Bamman, Sejal Popat and Sheng Shen (2019), "An Annotated Dataset of Literary Entities," NAACL 2019.
- Matthew Sims, Jong Ho Park and David Bamman (2019), "Literary Event Detection," ACL 2019.
- David Bamman, Olivia Lewke and Anya Mansoor (2020), "An Annotated Dataset of Coreference in English Literature", LREC.
LitBank currently contains annotations for entities, events, entity coreference, and quotation attribution in a sample of ~2,000 words from each of those texts, totaling 210,532 tokens.
LitBank is licensed under a Creative Commons Attribution 4.0 International License.