Artificial Relationships in Fiction
Artificial Relationships in Fiction (ARF) is a synthetically annotated dataset for Relation Extraction (RE) in fiction, created from a curated selection of literary texts sourced from Project Gutenberg. The dataset captures the rich, implicit relationships within fictional narratives using a novel ontology and GPT-4o for annotation. ARF is the first large-scale RE resource designed specifically for literary texts, advancing both NLP model training and computational literary analysis.
fiction_books: Metadata-rich corpus of 6,322 public domain fiction books (1850–1950) with inferred author gender and thematic categorization.fiction_books_in_chunks: Books segmented into 5-sentence chunks (5.96M total), preserving narrative coherence via 1-sentence overlap.fiction_books_with_relations: A subset of 95,475 text chunks annotated with 128,000+ relationships using GPT-4o and a fiction-specific ontology.fiction_booksbook_id: Unique Project Gutenberg ID.title: Title of the book.author: Author name.author_birth_year / author_death_year: Author lifespan.release_date: PG release date.subjects: List of thematic topics (mapped to 51 standardized themes).gender: Inferred author gender (via GPT-4o).text: Cleaned full book text.fiction_books_in_chunksbook_id, chunk_index: Book and chunk identifiers.text_chunk: Five-sentence excerpt from the book.synthetic_relations_in_fiction_books (ARF)book_id, chunk_index: Identifiers.text_chunk: Five-sentence text segment.relations: A list of structured relation annotations, each containing:
entity1, entity2: Text spans.entity1Type, entity2Type: Entity types based on ontology.relation: Relationship type.Each annotated relation is formatted as:
{
"entity1": "Head Entity text",
"entity2": "Tail Entity text",
"entity1Type": "Head entity type",
"entity2Type": "Tail entity type",
"relation": "Relation type"
}
Example:
{
"entity1": "Vortigern",
"entity2": "castle",
"entity1Type": "PER",
"entity2Type": "FAC",
"relation": "owns"
}
| Entity Type | Description |
|-------------|-------------|
| PER | Person or group of people |
| FAC | Facility – man-made structures for human use |
| LOC | Location – natural or loosely defined geographic regions |
| WTHR | Weather – atmospheric or celestial phenomena |
| VEH | Vehicle – transport devices (e.g., ship, carriage) |
| ORG | Organization – formal groups or institutions |
| EVNT | Event – significant occurrences in narrative |
| TIME | Time – chronological or historical expressions |
| OBJ | Object – tangible items in the text |
| SENT | Sentiment – emotional states or feelings |
| CNCP | Concept – abstract ideas or motifs |
| Relation Type | Entity 1 Type | Entity 2 Type | Description |
|----------------------|------------------|-------------------|-------------------------------------------|
| parent_father_of | PER | PER | Father relationship |
| parent_mother_of | PER | PER | Mother relationship |
| child_of | PER | PER | Child to parent |
| sibling_of | PER | PER | Sibling relationship |
| spouse_of | PER | PER | Spousal relationship |
| relative_of | PER | PER | Extended family relationship |
| adopted_by | PER | PER | Adopted by another person |
| companion_of | PER | PER | Companionship or ally |
| friend_of | PER | PER | Friendship |
| lover_of | PER | PER | Romantic relationship |
| rival_of | PER | PER | Rivalry |
| enemy_of | PER/ORG | PER/ORG | Hostile or antagonistic relationship |
| inspires | PER | PER | Inspires or motivates |
| sacrifices_for | PER | PER | Makes a sacrifice for |
| mentor_of | PER | PER | Mentorship or guidance |
| teacher_of | PER | PER | Formal teaching relationship |
| protector_of | PER | PER | Provides protection to |
| employer_of | PER | PER | Employment relationship |
| leader_of | PER | ORG | Leader of an organization |
| member_of | PER | ORG | Membership in an organization |
| lives_in | PER | FAC/LOC | Lives in a location |
| lived_in | PER | TIME | Historically lived in |
| visits | PER | FAC | Visits a facility |
| travel_to | PER | LOC | Travels to a location |
| born_in | PER | LOC | Birthplace |
| travels_by | PER | VEH | Travels by a vehicle |
| participates_in | PER | EVNT | Participates in an event |
| causes | PER | EVNT | Causes an event |
| owns | PER | OBJ | Owns an object |
| believes_in | PER | CNCP | Believes in a concept |
| embodies | PER | CNCP | Embodies a concept |
| located_in | FAC | LOC | Located in a place |
| part_of | FAC/LOC/ORG | FAC/LOC/ORG | Part of a larger entity |
| owned_by | FAC/VEH | PER | Owned by someone |
| occupied_by | FAC | PER | Occupied by someone |
| used_by | FAC | ORG | Used by an organization |
| affects | WTHR | LOC/EVNT | Weather affects location or event |
| experienced_by | WTHR | PER | Weather experienced by someone |
| travels_in | VEH | LOC | Vehicle travels in a location |
| based_in | ORG | LOC | Organization based in a location |
| attended_by | EVNT | PER | Event attended by person |
| ends_in | EVNT | TIME | Event ends at a time |
| occurs_in | EVNT | LOC/TIME | Event occurs in a place or time |
| features | EVNT | OBJ | Event features an object |
| stored_in | OBJ | LOC/FAC | Object stored in a place |
| expressed_by | SENT | PER | Sentiment expressed by person |
| used_by | OBJ | PER | Object used by person |
| associated_with | CNCP | EVNT | Concept associated with event |
| Metric | Value | |----------------------------|------------| | Books | 96 | | Authors | 91 | | Gender Ratio (M/F) | 55% / 45% | | Subgenres | 51 | | Annotated Chunks | 95,475 | | Relations per Chunk | 1.34 avg | | Chunks with No Relations | 35,230 | | Total Relations | ~128,000 |
If you use this dataset in your research, please cite:
@inproceedings{christou-tsoumakas-2025-artificial,
title = "Artificial Relationships in Fiction: A Dataset for Advancing {NLP} in Literary Domains",
author = "Christou, Despina and Tsoumakas, Grigorios",
editor = "Kazantseva, Anna and Szpakowicz, Stan and Degaetano-Ortlieb, Stefania and Bizzoni, Yuri and Pagel, Janis",
booktitle = "Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)",
month = may,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.latechclfl-1.13/",
pages = "130--147",
ISBN = "979-8-89176-241-1"
}