ReviewRobot Dataset
ReviewRobot Dataset
Overview
This repository contains data for paper ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis. [Dataset]
Dataset
There are three folders: Raw_data, IE_result, and KGs.
Raw_data folder
The Raw_data has two parts: Background Corpus and Paper-review Corpus.
We create the Background Corpus by selecting machine learning related pappers from the Semantic Scholar Open Research Corpus. It contains papers with their titles and abstracts published from the year of 1965 to 2019 (included).
The Paper-review Corpus contains parsed paper pdfs and their corresponding reviews. The paper-review pairs of acl_2017 and iclr_2017 folders come from PeerRead dataset. We fetched the rest from OpenReview and NeruIPS. We parsed those pdfs using GROBID. In each folder, metadata.txt contains all human reviews, and the txt/ folder contains all processed papers.
IE_result folder
The IE_result folder contains information extraction results from SciIE. In each group, the *_json/ contains tokenized texts, and the *_output/ contains IE results of tokenized texts.
The Background_IE contains two folders from one group for all paper abstracts from 1965 to 2019.
The Paper-review_IE contains four folders from two groups. The first group: iclrnipsabs_json and iclrnipsabs_output contain IE results for abstracts of Paper-review Corpus. The second group: iclrnips_json and iclrnips_output contain IE results for rest of papers in Paper-review Corpus.
KGs
The KGs folder contains the knowledge graphs built on the IE_result.
back_kg
The back_kg contains the background KGs built up to a certain year. For each year, there are three files.
Take 2012 as an example:
2012.pklcontains the background knowledge graph up to (include) 2012. It contains a dictionary of 6 fields:num_docis the number of papers up to that year,cluster2entityis a mapping from the entity to its mentions,entity2clusteris a mapping from the mention to its corresponding entity,cluster2typeis a mapping from the entity to its type,entityrefers to all mentions in current KG, andrelationsrefers to all relations in current KG.2012_key.pklcontains the mappings from knowledge elements to paper ids. It has two fields:clusteris the mapping from an entity to its corresponding paper ids, andrelationis the mapping from a relation to the corresponding paper ids.2012_papercontains the mappings from paper id to its paper title.
idea_kg
The idea_kg folder contains idea KGs constructed from paper abstracts and conclusions. Each line is a paper in the venue and has the following fields: id for the paper id, abs_num for the number of abstract sentences, sent for all sentences related to idea_kg, entity for all mentions in current KG, cluster2sent for the corresponding sentence ids for a specific entity, entity2num for the occurence of a specific mention, relation2num for the occurence of a specific relation, cluster2entity for a mapping from the entity to its mentions, entity2type conains a mapping from the mention to the type, relations for all relations in current KG, relation2sent for corresponding sentence ids for a specific relation, and entity2cluster for a mapping from the mention to its corresponding entity.
related_kg
The related_kg contains related KGs constructed from related work for each venue. It is of the same structure as idea_kg.
contribute_kg
The contribute_kg contains contribute KGs constructed from paper contribution section (under introduction section) and experiment section. It contains a dictionary of 4 fields: id for the paper id, total for the number of entities covered in the contribution section, covered for the number of entities covered in the experiment section, sents related sentences that covered those entities from both sections.
future_kg
The future_kg contains future KGs constructed from future work for each venue. It is of the same structure as idea_kg.
Review-annotation
The Review-annotation folder contains human annotations for review category and paper-review sentence pairs. The review.txt contains annotation for review category including 236 sentences for "SUMMARY", 33 sentences for "NOVELTY", 174 sentences for "SOUNDNESS_CORRECTNESS", 16 sentences for "MEANINGFUL_COMPARISON", and 14 sentences for "IMPACT". The pair.txt contains 2,535 review-paper pairs. For each pair, the first slot is the review sentence; the second slot is the paper sentence, the third slot is the label where 0 indicates two sentences are not related and 1 indicates they are related.
License
Creative Commons — Attribution 4.0 International — CC BY 4.0