ACL-RelAcS is a corpus designed for semantic RELation ACquiSition (extraction and classification) in the scientific domain.
It is composed of abstracts and introductions of about 11.000 papers from the
ACL Anthology Corpus, with automatically annotated domain concepts in the entire corpus, and manually annotated semantic relations in 500 abstracts.
The corpus was developed at LIPN Université Paris 13 and at LATTICE, CNRS with funding from LABEX-EFL.
SemEval 2018 Dataset
Together with the ACL-RD TEC corpus, ACL-RelAcS was used in SemEval 2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers. Note that the SemEval task and dataset use the same abstracts but a different relation typology.
If you are interested in the SemEval dataset, please go to the dedicated website.
Annotation
Concepts were identified and annotated fully automatically, based on a combination of terminology extraction and available ontological resources.
The annotation relies on the Saffron Knowledge Extraction Framework (domain models for computer
science and natural language processing), BabelNet and the terminology extractor TermSuite.
A typology of semantic relations between concepts is also proposed. This typology, consisting of
18 domain-specific and 3 generic relations, is the result of a corpus-based investigation of the
text sequences occurring between concepts in sentences.
A sample of ~500 abstracts from the corpus is manually
annotated with semantic relations. Only explicit relations are taken into account,
so that the data could serve to train or evaluate pattern-based semantic relation classification systems.