ACL-RelAcS Corpus

ACL-RelAcS

ACL-RelAcS is a corpus designed for semantic RELation ACquiSition (extraction and classification) in the scientific domain. It is composed of abstracts and introductions of about 11.000 papers from the ACL Anthology Corpus, with automatically annotated domain concepts in the entire corpus, and manually annotated semantic relations in 500 abstracts.
The corpus was developed at LIPN Université Paris 13 and at LATTICE, CNRS with funding from LABEX-EFL.

SemEval 2018 Dataset

Together with the ACL-RD TEC corpus, ACL-RelAcS was used in SemEval 2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers. Note that the SemEval task and dataset use the same abstracts but a different relation typology. If you are interested in the SemEval dataset, please go to the dedicated website.

Annotation

Concepts were identified and annotated fully automatically, based on a combination of terminology extraction and available ontological resources. The annotation relies on the Saffron Knowledge Extraction Framework (domain models for computer science and natural language processing), BabelNet and the terminology extractor TermSuite.
A typology of semantic relations between concepts is also proposed. This typology, consisting of 18 domain-specific and 3 generic relations, is the result of a corpus-based investigation of the text sequences occurring between concepts in sentences.
A sample of ~500 abstracts from the corpus is manually annotated with semantic relations. Only explicit relations are taken into account, so that the data could serve to train or evaluate pattern-based semantic relation classification systems.

If you are using ACL-RelAcs for academic research please cite:

Kata Gábor, Haïfa Zargayouna, Isabelle Tellier, Davide Buscaldi, Thierry Charnois: A Typology of Semantic Relations Dedicated to Scientific Literature Analysis. SAVE-SD Workshop at the 25th World Wide Web Conference. 2016
Kata Gábor, Haïfa Zargayouna, Davide Buscaldi, Isabelle Tellier, Thierry Charnois: Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature. Proceedings of the LREC 2016 Conference, Portoroz, Slovenia, May 2016.

If you are using the SemEval Task 7 dataset for academic research please cite:

Kata Gábor, Davide Buscaldi, Anne-Kathrin Schumann, Behrang QasemiZadeh, Haïfa Zargayouna, Thierry Charnois: Semeval-2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA, June 2018.

Download

Full ACL-RelAcS corpus : 11.000 abstracts with automatic concept annotation can be downloaded here.
Manually annotated semantic relations: 350 (training) + 150 (test) abstracts can be downloaded here.
Fine-grained semantic relation typology used for ACL-RelAcS.