ACL-RelAcS

ACL-RelAcS is a corpus designed for semantic RELation ACquiSition (extraction and classification) in the scientific domain. It is annotated with domain-relevant concepts and semantic relations. The corpus is composed of abstracts and introductions of about 11.000 papers from the ACL Anthology Corpus.

The corpus is being developed at LIPN Université Paris 13 and at LATTICE, CNRS.

Concepts were identified and annotated fully automatically, based on a combination of terminology extraction and available ontological resources. The annotation relies on the Saffron Knowledge Extraction Framework (domain models for computer science and natural language processing), BabelNet and the terminology extractor TermSuite.
A typology of semantic relations between concepts is also proposed. This typology, consisting of 18 domain-specific and 3 generic relations, is the result of a corpus-based investigation of the text sequences occurring between concepts in sentences.
A sample of ~500 abstracts from the corpus is manually annotated with semantic relations. Only explicit relations are taken into account, so that the data could serve to train or evaluate pattern-based semantic relation classification systems.

Together with the ACL-RD TEC corpus, ACL-RelAcS was used in SemEval 2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers. Note that the SemEval task and dataset use the same abstracts but a different relation typology. If you are interested in the SemEval dataset, please go to the dedicated website.

If you are using ACL-RelAcS, or the SemEval2018 Task 7 data for academic research please cite:

Download

  • Full ACL-RelAcS corpus : 11.000 abstracts with automatic concept annotation can be downloaded here.
  • Manually annotated semantic relations: 350 (training) + 150 (test) abstracts can be downloaded here.
  • Fine-grained semantic relation typology used for ACL-RelAcS.
  • Responsive image
  • Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Licence.