ACL-RelAcS is a corpus designed for semantic RELation ACquiSition (extraction and classification) in the scientific domain. It is annotated with domain-relevant concepts and semantic relations. The corpus is composed of abstracts and introductions of about 11.000 papers from the ACL Anthology Corpus.
Concepts were identified and annotated fully automatically, based on a combination of terminology extraction and available ontological resources. The annotation relies on the Saffron Knowledge Extraction Framework (domain models for computer science and natural language processing), BabelNet and the terminology extractor TermSuite.
A typology of semantic relations between concepts is also proposed. This typology, consisting of 18 domain-specific and 3 generic relations, is the result of a corpus-based investigation of the text sequences occurring between concepts in sentences. A sample of 500 abstracts from the corpus is currently being manually annotated with these semantic relations. Only explicit relations are taken into account, so that the data could serve to train or evaluate pattern-based semantic relation classification systems.

The corpus is being developed at LIPN Université Paris 13 and at LATTICE, CNRS.



  • The corpus with concept annotation can be downloaded here.
