The team brings together expertise in Natural Language Processing (NLP), corpus linguistics, and the Semantic Web, as well as data mining and machine learning. These complementary skills provide a unique positioning, enabling the team to conduct innovative work in the analysis and exploration of text corpora, as well as in knowledge acquisition from texts or knowledge graphs.
The team develops approaches to tackle complex problems in texts and knowledge graphs—such as deep syntactic parsing, joint extraction of semantic relations and named entities, and knowledge graph completion. These efforts leverage methods from deep learning, combinatorial optimization, data mining, and inductive logic programming. The team is particularly active in processing low-resource or specialized languages, neologism detection, and microblog analysis.
The RCLN team plays an active role in the leadership and research of the Labex « Empirical Foundations of Linguistics » (EFL), where it coordinates the Computational Semantic Analysis axis. At the local level, it is a member of the MathSTIC federation co-leading the axis Optimization and learning applied to digital content.
The team is structured around three closely interconnected research axes :
These three axes are complementary: the syntactico-semantic analysis of corpora serves as a foundation for text exploration and knowledge acquisition; conversely, analysis algorithms can benefit significantly from the acquired knowledge.
Scientific literature mining is defined as a cross-cutting axis for the team. Indeed, research publications and shared datasets could be better leveraged by intelligent systems to support and accelerate scientific efforts. This includes facilitating expert extraction, generating state-of-the-art text summaries, formulating scientific hypotheses, or providing evidence to support or refute existing claims.
Link to the 2017–2022 Activity Report and 2023–2028 Project of the RCLN team