Natural Language Engineering, Semantic Annotation, Content Management, Knowledge Engineering, Semantic Web
Adeline Nazarenko (LIPN, Université Paris 13 – Sorbonne Paris Cité & CNRS)
The semantic annotation of documents plays a key role for many applications of textual content management (e.g. navigation, semantic information retrieval, publication). Semantic Annotation consists in enriching a text with metadata which semantics is given by a formal semantic model (e.g. indexing language, thesaurus, ontology) [(B. Popov, A. Kiryakov, D. Ognyanoff, D. Manov, A. Kirilov. « Kim – a semantic platform for information extraction and retrieval ». //Natural Language Engineering//, 10(3-4):375–392, 2004.)] [(kirkayov>A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoff. « Semantic annotation, indexing, and retrieval ». //Journal of Web Semantics//, 2(1):49–79, 2004.)] 1). A formal semantic representation is thus associated with the text so that search engines or software agents can jointly exploit the textual content (plain text search, distributional measures) and the formal semantics associated with it.
The first generation annotation tools are quite simple. They often merely bind references to named entities identified in the texts to existing instances or new instances of concepts in an ontology [(magnini>B. Magnini, E. Pianta, O. Popescu, M. Speranza. « « Ontology population from textual mentions: Task definition and benchmark ». In //Proceedings of the OLP2 workshop on Ontology Population and Learning//, Sidney, Australia, 2006.)] [(giuliano>C. Giuliano, A. Gliozzo. « Instance-based ontology population exploiting named-entity substitution ». In //Proceedings of the 22nd International Conference on Computational Linguistics// (Coling 2008), pages 265–272, Manchester, August 2008.)]. However, the development of specialized applications of content management and linked data calls for renewed methods of semantic annotation: we need methods and tools that provide a richer expressiveness of annotation (e.g. annotation wrt . concepts and relations and not only instances) while being robust, generic and adaptable to different domains and use cases.
The goal is to design a semantic annotation method incorporating annotation quality measures and enabling the dynamic revision of annotations, assuming that the semantic model is ontological.
If we consider that an annotation system S = <O,T,A> consists of an ontology O, a text T and a set of annotations or links A associating segments of with entities of O, one must revise the system S if one of its components is updated (the text is modified, the ontology is enriched or restructured ) or when inconsistencies or gaps i n coverage are detected.
The PhD student will study the different scenarios requiring the revision of such an annotation system and propose a method of dynamic annotation integrating such a revision process. The dynamic annotation method must 1) integrate consistency criteria and coverage metrics to identify when the revision of an annotation system is necessary, 2) propose revision procedures adapted to different use scenarios and 3) control the convergence of the overall revision process.
Starting with the simplest types of annotation (e.g. a text annotated with instances and concepts of an ontology), the student will provide a method for dynamic annotation. It will rely on existing semantic annotation tools, on the expertise of RCLN team members and on real use cases to assess the contribution of t his dynamic annotation.
The proposed method will be directly integrated into an existing annotation tools or tested through simulation if integration is too complex.
On the opposite of sequential approaches, the goal of this PhD will be to design a semantic annotation method which allows not only to populate semantic models but, furthermore, to dynamically update these models while being annotated. In practical terms, it means to integrate the process of annotation and knowledge acquisition.
In order to design a dynamic semantic annotation method, we propose the following plan:
While the PhD student can rely on existent works on ontology population [(magnini)] [(giuliano)], semantic annotation [(Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, and Angel Kirilov. //Kim – a semantic platform for information extraction and retrieval//. Nat. Lang. Eng., 10(3-4) :375–392, 2004)] [(kirkayov)] 2) and semantic referencial evolution, specially in ontologies 3) 4) 5), she/he should extend and structure them according with the project goals.
The student will rely as well on knowledge acquisition from text tools developed by the RCLN team (Terminae 6) and SemEx 7)) and on the experience of semantic corpus building, whether automatic 8) 9) or manual 10). She/he could also make use of related works on semantic information retrieval 11).
At first, it will be necessary to work on classic ontologies ant thesaurus modes with the traditional standards (SKOS, OWL-DL) and on well established technologies for but other semantic models could be eventually proposed.
The problem of dynamic semantic annotation arises both for during automatic and manual annotation processes. The PhD work could focus on automatic annotation or treat both approaches in parallel.
The work will be supervised by Pr. Adeline Nazarenko and Pr. Francois Levy.
The student will be integrated in the RCLN team and benefit from its expertise in natural language processing, knowledge engineering and semantic web. In particular, RCLN has a solid experience in semantic annotation (manual annotation [(K. Fort. //Les ressources annotées, un enjeu pour l’analyse de contenu : vers une méthodologie de l’annotation manuelle de corpus. Thèse d'informatique//, Université Paris 13 – Sorbonne Paris Cité, Villetaneuse, France, 2012.)] [(fort1)] or based on machine learning 12), formalisms and resources for annotation [(Y. Ma, A. Nazarenko, L. Audibert. //Formal description of resources for ontology-based semantic annotation//. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010), Malta, May 2010. ELRA.)] [( N. Omrane, A. Nazarenko, P. Rosina, S. Szulman, C. Westphal. //Lexicalized ontology for a business rules management platform: An automotive use case//. In Proceedings of the 5th International Symposium on Rules, International Business Rules Forum (RuleMF@BRF), Ft Lauderdale, Florida, USA, November 2011.)]) and text-based ontology design 13). It also knows how to integrate those methods of acquisition and annotation in content analysis tools [(A. Guissé, F. Lévy, A. Nazarenko. //Un moteur sémantique pour explorer des textes réglementaires//. In Actes des 22èmes journées francophones d'Ingénierie des Connaissances, Chambéry, 2011.)] [(F. Lévy, A. Nazarenko, A. Guissé. //Annotation, indexation et parcours de documents numériques//. Revue des Sciences et Technologies de l'Information, 13(3/2010):121–152, 2010.)] 14).
The student will work at LIPN (University Paris 13 - Sorbonne Paris Cité & CNRS) where he/she will be assigned a desk. He/she will have access to local facilities and data resources.
RCLN is a member of excellence lab "Empirical Foundations of Language" (Labex EFL: research strand “computational semantic analysis”), where Dynamic semantic annotation is a main scientific concern.
Applications should be addressed to Adeline Nazarenko before May 26, 2014 : send a cover letter and a CV.