equipes:rcln:ancien_wiki:documents:dynamic_semantic_annotation

Dynamic semantic annotation

Natural Language Engineering, Semantic Annotation, Content Management, Knowledge Engineering, Semantic Web

Adeline Nazarenko (LIPN, Université Paris 13 – Sorbonne Paris Cité & CNRS)

The semantic annotation of documents plays a key role for many applications of textual content management (e.g. navigation, semantic information retrieval, publication). Semantic Annotation consists in enriching a text with metadata which semantics is given by a formal semantic model (e.g. indexing language, thesaurus, ontology) [(B. Popov, A. Kiryakov, D. Ognyanoff, D. Manov, A. Kirilov. « Kim – a semantic platform for information extraction and retrieval ». //Natural Language Engineering//, 10(3-4):375–392, 2004.)] [(kirkayov>A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoff. « Semantic annotation, indexing, and retrieval ». //Journal of Web Semantics//, 2(1):49–79, 2004.)] 1). A formal semantic representation is thus associated with the text so that search engines or software agents can jointly exploit the textual content (plain text search, distributional measures) and the formal semantics associated with it.

The first generation annotation tools are quite simple. They often merely bind references to named entities identified in the texts to existing instances or new instances of concepts in an ontology [(magnini>B. Magnini, E. Pianta, O. Popescu, M. Speranza. « « Ontology population from textual mentions: Task definition and benchmark ». In //Proceedings of the OLP2 workshop on Ontology Population and Learning//, Sidney, Australia, 2006.)] [(giuliano>C. Giuliano, A. Gliozzo. « Instance-based ontology population exploiting named-entity substitution ». In //Proceedings of the 22nd International Conference on Computational Linguistics// (Coling 2008), pages 265–272, Manchester, August 2008.)]. However, the development of specialized applications of content management and linked data calls for renewed methods of semantic annotation: we need methods and tools that provide a richer expressiveness of annotation (e.g. annotation wrt . concepts and relations and not only instances) while being robust, generic and adaptable to different domains and use cases.

The goal is to design a semantic annotation method incorporating annotation quality measures and enabling the dynamic revision of annotations, assuming that the semantic model is ontological.

If we consider that an annotation system S = <O,T,A> consists of an ontology O, a text T and a set of annotations or links A associating segments of with entities of O, one must revise the system S if one of its components is updated (the text is modified, the ontology is enriched or restructured ) or when inconsistencies or gaps i n coverage are detected.

The PhD student will study the different scenarios requiring the revision of such an annotation system and propose a method of dynamic annotation integrating such a revision process. The dynamic annotation method must 1) integrate consistency criteria and coverage metrics to identify when the revision of an annotation system is necessary, 2) propose revision procedures adapted to different use scenarios and 3) control the convergence of the overall revision process.

Starting with the simplest types of annotation (e.g. a text annotated with instances and concepts of an ontology), the student will provide a method for dynamic annotation. It will rely on existing semantic annotation tools, on the expertise of RCLN team members and on real use cases to assess the contribution of t his dynamic annotation.

The proposed method will be directly integrated into an existing annotation tools or tested through simulation if integration is too complex.

On the opposite of sequential approaches, the goal of this PhD will be to design a semantic annotation method which allows not only to populate semantic models but, furthermore, to dynamically update these models while being annotated. In practical terms, it means to integrate the process of annotation and knowledge acquisition.

In order to design a dynamic semantic annotation method, we propose the following plan:

  1. To define the target annotation types and identify the necessary tools to implement them (eventually a new annotation tool might be developped).
  2. To identify and model the triggering conditions that would justify the model update during the annotation process (it could be, for instance, a recall measure or an incoherence detection).
  3. To define and formalize the update procedures of the semantic model and the operations allowing their implementations: the update scope could be the model itself (add/remove/update of a certain element or a wider update of the model); the annotation rules allowing the projection of the model on the text might be affected as well.
  4. In certain cases it might be necessary to verify an already assigned annotation in order to take into account a model update.
  5. To extend the scope of the update to the case where aligned semantic resources are being used in parallel for the annotation process.

While the PhD student can rely on existent works on ontology population [(magnini)] [(giuliano)], semantic annotation [(Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, and Angel Kirilov. //Kim – a semantic platform for information extraction and retrieval//. Nat. Lang. Eng., 10(3-4) :375–392, 2004)] [(kirkayov)] 2) and semantic referencial evolution, specially in ontologies 3) 4) 5), she/he should extend and structure them according with the project goals.

The student will rely as well on knowledge acquisition from text tools developed by the RCLN team (Terminae 6) and SemEx 7)) and on the experience of semantic corpus building, whether automatic 8) 9) or manual 10). She/he could also make use of related works on semantic information retrieval 11).

At first, it will be necessary to work on classic ontologies ant thesaurus modes with the traditional standards (SKOS, OWL-DL) and on well established technologies for but other semantic models could be eventually proposed.

The problem of dynamic semantic annotation arises both for during automatic and manual annotation processes. The PhD work could focus on automatic annotation or treat both approaches in parallel.

The work will be supervised by Pr. Adeline Nazarenko and Pr. Francois Levy.

The student will be integrated in the RCLN team and benefit from its expertise in natural language processing, knowledge engineering and semantic web. In particular, RCLN has a solid experience in semantic annotation (manual annotation [(K. Fort. //Les ressources annotées, un enjeu pour l’analyse de contenu : vers une méthodologie de l’annotation manuelle de corpus. Thèse d'informatique//, Université Paris 13 – Sorbonne Paris Cité, Villetaneuse, France, 2012.)] [(fort1)] or based on machine learning 12), formalisms and resources for annotation [(Y. Ma, A. Nazarenko, L. Audibert. //Formal description of resources for ontology-based semantic annotation//. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010), Malta, May 2010. ELRA.)] [( N. Omrane, A. Nazarenko, P. Rosina, S. Szulman, C. Westphal. //Lexicalized ontology for a business rules management platform: An automotive use case//. In Proceedings of the 5th International Symposium on Rules, International Business Rules Forum (RuleMF@BRF), Ft Lauderdale, Florida, USA, November 2011.)]) and text-based ontology design 13). It also knows how to integrate those methods of acquisition and annotation in content analysis tools [(A. Guissé, F. Lévy, A. Nazarenko. //Un moteur sémantique pour explorer des textes réglementaires//. In Actes des 22èmes journées francophones d'Ingénierie des Connaissances, Chambéry, 2011.)] [(F. Lévy, A. Nazarenko, A. Guissé. //Annotation, indexation et parcours de documents numériques//. Revue des Sciences et Technologies de l'Information, 13(3/2010):121–152, 2010.)] 14).

The student will work at LIPN (University Paris 13 - Sorbonne Paris Cité & CNRS) where he/she will be assigned a desk. He/she will have access to local facilities and data resources.

RCLN is a member of excellence lab "Empirical Foundations of Language" (Labex EFL: research strand “computational semantic analysis”), where Dynamic semantic annotation is a main scientific concern.

Applications should be addressed to Adeline Nazarenko before May 26, 2014 : send a cover letter and a CV.


1), 2) V. Uren, P. Cimiano, J. Iria, S. Handschuh, M. Vargas-Vera, E. Motta, F. Ciravegna. « Semantic annotation for knowledge management: Requirements and a survey of the state of the art ». Journal of Web Semantics, 4, 2006.
3) Pieter De Leenheer and Tom Mens. Ontology evolution : State of the art and future directions. In Martin Hepp, Pieter De Leenheer, Aldo de Moor, and York Sure, editors, Ontology Management : Semantic Web, Semantic Web Services, and Business Applications , pages 131–176. Springer, 2007
4) Zied Sellami, Valérie Camps, and Nathalie Aussenac-Gilles. Dynamo-mas : a multi-agent system for ontology evolution from text. J. Data Semantics, 2(2-3) :145–161, 2013.
5) Rim Djedidi and Marie-Aude Aufaure. Ontology change management. In A. Paschke, H. Weigand, W. Behrendt, K. Tochtermann, and T. Pellegrini, editors, 5th International Conference on Semantic Systems (I-Semantics 09), Proceedings of I-KNOW'09 and I-SEMANTICS?09 , pages 611–621, Graz, Austria, September 2009. Verlag der Technischen Universitt Graz
6) Nathalie Aussenac-Gilles, Sylvie Despres, and Sylvie Szulman. The TERMINAE Method and Platform for Ontology Engineering from texts. In Paul Buitelaar and Philipp Cimiano, editors, Bridging the Gap between Text and Knowledge - Selected Contributions to Ontology Learning and Population from Text , pages 199–223. IOS Press, janvier 2008.
7) François Lévy, Adeline Nazarenko, Abdoulaye Guissé, Nouha Omrane, and Sylvie Szulman. An environment for the joint management of written policies and business rules. In Proceedings of the International Conference on Tools with Artificial Intelligence (IEEE-ICTAI 10) , pages 142–149, 2010
8) Yue Ma, Adeline Nazarenko, and Laurent Audibert. Formal description of resources for ontology-based semantic annotation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) , Malta, May 2010. ELRA
9), 12) Yue Ma, François Lévy, and Sudeep Ghimire. Reasoning with Annotations of Texts. In The 24th Florida Artificial Intelligence Research Society Conference (FLAIRS-24) , pages 192–197, États-Unis, May 2011.
10) Karën Fort, Adeline Nazarenko, and Sophie Rosset. Modeling the Complexity of Manual Annotation Tasks : a Grid of Analysis. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) , Mumbai, India, December 2012.
11) Haïfa Zargayouna. Indexation sémantique de documents XML. Thèse de doctorat. Université Paris-Sud, Déc. 2005.
13) terminae“
14) A. Nazarenko, A. Guissé, F. Lévy, N. Omrane, S. Szulman. Integrating Written Policies in Business Rule Management Systems. In Rule-Based reasoning, Programming, and Applications, volume 6826 of Lecture Notes in Computer Science, pages 99–113, Barcelona, Espagne, 2011.
  • Dernière modification: il y a 4 ans