**Ceci est une ancienne révision du document !**
Keywords
Natural Language Engineering, Semantic Annotation, Content Management, Knowledge Engineering, Semantic Web
Advisor
Adeline Nazarenko (LIPN, Université Paris 13 – Sorbonne Paris Cité & CNRS)
Abstract
The semantic annotation of documents plays a key role for many applications of textual content management (e.g. navigation, semantic information retrieval, publication). Semantic Annotation consists in enriching a text with metadata which semantics is given by a formal semantic model (e.g. indexing language, thesaurus, ontology) [ B. Popov, A. Kiryakov, D. Ognyanoff, D. Manov, A. Kirilov. « Kim – a semantic platform for information extraction and retrieval ». Natural Language Engineering, 10(3-4):375–392, 2004. ] <ref name=“kirkayov”
> A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoff. « Semantic annotation, indexing, and retrieval ». //Journal of Web Semantics//, 2(1):49–79, 2004.</ref> <ref name="uren"> V. Uren, P. Cimiano, J. Iria, S. Handschuh, M. Vargas-Vera, E. Motta, F. Ciravegna. « Semantic annotation for knowledge management: Requirements and a survey of the state of the art ». //Journal of Web Semantics//, 4, 2006.</ref>. A formal semantic representation is thus associated with the text so that search engines or software agents can jointly exploit the textual content (plain text search, distributional measures) and the formal semantics associated with it.
The first generation annotation tools are quite simple. They often merely bind references to named entities identified in the texts to existing instances or new instances of concepts in an ontology <ref name=“magnini”
>B. Magnini, E. Pianta, O. Popescu, M. Speranza. « « Ontology population from textual mentions: Task definition and benchmark ». In //Proceedings of the OLP2 workshop on Ontology Population and Learning//, Sidney, Australia, 2006.</ref> <ref name="giuliano">C. Giuliano, A. Gliozzo. « Instance-based ontology population exploiting named-entity substitution ». In //Proceedings of the 22nd International Conference on Computational Linguistics// (Coling 2008), pages 265–272, Manchester, August 2008.</ref>. However, the development of specialized applications of content management and linked data calls for renewed methods of semantic annotation: we need methods and tools that provide a richer expressiveness of annotation (e.g. annotation wrt . concepts and relations and not only instances) while being robust, generic and adaptable to different domains and use cases.
Goal
The goal is to design a semantic annotation method incorporating annotation quality measures and enabling the dynamic revision of annotations, assuming that the semantic model is ontological.
If we consider that an annotation system S = <O,T,A> consists of an ontology O, a text T and a set of annotations or links A associating segments of with entities of O, one must revise the system S if one of its components is updated (the text is modified, the ontology is enriched or restructured ) or when inconsistencies or gaps i n coverage are detected.
The PhD student will study the different scenarios requiring the revision of such an annotation system and propose a method of dynamic annotation integrating such a revision process. The dynamic annotation method must 1) integrate consistency criteria and coverage metrics to identify when the revision of an annotation system is necessary, 2) propose revision procedures adapted to different use scenarios and 3) control the convergence of the overall revision process.
Starting with the simplest types of annotation (e.g. a text annotated with instances and concepts of an ontology), the student will provide a method for dynamic annotation. It will rely on existing semantic annotation tools, on the expertise of RCLN team members and on real use cases to assess the contribution of t his dynamic annotation.
The proposed method will be directly integrated into an existing annotation tools or tested through simulation if integration is too complex.
On the opposite of sequential approaches, the goal of this PhD will be to design a semantic annotation method which allows not only to populate semantic models but, furthermore, to dynamically update these models while being annotated. In practical terms, it means to integrate the process of annotation and knowledge acquisition.
Perspective and plan
In order to design a dynamic semantic annotation method, we propose the following plan:
- To define the target annotation types and identify the necessary tools to implement them (eventually a new annotation tool might be developped).
- To identify and model the triggering conditions that would justify the model update during the annotation process (it could be, for instance, a recall measure or an incoherence detection).
- To define and formalize the update procedures of the semantic model and the operations allowing their implementations: the update scope could be the model itself (add/remove/update of a certain element or a wider update of the model); the annotation rules allowing the projection of the model on the text might be affected as well.
- In certain cases it might be necessary to verify an already assigned annotation in order to take into account a model update.
- To extend the scope of the update to the case where aligned semantic resources are being used in parallel for the annotation process.
While the PhD student can rely on existent works on ontology population <ref name=“magnini”/> <ref name=“giuliano”/>, semantic annotation <ref>Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, and Angel Kirilov. Kim – a semantic platform for information extraction and retrieval. Nat. Lang. Eng., 10(3-4) :375–392, 2004</ref> <ref name=“kirkayov”/> <ref name=“uren”/> and semantic referencial evolution, specially in ontologies <ref>Pieter De Leenheer and Tom Mens. Ontology evolution : State of the art and future directions. In Martin Hepp, Pieter De Leenheer, Aldo de Moor, and York Sure, editors, Ontology Management : Semantic Web, Semantic Web Services, and Business Applications , pages 131–176. Springer, 2007</ref> <ref>Zied Sellami, Valérie Camps, and Nathalie Aussenac-Gilles. Dynamo-mas : a multi-agent system for ontology evolution from text. J. Data Semantics, 2(2-3) :145–161, 2013.</ref> <ref>Rim Djedidi and Marie-Aude Aufaure. Ontology change management. In A. Paschke, H. Weigand, W. Behrendt, K. Tochtermann, and T. Pellegrini, editors, 5th International Conference on Semantic Systems (I-Semantics 09), Proceedings of I-KNOW'09 and I-SEMANTICS?09 , pages 611–621, Graz, Austria, September 2009. Verlag der Technischen Universitt Graz</ref>, she/he should extend and structure them according with the project goals.
The student will rely as well on knowledge acquisition from text tools developed by the RCLN team (Terminae <ref name=“terminae”
>Références
[
[1]] Nathalie Aussenac-Gilles, Sylvie Despres, and Sylvie Szulman. The TERMINAE Method and Platform for Ontology Engineering from texts. In Paul Buitelaar and Philipp Cimiano, editors, Bridging the Gap between Text and Knowledge - Selected Contributions to Ontology Learning and Population from Text , pages 199–223. IOS Press, janvier 2008.</ref> and SemEx <ref>François Lévy, Adeline Nazarenko, Abdoulaye Guissé, Nouha Omrane, and Sylvie Szulman. An environment for the joint management of written policies and business rules. In Proceedings of the International Conference on Tools with Artificial Intelligence (IEEE-ICTAI 10) , pages 142–149, 2010</ref>) and on the experience of semantic corpus building, whether automatic <ref name=“yue2”
>Yue Ma, Adeline Nazarenko, and Laurent Audibert. //Formal description
o
f resources for ontology-based semantic annotation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) , Malta, May 2010. ELRA</ref><ref name=“yue1”>Yue Ma, François Lévy, and Sudeep Ghimire. Reasoning with Annotations of Texts. In The 24th Florida Artificial Intelligence Research Society Conference (FLAIRS-24) , pages 192–197, États-Unis, May 2011.</ref> or manual <ref name=“fort1”>Karën Fort, Adeline Nazarenko, and Sophie Rosset. Modeling the Complexity of Manual Annotation Tasks : a Grid of Analysis. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) , Mumbai, India, December 2012.</ref>. She/he could also make use of related works on semantic information retrieval <ref>Haïfa Zargayouna. Indexation sémantique de documents XML. Thèse de doctorat. Université Paris-Sud, Déc. 2005.</ref>. At first, it will be necessary to work on classic ontologies ant thesaurus modes with the traditional standards (SKOS, OWL-DL) and on well established technologies for but other semantic models could be eventually proposed. The problem of dynamic semantic annotation arises both for during automatic and manual annotation processes. The PhD work could focus on automatic annotation or treat both approaches in parallel. ===== Context ===== The work will be supervised by Pr. Adeline Nazarenko and Pr. Francois Levy. The student will be integrated in the RCLN team and benefit from its expertise in natural language processing, knowledge engineering and semantic web. In particular, RCLN has a solid experience in semantic annotation (manual annotation <ref>K. Fort. Les ressources annotées, un enjeu pour l’analyse de contenu : vers une méthodologie de l’annotation manuelle de corpus. Thèse d'informatique, Université Paris 13 – Sorbonne Paris Cité, Villetaneuse, France, 2012.</ref> <ref name=“fort1”></ref> or based on machine learning <ref name=“yue1”></ref>, formalisms and resources for annotation <ref>Y. Ma, A. Nazarenko, L. Audibert. Formal description of resources for ontology-based semantic annotation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010), Malta, May 2010. ELRA.</ref> <ref> N. Omrane, A. Nazarenko, P. Rosina, S. Szulman, C. Westphal. Lexicalized ontology for a business rules management platform: An automotive use case. In Proceedings of the 5th International Symposium on Rules, International Business Rules Forum (RuleMF@BRF), Ft Lauderdale, Florida, USA, November 2011.</ref>) and text-based ontology design <ref name=“terminae”></ref>. It also knows how to integrate those methods of acquisition and annotation in content analysis tools <ref>A. Guissé, F. Lévy, A. Nazarenko. Un moteur sémantique pour explorer des textes réglementaires. In Actes des 22èmes journées francophones d'Ingénierie des Connaissances, Chambéry, 2011.</ref> <ref>F. Lévy, A. Nazarenko, A. Guissé. Annotation, indexation et parcours de documents numériques. Revue des Sciences et Technologies de l'Information, 13(3/2010):121–152, 2010.</ref> <ref>A. Nazarenko, A. Guissé, F. Lévy, N. Omrane, S. Szulman. Integrating Written Policies in Business Rule Management Systems. In Rule-Based reasoning, Programming, and Applications, volume 6826 of Lecture Notes in Computer Science, pages 99–113, Barcelona, Espagne, 2011.</ref>. The student will work at LIPN (University Paris 13 - Sorbonne Paris Cité & CNRS) where he/she will be assigned a desk. He/she will have access to local facilities and data resources. RCLN is a member of excellence lab "Empirical Foundations of Language" (Labex EFL: research strand “computational semantic analysis”), where Dynamic semantic annotation is a main scientific concern. Applications should be addressed to Adeline Nazarenko before May 26, 2014 : send a cover letter and a CV. ===== References ===== <references/> ===== Conferences and summer schools ===== * LDQ 2015: 2nd Workshop on Linked Data Quality co-located with ESWC 2015, Portorož, Slovenia (Deadline: March 6, 2015) * Journee: TOTh 2015, Formation: Construction d'ontologies à des fins terminologiques,2-3 juin 2015, Chambery * Journee d'etude ATALA, Fouille d'opinions et analyse de sentiments, 21 mars 2015 * EUROLAN 2015: Summer School on Linguistic Linked Open Data, 13 - 25 July 2015, Sibiu, Romania * CfP - 4th Workshop on the Multilingual Semantic Web, 1st June 2015, Portoroz, Slovenia * 1st Summer Datathon on Linguistic Linked Open Data (SD-LLOD’15), June 15thto 19th2015, Universidad Politécnica de Madrid * The 9th International Web Rule Symposium (RuleML) * 27th European Summer School in Logic, Language and Information (ESSLLI 2015) which has a chapter on ontologies ===== PhD topic (english) ===== Dynamic semantic annotation: analysis, modeling and implementation ===== Sujet de thèse en français ===== Dynamique de l’annotation sémantique : analyse, modélisation et mise en œuvre