An information extraction method for assisting a citizen count of violent events in Mexico

De wikiRcln
Révision de 20 mars 2015 à 23:41 par Jgflores (discussion | contributions)

(diff) ← Version précédente | Voir la version courante (diff) | Version suivante → (diff)
Aller à : navigation, rechercher

Un método de extracción de información para cuantificar eventos violentos en México con organizaciones civiles


To assist with an information extraction method the citizen counters of the @menosdias project, who since 2010 have counted more than 50000 violent victims in Mexico. Counters are volunteers who must read the Mexican online press during one week in order to register violent events in the @menosdias blog and tweeter account. The goal of the project is to extract violent events from online sources and propose violent event candidates to the counter. The main output would be a blog post, a tweet and a record in a violent events database.


  1. Parallel corpus construction. One corpus would be built from all the blog posts since 2015, the other from the tweets
  2. Corpus alignment by means of semantic similarity between blog posts and tweets
  3. Named entities annotation of places, person names and dates on the parallel corpus
  4. POS annotation and syntactic parsing.
  5. Semantic parsing and violent event extraction
  6. Violent event candidates validation by the human counter
  7. Training and testing data sets creation.
  8. First evaluation on testing datasets
  9. Second evaluation on @menosdias blank weeks (where no human volunteer was found to count)


First Iteration (EMNLP)

  1. Extract blog posts and tweets
    • Assigned to: Iván and Jorge
    • Due date: March 27
  2. Corpus alignment
    1. Calculate SOPA semantic similarity between blog posts in a chronological order
    2. Align blog and posts
    • Assigned to: Iván, Jorge, Davide
    • Due date: April 17
  3. Alignment evaluation
    • Assigned to: Iván and Jorge
  4. EMNLP paper writing
    • Deadline
      • (long papers): May 30
      • (short papers): June 15

Second Iteration

  1. Syntactic, semantic parsing and information extraction...
  2. Web application for @menosdias counters



LIPN - Université Paris 13




On @menosdias

  1. Menos Días Aquí: tweeter account and blog
  2. Menos Días Aquí: Civilian Casualties, the Archive, and Naming Violent Murders in Mexico”. E-misferica, 9.1-9.2, summer, 2012.
  3. Menos días aquí: Conteo, archivo y nombramiento civil de muertes por violencia en México
  4. Latest battlefield in Mexico's drug war: Social media
  5. Un blog ciudadano pone rostro a los muertos de la lucha contra el narco
  6. Messico, una mattanza senza fine. In un blog la conta dei morti per mantenere viva la memoria

On violent event extraction

  1. The New War Correspondents: The Rise of Civic Media Curation in Urban Warfare, by Andres Monroy-Hernandez, Danah Boyd, Emre Kıcıman, Munmun De Choudhury, and Scott Counts, 23 February 2013.
  2. Knowing Where and How Criminal Organizations Operate Using Web Content, by Viridiana Rios and Michele Coscia, 24/11/2012.
  3. Iraq Body Count
  4. Egypt's death toll
  5. Every casualty


  • May 30: EMNLP 2015
  • August 2015: Violent event extraction to a database
  • November 2015: A web application for @menosdias counters
  • 2016: A journal publication, automatic extraction for blank weeks (and maybe some funding)