An information extraction method for assisting a citizen count of violent events in Mexico
To assist with an information extraction method the citizen counters of the @menosdias project, who since 2010 have counted more than 50000 violent victims in Mexico. Counters are volunteers who must read the Mexican online press during one week in order to register violent events in the @menosdias blog and tweeter account. The goal of the project is to extract violent events from online sources and propose violent event candidates to the counter. The main output would be a blog post, a tweet and a record in a violent events database.
- Parallel corpus construction. One corpus would be built from all the blog posts since 2015, the other from the tweets
- Corpus alignment by means of semantic similarity between blog posts and tweets
- Named entities annotation of places, person names and dates on the parallel corpus
- POS annotation and syntactic parsing.
- Semantic parsing and violent event extraction
- Violent event candidates validation by the human counter
- Training and testing data sets creation.
- First evaluation on testing datasets
- Second evaluation on @menosdias blank weeks (where no human volunteer was found to count)
First Iteration (EMNLP)
- Extract blog posts and tweets
- Assigned to: Iván and Jorge
- Due date: March 27
- Corpus alignment
- Calculate SOPA semantic similarity between blog posts in a chronological order
- Align blog and posts
- Assigned to: Iván, Jorge, Davide
- Due date: April 17
- Alignment evaluation
- Assigned to: Iván and Jorge
- EMNLP paper writing
- Deadline
- (long papers): May 30
- (short papers): June 15
Second Iteration
- Syntactic, semantic parsing and information extraction…
- Web application for @menosdias counters
LIPN - Université Paris 13
- Davide Buscaldi, LIPN, Université Paris 13
- Jorge García Flores, LIPN, Université Paris 13
- Thierry Charnois, LIPN, Université Paris 13
- Alejandro Vélez
- Jaimie (UK)
On @menosdias
- Menos Días Aquí: tweeter account and blog
On violent event extraction
- The New War Correspondents: The Rise of Civic Media Curation in Urban Warfare, by Andres Monroy-Hernandez, Danah Boyd, Emre Kıcıman, Munmun De Choudhury, and Scott Counts, 23 February 2013.
- Knowing Where and How Criminal Organizations Operate Using Web Content, by Viridiana Rios and Michele Coscia, 24/11/2012.
- May 30: EMNLP 2015
- August 2015: Violent event extraction to a database
- November 2015: A web application for @menosdias counters
- 2016: A journal publication, automatic extraction for blank weeks (and maybe some funding)