Information extraction from spanish tweets hackaton (2015)

De wikiRcln
Aller à : navigation, rechercher

We are looking for funding to organize an information extraction hackaton from Spanish tweets in Mexico on 2015. The goal of the hackaton would be to motivate post-graduate students to get a hands on experience on natural language processing techniques and maybe to attract them to masters of PhD studies in our field.

The event would be held on Mexico City (UNAM) or Puebla (INAOE) on April 2015. We would propose to the students a growing complexity list of information extraction and classification tasks based on the Rep Lab task from CLEF 2014 [reference needed] and a Spanish tweet corpus provided by the BUAP University.

  • Named entities extraction: simple extraction of places, person names, institution names and explicit dates from tweets.
  • News sentiment analysis: tweet classification according to positive or negative opinion related to breaking news.
  • Violent events: information extraction about tweets reporting violent events, where geographical and temporal information might be extracted.
  • Migration movements: information extraction about people on the move, and specifically about Latin-American emigrates.
  • Disease spread: tweet classification and information extraction on people talking about symptoms, diseases and self-protecting behavior.

The hackaton would take a 48 hours format split on two days. The first day would be dedicated to an introduction of NLP methods and talks about the task-set and available corpus and tools. The second day would take a more traditional hackaton approach, with a 24 hours non-stop programming challenge, where participants would bring their laptops, and the organization would provide them with shelter, data, wifi, beverages and food. If NAACL funding is granted, we will look for additional sources for funding best applications prizes.


La materia de prima de todos los retos es un corpus de N tweets en español y una caja de herramientas de lingüística computacional.

Análisis de sentimientos

Clasificación de los tuits de acuerdo a la polaridad de opinión relativa al tema del tuit (positiva o negativa).

Extracción de entidades

Extraer lugares, personas, instituciones y fechas especificadas en los tuits.

Detección de eventos violentos

Extracción de información acerca de eventos violentos reportados en los tuits, de preferencia con coordenadas geográfica y temporal (si están han sido especificadas en el tuit).

Detección de eventos sanitarios

Extracción de información acerca de brotes de enfermedades o condición de salud general expresada en los tuits.

Caracterización de autor

Definir un perfil que caracterice al autor de los tuits (sexo, edad, profesión, lugar de origen, lugar en donde vive, gustos, disgustos).

Twitter analytics

Reto libre que consiste en proporcionar análisis estadísticos


  • What will this funding enable you to achieve?

Propuesta anterior

  • Budget for total amount of NAACL funding requested (in US dollars)

$1,500 to support the hackton activities

  • Indicate how you will distribute this funding.
 * Travel expenses $500 (10 students, bus fares)
 * Accommodation $500  (7 double rooms, students and organization)
 * Food $500 (1 day meals, rest to be funded by other means)
  • Are you willing to provide us with a post-conference/workshop report.


  • Contact information for the individual or organization making the request.

A decidirse

  • Include URL links to any additional information.

A decidirse

Plan de trabajo


Página oficial del jakatón 2015