Notes on Semeval 2015 conference

In the plane I read Eneko's paper on STS task. I get the following quotations…
- The top 10 systems [English task] did not show statistical signiﬁcant variation among them.
- Aligning words between sentences has been the most popular approach for the top three participants (DLS@CU, ExBThemis, Samsung). They use WordNet (Miller, 1995), Mikolov Embeddings (Mikolov et al., 2013; Baroni et al., 2014) and PPDB (Ganitkevitch et al., 2013).
- Most teams add a machine learning algorithm to learn the output scores, but note that Samsung team did not use it in their best run.
- Only about one ﬁfth of the systems were un- supervised, among which, the top performing sys- tem, UMDuluth-BlueTeam-run1, was able to come within 0.1 correlation points from the top perform- ing system on Wikipedia and within 0.03 on the Newswire dataset. This relatively narrow gap suggests that unsupervised semantic textual similarity is aviable option for languages with limited resources
System's worth studying and reading the paper
- Samsung: 4th place (no significant statistical difference with 1st) without machine learning
- ExBThemis: Best paper, 2d in english, 1st in spanish (they'll give a presentation today)
- DLS@CU: Our alignment master Sultan (1st in english two years in a row)

I had a burrito breakfast in front of the hotel, I didn't know that there's this nice continental breakfast for the conference participants!
I say hi to Greg Grefenstette (INRIA) and Mariana Apidianaki (LIMSI)… Vive la France!
I had a cheestake for lunch (very fast, lots of clients for the poster)
I had a chat with Eneko Aguirre (sends Greetings to Davide) and David Cer, from Google (I still have to ask him about querying for the immigrants projects)
I learnt about the The 10th Workshop on Innovative Use of NLP for Building Educational Applications

He starts with some words about Adam Kilgariff:
- “He was totally allergical to bullshit
- “He wrote a great paper in 1997: I don't believe in word senses”
- “He never got a paper accepted in the main ACL conference (which is a scandal, given his contribution to the field)”
- “Recently we were talking about all this young people doing deep learning: they don't make any distinction between language and vision”
On the multimodal skip-gram model
- Inspired by language learning by children
- Using the Frank corpus
- Training the model for standard distributional semantics with 20K words extracted from a baby language learning corpus.. they try to predict the objects the babies are learning (hat, ring) with a skip-gram model. The corpus is composed of words and object images (I don't totally understand it). The taks is called Matching words with objects
“Look at the kitty! Look at the oink!”
- The model tries to predict from the word kitty the right cute animal image
- Concept learning, word learning, synonim learning inspired by the human cognition process
Concentration lost
- I lost concentration writing this chronicle while Marco speaks.. I just pick up some isolated words, like…
- How MMSkipGram visualizes new concepts
- As far as I understand, he's trying to train an MMSkipGram model in order to learn unkown words and associate them to images, inspired by the baby language learning process… looks intresting, but…
- Models of language acquisition

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval001.pdf

A task worth to participate in… two subtasks: paraphrase identification (bynary) and semantic similarity between tweets. In contrast to STS, you have much more spell variations and much more street language. Very interesting in deed, specially if we are thinking about processing @menosdias tweets.

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval002.pdf

352 features combined with logistic regression
- a lot of metrics, machine translation, biological metrics, etc!
- word and phrase embeddings (unless you're under a rock you must have heard about word and phrase embeddings!)
- Tweet alignments with embeddings (each atom to a corresponding atom on other side)
  - we compute cosin similarity between vectors to score a candidate aligned pair
  - Linear programming
  - Recurrent neural networks (RNN) had the best score from all their systems

Póster session

4 clients in half an hour (Greg Grefenstette included)… not bad
Most of the questions were related to why random forest worked better and how do we explain the note we give tto a sentence pair…
Next year we shoud participate also in Interpretable STS, in order to explain why we attribute a note (and to improve our alignment algorithms)
Lots of clients for the afternoonn sesion. Most of the intresting conentrates on the Geographical similarity feature and on the question: Why random forest work better?
Word embeddings are the new buzz word

ExB Themis: Extensive Feature Extraction from Word Alignments for Semantic Textual Similarity

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval046.pdf

They were the best in spanish and one of the best in english
Cherry-picking the best approaches

Alignmente, word embeddings, SVN
Preprocessing
- Tokenizatin, case correction, unsupervised POS tagging, lemmatization, detection of dataset-specific stop words, identification of measurement & temporal expressions, state of the art NER (Haniing et al 2014, winner of GermEval-2004)
Non-alignment features
- Character n-grams, pathlen similarity, numbers overlaps, word n-gram similarity, sentence length, average word lenght
Alignement features
- Diretion dependent m:n alignements of types EQUI, OPPO, SPE, SIM, EL, NOALI
- Align in strict order: NE, Normalized temporal expresions, normalized measurements, arbitrary token n-grams 1-5, negations remaining content words
- Proportion features for EQUI, OPPO, SPE, REL
- Binned frequency features for OPPO, SPE, REL, NOALI
- Han et al 2013 align-and-penalize features “good alignment vs bad alignment”
A robust system across all the corpus. 2d in english, 1st in spanish with a huge gap with the next performing system
SVR using 40 alignment features aand 51 non alignment features
Questions:
- Eneko asks if there were ablation test in order to estimate which features were better: they didn't.
- “If two sentences have similar NE they should be aligned first”

SemEval-2015 Task 3: Answer Selection in Community Question Answering

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval047.pdf

To guess the right answer to a question from several options. Two corpus: Quatar expats (English) and religious questions (Arabic)

VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval048.pdf

UIMA pipeline!

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval049.pdf

Babelnet, Tokenized, POS tagged documents in four languages
Example: the concept of medicine as a drug (variating specification according to the source: Wikipedia, wordnet, etc.)
Very intresting dataset in english, spanish and italian with lots of ambiguous terms
Resources used by the participants:
- DBPediaSpotlike, Wikpedia Miner, evolutionary game theory using a non cooperative multiplayer game setting, Tagme, EL services, Babelnet
- optimizing multiple objective functions, document monosemy plus personalized page rank
The winner approach: content words tagged by exploiting their translations in other languages
- The winer approach comes from french Lab LIMSI, and in particular from the charming Marianita Apidianaki

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval050.pdf

LIMSI system exploitts the parallelism of the miltilingual test data
assumption of sense correspondence between a word and its translation in context (Diab and Resnik, 2002)
sentence and word (lemma) level alignements (Hunaligh, GIZA++)
keep spanish translation for english words, english translation for spanish and italian words
sense selection for word(w) in context
- the synsets of w in babelnet are found
- sw is filered to keep only synsets that contain both w and its aligned trasnlation t in this context
- if theres one more sens, synsets ranked using the default sense comparator in babelnet api and keep the highest ranked synset
BFS helps to find also the wrong senses…
LIMSI systems needs no training, it only relies on alignment and sense ranking
weaker performance for spanish and italian due to the problematic sense ranking in these language (performed by Babelnet)
when multiple senses are retained after filtering by alignment
BFS is needed
alignment-based filtering remains benefical as the translation might occur in only one synset
BFS= Babelnet First Sense
BFS prediction are often wrong, especially in Spanish and Italian
Perspectives: experiment with alignments provided by MT systems, train a WSD systems on data annotated by the alignment-based method
- check out the METEOR-WSD and RATATOUILLE metrics from the WMT shared task

SemEval-2015 Task 14: Analysis of Clinical Text

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval051.pdf

Corpus: 100 annotated notes (109K words)
400K unnanotated notes
Annotations: subject, course, severity, generic, body location
Taks 1: identify the disorder span + CUI (concept unique identifier) normalization
Task 2: disorder slot filling
# 2a: gold-standard disorder spans are provided
# 2b: no gold standard, just raw text
The hardest thing: entity linking part (cody part identification and CUI)
Approaches:
# CRF-based span recognition, bag of words, bigrams, POS, chunks, dependency, specialized lexicons, trigger temrs, distance to disporder spans, dependency parse information
Link to the task

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval052.pdf

Disorder Entity recognition
- Vector space model based, word-embeddings (MIMIC II corpus), CRF, SSVM and Meta Map
Disorder slot filling
- SVM, ngream ffeatures, lexicon features, dependency relation features

SemEval-2015 Task 15: A CPA dictionary-entry-building task

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval053.pdf

CPA: Corpus pattern analysis.
- corpus driven technique for mapping meaning onto words in text
- tools and resources to identify and represent unambiguosly the main semantic patterns in which words are used
- Sense Discriminative Patterns
CPA Parsing, CPA Clustering, CPA lexicography
- Input: plain text with the target verb highlighted, output: specific to subtask, similar to the PDEV (pattern dictionnary of english verbs) entries
Corpus
- MICROCHECK (29 verbs 378 patterns, 4529 annotated sentences)
- WINGSPREAD (93, 856, 12440 annotated sentences: ~10K learning, ~400 testing)
ACL-2015 tutorial: Patterns for semantic processing

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval054.pdf

RAS (rien à signaler)

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval077.pdf

RAS (rien à signaler)

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval078.pdf

Subtasks:
- Phrase level sentiment
- MEssage level sentiment
- Topic level sentiment

[…]

I had to go to the toilet, so I lost most of the presentation, which looked very good: Trento was the winer in the subtask A (phrases) using deep convolutional NN with aditional input for phrases; the second use message level sentiment + character n-grams, the third used model iteration (tetrai)
For subtask B (message polarity, the most popular task of Semeaval) the winner put together four top performer classifiers from previous editions of the task; the second used deep convutional NN and the third used logistic regression with special wieghting for positives and negatives
For subtsask C (topic extraction) both system uses “system for subtask B”
General Ressources: tokenization, stemming, lemmatization, stopword removal, POS tagging
Twitter specific ressources: (I lost them)
Classifiers: SVm, MaxEnt, Naive Bayes,
Deep learning, embeddings (unitn, INESC-ID)
integration of enseble metthods (Webis)

input: a list of termis; output: the same list of termis with a polarity score.
MaxDiff method of annotation. Which term is the most positive and which is the leas positive?
rotten is less positive than; #hapiness is ore positive than

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval079.pdf

Very intresting an didactic paper on deep learning to nlp
The key to success is the initialization of the NN
Deep Learning models in NLP
- model words as a vectors
- learn compositional rules to represent sentences
ConvNet architecture
- sentence matrix, word embeddings, phrase indicator features, convolutional feature map, pooled representation, softmax
For the message classification task you need to add more features for each word
Models for twitter sentiment analysis
- SVM with various n-gram, char-gram, lexicon features
- State of the art model (NRC) in Semeval 13 and 14
Deep learning models have shown excellent results on many NLP sentence classification tasks but failed so far to eat carefully engineerd methods
Tree step pre-training process
- Pre-train word vectors using unsupervised langage model (word2vec) on 50M tweets
Train the network on a large distant supervised corpora on 10M tweets
Fine tune the network on the supervised dataset (about 10k tweets)
Major novelty intializing network with weights
ConvNet params
- wide convolution, max-pooling, filter width 5
- word embeddings dimensionality 100
- number of feature maps 300
Importance of pre-training: Three different experiments: random, unsupervised, distant
careful weights

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval080.pdf

Use CrowdFlower for training/annotation
8000 training tweets
4000 testing tweets
output categories: sarcasm, irony, metaphor and others

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval081.pdf

Negation and modality
They created a resource for irony called Gezi (and they used a resource called NRC)
95 features!
primary features: polarity class, lexical ressource, inguistic context
secondary features: emoticons, highest and lowest sentiment scores, POS counts, named entities
no bag of words but context aware polarity classes

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval082.pdf

restaurant and laptops
intended to capture contradictory polarity sentences

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval083.pdf

RAS

Tornado warning yesterday evening!
I talked with Daniel Cer from google, who help me to (at least) have 100 google queries per day for unoporuno with a tweak to the google API dedicated to websites
Nice talk with Georgeta Bordea, she was the one who built Saffron Expert Finding System, she would be glad to collaborate or do things related to our highly qualified immigration project
Had lunch with Greg Grefenstette: invited him to work on the Quijote project: he's already counting Don Quixote's words!

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval132.pdf

TempEval-3 corpus and evaluation methodology proposed by UzZaman (2011)
Relations represented as timegraph
Intresting task!
4 teams participating with 13 unique runs
Three corpus: airbus, GM, stock market. Twe tracks and two subtracks
First task focusing on cross-document ordering of events
If we're thinking in AGESS we should participate in this task

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval133.pdf

based on the NewsREader pipeline system (ENeko Agirre)
subtask first addressed at document leven and then aggregated at corpus level
entity driven instead of event-driven
time-lines are obtained in the post-processing
NER CoNLL, NED (named entity desambiguation): DBPedia Spotlight
Entity coreference: Stanford Multie Sieve Pass, Event Coreference:?
Event detection, timex detection and normalization, TLINK detection and classification
System trained on TempEval-2 corpus
Timex Detection and normalization: TimePro system
TLINK: TimeProRel system
TimeLine aggregation module:
Very low F values… the most difficult was time ordering (low recall for temporal relations availiable)… “we were missing temporal relations for anchoring”

SemEval-2015 Task 5: QA TempEval - Evaluating Temporal Information Understanding with Question Answering

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval134.pdf

QA TempEval: TimeML was originally developped to support research in complex temporal QA. TempEval mainly focused on a more strightforward temporal inforamtion extraction. QA TempEval focuses on an end-user QA task. It is opposed to earlier corpus-based evaluation
This evaluation is about the accuracy for answering targeted questions
It's easier to evaluate
Task description: plain documents with DTC (TempEval-3 format). The plain documents are fed into participation systems, which annotate timexes, events and temporal relations: the output are TimeML annotated documents.
Test dataset creation: question sets and key documents.
example of question: is event21 after event19? (the answer order the event). the questions are yes/no temporal quesitions regarding any of the 13 Allen Interval relations holding between two designated temporal entities.
Corpus: wikinews, wsj, nyt, wikipedia article, informal bblog post
each system's annotation represent its temporal knowledge of the documents
the annotation of each system is fec into a temporal qa system (UzZaman et al 2012) that answers questions on behalf of the systems
Given a system's TimeML annotated documents, the TimeML QA process consists of three main steps: (lost)
Participants: rule-based timex module, SVM liblinear for event and relation detection and classification, SVM separated event detection
very low recall on the results
main finding: using event co-reference may help
systems are still far from deeply understanding from temporal aspects of NL (recall: 30%)

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval135.pdf

ML based (SVM in Yamcha)
Training: TimeBank and AQUAINT data from TempEval3 task
News reader pipeline: tokenization, pos, constituency parser, dependency parser, named entity recognition, SRL
Timex identificcation: classification of all tokens in 9 classes (B-DATE, I-DATE, B-TIME… etc)
Timex normalization: Time Expression normalizer for enlgish: timenorm (Bethard, 2013)
Two classifiers: event detection and event classification
Features: lemma, pos, chunk, entity type (NE or Timex), verb tense and polarity, etc.
All predicates identified by the SRL (semantic role labelling)
System described in Paramita Mirza and Sara Tonelli. 2014. Classifying Temporal Relations with Simple Features.

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval136.pdf

Clinical events identification (April 23: the patient did not have any postoperative bleeding)
Detection of events in relation of the time when the document was written (narrative container relation).
Annotated with THYME extensioon of ISO-TimeML
Corpus: ~300 documents; ~40000 events
event/time spans: begin, end
event/time attributes: begin, end, value
document time relations: begin, end, relation
narrative container relations: begin1, end1, begin2, end2
ML systems had better recall, rule-based systems had better precision (accuracy)

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval137.pdf

Tools: PyConText and Moonstone
initiate work on end to end temporal reasoning
approach: UIMA/ClearTK (liblinear): BIO-representations
cTAKES; pyConText
Features: lexical, section, HeidelTime lexicon
CRF++, cTAKES, lexical, semantic type, context window

SemEval 2015, Task 7: Diachronic Text Evaluation

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval147.pdf

intresting task: to temporary date text snippets according to the style
linear models to extend pairwise decision
linking to Wikipedia and Google n-gram
stylistic classification problem
a crawler to crawl text snippets
intresting for AGESS

UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval148.pdf

stylometric text classification
word epoch disambiguation (Mihalcea and Nastase, 2012)
temporal text ranking (Niculae et al, 2014) TEmporal TExt ranking and automatic dating of texts
identifying period-specific language
direct lookup
focus on language style
treat it as a multiclass classification (Weka SMO 1-vs-1 polynomial)
label each text using non overlapping year ranges
CPWS features
- character n-grams (worked well)
- Google syntactic n-grams
Naive Bayes estimate p(y|w) for each year (the probability of a word used in a year)
Multiclass classification seems to work better
character n-grams are highly effective features for diachronic classificattion (but not very satisfying)
the prior distribution over date-labels has a significant domain-specific effect

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval149.pdf

Question answering about location of objects, events
Text to scoene conversion/visualization
Generating textual description of images
Navigational instructions to a robot
Adopts ISO-Space encoding for spacial information (and ISOspace metamodel)
qualitative spatial liink: RCC8 Relations for the topological relations between elements (QLink)
QLinks: qualitative spatial links
Uses SpatialML relation types based on RCC8
Example: the book is on the table
- spatialsignal(s1, cluster=“on-1”, semantictype=topological, directional)
- qslink(qsl1, trajector=se1, landmark=se2, signal=s1, relType=EC)
Very few participants on this task
Standard machine learning models
Lexical, syntactical, open sourde features

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval150.pdf

Spatial role labeling
I don't understand anything
Sequential labeler ¡, generate candidate relation tuples, multi-class classifiers
spatial elements and signal
the ball is in the backyard of the house: detect signals (in) with lemmatize, pos-tagger, etc.
Spatial element: the ball; spatial signal: in; place: the backyard; spatial signal: of; spatial element: the house
classify candidate spatial relations and label arguments with multi-class classifiers for each relation type
dependency path to spatial signal, lemma, pos, direction from spatial signal.
best features: raw string in a 5 word window, 300-dimension GloVe word vector; POS bigrams for a 5-word window (best feature)

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval151.pdf

Taxonomy extraction: given a list of domain specific term, structure them in a taxonomy
Subtask: term extraction, relation discovery, taxonomy construction
Domains: chemical, equipment, food, science.
combined gold standards:
- wikipedia bitaxonomi WiBi
- The Google product taxonomy (food)
- material handling equipement (equipment)
- taxonomy of fields and their subfields (science)
- baselines: all the nodes conncected to the root conccept, string inclusion (science and network science)
- structural evaluation: presence of cycles and intermediate nodes
Evaluation: cumulative Fowlkes&Mallows (formula/mesure for comparing clusters)
Generalised F&M and cumulative F&M
The task didn't provide the corpus, just the terms: each participant had to find his own corpus
the baseline is closer to the base system
Taxonomy visualisation
Relations discovery: lexico-syntactic patterns have high precision bt low recall
cooccurrendce based approaches improve results
taxonomy construction: approaches are less known or difficult to reimplement
no corpus was provided and participants had no gold standard

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval152.pdf

terms: one to nine words
substring inclusion: bycycle helmet < helmet (suffix)
- licorice < rice (error)
- fruit 'n fibre < fruit (error: it's a cereal)
main intuition: hypernyms and hyponyms often occur together
- hypernyms are more common than hyponys
strategy: cooccurrence statistics and term frequencies in a collection of documents
Wikipedia (only text, no categorie, redirects, titles)
sentencized: 125 million sentences
counts of term cooccurrence in the same sentence (document frequency of terms)
method: consider all domain terms B co-occurring in the same Wikipedia sentences, eliminate any candidate B that appears in fewer documents than A, retain N=3

http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval153.pdf

RAS

STS

Compute the degree of semantic similarity between paired sentences (as usual)
Annotated data
- English 14250 pairs (2012-2015)
- Spanish 1620 pairs (2014,2015)
Evaluation: Pearson correlation to mean of human scores
Applications: deep QA, distillation, generation, machine reading, MT, plagiarism detection, paraphrasing, textual inference, summarization and many more
Datasets 2016
- Plagiarism detection
- QA question-question
- Post-edited MT
- Q&A Answer-answer
- Headlines
Data selection will target weakness of existing techniques
Pilot task: crosslingual STS
- JOhn said he said he is considered a witness but not a suspect
- “Él ya no es un sospechoso”, John dijo
semeval@googlegroups.com
http://ixa2.si.ehu.es/stswiki/

Full task on its own
Student grading scenario
- References: 12 killed in bus accident in Pakistan
- Student: 10 killed in road accident in NW Pakistan
- Grade 3.2 out of 5
- Explanation:
  - They are quite similare but Pakistan is more general than NW Pakistan and in bus accident is more specific than in road accident
Given a pair of sentences
- chunk the sentences /gold chunks provided
- systems align chunks across both sentences
- score similarity for each chunk pair
- classify the thpe of relation: EQUI, OPPO, SPE, SIMI, REL, FACT, OPI
Annotation guidelines publick, high qualiy annotation 75F1, high participation
Novelties: allow for N:M alignements
New test data: same datasets and education-related dataset
Question: e.agirre@ehu.es

Given a question Q, find a good answer A from a collection of CQA threads
Given a question, find a similar question
English and Arabic tasks

tweet level polarity
topic level polarity (pos/neg/neu, 5 stars)
topic trend detection (pos/neg/neu, 5 stars)

contradiction polarity on restaurant, laptops, hotels and consumer electronics
English and Chinese

input: a target and a tweet pair
output: determine whether the author of the text is in favor of, against or neutral towards the target
two subtask: labeled training data and no labeled training data

RAS

Input: the soldier was not afraid of dying
Output: f/fear-01 :arg0 (s / soldier) :arg2 (d / die-01) :arg0 s) :polarity:”-“)
resources: 15000 training parirs (LDC/DEFT), tokenizer, aligner (Pourdamghani et al 14), AMR manipulating librari, baseline parser (Flanigan et al 14), scorer (Cai & Knight '13)

RAS

Detection of minimal semantics units and their meanings (DimSUM)
Lexical semantic taksk
units (single/multiwords expressions) and classes (noun+verbs)
I googled restaurants in the area and Fuji Sushi came up and reviews were great so U made a carry out order.
I googled restaurants in the area and FujiSushi cameup … carry_out
googled (V:COMMUNICATION) restaurants (N:GROUP) area (N:LOCATION)
V: BODY, CHANGE, COGNITION, COMPETITION, COMMUNICATION, CONSUMPTION
Tag the english sentence for MWEs and supersenses
Domains: online reviews and Tweets (Copenhagen supersense dataset (Johannsen et al 2014)

The cat perched on the mat
complex: perched
simple: cat, mat
Format: 2247 training instances, 88000 testing instances
Corpus come from 400 non-native speakers of english language, 42 distinct natural language: ~4000 complex words were found

Extract timelines from text with events and temporal relations

Task definition: organise domain-specific terms in a taxonomy
Datasets: chemicals, equipement, food, science
goldstandards: wordnet, wikipedia, online taxonomies
multilingual settings: English, French, Italian and Dutch
evaluattion: structural evaluation
comparaison against gold standards
the corpus will be provided to participant this year.

task objective: given a word and a gloss, identify the wordnet synset that is its synonym por hyponym
taks oriented to words missing in wordnet

peer-reviewed all future task proposals
organized 14 tasks in 5 tracks
new paper reviewing guidelines
improve replicability (possibily introducing the chance of paper rejection)
the semeval experience
- noting the reviews!!!
- release initial version of submitted papers and anonymized reviews and ratings released after Semevval as a corpus for analysis!!!
- what makes a good review?
- Semeval 2017 Task: predicting reviewing quality (???)

Use Sultan to align @menosdias tweets
While drinking cofee with charming Houda Bouamor (from Qatar Carnegie Mellon) we had the idea of a multilingual summarizer (french-LIPN, spanish-IIMAS, arabic-Qatar) based on alignement and moderate generation (LORIA). She said she could get some funding for a one year project from Qatar.
De pronto pienso que @menosdías quizá daría para una tarea de Semeval
más le pienso y más me digo: hagamos chunking y alineación con @menosdías usando pura similaridad semántica y ya nos sale el paper (y la extracción de entidades, al menos de los tuits)
pensando en AGESS (Automatic Generation of State of the Arts)… creo que deberíamos lanzar a un doctorando a que trabaje en el tiempo (y participe en las tareas temporales de Semeval)
Using corefeence reduces he number of unknown relations significantly
SpaceEval might be useful for the GolemGenFred project
Orientation Link (OLink) describe non topological relationships between spacial elements (the chair is in front of the couch)
Data sources:
- Degree Confluence Project (travelers web log) DCP
- CLEF
SE (special entities), SS (spatial signal), MI (motion signal)
Three configurations: un-annotated text, manually annotated spatial elements, manually annotated spatial elements with attributes
word2vec… and wordembedding
Can Selectional Preferences Help Automatic Semantic Role Labeling? Shumin Wu and Martha Palmer (intresting poster, semantic role labelling with LDA)

Notes on Semeval 2015 conference

Wednesday, June 3 2015

Thursday, June 4 2015

Marco Baroni keynote conference on distributional semantic models

SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)

MITRE: Seven Systems for Semantic Similarity in Tweets

Póster session

ExB Themis: Extensive Feature Extraction from Word Alignments for Semantic Textual Similarity

SemEval-2015 Task 3: Answer Selection in Community Question Answering

VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems

SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking

LIMSI: Translations as Source of Indirect Supervision for Multilingual All-Words Sense Disambiguation and Entity Linking

SemEval-2015 Task 14: Analysis of Clinical Text

UTH-CCB: The Participation of the SemEval 2015 Challenge – Task 14

SemEval-2015 Task 15: A CPA dictionary-entry-building task

BLCUNLP: Corpus Pattern Analysis for Verbs Based on Dependency Chain

SemEval-2015 Task 9: CLIPEval Implicit Polarity of Events

SemEval-2015 Task 10: Sentiment Analysis in Twitter

unknown

UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification

SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter

CLaC-SentiPipe: SemEval2015 Subtasks 10 B,E, and Task 11

SemEval-2015 Task 12: Aspect Based Sentiment Analysis

NLANGP: Supervised Machine Learning System for Aspect Category Classification and Opinion Target Extraction

Blody looooong (but very interesting) research day! #ContradictoryPolarity

Friday, June 5

SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering (or Newsreader project)

SPINOZA_VU: An NLP Pipeline for Cross Document TimeLines

SemEval-2015 Task 5: QA TempEval - Evaluating Temporal Information Understanding with Question Answering

HLT-FBK: a Complete Temporal Processing System for QA TempEval

SemEval-2015 Task 6: Clinical TempEval

BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge

SemEval 2015, Task 7: Diachronic Text Evaluation

UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

SemEval-2015 Task 8: SpaceEval

SpRL-CWW: Spatial Relation Classification with Independent Multi-class Models

SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)

INRIASAC: Simple Hypernym Extraction Methods

SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing

SemEval-2016 Task Announcements and closing session

STS

Interpretable STS

Community QA

Sentiment Analysis in Twitter

Aspect-based sentiment analysis (ABSA)

Detecting stance in tweets

Sentiment intensity of English and Arabic

Meaning representation parsing

Chinese semantic dependency parsing

Semantic Analysis Track

Complex Word Identification

Clinical TempEval

Taxonomy extraction

semantic taxonomy enrichment

Closing session

Ideas