Notes on Semeval 2015 conference
Wednesday, June 3 2015
- In the plane I read Eneko's paper on STS task. I get the following quotations…
- The top 10 systems [English task] did not show statistical significant variation among them.
- Aligning words between sentences has been the most popular approach for the top three participants (DLS@CU, ExBThemis, Samsung). They use WordNet (Miller, 1995), Mikolov Embeddings (Mikolov et al., 2013; Baroni et al., 2014) and PPDB (Ganitkevitch et al., 2013).
- Most teams add a machine learning algorithm to learn the output scores, but note that Samsung team did not use it in their best run.
- Only about one fifth of the systems were un- supervised, among which, the top performing sys- tem, UMDuluth-BlueTeam-run1, was able to come within 0.1 correlation points from the top perform- ing system on Wikipedia and within 0.03 on the Newswire dataset. This relatively narrow gap suggests that unsupervised semantic textual similarity is aviable option for languages with limited resources
- System's worth studying and reading the paper
Thursday, June 4 2015
- I had a burrito breakfast in front of the hotel, I didn't know that there's this nice continental breakfast for the conference participants!
- I say hi to Greg Grefenstette (INRIA) and Mariana Apidianaki (LIMSI)… Vive la France!
- I had a cheestake for lunch (very fast, lots of clients for the poster)
- I had a chat with Eneko Aguirre (sends Greetings to Davide) and David Cer, from Google (I still have to ask him about querying for the immigrants projects)
Marco Baroni keynote conference on distributional semantic models
- He starts with some words about Adam Kilgariff:
- “He was totally allergical to bullshit
- “He wrote a great paper in 1997: I don't believe in word senses”
- “He never got a paper accepted in the main ACL conference (which is a scandal, given his contribution to the field)”
- “Recently we were talking about all this young people doing deep learning: they don't make any distinction between language and vision”
- On the multimodal skip-gram model
- Inspired by language learning by children
- Using the Frank corpus
- Training the model for standard distributional semantics with 20K words extracted from a baby language learning corpus.. they try to predict the objects the babies are learning (hat, ring) with a skip-gram model. The corpus is composed of words and object images (I don't totally understand it). The taks is called Matching words with objects
- “Look at the kitty! Look at the oink!”
- The model tries to predict from the word kitty the right cute animal image
- Concept learning, word learning, synonim learning inspired by the human cognition process
- Concentration lost
- I lost concentration writing this chronicle while Marco speaks.. I just pick up some isolated words, like…
- How MMSkipGram visualizes new concepts
- As far as I understand, he's trying to train an MMSkipGram model in order to learn unkown words and associate them to images, inspired by the baby language learning process… looks intresting, but…
- Models of language acquisition
SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval001.pdf
- A task worth to participate in… two subtasks: paraphrase identification (bynary) and semantic similarity between tweets. In contrast to STS, you have much more spell variations and much more street language. Very interesting in deed, specially if we are thinking about processing @menosdias tweets.
MITRE: Seven Systems for Semantic Similarity in Tweets
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval002.pdf
- 352 features combined with logistic regression
- a lot of metrics, machine translation, biological metrics, etc!
- word and phrase embeddings (unless you're under a rock you must have heard about word and phrase embeddings!)
- Tweet alignments with embeddings (each atom to a corresponding atom on other side)
- we compute cosin similarity between vectors to score a candidate aligned pair
- Linear programming
- Recurrent neural networks (RNN) had the best score from all their systems
Póster session
- 4 clients in half an hour (Greg Grefenstette included)… not bad
- Most of the questions were related to why random forest worked better and how do we explain the note we give tto a sentence pair…
- Next year we shoud participate also in Interpretable STS, in order to explain why we attribute a note (and to improve our alignment algorithms)
- Lots of clients for the afternoonn sesion. Most of the intresting conentrates on the Geographical similarity feature and on the question: Why random forest work better?
- Word embeddings are the new buzz word
ExB Themis: Extensive Feature Extraction from Word Alignments for Semantic Textual Similarity
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval046.pdf
- They were the best in spanish and one of the best in english
- Cherry-picking the best approaches
- Alignmente, word embeddings, SVN
- Preprocessing
- Tokenizatin, case correction, unsupervised POS tagging, lemmatization, detection of dataset-specific stop words, identification of measurement & temporal expressions, state of the art NER (Haniing et al 2014, winner of GermEval-2004)
- Non-alignment features
- Character n-grams, pathlen similarity, numbers overlaps, word n-gram similarity, sentence length, average word lenght
- Alignement features
- Diretion dependent m:n alignements of types EQUI, OPPO, SPE, SIM, EL, NOALI
- Align in strict order: NE, Normalized temporal expresions, normalized measurements, arbitrary token n-grams 1-5, negations remaining content words
- Proportion features for EQUI, OPPO, SPE, REL
- Binned frequency features for OPPO, SPE, REL, NOALI
- Han et al 2013 align-and-penalize features “good alignment vs bad alignment”
- A robust system across all the corpus. 2d in english, 1st in spanish with a huge gap with the next performing system
- SVR using 40 alignment features aand 51 non alignment features
- Questions:
- Eneko asks if there were ablation test in order to estimate which features were better: they didn't.
- “If two sentences have similar NE they should be aligned first”
SemEval-2015 Task 3: Answer Selection in Community Question Answering
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval047.pdf
- To guess the right answer to a question from several options. Two corpus: Quatar expats (English) and religious questions (Arabic)
VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval048.pdf
- UIMA pipeline!
SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval049.pdf
- Babelnet, Tokenized, POS tagged documents in four languages
- Example: the concept of medicine as a drug (variating specification according to the source: Wikipedia, wordnet, etc.)
- Very intresting dataset in english, spanish and italian with lots of ambiguous terms
- Resources used by the participants:
- DBPediaSpotlike, Wikpedia Miner, evolutionary game theory using a non cooperative multiplayer game setting, Tagme, EL services, Babelnet
- optimizing multiple objective functions, document monosemy plus personalized page rank
- The winner approach: content words tagged by exploiting their translations in other languages
- The winer approach comes from french Lab LIMSI, and in particular from the charming Marianita Apidianaki
LIMSI: Translations as Source of Indirect Supervision for Multilingual All-Words Sense Disambiguation and Entity Linking
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval050.pdf
- LIMSI system exploitts the parallelism of the miltilingual test data
- assumption of sense correspondence between a word and its translation in context (Diab and Resnik, 2002)
- sentence and word (lemma) level alignements (Hunaligh, GIZA++)
- keep spanish translation for english words, english translation for spanish and italian words
- sense selection for word(w) in context
- the synsets of w in babelnet are found
- sw is filered to keep only synsets that contain both w and its aligned trasnlation t in this context
- if theres one more sens, synsets ranked using the default sense comparator in babelnet api and keep the highest ranked synset
- BFS helps to find also the wrong senses…
- LIMSI systems needs no training, it only relies on alignment and sense ranking
- weaker performance for spanish and italian due to the problematic sense ranking in these language (performed by Babelnet)
- when multiple senses are retained after filtering by alignment
- BFS is needed
- alignment-based filtering remains benefical as the translation might occur in only one synset
- BFS= Babelnet First Sense
- BFS prediction are often wrong, especially in Spanish and Italian
- Perspectives: experiment with alignments provided by MT systems, train a WSD systems on data annotated by the alignment-based method
- check out the METEOR-WSD and RATATOUILLE metrics from the WMT shared task
SemEval-2015 Task 14: Analysis of Clinical Text
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval051.pdf
- Corpus: 100 annotated notes (109K words)
- 400K unnanotated notes
- Annotations: subject, course, severity, generic, body location
- Taks 1: identify the disorder span + CUI (concept unique identifier) normalization
- Task 2: disorder slot filling
- # 2a: gold-standard disorder spans are provided
- # 2b: no gold standard, just raw text
- The hardest thing: entity linking part (cody part identification and CUI)
- Approaches:
- # CRF-based span recognition, bag of words, bigrams, POS, chunks, dependency, specialized lexicons, trigger temrs, distance to disporder spans, dependency parse information
UTH-CCB: The Participation of the SemEval 2015 Challenge – Task 14
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval052.pdf
- Disorder Entity recognition
- Vector space model based, word-embeddings (MIMIC II corpus), CRF, SSVM and Meta Map
- Disorder slot filling
- SVM, ngream ffeatures, lexicon features, dependency relation features
SemEval-2015 Task 15: A CPA dictionary-entry-building task
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval053.pdf
- CPA: Corpus pattern analysis.
- corpus driven technique for mapping meaning onto words in text
- tools and resources to identify and represent unambiguosly the main semantic patterns in which words are used
- Sense Discriminative Patterns
- CPA Parsing, CPA Clustering, CPA lexicography
- Input: plain text with the target verb highlighted, output: specific to subtask, similar to the PDEV (pattern dictionnary of english verbs) entries
- Corpus
- MICROCHECK (29 verbs 378 patterns, 4529 annotated sentences)
- WINGSPREAD (93, 856, 12440 annotated sentences: ~10K learning, ~400 testing)
- ACL-2015 tutorial: Patterns for semantic processing
BLCUNLP: Corpus Pattern Analysis for Verbs Based on Dependency Chain
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval054.pdf
- RAS (rien à signaler)
SemEval-2015 Task 9: CLIPEval Implicit Polarity of Events
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval077.pdf
- RAS (rien à signaler)
SemEval-2015 Task 10: Sentiment Analysis in Twitter
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval078.pdf
- Subtasks:
- Phrase level sentiment
- MEssage level sentiment
- Topic level sentiment
[…]
- I had to go to the toilet, so I lost most of the presentation, which looked very good: Trento was the winer in the subtask A (phrases) using deep convolutional NN with aditional input for phrases; the second use message level sentiment + character n-grams, the third used model iteration (tetrai)
- For subtask B (message polarity, the most popular task of Semeaval) the winner put together four top performer classifiers from previous editions of the task; the second used deep convutional NN and the third used logistic regression with special wieghting for positives and negatives
- For subtsask C (topic extraction) both system uses “system for subtask B”
- General Ressources: tokenization, stemming, lemmatization, stopword removal, POS tagging
- Twitter specific ressources: (I lost them)
- Classifiers: SVm, MaxEnt, Naive Bayes,
- Deep learning, embeddings (unitn, INESC-ID)
- integration of enseble metthods (Webis)
unknown
- input: a list of termis; output: the same list of termis with a polarity score.
- MaxDiff method of annotation. Which term is the most positive and which is the leas positive?
- rotten is less positive than; #hapiness is ore positive than
UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval079.pdf
- Very intresting an didactic paper on deep learning to nlp
- The key to success is the initialization of the NN
- Deep Learning models in NLP
- model words as a vectors
- learn compositional rules to represent sentences
- ConvNet architecture
- sentence matrix, word embeddings, phrase indicator features, convolutional feature map, pooled representation, softmax
- For the message classification task you need to add more features for each word
- Models for twitter sentiment analysis
- SVM with various n-gram, char-gram, lexicon features
- State of the art model (NRC) in Semeval 13 and 14
- Deep learning models have shown excellent results on many NLP sentence classification tasks but failed so far to eat carefully engineerd methods
- Tree step pre-training process
- Pre-train word vectors using unsupervised langage model (word2vec) on 50M tweets
- Train the network on a large distant supervised corpora on 10M tweets
- Fine tune the network on the supervised dataset (about 10k tweets)
- Major novelty intializing network with weights
- ConvNet params
- wide convolution, max-pooling, filter width 5
- word embeddings dimensionality 100
- number of feature maps 300
- Importance of pre-training: Three different experiments: random, unsupervised, distant
- careful weights
SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval080.pdf
- Use CrowdFlower for training/annotation
- 8000 training tweets
- 4000 testing tweets
- output categories: sarcasm, irony, metaphor and others
CLaC-SentiPipe: SemEval2015 Subtasks 10 B,E, and Task 11
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval081.pdf
- Negation and modality
- They created a resource for irony called Gezi (and they used a resource called NRC)
- 95 features!
- primary features: polarity class, lexical ressource, inguistic context
- secondary features: emoticons, highest and lowest sentiment scores, POS counts, named entities
- no bag of words but context aware polarity classes
SemEval-2015 Task 12: Aspect Based Sentiment Analysis
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval082.pdf
- restaurant and laptops
- intended to capture contradictory polarity sentences
NLANGP: Supervised Machine Learning System for Aspect Category Classification and Opinion Target Extraction
Blody looooong (but very interesting) research day! #ContradictoryPolarity
Friday, June 5
- Tornado warning yesterday evening!
- I talked with Daniel Cer from google, who help me to (at least) have 100 google queries per day for unoporuno with a tweak to the google API dedicated to websites
- Nice talk with Georgeta Bordea, she was the one who built Saffron Expert Finding System, she would be glad to collaborate or do things related to our highly qualified immigration project
- Had lunch with Greg Grefenstette: invited him to work on the Quijote project: he's already counting Don Quixote's words!
SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering (or Newsreader project)
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval132.pdf
- TempEval-3 corpus and evaluation methodology proposed by UzZaman (2011)
- Relations represented as timegraph
- Intresting task!
- 4 teams participating with 13 unique runs
- Three corpus: airbus, GM, stock market. Twe tracks and two subtracks
- First task focusing on cross-document ordering of events
- If we're thinking in AGESS we should participate in this task
SPINOZA_VU: An NLP Pipeline for Cross Document TimeLines
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval133.pdf
- based on the NewsREader pipeline system (ENeko Agirre)
- subtask first addressed at document leven and then aggregated at corpus level
- entity driven instead of event-driven
- time-lines are obtained in the post-processing
- NER CoNLL, NED (named entity desambiguation): DBPedia Spotlight
- Entity coreference: Stanford Multie Sieve Pass, Event Coreference:?
- Event detection, timex detection and normalization, TLINK detection and classification
- System trained on TempEval-2 corpus
- Timex Detection and normalization: TimePro system
- TLINK: TimeProRel system
- TimeLine aggregation module:
- Very low F values… the most difficult was time ordering (low recall for temporal relations availiable)… “we were missing temporal relations for anchoring”
SemEval-2015 Task 5: QA TempEval - Evaluating Temporal Information Understanding with Question Answering
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval134.pdf
- QA TempEval: TimeML was originally developped to support research in complex temporal QA. TempEval mainly focused on a more strightforward temporal inforamtion extraction. QA TempEval focuses on an end-user QA task. It is opposed to earlier corpus-based evaluation
- This evaluation is about the accuracy for answering targeted questions
- It's easier to evaluate
- Task description: plain documents with DTC (TempEval-3 format). The plain documents are fed into participation systems, which annotate timexes, events and temporal relations: the output are TimeML annotated documents.
- Test dataset creation: question sets and key documents.
- example of question: is event21 after event19? (the answer order the event). the questions are yes/no temporal quesitions regarding any of the 13 Allen Interval relations holding between two designated temporal entities.
- Corpus: wikinews, wsj, nyt, wikipedia article, informal bblog post
- each system's annotation represent its temporal knowledge of the documents
- the annotation of each system is fec into a temporal qa system (UzZaman et al 2012) that answers questions on behalf of the systems
- Given a system's TimeML annotated documents, the TimeML QA process consists of three main steps: (lost)
- Participants: rule-based timex module, SVM liblinear for event and relation detection and classification, SVM separated event detection
- very low recall on the results
- main finding: using event co-reference may help
- systems are still far from deeply understanding from temporal aspects of NL (recall: 30%)
HLT-FBK: a Complete Temporal Processing System for QA TempEval
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval135.pdf
- ML based (SVM in Yamcha)
- Training: TimeBank and AQUAINT data from TempEval3 task
- News reader pipeline: tokenization, pos, constituency parser, dependency parser, named entity recognition, SRL
- Timex identificcation: classification of all tokens in 9 classes (B-DATE, I-DATE, B-TIME… etc)
- Timex normalization: Time Expression normalizer for enlgish: timenorm (Bethard, 2013)
- Two classifiers: event detection and event classification
- Features: lemma, pos, chunk, entity type (NE or Timex), verb tense and polarity, etc.
- All predicates identified by the SRL (semantic role labelling)
- System described in Paramita Mirza and Sara Tonelli. 2014. Classifying Temporal Relations with Simple Features.
SemEval-2015 Task 6: Clinical TempEval
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval136.pdf
- Clinical events identification (April 23: the patient did not have any postoperative bleeding)
- Detection of events in relation of the time when the document was written (narrative container relation).
- Annotated with THYME extensioon of ISO-TimeML
- Corpus: ~300 documents; ~40000 events
- event/time spans: begin, end
- event/time attributes: begin, end, value
- document time relations: begin, end, relation
- narrative container relations: begin1, end1, begin2, end2
- ML systems had better recall, rule-based systems had better precision (accuracy)
BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval137.pdf
- Tools: PyConText and Moonstone
- initiate work on end to end temporal reasoning
- approach: UIMA/ClearTK (liblinear): BIO-representations
- cTAKES; pyConText
- Features: lexical, section, HeidelTime lexicon
- CRF++, cTAKES, lexical, semantic type, context window
SemEval 2015, Task 7: Diachronic Text Evaluation
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval147.pdf
- intresting task: to temporary date text snippets according to the style
- linear models to extend pairwise decision
- linking to Wikipedia and Google n-gram
- stylistic classification problem
- a crawler to crawl text snippets
- intresting for AGESS
UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval148.pdf
- stylometric text classification
- word epoch disambiguation (Mihalcea and Nastase, 2012)
- temporal text ranking (Niculae et al, 2014) TEmporal TExt ranking and automatic dating of texts
- identifying period-specific language
- direct lookup
- focus on language style
- treat it as a multiclass classification (Weka SMO 1-vs-1 polynomial)
- label each text using non overlapping year ranges
- CPWS features
- character n-grams (worked well)
- Google syntactic n-grams
- Naive Bayes estimate p(y|w) for each year (the probability of a word used in a year)
- Multiclass classification seems to work better
- character n-grams are highly effective features for diachronic classificattion (but not very satisfying)
- the prior distribution over date-labels has a significant domain-specific effect
SemEval-2015 Task 8: SpaceEval
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval149.pdf
- Question answering about location of objects, events
- Text to scoene conversion/visualization
- Generating textual description of images
- Navigational instructions to a robot
- Adopts ISO-Space encoding for spacial information (and ISOspace metamodel)
- qualitative spatial liink: RCC8 Relations for the topological relations between elements (QLink)
- QLinks: qualitative spatial links
- Uses SpatialML relation types based on RCC8
- Example: the book is on the table
- spatialsignal(s1, cluster=“on-1”, semantictype=topological, directional)
- qslink(qsl1, trajector=se1, landmark=se2, signal=s1, relType=EC)
- Very few participants on this task
- Standard machine learning models
- Lexical, syntactical, open sourde features
SpRL-CWW: Spatial Relation Classification with Independent Multi-class Models
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval150.pdf
- Spatial role labeling
- I don't understand anything
- Sequential labeler ¡, generate candidate relation tuples, multi-class classifiers
- spatial elements and signal
- the ball is in the backyard of the house: detect signals (in) with lemmatize, pos-tagger, etc.
- Spatial element: the ball; spatial signal: in; place: the backyard; spatial signal: of; spatial element: the house
- classify candidate spatial relations and label arguments with multi-class classifiers for each relation type
- dependency path to spatial signal, lemma, pos, direction from spatial signal.
- best features: raw string in a 5 word window, 300-dimension GloVe word vector; POS bigrams for a 5-word window (best feature)
SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval151.pdf
- Taxonomy extraction: given a list of domain specific term, structure them in a taxonomy
- Subtask: term extraction, relation discovery, taxonomy construction
- Domains: chemical, equipment, food, science.
- combined gold standards:
- wikipedia bitaxonomi WiBi
- The Google product taxonomy (food)
- material handling equipement (equipment)
- taxonomy of fields and their subfields (science)
- baselines: all the nodes conncected to the root conccept, string inclusion (science and network science)
- structural evaluation: presence of cycles and intermediate nodes
- Evaluation: cumulative Fowlkes&Mallows (formula/mesure for comparing clusters)
- Generalised F&M and cumulative F&M
- The task didn't provide the corpus, just the terms: each participant had to find his own corpus
- the baseline is closer to the base system
- Taxonomy visualisation
- Relations discovery: lexico-syntactic patterns have high precision bt low recall
- cooccurrendce based approaches improve results
- taxonomy construction: approaches are less known or difficult to reimplement
- no corpus was provided and participants had no gold standard
INRIASAC: Simple Hypernym Extraction Methods
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval152.pdf
- terms: one to nine words
- substring inclusion: bycycle helmet < helmet (suffix)
- licorice < rice (error)
- fruit 'n fibre < fruit (error: it's a cereal)
- main intuition: hypernyms and hyponyms often occur together
- hypernyms are more common than hyponys
- strategy: cooccurrence statistics and term frequencies in a collection of documents
- Wikipedia (only text, no categorie, redirects, titles)
- sentencized: 125 million sentences
- counts of term cooccurrence in the same sentence (document frequency of terms)
- method: consider all domain terms B co-occurring in the same Wikipedia sentences, eliminate any candidate B that appears in fewer documents than A, retain N=3
SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing
SemEval-2016 Task Announcements and closing session
STS
- Compute the degree of semantic similarity between paired sentences (as usual)
- Annotated data
- English 14250 pairs (2012-2015)
- Spanish 1620 pairs (2014,2015)
- Evaluation: Pearson correlation to mean of human scores
- Applications: deep QA, distillation, generation, machine reading, MT, plagiarism detection, paraphrasing, textual inference, summarization and many more
- Datasets 2016
- Plagiarism detection
- QA question-question
- Post-edited MT
- Q&A Answer-answer
- Headlines
- Data selection will target weakness of existing techniques
- Pilot task: crosslingual STS
- JOhn said he said he is considered a witness but not a suspect
- “Él ya no es un sospechoso”, John dijo
- semeval@googlegroups.com
Interpretable STS
- Full task on its own
- Student grading scenario
- References: 12 killed in bus accident in Pakistan
- Student: 10 killed in road accident in NW Pakistan
- Grade 3.2 out of 5
- Explanation:
- They are quite similare but Pakistan is more general than NW Pakistan and in bus accident is more specific than in road accident
- Given a pair of sentences
- chunk the sentences /gold chunks provided
- systems align chunks across both sentences
- score similarity for each chunk pair
- classify the thpe of relation: EQUI, OPPO, SPE, SIMI, REL, FACT, OPI
- Annotation guidelines publick, high qualiy annotation 75F1, high participation
- Novelties: allow for N:M alignements
- New test data: same datasets and education-related dataset
- Question: e.agirre@ehu.es
Community QA
- Given a question Q, find a good answer A from a collection of CQA threads
- Given a question, find a similar question
- English and Arabic tasks
Sentiment Analysis in Twitter
- tweet level polarity
- topic level polarity (pos/neg/neu, 5 stars)
- topic trend detection (pos/neg/neu, 5 stars)
Aspect-based sentiment analysis (ABSA)
- contradiction polarity on restaurant, laptops, hotels and consumer electronics
- English and Chinese
Detecting stance in tweets
- input: a target and a tweet pair
- output: determine whether the author of the text is in favor of, against or neutral towards the target
- two subtask: labeled training data and no labeled training data
Sentiment intensity of English and Arabic
- RAS
Meaning representation parsing
- Input: the soldier was not afraid of dying
- Output: f/fear-01 :arg0 (s / soldier) :arg2 (d / die-01) :arg0 s) :polarity:”-“)
- resources: 15000 training parirs (LDC/DEFT), tokenizer, aligner (Pourdamghani et al 14), AMR manipulating librari, baseline parser (Flanigan et al 14), scorer (Cai & Knight '13)
Chinese semantic dependency parsing
RAS
Semantic Analysis Track
- Detection of minimal semantics units and their meanings (DimSUM)
- Lexical semantic taksk
- units (single/multiwords expressions) and classes (noun+verbs)
- I googled restaurants in the area and Fuji Sushi came up and reviews were great so U made a carry out order.
- I googled restaurants in the area and FujiSushi cameup … carry_out
- googled (V:COMMUNICATION) restaurants (N:GROUP) area (N:LOCATION)
- V: BODY, CHANGE, COGNITION, COMPETITION, COMMUNICATION, CONSUMPTION
- Tag the english sentence for MWEs and supersenses
- Domains: online reviews and Tweets (Copenhagen supersense dataset (Johannsen et al 2014)
Complex Word Identification
- The cat perched on the mat
- complex: perched
- simple: cat, mat
- Format: 2247 training instances, 88000 testing instances
- Corpus come from 400 non-native speakers of english language, 42 distinct natural language: ~4000 complex words were found
Clinical TempEval
- Extract timelines from text with events and temporal relations
Taxonomy extraction
- Task definition: organise domain-specific terms in a taxonomy
- Datasets: chemicals, equipement, food, science
- goldstandards: wordnet, wikipedia, online taxonomies
- multilingual settings: English, French, Italian and Dutch
- evaluattion: structural evaluation
- comparaison against gold standards
- the corpus will be provided to participant this year.
semantic taxonomy enrichment
- task objective: given a word and a gloss, identify the wordnet synset that is its synonym por hyponym
- taks oriented to words missing in wordnet
Closing session
- peer-reviewed all future task proposals
- organized 14 tasks in 5 tracks
- new paper reviewing guidelines
- improve replicability (possibily introducing the chance of paper rejection)
- the semeval experience
- noting the reviews!!!
- release initial version of submitted papers and anonymized reviews and ratings released after Semevval as a corpus for analysis!!!
- what makes a good review?
- Semeval 2017 Task: predicting reviewing quality (???)
Ideas
- Use Sultan to align @menosdias tweets
- While drinking cofee with charming Houda Bouamor (from Qatar Carnegie Mellon) we had the idea of a multilingual summarizer (french-LIPN, spanish-IIMAS, arabic-Qatar) based on alignement and moderate generation (LORIA). She said she could get some funding for a one year project from Qatar.
- De pronto pienso que @menosdías quizá daría para una tarea de Semeval
- más le pienso y más me digo: hagamos chunking y alineación con @menosdías usando pura similaridad semántica y ya nos sale el paper (y la extracción de entidades, al menos de los tuits)
- pensando en AGESS (Automatic Generation of State of the Arts)… creo que deberíamos lanzar a un doctorando a que trabaje en el tiempo (y participe en las tareas temporales de Semeval)
- Using corefeence reduces he number of unknown relations significantly
- SpaceEval might be useful for the GolemGenFred project
- Orientation Link (OLink) describe non topological relationships between spacial elements (the chair is in front of the couch)
- Data sources:
- Degree Confluence Project (travelers web log) DCP
- CLEF
- SE (special entities), SS (spatial signal), MI (motion signal)
- Three configurations: un-annotated text, manually annotated spatial elements, manually annotated spatial elements with attributes
- word2vec… and wordembedding
- Can Selectional Preferences Help Automatic Semantic Role Labeling? Shumin Wu and Martha Palmer (intresting poster, semantic role labelling with LDA)