A3, AOC, CALIN, LCR, MERCRED, RCLN

Heure:	14:00 - 15:00
Lieu:	Salle B107, bâtiment B, Université de Villetaneuse
Résumé:	Automatic Deception Detection in Text Applying Topic Modeling Algorithms
Description:	Hiram Calvo We deal with deceptive text identification by using different kinds of features: a continuous semantic space model based on latent Dirichlet allocation topics (LDA), one-hot representation (OHR), syntactic information from syntactic n-grams (SN), and lexicon-based features using the linguistic inquiry and word count dictionary (LIWC). We will present experiments with several combinations of these features were tested to assess the best source(s) for deceptive text identification aiming to present a state of the art performance. We conducted our tests on three different available corpora: a corpus consisting of 800 reviews about hotels, a corpus consisting of 600 reviews about controversial topics, and a corpus consisting of 236 book reviews. Additionally, we present an analysis on which features lead to either deceptive or truthful texts, finding that certain words can play different roles (sometimes even opposing ones) depending on the task being evaluated.Â We will present results of experiments in one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation); in a mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and finally, with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training.

Lundi 6 Mars