Jeudi 18 Février


Retour à la vue des calendrier
Jeudi 18 Février
Heure: 14:00 - 15:00
Lieu: Salle B107, bâtiment B, Université de Villetaneuse
Résumé: Sampled Weighted Min-Hashing for Large-Scale Topic Mining
Description: Ivan Vladimir MEZA Sampled Weighted Min-Hashing (SWMH) is a randomized approach to automatically mine topics from large-scale corpora. SWMH generates multiple random partitions of the corpus vocabulary based on term co-occurrence and agglomerates highly overlapping inter-partition cells to produce the mined topics. While alternative approaches define a topic as a probabilistic distribution over the complete vocabulary, SWMH topics are subsets of such vocabulary. Interestingly, the topics mined by SWMH underlie themes from the corpus at different levels of granularity. We extensively evaluate the meaningfulness of the mined topics both qualitatively and quantitatively on the NIPS (1.7K documents), 20 Newsgroup (20K), Reuters (800K) and Wikipedia (4M) corpora.