Neoveille
Repérage, analyse et suivi des néologisme en corpus
Objectifs
- Plateforme de repérage, analyse et suivi des néologismes en corpus (LIPN)
- Etude des emprunts en corpus (LDI, CLILLAC-ARP, Ieda, EMPNEO)
- Etude de la néologie sémantique (ERTIM, LIPN, LDI)
Consortium
- Paris 13 (LIPN, LDI)
- Paris 7 (CLILLAC-ARP)
- INALCO (ERTIM)
- Université Sao Paulo (Ieda Alves)
- Groupe EMPNEO
Embauche d'un ingénieur d'études
ToDo
- Embaucher un ingénieur pour développer la plate-forme (DONE)
- Choisir une architecture approprié au besoin et au caractère multilingue de l'application (DONE)
- POS Tagging
- Greek POS tagging web service (DONE)
- We explore
- Tokenization problem for Tree Tagger (DONE)
- Tree tagger installation (Katia:DONE)
- Emmanuel will perform tests on the TAL server installation, specially of the POS tagging part (Emmanuel)
- Indexing
- IMS CWB web interface and Tree Tagger (Emmanuel, Katia, Jorge; Due date: November 13)
- Katia will install IMS CWB in her computer from scratch (Katia and Jorge)
- After a localhost connection is possible, we would index a corpus from Neoveille and test it (Katia & Jorge)
- Fix or reinstalls CPQ Web in the TAL Server (Katia, Jorge, Emmanuel)
- Infrastructure
- Redmine migration (Jorge)
- Gibhub for Neoveille (Jorge)
- Document TAL Cluster (Emmanuel)
- Project web site (Katia)
- Python migration (Katia)
- Iteration 1
- Which architecture for Neoveille's functional interface (Emmanuel, Jorge, Katia)
- Next meeting: Friday, November 17th, 14h
Schedule
Iteration 0: seven languages with POS tagging on IMS CWB
Scheduled date: November 5
RSS processing
POS Tagging
- Milestone: produce the same output for the 7 language in the TAL server every month from the RSS input
- POS Tagging in the seven language
- Greek
- Chinese
- Russian
- Portuguese
- Polish
- Czech
- French
- POS Tagging with the RSS input
Indexing of the POS Tagging RSS output for IMS CWB
Dependency Analysis
Neoveille Web interface
Project web site
Iteration 1: seven languages with POS tagging and dependency analysis on D3 or IMS CWB
Iteration 2: neologism detection
Questions
- Est-ce qu'on peut distribué en libre des données qu'on au recueilli à partir d'un fil RSS?