I have focused my research on information
extraction, retrieval and organization from text documents, using several machine learning (probabilistic
and statistical) approaches.
In these years, I have analyzed different
aspects of text/web mining moving from classical text problems as
supervised and unsupervised learning, or new representation for
text document to more computational linguistic tasks, as language
evolution and statistical machine translation.
I have been part of the European project SMART,
where I have applied machine learning techniques to statistical machine translation
problems, and I have been also involved in a media analysis project aimed at
modeling the mediasphere based on text mining and cross-language
analysis techniques.
My
current research is centered on thesaurus indexing, machine
learning, multilingual multi class document classification,
news analysis and SMT
techniques applied to news domain.
My
last works involve learning curves analysis of a SMT system, confidence
estimation in Machine Translation, self-learning for SMT system.