My Picture Marco Turchi
Research Interests
Home  Research Interests Publications Software

Marco turchi

My Research

I have focused my research on  information extraction, retrieval and organization from text documents, using several machine learning (probabilistic and statistical) approaches. 

In these years, I have analyzed different aspects of text/web mining moving from classical text problems as supervised and unsupervised learning, or new representation for text document to more computational linguistic tasks, as statistical machine translation and document summarization. 

I have been part of the European project SMART, where I have applied machine learning techniques to statistical machine translation (SMT) problems, and I have been also involved in a media analysis project aimed at modeling the mediasphere based on text mining and cross-language analysis techniques.

My current research is centered on SMT techniques applied to news domain, in particular, on the use of translated documents in different NLP tasks such as document summarization, event extraction and  sentiment analysis. 
I'm also interested on multilingual multi-label document classification, robust approaches for outliers detection in text mining, multilingual patterns learning and news analysis. 

My last works involve development of a translation service of news, learning curves analysis of a SMT system, use of machine translation to directly and indirectly address multilingual text mining and document summarization.


Statistical Machine Translation (SMT)

  • Improving Translation Quality in News domain.
  • Learning Curves of SMT systems.
  • Confidence Estimation.
  • Parallel Sentences Extraction from the Web.
  • Self-Learning for SMT systems.
  • Hybrid system.
  • ...

Natural Language Processing and Text/Web Mining:

  • Feature selection.
  • Text classification.
  • Integration of semantic in text classification.
  • Detection of content in text documents.
  • Clustering techniques.
  • Semi-supervised clustering algorithms.
  • Soft-clustering.
  • Active learning algorithms for text classification. 
  • Time series analysis of textual data.
  • Detection of text pattern into text documents.
  • Extraction and analysis of multilingual text content.
  • Document Summarization.
  • ...

Machine Learning:

  • Smoothing techniques for parameter estimation.
  • Regression and classification algorithms.
  • Learning curves.
  • Kernel methods - String Kernels.
  • Graphical Models.
  • Robust approaches for outlier detection.
  • ...

Language evolution.

High Performance Computing (HPC), large-scale dataset.

Some links:

Svm - Support Vector Machine

- Kernel Methods for Pattern Analysis (here)

- PLS - Partial Least Square

- Graphical Model (here)

- LDA - Latent Dirichlet Allocation

- String Kernels (here)

NLP world:

- ACL - Association for Computational Linguistics

- SMT - Statistical Machine Translation

- Machine Translation Archive (here)

- EAMT - European Association for Machine Translation

Document made with KompoZer