My Picture Marco Turchi
Miscellaneous
Home  Research Interests Publications Software
and
Applications
Miscellaneous

Marco turchi

Miscellaneous

- Co-Organizer of:

  1. Intelligent Analysis and Processing of Web News Content workshop at WI-IAT - Milan 15 September 2009  
  2. Statistical Multilingual Analysis for Retrieval and Translation associated workshop at EAMT - Barcelona 13 May 2009
  3. European Project SMART Meeting in Bristol May, 2008

- Coordinator and head coach of basketball teams from September 1993

- Student Co-advisor for Master and Degree thesis on Text Analysis

-

 Talked About My Work 

- ONTS: "Optima" News Translation System has been mentioned here

- Our PLoS ONE paper "The Structure of EU Mediasphere" has been mentioned in    the following media

-

 NLP/Text Mining Libraries

- Gate a General Architecture for Text Engineering

- Weka Data Mining software in Java

- Apache Lucene: information retrieval library  

- lingpipe: Java libraries for the linguistic analysis of human language

-

 SMT tools

- Moses:  statistical Machine Translation System

- srilm: toolkit for building and applying statistical language models (LMs)

- irstlm: LM toolkit

- Giza++: training of statistical translation models

- Multi-thread GIZA: multi-thread extension to GIZA++ word aligning tool.

-

 General purpose Libraries

- SVMlight: an implementation of Support Vector Machines (SVMs) in C

- Apache Cayenne: persistence framework  providing object-relational mapping (ORM) and remoting services

- SciPy: software for mathematics, science, and engineering in Python

- mysql++: C++ wrapper for MySQL’s C API

-

 Corpora

- Europarl: parallel corpus for SMT in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.

- JRC-Acquis: parallel corpus for SMT in 22 languages.

- seTimes: parallel corpus for SMT for Balcanic languages: Turkish, Croatian, Albanian, Serbian, Macedonian, Bulgarian, Greek, Romanian, English. 

- EMEA: parallel corpus from the European Medicines Agency in 22 languages.

- CzEng: Czech-Englsih parallel corpus.

- EPPS: word alignment documents

- Spanish-Dutch NER human annotated data

-

 My extended CV

- Download here

df
Whiteboard:

Document made with KompoZer