Gestión proyectos traducción en la Universitat Autònoma de Barcelona

33
Pangeanic PangeaMT Gestión de proyectos de Traducción Automática

Transcript of Gestión proyectos traducción en la Universitat Autònoma de Barcelona

Pangeanic PangeaMT �������������

Gestión de proyectos de

Traducción Automática

������������� �������� ������������ ������������ ���������

Why machine Translation?

������!�����#�����������������#����%������)���������������� �����������������$�����������������������������������&���� ��������������

�(����������� ������������������������� )��*��&���#�����%������' �������������$�"�����������&��������$�"���������������%�#���-+,.��

�  ������.,-/)����� ����� ���� �������� �����������$����������0���"��&����*-�'�"��&���2�-,.-+�

�  ������ ������1,��&�.,.,�

Why machine Translation?

������������������� ��������������������������� ���������� �������������������������� ������

����������

MT Usage �������������� ��������� ���������������������������� ��

�  MT for assimilation: “gisting” or “understanding“

� ����� ������

����� ����

��� ����������

•  Practically unlimited demand; but free web-based services reduce incentive to improve technology

•  Coverage + important. Instant quality�  MT for dissemination: “publication“

�  MT for direct communication

��� ����������

� ����� ������

����� ����

•  Publishable quality that can only be achieved by humans. MT & tools a productivity booster

��� ����������

��� ����������

� ����� ������

����� ����•  Current R&D, Military uses systems

for spoken MT, first applications for smartphones

Machine translation

Language Model (Fluency) Translation Model (Adequacy)

Monolingual

Corpus

Phrase Table

Parallel Corpus

nGram- Model

Alignment, Phrase

Extraction

Counting, Smoothing

Decoder (translator)

Source Text

Target Text

N-bestLists

Basic Architecture for Statistical MT

MT@DGT

A selection of 23 out of 3897 ways to translate operations from EN to FR operations ||| action ||| (0) ||| (0) ||| 0.00338724 0.0017156 0.00316685 0.0034059 2.718 operations ||| actions ||| (0) ||| (0) ||| 0.0575431 0.0534003 0.0731052 0.0958526 2.718 operations ||| activité ||| (0) ||| (0) ||| 0.0102038 0.0079204 0.00744879 0.0084917 2.718 operations ||| activités ||| (0) ||| (0) ||| 0.019962 0.0194538 0.0366753 0.0451576 2.718 operations ||| des actions ||| (0,1) ||| (0) (0) ||| 0.0304499 0.0269505 0.00973472 0.00438066 2.718 operations ||| des activités ||| (0,1) ||| (0) (0) ||| 0.00877089 0.00997725 0.00246435 0.00206379 2.718 operations ||| des opérations ||| (0,1) ||| (0) (0) ||| 0.294821 0.281318 0.0406896 0.0238681 2.718 operations ||| exploitation ||| (0) ||| (0) ||| 0.0437821 0.0365346 0.0208856 0.029298 2.718 operations ||| fonctionnement ||| (0) ||| (0) ||| 0.0141471 0.01165 0.00919948 0.0099513 2.718 operations ||| gestion ||| (0) ||| (0) ||| 0.00141338 0.0013098 0.00286578 0.0032669 2.718 operations ||| intervention ||| (0) ||| (0) ||| 0.00561479 0.0026006 0.00110394 0.0013554 2.718 operations ||| interventions ||| (0) ||| (0) ||| 0.0830237 0.0778631 0.0102142 0.0149096 2.718 operations ||| les actions ||| (0,1) ||| (0) (0) ||| 0.0339458 0.0271478 0.00931099 0.00712787 2.718 operations ||| les activités ||| (0,1) ||| (0) (0) ||| 0.00915348 0.0101746 0.00296613 0.00335805 2.718 operations ||| les interventions ||| (0,1) ||| (0) (0) ||| 0.0565693 0.0393793 0.00207406 0.00110872 2.718 operations ||| les opérations ||| (0,1) ||| (0) (0) ||| 0.413399 0.281515 0.0564235 0.0388363 2.718 operations ||| manipulations ||| (0) ||| (0) ||| 0.0985325 0.183951 0.00104818 0.0034523 2.718 operations ||| operations ||| (0) ||| (0) ||| 0.786026 0.557952 0.00200716 0.0023981 2.718 operations ||| opération ||| (0) ||| (0) ||| 0.0245776 0.021675 0.00785022 0.0085959 2.718 operations ||| opérationnel ||| (0) ||| (0) ||| 0.00656403 0.0069192 0.0012266 0.0013902 2.718 operations ||| opérations effectuées ||| (0,1) ||| (0) (0) ||| 0.110801 0.285316 0.00132696 0.00229301 2.718 operations ||| opérations ||| (0) ||| (0) ||| 0.636821 0.562135 0.409237 0.522254 2.718 operations ||| travaux ||| (0) ||| (0) ||| 0.00273044 0.0024213 0.00194025 0.0023517 2.718

Examples of Statistical MT

Unrest is continuing in Cairo as protesters set up their demand for Egypt’s military rulers to resign

How does Statistical MT work?

•  SMT can be seen as a generalisation of Translation Memory to sub-segmental level

•  The phrases are text snippets (n-grams) taken from real-world translations (= as good as the source you entered)

•  Re-combination of those phrases for out-of-context translation may lead to significant problems : –  Alignment errors � spurious/lost meaning –  Bad morphology (declensions) –  Grammatical errors –  Wrong disambiguation

•  SMT will not recover implicit information from source text nor handle structural mismatches, it needs more info.

} ������������������� � ������������� ��������������� �������������� ��������

SMT from a translator’s perspective?

Strengths and Weaknesses of pure SMT(RBMT:translate pro � SMT:Koehn 2005, examples from EuroParl) EN: I wish the negotiators continued success with their work in this important area. RBMT: Ich wünsche, dass die Unterhändler Erfolg mit ihrer Arbeit in diesem wichtigen Bereich fortsetzten.

continued: Verb instead of adjective SMT: Ich wünsche der Verhandlungsführer fortgesetzte Erfolg bei ihrer Arbeit in diesem wichtigen Bereich.

three wrong inflectional endings

This has consequences in the fair PE payment model

Strengths and Weaknesses of pure SMT

In the early 90s and early 00’s, strong support and advocacy for either of the two technologies: SMT vs RBMT.

But we have learn to combine the best of both worlds. Google ditched Systran (Och, 2005), went all statistical and now SMT plus linguistic models. Microsoft (Chris Wendt) also uses 2 technologies

Examples of Hybrid MT architectures = SMT Module = RBMT Module

The search for integrated (hybrid) methods is now seen as natural extension of both approaches

How does PangeaMT work? It can bring together syntactically distant languages from different families

This has consequences in the fair PE payment model

SYNTAX (LINGUISTIC)-BASED HYBRID SMT

When available, the company plans to offer the following:

available When , the company the following : plans to offer :

����������������������� ������ ����

(VBPt3) (to) (VBinf) (DET) (NN)

(Predicate) Nipponization module

Translation & Cleaning

(Subject)

(VBPt) (to)

(ADV) (ADJ) (Punct)

(DET) (NNSing)

(Cond clause),

How does PangeaMT work?

How do you manage a translation project?

How do you manage a translation project? Typical order workflow at an LSP

http://pangeanic.es/norma-calidad-en15038-gestion-de-ofertas-pedidos-y-contratos/

How do you manage a translation project? EN15038 - Typical translation job project management at a translation company

How do you manage a translation project?

��� �������������������������

•  Main tool for language professionals •  TMs-based retrieval •  Shows similar (hight percent match)

translations stored in the TM for each segment of translation

•  100% matches: perfect, identical segments from the TM

•  Fuzzy matches: there are no 100% matches, but there are similar segments. Generally, matches below 70% require full human “PE”.

•  Mean productivity: 2000-3000 words/day. E.g.: •  Proprietary software: SDL Trados,

MemoQ, Deja Vu, Memsource •  Semi-free: WordFast •  Opensource software: OmegaT

Advantages: �  Higher productivity in professional

translation �  Very popular, low-entry threshold, not

so expensive �  Friendly GUI �  Easy Quality Check (QC) �  Latest versions: can integrate MT as

suggestion Disadvantages: �  Translation is not scalable = production

is directly related to available translators/hour/day

�  If there isn't any TM available, there is no productivity improvement

How do you manage a translation project?

��������� ��������������������

Advantages:�  Translation becomes scalable, millions

of words can be translated per day�  More productivity in non-professional

translations

Disadvantages:�  A lot of data is needed to create a

engine (SMT)�  A lot of time is needed to create an

engine (RBMT)�  MT alone is not useful for professional

translation, only for gisting

•  Production enhancement tools •  Full, complex systems •  Multilanguage •  Full translation •  “Gisting” •  Translate content which otherwise

would be ignored, not accessible or not understood.

How do you manage a translation project?

�������������•  Already available in the 90s between

Trados and Systran•  The CAT tool shows similar translations

from TM and a suggestion or suggestions from MT, and the user chooses what to postedit. E.g.:

translate segments with matches above 70-%75% with TMtranslate segments with matches below 70-75% with MT

•  If custom-built, very good results, but professional translators are reluctant to become post-editors (price pressures)

•  Average productivity: +5000 words/day

Advantages:�  Postediting is faster than translating

from scratch�  The future of the translation industry?�  Enable faster, HQ translations�  Fast postediting if the MT output

reaches certain Q levels

Disadvantages:�  If the MT output is bad, better to

translate from scratch

How do you manage a translation project?

����� ��� ��������������

How do you manage a translation project?

����� ��� �����������

How do you manage a translation project?

��������� ��������������

+: Direct- : PE slow (can't use CAT tools) QC difficult (cannot use XBench or QA Distiller)

How do you manage a translation project?

��������� ��������������

How do you manage a translation project?

������� �������������������

+: Fast if we have no TM QC difficult (cannot use XBench or QA Distiller)- : Slightly slower (minutes) if we have TM as file translation is a checking/retrieval exercise

How do you manage a translation project?

����������� ����

How it works:- During pre-analysis, export unknown segment (segments under XX percent match) into bilingual file- Translate bilingual file with MT- Import MT translated bilingual le into the TM- Select a penalty for MT- We can choose TM or MT (like a “bad TM”)

+ :Fast postediting- :A few steps more to do preparing the project

How do you manage a translation project?

���������������������

How it works:- ���������������������������� ����������������������������������������������������- Import MT translated bilingual le into the TM- Select a penalty for MT- We can choose TM or MT suggestion

+ :Fast post-editing- :If internet-based, latency depends on the Internet

How do you manage a translation project?

�������� �������

Automated engine training

Statistics and data are paramount for PMs

������������ ���������������������� ��

������������������������������� ������������������������������� �� �������������� ���������� ��� ���

�������� ����������������������������

����������������� ����������������������� ���������

�������� ��������������� �� ������ ����

��������������� �!����������� � ����

������������������ ������������� �����$ �������

����� ����� ��#� ��"���� � ��

The post-editor’s dilemma �� ������ �� ������� ���� ���� ���

��� ���� � ������� ���� ����� ��

CLIENT SEES

NEEDS

OMG!!!! BUT I AM A PROFESSIONAL

EXERCISE

Work in pairs, one is PM, one is “expert translator”:

PM tries to convince the translator to accept a lower rate for large volume of work (say 5000 words) to be done in one day as post-editor of an already pre-translated text.

PM sees the aver on a “day salary basis”, offering a likelihood of the text being OK and spending half of the time the translator would spend, so the translator earns the same in terms of timeTRANSLATOR says MT still requires reading, understanding, correcting, sometimes needs to be done from scratch.

Use your own experience with online tools

SHARE THE RESULTS WIT THE TEAM / WHO WINS ? / WHAT LESSONS HAVE YOU LEARNT?

PangeaMT �������������

Thanks!!

Questions?

������������� �������� ����������� ����������� ���������