Gestión proyectos traducción en la Universitat Autònoma de Barcelona
-
Upload
manuel-herranz -
Category
Business
-
view
130 -
download
1
Transcript of Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Pangeanic PangeaMT �������������
Gestión de proyectos de
Traducción Automática
������������� �������� ������������ ������������ ���������
Why machine Translation?
������!�����#�����������������#����%������)���������������� �����������������$�����������������������������������&���� ��������������
�(����������� ������������������������� )��*��&���#�����%������' �������������$�"�����������&��������$�"���������������%�#���-+,.��
� ������.,-/)����� ����� ���� �������� �����������$����������0���"��&����*-�'�"��&���2�-,.-+�
� ������ ������1,��&�.,.,�
Why machine Translation?
������������������� ��������������������������� ���������� �������������������������� ������
����������
MT Usage �������������� ��������� ���������������������������� ��
� MT for assimilation: “gisting” or “understanding“
� ����� ������
����� ����
��� ����������
• Practically unlimited demand; but free web-based services reduce incentive to improve technology
• Coverage + important. Instant quality� MT for dissemination: “publication“
� MT for direct communication
��� ����������
� ����� ������
����� ����
• Publishable quality that can only be achieved by humans. MT & tools a productivity booster
��� ����������
��� ����������
� ����� ������
����� ����• Current R&D, Military uses systems
for spoken MT, first applications for smartphones
Machine translation
Language Model (Fluency) Translation Model (Adequacy)
Monolingual
Corpus
Phrase Table
Parallel Corpus
nGram- Model
Alignment, Phrase
Extraction
Counting, Smoothing
Decoder (translator)
Source Text
Target Text
N-bestLists
Basic Architecture for Statistical MT
MT@DGT
A selection of 23 out of 3897 ways to translate operations from EN to FR operations ||| action ||| (0) ||| (0) ||| 0.00338724 0.0017156 0.00316685 0.0034059 2.718 operations ||| actions ||| (0) ||| (0) ||| 0.0575431 0.0534003 0.0731052 0.0958526 2.718 operations ||| activité ||| (0) ||| (0) ||| 0.0102038 0.0079204 0.00744879 0.0084917 2.718 operations ||| activités ||| (0) ||| (0) ||| 0.019962 0.0194538 0.0366753 0.0451576 2.718 operations ||| des actions ||| (0,1) ||| (0) (0) ||| 0.0304499 0.0269505 0.00973472 0.00438066 2.718 operations ||| des activités ||| (0,1) ||| (0) (0) ||| 0.00877089 0.00997725 0.00246435 0.00206379 2.718 operations ||| des opérations ||| (0,1) ||| (0) (0) ||| 0.294821 0.281318 0.0406896 0.0238681 2.718 operations ||| exploitation ||| (0) ||| (0) ||| 0.0437821 0.0365346 0.0208856 0.029298 2.718 operations ||| fonctionnement ||| (0) ||| (0) ||| 0.0141471 0.01165 0.00919948 0.0099513 2.718 operations ||| gestion ||| (0) ||| (0) ||| 0.00141338 0.0013098 0.00286578 0.0032669 2.718 operations ||| intervention ||| (0) ||| (0) ||| 0.00561479 0.0026006 0.00110394 0.0013554 2.718 operations ||| interventions ||| (0) ||| (0) ||| 0.0830237 0.0778631 0.0102142 0.0149096 2.718 operations ||| les actions ||| (0,1) ||| (0) (0) ||| 0.0339458 0.0271478 0.00931099 0.00712787 2.718 operations ||| les activités ||| (0,1) ||| (0) (0) ||| 0.00915348 0.0101746 0.00296613 0.00335805 2.718 operations ||| les interventions ||| (0,1) ||| (0) (0) ||| 0.0565693 0.0393793 0.00207406 0.00110872 2.718 operations ||| les opérations ||| (0,1) ||| (0) (0) ||| 0.413399 0.281515 0.0564235 0.0388363 2.718 operations ||| manipulations ||| (0) ||| (0) ||| 0.0985325 0.183951 0.00104818 0.0034523 2.718 operations ||| operations ||| (0) ||| (0) ||| 0.786026 0.557952 0.00200716 0.0023981 2.718 operations ||| opération ||| (0) ||| (0) ||| 0.0245776 0.021675 0.00785022 0.0085959 2.718 operations ||| opérationnel ||| (0) ||| (0) ||| 0.00656403 0.0069192 0.0012266 0.0013902 2.718 operations ||| opérations effectuées ||| (0,1) ||| (0) (0) ||| 0.110801 0.285316 0.00132696 0.00229301 2.718 operations ||| opérations ||| (0) ||| (0) ||| 0.636821 0.562135 0.409237 0.522254 2.718 operations ||| travaux ||| (0) ||| (0) ||| 0.00273044 0.0024213 0.00194025 0.0023517 2.718
Examples of Statistical MT
Unrest is continuing in Cairo as protesters set up their demand for Egypt’s military rulers to resign
How does Statistical MT work?
• SMT can be seen as a generalisation of Translation Memory to sub-segmental level
• The phrases are text snippets (n-grams) taken from real-world translations (= as good as the source you entered)
• Re-combination of those phrases for out-of-context translation may lead to significant problems : – Alignment errors � spurious/lost meaning – Bad morphology (declensions) – Grammatical errors – Wrong disambiguation
• SMT will not recover implicit information from source text nor handle structural mismatches, it needs more info.
} ������������������� � ������������� ��������������� �������������� ��������
SMT from a translator’s perspective?
Strengths and Weaknesses of pure SMT(RBMT:translate pro � SMT:Koehn 2005, examples from EuroParl) EN: I wish the negotiators continued success with their work in this important area. RBMT: Ich wünsche, dass die Unterhändler Erfolg mit ihrer Arbeit in diesem wichtigen Bereich fortsetzten.
continued: Verb instead of adjective SMT: Ich wünsche der Verhandlungsführer fortgesetzte Erfolg bei ihrer Arbeit in diesem wichtigen Bereich.
three wrong inflectional endings
This has consequences in the fair PE payment model
Strengths and Weaknesses of pure SMT
In the early 90s and early 00’s, strong support and advocacy for either of the two technologies: SMT vs RBMT.
But we have learn to combine the best of both worlds. Google ditched Systran (Och, 2005), went all statistical and now SMT plus linguistic models. Microsoft (Chris Wendt) also uses 2 technologies
Examples of Hybrid MT architectures = SMT Module = RBMT Module
The search for integrated (hybrid) methods is now seen as natural extension of both approaches
How does PangeaMT work? It can bring together syntactically distant languages from different families
This has consequences in the fair PE payment model
SYNTAX (LINGUISTIC)-BASED HYBRID SMT
When available, the company plans to offer the following:
available When , the company the following : plans to offer :
����������������������� ������ ����
(VBPt3) (to) (VBinf) (DET) (NN)
(Predicate) Nipponization module
Translation & Cleaning
(Subject)
(VBPt) (to)
(ADV) (ADJ) (Punct)
(DET) (NNSing)
(Cond clause),
How does PangeaMT work?
How do you manage a translation project? Typical order workflow at an LSP
http://pangeanic.es/norma-calidad-en15038-gestion-de-ofertas-pedidos-y-contratos/
How do you manage a translation project? EN15038 - Typical translation job project management at a translation company
How do you manage a translation project?
��� �������������������������
• Main tool for language professionals • TMs-based retrieval • Shows similar (hight percent match)
translations stored in the TM for each segment of translation
• 100% matches: perfect, identical segments from the TM
• Fuzzy matches: there are no 100% matches, but there are similar segments. Generally, matches below 70% require full human “PE”.
• Mean productivity: 2000-3000 words/day. E.g.: • Proprietary software: SDL Trados,
MemoQ, Deja Vu, Memsource • Semi-free: WordFast • Opensource software: OmegaT
Advantages: � Higher productivity in professional
translation � Very popular, low-entry threshold, not
so expensive � Friendly GUI � Easy Quality Check (QC) � Latest versions: can integrate MT as
suggestion Disadvantages: � Translation is not scalable = production
is directly related to available translators/hour/day
� If there isn't any TM available, there is no productivity improvement
How do you manage a translation project?
��������� ��������������������
Advantages:� Translation becomes scalable, millions
of words can be translated per day� More productivity in non-professional
translations
Disadvantages:� A lot of data is needed to create a
engine (SMT)� A lot of time is needed to create an
engine (RBMT)� MT alone is not useful for professional
translation, only for gisting
• Production enhancement tools • Full, complex systems • Multilanguage • Full translation • “Gisting” • Translate content which otherwise
would be ignored, not accessible or not understood.
How do you manage a translation project?
�������������• Already available in the 90s between
Trados and Systran• The CAT tool shows similar translations
from TM and a suggestion or suggestions from MT, and the user chooses what to postedit. E.g.:
translate segments with matches above 70-%75% with TMtranslate segments with matches below 70-75% with MT
• If custom-built, very good results, but professional translators are reluctant to become post-editors (price pressures)
• Average productivity: +5000 words/day
Advantages:� Postediting is faster than translating
from scratch� The future of the translation industry?� Enable faster, HQ translations� Fast postediting if the MT output
reaches certain Q levels
Disadvantages:� If the MT output is bad, better to
translate from scratch
How do you manage a translation project?
��������� ��������������
+: Direct- : PE slow (can't use CAT tools) QC difficult (cannot use XBench or QA Distiller)
How do you manage a translation project?
������� �������������������
+: Fast if we have no TM QC difficult (cannot use XBench or QA Distiller)- : Slightly slower (minutes) if we have TM as file translation is a checking/retrieval exercise
How do you manage a translation project?
����������� ����
How it works:- During pre-analysis, export unknown segment (segments under XX percent match) into bilingual file- Translate bilingual file with MT- Import MT translated bilingual le into the TM- Select a penalty for MT- We can choose TM or MT (like a “bad TM”)
+ :Fast postediting- :A few steps more to do preparing the project
How do you manage a translation project?
���������������������
How it works:- ���������������������������� ����������������������������������������������������- Import MT translated bilingual le into the TM- Select a penalty for MT- We can choose TM or MT suggestion
+ :Fast post-editing- :If internet-based, latency depends on the Internet
������������ ���������������������� ��
������������������������������� ������������������������������� �� �������������� ���������� ��� ���
�������� ����������������������������
����������������� ����������������������� ���������
�������� ��������������� �� ������ ����
��������������� �!����������� � ����
������������������ ������������� �����$ �������
����� ����� ��#� ��"���� � ��
The post-editor’s dilemma �� ������ �� ������� ���� ���� ���
��� ���� � ������� ���� ����� ��
CLIENT SEES
NEEDS
OMG!!!! BUT I AM A PROFESSIONAL
EXERCISE
Work in pairs, one is PM, one is “expert translator”:
PM tries to convince the translator to accept a lower rate for large volume of work (say 5000 words) to be done in one day as post-editor of an already pre-translated text.
PM sees the aver on a “day salary basis”, offering a likelihood of the text being OK and spending half of the time the translator would spend, so the translator earns the same in terms of timeTRANSLATOR says MT still requires reading, understanding, correcting, sometimes needs to be done from scratch.
Use your own experience with online tools
SHARE THE RESULTS WIT THE TEAM / WHO WINS ? / WHAT LESSONS HAVE YOU LEARNT?