17 December 2009
Introduction to translation technologies
I have just finished another round of my "Introduction to translation technologies" course at Lessius Hogeschool in Antwerp. Here's a subset of the slides that I used...
15 October 2009
AutoSuggest: translating in a higher gear
SDL has just released Service Pack 1 for SDL Trados Studio 2009. The update will be welcomed with a sigh of relief by Studio's early adopters, who suffered from some stability issues. I've been testing the service pack's beta, and can only confirm that these issues have now been fixed.
After playing (and, er, working) with Studio for a few months, I can barely conceal my enthusiasm for the innovative AutoSuggest technology. Hmmm, "AutoSuggest" - sounds more like some esotheric method to quit smoking, doesn't it? Anyway, this feature turns out to be a tremendous productivity enhancer for the translator. In fact, this may be the single most useful CAT-feature since the invention of fuzzy matching.
So how does this work? Well, the simplicity is almost too good to be true. While you translate in Studio's editor, tooltip-style suggestions will appear as soon as you type the first character(s) of a word. If the suggested term or phrase is appropriate, you simply press the Enter key to insert it. If you don't like the suggestion, you ignore it and keep typing. No buttons to click, no Alt+Ctrl+whatever. Just Enter.
The best part about AutoSuggest is its intelligence: you won't get hundreds of suggestions, but only those that make sense within the context of the current sentence. This distinguishes Studio's technology from the predictive "dictionary" option for texting on mobile phones. Just have a look at the examples below.
Figure 1: In this sentence, as soon as I type "r", Trados Studio suggests réchauffement planétaire as a French translation for the source term "global warming".
Figure 2: In the next sentence, when I type "r" once more, a different suggestion appears. Because of the new source-sentence context, Trados Studio now suggests réduction des émissions (emission reduction).
The secret behind this intelligence is my so-called "AutoSuggest dictionary". Studio allows me to generate such a dictionary from any translation memory or TMX file, provided that it contains at least 25,000 translation units (like the EU's Acquis Communautaire in TMX format). This approach ensures that AutoSuggest will only propose translations that are relevant for my subject field. Additionally, termbase terms and AutoText entries can be inserted in the same way (type the first characters and press Enter).
AutoSuggest is simply the cleverest implementation of sub-segment matching I've seen so far. The translator stays in full control, but benefits from a true time saver and quality enhancer. Want to try this yourself? You can find freely downloadable AutoSuggest dictionaries on TranslationZone and Xenotext.
After playing (and, er, working) with Studio for a few months, I can barely conceal my enthusiasm for the innovative AutoSuggest technology. Hmmm, "AutoSuggest" - sounds more like some esotheric method to quit smoking, doesn't it? Anyway, this feature turns out to be a tremendous productivity enhancer for the translator. In fact, this may be the single most useful CAT-feature since the invention of fuzzy matching.
So how does this work? Well, the simplicity is almost too good to be true. While you translate in Studio's editor, tooltip-style suggestions will appear as soon as you type the first character(s) of a word. If the suggested term or phrase is appropriate, you simply press the Enter key to insert it. If you don't like the suggestion, you ignore it and keep typing. No buttons to click, no Alt+Ctrl+whatever. Just Enter.
The best part about AutoSuggest is its intelligence: you won't get hundreds of suggestions, but only those that make sense within the context of the current sentence. This distinguishes Studio's technology from the predictive "dictionary" option for texting on mobile phones. Just have a look at the examples below.
Figure 1: In this sentence, as soon as I type "r", Trados Studio suggests réchauffement planétaire as a French translation for the source term "global warming".
Figure 2: In the next sentence, when I type "r" once more, a different suggestion appears. Because of the new source-sentence context, Trados Studio now suggests réduction des émissions (emission reduction).
The secret behind this intelligence is my so-called "AutoSuggest dictionary". Studio allows me to generate such a dictionary from any translation memory or TMX file, provided that it contains at least 25,000 translation units (like the EU's Acquis Communautaire in TMX format). This approach ensures that AutoSuggest will only propose translations that are relevant for my subject field. Additionally, termbase terms and AutoText entries can be inserted in the same way (type the first characters and press Enter).
AutoSuggest is simply the cleverest implementation of sub-segment matching I've seen so far. The translator stays in full control, but benefits from a true time saver and quality enhancer. Want to try this yourself? You can find freely downloadable AutoSuggest dictionaries on TranslationZone and Xenotext.
9 July 2009
Tips for translating PDF files
In almost any of my Trados trainings, one of my trainees will at some point ask me "And what about PDFs?" Indeed, it's fine to learn how to translate word processor documents or web pages with Trados or other CAT tools, but most translators receive some (or all) of their source documents in PDF format.
Usually this question leads to an animated discussion. Some trainees say they use third-party tools to convert PDF files into Word documents. "It works perfectly", some say. "I tried it and it's useless - the layout becomes a mess", others argue.
The new SDL Trados Studio 2009 offers support for PDF files, but I have some mixed feelings about that. It creates the (false) impression that PDF is a format like any other. In reality, a PDF is, well, like a box of chocolates. You never know what you're gonna get.
Here are a few tips that may be helpful when you need to translate PDF documents with CAT tools.
Let's say that I have written a manual in Adobe FrameMaker, and I want to distribute it on my website. Only people who have FrameMaker on their computer would be able to open that manual. So in order to make this document accessible, I decide to publish it in the Portable Document Format (PDF). Now it can be opened on most computers, even across different platforms (Windows, Mac, Linux, ...). That's what PDFs are all about.
But while the distribution of PDFs is easy, extraction of translatable content from a PDF document is a much bigger challenge. So as a translator, always try to convince your customer to send you the original files as well. Professional CAT tools are much better at processing the underlying formats, such as Word, PowerPoint, FrameMaker, InDesign etc. This is by far the best workflow.
Sometimes, your customer may not have the files in the original format, though. In that case, continue reading the tips below.
Acrobat Reader or Foxit Reader are free tools that enable you to open a PDF document. You can then even save the content as a text file. But by doing so, a hard return will appear at the end of each line, which will cause incorrect segmentation in your translation editor. So you'll need a more sophisticated solution instead.
SDL Trados Studio 2009 and some other CAT tools include a third-party PDF converter. If you CAT tool doesn't support PDF, try ReadIris, Nuance PDF Converter, Solid Converter or Abbyy Transformer, or a free online service like PDFtoWord.com.
Inform your customer about the challenges of converting PDFs. For instance, it may be possible to extract the text, but the original layout may be (partly) lost, especially when the document consists of multiple columns or text boxes. If the customer expects to receive the translated document with an identical layout, extra work may be needed. Is your customer prepared to pay extra for this?
Suppose I had a paper document, took a picture of it, and pasted that picture in an empty Word document. Would that qualify as the Word version of my document?
Your customer's "PDF document" may have been created in a similar way. Imagine a hard-to-read fax, printed on thermal paper, that was scanned as a picture with a flatbed scanner and then saved as PDF. You may even see coffee stains or other dirt on the document. Technically speaking, it's a PDF file, but from a translation automation point of view, it's about as useless as a handwritten document. It will be virtually impossible to extract any text from such PDFs, so you may have to retype the source document before you can even start translating it.
Even the best PDF converter may not succeed in extracting all text properly. Your solution may for instance work fine for English or other Western languages... but can it handle Russian, Korean or Amharic?
I tried converting a mixed English and Slovak PDF with Zamzar, and all characters with Slovak diacritics were corrupted. If you want to know whether it's your converter or the PDF itself that's causing the problem, simply try another conversion solution. PDFtoWord.com converted my Slovak text without problems.
But my next PDF may have been generated from a legacy DTP format on an old Mac, with text saved as pictures, and with strict security settings. You never know what you're gonna get...
Usually this question leads to an animated discussion. Some trainees say they use third-party tools to convert PDF files into Word documents. "It works perfectly", some say. "I tried it and it's useless - the layout becomes a mess", others argue.
The new SDL Trados Studio 2009 offers support for PDF files, but I have some mixed feelings about that. It creates the (false) impression that PDF is a format like any other. In reality, a PDF is, well, like a box of chocolates. You never know what you're gonna get.
Here are a few tips that may be helpful when you need to translate PDF documents with CAT tools.
1. Ask for the original files
Let's say that I have written a manual in Adobe FrameMaker, and I want to distribute it on my website. Only people who have FrameMaker on their computer would be able to open that manual. So in order to make this document accessible, I decide to publish it in the Portable Document Format (PDF). Now it can be opened on most computers, even across different platforms (Windows, Mac, Linux, ...). That's what PDFs are all about.
But while the distribution of PDFs is easy, extraction of translatable content from a PDF document is a much bigger challenge. So as a translator, always try to convince your customer to send you the original files as well. Professional CAT tools are much better at processing the underlying formats, such as Word, PowerPoint, FrameMaker, InDesign etc. This is by far the best workflow.
Sometimes, your customer may not have the files in the original format, though. In that case, continue reading the tips below.
2. Choose a reliable PDF converter
Acrobat Reader or Foxit Reader are free tools that enable you to open a PDF document. You can then even save the content as a text file. But by doing so, a hard return will appear at the end of each line, which will cause incorrect segmentation in your translation editor. So you'll need a more sophisticated solution instead.
SDL Trados Studio 2009 and some other CAT tools include a third-party PDF converter. If you CAT tool doesn't support PDF, try ReadIris, Nuance PDF Converter, Solid Converter or Abbyy Transformer, or a free online service like PDFtoWord.com.
3. Manage your customer's expectations
Inform your customer about the challenges of converting PDFs. For instance, it may be possible to extract the text, but the original layout may be (partly) lost, especially when the document consists of multiple columns or text boxes. If the customer expects to receive the translated document with an identical layout, extra work may be needed. Is your customer prepared to pay extra for this?
4. Ask for a sample file before accepting the job
Suppose I had a paper document, took a picture of it, and pasted that picture in an empty Word document. Would that qualify as the Word version of my document?
Your customer's "PDF document" may have been created in a similar way. Imagine a hard-to-read fax, printed on thermal paper, that was scanned as a picture with a flatbed scanner and then saved as PDF. You may even see coffee stains or other dirt on the document. Technically speaking, it's a PDF file, but from a translation automation point of view, it's about as useless as a handwritten document. It will be virtually impossible to extract any text from such PDFs, so you may have to retype the source document before you can even start translating it.
5. Test the conversion - again and again
Even the best PDF converter may not succeed in extracting all text properly. Your solution may for instance work fine for English or other Western languages... but can it handle Russian, Korean or Amharic?
I tried converting a mixed English and Slovak PDF with Zamzar, and all characters with Slovak diacritics were corrupted. If you want to know whether it's your converter or the PDF itself that's causing the problem, simply try another conversion solution. PDFtoWord.com converted my Slovak text without problems.
But my next PDF may have been generated from a legacy DTP format on an old Mac, with text saved as pictures, and with strict security settings. You never know what you're gonna get...
Labels:
PDF,
SDL Trados Studio,
Trados,
translation,
translation technology
29 May 2009
Exit Translator's Workbench, enter SDL Trados Studio
Today SDL Trados Technologies has presented its new flagship product SDL Trados Studio 2009 in Antwerp. The imminent release of this new CAT solution marks the end of an era. Fifteen years after the launch of TRADOS Translator's Workbench, the market-leading translation memory tool is about to be replaced by a radically new productivity enhancement solution for translators and project managers. Will SDL succeed in continuing the success story?
After SDL acquired Trados in 2005, it was obvious that the SDLX and Trados translation solutions would not continue to exist side by side. Still, it has taken SDL four years to combine the best of both products and to add innovative ideas to create a new, integrated environment - no longer a "workbench" (with my two left hands, I've always found the DIY connotations of this term rather frightening), but a more sophisticated "studio".
In the classic Trados universe, Microsoft Word was the preferred translation environment. Yet, this approach had many weaknesses, such as the fragility of the Trados segment delimiters (or "purple thingies") and the frequent occurrence of unintentional formatting changes. TagEditor, although a stable and technologically advanced alternative, seemed to lack visual appeal and usability, making it one of the most controversial tools in the translator community.
The new SDL Trados Studio 2009, as today demonstrated by SDL's Tracey Byrne, offers a very different approach. The new translation editor displays source and target segments in a spreadsheet-like column view, with an impressive real-time preview pane as a bonus. Tags are hidden whenever possible, but are still visible when this is needed for localization purposes. Formatting can be copied from source to target segment in a variety of ways, all of which seem faster and easier than the "get placeable" concept in older Trados versions. Predictive sub-segment suggestions may considerably speed up typing.
Reviewers and project managers will appreciate the more refined quality assurance checks, filtered views, status flags and tons of additional features. In fact, with so many options, commands, menus and buttons, the screen could easily become cluttered, but customizable views and auto-hiding panes enable the user to create a surprisingly clear working environment.
The importance of some innovative (and patented) features of SDL Trados Studio 2009 is highlighted by the use of fancy names like RevleX (the new XML-based translation memory format), QuickPlace (a feature to quickly copy formatting, tags, variables etc.) and AutoSuggest (suggested terms and phrases to speed up typing).
In the coming weeks, I'll discuss some of these features in more detail. For now, one thing is certain: we're about to witness some interesting times. How eager will the traditional Trados users be to adopt this new platform? Will they feel disoriented and become nostalgic about the good old purple thingies and bizarre command names like Set/Close Next no 100% Open/Get? Or will SDL Trados Studio 2009 represent the quantum leap that wipes out these old memories for good?
After SDL acquired Trados in 2005, it was obvious that the SDLX and Trados translation solutions would not continue to exist side by side. Still, it has taken SDL four years to combine the best of both products and to add innovative ideas to create a new, integrated environment - no longer a "workbench" (with my two left hands, I've always found the DIY connotations of this term rather frightening), but a more sophisticated "studio".
In the classic Trados universe, Microsoft Word was the preferred translation environment. Yet, this approach had many weaknesses, such as the fragility of the Trados segment delimiters (or "purple thingies") and the frequent occurrence of unintentional formatting changes. TagEditor, although a stable and technologically advanced alternative, seemed to lack visual appeal and usability, making it one of the most controversial tools in the translator community.
The new SDL Trados Studio 2009, as today demonstrated by SDL's Tracey Byrne, offers a very different approach. The new translation editor displays source and target segments in a spreadsheet-like column view, with an impressive real-time preview pane as a bonus. Tags are hidden whenever possible, but are still visible when this is needed for localization purposes. Formatting can be copied from source to target segment in a variety of ways, all of which seem faster and easier than the "get placeable" concept in older Trados versions. Predictive sub-segment suggestions may considerably speed up typing.
Reviewers and project managers will appreciate the more refined quality assurance checks, filtered views, status flags and tons of additional features. In fact, with so many options, commands, menus and buttons, the screen could easily become cluttered, but customizable views and auto-hiding panes enable the user to create a surprisingly clear working environment.
The importance of some innovative (and patented) features of SDL Trados Studio 2009 is highlighted by the use of fancy names like RevleX (the new XML-based translation memory format), QuickPlace (a feature to quickly copy formatting, tags, variables etc.) and AutoSuggest (suggested terms and phrases to speed up typing).
In the coming weeks, I'll discuss some of these features in more detail. For now, one thing is certain: we're about to witness some interesting times. How eager will the traditional Trados users be to adopt this new platform? Will they feel disoriented and become nostalgic about the good old purple thingies and bizarre command names like Set/Close Next no 100% Open/Get? Or will SDL Trados Studio 2009 represent the quantum leap that wipes out these old memories for good?
12 May 2009
Choosing the right translation tools
When translation agencies or freelance translators are trying to decide which translation memory tools they should buy, they are easily mislead. Marketing messages like "boost your productivity" or "maximum efficiency" may sound tempting, but usually conceal a blind spot in the tool vendors' claims.
A translation service provider's first concern should be compatibility. You need to be compatible with your client's formats, tools and workflow. If most of your clients produce documents with Microsoft Office, then by all means make sure that you have access to one or more Office licences, preferably in all popular flavours (2007, 2003, XP).
Likewise, if 80 percent of your (potential) clients use Trados, then investing in the same Trados version should be your number one priority. If other CAT tools like Star Transit, MemoQ or Across are leading in your market niche, then don't hesitate to choose the best match.
Most translation tools claim compatibility with one another, usually through standards such as TMX or XLIFF. But in my experience, the average translation project is already complex enough, even if the same version of the same tool is used by all parties involved in the process. Any additional conversion steps from one tool or format to another may cause unnecessary trouble, delays, or data corruption.
If you're a service provider, the inherent qualities or price of a tool only represent secondary reasons to adopt a solution. Whether or not you can seamlessly step into your client's process, that's the key to success.
Moreover, in an ever-changing market, it may be wise not to put all your eggs in one basket. Invest in multiple translation memory systems if that is what is needed to cover your customer base. It will be money well spent.
A translation service provider's first concern should be compatibility. You need to be compatible with your client's formats, tools and workflow. If most of your clients produce documents with Microsoft Office, then by all means make sure that you have access to one or more Office licences, preferably in all popular flavours (2007, 2003, XP).
Likewise, if 80 percent of your (potential) clients use Trados, then investing in the same Trados version should be your number one priority. If other CAT tools like Star Transit, MemoQ or Across are leading in your market niche, then don't hesitate to choose the best match.
Most translation tools claim compatibility with one another, usually through standards such as TMX or XLIFF. But in my experience, the average translation project is already complex enough, even if the same version of the same tool is used by all parties involved in the process. Any additional conversion steps from one tool or format to another may cause unnecessary trouble, delays, or data corruption.
If you're a service provider, the inherent qualities or price of a tool only represent secondary reasons to adopt a solution. Whether or not you can seamlessly step into your client's process, that's the key to success.
Moreover, in an ever-changing market, it may be wise not to put all your eggs in one basket. Invest in multiple translation memory systems if that is what is needed to cover your customer base. It will be money well spent.
23 March 2009
Low-cost terminology management
Every translator, technical writer, international organization or global company sooner or later has a need for efficient terminology management. The use of the wrong term in the wrong place may cause anything from a small nuisance or misunderstanding to multi-million dollar lawsuits, diplomatic incidents, or serious - if not fatal - accidents.
Professional terminology management systems are expensive, but the first step towards an efficient solution is collecting the content. This doesn't necessarily require high tech applications. Common office tools like Microsoft Excel or its free open source equivalent Calc, part of the OpenOffice.org suite, are perfect for the job.
I've recently created a sample spreadsheet termbase in Google Docs. It's free and it's web-based, so it allows me to easily share my dictionary. Moreover, this is a perfect input file for a conversion to SDL MultiTerm.
Although spreadsheets are primarily intended for numbers, formulas, charts and financial data, they include a few features that can be very convenient for glossaries: you can easily move columns around, insert new columns or hide existing ones, sort your data alphabetically, export the content into tab-delimited exchange files etc.
In spreadsheet format, you just need to respect the basic principles of database design. One such principle is that each column represents a field (English, French, Definition, Gender, ...), and each row represents a concept.
Furthermore, it is not a good idea to mix different types of information within the same field. If you do enter for instance both a term and its gender in the same field, it will be impossible to easily retrieve the term without the extra information. This is a critical issue once you want to migrate your data to a more sophisticated termbase solution.
As your termbase grows, you will at some point have a need for more power and flexibility. Once you require advanced search and filtering functions, integration in translation memory tools, collaborative workflows and dynamic publishing solutions, professional terminology management systems like MultiTerm are the next step forward.
Professional terminology management systems are expensive, but the first step towards an efficient solution is collecting the content. This doesn't necessarily require high tech applications. Common office tools like Microsoft Excel or its free open source equivalent Calc, part of the OpenOffice.org suite, are perfect for the job.
I've recently created a sample spreadsheet termbase in Google Docs. It's free and it's web-based, so it allows me to easily share my dictionary. Moreover, this is a perfect input file for a conversion to SDL MultiTerm.
Although spreadsheets are primarily intended for numbers, formulas, charts and financial data, they include a few features that can be very convenient for glossaries: you can easily move columns around, insert new columns or hide existing ones, sort your data alphabetically, export the content into tab-delimited exchange files etc.
In spreadsheet format, you just need to respect the basic principles of database design. One such principle is that each column represents a field (English, French, Definition, Gender, ...), and each row represents a concept.
Furthermore, it is not a good idea to mix different types of information within the same field. If you do enter for instance both a term and its gender in the same field, it will be impossible to easily retrieve the term without the extra information. This is a critical issue once you want to migrate your data to a more sophisticated termbase solution.
As your termbase grows, you will at some point have a need for more power and flexibility. Once you require advanced search and filtering functions, integration in translation memory tools, collaborative workflows and dynamic publishing solutions, professional terminology management systems like MultiTerm are the next step forward.
4 March 2009
Cutting costs: less words... or more?
In the current economic crisis, language service providers are often focusing their marketing message on the cost-saving effects of their services or expertise. VistaTEC's 15 Top Tips explain how optimized source content will indeed result in significantly lower translation costs.
Since translation rates are often based on the number of words, one simple but effective tip is: reduce the word count.
For instance, don't write
In this dialog box, click the Save button in order to save your changes.
if you can replace it by
Click Save to save your changes.
However, a major software company has recently modified its user documentation in a way that actually results in a higher number of words.
Old style:
If you select text formatted with the properties you want to use...
New style:
If you select text that is formatted with the properties that you want to use...
Does the new style result in higher translation costs? Probably not in the long run. By applying stricter style guides (more consistent and explicit style, less ambiguity), translation memories and machine translation will produce better pretranslations, and the costs of post-editing by human translators will go down.
For more information about controlled language and machine translation, visit Uwe Muegge's website.
Since translation rates are often based on the number of words, one simple but effective tip is: reduce the word count.
For instance, don't write
In this dialog box, click the Save button in order to save your changes.
if you can replace it by
Click Save to save your changes.
However, a major software company has recently modified its user documentation in a way that actually results in a higher number of words.
Old style:
If you select text formatted with the properties you want to use...
New style:
If you select text that is formatted with the properties that you want to use...
Does the new style result in higher translation costs? Probably not in the long run. By applying stricter style guides (more consistent and explicit style, less ambiguity), translation memories and machine translation will produce better pretranslations, and the costs of post-editing by human translators will go down.
For more information about controlled language and machine translation, visit Uwe Muegge's website.
Subscribe to:
Posts (Atom)