What are recognized tokens aka placeables in SDL Trados Studio?

Today, I want to share some info (that can also be found in the SDL Trados Studio Online Help) about recognized tokes also known as placeables.


When I started to use SDL Trados 2007 I was somehow like "Placeables...Tokens...What???".

However, within these ...so far 8 years... it took some time for me at the beginning to understand but I managed to get a grip of the tokens :) and the Online Help as it is today, explains everything really well.

Therefore I gathered the information from the Online Help regarding "recognize tokens" in this blog post and hope the veterans and also newbies and everyone in-between can find still useful information.


Cheers, Richard

About Recognized Tokens


Recognized tokens are source document content that has been recognized as:

  • Content not requiring translation.
  • Content which can be automatically localized by applying a translation memory. For example, some dates can automatically be converted to the correct format by applying a translation memory.

Markup tags, placeholder tags, numbers, variables, dates, acronyms and alphanumeric strings are all examples of recognized tokens. They are identified in the Editor window by a blue square-bracket underline.

Note: The underline is only displayed when you have a translation memory open and the cursor is in the segment where the recognized token is located.

About Recognized Tokens


A recognized token is a short piece of text, enclosed in a segment, that a TM treats as a single word because it is a defined format. For example, if dates are enabled as a recognized token in a TM, the TM recognizes Monday 1 January, 1900 as one word. Tokens that can be recognized are:

  • Dates
  • Times
  • Numbers (in numerals)
  • Measurements
  • Acronyms and URLs
  • Alphanumeric strings
  • Variables
  • Inline tags

The TM settings determine which tokens are recognized. The TM scans the TU for recognized tokens only when the TU is added to the TM. If you later enable token recognition, the TM does not immediately analyze existing TUs for tokens: you need to re-index the TM. When you do so, the TM uses the current token settings and scans existing TUs for tokens.

Recognize dates

If you select this option, dates in source segments are automatically converted to the correct format to be placed in target segments. If the source segment in a translation memory matches segment in the presented text in all content apart from the value of the date that occurs in the segment, the TM still registers a 100% match. The TM returns the translation and uses the value of the date from the presented text (rather than from the TM) to complete the translation. This is also the case for matching segments that contain numbers, variables, alphanumeric strings, times, measurements and acronyms.

 

In addition, the value of the date from the presented text will replace the value of the date in the target segment of the TM if the same date is used in both the target and source segment of the TM.

 

If you have defined some auto-localization settings for dates, these will be applied to dates in target segments. You define auto-localization settings for a language pair in the Options dialog box under Language Pairs.

 

Recognize times

Select this to recognize a time of day occurring in a source segment and convert the time to the correct format in target segments.

Times can also be automatically converted to the correct target language format.

 

Recognize numbers

Select this to recognize a number occurring in a source segment and convert it to the correct format in target segments.

 

For example, suppose the following segment occurs in your English-German translation: Today, the DAX was down 11.98 points (= 0.55%) to 4,312.79. The numbers would appear in the German translation as: 11,98, 0,55%, and 4.312,79.

 

Recognize alphanumeric strings

Select this to recognize any alphanumeric strings found in the source segments. These are codes made up of combinations of:

 

Letters (lowercase and/or UPPERCASE)

+

numbers

 

         optionally with              • underscores NAME_4001_co)

                                                    • full stops (BV0.mxm.072.531)

UPPERCASE letters

+  

dashes

+

numbers

               optionally with                    underscores (17620-ZY8_003)
  
 

 

Conditions

To be recognized as alphanumeric strings, the strings:

  • must not start or end with underscores, hyphens or full stops
  • must not contain dashes and full stops at the same time
  • must contain at least one number and one letter
  • must not contain lowercase characters and dashes at the same time

Instead of trying to translate such strings, Studio automatically transfers the alphanumeric string from the source segment into the target segment. Then it completes the segment translation by inserting the rest of the text from the translation unit. Conditions

 

To be recognized as alphanumeric strings, the strings:

  • must not start or end with underscores, hyphens or full stops
  • must not contain dashes and full stops at the same time
  • must contain at least one number and one letter
  • must not contain lowercase characters and dashes at the same time

Instead of trying to translate such strings, Studio automatically transfers the alphanumeric string from the source segment into the target segment. Then it completes the segment translation by inserting the rest of the text from the translation unit.

 

Server-based TMs

The option to Recognize alphanumeric strings is only compatible with GroupShare 2014 SP2 and later. This means that TMs available on previous GroupShare servers cannot recognize alphanumerics. For file-based translation memories, this option is always available, regardless of the Studio version in which the file-based TM was created.

 

Recognize acronyms

If you choose to recognize acronyms, they become available interactively in the recognized tokens drop-down list in the Editor view. Acronyms are treated literally (like words) but are still recognized text; this enables you to select them from the recognized tokens drop-down list and insert the acronym quickly into the target segment.

An internal regular expression in Studio identifies words as being acronyms when it detects:

    • a word in uppercase containing minimum two letters, or

    • a word in uppercase containing minimum one letter, followed by & and followed by at least another uppercase letter.

Items containing a full stop are not recognized as an acronym.

If acronym recognition is activated, URLs (hyperlinks) and IP addresses are also recognized as recognized tokens.

 

  Note: If more than 66 % of the letters in a segment are in uppercase and the segment contains at least two words, Studio considers that there are no acronyms in the text and treats the words literally. For example, Studio will detect two acronyms in "ABCD abcd ABCE”, as 66% of the letters are in uppercase, and none in “ABCD abcd ABCEF”, as 75% of letters are in uppercase.

 

Recognize variables

Select this to recognize variables. Variables are items which should not be translated. You create a list of variables in the Language Resources. This list appears in the New Translation Memory wizard, the New Server-based Translation Memory wizard, and also in language resource templates.

 

Variables are recognized tokens, and are copied verbatim into the target segment.

 

Recognize measurements

If you choose to recognize numbers (that includes numbers which are part of a measurement) SDL Trados Studio will place the number in the target segment and format it correctly for the target language.

 
Word Count Settings: you can control how the word count engine reports words that are separated by hyphens, dashes or formatting tags. These settings are disabled when creating a new server-based translation memory.
 
Count as one if words:

Are hyphenated

Select this to make words that contain hyphens count as a single word. For example, with this enabled, "two-wheeled vehicles" counts as two words, with this disabled, it counts as three words. By default, this is enabled.

 

Are joined by dashes

Select this to make words that are separated by dashes count as a single word. For example, with this enabled, "two—wheeled vehicles" counts as two words, with this disabled, it counts as three words. By default, this is enabled.

 

Contain formatting tags

Select this and the word count engine will not break words that contain formatting tags.

How recognized tokens are handled in translation

For some recognized tokens, the suggested target is the same as the source:

  • Inline tags
  • Acronyms
  • URLs
  • Alphanumeric strings
  • Variables

For the remaining recognized tokens, the translation can be automatically localized from the source. These tokens are:

  • Dates
  • Times
  • Numbers (in numerals)
  • Measurements

The project settings determine if the software does or does not auto-localize these tokens.