How are 'Words' and 'Characters' counted for Asian (Chinese) languages?

This is a very simple, and very important question, but I can't see a clear answer everywhere. 

In my project, 'Use word-based tokenization for Asian languages' is NOT ticked. It would be good to know how this would affect the word count. How are Asian words being 'tokenised' and what does it mean? 

My language is Chinese. I don't see how 'words' are defined here. 

Thanks

Parents
  • Hello  ,

    The following Gateway article explains this well- (since Studio 2017)

    SDL Trados Studio Application

    For users to get more complete information when translating from Asian languages in the light of the new Asian tokenization option, there is now a Word column in the Analysis report for Asian source languages:

    >if the character-based tokenization (active by default) is used, the word column reports a single Asian-language character as one word and a Western-language word as one word.

    >if the new word-based tokenization is used, the word column reports Asian-language words as words identified by the new tokenization engine and Western-language words also as one word. This typically always results in a lower word count.

Reply
  • Hello  ,

    The following Gateway article explains this well- (since Studio 2017)

    SDL Trados Studio Application

    For users to get more complete information when translating from Asian languages in the light of the new Asian tokenization option, there is now a Word column in the Analysis report for Asian source languages:

    >if the character-based tokenization (active by default) is used, the word column reports a single Asian-language character as one word and a Western-language word as one word.

    >if the new word-based tokenization is used, the word column reports Asian-language words as words identified by the new tokenization engine and Western-language words also as one word. This typically always results in a lower word count.

Children