Word count for
Asian languages like Thai, Japanese and Chinese when used as
source languages in
SDL TMS or
SDL Trados Studio, is calculated with the number of characters i.e. when
Trados Studio filters are used for file processing, then the system calculates the words as '
one character = one word'.
This is because these Oriental languages
do not use space between their words and so the traditional word count calculation which uses space to differentiate between words cannot be used. Hence, for logical reasons
character count is used as
word count for these languages.
For mixed content, that is source content that contains both CJK (Chinese, Japanese, and Korean) characters and non-CJK characters, when counting "words" in a segment, TMS iterates through the characters of the segment one-by-one. Whitespace (and some other punctuation characters) are considered to be word-breaks. When it encounters a CJK character it starts counting the characters, when it returns to non-CJK it switches back to counting "words".
Here are some examples of how TMS would count:
Segment | TMS 'Word" Count |
---|
Hello world | 2 Words |
こんにちは世界 | 7 Words |
Hello 世界 | 3 Words |
Hello 世界 world | 4 Words |
This shows that TMS does not have a concept of characters, as everything is counted as a word.