SDL TMS: How is the word count calculated in TMS for Asian languages as source languages?

Article Number:000012974 | Last Updated:8/3/2022 11:49 PM

Scope/Environment

SDL TMS All versions

Question

How is the word count calculated in TMS for Asian languages as source languages?

Answer

Word count for Asian languages like Thai, Japanese and Chinese when used as source languages in SDL TMS or SDL Trados Studio, is calculated with the number of characters i.e. when Trados Studio filters are used for file processing, then the system calculates the words as 'one character = one word'.

This is because these Oriental languages do not use space between their words and so the traditional word count calculation which uses space to differentiate between words cannot be used. Hence, for logical reasons character count is used as word count for these languages.

For mixed content, that is source content that contains both CJK (Chinese, Japanese, and Korean) characters and non-CJK characters, when counting "words" in a segment, TMS iterates through the characters of the segment one-by-one. Whitespace (and some other punctuation characters) are considered to be word-breaks. When it encounters a CJK character it starts counting the characters, when it returns to non-CJK it switches back to counting "words".

Here are some examples of how TMS would count:

Segment	TMS 'Word" Count
Hello world	2 Words
こんにちは世界	7 Words
Hello 世界	3 Words
Hello 世界 world	4 Words

This shows that TMS does not have a concept of characters, as everything is counted as a word.

Reference

Send Article Feedback