Salesforce

WorldServer: I see many duplicate TM entries in my Translation Memory. Why is this happening and how to override existing TM entries in WorldServer TM when importing TMs containing corrections - instead of adding new TM entries?

« Go Back

Information

 
Article TypeSolution Article
Scope/EnvironmentWorldServer
Symptoms/Context
I see many duplicate TM entries in my Translation Memory. Why is this happening and how to override existing TM entries in WorldServer TM when importing TMs containing corrections - instead of adding new TM entries?
Resolution

Resolution 1:
Duplicates can be manually removed. For detailed instructions, refer to 

How to delete TM Duplicates from WorldServer Translation Memories

Resolution 2:

We have dealt with customers in the past that needed a more simplistic definition of duplicates and there are a couple defined ways to "override" the default AIS context to an extent so that there is more of a global definition of context.

One approach is to use path normalization to redefine how the AIS context is defined. However, this would not completely eliminate what the customer considers duplicates since duplicate source text within the asset could result in multiple entries with the same translation.

Secondly, the use of segment IDs and enabling SID preferred in-context exact matching (SPICE matching) could work provided that a single SID is used across all assets. The SID implementation rules for TM would enforce the rule that there can only be a single source + SID combination stored in the TM for a single.

Note: these options are not without their costs. Local context information for how a single instance of a source text is used will be overridden by the next occurrence as the SID context and its usage requirements override the standard AIS context.

The SID based implementation would provide the "global uniqueness" that the client is looking for. You will not have multiple 100% matches for the segments because you will not have any alternate translations going forward for the same source text + SID combination. (Note that this solution does not get rid of the legacy TM entries stored in the TM.)
The TM gets unnecessarily big. This would help constrain the size of the TM.

The key to this solution will be how SIDs are defined and applied.

Note: SIDs are currently applicable to certain file types via file type configuration. See also: Support of SIDs for File Types in WorldServer

Note that you can also configure SPICE Matching independently from file types by mapping the relevant AIS property to a TM entry attribute. This article provides exact instructions:

How to set up a SPICE (SID) Matching configuration in WorldServer

 

Root Cause
WorldServer *always* creates duplicates for TM units where the same source segment has different translations. This is the default behavior.

WorldServer does not really consider the TM entries in question as duplicates. The reason is that WorldServer defines duplicates in a manner that is not always visible by looking at TM entries. In the WorldServer context, a duplicate is defined by two entries in the TM that have the exact AIS context, meaning not only is the source text the same, it comes from the same asset and is in the same location within that asset.

One of the goals of the TM is to allow an unmodified asset to be translated exactly the way it was translated before provided the TM entries that were stored "for the asset" were not either modified or deleted. When an asset is translated and saved to the TM, TM entries are stored specifically for that asset. These entries are leverageable by other assets to which that TM is applied, but each of the entries has very specific information in that that associates them to a particular asset and location within the asset.

Here are some of the implications of this:

If two assets have a identical source segment somewhere in the two assets, and these two segments are translated identically, there will be two TM entries stored in the TM for that source segment. This is not a bug, but is by designed. This allows for the translation in one of the assets to change without requiring the translation of other asset to change when re-leveraged.

ICE conditions can be accurately persisted since the two identical segments are not necessarily used in the same way – meaning they may exist between different segments. This is referred to as their usage context, and you can think of it as the sentences around a segment. Note that even within the same asset, it is possible to have an identical segment translated the same or differently, and in both cases, 2 entries would be stored.

In short the TM stores multiple translations to the database because it identifies segments by source text and context.
Reference
Attachment 1 
Attachment 2 
Attachment 3 
Attachment 4 
Attachment 5 

Powered by