Is fuzzy matching, or partial string matching supported for terminology hits in native WorldServer TermBases? To give an example: If a new source segment contains the word airport and the project TermBase contains the term entry port, there should be a match and that match should be found and visible when clicking on the TD icon in Browser workbench. It should also be exported as part of the WSXZ package. However, only full match terms are found and exported. Is there a plan to implement this capability in the near future? |
The Fuzzy term searching is a function of the WorldServer Terminology tool if Multiterm Term Databases are used. This applies to Browser Workbench, Online Editor or when exporting as part of a Studio package. This section of the RWS Documentation Center talks about the integration between WorldServer and GroupShare/Multiterm Term databases. This is an example of Fuzzy Match results displayed in Browser Workbench when using a Multiterm Term Database: Also, the WSXZ package exported from a project using a Multiterm Term Database includes a TBX file with the relevant terms (including fuzzy matches) and some additional files and folders that are not present in packages exported from a project that do not use Multiter Term Databases. Here is an example of the content of such a WSXZ package including a Multiterm TD: However, if WorldServer native Term Databases are in use, there is no out-of-the-box support for fuzzy term look-up. The Online Editor includes TD look-up starting from WorldServer version 11.6. But again, if you're using native WorldServer Term Databases, there will also be no fuzzy search in the Online Editor. This is a function of the terminology technology and not the editor that exercises it. There is an out-of-the-box mechanism for a sophisticated user to add this support via customization (a.k.a. SDK Stemmer Component), as detailed in the WorldServer documentation here. The location of the stemmer samples in the SDK zip file (xxxx stands for the version number) is under wssdk_xxx\samples\dist\linguistic_samples.zip. Although SDK sample stemmers are available, we are not aware of whether those have ever been used since stemmers are usually customized by RWS Professional Services on behalf of our customers. They need to be deployed per language if a fuzzy look-up for all languages in the TD is preferred. Without stemming, searches are sensitive to any variations in a word. Differences in terms (for example "laminate" vs. "laminating"), number (for example, "server" vs. "servers") or other inflexions will prevent leverage of terms that do not match exactly. With stemming enabled, WorldServer reduces these variations of a word to a common stem for searching purposes. This increases terminology leverage by eliminating inflectional differences in the words of a term. While most users end up deciding on a better terminology management tool, such as MultiTerm, if you are interested in such a customization, please engage the RWS Professional Services team. |