When exporting a Term Database from WorldServer in Tab delimited or CSV format, the translated terms have character corruption

« Go Back


Article TypeSolution Article
Scope/EnvironmentWorldServer all versions
After exporting a Term Database from WorldServer in Delimited File: Simple Format​ or CSV: Advanced format, when I open the  resulting CSV file in Microsoft Excel, the characters of non.ASCII languages display garbled or as question mark. They do not display correctly. Here is a screenshot of how Russian terms would look like:

User-added image
There are a couple of workarounds:

A- use Google docs:

1- Log in to your Google account (or create one if you do not have one yet)
2- On the Drive screen, click on New to create a new file and choose Google Sheet > Blank spreadsheet. A new, empty spreadsheet will open up.
3- From the File menu choose Import and click on the Upload tab.
4- Now drag and drop your XLSX file into the window.
5- Choose Replace spreadsheet
6- Choose whichever character you are using as a Separator (in this case, use Comma)
7- Click Import. The file gets imported and it will open up automatically. As you can see, the characters display nicely.
8-From the File menu choose Download as -> Microsoft Excel (*.xlsx) to convert it and download it as Microsoft Excel 2007-2013 (XLSX) file.

B- Use Microsoft Word

1- Open Microsoft Word and go to File/Open. Make sure to select All files in your Open window.
2- Browse to the CSV file and select it to open it
3- A File conversion window will open up. The Unicode (UTF8) encoding is preselected. Click OK.
4- Now the content of the CSV file will be opened in Word. As you will see, there are no character issues.
5- Copy all content in the Word file
6- Now open Microsoft Excel and a new/empy spreadhsheet.
7- Paste the content of the Word file in there
8- Still in Excel, select Column A, go to the Data Tab
9- Click on Text to Columns. The Convert Text to Columns wizard will open up. Make sure that the file type Delimited is selected. 
10- click Next
11- In the list of Delimiters, make sure that Tab and Comma are selected.
12- Click Finish

Your Termbase is now imported to an Excel file and your characters will display correctly.

C-  Download an XML Editor like Foxe from Firstobject from here

1- Open the CSV file in Foxe
2- You will se the characters display correctly
3- Copy all content
4- Open Excel and a new spreadsheet
5- Paste the copied content into your empty Excel spreadheets
Root Cause
Actually the export into CSV format (and UTF8 encoding) works just fine. There is no issue on WorldServer's side and the export procedure is correct. The problem is a general limitation in Microsoft Excel's capability to read the characters in this format (CSV). This is a well known issue and there are many articles about this on the Internet.
Attachment 1 
Attachment 2 
Attachment 3 
Attachment 4 
Attachment 5 

Powered by