Salesforce

WorldServer - How do I convert my Glossary in XLSX format to CSV without corrupting the characters of non-ASCII languages

« Go Back

Information

 
TitleWorldServer - How do I convert my Glossary in XLSX format to CSV without corrupting the characters of non-ASCII languages
URL Name000004328
SummaryThis request is actually unrelated to WorldServer. It is a general file conversion problem. The problem in this case is a general limitation in Microsoft  Excel's capability to read the characters in this format (CSV). This is a well known issue and you can find many articles about this on the Internet. The easiest method is to use Google Docs to perform the conversion.
Scope/EnvironmentSDL WorldServer
Question
I have created a simple Glossary in Excel (*.XLSX) format and I want to convert it to *.CSV format in order to import it to WorldServer. However, when I do so, the terms of non-ASCII languages such as Russian or asian languages display as garbled and are corrupted.

How can I convert my Excel Glossary without corrupting the characters?

 
Answer
This request is actually unrelated to WorldServer. It is a general file conversion problem. The problem in this case is a general limitation in Microsoft  Excel's capability to read the characters in this format (CSV). This is a well known issue and you can find many articles about this on the Internet.

The easiest method is to use Google Docs to perform the conversion. Here are the steps to be followed:
  1. Log in to your Google account (or create one if you do not have one yet)
  2. On the Drive screen, click on New to create a new file and choose Google Sheet > Blank spreadsheet. A new, empty spreadsheet will open up.
  3. From the File menu choose Import and click on the Upload tab.
  4. Now drag and drop your XLSX file into the window
  5. An Import File window will appear. 
  6. Select the option Replace spreadsheet (which is selected per default).
  7. Click Import. The file gets imported and is automatically opened. You will see that it displays nicely.
  8. Go to File > Download > Comma Separated Values (.csv, current sheet)
  9. The file will be downloaded in CSV format and will be automatically named Untitled spreadsheet - Sheet1.csv. Rename it to your original file name but with the correct CSV extension.

The CSV file created this way preserves the language encoding. If you open it in Microsoft Excel, you will still not be able to view the proper  fonts. As mentioned, this is an Excel limitation.

But if you open the file in Notepad++ or in a free XML Editor like Foxe from Firstobject (which can be downloaded from here), you will see that the characters display correctly. The CSV file can now be imported into your WorldServer Term Database.
 
Reference
Attachment 1 
Attachment 2 
Attachment 3 
Attachment 4 
Attachment 5 

Powered by