Opening tab-delimited text file in Excel mangles special symbols - excel

I have a tab-delimited text file which contains dagger characters (†). When I open this in Excel 2010, they are mangled and replaced with † (I'm not sure if Excel is adding the space, too). Why does this occur and how can I fix it?
Right now I do search and replacing in Excel to replace the †s, but it's inefficient for many files and hacky.

The original file is not using the character encoding that Excel expects.
See
Character Encoding and the ’ Issue
Excel's Import Wizard is better at handling encoding issues and may be able to open your source file properly. See
Microsoft Excel mangles Diacritics in .csv files?

Nevermind, after taking another look at Open .csv file containing special characters in Excel I realized that instead of Right Click -> Open with Excel, if I go to File -> Open in Excel it lets me choose the encoding.

Related

Changing text file encoding to ANSI

I need to convert some txt files to ANSI encoding. These files should have ANSI encoding according to Notepad++ encoding description.
I compare these files to a sample file which is identified by Notepad++ as ANSI encoded. These files should have ANSI encoding to be able to upload them to an old application.
I need to do it automatically as a part of VBA code which I also use to create the files from Excel data. Each time the macro pastes data from Excel to new workbook and saves as text file. Maybe it is possible to just choose encoding while saving Excel file but I didn't find the ANSI encoding in save options in Excel.
I figured out that maybe the best way would be to open each text file in Notepad and save as choosing the ANSI encoding. Is it possible to do using Shell call in VBA ?

Excel xlsx file saving as CSV file - Korean and Japanese cracking badly

I am trying to make a CSV file from an Excel file. It has English, Korean and Japanese inputs. Right now it's saved as file.xlsx.
But when I try to save-as CSV through Excel as file.csv, all the Korean and Japanese inputs turn into question marks (???????)
I tried importing into Google Spreadsheets and exporting out as csv from there (from reading some other solutions) but it still turns into question marks.
I tried building a CSV file from scratch and just copying/pasting values from the Excel file into the CSV, but after I save it as CSV, the characters always crack.
Does anybody know how to work-around this? Thank you
I don't know that there IS an answer for this. CSV has no encoding, so it gets lost when you save in that format.
I tried, as a test, saving Chinese characters as a Unicode Text file, and believe it or not, that worked. So you may be able to do that, and simply change the filename to CSV. Assuming for some reason you NEED the filename to be CSV.
EDIT: I just ran addional testing on this. I was able to reimport the TXT file with either TXT or CSV extension, and the characters stayed just fine. So I think Unicode text is your answer.
Simply opening a CSV file in Excel only works when default assumptions hold. You may be writing the CSV correctly but not validating it properly.
It is more reliable to open a blank worksheet and then use Data Import. The encoding of the CSV file is one of the parameters you can specify.
To fully retain the characters while saving it on a CSV format and to somehow be able to import/re-use the data in the future.
You can follow these steps.
In Microsoft Excel, open the *.xlsx file.
Select Menu | Save As.
Enter any name for your file.
Under "Save as type," select Unicode Text.
Click Save.
Open your saved file in Microsoft Notepad.
Replace all tab characters with commas (",").
Select a tab character (select and copy the space between two column headers)
Open the "Find and Replace" window (Press Ctrl+H) and replace all tab characters with comma .
Click Save As.
Name the file, and change the Encoding: to UTF-8.
Change the file extension from .txt to .csv.
Click Save.
Open the .csv file in Excel to view your data.
Had the same issue. the below article shows the workaround in details:
https://help.salesforce.com/articleView?id=000003837&type=1
However, i decided to go with LibreOffice Calc, as it requires less steps to achieve the desired outcome. While exporting, you get to select charecter set, field delimiter and text decimeter.
For all other tasks, i prefer Excel.
Download and install Unicode CSV Addin for excel.
Save the csv from the new "Unicode CSV" menu as shown in picture
below.

How to export Excel file with arabic characters to CSV file? [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.
Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.
A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.
NOTE: Google Sheets does have limitations when importing. See here.
NOTE: Be careful of sensitive data with Google Sheets.
EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.
I've found OpenOffice's spreadsheet application, Calc, is really good at handling CSV data.
In the "Save As..." dialog, click "Format Options" to get different encodings for CSV. LibreOffice works the same way AFAIK.
Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.
(optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".
Import your *.txt file in the target application. Make sure it can accept UTF16 format.
If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.
I use this procedure to import data from Excel to Moodle.
I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.
Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.
Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.
Please give it a try and see if it works for you. Good luck.
You can use iconv command under Unix (also available on Windows as libiconv).
After saving as CSV under Excel in the command line put:
iconv -f cp1250 -t utf-8 file-encoded-cp1250.csv > file-encoded-utf8.csv
(remember to replace cp1250 with your encoding).
Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).
You can do this on a modern Windows machine without third party software. This method is reliable and it will handle data that includes quoted commas, quoted tab characters, CJK characters, etc.
1. Save from Excel
In Excel, save the data to file.txt using the type Unicode Text (*.txt).
2. Start PowerShell
Run powershell from the Start menu.
3. Load the file in PowerShell
$data = Import-Csv C:\path\to\file.txt -Delimiter "`t" -Encoding BigEndianUnicode
4. Save the data as CSV
$data | Export-Csv file.csv -Encoding UTF8 -NoTypeInformation
The only "easy way" of doing this is as follows. First, realize that there is a difference between what is displayed and what is kept hidden in the Excel .csv file.
Open an Excel file where you have the info (.xls, .xlsx)
In Excel, choose "CSV (Comma Delimited) (*.csv) as the file type and save as that type.
In NOTEPAD (found under "Programs" and then Accessories in Start menu), open the saved .csv file in Notepad
Then choose -> Save As... and at the bottom of the "save as" box, there is a select box labelled as "Encoding". Select UTF-8 (do NOT use ANSI or you lose all accents etc). After selecting UTF-8, then save the file to a slightly different file name from the original.
This file is in UTF-8 and retains all characters and accents and can be imported, for example, into MySQL and other database programs.
This answer is taken from this forum.
Another one I've found useful:
"Numbers" allows encoding-settings when saving as CSV.
Using Notepad++
This will fix the corrupted CSV file saved by Excel and re-save it in the proper encoding.
Export CSV from Excel
Load into Notepad++
Fix encoding
Save
Excel saves in CP-1252 / Windows-1252. Open the CSV file in Notepad++. Select
Encoding > Character Sets > Western European > Windows-1252
Then
Encoding > Convert to UTF-8
File > Save
First tell Notepad++ the encoding, then convert. Some of these other answers are converting without setting the proper encoding first, mangling the file even more. They would turn what should be ’ into 達. If your character does not fit into CP-1252 then it was already lost when it was saved as CSV. Use another answer for that.
"nevets1219" is right about Google docs, however if you simply "import" the file it often does not convert it to UTF-8.
But if you import the CSV into an existing Google spreadsheet it does convert to UTF-8.
Here's a recipe:
On the main Docs (or Drive) screen click the "Create" button and choose "Spreadsheet"
From the "File" menu choose "Import"
Click "Choose File"
Choose "Replace spreadsheet"
Choose whichever character you are using as a Separator
Click "Import"
From the "File" menu choose "Download as" -> CSV (current sheet)
The resulting file will be in UTF-8
Under Excel 2016 and up (including Office 365), there is a CSV option dedicated to the UTF-8 format.
In Office 365, do Save As; where previously one might have chosen CSV (Comma Delimited), now one of the file types you can save as is CSV UTF-8 (Comma delimited) (*.csv)
What about using Powershell.
Get-Content 'C:\my.csv' | Out-File 'C:\my_utf8.csv' -Encoding UTF8
For those looking for an entirely programmatic (or at least server-side) solution, I've had great success using catdoc's xls2csv tool.
Install catdoc:
apt-get install catdoc
Do the conversion:
xls2csv -d utf-8 file.xls > file-utf-8.csv
This is blazing fast.
Note that it's important that you include the -d utf-8 flag, otherwise it will encode the output in the default cp1252 encoding, and you run the risk of losing information.
Note that xls2csv also only works with .xls files, it does not work with .xlsx files.
Easiest way:
No need Open office and google docs
Save your file as "Unicode text file";
now you have an unicode text file
open it with "notepad" and "Save as" it with selecting "utf-8" or
other code page that you want
rename file extension from "txt" to "csv". This will result in a tab-delimited UTF-8 csv file.
If you want a comma-delimited file, open the csv file you just renamed and replace all tabs with commas. To do this in Notepad on Win 10, simply select one tab field then click Ctrl+H. In the window that opens, type a comma , in the "Replace with" field then click "Replace All". Save your file. The result will be a comma-delimited UTF-8 csv file.
Don't open it with MS-Office anyway!!!
Now you have a tab delimited CSV file.
Or, a comma-delimited one if you applied step number 5.
As funny as it may seem, the easiest way I found to save my 180MB spreadsheet into a UTF8 CSV file was to select the cells into Excel, copy them and to paste the content of the clipboard into SublimeText.
I was not able to find a VBA solution for this problem on Mac Excel. There simply seemed to be no way to output UTF-8 text.
So I finally had to give up on VBA, bit the bullet, and learned AppleScript. It wasn't nearly as bad as I had thought.
Solution is described here:
http://talesoftech.blogspot.com/2011/05/excel-on-mac-goodbye-vba-hello.html
Assuming an Windows environment, save and work with the file as usual in Excel but then open up the saved Excel file in Gnome Gnumeric (free). Save Gnome Gnumeric's spreadsheet as CSV which - for me anyway - saves it as UTF-8 CSV.
Easy way to do it: download open office (here), load the spreadsheet and open the excel file (.xls or .xlsx). Then just save it as a text CSV file and a window opens asking to keep the current format or to save as a .ODF format. select "keep the current format" and in the new window select the option that works better for you, according with the language that your file is been written on. For Spanish language select Western Europe (Windows-1252/ WinLatin 1) and the file works just fine. If you select Unicode (UTF-8), it is not going to work with the spanish characters.
Save xls file (Excel file) as Unicode text=>file will be saved in text format (.txt)
Change format from .txt to .csv (rename the file from XYX.txt to XYX.csv
I have also came across the same problem but there is an easy solution for this.
Open your xlsx file in Excel 2016 or higher.
In "Save As" choose this option: "(CSV UTF-8(Comma Delimited)*.csv)"
It works perfectly and a csv file is generated which can be imported in any software. I imported this csv file in my SQLITE database and it works perfectly with all unicode characters intact.
Came across the same problem and googled out this post. None of the above worked for me. At last I converted my Unicode .xls to .xml (choose Save as ... XML Spreadsheet 2003) and it produced the correct character. Then I wrote code to parse the xml and extracted content for my use.
I have written a small Python script that can export worksheets in UTF-8.
You just have to provide the Excel file as first parameter followed by the sheets that you would like to export. If you do not provide the sheets, the script will export all worksheets that are present in the Excel file.
#!/usr/bin/env python
# export data sheets from xlsx to csv
from openpyxl import load_workbook
import csv
from os import sys
reload(sys)
sys.setdefaultencoding('utf-8')
def get_all_sheets(excel_file):
sheets = []
workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
all_worksheets = workbook.get_sheet_names()
for worksheet_name in all_worksheets:
sheets.append(worksheet_name)
return sheets
def csv_from_excel(excel_file, sheets):
workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
for worksheet_name in sheets:
print("Export " + worksheet_name + " ...")
try:
worksheet = workbook.get_sheet_by_name(worksheet_name)
except KeyError:
print("Could not find " + worksheet_name)
sys.exit(1)
your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for row in worksheet.iter_rows():
lrow = []
for cell in row:
lrow.append(cell.value)
wr.writerow(lrow)
print(" ... done")
your_csv_file.close()
if not 2 <= len(sys.argv) <= 3:
print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
sys.exit(1)
else:
sheets = []
if len(sys.argv) == 3:
sheets = list(sys.argv[2].split(','))
else:
sheets = get_all_sheets(sys.argv[1])
assert(sheets != None and len(sheets) > 0)
csv_from_excel(sys.argv[1], sheets)
Excel typically saves a csv file as ANSI encoding instead of utf8.
One option to correct the file is to use Notepad or Notepad++:
Open the .csv with Notepad or Notepad++.
Copy the contents to your computer clipboard.
Delete the contents from the file.
Change the encoding of the file to utf8.
Paste the contents back from the clipboard.
Save the file.
A second option to "nevets1219" is to open your CSV file in Notepad++ and do a convertion to ANSI.
Choose in the top menu :
Encoding -> Convert to Ansi
Encoding -> Convert to Ansi will encode it in ANSI/UNICODE. Utf8 is a subset of Unicode. Perhaps in ANSI will be encoded correctly, but here we are talking about UTF8, #SequenceDigitale.
There are faster ways, like exporting as csv ( comma delimited ) and then, opening that csv with Notepad++ ( free ), then Encoding > Convert to UTF8. But only if you have to do this once per file. If you need to change and export fequently, then the best is LibreOffice or GDocs solution.
Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.
open .csv fine with notepad++. if you see your encoding is good (you see all characters as they should be) press encoding , then convert to ANSI
else - find out what is your current encoding
another solution is to open the file by winword and save it as txt and then reopen it by excel and it will work ISA
Save Dialog > Tools Button > Web Options > Encoding Tab
I have the same problem and come across this add in , and it works perfectly fine in excel 2013 beside excel 2007 and 2010 which it is mention for.

How do I export an Excel file with Chinese characters to a CSV?

I having a Excel document with a data table containing Chinese characters. I am trying to export this Excel spreadsheet to a CSV file for importing into a MySQL database.
However, when I save the Excel document as a CSV file, Notepad displays the resulting CSV file's Chinese characters as question marks. Importing into MySQL preserves the question marks, completely ignoring what the original Chinese characters are.
I'm suspecting this may have to do with using Excel with UTF-8 encoding. Thanks for your help!
The following method has been tested and used to import CSV files in MongoDB, so it should work:
In your Excel worksheet, go to File > Save As.
Name the file and choose Unicode Text (*.txt) from the drop-down list next to "Save as type", and then click Save.
Open the unicode .txt file using your preferred text editor, for example Notepad.
Since our unicode text file is a tab-delimited file and we want to convert Excel to CSV (comma-separated) file, we need to replace all tabs with commas.
Select a tab character, right click it and choose Copy from the context menu, or simply press CTRL+C as shown in the screenshot below.
Press CTRL+H to open the Replace dialog and paste the copied tab (CTRL+V) in the Find what field. When you do this, the cursor will move rightwards indicating that the tab was pasted. Type a comma in the Replace with field and click Replace All.
Click File > Save As, enter a file name and change the encoding to UTF-8. Then click the Save button.
Change the .txt extension to .csv directly in Notepad's Save as dialog and choose All files (.) next to Save as type, as shown in the screenshot below.
Open the CSV file from Excel by clicking File > Open > Text files (.prn, .txt, .csv) and verify if the data is Okay.
Source here
As far as I know Excel doesn't save CSV files in any Unicode encoding. I have had similar issues recently trying to export a file as CSV with the £ symbol. I had the benefit of being able to use another tool altogether.
My version of Excel 2010 can export in Unicode format File > Save As > Unicode Text (.txt), but the output is a tab-delimited, UCS-2 encoded file. I don't know MySQL at all but a brief look at the specifications and it appears to handle tab delimited imports and UCS-2. It may be worth trying this output.
Edit: Additionally, you could always open this Unicode output in Notepad++ convert it to UTF-8 Encoding > Convert to UTF-8 without BOM And possibly replace all tab chars with commas too (Use the Replace dialogue in Extended Search mode, \t in the Find box and , in the Replace box.)
You might want to try notepad++, I doubt notepad will support unicode characters.
http://notepad-plus-plus.org/
For some people this solution may work: https://support.geekseller.com/knowledgebase/utf-8/
When saving csv, go to lower right Tools > Web Options > Encoding > Unicode (UTF-8)
Or this SO answer: just use Google Sheets to save csv as unicode:
Excel to CSV with UTF8 encoding
I have tried all above methods for my data but it does not quite work for my data (Simplified Chinese, over 700Mb. I have tried Windows Chinese and English system, English and Chinese excel. Windows excel seems not be able to save to utf8 even it claims to do so. I specify the uft8 csv in save as, but when i use the 'open sheet' to detect the encoding mehtods. it is not uft8,not GB* as well.
Here is my final solution.
(1) Download 'open sheet'.
(2) Open it properly. You Ccan scroll the encoding method until you see the Chinese character displayed in the preview windows.
(3) Save it as utf-8(if you want utf-8).
PS:You need to figure out the default encoding in your system. As far
as I know, Ubuntu deals with UTF8 fine. But the windows default
Simplied Chinese is start with GB**.Even if you encode it as utf8,
still, you might open it cocrrectly as well. In my case, r could not
open my utf-8 csv, but can open the GB* encoding.
This methods work well even your file is very large.
Some other work around is google sheet(but the file size can be limited). Notepad++ also works for smaller file.
There is a way to detect the encoding methods by opening your file and scroll through the encoding methods until you see the Chinese displayed correctly.
You should save csv file with:
df.to_csv(file_name, encoding = 'utf_8_sig')
instead of:
df.to_csv(file_name, encoding = 'utf-8')

Excel to CSV with UTF8 encoding [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.
Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.
A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.
NOTE: Google Sheets does have limitations when importing. See here.
NOTE: Be careful of sensitive data with Google Sheets.
EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.
I've found OpenOffice's spreadsheet application, Calc, is really good at handling CSV data.
In the "Save As..." dialog, click "Format Options" to get different encodings for CSV. LibreOffice works the same way AFAIK.
Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.
(optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".
Import your *.txt file in the target application. Make sure it can accept UTF16 format.
If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.
I use this procedure to import data from Excel to Moodle.
I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.
Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.
Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.
Please give it a try and see if it works for you. Good luck.
You can use iconv command under Unix (also available on Windows as libiconv).
After saving as CSV under Excel in the command line put:
iconv -f cp1250 -t utf-8 file-encoded-cp1250.csv > file-encoded-utf8.csv
(remember to replace cp1250 with your encoding).
Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).
You can do this on a modern Windows machine without third party software. This method is reliable and it will handle data that includes quoted commas, quoted tab characters, CJK characters, etc.
1. Save from Excel
In Excel, save the data to file.txt using the type Unicode Text (*.txt).
2. Start PowerShell
Run powershell from the Start menu.
3. Load the file in PowerShell
$data = Import-Csv C:\path\to\file.txt -Delimiter "`t" -Encoding BigEndianUnicode
4. Save the data as CSV
$data | Export-Csv file.csv -Encoding UTF8 -NoTypeInformation
The only "easy way" of doing this is as follows. First, realize that there is a difference between what is displayed and what is kept hidden in the Excel .csv file.
Open an Excel file where you have the info (.xls, .xlsx)
In Excel, choose "CSV (Comma Delimited) (*.csv) as the file type and save as that type.
In NOTEPAD (found under "Programs" and then Accessories in Start menu), open the saved .csv file in Notepad
Then choose -> Save As... and at the bottom of the "save as" box, there is a select box labelled as "Encoding". Select UTF-8 (do NOT use ANSI or you lose all accents etc). After selecting UTF-8, then save the file to a slightly different file name from the original.
This file is in UTF-8 and retains all characters and accents and can be imported, for example, into MySQL and other database programs.
This answer is taken from this forum.
Another one I've found useful:
"Numbers" allows encoding-settings when saving as CSV.
Using Notepad++
This will fix the corrupted CSV file saved by Excel and re-save it in the proper encoding.
Export CSV from Excel
Load into Notepad++
Fix encoding
Save
Excel saves in CP-1252 / Windows-1252. Open the CSV file in Notepad++. Select
Encoding > Character Sets > Western European > Windows-1252
Then
Encoding > Convert to UTF-8
File > Save
First tell Notepad++ the encoding, then convert. Some of these other answers are converting without setting the proper encoding first, mangling the file even more. They would turn what should be ’ into 達. If your character does not fit into CP-1252 then it was already lost when it was saved as CSV. Use another answer for that.
"nevets1219" is right about Google docs, however if you simply "import" the file it often does not convert it to UTF-8.
But if you import the CSV into an existing Google spreadsheet it does convert to UTF-8.
Here's a recipe:
On the main Docs (or Drive) screen click the "Create" button and choose "Spreadsheet"
From the "File" menu choose "Import"
Click "Choose File"
Choose "Replace spreadsheet"
Choose whichever character you are using as a Separator
Click "Import"
From the "File" menu choose "Download as" -> CSV (current sheet)
The resulting file will be in UTF-8
Under Excel 2016 and up (including Office 365), there is a CSV option dedicated to the UTF-8 format.
In Office 365, do Save As; where previously one might have chosen CSV (Comma Delimited), now one of the file types you can save as is CSV UTF-8 (Comma delimited) (*.csv)
What about using Powershell.
Get-Content 'C:\my.csv' | Out-File 'C:\my_utf8.csv' -Encoding UTF8
For those looking for an entirely programmatic (or at least server-side) solution, I've had great success using catdoc's xls2csv tool.
Install catdoc:
apt-get install catdoc
Do the conversion:
xls2csv -d utf-8 file.xls > file-utf-8.csv
This is blazing fast.
Note that it's important that you include the -d utf-8 flag, otherwise it will encode the output in the default cp1252 encoding, and you run the risk of losing information.
Note that xls2csv also only works with .xls files, it does not work with .xlsx files.
Easiest way:
No need Open office and google docs
Save your file as "Unicode text file";
now you have an unicode text file
open it with "notepad" and "Save as" it with selecting "utf-8" or
other code page that you want
rename file extension from "txt" to "csv". This will result in a tab-delimited UTF-8 csv file.
If you want a comma-delimited file, open the csv file you just renamed and replace all tabs with commas. To do this in Notepad on Win 10, simply select one tab field then click Ctrl+H. In the window that opens, type a comma , in the "Replace with" field then click "Replace All". Save your file. The result will be a comma-delimited UTF-8 csv file.
Don't open it with MS-Office anyway!!!
Now you have a tab delimited CSV file.
Or, a comma-delimited one if you applied step number 5.
As funny as it may seem, the easiest way I found to save my 180MB spreadsheet into a UTF8 CSV file was to select the cells into Excel, copy them and to paste the content of the clipboard into SublimeText.
I was not able to find a VBA solution for this problem on Mac Excel. There simply seemed to be no way to output UTF-8 text.
So I finally had to give up on VBA, bit the bullet, and learned AppleScript. It wasn't nearly as bad as I had thought.
Solution is described here:
http://talesoftech.blogspot.com/2011/05/excel-on-mac-goodbye-vba-hello.html
Assuming an Windows environment, save and work with the file as usual in Excel but then open up the saved Excel file in Gnome Gnumeric (free). Save Gnome Gnumeric's spreadsheet as CSV which - for me anyway - saves it as UTF-8 CSV.
Easy way to do it: download open office (here), load the spreadsheet and open the excel file (.xls or .xlsx). Then just save it as a text CSV file and a window opens asking to keep the current format or to save as a .ODF format. select "keep the current format" and in the new window select the option that works better for you, according with the language that your file is been written on. For Spanish language select Western Europe (Windows-1252/ WinLatin 1) and the file works just fine. If you select Unicode (UTF-8), it is not going to work with the spanish characters.
Save xls file (Excel file) as Unicode text=>file will be saved in text format (.txt)
Change format from .txt to .csv (rename the file from XYX.txt to XYX.csv
I have also came across the same problem but there is an easy solution for this.
Open your xlsx file in Excel 2016 or higher.
In "Save As" choose this option: "(CSV UTF-8(Comma Delimited)*.csv)"
It works perfectly and a csv file is generated which can be imported in any software. I imported this csv file in my SQLITE database and it works perfectly with all unicode characters intact.
Came across the same problem and googled out this post. None of the above worked for me. At last I converted my Unicode .xls to .xml (choose Save as ... XML Spreadsheet 2003) and it produced the correct character. Then I wrote code to parse the xml and extracted content for my use.
I have written a small Python script that can export worksheets in UTF-8.
You just have to provide the Excel file as first parameter followed by the sheets that you would like to export. If you do not provide the sheets, the script will export all worksheets that are present in the Excel file.
#!/usr/bin/env python
# export data sheets from xlsx to csv
from openpyxl import load_workbook
import csv
from os import sys
reload(sys)
sys.setdefaultencoding('utf-8')
def get_all_sheets(excel_file):
sheets = []
workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
all_worksheets = workbook.get_sheet_names()
for worksheet_name in all_worksheets:
sheets.append(worksheet_name)
return sheets
def csv_from_excel(excel_file, sheets):
workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
for worksheet_name in sheets:
print("Export " + worksheet_name + " ...")
try:
worksheet = workbook.get_sheet_by_name(worksheet_name)
except KeyError:
print("Could not find " + worksheet_name)
sys.exit(1)
your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for row in worksheet.iter_rows():
lrow = []
for cell in row:
lrow.append(cell.value)
wr.writerow(lrow)
print(" ... done")
your_csv_file.close()
if not 2 <= len(sys.argv) <= 3:
print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
sys.exit(1)
else:
sheets = []
if len(sys.argv) == 3:
sheets = list(sys.argv[2].split(','))
else:
sheets = get_all_sheets(sys.argv[1])
assert(sheets != None and len(sheets) > 0)
csv_from_excel(sys.argv[1], sheets)
Excel typically saves a csv file as ANSI encoding instead of utf8.
One option to correct the file is to use Notepad or Notepad++:
Open the .csv with Notepad or Notepad++.
Copy the contents to your computer clipboard.
Delete the contents from the file.
Change the encoding of the file to utf8.
Paste the contents back from the clipboard.
Save the file.
A second option to "nevets1219" is to open your CSV file in Notepad++ and do a convertion to ANSI.
Choose in the top menu :
Encoding -> Convert to Ansi
Encoding -> Convert to Ansi will encode it in ANSI/UNICODE. Utf8 is a subset of Unicode. Perhaps in ANSI will be encoded correctly, but here we are talking about UTF8, #SequenceDigitale.
There are faster ways, like exporting as csv ( comma delimited ) and then, opening that csv with Notepad++ ( free ), then Encoding > Convert to UTF8. But only if you have to do this once per file. If you need to change and export fequently, then the best is LibreOffice or GDocs solution.
Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.
open .csv fine with notepad++. if you see your encoding is good (you see all characters as they should be) press encoding , then convert to ANSI
else - find out what is your current encoding
another solution is to open the file by winword and save it as txt and then reopen it by excel and it will work ISA
Save Dialog > Tools Button > Web Options > Encoding Tab
I have the same problem and come across this add in , and it works perfectly fine in excel 2013 beside excel 2007 and 2010 which it is mention for.

Resources