What's the bese encoding for csv to be imported by Microsoft excel? - excel

Microsoft Excel 2000 does not import csv encoded in UTF-8 properly. Which encoding should I used if unicode characters (for example, English + Chinese + Japanese) are embedded in the CSV? Thanks

I believe you probably want UTF16LE

Related

How do I write the £ (GBP) sign in a CSV file from Ruby and read it back correctly in Excel?

When I write a CSV file using Ruby containing the £ sign and I open it using Excel I see this symbol instead ¬£.
My understanding is that Ruby uses UTF-8, but Excel interprets this file using a different encoding (ASCII).
I tried to write a US-ASCII encoded CSV file and guessed the £ encoding in ASCII like this:
csv = CSV.open(filename, 'w:US-ASCII')
csv << "\xA3"
csv.close
but it fails with invalid byte sequence in UTF-8 somewhere deep into the CSV library.
What am I doing wrong?
Thank you
For sure, Excel is not bound to use ASCII. For instance, I can easily input japanese characters into an Excel cell, and these are certainly not representable by ASCII.
While Ruby, by default, uses Unicode in its internal representation, every String object incorporates its own encoding, so you could in theory mix strings with different encodings, if you want to. In your case, you want to force a certain encoding when writing a file. This can be done either by using the w: output option, as you did, or by using external_encoding: Encoding::US-ASCII. See here for the names of the constants in Encoding.
I don't think US-ASCII is a good choice for the encoding, simply because there is no pound symbol in the ASCII chart. I would have expected that you get a warning message on stderr, when trying to write a pound symbol. If you need an 8-bit-encoding, ISO-8859-1 should do the job, but my recommendation would be to write UTF-8 and tell Excel to use this encoding when reading the CSV file. The possibility to import UTF exists at least since Excel 2007.

Javascript export CSV encoding utf-8 and using excel to open issue

I have been reading quite some posts including this one
Javascript export CSV encoding utf-8 issue
I know lots mentioned it's because of microsoft excel that using something like this should work
https://superuser.com/questions/280603/how-to-set-character-encoding-when-opening-excel
I have tried on ubuntu (which didn't even have any issue), on windows10, which I have to use the second posts to import, on mac which has the biggest problem because mac does not import, does not read the unicode at all.
Is there anyway I can do it in coding while exporting to enforce excel to open with utf-8? or some other workaround I might be able to try?
Thanks in advance for any help and suggestions.
Many Windows applications, including Excel, assume the localized ANSI encoding (Windows-1252 on US Windows) when opening a file, unless the file starts with byte-order-mark (BOM) code point. While UTF-8 doesn't need a BOM, a UTF-8-encoded BOM at the start of a file clues Excel that the file is UTF-8. The byte sequence is EF BB BF and the equivalent Unicode code point is U+FEFF.

Exporting from Excel to CSV replaces Japanese characters with ??? even though Windows, Office locale is Japan/Japanese

I am exporting an excel file (Excel 2016) containing Japanese characters into CSV. (Note : I am not exporting to CSV UTF-8 provided). In the process, all Japanese characters are replaced with '?'
My Windows/Office locale is Japan/Japanese & Windows/office language/format is all Japanese.
I understand that excel uses a codepage to save the CSV file in particular encoding. My understanding was this should be Shift-JIS (as default encoding for Japanese locale). If that is so, why the loss of information & replacement by '?'
What encoding does Excel try to save the CSV in???
(FYI : If I try to open an CSV, excel by default attempts to open the CSV in Shift-JIS 932 as expected)
Note : I am aware of workarounds of using UTF-8. I am interested in understanding above behavior, more than a workaround
Thanks
The character 縺 appears very often when you read a byte stream containing Shift-JIS (MS932) encoded hiragana characters and try to decode it as UTF-8 characters. FYI, CybetChef is handy for this kind of work. You will get the string まとづ…… as output from your string.
So in this situation, Excel 2016 seems to have written the CSV in Shift-JIS (MS932), and your text editor (or Excel 2016. How did you open the CSV?) seems to have read the CSV in UTF-8.

Cannot write british pound or euro symbols to CSV file - Nodejs

I'm writing a CSV file which contains text with british pound and euro symbols, however when I opened the file in Excel, I see some rather odd behavior. I see some weird A-looking symbol before the british pound, and quotes instead of the euro symbol. I figured it's probably because Excel doesn't like a file that's UTF8 encoded.
fs.writeFileAsync("the-file.csv", text-containing-foreing-currency, "utf8");
Does anyone know a way to get around this while creating the file? I don't want the users to have to do anything with excel after downloading the file, I just want them to be able to open the file and see the right symbols.
There shouldn't be any problem with node writing the symbols to the file, if you open it with a text editor you should see the correct characters.
The problem is with excel opening UTF8 csv files. By default it assumes ANSI encoding, so if the file is in UTF8, it scrambles the characters. You can open the file correctly with the text import wizard.
In general this is a limitation of excel. The best workaround for you will depend on your OS and Excel version. This is a heavily discussed topic, here are some good reads:
Is it possible to force Excel recognize UTF-8 CSV files automatically?
Which encoding opens CSV files correctly with Excel on both Mac and Windows?

What charset does Microsoft Excel use when saving files?

I have a Java app which reads CSV files which have been created in Excel (e.g. 2007). Does anyone know what charset MS Excel uses to save these files in?
I would have guessed either:
windows-1255 (Cp1255)
ISO-8859-1
UTF8
but I am unable to decode extended chars (e.g. french accentuated letters) using either of these charset types.
From memory, Excel uses the machine-specific ANSI encoding. So this would be Windows-1252 for a EN-US installation, 1251 for Russian, etc.
CSV files could be in any format, depending on what encoding option was specified during the export from Excel: (Save Dialog, Tools Button, Web Options Item, Encoding Tab)
UPDATE: Excel (including Office 2013) doesn't actually respect the web options selected in the "save as..." dialog, so this is a bug of some sort. I just use OpenOffice Calc now to open my XLSX files and export them as CSV files (edit filter settings, choose UTF-8 encoding).
Waking up this old thread... We are now in 2017. And still Excel is unable to save a simple spreadsheet into a CSV format while preserving the original encoding ... Just amazing.
Luckily Google Docs lives in the right century. The solution for me is just to open the spreadsheet using Google Docs, than download it back down as CSV. The result is a correctly encoded CSV file (with all strings encoded in UTF8).
I had a similar problem last week. I received a number of CSV files with varying encodings. Before importing into the database I then used the chardet libary to automatically sniff out the correct encoding.
Chardet is a port from Mozillas character detection engine and if the sample size is large enough (one accentuated character will not do) works really well.
Russian Edition offers CSV, CSV (Macintosh) and CSV (DOS).
When saving in plain CSV, it uses windows-1251.
I just tried to save French word Résumé along with the Russian text, it saved it in HEX like 52 3F 73 75 6D 3F, 3F being the ASCII code for question mark.
When I opened the CSV file, the word, of course, became unreadable (R?sum?)
Excel 2010 saves an UTF-16/UCS-2 TSV file, if you select File > Save As > Unicode Text (.txt). It's (force) suffixed ".txt", which you can change to ".tsv".
If you need CSV, you can then convert the TSV file in a text editor like Notepad++, Ultra Edit, Crimson Editor etc, replacing tabs by semi-colons, commas or the like. Note that e.g. for reading into a DB table, often TSV works fine already (and it is often easier to read manually).
If you need a different code page like UTF-8, use one of the above mentioned editors for converting.
cp1250 is used extensively in Microsoft Office documents, including Word and Excel 2003.
http://en.wikipedia.org/wiki/Windows-1250
A simple way to confirm this would be to:
Create a spreadsheet with higher order characters, e.g. "Veszprém" in one of the cells;
Use your favourite scripting language to parse and decode the spreadsheet;
Look at what your script produces when you print out the decoded data.
Example perl script:
#!perl
use strict;
use Spreadsheet::ParseExcel::Simple;
use Encode qw( decode );
my $file = "my_spreadsheet.xls";
my $xls = Spreadsheet::ParseExcel::Simple->read( $file );
my $sheet = [ $xls->sheets ]->[0];
while ($sheet->has_data) {
my #data = $sheet->next_row;
for my $datum ( #data ) {
print decode( 'cp1250', $datum );
}
}
While it is true that exporting an excel file that contains special characters to csv can be a pain in the ass, there is however a simple work around: simply copy/paste the cells into a google docs and then save from there.
You could use this Visual Studio VB.Net code to get the encoding:
Dim strEncryptionType As String = String.Empty
Dim myStreamRdr As System.IO.StreamReader = New System.IO.StreamReader(myFileName, True)
Dim myString As String = myStreamRdr.ReadToEnd()
strEncryptionType = myStreamRdr.CurrentEncoding.EncodingName
You can create CSV file using encoding UTF8 + BOM (https://en.wikipedia.org/wiki/Byte_order_mark).
First three bytes are BOM (0xEF,0xBB,0xBF) and then UTF8 content.
OOXML files like those that come from Excel 2007 are encoded in UTF-8, according to wikipedia. I don't know about CSV files, but it stands to reason it would use the same format...

Resources