How to prevent excel from changing csv file encoding after save? [duplicate] - excel

This question already has answers here:
Excel to CSV with UTF8 encoding [closed]
(36 answers)
Closed 7 years ago.
I'm trying to implement a simple data import to my MVC application from .csv file.
To remove the need for user to save .csv file template in his computer and from going through the trouble of getting it in the first place, I created a button on my form that generates the template.
I generate a string from my model object in a writer class:
this.Writer.GetCswTemplate() returns string from viewModel (i.e. column1;colum2\r\n)
As per How to GetBytes() in C# with UTF8 encoding with BOM?
I force excel to open .csv file with UTF-8 encoding:
var templateResult = Encoding.UTF8.GetBytes(this.Writer.GetCswTemplate());
var preamble = Encoding.UTF8.GetPreamble();
var templateBytes = preamble.Concat(templateResult).ToArray();
to send the generated template to the user I use MVC File() helper:
return this.File(templateBytes, "application/csv", "filename.csv");
It works great, it generates the template, returns it to the user, opens it with Excel and shows all the special characters in it. If I open the generated file in Notepad++ I can see that it's encoding is UTF-8.
The problem occurs when a user fills the generated template and saves it inside Excel. For some reason Excel decides to change file encoding to ANSI.
Is there any way for me to prevent that? Did I miss something (Add some kind of header or something)?
Interestingly if I generate template with UTF-8 (without BOM), modify said file in excel and save it, Excel does not change its encoding to ANSI. The problem then is that Excel does not recognize the special characters inside template.

UTF-8 is an encoding set that can contain any Unicode character. Unfortunately, not all applications can encode files in UTF-8 by default, and Microsoft Excel is one of them.
Instead of Unicode, Excel encodes CSV files using ANSI
One would either need to ask the user to open the file in notepad and save in the correct format (to much work!) or work out some detection/conversion logic.

Related

GO generate incorrect UTF-8 chars like ö

Ik building an GO application where I want to output a cvs string from a buffer out via the http server.
I'm putting it into the csv buffer:
var buffer bytes.Buffer
resp := csv.NewWriter(&buffer)
resp.Write("Schröder")
The output it via the http server:
resp.Flush()
w.Header().Set("Content-Type", "text/csv; charset=utf-8")
w.Write([]byte(buffer.String()))
When I then open my url a csv file is download en opened by Excel. In that excelsheet the field value is converted to "Schröder".
Any idee, i'm already a week stuk on this item?
The problem is not in Go but in Excel. The information that the data are encoded in UTF-8 is lost when saving the file, since there is no such thing as an encoding attribute on saved files.
Thus Excel will just see the plan data and has no information about the encoding. There are several tricks to make Excel do the right guess, like placing the proper byte order mark (BOM) at start of the file. See Is it possible to force Excel recognize UTF-8 CSV files automatically?. But just specifying charset=utf-8 within the HTTP Content-type header will not help since Excel does not get this information.

CSV in UTF-8 and Microsoft Excel

In my application I have a list of items which can be exported to CSV.
For this, I create a Blob as follows:
var BOM = "\ufeff";
var blob = new Blob([csv], {
type: 'csv;charset=utf-8'
});
In case that the data in this list contains special characters, the exported file was not opened correctly in MS Excel. So I added a line to my code (the second line in the following snippet), as I found in many Q&A forums:
var BOM = "\ufeff";
var csv = BOM + csv;
var blob = new Blob([csv], {
type: 'csv;charset=utf-8'
});
That works - the CSV is opened correctly in Excel, but then, when saving the file - it is save in text format and not as CSV. Which meant I need to "Save As" the file and change the default type if I want it to be saved correctly.
Is it really like this? Do I really have to choose between the two options - see the file or save it correctly?
Yes this is a shame but it is really like this. From Excel a CSV is ANSI encoded per default and there is not a directly possibility to save CSV in any unicode encoding. Microsoft itself suggest using Notepad to change the encoding. See How to save an address book to a CSV file by using the UTF-8 encoding format so that the CSV file can be imported to Windows Mail. See also How can I save a csv with utf-8 encoding using Excel 2013?
Only other possibility is using VBA and create the CSV file using ADODB.Stream or Scripting.FileSystemObject.
How to use ADODB.Stream to create unicode encoded CSV file is answered multiple times already. For example: how to Export excel to csv file with "|" delimted and utf-8 code. Simply change the delimiter "|" to ",". This is the basic approach. Maybe you have to extend it to provide text delimiter also, if the delimiter can be part of the data.
Using CreateTextFile Method of Scripting.FileSystemObject is simpler but only allows Unicode which is UTF-16LE rather than UTF-8.

Generated .csv file display £ instead of £ symbol in csv file [duplicate]

I am programmatically exporting data (using PHP 5.2) into a .csv test file.
Example data: Numéro 1 (note the accented e).
The data is utf-8 (no prepended BOM).
When I open this file in MS Excel is displays as Numéro 1.
I am able to open this in a text editor (UltraEdit) which displays it correctly. UE reports the character is decimal 233.
How can I export text data in a .csv file so that MS Excel will correctly render it, preferably without forcing the use of the import wizard, or non-default wizard settings?
A correctly formatted UTF8 file can have a Byte Order Mark as its first three octets. These are the hex values 0xEF, 0xBB, 0xBF. These octets serve to mark the file as UTF8 (since they are not relevant as "byte order" information).1 If this BOM does not exist, the consumer/reader is left to infer the encoding type of the text. Readers that are not UTF8 capable will read the bytes as some other encoding such as Windows-1252 and display the characters  at the start of the file.
There is a known bug where Excel, upon opening UTF8 CSV files via file association, assumes that they are in a single-byte encoding, disregarding the presence of the UTF8 BOM. This can not be fixed by any system default codepage or language setting. The BOM will not clue in Excel - it just won't work. (A minority report claims that the BOM sometimes triggers the "Import Text" wizard.) This bug appears to exist in Excel 2003 and earlier. Most reports (amidst the answers here) say that this is fixed in Excel 2007 and newer.
Note that you can always* correctly open UTF8 CSV files in Excel using the "Import Text" wizard, which allows you to specify the encoding of the file you're opening. Of course this is much less convenient.
Readers of this answer are most likely in a situation where they don't particularly support Excel < 2007, but are sending raw UTF8 text to Excel, which is misinterpreting it and sprinkling your text with à and other similar Windows-1252 characters. Adding the UTF8 BOM is probably your best and quickest fix.
If you are stuck with users on older Excels, and Excel is the only consumer of your CSVs, you can work around this by exporting UTF16 instead of UTF8. Excel 2000 and 2003 will double-click-open these correctly. (Some other text editors can have issues with UTF16, so you may have to weigh your options carefully.)
* Except when you can't, (at least) Excel 2011 for Mac's Import Wizard does not actually always work with all encodings, regardless of what you tell it. </anecdotal-evidence> :)
Prepending a BOM (\uFEFF) worked for me (Excel 2007), in that Excel recognised the file as UTF-8. Otherwise, saving it and using the import wizard works, but is less ideal.
Below is the PHP code I use in my project when sending Microsoft Excel to user:
/**
* Export an array as downladable Excel CSV
* #param array $header
* #param array $data
* #param string $filename
*/
function toCSV($header, $data, $filename) {
$sep = "\t";
$eol = "\n";
$csv = count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
foreach($data as $line) {
$csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
}
$encoded_csv = mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
header('Content-Description: File Transfer');
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment; filename="'.$filename.'.csv"');
header('Content-Transfer-Encoding: binary');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
header('Content-Length: '. strlen($encoded_csv));
echo chr(255) . chr(254) . $encoded_csv;
exit;
}
UPDATED: Filename improvement and BUG fix correct length calculation. Thanks to TRiG and #ivanhoe011
The answer for all combinations of Excel versions (2003 + 2007) and file types
Most other answers here concern their Excel version only and will not necessarily help you, because their answer just might not be true for your version of Excel.
For example, adding the BOM character introduces problems with automatic column separator recognition, but not with every Excel version.
There are 3 variables that determines if it works in most Excel versions:
Encoding
BOM character presence
Cell separator
Somebody stoic at SAP tried every combination and reported the outcome. End result? Use UTF16le with BOM and tab character as separator to have it work in most Excel versions.
You don't believe me? I wouldn't either, but read here and weep: http://wiki.sdn.sap.com/wiki/display/ABAP/CSV+tests+of+encoding+and+column+separator
Echo UTF-8 BOM before outputing CSV data. This fixes all character issues in Windows but doesnt work for Mac.
echo "\xEF\xBB\xBF";
It works for me because I need to generate a file which will be used on Windows PCs only.
select UTF-8 enconding when importing. if you use Office 2007 this is where you chose it :
right after you open the file.
UTF-8 doesn't work for me in office 2007 without any service pack, with or without BOM
(U+ffef or 0xEF,0xBB,0xBF , neither works)
installing sp3 makes UTF-8 work when 0xEF,0xBB,0xBF BOM is prepended.
UTF-16 works when encoding in python using "utf-16-le" with a 0xff 0xef
BOM prepended, and using tab as seperator.
I had to manually write out the BOM, and then use "utf-16-le" rather then "utf-16",
otherwise each encode() prepended the BOM to every row written out which
appeared as garbage on the first column of the second line and after.
can't tell whether UTF-16 would work without any sp installed, since
I can't go back now. sigh
This is on windows, dunno about office for MAC.
for both working cases, the import works when launching a download directly from the
browser and the text import wizard doesn't intervence, it works like you would expect.
As Fregal said \uFEFF is the way to go.
<%#LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
<%
Response.Clear();
Response.ContentType = "text/csv";
Response.Charset = "utf-8";
Response.AddHeader("Content-Disposition", "attachment; filename=excelTest.csv");
Response.Write("\uFEFF");
// csv text here
%>
I've also noticed that the question was "answered" some time ago but I don't understand the stories that say you can't open a utf8-encoded csv file successfully in Excel without using the text wizard.
My reproducible experience:
Type Old MacDonald had a farm,ÈÌÉÍØ into Notepad, hit Enter, then Save As (using the UTF-8 option).
Using Python to show what's actually in there:
>>> open('oldmac.csv', 'rb').read()
'\xef\xbb\xbfOld MacDonald had a farm,\xc3\x88\xc3\x8c\xc3\x89\xc3\x8d\xc3\x98\r\n'
>>> ^Z
Good. Notepad has put a BOM at the front.
Now go into Windows Explorer, double click on the file name, or right click and use "Open with ...", and up pops Excel (2003) with display as expected.
You can save an html file with the extension 'xls' and accents will work (pre 2007 at least).
Example: save this (using Save As utf8 in Notepad) as test.xls:
<html>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8" />
<table>
<tr>
<th>id</th>
<th>name</th>
</tr>
<tr>
<td>4</td>
<td>Hélène</td>
</tr>
</table>
</html>
This is just of a question of character encodings. It looks like you're exporting your data as UTF-8: é in UTF-8 is the two-byte sequence 0xC3 0xA9, which when interpreted in Windows-1252 is é. When you import your data into Excel, make sure to tell it that the character encoding you're using is UTF-8.
The CSV format is implemented as ASCII, not unicode, in Excel, thus mangling the diacritics. We experienced the same issue which is how I tracked down that the official CSV standard was defined as being ASCII-based in Excel.
Excel 2007 properly reads UTF-8 with BOM (EF BB BF) encoded csv.
Excel 2003 (and maybe earlier) reads UTF-16LE with BOM (FF FE), but with TABs instead of commas or semicolons.
I can only get CSV to parse properly in Excel 2007 as tab-separated little-endian UTF-16 starting with the proper byte order mark.
Writing a BOM to the output CSV file actually did work for me in Django:
def handlePersoonListExport(request):
# Retrieve a query_set
...
template = loader.get_template("export.csv")
context = Context({
'data': query_set,
})
response = HttpResponse()
response['Content-Disposition'] = 'attachment; filename=export.csv'
response['Content-Type'] = 'text/csv; charset=utf-8'
response.write("\xEF\xBB\xBF")
response.write(template.render(context))
return response
For more info http://crashcoursing.blogspot.com/2011/05/exporting-csv-with-special-characters.html Thanks guys!
Another solution I found was just to encode the result as Windows Code Page 1252 (Windows-1252 or CP1252). This would be done, for example by setting Content-Type appropriately to something like text/csv; charset=Windows-1252 and setting the character encoding of the response stream similarly.
Note that including the UTF-8 BOM is not necessarily a good idea - Mac versions of Excel ignore it and will actually display the BOM as ASCII… three nasty characters at the start of the first field in your spreadsheet…
Check the encoding in which you are generating the file, to make excel display the file correctly you must use the system default codepage.
Wich language are you using? if it's .Net you only need to use Encoding.Default while generating the file.
If you have legacy code in vb.net like I have, the following code worked for me:
Response.Clear()
Response.ClearHeaders()
Response.ContentType = "text/csv"
Response.Expires = 0
Response.AddHeader("Content-Disposition", "attachment; filename=export.csv;")
Using sw As StreamWriter = New StreamWriter(Context.Response.OutputStream, System.Text.Encoding.Unicode)
sw.Write(csv)
sw.Close()
End Using
Response.End()
I've found a way to solve the problem. This is a nasty hack but it works: open the doc with Open Office, then save it into any excel format; the resulting .xls or .xlsx will display the accentuated characters.
With Ruby 1.8.7 I encode every field to UTF-16 and discard BOM (maybe).
The following code is extracted from active_scaffold_export:
<%
require 'fastercsv'
fcsv_options = {
:row_sep => "\n",
:col_sep => params[:delimiter],
:force_quotes => #export_config.force_quotes,
:headers => #export_columns.collect { |column| format_export_column_header_name(column) }
}
data = FasterCSV.generate(fcsv_options) do |csv|
csv << fcsv_options[:headers] unless params[:skip_header] == 'true'
#records.each do |record|
csv << #export_columns.collect { |column|
# Convert to UTF-16 discarding the BOM, required for Excel (> 2003 ?)
Iconv.conv('UTF-16', 'UTF-8', get_export_column_value(record, column))[2..-1]
}
end
end
-%><%= data -%>
The important line is:
Iconv.conv('UTF-16', 'UTF-8', get_export_column_value(record, column))[2..-1]
open the file csv with notepad++
clic on Encode, select convert to UTF-8 (not convert to UTF-8(without BOM))
Save
open by double clic with excel
Hope that help
Christophe GRISON

Is it possible to force Excel recognize UTF-8 CSV files automatically?

I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such CSV files (containing e.g. diacritics, cyrillic letters, Greek letters) in Excel does not achieve the expected results showing something like Г„/Г¤, Г–/Г¶. And I don't know how to force Excel understand that the open CSV file is encoded in UTF-8. I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that.
Is there any workaround?
P.S. Which tools may potentially behave like Excel does?
UPDATE
I have to say that I've confused the community with the formulation of the question. When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user, in a fluent and transparent way. However, I used a wrong formulation asking for doing it automatically. That is very confusing and it clashes with VBA macro automation. There are two answers for this questions that I appreciate the most: the very first answer by Alex https://stackoverflow.com/a/6002338/166589, and I've accepted this answer; and the second one by Mark https://stackoverflow.com/a/6488070/166589 that have appeared a little later. From the usability point of view, Excel seemed to have lack of a good user-friendly UTF-8 CSV support, so I consider both answers are correct, and I have accepted Alex's answer first because it really stated that Excel was not able to do that transparently. That is what I confused with automatically here. Mark's answer promotes a more complicated way for more advanced users to achieve the expected result. Both answers are great, but Alex's one fits my not clearly specified question a little better.
UPDATE 2
Five months later after the last edit, I've noticed that Alex's answer has disappeared for some reason. I really hope it wasn't a technical issue and I hope there is no more discussion on which answer is greater now. So I'm accepting Mark's answer as the best one.
Alex is correct, but as you have to export to csv, you can give the users this advice when opening the csv files:
Save the exported file as a csv
Open Excel
Import the data using Data-->Import External Data --> Import Data
Select the file type of "csv" and browse to your file
In the import wizard change the File_Origin to "65001 UTF" (or choose correct language character identifier)
Change the Delimiter to comma
Select where to import to and Finish
This way the special characters should show correctly.
The UTF-8 Byte-order mark will clue Excel 2007+ in to the fact that you're using UTF-8. (See this SO post).
In case anybody is having the same issues I was, .NET's UTF8 encoding class does not output a byte-order marker in a GetBytes() call. You need to use streams (or use a workaround) to get the BOM to output.
The bug with ignored BOM seems to be fixed for Excel 2013. I had same problem with Cyrillic letters, but adding BOM character \uFEFF did help.
It is incredible that there are so many answers but none answers the question:
"When I was asking this question, I asked for a way of opening a UTF-8
CSV file in Excel without any problems for a user,..."
The answer marked as the accepted answer with 200+ up-votes is useless for me because I don't want to give my users a manual how to configure Excel.
Apart from that: this manual will apply to one Excel version but other Excel versions have different menus and configuration dialogs. You would need a manual for each Excel version.
So the question is how to make Excel show UTF8 data with a simple double click?
Well at least in Excel 2007 this is not possible if you use CSV files because the UTF8 BOM is ignored and you will see only garbage. This is already part of the question of Lyubomyr Shaydariv:
"I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."
I make the same experience: Writing russian or greek data into a UTF8 CSV file with BOM results in garbage in Excel:
Content of UTF8 CSV file:
Colum1;Column2
Val1;Val2
Авиабилет;Tλληνικ
Result in Excel 2007:
A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional problem to the UTF-8 problem.
Excel exists since decades. It is a shame that Microsoft was not able to implement such a basic thing as CSV import in all these years.
However, if you put the same values into a HTML file and save that file as UTF8 file with BOM with the file extension XLS you will get the correct result.
Content of UTF8 XLS file:
<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>
Result in Excel 2007:
You can even use colors in HTML which Excel will show correctly.
<style>
.Head { background-color:gray; color:white; }
.Red { color:red; }
</style>
<table border=1>
<tr><td class=Head>Colum1</td><td class=Head>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td class=Red>Авиабилет</td><td class=Red>Tλληνικ</td></tr>
</table>
Result in Excel 2007:
In this case only the table itself has a black border and lines. If you want ALL cells to display gridlines this is also possible in HTML:
<html xmlns:x="urn:schemas-microsoft-com:office:excel">
<head>
<meta http-equiv="content-type" content="text/plain; charset=UTF-8"/>
<xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>MySuperSheet</x:Name>
<x:WorksheetOptions>
<x:DisplayGridlines/>
</x:WorksheetOptions>
</x:ExcelWorksheet>
</x:ExcelWorksheets>
</x:ExcelWorkbook>
</xml>
</head>
<body>
<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>
</body>
</html>
This code even allows to specify the name of the worksheet (here "MySuperSheet")
Result in Excel 2007:
We have used this workaround:
Convert CSV to UTF-16 LE
Insert BOM at beginning of file
Use tab as field separator
Had the same problems with PHP-generated CSV files.
Excel ignored the BOM when the Separator was defined via "sep=,\n" at the beginning of the content (but of course after the BOM).
So adding a BOM ("\xEF\xBB\xBF") at the beginning of the content and setting the semicolon as separator via fputcsv($fh, $data_array, ";"); does the trick.
You can convert .csv file to UTF-8 with BOM via Notepad++:
Open the file in Notepad++.
Go to menu Encoding→Convert to UTF-8-BOM.
Go to menu File→Save.
Close Notepad++.
Open the file in Excel .
Worked in Microsoft Excel 2013 (15.0.5093.1000) MSO (15.0.5101.1000) 64-bit from Microsoft Office Professional Plus 2013 on Windows 8.1 with locale for non-Unicode programs set to "German (Germany)".
Old question but heck, the simplest solution is:
Open CSV in Notepad
Save As -> select the right encoding
Open the new file
I have had the same issue in the past (how to produce files that Excel can read, and other tools can also read). I was using TSV rather than CSV, but the same problem with encodings came up.
I failed to find any way to get Excel to recognize UTF-8 automatically, and I was not willing/able to inflict on the consumers of the files complicated instructions how to open them. So I encoded them as UTF-16le (with a BOM) instead of UTF-8. Twice the size, but Excel can recognize the encoding. And they compress well, so the size rarely (but sadly not never) matters.
As I posted on http://thinkinginsoftware.blogspot.com/2017/12/correctly-generate-csv-that-excel-can.html:
Tell the software developer in charge of generating the CSV to correct it. As a quick workaround you can use gsed to insert the UTF-8 BOM at the beginning of the string:
gsed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' file.csv
This command inserts the UTF-4 BOM if not present. Therefore it is an idempotent command. Now you should be able to double click the file and open it in Excel.
In php you just prepend $bom to your $csv_string:
$bom = sprintf( "%c%c%c", 239, 187, 191); // EF BB BF
file_put_contents( $file_name, $bom . $csv_string );
Tested with MS Excel 2016, php 7.2.4
Simple vba macro for opening utf-8 text and csv files
Sub OpenTextFile()
filetoopen = Application.GetOpenFilename("Text Files (*.txt;*.csv), *.txt;*.csv")
If filetoopen = Null Or filetoopen = Empty Then Exit Sub
Workbooks.OpenText Filename:=filetoopen, _
Origin:=65001, DataType:=xlDelimited, Comma:=True
End Sub
Origin:=65001 is UTF-8.
Comma:True for .csv files distributed in colums
Save it in Personal.xlsb to have it always available.
Personalise excel toolbar adding a macro call button and open files from there.
You can add more formating to the macro, like column autofit , alignment,etc.
Just for help users interested on opening the file on Excel that achieve this thread like me.
I have used the wizard below and it worked fine for me, importing an UTF-8 file.
Not transparent, but useful if you already have the file.
Open Microsoft Excel 2007.
Click on the Data menu bar option.
Click on the From Text icon.
Navigate to the location of the file that you want to import. Click on the filename and then click on the Import button. The Text Import Wizard - Step 1 or 3 window will now appear on the screen.
Choose the file type that best describes your data - Delimited or Fixed Width.
Choose 65001: Unicode (UTF-8) from the drop-down list that appears next to File origin.
Click on the Next button to display the Text Import Wizard - Step 2 or 3 window.
Place a checkmark next to the delimiter that was used in the file you wish to import into Microsoft Excel 2007. The Data preview window will show you how your data will appear based on the delimiter that you chose.
Click on the Next button to display the Text Import Wizard - Step 3 of 3.
Choose the appropriate data format for each column of data that you want to import. You also have the option to not import one or more columns of data if you want.
Click on the Finish button to finish importing your data into Microsoft Excel 2007.
Source: https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0
A truly amazing list of answers, but since one pretty good one is still missing, I'll mention it here: open the csv file with google sheets and save it back to your local computer as an excel file.
In contrast to Microsoft, Google has managed to support UTF-8 csv files so it just works to open the file there. And the export to excel format also just works. So even though this may not be the preferred solution for all, it is pretty fail safe and the number of clicks is not as high as it may sound, especially when you're already logged into google anyway.
This is my working solution:
vbFILEOPEN = "your_utf8_file.csv"
Workbooks.OpenText Filename:=vbFILEOPEN, DataType:=xlDelimited, Semicolon:=True, Local:=True, Origin:=65001
The key is Origin:=65001
Yes it is possible. When writing the stream creating the csv, the first thing to do is this:
myStream.Write(Encoding.UTF8.GetPreamble(), 0, Encoding.UTF8.GetPreamble().Length)
Yes, this is possible. As previously noted by multiple users, there seems to be a problem with excel reading the correct Byte Order Mark when the file is encoded in UTF-8. With UTF-16 it does not seem to have a problem, so it is endemic to UTF-8. The solution I use for this is adding the BOM, TWICE. For this I execute the following sed command twice:
sed -I '1s/^/\xef\xbb\xbf/' *.csv
, where the wildcard can be replaced with any file name. However, this leads to a mutation of the sep= at the beginning of the .csv file. The .csv file will then open normally in excel, but with an extra row with "sep=" in the first cell.
The "sep=" can also be removed in the source .csv itself, but when opening the file with VBA the delimiter should be specified:
Workbooks.Open(name, Format:=6, Delimiter:=";", Local:=True)
Format 6 is the .csv format. Set Local to true, in case there are dates in the file. If Local is not set to true the dates will be Americanized, which in some cases will corrupt the .csv format.
This is not accurately addressing the question but since i stumbled across this and the above solutions didn't work for me or had requirements i couldn't meet, here is another way to add the BOM when you have access to vim:
vim -e -s +"set bomb|set encoding=utf-8|wq" filename.csv
hi i'm using ruby on rails for csv generation. In our application we plan to go for the multi language(I18n) and we faced an issue while viewing I18n content in the CSV file of windows excel.
Was fine with Linux (Ubuntu) and mac.
We identified that windows excel need to be imported the data again to view the actual data. While import we will get more options to choose character set.
But this can’t be educated for each and every user, so solution we looking for is to open just by double click.
Then we identified the way of showing data by open mode and bom in windows excel with the help of aghuddleston gist. Added at reference.
Example I18n content
In Mac and Linux
Swedish : Förnamn
English : First name
In Windows
Swedish : Förnamn
English : First name
def user_information_report(report_file_path, user_id)
user = User.find(user_id)
I18n.locale = user.current_lang
open_mode = "w+:UTF-16LE:UTF-8"
bom = "\xEF\xBB\xBF"
body user, open_mode, bom
end
def headers
headers = [
"ID", "SDN ID",
I18n.t('sys_first_name'), I18n.t('sys_last_name'), I18n.t('sys_dob'),
I18n.t('sys_gender'), I18n.t('sys_email'), I18n.t('sys_address'),
I18n.t('sys_city'), I18n.t('sys_state'), I18n.t('sys_zip'),
I18n.t('sys_phone_number')
]
end
def body tenant, open_mode, bom
File.open(report_file_path, open_mode) do |f|
csv_file = CSV.generate(col_sep: "\t") do |csv|
csv << headers
tenant.patients.find_each(batch_size: 10) do |patient|
csv << [
patient.id, patient.patientid,
patient.first_name, patient.last_name, "#{patient.dob}",
"#{translate_gender(patient.gender)}", patient.email, "#{patient.address_1.to_s} #{patient.address_2.to_s}",
"#{patient.city}", "#{patient.state}", "#{patient.zip}",
"#{patient.phone_number}"
]
end
end
f.write bom
f.write(csv_file)
end
end
Important things to note here is open mode and bom
open_mode = "w+:UTF-16LE:UTF-8"
bom = "\xEF\xBB\xBF"
Before writing the CSV insert BOM
f.write bom
f.write(csv_file)
Windows and Mac
File can be opened directly by double clicking.
Linux (ubuntu)
While opening a file ask for the separator options -> choose “TAB”
Download & install LibreOffice Calc
Open the csv file of your choice in LibreOffice Calc
Thank the heavens that an import text wizard shows up...
...select your delimiter and character encoding options
Select the resulting data in Calc and copy paste to Excel
I faced the same problem a few days ago, and could not find any solution because I cannot use the import from csv feature because it makes everything to be styled as string.
My solution was to first open the file with notpad++ and change the encode to ASCII.
Then just opened the file in excel and it worked as expected.
Working solution for office 365
save in UTF-16 (no LE, BE)
use separator \t
Code in PHP
$header = ['číslo', 'vytvořeno', 'ěščřžýáíé'];
$fileName = 'excel365.csv';
$fp = fopen($fileName, 'w');
fputcsv($fp, $header, "\t");
fclose($fp);
$handle = fopen($fileName, "r");
$contents = fread($handle, filesize($fileName));
$contents = iconv('UTF-8', 'UTF-16', $contents);
fclose($handle);
$handle = fopen($fileName, "w");
fwrite($handle, $contents);
fclose($handle);
This is an old question but I've just encountered had a similar problem and the solution may help others:
Had the same issue where writing out CSV text data to a file, then opening the resulting .csv in Excel shifts all the text into a single column. After having a read of the above answers I tried the following, which seems to sort the problem out.
Apply an encoding of UTF-8 when you create your StreamWriter. That's it.
Example:
using (StreamWriter output = new StreamWriter(outputFileName, false, Encoding.UTF8, 2 << 22)) {
/* ... do stuff .... */
output.Close();
}
If you want to make it fully automatic, one click, or to load automatically into Excel from say a web page, but can't generate proper Excel files, then I would suggest looking at SYLK format as an alternative. OK it is not as simple as CSV but it is text based and very easy to implement and it supports UTF-8 with no issues.
I wrote a PHP class that receives the data and outputs a SYLK file which will open directly in Excel by just clicking the file (or will auto-launch Excel if you write the file to a web page with the correct mime type. You can even add formatting (like bold, format numbers in particular ways etc) and change column sizes, or auto size columns to the text in the columns and all in all the code is probably not more than about 100 lines.
It is dead easy to reverse engineer SYLK by creating a simple spreadsheet and saving as SYLK and then reading it with a text editor. The first block are headers and standard number formats that you will recognise (which you just regurgitate in every file you create), then the data is simply an X/Y coordinate and a value.
I am generating csv files from a simple C# application and had the same problem. My solution was to ensure the file is written with UTF8 encoding, like so:
// Use UTF8 encoding so that Excel is ok with accents and such.
using (StreamWriter writer = new StreamWriter(path, false, Encoding.UTF8))
{
SaveCSV(writer);
}
I originally had the following code, with which accents look fine in Notepad++ but were getting mangled in Excel:
using (StreamWriter writer = new StreamWriter(path))
{
SaveCSV(writer);
}
Your mileage may vary - I'm using .NET 4 and Excel from Office 365.
I tried everything I could find on this thread and similar, nothing worked fully. However, importing to google sheets and simply downloading as csv worked like a charm. Try it out if you come to my frustration point.
It's March 2022, and it seems we cannot use both a BOM and the sep=... line.
Adding the sep=\t or similar, makes Excel ignore the BOM.
Using a semicolon seems to be a default Excel understands, in which case we can skip the sep=... line and it works.
This is Microsoft 365 with Excel version 2110 build 14527.20276.
Found a solution for ASP.NET Core to download CSV's as UTF8 with POM:
byte[] csvBytes = Encoding.Default.GetBytes(csvString);
UTF8Encoding utf8 = new UTF8Encoding(true);
byte[] bom = utf8.GetPreamble();
var result = bom.Concat(csvBytes).ToArray();
return new FileContentResult(result, MediaTypeHeaderValue.Parse("text/csv; charset=utf-8"));
Excel is recognizes the downloaded CSV file than as UTF8.
Just sharing a comprehensive function that might make your life easier working with CSV files.... please note last function argument in relation to this topic
function array2csv($data, $file = '', $download = true, $mode = 'w+', $delimiter = ',', $enclosure = '"', $escape_char = "\\", $addUnicodeBom = false)
{
$return = false;
if ($file == '') {
$f = fopen('php://memory', 'r+');
} else {
$f = fopen($file, $mode);
}
if ($addUnicodeBom) {
$utf8_with_bom = chr(239) . chr(187) . chr(191);
fwrite($f, $utf8_with_bom);
}
foreach ($data as $line => $item) {
fputcsv($f, $item, $delimiter, $enclosure, $escape_char);
}
rewind($f);
if ($download == true) {
$return = stream_get_contents($f);
} else {
$return = true;
}
return $return;
}
First save the Excel spreadsheet as Unicode text. Open the TXT file using Internet explorer and click "Save as" TXT Encoding - choose the appropriate encoding, i.e. for Win Cyrillic 1251

Microsoft Excel mangles Diacritics in .csv files?

I am programmatically exporting data (using PHP 5.2) into a .csv test file.
Example data: Numéro 1 (note the accented e).
The data is utf-8 (no prepended BOM).
When I open this file in MS Excel is displays as Numéro 1.
I am able to open this in a text editor (UltraEdit) which displays it correctly. UE reports the character is decimal 233.
How can I export text data in a .csv file so that MS Excel will correctly render it, preferably without forcing the use of the import wizard, or non-default wizard settings?
A correctly formatted UTF8 file can have a Byte Order Mark as its first three octets. These are the hex values 0xEF, 0xBB, 0xBF. These octets serve to mark the file as UTF8 (since they are not relevant as "byte order" information).1 If this BOM does not exist, the consumer/reader is left to infer the encoding type of the text. Readers that are not UTF8 capable will read the bytes as some other encoding such as Windows-1252 and display the characters  at the start of the file.
There is a known bug where Excel, upon opening UTF8 CSV files via file association, assumes that they are in a single-byte encoding, disregarding the presence of the UTF8 BOM. This can not be fixed by any system default codepage or language setting. The BOM will not clue in Excel - it just won't work. (A minority report claims that the BOM sometimes triggers the "Import Text" wizard.) This bug appears to exist in Excel 2003 and earlier. Most reports (amidst the answers here) say that this is fixed in Excel 2007 and newer.
Note that you can always* correctly open UTF8 CSV files in Excel using the "Import Text" wizard, which allows you to specify the encoding of the file you're opening. Of course this is much less convenient.
Readers of this answer are most likely in a situation where they don't particularly support Excel < 2007, but are sending raw UTF8 text to Excel, which is misinterpreting it and sprinkling your text with à and other similar Windows-1252 characters. Adding the UTF8 BOM is probably your best and quickest fix.
If you are stuck with users on older Excels, and Excel is the only consumer of your CSVs, you can work around this by exporting UTF16 instead of UTF8. Excel 2000 and 2003 will double-click-open these correctly. (Some other text editors can have issues with UTF16, so you may have to weigh your options carefully.)
* Except when you can't, (at least) Excel 2011 for Mac's Import Wizard does not actually always work with all encodings, regardless of what you tell it. </anecdotal-evidence> :)
Prepending a BOM (\uFEFF) worked for me (Excel 2007), in that Excel recognised the file as UTF-8. Otherwise, saving it and using the import wizard works, but is less ideal.
Below is the PHP code I use in my project when sending Microsoft Excel to user:
/**
* Export an array as downladable Excel CSV
* #param array $header
* #param array $data
* #param string $filename
*/
function toCSV($header, $data, $filename) {
$sep = "\t";
$eol = "\n";
$csv = count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
foreach($data as $line) {
$csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
}
$encoded_csv = mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
header('Content-Description: File Transfer');
header('Content-Type: application/vnd.ms-excel');
header('Content-Disposition: attachment; filename="'.$filename.'.csv"');
header('Content-Transfer-Encoding: binary');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
header('Content-Length: '. strlen($encoded_csv));
echo chr(255) . chr(254) . $encoded_csv;
exit;
}
UPDATED: Filename improvement and BUG fix correct length calculation. Thanks to TRiG and #ivanhoe011
The answer for all combinations of Excel versions (2003 + 2007) and file types
Most other answers here concern their Excel version only and will not necessarily help you, because their answer just might not be true for your version of Excel.
For example, adding the BOM character introduces problems with automatic column separator recognition, but not with every Excel version.
There are 3 variables that determines if it works in most Excel versions:
Encoding
BOM character presence
Cell separator
Somebody stoic at SAP tried every combination and reported the outcome. End result? Use UTF16le with BOM and tab character as separator to have it work in most Excel versions.
You don't believe me? I wouldn't either, but read here and weep: http://wiki.sdn.sap.com/wiki/display/ABAP/CSV+tests+of+encoding+and+column+separator
Echo UTF-8 BOM before outputing CSV data. This fixes all character issues in Windows but doesnt work for Mac.
echo "\xEF\xBB\xBF";
It works for me because I need to generate a file which will be used on Windows PCs only.
select UTF-8 enconding when importing. if you use Office 2007 this is where you chose it :
right after you open the file.
UTF-8 doesn't work for me in office 2007 without any service pack, with or without BOM
(U+ffef or 0xEF,0xBB,0xBF , neither works)
installing sp3 makes UTF-8 work when 0xEF,0xBB,0xBF BOM is prepended.
UTF-16 works when encoding in python using "utf-16-le" with a 0xff 0xef
BOM prepended, and using tab as seperator.
I had to manually write out the BOM, and then use "utf-16-le" rather then "utf-16",
otherwise each encode() prepended the BOM to every row written out which
appeared as garbage on the first column of the second line and after.
can't tell whether UTF-16 would work without any sp installed, since
I can't go back now. sigh
This is on windows, dunno about office for MAC.
for both working cases, the import works when launching a download directly from the
browser and the text import wizard doesn't intervence, it works like you would expect.
As Fregal said \uFEFF is the way to go.
<%#LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
<%
Response.Clear();
Response.ContentType = "text/csv";
Response.Charset = "utf-8";
Response.AddHeader("Content-Disposition", "attachment; filename=excelTest.csv");
Response.Write("\uFEFF");
// csv text here
%>
I've also noticed that the question was "answered" some time ago but I don't understand the stories that say you can't open a utf8-encoded csv file successfully in Excel without using the text wizard.
My reproducible experience:
Type Old MacDonald had a farm,ÈÌÉÍØ into Notepad, hit Enter, then Save As (using the UTF-8 option).
Using Python to show what's actually in there:
>>> open('oldmac.csv', 'rb').read()
'\xef\xbb\xbfOld MacDonald had a farm,\xc3\x88\xc3\x8c\xc3\x89\xc3\x8d\xc3\x98\r\n'
>>> ^Z
Good. Notepad has put a BOM at the front.
Now go into Windows Explorer, double click on the file name, or right click and use "Open with ...", and up pops Excel (2003) with display as expected.
You can save an html file with the extension 'xls' and accents will work (pre 2007 at least).
Example: save this (using Save As utf8 in Notepad) as test.xls:
<html>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8" />
<table>
<tr>
<th>id</th>
<th>name</th>
</tr>
<tr>
<td>4</td>
<td>Hélène</td>
</tr>
</table>
</html>
This is just of a question of character encodings. It looks like you're exporting your data as UTF-8: é in UTF-8 is the two-byte sequence 0xC3 0xA9, which when interpreted in Windows-1252 is é. When you import your data into Excel, make sure to tell it that the character encoding you're using is UTF-8.
The CSV format is implemented as ASCII, not unicode, in Excel, thus mangling the diacritics. We experienced the same issue which is how I tracked down that the official CSV standard was defined as being ASCII-based in Excel.
Excel 2007 properly reads UTF-8 with BOM (EF BB BF) encoded csv.
Excel 2003 (and maybe earlier) reads UTF-16LE with BOM (FF FE), but with TABs instead of commas or semicolons.
I can only get CSV to parse properly in Excel 2007 as tab-separated little-endian UTF-16 starting with the proper byte order mark.
Writing a BOM to the output CSV file actually did work for me in Django:
def handlePersoonListExport(request):
# Retrieve a query_set
...
template = loader.get_template("export.csv")
context = Context({
'data': query_set,
})
response = HttpResponse()
response['Content-Disposition'] = 'attachment; filename=export.csv'
response['Content-Type'] = 'text/csv; charset=utf-8'
response.write("\xEF\xBB\xBF")
response.write(template.render(context))
return response
For more info http://crashcoursing.blogspot.com/2011/05/exporting-csv-with-special-characters.html Thanks guys!
Another solution I found was just to encode the result as Windows Code Page 1252 (Windows-1252 or CP1252). This would be done, for example by setting Content-Type appropriately to something like text/csv; charset=Windows-1252 and setting the character encoding of the response stream similarly.
Note that including the UTF-8 BOM is not necessarily a good idea - Mac versions of Excel ignore it and will actually display the BOM as ASCII… three nasty characters at the start of the first field in your spreadsheet…
Check the encoding in which you are generating the file, to make excel display the file correctly you must use the system default codepage.
Wich language are you using? if it's .Net you only need to use Encoding.Default while generating the file.
If you have legacy code in vb.net like I have, the following code worked for me:
Response.Clear()
Response.ClearHeaders()
Response.ContentType = "text/csv"
Response.Expires = 0
Response.AddHeader("Content-Disposition", "attachment; filename=export.csv;")
Using sw As StreamWriter = New StreamWriter(Context.Response.OutputStream, System.Text.Encoding.Unicode)
sw.Write(csv)
sw.Close()
End Using
Response.End()
I've found a way to solve the problem. This is a nasty hack but it works: open the doc with Open Office, then save it into any excel format; the resulting .xls or .xlsx will display the accentuated characters.
With Ruby 1.8.7 I encode every field to UTF-16 and discard BOM (maybe).
The following code is extracted from active_scaffold_export:
<%
require 'fastercsv'
fcsv_options = {
:row_sep => "\n",
:col_sep => params[:delimiter],
:force_quotes => #export_config.force_quotes,
:headers => #export_columns.collect { |column| format_export_column_header_name(column) }
}
data = FasterCSV.generate(fcsv_options) do |csv|
csv << fcsv_options[:headers] unless params[:skip_header] == 'true'
#records.each do |record|
csv << #export_columns.collect { |column|
# Convert to UTF-16 discarding the BOM, required for Excel (> 2003 ?)
Iconv.conv('UTF-16', 'UTF-8', get_export_column_value(record, column))[2..-1]
}
end
end
-%><%= data -%>
The important line is:
Iconv.conv('UTF-16', 'UTF-8', get_export_column_value(record, column))[2..-1]
open the file csv with notepad++
clic on Encode, select convert to UTF-8 (not convert to UTF-8(without BOM))
Save
open by double clic with excel
Hope that help
Christophe GRISON

Resources