xls to csv with chainese character support - excel

While saving an excel file with chainese character to csv, these characters are converting to ??? (Question marks) junk characters.
Please let me know if any of you have any solution for this. I tried saving it in unicode text, it worked fine but when I tried saving it as .csv, its not working.
Thanks

I has a similar problem with Japanese characters before. At the time Excel 2003 only exported CSV to Latin1 (or maybe Windows 1352). I basically wrote my own Excel Macro to iterate over the rows and columns, and build up an in memory string of what the CSV file would look like. Then, I used an ADODB.Stream to save it myself. This sample code should get you started.
Dim csvdata As String
Dim CRLF As String
Dim objStream As Object
CRLF = Chr(13) & Chr(10)
csvdata = """key"",""value""" + CRLF
csvdata = csvdata + """a"",""a""" + CRLF
csvdata = csvdata + """aacute"",""á""" + CRLF
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Position = 0
objStream.Charset = "UTF-8"
objStream.WriteText csvdata
objStream.SaveToFile "test.csv", 2 ' adSaveCreateOverWrite
objStream.Close

Why do you need a CSV file? What encoding do you need it in? UTF-8? GBK? What software is going to read the CSV file? What version of Excel are you using?
If you know Python, you could use the xlrd module to read the Excel file, format the data, encode it, and write it to a CSV file, or use it to update a database, or whatever.

Related

How to change encoding from UTF-8 to UTF-8-BOM of exported *.txt files from Excel?

Exported text files from Excel are encoded with UTF-8.
An encoding UTF-8-BOM is needed.
I think that in code shall be inserted a row, written like:
Java
?xml version="1.0" encoding="UTF-8"?
Jasperreport CSV UTF-8 without BOM instead of UTF-8
or
HTML5
meta charset="utf-8"
Bad UTF-8 without BOM encoding
Sub export_data()
Dim row, column, i, j As Integer
Dim fullPath, myFile As String
fullPath = "C:\Workspace"
row = 21
column = 5
For i = 1 To column
myFile = Cells(1, i).Value + ".txt"
myFile = fullPath + "/" + myFile
Open myFile For Output As #1
For j = 2 To row
Print #1, Cells(j, i).Value
Next j
Close #1
Next i
End Sub
How can I define and where to put a row, which defines encoding UTF-8-BOM?
Thank You.
Instead of Printing the file line by line, it might be more efficient to
save your selected range as a CSV UTF-8
you might need to change the file type after saving
Use ADO to process the file as UTF-8
Either will add a BOM automatically.
EDIT
If you are unfamiliar, you could perform the save to csv - utf8 process manually with the macro recorder turned on. Then examine what you have recorded and make appropriate edits.
Another way of adding the BOM, in the context of your existing code, would be to write it directly as a byte array to the first line.
For example:
Dim BOM(0 To 2) As Byte 'EF BB BF
BOM(0) = &HEF
BOM(1) = &HBB
BOM(2) = &HBF
Open myFile For Binary Access Write As #1
Put #1, 1, BOM
Close #1
will put the BOM at the beginning of the file.
You should then change the mode in your subsequent Print code to Append.
I suggest you read about the pros and cons of using Print vs Write
You should also read about declaration statements. In yours, only the last variable on each line is being declared as the specified type; the preceding variables are being implicitly declared as being of type Variant.

Editing a Hex Stream in VBA Before Saving to File

Background: I am extracting a file that is saved in a SQL database using an ADODB connection. One column of the database is the filename, including the file extension, and another is the actual contents of the file as a hex stream. I would like to save this file and open it.
Problem: This works fine with .pdf files. However, with .png files there is an error- the file is corrupted when I try to open it. I used a hex editor (HxD) and noticed that there were excess values. If I remove these the file opens fine. The hex stream should begin with the "per mille" character (Chr(137) in excel) in order for the file to open. I have not found a way to edit the hex stream in excel without converting it to characters.
The .png file opens with no problem when I take out the first few characters using a hex editor so that the file begins with:
‰PNG
Or the equivalent in hex code:
89 50 4E 47
The excess characters are
ÿþ
Or the equivalent in hex code:
FF FE
(I am trying to remove these). These characters are in the saved file even when I remove 4 characters from the text string using
Content = Right(Content, Len(Content) - 4)
It's almost like they automatically get added before the string when I save the file.
Code:
Calling the save to file function, where Content is the file content and Name is the filename:
Call StringToTextFile("C:\", rst![Content], rst![Name])
The function is below:
Public Sub StringToTextFile(ByVal directory As String, ByVal Content As String, ByVal filename As String)
'Requires reference to scrrun.dll
Dim fso As Scripting.FileSystemObject
Dim ts As Scripting.TextStream
Set fso = New Scripting.FileSystemObject
If Right(filename, 4) = ".png" Then 'recognizing the .png file
'Content = CByte(Chr(137)) & Right(Content, Len(Content)) 'unsuccessful attempt at inserting "per mille" character
'iret = InStr(0, Chr(137), Content, vbBinaryCompare) 'unsuccessful attempt at finding the "per mille" character in the content
End If
Set ts = fso.CreateTextFile(directory & filename, True, True)
ts.Write Content
ts.Close
Dim myShell As Object
Set myShell = CreateObject("WScript.Shell")
myShell.Run directory & filename 'Open the saved file
End Sub
When I try to insert the "per mille" character using Chr(137) it just shows a blank space in the hex editor.
Any help is appreciated!
This seems to be a similar discussion, but I am unsure how to apply this to my case:
excel-vba-hex-to-ascii-error

VBA Excel Macro write file using UTF-8 encoding [duplicate]

This question already has answers here:
Save text file UTF-8 encoded with VBA
(7 answers)
Closed 8 years ago.
I'm creating a macro in excel that processes a spreadsheet and writes the content (text) to a file. I need this file to be encoded as UTF-8. I've tried opening the file as unicode using OpenTextFile(... TristateTrue) and StrConv(.. vbUnicode) but those only convert it to UTF-16. I've searched everywhere online and can't find anything. Is this even possible?
thanks
Dim fsT As Object
Set fsT = CreateObject("ADODB.Stream")
fsT.Type = 2 'Specify stream type - we want To save text/string data.
fsT.Charset = "utf-8" 'Specify charset For the source text data.
fsT.Open 'Open the stream And write binary data To the object
fsT.WriteText "special characters: äöüß"
fsT.SaveToFile sFileName, 2 'Save binary data To disk
Reference:
Save text file UTF-8 encoded with VBA

Exporting a Microsoft Access table to UTF-16 CSV

I have an Access table with some Chinese characters that I need to export into a CSV file with UTF-16 encoding. If this is not possible, I could also try exporting the table into an XLS or CSV file, and then convert the encoding to UTF-16.
I have a feeling there is no simple way of doing this using Access and/or Excel and/or VBA, but if there is, I would love to hear it! If not, a solution using Java would be helpful.
I'm sure it would be helpful if I knew what encoding the file was already in. The Chinese characters show up correctly when I export the file to Microsoft Excel 2000, but they do not show up correctly in Microsoft Access. They were originally typed into Microsoft Excel. I think that means they are in Unicode rich text, but I'm not sure.
Thanks much!
I use ADO streams to do this sort of thing. I had to do this for a TON of websites where I was helping them with SEO automation.
http://www.nonhostile.com/howto-convert-byte-array-utf8-string-vb6.asp
' accept a byte array containing utf-8 data
' and convert it to a string
Public Function ConvertStringToUtf8Bytes(ByRef strText As String) As Byte()
Dim objStream As ADODB.Stream
Dim data() As Byte
' init stream
Set objStream = New ADODB.Stream
objStream.Charset = "utf-16"
objStream.Mode = adModeReadWrite
objStream.Type = adTypeText
objStream.Open
' write bytes into stream
objStream.WriteText strText
objStream.Flush
' rewind stream and read text
objStream.Position = 0
objStream.Type = adTypeBinary
objStream.Read 3 ' skip first 3 bytes as this is the utf-8 marker
data = objStream.Read()
' close up and return
objStream.Close
ConvertStringToUtf8Bytes = data
End Function

vba code for excel - to encoding gibberish to hebrew

i have files that open with excel.
when i open the file the text is like gibberish.
i need to encode - tools-internet option - general-encode - hebrew iso-visual
and then the file turn to hebrew
there is a vba code that do that ?
thanks,
omri
I don't really have a way to test this, so I am just taking a shot:
Excel.ActiveWorkbook.WebOptions.Encoding = msoEncodingHebrew
Use the following function from ADODB Stream, with the following code.
Page 1255 is the original Hebrew page.
And you need to reference the latest Microsoft ActiveX Data Objects Library.
(Tools/References)
Public Function CorrectHebrew(gibberish As String) As String
Dim inStream As ADODB.stream
Set inStream = New ADODB.stream
inStream.Open
inStream.Charset = "WIndows-1255"
inStream.WriteText gibberish
inStream.Position = 0 ' bring it back to start preparing for the ReadText
inStream.Charset = "UTF-8"
CorrectHebrew = inStream.ReadText ' return the corrected text
inStream.Close
End Function

Resources