VBA Excel Macro write file using UTF-8 encoding [duplicate] - excel

This question already has answers here:
Save text file UTF-8 encoded with VBA
(7 answers)
Closed 8 years ago.
I'm creating a macro in excel that processes a spreadsheet and writes the content (text) to a file. I need this file to be encoded as UTF-8. I've tried opening the file as unicode using OpenTextFile(... TristateTrue) and StrConv(.. vbUnicode) but those only convert it to UTF-16. I've searched everywhere online and can't find anything. Is this even possible?
thanks

Dim fsT As Object
Set fsT = CreateObject("ADODB.Stream")
fsT.Type = 2 'Specify stream type - we want To save text/string data.
fsT.Charset = "utf-8" 'Specify charset For the source text data.
fsT.Open 'Open the stream And write binary data To the object
fsT.WriteText "special characters: äöüß"
fsT.SaveToFile sFileName, 2 'Save binary data To disk
Reference:
Save text file UTF-8 encoded with VBA

Related

Editing a Hex Stream in VBA Before Saving to File

Background: I am extracting a file that is saved in a SQL database using an ADODB connection. One column of the database is the filename, including the file extension, and another is the actual contents of the file as a hex stream. I would like to save this file and open it.
Problem: This works fine with .pdf files. However, with .png files there is an error- the file is corrupted when I try to open it. I used a hex editor (HxD) and noticed that there were excess values. If I remove these the file opens fine. The hex stream should begin with the "per mille" character (Chr(137) in excel) in order for the file to open. I have not found a way to edit the hex stream in excel without converting it to characters.
The .png file opens with no problem when I take out the first few characters using a hex editor so that the file begins with:
‰PNG
Or the equivalent in hex code:
89 50 4E 47
The excess characters are
ÿþ
Or the equivalent in hex code:
FF FE
(I am trying to remove these). These characters are in the saved file even when I remove 4 characters from the text string using
Content = Right(Content, Len(Content) - 4)
It's almost like they automatically get added before the string when I save the file.
Code:
Calling the save to file function, where Content is the file content and Name is the filename:
Call StringToTextFile("C:\", rst![Content], rst![Name])
The function is below:
Public Sub StringToTextFile(ByVal directory As String, ByVal Content As String, ByVal filename As String)
'Requires reference to scrrun.dll
Dim fso As Scripting.FileSystemObject
Dim ts As Scripting.TextStream
Set fso = New Scripting.FileSystemObject
If Right(filename, 4) = ".png" Then 'recognizing the .png file
'Content = CByte(Chr(137)) & Right(Content, Len(Content)) 'unsuccessful attempt at inserting "per mille" character
'iret = InStr(0, Chr(137), Content, vbBinaryCompare) 'unsuccessful attempt at finding the "per mille" character in the content
End If
Set ts = fso.CreateTextFile(directory & filename, True, True)
ts.Write Content
ts.Close
Dim myShell As Object
Set myShell = CreateObject("WScript.Shell")
myShell.Run directory & filename 'Open the saved file
End Sub
When I try to insert the "per mille" character using Chr(137) it just shows a blank space in the hex editor.
Any help is appreciated!
This seems to be a similar discussion, but I am unsure how to apply this to my case:
excel-vba-hex-to-ascii-error

Output from Excel to text file using VBA, "UTF-8 Unix" format needed

I have some data in Excel file and I need to share it with somebody.
The other side has specific requirements to the format: text file, *.dat name, pipe-separated, etc..
So I created a VBA-code to extract the data into a text file using the examples from this site.
At the moment my file is not being accepted because: "It's not a UTF-8 UNIX format".
I tried multiple solutions from the internet but each time I receive an answer that it's either "UTF-8 PC format" or "Unicode format".
Can somebody help me to solve this? Here's what I have so far:
'create output (this is from my initial version)
Set Fileout = fso.CreateTextFile(myFile, True, True)
For y = 2 To a
Fileout.WriteLine .Cells(y, 59)
Next y
Fileout.Close
'this is from my second conversion attempt, where I received UTF-8 PC format
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Position = 0
stream.Charset = "UTF-8"
stream.LoadFromFile myFile
'This is my latest try. I found something called "UTF-8 without BOM" on the internet. Still doesn't work.
Set StreamUnixUtf = CreateObject("ADODB.Stream")
StreamUnixUtf.Type = 1
StreamUnixUtf.Mode = 3
StreamUnixUtf.Open
stream.Position = 3
stream.CopyTo StreamUnixUtf
stream.Flush
stream.Close
StreamUnixUtf.SaveToFile myFileUnixUTF, 2
StreamUnixUtf.Close
Thank you in advance for your help.
How about TextWrangler (mac) or Programmer's file editor (PC) and select unix line feed - have this when I need to submit a file onto a specific system...

Excel export to .txt via script

I got the issue to export a huge amount of Excel cells to a .txt file, which necessarily needs to in UTF-8 format. Therefore I made a VBScript, which is executed by a batch file and it totally does what it should (despite it creates a UTF-16 file).
Set file = fso.OpenTextFile(FILE, 2, True, -1)
In the documentation is mentioned that the -1 will generate a Unicode file and I am quite sure, that this is limited to UTF-16.
My questions are now: am I missing something or it is quite not possible to achieve this with VBScript? Is there an easier way? Besides that: is this platform independent?
The FileSystemObject doesn't support UTF-8, but ADODB.Stream objects do.
...
'Note: wb is the variable holding your workbook object
'Save worksheet as Unicode text ...
wb.SaveAs "C:\utf16.txt", 42
'... read the Unicode text into a variable ...
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\utf16.txt", 1, False, -1).ReadAll
'... and export it as UTF-8 text.
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = 2 'text
stream.Position = 0
stream.Charset = "utf-8"
stream.WriteText txt
stream.SaveToFile "C:\utf8.txt", 2
stream.Close
...

Exporting a Microsoft Access table to UTF-16 CSV

I have an Access table with some Chinese characters that I need to export into a CSV file with UTF-16 encoding. If this is not possible, I could also try exporting the table into an XLS or CSV file, and then convert the encoding to UTF-16.
I have a feeling there is no simple way of doing this using Access and/or Excel and/or VBA, but if there is, I would love to hear it! If not, a solution using Java would be helpful.
I'm sure it would be helpful if I knew what encoding the file was already in. The Chinese characters show up correctly when I export the file to Microsoft Excel 2000, but they do not show up correctly in Microsoft Access. They were originally typed into Microsoft Excel. I think that means they are in Unicode rich text, but I'm not sure.
Thanks much!
I use ADO streams to do this sort of thing. I had to do this for a TON of websites where I was helping them with SEO automation.
http://www.nonhostile.com/howto-convert-byte-array-utf8-string-vb6.asp
' accept a byte array containing utf-8 data
' and convert it to a string
Public Function ConvertStringToUtf8Bytes(ByRef strText As String) As Byte()
Dim objStream As ADODB.Stream
Dim data() As Byte
' init stream
Set objStream = New ADODB.Stream
objStream.Charset = "utf-16"
objStream.Mode = adModeReadWrite
objStream.Type = adTypeText
objStream.Open
' write bytes into stream
objStream.WriteText strText
objStream.Flush
' rewind stream and read text
objStream.Position = 0
objStream.Type = adTypeBinary
objStream.Read 3 ' skip first 3 bytes as this is the utf-8 marker
data = objStream.Read()
' close up and return
objStream.Close
ConvertStringToUtf8Bytes = data
End Function

xls to csv with chainese character support

While saving an excel file with chainese character to csv, these characters are converting to ??? (Question marks) junk characters.
Please let me know if any of you have any solution for this. I tried saving it in unicode text, it worked fine but when I tried saving it as .csv, its not working.
Thanks
I has a similar problem with Japanese characters before. At the time Excel 2003 only exported CSV to Latin1 (or maybe Windows 1352). I basically wrote my own Excel Macro to iterate over the rows and columns, and build up an in memory string of what the CSV file would look like. Then, I used an ADODB.Stream to save it myself. This sample code should get you started.
Dim csvdata As String
Dim CRLF As String
Dim objStream As Object
CRLF = Chr(13) & Chr(10)
csvdata = """key"",""value""" + CRLF
csvdata = csvdata + """a"",""a""" + CRLF
csvdata = csvdata + """aacute"",""á""" + CRLF
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Position = 0
objStream.Charset = "UTF-8"
objStream.WriteText csvdata
objStream.SaveToFile "test.csv", 2 ' adSaveCreateOverWrite
objStream.Close
Why do you need a CSV file? What encoding do you need it in? UTF-8? GBK? What software is going to read the CSV file? What version of Excel are you using?
If you know Python, you could use the xlrd module to read the Excel file, format the data, encode it, and write it to a CSV file, or use it to update a database, or whatever.

Resources