Convert UTF-8 string to ISO-8859-1 - string

My Classic ASP application retrieves an UTF-8 string from it's database, but I need to convert it to ISO-8859-1. I can't change the HTML page encoding;
I really need to convert just the fetched string. How can I do it?

I found the answer here:
Const adTypeBinary = 1
Const adTypeText = 2
' accept a string and convert it to Bytes array in the selected Charset
Function StringToBytes(Str,Charset)
Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
Stream.Type = adTypeText
Stream.Charset = Charset
Stream.Open
Stream.WriteText Str
Stream.Flush
Stream.Position = 0
' rewind stream and read Bytes
Stream.Type = adTypeBinary
StringToBytes= Stream.Read
Stream.Close
Set Stream = Nothing
End Function
' accept Bytes array and convert it to a string using the selected charset
Function BytesToString(Bytes, Charset)
Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
Stream.Charset = Charset
Stream.Type = adTypeBinary
Stream.Open
Stream.Write Bytes
Stream.Flush
Stream.Position = 0
' rewind stream and read text
Stream.Type = adTypeText
BytesToString= Stream.ReadText
Stream.Close
Set Stream = Nothing
End Function
' This will alter charset of a string from 1-byte charset(as windows-1252)
' to another 1-byte charset(as windows-1251)
Function AlterCharset(Str, FromCharset, ToCharset)
Dim Bytes
Bytes = StringToBytes(Str, FromCharset)
AlterCharset = BytesToString(Bytes, ToCharset)
End Function
So I just did this:
AlterCharset(str, "ISO-8859-1", "UTF-8")
And it worked nicely.

To expand on the OP's own self-answer, when converting from single-byte character sets (such as ISO-8859-1, Windows-1251, Windows-1252, etc...) to UTF-8, there is some needless redundancy in converting to and back from ADODB's byte array. The overhead of multiple function calls and conversions can be eliminated as such:
Const adTypeText = 2
Private Function AsciiStringToUTF8(AsciiString)
Dim objStream: Set objStream = CreateObject("ADODB.Stream")
Call objStream.Open()
objStream.Type = adTypeText
'Any single-byte charset should work in theory
objStream.Charset = "Windows-1252"
Call objStream.WriteText(AsciiString)
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
objStream.Position = 0
objStream.Charset = "UTF-8"
AsciiStringToUTF8 = objStream.ReadText()
Call objStream.Close(): Set objStream = Nothing
End Function

Related

Appending 2 CSV files, but produces garbage characters [duplicate]

how can I write UTF-8 encoded strings to a textfile from vba, like
Dim fnum As Integer
fnum = FreeFile
Open "myfile.txt" For Output As fnum
Print #fnum, "special characters: äöüß" 'latin-1 or something by default
Close fnum
Is there some setting on Application level?
I found the answer on the web:
Dim fsT As Object
Set fsT = CreateObject("ADODB.Stream")
fsT.Type = 2 'Specify stream type - we want To save text/string data.
fsT.Charset = "utf-8" 'Specify charset For the source text data.
fsT.Open 'Open the stream And write binary data To the object
fsT.WriteText "special characters: äöüß"
fsT.SaveToFile sFileName, 2 'Save binary data To disk
Certainly not as I expected...
You can use CreateTextFile or OpenTextFile method, both have an attribute "unicode" useful for encoding settings.
object.CreateTextFile(filename[, overwrite[, unicode]])
object.OpenTextFile(filename[, iomode[, create[, format]]])
Example: Overwrite:
CreateTextFile:
fileName = "filename"
Set fso = CreateObject("Scripting.FileSystemObject")
Set out = fso.CreateTextFile(fileName, True, True)
out.WriteLine ("Hello world!")
...
out.close
Example: Append:
OpenTextFile Set fso = CreateObject("Scripting.FileSystemObject")
Set out = fso.OpenTextFile("filename", ForAppending, True, 1)
out.Write "Hello world!"
...
out.Close
See more on MSDN docs
This writes a Byte Order Mark at the start of the file, which is unnecessary in a UTF-8 file and some applications (in my case, SAP) don't like it.
Solution here: Can I export excel data with UTF-8 without BOM?
Here is another way to do this - using the API function WideCharToMultiByte:
Option Explicit
Private Declare Function WideCharToMultiByte Lib "kernel32.dll" ( _
ByVal CodePage As Long, _
ByVal dwFlags As Long, _
ByVal lpWideCharStr As Long, _
ByVal cchWideChar As Long, _
ByVal lpMultiByteStr As Long, _
ByVal cbMultiByte As Long, _
ByVal lpDefaultChar As Long, _
ByVal lpUsedDefaultChar As Long) As Long
Private Sub getUtf8(ByRef s As String, ByRef b() As Byte)
Const CP_UTF8 As Long = 65001
Dim len_s As Long
Dim ptr_s As Long
Dim size As Long
Erase b
len_s = Len(s)
If len_s = 0 Then _
Err.Raise 30030, , "Len(WideChars) = 0"
ptr_s = StrPtr(s)
size = WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, 0, 0, 0, 0)
If size = 0 Then _
Err.Raise 30030, , "WideCharToMultiByte() = 0"
ReDim b(0 To size - 1)
If WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, VarPtr(b(0)), size, 0, 0) = 0 Then _
Err.Raise 30030, , "WideCharToMultiByte(" & Format$(size) & ") = 0"
End Sub
Public Sub writeUtf()
Dim file As Integer
Dim s As String
Dim b() As Byte
s = "äöüßµ#€|~{}[]²³\ .." & _
" OMEGA" & ChrW$(937) & ", SIGMA" & ChrW$(931) & _
", alpha" & ChrW$(945) & ", beta" & ChrW$(946) & ", pi" & ChrW$(960) & vbCrLf
file = FreeFile
Open "C:\Temp\TestUtf8.txt" For Binary Access Write Lock Read Write As #file
getUtf8 s, b
Put #file, , b
Close #file
End Sub
I looked into the answer from Máťa whose name hints at encoding qualifications and experience. The VBA docs say CreateTextFile(filename, [overwrite [, unicode]]) creates a file "as a Unicode or ASCII file. The value is True if the file is created as a Unicode file; False if it's created as an ASCII file. If omitted, an ASCII file is assumed." It's fine that a file stores unicode characters, but in what encoding? Unencoded unicode can't be represented in a file.
The VBA doc page for OpenTextFile(filename[, iomode[, create[, format]]]) offers a third option for the format:
TriStateDefault 2 "opens the file using the system default."
TriStateTrue 1 "opens the file as Unicode."
TriStateFalse 0 "opens the file as ASCII."
Máťa passes -1 for this argument.
Judging from VB.NET documentation (not VBA but I think reflects realities about how underlying Windows OS represents unicode strings and echoes up into MS Office, I don't know) the system default is an encoding using 1 byte/unicode character using an ANSI code page for the locale. UnicodeEncoding is UTF-16. The docs also describe UTF-8 is also a "Unicode encoding," which makes sense to me. But I don't yet know how to specify UTF-8 for VBA output nor be confident that the data I write to disk with the OpenTextFile(,,,1) is UTF-16 encoded. Tamalek's post is helpful.
I didn't want to change all my code just to support several UTF8 strings so i let my code do it's thing, and after the file was saved (in ANSI code as it is the default of excel) i then convert the file to UTF-8 using this code:
Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
Dim objFS As Object
Dim iFile As Double
Dim sFileData As String
'Init
iFile = FreeFile
Open sInFilePath For Input As #iFile
sFileData = Input$(LOF(iFile), iFile)
sFileData = sFileData & vbCrLf
Close iFile
'Open & Write
Set objFS = CreateObject("ADODB.Stream")
objFS.Charset = "utf-8"
objFS.Open
objFS.WriteText sFileData
'Save & Close
objFS.SaveToFile sOutFilePath, 2 '2: Create Or Update
objFS.Close
'Completed
Application.StatusBar = "Completed"
End Sub
and i use this sub like this (this is an example):
Call convertTxttoUTF("c:\my.json", "c:\my-UTF8.json")
i found this code here: VBA to Change File Encoding ANSI to UTF8 – Text to Unicode
and since this is written with BOM marker, in order to remove the bom i changed the Sub to this:
Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
Dim objStreamUTF8 As Object
Dim objStreamUTF8NoBOM As Object
Dim iFile As Double
Dim sFileData As String
Const adSaveCreateOverWrite = 2
Const adTypeBinary = 1
Const adTypeText = 2
'Init
iFile = FreeFile
Open sInFilePath For Input As #iFile
sFileData = Input(LOF(iFile), iFile)
Close iFile
'Open files
Set objStreamUTF8 = CreateObject("ADODB.Stream")
Set objStreamUTF8NoBOM = CreateObject("ADODB.Stream")
' wrute the fules
With objStreamUTF8
.Charset = "UTF-8"
.Open
.WriteText sFileData
.Position = 0
.SaveToFile sOutFilePath, adSaveCreateOverWrite
.Type = adTypeText
.Position = 3
End With
With objStreamUTF8NoBOM
.Type = adTypeBinary
.Open
objStreamUTF8.CopyTo objStreamUTF8NoBOM
.SaveToFile sOutFilePath, 2
End With
' close the files
objStreamUTF8.Close
objStreamUTF8NoBOM.Close
End Sub
i used this answer to solve the BOM unknown character at the beginning of the file
The traditional way to transform a string to a UTF-8 string is as follows:
StrConv("hello world",vbFromUnicode)
So put simply:
Dim fnum As Integer
fnum = FreeFile
Open "myfile.txt" For Output As fnum
Print #fnum, StrConv("special characters: äöüß", vbFromUnicode)
Close fnum
No special COM objects required

Problems reading a whole line from textfile

I have a program that reads lines of text (some 3000-4000 lines) from a textfile (saved with UTF from Notes). Each line consists of about 300-900 characters. I used this function:
Function loadVerbs2(fullPathName As String) As String()
Dim strings(0 To 5000) As String
Dim my_file As Integer
Dim text_line As String
Dim stringNr As Integer
my_file = FreeFile()
Open fullPathName For Input As my_file
stringNr = 0
While Not EOF(my_file)
Line Input #my_file, text_line
'Cut preceding "
While ((Asc(Left$(text_line, 1)) < Asc("a")) Or (Asc(Left$(text_line, 1)) > Asc("z")))
text_line = Mid$(text_line, 2)
Wend
' Cut ending " and ,
While ((Right$(text_line, 1) = Chr$(34)) Or (Right$(text_line, 1) = ","))
text_line = Left$(text_line, Len(text_line) - 1)
Wend
strings(stringNr) = latinCharacter(text_line)
stringNr = stringNr + 1
Wend
Close #my_file
loadVerbs2 = strings
End Function
For some reason the function doesn't read the whole line, but cut them.
So I changed it to this:
Function loadVerbs(fullPathName As String) As String()
Dim strings(0 To 5000) As String
Dim text_line As String
Dim stringNr As Integer
Dim fso As New FileSystemObject
Dim ts As TextStream
Set ts = fso.OpenTextFile(fullPathName)
Do While Not ts.AtEndOfStream
text_line = ts.ReadLine
'Cut preceding "
While ((Asc(Left$(text_line, 1)) < Asc("a")) Or (Asc(Left$(text_line, 1)) > Asc("z")))
text_line = Mid$(text_line, 2)
Wend
' Cut ending " and ,
While ((Right$(text_line, 1) = Chr$(34)) Or (Right$(text_line, 1) = ","))
text_line = Left$(text_line, Len(text_line) - 1)
Wend
strings(stringNr) = latinCharacter(text_line)
stringNr = stringNr + 1
Loop
ts.Close
loadVerbs = strings
End Function
But with the same result. There are some characters like this in the text: á í é ó à ò ì è â û î ñ ç which makes it neceserry for this "latinCharacter" function to convert them so I can put them on a spreadsheet. But these characters does not stop the ReadLine or Line Input from reading the entire line.
Any suggestions??
I think it's better to use an ADO Stream. One advantage is the possibility to set the right charset.
For infos to ADO Stream look here
Here is a demo for you with the relevant parameters for text imports and what they stand for:
Sub AdoStreamDemo()
Dim importPath As String
Dim importFileName As String
Dim objStream As Object
Dim lineOfTextFile As String
importPath = "D:\Your Folder\" 'Don't forget the bakslash (\) at the end
importFileName = "Your File.xxx" 'Filename with extension
'Initialise ADO Stream
Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "utf-8" 'default is Unicode
objStream.Type = 2 '2 = Text, 1 = Binary
objStream.LineSeparator = -1 '-1 = adCrLf (default), 13 = adCr, 10 = adLf
objStream.Open 'Opens the stream
objStream.LoadFromFile importPath & importFileName 'Path with backslash (\) at the end and filename
Do Until objStream.EOS 'EOS = End Of Stream
lineOfTextFile = objStream.ReadText(-2) '-2 = one row, -1 = all
'Do here what you want to do with every line of text from the file
Debug.Print lineOfTextFile
'Next line
Loop
objStream.Close 'Closes the stream
End Sub
Ok, problem solved! The easy answer is that there wasn't any problem. When you run the Microsoft Visual Basic for Application debugger and show the local variables window the debugger doesn't show the whole length of the strings. So the program works but it looks like it doesn't in the debugger.

VBA Function to convert .csv to UTF-8 without BOM

I made a macro that generate a .csv file with datas from an excel sheet.
When I click on a button, this csv file is send to a server and is consummed after minutes.
It works perfectly.
The only problem is that this csv file is generated in UTF-8 with BOM and I don't want that, I need UTF-8 without BOM
I got inspired from this function that I found simply and easy to read :
Use "ADODB.Stream" to convert ANSI to UTF-8, miss 1-2 character in the first row
I tried to adapt it into this :
Function ConvertToUtf8(myFileIn, myFileOut)
Dim stream, strText
Set stream = CreateObject("ADODB.Stream")
stream.Type = 2 'text
stream.Charset = "_autodetect_all"
stream.Open
stream.LoadFromFile myFileIn
strText = stream.ReadText
stream.Close
stream.Type = 2
stream.Charset = "utf-8"
stream.Open
stream.Position = 3 'without BOM (doesn't work)
stream.WriteText strText
stream.SaveToFile myFileOut, 2
stream.Close
Set stream = Nothing
End Function
I don't know why but it creates the file into UTF-8 without BOM but this error pop-up :
(https://imgur.com/a/FhJDAzh)
I'm pretty sure I'm not far from the solution but I don't find it
When I click on "debug", this line is in cause :
stream.Position = 3 'without BOM (doesn't work)
EDIT
Finally I found exactly what I was looking for, instead of a function which convert the file, a more efficient function to write directly in UTF-8 without BOM in the file :
Function WriteUTF8WithoutBOM(chaine As String, nomfichier As String)
Dim UTFStream As Object, BinaryStream As Object
With CreateObject("adodb.stream")
.Type = 2
.Mode = 3
.Charset = "UTF-8"
.LineSeparator = -1
.Open
.WriteText chaine, 1
.Position = 3 'skip BOM
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = 1
BinaryStream.Mode = 3
BinaryStream.Open
'Strips BOM (first 3 bytes)
.CopyTo BinaryStream
.Flush
.Close
End With
BinaryStream.SaveToFile nomfichier, 2
BinaryStream.Flush
BinaryStream.Close
End Function
For the credit, I found it here https://www.excel-downloads.com/threads/question-de-conversion-en-utf-8-pour-une-vba-qui-enregistre-un-txt.20011510/ (Staple1600 answer)

Convert a hex string to base64 in an Excel function

I have a lengthy string of hex values to convert to base64.
I'm looking for a simple format cell function such as =Hex2b64(Hexstring) that will accept any length of hex characters.
I have been using http://home.paulschou.net/tools/xlate/ to do my conversion manually. The conversion works and the data is received by all relevant databases and parsed appropriately.
The data I am receiving is hex represented binary, which has been converted in multiple blocks and concatenated into long hex strings in accordance with project documentation that I am not privy to.
A typical Input String would be:
Hex= 00014088F6650101393939393939392D30304646463238313030000343332353430342D35353FA10000002805900100002805
and the corresponding output would be:
B64 = AAFAiPZlAQE5OTk5OTk5LTAwRkZGMjgxMDAAA0MzI1NDA0LTU1P6EAAAAoBZABAAAoAF
Function Hex2Base64(ByVal sHex)
Static oNode As Object
Dim a() As Byte
If Len(sHex) Mod 2 <> 0 Then
sHex = Left(sHex, Len(sHex) - 1) & "0" & Right(sHex, 1)
End If
If oNode Is Nothing Then
Set oNode = CreateObject("MSXML2.DOMDocument").createElement("Node")
End If
With oNode
.text = ""
.dataType = "bin.hex"
.text = sHex
a = .nodeTypedValue
.dataType = "bin.base64"
.nodeTypedValue = a
Hex2Base64 = .text
End With
End Function

Need to convert text files to Unicode from UTF8 in vbscript

I have a bunch of text files in a folder that I need to change the encoding on to Unicode and instead of manually opening the files and saving them as Unicode I would like to have a script to do this.
The files are currently in UTF-8 encoding and my extremely limited scripting abilities can't figure this one out. I found the code below to convert to Unicode from ANSI and when I use this code it does convert it to Unicode but it messes up the characters so the conversion doesn't actually work. Any thoughts? Thanks in advance.
Set fso = CreateObject("Scripting.FileSystemObject")
Set oFolder = fso.GetFolder("C:\test")
Set oFiles = oFolder.files
For each file in oFiles
If Right(file.Name, 3) = "txt" Then
Set ANSIFile = fso.OpenTextFile(file.path, 1, False, False)
ANSIContent = ANSIFile.ReadAll
Set UNICODEFile = fso.OpenTextFile(file.path, 2, False, True)
UNICODEFile.Write ANSIContent
End If
Next
Unfortunately VBScript doesn't support this kind of conversion by itself. You can use an ADODB.Stream object, though:
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = 2 'text
stream.Charset = "utf-8"
stream.LoadFromFile "C:\input.txt"
text = stream.ReadText
stream.Close
Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.OpenTextFile("C:\output.txt", 2, True, True)
f.Write text
f.Close
Or a little more streamlined:
Set fso = CreateObject("Scripting.FileSystemObject")
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = 2 'text
stream.Charset = "utf-8"
stream.LoadFromFile "C:\input.txt"
fso.OpenTextFile("C:\output.txt", 2, True, True).Write stream.ReadText
stream.Close
If you want to replace the existing file you'll have to use the first version and use the same file for input and output. Use a loop like this to iterate over all files in a folder:
Set fso = CreateObject("Scripting.FileSystemObject")
Set stream = CreateObject("ADODB.Stream")
For Each f In fso.GetFolder("C:\source\folder").Files
stream.Open
stream.Type = 2 'text
stream.Charset = "utf-8"
stream.LoadFromFile f.Path
text = stream.ReadText
stream.Close
fso.OpenTextFile(f.Path, 2, True, True).Write text
Next

Resources