Can I export excel data with UTF-8 without BOM? - excel

I export Microsoft Excel data by Excel Macro(VBScript).
Because file is lua script, I export it as UTF-8.
The only way I can make UTF-8 in Excel is using adodb.stream like this
set fileLua = CreateObject("adodb.stream")
fileLua.Type = 2
fileLua.Mode = 3
fileLua.Charset = "UTF-8"
fileLua.Open
fileLua.WriteText("test")
fileLua.SaveToFile("Test.lua")
fileLua.flush
fileLua.Close
I want to make eliminate BOM from Test.lua but I don't know how.
(Because Test.lua has some unicode text, I have to use UTF-8 format.)
Do you know how to make UTF-8 file without BOM in excel file?
Thanks in advance.

I have also the same issue: have to export data from Excel (Office 2003, VBA6.5) to UTF-8 encoded file. Found the answer from your question ! Below my example where I also strip the BOM using trick #2 from boost's (thanks!) answer. I didn't get #1 working and never tried #3.
Sub WriteUTF8WithoutBOM()
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.LineSeparator = adLF
UTFStream.Open
UTFStream.WriteText "This is an unicode/UTF-8 test.", adWriteLine
UTFStream.WriteText "First set of special characters: öäåñüûú€", adWriteLine
UTFStream.WriteText "Second set of special characters: qwertzuiopõúasdfghjkléáûyxcvbnm\|Ä€Í÷×äðÐ[]í³£;?¤>#&#{}<;>*~¡^¢°²`ÿ´½¨¸0", adWriteLine
UTFStream.Position = 3 'skip BOM
Dim BinaryStream As Object
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
'Strips BOM (first 3 bytes)
UTFStream.CopyTo BinaryStream
'UTFStream.SaveToFile "d:\adodb-stream1.txt", adSaveCreateOverWrite
UTFStream.Flush
UTFStream.Close
BinaryStream.SaveToFile "d:\adodb-stream2.txt", adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
End Sub
The ADO Stream Object reference I used.

If anyone else is struggling with the adTypeText constant, you need to include "Microsoft ActiveX Data Objects 2.5 Object Library" under Tools->References.

A few possibilities:
Put the text into the buffer as UTF-8, Type=2, but then set Type=1 (as binary) and write that out. That might convince ADODB.Stream to skip adding the BOM.
Create another buffer, as type binary, and use the CopyTo to copy the data to that buffer from a point after the BOM.
Read the file in again using Scripting.FileSystemObject, trim off the BOM, write out again

Edit
A comment from rellampec alerted me to a better way of dropping the LF I had discovered was added to the end of the file by user272735's method. I have added a new version of my routine at the end.
Original post
I had been using user272735's method successfully for a year when I discovered it added a LF at the end of the file. I failed to notice this extra LF until I did some very detailed testing so this is not an important error. However, my latest version discards that LF just in case it ever became important.
Public Sub PutTextFileUtf8(ByVal PathFileName As String, ByVal FileBody As String)
' Outputs FileBody as a text file (UTF-8 encoding without leading BOM)
' named PathFileName
' Needs reference to "Microsoft ActiveX Data Objects n.n Library"
' Addition to original code says version 2.5. Tested with version 6.1.
' 1Nov16 Copied from http://stackoverflow.com/a/4461250/973283
' but replaced literals with parameters.
' 15Aug17 Discovered routine was adding an LF to the end of the file.
' Added code to discard that LF.
' References: http://stackoverflow.com/a/4461250/973283
' https://www.w3schools.com/asp/ado_ref_stream.asp
Dim BinaryStream As Object
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
' The LineSeparator will be added to the end of FileBody. It is possible
' to select a different value for LineSeparator but I can find nothing to
' suggest it is possible to not add anything to the end of FileBody
UTFStream.LineSeparator = adLF
UTFStream.Open
UTFStream.WriteText FileBody, adWriteLine
UTFStream.Position = 3 'skip BOM
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
UTFStream.CopyTo BinaryStream
' Oriinally I planned to use "CopyTo Dest, NumChars" to not copy the last
' byte. However, NumChars is described as an integer whereas Position is
' described as Long. I was concerned by "integer" they mean 16 bits.
'Debug.Print BinaryStream.Position
BinaryStream.Position = BinaryStream.Position - 1
BinaryStream.SetEOS
'Debug.Print BinaryStream.Position
UTFStream.Flush
UTFStream.Close
Set UTFStream = Nothing
BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
Set BinaryStream = Nothing
End Sub
New version of routine
This version omits the code to discard the unwanted LF added at the end because it avoids adding the LF in the first place. I have retained the original version in case anyone is interested in the technique for deleting trailing characters.
Public Sub PutTextFileUtf8NoBOM(ByVal PathFileName As String, ByVal FileBody As String)
' Outputs FileBody as a text file named PathFileName using
' UTF-8 encoding without leading BOM
' Needs reference to "Microsoft ActiveX Data Objects n.n Library"
' Addition to original code says version 2.5. Tested with version 6.1.
' 1Nov16 Copied from http://stackoverflow.com/a/4461250/973283
' but replaced literals with parameters.
' 15Aug17 Discovered routine was adding an LF to the end of the file.
' Added code to discard that LF.
' 11Oct17 Posted to StackOverflow
' 9Aug18 Comment from rellampec suggested removal of adWriteLine from
' WriteTest statement would avoid adding LF.
' 30Sep18 Amended routine to remove adWriteLine from WriteTest statement
' and code to remove LF from file. Successfully tested new version.
' References: http://stackoverflow.com/a/4461250/973283
' https://www.w3schools.com/asp/ado_ref_stream.asp
Dim BinaryStream As Object
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.Open
UTFStream.WriteText FileBody
UTFStream.Position = 3 'skip BOM
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
UTFStream.CopyTo BinaryStream
UTFStream.Flush
UTFStream.Close
Set UTFStream = Nothing
BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
Set BinaryStream = Nothing
End Sub

Uf you prefer native T-SQL instead of external code
DECLARE #FILE_NAME VARCHAR(255) = 'd:\utils\test.xml' --drive:\path\filename\
DECLARE #FILE_DATA VARCHAR(MAX) = '<?xml version="1.0" encoding="UTF-8"?>test</xml>' --binary as varchar(max)
DECLARE #FILE_NAME_TO VARCHAR(255) --Temp name for text stream
DECLARE #FSO_ID_TXTSTRM INT --Text Stream
DECLARE #FSO_ID_BINSTRM INT --Binary Stream
DECLARE #RC INT
EXEC #RC = sp_OACreate 'ADODB.Stream', #FSO_ID_TXTSTRM OUTPUT
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'Type', 2 --1 = binary, 2 = text
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'Mode', 3 --0 = not set, 1 read, 2 write, 3 read/write
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'Charset', 'UTF-8' --'ISO-8859-1'
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'LineSeparator', 'adLF'
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'Open'
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'WriteText', NULL, #FILE_DATA --text method
--Create binary stream
EXEC #RC = sp_OACreate 'ADODB.Stream', #FSO_ID_BINSTRM OUTPUT
EXEC #RC = sp_OASetProperty #FSO_ID_BINSTRM, 'Type', 1 --1 = binary, 2 = text
EXEC #RC = sp_OAMethod #FSO_ID_BINSTRM, 'Open'
EXEC #RC = sp_OASetProperty #FSO_ID_BINSTRM, 'Mode', 3 --0 = not set, 1 read, 2 write, 3 read/write
--Move 3 positions forward in text stream (BOM is first 3 positions)
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'Position', 3
--Copy text stream to binary stream
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'CopyTo', NULL, #FSO_ID_BINSTRM
--Commit data and close text stream
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'Flush'
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'Close'
EXEC #RC = sp_OADestroy #FSO_ID_TXTSTRM
--Save binary stream to file and close
EXEC #RC = sp_OAMethod #FSO_ID_BINSTRM, 'SaveToFile', NULL, #FILE_NAME, 2 --1 = notexist 2 = overwrite
EXEC #RC = sp_OAMethod #FSO_ID_BINSTRM, 'Close'
EXEC #RC = sp_OADestroy #FSO_ID_BINSTRM

Here's another BOM-disposal hack, from an answer that overlaps your question.
Apologies for the late answer - this is more for other people who are encountering Byte Order Markers - and the page views on this question tell me that your question is relevant to several related problems: it's surprisingly difficult to write a BOM-Free file in VBA - even some of the common streams libraries deposit a BOM in your output, whether you asked for it or not.
I say my answer 'overlaps' because the code below is solving a slightly different problem - the primary purpose is writing a Schema file for a folder with a heterogeneous collection of files - but it's a working example of BOM-removal and BOM-free file writing in use, and the relevant segment is clearly marked.
The key functionality is that we iterate through all the '.csv' files in a folder, and we test each file with a quick nibble of the first four bytes: and we only only undertake the onerous task of stripping out a the marker if we see one.
We're working with low-level file-handling code from the primordial C. We have to, all the way down to using byte arrays, because everything else that you do in VBA will deposit the Byte Order Markers embedded in the structure of a string variable.
So, without further adodb, here's the code:
BOM-Disposal code for text files in a schema.ini file:
Public Sub SetSchema(strFolder As String)
On Error Resume Next
' Write a Schema.ini file to the data folder.
' This is necessary if we do not have the registry privileges to set the
' correct 'ImportMixedTypes=Text' registry value, which overrides IMEX=1
' The code also checks for ANSI or UTF-8 and UTF-16 files, and applies a
' usable setting for CharacterSet ( UNICODE|ANSI ) with a horrible hack.
' OEM codepage-defined text is not supported: further coding is required
' ...And we strip out Byte Order Markers, if we see them - the OLEDB SQL
' provider for textfiles can't deal with a BOM in a UTF-16 or UTF-8 file
' Not implemented: handling tab-delimited files or other delimiters. The
' code assumes a header row with columns, specifies 'scan all rows', and
' imposes 'read the column as text' if the data types are mixed.
Dim strSchema As String
Dim strFile As String
Dim hndFile As Long
Dim arrFile() As Byte
Dim arrBytes(0 To 4) As Byte
If Right(strFolder, 1) <> "\" Then strFolder = strFolder & "\"
' Dir() is an iterator function when you call it with a wildcard:
strFile = VBA.FileSystem.Dir(strFolder & "*.csv")
Do While Len(strFile) > 0
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Get #hndFile, , arrBytes
Close #hndFile
strSchema = strSchema & "[" & strFile & "]" & vbCrLf
strSchema = strSchema & "Format=CSVDelimited" & vbCrLf
strSchema = strSchema & "ImportMixedTypes=Text" & vbCrLf
strSchema = strSchema & "MaxScanRows=0" & vbCrLf
If arrBytes(2) = 0 Or arrBytes(3) = 0 Then ' this is a hack
strSchema = strSchema & "CharacterSet=UNICODE" & vbCrLf
Else
strSchema = strSchema & "CharacterSet=ANSI" & vbCrLf
End If
strSchema = strSchema & "ColNameHeader = True" & vbCrLf
strSchema = strSchema & vbCrLf
' ***********************************************************
' BOM disposal - Byte order marks break the Access OLEDB text provider:
If arrBytes(0) = &HFE And arrBytes(1) = &HFF _
Or arrBytes(0) = &HFF And arrBytes(1) = &HFE Then
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
ReDim arrFile(0 To LOF(hndFile) - 1)
Get #hndFile, , arrFile
Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1), ""
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Put #hndFile, , arrFile
Close #hndFile
Erase arrFile
ElseIf arrBytes(0) = &HEF And arrBytes(1) = &HBB And arrBytes(2) = &HBF Then
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
ReDim arrFile(0 To LOF(hndFile) - 1)
Get #hndFile, , arrFile
Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1) & arrBytes(2), ""
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Put #hndFile, , arrFile
Close #hndFile
Erase arrFile
End If
' ***********************************************************
strFile = ""
strFile = Dir
Loop
If Len(strSchema) > 0 Then
strFile = strFolder & "Schema.ini"
hndFile = FreeFile
Open strFile For Binary As #hndFile
Put #hndFile, , strSchema
Close #hndFile
End If
End Sub
Public Sub BigReplace(ByRef arrBytes() As Byte, _
ByRef SearchFor As String, _
ByRef ReplaceWith As String)
On Error Resume Next
Dim varSplit As Variant
varSplit = Split(arrBytes, SearchFor)
arrBytes = Join$(varSplit, ReplaceWith)
Erase varSplit
End Sub
The code's easier to understand if you know that a Byte Array can be assigned to a VBA.String, and vice versa. The BigReplace() function is a hack that sidesteps some of VBA's inefficient string-handling, especially allocation: you'll find that large files cause serious memory and performance problems if you do it any other way.

Related

How to maintain character set while exporting Excel table to .csv UTF8 without BOM using VBA?

I have read several other answers regarding how to export a table to .csv with UTF8 encoding (no BOM). I found code which almost works for me, see below.
My problem is that the table contains swedish characters (ÅÄÖ), and when the .csv-file is opened these are lost to what looks like an incorrect charset. I found a workaround which is to open the .csv-file in Notepad, save, and then open it in Excel. The workaround makes Excel display the letters properly, but I would prefer not to have the extra step. Can the code below be modified so that the charset is not lost?
Option Explicit
Sub CSVFileAsUTF8WithoutBOM()
Dim SrcRange As Range
Dim CurrRow As Range
Dim CurrCell As Range
Dim CurrTextStr As String
Dim ListSep As String
Dim FName As Variant
Dim UTFStream As Object
Dim BinaryStream As Object
' ADO Constants
Const adTypeBinary = 1 ' The stream contains binary data
Const adTypeText = 2 ' The stream contains text data (default)
Const adWriteLine = 1 ' write text string and a line separator (as defined by the LineSeparator property) to the stream.
Const adModeReadWrite = 3 ' Read/write
Const adLF = 10 ' Line feed only - default is carriage return line feed (adCRLF)
Const adSaveCreateOverWrite = 2 ' Overwrites the file with the data from the currently open Stream object, if the file already exists
' Open this workbook location
ChDrive Left(ThisWorkbook.Path, 1)
ChDir ThisWorkbook.Path
' ask for file name and path
FName = Application.GetSaveAsFilename("", "CSV File (*.csv), *.csv")
' prepare UTF-8 stream
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.LineSeparator = adLF
UTFStream.Open
'set field separator
ListSep = ";"
'set source range with data for csv file
If Selection.Cells.Count > 1 Then
Set SrcRange = Selection
Else
Set SrcRange = ActiveSheet.UsedRange
End If
For Each CurrRow In SrcRange.Rows
CurrTextStr = ""
For Each CurrCell In CurrRow.Cells
CurrTextStr = CurrTextStr & Replace(CurrCell.Value, """", """""") & ListSep
Next
'remove ListSep after the last value in line
While Right(CurrTextStr, 1) = ListSep
CurrTextStr = Left(CurrTextStr, Len(CurrTextStr) - 1)
Wend
'add line to UTFStream
UTFStream.WriteText CurrTextStr, adWriteLine ' Writes character data to a text Stream object
Next
'skip BOM
UTFStream.Position = 3 ' sets or returns a long value that indicates the current position (in bytes) from the beginning of a Stream object
'copy UTFStream to BinaryStream
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open ' Opens a Stream object
'Strips BOM (first 3 bytes)
UTFStream.CopyTo BinaryStream ' Copies a specified number of characters/bytes from one Stream object into another Stream object
UTFStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
UTFStream.Close ' Closes a Stream object
'save to file
BinaryStream.SaveToFile FName, adSaveCreateOverWrite
BinaryStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
BinaryStream.Close ' Closes a Stream object
End Sub
EDIT:
Based on your comment, I realize that what you initially wanted was to keep the information about the character encoding inside the file without having a BOM.
The problem with this question (as you realized it) is that the BOM is actually what normally contains the information about the character encoding and putting this information anywhere else in the file doesn't really make sense.
So, your code is actually perfect for the task at hand. What needs to be changed is how the CSV file is imported/opened by the software you want to use.
When the file has no BOM, a software reading the file has to guess the
character encoding.
In general, if the software you use doesn't support BOM and doesn't guess correctly, there should at least be a way to customize the behavior of the import/open command so that you can specify the character encoding (seems like you actually found it).
Original answer:
For some reason, Excel has a hard time to guess the character encoding when opening a UTF-8 encoded CSV file when you just double-clicking the file. You have to help it a little...
Instead of opening it directly, you could load the CSV content to a new workbook by using the (legacy) Text Import Wizard and selecting the UTF-8 character set (65001) during import if Excel is not able to figure it out by itself.
If you were to record a macro while doing it and make it into a sub procedure, you could have something like this:
Sub OpenCSV(FullFileName As String)
Dim wb As Workbook
Set wb = Workbooks.Add
Dim ws As Worksheet
Set ws = wb.Sheets(1)
With ws.QueryTables.Add(Connection:= _
"TEXT;" & FullFileName, Destination:=Range( _
"$A$1"))
.Name = "CSV_Open"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 65001
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End Sub
Other suggestion
If you really want to be able to double-click the file instead of using the Text Import Wizard or running a macro, you could always create a VBA event procedure in an add-in or PERSONAL.XSLB running every time a workbook is opened.
If it detects that the file that was just opened is a CSV file, it could close it and "reopen" it using the code above.
Extra:
Of interest: there is a question here about how to change the default character encoding that Excel uses.

Save .txt locally (UTF-8) [duplicate]

My system is Window 10 English-US.
I need to write some non-printable ASCII characters to a text file. So for eg for the ASCII value of 28, I want to write \u001Cw to the file. I don't have to do anything special when coded in Java. Below is my code in VBS
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Type = 2
objStream.Position = 0
objStream.CharSet = "utf-16"
objStream.WriteText ChrW(28) 'Need this to appear as \u001Cw in the output file
objStream.SaveToFile "C:\temp\test.txt", 2
objStream.Close
You need a read-write stream so that writing to it and saving it to file both work.
Const adModeReadWrite = 3
Const adTypeText = 2
Const adSaveCreateOverWrite = 2
Sub SaveToFile(text, filename)
With CreateObject("ADODB.Stream")
.Mode = adModeReadWrite
.Type = adTypeText
.Charset = "UTF-16"
.Open
.WriteText text
.SaveToFile filename, adSaveCreateOverWrite
.Close
End With
End Sub
text = Chr(28) & "Hello" & Chr(28)
SaveToFile text, "C:\temp\test.txt"
Other notes:
I like to explicitly define with Const all the constants in the code. Makes reading so much easier.
A With block save quite some typing here.
Setting the stream type to adTypeText is not really necessary, that's the default anyway. But explicit is better than implicit, I guess.
Setting the Position to 0 on a new stream is superfluous.
It's unnecessary to use ChrW() for ASCII-range characters. The stream's Charset decides the byte width when you save the stream to file. In RAM, everything is Unicode anyway (yes, even in VBScript).
There are two UTF-16 encodings supported by ADODB.Stream: little-endian UTF-16LE (which is the default and synonymous with UTF-16) and big-endian UTF-16BE, with the byte order reversed.
You can achieve the same result with the FileSystemObject and its CreateTextFile() method:
Set FSO = CreateObject("Scripting.FileSystemObject")
Sub SaveToFile(text, filename)
' CreateTextFile(filename [, Overwrite [, Unicode]])
With FSO.CreateTextFile(filename, True, True)
.Write text
.Close
End With
End Sub
text = Chr(28) & "Hello" & Chr(28)
SaveToFile text, "C:\temp\test.txt"
This is a little bit simpler, but it only offers a Boolean Unicode parameter, which switches between UTF-16 and ANSI (not ASCII, as the documentation incorrectly claims!). The solution with ADODB.Stream gives you fine-grained encoding choices, for example UTF-8, which is impossible with the FileSystemObject.
For the record, there are two ways to create an UTF-8-encoded text file:
The way Microsoft likes to do it, with a 3-byte long Byte Order Mark (BOM) at the start of the file. Most, if not all Microsoft tools do that when they offer "UTF-8" as an option, ADODB.Stream is no exception.
The way everyone else does it - without a BOM. This is correct for most uses.
To create an UTF-8 file with BOM, the first code sample above can be used. To create an UTF-8 file without BOM, we can use two stream objects:
Const adModeReadWrite = 3
Const adTypeBinary = 1
Const adTypeText = 2
Const adSaveCreateOverWrite = 2
Sub SaveToFile(text, filename)
Dim iStr: Set iStr = CreateObject("ADODB.Stream")
Dim oStr: Set oStr = CreateObject("ADODB.Stream")
' one stream for converting the text to UTF-8 bytes
iStr.Mode = adModeReadWrite
iStr.Type = adTypeText
iStr.Charset = "UTF-8"
iStr.Open
iStr.WriteText text
' one steam to write bytes to a file
oStr.Mode = adModeReadWrite
oStr.Type = adTypeBinary
oStr.Open
' switch first stream to binary mode and skip UTF-8 BOM
iStr.Position = 0
iStr.Type = adTypeBinary
iStr.Position = 3
' write remaining bytes to file and clean up
oStr.Write iStr.Read
oStr.SaveToFile filename, adSaveCreateOverWrite
oStr.Close
iStr.Close
End Sub

How to read .txt file with Chinese characters?

I have a subroutine that reads text files and extracts certain data from them. Here is an example:
NamePrefix = "Example"
OutputPath = "C:\Example"
DbSize = 65536
LstStr = ""
Dim Success() As Boolean
Dim Value() As Double
ReDim Success(1 to DbSize)
ReDim Value(1 to DbSize)
For ID = 1 to DbSize
'Read string
FileName = NamePrefix & Format(ID,"000000") & ".lst"
FilePath = OutputPath & "\" & FileName
Open FilePath For Input As 1
LstStr = Input(LOF(1),1)
Close 1
'Extract data
If InStr(1, LstStr, "SUCCESS") <> 0 Then Success(i) = True Else Success(i) = False
Pos1 = InStr(1, LstStr, "TH 1 value: ") 'Position varies for each file
Value(i) = Val(Mid(LstStr, Pos1 + 13, 10)) 'Value in scientific notation
Next ID
The use of InStr to locate strings by position works perfectly when there are just alphabets, numbers and symbols. However, sometimes the files contain Chinese characters and the Input function returns an empty string "" to LstStr. I tried to use some other suggested methods but in vain (e.g. Extract text from a text file with Chinese characters using vba). How should I read files with Chinese characters successfully, in a way that I do not need to modify other parts of the code which extract data by position? Thanks!
This would be an alternative way to read the string. Make sure that the .Charset is set to the charset of the file you want to read.
To use ADOBD you will need to add the reference Microsoft ActiveX Data Objects 6.1 Library (Version can be different) in VBA Menu › Extras › References
Dim adoStream As ADODB.Stream
Set adoStream = New ADODB.Stream
adoStream.Charset = "UTF-8" 'set the correct charset
adoStream.Open
adoStream.LoadFromFile FilePath
LstStr = adoStream.ReadText
adoStream.Close
Set adoStream = Nothing

Reading International/Special Characters in VBA

I read an Excel spreadsheet row by row and for each row create a textfile including information from the columns.
From time to time there is foreign text in some of the spreadsheet cells. In the debugger the foreign text appears as '?' question marks. It fails when trying to write these question marks to the text file.
This is a snippet of the code that reads the values from a row to a string array
Set oFS = CreateObject("Scripting.Filesystemobject")
For Each rID In oSh.UsedRange.Columns("A").Cells
For Each rValue In oSh.UsedRange.Rows(rowCount).Cells
ReDim Preserve columnValues(columnCount)
columnValues(columnCount) = rValue
columnCount = columnCount + 1
Next
Next
This is the code which writes to a text file
sFNText = sMakeFolder & "\" & rID.Value & ".txt"
Set oTxt = oFS.OpenTextFile(sFNText, 2, True)
For i = 0 To UBound(columnTitles)
oTxt.Write columnTitles(i) & ": " & columnValues(i) & vbNewLine
Next i
oTxt.Close
I have experimented with changing the format of opentextfile and also using AscW and ChrW to convert to and from ansi.
EDIT: In particular I am trying to read in Greek symbols (pi, omega etc.) and write them back out to a textfile. I have used the
StrConv(Cells(1, 1), vbUnicode)
method that was detailed in How can I create text files with special characters in their filenames and have got that example working. It seems now a problem with writing this to a textfile. nixda's example seems to work in isolation when using his Print command, however when I try
otxt.Write
to write my stored variable to a textfile it writes out garbage, as opposed to the print method which produces the correct result. Looking at the debugger both variables are stored identically (print method + write), so I believe it is now down to the output method (otxt.Write) which is converting the stored variable into garbage. I have tried using the -1 & -2 options for OpenTextFile - both producing garbage results.
I have the following sheet:
and the following code:
Sub writeUnicodeText()
Dim arr_Strings() As String
i = 0
For Each oCell In ActiveSheet.Range("A1:A4")
ReDim Preserve arr_Strings(i)
arr_Strings(i) = oCell.Value
i = i + 1
Next
Set oFS = CreateObject("Scripting.Filesystemobject")
Set oTxt = oFS.OpenTextFile("C:\users\axel\documents\test.txt", 2, True, -1)
For i = 0 To UBound(arr_Strings)
oTxt.Write arr_Strings(i) & vbNewLine
Next i
oTxt.Close
End Sub
This produces the following file:
This is the code I use to write to a text. I've tried many methods and this has worked the best.
Sub ProcessX()
FName1 = "Location of File"
txtStrngX = OpenTextFileToString2(FName1)
end sub
Public Function OpenTextFileToString2(ByVal strFile As String) As String
Dim hFile As Long
hFile = FreeFile
Open strFile For Input As #hFile
OpenTextFileToString2 = Input$(LOF(hFile), hFile)
Close #hFile
End Function
As for reading in from rows just be sure to set your variable to a string when compiling and any method should work fine.
sorry. That's reading from a text. Here is writing.
Public Function RecordsetToText(rs As Object, Optional FullPath _
As String, Optional ValueDelimiter As String = " ") As Boolean
'PURPOSE: EXPORTS DATA FROM AN ADO RECORDSET TO A TEXT FILE
'PARAMETERS:
'RS: Recordset to Export. Open the recordset before
'passing it to this function
'FullPath (Optional): FullPath of text file.
'if not specified, the function uses app.path +
'rs.txt
'ValueDelmiter (Optional): String to delimiter
'values within a row. If not specified, an tab
'is used
'RETURNS: True if successful, false if an error occurs
'COMMENTS: Rows are delimited by a carriage return
Dim sFullPath As String
Dim sDelimiter As String
Dim iFileNum As Integer
Dim lFieldCount As Long
Dim lCtr As Long
Dim oField As ADODB.Field
On Error GoTo ErrorHandler:
If RecordSetReady(rs) = False Then Exit Function
sDelimiter = ValueDelimiter
If FullPath = "" Then
sFullPath = App.Path
If Right(sFullPath, 1) <> "\" Then sFullPath = _
sFullPath & "\"
sFullPath = sFullPath & "rs.txt"
Else
sFullPath = FullPath
End If
iFileNum = FreeFile
Open sFullPath For Output As #iFileNum
With rs
lFieldCount = .Fields.Count - 1
On Error Resume Next
.MoveFirst
On Error GoTo ErrorHandler
For lCtr = 0 To lFieldCount
Set oField = .Fields(lCtr)
If lCtr < lFieldCount Then
Print #iFileNum, oField.Name & sDelimiter;
Else
Print #iFileNum, oField.Name
End If
Next
Do While Not .EOF
For lCtr = 0 To lFieldCount
Set oField = .Fields(lCtr)
If lCtr < lFieldCount Then
Print #iFileNum, oField.Value & sDelimiter;
Else
Print #iFileNum, oField.Value
End If
Next
.MoveNext
Loop
End With
RecordsetToText = True
ErrorHandler:
On Error Resume Next
Close #iFileNum
End Function

Remove a line from a text file if that line contains some string

In VB6, I'm looking for a way to remove a line of text from a text file if that line contains some string. I work mostly with C# and I'm at a loss here. With .NET there are several ways to do this, but I'm the lucky one who has to maintain some old VB code.
Is there a way to do this?
Thanks
Assuming you have the filename in a variable sFileName:
Dim iFile as Integer
Dim sLine as String, sNewText as string
iFile = FreeFile
Open sFileName For Input As #iFile
Do While Not EOF(iFile)
Line Input #iFile, sLine
If sLine Like "*foo*" Then
' skip the line
Else
sNewText = sNewText & sLine & vbCrLf
End If
Loop
Close
iFile = FreeFile
Open sFileName For Output As #iFile
Print #iFile, sNewText
Close
You may want to output to a different file instead of overwriting the source file, but hopefully this gets you closer.
Well text files are a complicated beast from some point of view: you cannot remove a line and move the further text backward, it is a stream.
I suggest you instead about considering an input to output approach:
1) you open the input file as text
2) you open a second file for output, a temporary file.
3) you iterate through all lines in file A.
4) if current line contains our string, don't write it. If current line does not
contains our string, we write it in the file B.
5) you close file A, you close file B.
Now you can add some steps.
6) Delete file A
7) Move file B in previous file A location.
DeleteLine "C:\file.txt", "John Doe", 0,
Function DeleteLine(strFile, strKey, LineNumber, CheckCase)
'Use strFile = "c:\file.txt" (Full path to text file)
'Use strKey = "John Doe" (Lines containing this text string to be deleted)
Const ForReading = 1
Const ForWriting = 2
Dim objFSO, objFile, Count, strLine, strLineCase, strNewFile
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(strFile, ForReading)
Do Until objFile.AtEndOfStream
strLine = objFile.Readline
If CheckCase = 0 Then strLineCase = UCase(strLine): strKey = UCase(strKey)
If LineNumber = objFile.Line - 1 Or LineNumber = 0 Then
If InStr(strLine, strKey) Or InStr(strLineCase, strKey) Or strKey = "" Then
strNewFile = strNewFile
Else
strNewFile = strNewFile & strLine & vbCrLf
End If
Else
strNewFile = strNewFile & strLine & vbCrLf
End If
Loop
objFile.Close
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(strFile, ForWriting)
objFile.Write strNewFile
objFile.Close
End Function

Resources