Save .txt locally (UTF-8) [duplicate] - jscript

My system is Window 10 English-US.
I need to write some non-printable ASCII characters to a text file. So for eg for the ASCII value of 28, I want to write \u001Cw to the file. I don't have to do anything special when coded in Java. Below is my code in VBS
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Type = 2
objStream.Position = 0
objStream.CharSet = "utf-16"
objStream.WriteText ChrW(28) 'Need this to appear as \u001Cw in the output file
objStream.SaveToFile "C:\temp\test.txt", 2
objStream.Close

You need a read-write stream so that writing to it and saving it to file both work.
Const adModeReadWrite = 3
Const adTypeText = 2
Const adSaveCreateOverWrite = 2
Sub SaveToFile(text, filename)
With CreateObject("ADODB.Stream")
.Mode = adModeReadWrite
.Type = adTypeText
.Charset = "UTF-16"
.Open
.WriteText text
.SaveToFile filename, adSaveCreateOverWrite
.Close
End With
End Sub
text = Chr(28) & "Hello" & Chr(28)
SaveToFile text, "C:\temp\test.txt"
Other notes:
I like to explicitly define with Const all the constants in the code. Makes reading so much easier.
A With block save quite some typing here.
Setting the stream type to adTypeText is not really necessary, that's the default anyway. But explicit is better than implicit, I guess.
Setting the Position to 0 on a new stream is superfluous.
It's unnecessary to use ChrW() for ASCII-range characters. The stream's Charset decides the byte width when you save the stream to file. In RAM, everything is Unicode anyway (yes, even in VBScript).
There are two UTF-16 encodings supported by ADODB.Stream: little-endian UTF-16LE (which is the default and synonymous with UTF-16) and big-endian UTF-16BE, with the byte order reversed.
You can achieve the same result with the FileSystemObject and its CreateTextFile() method:
Set FSO = CreateObject("Scripting.FileSystemObject")
Sub SaveToFile(text, filename)
' CreateTextFile(filename [, Overwrite [, Unicode]])
With FSO.CreateTextFile(filename, True, True)
.Write text
.Close
End With
End Sub
text = Chr(28) & "Hello" & Chr(28)
SaveToFile text, "C:\temp\test.txt"
This is a little bit simpler, but it only offers a Boolean Unicode parameter, which switches between UTF-16 and ANSI (not ASCII, as the documentation incorrectly claims!). The solution with ADODB.Stream gives you fine-grained encoding choices, for example UTF-8, which is impossible with the FileSystemObject.
For the record, there are two ways to create an UTF-8-encoded text file:
The way Microsoft likes to do it, with a 3-byte long Byte Order Mark (BOM) at the start of the file. Most, if not all Microsoft tools do that when they offer "UTF-8" as an option, ADODB.Stream is no exception.
The way everyone else does it - without a BOM. This is correct for most uses.
To create an UTF-8 file with BOM, the first code sample above can be used. To create an UTF-8 file without BOM, we can use two stream objects:
Const adModeReadWrite = 3
Const adTypeBinary = 1
Const adTypeText = 2
Const adSaveCreateOverWrite = 2
Sub SaveToFile(text, filename)
Dim iStr: Set iStr = CreateObject("ADODB.Stream")
Dim oStr: Set oStr = CreateObject("ADODB.Stream")
' one stream for converting the text to UTF-8 bytes
iStr.Mode = adModeReadWrite
iStr.Type = adTypeText
iStr.Charset = "UTF-8"
iStr.Open
iStr.WriteText text
' one steam to write bytes to a file
oStr.Mode = adModeReadWrite
oStr.Type = adTypeBinary
oStr.Open
' switch first stream to binary mode and skip UTF-8 BOM
iStr.Position = 0
iStr.Type = adTypeBinary
iStr.Position = 3
' write remaining bytes to file and clean up
oStr.Write iStr.Read
oStr.SaveToFile filename, adSaveCreateOverWrite
oStr.Close
iStr.Close
End Sub

Related

Removing CrLf in whole csv in VBA - ADODB.Stream (Excel Macro)

I have a vba macro with that adds BOM to UTF-8 csv - it's needed for succesfully opening in Excel.
But the problem is, that when at the end of the line there is CrLf mark - excel makes new line below that line.
What I need is to remove all CrLf marks (no Cr or Lf should be added instead). It should help because only Cr will exist in csv. CrLf exists only in fault lines.
Can you help please with my source code? What formula should I add to replace in source csv to save target csv with BOM without any CrLf?
Dim fsT, tFileToOpen, tFileToSave As String
tFileToOpen = "C:\source_NO_BOM.csv"
tFileToSave = "C:\target_WITH_BOM.csv"
tFileToOpenPath = tFileToOpen
tFileToSavePath = tFileToSave
Set fsT = CreateObject("ADODB.Stream"): 'Create Stream object
fsT.Type = 2: 'Specify stream type – we want To save text/string data.
fsT.Charset = "utf-8": 'Specify charset For the source text data.
fsT.Open: 'Open the stream
fsT.LoadFromFile tFileToOpenPath: 'And write the file to the object stream
fsT.SaveToFile tFileToSavePath, 2: 'Save the data to the named path
You can create another stream object, change the content you broke into it, and export it.
Dim fsT, tFileToOpen, tFileToSave As String
Dim s As String, Newfst As Object
tFileToOpen = "C:\source_NO_BOM.csv"
tFileToSave = "C:\target_WITH_BOM.csv"
tFileToOpenPath = tFileToOpen
tFileToSavePath = tFileToSave
Set fsT = CreateObject("ADODB.Stream"): 'Create Stream object
With fsT
.Type = 2: 'Specify stream type ? we want To save text/string data.
.Charset = "utf-8": 'Specify charset For the source text data.
.Open: 'Open the stream
.LoadFromFile tFileToOpenPath: 'And write the file to the object stream
s = .ReadText
s = Replace(s, vbCrLf, "")
End With
Set Newfst = CreateObject("ADODB.Stream")
With Newfst
.Type = 2
.Charset = "utf-8"
.Open
.WriteText s
.SaveToFile tFileToSave, 2
End With

How to maintain character set while exporting Excel table to .csv UTF8 without BOM using VBA?

I have read several other answers regarding how to export a table to .csv with UTF8 encoding (no BOM). I found code which almost works for me, see below.
My problem is that the table contains swedish characters (ÅÄÖ), and when the .csv-file is opened these are lost to what looks like an incorrect charset. I found a workaround which is to open the .csv-file in Notepad, save, and then open it in Excel. The workaround makes Excel display the letters properly, but I would prefer not to have the extra step. Can the code below be modified so that the charset is not lost?
Option Explicit
Sub CSVFileAsUTF8WithoutBOM()
Dim SrcRange As Range
Dim CurrRow As Range
Dim CurrCell As Range
Dim CurrTextStr As String
Dim ListSep As String
Dim FName As Variant
Dim UTFStream As Object
Dim BinaryStream As Object
' ADO Constants
Const adTypeBinary = 1 ' The stream contains binary data
Const adTypeText = 2 ' The stream contains text data (default)
Const adWriteLine = 1 ' write text string and a line separator (as defined by the LineSeparator property) to the stream.
Const adModeReadWrite = 3 ' Read/write
Const adLF = 10 ' Line feed only - default is carriage return line feed (adCRLF)
Const adSaveCreateOverWrite = 2 ' Overwrites the file with the data from the currently open Stream object, if the file already exists
' Open this workbook location
ChDrive Left(ThisWorkbook.Path, 1)
ChDir ThisWorkbook.Path
' ask for file name and path
FName = Application.GetSaveAsFilename("", "CSV File (*.csv), *.csv")
' prepare UTF-8 stream
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.LineSeparator = adLF
UTFStream.Open
'set field separator
ListSep = ";"
'set source range with data for csv file
If Selection.Cells.Count > 1 Then
Set SrcRange = Selection
Else
Set SrcRange = ActiveSheet.UsedRange
End If
For Each CurrRow In SrcRange.Rows
CurrTextStr = ""
For Each CurrCell In CurrRow.Cells
CurrTextStr = CurrTextStr & Replace(CurrCell.Value, """", """""") & ListSep
Next
'remove ListSep after the last value in line
While Right(CurrTextStr, 1) = ListSep
CurrTextStr = Left(CurrTextStr, Len(CurrTextStr) - 1)
Wend
'add line to UTFStream
UTFStream.WriteText CurrTextStr, adWriteLine ' Writes character data to a text Stream object
Next
'skip BOM
UTFStream.Position = 3 ' sets or returns a long value that indicates the current position (in bytes) from the beginning of a Stream object
'copy UTFStream to BinaryStream
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open ' Opens a Stream object
'Strips BOM (first 3 bytes)
UTFStream.CopyTo BinaryStream ' Copies a specified number of characters/bytes from one Stream object into another Stream object
UTFStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
UTFStream.Close ' Closes a Stream object
'save to file
BinaryStream.SaveToFile FName, adSaveCreateOverWrite
BinaryStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
BinaryStream.Close ' Closes a Stream object
End Sub
EDIT:
Based on your comment, I realize that what you initially wanted was to keep the information about the character encoding inside the file without having a BOM.
The problem with this question (as you realized it) is that the BOM is actually what normally contains the information about the character encoding and putting this information anywhere else in the file doesn't really make sense.
So, your code is actually perfect for the task at hand. What needs to be changed is how the CSV file is imported/opened by the software you want to use.
When the file has no BOM, a software reading the file has to guess the
character encoding.
In general, if the software you use doesn't support BOM and doesn't guess correctly, there should at least be a way to customize the behavior of the import/open command so that you can specify the character encoding (seems like you actually found it).
Original answer:
For some reason, Excel has a hard time to guess the character encoding when opening a UTF-8 encoded CSV file when you just double-clicking the file. You have to help it a little...
Instead of opening it directly, you could load the CSV content to a new workbook by using the (legacy) Text Import Wizard and selecting the UTF-8 character set (65001) during import if Excel is not able to figure it out by itself.
If you were to record a macro while doing it and make it into a sub procedure, you could have something like this:
Sub OpenCSV(FullFileName As String)
Dim wb As Workbook
Set wb = Workbooks.Add
Dim ws As Worksheet
Set ws = wb.Sheets(1)
With ws.QueryTables.Add(Connection:= _
"TEXT;" & FullFileName, Destination:=Range( _
"$A$1"))
.Name = "CSV_Open"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 65001
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End Sub
Other suggestion
If you really want to be able to double-click the file instead of using the Text Import Wizard or running a macro, you could always create a VBA event procedure in an add-in or PERSONAL.XSLB running every time a workbook is opened.
If it detects that the file that was just opened is a CSV file, it could close it and "reopen" it using the code above.
Extra:
Of interest: there is a question here about how to change the default character encoding that Excel uses.

Convert Xlsx to CSV UTF-8 format

i want to convert my XLXS file to CSV UTF-8 format using vb script or macros.
if WScript.Arguments.Count < 2 Then
WScript.Echo "Error! Please specify the source path and the destination. Usage: XlsToCsv SourcePath.xls Destination.csv"
Wscript.Quit
End If
Dim oExcel
Set oExcel = CreateObject("Excel.Application")
Dim oBook
Set oBook = oExcel.Workbooks.Open(Wscript.Arguments.Item(0))
oBook.SaveAs WScript.Arguments.Item(1), 6
oBook.Close False
oExcel.Quit
WScript.Echo "Done"enter code here
The above script works fine for normal formats.
Please help me in converting in into UTF-8 format
i have also tries the below ,code but it converts into junk characters
Public Sub convert_UnicodeToUTF8()
Dim parF1, parF2 As String
parF1 = "C:\shrangi\SX_Hospital.xlsx"
parF2 = "C:\shrangi\SX_Hospital.csv"
Const adSaveCreateOverWrite = 2
Const adTypeText = 2
Dim streamSrc, streamDst ' Source / Destination
Set streamSrc = CreateObject("ADODB.Stream")
Set streamDst = CreateObject("ADODB.Stream")
streamDst.Type = adTypeText
streamDst.Charset = "UTF-8"
streamDst.Open
With streamSrc
.Type = adTypeText
.Charset = "UTF-8"
.Open
.LoadFromFile parF1
.copyTo streamDst
.Close
End With
streamDst.SaveToFile parF2, adSaveCreateOverWrite
streamDst.Close
Set streamSrc = Nothing
Set streamDst = Nothing
End Sub
Simply:
ActiveWorkbook.SaveAs Filename:="C:\yourPath\yourFileName.csv", FileFormat:=xlCSVUTF8
More Info:
MSDN: Workbook.SaveAs Method
Since you are converting an external file to an external file, you don't need to do it within Excel with VBA. That opens up some possibilities. With the OpenXML SDK you don't even need Excel.
OpenXML SDK is a bit hard to use so there are a few wrappers for it to optimize Workbook programming. EPPlus has a PowerShell wrapper around it called PSExcel. It makes this task really easy in PowerShell
One-time setup, typically as an Administrator:
Install-Module PSExcel
Once per PowerShell session:
Import-Module PSExcel
Then:
Import-XLSX 'C:\shrangi\SX_Hospital.xlsx' | Export-CSV 'C:\shrangi\SX_Hospital.csv' -Encoding UTF8
For a simple workbook, that's all you need.
Side note on CSV: Converting from xlsx to csv throws out almost all the metadata and introduces the need for more metadata. Along with the file, you need to communicate the character encoding, the data types of each column, whether there is a header row, the line terminator, the field separator (not always comma), the culture-specific numeric formatting, the quote character (aka "text qualifier"), and the quote character escape mechanism. You can see all of these question that Excel has to ask when you use its text import wizard.

How to read .txt file with Chinese characters?

I have a subroutine that reads text files and extracts certain data from them. Here is an example:
NamePrefix = "Example"
OutputPath = "C:\Example"
DbSize = 65536
LstStr = ""
Dim Success() As Boolean
Dim Value() As Double
ReDim Success(1 to DbSize)
ReDim Value(1 to DbSize)
For ID = 1 to DbSize
'Read string
FileName = NamePrefix & Format(ID,"000000") & ".lst"
FilePath = OutputPath & "\" & FileName
Open FilePath For Input As 1
LstStr = Input(LOF(1),1)
Close 1
'Extract data
If InStr(1, LstStr, "SUCCESS") <> 0 Then Success(i) = True Else Success(i) = False
Pos1 = InStr(1, LstStr, "TH 1 value: ") 'Position varies for each file
Value(i) = Val(Mid(LstStr, Pos1 + 13, 10)) 'Value in scientific notation
Next ID
The use of InStr to locate strings by position works perfectly when there are just alphabets, numbers and symbols. However, sometimes the files contain Chinese characters and the Input function returns an empty string "" to LstStr. I tried to use some other suggested methods but in vain (e.g. Extract text from a text file with Chinese characters using vba). How should I read files with Chinese characters successfully, in a way that I do not need to modify other parts of the code which extract data by position? Thanks!
This would be an alternative way to read the string. Make sure that the .Charset is set to the charset of the file you want to read.
To use ADOBD you will need to add the reference Microsoft ActiveX Data Objects 6.1 Library (Version can be different) in VBA Menu › Extras › References
Dim adoStream As ADODB.Stream
Set adoStream = New ADODB.Stream
adoStream.Charset = "UTF-8" 'set the correct charset
adoStream.Open
adoStream.LoadFromFile FilePath
LstStr = adoStream.ReadText
adoStream.Close
Set adoStream = Nothing

Can I export excel data with UTF-8 without BOM?

I export Microsoft Excel data by Excel Macro(VBScript).
Because file is lua script, I export it as UTF-8.
The only way I can make UTF-8 in Excel is using adodb.stream like this
set fileLua = CreateObject("adodb.stream")
fileLua.Type = 2
fileLua.Mode = 3
fileLua.Charset = "UTF-8"
fileLua.Open
fileLua.WriteText("test")
fileLua.SaveToFile("Test.lua")
fileLua.flush
fileLua.Close
I want to make eliminate BOM from Test.lua but I don't know how.
(Because Test.lua has some unicode text, I have to use UTF-8 format.)
Do you know how to make UTF-8 file without BOM in excel file?
Thanks in advance.
I have also the same issue: have to export data from Excel (Office 2003, VBA6.5) to UTF-8 encoded file. Found the answer from your question ! Below my example where I also strip the BOM using trick #2 from boost's (thanks!) answer. I didn't get #1 working and never tried #3.
Sub WriteUTF8WithoutBOM()
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.LineSeparator = adLF
UTFStream.Open
UTFStream.WriteText "This is an unicode/UTF-8 test.", adWriteLine
UTFStream.WriteText "First set of special characters: öäåñüûú€", adWriteLine
UTFStream.WriteText "Second set of special characters: qwertzuiopõúasdfghjkléáûyxcvbnm\|Ä€Í÷×äðÐ[]í³£;?¤>#&#{}<;>*~¡^¢°²`ÿ´½¨¸0", adWriteLine
UTFStream.Position = 3 'skip BOM
Dim BinaryStream As Object
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
'Strips BOM (first 3 bytes)
UTFStream.CopyTo BinaryStream
'UTFStream.SaveToFile "d:\adodb-stream1.txt", adSaveCreateOverWrite
UTFStream.Flush
UTFStream.Close
BinaryStream.SaveToFile "d:\adodb-stream2.txt", adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
End Sub
The ADO Stream Object reference I used.
If anyone else is struggling with the adTypeText constant, you need to include "Microsoft ActiveX Data Objects 2.5 Object Library" under Tools->References.
A few possibilities:
Put the text into the buffer as UTF-8, Type=2, but then set Type=1 (as binary) and write that out. That might convince ADODB.Stream to skip adding the BOM.
Create another buffer, as type binary, and use the CopyTo to copy the data to that buffer from a point after the BOM.
Read the file in again using Scripting.FileSystemObject, trim off the BOM, write out again
Edit
A comment from rellampec alerted me to a better way of dropping the LF I had discovered was added to the end of the file by user272735's method. I have added a new version of my routine at the end.
Original post
I had been using user272735's method successfully for a year when I discovered it added a LF at the end of the file. I failed to notice this extra LF until I did some very detailed testing so this is not an important error. However, my latest version discards that LF just in case it ever became important.
Public Sub PutTextFileUtf8(ByVal PathFileName As String, ByVal FileBody As String)
' Outputs FileBody as a text file (UTF-8 encoding without leading BOM)
' named PathFileName
' Needs reference to "Microsoft ActiveX Data Objects n.n Library"
' Addition to original code says version 2.5. Tested with version 6.1.
' 1Nov16 Copied from http://stackoverflow.com/a/4461250/973283
' but replaced literals with parameters.
' 15Aug17 Discovered routine was adding an LF to the end of the file.
' Added code to discard that LF.
' References: http://stackoverflow.com/a/4461250/973283
' https://www.w3schools.com/asp/ado_ref_stream.asp
Dim BinaryStream As Object
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
' The LineSeparator will be added to the end of FileBody. It is possible
' to select a different value for LineSeparator but I can find nothing to
' suggest it is possible to not add anything to the end of FileBody
UTFStream.LineSeparator = adLF
UTFStream.Open
UTFStream.WriteText FileBody, adWriteLine
UTFStream.Position = 3 'skip BOM
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
UTFStream.CopyTo BinaryStream
' Oriinally I planned to use "CopyTo Dest, NumChars" to not copy the last
' byte. However, NumChars is described as an integer whereas Position is
' described as Long. I was concerned by "integer" they mean 16 bits.
'Debug.Print BinaryStream.Position
BinaryStream.Position = BinaryStream.Position - 1
BinaryStream.SetEOS
'Debug.Print BinaryStream.Position
UTFStream.Flush
UTFStream.Close
Set UTFStream = Nothing
BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
Set BinaryStream = Nothing
End Sub
New version of routine
This version omits the code to discard the unwanted LF added at the end because it avoids adding the LF in the first place. I have retained the original version in case anyone is interested in the technique for deleting trailing characters.
Public Sub PutTextFileUtf8NoBOM(ByVal PathFileName As String, ByVal FileBody As String)
' Outputs FileBody as a text file named PathFileName using
' UTF-8 encoding without leading BOM
' Needs reference to "Microsoft ActiveX Data Objects n.n Library"
' Addition to original code says version 2.5. Tested with version 6.1.
' 1Nov16 Copied from http://stackoverflow.com/a/4461250/973283
' but replaced literals with parameters.
' 15Aug17 Discovered routine was adding an LF to the end of the file.
' Added code to discard that LF.
' 11Oct17 Posted to StackOverflow
' 9Aug18 Comment from rellampec suggested removal of adWriteLine from
' WriteTest statement would avoid adding LF.
' 30Sep18 Amended routine to remove adWriteLine from WriteTest statement
' and code to remove LF from file. Successfully tested new version.
' References: http://stackoverflow.com/a/4461250/973283
' https://www.w3schools.com/asp/ado_ref_stream.asp
Dim BinaryStream As Object
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.Open
UTFStream.WriteText FileBody
UTFStream.Position = 3 'skip BOM
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
UTFStream.CopyTo BinaryStream
UTFStream.Flush
UTFStream.Close
Set UTFStream = Nothing
BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
Set BinaryStream = Nothing
End Sub
Uf you prefer native T-SQL instead of external code
DECLARE #FILE_NAME VARCHAR(255) = 'd:\utils\test.xml' --drive:\path\filename\
DECLARE #FILE_DATA VARCHAR(MAX) = '<?xml version="1.0" encoding="UTF-8"?>test</xml>' --binary as varchar(max)
DECLARE #FILE_NAME_TO VARCHAR(255) --Temp name for text stream
DECLARE #FSO_ID_TXTSTRM INT --Text Stream
DECLARE #FSO_ID_BINSTRM INT --Binary Stream
DECLARE #RC INT
EXEC #RC = sp_OACreate 'ADODB.Stream', #FSO_ID_TXTSTRM OUTPUT
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'Type', 2 --1 = binary, 2 = text
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'Mode', 3 --0 = not set, 1 read, 2 write, 3 read/write
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'Charset', 'UTF-8' --'ISO-8859-1'
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'LineSeparator', 'adLF'
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'Open'
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'WriteText', NULL, #FILE_DATA --text method
--Create binary stream
EXEC #RC = sp_OACreate 'ADODB.Stream', #FSO_ID_BINSTRM OUTPUT
EXEC #RC = sp_OASetProperty #FSO_ID_BINSTRM, 'Type', 1 --1 = binary, 2 = text
EXEC #RC = sp_OAMethod #FSO_ID_BINSTRM, 'Open'
EXEC #RC = sp_OASetProperty #FSO_ID_BINSTRM, 'Mode', 3 --0 = not set, 1 read, 2 write, 3 read/write
--Move 3 positions forward in text stream (BOM is first 3 positions)
EXEC #RC = sp_OASetProperty #FSO_ID_TXTSTRM, 'Position', 3
--Copy text stream to binary stream
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'CopyTo', NULL, #FSO_ID_BINSTRM
--Commit data and close text stream
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'Flush'
EXEC #RC = sp_OAMethod #FSO_ID_TXTSTRM, 'Close'
EXEC #RC = sp_OADestroy #FSO_ID_TXTSTRM
--Save binary stream to file and close
EXEC #RC = sp_OAMethod #FSO_ID_BINSTRM, 'SaveToFile', NULL, #FILE_NAME, 2 --1 = notexist 2 = overwrite
EXEC #RC = sp_OAMethod #FSO_ID_BINSTRM, 'Close'
EXEC #RC = sp_OADestroy #FSO_ID_BINSTRM
Here's another BOM-disposal hack, from an answer that overlaps your question.
Apologies for the late answer - this is more for other people who are encountering Byte Order Markers - and the page views on this question tell me that your question is relevant to several related problems: it's surprisingly difficult to write a BOM-Free file in VBA - even some of the common streams libraries deposit a BOM in your output, whether you asked for it or not.
I say my answer 'overlaps' because the code below is solving a slightly different problem - the primary purpose is writing a Schema file for a folder with a heterogeneous collection of files - but it's a working example of BOM-removal and BOM-free file writing in use, and the relevant segment is clearly marked.
The key functionality is that we iterate through all the '.csv' files in a folder, and we test each file with a quick nibble of the first four bytes: and we only only undertake the onerous task of stripping out a the marker if we see one.
We're working with low-level file-handling code from the primordial C. We have to, all the way down to using byte arrays, because everything else that you do in VBA will deposit the Byte Order Markers embedded in the structure of a string variable.
So, without further adodb, here's the code:
BOM-Disposal code for text files in a schema.ini file:
Public Sub SetSchema(strFolder As String)
On Error Resume Next
' Write a Schema.ini file to the data folder.
' This is necessary if we do not have the registry privileges to set the
' correct 'ImportMixedTypes=Text' registry value, which overrides IMEX=1
' The code also checks for ANSI or UTF-8 and UTF-16 files, and applies a
' usable setting for CharacterSet ( UNICODE|ANSI ) with a horrible hack.
' OEM codepage-defined text is not supported: further coding is required
' ...And we strip out Byte Order Markers, if we see them - the OLEDB SQL
' provider for textfiles can't deal with a BOM in a UTF-16 or UTF-8 file
' Not implemented: handling tab-delimited files or other delimiters. The
' code assumes a header row with columns, specifies 'scan all rows', and
' imposes 'read the column as text' if the data types are mixed.
Dim strSchema As String
Dim strFile As String
Dim hndFile As Long
Dim arrFile() As Byte
Dim arrBytes(0 To 4) As Byte
If Right(strFolder, 1) <> "\" Then strFolder = strFolder & "\"
' Dir() is an iterator function when you call it with a wildcard:
strFile = VBA.FileSystem.Dir(strFolder & "*.csv")
Do While Len(strFile) > 0
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Get #hndFile, , arrBytes
Close #hndFile
strSchema = strSchema & "[" & strFile & "]" & vbCrLf
strSchema = strSchema & "Format=CSVDelimited" & vbCrLf
strSchema = strSchema & "ImportMixedTypes=Text" & vbCrLf
strSchema = strSchema & "MaxScanRows=0" & vbCrLf
If arrBytes(2) = 0 Or arrBytes(3) = 0 Then ' this is a hack
strSchema = strSchema & "CharacterSet=UNICODE" & vbCrLf
Else
strSchema = strSchema & "CharacterSet=ANSI" & vbCrLf
End If
strSchema = strSchema & "ColNameHeader = True" & vbCrLf
strSchema = strSchema & vbCrLf
' ***********************************************************
' BOM disposal - Byte order marks break the Access OLEDB text provider:
If arrBytes(0) = &HFE And arrBytes(1) = &HFF _
Or arrBytes(0) = &HFF And arrBytes(1) = &HFE Then
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
ReDim arrFile(0 To LOF(hndFile) - 1)
Get #hndFile, , arrFile
Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1), ""
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Put #hndFile, , arrFile
Close #hndFile
Erase arrFile
ElseIf arrBytes(0) = &HEF And arrBytes(1) = &HBB And arrBytes(2) = &HBF Then
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
ReDim arrFile(0 To LOF(hndFile) - 1)
Get #hndFile, , arrFile
Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1) & arrBytes(2), ""
hndFile = FreeFile
Open strFolder & strFile For Binary As #hndFile
Put #hndFile, , arrFile
Close #hndFile
Erase arrFile
End If
' ***********************************************************
strFile = ""
strFile = Dir
Loop
If Len(strSchema) > 0 Then
strFile = strFolder & "Schema.ini"
hndFile = FreeFile
Open strFile For Binary As #hndFile
Put #hndFile, , strSchema
Close #hndFile
End If
End Sub
Public Sub BigReplace(ByRef arrBytes() As Byte, _
ByRef SearchFor As String, _
ByRef ReplaceWith As String)
On Error Resume Next
Dim varSplit As Variant
varSplit = Split(arrBytes, SearchFor)
arrBytes = Join$(varSplit, ReplaceWith)
Erase varSplit
End Sub
The code's easier to understand if you know that a Byte Array can be assigned to a VBA.String, and vice versa. The BigReplace() function is a hack that sidesteps some of VBA's inefficient string-handling, especially allocation: you'll find that large files cause serious memory and performance problems if you do it any other way.

Resources