How to maintain character set while exporting Excel table to .csv UTF8 without BOM using VBA? - excel

I have read several other answers regarding how to export a table to .csv with UTF8 encoding (no BOM). I found code which almost works for me, see below.
My problem is that the table contains swedish characters (ÅÄÖ), and when the .csv-file is opened these are lost to what looks like an incorrect charset. I found a workaround which is to open the .csv-file in Notepad, save, and then open it in Excel. The workaround makes Excel display the letters properly, but I would prefer not to have the extra step. Can the code below be modified so that the charset is not lost?
Option Explicit
Sub CSVFileAsUTF8WithoutBOM()
Dim SrcRange As Range
Dim CurrRow As Range
Dim CurrCell As Range
Dim CurrTextStr As String
Dim ListSep As String
Dim FName As Variant
Dim UTFStream As Object
Dim BinaryStream As Object
' ADO Constants
Const adTypeBinary = 1 ' The stream contains binary data
Const adTypeText = 2 ' The stream contains text data (default)
Const adWriteLine = 1 ' write text string and a line separator (as defined by the LineSeparator property) to the stream.
Const adModeReadWrite = 3 ' Read/write
Const adLF = 10 ' Line feed only - default is carriage return line feed (adCRLF)
Const adSaveCreateOverWrite = 2 ' Overwrites the file with the data from the currently open Stream object, if the file already exists
' Open this workbook location
ChDrive Left(ThisWorkbook.Path, 1)
ChDir ThisWorkbook.Path
' ask for file name and path
FName = Application.GetSaveAsFilename("", "CSV File (*.csv), *.csv")
' prepare UTF-8 stream
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.LineSeparator = adLF
UTFStream.Open
'set field separator
ListSep = ";"
'set source range with data for csv file
If Selection.Cells.Count > 1 Then
Set SrcRange = Selection
Else
Set SrcRange = ActiveSheet.UsedRange
End If
For Each CurrRow In SrcRange.Rows
CurrTextStr = ""
For Each CurrCell In CurrRow.Cells
CurrTextStr = CurrTextStr & Replace(CurrCell.Value, """", """""") & ListSep
Next
'remove ListSep after the last value in line
While Right(CurrTextStr, 1) = ListSep
CurrTextStr = Left(CurrTextStr, Len(CurrTextStr) - 1)
Wend
'add line to UTFStream
UTFStream.WriteText CurrTextStr, adWriteLine ' Writes character data to a text Stream object
Next
'skip BOM
UTFStream.Position = 3 ' sets or returns a long value that indicates the current position (in bytes) from the beginning of a Stream object
'copy UTFStream to BinaryStream
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open ' Opens a Stream object
'Strips BOM (first 3 bytes)
UTFStream.CopyTo BinaryStream ' Copies a specified number of characters/bytes from one Stream object into another Stream object
UTFStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
UTFStream.Close ' Closes a Stream object
'save to file
BinaryStream.SaveToFile FName, adSaveCreateOverWrite
BinaryStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
BinaryStream.Close ' Closes a Stream object
End Sub

EDIT:
Based on your comment, I realize that what you initially wanted was to keep the information about the character encoding inside the file without having a BOM.
The problem with this question (as you realized it) is that the BOM is actually what normally contains the information about the character encoding and putting this information anywhere else in the file doesn't really make sense.
So, your code is actually perfect for the task at hand. What needs to be changed is how the CSV file is imported/opened by the software you want to use.
When the file has no BOM, a software reading the file has to guess the
character encoding.
In general, if the software you use doesn't support BOM and doesn't guess correctly, there should at least be a way to customize the behavior of the import/open command so that you can specify the character encoding (seems like you actually found it).
Original answer:
For some reason, Excel has a hard time to guess the character encoding when opening a UTF-8 encoded CSV file when you just double-clicking the file. You have to help it a little...
Instead of opening it directly, you could load the CSV content to a new workbook by using the (legacy) Text Import Wizard and selecting the UTF-8 character set (65001) during import if Excel is not able to figure it out by itself.
If you were to record a macro while doing it and make it into a sub procedure, you could have something like this:
Sub OpenCSV(FullFileName As String)
Dim wb As Workbook
Set wb = Workbooks.Add
Dim ws As Worksheet
Set ws = wb.Sheets(1)
With ws.QueryTables.Add(Connection:= _
"TEXT;" & FullFileName, Destination:=Range( _
"$A$1"))
.Name = "CSV_Open"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 65001
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = True
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = False
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End Sub
Other suggestion
If you really want to be able to double-click the file instead of using the Text Import Wizard or running a macro, you could always create a VBA event procedure in an add-in or PERSONAL.XSLB running every time a workbook is opened.
If it detects that the file that was just opened is a CSV file, it could close it and "reopen" it using the code above.
Extra:
Of interest: there is a question here about how to change the default character encoding that Excel uses.

Related

Excel crashing randomly when running macro

I'm having an issue with the following code, that is supposed to sequentially open 〜100 csv files, check for a value in a cell (validation, if it is file with correct structure), copy single line of data and paste it into ThisWorkbook.Worksheets("2 CSV").Range("B" & row_number).
This solution worked for two years until this month. Now the whole Excel crashes randomly on any file without any error message. Sometimes it manages to loop through 20 files, sometimes 5.
The weirdest thing is, that I can loop manually using F8 through the whole thing without any problem.
The macro:
Sub b_load_csv()
Dim appStatus As Variant
Dim folder_path As String 'folder path to where CSVs are stored
Dim file_name As String 'file name of current CSV file
Dim row_number As Integer 'row number in target sheet
Dim source_sheet_name As String 'name of the source sheet of the CSV = CSV file name
Dim wb_src As Workbook 'variable for opened CSV source workbook
Dim sht_src As Worksheet 'variable for opened CSV source sheet
Dim sht_csv As Worksheet 'variable for target sheet in ThisWorkbook
With Application
.Calculation = xlCalculationManual
.ScreenUpdating = False
.DisplayAlerts = False
If .StatusBar = False Then appStatus = False Else appStatus = .StatusBar 'show currently processing file in status bar
End With
folder_path = "C:\Folder\SubFolder\" 'here are the files stored
file_name = Dir(folder_path & "*.csv") 'using dir to get file names
row_number = 3 'row number for pasting values
Set sht_csv = ThisWorkbook.Worksheets("2 CSV") 'target sheet for data aggregation
Do While file_name <> ""
Workbooks.Open (folder_path & file_name), UpdateLinks:=False, Local:=True 'open csv file
Set wb_src = Workbooks(file_name) 'assign opened csv file to variable
source_sheet_name = Left(file_name, InStr(file_name, ".") - 1) 'sheet name in csv is the same as the file name
Set sht_src = wb_src.Worksheets(source_sheet_name) 'assign source sheet to variable
If sht_src.Range("C1").Value2 = "OJ_POPIS" Then 'checks if the csv has the correct structure
sht_src.Range("A2:FZ2").Copy 'if so copies desired range
sht_csv.Range("B" & row_number).PasteSpecial 'and pastes it into target worksheet column B
End If
sht_csv.Range("A" & row_number).Value2 = file_name 'writes file name into column A
Application.CutCopyMode = False
wb_src.Close SaveChanges:=False
file_name = Dir() 'fetch next file name
row_number = row_number + 1
'the following lines is what I tried to fix the problem of random excel crashing
Set wb_src = Nothing
Set sht_src = Nothing
Application.StatusBar = "Processing file " & file_name
DoEvents
Application.Wait (Now + TimeValue("0:00:02"))
ThisWorkbook.Save 'save after every loaded file to see which files are causing the problem
Loop
MsgBox "Data from CSV files copied", vbOKOnly
Set sht_csv = Nothing
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.DisplayAlerts = True
End Sub
Source CSV files are encoded both in UTF-8 and ANSI (my ACP is ANSI, 1250) and ; delimited.
Group policy restricting macros doesn't apply to me. I can sign my own macros.
What I tried:
Lines of code at the end of the loop
Identifying and deleting files triggering the crash (they have nothing in common, seemingly random, by the time a remove half of them... what is the point)
Simplifying the macro
New workbook
Different machine
VPN On/Off
Thank you for your help!
First thing I'd try is include a proper error handler (not resume next), particularly with x64, and ensure 'Break on all unhandled errors' is selected in Tools / Options / General.
Second thing I'd try is avoid using the clipboard -
With sht_src.Range("A2:FZ2")
sht_cvs.Range("B" & row_number).Resize(.Rows.Count, .Columns.Count).Value = .Value
End With
(no need then to clear CutCopyMode)
Third thing I'd try is don't filter with Dir but something like this -
sFilter = "*.cvs"
file_name = Dir$(, 15) ' without vbDirectory if not getting subfolders
Do While Len(file_name)
If file_name Like sFilter Then
' process file
End If
file_name = Dir$(, 15)
Loop
Fourth thing I'd try is a good cup of coffee!

Removing CrLf in whole csv in VBA - ADODB.Stream (Excel Macro)

I have a vba macro with that adds BOM to UTF-8 csv - it's needed for succesfully opening in Excel.
But the problem is, that when at the end of the line there is CrLf mark - excel makes new line below that line.
What I need is to remove all CrLf marks (no Cr or Lf should be added instead). It should help because only Cr will exist in csv. CrLf exists only in fault lines.
Can you help please with my source code? What formula should I add to replace in source csv to save target csv with BOM without any CrLf?
Dim fsT, tFileToOpen, tFileToSave As String
tFileToOpen = "C:\source_NO_BOM.csv"
tFileToSave = "C:\target_WITH_BOM.csv"
tFileToOpenPath = tFileToOpen
tFileToSavePath = tFileToSave
Set fsT = CreateObject("ADODB.Stream"): 'Create Stream object
fsT.Type = 2: 'Specify stream type – we want To save text/string data.
fsT.Charset = "utf-8": 'Specify charset For the source text data.
fsT.Open: 'Open the stream
fsT.LoadFromFile tFileToOpenPath: 'And write the file to the object stream
fsT.SaveToFile tFileToSavePath, 2: 'Save the data to the named path
You can create another stream object, change the content you broke into it, and export it.
Dim fsT, tFileToOpen, tFileToSave As String
Dim s As String, Newfst As Object
tFileToOpen = "C:\source_NO_BOM.csv"
tFileToSave = "C:\target_WITH_BOM.csv"
tFileToOpenPath = tFileToOpen
tFileToSavePath = tFileToSave
Set fsT = CreateObject("ADODB.Stream"): 'Create Stream object
With fsT
.Type = 2: 'Specify stream type ? we want To save text/string data.
.Charset = "utf-8": 'Specify charset For the source text data.
.Open: 'Open the stream
.LoadFromFile tFileToOpenPath: 'And write the file to the object stream
s = .ReadText
s = Replace(s, vbCrLf, "")
End With
Set Newfst = CreateObject("ADODB.Stream")
With Newfst
.Type = 2
.Charset = "utf-8"
.Open
.WriteText s
.SaveToFile tFileToSave, 2
End With

Import Text File with Blank Characters

I'm trying to automate a task in VBA where I click a button to select a text file (which is a generated output file from another program) and the text file then imports itself into the workbook to be parsed. The text file is space delimited with inconsistent breaks (some lines contain 8 characters per value, others contain 10, etc). My problem is when Excel imports lines that contain blank spaces (" ") that Excel loses the leading/trailing spaces if there's only one non-blank value in that given row. My code is given below:
Sub Import_File()
Sheet3.Cells.Clear
Sheet3.Activate
x = Application.GetOpenFilename
If x = False Then
Exit Sub
End If
With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & x, Destination:=Range("A1"))
.FieldNames = True
.PreserveFormatting = True
.RefreshStyle = xlInsertDeleteCells
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileTabDelimiter = True
.TextFileCommaDelimiter = True
.TextFileColumnDataTypes = Array(1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
End Sub
A workaround I read is that I could manually open the text file in Notepad, and find/replace all spaces with a character not in use, such as %, and then import so I can parse the data. I'd prefer to automate this task and avoid using Notepad. I found code below that modifies the text file itself and performs the character swap (the given code searches the text file for string given in cell A1 and replaces it with string in cell B1). The "Replace_Text" code works except I just want to temporarily modify the text file so that I can perform a proper import and wish to preserve the original text file afterward. Is there a clean way to piece together this code to perform a proper import and avoid permanently altering the original text file? EDIT: Added a snapshot of my text file that isn't importing properly if this helps. The leading spaces on the 10th entry of the 4th row are being ignored when Excel imports the file.
Sub Replace_Text()
Dim strFile As String
Dim i As Integer
Dim strText As String
Dim cell As Range
With Application.FileDialog(msoFileDialogFilePicker)
.InitialFileName = ThisWorkbook.Path
If .Show <> -1 Then Exit Sub
strFile = .SelectedItems(1)
End With
i = FreeFile
strText = Space(FileLen(strFile))
With CreateObject("vbscript.regexp")
.Global = True
Open strFile For Binary Access Read Write As #i
Get #i, , strText
For Each cell In Range("A1:A" & Cells(Rows.Count, "A").End(xlUp).Row)
.Pattern = Replace(Replace(Replace(Replace(cell.Value, "?", "\?"), "*", "\*"), "+", "\+"), ".", "\.")
strText = .Replace(strText, cell.Offset(, 1).Value)
Next cell
Put #i, 1, strText
Close #i
End With
End Sub
Link to Text File

Importing CSV files keeping UTF-8 format

I am importing a batch of csv files from a folder all in separate worksheets, yet when I import the file, my new data loses leading zeroes for numbers and also loses its UTF-8 format. Is there any possible way to import the csv files while keeping leading zeroes and UTF-8 format?
Below is my vba
Option Explicit
Sub ImportCSVs()
Dim fPath As String
Dim fCSV As String
Dim wbCSV As Workbook
Dim wbMST As Workbook
Set wbMST = ThisWorkbook
fPath = "C:\mycsvfiles\Q3 2017\" 'path to CSV files, include the final \
Application.ScreenUpdating = False 'speed up macro
Application.DisplayAlerts = False 'no error messages, take default answers
fCSV = Dir(fPath & "*.csv") 'start the CSV file listing
On Error Resume Next
Do While Len(fCSV) > 0
Set wbCSV = Workbooks.Open(fPath & fCSV) 'open a CSV file
wbMST.Sheets(ActiveSheet.Name).Delete 'delete sheet if it exists
ActiveSheet.Move After:=wbMST.Sheets(wbMST.Sheets.Count) 'move new sheet into Mstr
Columns.AutoFit 'clean up display
fCSV = Dir 'ready next CSV
Loop
Application.ScreenUpdating = True
Set wbCSV = Nothing
End Sub
Thanks a million in advance! Let me know if I can provide additional information

Very large excel file - how to copy data between sheets?

I need to to import some csv files into excel 2010 and create a very simple, but very large database.
The whole story will be - five columns and thousands of rows.
VBA is also simple - copy data from one sheet to another - and vice versa.
But I need to care about memory requirement, because of potentially very large file size.
Dim ws1 As Worksheet
Dim ws2 As Worksheet
Dim r1 As Range
Dim r2 As Range
Set ws1 = Sheets("01")
Set ws2 = Sheets("02")
Set r1 = ws1.Range("A1:B10") ' for example
Set r2 = ws2.Range("C5:D14")
r1.Copy Destination:=r2 'first way
r2.Value = r1.Value ' second way
Is there any differences between this two methods, in the scope of memory/time consuming?
At the and I will have over 10,000 rows. What will be the size of the file?
You can utilize ADO to query text files as if they were a database table. This allows you to write SQL queries to pull data out of your text files. You can do this any text file or even .xls files if you wanted to.
The code/process for doing so is fairly simple. You'll need to reference the Microsoft ActiveX Data Objects 2.X Library first and then use something like the following:
Dim cn as New ADODB.Connection
Dim rs as New ADODB.Recordset
Dim i as Integer
With cn
.Provider = "Microsoft.Jet.OLEDB.4.0"
.ConnectionString = "Data Source=C:\SomeFolder;" & _
"Extended Properties=""text; HDR=Yes;FMT=Delimited"""
.Open
With rs
.Open "SELECT * from fileName.txt", cn
'Loop through each row in query
While Not (.EOF Or .BOF)
'Loop through each column in row
For i = 0 to .Fields.Count - 1
Debug.Print .Fields(i).Value 'Print value of field to Immediate Window
Next i
.MoveNext
Wend
.Close
End With
.Close
End With
Set rs = Nothing
Set cn = Nothing
This will loop through your text file and display the value of the first column in your VBA immediate window. It also assumes that your file has header rows. If it does not then you need to alter HDR in your ConnectionString to No.
The code will automatically try and infer types for you but if you're running into issues with it not discovering the correct type (such as leading zeros) then you can explicity define a schema for your file. It's important to note that if you go the schema route then your ConnectionString arguments like HDR and FMT WILL BE IGNORED. They will retain their default settings as defined in the Registry unless you override them in the schema definition. More info on schema.ini files can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx.
Here is another useful link: http://msdn.microsoft.com/en-us/library/ms974559.aspx. It's an article written by the Microsoft Scripting Guys and is how I originally learned about the process.
Lastly, if you ever use this process with .xls files then you should know that you should NEVER query an OPEN .xls file. There's a nasty memory leak bug with OPEN .xls files (more info here: http://support.microsoft.com/default.aspx?scid=kb;en-us;319998&Product=xlw). As long as you query CLOSED .xls documents then you shouldn't have any issues whatsoever =D. The syntax in the SQL FROM clause is a bit different since you have to target particular sheet but IIRC the Scripting Guys article I linked explains how to do so.
This code block had some specifics for a project I was on, but should help get you started on how to import CSV files (somewhat cleaning) through VBA:
Public Sub ImportCSV(strPath As String, strFile As String, strExt As String, wbDestination As Workbook, Optional wsDest As Worksheet, Optional strRange As String, Optional blHeaders As Boolean = True)
'imports given CSV file into given sheet at given range _
defaults to comma separated delimiters
Dim wsDestination As Worksheet
Dim strFileName As String
strFileName = strPath & strFile & ".csv"
If wsDest Is Nothing Then Set wsDestination = wbDestination.Worksheets.Add(, wbDestination.Worksheets(wbDestination.Worksheets.Count)) Else: Set wsDestination = wsDest
If strRange = "" Then strRange = "$A$1"
With wsDestination.QueryTables.Add(Connection:="TEXT;" & strFileName, Destination:=wsDestination.Range(strRange))
.FieldNames = False
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = False
.AdjustColumnWidth = False
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
.Delete
End With
If Not blHeaders Then wsDestination.Range(strRange).EntireRow.Delete
End Sub

Resources