Excel export to .txt via script - excel

I got the issue to export a huge amount of Excel cells to a .txt file, which necessarily needs to in UTF-8 format. Therefore I made a VBScript, which is executed by a batch file and it totally does what it should (despite it creates a UTF-16 file).
Set file = fso.OpenTextFile(FILE, 2, True, -1)
In the documentation is mentioned that the -1 will generate a Unicode file and I am quite sure, that this is limited to UTF-16.
My questions are now: am I missing something or it is quite not possible to achieve this with VBScript? Is there an easier way? Besides that: is this platform independent?

The FileSystemObject doesn't support UTF-8, but ADODB.Stream objects do.
...
'Note: wb is the variable holding your workbook object
'Save worksheet as Unicode text ...
wb.SaveAs "C:\utf16.txt", 42
'... read the Unicode text into a variable ...
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\utf16.txt", 1, False, -1).ReadAll
'... and export it as UTF-8 text.
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = 2 'text
stream.Position = 0
stream.Charset = "utf-8"
stream.WriteText txt
stream.SaveToFile "C:\utf8.txt", 2
stream.Close
...

Related

Editing a Hex Stream in VBA Before Saving to File

Background: I am extracting a file that is saved in a SQL database using an ADODB connection. One column of the database is the filename, including the file extension, and another is the actual contents of the file as a hex stream. I would like to save this file and open it.
Problem: This works fine with .pdf files. However, with .png files there is an error- the file is corrupted when I try to open it. I used a hex editor (HxD) and noticed that there were excess values. If I remove these the file opens fine. The hex stream should begin with the "per mille" character (Chr(137) in excel) in order for the file to open. I have not found a way to edit the hex stream in excel without converting it to characters.
The .png file opens with no problem when I take out the first few characters using a hex editor so that the file begins with:
‰PNG
Or the equivalent in hex code:
89 50 4E 47
The excess characters are
ÿþ
Or the equivalent in hex code:
FF FE
(I am trying to remove these). These characters are in the saved file even when I remove 4 characters from the text string using
Content = Right(Content, Len(Content) - 4)
It's almost like they automatically get added before the string when I save the file.
Code:
Calling the save to file function, where Content is the file content and Name is the filename:
Call StringToTextFile("C:\", rst![Content], rst![Name])
The function is below:
Public Sub StringToTextFile(ByVal directory As String, ByVal Content As String, ByVal filename As String)
'Requires reference to scrrun.dll
Dim fso As Scripting.FileSystemObject
Dim ts As Scripting.TextStream
Set fso = New Scripting.FileSystemObject
If Right(filename, 4) = ".png" Then 'recognizing the .png file
'Content = CByte(Chr(137)) & Right(Content, Len(Content)) 'unsuccessful attempt at inserting "per mille" character
'iret = InStr(0, Chr(137), Content, vbBinaryCompare) 'unsuccessful attempt at finding the "per mille" character in the content
End If
Set ts = fso.CreateTextFile(directory & filename, True, True)
ts.Write Content
ts.Close
Dim myShell As Object
Set myShell = CreateObject("WScript.Shell")
myShell.Run directory & filename 'Open the saved file
End Sub
When I try to insert the "per mille" character using Chr(137) it just shows a blank space in the hex editor.
Any help is appreciated!
This seems to be a similar discussion, but I am unsure how to apply this to my case:
excel-vba-hex-to-ascii-error

Output from Excel to text file using VBA, "UTF-8 Unix" format needed

I have some data in Excel file and I need to share it with somebody.
The other side has specific requirements to the format: text file, *.dat name, pipe-separated, etc..
So I created a VBA-code to extract the data into a text file using the examples from this site.
At the moment my file is not being accepted because: "It's not a UTF-8 UNIX format".
I tried multiple solutions from the internet but each time I receive an answer that it's either "UTF-8 PC format" or "Unicode format".
Can somebody help me to solve this? Here's what I have so far:
'create output (this is from my initial version)
Set Fileout = fso.CreateTextFile(myFile, True, True)
For y = 2 To a
Fileout.WriteLine .Cells(y, 59)
Next y
Fileout.Close
'this is from my second conversion attempt, where I received UTF-8 PC format
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Position = 0
stream.Charset = "UTF-8"
stream.LoadFromFile myFile
'This is my latest try. I found something called "UTF-8 without BOM" on the internet. Still doesn't work.
Set StreamUnixUtf = CreateObject("ADODB.Stream")
StreamUnixUtf.Type = 1
StreamUnixUtf.Mode = 3
StreamUnixUtf.Open
stream.Position = 3
stream.CopyTo StreamUnixUtf
stream.Flush
stream.Close
StreamUnixUtf.SaveToFile myFileUnixUTF, 2
StreamUnixUtf.Close
Thank you in advance for your help.
How about TextWrangler (mac) or Programmer's file editor (PC) and select unix line feed - have this when I need to submit a file onto a specific system...

Character changes when importing to Access and exporting to Excel

I'm working on an Access database in which I import csv files converted from xls
Usually this works, but recently one file has some fields where characters change within the field after being imported into Access
For example:
a dash changes to û
a beginning double quote changes to ô
an end double quote changes to ö
From what I have read it has something to do with 7 or 8 bit character codes.. which is not something I really understand.
My questions are, is there any way to prevent this character change or is there something better than what I've tried already?
Or are there any potential problems that I haven't come across with what seems to work in my example below?
Here's what I've tried so far that seems to work
From the original Excel file Save as unicode text file (something new for me)
ActiveWorkbook.SaveAs Filename:= _
"D:\NewFiles\ReportList.txt", FileFormat:=xlUnicodeText _
, CreateBackup:=False
Then import into the database with the following code
DoCmd.TransferText acImportDelim, "ReportList Import Specification", "tbl_ReportList", "D:\NewFiles\ReportList.txt", True
This seems to import the text into the database correctly.
Other people work with the data and then export a new report from Access to Excel.
That changes the font to MS Sans Serif and changes the characters again but not the same changes as when it was imported.
After the Excel report is exported, and I change the font to Arial the characters are correct again.... at least so far.
I haven't run into this character change in the past and my solution seems to work, but I'm not sure if there are other potential problems or if there's anything I missed. I haven't found the answer to this specific question yet.
Thanks for taking time to help with this.
Here is a method that I have used in the past to circumvent the character encoding issues.
I suspect this method should also work between Excel and Access -- although Access is not really something I am familiar with.
This sub specifies the file's full name & path, and a destination for a new filename & path. These could be the same if you want to overwrite existing.
NOTE On a few simple tests, I can't get this to read a file saved as "Unicode" from Excel, but it works perfectly on files saved as "Tab Delimited TXT" files and CSV/comma-separated files, too.
Sub OpenAndSaveTxtUTF8()
Dim txtFileName as String
Dim newTxtFileName as String
txtFileName = "D:\NewFiles\ReportList.txt"
newTxtFileName = "D:\NewFiles\UTF8_ReportList.txt"
WriteUTF8(ReadTextFile(txtFileName), newTxtFileName)
End Sub
This sub calls on two functions which I borrowed from sources credited in the code comments. The WriteUTF8 creates a proper UTF8 file from the contents of ReadTextFile which returns a string of the full file contents.
Function ReadTextFile(sFileName As String) As String
'http://www.vbaexpress.com/kb/getarticle.php?kb_id=699
Dim iFile As Integer
On Local Error Resume Next
' \\ Use FreeFile to supply a file number that is not already in use
iFile = FreeFile
' \\ ' Open file for input.
Open sFileName For Input As #iFile
' \\ Return (Read) the whole content of the file to the function
ReadTextFile = Input$(LOF(iFile), iFile)
Close #iFile
On Error GoTo 0
End Function
This function requires a reference to the ADODB library, or, you can Dim objStream As Object and the code should still work for you.
Function WriteUTF8(textString$, myFileOut$)
'Modified from http://www.vbaexpress.com/forum/showthread.php?t=42375
'David Zemens - February 12, 2013
'Requires a reference to ADODB?
' UTF8() Version 1.00
' Open a "plain" text file and save it again in UTF-8 encoding
' (overwriting an existing file without asking for confirmation).
'
' Based on a sample script from JTMar:
' http://bytes.com/groups/asp/52959-save-file-utf-8-format-asp-vbscript
'
' Written by Rob van der Woude
' http://www.robvanderwoude.com
Dim objStream As ADODB.Stream
' Valid Charset values for ADODB.Stream
Const CdoBIG5 = "big5"
Const CdoEUC_JP = "euc-jp"
Const CdoEUC_KR = "euc-kr"
Const CdoGB2312 = "gb2312"
Const CdoISO_2022_JP = "iso-2022-jp"
Const CdoISO_2022_KR = "iso-2022-kr"
Const CdoISO_8859_1 = "iso-8859-1"
Const CdoISO_8859_2 = "iso-8859-2"
Const CdoISO_8859_3 = "iso-8859-3"
Const CdoISO_8859_4 = "iso-8859-4"
Const CdoISO_8859_5 = "iso-8859-5"
Const CdoISO_8859_6 = "iso-8859-6"
Const CdoISO_8859_7 = "iso-8859-7"
Const CdoISO_8859_8 = "iso-8859-8"
Const CdoISO_8859_9 = "iso-8859-9"
Const cdoKOI8_R = "koi8-r"
Const cdoShift_JIS = "shift-jis"
Const CdoUS_ASCII = "us-ascii"
Const CdoUTF_7 = "utf-7"
Const CdoUTF_8 = "utf-8"
' ADODB.Stream file I/O constants
Const adTypeBinary = 1
Const adTypeText = 2
Const adSaveCreateNotExist = 1
Const adSaveCreateOverWrite = 2
On Error Resume Next
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Type = adTypeText
objStream.Position = 0
objStream.Charset = CdoUTF_8
'We are passing a string to write to file, so omit the following line
' objStream.LoadFromFile myFileIn
'And instead of using LoadFromFile we are writing directly from the COPIED
' text from the unsaved/temp instance of Notepad.exe
objStream.WriteText textString, 1
objStream.SaveToFile myFileOut, adSaveCreateOverWrite
objStream.Close
Set objStream = Nothing
If Err Then
WriteUTF8 = False
Else
WriteUTF8 = True
End If
On Error GoTo 0
End Function

Decimalseperator lost after conversion from csv to excel with vb-script

I have a CSV with semicolon seperators that I would like to convert to a regular Excel sheet. I managed to do this with the code below, but I must have made a mistake because numbers with decimals in the original file that don't start with a zero are shown in Excel as number without the decimal separator. When I open the CSV manually in Excel the result will be fine, so it must be a side-effect of doing it with a script.
For example:
In the CSV there is a line:
2013-03-10 17:00:15; idle; 2,272298;; 0,121860
In the Excel sheet this becomes:
2013-03-10 17:00 | idle | 2.272.298| | 0,121860
Opened manually in excel gives:
2013-03-10 17:00 | idle | 2,272298| | 0,121860
Could somebody please tell me what I could/should change to keep the decimals as decimals in Excel? Possibly a way to tell Excel which symbol represents the decimal separator or an argument to force it into using European formats?
Kind regards, Nico
This is the script I currently have, where csvFile is a string with the full path to the original file and excelFile is a string with the full path to the location where I want to store the new excel sheet.
Set objExcel = CreateObject("Excel.Application") 'use excel
objExcel.Visible = true 'visible
objExcel.displayalerts=false 'no warnings
objExcel.Workbooks.Open(csvFile) 'open the file
objExcel.ActiveWorkbook.SaveAs excelFile, -4143, , , False, False 'save as xls
objExcel.Quit 'close excel
Create a schema.ini file in the folder your csvFile lives in and describe it according to the rules given here.
Further reading: import, text files
There are several approaches possible, I will cover one that I favor:
Start Recording a macro
Create a new workbook
From that workbook go to Data > From Text and there you select the CSV file, then you can do all the required settings regarding Value separators, Decimal separators, Thousands separators. Also the specific data type can be selected for each column.
When the CSV content is added go to Data > Connections and Remove
the connection. The data will stay in the worksheet, but there is no
longer an active connection.
Save the workbook under the xls name
Stop the Recording
Now tweak the script a bit to your liking.
In general Excel honors the system's regional settings. The CSV import, however, sometimes has its own mind about the "correct" format, particularly when the imported file has the extension .csv.
I'd try the following. Rename the file to .txt or .tsv and import it like this:
objExcel.Workbooks.OpenText csvFile, , , 1, 1, False, False, True
I made a work around. I now create a copy of the CSV file where I replace all commas followed by a number by points. While not very effective it does give Excel what it wants and it is simple enough for an inexperienced programmer like me to use.
When doing so a college asked me to also remove white spaces and entries with duplicate values in the first column (the timestamp in this case).
The result was this script
'csvFile is a string with the full path to the file. e.g. "C:\\Program Files\\Program\\data.csv"
'tempFile is a string with the full path to the file. e.g. "C:\\Temp\\temp.csv"
'excelfile is a string with the full path to the file. e.g. "D:\\Data\\sheet.xls"
Set fs=CreateObject("Scripting.FileSystemObject")
Set writeFile = fs.CreateTextFile(tempFile,True)
Set readFile = fs.OpenTextFile(csvFile)
' regular expression to remove leading whitespaces
Set regular_expression = New RegExp
regular_expression.Pattern = "^\s*"
regular_expression.Multiline = False
' regular expression to change the decimal seperator into a point
Set regular_expression2 = New RegExp
regular_expression2.Global = True
regular_expression2.Pattern = ",(?=\d)"
regular_expression2.Multiline = False
'copy the original file to the temp file and apply the changes
Do Until readFile.AtEndOfStream
strLine= readFile.ReadLine
If (StrComp(current_timestamp,Mid(strLine, 1, InStr(strLine,";")),1)<>0) Then
If (Len(previous_line) > 2) Then
previous_line = regular_expression2.replace(previous_line,".")
writeFile.Write regular_expression.Replace(previous_line, "") & vbCrLf
End if
End if
current_timestamp = Mid(strLine, 1, InStr(strLine,";"))
previous_line = strLine
Loop
readFile.Close
writeFile.Close
Set objExcel = CreateObject("Excel.Application") ' use excel
objExcel.Visible = true ' visible
objExcel.displayalerts=false ' no warning pop-ups
objExcel.Workbooks.Open(tempFile) ' open the file
objExcel.ActiveWorkbook.SaveAs excelfile, -4143, , , False, False 'save as excelfile
fs.DeleteFile tempFile ' clean up the temp file
I hope this will also be useful for someone else.

Exporting a Microsoft Access table to UTF-16 CSV

I have an Access table with some Chinese characters that I need to export into a CSV file with UTF-16 encoding. If this is not possible, I could also try exporting the table into an XLS or CSV file, and then convert the encoding to UTF-16.
I have a feeling there is no simple way of doing this using Access and/or Excel and/or VBA, but if there is, I would love to hear it! If not, a solution using Java would be helpful.
I'm sure it would be helpful if I knew what encoding the file was already in. The Chinese characters show up correctly when I export the file to Microsoft Excel 2000, but they do not show up correctly in Microsoft Access. They were originally typed into Microsoft Excel. I think that means they are in Unicode rich text, but I'm not sure.
Thanks much!
I use ADO streams to do this sort of thing. I had to do this for a TON of websites where I was helping them with SEO automation.
http://www.nonhostile.com/howto-convert-byte-array-utf8-string-vb6.asp
' accept a byte array containing utf-8 data
' and convert it to a string
Public Function ConvertStringToUtf8Bytes(ByRef strText As String) As Byte()
Dim objStream As ADODB.Stream
Dim data() As Byte
' init stream
Set objStream = New ADODB.Stream
objStream.Charset = "utf-16"
objStream.Mode = adModeReadWrite
objStream.Type = adTypeText
objStream.Open
' write bytes into stream
objStream.WriteText strText
objStream.Flush
' rewind stream and read text
objStream.Position = 0
objStream.Type = adTypeBinary
objStream.Read 3 ' skip first 3 bytes as this is the utf-8 marker
data = objStream.Read()
' close up and return
objStream.Close
ConvertStringToUtf8Bytes = data
End Function

Resources