I have an Access table with some Chinese characters that I need to export into a CSV file with UTF-16 encoding. If this is not possible, I could also try exporting the table into an XLS or CSV file, and then convert the encoding to UTF-16.
I have a feeling there is no simple way of doing this using Access and/or Excel and/or VBA, but if there is, I would love to hear it! If not, a solution using Java would be helpful.
I'm sure it would be helpful if I knew what encoding the file was already in. The Chinese characters show up correctly when I export the file to Microsoft Excel 2000, but they do not show up correctly in Microsoft Access. They were originally typed into Microsoft Excel. I think that means they are in Unicode rich text, but I'm not sure.
Thanks much!
I use ADO streams to do this sort of thing. I had to do this for a TON of websites where I was helping them with SEO automation.
http://www.nonhostile.com/howto-convert-byte-array-utf8-string-vb6.asp
' accept a byte array containing utf-8 data
' and convert it to a string
Public Function ConvertStringToUtf8Bytes(ByRef strText As String) As Byte()
Dim objStream As ADODB.Stream
Dim data() As Byte
' init stream
Set objStream = New ADODB.Stream
objStream.Charset = "utf-16"
objStream.Mode = adModeReadWrite
objStream.Type = adTypeText
objStream.Open
' write bytes into stream
objStream.WriteText strText
objStream.Flush
' rewind stream and read text
objStream.Position = 0
objStream.Type = adTypeBinary
objStream.Read 3 ' skip first 3 bytes as this is the utf-8 marker
data = objStream.Read()
' close up and return
objStream.Close
ConvertStringToUtf8Bytes = data
End Function
Related
I have a one dimensional array with more than 3 million items and I would like to transfer it to a text file. I tried a FileSystemObject method, which is not fast enough for me. So I tried to write to cells in a worksheet and export it as txt file, but I am still searching for a faster way to write an array to a txt file.
Please try also Put (and maybe later also Get):
Private Sub TestPut(myArray() as string)
Dim handle As Long
handle = FreeFile
Open Application.Defaultfilepath & "\Whatever.txt" For Binary As #handle
Put #handle, , myArray
Close #handle
End Sub
You may join your array as a single string to prevent unwanted descriptors (see above Put-documentation) and to define CR or CRLF or whatever as delimiter,
but only if the resulting string's length does not exceed 2,147,483,647 bytes:
Put #handle, , Join(myArray, vbCrLf)
Try something like that
FilePath = "C:\output.txt"
Set FileStream = CreateObject("ADODB.Stream")
FileStream.Open
FileStream.Type = 2 'Text
FileStream.Charset = "utf-8"
FileStream.WriteText vba.Strings.Join(YourArray)
FileStream.SaveToFile (FilePath)
FileStream.Close
I got the issue to export a huge amount of Excel cells to a .txt file, which necessarily needs to in UTF-8 format. Therefore I made a VBScript, which is executed by a batch file and it totally does what it should (despite it creates a UTF-16 file).
Set file = fso.OpenTextFile(FILE, 2, True, -1)
In the documentation is mentioned that the -1 will generate a Unicode file and I am quite sure, that this is limited to UTF-16.
My questions are now: am I missing something or it is quite not possible to achieve this with VBScript? Is there an easier way? Besides that: is this platform independent?
The FileSystemObject doesn't support UTF-8, but ADODB.Stream objects do.
...
'Note: wb is the variable holding your workbook object
'Save worksheet as Unicode text ...
wb.SaveAs "C:\utf16.txt", 42
'... read the Unicode text into a variable ...
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\utf16.txt", 1, False, -1).ReadAll
'... and export it as UTF-8 text.
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = 2 'text
stream.Position = 0
stream.Charset = "utf-8"
stream.WriteText txt
stream.SaveToFile "C:\utf8.txt", 2
stream.Close
...
I'm working on an Access database in which I import csv files converted from xls
Usually this works, but recently one file has some fields where characters change within the field after being imported into Access
For example:
a dash changes to û
a beginning double quote changes to ô
an end double quote changes to ö
From what I have read it has something to do with 7 or 8 bit character codes.. which is not something I really understand.
My questions are, is there any way to prevent this character change or is there something better than what I've tried already?
Or are there any potential problems that I haven't come across with what seems to work in my example below?
Here's what I've tried so far that seems to work
From the original Excel file Save as unicode text file (something new for me)
ActiveWorkbook.SaveAs Filename:= _
"D:\NewFiles\ReportList.txt", FileFormat:=xlUnicodeText _
, CreateBackup:=False
Then import into the database with the following code
DoCmd.TransferText acImportDelim, "ReportList Import Specification", "tbl_ReportList", "D:\NewFiles\ReportList.txt", True
This seems to import the text into the database correctly.
Other people work with the data and then export a new report from Access to Excel.
That changes the font to MS Sans Serif and changes the characters again but not the same changes as when it was imported.
After the Excel report is exported, and I change the font to Arial the characters are correct again.... at least so far.
I haven't run into this character change in the past and my solution seems to work, but I'm not sure if there are other potential problems or if there's anything I missed. I haven't found the answer to this specific question yet.
Thanks for taking time to help with this.
Here is a method that I have used in the past to circumvent the character encoding issues.
I suspect this method should also work between Excel and Access -- although Access is not really something I am familiar with.
This sub specifies the file's full name & path, and a destination for a new filename & path. These could be the same if you want to overwrite existing.
NOTE On a few simple tests, I can't get this to read a file saved as "Unicode" from Excel, but it works perfectly on files saved as "Tab Delimited TXT" files and CSV/comma-separated files, too.
Sub OpenAndSaveTxtUTF8()
Dim txtFileName as String
Dim newTxtFileName as String
txtFileName = "D:\NewFiles\ReportList.txt"
newTxtFileName = "D:\NewFiles\UTF8_ReportList.txt"
WriteUTF8(ReadTextFile(txtFileName), newTxtFileName)
End Sub
This sub calls on two functions which I borrowed from sources credited in the code comments. The WriteUTF8 creates a proper UTF8 file from the contents of ReadTextFile which returns a string of the full file contents.
Function ReadTextFile(sFileName As String) As String
'http://www.vbaexpress.com/kb/getarticle.php?kb_id=699
Dim iFile As Integer
On Local Error Resume Next
' \\ Use FreeFile to supply a file number that is not already in use
iFile = FreeFile
' \\ ' Open file for input.
Open sFileName For Input As #iFile
' \\ Return (Read) the whole content of the file to the function
ReadTextFile = Input$(LOF(iFile), iFile)
Close #iFile
On Error GoTo 0
End Function
This function requires a reference to the ADODB library, or, you can Dim objStream As Object and the code should still work for you.
Function WriteUTF8(textString$, myFileOut$)
'Modified from http://www.vbaexpress.com/forum/showthread.php?t=42375
'David Zemens - February 12, 2013
'Requires a reference to ADODB?
' UTF8() Version 1.00
' Open a "plain" text file and save it again in UTF-8 encoding
' (overwriting an existing file without asking for confirmation).
'
' Based on a sample script from JTMar:
' http://bytes.com/groups/asp/52959-save-file-utf-8-format-asp-vbscript
'
' Written by Rob van der Woude
' http://www.robvanderwoude.com
Dim objStream As ADODB.Stream
' Valid Charset values for ADODB.Stream
Const CdoBIG5 = "big5"
Const CdoEUC_JP = "euc-jp"
Const CdoEUC_KR = "euc-kr"
Const CdoGB2312 = "gb2312"
Const CdoISO_2022_JP = "iso-2022-jp"
Const CdoISO_2022_KR = "iso-2022-kr"
Const CdoISO_8859_1 = "iso-8859-1"
Const CdoISO_8859_2 = "iso-8859-2"
Const CdoISO_8859_3 = "iso-8859-3"
Const CdoISO_8859_4 = "iso-8859-4"
Const CdoISO_8859_5 = "iso-8859-5"
Const CdoISO_8859_6 = "iso-8859-6"
Const CdoISO_8859_7 = "iso-8859-7"
Const CdoISO_8859_8 = "iso-8859-8"
Const CdoISO_8859_9 = "iso-8859-9"
Const cdoKOI8_R = "koi8-r"
Const cdoShift_JIS = "shift-jis"
Const CdoUS_ASCII = "us-ascii"
Const CdoUTF_7 = "utf-7"
Const CdoUTF_8 = "utf-8"
' ADODB.Stream file I/O constants
Const adTypeBinary = 1
Const adTypeText = 2
Const adSaveCreateNotExist = 1
Const adSaveCreateOverWrite = 2
On Error Resume Next
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Type = adTypeText
objStream.Position = 0
objStream.Charset = CdoUTF_8
'We are passing a string to write to file, so omit the following line
' objStream.LoadFromFile myFileIn
'And instead of using LoadFromFile we are writing directly from the COPIED
' text from the unsaved/temp instance of Notepad.exe
objStream.WriteText textString, 1
objStream.SaveToFile myFileOut, adSaveCreateOverWrite
objStream.Close
Set objStream = Nothing
If Err Then
WriteUTF8 = False
Else
WriteUTF8 = True
End If
On Error GoTo 0
End Function
While saving an excel file with chainese character to csv, these characters are converting to ??? (Question marks) junk characters.
Please let me know if any of you have any solution for this. I tried saving it in unicode text, it worked fine but when I tried saving it as .csv, its not working.
Thanks
I has a similar problem with Japanese characters before. At the time Excel 2003 only exported CSV to Latin1 (or maybe Windows 1352). I basically wrote my own Excel Macro to iterate over the rows and columns, and build up an in memory string of what the CSV file would look like. Then, I used an ADODB.Stream to save it myself. This sample code should get you started.
Dim csvdata As String
Dim CRLF As String
Dim objStream As Object
CRLF = Chr(13) & Chr(10)
csvdata = """key"",""value""" + CRLF
csvdata = csvdata + """a"",""a""" + CRLF
csvdata = csvdata + """aacute"",""á""" + CRLF
Set objStream = CreateObject("ADODB.Stream")
objStream.Open
objStream.Position = 0
objStream.Charset = "UTF-8"
objStream.WriteText csvdata
objStream.SaveToFile "test.csv", 2 ' adSaveCreateOverWrite
objStream.Close
Why do you need a CSV file? What encoding do you need it in? UTF-8? GBK? What software is going to read the CSV file? What version of Excel are you using?
If you know Python, you could use the xlrd module to read the Excel file, format the data, encode it, and write it to a CSV file, or use it to update a database, or whatever.
i have files that open with excel.
when i open the file the text is like gibberish.
i need to encode - tools-internet option - general-encode - hebrew iso-visual
and then the file turn to hebrew
there is a vba code that do that ?
thanks,
omri
I don't really have a way to test this, so I am just taking a shot:
Excel.ActiveWorkbook.WebOptions.Encoding = msoEncodingHebrew
Use the following function from ADODB Stream, with the following code.
Page 1255 is the original Hebrew page.
And you need to reference the latest Microsoft ActiveX Data Objects Library.
(Tools/References)
Public Function CorrectHebrew(gibberish As String) As String
Dim inStream As ADODB.stream
Set inStream = New ADODB.stream
inStream.Open
inStream.Charset = "WIndows-1255"
inStream.WriteText gibberish
inStream.Position = 0 ' bring it back to start preparing for the ReadText
inStream.Charset = "UTF-8"
CorrectHebrew = inStream.ReadText ' return the corrected text
inStream.Close
End Function