How to read .txt file with Chinese characters? - excel

I have a subroutine that reads text files and extracts certain data from them. Here is an example:
NamePrefix = "Example"
OutputPath = "C:\Example"
DbSize = 65536
LstStr = ""
Dim Success() As Boolean
Dim Value() As Double
ReDim Success(1 to DbSize)
ReDim Value(1 to DbSize)
For ID = 1 to DbSize
'Read string
FileName = NamePrefix & Format(ID,"000000") & ".lst"
FilePath = OutputPath & "\" & FileName
Open FilePath For Input As 1
LstStr = Input(LOF(1),1)
Close 1
'Extract data
If InStr(1, LstStr, "SUCCESS") <> 0 Then Success(i) = True Else Success(i) = False
Pos1 = InStr(1, LstStr, "TH 1 value: ") 'Position varies for each file
Value(i) = Val(Mid(LstStr, Pos1 + 13, 10)) 'Value in scientific notation
Next ID
The use of InStr to locate strings by position works perfectly when there are just alphabets, numbers and symbols. However, sometimes the files contain Chinese characters and the Input function returns an empty string "" to LstStr. I tried to use some other suggested methods but in vain (e.g. Extract text from a text file with Chinese characters using vba). How should I read files with Chinese characters successfully, in a way that I do not need to modify other parts of the code which extract data by position? Thanks!

This would be an alternative way to read the string. Make sure that the .Charset is set to the charset of the file you want to read.
To use ADOBD you will need to add the reference Microsoft ActiveX Data Objects 6.1 Library (Version can be different) in VBA Menu › Extras › References
Dim adoStream As ADODB.Stream
Set adoStream = New ADODB.Stream
adoStream.Charset = "UTF-8" 'set the correct charset
adoStream.Open
adoStream.LoadFromFile FilePath
LstStr = adoStream.ReadText
adoStream.Close
Set adoStream = Nothing

Related

How to only open specific excel files by comparing file names?

I am currently working on a project that requires me to compile data from hundreds of spreadsheets in a given directory. My problem is I'm not sure how to handle different sub-revisions of files. For example the files are named:
File Name R1a.xlsx
File Name R1b.xlsx
File Name R1c.xlsx
File Name R2a.xlsx
File Name R2b.xlsx
For the above files I would only need to read from 1c and 2b. Is there a good way of determining which files need read, or could someone at least point me in a direction that I could look into? My initial thoughts were to loop through the characters in the file names and check for the largest letter that follows a number, but that seems like it would be incredibly tricky to code properly.
Thanks in advance!
There are a number of ways to approach this problem. If your filename domain is truly fixed as R{num}{prio}.xlsx, then note that the filenames constitute valid Excel cell addresses when {prio}.xlsx is stripped away. The resultant cell addresses from your example are R1 and R2. You can now use the R column of the current spreadsheet as a sparse vector to store the highest priority observed per cell (aka file). When all filenames have been examined and their highest priorities stored, it's now a simple matter of traversing the R column to pick up the files selected for processing.
Here's code that handles the aforementioned filename domain....
Sub ProcessFilesBasedOnFnamePriority()
Dim filenames, fname As Variant
Dim maxRowNum, nRowNum, i As Long
Dim strFilePrefix, strCellPrio As String
maxRowNum = 1
filenames = listfiles("c:\temp\lots_of_files")
' make an assumption that all filenames begin with the same
' single-char prefix character.
strFilePrefix = Left(filenames(1), 1)
For Each fname In filenames
Dim dotpos, suffixLen As Integer
Dim strCellAddr, strFnamePrio
dotpos = InStr(1, fname, ".")
suffixLen = Len(Mid(fname, dotpos))
' assume priority is specified by a single char in fname
strFnamePrio = LCase(Mid(fname, dotpos - 1, 1))
strCellAddr = Left(fname, Len(fname) - (suffixLen + 1)) ' "+1" to account for priority char
strCellPrio = Range(strCellAddr)
If (Trim(strCellPrio) = "") Then
Range(strCellAddr) = strFnamePrio ' store first prio occurrence
Else
' assume filename prio characters (e.g., [a-z]) sort lexicographically
If strFnamePrio > strCellPrio Then
Range(strCellAddr) = strFnamePrio
End If
End If
nRowNum = CLng(Mid(strCellAddr, 2))
If nRowNum > maxRowNum Then
maxRowNum = nRowNum
End If
Next
For i = 1 To maxRowNum
strCellPrio = Trim(Range(strFilePrefix & CStr(i)))
If strCellPrio <> "" Then
fname = strFilePrefix & CStr(i) & strCellPrio & ".xlsx"
Debug.Print fname ' <-- do analysis on fname
End If
Next i
End Sub
The code for listfiles is here. Note that this solution will not handle stray files that don't follow the assumed naming pattern. You'll need to add checks to weed them out.
You can store your list of partial file names you need to match in an array. Then loop through the partial names in the array and foreach partial name, loop through the directory to look for a match. VBA provides the InStr function to that you can use to test if a file name contains the partial name from your array.
In pseudocode:
myArray = [ 1c, 2b]
ForEach partialName in myArray
ForEach file in myDirectory
If InStr(fileName, partialName) Then
// Do something interesting
End If
Next file
Next partialName
Use a excel sheet or tabular format to express the file desired.
The excel sheet only needs two columns, A for the filename without the sub-revision, and column B for the desired sub-revision.
Compile and compose this information and then your vba implementation use the excel worksheet/tabular format to iterate and read "only" those files.
Use whatever language to compose the tabular format (in my case, python is preferred) and try to use any thing you can "to determine the sub-revision".
This allows you to debug the results more easily and use whatever language to compile a worksheet formatted or tabular delimited file.
This works because .GetFolder returns a sorted list.
Option Explicit
Sub FilesSelecter()
Dim fs As Object
Dim TargetPath As String
Dim DirList As Object
Dim File As Object
Dim BaseName As String
Dim RootFileName As String
Dim SaveRootFileName As String
Dim SaveBaseName As String
Set fs = CreateObject("Scripting.FileSystemObject")
TargetPath = "C:\Users\BeastMstr\Documents\TestFiles"
Set DirList = fs.Getfolder(TargetPath)
SaveRootFileName = ""
For Each File In DirList.Files
BaseName = fs.getbasename(File)
RootFileName = Left(BaseName, Len(BaseName) - 1)
If SaveRootFileName = RootFileName Or SaveRootFileName = "" Then
SaveRootFileName = RootFileName
SaveBaseName = BaseName
Else
'
'Do Somethingwith SaveBaseName
'
Debug.Print SaveBaseName
SaveRootFileName = RootFileName
SaveBaseName = BaseName
End If
Next
'
' Do something with the last file
'
Debug.Print SaveBaseName
End Sub

Special Characters from txt file to excel

I am trying to import special characters from a txt file into excel.
I've tried so many things but the characters BREAK in excel.
example of my string:
in txt: Changjíhuízúzìzhìzhou
converts in excel to: Changjíhuízúzìzhìzhou
so I tried moving values over bit by bit but no luck..
Sub ImportTXTFile()
Dim file As Variant
Dim EXT As String
Dim Direct As String ' directory...
Direct = "C:\FilePath\Here\"
EXT = ".txt"
Dim COL As Long
Dim row As Long
COL = 1
row = 1
file = Dir(Direct)
Do While (file <> "") ' Cycle through files until no more files
If InStr(file, "Data.txt") > 0 Then
'
Open Direct & "Data.txt" For Input As #1
'
While Not EOF(1)
Line Input #1, DataLine ' Read in line
Do While DataLine <> ""
If InStr(DataLine, ",") = 0 Then ' Drop value into excel upto the first ,
Sheets("test").Cells(row, COL).Value = DataLine
DataLine = ""
Else
Sheets("test").Cells(row, COL).Value = Left(DataLine, InStr(DataLine, ",") - 1)
DataLine = Right(DataLine, Len(DataLine) - InStr(DataLine, ",")) ' rebuild array without data upto first ,
End If
COL = COL + 1 ' next column
Loop
COL = 1 ' reset column
row = row + 1 ' write to next row
Wend
'
Close #1 ' Close files straight away
End If
file = Dir
Loop
MsgBox "Data Updated"
End Sub
So I want to cry because all this converting of UTF-8 to ASCII can be avoid simply by:
opening the txt file in Notepad++
going to the encoding tab
clicking convert to ASCII
ran my original code.
BLAM
everything is perfect.
Thank you danieltakeshi for all your help!
Using the first link i gave you, here is a test code, i tested with success. Using the charset: CdoISO_8859_1
Dim objStream As Object
Dim strData As String
Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "iso-8859-1"
objStream.Open
objStream.LoadFromFile ("C:\Users\user_name\Desktop\test.txt")
strData = objStream.ReadText()
Debug.Print strData & " Compare to: Changjíhuízúzìzhìzhou"
The output was:
EDIT:
Check the encoding type of your .txt file and import to Excel with the same encoding charset, for example, i changed the test.txt to UTF-8 and imported successfully with the .Charset as "utf-8"
You can Save As your .txt file and choose the encoding.

Excel to Text VBA precedes with question mark - how to fix?

I'm very new to VBA in Excel. I'm using this code I cobbled together from example snippets online to convert a column of cells in Excel to a text file:
Private Sub CommandButton1_Click()
Dim myFile As String, rng As Range, cellValue As Variant, i As Integer, j As Integer
Dim FName As String
Dim FPath As String
Set fsT = CreateObject("ADODB.Stream"): 'Create Stream object
fsT.Type = 2: 'Specify stream type – we want To save text/string data.
fsT.Charset = "utf-8": 'Specify charset For the source text data.
FPath = "C:\WHIT\ParamGen"
FName = Sheets("Sheet1").Range("b49").Text
myFile = FPath & "\" & FName
Set rng = Range("B2: B42 ")
Open myFile For Output As #1
For i = 1 To rng.Rows.Count
For j = 1 To rng.Columns.Count
cellValue = rng.Cells(i, j).Value
If j = rng.Columns.Count Then
Print #1, cellValue
Else
Print #1, cellValue,
End If
Next j
Next i
Close #1
End Sub
The problem is that the first cell in my Excel file contains this text:
#!=1
...and it shows up in the generated text file like this:
?#!=1
Everything else in the excel file gets written to the text file without issue, but that question mark messes up the import function in the software this file is being generated for.
Any ideas on getting this question mark to disappear?
Have you tried removing the "?" with code in the file, such as:
If left(cellvalue,1)="?" then
application.substitute(cellvalue,"?","")
end if
I ran this code with the first cell containing #!=1 and it wrote correctly to the text file as #!=1 (no ? added). Did you check to see if cell B2 contains any non-printable characters?
I found a solution. Excel was treating the #!=1 in the first cell as a function, but it wasn't a functional function. My best guess is that it was throwing an invisible character in there as it parsed it into a text file. Overwriting the offending cell with '#!=1 did the trick.

Create text files from excel

Need to create text files from excel rows.
Column 1 should include the file names and column 2 the content of the text files. Each row will have either new file name or new content for that new text file. Also, the content of the text should be split into several lines.
How to accomplish this? Thank you.
Edited solution with text separation to lines.
For the sample the following chars are used:
;:,/|
Add new separators to RegEx pattern as required. Full code is below:
Sub Text2Files()
Dim FileStream As Object
Dim FileContent As String
Dim i As Long
Dim SavePath As String
Dim RegX_Split As Object
Set RegX_Split = CreateObject("VBScript.RegExp")
RegX_Split.Pattern = "[\;\:\,\\\/\|]" 'List of used line seperators as \X
RegX_Split.IgnoreCase = True
RegX_Split.Global = True
Set FileStream = CreateObject("ADODB.Stream")
SavePath = "D:\DOCUMENTS\" 'Set existing folder with trailing "\"
For i = 1 To ThisWorkbook.ActiveSheet.Cells(1, 1).CurrentRegion.Rows.Count
FileContent = RegX_Split.Replace(ThisWorkbook.ActiveSheet.Cells(i, 2).Text, vbNewLine)
FileStream.Open
FileStream.Type = 2 'Text
FileStream.Charset = "UTF-8" 'Change encoding as required
FileStream.WriteText FileContent
FileStream.SaveToFile SavePath & ThisWorkbook.ActiveSheet.Cells(i, 1).Text, 2 'Will overwrite the existing file
FileStream.Close
Next i
End Sub
Read more about ADO Stream Object:
http://www.w3schools.com/ado/ado_ref_stream.asp
RegEx for beginners:
http://www.jose.it-berater.org/scripting/regexp/regular_expression_syntax.htm
Sample file with the above code is here: https://www.dropbox.com/s/kh9cq1gqmg07j20/Text2Files.xlsm

How to extract file name from path?

How do I extract the filename myfile.pdf from C:\Documents\myfile.pdf in VBA?
The best way of working with files and directories in VBA for Office 2000/2003 is using the scripting library.
Create a filesystem object and do all operations using that.
Early binding:
Add a reference to Microsoft Scripting Runtime (Tools > References in the IDE).
Dim fso as new FileSystemObject
Dim fileName As String
fileName = fso.GetFileName("c:\any path\file.txt")
Late binding (see comments for more)
With CreateObject("Scripting.FileSystemObject")
fileName = .GetFileName(FilePath)
extName = .GetExtensionName(FilePath)
baseName = .GetBaseName(FilePath)
parentName = .GetParentFolderName(FilePath)
End With
The FileSystemObject is great. It offers a lot of features such as getting special folders (My documents, etc.), creating, moving, copying, deleting files and directories in an object oriented manner.
Dir("C:\Documents\myfile.pdf")
will return the file name, but only if it exists.
This is taken from snippets.dzone.com:
Function GetFilenameFromPath(ByVal strPath As String) As String
' Returns the rightmost characters of a string upto but not including the rightmost '\'
' e.g. 'c:\winnt\win.ini' returns 'win.ini'
If Right$(strPath, 1) <> "\" And Len(strPath) > 0 Then
GetFilenameFromPath = GetFilenameFromPath(Left$(strPath, Len(strPath) - 1)) + Right$(strPath, 1)
End If
End Function
I've read through all the answers and I'd like to add one more that I think wins out because of its simplicity. Unlike the accepted answer this does not require recursion. It also does not require referencing a FileSystemObject.
Function FileNameFromPath(strFullPath As String) As String
FileNameFromPath = Right(strFullPath, Len(strFullPath) - InStrRev(strFullPath, "\"))
End Function
http://vba-tutorial.com/parsing-a-file-string-into-path-filename-and-extension/ has this code plus other functions for parsing out the file path, extension and even the filename without the extension.
I can't believe how overcomplicated some of these answers are... (no offence!)
Here's a single-line function that will get the job done:
Function getFName(pf)As String:getFName=Mid(pf,InStrRev(pf,"\")+1):End Function
Function getPath(pf)As String:getPath=Left(pf,InStrRev(pf,"\")):End Function
Examples:
Dim sFilePath$, sFileName$
sFileName = Split(sFilePath, "\")(UBound(Split(sFilePath, "\")))
If you want a more robust solution that will give you both the full folder's path AND the filename, here it is:
Dim strFileName As String, strFolderPath As String
Dim lngIndex As Long
Dim strPath() As String
strPath() = Split(OpenArgs, "\") 'Put the Parts of our path into an array
lngIndex = UBound(strPath)
strFileName = strPath(lngIndex) 'Get the File Name from our array
strPath(lngIndex) = "" 'Remove the File Name from our array
strFolderPath = Join(strPath, "\") 'Rebuild our path from our array
Or as a sub/function:
Private Sub SeparatePathAndFile(ByRef io_strFolderPath As String, ByRef o_strFileName As String)
Dim strPath() As String
Dim lngIndex As Long
strPath() = Split(io_strFolderPath, "\") 'Put the Parts of our path into an array
lngIndex = UBound(strPath)
o_strFileName = strPath(lngIndex) 'Get the File Name from our array
strPath(lngIndex) = "" 'Remove the File Name from our array
io_strFolderPath = Join(strPath, "\") 'Rebuild our path from our array
End Sub
You pass the first parameter with the full path of the file and it will be set to the folder's path while the second parameter will be set to the file's name.
Here's a simple VBA solution I wrote that works with Windows, Unix, Mac, and URL paths.
sFileName = Mid(Mid(sPath, InStrRev(sPath, "/") + 1), InStrRev(sPath, "\") + 1)
sFolderName = Left(sPath, Len(sPath) - Len(sFileName))
You can test the output using this code:
'Visual Basic for Applications
http = "https://www.server.com/docs/Letter.txt"
unix = "/home/user/docs/Letter.txt"
dos = "C:\user\docs\Letter.txt"
win = "\\Server01\user\docs\Letter.txt"
blank = ""
sPath = unix
sFileName = Mid(Mid(sPath, InStrRev(sPath, "/") + 1), InStrRev(sPath, "\") + 1)
sFolderName = Left(sPath, Len(sPath) - Len(sFileName))
Debug.print "Folder: " & sFolderName & " File: " & sFileName
Also see: Wikipedia - Path (computing)
The simplest approach if you are sure the file physically exists on the disk:
Dim fileName, filePath As String
filePath = "C:\Documents\myfile.pdf"
fileName = Dir(filePath)
If you are not sure about existence of file or just want to extract filename from a given path then, simplest approach is:
fileName = Mid(filePath, InStrRev(filePath, "\") + 1)
To get the file name in an excel macro is:
filname = Mid(spth, InStrRev(spth, "\", Len(spth)) + 1, Len(spth))
MsgBox Mid(filname, 1, InStr(filname, ".") - 1)
Function file_name_only(file_path As String) As String
Dim temp As Variant
temp = Split(file_path, Application.PathSeparator)
file_name_only = temp(UBound(temp))
End Function
here you give your file name as input of the function
the split function of VBA splits the path in different portion by using "\" as path separator & stores them in an array named "temp"
the UBound() finds the max item number of array and finally assigns the result to "file_name_only" function
Hope this will be helpful.
Here's an alternative solution without code. This VBA works in the Excel Formula Bar:
To extract the file name:
=RIGHT(A1,LEN(A1)-FIND("~",SUBSTITUTE(A1,"\","~",LEN(A1)-LEN(SUBSTITUTE(A1,"\","")))))
To extract the file path:
=MID(A1,1,LEN(A1)-LEN(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1,"\",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,LEN(A1))))
I am using this function...
VBA Function:
Function FunctionGetFileName(FullPath As String) As String
'Update 20140210
Dim splitList As Variant
splitList = VBA.Split(FullPath, "\")
FunctionGetFileName = splitList(UBound(splitList, 1))
End Function
Now enter
=FunctionGetFileName(A1) in youe required cell.
or You can use these...
=MID(A1,FIND("*",SUBSTITUTE(A1,"\","*",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,LEN(A1))
I needed the path, not the filename.
So to extract the file path in code:
JustPath = Left(sFileP, Len(sFileP) - Len(Split(sFileP, "\")(UBound(Split(sFileP, "\")))))
This gleaned from Twiggy # http://archive.atomicmpc.com.au and other places:
'since the file name and path were used several times in code
'variables were made public
Public FName As Variant, Filename As String, Path As String
Sub xxx()
...
If Not GetFileName = 1 Then Exit Sub '
...
End Sub
Private Function GetFileName()
GetFileName = 0 'used for error handling at call point in case user cancels
FName = Application.GetOpenFilename("Ramp log file (*.txt), *.txt")
If Not VarType(FName) = vbBoolean Then GetFileName = 1 'to assure selection was made
Filename = Split(FName, "\")(UBound(Split(FName, "\"))) 'results in file name
Path = Left(FName, InStrRev(FName, "\")) 'results in path
End Function
Dim nme As String = My.Computer.FileSystem.GetFileInfo(pathFicheiro).Name
Dim dirc As String = My.Computer.FileSystem.GetFileInfo(nomeFicheiro).Directory

Resources