Speed up finding txt files in nested folder structure - excel

This code finds a txt file with a variable name in a folder structure with variable names.
The underlined parts are the variable parts that I can't know
and further back the folder structure looks like this:
sub Main()
Dim FileSystem As Object
Dim HostFolder As String
' for loop that loops multiple days omitted for simplicity
HostFolder = "H:\Dokument\Avvikelser\" & Format(dd, "YYYY\\mm\\dd") ' dd is the date it should look at
Set FileSystem = CreateObject("Scripting.FileSystemObject")
DoFolder FileSystem.GetFolder(HostFolder)
End sub
Sub DoFolder(Folder)
Dim SubFolder
For Each SubFolder In Folder.SubFolders
'Debug.Print SubFolder
' since I don't need to look in the folder called 51562 I added the if below
If IsNumeric(Right(SubFolder, 5)) = False Then DoFolder SubFolder
DoEvents
Next
Dim File
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
For Each File In Folder.Files
' only the text file is interesting
If Right(File, 3) = "txt" Then
Debug.Print File
objStream.Charset = "utf-8"
objStream.Open
objStream.LoadFromFile (File)
cont = objStream.ReadText()
objStream.Close
Set objStream = Nothing
' strData is a public string that is appended with multiple textfiles contents
strData = strData & cont
DoEvents
End If
Next
End Sub
The barebone of the code is copied from https://stackoverflow.com/a/22645439/5159168
The issue I have is that even though I have tried to make it only traverse the parts it has to it still takes about 2-2.5 seconds between each file is read.
Is there anything that I can do to make this faster?

Example of how to split your procedures (so each of them does less things): The first one GetAllTextFilesFromNonNumericSubFolders just collects the text files of all non numeric sub folders and collects them in a FileList. The second one ProceedTextFiles then uses the FileList to work with these text files.
Now you can easily check out which one is the bottleneck. Just comment out ProceedTextFiles FileList in your Main() procedure. If this runs fast the bottleneck is not looping through the folders. If it is slow, you can try to find an more optimized way of collecting the text files.
Option Explicit
Public Sub Main()
Dim HostFolder As String
HostFolder = "H:\Dokument\Avvikelser\" & Format(dd, "YYYY\\mm\\dd")
Dim FileSystem As Object
Set FileSystem = CreateObject("Scripting.FileSystemObject")
Dim FileList As New Collection
GetAllTextFilesFromNonNumericSubFolders FileSystem.GetFolder(HostFolder), FileList
If FileList.Count > 0 Then
ProceedTextFiles FileList
Else
MsgBox "No text files found in '" & HostFolder & "'.", vbExclamation
End If
End Sub
Public Sub GetAllTextFilesFromNonNumericSubFolders(ByVal Folder As Folder, ByRef FileList As Collection)
Dim SubFolder As Variant
For Each SubFolder In Folder.SubFolders
'Debug.Print SubFolder
' since I don't need to look in the folder called 51562 I added the if below
If IsNumeric(Right(SubFolder, 5)) = False Then GetAllTextFilesFromNonNumericSubFolders SubFolder, FileList
DoEvents 'ToDo: remove this line to optimize speed
Next
Dim File As Variant
For Each File In Folder.Files
' only the text file is interesting
If Right(File, 3) = "txt" Then
Debug.Print File 'ToDo: remove this line to optimize speed
FileList.Add File
End If
Next
End Sub
Public Sub ProceedTextFiles(ByVal FileList As Collection)
Dim objStream As Variant
Set objStream = CreateObject("ADODB.Stream")
Dim File As Variant
For Each File In FileList
Debug.Print File 'ToDo: remove this line to optimize speed
objStream.Charset = "utf-8"
objStream.Open
objStream.LoadFromFile File
cont = objStream.ReadText()
objStream.Close
Set objStream = Nothing
' strData is a public string that is appended with multiple textfiles contents
strData = strData & cont
DoEvents 'ToDo: remove this line to optimize speed
Next File
End Sub
Note that the ByRef FileList in the GetAllTextFilesFromNonNumericSubFolders procedure is a return variable. Because it is ByRef the changes made in teh procedure are returned back to the Main procedure.
Alternative:
Give the follwing a try. This uses the dir *.txt /A-H-S /B /S command to go through the subfolders and should be much quicker.
Option Explicit
Public Sub Main()
Dim HostFolder As String
HostFolder = "H:\Dokument\Avvikelser\" & Format(dd, "YYYY\\mm\\dd")
Dim FileList() As String
FileList = GetAllTextFilesSubFolders(HostFolder)
If (Not Not FileList) <> 0 Then
ProceedTextFiles FileList
Else
MsgBox "No text files found in '" & HostFolder & "'.", vbExclamation
End If
End Sub
Public Function GetAllTextFilesSubFolders(ByVal Folder As String) As Variant
GetAllTextFilesSubFolders = Split(CreateObject("WScript.Shell").Exec("cmd /c dir """ & Folder & "\""*.txt /A-H-S /B /S").StdOut.ReadAll, vbNewLine)
End Function
Public Sub ProceedTextFiles(ByRef FileList() As String)
Dim objStream
Set objStream = CreateObject("ADODB.Stream")
Dim File As Variant
For Each File In FileList
Debug.Print File
objStream.Charset = "utf-8"
objStream.Open
objStream.LoadFromFile File
cont = objStream.ReadText()
objStream.Close
Set objStream = Nothing
' strData is a public string that is appended with multiple textfiles contents
strData = strData & cont
DoEvents
Next File
End Sub
Added by OP.
UTF-8 version
Since you can't read the cmd window in UTF-8, the easy dirty hack is to replace the wrong characters in the string before you split.
Public Function GetAllTextFilesSubFolders(ByVal Folder As String) As Variant
GetAllTextFilesSubFolders = Split(Replace(Replace(Replace(CreateObject("WScript.Shell").Exec("cmd /c dir """ & Folder & "\""*.txt /A-H-S /B /S").StdOut.ReadAll, "†", "å"), "„", "ä"), "”", "ö"), vbNewLine)
End Function
2 seconds!

Related

Pull data from most recent file in folder

Trying to use the most recent file in folder for data.
My problem is that my master excel file wont use the data from the most recent data file (xlsx) to pull the data. My code currently has the name of the current file (eg. "Network-2019.xlsm") but lets say i insert a file called "network.xlsm, which is posted in the folder later. I want main dataset to recognize this and pull in that data.
Function GetMostRecentExcelFile(ByVal myDirectory As String, ByVal filePattern As String) As String
Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Dim myFolder As Object
Set myFolder = fso.getfolder(IIf(Right(myDirectory, 1) = "\", myDirectory, myDirectory & "\"))
Dim currentDate As Date
Dim fname As String
Dim currentFile As Object
For Each currentFile In myFolder.Files
If (currentDate = CDate(0) Or currentFile.DateCreated > currentDate) And currentFile.name Like filePattern _
And InStr(LCase$(currentFile.name), ".xlsx") > 0 And InStr(currentFile.name, "~$") = 0 Then
currentDate = currentFile.DateCreated
fname = currentFile.name
End If
Next currentFile
GetMostRecentExcelFile = fname
End Function
I would suggest something like below, since you are using the FileSystemObject
Note that I used early binding. The associated intellisense is quite useful, and you can always change to late binding if you need to for any reason.
Option Explicit
Function GetMostRecentExcelFile(sFolderPath As String) As String
Dim FSO As FileSystemObject
Dim FO As Folder, FI As File, recentFI As File
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(sFolderPath)
For Each FI In FO.Files
Select Case FI.Name Like "*.xlsx"
Case True
Select Case recentFI Is Nothing
Case True
Set recentFI = FI
Case False
If FI.DateCreated > recentFI.DateCreated Then
Set recentFI = FI
End If
End Select
End Select
Next FI
GetMostRecentExcelFile = recentFI.Path
End Function

VBA : Looping through Filesystem and finding the newest file

In my program, i want to browse through a complex file structure, and display the newest file in it.
The file structure has several folders and subfolders, most of the time empty. So this macro would help to reveal where the latest information is.
Sub newestFile()
Dim FileSystem as object
Dim MostRecentFile as string
Dim MostRecentDate as Date
Dim FileSpec as String
Dim filename as string
'This is where i specify what type of files i would be looking for
FileSpec ="*.*"
'This is where i specify where the master directory is, so that i may look down into it
Directory ="c:\Directory1\"
filename = Dir(Directory & FileSpec)
set Filesystem = CreateObject("Scripting.FileSystemObject")
Do Folder FileSystem.getFolder(Directory)
set ws = Sheets("Events")
ws.cells(2,7).value = MostRecentFile
ws.cells(2,8).value = MostRecentDate
end sub
private Function DoFolder(Directory)
For each subfolder in Directory.SubFolders
DoFolder subfolder
Dim file
For each File in Directory.files
'actions go here
If File <> "" Then
MostRecentFile = File
MostRecentDate = FileDateTime(Directory)
If FileDateTime(File) > MostRecentDate Then
MostRecentFile = File
MostRecentDate = FileDateTime(File)
End if
End If
next
next
End Function
on this code i always loose the variables (MostRecentFile and MostRecentDate) when the code goes to another subfolder.
I expected on having the name of the newest file (of the whole structure), and the date.
As was said, scope is certainly a concern. Here's a loop within the subroutine:
Sub newestFile()
Dim FileSystem As Object ' Needed to get file properties
Set FileSystem = CreateObject("Scripting.FileSystemObject")
Dim MostRecentDate As Date
Dim MostRecentFile As String
Directory = "c:\Directory1\"
FileSpec = "*.txt" '<-- can be "*.xls*" or whatever criteria needed
MyFile = ""
'Loop through text files in Directory finding the most current file
MyFile = Dir(Directory & FileSpec) 'Get first file name in directory
Do While MyFile <> ""
If MostRecentDate < FileSystem.GetFile(Directory & MyFile).DateLastModified Then
MostRecentDate = FileSystem.GetFile(Directory & MyFile).DateLastModified
MostRecentFile = MyFile
End If
MyFile = Dir 'Get next file matching criteria
Loop
set ws = Sheets("Events")
ws.cells(2,7).value = MostRecentFile
ws.cells(2,8).value = MostRecentDate
End Sub
You need to declare the variables at module level
Private MostRecentFile as string
Private MostRecentDate as Date
Sub newestFile()
....
End Sub

how to read a text using condition if

I have an issue and I need your help. here is the problem. I have inside a folder some excel files that I have to open automatically in order to make some operations. Those files have the same name except the number of the files like this:
Folder name : Extraction_Files
Files name : - "System_Extraction_Supplier_1"
- "System_Extraction_Supplier_2"
- "System_Extraction_Supplier_3"
The number of files can change so i used a loop Do While to count the number of files, then the plan is to use a loop for I =1 to ( number of files) to open all of theme.
please read my code. I know that i used a wrong way to read file name using a loop for but I share it because I don't have an other idea.
Here is my code :
Sub OpenFiles ()
Dim MainPath as String
Dim CommonPath as String
Dim Count As Integer
Dim i As Integer
' the main path is " C:\Desktop\Extraction_Files\System_Extraction_Supplier_i"
'with i = 1 to Count ( file number )
CommonPath = "C:\Desktop\Extraction_Files\System_Extraction_Supplier_*"
'counting automatically the file number
Filename = Dir ( CommonPath )
Do While Filename <> ""
Count = Count + 1
Filename = Dir ()
Loop
'the issue is below because this code generate a MsgBox showing a MainPath with the index i like this
'"C:\Desktop\Extraction_Files\System_Extraction_Supplier_i"
' so vba can not find the files
For i = 1 To count
MainPath = "C:\Desktop\Extraction_Files\System_Extraction_Supplier_" & "i"
MsgBox MainPath &
Workbooks.Open MainPath
Next
End Sub
what is the best approach to this?
Why not count as you open them. You're already identifying them so why not open each file as you go:
Sub OpenFiles()
Dim Filename As String
Dim CommonPath As String
Dim Count As Integer
CommonPath = "C:\Desktop\Extraction_Files\"
Filename = Dir(CommonPath & "System_Extraction_Supplier_*")
Do While Filename <> ""
MsgBox Filename
Workbooks.Open CommonPath & Filename
Count = Count + 1
Filename = Dir()
Loop
End Sub
PS. It might be worth adding .xl* or similar to the end of your search pattern to prevent Excel trying to open files that aren't Excel files:
Filename = Dir(CommonPath & "System_Extraction_Supplier_*.xl*")
If you want to open all folders, in a specific folder, which start with "NewFile_", one loop only is needed:
Sub OpenFolders()
Dim path As String: path = ""C:\Desktop\Extraction_Files\""
Dim fileStart As String: fileStart = "System_Extraction_Supplier_"
Dim Fso As Object
Dim objFolder As Object
Set Fso = CreateObject("Scripting.FileSystemObject")
Set objFolder = Fso.GetFolder(path)
For Each objSubFolder In objFolder.subfolders
If InStr(1, objSubFolder.Name, fileStart) Then
Shell "explorer.exe " & objSubFolder, vbNormalFocus
Debug.Print objSubFolder.Name
End If
Next objSubFolder
End Sub
Folders in vba are opened with the Shell "explorer.exe " command. The code opens every folder in "C:\yourFile\", which contains NewFile_ in the name. This check is done with If InStr(1, objSubFolder.Name, fileStart) Then.

Traverse zip file using VBA

I need to traverse a zip files using VBA. In particular I need to, without unzipping the file, locate the xl folder in order to find the media subfolder. I then need to copy the images out of the media subfolder and save them to another folder.
Public Sub Extract_Images()
Dim fso As FileSystemObject
Dim objFile As File
Dim myFolder
Const zipDir As String = "\\...\ZIP FILES"
Const xlFolder As String = "xl"
Const mediaFolder As String = "media"
Dim picname As String
Dim zipname As String
Set fso = New FileSystemObject
Set myFolder = fso.GetFolder(zipDir)
For Each objFile In myFolder.Files
zipname = objFile.Name
Next objFile
End Sub
^That code successfully loops through the folder and gathers the names of the zip files. But I need to get into the files and traverse the structures to get to the Media folder.
Building off: https://www.rondebruin.nl/win/s7/win002.htm
Edit: - this shows how you can incorporate the extraction into your code. Just pass the full zip path and the location to where you want to extract the files. You can do this from within your existing loop.
You may need to account for media files sharing the same name if you're planning on extracting them all to the same location...
Sub Tester()
ExtractMediaFiles "C:\Users\twilliams\Desktop\tempo.zip", _
"C:\Users\twilliams\Desktop\extracted\"
End Sub
Sub ExtractMediaFiles(zipFile As Variant, outFolder As Variant)
Dim oApp As Object
Dim fileNameInZip As Variant, oNS As Object
Set oApp = CreateObject("Shell.Application")
On Error Resume Next
Set oNS = oApp.Namespace(zipFile & "\xl\media")
On Error GoTo 0
If Not oNS Is Nothing Then
For Each fileNameInZip In oNS.items
Debug.Print fileNameInZip
oApp.Namespace(outFolder).copyhere oNS.items.Item(CStr(fileNameInZip))
Next
Else
Debug.Print "No xl\media path for " & zipFile
End If
End Sub

VBA Code to delete Temp files made by word

I have code to delete all files in a folder:
ChDir "C:\test\" 'path
Kill "C:\test*.*" 'type
However, when I open a doc file and save it as a text, it creates a temporary file named ~$*****.doc and these files do not get deleted.
How would I do this?
Sub BatchConvertCSV()
'declarations
Dim i As Integer
Dim j As Integer
Dim NewName As String
Dim objWord As Object
Dim ApplicationFileSearch As New FileSearch
Dim iCnt As Integer
Set objWord = CreateObject("Word.Application")
'search for all.doc files in specified folder
With ApplicationFileSearch
.NewSearch
.LookIn = "C:\test\"
.SearchSubFolders = False
.FileName = "*.doc"
.Execute
j = .FoundFiles.Count
i = 1
MsgBox ("Found files " & j)
'open each document
Do While i < j
Set objWord = Documents.Open(FileName:=.FoundFiles(i))
With ActiveDocument
iCnt = ActiveDocument.Fields.Count
'Somewhere here we need to decide on the placement for an if statement to filter out the doc files for 35 and 39 fields.
'If the doc file does not have that amount of fields
'MsgBox ("Found fields " & iCnt)
If iCnt > 30 And iCnt < 40 Then
.SaveFormsData = True
'save open file as just form data csv file and call it the the vaule of i.txt (i.e 1.txt, 2.txt,...i.txt) and close open file
NewName = i
ChangeFileOpenDirectory "C:\test\Raw Data\"
ActiveDocument.SaveAs FileName:=NewName
objWord.Close False
Else
End If
End With
i = i + 1
Loop
'repeat to the ith .doc file
End With
ChDir "C:\test\" 'path
Kill "C:\test\*.*" 'type
Try this:
With CreateObject("Scripting.FileSystemObject").getfolder("C:\Test")
For Each file In .Files
If Left(file.Name, 2) = "~$" Then
Kill "C:\Test\" & file.Name
End If
Next file
End With
You can, of course, refine that filter as you see fit.
The only problem I can see with that is that you're removing files from .Files while you're looping through them; it might work, but it's probably safer to add each file to a list instead of killing it in the ForEach loop, and then go through and kill everything in the list afterwards.
EDIT:
A little more research. According to this article, you can't use Kill on read-only files. This means you need to use the SetAttr command to remove the "read-only" flag. Here's some code that might help:
Dim strDir, strFile As String
strDir = "C:\Test\" 'Don't forget the trailing backslash
strFile = Dir(strDir & "~$*", vbHidden)
Do Until strFile = ""
If Len(Dir$(strDir & strFile)) > 0 Then
SetAttr strDir & strFile, vbNormal
Kill strDir & strFile
End If
strFile = Dir()
Loop
As you can see, that includes a check that the file actually exists before trying to delete it; as we're pulling that file up with Dir the check shouldn't be necessary, but your experience suggests that extra precautions are needed here. Let me know how that works.

Resources