Search multiple text files for specific lines of data and import into excel using VBA macros - excel

I am very new to VBA and I'm looking to use it to automate some of my processes. I have looked around this website (and others) and although I find very similar queries, I can't seem to find one that fits my needs exactly.
So far the closest thing I've found to what I'm looking to do is this: Wanting to create a search field and button to trigger VBA script to run
I have a source folder with all my data. My data is stored in multiple text files. Here is an example of what the data in the files looks like:
10001,1,205955.00
10001,2,196954.00
10001,3,4.60
10001,4,92353.00
10001,5,85015.00
10001,6,255.90
10001,7,804.79
10001,8,205955.00
10001,9,32465.00
In each row, the first number is a geographic code, second number is a numeric code for a specific indicator (not important for what I'm trying to do), and the third number is the value I want to import into my spreadsheet. Each geographic code is associated with 2247 rows.
I want to use a search box control in Excel that I can type a specific geographic code into, click a button and then the macro would run, searching the files for that specific code and then importing all the values - in the order they are listed in the data file - into my desired range in the workbook.
So far I've gotten this code written. Again, forgive me if this is bad code... I tried to re-purpose the code from the other forum post I mentioned earlier.
I think I setup the import location right... I want it to import into column C, row 3 of the sheet that the search box/button combo will be present on. But now, I am unsure how I would get the import aspect to work. Thanks in advance for anyone who can help on this issue.
Sub SearchFolders()
Dim FSO As Object
Dim Folder As Object
Dim File As Object
Dim TS As Object
Dim SourceFolder As String
Dim Search As String
Dim LineNumber As Long
Dim DataSh As Worksheet
SourceFolder = "C:\Users\MarMar\Desktop\Data\Census2016\DataFiles\"
Search = TextBox1.Value
Set FSO = CreateObject("Scripting.FileSystemObject")
Set Folder = FSO.GetFolder(SourceFolder)
Set DataSh = ActiveSheet.Cells(3, 3)
For Each File In Folder.Files
Set TS = File.OpenAsTextStream()
LineNumber = 0
Do While Not TS.AtEndOfStream
LineNumber = LineNumber + 1
If InStr(TS.ReadLine, Search) Then
'Code to Import Values to DataSh ???
End If
Loop
TS.Close
Next File
End Sub

Maybe something like this:
Dim arr
For Each File In Folder.Files
Set TS = File.OpenAsTextStream()
LineNumber = 0
Do While Not TS.AtEndOfStream
arr = Split(TS.ReadLine, ",") 'split line to array
'check first element in array
If arr(0) = Search Then
datash.Resize(1, UBound(arr) + 1).Value = arr
Set datash = datash.Offset(1, 0)
End If
Loop
TS.Close
Next File

Final result that worked for me!
Sub SearchImportData1()
Dim FSO As Object
Dim SourceFolder As String
Dim Folder As Object
Dim Import As Range
Dim Search As String
Dim TextBox1 As TextBox
Dim File As Object
Dim TS As Object
Dim LineNumber As Integer
Dim Arr As Variant
SourceFolder = "C:\Users\MarMar\Desktop\Data\Census2016\DataFiles\"
Set FSO = CreateObject("Scripting.FileSystemObject")
Set Folder = FSO.GetFolder(SourceFolder)
Set Import = ActiveSheet.Cells(2, 3)
Search = ActiveSheet.TextBox1.Text
For Each File In Folder.Files
Set TS = File.OpenAsTextStream()
LineNumber = 0
Do While Not TS.AtEndOfStream
Arr = Split(TS.ReadLine, ",")
If Arr(0) = Search Then
Import.Resize(1, 1).Value = Arr(2)
Set Import = Import.Offset(1, 0)
End If
Loop
TS.Close
Next File
End Sub

Related

List files from a folder in a listbox with recent files showing first

I have a listbox that displays XLSM files from a folder Archive and PDFs from a folder called PDF in the folder of the main XLSM file, Main.
C:\Main\Archive\, C:\Main\PDF\, the XLSM looking up these items is located in the root folder C:\Main\
I would like to display the most recent modified files in descending order.
So if a file was created today, it would show on top followed by the one created yesterday and so on.
The code I have is just the standard AddItem to ListBox1
MyFile = Dir(MyFolder & "\*.xlsm")
Do While MyFile <> ""
ListBox1.AddItem MyFile
MyFile = Dir
Loop
These files also have names that start with either FSO or PPG followed by the "ticket number" like 1031, company name, job type and simple date.
FSO 10333 Co Name Job Type 042220.xlsm
PPG 10332 Co Name Job Type 042120.xlsm
That's how the names are displayed in the listbox. PDFs are the same name.
Thank you for your time!
What about the following:
Use of FileSystemObject to access file properties like: GetExtensionName, DateCreated and Name.
Use of Dictionary object to create a library and store values in memory.
Use of ArrayList object to create a list to store creation timedate values which we then can Sort ascending and Reverse to create an descending list.
We can then iterate ArrayList to return values from our Dictionary and add them to the ListBox in order.
Dim dict As Object: Set dict = CreateObject("Scripting.Dictionary")
Dim arrList As Object: Set arrList = CreateObject("System.Collections.ArrayList")
Dim FSO: Set FSO = CreateObject("scripting.FileSystemObject")
Dim oFolder, oFile
Set oFolder = FSO.getfolder(MyFolder)
For Each oFile In oFolder.Files
If FSO.GetExtensionName(oFile) = "xlsm" Then
dict(oFile.DateCreated) = oFile.Name
arrList.Add oFile.DateCreated
End If
Next
arrList.Sort
arrList.Reverse
For i = 0 To arrList.Count - 1
ListBox1.AddItem dict(arrList(i))
Next
In support of my comment, omething like this will help you get the date from the file name. You can then use Excel to sort on this.
Sub testing()
Debug.Print get_date("FSO 10333 Co Name Job Type 042220.xlsm")
End Sub
Function get_date(strInput As String) As Date
Dim lngLength As Long
Dim lngLastSpace As Long
Dim strEndSection As String
lngLength = Len(strInput)
lngLastSpace = InStrRev(strInput, " ")
strEndSection = Mid(strInput, lngLastSpace)
strEndSection = Trim(Split(strEndSection, ".")(0)) ' The date bit
get_date = DateSerial(Mid(strEndSection, 5, 2), _
Mid(strEndSection, 1, 2), _
Mid(strEndSection, 3, 2))
End Function

VBA: List of folder paths, return list of excel file paths, then edit excels

I have a user-form that pastes folder-paths into a list. I then have the code below that is supposed to loop through that list and list all the sub-folders (then I'll probably have another code loop through the sub-folders to get the excel workbooks).
I know it's inelegant, because ultimately what I want is have my list of paths be looked in one a time, through each folder and subfolder to find and list the excel files. But there was a question like that and it was taken down. The question was then referred to a different q&a that I did not understand, that had to do with individual FILE NAMES, typed in a single cell not a range, nor as a path. I speak Russian, which some of his code was in, and still couldn't quite understand what his code meant and was referring to, and when I tried it, it kept telling met that "GetData" was undefined? so I've tried to ask a different but similar question in the hope that someone can explain to me what I need to do, as I've gone as far as I can and have tried to adapt both codes from the links in this post as well as many others. I have several modules with broken code that doesn't work, and the closest I've come is the code below. At this point I'd settle simply for a way to list the excel file names from a list of paths.
Option Explicit
Dim i As Long, j As Long
Dim searchfolders As Variant
Dim FileSystemObject
Sub ListOfFolders77()
Dim LookInTheFolder As String
'Dim ws As Worksheet: Set ws = Sheets("Output4")
Dim ws2 As Worksheet: Set ws2 = Sheets("Output3")
Dim rng As Range: Set rng = ws2.Range("A1:A" & Rows.Count).End(xlUp)
Dim mypath As Range
'Dim Region As Range: Set Region = ws.Range("A2")
'Dim district As Range: Set district = ws.Range("B2")
'Dim city As Range: Set city = ws.Range("C2")
'Dim atlas As Range: Set atlas = ws.Range("D2")
i = 1
For Each mypath In rng
LookInTheFolder = mypath.Value
Set FileSystemObject = CreateObject("Scripting.FileSystemObject")
For Each searchfolders In FileSystemObject.GetFolder(LookInTheFolder).subfolders
Sheets("Subfolders").Cells(i, 1) = searchfolders
i = i + 1
SearchWithin searchfolders
Next searchfolders
Next mypath
End Sub
Sub SearchWithin(searchfolders)
On Error GoTo exits
For Each searchfolders In FileSystemObject.GetFolder(searchfolders).subfolders
j = UBound(Split(searchfolders, "\"))
Cells(i, j) = searchfolders
i = i + 1
SearchWithin searchfolders
Next searchfolders
exits:
End Sub
Ideally I want to get all the excel files in the folders and subfolders, and copy paste the data on the first sheet into one long list, but I'm still on step 1. I posted a more detailed explanation here last week and have yet to receive any feedback or potential tips.
I apologize if this doesn't make sense or seems half-hazard. I am self taught in excel VBA and am struggling to understand if what I need is even possible. I attempted using Directory but I've little success putting directory in a for each loop.
I also tried using an array, which almost crashed by computer as it went to list ALL the folders and files in my entire computer.
If I understand correctly, your requirements are as follows:
Begin with a set of root paths
Iterate recursively through all the files in each root path
For each file in the resulting collection, if it's an Excel file, add to final list for further processing
Let's start with the first two points. I would suggest the following code (make sure to add a reference to Microsoft Scripting Runtime via Tools -> References... in the VBA editor menus):
Public Function GetFiles(ByVal roots As Variant) As Collection
Select Case TypeName(roots)
Case "String", "Folder"
roots = Array(roots)
End Select
Dim results As New Collection
Dim fso As New Scripting.FileSystemObject
Dim root As Variant
For Each root In roots
AddFilesFromFolder fso.GetFolder(root), results
Next
Set GetFiles = results
End Function
Private Sub AddFilesFromFolder(folder As Scripting.folder, results As Collection)
Dim file As Scripting.file
For Each file In folder.Files
results.Add file
Next
Dim subfolder As Scripting.folder
For Each subfolder In folder.SubFolders
AddFilesFromFolder subfolder, results
Next
End Sub
The GetFiles function can be called by passing in a single string (or Folder):
Debug.Print GetFiles("c:\users\win8\documents").Count
or anything that can be iterated over with For Each -- an array, collection, Dictionary, or even an Excel Range object:
Dim allFiles As Collection
Set allFiles = GetFiles(ws2.Range("A1:A" & Rows.Count).End(xlUp)) 'from question
GetFiles as it stands is flexible for many use cases, and doesn't use any Excel-specific objects. In order to limit the results to Excel files only, you can create a new collection, and only add the Excel files into the new collection:
'You could filter by the File object's Type property
Sub GetExcelFilesByType()
Dim allFiles As Collection
Set allFiles = GetFiles(ws2.Range("A1:A" & Rows.Count).End(xlUp)) 'from question
Dim excelFiles As New Collection
Dim file As Scripting.File
For Each file In allFiles
If file.Type = "Microsoft Excel Worksheet" Then excelFiles.Add file
Next
End Sub
' Or you could filter by extension, using the FileSystemObject.GetExtensionName method
Sub GetExcelFilesByExtensionName()
Dim allFiles As Collection
Set allFiles = GetFiles(ws2.Range("A1:A" & Rows.Count).End(xlUp)) 'from question
Dim excelFiles As New Collection
Dim fso As New Scripting.FileSystemObject
Dim file As Scripting.File
For Each file In allFiles
Select Case fso.GetExtensionName(file.path)
Case "xls", "xlsb", "xlsm"
excelFiles.Add file
End Select
Next
End Sub
Either will get you a Collection of File objects, of only Excel files, from the set of root folders.
Notes
This code is recursively adding all the files (not just Excel files) into one collection (in GetFiles) and then filtering out the non-Excel files into a new collection. This might be less performant than adding only Excel files into the original collection, but that would limit GetFiles to only this scenario.
If you want to paste the results into an Excel worksheet, you could iterate through excelFiles and paste each path into the sheet. Alternatively, you might convert excelFiles into an array, and use the Excel Range object's Value property to set all the values from the array, without using a For Each.
References
Microsoft Scripting Runtime
FileSystemObject object, GetExtensionName method
File object
Folder object
VBA
Collection object
Here's a quick way, slightly adapted from this answer.
Just add in your folder locations to the path() = ... list and it should work for you. It outputs, in the current excel sheet, the paths of all Excel files in folders you provide.
From there, you can do what you please. (Perhaps throw the file paths in to an array, so you have an array of files you want to open. From there you can do the copying of data).
'Force the explicit delcaration of variables
Option Explicit
Sub ListFiles()
'http://www.xl-central.com/list-the-files-in-a-folder-and-subfolders.html
'Set a reference to Microsoft Scripting Runtime by using
'Tools > References in the Visual Basic Editor (Alt+F11)
'Declare the variables
Dim objFSO As Scripting.FileSystemObject
Dim objTopFolder As Scripting.Folder
Dim strTopFolderName As String
Dim path() As Variant ' EDIT THE BELOW PATH LIST FOR WHATEVER YOU NEED!
path() = Array("C:\Users\USERNAME\Desktop\Stuff\New folder", "C:\Users\USERNAME\Desktop\Other Stuff\")
'Insert the headers for Columns
Range("A1").Value = "File Name"
Range("D1").Value = "File Path"
Dim i As Long
For i = LBound(path) To UBound(path)
strTopFolderName = path(i)
'Create an instance of the FileSystemObject
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get the top folder
Set objTopFolder = objFSO.GetFolder(strTopFolderName)
'Call the RecursiveFolder routine
Call RecursiveFolder(objTopFolder, True)
'Change the width of the columns to achieve the best fit
Columns.AutoFit
Next i
End Sub
Sub RecursiveFolder(objFolder As Scripting.Folder, _
IncludeSubFolders As Boolean)
'Declare the variables
Dim objFile As Scripting.File
Dim objSubFolder As Scripting.Folder
Dim NextRow As Long
'Find the next available row
NextRow = Cells(Rows.Count, "A").End(xlUp).Row + 1
'Loop through each file in the folder
For Each objFile In objFolder.Files
Debug.Print (objFile)
If objFile.Type = "Microsoft Excel Worksheet" Then
Cells(NextRow, "A").Value = objFile.Name
Cells(NextRow, "D").Value = objFile.path
NextRow = NextRow + 1
End If
Next objFile
'Loop through files in the subfolders
If IncludeSubFolders Then
For Each objSubFolder In objFolder.SubFolders
Call RecursiveFolder(objSubFolder, True)
Next objSubFolder
End If
End Sub

Efficiently reading txt file (or other data sources) using VBA

I have a (large) column of data stored in a txt file.
I need to copy the column vector in an Excel sheet. Here is my code:
Dim t As Single
t = Timer
Dim sFile As String
inputFile = "C:\Temp\vector.txt"
Dim rowNum As Long
rowNum = 1
Dim dest As Range
Set dest = Sheet1.Cells(rowNum, 1)
Open inputFile For Input As #1
Do Until EOF(1)
Input #1, ReadData
If Not IsEmpty(ReadData) Then
dest.Cells = ReadData
rowNum = rowNum + 1
Set dest = Sheet1.Cells(rowNum, 1)
End If
Loop
Close #1 'close the opened file
Sheet1.[C2].Value = Timer - t
I wonder whether there is a more efficient/fast way to accomplish the same task.
To this aim, does it make sense to convert the txt file into another format (say .csv, .xlsx or any other file type) instead of reading lines from the .txt file?
Any help is highly appreciated.
S
Following this link I have tried different solutions.
The following code provides a much faster solution to the problem (importing in Excel a column of 500,000 random numbers) as compared to the code proposed in the initial question.
Dim t As Single
t = Timer
Dim inputFile As String
inputFile = "C:\Temp\vector.txt"
Dim wbI As Workbook, wbO As Workbook
Dim wsI As Worksheet
Set wbI = ThisWorkbook
Set wsI = wbI.Sheets("Sheet1")
Set wbO = Workbooks.Open(inputFile)
wbO.Sheets(1).Columns(1).Copy wsI.Columns(1)
wbO.Close SaveChanges:=False
Sheet1.[C2].Value = Timer - t
In particular, after 20 trials, the average computational time was 1.50 seconds, while con the first code it was 10.2 seconds.
Hope this helps!
If you want to use the first approach (which I'd recommend as it doesn't involve opening the files through Excel) then you can reduce the run time by batching the prints.
Also you may want to consider using scripting.filesystemobject rather than the older IO interface.
See example below (Note this code hasn't been tested)
const path as string = ""
const max_print_rows as integer = 10000
dim print_start_cell as range
dim print_arr () as string
dim i as integer,j as long
dim fso as scripting.filesystemobject
dim in_file as scripting.textstream
set print_start_cell=thisworkbook.names("Start_Cell").referstorange
set fso=new scripting.filesystemobject
set in_file=fso.opentextfile(path,forreading)
redim print_arr(1 to max_print_rows,1 to 1)
do until in_file.atendofstream
i=i+1
print_arr(i)=in_file.readline
if I=max_print_rows then
print_start_cell.offset(j).resize(max_print_rows).value=print_arr
j=j+i
erase print_arr
redim print_arr(1 to max_print_rows)
i=1
end if
loop
print_start_cell.offset(j).resize(max_print_rows).value=print_arr
erase print_arr
in_file.close
set in_file=nothing
set print_start_cell=nothing
set fso=nothing

Excel External Data into a Table

I am working on an excel spreadsheet that takes data from a CSV file (produced automatically by an external system).
I have used:
Data->Get External Data->From Text
And it works perfect !
However i am not able to format the imported data as a table :-(
It gives the following message :
Your Selection overlaps one or more external data ranges. Do you want to convert the selection to a table and remove all external connections?
Is there a way to format the imported data as a table wthout breaking the connection ?
Thanks
Martin
This should work for you - make sure you have a tab called Data and you change the public const to the path of the file. I assume you know what to do with this code, if not let me know.
Public Const feedDir = "C:\Program Files\Common Files\System\data.csv" 'change this to the path of the file
Sub loadDataWrapper()
'''check file is in directory before proceding
If Dir(feedDir) <> "" Then
fileToLoad = feedDir
Else
MsgBox "No file available to load. Please check the path and try again."
Exit Sub
End If
Call loadData(fileToLoad)
End Sub
Sub loadData(ByVal fileToLoad As String)
Dim fso As Object, textFile As Object: Set fso = CreateObject("Scripting.FileSystemObject")
Dim textFileStr As String
Dim textFileArr As Variant
Dim outputArr() As Variant
Dim oneRow As Variant
Dim numRows, numColumns As Long
'''open the text file and read into memory as is
Set textFile = fso.OpenTextFile(fileToLoad, 1)
textFileStr = textFile.ReadAll
textFile.Close
Set textFile = Nothing
Set fso = Nothing
'''find number of rows and columns of text file
textFileArr = Split(textFileStr, Chr(10))
numRows = UBound(textFileArr)
numColumns = UBound(Split(textFileArr(0), ","))
ReDim outputArr(numRows, numColumns)
'''go through every line and insert into array
For ii = 0 To (numRows - 1)
oneRow = Split(textFileArr(ii), ",")
For jj = 0 To numColumns
outputArr(ii, jj) = oneRow(jj)
Next jj
Next ii
'''output array to Worksheet
Worksheets("Data").Range("A2:Z1048576").ClearContents
Worksheets("Data").Range("A2").Resize(numRows + 1, numColumns + 1).Value = outputArr
End Sub
Would a Pivot Table satisfy your requirement?
Insert>PivotTable>Use External Data Source Radio Button

I only want to import the last 3 days of text files into Excel

I have a folder with over 16,000 files and I've managed to find some code that won't break Excel when it searches all of the files. Now I need some code that will import the last 3 days worth of text files. Any help would be appreciated.
Current code:
Sub ReadFilesIntoActiveSheet()
Dim fso As FileSystemObject
Dim folder As folder
Dim file As file
Dim FileText As TextStream
Dim TextLine As String
Dim Items() As String
Dim i As Long
Dim cl As Range
Dim dateToCheck As Date
Dim daysBack As Integer
Dim filesCheckd As Integer
filesChecked = 0
daysBack = 5
dateToCheck = DateAdd("d", -daysBack, Date)
Dim StrFile As String
StrFile = Dir("X:\TMS\TRUCK_OUT\")
Do While Len(StrFile) > 0
filesChecked = filesChecked + 1
StrFile = Dir
Loop
MsgBox filesChecked
Set FileText = Nothing
Set file = Nothing
Set folder = Nothing
Set fso = Nothing
End Sub
I told you the function to use and your response was to ask me to explain how to use it. The implication is that you know so little about Excel VBA that even knowing which function to use is not enough to add two extra statements to your code.
You must invest some time in learning the basics of Excel VBA. Trying to get a macro written for you without knowing the basics will probably fail. If you are successful, it will take a long time if you are only going to get a few lines per question.
Search the internet for "Excel VBA tutorial". There are many to choose from. Alternatively, visit a good bookshop or library and look for Excel VBA Primers. Again many to choose from. The time you invest in learning the basics will very quickly repay itself.
Issue 1
The first declaration is Dim fso As FileSystemObject. This will only compile if one of your references is for the Microsoft Scripting RunTime. You do not use fso. Are you planning to use it later? Why have you used Dir rather than the Files property of the folder object you have declared?
Issue 2
What do you mean by: "I've managed to find some code that won't break Excel when it searches all of the files." What code did you have which broke Excel?
Issue 3
Dim daysBack As Integer declares a 16-bit integer. Unless you have an old 16-bit computer, this will require extra processing. Dim i As Long declares a 32-bit integer and it the correct choice.
Issue 4
Dir returns a Variant. Dir$ returns a String and is faster.
None of the above issues are immediately important. I trying to show some of the little things you do not know about Excel VBA and the traps waiting to catch you when you do not know the basics.
For the version of your macro below, I have:
Indented the code to make it easier to read.
Commented out statements you do not current use.
Added two statements so only files with a Last Modified Date after dateToCheck are counted.
This will take you a little further.
Sub ReadFilesIntoActiveSheet()
'Dim fso As FileSystemObject
'Dim folder As folder
'Dim file As file
'Dim FileText As TextStream
'Dim TextLine As String
'Dim Items() As String
'Dim i As Long
'Dim cl As Range
Dim dateToCheck As Date
Dim daysBack As Integer
Dim filesCheckd As Integer
Dim StrFile As String
Dim PathRoot As String
filesChecked = 0
daysBack = 5
dateToCheck = DateAdd("d", -daysBack, Date)
PathRoot = "X:\TMS\TRUCK_OUT\"
StrFile = Dir$(PathRoot)
Do While Len(StrFile) > 0
If FileDateTime(PathRoot & StrFile) < dateToCheck Then
filesChecked = filesChecked + 1
End If
StrFile = Dir$
Loop
MsgBox filesChecked
'Set FileText = Nothing
'Set file = Nothing
'Set folder = Nothing
'Set fso = Nothing
End Sub

Resources