How to only open specific excel files by comparing file names? - excel

I am currently working on a project that requires me to compile data from hundreds of spreadsheets in a given directory. My problem is I'm not sure how to handle different sub-revisions of files. For example the files are named:
File Name R1a.xlsx
File Name R1b.xlsx
File Name R1c.xlsx
File Name R2a.xlsx
File Name R2b.xlsx
For the above files I would only need to read from 1c and 2b. Is there a good way of determining which files need read, or could someone at least point me in a direction that I could look into? My initial thoughts were to loop through the characters in the file names and check for the largest letter that follows a number, but that seems like it would be incredibly tricky to code properly.
Thanks in advance!

There are a number of ways to approach this problem. If your filename domain is truly fixed as R{num}{prio}.xlsx, then note that the filenames constitute valid Excel cell addresses when {prio}.xlsx is stripped away. The resultant cell addresses from your example are R1 and R2. You can now use the R column of the current spreadsheet as a sparse vector to store the highest priority observed per cell (aka file). When all filenames have been examined and their highest priorities stored, it's now a simple matter of traversing the R column to pick up the files selected for processing.
Here's code that handles the aforementioned filename domain....
Sub ProcessFilesBasedOnFnamePriority()
Dim filenames, fname As Variant
Dim maxRowNum, nRowNum, i As Long
Dim strFilePrefix, strCellPrio As String
maxRowNum = 1
filenames = listfiles("c:\temp\lots_of_files")
' make an assumption that all filenames begin with the same
' single-char prefix character.
strFilePrefix = Left(filenames(1), 1)
For Each fname In filenames
Dim dotpos, suffixLen As Integer
Dim strCellAddr, strFnamePrio
dotpos = InStr(1, fname, ".")
suffixLen = Len(Mid(fname, dotpos))
' assume priority is specified by a single char in fname
strFnamePrio = LCase(Mid(fname, dotpos - 1, 1))
strCellAddr = Left(fname, Len(fname) - (suffixLen + 1)) ' "+1" to account for priority char
strCellPrio = Range(strCellAddr)
If (Trim(strCellPrio) = "") Then
Range(strCellAddr) = strFnamePrio ' store first prio occurrence
Else
' assume filename prio characters (e.g., [a-z]) sort lexicographically
If strFnamePrio > strCellPrio Then
Range(strCellAddr) = strFnamePrio
End If
End If
nRowNum = CLng(Mid(strCellAddr, 2))
If nRowNum > maxRowNum Then
maxRowNum = nRowNum
End If
Next
For i = 1 To maxRowNum
strCellPrio = Trim(Range(strFilePrefix & CStr(i)))
If strCellPrio <> "" Then
fname = strFilePrefix & CStr(i) & strCellPrio & ".xlsx"
Debug.Print fname ' <-- do analysis on fname
End If
Next i
End Sub
The code for listfiles is here. Note that this solution will not handle stray files that don't follow the assumed naming pattern. You'll need to add checks to weed them out.

You can store your list of partial file names you need to match in an array. Then loop through the partial names in the array and foreach partial name, loop through the directory to look for a match. VBA provides the InStr function to that you can use to test if a file name contains the partial name from your array.
In pseudocode:
myArray = [ 1c, 2b]
ForEach partialName in myArray
ForEach file in myDirectory
If InStr(fileName, partialName) Then
// Do something interesting
End If
Next file
Next partialName

Use a excel sheet or tabular format to express the file desired.
The excel sheet only needs two columns, A for the filename without the sub-revision, and column B for the desired sub-revision.
Compile and compose this information and then your vba implementation use the excel worksheet/tabular format to iterate and read "only" those files.
Use whatever language to compose the tabular format (in my case, python is preferred) and try to use any thing you can "to determine the sub-revision".
This allows you to debug the results more easily and use whatever language to compile a worksheet formatted or tabular delimited file.

This works because .GetFolder returns a sorted list.
Option Explicit
Sub FilesSelecter()
Dim fs As Object
Dim TargetPath As String
Dim DirList As Object
Dim File As Object
Dim BaseName As String
Dim RootFileName As String
Dim SaveRootFileName As String
Dim SaveBaseName As String
Set fs = CreateObject("Scripting.FileSystemObject")
TargetPath = "C:\Users\BeastMstr\Documents\TestFiles"
Set DirList = fs.Getfolder(TargetPath)
SaveRootFileName = ""
For Each File In DirList.Files
BaseName = fs.getbasename(File)
RootFileName = Left(BaseName, Len(BaseName) - 1)
If SaveRootFileName = RootFileName Or SaveRootFileName = "" Then
SaveRootFileName = RootFileName
SaveBaseName = BaseName
Else
'
'Do Somethingwith SaveBaseName
'
Debug.Print SaveBaseName
SaveRootFileName = RootFileName
SaveBaseName = BaseName
End If
Next
'
' Do something with the last file
'
Debug.Print SaveBaseName
End Sub

Related

Is there a way to obtain the file path of an excel sheet through a fuzzy match with a cell in a Master Excel sheet? [duplicate]

I need to open a file whose full filename I do not know.
I know the file name is something like.
filename*esy
I know definitely that there's only one occurrence of this file in the given directory.
filename*esy is already a "shell ready" wildcard & if thats alway the case you can simply;
const SOME_PATH as string = "c:\rootdir\"
...
Dim file As String
file = Dir$(SOME_PATH & "filename*esy" & ".*")
If (Len(file) > 0) Then
MsgBox "found " & file
End If
Just call (or loop until empty) file = Dir$() to get the next match.
There is an Application.FileSearch you can use (see below). You could use that to search for the files that match your pattern. This information taken from here.
Sub App_FileSearch_Example()
With Application.FileSearch
.NewSearch
.LookIn = "c:\some_folder\"
.FileName = "filename*esy"
If .Execute(SortBy:=msoSortByLastModified, SortOrder:=msoSortOrderDescending) > 0 Then
For i1 = 1 To .FoundFiles.Count
' do something with matched file(s)
Next i1
End If
End With
End Sub
If InStr(sFilename, "filename") > 0 and InStr(sFilename, "esy") > 0 Then
'do somthing
end if
Or you can use RegEx
Dim RE As Object, REMatches As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.MultiLine = False
.Global = False
.IgnoreCase = True
.Pattern = "filename(.*)esy"
End With
Set REMatches = RE.Execute(sFilename)
REMatches(0) 'find match
I was trying this question as a function. This is the solution that ended up working for me.
Function fileName(path As String, sName As String, ext As String) As Variant
'path is Full path from root. Can also use path = ActiveWorkbook.path & "\"
'sName is the string to search. ? and * are wildcards. ? is for single char
'example sName = "book?" or sName ="March_*_2014*"
'ext is file extention ie .pdf .xlsm .xls? .j*
Dim file As Variant 'Store the next result of Dir
Dim fname() As String 'Dynamic Array for result set
ReDim fname(0 To 0)
Dim i As Integer ' Counter
i = 0
' Use dir to search and store first result
fname(i) = path & Dir(path & "\" & sName & ext)
i = i + 1
'Load next result
file = Dir
While file <> "" 'While a file is found store that file in the array
ReDim Preserve fname(0 To i) As String
fname(i) = path & file
file = Dir
Wend
fileName = Application.Transpose(fname) 'Print out array
End Function
This works for me as a single or array function.
If you know that no other file contains "filename" and "esy" in that order then you can simply use
Workbooks.Open Filename:= "Filepath\filename*esy.*"
Or if you know the number of missing characters then (assuming 4 characters unknown)
Workbooks.Open Filename:= "Filepath\filename????esy.*"
I use this method to run code on files which are date & timestamped to ignore the timestamp part.

VBA rename files in folder keeping the right order

I've found this code that renames all the files into a specific folder.
Const FolderLoc = "C:\Users\chf000\Desktop\AAA\"
Dim x As Long
x = 1
Dim s As String
s = Dir(FolderLoc & "*.*")
Do While s <> ""
Name FolderLoc & s As FolderLoc & "ANIMATIC-" & x & ".png"
s = Dir()
x = x + 1
Loop
It works, but I've got an issue during the renaming. Basically, I've got a png's sequence into the folder, like this: SHOT001_00.png and SHOT001_01.png and so one.
the sequence has more than 100 frames.
the script changes the name from SHOT001_00.png to ANIMATIC-0.png and so one.
When I run the script, the files are renamed in the wrong order.
For example, the file named ANIMATIC-12.png contains the image that belongs to frame 101.
I guess is a problem of how the script sort the files in the folder, is sorting in a sort of alphabetical order, rather than numerical order.
Does anyone know how can I edit the script in order to rename and keep the correct order?
Thanks
I think you should first put all the names in an array and then rename them from that list.
I use this code to get the file list into an array:
Private Function GetFileList(FileSpec As String) As Variant
' Returns an array of filenames that match FileSpec
' If no matching files are found, it returns False
Dim FileArray() As Variant
Dim FileCount As Integer
Dim FileName As String
On Error GoTo NoFilesFound
FileCount = 0
FileName = Dir(FileSpec)
If FileName = "" Then GoTo NoFilesFound
'Loop until no more matching files are found
Do While FileName <> ""
FileCount = FileCount + 1
ReDim Preserve FileArray(1 To FileCount)
FileArray(FileCount) = FileName
FileName = Dir()
Loop
GetFileList = FileArray
Exit Function
' Error handler
NoFilesFound:
GetFileList = False
End Function

How to extract specific words from text files into xls spreadsheet

I'm new in VBA. Before posting my question here,I have spent almost 3 days surfing Internet.
I have 300+ text files (text converted from PDF using OCR),from text file. I need to get all words that contain "alphabet" and "digits" (as example KT315A, KT-315-a, etc) along with source reference (txt file name).
What I need is
1.add "smart filter" that will copy only words that contains
"alphabets" and "digits"
paste copied data to column A
add reference file name to column B
I have found code below that can copy all data from text files into excel spreadsheet.
text files look like
"line from 252A-552A to ddddd, ,,, #,#,rrrr, 22 , ....kt3443 , fff,,,etc"
final result in xls should be
A | B
252A-552A | file1
kt3443 | file1
Option Explicit
Const sPath = "C:\outp\" 'remember end backslash
Const delim = "," 'comma delimited text file - EDIT
'Const delim = vbTab 'for TAB delimited text files
Sub ImportMultipleTextFiles()
Dim wb As Workbook
Dim sFile As String
Dim inputRow As Long
RefreshSheet
On Error Resume Next
sFile = Dir(sPath & "*.txt")
Do Until sFile = ""
inputRow = Sheets("Temp").Range("A" & Rows.Count).End(xlUp).Row + 1
'open the text file
'format=6 denotes a text file
Set wb = Workbooks.Open(Filename:=sPath & sFile, _
Format:=6, _
Delimiter:=delim)
'copy and paste
wb.Sheets(1).Range("A1").CurrentRegion.Copy _
Destination:=ThisWorkbook.Sheets("Temp").Range("A" & inputRow)
wb.Close SaveChanges:=False
'get next text file
sFile = Dir()
Loop
Set wb = Nothing
End Sub
Sub RefreshSheet()
'delete old sheet and add a new one
On Error Resume Next
Application.DisplayAlerts = False
Sheets("Temp").Delete
Application.DisplayAlerts = True
Worksheets.Add
ActiveSheet.Name = "Temp"
On Error GoTo 0
End Sub
thanks!
It's a little tough to tell exactly what constitutes a word from your example. It clearly can contain characters other than letters and numbers (eg the dash), but some of the items have dots preceding, so it cannot be defined as being delimited by a space.
I defined a "word" as a string that
Starts with a letter or digit and ends with a letter or digit
Contains both letters and digits
Might also contain any other non-space characters except a comma
To do this, I first replaced all the commas with spaces, and then applied an appropriate regular expression. However, this might accept undesired strings, so you might need to be more specific in defining exactly what is a word.
Also, instead of reading the entire file into an Excel workbook, by using the FileSystemObject we can process one line at a time, without reading 300 files into Excel. The base folder is set, as you did, by a constant in the VBA code.
But there are other ways to do this.
Be sure to set the references for early binding as noted in the code:
Option Explicit
'Set References to:
' Microsoft Scripting Runtime
' Microsoft VBscript Regular Expressions 5.5
Sub SearchMultipleTextFiles()
Dim FSO As FileSystemObject
Dim TS As TextStream, FO As Folder, FI As File, FIs As Files
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim WS As Worksheet, RW As Long
Const sPath As String = "C:\Users\Ron\Desktop"
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(sPath)
Set WS = ActiveSheet
WS.Columns.Clear
Set RE = New RegExp
With RE
.Global = True
.Pattern = "(?:\d(?=\S*[a-z])|[a-z](?=\S*\d))+\S*[a-z\d]"
.IgnoreCase = True
End With
For Each FI In FO.Files
If FI.Name Like "*.txt" Then
Set TS = FI.OpenAsTextStream(ForReading)
Do Until TS.AtEndOfStream
'Change .ReadLine to .ReadAll *might* make this run faster
' but would need to be tested.
Set MC = RE.Execute(Replace(TS.ReadLine, ",", " "))
If MC.Count > 0 Then
For Each M In MC
RW = RW + 1
WS.Cells(RW, 1) = M
WS.Cells(RW, 2) = FI.Name
Next M
End If
Loop
End If
Next FI
End Sub

compare two values and generate a percentage (excel)

I am currently trying to create a spreadsheet which keeps track of how many files have been quality checked against those that haven't and then displays the amount left to be checked as a percentage.
Currently on open the spreadsheet pulls the details from a checked folder and a work to be checked folder as follows:-
Private Sub pdf_loading()
Range("M5").Clear
Dim FolderPath As String, path As String, count As Integer
FolderPath = "C:\path to folder\"
' looks in spercific folder
path = FolderPath & "*.pdf"
' for file type this time it is pdf files, though if you change this is could be word files, or psd's
Filename = Dir(path)
Do While Filename <> ""
' checks for filename <less than or >greater than "filename" as "" is empty does not look for spercific file
count = count + 1
' counts amount of pdf files, add 1 to the last known number
Filename = Dir()
' contiunes count until it reaches the end of the directory
Loop
Range("M5").Value = count
' puts final count value in cell
For Each Cell In [M:M]
If Cell.Value = "0" Then
Cell.ClearContents
ElseIf Range("M5").Value >= 1 Then
End If
Next Cell
End Sub
Then for the checked folder:-
Private Sub checked_loading()
Range("M6").Clear
Dim FolderPath As String, path As String, count As Integer
FolderPath = "C:\path to folder\"
path = FolderPath & "*.pdf"
Filename = Dir(path)
Do While Filename <> ""
count = count + 1
Filename = Dir()
Loop
Range("M6").Value = count
For Each Cell In [M:M]
If Cell.Value = "0" Then
Cell.ClearContents
ElseIf Range("M5").Value >= 1 Then
End If
Next Cell
End Sub
This works fine, though currently the formula I have tried to generate the percentage is as follows:-
=IF(M5=M6,"50%",IF(M5=0,"100%",IF(M6=0,"0%",SUM(M5*M6/100*1))))
This brings back incorrect results like 144.00% when the files to be check result is 9 and the files checked result is 16.
I would prefer to have the percentage calculation to be in vba so that end users could not accidentally delete the underlying formula.
Any help on this issue or if there is a more efficient code structure would be most appreciate.
Not to worry I have found a solution which works a treat. The above code now looks like this for the work to be checked:-
Private Sub pdf_loading()
Range("K5:L6").ClearContents
Range("M5").ClearContents
' Clear cell contents on open
Dim FolderPath As String, path As String, count As Integer
FolderPath = "C:\filepath\folder\"
' looks in spercific folder
path = FolderPath & "*.pdf"
' for file type this time it is pdf files, though if you change this is could be word files, or psd's
Filename = Dir(path)
Do While Filename <> ""
' checks for filename <less than or >greater than "filename" as "" is empty does not look for spercific file
count = count + 1
' counts amount of pdf files, add 1 to the last known number
Filename = Dir()
' contiunes count until it reaches the end of the directory
Loop
Range("M5").Value = count
' puts final count value in cell
End Sub
and the work checked folder is now like this:-
Private Sub checked_loading()
Range("M6").ClearContents
Dim FolderPath As String, path As String, count As Integer
FolderPath = "C:\filepath\folder\"
path = FolderPath & "*.pdf"
Filename = Dir(path)
Do While Filename <> ""
count = count + 1
Filename = Dir()
Loop
Range("M6").Value = count
Range("N5").Formula = "=Sum(M5,M6)"
Range("K5").Formula = "=SUM(M6/N5*1)"
' adds formulas to selected cells to give percentage
End Sub

How to extract file name from path?

How do I extract the filename myfile.pdf from C:\Documents\myfile.pdf in VBA?
The best way of working with files and directories in VBA for Office 2000/2003 is using the scripting library.
Create a filesystem object and do all operations using that.
Early binding:
Add a reference to Microsoft Scripting Runtime (Tools > References in the IDE).
Dim fso as new FileSystemObject
Dim fileName As String
fileName = fso.GetFileName("c:\any path\file.txt")
Late binding (see comments for more)
With CreateObject("Scripting.FileSystemObject")
fileName = .GetFileName(FilePath)
extName = .GetExtensionName(FilePath)
baseName = .GetBaseName(FilePath)
parentName = .GetParentFolderName(FilePath)
End With
The FileSystemObject is great. It offers a lot of features such as getting special folders (My documents, etc.), creating, moving, copying, deleting files and directories in an object oriented manner.
Dir("C:\Documents\myfile.pdf")
will return the file name, but only if it exists.
This is taken from snippets.dzone.com:
Function GetFilenameFromPath(ByVal strPath As String) As String
' Returns the rightmost characters of a string upto but not including the rightmost '\'
' e.g. 'c:\winnt\win.ini' returns 'win.ini'
If Right$(strPath, 1) <> "\" And Len(strPath) > 0 Then
GetFilenameFromPath = GetFilenameFromPath(Left$(strPath, Len(strPath) - 1)) + Right$(strPath, 1)
End If
End Function
I've read through all the answers and I'd like to add one more that I think wins out because of its simplicity. Unlike the accepted answer this does not require recursion. It also does not require referencing a FileSystemObject.
Function FileNameFromPath(strFullPath As String) As String
FileNameFromPath = Right(strFullPath, Len(strFullPath) - InStrRev(strFullPath, "\"))
End Function
http://vba-tutorial.com/parsing-a-file-string-into-path-filename-and-extension/ has this code plus other functions for parsing out the file path, extension and even the filename without the extension.
I can't believe how overcomplicated some of these answers are... (no offence!)
Here's a single-line function that will get the job done:
Function getFName(pf)As String:getFName=Mid(pf,InStrRev(pf,"\")+1):End Function
Function getPath(pf)As String:getPath=Left(pf,InStrRev(pf,"\")):End Function
Examples:
Dim sFilePath$, sFileName$
sFileName = Split(sFilePath, "\")(UBound(Split(sFilePath, "\")))
If you want a more robust solution that will give you both the full folder's path AND the filename, here it is:
Dim strFileName As String, strFolderPath As String
Dim lngIndex As Long
Dim strPath() As String
strPath() = Split(OpenArgs, "\") 'Put the Parts of our path into an array
lngIndex = UBound(strPath)
strFileName = strPath(lngIndex) 'Get the File Name from our array
strPath(lngIndex) = "" 'Remove the File Name from our array
strFolderPath = Join(strPath, "\") 'Rebuild our path from our array
Or as a sub/function:
Private Sub SeparatePathAndFile(ByRef io_strFolderPath As String, ByRef o_strFileName As String)
Dim strPath() As String
Dim lngIndex As Long
strPath() = Split(io_strFolderPath, "\") 'Put the Parts of our path into an array
lngIndex = UBound(strPath)
o_strFileName = strPath(lngIndex) 'Get the File Name from our array
strPath(lngIndex) = "" 'Remove the File Name from our array
io_strFolderPath = Join(strPath, "\") 'Rebuild our path from our array
End Sub
You pass the first parameter with the full path of the file and it will be set to the folder's path while the second parameter will be set to the file's name.
Here's a simple VBA solution I wrote that works with Windows, Unix, Mac, and URL paths.
sFileName = Mid(Mid(sPath, InStrRev(sPath, "/") + 1), InStrRev(sPath, "\") + 1)
sFolderName = Left(sPath, Len(sPath) - Len(sFileName))
You can test the output using this code:
'Visual Basic for Applications
http = "https://www.server.com/docs/Letter.txt"
unix = "/home/user/docs/Letter.txt"
dos = "C:\user\docs\Letter.txt"
win = "\\Server01\user\docs\Letter.txt"
blank = ""
sPath = unix
sFileName = Mid(Mid(sPath, InStrRev(sPath, "/") + 1), InStrRev(sPath, "\") + 1)
sFolderName = Left(sPath, Len(sPath) - Len(sFileName))
Debug.print "Folder: " & sFolderName & " File: " & sFileName
Also see: Wikipedia - Path (computing)
The simplest approach if you are sure the file physically exists on the disk:
Dim fileName, filePath As String
filePath = "C:\Documents\myfile.pdf"
fileName = Dir(filePath)
If you are not sure about existence of file or just want to extract filename from a given path then, simplest approach is:
fileName = Mid(filePath, InStrRev(filePath, "\") + 1)
To get the file name in an excel macro is:
filname = Mid(spth, InStrRev(spth, "\", Len(spth)) + 1, Len(spth))
MsgBox Mid(filname, 1, InStr(filname, ".") - 1)
Function file_name_only(file_path As String) As String
Dim temp As Variant
temp = Split(file_path, Application.PathSeparator)
file_name_only = temp(UBound(temp))
End Function
here you give your file name as input of the function
the split function of VBA splits the path in different portion by using "\" as path separator & stores them in an array named "temp"
the UBound() finds the max item number of array and finally assigns the result to "file_name_only" function
Hope this will be helpful.
Here's an alternative solution without code. This VBA works in the Excel Formula Bar:
To extract the file name:
=RIGHT(A1,LEN(A1)-FIND("~",SUBSTITUTE(A1,"\","~",LEN(A1)-LEN(SUBSTITUTE(A1,"\","")))))
To extract the file path:
=MID(A1,1,LEN(A1)-LEN(MID(A1,FIND(CHAR(1),SUBSTITUTE(A1,"\",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,LEN(A1))))
I am using this function...
VBA Function:
Function FunctionGetFileName(FullPath As String) As String
'Update 20140210
Dim splitList As Variant
splitList = VBA.Split(FullPath, "\")
FunctionGetFileName = splitList(UBound(splitList, 1))
End Function
Now enter
=FunctionGetFileName(A1) in youe required cell.
or You can use these...
=MID(A1,FIND("*",SUBSTITUTE(A1,"\","*",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,LEN(A1))
I needed the path, not the filename.
So to extract the file path in code:
JustPath = Left(sFileP, Len(sFileP) - Len(Split(sFileP, "\")(UBound(Split(sFileP, "\")))))
This gleaned from Twiggy # http://archive.atomicmpc.com.au and other places:
'since the file name and path were used several times in code
'variables were made public
Public FName As Variant, Filename As String, Path As String
Sub xxx()
...
If Not GetFileName = 1 Then Exit Sub '
...
End Sub
Private Function GetFileName()
GetFileName = 0 'used for error handling at call point in case user cancels
FName = Application.GetOpenFilename("Ramp log file (*.txt), *.txt")
If Not VarType(FName) = vbBoolean Then GetFileName = 1 'to assure selection was made
Filename = Split(FName, "\")(UBound(Split(FName, "\"))) 'results in file name
Path = Left(FName, InStrRev(FName, "\")) 'results in path
End Function
Dim nme As String = My.Computer.FileSystem.GetFileInfo(pathFicheiro).Name
Dim dirc As String = My.Computer.FileSystem.GetFileInfo(nomeFicheiro).Directory

Resources