Extracting a row from a CSV file quickly in Excel VBA - excel

I have about 5000 .csv files and I want to search for one row in each file and extract it. I have pasted the key part of code below, which works, but as I have to open and close each .csv file, the process is slow for 5000 files. Is there any way to read a csv file without opening it? I had considered writing a small script to convert each csv file to Excel first? Thx.
SP_File_Name = Dir(DN_Path & "*.*")
Count = 1
Set START_CELL_RANGE = TARGET_SP_SHEET.Range("B3")
Set TICKER_CODE_RANGE = TARGET_SP_SHEET.Range("B1")
While (SP_File_Name <> "")
SP_Full_Path = DN_Path & SP_File_Name
Workbooks.OpenText Filename:=SP_Full_Path, DataType:=xlDelimited, comma:=True, Local:=True
Set INPUT_WORKBOOK = ActiveWorkbook
Set INPUT_SHEET = INPUT_WORKBOOK.Worksheets(1)
INPUT_SHEET.Range("$A$1").Select
Set INPUT_RANGE = ActiveCell.CurrentRegion
Set INPUT_FIRST_MATCH_RANGE = INPUT_RANGE.Find(TICKER_CODE_RANGE)
If INPUT_FIRST_MATCH_RANGE Is Nothing Then
GoTo NOT_FOUND
End If
START_CELL = START_CELL_RANGE.Address
TARGET_SP_SHEET.Range(START_CELL_RANGE.Address, START_CELL_RANGE.Offset(0, 6).Address).Value = INPUT_SHEET.Range(INPUT_FIRST_MATCH_RANGE.Address, INPUT_FIRST_MATCH_RANGE.Offset(0, 7).Address).Value
' write diagnostics
Sheet5.Range("K" & Count + 4).Value = START_CELL
Sheet5.Range("L" & Count + 4).Value = "$A$1"
Sheet5.Range("M" & Count + 4).Value = INPUT_FIRST_MATCH_RANGE.Address
Sheet5.Range("N" & Count + 4).Value = INPUT_FIRST_MATCH_RANGE.Offset(0, 7).Address
NOT_FOUND:
Set START_CELL_RANGE = START_CELL_RANGE.Offset(1, 0)
Workbooks(SP_File_Name).Close SaveChanges:=False
SP_File_Name = Dir
Count = Count + 1
Wend

To call a cmd command from VBA, I have used WshShell. For early binding I set a reference to the Windows Script Host Object Model
One problem with the Shell function is that it runs asynchronously. By using the WshShell Run method, you can have it wait until finished before executing subsequent commands.
Sample code might look as follows:
Option Explicit
Sub foo()
Dim WSH As WshShell
Dim lErrCode As Long
Set WSH = New WshShell
lErrCode = WSH.Run("cmd /c findstr /C:""Power"" ""C:\Users\Ron\filelist.txt"" > ""C:\Users\Ron\Results2.txt""", 1, True)
If lErrCode <> 0 Then
MsgBox "Error Code: " & lErrCode
Stop
End If
Set WSH = Nothing
Call Shell
End Sub
With regard to your command that you showed in your comment, I would ensure that VBA is interpreting the string correctly for the cmd prompt. Looking at your code line, I would wonder whether you are missing a space between the search string and the file path.

I don't think you can read the contents of a file without opening it. Why not just merge all 5000 files into 1 single file and read that into Excel. Certainly that will be much faster. Use the Command Window, point it to the folder that contains all 5000 files, and enter this:
copy *.csv merge.csv
See the link below for an example.
http://analystcave.com/merge-csv-files-or-txt-files-in-a-folder/

Related

Excel 2013 (32 bit) VBA Scripting.FileSystemObject.DIR not listing all files

I have created an excel application that listed all files in a selected directory (in Excel 2013 32 bit). Following is the script
Const path_col = 1;
Const PDF_File_Col = 2;
Sub input_file(zipFile As String)
Dim source As String
Dim FileCount As Integer
Dim FileName As String
Dim fso
Dim currentPDF As String
Dim currentTXT As String
Dim currentrow As Long
Dim first_Date_Created As String
Dim Cur_Date_Created As String
Dim CurSheet As Worksheet
Set CurSheet = ActiveSheet
source = Replace(zipFile, ".zip", "\")
FileCount = 0
currentrow = Sheets("List").Cells(4, 1).Value
FileName = Dir(source, vbReadOnly)
Set fso = CreateObject("Scripting.FileSystemObject")
Application.ScreenUpdating = False
Do While FileName <> ""
If Right(FileName, 3) = "pdf" Then
Cur_Date_Created = Format(fso.getfile(source & FileName).datecreated, "yyyy-Mmm-dd")
currentPDF = FileName
With Sheets("List")
.Cells(currentrow, path_col).Value = source
.Cells(currentrow, PDF_File_Col).Value = currentPDF
End With
' read_file source & currentTXT, currentrow
currentrow = currentrow + 1
End If
FileName = Dir()
Loop
Application.ScreenUpdating = True
Exit Sub
Issues
The script have been running daily since few years ago under Windows 7 without any issue until moving the Windows 10 a few months ago. After moving into Windows 10 we start finding it failed to list all pdf files (ie. stopped in the middle so how) intermittently without any error message populated (Note: we have not dismissed any error message before running this subroutine.)
The PDF files inside the folder was named by consecutive number and always end with ".pdf" (always in lower case). For Example: If the folder have 1200 PDF file, it will then be in arange as "PDF0001.pdf", "PDF0002.pdf" - "PDF1200.pdf". However, for some reason the sub-routine may stop running at "PDF0900.pdf"and the rest ("PDF0901.pdf - "PDF1200.pdf") will be missed from the list. However, it mayworks fine if we simply rerun the subroutine.
Note
User selected the zip file. However, it was already unzipped before
enter to this subroutine The folder only have pdf files. But usually
have more than 1000 to 5xxx and the total folder size can be upto 500MB
Could you please share me some light on what should I do regarding to this problem?
Thanks in advance!
The folder does not have only pdf files, since the folder path is extracted from a Zip one, but this is not an issue since the retrieved files by Dir are filtered according to their extension.
If you move the first Dir after source = Replace(zipFile, ".zip", "\") it will be better. Only in this way the folder path is a correct one.
Try transforming it in fileName = Dir(source & "*.pdf"). In this way, it will return only the pdf files in the directory and you can comment the line If Right(FileName, 3) = "pdf" Then, not being necessary, anymore.
Dir does not belong to FileSystemObject.
I (only) suppose that not Dir is 'guilty'. It correctly retrieves all files until the code crushes. Try using DoEvents after the line FileName = Dir(). And maybe after, too...

Open ZipFile, Look for Specific File Type And Save File Name

So I posted a question here:
VBA - Find Specific Sub Folders by Name Identifiers
This question was very broad, but I was facing specific issues I needed help identifying and resolving. Now, I managed to resolve those issues in the original post, however, there is still a good portion of the question unanswered and I would like to close the question only when I am able to post the full result.
Currently, what I still need to do, it the last 4 steps:
Open ZipFile
Look for .png extenstion
Grab the name of the .png file
Put the name in a cell in excel
The issue I am facing, is that of properly opening the zip file. I been through so many posts on this but NOTHING seems to work for me.
The closest I have come to accomplishing the task is what I found here:
https://www.ozgrid.com/forum/forum/help-forums/excel-general/109333-how-to-count-number-of-items-in-zip-file-with-vba-2007
I figure, if at the very least, I am able to enter the zip file, I can then work from there. But alas, I am still stuck at simply trying to open the file.
Here is the code I have (Using from the link above):
Sub CountZipContents()
Dim zCount As Double, CountContents As Double
Dim sh As Object, fld As Object, n As Object
Dim FSO As Object
CountContents = 0
zCount = 0
x = "C:\Users\UserName\Desktop\Today\MyFolder\"
Set FSO = CreateObject("Scripting.FileSystemObject")
If FSO.FolderExists(x) Then
For Each FileInFolder In FSO.GetFolder(x).Files
If Right(FileInFolder.Name, 4) = ".png" Then
CountContents = CountContents + 1
ElseIf Right(FileInFolder.Name, 4) = ".Zip" Then
Set sh = CreateObject("Shell.Application")
Set ZipFile = sh.Namespace(CVar(x & "\" & FileInFolder.Name))
Debug.Print FileInFolder.Name
For Each fileInZip In ZipFile.Items
If LCase(fileInZip) Like LCase("*.png") Then
CountContents = CountContents + 1
End If
Next
End If
Next FileInFolder
End If
Set sh = Nothing
End Sub
The issue I get is on this line:
For Each fileInZip In ZipFile.Items
Error Message:
Object variable or With block not set
Whenever I tried to use Shell, like below:
Dim oShell As New Shell
I get this error:
User-defined type not defined
With the below:
Link https://msdn.microsoft.com/en-us/library/windows/desktop/bb776890(v=vs.85).aspx
Dim oApp As Object
Set oApp = CreateObject("WScript.Shell")
'get a shell object
Set oApp = CreateObject("Shell.Application")
If oApp.Namespace(ZipFile).Items.count > 0 Then
I get this error:
Object doesn't support this property or method
On this line:
If oApp.Namespace(ZipFile).Items.count > 0 Then
References to links I have tried:
https://wellsr.com/vba/2015/tutorials/open-and-close-file-with-VBA-Shell/
http://www.vbaexpress.com/forum/showthread.php?38616-quot-shell-quot-not-work-in-Excel
Excel VBA - read .txt from .zip files
I just don't understand why this step is taking so much time to complete.
Your main problem is a really simple one: Your path "C:\Users\UserName\Desktop\Today\MyFolder\" contains already a trailing backslash, and when you set your ZipFile-variable, you are adding another one between path and filename. This will cause the shell-command to fail and ZipFile is nothing.
There are some minor problems with the code. I would recommend to use the GetExtensionName of your FileSystemObject to get the extension and convert this to lowercase so that you catch all files, no matter if they are .PNG, .png or .Png
For Each FileInFolder In FSO.GetFolder(x).Files
Dim fileExt As String
fileExt = LCase(FSO.GetExtensionName(FileInFolder.Name))
If fileExt = "png" Then
CountContents = CountContents + 1
Debug.Print "unzipped " & FileInFolder.Name
ElseIf fileExt = "zip" Then
Dim ZipFileName As String, ZipFile, fileInZip
Set sh = CreateObject("Shell.Application")
ZipFileName = x & FileInFolder.Name
Set ZipFile = sh.Namespace(CVar(ZipFileName))
For Each fileInZip In ZipFile.Items
If LCase(FSO.GetExtensionName(fileInZip)) = "png" Then
CountContents = CountContents + 1
Debug.Print "zipped in " & FileInFolder.Name & ": " & fileInZip
End If
Next
End If
Next FileInFolder
Additionally the strong advice to use Option Explicit and define all your variables. And split commands into smaller pieces. This costs you only a few seconds of typing the extra lines but helps you when debugging your code:
' Instead of
' Set ZipFile = sh.Namespace(CVar(x & "\" & FileInFolder.Name))
' write
Dim fName as string
fName = x & "\" & FileInFolder.Name; ' Now you can check fName and see the problem.
Set ZipFile = sh.Namespace(CVar(fName))
Try this:
Option Explicit
' Just to test CheckZipFolder
Sub TestZip()
Dim sZipFold As String: sZipFold = "C:\Temp\MyZip.zip" ' Change this to the path to your zip file
CheckZipFolder sZipFold
End Sub
Sub CheckZipFolder(ByVal sZipFold As String)
Dim oSh As New Shell ' For this, you need to add reference to 'Microsoft Shell Controls and Automation'
Dim oFi As Object
' Loop through all files in the folder
For Each oFi In oSh.Namespace(sZipFold).Items
' Checking for file type (excel file in this case)
If oFi.Type = "Microsoft Excel Worksheet" Then
MsgBox oFi.Name
'..... Add your actions here
End If
' This will make the UDF recursive. Remove this code if not needed
If oFi.IsFolder Then
CheckZipFolder oFi.Path
End If
Next
' Clear object
Set oSh = Nothing
End Sub

Using function to open and update values in external workbooks, but returning source errors

I've been using a function from another StackOverflow question (I'm SO sorry I can't find the original answer!) to help go through a number of cells in Column L that contains a formula that spits our a hyperlinked filepath. It is meant to open each one (workbook), update the values, then save and close the workbook before opening the next one. See below.
Sub List_UpdateAndSave()
Dim lr As Long
Dim i As Integer
Dim WBSsource As Workbook
Dim FileNames As Variant
Dim msg As String
' Update the individual credit models
With ThisWorkbook.Sheets("List")
lr = .Cells(.Rows.Count, "L").End(xlUp).Row
FileNames = .Range("L2:L" & lr).Value
End With
For i = LBound(FileNames, 1) To UBound(FileNames, 1)
On Error Resume Next
If FileNames(i, 1) Like "*.xls*" Then
Set WBSsource = Workbooks.Open(FileNames(i, 1), _
ReadOnly:=False, _
Password:="", _
UpdateLinks:=3)
If Err = 0 Then
With WBSsource
'do stuff here
.Save
.Close True
End With
Else
msg = msg & FileNames(i, 1) & Chr(10)
On Error GoTo 0
End If
End If
Set WBSsource = Nothing
Next i
If Len(msg) > 0 Then
MsgBox "The Following Files Could Not Be Opened" & _
Chr(10) & msg, 48, "Error"
End If
End Sub
The problem now is I am using this to work on a Network drive, and as a result it cause pathing issues with the Connections/Edit Links part. Each of the files are stored on S:\... which as a result of using the Hyperlink formula, won't be able to find the source data. See below the example image of a file that as been opened through a hyperlink cell from my original workbook. When I go to update the Edit Links section of it, it shows these errors.
If I open that lettered drive in Windows Explorer and find the file, it works with no problems. Open, Update Values > Save > Close, it says unknown...
(but if I click Update values here they update correctly.)
If opened using a Hyperlink formula in a cell (Also directing to S:\..) it says it contains links that cannot be updated. I choose to edit links and they're all "Error: Source not found". The location on them also starts off with \\\corp\... and not S:\.
Anyway to fix this? Apologies for the long winded question.
I'm adding this as an answer as it contains code and is a bit long for a comment.
I'm not sure if it's what you're after though.
The code will take the mapped drive and return the network drive, or visa-versa for Excel files. DriveMap is the variable containing the final string - you may want to adapt into a function.
Sub UpdatePath()
Dim oFSO As Object
Dim oDrv As Object
Dim FileName As String
Dim DriveMap As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
FileName = Range("A1")
If InStr(oFSO.GetExtensionName(FileName), "xls") > 0 Then
For Each oDrv In oFSO.drives
If oDrv.sharename <> "" Then
'Changes \\corp\.... to S:\
If InStr(FileName, oDrv.sharename) = 1 Then
DriveMap = Replace(FileName, oDrv.sharename, oDrv.Path)
End If
'Changes S:\ to \\corp\....
' If InStr(FileName, oDrv.Path) = 1 Then
' DriveMap = Replace(FileName, oDrv.Path, oDrv.sharename)
' End If
End If
Next oDrv
End If
End Sub

Executing Excel macro on/from specific open file

I've got a need to open some Excel files and "pause" then close them. In this process I run one macro on opening, and another on closing. The opening one works fine because it is done as each file is opened. But the closing part of the code I can't get it to run the correct macro. They have the same names, but the file contests are different, and what the macro does per file is different.
This is the gist of what I'm doing now
Set xlApp = CreateObject("Excel.Application")
xlApp.Visible = True
path = "\\Gaalpa1cdfile19\north_sa_staff\Reports\Rpt-ProductionCurves\"
filename2018P1 = "2018 P1.xlsm"
Set xlbook2018P1 = xlApp.WorkBooks.Open(path & filename2018P1)
' Run Macro
xlApp.Run "AutoRefresh"
filename2018P3 = "P3 2018 HRR.xlsm"
Set xlbook2018P3 = xlApp.WorkBooks.Open(path & filename2018P3)
'Run Macro
xlApp.Run "AutoRefresh"
'My "pause"
WScript.Echo ("All Files were" & Chr(013) & _
"opened and refreshed, update ppt before OK" & Chr(013) & _
" DO NOT CLICK OK" & Chr(013))
'==========================
'Below is the trouble spot.
'==========================
xlapp.Run "'" & filename2018P1 & "'" & "!AutoPublish"
xlbook2018P1.Close False
Set xlbook2018P1 = Nothing
xlapp.run "'" & filename2018P3 & "'" & "!AutoPublish"
xlbook2018P3.Close False
Set xlbook2018P3 = Nothing
The first part works fine, but trying to run the file's respective AutoPublish macro does not. The code works fine if I leave out that Run line. (The real file names have spaces and I had to add the single quotes to get it to accept the filename.)
What it appears to be doing is using the macros from the last file opened, not the one it's directed to use it the run line. I think I need a way to "select" the correct file, or give it focus so the macro could run without an explicit filename argument, which it appears to be ignoring anyway.
EDIT:
Solution was:
xlbook2018P1.Activate ' This fixed it, I think
xlapp.Run "'" & filename2018P1 & "'" & "!AutoPublish"
xlbook2018P1.Close False
Set xlbook2018P1 = Nothing
xlbook2018P3.Activate
xlapp.run "'" & filename2018P3 & "'" & "!AutoPublish"
xlbook2018P3.Close False
Set xlbook2018P3 = Nothing
When tackling similar tasks, I usually work around by implementing a master Excel file first, and call a sub in this master file via VBS. The advantage to me seems it is way easier to fullfill all tasks in the VBA of the master file rather than having to code all that in VBS.
Create a master file, e.g. "Master.xlsm", list all your files you need to open on a sheet named "Files" in column A, starting in row 1.
Insert a module and place the following sub in this module:
Sub Main()
Dim strPath As String
Dim strFile As String
Dim lRow As Long
Dim i As Long
Dim k As Integer
Dim n As Long
Dim wb(1 To 3) As Workbook
Dim wbTest As Workbook
Set wbMaster = ThisWorkbook
strPath = "\\Gaalpa1cdfile19\north_sa_staff\Reports\Rpt-ProductionCurves\"
'Check how many files you need to open
With Sheets("Files")
lRow = Sheets("Files").Range("A" & .Rows.Count).End(xlUp).Row
End With
'open all available files
For i = 1 To lRow
Workbooks.Open (wbMaster.Sheets("Files").Range("A" & i).Value)
Next
'now run the two macros in each open file
For k = 2 To Workbooks.Count 'this will work only if your master file is the only one open when starting the sub!
Workbooks(k).Run "'" & Workbooks(k).Name & "'!AutoRefresh"
DoEvents
Workbooks(k).Run "'" & Workbooks(k).Name & "'!AutoPublish"
DoEvents
Next
'and close all files previously opened except for the master file
For n = Workbooks.Count To 2 Step -1
Workbooks(n).Close False
Next
End Sub
It seems like a possible explanation for what you're seeing is that your AutoPublish macro refers to ActiveWorkbook and not the safer ThisWorkbook. If another workbook is active when it's called that could lead to unexpected results.

Pull Document Properties for Directory of DOC files into Excel Sheet

I have a directory full of MS Word .Doc files. I need to generate a list of those files with the page count for each, e.g., "File 1- 50 words, File 2 -100 words" etc. It seems like it'd be easiest to do this in Excel (file name in column A, page count in column B), though I'm not totally committed to that.
Frustratingly, I can view this in Windows Explorer by just adding the "Pages" field, so I know the information is there, but I can't print or otherwise work with it. I can generate a list of files to import into Excel using a command prompt Dir command, but I can't figure out a way to get that list to include page counts.
Does anyone have any ideas?
Update 2 (deleted 1 because I realized I was being an idiot):
I'm trying to execute Noodle's script from a VSB file, but getting a "Subscript out of range" error on line 6. Have not made any changes from what's posted in the reply (I did initially, but they caused different problems), and can't figure out where the error is coming from. Suggestions?
This code dumps all shell properties for a folder that you can use Import Data in Excel to import. It's VBScript and VBScript is legal VBA, so you can optimise it for speed to VBA normal programming style or use as is.
Ag(0) is the command line parameter, ie the folder to be done. You can't use this in VBA but have to replace with the host's (excel/word) equivelent method in the app object (excel or word).
Set objShell = CreateObject("Shell.Application")
Set Ag=Wscript.Arguments
set WshShell = WScript.CreateObject("WScript.Shell")
'Set Fldr=objShell.NameSpace(32)
Set Fldr=objShell.NameSpace(Ag(0))
Set FldrItems=Fldr.Items
Set fso = CreateObject("Scripting.FileSystemObject")
Set DeskFldr=objShell.Namespace(16)
FName=fso.buildpath(DeskFldr.self.path, "Folder Property List.txt")
Set ts = fso.OpenTextFile(FName, 8, true)
For x = 0 to 100
t1 = t1 & Fldr.GetDetailsOf(vbnull, x) & vbtab
Next
ts.write FLDR.self.path & vbcrlf
ts.Write T1 & vbcrlf
T1=""
For Each FldrItem in FldrItems
For x = 0 to 100
t1 = t1 & Fldr.GetDetailsOf(FldrItem, x) & vbtab
Next
t1=t1 & vbcrlf
ts.Write T1
T1=""
Next
msgbox FName & "has a tab delimited list of all properties"
Note this program has different outputs depending on which version of Windows it is run on, and every version is different. My Vista does not have a page field.
Microsoft has a DSOfile.dll which helps with this. It lets you access properties from closed office documents. You download it, install it by running the .exe file, and then set a reference to it in Tools/References in the VBE.
If this were to be used generally, you probably need to add a bunch of error checking to make sure inputs are valid.
Also, there may be some issues if you try to use it with 64-bit versions of Office. I have Office 2007 32 bit running on Windows 7 Professional x64.
Option Explicit
'Set Reference to Microsoft Scripting Runtime
'Set Reference to DSO OLE Document Properties Reader 2.1
Sub GetWordCountsFromDocs()
Const PathName As String = "c:\users\ron\documents\"
Dim FSO As FileSystemObject
Dim FO As Folder
Dim FIs As Files, FI As File
Dim lWC As Long
Dim ColWC As Collection
Dim V()
Dim I As Long
Set ColWC = New Collection
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(PathName)
Set FIs = FO.Files
ReDim V(1 To 2)
For Each FI In FIs
If FI.Name Like "*.doc*" Then
lWC = GetWordCount(PathName & FI.Name)
V(1) = FI.Name
V(2) = lWC
ColWC.Add V
End If
Next FI
ReDim V(0 To ColWC.Count, 1 To 2)
V(0, 1) = "File Name"
V(0, 2) = "Word Count"
For I = 1 To ColWC.Count
V(I, 1) = ColWC(I)(1)
V(I, 2) = ColWC(I)(2)
Next I
ActiveSheet.Cells.Clear
Range("a1").Resize(UBound(V, 1) + 1, UBound(V, 2)) = V
End Sub
'-------------------------------------------------
Private Function GetWordCount(FilePath As String) As Long
Dim DSO As DSOFile.OleDocumentProperties
Dim Prop As Office.DocumentProperty
Dim V As Variant
Set DSO = New DSOFile.OleDocumentProperties
DSO.Open sFileName:=FilePath, ReadOnly:=True
GetWordCount = CallByName(DSO.SummaryProperties, "wordcount", VbGet)
End Function

Resources