Search using VBscript and populate Options dynamically - search

Objective
To search through a bunch of lines in a text file, and if a match is found populate that line in a Options list that is displayed in HTA.
Eg: If 'Setup' is found in 5 lines out of total 10, all the 5 lines need to be populated as 'Options'
Code
Set objFSO = CreateObject("Scripting.Filesystemobject")
Set objRegEx = New RegExp
With objRegEx
.Pattern = "(\b" & "setup" & "\b)"
.IgnoreCase = True
.Global = True
End With
Set objOpen = objFSO.OpenTextFile ("FileList.lst", 1)
Contents = objOpen.ReadAll
Set objMatchAll = objRegEx.Execute( Contents )
If objMatchAll.count > 0 Then
Set objOpen = objFSO.OpenTextFile ("FileList.lst", 1)
Do Until objOpen.AtEndOfStream
Line = objOpen.ReadLine
Set objMatchAny = objRegEx.Execute( Line )
If objMatchAny.count > 0 Then
Set objOption = Document.createElement("OPTION")
objOption.Text = Line
objOption.Value = Line
ValuesList.add objOption
'Matched = Matched & vbNewLine & Line
MatchCount = MatchCount + 1
End If
Loop
Else
MsgBox "No results"
End If
Explanation
The code looks for the term 'setup' (of course this is dynamically populated at the time of execution) in the file 'FileList.lst'. When results are found an 'Option' object is generated and added to the 'ValuesList' List which is in an HTML body using tags.
Note 1: The reason i generate an 'Options' object instead of just loading the line is so that we can populate the tag. The tag is used so we can select any one of the search result.
Note 2: The reason the 'Contents' variable is created so that incase if there are no matches at all, it need not go to each line to find a match, which would take longer to just display that message.
Problem
The code works fine, tested upto 150 results (outcome), but when there is a large number of matches my HTA freezes.
Question
Can the existing code be modified to perform better, like a different method to instead of creating the an 'Options' object, an alternate method to generate the 'ValuesList' ?
Instead of running two objRegEx search results, is there way to return the matched line from 'Contents' Varialable ?
Update
Ok, i ran my script without the objOption part which is not creating and adding options to my ValuesList, only regexp parsing through 58k lines, also resulting in 58k matches and the outcome was 3secs ... so looks like i need an alternative to populate my HTA options list ... its not able to handle that many options to select from ... any alternatives ? I used the same logic in a browser and the entire browser freezes ...

It seems like you really only care about whether or not the regex matches in a particular line or not. Since you don't need to know how many matches occurred, nor do you need the actual match text, you can use the Test method instead. This should be faster because it will stop after the first match, plus it doesn't have to construct the Matches collection. I'd also leave the Global property at its default value of False for pretty much the same reason, but if you're just using the Test method, I don't think the Global property matters.

Thanks to Cheran Shunmugavel i found out that the best way is to use DocumentFragments. I impleted that concept in my code and the results were great !
New Code
Set objFSO = CreateObject("Scripting.Filesystemobject")
Set objRegEx = New RegExp
With objRegEx
.Pattern = "(\b" & "setup" & "\b)"
.IgnoreCase = True
.Global = True
End With
Set objFragment = Document.createDocumentFragment()
Set objOpen = objFSO.OpenTextFile ("FileList.lst", 1)
Contents = objOpen.ReadAll
Set objMatchAll = objRegEx.Execute( Contents )
If objMatchAll.count > 0 Then
Set objOpen = objFSO.OpenTextFile ("FileList.lst", 1)
Do Until objOpen.AtEndOfStream
Line = objOpen.ReadLine
Set objMatchAny = objRegEx.Execute( Line )
If objMatchAny.count > 0 Then
Set objOption = Document.createElement("OPTION")
objOption.innerHTML = Line
objFragment.appendchild objOption
MatchCount = MatchCount + 1
End If
Loop
ViewList.appendChild objFragment.cloneNode(True)
Else
MsgBox "No results"
End If
Old Code : 53mins 23secs
New Code : 31secs

Related

How to loop through XML-nodes and validate if values exists?

I have through an API fetched my data as an XML, and I wish to cycle through nodes (there are several of the same type) and add them to certain fields/a table.
Example from the XML-file:
<HistRating
xmlns="">
<EndrAr>2020</EndrAr>
<EndrMnd>7</EndrMnd>
<Rating>A</Rating>
</HistRating>
<HistRating
xmlns="">
<EndrAr>2019</EndrAr>
<EndrMnd>6</EndrMnd>
<Rating>A</Rating>
</HistRating>
I have tried the following format (at this point the XML I need is in a string in xmlDoc xmlDoc = CreateObject("MSXML2.DOMDocument.6.0"). Fully aware that this is not a really "sexy" way to write it, but I'm new at this game:
Set nodeXML = xmlDoc.getElementsByTagName("EndrAr")
Range("G1").Value = nodeXML(1).Text
Range("H1").Value = nodeXML(2).Text
Range("I1").Value = nodeXML(3).Text
Set nodeXML = xmlDoc.getElementsByTagName("EndrMnd")
Range("G2").Value = nodeXML(1).Text
Range("H2").Value = nodeXML(2).Text
Range("I2").Value = nodeXML(3).Text
Set nodeXML = xmlDoc.getElementsByTagName("Rating")
Range("G3").Value = nodeXML(1).Text
Range("H3").Value = nodeXML(2).Text
Range("I3").Value = nodeXML(3).Text
This works great as long as all three items are there. Unfortunately that is not given. If it is a new company i.e. (3) wont exist (there is one line per year above), and I would like to either set the cell to Blank or No value or something.
The result from when I run the above code:
But if I try to add a line 4 to test what happens if value does not exists I get the following (for obvious reasons)
What I would love some help with is:
Can I by some "magic" add a ifmissing (tried it, but could not get it to work)?
Other ways to add a if variable is not found, input following into cell
Or are there a complete different way I should have solved this?
This is to add accounting data from last X available years (where X is ie 4, or less if not 4 is available) from 30 nodes.
You could use an Error trapping Function. Note in the code below we choose not to use the returned boolean.
Dim myTest as String
.
.
TryReadingXmlNode nodeXML,1, myText
Range("G1").Value = myText
.
.
Public Function TryReadingXmlNode(ByVal ipNode as object, ByVal ipIndex as Long, ByRef opText as string) as boolean
On Error Resume Next
opText=ipNode.Item(ipIndex).Text
TryReadingXmlNode=Len(opText)>0
If err.number>0 then opText="NoValue"
on Error Goto 0
End Function
Start by querying all of the HistRating elements, then loop over that collection:
Const MAX_YEARS As Long = 4
Dim ratings, rating, c As Range, i as Long
Set c= Range("A1")
Set ratings = xmlDoc.getElementsByTagName("HistRating")
For Each rating in ratings
c.offset(0, i) = rating.getElementsByTagName("EndrAr")(0).Text
c.offset(1, i) = rating.getElementsByTagName("EndrMnd")(0).Text
c.offset(2, i) = rating.getElementsByTagName("Rating")(0).Text
i = i + 1
If i >= MAX_YEARS Then Exit For 'exit if processed enough nodes
Next rating

Extract file names from a File Explorer search into Excel

This has been bugging me for while as I feel I have few pieces of the puzzle but I cant put them all together
So my goal is to be able to search all .pdfs in a given location for a keyword or phrase within the content of the files, not the filename, and then use the results of the search to populate an excel spreadsheet.
Before we start, I know that this easy to do using the Acrobat Pro API, but my company are not going to pay for licences for everyone so that this one macro will work.
The windows file explorer search accepts advanced query syntax and will search inside the contents of files assuming that the correct ifilters are enabled. E.g. if you have a word document called doc1.docx and the text inside the document reads "blahblahblah", and you search for "blah" doc1.docx will appear as the result.
As far as I know, this cannot be acheived using the FileSystemObject, but if someone could confirm either way that would be really useful?
I have a simple code that opens an explorer window and searches for a string within the contents of all files in the given location. Once the search has completed I have an explorer window with all the files required listed. How do I take this list and populate an excel with the filenames of these files?
dim eSearch As String
eSearch = "explorer " & Chr$(34) & "search-ms://query=System.Generic.String:" & [search term here] & "&crumb=location:" & [Directory Here] & Chr$(34)
Call Shell (eSearch)
Assuming the location is indexed you can access the catalog directly with ADO (add a reference to Microsoft ActiveX Data Objects 2.x):
Dim cn As New ADODB.Connection
Dim rs As New ADODB.Recordset
Dim sql As String
cn.Open "Provider=Search.CollatorDSO;Extended Properties='Application=Windows'"
sql = "SELECT System.ItemNameDisplay, System.ItemPathDisplay FROM SystemIndex WHERE SCOPE='file:C:\look\here' AND System.Kind <> 'folder' AND CONTAINS(System.FileName, '""*.PDF""') AND CONTAINS ('""find this text""')"
rs.Open sql, cn, adOpenForwardOnly, adLockReadOnly
If Not rs.EOF Then
Do While Not rs.EOF
Debug.Print "File: "; rs.Collect(0)
Debug.Print "Path: "; rs.Collect(1)
rs.MoveNext
Loop
End If
Try using the next function, please:
Function GetFilteredFiles(foldPath As String) As Collection
'If using a reference to `Microsoft Internet Controls (ShDocVW.dll)_____________________
'uncomment the next 2 lines and comment the following three (without any reference part)
'Dim ExpWin As SHDocVw.ShellWindows, CurrWin As SHDocVw.InternetExplorer
'Set ExpWin = New SHDocVw.ShellWindows
'_______________________________________________________________________________________
'Without any reference:_____________________________________
Dim ExpWin As Object, CurrWin As Object, objshell As Object
Set objshell = CreateObject("Shell.Application")
Set ExpWin = objshell.Windows
'___________________________________________________________
Dim Result As New Collection, oFolderItems As Object, i As Long
Dim CurrSelFile As String
For Each CurrWin In ExpWin
If Not CurrWin.Document Is Nothing Then
If Not CurrWin.Document.FocusedItem Is Nothing Then
If left(CurrWin.Document.FocusedItem.Path, _
InStrRev(CurrWin.Document.FocusedItem.Path, "\")) = foldPath Then
Set oFolderItems = CurrWin.Document.folder.Items
For i = 0 To oFolderItems.count
On Error Resume Next
If Err.Number <> 0 Then
Err.Clear: On Error GoTo 0
Else
Result.Add oFolderItems.item(CLng(i)).Name
On Error GoTo 0
End If
Next
End If
End If
End If
Next CurrWin
Set GetFilteredFiles = Result
End Function
Like it is, the function works without any reference...
The above function must be called after you executed the search query in your existing code. It can be called in the next (testing) way:
Sub testGetFilteredFiles()
Dim C As Collection, El As Variant
Set C = GetFilteredFiles("C:\Teste VBA Excel\")'use here the folder path you used for searching
For Each El In C
Debug.Print El
Next
End Sub
The above solution iterates between all IExplorer windows and return what is visible there (after filtering) for the folder you initially used to search.
You can manually test it, searching for something in a specific folder and then call the function with that specific folder path as argument ("\" backslash at the end...).
I've forgotten everything I ever knew about VBA, but recently stumbled across an easy way to execute Explorer searches using the Shell.Application COM object. My code is PowerShell, but the COM objects & methods are what's critical. Surely someone here can translate.
This has what I think are several advantages:
The query text is identical to what you wouold type in the Search Bar in Explorer, e.g.'Ext:pdf Content:compressor'
It's easily launched from code and results are easily extracted with code, but SearchResults window is available for visual inspection/review.
With looping & pauses, you can execute a series of searches in the same window.
I think this ability has been sitting there forever, but the MS documentation of the Document object & FilterView method make no mention of how they apply to File Explorer.
I hope others find this useful.
$FolderToSearch = 'c:\Path\To\Folder'
$SearchBoxText = 'ext:pdf Content:compressor'
$Shell = New-Object -ComObject shell.application
### Get handles of currenlty open Explorer Windows
$CurrentWindows = ( $Shell.Windows() | Where FullName -match 'explorer.exe$' ).HWND
$WinCount = $Shell.Windows().Count
$Shell.Open( $FolderToSearch )
Do { Sleep -m 50 } Until ( $Shell.Windows().Count -gt $WinCount )
$WindowToSerch = ( $Shell.Windows() | Where FullName -match 'explorer.exe$' ) | Where { $_.HWND -notIn $CurrentWindows }
$WindowToSearch.Document.FilterView( $SearchBoxText )
Do { Sleep -m 50 } Until ( $WindowToSearch.ReadyState -eq 4 )
### Fully-qualified name:
$FoundFiles = ( $WindowToSearch.Document.Folder.Items() ).Path
### or just the filename:
$FoundFiles = ( $WindowToSearch.Document.Folder.Items() ).Name
### $FoundFIles is an array of strings containing the names.
### The Excel portion I leave to you! :D

replace multiple cell value from a group of strings

Hi I want to replace multiple values in one for example :
sunny 91878656 rere
vicky 91864567 gfgf
honey 91941561 ytyt
monika 98887888 hjhj
NOw if I want to replace the following two values together with space:
91941561
98887888
How can I do it ?
I dont want to do simple find and replace as this is just an exmaple I have a list of over 12000 records and the numbers which needs to be replaced are more than 900
the reason i want to replace is they are not valid anymore.
also is it possible to remove whole record like if 91941561 is found whole of the record should be deleted or replaced with space like:
honey 91941561 ytyt
monika 98887888 hjhj
thanks
You may use the Regular expression. Below is a sample code
Sub test()
Dim str_demo As String
str_demo = "monika 98887888 hjhj"
MsgBox getString(str_demo)
End Sub
Function getString(ByVal str As String) As String
Dim objRegEx As Object
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.Global = True
objRegEx.Pattern = "[a-zA-Z]"
Set allMatches = objRegEx.Execute(str)
For i = 0 To allMatches.Count - 1
result = result & allMatches.Item(i)
Next
getString = result
End Function

Extract tables from pdf (to excel), pref. w/ vba

I am trying to extract tables from pdf files with vba and export them to excel. If everything works out the way it should, it should go all automatic. The problem is that the table are not standardized.
This is what I have so far.
VBA (Excel) runs XPDF, and converts all .pdf files found in current folder to a text file.
VBA (Excel) reads through each text file line by line.
And the code:
With New Scripting.FileSystemObject
With .OpenTextFile(strFileName, 1, False, 0)
If Not .AtEndOfStream Then .SkipLine
Do Until .AtEndOfStream
//do something
Loop
End With
End With
This all works great. But now I am getting to the issue of extracting the tables from the text files.
What I am trying to do is VBA to find a string e.g. "Year's Income", and then output the data, after it, into columns. (Until the table ends.)
The first part is not very difficult (find a certain string), but how would I go about the second part. The text file will look like this Pastebin. The problem is that the text is not standardized. Thus for example some tables have 3-year columns (2010 2011 2012) and some only two (or 1), some tables have more spaces between the columnn, and some do not include certain rows (such as Capital Asset, net).
I was thinking about doing something like this but not sure how to go about it in VBA.
Find user defined string. eg. "Table 1: Years' Return."
a. Next line find years; if there are two we will need three columns in output (titles +, 2x year), if there are three we will need four (titles +, 3x year).. etc
b. Create title column + column for each year.
When reaching end of line, go to next line
a. Read text -> output to column 1.
b. Recognize spaces (Are spaces > 3?) as start of column 2. Read numbers -> output to column 2.
c. (if column = 3) Recognize spaces as start of column 3. Read numbers -> output to column 3.
d. (if column = 4) Recognize spaces as start of column 4. Read numbers -> output to column 4.
Each line, loop 4.
Next line does not include any numbers - End table. (probably the easiet just a user defined number, after 15 characters no number? end table)
I based my first version on Pdf to excel, but reading online people do not recommend OpenFile but rather FileSystemObject (even though it seems to be a lot slower).
Any pointers to get me started, mainly on step 2?
You have a number of ways to dissect a text file and depending on how complex it is might cause you to lean one way or another. I started this and it got a bit out of hand... enjoy.
Based on the sample you've provided and the additional comments, I noted the following. Some of these may work well for simple files but can get unwieldy with bigger more complex files. Furthermore, there may be slightly more efficient methods or tricks to what I have used here but this will definitely get you going an achieve the desired outcome. Hopefully this makes sense in conjunction with the code provided:
You can use booleans to help you determine what 'section' of the text file you are in. Ie use InStr on the current line to
determine you are in a Table by looking for the text 'Table' and then
once you know you are in the 'Table' section of the file start
looking for the 'Assets' section etc
You can use a few methods to determine the number of years (or columns) you have. The Split function along with a loop will do
the job.
If your files always have constant formatting, even only in certain parts, you can take advantage of this. For example, if you know your
file line will always have a dollar sign in front of the them, then
you know this will define the column widths and you can use this on
subsequent lines of text.
The following code will extract the Assets details from the text file, you can mod it to extract other sections. It should handle multiple rows. Hopefully I've commented it sufficient. Have a look and I'll edit if needs to help out further.
Sub ReadInTextFile()
Dim fs As Scripting.FileSystemObject, fsFile As Scripting.TextStream
Dim sFileName As String, sLine As String, vYears As Variant
Dim iNoColumns As Integer, ii As Integer, iCount As Integer
Dim bIsTable As Boolean, bIsAssets As Boolean, bIsLiabilities As Boolean, bIsNetAssets As Boolean
Set fs = CreateObject("Scripting.FileSystemObject")
sFileName = "G:\Sample.txt"
Set fsFile = fs.OpenTextFile(sFileName, 1, False)
'Loop through the file as you've already done
Do While fsFile.AtEndOfStream <> True
'Determine flag positions in text file
sLine = fsFile.Readline
Debug.Print VBA.Len(sLine)
'Always skip empty lines (including single spaceS)
If VBA.Len(sLine) > 1 Then
'We've found a new table so we can reset the booleans
If VBA.InStr(1, sLine, "Table") > 0 Then
bIsTable = True
bIsAssets = False
bIsNetAssets = False
bIsLiabilities = False
iNoColumns = 0
End If
'Perhaps you want to also have some sort of way to designate that a table has finished. Like so
If VBA.Instr(1, sLine, "Some text that designates the end of the table") Then
bIsTable = False
End If
'If we're in the table section then we want to read in the data
If bIsTable Then
'Check for your different sections. You could make this constant if your text file allowed it.
If VBA.InStr(1, sLine, "Assets") > 0 And VBA.InStr(1, sLine, "Net") = 0 Then bIsAssets = True: bIsLiabilities = False: bIsNetAssets = False
If VBA.InStr(1, sLine, "Liabilities") > 0 Then bIsAssets = False: bIsLiabilities = True: bIsNetAssets = False
If VBA.InStr(1, sLine, "Net Assests") > 0 Then bIsAssets = True: bIsLiabilities = False: bIsNetAssets = True
'If we haven't triggered any of these booleans then we're at the column headings
If Not bIsAssets And Not bIsLiabilities And Not bIsNetAssets And VBA.InStr(1, sLine, "Table") = 0 Then
'Trim the current line to remove leading and trailing spaces then use the split function to determine the number of years
vYears = VBA.Split(VBA.Trim$(sLine), " ")
For ii = LBound(vYears) To UBound(vYears)
If VBA.Len(vYears(ii)) > 0 Then iNoColumns = iNoColumns + 1
Next ii
'Now we can redefine some variables to hold the information (you'll want to redim after you've collected the info)
ReDim sAssets(1 To iNoColumns + 1, 1 To 100) As String
ReDim iColumns(1 To iNoColumns) As Integer
Else
If bIsAssets Then
'Skip the heading line
If Not VBA.Trim$(sLine) = "Assets" Then
'Increment the counter
iCount = iCount + 1
'If iCount reaches it's limit you'll have to redim preseve you sAssets array (I'll leave this to you)
If iCount > 99 Then
'You'll find other posts on stackoverflow to do this
End If
'This will happen on the first row, it'll happen everytime you
'hit a $ sign but you could code to only do so the first time
If VBA.InStr(1, sLine, "$") > 0 Then
iColumns(1) = VBA.InStr(1, sLine, "$")
For ii = 2 To iNoColumns
'We need to start at the next character across
iColumns(ii) = VBA.InStr(iColumns(ii - 1) + 1, sLine, "$")
Next ii
End If
'The first part (the name) is simply up to the $ sign (trimmed of spaces)
sAssets(1, iCount) = VBA.Trim$(VBA.Mid$(sLine, 1, iColumns(1) - 1))
For ii = 2 To iNoColumns
'Then we can loop around for the rest
sAssets(ii, iCount) = VBA.Trim$(VBA.Mid$(sLine, iColumns(ii) + 1, iColumns(ii) - iColumns(ii - 1)))
Next ii
'Now do the last column
If VBA.Len(sLine) > iColumns(iNoColumns) Then
sAssets(iNoColumns + 1, iCount) = VBA.Trim$(VBA.Right$(sLine, VBA.Len(sLine) - iColumns(iNoColumns)))
End If
Else
'Reset the counter
iCount = 0
End If
End If
End If
End If
End If
Loop
'Clean up
fsFile.Close
Set fsFile = Nothing
Set fs = Nothing
End Sub
I cannot examine the sample data as the PasteBin has been removed. Based on what I can glean from the problem description, it seems to me that using Regular Expressions would make parsing the data much easier.
Add a reference to the Scripting Runtime scrrun.dll for the FileSystemObject.
Add a reference to the Microsoft VBScript Regular Expressions 5.5. library for the RegExp object.
Instantiate a RegEx object with
Dim objRE As New RegExp
Set the Pattern property to "(\bd{4}\b){1,3}"
The above pattern should match on lines containing strings like:
2010
2010 2011
2010 2011 2012
The number of spaces between the year strings is irrelevant, as long as there is at least one (since we're not expecting to encounter strings like 201020112012 for example)
Set the Global property to True
The captured groups will be found in the individual Match objects from the MatchCollection returned by the Execute method of the RegEx object objRE. So declare the appropriate objects:
Dim objMatches as MatchCollection
Dim objMatch as Match
Dim intMatchCount 'tells you how many year strings were found, if any
Assuming you've set up a FileSystemObject object and are scanning the text file, reading each line into a variable strLine
First test to see if the current line contains the pattern sought:
If objRE.Test(strLine) Then
'do something
Else
'skip over this line
End If
Set objMatches = objRe.Execute(strLine)
intMatchCount = objMatches.Count
For i = 0 To intMatchCount - 1
'processing code such as writing the years as column headings in Excel
Set objMatch = objMatches(i)
e.g. ActiveCell.Value = objMatch.Value
'subsequent lines beneath the line containing the year strings should
'have the amounts, which may be captured in a similar fashion using an
'additional RegExp object and a Pattern such as "(\b\d+\b){1,3}" for
'whole numbers or "(\b\d+\.\d+\b){1,3}" for floats. For currency, you
'can use "(\b\$\d+\.\d{2}\b){1,3}"
Next i
This is just a rough outline of how I would approach this challenge. I hope there is something in this code outline that will be of help to you.
Another way to do this I have some success with is to use VBA to convert to a .doc or .docx file and then search for and pull tables from the Word file. They can be easily extracted into Excel sheets. The conversion seems to handle tables nicely. Note however that it works on a page by page basis so tables extending over a page end up as separate tables in the word doc.

Multiple Range.Find() in VBA

I have met this interesting problem today. I have a loop inside another loop and both use Find for different purposes. What happens is that using Find in the inside loop screws up the Find on the outer loop. I'm guessing excel keeps memory of only one search instance. Is there some way to work around this or is this a design matter ?
Here's some shortened version of my code.
Sub Main()
'Some boring stuff
Set lst_rapports = Worksheets("mappingTRANSIT").range("lst_rapports")
Set first_result = lst_rapports.Find(rap_choisi)
Set active_result = first_result
Sheets("req01").Unprotect "shoobidoowap"
If Not first_result Is Nothing Then
' ...
Do
Sheets("req01").Select
' ...
For i = 0 To 4
Set rubrique_cell = range("E:E").Find(rub(i))
If Not rubrique_cell Is Nothing Then
' ...
End If
Next i
' Yet more boring stuff...
Set active_result = lst_rapports.FindNext(active_result)
Loop Until active_result.Address = first_result.Address
Else
MsgBox "Impossible de trouver """ & rap_choisi & """ !"
End If
Sheets("req01").Protect "shoobidoowap"
End Sub
Notice the second use of .Find in the for loop.
Is there some way I can preserve the first search in some kind of temporary variable and restore it back after that ?
Many thanks.
When you run FindNext(MSDN for FindNext), it automatically uses the same what as the last call on Find, even if used for a different range.
To correct for this, instead of using
Set active_result = lst_rapports.FindNext(active_result)
use
Set active_result = lst_rapports.Find(rap_choisi,active_result)

Resources