Search number of times a string occurs in text file - string

I am trying to go read a text file and count the number of times a phrase/string (not word) occurs in the text file, but so far what I have is this:
Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("D:\VBscript project\testing.txt", ForReading)
strContents = objFile.ReadAll
objFile.Close
i = 0
arrLines = Split(strContents, "")
For Each strLine in arrLines
If InStr(strLine, "hi there") Then
i = i + 1
End If
Next
WScript.Echo "Number of times word occurs: " & i
This will only allow me to count the number of times a word occurs, which does not work when I try to tweak it to count phrases.

Consider the below example:
strPath = "D:\VBscript project\testing.txt"
strPhrase = "hi there"
strContent = ReadTextFile(strPath, 0)
arrContent = Split(strContent, strPhrase)
MsgBox "Number of times phrase occurs: " & UBound(arrContent)
Function ReadTextFile(strPath, lngFormat)
' lngFormat -2 - System default, -1 - Unicode, 0 - ASCII
With CreateObject("Scripting.FileSystemObject").OpenTextFile(strPath, 1, False, lngFormat)
ReadTextFile = ""
If Not .AtEndOfStream Then ReadTextFile = .ReadAll
.Close
End With
End Function
Note that Split-based method is case-sensitive.

strPath = "D:\VBscript project\testing.txt"
strPhrase = "hi there"
strContent = ReadTextFile(strPath, 0)
arrContent = Split(strContent, strPhrase)
MsgBox "Number of times phrase occurs: " & UBound(arrContent)
Function ReadTextFile(strPath, lngFormat)
' lngFormat -2 - System default, -1 - Unicode, 0 - ASCII
With CreateObject("Scripting.FileSystemObject").OpenTextFile(strPath, 1, False, lngFormat)
ReadTextFile = ""
If Not .AtEndOfStream Then ReadTextFile = .ReadAll
.Close
End With
End Function

If I understood you correctly and what you are asking for is really as simple as it looks, you could just change the "hi there" string to a parameter. This way you can dynamically tell your function what to look for.
EDIT: Thanks to #omegastripes I noticed a flaw in my previous code, so this is one that would work.
The code would be like this:
Sub yourSubName (pstrTextToCount)
Const ForReading = 1
Dim objFSO : Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim objFile : Set objFile = objFSO.OpenTextFile("D:\VBscript project\testing.txt", ForReading)
Dim strContents : strContents = objFile.ReadAll
objFile.Close
' You don't need these objects anymore, so release them
Set objFile = Nothing
Set objFSO = Nothing
Dim intTextPosition : intTextPosition = 0
Dim i : i = -1
Do
i = i + 1
intTextPosition = InStr(intTextPosition + 1, strContents, pstrTextToCount)
Loop While (intTextPosition > 0)
Wscript.Echo "Number of times '" & pstrTextToCount & "' occurs: " & i
End Sub
I am assuming your Sub will only do that and this is why I enclosed it into the Sub, End Sub statements. You can add any other coding that you need, but only remember to add your required parameter on the signature of the Sub for it to work.
PS: As a good practice, always Dim your variables and release memory of objects that are no longer needed with Set objName = Nothing

Here a version using Regular Expressions so you can specify if the search needs to be case sensitive.
For testpurpose I use the contents of the script itself as input.
Dim path, phrase, content
path = Wscript.ScriptFullName
phrase = "hi there\^$*+?{}.()|[]"
content = CreateObject("Scripting.FileSystemObject").OpenTextFile(path).ReadAll
Function NumberOfPhrasesInString(phrase, text, IgnoreCase)
Dim regexpr, matches
Set regexpr = New RegExp
phrase = RegExEscape(phrase)
With regexpr
.Pattern = phrase
.Global = True
.IgnoreCase = IgnoreCase
Set matches = .Execute(text)
End With
NumberOfPhrasesInString = matches.count
End Function
Function RegExEscape(str)
Dim special
RegExEscape = str
special = "\^$*+?{.()|[]"
For i=1 To Len(special)
RegExEscape = replace(RegExEscape, Mid(special, i, 1), "\" & Mid(special, i, 1))
Next
End Function
Wscript.Echo "Number of times phrase occurs: " & NumberOfPhrasesInString(phrase, content, false)
As a bonus, since I'm switched to Ruby here also that version
path = __FILE__ # the path to this script for test purposes
phrase = 'HI THERE \ ^ $ * + ? { . ( | ['
puts phrase
content = File.read path
def number_of_phrases_in_string(phrase, text, ignoreCase=false )
escaped = Regexp.escape(phrase)
text.scan(Regexp.new(escaped, ignoreCase)).count.to_s
end
puts "Number of times phrase occurs: " + number_of_phrases_in_string(phrase, content, true)
Or in a single line
puts File.read(__FILE__).scan(Regexp.new(Regexp.escape(phrase), true)).count
The true in the last line defines casesensitivity

Related

Unpredictable errors VBA microsoft word copying comments and text to excel

I tried to make a macro that takes all the comments in a word document, filters based on the comment text and then inserts them in excel with the associated text in a note.
I tried each step iteratively and I managed to copy the comments and pasting the wanted results in the same word document. Then I managed to manipulate excel by adding columns and notes.
Everything broke when I integrated the excel part with the comment extraction part. The errors were invalid procedure call for the line with rightParPos = InStr(leftParPos, comment, ")") which I hadn't touched in a while, so I tried outputting the parameters... That lead to a completely different error - an indexing error for the categories array when categoryCount was 0, which also was very strange. After that I tried removing a strange character in a string and then I suddenly got some kind of "can't connect to excel" at Set xlWB = xlApp.Workbooks.Open(FileName:=xlPath).
It seems completely random to me. I'm thinking that it might be some kind of limit or bug in the Microsoft Word environment that is causing these problems. Anyone knowing what could be a cause of these strange errors?
I couldn't find anything out of the ordinary with my code, but maybe someone on S.O. sees something that immediately looks strange. Sorry for the very messy code.
Sub Test()
Dim comment, text As String
Dim pageNr As Integer
Dim codePrefix, fileName As String
Dim newLinePos, leftParPos, rightParPos As Integer
Dim commentNr As Integer
Dim codeWorksheetIndex As Integer
Dim xlFile, xlDir, xlPath As String
'Excel'
Dim xlApp As Object
Dim xlWB As Object
On Error Resume Next
Set xlApp = GetObject(, "Excel.Application")
If Err Then
Set xlApp = CreateObject("Excel.Application")
End If
On Error GoTo 0
xlFile = "TEST"
xlDir = "My\Directory\path\" 'censored
xlPath = xlDir & xlFile
Set xlWB = xlApp.Workbooks.Open(FileName:=xlPath)
codePrefix = "a-code" 'censored
fileName = Left(ActiveDocument.Name, Len(ActiveDocument.Name)-5)
'insert a column as second column in each spreadsheet'
For sheet_index = 1 to 3
With xlWB.Worksheets(sheet_index)
.Range("B:B").Insert
.Cells(1, 2).Formula = fileName
End With
Next sheet_index
For commentNr = 1 To ActiveDocument.Comments.Count
Dim category As String
Dim categories(1 to 2) As String
Dim categoryCount As Integer
Dim numLeft, numRight as Integer
'Dim j As Integer
comment = LCase(ActiveDocument.Comments(commentNr).Range)
text = ActiveDocument.Comments(commentNr).Scope
pageNr = ActiveDocument.Comments(commentNr).Scope.Information(wdActiveEndPageNumber)
'find newline'
newLinePos = InStr(comment, vbCr)
If newLinePos = 0 Then
newLinePos = InStr(comment, vbLf)
If newLinePos = 0 Then
newLinePos = InStr(comment, vbCrLf)
if newLinePos = 0 then
newLinePos = InStr(comment, Chr(10))
if newLinePos = 0 then
ActiveDocument.Content.InsertAfter Text:="ERROR: comment " & commentNr & " misses newline!" & vbNewLine
End If
End If
End If
End If
'set to initial index for leftpar instr'
rightParPos = 1
categoryCount = 0
Do
leftParPos = InStr(rightParPos, comment, "(")
rightParPos = InStr(leftParPos, comment, ")")
If leftParPos > 0 and rightParPos > 0 Then
numLeft = rightParPos-1
numRight = numLeft - leftParPos
category = Trim(Right(Left(comment, numLeft), numRight))
categories(categoryCount) = category
categoryCount = categoryCount + 1
End If
Loop While leftParPos > 0 And rightParPos > 0
comment = fileName & " (s. " & pageNr & ")" & vbNewLine & Trim(Right(comment, Len(comment)-newLinePos))
If Instr(LCase(comment), codePrefix) = 1 Then
For categoryIndex = 0 To categoryCount-1
category = categories(categoryIndex)
If category = "category1" Then
codeWorksheetIndex = 1
ElseIf category = "category2" Then
codeWorksheetIndex = 2
ElseIf category = "category3" Then
codeWorksheetIndex = 3
End If
With xlWB.Worksheets(codeWorksheetIndex)
.Cells(commentNr+1, 2).Formula = text
.Cells(commentNr+1, 2).NoteText comment 'this only worked without =
End With
Next categoryIndex
End If
Next commentNr
End Sub
There are two critical problems with the code that were overlooked and then there was one third problem that wasn't due to the code but which also resulted in errors.
As #TimWilliams mentioned, one case where leftParPos = 0 was unhandled.
The indexing of categories was entirely wrong and faulty in the code.
The strangest error was due to having the excel file on an external harddrive that disconnected and therefore making excel not responding.

How to finda text and get the page no. for acrobat using vba

I want to find the text and get the page number of text found in acrobat using VBA, I am able to find the text but not able to get the page number. for that
Sub Main()
Dim acrApp, acrAVDoc
Set acrApp = CreateObject("AcroExch.app")
Set acrAVDoc = CreateObject("AcroExch.AVDoc")
acrApp.Show
If acrAVDoc.Open("FileName", "") Then
Ok = acrAVDoc.FindText("Text to search", 0, 1, 1)
MsgBox (Ok)
End If
Set acrAVDoc = Nothing
Set acrApp = Nothing
End Sub
I am not able to set the object for
Set acrPDDoc = CreateObject("Acrobat.AV_PAGE_VIEW")
I know this is an old question, but it was one of the top search results when I was looking for the same info. I never found anything that truly met my needs so I made something up by combining several different resources.
The function below is acceptably fast, even on very large documents. It searches page by page, not word by word, so it will find multi-word matches and words with dashes (case insensitive). It returns the matches for all pages separated by commas.
Hope this is helpful to someone in the future.
Sub Demo()
Dim SearchResult As String
SearchResult = AdobePdfSearch("my search string", "C:\Demo\Demo.pdf")
MsgBox SearchResult
End Sub
Function AdobePdfSearch(SearchString As String, strFileName As String) As String
'Note: A Reference to the Adobe Library must be set in Tools|References!
'Note! This only works with Acrobat Pro installed on your PC, will not work with Reader
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, i, j, iNumPages
Dim strResult As String
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
Set AcroPDDoc = AcroAVDoc.GetPDDoc
iNumPages = AcroPDDoc.GetNumPages
For i = 0 To iNumPages - 1
Set PageNumber = AcroPDDoc.AcquirePage(i)
Set PageContent = CreateObject("AcroExch.HiliteList")
If PageContent.Add(0, 9000) <> True Then Exit Function
Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
' The next line is needed to avoid errors with protected PDFs that can't be read
On Error Resume Next
For j = 0 To AcroTextSelect.GetNumText - 1
Content = Content & AcroTextSelect.GetText(j)
Next j
If InStr(1, LCase(Content), LCase(SearchString)) > 0 Then
strResult = IIf(strResult = "", i + 1, strResult & "," & i + 1)
End If
Content = ""
Next i
AdobePdfSearch = strResult
'Uncomment the lines below if you want to close the PDF when done.
'AcroAVDoc.Close True
'AcroApp.Exit
'Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function
sub checks each page of pdf, word by word
Sub FindtextandPageNumber()
Dim FindWord 'Word you want to search
Dim acroAppObj As Object
Dim PDFDocObj As Object
Dim myPDFPageHiliteObj As Object
Dim iword As Integer, iTotalWords As Integer
Dim numOfPage As Integer, Nthpage As Integer
Dim word As String, sPath As String
Set acroAppObj = CreateObject("AcroExch.App")
Set PDFDocObj = CreateObject("AcroExch.PDDoc")
Set myPDFPageHiliteObj = CreateObject("AcroExch.HiliteList")
Check3 = myPDFPageHiliteObj.Add(0, 32767)
FindWord = "Hello"
acroAppObj.Show
sPath = "Test.pdf" 'Path of pdf where you want to search
PDFDocObj.Open (sPath)
numOfPage = PDFDocObj.GetNumPages
word = vbNullString
Set PDFJScriptObj = Nothing
For Nthpage = 0 To numOfPage - 1
Set pAcroPDPage = PDFDocObj.AcquirePage(Nthpage)
Set wordHilite = pAcroPDPage.CreateWordHilite(myPDFPageHiliteObj)
Set PDFJScriptObj = PDFDocObj.GetJSObject
iTotalWords = wordHilite.GetNumText
iTotalWords = PDFJScriptObj.getPageNumWords(Nthpage)
''check the each word
For iword = 0 To iTotalWords - 1
word = Trim(CStr(PDFJScriptObj.getPageNthWord(Nthpage, iword)))
If word <> "" Then
If word = FindWord Then
PageNumber = Nthpage
msgbox PageNumber
End If
word = ""
End If
Next iword
Next Nthpage
End Sub

Search a string from text file & Return the Line Number using VBA

I have one text file that contains around 100K lines. Now I would like to search a string from the text file. If that string is present then I want to get the line number at which it's present. At the end I need all the occurrence of that string with line numbers from the text file.
* Ordinary Method Tried *
We can read the whole text file line by line. Keep a counter variable that increases after every read. If I found my string then I will return the Counter Variable. The limitation of this method is, I have to traverse through all the 100K lines one by one to search the string. This will decrease the performance.
* Quick Method (HELP REQUIRED)*
Is there any way that will directly take me to the line where my searchstring is present and if found I can return the line number where it's present.
* Example *
Consider below data is present in text file. (say only 5 lines are present)
Now I would like to search a string say "Pune". Now after search, it should return me Line number where string "pune" is present. Here in this case it's present in line 2. I should get "2" as an output. I would like to search all the occurrence of "pune" with their line numbers
I used a spin off of Me How's code example to go through a list of ~10,000 files searching for a string. Plus, since my html files have the potential to contain the string on several lines, and I wanted a staggered output, I changed it up a bit and added the cell insertion piece. I'm just learning, but this did exactly what I needed and I hope it can help others.
Public Sub ReadTxtFile()
Dim start As Date
start = Now
Dim oFSO As Object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Dim oFS As Object
Dim filePath As String
Dim a, b, c, d, e As Integer
a = 2
b = 2
c = 3
d = 2
e = 1
Dim arr() As String
Do While Cells(d, e) <> vbNullString
filePath = Cells(d, e)
ReDim arr(5000) As String
Dim i As Long
i = 0
If oFSO.FileExists(filePath) Then
On Error GoTo Err
Set oFS = oFSO.OpenTextFile(filePath)
Do While Not oFS.AtEndOfStream
arr(i) = oFS.ReadLine
i = i + 1
Loop
oFS.Close
Else
MsgBox "The file path is invalid.", vbCritical, vbNullString
Exit Sub
End If
For i = LBound(arr) To UBound(arr)
If InStr(1, arr(i), "Clipboard", vbTextCompare) Then
Debug.Print i + 1, arr(i)
Cells(a + 1, b - 1).Select
Selection.Insert Shift:=xlDown
Cells(a, b).Value = i + 1
Cells(a, c).Value = arr(i)
a = a + 1
d = d + 1
End If
Next
a = a + 1
d = d + 1
Loop
Debug.Print DateDiff("s", start, Now)
Exit Sub
Err:
MsgBox "Error while reading the file.", vbCritical, vbNullString
oFS.Close
Exit Sub
End Sub
the following fragment could be repalaced like:
Dim arr() As String
Dim i As Long
i = 0
If oFSO.FileExists(filePath) Then
On Error GoTo Err
Set oFS = oFSO.OpenTextFile(filePath)
Do While Not oFS.AtEndOfStream
ReDim Preserve arr(0 To i)
arr(i) = oFS.ReadLine 'to save line's content to array
'If Len(oFSfile.ReadLine) = 0 Then Exit Do 'to get number of lines only
i = i + 1
Loop
oFS.Close
Else
MsgBox "The file path is invalid.", vbCritical, vbNullString
Exit Sub
End If
Here's another method that should work fairly quickly. It uses the shell to execute the FINDSTR command. If you find the cmd box flickers, do an internet search for how to disable it. There are two options provided: one will return the line number followed by a colon and the text of the line containing the keyword. The other will just return the line number.
Not sure what you want to do with the results, so I just have them in a message box.
Option Explicit
'Set reference to Windows Script Host Object Model
Sub FindStrings()
Const FindStr As String = "Pune"
Const FN As String = "C:\users\ron\desktop\LineNumTest.txt"
Dim WSH As WshShell
Dim StdOut As Object
Dim S As String
Set WSH = New WshShell
Set StdOut = WSH.Exec("cmd /c findstr /N " & FindStr & Space(1) & FN).StdOut
Do Until StdOut.AtEndOfStream
S = S & vbCrLf & StdOut.ReadLine
'If you want ONLY the line number, then
'S = S & vbCrLf & Split(StdOut.ReadLine, ":")(0)
Loop
S = Mid(S, 2)
MsgBox (S)
End Sub

Insert spaces in particular places on particular lines in .txt file

I've been trying to write a program that opens a text file and inserts spaces after the 5th, 6th and 7th word for every line beginning with C_PIN.
I feel that I have it almost complete but I've benn getting run-time errors 5 and 438.
Example text being read:
COMP C48 66250110810 cap sc_cap.0603_H9 43.3959 74.3331 1 0
C_PROP (PB_FREE,"Y") (VALUE,"10nF") (TOLER,"10%") (PART_NAME,"06035C103K4T2A")
C_PIN C48-1 43.3959 75.0951 1 1 0 sp.0603_H9.1 /N$1567
C_PIN C48-2 43.3959 73.5711 1 1 0 sp.0603_H9.2 GN
An extra space is required after the 1 1 0
Here's where I think the problem in my code lies:
x = " "
Do While Not EOF(infilenum%)
Line Input #infilenum%, a$
If Left$(a$, 5) = "C_PIN" Then
For Each x In InStr
'If InStr(strText, " ") Then
w = w + 1
'w = strText.Split
For w = 5 To w = 7
My.Computer.FileSystem.WriteAllText (infilename$)
strText = My.Computer.FileSystem.ReadAllText(infilename$).Replace(w, x + w)
vb.newline
Any help is much appreciated!
For Each x In InStr
isn't valid in any way!!!
You can use Split and Join:
If Left(a, 5) = "C_PIN" Then
va = Split(a, " ")
va(4) = va(4) & " " 'Add extra space
va(5) = va(4) & " "
va(6) = va(4) & " "
a = Join(va, " ") 'Join with extra spaces added
End If
Now you can write the string.
I played with an alternative version that uses a regexp to make a single shot replacement
While it looks a little complicated, the plus - other than a single shot replacement - is that it only will alter lines that
start with C_PIN
have (at least) a further 6 words
This sample take your initial file, and saves a second version with the padded spacing.
Upated for additional requirement, using two separate regexp replacements
Sub ReDo()
Dim objFso As Object
Dim objFil As Object
Dim objFil2 As Object
Dim objRegex As Object
Dim strFile As String
Dim strAll As String
strFil = "c:\temp\REnglish.txt"
strFil2 = "c:\temp\REnglish2.txt"
Set objFso = CreateObject("Scripting.FileSystemObject")
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "(\nC_PIN\s)((\b[^\s]+\b\s){3})(\b[^\s]+\b\s)(\b[^\s]+\b\s)(\b\d\b\s)"
.Global = True
Set objFil = objFso.OpenTextFile(strFil)
strAll = objFil.ReadAll
Set objFil2 = objFso.createtextfile(strFil2)
strAll = .Replace(strAll, "$1$2$3$4 $5 $6 ")
.Pattern = "(\nC_PIN\s)((\b[^\s]+\b\s){3})(\b[^\s]+\b\s)(\b[^\s]+\b\s)(\b\d{2,}\b\s)"
objFil2.write .Replace(strAll, "$1$2$3$4 $5 $6")
End With
objFil.Close
objFil2.Close
End Sub

read folders and any document properties from excel?

I am wanting to try something and I'm fairly sure it's possible, but not really sure!!
In MS Excel (2003) can I write a VBA script which will open a location (eg: s://public/marketing/documents/) and list all the documents located within there (filename)?
The ultimate goal would be to have the document name, date last modified, date created and modified by name.
Is this possible? I'd like to return any found values in rows on a sheet. eg: type: FOLDER, type: Word Doc etc.
Thanks for any info!
Done that recently. Use the DSOFile object. In Excel-VBA you first need to create a reference to Dsofile.dll ("DSO OLE Document Properties Reader 2.1" or similar). Also check you have a reference to the Office library
First you may want to select the file path which you want to examine
Sub MainGetProps()
Dim MyPath As String
MyPath = GetDirectoryDialog()
If MyPath = "" Then Exit Sub
GetFileProps MyPath, "*.*"
End Sub
Let's have a nice Path selection window
Function GetDirectoryDialog() As String
Dim MyFD As FileDialog
Set MyFD = Application.FileDialog(msoFileDialogFolderPicker)
With MyFD
.AllowMultiSelect = False
.Show
If .SelectedItems.Count <> 0 Then
GetDirectoryDialog = .SelectedItems(1)
End If
End With
End Function
Now let's use the DSO object to read out info ... I reduced the code to the bare necessary
Private Sub GetFileProps(MyPath As String, Arg As String)
Dim Idx As Integer, Jdx As Integer, MyFSO As FileSearch, MyRange As Range, MyRow As Integer
Dim DSOProp As DSOFile.OleDocumentProperties
Set DSOProp = New DSOFile.OleDocumentProperties
Set MyRange = ActiveSheet.[A2] ' your output is nailed here and overwrites anything
Set MyFSO = Application.FileSearch
With MyFSO
.NewSearch
.LookIn = MyPath
.SearchSubFolders = True ' or false as you like
.Filename = Arg
.FileType = msoFileTypeAllFiles
If .Execute() > 0 Then
MsgBox .FoundFiles.Count & " file(s) found." ' to see what you will get
For Idx = 1 To .FoundFiles.Count
DSOProp.Open .FoundFiles(Idx) ' examine the DSOProp element in debugger to find all summary property names; not all may be filled though
Debug.Print .FoundFiles(Idx)
Debug.Print "Title: "; DSOProp.SummaryProperties.Title
Debug.Print "Subject: "; DSOProp.SummaryProperties.Subject
' etc. etc. write it into MyRange(Idx,...) whatever
' now hunt down the custom properties
For Jdx = 0 To DSOProp.CustomProperties.Count - 1
Debug.Print "Custom #"; Jdx; " ";
Debug.Print " Name="; DSOProp.CustomProperties(Jdx).Name;
If DSOProp.CustomProperties(Jdx).Type <> dsoPropertyTypeUnknown Then
Debug.Print " Value="; DSOProp.CustomProperties(Jdx).Value
Else
Debug.Print " Type=unknowwn; don't know how to print";
End If
MyRow = MyRow + 1
Next Jdx
DSOProp.Close
Next Idx
Else
MsgBox "There were no files found."
End If
End With
End Sub
and that should be it
good luck MikeD

Resources