Search a file based on a text file's words

Search a file based on a text file's words - search

I want to search several words at the same time in a text file.
For example think I want to search these 3 words: Majid,superuser,device
Normally I should search for them one by one, and I can not search all of them at the same time. So I want to search these words at the same time in a text file.
I want to enter those 3 words in a text file, one word per line. Let's name it SearchText. Now I have a Target Text which I want to search those words in it. Let's name it TargetText.
I want to tell an app or something similar to get words from SearchText and find them in TargetText and Highlights them or gives me the find result.
I hope I'm clear. So can anyone hep me?

You're clear. I think the best option would be Regex.
Try this:
Option Explicit
Dim oFso : Set oFso = CreateObject("Scripting.FileSystemObject")
Dim srcPath : SrcPath = oFso.GetParentFolderName(WScript.ScriptFullName)
Dim sWordList : sWordList = ofso.OpenTextFile(oFso.BuildPath(srcPath, "search.txt")).ReadAll()
Dim sTargFile : sTargFile = ofso.OpenTextFile(oFso.BuildPath(srcPath, "target.txt")).ReadAll()
Dim strWords : strWords = Join(Split(sWordList, vbCrLf), "|")
Dim oReg : Set oReg = New RegExp
Dim oDict : Set oDict = CreateObject("Scripting.Dictionary") 'for found words
oDict.CompareMode = vbTextCompare 'case insensitive
With oReg
.Global = True
.IgnoreCase = True
.Pattern = "("& strWords &")" ' (word1|word2|word3|etc..)
'if the words contain some special chars for regex, you may need to escape with \ char
'see the information below: http://goo.gl/cqGVp
Dim collFND : Set collFND = oReg.Execute(sTargFile)
Dim Fnd
'WScript.Echo String(50, "-")
For Each Fnd In collFND
With Fnd
'WScript.Echo "Word : "& .Value, ", Position: ("& .FirstIndex &","& .Length &")"
oDict(.Value) = ""
End With
Next
Set collFND = Nothing
'WScript.Echo String(50, "-"), vbCrLf, "Higlighted Output:", oReg.Replace(sTargFile, "[$1]")
ofso.CreateTextFile(oFso.BuildPath(srcPath, "foundwords.txt"),True).Write(Join(oDict.Keys, vbCrLf)) 'found words saved
End With

Related

Find unknown name and surname in opened Word document, copy it and paste into the cell A12 in excel .activesheet with excel VBA

Hello Stackoverflow community.
My goal is to write a macro that finds unknown name (or both names written like so "Firstname Secondname") and surname (or both surnames written like so "Firstsurname-Secondsurname") in previously opened/active Word document - there will be only one Word document opened on the computer at the time. I want to find and copy the name and surname from point 2.
Next the macro should copy this name and paste it into the cell A12 in excel"s .activesheet. Only one excel workbook will be opened on the computer at the time.
The structure of the word document is quite consistent and apart from names and personal/id numbers everything stays the same, but no word bookmarks are created. I've found the text that never changes in point 1. = "REGON 364061169, NIP 951-24-09-783,".
It's before the name+surname I want to find and copy - I hope it helps.
But also the text "2. " is directly before the name+surname I want to copy and although in the whole contract the string "2. " appears over 20 times, this is the 1st "2. " occurence that precedes name+surname I want to copy and paste into excel's cell.
Name+surname changes all the time, is unknown and has different number of words/characters every time.
Sub FindNames()
'Variables declaration
Dim WordApp As Word.Application
Dim WordDoc As Word.Document
Dim ExcelApp As Excel.Application
Dim MySheet As Worksheet
Dim TextToFind As String
Dim FirstName As String
Dim Rng As Word.Range
Dim StartPos As Long
Dim EndPos As Long
Application.ScreenUpdating = False
TextToFind = "REGON 364061169, NIP 951-24-09-783," 'this text length is 21 caracters
'Assigning object variables
Set WordApp = GetObject(, "Word.Application")
Set ExcelApp = GetObject(, "Excel.Application")
Set WordDoc = WordApp.ActiveDocument
Set MySheet = Application.ActiveWorkbook.ActiveSheet
'Set MySheet = ExcelApp.ActiveWorkbook.ActiveSheet
Set Rng = WordApp.ActiveDocument.Content
'InStr function returns a Variant (Long) specifying the position of the _
first occurrence of one string within another.
StartPos = InStr(1, Rng, TextToFind) 'here we get 1420, we're looking 4 "TextToFind"
EndPos = InStr(StartPos, Rng, "§ 1. ") 'here we get 2742, we're looking 4 ",00zł"
If StartPos = 0 Or EndPos = 0 Then
MsgBox ("Client's names were not found!")
Else
StartPos = StartPos + Len(TextToFind) 'now start position is reassigned at 1455;
FirstName = Mid(Rng, StartPos, EndPos - StartPos)
End If
'len(Firstname)
End Sub
This is the best I can write, but I cannot isolate only name+surname from the bigger variable = FirstName.
My version of the code provided by #PeterT, which is not working for me.
Rng.SetRange Start:=StartPos, End:=EndPos
Debug.Print Rng.Paragraphs.Count
If StartPos = 0 Or EndPos = 0 Then
MsgBox ("Client's names were not found!")
'finding the paragraphs that follow the TextToFind1
Else
For Each Para In Rng.Paragraphs
'how to identify the second paragraph?
'these are not .ListParagraphs, they're normal paragraphs
'If Para.Range.ListParagraphs.Count = 1 Then
If Para.Range.Paragraphs.Count = 2 Then
'how to access the second paragraph?
'If Para.Range.ListFormat.ListValue = 2 Then
'Para.Range.Paragraphs(1).Next(Count:=1).Range
'If Para.Range.Paragraphs.Count = 2 Then
Debug.Print "Name = " & Para.Range.Words(1) & _
", Surname = " & Para.Range.Words(2)
End If
Next Para
End If
I can't access second paragraph and extract the "Michał Łukasz ROESLER" string.
I'd also like to extract "Katarzyna Paula STANISZKIS-KRAWCZYK" from the third paragraph in the Rng. Both of them are on the first page of the document.

This answer is deliberately separate from my previous example. That other example
is based on finding paragraphs formatted as ListParagraphs, and
remains valid if your search must include that formatting style.
This answer assumes the numbered paragraphs are simply regular paragraphs (albeit >indented and numbered). No error checking is performed in this example, e.g. if the >paragraph is not numbered or the names are located elsewhere in the paragraph.
By setting up the searchRange in the manner below, you are assured that the first paragraph is the one containing your search term. In this case, it's the paragraph for Item 1. Since the searchRange is defined using the search term, you're assured that the name is in the next paragraph. No loop is necessary.
Option Explicit
Sub FindNames2()
Dim textToFind As String
textToFind = "REGON 364061169, NIP 951-24-09-783,"
Dim searchArea As Word.Range
Set searchArea = ThisDocument.Content
Dim startPos As Long
Dim endPos As Long
startPos = InStr(1, searchArea, textToFind)
If (startPos = 0) Then Exit Sub
'--- adjust the area to start from where we found the text
' until the end of the document
searchArea.SetRange Start:=startPos, End:=searchArea.End
'--- we want the name at the start of the very next paragraph
' (the current paragraph with the text to find is paragraph 1)
Dim theParagraph As Word.Paragraph
Set theParagraph = searchArea.Paragraphs(2)
Dim itemNumber As Long
Dim firstName As String
Dim lastName As String
itemNumber = CLng(Trim(theParagraph.Range.Words(1)))
firstName = Trim$(theParagraph.Range.Words(3))
lastName = Trim$(theParagraph.Range.Words(4))
Debug.Print "Name = " & firstName & " " & lastName & " in Item #" & itemNumber
End Sub
A couple things to note from additional example in the OP.
The endPos may be zero, even if the search text is found. My testing showed that checking the startPos was sufficient.
When accessing a Word(3), for example, the returned text may have whitespace on one or both sides of the word. Using the Trim$ function removes that whitespace.
You can access the name in the paragraph below by incrementing from Paragraphs(2) to Paragraphs(3).

This example code assumes you are executing the macro from the MS Word document.
Option Explicit
Sub FindNames()
Dim textToFind As String
textToFind = "REGON 364061169, NIP 951-24-09-783,"
Dim searchArea As Word.Range
Set searchArea = ThisDocument.Content
Dim startPos As Long
Dim endPos As Long
startPos = InStr(1, searchArea, textToFind)
If startPos = 0 Then Exit Sub
'--- adjust the area to start from where we found the text
' until the end of the document
searchArea.SetRange Start:=startPos, End:=searchArea.End
'--- now find the list paragraphs that follow the text
Dim para As Word.Paragraph
For Each para In searchArea.Paragraphs
'--- identify the list paragraph
If para.Range.ListParagraphs.Count = 1 Then
'--- find the second item in the list
If para.Range.ListFormat.ListValue = 2 Then
Debug.Print "Name = " & para.Range.Words(1) & _
", Surname = " & para.Range.Words(2)
End If
End If
Next para
End Sub

The best way to do this is create a Word.Range, search the range, then adjust it to capture the names.
Dim srchRng as Word.Range
Dim thisDoc as Word.Document: Set thisDoc = Word.ActiveDocument
Set srchRange = thisDoc.Content
With srchRange.Find
.Text = "REGON 364061169, NIP 951-24-09-783,"
.Execute
If .Found = True Then
srchRange.MoveEndUntil Cset:="."
srchRange.MoveEnd wdWord, 3
If srchRange.Words.Last.Next.Text = "-" Then
srchRange.MoveEnd wdWord, 2
End If
Dim nameStart As Long
nameStart = InStr(1, srchRange.Text, "2. ")
Dim fullName As String
fullName = Mid(srchRange.Text, nameStart + 3)
End If
End With
Debug.Print fullName

Removing particular string from a cell

I have text in a range of cells like
Manufacturer#||#Coaster#|#|Width (side to side)#||#20" W####Height (bottom to top)#||#35" H#|#|Depth (front to back)#||#20.5" D####Seat Depth#||#14.25"**#|#|Material & Finish####**Composition#||#Wood Veneers & Solids#|#|Composition#||#Metal#|#|Style Elements####Style#||#Contemporary#|#|Style#||#Casual
From this cell i need to remove strings between #|#|"needtoremove"#### only without affecting other strings.
I have tried find and replace, finding #|#|*#### and replacing it with #|#|. However its not giving the exact result.
Can anyone help me?

The other solution will remove anything between the first #|#| and ####, event the #||# etc.
In case you only need to remove the text between #|#| and #### only if there is no other ##|| inbetween, I think the simplest way is to use a regex.
You will need to activate the Microsoft VBScript Regular Expressions 5.5 library in Tools->References from the VBA editor.
Change range("D166") to wherever your cell is. The expression as it is right now ("#\|#\|[A-Za-z0-9& ]*####")matches any text that starts with #|#|, ends with #### and has any number of alphanumerical character, & or space. You can add other caracters between the brakets if needed.
Sub remove()
Dim reg As New RegExp
Dim pattern As String
Dim replace As String
Dim strInput As String
strInput = Range("D166").Value
replace = ""
pattern = "#\|#\|[A-Za-z0-9& ]*####"
With reg
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If reg.test(strInput) Then Range("D166").Value = reg.replace(strInput, replace)
End Sub

Something like this.
If that value is in cell A1
Dim str As String
Dim i As Integer
Dim i2 As Integer
Dim ws As Excel.Worksheet
Set ws = Application.ActiveSheet
str = ws.Range("A1").Value
i = InStr(str, "#|#|")
i2 = InStr(str, "####")
str = Left(str, i) & Right(str, Len(str) - i2)
ws.Range("A1").Value = str

Read a Text file which has huge line (more than 1024 characters) using VBA macro

I need to pick a particular data from a text file. But this text file has data which is more than 1024 characters on a single line.
For eg: I need data between string text1 and text 2. My code takes only the first data between text1 & text2 in the huge line, and moves to next line. But previous huge line has multiple instances of text1 & text2. I am not able to get those data. Please help. Find below my code:
Sub Macro1()
Dim dat As String
Dim fn As String
fn = "C:\Users\SAMUEL\Desktop\123\Source1.TXT" '<---- change here
With CreateObject("Scripting.FileSystemObject").OpenTextFile(fn)
Do While Not .AtEndOfStream
dat = .Readline
If InStr(1, dat, "text1", vbTextCompare) > 0 Then
x = InStr(dat, "text1") + 8
y = InStr(dat, "text2")
Z = y - x
MsgBox Mid(dat, x, Z)
End If
Loop
.Close
End With
End Sub
I want to pick the data between Text1 and Text2 to a specific cell.
The data looks like "This is an Text1 awesome Text2 website. I like this Text1 website Text2."
This is a huge data which I copied from a website. When I save in a Text file, one line of this web data is more than 4000 characters. So the line in text file ends at 1024 characters and data moves to next line that becomes 3 lines. But My macro takes first 1024 in string "dat" and moves to second line of web data, that means it skips all data after 1024 characters to 4000 characters. The data I want which exists between Text1 and Text2 could be anywhere in whole 4000 characters, But It will be in same pattern. It will never be like Text1...Text1...Text2..

Using a regexp is a useful way to quickly replace all matches in a single shot, or work through each match (including multiple matches per line) as this sample does below.
Sub DisappearingSwannie()
Dim objFSO As Object
Dim objFil As Object
Dim objRegex As Object
Dim objRegMC As Object
Dim objRegM As Object
Dim strIn As String
Dim X
Dim lngCnt As Long
Dim fn As String
fn = "C:\temp\test.TXT" '<---- change here
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objRegex = CreateObject("vbscript.regexp")
Set objFil = objFSO.OpenTextFile(fn)
X = Split(objFil.readall, vbNewLine)
With objRegex
.Global = True
.Pattern = "text1(.+?)text2"
End With
For lngCnt = 1 To UBound(X)
If objRegex.test(X(lngCnt)) Then
Set objRegMC = objRegex.Execute(X(lngCnt))
For Each objRegM In objRegMC
Debug.Print "line " & lngCnt & " position:" & objRegM.firstindex
Next
End If
Next
End Sub

Here is a macro that looks in A1 and B1 for Text1 and Text2. It then allows you to pick a file to process and parses out the substrings from text1 to text2 inclusive. Finally, it splits them into chunks of no more than 1024 characters (ensuring that each chunk ends with a space, so as not to split words), and writes them into a series of rows in column A starting in A2.
Both the parsing of the substrings, and the breaking them up into 1024 character chunks, are accomplished using regular expressions. The "work" is done in VBA arrays as this is faster than going back and forth to the worksheet.
Since the length of a string variable can be approximately 2^31 characters, I doubt you will have any problem reading the entire document into a single variable and then processing it, instead of going line by line.
Since the macro has arguments, you will need to call it from another macro; or it should be trivial for you to change the code to allow different methods of input for text1 and text2.
There is no error checking.
If you do not want to include Text1 and Text2 in the results, a minor change in the regular expression pattern is all that would be required.
I used early binding so as to take advantage of the "hints" while writing the macro. This requires setting references as noted in the macro. However, it should be simple for you to change this to late binding if you wish.
You might also consider a modification so that the multi-row chunks are somehow differentiated from the single row chunks.
Enjoy
Option Explicit
'Set Reference to Microsoft Scripting Runtime
'Set Reference ot Microsoft VBScript Regular Expressions 5.5
Sub ExtractPhrases(Text1 As String, Text2 As String)
Dim FSO As FileSystemObject
Dim TS As TextStream
Dim FN As File, sFN As String
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim RE2 As RegExp, MC2 As MatchCollection, M2 As Match
Dim sPat As String
Dim S As String, sTemp As String
Dim V() As Variant, vRes() As Variant
Dim I As Long, J As Long, K As Long
Dim C As Range
Dim rRes As Range
'Get File path
With Application.FileDialog(msoFileDialogFilePicker)
.AllowMultiSelect = False
.ButtonName = "Process File"
.Filters.Add "Text", "*.txt", 1
.FilterIndex = 1
.InitialView = msoFileDialogViewDetails
If .Show = -1 Then sFN = .SelectedItems(1)
End With
'Read File into String variable
Set FSO = New FileSystemObject
Set TS = FSO.OpenTextFile(FileName:=sFN, IOMode:=ForReading, Create:=False)
S = TS.ReadAll
'Get results
Set RE = New RegExp
Set RE2 = New RegExp
With RE2
.Global = True
.MultiLine = False
.Pattern = "(\S[\s\S]{1,1023})(?:\s+|$)"
End With
With RE
.Global = True
.IgnoreCase = True
.Pattern = "\b" & Text1 & "\b([\s\S]+?)\b" & Text2 & "\b"
If .Test(S) = True Then
ReDim vRes(0)
Set MC = RE.Execute(S)
For I = 1 To MC.Count
Set MC2 = RE2.Execute(MC(I - 1))
ReDim V(1 To MC2.Count)
For J = 1 To MC2.Count
V(J) = MC2(J - 1).SubMatches(0)
Next J
ReDim Preserve vRes(UBound(vRes) + J - 1)
For J = 1 To MC2.Count
K = K + 1
vRes(K) = V(J)
Next J
Next I
End If
End With
vRes(0) = "Phrases"
'transpose vRes
ReDim V(1 To UBound(vRes) + 1, 1 To 1)
For I = 0 To UBound(vRes)
V(I + 1, 1) = vRes(I)
Next I
Set rRes = Range("a2").Resize(rowsize:=UBound(V))
Range(rRes(1), Cells(Rows.Count, rRes.Column)).Clear
rRes = V
End Sub

Use Dir to find file without "AAA"

I have some files that are called Team-(Random Number).txt.
Dir("Team-" & "*" & ".txt")
But when there have been changes there could be text files called Team-(Random Number)-AAA.txt, Team-(Random Number)-AAB.txt and so but the most recent file is always called Team-(Random Number).txt.
Since Dir only returns 1 file and does this randomly is there a way to get the file Team-(Random Number).txt?
It should be no problem if dir returned the result in a normal order but apparently it does it randomly.
I've thought of excluding the -AAA part but don't know what the syntax should. Or in a less efficient way to get all files and sort it in an array but with 10 - 200 files it's not very efficient.
Now I'm hoping could give me the syntax of excluding the part or other workaround for my problem thanks!

I'd say go for Regular Expressions.
Private Sub TeamTxtExists()
Dim Path As String, Pattern As String, FileFound As String
Dim REGEX As Object, Matches As Object
Dim oFSO As Object, oFolder As Object, oFile As Object
Path = "D:\Personal\Stack Overflow\" 'Modify as necessary.
Pattern = "(Team-(\d+).txt)"
Set REGEX = CreateObject("VBScript.RegExp")
With REGEX
.Pattern = Pattern
.Global = True
.IgnoreCase = True
End With
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFolder = oFSO.GetFolder(Path)
For Each oFile In oFolder.Files
Set Matches = REGEX.Execute(oFile.Name)
For Each Match In Matches
Debug.Print Match.Value
Next Match
Next oFile
End Sub
This will print in your immediate window (Ctrl-G in the VBE) all the names of text files that don't have AAA or the like in their filenames. Tried and tested.

In a similar vein to Loop through files in a folder using VBA?
Use Dir to efficiently find the first group of files that match team-xxxx.txt
Then zero in on the wanted match which could either be done with
Like for a simple match
Regexp for a harder match
Exit the Dir list on a successful match
I went with the regexp.
code
Sub LoopThroughFiles()
Dim objRegex As Object
Dim StrFile As String
Dim bFound As Boolean
bFound = False
Set objRegex = CreateObject("VBScript.RegExp")
objRegex.Pattern = "team-\d+"
StrFile = Dir("c:\temp\team-*.txt")
Do While Len(StrFile) > 0
If objRegex.test(StrFile) = False Then
StrFile = Dir
Else
bFound = True
MsgBox "Your file is " & StrFile
Exit Do
End If
Loop
If Not bFound Then MsgBox "No Match", vbCritical
End Sub

Is this helping?:
Dir("Your_folder_path_ending_with_a_\" & "Team-(*).txt")
Going a bit more in-depth and using the folder content shown in the picture:
This sub will return all the file names that only contain "Team-(Random Number).txt":
Sub showFileName()
Dim FolderPath As String: FolderPath = "C:\test\"
Dim Filter As String: Filter = "Team-(*).txt"
Dim dirTmp As String
dirTmp = Dir(FolderPath & Filter)
Do While Len(dirTmp) > 0
Debug.Print dirTmp
dirTmp = Dir
Loop
End Sub
The result is:
Team-(123).txt
Team-(14).txt
Team-(PI).txt

VB: Read "x" integer from a string and replace it

First of all, I am not from programming background and totally new to VB. For some reasons, I have to do a small scripting task at office.
I have gone through many posts here as well on other forums but not able to find the required information.
Ok so here's what I need to do:
I want to find an integer from a string (the integer can be any from 5 to 4095) and replace it with an integer of my choice.
I have done some analysis but not able to find a function which can search for "any" integer from a string. Searching for fixed integers is working fine.
e.g:
Convert: "There are 10 apples" - where 10 can be any number from 5 to 4095, not known to me.
To: "There are 5 apples" - 5 is the number I will manually give.
Any help is appreciated.
Thank you.
Edit
Final Code:
Set objFSO = CreateObject("Scripting.FileSystemObject")
strFile = "C:\Users\inkasap\Desktop\temp\IOParams_Backup.xml"
Set objFile = objFSO.OpenTextFile(strFile)
Do Until objFile.AtEndOfStream strLine = objFile.ReadLine
Dim re, strLine
Set re = New RegExp
if InStr(strLine,"name=""BIOSCUPP") > 0 Then
re.Pattern = "\d+"
Set colMatch = re.Execute(strLine)
For Each objMatch In colMatch
strLine = Replace(strLine, objMatch.Value, "30")
Next end
if WScript.Echo strLine
Loop

Well, you can use regular expressions to find any numerical value, and then replace it:
Dim re, test
Set re = New RegExp
re.Pattern = "\d+"
test = "There are 4000 apples"
Set colMatch = re.Execute(test)
For Each objMatch In colMatch
MsgBox(Replace(test,objMatch.Value,"5"))
Next
This page contains all informations you need.

You can use a RegularExpressions with Regex.Replace method:
Dim str = "There are 10 apples"
Dim regex = New System.Text.RegularExpressions.Regex("\d+", System.Text.RegularExpressions.RegexOptions.Compiled)
Dim replaced = regex.Replace(str, "5") ' "There are 5 apples" '
Edit: Just only seen that you need a vbscript instead of VB.NET approach.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Search a file based on a text file's words - search

Related

Find unknown name and surname in opened Word document, copy it and paste into the cell A12 in excel .activesheet with excel VBA

Removing particular string from a cell

Read a Text file which has huge line (more than 1024 characters) using VBA macro

Use Dir to find file without "AAA"

VB: Read "x" integer from a string and replace it

Categories

Resources