String contains substring and substring not part of longer word (exact match) - string

I have captured the full text of a PDF-file in a string called pdfText.
Next I am looping through an array containing substrings to be found/searched for in the pdfText-string.
One of the substrings is Invoice.
Both pdfText and the substrings I am searching for are converted to lower case.
If at least one of the substrings are found in the pdfText, a boolean is set to true.
Now, I have an example where the pdtText contains '...Net amount to be invoiced...'. This is the only variant of 'invoice' in the text.
This of course returns true if I use
substring = "Invoice" ... pdfText.contains(substring.ToLower).
But in this case I need it to return false. I need to find only exact matches.
Another example, if the pdfText contains '...This is an invoice. Please pay....Net amount to be invoiced...' the boolean should be set to true because of the first invoice-match, but not the second invoiced-(non)match.
So what I am looking for is to find a substring Invoice in a string pdfText and make sure, that the substring is not part of a longer word invoiced, invoice-process etc.. Note, that invoice. should return True.
I believe this should be possible, but cannot wrap my head around it currently.
I might need to use regex?

This one uses the RegEx, with a slight change, proposed by #Mederic at https://stackoverflow.com/a/45587916/2326360
Use the build in UiPath activity Is Match, found under Programming->String.
Use it inside your loop, with the current settings.
The RegEx is: substring+"[^a-zA-Z]"
I have declared the following variables:

RegEx would be a good approach.
I only started RegEx not long ago but I think this would work fine.
RegEx:
(invoice)[^a-zA-Z]
Explanation:
() Creates a Capture Group
invoice looks for the match for invoice
[^a-zA-Z] Checks there are no characters from a-z or A-Z after
Example:
Sample: This was invoiced
Result: No Result
Sample: This is an invoice.
Result: Match on invoice. Capture group 1 = invoice
Implementation:
Dim m As Match = Regex.Match(pdfText.ToLower,"(invoice)[^a-zA-Z]")
' If successful, write the group.
If (m.Success) Then
Dim key As String = m.Groups(1).Value
Console.WriteLine(key)
End If

Related

Check occurrence of a string, a part of which might vary

I am trying to search for a string "SAMPLEX_FIND", where abphabet in place of "X" might vary. For example- "ABCSAMPLEA_FINDER", "asrSAMPLEB_FINDer" etc.
You can use regex for that:
import re
example = "ABCSAMPLEA_FINDER"
print(re.findall(r".*SAMPLE._FIND.*", example)[0])
Note that if the function couldn't find the pattern, the result would be an empty list and the index won't work

replace 2 or 3 words in one sentence in one cell from another words in another cell on Excel

I was looking for a solution and I found it here
replacing many words every one with alternative word
But now I'm using a alternative code that I've got from the link below that post, which is case sensitve.
Function SubstituteMultipleCS(text As String, old_text As Range, new_text As Range)
Dim i As Single
For i = 1 To old_text.Cells.Count
Result = Replace(text, old_text.Cells(i), new_text.Cells(i))
text = Result
Next i
SubstituteMultipleCS = Result
End Function
I'm using it to make German Anki cards so I need to replace some words with ___. It's working with one single word or a bunch of words if they are together, but...
The problem is the following:
Some verbs conjugation have a sentence structure when I must place the main verb after the noun and the particle, which belongs to the verb, at the end. Something like this
As you can see in the picture, the verb "schaute an" is not replaced by the new word because "schaute" is separated from "an" in the original sentence.
Is there any way to fix this?
thank you.
Here is a formula you may use (which works for your current sample data:
Formula in C2:
=IFERROR(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(" "&SUBSTITUTE(B2,"."," ")&" "," "&FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[position() = 1]")&" ",D2,1),IFERROR(" "&FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[position() = 2]")&" ",""),D2,1),IFERROR(" "&FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[position() = 3]")&" ",""),D2,1))&".","")
The advantage of nested substitutes is that we can tell the function to only replace the first occurence if you had a sentence where multiple could occur. Not sure if it's watertight.

Lua: Search a specific string

Hi all tried all the string pattrens and library arguments but still stuck.
i want to get the name of the director from the following string i have tried the string.matcH but it matches the from the first character it finD from the string
the string is...
fixstrdirector = {id:39254,cast:[{id:15250,name:Hope Davis,character:Aunt Debra,order:5,cast_id:10,profile_path:/aIHF11Ss8P0A8JUfiWf8OHPVhOs.jpg},{id:53650,name:Anthony Mackie,character:Finn,order:3,cast_id:11,profile_path:/5VGGJ0Co8SC94iiedWb2o3C36T.jpg},{id:19034,name:Evangeline Lilly,character:Bailey Tallet,order:2,cast_id:12,profile_path:/oAOpJKgKEdW49jXrjvUcPcEQJb3.jpg},{id:6968,name:Hugh Jackman,character:Charlie Kenton,order:0,cast_id:13,profile_path:/wnl7esRbP3paALKn4bCr0k8qaFu.jpg},{id:79072,name:Kevin Durand,character:Ricky,order:4,cast_id:14,profile_path:/c95tTUjx5T0D0ROqTcINojpH6nB.jpg},{id:234479,name:Dakota Goyo,character:Max Kenton,order:1,cast_id:15,profile_path:/7PU6n4fhDuFwuwcYVyRNVEZE7ct.jpg},{id:8986,name:James Rebhorn,character:Marvin,order:6,cast_id:16,profile_path:/ezETMv0YM0Rg6YhKpu4vHuIY37D.jpg},{id:930729,name:Marco Ruggeri,character:Cliff,order:7,cast_id:17,profile_path:/1Ox63ukTd2yfOf1LVJOMXwmeQjO.jpg},{id:19860,name:Karl Yune,character:Tak Mashido,order:8,cast_id:18,profile_path:/qK315vPObCNdywdRN66971FtFez.jpg},{id:111206,name:Olga Fonda,character:Farra Lemkova,order:9,cast_id:19,profile_path:/j1qabOHf3Pf82f1lFpUmdF5XvSp.jpg},{id:53176,name:John Gatins,character:Kingpin,order:10,cast_id:41,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:1126350,name:Sophie Levy,character:Big Sister,order:11,cast_id:42,profile_path:null},{id:1126351,name:Tess Levy,character:Little Sister,order:12,cast_id:43,profile_path:null},{id:1126352,name:Charlie Levy,character:Littlest Sister,order:13,cast_id:44,profile_path:null},{id:187983,name:Gregory Sims,character:Bill Panner,order:14,cast_id:45,profile_path:null}],crew:[{id:58726,name:Leslie Bohem,department:Writing,job:Screenplay,profile_path:null},{id:53176,name:John Gatins,department:Writing,job:Screenplay,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:17825,name:Shawn Levy,department:Directing,job:Director,profile_path:/7f2f8EXdlWsPYN0HPGcIlG21xU.jpg},{id:12415,name:Richard Matheson,department:Writing,job:Story,profile_path:null},{id:57113,name:Dan Gilroy,department:Writing,job:Story,profile_path:null},{id:25210,name:Jeremy Leven,department:Writing,job:Story,profile_path:null},{id:17825,name:Shawn Levy,department:Production,job:Producer,profile_path:/7f2f8EXdlWsPYN0HPGcIlG21xU.jpg},{id:34970,name:Susan Montford,department:Production,job:Producer,profile_path:/1XJt51Y9ciPhkHrAYE0j6Jsmgji.jpg},{id:3183,name:Don Murphy,department:Production,job:Producer,profile_path:null},{id:34967,name:Rick Benattar,department:Production,job:Producer,profile_path:null},{id:1126348,name:Eric Hedayat,department:Production,job:Producer,profile_path:null},{id:186721,name:Ron Ames,department:Production,job:Producer,profile_path:null},{id:10956,name:Josh McLaglen,department:Production,job:Executive Producer,profile_path:null},{id:57634,name:Mary McLaglen,department:Production,job:Executive Producer,profile_path:null},{id:23779,name:Jack Rapke,department:Production,job:Executive Producer,profile_path:null},{id:488,name:Steven Spielberg,department:Production,job:Executive Producer,profile_path:/cuIYdFbEe89PHpoiOS9tmo84ED2.jpg},{id:30,name:Steve Starkey,department:Production,job:Executive Producer,profile_path:null},{id:24,name:Robert Zemeckis,department:Production,job:Executive Producer,profile_path:/isCuZ9PWIOyXzdf3ihodXzjIumL.jpg},{id:531,name:Danny Elfman,department:Sound,job:Original Music Composer,profile_path:/pWacZpYPos8io22nEiim7d3wp2j.jpg},{id:18265,name:Mauro Fiore,department:Crew,job:Cinematography,profile_path:null},{id:54271,name:Dean Zimmerman,department:Editing,job:Editor,profile_path:null},{id:25365,name:Richard Hicks,department:Production,job:Casting,profile_path:null},{id:5490,name:David Rubin,department:Production,job:Casting,profile_path:null},{id:52088,name:Tom Meyer,department:Art,job:Production Design,profile_path:null}]}
i have tried string.match(fixstrdirector,"name:(.+),department:Directing")
but it gives me the from the first occurace it find the name to the end of thr string
output:
Hope Davis,character:Aunt Debra,order:5,cast_id:10,profile_path:/aIHF11Ss8P0A8JUfiWf8OHPVhOs.jpg},{id:53650,name:Anthony Mackie,character:Finn,order:3,cast_id:11,profile_path:/5VGGJ0Co8SC94iiedWb2o3C36T.jpg},{id:19034,name:Evangeline Lilly,character:Bailey Tallet,order:2,cast_id:12,profile_path:/oAOpJKgKEdW49jXrjvUcPcEQJb3.jpg},{id:6968,name:Hugh Jackman,character:Charlie Kenton,order:0,cast_id:13,profile_path:/wnl7esRbP3paALKn4bCr0k8qaFu.jpg},{id:79072,name:Kevin Durand,character:Ricky,order:4,cast_id:14,profile_path:/c95tTUjx5T0D0ROqTcINojpH6nB.jpg},{id:234479,name:Dakota Goyo,character:Max Kenton,order:1,cast_id:15,profile_path:/7PU6n4fhDuFwuwcYVyRNVEZE7ct.jpg},{id:8986,name:James Rebhorn,character:Marvin,order:6,cast_id:16,profile_path:/ezETMv0YM0Rg6YhKpu4vHuIY37D.jpg},{id:930729,name:Marco Ruggeri,character:Cliff,order:7,cast_id:17,profile_path:/1Ox63ukTd2yfOf1LVJOMXwmeQjO.jpg},{id:19860,name:Karl Yune,character:Tak Mashido,order:8,cast_id:18,profile_path:/qK315vPObCNdywdRN66971FtFez.jpg},{id:111206,name:Olga Fonda,character:Farra Lemkova,order:9,cast_id:19,profile_path:/j1qabOHf3Pf82f1lFpUmdF5XvSp.jpg},{id:53176,name:John Gatins,character:Kingpin,order:10,cast_id:41,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:1126350,name:Sophie Levy,character:Big Sister,order:11,cast_id:42,profile_path:null},{id:1126351,name:Tess Levy,character:Little Sister,order:12,cast_id:43,profile_path:null},{id:1126352,name:Charlie Levy,character:Littlest Sister,order:13,cast_id:44,profile_path:null},{id:187983,name:Gregory Sims,character:Bill Panner,order:14,cast_id:45,profile_path:null}],crew:[{id:58726,name:Leslie Bohem,department:Writing,job:Screenplay,profile_path:null},{id:53176,name:John Gatins,department:Writing,job:Screenplay,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:17825,name:Shawn Levy
You're searching from the first occurrence of "name:" until the "department:Directing" with everything in between.
Instead, you need to restrict what can be between the two strings. Here for example I'm saying that the characters that make up the name can only be alphanumeric or a space:
string.match(fixstrdirector,"name:([%w ]+),department:Directing")
Alternatively, given that there's a comma separating the parameters, a better approach would be to search for "name:" followed by any characters other than a comma, followed by "department:Directing":
string.match(fixstrdirector,"name:([^,]+),department:Directing")
Of course that wouldn't work if the name had a comma it in!
Lua patterns provides - modifier for tasks as you have above. As stated on PiL - Section 20.2:
The + modifier matches one or more characters of the original class.
It will always get the longest sequence that matches the pattern.
Like *, the modifier - also matches zero or more occurrences of
characters of the original class. However, instead of matching the
longest sequence, it matches the shortest one.
Next, when you are using . to match, it'll find any and all characters satisfying the pattern. Therefore, you'll get the result from first occurence of name until the ,department:Directing is found. Since you know that it is a JSON data, you can try to match for [^,]; that is, non-comma characters.
So, for your case try:
local tAllNames = {}
for sName in fixstrdirector:gmatch( "name:([^,]-),department:Directing" ) do
tAllNames[ #tAllNames + 1 ] = sName
end
and all your required names will be stored in the table tAllNames. An example of the above can be seen at codepad.

C# 4.0 function to check for first four characters in the string

I need to validate for valid code name.
So, my string can have values like below:
String test = "C000. ", "C010. ", "C020. ", "C030. ", "CA00. ","C0B0. ","C00C. "
So my function needs to validate below conditions:
It should start with C
After that next 3 characters should be numeric before .
Rest it can be anything.
So in above string values, only ["C000.", "C010.", "C020.", "C030."] are valid ones.
EDIT:
Below is the code I tried:
if (nameObject.Title.StartsWith(String.Format("^[C][0-9]{3}$",nameObject.Title)))
I'd suggest a regex, for example (written off the top of my head, may need work):
string s = "C030.";
Regex reg = new Regex("C[0-9]{3,3}\\.");
bool isMatch = reg.IsMatch(s);
This regex should do the trick:
Regex.IsMatch(input, #"C[0-9]{3}\..*")
Check out http://www.techotopia.com/index.php/Working_with_Strings_in_C_Sharp
for a quick tutorial on (among other things) individual access of string elements, so you can test each element for your criteria.
If you think your criteria may change, using regular expressions gives you maximum flexibility (but is more runtime intensive than regular string-element evaluation). In your case, it may be overkill, IMHO.

Split string by first delimiter

I have a column with a long list of folder and file names. The folders and file names vary. I want to extract the file name from the column into another column but I struggling to do this in Excel.
Example of column data:(files and folder altered to hide details that should not be public)
c:\data\1\nc2\media\ss\system media\ne\d - wnd enging works v5.swf
c:\data\1\nc2\media\ss\special campaigns\samns dec 2012\trainerv5.swf
C:\Local\Messages\17362~000000001~20131231235910~4.MUF
c:\data\1\nc2\media\ss\system media\tl\nd - tfl statusv4.swf
c:\data\1\nc2\media\ss\system media\core\ss_bagage v2.swf
I know I should be able to search from the right to the first occurence of "\" but I can't figure out the syntax.
Many thanks
UPDATE:
Formula =RIGHT(B2,LEN(B2)-SEARCH("\",B2,1)) should work, but it shows incorrect results. But If I change it to search for "." it pulls out the file extension. So there is a key item I'm missing
=RIGHT(A1,LEN(A1)-FIND("~",SUBSTITUTE(A1,"\","~",LEN(A1)-LEN(SUBSTITUTE(A1,"\","")))))
copy it in any column say b drag down,you are done
VBA is a more efficient option if you have many files to parse. Create a module and add the below:
Function GetFileName(file As String) As String
Set fso = CreateObject("Scripting.FileSystemObject")
GetFileName = fso.GetFileName(file)
End Function
There are several different ways to get the text following the last slash in a string, including the following formula. In this example, H15 is the cell containing the string to search. If it can't find a slash, it returns the "-" (dash) character.
=iferror(RIGHT(H15,LEN(H15)-SEARCH("|",SUBSTITUTE(H15,"/","|",LEN(H15)-LEN(SUBSTITUTE(H15,"/",""))))),"-")
The formula first finds the number of slashes in the string. LEN gives the total length of the string, and LEN of the string without slashes after using SUBSTITUTE to eliminate the slashes in the original string - the difference is the number of slashes.
Then, you substitute in a marker character(I used "|") for the last slash. By searching for the marker, you find where the bit after the slash starts. The total length of the string minus where the marker starts tells you how many characters to take from the right, which you then do.
If you need more generic string parsing and are willing to use a little bit of VBA, you can use the split function as suggested by Jamie Bull in his answer to this question on SuperUser.
His function will use any character you choose to split the string into segments and return whichever segment you choose.
I've copied Jamie's function here for convenient reference:
Function STR_SPLIT(str, sep, n) As String
Dim V() As String
V = Split(str, sep)
STR_SPLIT = V(n - 1)
End Function

Resources