find matching string in two strings - string

I would like to get hints for a perl script that finds the longest common substring present in two strings. Each string is maximal 500 characters long.
For example
abcsffwqfwqsdfasdfTHISISANAPPLEfasdfasdfsdfsadfasdfsdaf4353.54.4fdfsdgg
detertqteqtTHISISANAPPLEafsedfgwetwqrgtwrgtwetpqw4t5osdavm\wert4384..53
The output should be THISISANAPPLE
Sounds easy, but may not be trivial.
Anyone has an idea?

Check String::LCSS_XS
use String::LCSS_XS 'lcss';
my ($s1,$s2) = (
"abcsffwqfwqsdfasdfTHISISANAPPLEfasdfasdfsdfsadfasdfsdaf4353.54.4fdfsdgg",
"detertqteqtTHISISANAPPLEafsedfgwetwqrgtwrgtwetpqw4t5osdavm\wert4384..53"
);
my $longest = lcss ($s1, $s2);
print "$longest\n";
output
THISISANAPPLE

Related

Regex to find numbers that are not in a phrase

If I have e.g. this string:
Beschreibung Menge VK-Preis MwSt% Betrag
Schadenbewertunginkl.Restwertermittlung 1 25,00€ 19 25,00€
Rechnungsbetragexcl.MwSt.: 25,00€
MwSt.(19%): 4,75€
Rechnungsbetragincl.MwSt.: 123.029,75€
I want to extract all the numbers.
My regexes are:
regex_up_to_thousand = r'\b(?:\d{1,3}){1}(?:,{1}\d{2})\b'
and
regex_every_price = r'\b(?:\d{1,3}(\.|,))+(:?\d{3}(\.|,))(?:\d{2})\b'
My idea was to first get the "big" prices, remove them from the text and get the other numbers.
Which works in most cases, until I have a date that looks like this maybe
Gutachtennummer: 1009126 Leistungsdatum: 11.10.2021
I would get the 11.10 with my second regex, and I don't know how to prevent this.
I thought the \b would help, but sadly not.
Any ideas?
It's not the end of the world, since I do a lot of math in the background, but it's a possibility that a date would fit some values and I calculate something wrong in the end.
You could try the following pattern.
\b\d+(?:(?:\.|,)\d{3})*(?:(?:\.|,)\d{2})\b(?!\W\d)
The main thing is (?!\W\d) at the end which ensures that after your amount you will not have a construct of 1 non-word character followed by 1 digit.
Example: https://regex101.com/r/q1ic9S/1

Replacing a certain part of string with a pre-specified Value

I am fairly new to Puppet and Ruby. Most likely this question has been asked before but I am not able to find any relevant information.
In my puppet code I will have a string variable retrieved from the fact hostname.
$n="$facts['hostname'].ex-ample.com"
I am expecting to get the values like these
DEV-123456-02B.ex-ample.com,
SCC-123456-02A.ex-ample.com,
DEV-123456-03B.ex-ample.com,
SCC-999999-04A.ex-ample.com
I want to perform the following action. Change the string to lowercase and then replace the
-02, -03 or -04 to -01.
So my output would be like
dev-123456-01b.ex-ample.com,
scc-123456-01a.ex-ample.com,
dev-123456-01b.ex-ample.com,
scc-999999-01a.ex-ample.com
I figured I would need to use .downcase on $n to make everything lowercase. But I am not sure how to replace the digits. I was thinking of .gsub or split but not sure how. I would prefer to make this happen in a oneline code.
If you really want a one-liner, you could run this against each string:
str
.downcase
.split('-')
.map
.with_index { |substr, i| i == 2 ? substr.gsub(/0[0-9]/, '01') : substr }
.join('-')
Without knowing what format your input list is taking, I'm not sure how to advise on how to iterate through it, but maybe you have that covered already. Hope it helps.
Note that Puppet and Ruby are entirely different languages and the other answers are for Ruby and won't work in Puppet.
What you need is:
$h = downcase(regsubst($facts['hostname'], '..(.)$', '01\1'))
$n = "${h}.ex-ample.com"
notice($n)
Note:
The downcase and regsubst functions come from stdlib.
I do a regex search and replace using the regsubst function and replace ..(.)$ - 2 characters followed by another one that I capture at the end of the string and replace that with 01 and the captured string.
All of that is then downcased.
If the -01--04 part is always on the same string index you could use that to replace the content.
original = 'DEV-123456-02B.ex-ample.com'
# 11 -^
string = original.downcase # creates a new downcased string
string[11, 2] = '01' # replace from index 11, 2 characters
string #=> "dev-123456-01b.ex-ample.com"

string parts seperated by ; to ASCII written in a new string

Something like that is coming in:
str="Hello;this;is;a;text"
What I do want as result is this:
result="72:101:108:108:111;116:104:105:115;..."
which should be the Text in ASCII.
You could use string matching to get each word separated by ; and then convert, concat:
local str = "Hello;this;is;a;text"
for word in str:gmatch("[^;]+") do
ascii = table.pack(word:byte(1, -1))
local converted = table.concat(ascii, ":")
print(converted)
end
The output of the above code is:
72:101:108:108:111
116:104:105:115
105:115
97
116:101:120:116
I'll leave the rest of work to you. Hint: use table.concat.
Here is another approach, which exploits that fact that gsub accepts a table where it reads replacements:
T={}
for c=0,255 do
T[string.char(c)]=c..":"
end
T[";"]=";"
str="Hello;this;is;a;text"
result=str:gsub(".",T):gsub(":;",";")
print(result)
Another possibility:
function convert(s)
return (s:gsub('.',function (s)
if s == ';' then return s end
return s:byte()..':'
end)
:gsub(':;',';')
:gsub(':$',''))
end
print(convert 'Hello;this;is;a;text')
Finding certain character or string (such as ";") can be done by using string.find - https://www.lua.org/pil/20.1.html
Converting character to its ASCII code can be done by string.byte - https://www.lua.org/pil/20.html
What you need to do is build a new string using two functions mentioned above. If you need more string-based functions please visit official Lua site: https://www.lua.org/pil/contents.html
Okay...I got way further, but I can't find how to return a string made up of two seperate strings like
str=str1&" "&str2

Reading from a string using sscanf in Matlab

I'm trying to read a string in a specific format
RealSociedad
this is one example of string and what I want to extract is the name of the team.
I've tried something like this,
houseteam = sscanf(str, '%s');
but it does not work, why?
You can use regexprep like you did in your post above to do this for you. Even though your post says to use sscanf and from the comments in your post, you'd like to see this done using regexprep. You would have to do this using two nested regexprep calls, and you can retrieve the team name (i.e. RealSociedad) like so, given that str is in the format that you have provided:
str = 'RealSociedad';
houseteam = regexprep(regexprep(str, '^<a(.*)">', ''), '</a>$', '')
This looks very intimidating, but let's break this up. First, look at this statement:
regexprep(str, '^<a(.*)">', '')
How regexprep works is you specify the string you want to analyze, the pattern you are searching for, then what you want to replace this pattern with. The pattern we are looking for is:
^<a(.*)">
This says you are looking for patterns where the beginning of the string starts with a a<. After this, the (.*)"> is performing a greedy evaluation. This is saying that we want to find the longest sequence of characters until we reach the characters of ">. As such, what the regular expression will match is the following string:
<ahref="/teams/spain/real-sociedad-de-futbol/2028/">
We then replace this with a blank string. As such, the output of the first regexprep call will be this:
RealSociedad</a>
We want to get rid of the </a> string, and so we would make another regexprep call where we look for the </a> at the end of the string, then replace this with the blank string yet again. The pattern you are looking for is thus:
</a>$
The dollar sign ($) symbolizes that this pattern should appear at the end of the string. If we find such a pattern, we will replace it with the blank string. Therefore, what we get in the end is:
RealSociedad
Found a solution. So, %s stops when it finds a space.
str = regexprep(str, '<', ' <');
str = regexprep(str, '>', '> ');
houseteam = sscanf(str, '%*s %s %*s');
This will create a space between my desired string.

Lua: Search a specific string

Hi all tried all the string pattrens and library arguments but still stuck.
i want to get the name of the director from the following string i have tried the string.matcH but it matches the from the first character it finD from the string
the string is...
fixstrdirector = {id:39254,cast:[{id:15250,name:Hope Davis,character:Aunt Debra,order:5,cast_id:10,profile_path:/aIHF11Ss8P0A8JUfiWf8OHPVhOs.jpg},{id:53650,name:Anthony Mackie,character:Finn,order:3,cast_id:11,profile_path:/5VGGJ0Co8SC94iiedWb2o3C36T.jpg},{id:19034,name:Evangeline Lilly,character:Bailey Tallet,order:2,cast_id:12,profile_path:/oAOpJKgKEdW49jXrjvUcPcEQJb3.jpg},{id:6968,name:Hugh Jackman,character:Charlie Kenton,order:0,cast_id:13,profile_path:/wnl7esRbP3paALKn4bCr0k8qaFu.jpg},{id:79072,name:Kevin Durand,character:Ricky,order:4,cast_id:14,profile_path:/c95tTUjx5T0D0ROqTcINojpH6nB.jpg},{id:234479,name:Dakota Goyo,character:Max Kenton,order:1,cast_id:15,profile_path:/7PU6n4fhDuFwuwcYVyRNVEZE7ct.jpg},{id:8986,name:James Rebhorn,character:Marvin,order:6,cast_id:16,profile_path:/ezETMv0YM0Rg6YhKpu4vHuIY37D.jpg},{id:930729,name:Marco Ruggeri,character:Cliff,order:7,cast_id:17,profile_path:/1Ox63ukTd2yfOf1LVJOMXwmeQjO.jpg},{id:19860,name:Karl Yune,character:Tak Mashido,order:8,cast_id:18,profile_path:/qK315vPObCNdywdRN66971FtFez.jpg},{id:111206,name:Olga Fonda,character:Farra Lemkova,order:9,cast_id:19,profile_path:/j1qabOHf3Pf82f1lFpUmdF5XvSp.jpg},{id:53176,name:John Gatins,character:Kingpin,order:10,cast_id:41,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:1126350,name:Sophie Levy,character:Big Sister,order:11,cast_id:42,profile_path:null},{id:1126351,name:Tess Levy,character:Little Sister,order:12,cast_id:43,profile_path:null},{id:1126352,name:Charlie Levy,character:Littlest Sister,order:13,cast_id:44,profile_path:null},{id:187983,name:Gregory Sims,character:Bill Panner,order:14,cast_id:45,profile_path:null}],crew:[{id:58726,name:Leslie Bohem,department:Writing,job:Screenplay,profile_path:null},{id:53176,name:John Gatins,department:Writing,job:Screenplay,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:17825,name:Shawn Levy,department:Directing,job:Director,profile_path:/7f2f8EXdlWsPYN0HPGcIlG21xU.jpg},{id:12415,name:Richard Matheson,department:Writing,job:Story,profile_path:null},{id:57113,name:Dan Gilroy,department:Writing,job:Story,profile_path:null},{id:25210,name:Jeremy Leven,department:Writing,job:Story,profile_path:null},{id:17825,name:Shawn Levy,department:Production,job:Producer,profile_path:/7f2f8EXdlWsPYN0HPGcIlG21xU.jpg},{id:34970,name:Susan Montford,department:Production,job:Producer,profile_path:/1XJt51Y9ciPhkHrAYE0j6Jsmgji.jpg},{id:3183,name:Don Murphy,department:Production,job:Producer,profile_path:null},{id:34967,name:Rick Benattar,department:Production,job:Producer,profile_path:null},{id:1126348,name:Eric Hedayat,department:Production,job:Producer,profile_path:null},{id:186721,name:Ron Ames,department:Production,job:Producer,profile_path:null},{id:10956,name:Josh McLaglen,department:Production,job:Executive Producer,profile_path:null},{id:57634,name:Mary McLaglen,department:Production,job:Executive Producer,profile_path:null},{id:23779,name:Jack Rapke,department:Production,job:Executive Producer,profile_path:null},{id:488,name:Steven Spielberg,department:Production,job:Executive Producer,profile_path:/cuIYdFbEe89PHpoiOS9tmo84ED2.jpg},{id:30,name:Steve Starkey,department:Production,job:Executive Producer,profile_path:null},{id:24,name:Robert Zemeckis,department:Production,job:Executive Producer,profile_path:/isCuZ9PWIOyXzdf3ihodXzjIumL.jpg},{id:531,name:Danny Elfman,department:Sound,job:Original Music Composer,profile_path:/pWacZpYPos8io22nEiim7d3wp2j.jpg},{id:18265,name:Mauro Fiore,department:Crew,job:Cinematography,profile_path:null},{id:54271,name:Dean Zimmerman,department:Editing,job:Editor,profile_path:null},{id:25365,name:Richard Hicks,department:Production,job:Casting,profile_path:null},{id:5490,name:David Rubin,department:Production,job:Casting,profile_path:null},{id:52088,name:Tom Meyer,department:Art,job:Production Design,profile_path:null}]}
i have tried string.match(fixstrdirector,"name:(.+),department:Directing")
but it gives me the from the first occurace it find the name to the end of thr string
output:
Hope Davis,character:Aunt Debra,order:5,cast_id:10,profile_path:/aIHF11Ss8P0A8JUfiWf8OHPVhOs.jpg},{id:53650,name:Anthony Mackie,character:Finn,order:3,cast_id:11,profile_path:/5VGGJ0Co8SC94iiedWb2o3C36T.jpg},{id:19034,name:Evangeline Lilly,character:Bailey Tallet,order:2,cast_id:12,profile_path:/oAOpJKgKEdW49jXrjvUcPcEQJb3.jpg},{id:6968,name:Hugh Jackman,character:Charlie Kenton,order:0,cast_id:13,profile_path:/wnl7esRbP3paALKn4bCr0k8qaFu.jpg},{id:79072,name:Kevin Durand,character:Ricky,order:4,cast_id:14,profile_path:/c95tTUjx5T0D0ROqTcINojpH6nB.jpg},{id:234479,name:Dakota Goyo,character:Max Kenton,order:1,cast_id:15,profile_path:/7PU6n4fhDuFwuwcYVyRNVEZE7ct.jpg},{id:8986,name:James Rebhorn,character:Marvin,order:6,cast_id:16,profile_path:/ezETMv0YM0Rg6YhKpu4vHuIY37D.jpg},{id:930729,name:Marco Ruggeri,character:Cliff,order:7,cast_id:17,profile_path:/1Ox63ukTd2yfOf1LVJOMXwmeQjO.jpg},{id:19860,name:Karl Yune,character:Tak Mashido,order:8,cast_id:18,profile_path:/qK315vPObCNdywdRN66971FtFez.jpg},{id:111206,name:Olga Fonda,character:Farra Lemkova,order:9,cast_id:19,profile_path:/j1qabOHf3Pf82f1lFpUmdF5XvSp.jpg},{id:53176,name:John Gatins,character:Kingpin,order:10,cast_id:41,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:1126350,name:Sophie Levy,character:Big Sister,order:11,cast_id:42,profile_path:null},{id:1126351,name:Tess Levy,character:Little Sister,order:12,cast_id:43,profile_path:null},{id:1126352,name:Charlie Levy,character:Littlest Sister,order:13,cast_id:44,profile_path:null},{id:187983,name:Gregory Sims,character:Bill Panner,order:14,cast_id:45,profile_path:null}],crew:[{id:58726,name:Leslie Bohem,department:Writing,job:Screenplay,profile_path:null},{id:53176,name:John Gatins,department:Writing,job:Screenplay,profile_path:/A2MqnSKVzOuBf8MVfNyve2h2LxJ.jpg},{id:17825,name:Shawn Levy
You're searching from the first occurrence of "name:" until the "department:Directing" with everything in between.
Instead, you need to restrict what can be between the two strings. Here for example I'm saying that the characters that make up the name can only be alphanumeric or a space:
string.match(fixstrdirector,"name:([%w ]+),department:Directing")
Alternatively, given that there's a comma separating the parameters, a better approach would be to search for "name:" followed by any characters other than a comma, followed by "department:Directing":
string.match(fixstrdirector,"name:([^,]+),department:Directing")
Of course that wouldn't work if the name had a comma it in!
Lua patterns provides - modifier for tasks as you have above. As stated on PiL - Section 20.2:
The + modifier matches one or more characters of the original class.
It will always get the longest sequence that matches the pattern.
Like *, the modifier - also matches zero or more occurrences of
characters of the original class. However, instead of matching the
longest sequence, it matches the shortest one.
Next, when you are using . to match, it'll find any and all characters satisfying the pattern. Therefore, you'll get the result from first occurence of name until the ,department:Directing is found. Since you know that it is a JSON data, you can try to match for [^,]; that is, non-comma characters.
So, for your case try:
local tAllNames = {}
for sName in fixstrdirector:gmatch( "name:([^,]-),department:Directing" ) do
tAllNames[ #tAllNames + 1 ] = sName
end
and all your required names will be stored in the table tAllNames. An example of the above can be seen at codepad.

Resources