LevenshteinSim() Approximate string matching - string

I am using levenshteinSim() to do the approximate string matching. I am facing a problem
here is what my data look like
string = "Mitchell"
stringvector = c("Ray Mitchell", "Mitchell Dough","Juila Mitch")
.
I want the algorithm to match only second part of the Stringvector, not the first half..How do i do it. I really appreciate your help. And how do I use weighing schema?
Thanks
Kothavari

I believe you will need to preprocess the data to just pull out the second part of the string and use the algo on that.
Other people seem to do some preproessing first. See here

Related

How to get a substring with Regex in Python

I am trying to formnulate a regex to get the ids from the below two strings examples:
/drugs/2/drug-19904-5106/magnesium-oxide-tablet/details
/drugs/2/drug-19906/magnesium-moxide-tablet/details
In the first case, I should get 19904-5106 and in the second case 19906.
So far I tried several, the closes I could get is [drugs/2/drug]-.*\d but would return g-19904-5106 and g-19907.
Please any help to get ride of the "g-"?
Thank you in advance.
When writing a regex expression, consider the patterns you see so that you can align it correctly. For example, if you know that your desired IDs always appear in something resembling ABCD-1234-5678 where 1234-5678 is the ID you want, then you can use that. If you also know that your IDs are always digits, then you can refine the search even more
For your example, using a regex string like
.+?-(\d+(?:-\d+)*)
should do the trick. In a python script that would look something like the following:
match = re.search(r'.+?-(\d+(?:-\d+)*)', my_string)
if match:
my_id = match.group(1)
The pattern may vary depending on the depth and complexity of your examples, but that works for both of the ones you provided
This is the closest I could find: \d+|.\d+-.\d+

Regex or IndexOf?

I have a long string "AB100123485;AB10064279293-IP-1-KNPO;AB473898487-MM41". I have to extract integer value after "IP-" i.e 1 (only) what is the most efficient way ? I am using c#
Thanks
The 'most-efficient' way depends on how consistent your string is in terms of length and appearance. You can surely do this with a regular expression as a quick solution if you just want to get the digit directly following IP-.
You can utilize the RegularExpressions API, passing in your regular expression and input string.
https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.match?view=netframework-4.8#System_Text_RegularExpressions_Regex_Match_System_String_System_String_
This pattern should get you started IP-[0-9]; refine it more to your use case as needed.
For example:
Match matched = System.Text.RegularExpressions.Match(
"AB100123485;AB10064279293-IP-1-KNPO;AB473898487-MM41",
"IP-[0-9]"
);

Generate string data from regex

I would like to be able to take a regex and generate conforming data using the python hypothesis library. For example given a regex of
regex = re.compile('[a-zA-Z]')
This would match any english alpha characters. An example generator for this could be.
import hypothesis
import string
hypothesis.strategies.text(alphabet=string.ascii_letters)
But Ideally I want to construct a string that will match any regex passed in.
There's a work in progress pull request for adding this feature. Nothing extant will let you do it easily, but looking at the PR might give you a good idea about how to translate any specific example you need.
Update: the from_regex strategy was added in Hypothesis 3.19.

SSRS - How to get a part of a string

I have a parameter called Analyst group in this format :
[Dimension].[Analyst Group].&[Nl.Workplace.Foundation]
I want to pass this parameter to another report, to filter data. Its a multi value parameter. But the other report only accepts it in this format : [KanBan].[Analyst Group].&[Nl.Workplace.Foundation]
So im trying to isolate the "Nl.Workplace.Foundation", so i can do the following thing in the Go To Report parameter expression :="[KanBan].[Analyst Group].&["& --Isolated analyst group-- &"]" to create the desired format.
So what i need is to extract the part between .&[ and ]
But i really have no idea how to isolate that part of the string.
Found a solution! If i just use the Parameter.label instead of Parameter.value it automatically does what i want!
A different solution has been found, but I will still answer the initial question. It could help.
So what i need is to extract the part between .&[ and ]
You could use a regex.
This may not be the fastest way but it can handle most of the situations.
So let's assume you have a string containing:
[Dimension].[Analyst Group].&[Nl.Workplace.Foundation]
And you want to get the following string:
Nl.Workplace.Foundation
Just use the following expression:
=System.Text.RegularExpressions.Regex.Match("[Dimension].[Analyst Group].&[Nl.Workplace.Foundation]", "\.&\[(?<NWF>[^]]+)\]").Groups("NWF").Value
In the expression, replace the input string with your dynamic values, like for example:
=System.Text.RegularExpressions.Regex.Match(Fields!Dimension.Value & "." & Fields!AnalystGroup.Value, "\.&\[(?<NWF>[^]]+)\]").Groups("NWF").Value
I'm keeping the formula as simple as possible so that you can easily adapt it, with, say, handling the case where an input string will not have a match (with the above query it will return #Error).
You could do this by adding an IIF() or better, use a custom function that you can reuse in several places and will reduce the length of your expression.

parsing a string that ends

I have a huge string. I need to extract a substring from that that huge string. The conditions are the string starts with either "TECHNICAL" or "JUSTIFY" or "ALIGN" and ends with a number( any number from 1 to 10) followed by period and then followed by space. so for example, I have
string x = "This is a test, again I am testing TECHNICAL: I need to extract this substring starting with testing. 8. This is test again and again and again and again.";
so I need this
TECHNICAL: I need to extract this substring starting with testing.
I was wondering if someone has elegant solution for that.
I was trying to use the regular expression, but I guess I could not figure out the right expresion.
any help will be appreciated.
Thanks in advance.
Try this: #"((?:TECHNICAL|JUSTIFY|ALIGN).*?)(?:[1-9]|10)\. "

Resources