Comparring a string with a manually added string - string

If have been cracking my head over this for a week now:
We have an assignment, where we have 2 options in our program, with option 1, the program asks for a name and a date, and then it generates an email addressed to the give name, with that date.
The second option, we have to paste text in to program, and it will tell us if the 'template' from option 1 is used or not, and it gives you the name, and date.
my question is now: how do I compare the given string, with the manual input string and make that name, and date (could be 2nd of oktober, could be 10/02, could be sunday the 2nd, basically anything that isn't the same as the template) and still make it say the template matches?
I thought: cutting the strings up, comparing them, word for word, but then what? and how?

Since I do not know what language you are programming in, I will give you some examples of what you ask in the languages I know.
Python(2.7):
x = raw_input('Manual String') // get user input, this can be replaced
y = 'this is a string: '+ str(x) // use Str incase of a number or other format of x.
if(y == 'this is a string: doubleo'):
print "The strings are equal!"
C:
use this page:
http://www.tutorialspoint.com/ansi_c/c_strcmp.htm

Related

FuzzyWuzzy for very similar records in Python

I have a dataset with which I want to find the closest string match. For that purpose I'm using FuzzyWuzzy in this way
sol=process.extract(t,dev2,scorer=fuzz.token_sort_ratio)
Where t is the string and dev2 is the list to compare to. My problem is that sometimes it has very similar records and options provided by FuzzyWuzzy seems to be lacking. And I've tested with token_sort, token_set, partial_token sort and set, ratio, partial_ratio, and WRatio.
For example, the string Italy - Serie A gives me the following 2 closest matches.
Token_sort_ratio: (92, 'Italy - Serie D');(86, 'Italian - Serie A')
The one wanted is obviously the second one, but character by character is closer the first one, which is a different league.
This happens as well with teams. If, let's say I have a string Buchtholz I would obtains Buchtholz II before I get TSV Buchtholz.
My main guess now would be to try and weight the presence and absence of several characters more heavily, like single capital letters at the end of the string, so if there is a difference in the letter or an absence it is weighted as less close. Or for () and special characters.
I don't know if there is a way to take this into account or you guys have a better approach to get the string that really matches.
Similarity matches often require knowledge of the data being analysed. i.e. it is not just a blind single round of matching. I recommend that you pass your results through more steps of matching, starting with inclusive/optimistic approaches (like token_set_ratio) with low cut off scores and working toward more exclusive/pessimistic approaches with higher cut off scores until you have a clear winner. If you know more about the text you're analyzing, you can even modify the strings as you progress.
In a case I worked on, I did similarity matches of goods movement descriptions. In the descriptions the numbers sequences were more important than the text. e.g. when looking for a match for "SLURRY VALVE 250MM RAGMAX 2000" the 250 and 2000 part of the string are important, otherwise I get a "SLURRY VALVE 50MM RAGMAX 2000" as the best match instead of "VALVE B/F 250MM,RAGMAX 250RAG2000 RAGON" which is a better result.
I put the similarity match process through two steps: 1. Get a bunch of similar matches using an optimistic matching scorer (token_set_ratio) 2. get the number sequences of these results and pass them through another round of matching with a more strict scorer (token_sort_ratio). Doing this gave me the better result in the example I showed above.
Below is some blocks of code that could be of assistance:
here's a function to get numbers from the sequence. (In your case you might use this to exclude numbers from your string instead?)
def get_numbers_from_string(description):
numbers = ''.join((ch if ch in '0123456789.-' else ' ') for ch in description)
numbers = ' '.join([nr for nr in numbers.split()])
return numbers
and here is a portion of the code I used to put the description match through two rounds:
try:
# get close match from goods move that has material numbers
df_material = pd.DataFrame(process.extract(description,
corpus_material,
scorer=fuzz.token_set_ratio),
columns=['Similar Text','Score']
)
if df_material['Score'][df_material['Score']>=cut_off_accuracy_materials].count()>=1:
similar_text = df_material['Similar Text'].iloc[0]
score = df_material['Score'].iloc[0]
if nr_description_numbers>4:
# if there are multiple matches found, then get best number combination match
df_material = df_material[df_material['Score']>=cut_off_accuracy_materials]
new_corpus = list(df_material['Similar Text'])
new_corpus = np.vectorize(get_numbers_from_string)(new_corpus)
df_material['numbers'] = new_corpus
df_numbers = pd.DataFrame(process.extract(description_numbers,
new_corpus,
scorer=fuzz.token_sort_ratio),
columns=['numbers','Score']
)
similar_text = df_material['Similar Text'][df_material['numbers']==df_numbers['numbers'].iloc[0]].iloc[0]
nr_score = df_numbers['Score'].iloc[0]
hope it helps, and good luck

String contains substring and substring not part of longer word (exact match)

I have captured the full text of a PDF-file in a string called pdfText.
Next I am looping through an array containing substrings to be found/searched for in the pdfText-string.
One of the substrings is Invoice.
Both pdfText and the substrings I am searching for are converted to lower case.
If at least one of the substrings are found in the pdfText, a boolean is set to true.
Now, I have an example where the pdtText contains '...Net amount to be invoiced...'. This is the only variant of 'invoice' in the text.
This of course returns true if I use
substring = "Invoice" ... pdfText.contains(substring.ToLower).
But in this case I need it to return false. I need to find only exact matches.
Another example, if the pdfText contains '...This is an invoice. Please pay....Net amount to be invoiced...' the boolean should be set to true because of the first invoice-match, but not the second invoiced-(non)match.
So what I am looking for is to find a substring Invoice in a string pdfText and make sure, that the substring is not part of a longer word invoiced, invoice-process etc.. Note, that invoice. should return True.
I believe this should be possible, but cannot wrap my head around it currently.
I might need to use regex?
This one uses the RegEx, with a slight change, proposed by #Mederic at https://stackoverflow.com/a/45587916/2326360
Use the build in UiPath activity Is Match, found under Programming->String.
Use it inside your loop, with the current settings.
The RegEx is: substring+"[^a-zA-Z]"
I have declared the following variables:
RegEx would be a good approach.
I only started RegEx not long ago but I think this would work fine.
RegEx:
(invoice)[^a-zA-Z]
Explanation:
() Creates a Capture Group
invoice looks for the match for invoice
[^a-zA-Z] Checks there are no characters from a-z or A-Z after
Example:
Sample: This was invoiced
Result: No Result
Sample: This is an invoice.
Result: Match on invoice. Capture group 1 = invoice
Implementation:
Dim m As Match = Regex.Match(pdfText.ToLower,"(invoice)[^a-zA-Z]")
' If successful, write the group.
If (m.Success) Then
Dim key As String = m.Groups(1).Value
Console.WriteLine(key)
End If

Struct name from variable in Matlab

I have created a structure containing a few different fields. The fields contain data from a number of different subjects/participants.
At the beginning of the script I prompt the user to enter the "Subject number" like so:
prompt='Enter the subject number in the format SUB_n: ';
SUB=input(prompt,'s');
Example SUB_34 for the 34th subject.
I want to then name my structure such that it contains this string... i.e. I want the name of my structure to be SUB_34, e.g. SUB_34.field1. But I don't know how to do this.
I know that you can assign strings to a specific field name for example for structure S if I want field1 to be called z then
S=struct;
field1='z';
S.(field1);
works but it does not work for the structure name.
Can anyone help?
Thanks
Rather than creating structures named SUB_34 I would strongly recommend just using an array of structures instead and having the user simply input the subject number.
number = input('Subject Number')
S(number) = data_struct
Then you could simply find it again using:
subject = S(number);
If you really insist on it, you could use the method proposed in the comment by #Sembei using eval to get the struct. You really should not do this though
S = eval([SUB, ';']);
Or to set the structure
eval([SUB, ' = mydata;']);
One (of many) reasons not to do this is that I could enter the following at your prompt:
>> prompt = 'Enter the subject number in the format SUB_n: ';
>> SUB = input(prompt, 's');
>> eval([SUB, ' = mydata;']);
And I enter:
clear all; SUB_34
This would have the unforeseen consequence that it would remove all of your data since eval evaluates the input string as a command. Using eval on user input assumes that the user is never going to ever write something malformed or malicious, accidentally or otherwise.

How to verify ANY text is present with selenium IDE

I know how to verify if a specific text is present in a web page using Selenium IDE. But what I wanted to know is, can you verify that any text is present in an element?
For example there's a text box with the title "Top Champion". This text box will be changed daily with the name of a person. Now I just wanted to check whether there is a text in this text box, no matter what the text actually is. I've tried the verify text command and tried blanking the value, but it doesn't work. If the command can return a true or false command that would be really helpful
BTW, verify value doesn't work either since the element that I'm testing is not a form field
Your best bet is as follows (I have written single tests for this for numbers)
Medium rigour:
waitForText | css=.SELECTORS | regex:.+?
This will wait until there is at least 1 character present.
Strong rigour (only works if you have a subset of characters present):
waitForText | css=.SELECTORS | regex:^[0-9]+$
This will wait until there is text. This text must start with a number, have at least 1 number, and then finish. It does not permit any character outside of the subset given. An example you could do to match numbersNAMEnumbers would be.
waitForText | css=.SELECTORS | regex:^[0-9]+[a-zA-Z]+[0-9]+$
This would wait for a string such as 253432234BobbySmith332
Luke
If i have understood your question properly there below is one way you can search for an element contains a string. Not sure if this is what you are looking.
List<WebElement> findElement = webElement.findElements(By.xpath("YOUR_TEXTINPUT_PATH_HERE"));
if( findElement.size() > 0 ){
if( findElement.get(0).getText() != null && findElement.get(0).getText().indexOf("THE_STRING_THAT_YOU_WANT_TO SEARCH") != -1 ) {
// IF IT COMES HERE, THAT MEANS THE ELEMENT IS PRESENT WITH THE TEXT
}
}
store text|[your element]|StoredText
execute script|return ${StoredText}.length > 0|x
assert|x|true
Using these three lines in the Selenium IDE, the first line will extract the text from the element into the variable StoredText.
The second line will store whether the length of that text is greater than zero into the variable x (a true or false result).
The third line asserts that the result was true, failing the test if not. You don't need the third line if all you want is the true or false result.
So if the element contains any text, the extracted text length will be greater than zero, the variable x will be true, and the assert will pass. This verifies that any text is present in the element.

Is there any way when I choose an option not to clear previously typed data in the input

The problem is this:
In my programme at first the user gets options for a first name - so hopefully he likes something from the options and he chooses it -so far everything is OK!
But then when he types space he starts receiving options for second name and a if he likes something and chooses it - then the Autocomplete just erases the first name. Is there any way I can change that?
hello Rich thank you very much or your response - now i've decided to change my task and here is what I made when a user types for example I character i get all the first names that start with I- so far no problem! ANd when he types the white space and K for example I make request to my web service that gets the middle names that starts with K or the last names that start with K (one of them should start with K for Iwelina), so in this case for Iwelina Ive got RADULSKA KOSEWA and KOSEWA NEDEWA! For the source of autocomplete I concatenate iwelina with (radulska kosewa)and iwelina with (KOSEWA NEDEWA) so at the end I've got IWELINA IELINA RADULSKA KOSEWA and IWELINA KOSEWA NEDEWA!!! the only problem is that when i type Iwelina K i get only IWELINA KOSEWA NEDEWA!!!here is the code for autocomlete
$('#input').autocomplete({
source: function(request, response) {
var matcher = new RegExp( $.ui.autocomplete.escapeRegex(request.term, " "));
var data = $.grep( srcAutoComp, function(value) {
return matcher.test( value.label || value.value || value );
});
response(data);
}
});
if you know how i can change it I will be glad for the help
I don't understand how, when the user begines to type the second name, he's getting results that are only the last name. For example, if he types "Joh" and selects "John" from the options, and then continues to type "John Do", then how is it possible that your drop down gives him results for only the last name, like "Doe"?
At any rate, assuming this is truly happening, you could just combine all combinations of first and last names in your source data and that will show "John Doe" in the drop down when the user types "Joh" selects "John" and then continues to type "John Do".
Another way to do this is with a complicated change to the search and response events to search after a space if it is there, and recombine it with the first string after the search for the last name is complete. If you give me your source data, I could put something together for this.

Resources