Pattern Matching BASIC programming Language and Universe Database - basic

I need to identify following patterns in string.
- "2N':'2N':'2N"
- "2N'-'2N'-'2N"
- "2N'/'2N'/'2N"
- "2N'/'2N'-'2N"
AND SO ON.....
basically i want this pattern if written in Simple language
2 NUMBERS [: / -] 2 NUMBERS [: / -] 2 NUMBERS
So is there anyway by which i could write one pattern which will cover all the possible scenarios ? or else i have to write total 9 patterns and had to match all 9 patterns to string.... and it is not the scenario in my code , i have to match 4, 2 number digits separated by [: / -] to string for which i have towrite total 27 patterns. So for understanding purpose i have taken 3 ,2 digit scenario...
Please help me...Thank you

Maybe you could try something like (Pick R83 style)
OK = X MATCH "2N1X2N1X2N" AND X[3,1]=X[6,1] AND INDEX(":/-",X[3,1],1) > 0
Where variable X is some input string like: 12-34-56
Should set variable OK to 1 if validation passes, else 0 for any invalid format.
This seems to get all your required validation into a single statement. I have assumed that the non-numeric characters have to be the same. If this is not true, the check could be changed to something like:
OK = X MATCH "2N1X2N1X2N" AND INDEX(":/-",X[3,1],1) > 0 AND INDEX(":/-",X[6,1],1) > 0
Ok, I guess the requirement of surrounding characters was not obvious to me. Still, it does not make it much harder. You just need to 'parse' the string looking for the first (I assume) such pattern (if any) in the input string. This can be done in a couple of lines of code. Here is a (rather untested ) R83 style test program:
PROMPT ":"
LOOP
LOOP
CRT 'Enter test string':
INPUT S
WHILE S # "" AND LEN(S) < 8 DO
CRT "Invalid input! Hit RETURN to exit, or enter a string with >= 8 chars!"
REPEAT
UNTIL S = "" DO
*
* Look for 1st occurrence of pattern in string..
CARDNUM = ""
FOR I = 1 TO LEN(S)-7 WHILE CARDNUM = ""
IF S[I,8] MATCH "2N1X2N1X2N" THEN
IF INDEX(":/-",S[I+2,1],1) > 0 AND INDEX(":/-",S[I+5,1],1) > 0 THEN
CARDNUM = S[I,8] ;* Found it!
END ELSE I = I + 8
END
NEXT I
*
CRT CARDNUM
REPEAT
There is only 7 or 8 lines here that actually look for the card number pattern in the source/test string.

Not quite perfect but how about 2N1X2N1X2N this gets you 2 number followed by 1 of any character followed by 2 numbers etc.

This might help:
BIG.STRING ="HELLO TILDE ~ CARD 12:34:56 IS IN THIS STRING"
TEMP.STRING = BIG.STRING
CONVERT "~:/-" TO "*~~~" IN TEMP.STRING
IF TEMP.STRING MATCHES '0X2N"~"2N"~"2N0X' THEN
FIRST.TILDE.POSN = INDEX(TEMP.STRING,"~",1)
CARD.STRING = BIG.STRING[FIRST.TILDE.POSN-2,8]
PRINT CARD.STRING
END

Related

Is there a way to replace characters in a string from index 0 to index -4 (i.e. all but last 4 characters) with a '#'

For example, If my string was 'HelloWorld'
I want the output to be ######orld
My Code:
myString = 'ThisIsAString'
hashedString = string.replace(string[:-4], '#')
print(hashedString)
Output >> #ring
I expected the output to have just one # symbol since it is replacing argument 1 with argument 2.
Can anyone help me with this?
You could multiply # by the word length - 4 and then use the string slicing.
myString = 'HelloWorld'
print('#' * (len(myString) - 4) + myString[-4:])
myString = 'ThisIsAString'
print('#' * (len(myString) - 4) + myString[-4:])
string.replace(old, new) replaces all instances of old with new. So the code you provided is actually replacing the entire beginning of the string with a single pound sign.
You will also notice that input like abcdabcd will give the output ##, since you are replacing all 'abcd' substrings.
Using replace, you could do
hashes = '#' * len(string[:-4])
hashedString = string.replace(string[:-4], hashes, 1)
Note the string multiplication to get the right number of pound symbols, and the 1 passed to replace, which tells it only to replace the first case it finds.
A better method would be to not use replace at all:
hashes = '#' * (len(string) - 4)
leftover = string[-4:]
hashedString = hashes + leftover
This time we do the same work with getting the pound sign string, but instead of replacing we just take the last 4 characters and add them after the pound signs.

Making one string the anagram of other

I have a problem where two strings of same length are given, and I have to tell how many letters I have to change in the first string to make it an anagram of the second.
Here is what I did:
count = 0
Mutable_str = ''.join(sorted("hhpddlnnsjfoyxpci"))
Ref_str = ''.join(sorted("ioigvjqzfbpllssuj"))
i = 0
while i < len(Mutable_str):
if Mutable_str[i] != Ref_str[i]:
count += 1
i += 1
print(count)
My algorithm in this case returned 16 as result. But the correct answer is 10. Can someone tell me what is wrong in my code?
Thank you very much!
You need to use str.count
So you need to add up the differences between the number of occurrences of each character in the different strings. This can be done with str.count(c) where c is each distinct character in the second string (got with set()). We then need to use max() on the difference with 0 so that if the difference is negative this doesn't effect the total differences.
So as you can see, it boils down to one neat little one-liner:
def changes(s1, s2):
return sum(max(0, s2.count(c) - s1.count(c)) for c in set(s2))
and some tests:
>>> changes("hhpddlnnsjfoyxpci", "ioigvjqzfbpllssuj")
10
>>> changes("abc", "bcd")
1
>>> changes("jimmy", "bobby")
4

Find the location of multiple strings in a cell array of strings

I have 2 question regarding searching for strings in MATLAB
If I have to find a string in a cell array of strings I can do the following to get the location of 'PO' in the cell array
find(strcmpi({'PO','FOO','PO1','FOO1','PO1','PO'},'PO'))
% 1 6
But, I really want to search for multiple strings ({'PO1', 'PO'}) at the same time (not using a for loop). What is the best way to do this?
Is there any function like histc() which can tell me how many times the string has occurred. Again for one string, I could do:
length(strfind({'PO','FOO','PO1','FOO1','PO1','PO'},'PO'))
But this obviously doesn't work for multiple strings at a time.
If you want to find multiple strings, then just use the second output of ismember instead to tell you which string it is. If you really need case-insensitive matching, I've added the upper call to force all inputs to be upper-case. You can omit this if you think it's already uppercase.
data = {'PO','FOO','PO1','FOO1','PO1','PO', 'PO'};
[tf, inds] = ismember(upper(data), {'PO1', 'PO'});
% 2 0 1 0 1 2 2
You can then use the second output to determine which string was found where:
% PO1 Occurrences
find(inds == 1)
% 3 5
% PO Occurrences
find(inds == 2)
% 1 6 7
If you want the equivalent of histc, you can use accumarray to do that. We can pass it all of the values of inds that are non-zero (i.e. the ones that you were actually searching for).
accumarray(inds(tf).', ones(sum(tf), 1))
% 2 3
If instead you want to get the histogram of all strings (not just the ones you're searching for) you could do the following:
[strings, ~, inds] = unique(data, 'stable');
occurrences = accumarray(inds, ones(size(inds)));
% 'PO' [3]
% 'FOO' [1]
% 'PO1' [2]
% 'FOO1' [1]

list of all the permutations and combinations for 2 strings

Lets take a word
qwerty
What I want is I need to insert periods (dots .) between the string. It can be any other character also.
For example,
q.werty
qw.erty
qwe.rty
qwer.ty
qwert.y
The above is for 1 period or dot. So 1 period combination for a 5 letter string will generate 5 outputs. (N-1)
Now for 2 periods (2 dots) (2 examples only):
q.w.erty
q.we.rty
q.wer.ty
q.wert.y
qw.e.rty
qw.er.ty
qw.ert.y
qwe.r.ty
qwe.rt.y
qwer.t.y
and so on..
NOTE: There must not be 2 consecutive dots between 2 letters in the string. Also, there must not be a period before starting character and/or after ending character.
Can anyone provide a Shell Script (sh, bash) for the above to list all the possible combinations and permutations. I have tried Googling and didn't find any worthwhile content to refer.
EDIT: Any help on how to start this on bash shell script would be great...
Your puzzle is fun so here's a code:
#!/bin/bash
t=qwerty
echo '---- one dot ----'
for (( i = 1; i < ${#t}; ++i )); do
echo "${t:0:i}.${t:i}"
done
echo '---- two dots ----'
for (( i = 1; i < (${#t} - 1); ++i )); do
for (( j = i + 1; j < ${#t}; ++j )); do
echo "${t:0:i}.${t:i:j - i}.${t:j}"
done
done
Output:
---- one dot ----
q.werty
qw.erty
qwe.rty
qwer.ty
qwert.y
---- two dots ----
q.w.erty
q.we.rty
q.wer.ty
q.wert.y
qw.e.rty
qw.er.ty
qw.ert.y
qwe.r.ty
qwe.rt.y
qwer.t.y
See the Bash Manual for everything.
I won't write the code, but I can guide you to the answer.
I assume you want to consider all possible number of dots, not just 1 or 2, but 3, 4, ... , up to the length of the string - 1.
For each character in the string up until the last, there are two possibilities: there is a dot or there is not a dot. So for an n character string, there are O(2^(n-1)) possibilities.
You could write a for loop that goes through all 2^(n-1) possibilities. Each one of these corresponds to a single output with dots after letters.
Let i be an iteration of the for loop. Then have an internal j loop that goes 1 to n-1. If the jth bit is 1, then put a dot after the jth letter.

Recognize relevant string information by checking the first characters

I have a table with 2 columns. In column 1, I have a string information, in column 2, I have a logical index
%% Tables and their use
T={'A2P3';'A2P3';'A2P3';'A2P3 with (extra1)';'A2P3 with (extra1) and (extra 2)';'A2P3 with (extra1)';'B2P3';'B2P3';'B2P3';'B2P3 with (extra 1)';'A2P3'};
a={1 1 0 1 1 0 1 1 0 1 1 }
T(:,2)=num2cell(1);
T(3,2)=num2cell(0);
T(6,2)=num2cell(0);
T(9,2)=num2cell(0);
T=table(T(:,1),T(:,2));
class(T.Var1);
class(T.Var2);
T.Var1=categorical(T.Var1)
T.Var2=cell2mat(T.Var2)
class(T.Var1);
class(T.Var2);
if T.Var1=='A2P3' & T.Var2==1
disp 'go on'
else
disp 'change something'
end
UPDATES:
I will update this section as soon as I know how to copy my workspace into a code format
** still don't know how to do that but here it goes
*** why working with tables is a double edged sword (but still cool): I have to be very aware of the class inside the table to refer to it in an if else construct, here I had to convert two columns to categorical and to double from cell to make it work...
Here is what my data looks like:
I want to have this:
if T.Var1=='A2P3*************************' & T.Var2==1
disp 'go on'
else
disp 'change something'
end
I manage to tell matlab to do as i wish, but the whole point of this post is: how do i tell matlab to ignore what comes after A2P3 in the string, where the string length is variable? because otherwise it would be very tiring to look up every single piece of string information left on A2P3 (and on B2P3 etc) just to say thay.
How do I do that?
Assuming you are working with T (cell array) as listed in your code, you may use this code to detect the successful matches -
%%// Slightly different than yours
T={'A2P3';'NotA2P3';'A2P3';'A2P3 with (extra1)';'A2P3 with (extra1) and (extra 2)';'A2P3 with (extra1)';'B2P3';'B2P3';'NotA2P3';'B2P3 with (extra 1)';'A2P3'};
a={1 1 0 1 1 0 1 1 0 1 1 }
T(:,2)=num2cell(1);
T(3,2)=num2cell(0);
T(6,2)=num2cell(0);
T(9,2)=num2cell(0);
%%// Get the comparison results
col1_comps = ismember(char(T(:,1)),'A2P3') | ismember(char(T(:,1)),'B2P3');
comparisons = ismember(col1_comps(:,1:4),[1 1 1 1],'rows').*cell2mat(T(:,2))
One quick solution would be to make a function that takes 2 strings and checks whether the first one starts with the second one.
Later Edit:
The function will look like this:
for i = 0, i < second string's length, i = i + 1
if the first string's character at index i doesn't equal the second string's character at index i
return false
after the for, return true
This assuming the second character's lenght is always smaller the first's. Otherwise, return the function with the arguments swapped.

Resources