Recognize relevant string information by checking the first characters - string

I have a table with 2 columns. In column 1, I have a string information, in column 2, I have a logical index
%% Tables and their use
T={'A2P3';'A2P3';'A2P3';'A2P3 with (extra1)';'A2P3 with (extra1) and (extra 2)';'A2P3 with (extra1)';'B2P3';'B2P3';'B2P3';'B2P3 with (extra 1)';'A2P3'};
a={1 1 0 1 1 0 1 1 0 1 1 }
T(:,2)=num2cell(1);
T(3,2)=num2cell(0);
T(6,2)=num2cell(0);
T(9,2)=num2cell(0);
T=table(T(:,1),T(:,2));
class(T.Var1);
class(T.Var2);
T.Var1=categorical(T.Var1)
T.Var2=cell2mat(T.Var2)
class(T.Var1);
class(T.Var2);
if T.Var1=='A2P3' & T.Var2==1
disp 'go on'
else
disp 'change something'
end
UPDATES:
I will update this section as soon as I know how to copy my workspace into a code format
** still don't know how to do that but here it goes
*** why working with tables is a double edged sword (but still cool): I have to be very aware of the class inside the table to refer to it in an if else construct, here I had to convert two columns to categorical and to double from cell to make it work...
Here is what my data looks like:
I want to have this:
if T.Var1=='A2P3*************************' & T.Var2==1
disp 'go on'
else
disp 'change something'
end
I manage to tell matlab to do as i wish, but the whole point of this post is: how do i tell matlab to ignore what comes after A2P3 in the string, where the string length is variable? because otherwise it would be very tiring to look up every single piece of string information left on A2P3 (and on B2P3 etc) just to say thay.
How do I do that?

Assuming you are working with T (cell array) as listed in your code, you may use this code to detect the successful matches -
%%// Slightly different than yours
T={'A2P3';'NotA2P3';'A2P3';'A2P3 with (extra1)';'A2P3 with (extra1) and (extra 2)';'A2P3 with (extra1)';'B2P3';'B2P3';'NotA2P3';'B2P3 with (extra 1)';'A2P3'};
a={1 1 0 1 1 0 1 1 0 1 1 }
T(:,2)=num2cell(1);
T(3,2)=num2cell(0);
T(6,2)=num2cell(0);
T(9,2)=num2cell(0);
%%// Get the comparison results
col1_comps = ismember(char(T(:,1)),'A2P3') | ismember(char(T(:,1)),'B2P3');
comparisons = ismember(col1_comps(:,1:4),[1 1 1 1],'rows').*cell2mat(T(:,2))

One quick solution would be to make a function that takes 2 strings and checks whether the first one starts with the second one.
Later Edit:
The function will look like this:
for i = 0, i < second string's length, i = i + 1
if the first string's character at index i doesn't equal the second string's character at index i
return false
after the for, return true
This assuming the second character's lenght is always smaller the first's. Otherwise, return the function with the arguments swapped.

Related

How to remove kth element in O(1) time complexity

Given a string I need to remove the smallest character and return the sum of indices of removed charecter.
Suppose the string is 'abcab' I need to remove first a at index 1.
We are left with 'bcab'. Now remove again a which is smallest in remaining string and is at index 3
We are left with 'bcb'.
In the same way remove b at index 1,then remove again b from 'cb' at index 2 and finally remove c
Total of all indices is 1+3+1+2+1=8
Question is simple but we need to do it in O(n). for that I need to remove kth element in O(1). In python del list[index] has time complexity O(n).
How can I delete in constant time using python
Edit
This is the exact question
You are given a string S of size N. Assume that count is equal to 0.
Your task is the remove all the N elements of string S by performing the following operation N times
• In a single operation, select an alphabetically smallest character in S, for example, Remove from S and add its index to count. If multiple characters such as c exist, then select that has the smallest index.
Print the value of count.
Note Consider 1-based indexing
Solve the problem for T test cases
Input format
The first line of the input contains an integer T denoting the number of test cases • The first line of each test case contains an integer N denoting the size of string S
• The second line of each test case contains a string S
Output format
For each test case print a single line containing one integer denoting the value of count
1<T, N < 10^5
• S contains only lowercase English alphabets
Sum of N over all test cases does not exceed 10
Sample input 1
5
abcab
Sample Output1
8
Explanation
The operations occur in the following order
Current string S= abcab', The alphabetically smallest character of s is 'a As there are 2 occurrences of a, we choose the first occurrence. Its Index 1 will be added to the count and a will be removed. Therefore, S becomes bcab
a will.be removed from 5 (bcab) and 3 will.be added to count
The first occurrence of b will be removed from (bcb) and 1 will be added to count.
b will be removed from s (cb) and 2 will be added to count
c will be removed from 5 (c) and 1 will be added to count
If you follow your procedure of repeatedly removing the first occurrence of the smallest character, then each character's index -- when you remove it -- is the number of preceding larger characters in the original string plus one.
So what you really need to do is find, for each character, the number of preceding larger characters, and then add up all those counts.
There are only 26 characters, so you can do this as you go with 26 counters.
Please link to the original problem statement, or copy/paste exactly what it says, without trying to explain it. As is, what you're asking for is impossible.
Forget deleting: if what you're asking for was possible, sorting would be worse-case O(n) (remove the minimum remaining n times, at O(1) cost for each), but it's well known that comparison-based sorting cannot do better than worst case O(n log n).
One bet: the original problem statement doesn't require that you delete anything - but instead that you return the result as if you had deleted.
With one pass over the input
Putting together various ideas, the final index of a character is one more than the number of larger characters seen before it. So it's possible to do this in one left-to-right pass over the input, using O(1) storage and O(n) time, while deleting nothing:
def crunch(s):
neq = [0] * 26
result = 0
orda = ord('a')
for ch in map(ord, s):
ch -= orda
result += sum(neq[i] for i in range(ch + 1, 26)) + 1
neq[ch] += 1
return result
For your original:
>>> crunch('abcab')
8
But it's also possible to process arbitary iterables one character at a time:
>>> from itertools import repeat, chain
>>> crunch(chain(repeat('y', 1000000), 'xz'))
2000002
x is originally at (1-based) index 1000001, which accounts for half the result. Then each of a million 'y's is conceptually deleted, each at index 1. Finally 'z' is at index 1, for a grand total of 2000002.
Looks like you're only interested in the resulting sum of indices and don't need to simulate this algorithm step by step.
In which case you could compute the result in the following way:
For each letter from a to z:
Have a counter of already removed letters set to 0
Iterate over the string and if you encounter the current letter add current_index - already_removed_counter to the result.
2a. If you encounter current or earlier (smaller) letter increase the counter as it already has been removed
The time complexity is 26 * O{n} which is O{n}.
Since there are only 26 distinct chatacters in the string, we can take each character separately and linearly traverse the string to find all its occurences. Keep a counter of how many chacters were found. Each time an occurence of a given character is found display its index decreased by the counter. Before switching to a new character, remove all the occurences of the previous one - this can be done in linear time.
res = 0
for c in 'a' .. 'z'
cnt = 0
for idx = 1 .. len(s)
if s[idx] = c
print idx - cnt
res += idx - cnt
cnt++
removeAll(s, c)
return res
where
removeAll(s,c):
i = 1
cnt = 0
n = len(s)
while (i < n)
if s[i + cnt] = c
cnt++
n--
else
s[i] = s[i + cnt]
i++
len(s) = n
It prints the elements of the sum to better illustrate what's going on.
Edit:
An updated version based on Igor's answer, that does not require actually removing elements. The complexity is the same i.e. O(n).
res = 0
for c in 'a' .. 'z'
cnt = 0
for idx = 1 .. len(s)
if s[idx] <= c
if s[idx] = c
print idx - cnt
res += idx - cnt
cnt++
return res

Selecting Characters In String

I can grab every 2 chars from sum2.text in order (102030) i get 10,20,30
but my issue is, selecting exactly those numbers 10,20,30 in order
my current code below outputs: msgbox(10) msgbox(20) msgbox(30) but wont select and replace those exact numbers in order one by one
My code:
For i = 0 To sum2.Text.Length - 1 Step 2 'grabs every 2 chars
Dim result = (sum2.Text.Substring(i, 2)) 'this holds the 2 chars
MsgBox(result) 'this shows the 2 chars
sum2.SelectionStart = i 'this starts the selection at i
sum2.SelectionLength = 2 'this sets the length of selection (i)
If sum2.SelectedText.Contains("10") Then
sum2.SelectedText = sum2.SelectedText.Replace("10", "a")
End If
If sum2.SelectedText.Contains("20") Then
sum2.SelectedText = sum2.SelectedText.Replace("20", "b")
End If
If sum2.SelectedText.Contains("30") Then
sum2.SelectedText = sum2.SelectedText.Replace("30", "c")
End If
my probolem is that it will show the numbers in sum2 one by one correctly, but it would select and replace at all or one by one. I believe the issue is with the selection length
OK, here's my attempt from what I'm understanding you are wanting to do. The problem is, you are trying to alter the string that the loop is using when you replace "10" with "a" so you need to create a variable to hold your newly built string.
Dim part As String = ""
Dim fixed As String = ""
For i = 0 To Sum2.SelectedText.Length - 1 Step 2
part = Sum2.SelectedText.Substring(i, 2)
Select Case part
Case "10"
part = part.Replace("10", "a")
Case "20"
part = part.Replace("20", "b")
Case "30"
part = part.Replace("30", "c")
End Select
fixed &= part
Next
Sum2.SelectedText = fixed
Of course, this is only to show the workings of moving through the string and changing it. You would need to replace your selected text with the newly formatted result (fixed in this case)
Result: ab3077328732
Also, just so you know, if this format was such that no 2 digits would interfere, you could simply do a
sub2.selectedtext.replace("10", "a").Replace("20", "b").Replace...
However if you had 2 digits like 11 next to 05 it would fail to give desired results because if would change 1105 to 1a5. Just something to think about.
Here's some code to get you started:
For i = 0 To sum2.SelectedText.Length - 1 Step 2
MessageBox.Show(sum2.SelectedText.Substring(i, 2))
Next

VBA Greater Than Function Not Working

I have an issue where I am trying to compare a values that can be alphanumeric, only numeric, or only alphabetic.
The code originally worked fine for comparing anything within the same 100s group (IE 1-99 with alphabetic components). However when I included 100+ into it, it malfunctioned.
The current part of the code reads:
For j = 1 To thislength
If lennew < j Then
enteredval = Left("100A", lennew)
ElseIf lennew >= j Then
enteredval = Left("100A", j)
End If
If lenold < j Then
cellval = Left("67", lenold)
ElseIf lenold >= j Then
cellval = Left("67", j)
End If
'issue occurs here
If enteredval >= cellval Then
newrow = newrow+1
End If
Next j
The issue occurs in the last if statement.
When cycling through the 100 is greater than the 67 but still skips over. I tried to declare them both as strings (above this part of code) to see if that would help but it didn't.
What I am trying to accomplish is to sort through a bunch of rows and find where it should go. IE the 100A should go between 100 and 100B.
Sorry lennew=len("100A") and lennold=len("67"). And thislength=4or whatever is larger of the two lengths.
The problem is that you're trying to solve the comparison problem by attacking specific values, and that's going to be a problem to maintain. I'd make the problem more generic by creating a function that supplies takes two values returns -1 if the first operand is "before" the second, 0 if they are the same, and 1 if the first operand is "after" the second per your rules.
You could then restructure your code to eliminate the specific hardcoded prefix testing and then just call the comparison function directly, eg (and this is COMPLETELY untested, off-the-cuff, and my VBA is VERRRRRY stale :) but the idea is there: (it also assumes the existence of a simple string function called StripPrefix that just takes a string and strips off any leading digits, which I suspect you can spin up fairly readily yourself)
Function CompareCell(Cell1 as String, Cell2 as String) as Integer
Dim result as integer
Dim suffix1 as string
Dim suffix2 as string
if val(cell1)< val(cell2) Then
result = -1
else if val(cell1)>val(cell2) then
result = 1
else if val(cell1)=val(cell2) then
if len(cell1)=len(cell2) then
result =0
else
' write code to strip leading numeric prefixes
' You must supply StripPrefix, but it's pretty simple
' I just omitted it here for clarity
suffix1=StripPrefix(cell1) ' eg returns "ABC" for "1000ABC"
suffix2=StripPrefix(cell2)
if suffix1 < suffix2 then
result = -1
else if suffix1 > suffix2 then
result = 1
else
result = 0
end if
end if
return result
end function
A function like this then allows you to take any two cell references and compare them directly to make whatever decision you need:
if CompareCell(enteredval,newval)>=0 then
newrow=newrow+1
end if

Pattern Matching BASIC programming Language and Universe Database

I need to identify following patterns in string.
- "2N':'2N':'2N"
- "2N'-'2N'-'2N"
- "2N'/'2N'/'2N"
- "2N'/'2N'-'2N"
AND SO ON.....
basically i want this pattern if written in Simple language
2 NUMBERS [: / -] 2 NUMBERS [: / -] 2 NUMBERS
So is there anyway by which i could write one pattern which will cover all the possible scenarios ? or else i have to write total 9 patterns and had to match all 9 patterns to string.... and it is not the scenario in my code , i have to match 4, 2 number digits separated by [: / -] to string for which i have towrite total 27 patterns. So for understanding purpose i have taken 3 ,2 digit scenario...
Please help me...Thank you
Maybe you could try something like (Pick R83 style)
OK = X MATCH "2N1X2N1X2N" AND X[3,1]=X[6,1] AND INDEX(":/-",X[3,1],1) > 0
Where variable X is some input string like: 12-34-56
Should set variable OK to 1 if validation passes, else 0 for any invalid format.
This seems to get all your required validation into a single statement. I have assumed that the non-numeric characters have to be the same. If this is not true, the check could be changed to something like:
OK = X MATCH "2N1X2N1X2N" AND INDEX(":/-",X[3,1],1) > 0 AND INDEX(":/-",X[6,1],1) > 0
Ok, I guess the requirement of surrounding characters was not obvious to me. Still, it does not make it much harder. You just need to 'parse' the string looking for the first (I assume) such pattern (if any) in the input string. This can be done in a couple of lines of code. Here is a (rather untested ) R83 style test program:
PROMPT ":"
LOOP
LOOP
CRT 'Enter test string':
INPUT S
WHILE S # "" AND LEN(S) < 8 DO
CRT "Invalid input! Hit RETURN to exit, or enter a string with >= 8 chars!"
REPEAT
UNTIL S = "" DO
*
* Look for 1st occurrence of pattern in string..
CARDNUM = ""
FOR I = 1 TO LEN(S)-7 WHILE CARDNUM = ""
IF S[I,8] MATCH "2N1X2N1X2N" THEN
IF INDEX(":/-",S[I+2,1],1) > 0 AND INDEX(":/-",S[I+5,1],1) > 0 THEN
CARDNUM = S[I,8] ;* Found it!
END ELSE I = I + 8
END
NEXT I
*
CRT CARDNUM
REPEAT
There is only 7 or 8 lines here that actually look for the card number pattern in the source/test string.
Not quite perfect but how about 2N1X2N1X2N this gets you 2 number followed by 1 of any character followed by 2 numbers etc.
This might help:
BIG.STRING ="HELLO TILDE ~ CARD 12:34:56 IS IN THIS STRING"
TEMP.STRING = BIG.STRING
CONVERT "~:/-" TO "*~~~" IN TEMP.STRING
IF TEMP.STRING MATCHES '0X2N"~"2N"~"2N0X' THEN
FIRST.TILDE.POSN = INDEX(TEMP.STRING,"~",1)
CARD.STRING = BIG.STRING[FIRST.TILDE.POSN-2,8]
PRINT CARD.STRING
END

Find the location of multiple strings in a cell array of strings

I have 2 question regarding searching for strings in MATLAB
If I have to find a string in a cell array of strings I can do the following to get the location of 'PO' in the cell array
find(strcmpi({'PO','FOO','PO1','FOO1','PO1','PO'},'PO'))
% 1 6
But, I really want to search for multiple strings ({'PO1', 'PO'}) at the same time (not using a for loop). What is the best way to do this?
Is there any function like histc() which can tell me how many times the string has occurred. Again for one string, I could do:
length(strfind({'PO','FOO','PO1','FOO1','PO1','PO'},'PO'))
But this obviously doesn't work for multiple strings at a time.
If you want to find multiple strings, then just use the second output of ismember instead to tell you which string it is. If you really need case-insensitive matching, I've added the upper call to force all inputs to be upper-case. You can omit this if you think it's already uppercase.
data = {'PO','FOO','PO1','FOO1','PO1','PO', 'PO'};
[tf, inds] = ismember(upper(data), {'PO1', 'PO'});
% 2 0 1 0 1 2 2
You can then use the second output to determine which string was found where:
% PO1 Occurrences
find(inds == 1)
% 3 5
% PO Occurrences
find(inds == 2)
% 1 6 7
If you want the equivalent of histc, you can use accumarray to do that. We can pass it all of the values of inds that are non-zero (i.e. the ones that you were actually searching for).
accumarray(inds(tf).', ones(sum(tf), 1))
% 2 3
If instead you want to get the histogram of all strings (not just the ones you're searching for) you could do the following:
[strings, ~, inds] = unique(data, 'stable');
occurrences = accumarray(inds, ones(size(inds)));
% 'PO' [3]
% 'FOO' [1]
% 'PO1' [2]
% 'FOO1' [1]

Resources