What is the Algorithm for this Programming Question? - string

This is a question I encountered in a Test and I am not able to solve it. Every time I think of an algorithm, a new corner case comes that fails it. Can someone please explain me, how to move through the problem ?
Problem Statement
The Cytes Lottery is the biggest lottery in the world. On each ticket, there is a string of a-z letters. The company produces a draw string S. A person wins if his/her ticket string is a special substring of the draw string. A special substring is a substring which can be formed by ignoring at most K characters from drawString. For example, if draw string = "xyzabc" and tickets are [ac zb yhja] with K=1 then the winning tickets will be 2 i.e ac (won by ignoring "b" in drawstring) and zb (won by ignoring "a" in drawstring).
Now some people change their ticket strings in order to win the lottery. To avoid any kind of suspicion, they can make the following changes in their strings.
They can change character 'o' to character 'a' and vice versa
They can change character 't' to character 'l' and vice versa
They can erase a character from anywhere in the string
Note that they can ignore at most 'K' characters from the draw string to get a match with the ticket string.
Write an algorithm to find the number of people who win the lottery (either honestly or by cheating).
Input:
The first line of the input consists of an integer - numTickets, representing the number of tickets (N).
The second line consists of a string - drawString, representing the draw string (S).
The third line consists of N space-seperated strings - tickets1, tickets2,........., ticketsN representing the tickets.
The last line consists of an integer-tolerance, representing the maximum number of characters that can be deleted from the drawString(K).
Output:
An integer representing the number of winning tickets (either fairly or by cheating).
Constraints:
0 <= numTickets <= 1000
0 <= length of drawString <= 200
0 <= length of tickets[i] <= 200
0 <= tolerance <= 1000
Note:
The drawString contains lowercase English alphabets
Example:
Input:
3
aabacd
abcde aoc actld
1
Output:
2

Related

Finding sequences of nucleotides probability in multiple strings

This is my first post here so please be patient with me.
I am stuck on a problem and don't know what to do. I am given a
non-overlapping substring of length 'm' that appears twice in a string of length 'n'. What is the probability of finding the substring of length 'm' in both that string and another string of length '2n'?
For argument's sake let's say that m = 4 and n = 33. I have tried to use Independent Event probability as well as Markov Chain Models, but my answer never seems to be correct.
What would be the chance that the same 2 non-overlapping substrings of length 4 that are found in the string of length 33 will be found in a string of length 66?

what will be the dp and transitions in this problem

Vasya has a string s of length n consisting only of digits 0 and 1. Also he has an array a of length n.
Vasya performs the following operation until the string becomes empty: choose some consecutive substring of equal characters, erase it from the string and glue together the remaining parts (any of them can be empty). For example, if he erases substring 111 from string 111110 he will get the string 110. Vasya gets ax points for erasing substring of length x.
Vasya wants to maximize his total points, so help him with this!
https://codeforces.com/problemset/problem/1107/E
i was trying to get my head around the editorial,but couldn't understand it... can anyone tell an easy way to do it?
input:
7
1101001
3 4 9 100 1 2 3
output:
109
Explanation
the optimal sequence of erasings is: 1101001 → 111001 → 11101 → 1111 → ∅.
Here, we consider removing prefixes instead of substrings. Why?
We try to remove a consecutive prefix of a particular state which is actually a substring in the main string. So, our DP states will be start index, end index, prefix length.
Let's consider an example str = "1010110". Here, initially start=0, end=7, and prefix=1(the first '1' will be the only prefix now). we iterate over all the indices in the current state except the starting index and check if str[i]==str[start]. Here, for example, str[4]==str[0]. Now we divide the string into "010" with prefix=1(010) && "110" with prefix=2(1010110). These two are now two individual subproblems. So, when there remains a string with length 1, we return aprefix.
Here is my code.

Number of substrings with given constraints

I am given a sorted string and I wish to count the number of substrings (not necessarily contiguous) that are possible with the following constraints:
All the alphabets in the substring should be in sorted order.
The substring must contain only 1 vowel.
The length of the substring should be greater than or equal to 3.
For example:
for "aabbc",
we have 3 substrings "abc","abb","abbc" that match the above constraints.So, here 3 is the ans.
How do I go about for a general string?
I have tried this for 2-3 hours, but couldn't find a proper way. I was asked this question in a programming coding round today and I fear the same question would be asked in the interview tomorrow. Even hints or approach would be appreciated.
Suppose we have k vowels, and an array A specifying the histogram of each non-vowel. (i.e. A[0] is the number of the first non-vowel, A[1] is the number of the second non-vowel.)
Then (ignoring the length constraint) we have k choices for the vowel, and (A[0]+1)*(A[1]+1)*(A[2]+1)*... choices for the remaining letters (for each non-vowel we can have 0,1,2,...,A[i] choices).
This overcounts by k (for the single letter cases) and by k*len(A) for the double letter cases, so simply subtract these from the total.
Example Python code:
from collections import Counter
s='aabbc'
vowels = 'aeiou'
C = Counter(s)
t = 1
vowel_count = 0
cons_count = 0
for letter,count in C.items():
if letter in vowels:
vowel_count += 1
else:
cons_count += 1
t *= count+1
print vowel_count * (t - cons_count - 1)

Deterministic automata to find number of subsequence in string of another string

Deterministic automata to find number of subsequences in string ?
How can I construct a DFA to find number of occurence string as a subsequence in another string?
eg. In "ssstttrrriiinnngggg" we have 3 subsequences which form string "string" ?
also both string to be found and to be searched only contain characters from specific character Set .
I have some idea about storing characters in stack poping them accordingly till we match , if dont match push again .
Please tell DFA solution ?
OVERLAPPING MATCHES
If you wish to count the number of overlapping sequences then you simply construct a DFA that matches the string, e.g.
1 -(if see s)-> 2 -(if see t)-> 3 -(if see r)-> 4 -(if see i)-> 5 -(if see n)-> 6 -(if see g)-> 7
and then compute the number of ways of being in each state after seeing each character using dynamic programming. See the answers to this question for more details.
DP[a][b] = number of ways of being in state b after seeing the first a characters
= DP[a-1][b] + DP[a-1][b-1] if character at position a is the one needed to take state b-1 to b
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=1.
Then the total number of overlapping strings is DP[len(string)][7]
NON-OVERLAPPING MATCHES
If you are counting the number of non-overlapping sequences, then if we assume that the characters in the pattern to be matched are distinct, we can use a slight modification:
DP[a][b] = number of strings being in state b after seeing the first a characters
= DP[a-1][b] + 1 if character at position a is the one needed to take state b-1 to b and DP[a-1][b-1]>0
= DP[a-1][b] - 1 if character at position a is the one needed to take state b to b+1 and DP[a-1][b]>0
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=infinity.
Then the total number of non-overlapping strings is DP[len(string)][7]
This approach will not necessarily give the correct answer if the pattern to be matched contains repeated characters (e.g. 'strings').

Substring a text using MOVEL function in RPG

Question:
Is it save to get substring n characters from a text in RPG using MOVEL function which take a text with length x and store it to a variable with capacity n?
Or the only save way to get the first n character is using SUBST?
The background of the question is one of my colleague getting the first 3 characters from a database with 30 char in length is using MOVEL to a variable with length only 3 char (like truncating the rest of it). The strange way, sometimes the receive variable is showing minus character ('-'), sometimes doesn't. So I assume using MOVEL is not a safe way. I am thinking like string in C which always terminated by '\0', you need to use strcpy function to get the copy save, not assigning using = operator.
Anybody who knows RPG familiar with this issue?
MOVEL should work. RPG allows several character data types. Generally speaking, someone using MOVEL will not be dealing with null terminated strings because MOVEL is an old technique and null terminated strings are a newer data type. You can read up on the MOVEx operations and the string operations in the RPG manual. To get a better answer, please post your code, including the definitions of the variables involved.
EDIT: Example of how MOVEL handles signs.
dcl-s long char(20) inz('CORPORATION');
dcl-s short char(3) inz('COR');
dcl-s numb packed(3: 0);
// 369
c movel long numb
dsply numb;
// -369
c movel short numb
dsply numb;
*inlr = *on;
With signed numeric fields in RPG the sign is held in the zone of the last byte of the field. So 123 is X'F1F2F3' but -123 is X'F1F2D3'. If you look at those fields as character strings they will have 123 and 12L in them.
In your program you are transferring something like "123 AAAAAL" to a 3 digit numeric field so you get X'F1F2F3' but because the final character is X'D3' that changes the result to have a zone of D i.e. X'F1F2D3'
You anomaly is dependent on what the 30th character contains. If it is } or any capital letter J to R then you get a negative result. [It doesn't matter whether the first 3 characters are numbers or letters because it is only the second half of the byte, the digit, that matters in your example.]
The IBM manuals say:
If factor 2 is character and the result field is numeric, a minus zone is moved into the rightmost position of the result field if the zone from the rightmost position of factor 2 is a hexadecimal D (minus zone). However, if the zone from the rightmost position of factor 2 is not a hexadecimal D, a positive zone is moved into the rightmost position of the result field. Other result field positions contain only numeric characters.
Don

Resources