Number of substrings with given constraints - string

I am given a sorted string and I wish to count the number of substrings (not necessarily contiguous) that are possible with the following constraints:
All the alphabets in the substring should be in sorted order.
The substring must contain only 1 vowel.
The length of the substring should be greater than or equal to 3.
For example:
for "aabbc",
we have 3 substrings "abc","abb","abbc" that match the above constraints.So, here 3 is the ans.
How do I go about for a general string?
I have tried this for 2-3 hours, but couldn't find a proper way. I was asked this question in a programming coding round today and I fear the same question would be asked in the interview tomorrow. Even hints or approach would be appreciated.

Suppose we have k vowels, and an array A specifying the histogram of each non-vowel. (i.e. A[0] is the number of the first non-vowel, A[1] is the number of the second non-vowel.)
Then (ignoring the length constraint) we have k choices for the vowel, and (A[0]+1)*(A[1]+1)*(A[2]+1)*... choices for the remaining letters (for each non-vowel we can have 0,1,2,...,A[i] choices).
This overcounts by k (for the single letter cases) and by k*len(A) for the double letter cases, so simply subtract these from the total.
Example Python code:
from collections import Counter
s='aabbc'
vowels = 'aeiou'
C = Counter(s)
t = 1
vowel_count = 0
cons_count = 0
for letter,count in C.items():
if letter in vowels:
vowel_count += 1
else:
cons_count += 1
t *= count+1
print vowel_count * (t - cons_count - 1)

Related

linear time algorithm for finding most frequent m-letter substring in a string

Suppose we have a n letter string and we are searching for most repeated m letter substring (1=<m =< n).
I am just searching for an algorithm which solves this problem in linear time. And I have reached to suffix tree. But how can I solve it by suffix tree?
Thanks a lot.
Idea
You can also solve it with hash function.
We can convert strings to base b numbers where b is a prime number.
For example, if the strings only consist of lowercase alphabet (26 characters a-z) then we can choose b equals 29.
Then we map string characters to corresponding numbers. For example:
a -> 1
b -> 2
c -> 3
...
z -> 26
String abc will equals 29^2*1 + 29^1*2 + 29^0*3 = 899 in base 29.
We should map a -> 1 but not a -> 0 since hash value of aaa and aa will be equal in base b, which shouldn't be.
Now instead of compare two strings, we can compares their hash value in base b. If their hash value are equal then we can say they are equal.
Since hash value can be very large, you can use it's module a large prime number, for example mod 1e9+7. The posibility of two different strings have same hash value is very low in this case.
Algorithm
The algorithm can be described as bellow:
Let n-letter string be S
Let hash(s) be function to get hash value of string s
For each m-letter-substring of S, call it s
Increase the number of occurrences of hash(s), let call its o(hash(s))
Result will be the m-letter-substring s with the maximum o(hash(s))
To calculate hash(s), first we build array H where:
H[i] = (b^(i-1)*S[1] + b^(i-2)*S[2] + b^(i-3)*S[3] + ... + b^0*S[i]) % mod
Here S[i] is the mapped number of character i-th of string S.
To calculate b^x, we can calculate array powb where:
powb[0] = 1; powb[i] = (powb[i - 1] * b) % mod
Then for a substring s[l..r] of string S,
hash(s[l..r]) = (H[r] - H[l-1]*b^(r-l+1)) % mod
As we can see, hash(s) can be negative, in this case we should add mod to hash(s) (hash(s) += mod).
Complexity
O(N) to calculate H, powb
O(N) to iterate every substring s
For each s
O(1) to calculate hash(s)
O(log(N)) to calculate total occurrences of hash value (C++ map)
Total complexity: O(N log N)

What is the Algorithm for this Programming Question?

This is a question I encountered in a Test and I am not able to solve it. Every time I think of an algorithm, a new corner case comes that fails it. Can someone please explain me, how to move through the problem ?
Problem Statement
The Cytes Lottery is the biggest lottery in the world. On each ticket, there is a string of a-z letters. The company produces a draw string S. A person wins if his/her ticket string is a special substring of the draw string. A special substring is a substring which can be formed by ignoring at most K characters from drawString. For example, if draw string = "xyzabc" and tickets are [ac zb yhja] with K=1 then the winning tickets will be 2 i.e ac (won by ignoring "b" in drawstring) and zb (won by ignoring "a" in drawstring).
Now some people change their ticket strings in order to win the lottery. To avoid any kind of suspicion, they can make the following changes in their strings.
They can change character 'o' to character 'a' and vice versa
They can change character 't' to character 'l' and vice versa
They can erase a character from anywhere in the string
Note that they can ignore at most 'K' characters from the draw string to get a match with the ticket string.
Write an algorithm to find the number of people who win the lottery (either honestly or by cheating).
Input:
The first line of the input consists of an integer - numTickets, representing the number of tickets (N).
The second line consists of a string - drawString, representing the draw string (S).
The third line consists of N space-seperated strings - tickets1, tickets2,........., ticketsN representing the tickets.
The last line consists of an integer-tolerance, representing the maximum number of characters that can be deleted from the drawString(K).
Output:
An integer representing the number of winning tickets (either fairly or by cheating).
Constraints:
0 <= numTickets <= 1000
0 <= length of drawString <= 200
0 <= length of tickets[i] <= 200
0 <= tolerance <= 1000
Note:
The drawString contains lowercase English alphabets
Example:
Input:
3
aabacd
abcde aoc actld
1
Output:
2

what will be the dp and transitions in this problem

Vasya has a string s of length n consisting only of digits 0 and 1. Also he has an array a of length n.
Vasya performs the following operation until the string becomes empty: choose some consecutive substring of equal characters, erase it from the string and glue together the remaining parts (any of them can be empty). For example, if he erases substring 111 from string 111110 he will get the string 110. Vasya gets ax points for erasing substring of length x.
Vasya wants to maximize his total points, so help him with this!
https://codeforces.com/problemset/problem/1107/E
i was trying to get my head around the editorial,but couldn't understand it... can anyone tell an easy way to do it?
input:
7
1101001
3 4 9 100 1 2 3
output:
109
Explanation
the optimal sequence of erasings is: 1101001 → 111001 → 11101 → 1111 → ∅.
Here, we consider removing prefixes instead of substrings. Why?
We try to remove a consecutive prefix of a particular state which is actually a substring in the main string. So, our DP states will be start index, end index, prefix length.
Let's consider an example str = "1010110". Here, initially start=0, end=7, and prefix=1(the first '1' will be the only prefix now). we iterate over all the indices in the current state except the starting index and check if str[i]==str[start]. Here, for example, str[4]==str[0]. Now we divide the string into "010" with prefix=1(010) && "110" with prefix=2(1010110). These two are now two individual subproblems. So, when there remains a string with length 1, we return aprefix.
Here is my code.

Minimum no of operations required to create String A By appending subsequence of String B to a empty string C

You have given two strings A and B. You have some empty string C. In one operation You can remove any no of characters (from anywhere) from String B and append it to string C. Minimum no of operations required to convert String C to String A.
e.g if
A is "ABCDE" and
B is "ABDEC" then
In 1st operation you will choose subsequence ABC from B and in 2nd operation DE.
So two operations are required.
if
A is "ABCDE"
B is "EDCBA" then
operations required 5.
Linear complexity expected O(n)
Just use a greedy algorithm.
1 - Let i = 0
2 - Let j = 0
3 - Search for the first A[i] in B after j
4 - If it exists, let j be its index in B, remove it from B, append it to C, increment i, and repeat from 3
5 - If it doesn't exist, repeat from 2
Each time you get to 5 corresponds to one operation.
Assuming all the characters of A (and B) are different, then here is a solution with linear complexity. You need a hashmap or something similar, as well as an array of indices, Y, of equal length to A and B.
1 - Put each character of A in the hashmap as key, with its index as value.
2 - Look up each character of B in the hashmap to get the value i, and put its index into Y at the position i.
3 - Go through Y counting the number of times that Y[i] < Y[i-1]. That's your number of operations.

Linear space data-structure supporing subsequence query on a static string

Build a data-stucture from a given string S of length n which supports fast queries for checking whether an input string J of length m is a subsequence of S.
S is a static string and pre-processing time of the data-structure can be ignored.
Requirements:
The space consumption should be linear O(n)
The runtime of subsequence(J) should depend on m - not necessarily O(m) but, the faster the better.
What is subsequence?
A is a subsequence of B if A can be constructed by removing zero or more characters from B. I.e ABA is a subsequence of ADBDBAC
What I tried
A data-structure which supports the Subsequence(J) query stores pointers from each letter in S to the next occurrence in S of every letter in the alphabet.
Let A be an array of length n + 1. A contains hash-tables hashed over alphabet, σ. Each key-value pair (k,v) in the hash-table contains some letter k as key and it's next occurrence as value v.
The hash-table A_0 contains the first occurrence of every letter in the alphabet.
The hash-table A_1 contains the index of second occurrence for the letter at S_0 along with the first occurrence of the other letters.
The hash-table A_2 contains the index of second occurrence for the letters S_1 and S_2 assuming they are different letters - otherwise A_2 will contain the third index of the letter at S_1 - along with the first occurrence of the other letters and so on...
Example: If T is B C A D F B, ¥ represents the hashtable A_0 and represents a Ø null pointer, the data-structure would look like:
|0 1 2 3 4 5
|¥ B C A D B
A|3 3 3 Ø Ø Ø
B|1 5 5 5 5 Ø
C|2 2 Ø Ø Ø Ø
D|4 4 4 4 Ø Ø
The alphabet \sigma is built from the letters in T and is static. Therefore, perfect hashing (FKS) can be used.
Running the query
To perform the Subsequence(J) query with the string J, we lookup the A-index of the first occurrence J_0 in S using A_0.
In the example we could query Subsequence("BAB") to test if BAB is a subsequence:
* look-up B in column 0 which returns index 1
* look-up A in column 1 which returns index 3
* look-up B in column 3 which returns index 5
As long as we don't pass a null-pointer, the string is subsequence. The hash-lookups take constant time and we have to perform at most |J| of them the runtime is O(|J|).
The space consumption is O(|J|·|S|)
The simple and slow way to check whether or not J is a subsequence of S is:
Start at the beginning of S
For each character c in J, in order, move forward in S to the next occurrence of c.
Iff you make it to the end and find a match for every character, then J is a subsequence of S.
You can accelerate these searches by building a map from each character that occurs in S to a sorted array of the positions at which that character occurs.
Then, to find the next occurrence of a character in step (2), you can lookup the position array for that character and do a binary search in the array for the next occurrence after the current position.
Total worst-case complexity to do a subsequence check would be O(m log n).

Resources