Substrings and Subsequences - string

In a string of length n, how many Sub-strings and Sub-sequences can I have... even tho a sub-string is obtained by deleting any prefix and any suffix from s, while a sub-sequence is any string formed by deleting zero or more not necessary a consecutive positions of s.

Assuming you are not ignoring duplicates:
sub strings = n(n+1)/2
count the number of 1 length sub strings = n
count the number of 2 length sub strings = n-1
count the number of 3 length sub strings = n-2
....
count the number of n length sub strings = n - (n-1) = 1
generalizes to the sum of the sequence of numbers from 1 to n.
sub sequences = 2^n
Think of the string as a bit array. either include the character in your sub sequence or do not. there are 2^n combinations.

Related

VBA(excel) numbers as string comparation

How to compare numbers as string?
I want to compare f.ex. 2 and 10. When i sort them in Excel in ascending order 2 is before 10, but when i compare in VBA those two 10 seems to be lower number than 2. I'm using strcomp function. I need to compare them as strings beacuse it's a part of bigger program that's searching for the identical strings in excel columns. Strings can be normal strings, numbers and number-ish strings as "12-131xxx".
Mnich, with StrComp, either vbBinaryCompare or vbTextCompare will give you the returns you're currently getting. When Excel sorts numbers/integers, it sorts as you would expect, but when sorting numbers as strings, it uses a textual comparison; so all numbers starting with 1, even 111, will be ranked lower than 2. As Comintern mentions, you have to add leading zeros to get away with this.
Or, perhaps, you replied one number comparison before you ranked the strings, you could try a Function to extract the numbers, then let that weigh your decision:
'pass in array from main stuff
Function StackOverflow(arr())
Dim arr_NewStr() As String
Dim x As Integer
Dim i As Integer
ReDim arr_NewStr(1 To UBound(arr))
x = 1
For i = 1 To UBound(arr)
Do
If IsNumeric(Mid(arr(i), x, 1)) Then
arr_NewStr = arr_NewStr & Mid(arr(i), x, 1)
End If
x = x + 1
Loop Until x = Len(arr(i))
Next
'clean up
x = vbEmpty: i = vbEmpty
'pass somewhere, or make "arr" public
End Function

Given a integer N greater than zero. How many sequences of 1's and 2's are there

Given a integer N greater than zero.
How many sequences of 1's and 2's are there such that sum of the numbers in the sequence = N ?
(not necessary that every sequence must contain both 1 and 2 )
example :
for N = 2 ; 11,2 => ans = 2 sequences of 1's and 2's
for N = 3 ; 11,12,21 => ans = 3 sequences of 1's and 2's
One can think of a recursive formula, for instance by characterizing the last digits. For instance, a sequence of N+1 can be obtained by concatenating a sequence of N and a 1, or a sequence of N-1 and a 2. So it gives:
R(N+1) = R(N) + R(N-1)
So we have a Fibonacci-type sequence with R(1)=1 and R(2)=2.
See https://en.wikipedia.org/wiki/Fibonacci_number
It gives
where and .
So you can program the answer using a constant number of operations.

Number of substrings with given constraints

I am given a sorted string and I wish to count the number of substrings (not necessarily contiguous) that are possible with the following constraints:
All the alphabets in the substring should be in sorted order.
The substring must contain only 1 vowel.
The length of the substring should be greater than or equal to 3.
For example:
for "aabbc",
we have 3 substrings "abc","abb","abbc" that match the above constraints.So, here 3 is the ans.
How do I go about for a general string?
I have tried this for 2-3 hours, but couldn't find a proper way. I was asked this question in a programming coding round today and I fear the same question would be asked in the interview tomorrow. Even hints or approach would be appreciated.
Suppose we have k vowels, and an array A specifying the histogram of each non-vowel. (i.e. A[0] is the number of the first non-vowel, A[1] is the number of the second non-vowel.)
Then (ignoring the length constraint) we have k choices for the vowel, and (A[0]+1)*(A[1]+1)*(A[2]+1)*... choices for the remaining letters (for each non-vowel we can have 0,1,2,...,A[i] choices).
This overcounts by k (for the single letter cases) and by k*len(A) for the double letter cases, so simply subtract these from the total.
Example Python code:
from collections import Counter
s='aabbc'
vowels = 'aeiou'
C = Counter(s)
t = 1
vowel_count = 0
cons_count = 0
for letter,count in C.items():
if letter in vowels:
vowel_count += 1
else:
cons_count += 1
t *= count+1
print vowel_count * (t - cons_count - 1)

count the number of binary string of length n that is repeatable

The problem is to find the number of repeatable binary strings of length n.A binary string is repeatable if it can be obtained by any sub string of the binary string that repeats itself to form the original binary string.
Example
"1010" is a repeatable string as it can be obtained from "10" by repeating 2 number of times
"1001" is not a repeatable string as it cannot be obtained from any sub string of "1001" by repeating them any number of times
The solution I thought of is to generate all possible binary string of length n and check whether it is is a repeatable or not using KMP algorithm, but this solution is not feasible even for small n like n=40.
The second approach I thought is
for divisor k of n find all sub strings of length k that repeats itself n/k times
Example for n = 6 we have divisor 1,2,3
for length 1 we have 2 sub string "1" and "0" that repeats itself 6
times so "111111" and "000000" are repeatable strings
for length 2 we have 4 sub strings "00" "01" "10" "11" so "000000"
"010101" "101010" and "111111" are repeatable strings
similarly for length 3 we have 8 strings that are repeatable.
Sum up all the divisor generated string and subtract duplicates.
In the above example the string "111111" and "000000" was counted 3 times for each of the divisor.so clearly I am over counting.I need to subtract duplicates but I can't think of anyway to subtract duplicates from my actual count How can I do that?
Am I headed in the right direction or do I need to any other approach?
When you use the second scheme remove the sub strings which made of repeatable binaries. For instance, 00 and 11 are made of the repeat of 0 and 1 respectively. So for length of 2 only consider the "01" and "10"
for length of 3 only consider "001", "010", "011", "100", "101", "110"
...
generally,
for odd length of n remove 0 and (2^n)-1,
for even length of n, remove 0, (2^(n/2)+1), (2^(n/2)+1)2, ...., (2^n)-1
and if n dividable by 3, (1+2^(n/2)+2^(n-2)), (1+2^(n/2)+2^(n-2)) 2, ...
continue this for all divider.
One idea is that if we only count the ways to make the divisor-sized strings from non-repeated substrings, the counts from the divisors's divisors will account for the ways to make the divisors from repeated substrings.
f(1) = 0
f(n) = sum(2^d - f(d)), where 1 <= d < n and d divides n
...meaning the sum of only the ways divisors of n can be made not from repeated substrings.
f(2) = 2^1-0
f(3) = 2^1-0
f(4) = 2^1-0 + 2^2-2
f(6) = 2^1-0 + 2^2-2 + 2^3-2
...

How to find the number of the lexicographically minimal string rotation?

How to find the number of lexicographically minimal string rotation?
For example:
S = abab, N = 2
S = abca, N = 1
S = aaaa, N = 4
I tried Duval's algorithm, it works very long. The string length of 100000000 characters.
Easy -- just determine the minimum period of the string. A string which is periodic in minimal period K will produce identical (and hence lexicographically equal) strings for exactly N/K different rotations, so whatever the lexicographic minimum is, it'll be the result of N/K different rotations.

Resources