Unique numbers with missing digits - string

I have this problem that I must solve in time that is polynomial in N, K and D given below:
Let N, K be some natural numbers.
Then a, b, c ... are N numbers of exactly K digits each.
a, b, c ... contain only the digits 1 and 2 in some order given by the input.
Although, there have only D digits that are visible, the rest of them being hidden (the hidden digits will be noted with the character "?").
There may be different numbers such that one or more of a, b, c ... are generalizations of the said number:
e.g.
2?122 is a generalization for 21122 and 22122
2?122 is not a generalization for 11111
12??? and ?21?? are both generalizations for 12112
???22 and ???11 cannot be generalizations of the same number
Basically, some number is a generalization of the other if the latter can be one of the "unhidden" versions of the former.
Question:
How many different numbers there are such that at least one of a, b, c or ... is their generalization?***
Quick Reminder:
N = nº of numbers
K = nº of digits in each number
D = nº of visible digits in each number
Conditions & Limitations:
N, K, D are natural numbers
1 ≤ N
1 ≤ D < K
Input / Output snippets for verification of the algorithm:
Input:
N = 3, K = 5, D = 3
112??
?122?
1?2?1
Output:
8
Explanation:
The numbers are 11211, 11212, 11221, 11222, 12211, 12221, 21221, 21222, which are 8 numbers.
11211, 11212, 11221, 11222 are the generalizations of 112??
11221, 11222, 21221, 21222 are the generalizations of ?122?
11211, 11221, 12211, 12221 are the generalizations of 1?2?1
Input:
N = 2, K = 3, D = 1
1??
?2?
Output:
6
Explanation:
The numbers are 111, 112, 121, 122, 221, 222, which are 6 numbers.
From my calculations, I found out that there are 2^(K-D) possible numbers in total that have a as their generalization, 2^(K-D) possible numbers in total that have b as their generalization etc., leaving me with N*2^(K-D) numbers.
My big problem is that I found cases where a number has multiple generalizations and therefore it repeats inside N*2^(K-D), so the real nº of different numbers will be, in this case, something smaller than N*2^(K-D).
I don't know how to find only the different numbers and I need your help.
Thank you very much!
EDIT: Answer to «n. 1.8e9-where's-my-share m.»'s question from the comments:
If you have two generalisations, can you find out how many numbers they both generalise?
For two given general numbers a and b (general meaning that they both contain "?"), it is possible to find nº of numbers generalised by both a and b in polynomial time by using the following logic:
1 - we declare some variable q = 1
2 - we start "scanning" the digits of the two numbers simultaneously from left to right:
2.1 - if we find two unhidden digits and they are different, then no numbers are generalized by both a and b and we return 0
2.2 - if we find two hidden digits, then we multiply q by 2, since of the two general numbers result to both generalize some number, that number can have 1 or 2 in place of "?", therefore for each "?" we double the numbers that can be generalized from both a and b as long as step 2.1 is never true.
3 - if scanned all the digits and step 2.1 was never true, then we return 2^q
Therefore, the nº of numbers both a and b generalize is 0 or 2^q, according to the cases presented above.

Unfortunately, this is impossible to do in polynomial time (unless P=NP, and maybe not even then.) Your problem is equivalent to the problem of counting satisfying assignments to a formula in Disjunctive Normal Form, called the DNF counting problem. DNF counting is Sharp-P-hard, so a polynomial time solution could be used to solve all problems in NP in polynomial time too (and more).
To see the relationship, note that each pattern is equivalent to an AND of several literals. If you take '1' in a position to be a literal, and '2' in that position to be that literal negated, you can convert it to a disjunctive clause.
For example:
1 1 2 ? ?
becomes
(x_1 ∧ x_2 ∧ ¬x_3)
? 1 2 2 ?
becomes
(x_2 ∧ ¬x_3 ∧ ¬x_4)
1 ? 2 ? 1
becomes
(x_1 ∧ ¬x_3 ∧ x_5)
The question of how many numbers satisfy at least one of these patterns is exactly the question of how many assignments satisfy at least one of the equivalent clauses.

Related

Decompose an Integer in sum of integers containing only 1,2,3 as its digit

I was struggling with a problem on Atcoder, in which we have to find minimum K for a number N such that K numbers of whose digits are either 1, 2, or 3 can sum up to N.
Through editorial, I could only understand, if
p is good if p = 10a + b
where a is either 0 or a is itself that kind of number and b is either 1, 2, or 3.
from this point onward, I'm not able under how to proceed further and what the author is meant by a good number. This problem seems really delicate but its solution is bizarre. Also, give me a better solution if possible.
problem link: https://atcoder.jp/contests/arc123/tasks/arc123_c

How to find all strings that do not contain substring palindromes

Disclaimer: This is a problem lifted from HackerRank, but their editorial answer wasn't sufficient so I hoped to get better answers. If it's against any policy, please let me know and I'll take this down.
Problem:
You are given two integers, N and M. Count the number of strings of length N under the alphabet set of size M that doesn't contain any palindromic string of the length greater than 1 as a consecutive substring.
N=2,M=2 -> 2 :: AA, AB, BA, BB
N=2,M=3 -> 6 :: AA, AB, AC, BA, BB, BC, CA, CB, CC
ABCDE counts as it does not contain any palindromic substrings.
ABCCC does not count as it does contain "CCC", a palindrome of length >1.
Editorial
Here is the provided answer which I think is wrong:
For N>=3, there are (M−2) ways to choose any next symbol (after the first two) - basically it should not coincide with the previous and pred-previous symbols, that aren't equal.
If N=1, return M
If N=2, return M * (M-1)
If N>=3, return M * (M-1) * (M-2)^(N-2)
counterexample: N=4, M=3, "ABCC"
My Solution Try
When I was working on this problem, I tried to find all the strings that contained palindromic substrings and subtracting that from the total, M^N. I ran into a lot of problems with over counting. For example, "ABABA" has "ABA","BAB","ABA" of n=3, and "ABABA" of n=5.
Thanks for any help in elucidating this problem. I really hope for a good answer to figure this out!
Suppose you build up palindrome-free strings one letter at a time. For the first letter, you have M choices, and for the second, you have M-1, since you can't use the first letter. This much is obvious.
For every letter after the first two, you can't use the previous letter, and you can't use the letter before that, so that's two choices eliminated. What about the other letters? Well, if using one of those creates a palindrome, it would have to be a palindrome of length at least 4 - but if adding a letter creates a palindrome of length K+2 for K>=2, the string must already have had a palindrome of length K for the new palindrome to build off of. (For K<2, this is okay.) Since the string didn't have any palindromes of length >=2, we can conclude that adding any letter other than the previous two letters is fine.
Thus, we have M choices for the first letter, M-1 choices for the second, and M-2 for every letter after that.

Counting number of times an atom occurs in a table in J language

In J, to count the number of times an element occurs in a list in J is:
count =: 4 : '+/x=y'"0 1.
Alternatively, one can use the "member of interval" E. What is the equivalent expression, in J, for counting the number of times an "atom" occurs in a table?
I am also curious why the rank of "count" is given as 1 0 1, when it is specifically defined as a dyad. Why is the monad rank 1 also included? Can "count", as defined above be used as a monad?
I think what you are looking for is to make the table into a list using (Ravel) monadic ','and then proceed as before. So count becomes:
count=: +/#: (= ,) NB. tacit
count=: 4 : '+/ x = ,y' NB. explicit
Cheers, bob

Algorithm to form a given pattern using some strings

Given are 6 strings of any length. The words are to be arranged in the pattern shown below. They can be arranged either vertically or horizontally.
--------
| |
| |
| |
---------------
| |
| |
| |
--------
The pattern need not to be symmetric and there need to be two empty areas as shown.
For example:
Given strings
PQF
DCC
ACTF
CKTYCA
PGYVQP
DWTP
The pattern can be
DCC...
W.K...
T.T...
PGYVQP
..C..Q
..ACTF
where dot represent empty areas.
The other example is
RVE
LAPAHFUIK
BIRRE
KZGLPFQR
LLHU
UUZZSQHILWB
Pattern is
LLHU....
A..U....
P..Z....
A..Z....
H..S....
F..Q....
U..H....
I..I....
KZGLPFQR
...W...V
...BIRRE
If multiple patterns are possible then pattern with lexicographically smallest first line, then second line and so on is to be formed. What algorithm can be used to solve this?
Find strings which suits to this constraint:
strlen(a) + strlen(b) - 1 = strlen(c)
strlen(d) + strlen(e) - 1 = strlen(f)
After that try every possible situation if they are valid. For example;
aaa.....
d.f.....
d.f.....
d.f.....
cccccccc
..f....e
..f....e
..bbbbbb
There will be 2*2*2 = 8 different situation.
There are a number of heuristics that you can apply, but before that, let's go over some properties of the puzzle.
+aa+
c f
+ee+eee+
f d
+bbb+
Let us call the length of the string with the same character as appeared in the diagram above. We have:
a + b - 1 = e
c + d - 1 = f
I will refer to the 2 strings for the cross in the middle as middle strings.
We also infer that the length of the string cannot be less than 2. Therefore, we can infer:
e > a, e > b
f > c, f > d
From this, we know that the 2 shortest strings cannot be middle strings, due to the inequality above.
The 3 largest strings cannot be equal also, since after choosing any of 3 string as middle string, we are left with 2 largest strings that are equal, and it is impossible according to the inequality above.
The puzzle is only tricky when the lengths are regular. When the lengths are irregular, you can do direct mapping from length to position.
If we have the 2 largest strings being equal, due to the inequality above, they are the 2 middle strings. The worst case for this one is a "regular" puzzle, where the length a, b, c, d are equal.
If the 2 largest strings are unequal, the largest string's position can be determined immediately (since its length is unique in the puzzle) - as one of the middle string. In worst case, there can be 3 candidates for the other middle string - just brute force and check all of them.
Algorithm:
Try to map unique length string to the position.
Brute force the 2 strings in the middle (taken into consideration what I mentioned above), and brute force to fill in the rest.
Even with stupid brute force, there are only 6! = 720 cases, if the string can only go from left to right, up to down (no reverse). There will be 46080 cases (* 2^6) if the string is allowed to be in any direction.

string Rabin-Karp elementary number notations

I am reading about String algorithms in Introduction to Algorithms by Cormen etc
Following is text about some elementary number theoretic notations.
Note: In below text refere == as modulo equivalence.
Given a well-defined notion of the remainder of one integer when divided by another, it is convenient to provide special notation to indicate equality of remainders. If (a mod n) = (b mod n), we write a == b (mod n) and say that a is equivalent to b, modulo n. In other words, a == b (mod n) if a and b have the same remainder when divided by n. Equivalently, a == b (mod n) if and only if n | (b - a).
For example, 61 == 6 (mod 11). Also, -13 == 22 == 2 == (mod 5).
The integers can be divided into n equivalence classes according to their remainders modulo n. The equivalence class modulo n containing an integer a is
[a]n = {a + kn : k Z} .
For example, [3]7 = {. . . , -11, -4, 3, 10, 17, . . .}; other denotations for this set are [-4]7 and [10]7.
Writing a belongs to [b]n is the same as writing a == b (mod n). The set of all such equivalence classes is
Zn = {[a]n : 0 <= a <= n - 1}.----------> Eq 1
My question in above text is in equation 1 it is mentioned that "a" should be between 0 and n-1, but in example it is given as -4 which is not between 0 and 6, why?
In addition to above it is mentioned that for Rabin-Karp algorithm we use equivalence of two numbers modulo a third number? What does this mean?
I'll try to nudge you in the right direction, even though it's not about programming.
The example with -4 in it is an example of an equivalence class, which is a set of all numbers equivalent to a given number. Thus, in [3]7, all numbers are equivalent (modulo 7) to 3, and that includes -4 as well as 17 and 710 and an infinity of others.
You could also name the same class [10]7, because every number that is equivalent (modulo 7) to 3 is at the same time equivalent (modulo 7) to 10.
The last definition gives a set of all distinct equivalence classes, and states that for modulo 7, there is exactly 7 of them, and can be produced by numbers from 0 to 6. You could also say
Zn = {[a]n : n <= a < 2 * n}
and the meaning will remain the same, since [0]7 is the same thing as [7]7, and [6]7 is the same thing as [13]7.
This is not a programming question, but never mind...
it is mentioned that "a" should be between 0 and n-1, but in example it is given as -4 which is not between 0 and 6, why?
Because [-4]n is the same equivalence class as [x]n for some x such that 0 <= x < n. So equation 1 takes advantage of the fact to "neaten up" the definition and make all the possibilities distinct.
In addition to above it is mentioned that for Rabin-Karp algorithm we use equivalence of two numbers modulo a third number? What does this mean?
The Rabin-Karp algorithm requires you to calculate a hash value for the substring you are searching for. When hashing, it is important to use a hash function that uses the whole of the available domain even for quite small strings. If your hash is a 32 bit integer and you just add the successive unicode values together, your hash will usually be quite small resulting in lots of collisions.
So you need a function that can give you large answers. Unfortunately, this also exposes you to the possibility of integer overflow. Hence you use modulo arithmetic to keep the comparisons from being messed up by overflow.

Resources