Context free grammar design - Why is this not allowed? - programming-languages

I am still learning about the context free grammar. A particular kind of question I have doubt about is coming up with a grammar for some specific patterns.
For example:
"At least 3 zeros."
Why not just come up with a grammar that is like: S -> 000 ?
What rule forbids this kind of grammar?

What it could limit your grammar (forbid some constructs) is the grammar form that you use. In the Chomsky normal form (CNF) the rules are one of:
A → BC
A → a
S → ε
In the Greibach normal form (GNF) the rules are:
A → aBCD...
There is and the Backus–Naur form, where you have more richer syntax. Then the Extended Backus–Naur form (EBNF) and at best the Augmented Backus–Naur form (ABNF).
At least 3 zeroes in CNF:
S → AB
A → ZZ
B → ZB
B → 0
Z → 0
At least 3 zeroes in GNF:
S → 0A
A → 0B
B → 0B
B → 0
At least 3 zeroes in BNF:
<S> ::= "000" <A>
<A> ::= "0" <A> | ""
At least 3 zeroes in EBNF:
S = "000", { "0" }
At least 3 zeroes in ABNF:
S = 3*"0"
Some recognition (parsing) algorithms require the grammar to be in some form. For this reason you have to convert the grammar, if it is not already in it.

Related

Unique numbers with missing digits

I have this problem that I must solve in time that is polynomial in N, K and D given below:
Let N, K be some natural numbers.
Then a, b, c ... are N numbers of exactly K digits each.
a, b, c ... contain only the digits 1 and 2 in some order given by the input.
Although, there have only D digits that are visible, the rest of them being hidden (the hidden digits will be noted with the character "?").
There may be different numbers such that one or more of a, b, c ... are generalizations of the said number:
e.g.
2?122 is a generalization for 21122 and 22122
2?122 is not a generalization for 11111
12??? and ?21?? are both generalizations for 12112
???22 and ???11 cannot be generalizations of the same number
Basically, some number is a generalization of the other if the latter can be one of the "unhidden" versions of the former.
Question:
How many different numbers there are such that at least one of a, b, c or ... is their generalization?***
Quick Reminder:
N = nº of numbers
K = nº of digits in each number
D = nº of visible digits in each number
Conditions & Limitations:
N, K, D are natural numbers
1 ≤ N
1 ≤ D < K
Input / Output snippets for verification of the algorithm:
Input:
N = 3, K = 5, D = 3
112??
?122?
1?2?1
Output:
8
Explanation:
The numbers are 11211, 11212, 11221, 11222, 12211, 12221, 21221, 21222, which are 8 numbers.
11211, 11212, 11221, 11222 are the generalizations of 112??
11221, 11222, 21221, 21222 are the generalizations of ?122?
11211, 11221, 12211, 12221 are the generalizations of 1?2?1
Input:
N = 2, K = 3, D = 1
1??
?2?
Output:
6
Explanation:
The numbers are 111, 112, 121, 122, 221, 222, which are 6 numbers.
From my calculations, I found out that there are 2^(K-D) possible numbers in total that have a as their generalization, 2^(K-D) possible numbers in total that have b as their generalization etc., leaving me with N*2^(K-D) numbers.
My big problem is that I found cases where a number has multiple generalizations and therefore it repeats inside N*2^(K-D), so the real nº of different numbers will be, in this case, something smaller than N*2^(K-D).
I don't know how to find only the different numbers and I need your help.
Thank you very much!
EDIT: Answer to «n. 1.8e9-where's-my-share m.»'s question from the comments:
If you have two generalisations, can you find out how many numbers they both generalise?
For two given general numbers a and b (general meaning that they both contain "?"), it is possible to find nº of numbers generalised by both a and b in polynomial time by using the following logic:
1 - we declare some variable q = 1
2 - we start "scanning" the digits of the two numbers simultaneously from left to right:
2.1 - if we find two unhidden digits and they are different, then no numbers are generalized by both a and b and we return 0
2.2 - if we find two hidden digits, then we multiply q by 2, since of the two general numbers result to both generalize some number, that number can have 1 or 2 in place of "?", therefore for each "?" we double the numbers that can be generalized from both a and b as long as step 2.1 is never true.
3 - if scanned all the digits and step 2.1 was never true, then we return 2^q
Therefore, the nº of numbers both a and b generalize is 0 or 2^q, according to the cases presented above.
Unfortunately, this is impossible to do in polynomial time (unless P=NP, and maybe not even then.) Your problem is equivalent to the problem of counting satisfying assignments to a formula in Disjunctive Normal Form, called the DNF counting problem. DNF counting is Sharp-P-hard, so a polynomial time solution could be used to solve all problems in NP in polynomial time too (and more).
To see the relationship, note that each pattern is equivalent to an AND of several literals. If you take '1' in a position to be a literal, and '2' in that position to be that literal negated, you can convert it to a disjunctive clause.
For example:
1 1 2 ? ?
becomes
(x_1 ∧ x_2 ∧ ¬x_3)
? 1 2 2 ?
becomes
(x_2 ∧ ¬x_3 ∧ ¬x_4)
1 ? 2 ? 1
becomes
(x_1 ∧ ¬x_3 ∧ x_5)
The question of how many numbers satisfy at least one of these patterns is exactly the question of how many assignments satisfy at least one of the equivalent clauses.

How do you interpret it? (u∈Σ∗)

Here is the full rule
{a^k u a^k| k≥1, u∈Σ∗}
does this mean either single a or single b or any combinations of a and b from the language can be replaced in u?
So if k=1 then is it aaa | aba OR a(aba)a | a(ba)a
Thanks
Rahman
This rule means every string in the language has the same number of a's at the beginning as at the end, with whatever you want (including more a's) between.
So aaa, aba, aabaa and abaa are all in the language (assuming b is in Σ).
In fact, it is enough that the string is at least 2 characters long and there is an a at either end (left as an exercise).

Regular expression over the language C={a,b}

Good evening everyone,i'm getting stuck with the following regular expression,
I think there is a much easier approach to the expression than mine,
I had to writing down the regular expression and the dfa that from the alphabet {a,b} accepted all the strings that start and end with b and have an even number of a.
My attempt was going into cases, but the outcome, wasn't to great :
I tried something like this :
b b* (aba)* (aab)* (aa)* (aab)* (aba)* b*b
But i think this is not complete.
Should i follow some general rule to achieve this task? Or i just need to practice regular expressions ?
Thanks,any tip or help would be appreciated.
A DFA seems easier to make here, so we can start there and derive the regular expression from there.
We will need at least one, initial, state. This state cannot be accepting because the empty string does not start and end with b. We'll call this q0.
If we see an a in this state, we are looking at a string that does not start with b, so we cannot accept it no matter what comes next. We can represent this with a new, dead, state. We'll call this q1.
If we see a b in q0, we need a new state to represent the fact that we are well on the way to seeing a string that meets the criteria. Indeed, the string b starts and ends with a b and has an even number of a (zero is even); so this state must be accepting. Call this q2.
If we see an a in q2 then we have an odd number of as and did not last see a b, so we cannot accept the string. However, it's still possible to accept a string from this state by seeing an odd number of as followed by at least one b. Call the state to represent this q3.
If we see a b in q2, we're in the same situation as before (even number of a and last saw a b, so we can accept). Stay in q2.
If in q3 and we see an a, we now have an even number of a again and just need a b. Call this new state q4. If we see a b, we still need an a so we might as well stay in q3.
If in q4 and we see an a, we again need more as and can return to q3. If, on the other hand, we get a b, we can return to q2 since the string is in our language.
The DFA looks like this:
q s q'
-- -- --
q0 a q1 q0: initial state
q0 b q2 q1: dead state, did not begin with b
q1 a q1 q2: accepting state, even #a and start/stop with b
q1 b q2 q3: start with b, odd #a
q2 a q3 q4: start with b, even #a, stop with a
q2 b q2
q3 a q4
q3 b q3
q4 a q3
q4 b q2
To get the regular expression we can find regular expressions leading to each state, iteratively, and then take the union of regular expressions for accepting states. In this case, only q2 is accepting so all we need is a regular expression for that state. We proceed iteratively, substituting at each stage.
round 0
(q0): e
(q1): (q0)a + (q1)(a+b)
(q2): (q0)b + (q2)b + (q4)b
(q3): (q2)a + (q3)b + (q4)a
(q4): (q3)a
round 1
(q0): e
(q1): a + (q1)(a+b) = a(a+b)*
(q2): b + (q2)b + (q4)b = (b+(q4)b)b*
(q3): (q2)a + (q3)b + (q4)a = ((q2)+(q4))ab*
(q4): (q3)a
round 2
(q0): e
(q1): a(a+b)*
(q2): (b+(q3)ab)b*
(q3): ((q2)+(q3)a)ab* = (q2)ab* + (q3)aab* = (q2)ab*(aab*)*
(q4): (q3)a
round3:
(q0): e
(q1): a(a+b)*
(q2): (b+(q3)ab)b*
(q3): (b+(q3)ab)b*ab*(aab*)* = bb*ab*(aab*)*+(q3)abb*ab*(aab*)* = bb*ab*(aab*)*(abb*ab*(aab*)*)*
(q4): (q3)a
round4:
(q0): e
(q1): a(a+b)*
(q2): (b+bb*ab*(aab*)*(abb*ab*(aab*)*)*ab)b*
(q3): bb*ab*(aab*)*(abb*ab*(aab*)*)*
(q4): bb*ab*(aab*)*(abb*ab*(aab*)*)*a
Therefore, a regular expression is this:
r = (b+bb*ab*(aab*)*(abb*ab*(aab*)*)*ab)b*
= bb* + bb*ab*(aab*)*(abb*ab*(aab*)*)*abb*
The bb* part encodes the fact that any string of b is a string in the language.
the other part begins and ends with bb* which encodes the fact that any string must begin and end with b
The outermost as encode the fact that any string in the language with a must have both a first and a last a
The aab* parts allow there to be contiguous pairs of a
The abb*ab* part allow allows there to be non-contiguous pairs of a
As a final note, the rules for replacing expressions as above are the following:
A: r r is an expression
B: As s is an expression
=
A: r
B: rs
A: r + As r, s are expressions
=
A = rs*
Good evening ! check it out
b(aa)*b
this results in generation of strings having starting and end on b
and containing even clumps of a if any i-e
produce a in multiple of 2 i-e even number

What's wrong with this Haskell unicode variable name?

What's wrong this this code?
Prelude> let xᵀ = "abc"
<interactive>:10:6: lexical error at character '\7488'
According to my reading of the Haskell 2010 report, any uppercase or lowercase Unicode letter should be valid at the end of a variable name. Does the ᵀ character (MODIFIER LETTER CAPITAL T) not qualify as an uppercase Unicode letter?
Is there a better character to represent the transpose of a vector? I'd like to stay concise since I'm evaluating a dense mathematical formula.
I'm running GHC 7.8.3.
Uppercase Unicode letters are in the Unicode character category Letter, Uppercase [Lu].
Lowercase Unicode letters are in the Unicode character category Letter, Lowercase [Ll].
MODIFIER LETTER CAPITAL T is in the Unicode character category Letter, Modifier [Lm].
I tend to stick to ASCII, so I'd probably just use a name like xTrans or x', depending on the number of lines it is in scope.
Characters not in the category ANY are not valid in Haskell programs and should result in a lexing error.
where
ANY → graphic | whitechar
graphic → small | large | symbol | digit | special | " | '
small → ascSmall | uniSmall | _<br>
ascSmall → a | b | … | z<br>
uniSmall → any Unicode lowercase letter
...
uniDigit → any Unicode decimal digit
...
Modifier letters like ᵀ are not legal Haskell at all. (Unlike sub- or superscript numbers – which are in the Number, Other category so a₁ is treated much like a1.)
I like to use non-ASCII Unicode when it helps readability, but unless you've already assigned another meaning to the prime symbol using it here for transpose should be just fine.

Algorithm to form a given pattern using some strings

Given are 6 strings of any length. The words are to be arranged in the pattern shown below. They can be arranged either vertically or horizontally.
--------
| |
| |
| |
---------------
| |
| |
| |
--------
The pattern need not to be symmetric and there need to be two empty areas as shown.
For example:
Given strings
PQF
DCC
ACTF
CKTYCA
PGYVQP
DWTP
The pattern can be
DCC...
W.K...
T.T...
PGYVQP
..C..Q
..ACTF
where dot represent empty areas.
The other example is
RVE
LAPAHFUIK
BIRRE
KZGLPFQR
LLHU
UUZZSQHILWB
Pattern is
LLHU....
A..U....
P..Z....
A..Z....
H..S....
F..Q....
U..H....
I..I....
KZGLPFQR
...W...V
...BIRRE
If multiple patterns are possible then pattern with lexicographically smallest first line, then second line and so on is to be formed. What algorithm can be used to solve this?
Find strings which suits to this constraint:
strlen(a) + strlen(b) - 1 = strlen(c)
strlen(d) + strlen(e) - 1 = strlen(f)
After that try every possible situation if they are valid. For example;
aaa.....
d.f.....
d.f.....
d.f.....
cccccccc
..f....e
..f....e
..bbbbbb
There will be 2*2*2 = 8 different situation.
There are a number of heuristics that you can apply, but before that, let's go over some properties of the puzzle.
+aa+
c f
+ee+eee+
f d
+bbb+
Let us call the length of the string with the same character as appeared in the diagram above. We have:
a + b - 1 = e
c + d - 1 = f
I will refer to the 2 strings for the cross in the middle as middle strings.
We also infer that the length of the string cannot be less than 2. Therefore, we can infer:
e > a, e > b
f > c, f > d
From this, we know that the 2 shortest strings cannot be middle strings, due to the inequality above.
The 3 largest strings cannot be equal also, since after choosing any of 3 string as middle string, we are left with 2 largest strings that are equal, and it is impossible according to the inequality above.
The puzzle is only tricky when the lengths are regular. When the lengths are irregular, you can do direct mapping from length to position.
If we have the 2 largest strings being equal, due to the inequality above, they are the 2 middle strings. The worst case for this one is a "regular" puzzle, where the length a, b, c, d are equal.
If the 2 largest strings are unequal, the largest string's position can be determined immediately (since its length is unique in the puzzle) - as one of the middle string. In worst case, there can be 3 candidates for the other middle string - just brute force and check all of them.
Algorithm:
Try to map unique length string to the position.
Brute force the 2 strings in the middle (taken into consideration what I mentioned above), and brute force to fill in the rest.
Even with stupid brute force, there are only 6! = 720 cases, if the string can only go from left to right, up to down (no reverse). There will be 46080 cases (* 2^6) if the string is allowed to be in any direction.

Resources