I have the discrete finite automaton defined by the following statement:
{ω | ω is any string not in a* ∪ b*}
For some reason I'm just not understanding the "a* ∪ b*" part. I know what a union is, but how is this different from a* b*? Is the resulting DFA of these two statements the same? I need to first create the DFA for the complement of this language, and then use that DFA to create the DFA of the above language based on that.
Can someone help me understand this?
Let's break down the definitions and then it'll be simpler to see the difference between a* ∪ b* and a* b*.
The set x* refers to any non-negative number (including zero) of times of the symbol x. This includes the empty string (ε), x, xx, xxx and so forth.
The set XY refers to any symbol that is in X followed by Y
If X = {1, 2} and Y = {a, b}, then XY = {1a, 1b, 2a, 2b}
The set X ∪ Y refers to the union of the sets described by X and Y. This includes any symbols that may be in set X OR Y but not necessarily both.
If X = {1, 2} and Y = {a, b}, then X ∪ Y = {1, 2, a, b}
From the above we can tell deduce that the set of elements in a* b* is any element in the set a* followed by any element in the set b* (remember since we're using the * notation, the empty string is included). a* = {ε, a, aa, aaa, aaaa, ... } and b* = {ε, b, bb, bbb, bbbb, ... }. Therefore a* b* = {ε, a, b, ab, aa, aab, aabb, bb, abb, aabb, ...}.
Similarly we now know that a* ∪ b* includes any elements in the set a* OR b*. Therefore a* ∪ b* = {ε, a, b, aa, bb, aaa, bbb, aaaa, bbbb, ...}. Notice there isn't any element that has both the symbols a AND b because that is not in the set.
Finally, you can ask what are the elements that are in a* b* BUT are not in a* ∪ b*. a* b* \ a* ∪ b* = { x ∈ a* b* | x ∉ a* ∪ b*} = { ab, abb, abbb, ... aab, aabb, aabbb, ..., aaab, aaabb, ... }. These are the elements with both symbols a and b in them.
a* U b* is the language of all strings of a only along with all strings of b only: {empty, a, b, aa, bb, aaa, bbb, ...}. The strings not in this language are those strings that contain not only as or bs but both: {ab, ba, aab, aba, baa, abb, bab, bba, ...}. a*b* is yet another language which consists of all the strings where any as come before any bs: {empty, a, b, aa, ab, bb, ...}. a*b* is a superset of a* U b* but it is not equal to it. It is neither a subset nor a superset of the language of strings not in a* U b* but it does overlap in many places. Since all three languages are distinct, all three have distinct DFAs.
Related
I need to decide whether the language L = {wxw| x, w ∈ {a,b}* , w≠ε} is regular or not.
I know that L2 = {wxwR| x, w ∈ {a,b}* , w≠ε} is regular, since you can make sure that the word start and ends in the same letter, but it doesn't seem to work without the reversing, (for example w = 10, x = ε)
How do I prove it though?
It is not regular. Use Myhill-Nerode. Consider the prefix a^n b. The shortest string ending with b that can be appended to this to get a string in the language is a^n b. This means every such prefix is distinguishable and there's no DFA.
I am doing a theory of automata course at University and the teacher gave us this question.
Is this correct
[^[[[a-c]*aa[a-c]*]|[[a-c]*bb[a-c]*]|[[a-c]*cc[a-c]*]]]
Wrong. [^…] only works on character level.
Regular expression does not have an operator that complements the whole language (Σ* − L).
Here is a mechanical construction. You should observe that "no aa/bb/cc" means an a can only be followed by b,c or ε, and similar for the other two letter. So you can start from the regular grammar in DFA form:
S → ε | a A | b B | c C
A → ε | b B | c C
B → ε | c C | a A
C → ε | a A | b B
and then convert the DFA to regular expression by eliminating each state one by one:
S → ε | a (ε | b B | c C) | b B | c C
B → ε | c C | a (ε | b B | c C)
C → ε | b B | a (ε | b B | c C)
Expanding,
S → ε | b B | c C | a | ab B | ac C
B → ε | c C | a | ab B | ac C
C → ε | b B | a | ab B | ac C
You could do some simplification meanwhile, e.g. change all right-recursion X → y | z X to the Kleene star X → z* y, or merge common branches X → xy | y with optionals X → x? y.
S → a? | a? b B | a? c C
B → (ab)* a? | (ab)* a? c C
C → (ac)* a? | (ac)* a? b B
You should be able to figure out the rest.
(Note that there can be multiple regular expressions describing the same language. While this method can produce a solution, it may not be shortest one.)
Try this regex:
^(?!.*([A-Za-z])\1).*$
This regex used a negative lookahead which asserts that the same two characters never occur back-to-back anywhere in the string. The quantity ([A-Za-z]) matches and captures every single letter in the string, and then \1 uses that captured letter.
Demo here:
Regex101
N.B.: this is a more didactic answer trying to get the student thinking, it does not contain the solution to the actual problem.
I assume that you only have to worry about 3 letters as per your question, not about all 26+ ones.
Note that, without spoiling anything, the purpose of the exercise is very likely to show the practical and theoretical limits of regular expression when trying to track contextual information.
You also seem to be restricted to the bare language without more recent constructs like lookaheads.
They regexp you came up with does not compute, regarding the "canonical" regexp features. E.g., you have nested square brackets. Without more specification, basic regexp have (), |, [], ., *, +, ? with only the () nesting. Note that in the strict theoretical version, there may be even less meta symbols allowed. Trust your notes/textbook on that as we do not know what your teacher will accept.
Try to come up with a solution, and then you only need to find a single counter example. Then try to work around that example until you cannot think of any more...
One more hint: Try to work from what you are given. Can you make an expression for a...a where ... contains no a but still follows the rule of no bb nor cc?
It is easy to construct a DFA for this language:
Q e Q' Initial state: qI
qI a qA Accepting states: qI, qA, qB, qC
qI b qB Dead states: qD
qI c qC
qA a qD
qA b qB
qA c qC
qB a qA
qB b qD
qB c qC
qC a qA
qC b qC
qC c qD
qD a qD
qD b qD
qD c qD
With a DFA in hand, you can use any well-known algorithm for transforming a DFA into a regular expression to get an answer. These algorithms are known and running through them - automatically or by hand - will always give you a correct answer. Here is a question with answers showing how to go through this process: https://cs.stackexchange.com/q/2016/69
Pumping lemma definition (from wiki)
Let L be a regular language. Then there exists an integer p ≥ 1 depending only on L such that every string w in L of length at least p (p is called the "pumping length"[4]) can be written as w = xyz (i.e., w can be divided into three substrings), satisfying the following conditions:
|y| ≥ 1;
|xy| ≤ p
for all i ≥ 0, xyiz ∈ L
Suppose I want to test regular expression 011
Since it is regular expressionm, there is string w for at least length p that satisfy w=xyz
The number of this automata is 3, p should be >= 3
But only string that accept this automata is 011
So I pick 011 as w
I can break up 3 part 011 = xyz
but how can I break? I cannot satisfy
|y| ≥ 1;
|xy| ≤ p
for all i ≥ 0, xyiz ∈ L
Since it is only accept 011
How can I pump? Where am I wrong
Let p be 4. Then there are no strings w in L of length at least p, so any statement of the form "Every string w in L of length at least p […]" will be vacuously true. So the pumping lemma is satisfied.
Pumping lemma is generally applicable to infinite regular languages.
And is not used to prove L is regular
It is used to prove L is not regular
But it satisfies for all infinite regular languages
Given an alphabet of {a, b} where Na denotes the number of occurrences of a, and Nb the number of occurrences of b:
L1 = {xy | Na(x) = Nb(y)}
L2 = {w | Na(w) and Nb(w) are even number}
Wouldn't a single DFA with four states and using mod be able to accept both languages?
No, because both languages are different so you can't draw single DFA for both languages.
An automaton uniquely defined a language, but yes of-course for a language more than one automata are possible called 'equivalent automata'.
Language L1 = A = {xy | Na(x) = Nb(y)} is a regular language. Regular expression for this language is:
(a + b)*a(a + b)*b(a + b)* + ^
To understand this language and regular expression read: "Show that the following set over {a, b} is regular".
Language L2 = A = {w | Na(w) and Nb(w) are even number} is also a regular language. Regular expression for this language is:
((a + b(aa)*ab)(bb)*(ba(aa)*ab(bb)*)*a + (b + a(bb)*ba)(aa)*(ab(bb)*ba(aa)*)*b)*
To understand this language and regular expression read: "Need Regular Expression for Finite Automata".
But both languages are not equal because there are some strings in language L1 those are not belongs to language L2 e.g. ab is a string in L1 but doesn't not consist of even number of a and b hence doesn't belongs to language L2.
Note: Language L2 is either not a subset of language L1, because in L2 a strings of even length and single symbol is possible like aa, aaaa, bb, bbbb but these strings are not member in L1.
Both languages are different hence single DFA is not possible for both languages.
Both the languages L1 = {xy | Na(x) = Nb(y)} and
L2 = {w | Na(w) and Nb(w) are even number}are different so we cannot draw a single DFA for both languages.
For Language L1 :
A = {xy | Na(x) = Nb(y)} is a regular language.
Regular expression for this language is:
(a + b)*a(a + b)b(a + b)
Language L2 :
A = {w | Na(w) and Nb(w) are even number} is also a regular language. Regular expression for this language is:
((a + b(aa)ab)(bb)(ba(aa)ab(bb))*a + (b + a(bb)ba)(aa)(ab(bb)ba(aa))b)
Both languages are not equal because there are some strings in language L1 which doesnt belong to language L2. ab is a string in L1 but doesn't not consist of even number of a and b hence doesn't belongs to language L2.
As Both languages are different,single DFA cannot be constructed that accepts both the languages.
I'm learning the difference between the lemmata in the question. Every reference I can find uses the example:
{(a^i)(b^j)(c^k)(d^l) : i = 0 or j = k = l}
to show the difference between the two. I can find an example using the regular lemma to "disprove" it.
Select w = uvxyz, s.t. |vy| > 0, |vxy| <= p.
Suppose w contains an equal number of b's, c's, d's.
I selected:
u,v,x = ε
y = (the string of a's)
z = (the rest of the string w)
Pumping y will just add to the number of a's, and if |b|=|c|=|d| at first, it still will now.
(Similar argument for if w has no a's. Then just pump whatever you want.)
My question is, how does Ogden's lemma change this strategy? What does "marking" do?
Thanks!
One important stumbling issue here is that "being able to pump" does not imply context free, rather "not being able to pump" shows it is not context free. Similarly, being grey does not imply you're an elephant, but being an elephant does imply you're grey...
Grammar context free => Pumping Lemma is definitely satisfied
Grammar not context free => Pumping Lemma *may* be satisfied
Pumping Lemma satisfied => Grammar *may* be context free
Pumping Lemma not satisfied => Grammar definitely not context free
# (we can write exactly the same for Ogden's Lemma)
# Here "=>" should be read as implies
That is to say, in order to demonstrate that a language is not context free we must show it fails(!) to satisfy one of these lemmata. (Even if it satisfies both we haven't proved it is context free.)
Below is a sketch proof that L = { a^i b^j c^k d^l where i = 0 or j = k = l} is not context free (although it satisfies The Pumping Lemma, it doesn't satisfy Ogden's Lemma):
Pumping lemma for context free grammars:
If a language L is context-free, then there exists some integer p ≥ 1 such that any string s in L with |s| ≥ p (where p is a pumping length) can be written as
s = uvxyz
with substrings u, v, x, y and z, such that:
1. |vxy| ≤ p,
2. |vy| ≥ 1, and
3. u v^n x y^n z is in L for every natural number n.
In our example:
For any s in L (with |s|>=p):
If s contains as then choose v=a, x=epsilon, y=epsilon (and we have no contradiction to the language being context-free).
If s contains no as (w=b^j c^k d^l and one of j, k or l is non-zero, since |s|>=1) then choose v=b (if j>0, v=c elif k>0, else v=c), x=epsilon, y=epsilon (and we have no contradiction to the language being context-free).
(So unfortunately: using the Pumping Lemma we are unable to prove anything about L!
Note: the above was essentially the argument you gave in the question.)
Ogden's Lemma:
If a language L is context-free, then there exists some number p > 0 (where p may or may not be a pumping length) such that for any string w of length at least p in L and every way of "marking" p or more of the positions in w, w can be written as
w = uxyzv
with strings u, x, y, z, and v such that:
1. xz has at least one marked position,
2. xyz has at most p marked positions, and
3. u x^n y z^n v is in L for every n ≥ 0.
Note: this marking is the key part of Ogden's Lemma, it says: "not only can every element be "pumped", but it can be pumped using any p marked positions".
In our example:
Let w = a b^p c^p d^p and mark the positions of the bs (of which there are p, so w satisfies the requirements of Ogden's Lemma), and let u,x,y,z,v be a decomposition satisfying the conditions from Ogden's lemma (z=uxyzv).
If x or z contain multiple symbols, then u x^2 y z^2 w is not in L, because there will be symbols in the wrong order (consider (bc)^2 = bcbc).
Either x or z must contain a b (by Lemma condition 1.)
This leaves us with five cases to check (for i,j>0):
x=epsilon, z=b^i
x=a, z=b^i
x=b^i, z=c^j
x=b^i, z=d^j
x=b^i, z=epsilon
in every case (by comparing the number of bs, cs and ds) we can see that u x^2 v y^2 z is not in L (and we have a contradiction (!) to the language being context-free, that is, we've proved that L is not context free).
.
To summarise, L is not context-free, but this cannot be demonstrated using The Pumping Lemma (but can by Ogden's Lemma) and thus we can say that:
Ogden's lemma is a second, stronger pumping lemma for context-free languages.
I'm not too sure about how to use Ogden's lemma here but your "proof" is wrong. When using the pumping lemma to prove that a language is not context free you cannot choose the splitting into uvxyz. The splitting is chosen "for you" and you have to show that the lemma is not fulfilled for any uvxyz.