Context free language (Pumping lemma a^n b^m c^min(n,m)) - regular-language

I'm struggling to solve the following problem. I'm supposed to use the pumping lemma.
To prove {a^n b^m c^min(n,m) | m,n >= 0 } is not Context Free.

Consider the string a^p b^p c^p in the language. By the pumping-lemma for context-free languages, this string can be written as uvxyz such that:
|vxy| < p
|vy| > 0
u(v^n)x(y^n)z is also in the language for all natural numbers n
There are five cases to consider for the placement of vxy in our string:
vxy is entirely in the first section of a's only. If we choose n = 0 and pump down, we lose a's, but then the number of c's would need to be reduced as well to remain in the language. This placement of vxy does not work.
vxy spans the a's and the b's. Choosing n = 0 and pumping down will lose a's and b's. Since the number of c's isn't being decreased commensurately, this choice for vxy doesn't work either.
vxy is entirely in the section of b's only. The same argument from case 1 applies here as well.
vxy spans the a's ad the c's. Choosing n > 0 and pumping up will add b's and c's. Now the number of c's will be strictly greater than the number of a's, which means this choice doesn't work either.
vxy is entirely in the section of c's only. Pumping in either direction will make the number of c's different from the number of a's and from the number of b's, so that choice fails as well.
There were five possible places to put vxy in our string, and all of them failed. That means our string cannot be written according to the requirements of the pumping lemma and, as a result, our language cannot be context-free.

Related

How to prove the correctness of a given grammar?

I am wondering how programming langauge developers validate and prove that their grammar is correct. Suppose that I created a new grammar for a new langauge. I can test my grammar with a unit test tool by providing different kinds of test programs. However, I will never 100% ensure that my grammar is correct. How do language developers ensure that their grammar is correct in real world?
Let's say I created a grammar for a new language using pencil and paper. However, I did a mistake and my grammar accepts the expressions that end with a + like 2+2+. I will implement my language using this incorrect grammar, if I don't find the mistake in it. After implementation and unit testing, I can find the error. Is it possible to find it before starting any implementation?
Definitely, I can try my grammar with some sample inputs using pencil and paper (derivation etc.), but I may miss some corner cases. Is there a better approach or how in the real language developers test their grammar?
A proof is a logical argument that demonstrates the truth of a claim. There are as many ways to prove something as there are ways of thinking about a problem. A common way to prove things about discrete structures (like grammars) is using mathematical induction. Basically, you show that something is true in base cases - the simplest cases possible - and then show that if it's true for all cases under a certain size, it must therefore be true for cases of the next size.
In our case: suppose we wanted only to prove your grammar didn't generate + at the end of a word. We could do induction on the number of productions used in constructing a string in the language. We would identify all relevant base cases, show the property holds for these strings, and then show that longer strings in the language are constructed in such a way that it is impossible to get a + at the end. Here's an example.
S := S + S | (S) | x
Base case: the shortest string in the language is x, generated as S -> x. It does not end with a +.
Induction hypothesis: assume all strings produced using up to and including k productions do not end with +.
Induction step: we must show strings produced using more than k productions do not end with +. If we apply the rule (S) to any string generated from S, we do not add + so the property holds. If we apply S + S to strings generated from S, the last symbol in S + S is the last symbol of a shorter string (at least 2 symbols shorter) generated by S. By the induction hypothesis, that string did not end in +, so neither does this one. There are no other productions, so no string in the language ends in +. QED

UML multiple ranges of values in MultiplicityElements

In accordance with UML specification is it correct to specify multiple ranges of values in MultiplicityElements? For example, two ranges at the AssociationEnd: 3..7,10..20 or for an Attribute, eg. account:Account[0..5,8..10]. Popular tools allow to do that. Is it correct?
TLDR: No, this kind of multiplicity is not correct.
Full answer
Sections 7.5.2 and 7.5.3.2 of UML 2.5 specification clearly defines that multiplicity is defined within MultiplicityElement as either a range between two numbers, a specific number (if upper and lower are equal) or a range from a number to infinity if upper number is *. Unfortunately you can not list just specific values. To be more specific it is listed by two numbers, lower and upper that define multiplicity range limitations.
I recall it was possible in some earlier version of UML, however I've seen it only in some book (that unfortunately I don't remember clearly), not the specification itself.
As for B.8.15.1 it tells nothing about possible values, especially doesn't suggest a possibility of listing several values/ranges.
So possible values are:
a
a..b (where a <= b, if a = b then it is equivalent to a)
*
a..* (if a = 0 then it is equivalent to *)
Both a and b can be expressions that evaluate to a natural number greater or equal to 0 if only the inequality of a <= b is held for all possible values of the expression(s).
Of course for in-line multiplicities they are put in square brackets.
On the other hand according to 9.4.2 StructuralFeatures and Parameters are MultiplicityElements so they have precisely one multiplicity.
One MultiplicityElement can have only one multiplicity range.
Whether one umlDiagramElement can have more multiplicity elements associated is not clear to me. Specification 2.5 seems to allow it chapter B.8.15.1
Though the notation is syntactically not allowed, you well might want to specify sets. This can easily be done by attaching a constraint. If you're egg-headed enough you can construct a OCL script. But some clear text like { multiplicity must be within range 0..5 and 8..10 } will be fine. Just use a * for the real multiplicity.

How to determine whether given language is regular or not(by just looking at the language)?

Is there any trick to guess if a language is regular by just looking at the language?
In order to choose proof methods, I have to have some hypothesis at first. Do you know any hints/patterns required to reduce time consumption in solving long questions?
For instance, in order not to spend time on pumping lemma, when language is regular and I don't want to construct DFA/grammar.
For example:
1. L={w ε {a,b}*/no of a in (w) < no of b in (w)}
2. L={a^nb^m/n,m>=0}
How to tell which is regular by just looking at the above examples??
In general, when looking at a language, a good rule of thumb for whether the language is regular or not is to think of a program that can read a string and answer the question "is this string in the language?"
To write such a program, do you need to store some arbitrary value in a variable or is the program's state (that is, the combination of all possible variables' values) limited to some finite fixed number of possibilities? If the language can be recognized by a program that only needs a fixed number of variables that can only have a fixed number of values, then you've got a regular language. If not, then not.
Using this, I can see that the first language is not regular, but the second language is. In the first language, I need to remember how many as I've seen, and how many bs. (Or at the very least, I need to keep track of (# of as) - (# of bs), and accept if the string ends while that count is negative). At the same time, there's no limit on the number of as, so this count could go arbitrarily large.
In the second language, I don't care what n and m are at all. So with the second language, my program would just keep track of "have I seen at least one b yet?" to make sure we don't have any a characters that occur after the first b. (So, one variable with only two values - true or false)
So one way to make language 1 into a regular language is to change it to be:
1. L={w ∈ {a,b}*/no of a in (w) < no of b in (w), and no of a in (w) < 100}
Now I don't need to keep track of the number of as that I've seen once I hit 100 (since then I know automatically that the string isn't in the language), and likewise with the number of bs - once I hit 100, I can stop counting because I know that'll be enough unless the number of as is itself too large.
One common case you should watch out for with this is when someone asks you about languages where "number of as is a multiple of 13" or "w ∈ {0,1}* and w is the binary representation of a multiple of 13". With these, it might seem like you need to keep track of the whole number to make the determination, but in fact you don't - in both cases, you only need to keep a variable that can count from 0 to 12. So watch out for "multiple of"-type languages. (And the related "is odd" or "is even" or "is 1 more than a multiple of 13")
Other mathematical properties though - for example, w ∈ {0,1}* and w is the binary representation of a perfect square - will result in non-regular languages.

Is there a compound assignment operator for a = b <operator> a (where <operator> is not commutative)?

In a lot of languages a = a + b can be written as a += b
In case of numerical operations, a + b is same as b + a, so the single compound operator suffices.
Also, a = a - b can be written as a -=b .
However, a-b is not equal to b-a. Hence, the compound assignment operator does not work for a = b - a
So, are there compound assignment operators for the operation a = b op a (where op can be +, -, *, /, %, and order matters) ?
[Non commutative operations]
No there isn't a short-hand notation for a = b + a. If you need to do a lot of a = b + a for strings, you'd better build a list like:
lst = []
...
lst.append("a")
lst.append("bb")
lst.append("ccc")
lst.append("dddd")
...
lst.reverse()
return ''.join(lst) # "ddddcccbba"
No, there is not.
Origin of the shorthand
I suspect this shorthand to come from assembly language where the ADD instruction does exactly that - takes two operands, makes an addition and stores it to the first one.
I'd say people were used to think this way and so this pattern appeared also in C language as a += b shorthand. Other languages took this from C.
I think there is no special reason to have or not to have a = a + b or a = b + a. I think none of them two is more often needed in programming. The reason is historical. The same why we use QWERTY keyboard layout and not the others.
Update: See this, it is a myth, because C was based on B language rather than coming from assembly languages. The origin is not clear.
Possible reasons
Every operator makes the language more complex. Python supports operator overloading, so there is even more work to have a new one.
It is rarely used in comparing with +=.
People are (were) used from assembly language more to += kind of operation rather than a = b + a, so they were okay with the fact no shorthand existed and did not requested it.
Readability concerns.
Lack of suitable syntax. How would you design it?
Possible solutions
The best possible solution is to just write a = b + a, because it is clear and readable from the first glance. For the same reason (readability) (Update: who knows?) Python does not provide a++ known from C and other languages. You have to type a += 1. The += shorthand is not very readable to a programming beginner neither, but one can still somehow at least guess what is about. It is compromise between tradition, laziness and readability.
If there is no tradition, readability should win, at least in Python. So one should clearly write a few characters more rather than looking for a shorthand. That is the case for a = b + a.
Note
If you are concatenating more strings, you should watch for .join() for the performance concern.
I don't know of such a shortcut built into any language, but some languages would allow you to create one.
In Scala, for instance, you can essentially define your own operators.

Why do prevailing programming languages like C use array starting from 0? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why does the indexing start with zero in 'C'?
Why do prevailing programming languages like C use array starting from 0? I know some programming languages like PASCAL have arrays starting from 1. Are there any good reasons for doing so? Or is it merely a historical reason?
Because you access array elements by offset relative to the beginning of the array.
First element is at offset 0.
Later more complex array data structures appeared (such as SAFEARRAY) that allowed arbitrary lower bound.
In C, the name of an array is essentially a pointer, a reference to a memory location, and so the expression array[n] refers to a memory location n-elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0]. Most programming languages have been designed this way, so indexing from 0 is pretty much inherent to the language.
However, Dijkstra explains why we should index from 0. This is a problem on how to denote a subsequence of natural numbers, say for example 1,2,3,...,10. We have four solutions available:
a. 0 < i < 11
b. 1<= i < 11
c. 0 < i <= 10
d. 1 <= i <= 10
Dijkstra argues that the proper notation should be able to denote naturally the two following cases:
The subsequence includes the smallest natural number, 0
The subsequence is empty
Requirement 1. leaves out a. and c. since they would have the form -1 < i which uses a number not lying in the natural number set (Dijkstra says this is ugly). So we are left with b. and d. Now requirement 2. leaves out d. since for a set including 0 that is shrunk to the empty one, d. takes the form 0 <= i <= -1, which is a little messed up! Subtracting the ranges in b. we also get the sequence length, which is another plus. Hence we are left with b. which is by far the most widely used notation in programming now.
Now you know. So, remember and take pride in the fact that each time you write something like
for( i=0; i<N; i++ ) {
sum += a[i];
}
you are not just following the rules of language notation. You are also promoting mathematical beauty!
here
In assembly and C, arrays was implemented as memory pointers. There the first element was stored at offset 0 from the pointer.
In C arrays are tied to pointers. Array index is a number that you add to the pointer to the array's initial element. This is tied to one of the addressing modes of PDP-11, where you could specify a base address, and place an offset to it in a register to simulate an array. By the way, this is the same place from which ++ and -- came from: PDP-11 provided so-called auto-increment and auto-decrement addressing modes.
P.S. I think Pascal used 1 by default; generally, you were allowed to specify the range of your array explicitly, so you could start it at -10 and end at +20 if you wanted.
Suppose you can store only two bits. That gives you four combinations:
00 10 01 11 Now, assign integers to those 4 values. Two reasonable mappings are:
00->0
01->1
10->2
11->3
and
11->-2
10->-1
00->0
01->1
(Another idea is to use signed magnitude and use the mapping:
11->-1 10->-0 00->+0 01->+1)
It simply does not make sense to use 00 to represent 1 and use 11 to represent 4. Counting from 0 is natural. Counting from 1 is not.

Resources