Wonder is the following language finite - regular-language

There is a question about the following language is finite or not in my class
{w : w is a regular expression for {ambn:m+n≤k}} where k is a specific natural number.
I think it is finite, because there can be at most (K+1)*k/2 words in the language, but the reference answer is w is infinite
can anybody explain it
ps: is there only one regular expression for a particular regular language?

If I interpret your question correctly, then yeah, it's infinite. We're looking for the number of different regular expression that match, let's say, 3 character strings 'a' and 'b' where all the 'a's come first. Different regexp languages can vary in their allowed syntax but all of them have some kind of union operator. We could be really pathological and change an 'a' in your pattern to ('a' | 'a'), which reduces to 'a' of course but it's a new way to write it. There are then an infinite number of ways to write that pattern by continuing to expand in the same way.

Related

Prove regular language and automata

This is a grammar and I wan to check if this language is regular or not.
L → ε | aLcLc | LL
For example the result of this grammar is:
acc, accacc ..., aacccc, acaccc, accacc, aaacccccc, ...
I know that is not a regular language but how to prove it? Is building an automata the right way to prove it? What is the resulting automata. I don't see pattern to use it for build the automata.
Thank you for any help!
First, let me quickly demonstrate that you cannot deduce the language of a grammar is irregular based solely on the grammar's being irregular. To see this, consider the unrestricted grammar:
S -> SSaSS | aS | e
SaS -> aSa
aaS -> SSa
This is clearly not a regular grammar but you should be able to verify it generates the infinite regular language of all strings of a.
That said, how should we proceed? We will need to figure out what language your grammar generates, and then argue that particular language cannot be regular. We notice that the only rule that introduces terminal symbols always introduces twice as many c as it does a. Furthermore, it's not hard to see the language must be infinite. We can use the Myhill-Nerode theorem to show that these observations imply the language must be irregular.
Consider the prefix a^n of a hypothetical string in the language of this grammar. The shortest string which can be appended to the end of this prefix to give us a string generated by this grammar is c^(2n). No shorter string will work, and that string always works. Imagine now that we were looking at a correct deterministic finite automaton for the language of the grammar. Then, whatever state processing the prefix a^n left us in, we'd need the shortest path from there to an accepting state in the automaton to have length 2n. But a DFA must have finitely many states, and n is an arbitrary natural number. Our DFA cannot work for all possible n (it would need to have arbitrarily many states). This is a contradiction, so there can be no correct DFA for the language of the grammar. Since all regular languages have DFAs, that means the language of this grammar cannot be regular.

Will L = {a*b*} be classified as a regular language?

Will L = {a*b*} be classified as a regular language?
I am confused because I know that L = {a^n b^n} is not regular. What difference does the kleene star make?
Well it is makes difference when you have a L = {a^n b^n} and a L = {a*b*}.
When you have a a^n b^n language it is a language where you must have the same number of a's and b's example:{aaabbb, ab, aabb, etc}. As you said this is not a regular expression.
But when we talk about L = {a*b*} it is a bit different here you can have any number of a followed by any numbers of b (including 0). Some example are:
{a, b, aaab, aabbb, aabbbb, etc}
As you can see it is different from the {a^n b^n} language where you needed to have the same numbers of a's and b's.
And yes a*b* is regular by its nature. If you want a good explanation why it is regular you can check this How to prove a language is regular they might have a better explanation then me (:
I hope it helped you
The language described by the regular expression ab is regular by definition. These expressions cannot describe any non-regular language and are indeed one of the ways of defining the regular languages.
{a^n b^n: n>0} (this would be a formally complete way of describing it) on the other hand, cannot be described by a regular expression. Intuitively, when reaching the border between a and b you need to remember n. Since it is not bounded, no finite-memory device can do that. In ab you only need to remember that from now on only b should appear; this is very finite. The two stars in some sense are not related; each expands its block independently of the other.

Finiteness of Regular Language

We all know that (a + b)* is a regular language for containing only symbols a and b.
But (a + b)* is a string of infinite length and it is regular as we can build a finite automata, so it should be finite.
Can anyone please explain this?
Finite automaton can be constructed for any regular language, and regular language can be a finite or an infinite set. Of-course there are infinite sets those are not regular sets. Check the Venn diagram below:
Notes:
1. every finite set is a regular set.
2. any dfa for an infinite set will always contains loop (or dfa without loop is not possible for infinite set).
3. every non-regular language is an infinite set.
The word "finite" in finite automata significance the presence of 'finite amount of memory' in automata for the class of regular languages, hence only 'finite' (or says bounded) amount of information can be stored at any instance of time while processing a string of language.
In finite automata, memory is present in the form of states only (whereas in the other class of automata like Pda, Turing Machines external memory are used to store unbounded information). You can think a finite automata as a CPU without explicit memory; that can only store recent results in its registers.
So, we can define "regular language" as — a class of languages for which only bounded (finite) information is required to stored at any instance of time while processing language strings.
Further read (for infinite languages):
What is regular language: What is basically a regular language? And Why a*b* is regular? But language { anbn | n > 0 } is not a regular language
To understand how states are uses as memory element read this answer: How to write regular expression for a DFA
And difference between automate for finite ans infinite regular language: To make sure: Pumping lemma for infinite regular languages only?
Each word in the language (a+b) is of finite length. The same way as there are infinitely many integers, but each of them finite.
Yes, the language itself is an infinite set. Most languages are. But a finite automaton (NB: automata is plural) works just fine for them, provided each word is of finite length.
As an aside: This type of question probably should go to cs.stackexchange.com.
But (a + b)* is a string of infinite length
No, (a + b)* is a finite way to express an infinite set (language) of finite strings.
1. A regular expression describes the string generated by some language. Applying that regular expression gives you all the strings that can be described by that language.
2. When you convert that regular expression to a finite automaton (automata with finite states) , it means that those same strings can also be generated by traversing from state-to-state on that automaton. Now, intuitively, each state here represents a group of strings belonging to that language. It says, after having "absorbed" some input, the string is now in state X.
Example:
If you want a regex to accept strings with even numbers of 0 , then you'll have one state (group) which indicates that even number of 0 has been observed in the input so far. And another state (group) for odd numbers --> this state would be your non-accepting state in the FA.
As shown here, you just needed 2 (finite) states to generate an infinite number of strings, because of the grouping of odd and even we did.
And that is why it is regular.
It just means there exists a finite regular expression for the specified language and is no where related to no of strings generated from the expression.
For many regular languages we can generate infinite number of strings which follow that language but to that language is regular to prove that we need a regular expression which must be finite.
So here the expression (a+b)* is finite way of expressing 0-n number of a's or b's or combination of that but n can take any value which results in infinite no. of strings.

Why Haskell doesn't have a single element tuple?

I'm wondering why Haskell doesn't have a single element tuple. Is it just because nobody needed it so far, or any rational reasons? I found an interesting thread in a comment at the Real World Haskell's website http://book.realworldhaskell.org/read/types-and-functions.html#funcstypes.composite, and people guessed various reasons like:
No good syntax sugar.
It is useless.
You can think that a normal value like (1) is actually a single element tuple.
But does anyone know the reason except a guess?
There's a lib for that!
http://hackage.haskell.org/packages/archive/OneTuple/0.2.1/doc/html/Data-Tuple-OneTuple.html
Actually, we have a OneTuple we use all the time. It's called Identity, and is now used as the base of standard pure monads in the new mtl:
http://hackage.haskell.org/packages/archive/transformers/0.2.2.0/doc/html/Data-Functor-Identity.html
And it has an important use! By virtue of providing a type constructor of kind * -> *, it can be made an instance (a trival one, granted, though not the most trivial) of Monad, Functor, etc., which lets us use it as a base for transformer stacks.
The exact reason is because it's totally unnecessary. Why would you need a one-tuple if you can just have its value?
The syntax also tends to be a bit clunky. In Python, you can have one-tuples, but you need a trailing comma to distinguish it from a parenthesized expression:
onetuple = (3,)
All in all, there's no reason for it. I'm sure there's no "official" reason because the designers of Haskell probably never even considered a single element tuple because it has no use.
I don't know if you were looking for some reasons beyond the obvious, but in this case the obvious answer is the right one.
My answer is not exactly about Haskell semantics, but about the theoretical mathematical elegance of making a value the same as its one-tuple. (So this answer should not be taken as an explanation of the standard behavior expected of a Haskell implementation, because it isn't intended as such.)
In programming languages and computation models where all functions are curried, such as lambda-calculus and combinatory logic, every function has exactly one input argument and one output/return value. No more, no less.
When we want a particular function f to have more than one input argument – say 3 –, we simulate it under this curried regime by creating a 1-argument function that returns a 2-argument function. Thus, f x y z = ((f x) y) z, and f x would return a 2-argument function.
Likewise, sometimes we might want to return more than one value from a function. It is not literally possible under this semantics, but we can simulate it by returning a tuple. We can generalize this.
If, for uniformity, we constrain the only return value of any function to be an (n-)tuple, we are able to harmonize some interesting features of the unit value and of supposedly non-tuple return values with the features of tuples in general, as follows.
Let's adopt as the general syntax of n-tuples the following schema, where ci is the component with the index i:
Notice that n-tuples have delimiting parentheses in this syntax.
Under this schema, how would we represent a 0-tuple? Since it has no components, this degenerate case would be represented like this: ( ). This syntax precisely coincides with the syntax we adopt to represent the unit value. So, we are tempted to make the unit value the same as a 0-tuple.
What about a 1-tuple? It would have this representation: . Here a syntactical ambiguity would immediately arise: parentheses would be used in the language both as 1-tuple delimiters and as mere grouping of values or expressions. So, in a context where (v) appears, a compiler or interpreter would be unsure whether this is a 1-tuple with a component whose value is v, or just an isolated value v inside superfluous parentheses.
A way to solve this ambiguity is to force a value to be the same as the 1-tuple that would have it as its only component. Not much would be sacrificed, since the only non-empty projection we can perform on a 1-tuple is to obtain its only value.
For this to be consistently enforced, the syntactical consequence is that we would have to relax a bit our former requirement that delimiting parentheses are mandatory for all n-tuples: now they would be optional for 1-tuples, and mandatory for all other values of n. (Or we could require all values to be delimited by parentheses, but this would be inconvenient for practical use.)
In summary, under the interpretation that a 1-tuple is the same as its only component value, we could, by making syntactic puns with parentheses, consider all return values of functions in our programming language or computing model as n-tuples: the 0-tuple in the case of the unit type, 1-tuples in the case of ordinary/"atomic" values which we usually don't think of as tuples, and pairs/triples/quadruples/... for other kinds of tuples. This heterodox interpretation is mathematically parsimonious and uniform, is expressive enough to simulate functions with multiple input arguments and multiple return values, and is not incompatible with Haskell (in the sense that no harm is done if the programmer assumes this unofficial interpretation).
This was an argument by syntactic puns. Whether you are satisfied or not with it, we can do even better. A more principled argument can be taken from the mathematical theory of relations, by exploring the Cartesian product operation.
An (n-adic) relation is extensionally defined as a uniform set of (n-)tuples. (This characterization is fundamental to relational database theory and is therefore important knowledge for professional computer programmers.)
A dyadic relation – a set of pairs (2-tuples) – is a subset of the Cartesian product of 2 sets:
. For a homogeneous relation: .
A triadic relation – a set of triples (3-tuples) – is a subset of the Cartesian product of 3 sets:
. For a homogeneous relation: .
A monadic relation – a set of monuples (1-tuples) – is a subset of the Cartesian product of 1 set:
(by the usual mathematical convention).
As we can see, a monadic relation is just a set of atomic values! This means that a set of 1-tuples is a set of atomic values. Therefore, it is convenient to consider that 1-tuples and atomic values are the same thing.

What is the difference between Pattern Matching and Guards?

I am very new to Haskell and to functional programming in general. My question is pretty basic. What is the difference between Pattern Matching and Guards?
Function using pattern matching
check :: [a] -> String
check [] = "Empty"
check (x:xs) = "Contains Elements"
Function using guards
check_ :: [a] -> String
check_ lst
| length lst < 1 = "Empty"
| otherwise = "Contains elements"
To me it looks like Pattern Matching and Guards are fundamentally the same. Both evaluate a condition, and if true will execute the expression hooked to it. Am I correct in my understanding?
In this example I can either use pattern matching or guards to arrive at the same result. But something tells me I am missing out on something important here. Can we always replace one with the other?
Could someone give examples where pattern matching is preferred over guards and vice versa?
Actually, they're fundamentally quite different! At least in Haskell, at any rate.
Guards are both simpler and more flexible: They're essentially just special syntax that translates to a series of if/then expressions. You can put arbitrary boolean expressions in the guards, but they don't do anything you couldn't do with a regular if.
Pattern matches do several additional things: They're the only way to deconstruct data, and they bind identifiers within their scope. In the same sense that guards are equivalent to if expressions, pattern matching is equivalent to case expressions. Declarations (either at the top level, or in something like a let expression) are also a form of pattern match, with "normal" definitions being matches with the trivial pattern, a single identifier.
Pattern matches also tend to be the main way stuff actually happens in Haskell--attempting to deconstruct data in a pattern is one of the few things that forces evaluation.
By the way, you can actually do pattern matching in top-level declarations:
square = (^2)
(one:four:nine:_) = map square [1..]
This is occasionally useful for a group of related definitions.
GHC also provides the ViewPatterns extension which sort of combines both; you can use arbitrary functions in a binding context and then pattern match on the result. This is still just syntactic sugar for the usual stuff, of course.
As for the day-to-day issue of which to use where, here's some rough guides:
Definitely use pattern matching for anything that can be matched directly one or two constructors deep, where you don't really care about the compound data as a whole, but do care about most of the structure. The # syntax lets you bind the overall structure to a variable while also pattern matching on it, but doing too much of that in one pattern can get ugly and unreadable quickly.
Definitely use guards when you need to make a choice based on some property that doesn't correspond neatly to a pattern, e.g. comparing two Int values to see which is larger.
If you need only a couple pieces of data from deep inside a large structure, particularly if you also need to use the structure as a whole, guards and accessor functions are usually more readable than some monstrous pattern full of # and _.
If you need to do the same thing for values represented by different patterns, but with a convenient predicate to classify them, using a single generic pattern with a guard is usually more readable. Note that if a set of guards is non-exhaustive, anything that fails all the guards will drop down to the next pattern (if any). So you can combine a general pattern with some filter to catch exceptional cases, then do pattern matching on everything else to get details you care about.
Definitely don't use guards for things that could be trivially checked with a pattern. Checking for empty lists is the classic example, use a pattern match for that.
In general, when in doubt, just stick with pattern matching by default, it's usually nicer. If a pattern starts getting really ugly or convoluted, then stop to consider how else you could write it. Besides using guards, other options include extracting subexpressions as separate functions or putting case expressions inside the function body in order to push some of the pattern matching down onto them and out of the main definition.
For one, you can put boolean expressions within a guard.
For example:
Just as with list comprehensions, boolean expressions can be freely mixed with among the pattern guards. For example:
f x | [y] <- x
, y > 3
, Just z <- h y
= ...
Update
There is a nice quote from Learn You a Haskell about the difference:
Whereas patterns are a way of making sure a value conforms to some form and deconstructing it, guards are a way of testing whether some property of a value (or several of them) are true or false. That sounds a lot like an if statement and it's very similar. The thing is that guards are a lot more readable when you have several conditions and they play really nicely with patterns.
To me it looks like Pattern Matching and Guards are fundamentally the same. Both evaluate a condition, and if true will execute the expression hooked to it. Am I correct in my understanding?
Not quite. First pattern matching can not evaluate arbitrary conditions. It can only check whether a value was created using a given constructor.
Second pattern matching can bind variables. So while the pattern [] might be equivalent to the guard null lst (not using length because that'd not be equivalent - more on that later), the pattern x:xs most certainly is not equivalent to the guard not (null lst) because the pattern binds the variables x and xs, which the guard does not.
A note on using length: Using length to check whether a list is empty is very bad practice, because, to calculate the length it needs to go through the whole list, which will take O(n) time, while just checking whether the list is empty takes O(1) time with null or pattern matching. Further using `length´ just plain does not work on infinite lists.
In addition to the other good answers, I'll try to be specific about guards: Guards are just syntactic sugar. If you think about it, you will often have the following structure in your programs:
f y = ...
f x =
if p(x) then A else B
That is, if a pattern matches, it is followed right after by a if-then-else discrimination. A guard folds this discrimination into the pattern match directly:
f y = ...
f x | p(x) = A
| otherwise = B
(otherwise is defined to be True in the standard library). It is more convenient than an if-then-else chain and sometimes it also makes the code much simpler variant-wise so it is easier to write than the if-then-else construction.
In other words, it is sugar on top of another construction in a way which greatly simplifies your code in many cases. You will find that it eliminates a lot of if-then-else chains and make your code more readable.

Resources