How to read language definition syntax for regular/nonregular languages - abstract

I am trying to understand a simple concept about Language definition.
Specifically when there are two strings in the language definition.
such as:
Language F = ww | w ∈ {0,1}*
Can someone help me understand the syntax? It makes sense to me when there is only one w, the set notation containing w also confuses me.
Looking it up online/in the book didn't answer this specific question concisely.

What this particular notation is telling you is that there is a language F of strings all the strings in F can be written in the form ww, where w is some string of binary digits. "ww" means that you take a string w and concatenate it with itself; so, if w=Patrick87, then ww=Patrick87Patrick87. Not all strings are of the form ww; for instance, consider "01". So, this definition tells us which strings are and which strings are not in F.

Related

Why is the most constraint language for this not Regular and instead, Context-Free?

This is the question:
Q: Given the following production rules, which is finite or otherwise the most constrained language in the Chomsky language hierarchy corresponding to the language described by the following production rules?
Production Rules provided
From what I've read, for Regular Languages in Automata is that it can be constructed by a finite automaton & can't be a^nb^n and cannot have strings produced where we have to count part of the string to produce the rest of it. I'm just still quite confused on what it means when we cannot have strings produced where we have to count part of the string to produce the rest of it... (like just taking this particular question as an example) Could anyone help explain on this?
Thanks a bunch.
The grammar is reproduced here:
S := aAbA
A := aAb
A := aba
Right off the bat, from the syntax of this grammar, we can guarantee the language at most context-free. This is because all productions have single non-terminals on the left-hand side. Given this, the most restrictive language class of the Chomsky hierarchy to which this language belongs must either be the regular languages or the context-free languages.
We can show this language is not regular using the pumping lemma for regular languages. Assume the language of this grammar were regular. Then for any string w in the language of the grammar of length at least p, it must be possible to write w = uvx where |uv| <= p, |v| > 0 and for all n >= 0, u(v^n)x is also a string in the language. Consider now the string a(a^p)aba(b^p)baba. This string is in the language of the grammar because we can derive it as follows:
S := aAbA := a(aAb)bA := aa(aAb)bba := ... := a(a^p)A(b^p)bA
:= a(a^p)aba(b^p)bA := a(a^p)aba(b^p)baba
This string has length at least p (indeed, its length is 2p + 8). As such, I should be able to write it in the manner described above; however, notice that the first p+1 symbols in this string are exclusively a. No matter how I write w = uvx, because |uv| <= p, uv consists only of a, and so must v. Thus, pumping only changes the number of a in the prefix of the string. But this cannot give me strings in the language for any n, since:
the grammar only generates intermediate forms with up to two instances of the non-terminal A
the non-terminal A can only cause the number of a to be exactly one greater than the number of b
all other productions in the grammar add the same numbers of a and b
as a result all strings produced by this grammar can only have two more a's than b's.
changing the number of a's in the prefix without changing the number of b's cannot maintain this characteristic of strings generated by the grammar
This is a contradiction. The only assumption we made was that the language is regular. Thus, we conclude the assumption was wrong and that the language cannot be regular.
Because the language is not regular, the most restrictive language class in the Chomsky hierarchy for it is the context-free languages.

Prove regular language and automata

This is a grammar and I wan to check if this language is regular or not.
L → ε | aLcLc | LL
For example the result of this grammar is:
acc, accacc ..., aacccc, acaccc, accacc, aaacccccc, ...
I know that is not a regular language but how to prove it? Is building an automata the right way to prove it? What is the resulting automata. I don't see pattern to use it for build the automata.
Thank you for any help!
First, let me quickly demonstrate that you cannot deduce the language of a grammar is irregular based solely on the grammar's being irregular. To see this, consider the unrestricted grammar:
S -> SSaSS | aS | e
SaS -> aSa
aaS -> SSa
This is clearly not a regular grammar but you should be able to verify it generates the infinite regular language of all strings of a.
That said, how should we proceed? We will need to figure out what language your grammar generates, and then argue that particular language cannot be regular. We notice that the only rule that introduces terminal symbols always introduces twice as many c as it does a. Furthermore, it's not hard to see the language must be infinite. We can use the Myhill-Nerode theorem to show that these observations imply the language must be irregular.
Consider the prefix a^n of a hypothetical string in the language of this grammar. The shortest string which can be appended to the end of this prefix to give us a string generated by this grammar is c^(2n). No shorter string will work, and that string always works. Imagine now that we were looking at a correct deterministic finite automaton for the language of the grammar. Then, whatever state processing the prefix a^n left us in, we'd need the shortest path from there to an accepting state in the automaton to have length 2n. But a DFA must have finitely many states, and n is an arbitrary natural number. Our DFA cannot work for all possible n (it would need to have arbitrarily many states). This is a contradiction, so there can be no correct DFA for the language of the grammar. Since all regular languages have DFAs, that means the language of this grammar cannot be regular.

Will L = {a*b*} be classified as a regular language?

Will L = {a*b*} be classified as a regular language?
I am confused because I know that L = {a^n b^n} is not regular. What difference does the kleene star make?
Well it is makes difference when you have a L = {a^n b^n} and a L = {a*b*}.
When you have a a^n b^n language it is a language where you must have the same number of a's and b's example:{aaabbb, ab, aabb, etc}. As you said this is not a regular expression.
But when we talk about L = {a*b*} it is a bit different here you can have any number of a followed by any numbers of b (including 0). Some example are:
{a, b, aaab, aabbb, aabbbb, etc}
As you can see it is different from the {a^n b^n} language where you needed to have the same numbers of a's and b's.
And yes a*b* is regular by its nature. If you want a good explanation why it is regular you can check this How to prove a language is regular they might have a better explanation then me (:
I hope it helped you
The language described by the regular expression ab is regular by definition. These expressions cannot describe any non-regular language and are indeed one of the ways of defining the regular languages.
{a^n b^n: n>0} (this would be a formally complete way of describing it) on the other hand, cannot be described by a regular expression. Intuitively, when reaching the border between a and b you need to remember n. Since it is not bounded, no finite-memory device can do that. In ab you only need to remember that from now on only b should appear; this is very finite. The two stars in some sense are not related; each expands its block independently of the other.

Isn't every language regular, according to formal definition of it?

This is the definition of regular languages from Wikipedia's article:
The collection of regular languages over an alphabet Σ is defined recursively as follows:
The empty language Ø is a regular language.
For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language.
If A and B are regular languages, then A ∪ B (union), A • B (concatenation), and A* (Kleene star) are regular languages.
No other languages over Σ are regular.
Now think about aⁿbⁿ which we know is not regular, but doesn't it pass the above rules?
{a} is regular, so is {b} and also their concatenation and thus the mentioned lang!
it feels like I'm mistaking set of languages which is, in other words, set of sets; for set of words which is, the language?
You are mistaken in your statement that you can form this specific language from the rules. Formally, this follows from the Pumping Lemma. To address the reasoning in your question, though:
{a} is regular, so by repeated concatenation, {a^m} is regular
{b} is regular, so by repeated concatenation, {b^n} is regular
so their concatenation, which is anything of the form {a^m b^n} is regular as well, but it is precisely the constraint m == n that you cannot formulate via this family.
aⁿbⁿ is a language that contains only the strings with nx as followed by nxbs.
You can create a regular language that is a superset of this language, but not this language itself.
You're right, {a} is regular and {b} is regular. Thus, by the rules you mentioned, their concatenation is regular as well. However, the concatenation of two languages is defined as {vw | v in L_1, w in L_2}. Since both L_1 and L_2 only contain a single word (a and b, respectively), this definition is equivalent to {vw | v = a, w = b}, which is the set {ab}.
Thus, the concatenation of the two languages is the set {ab}, not a^n b^n.

Haskell tuple constructor (GHC) and the separation between a language and its implementation

Haskell blew my mind yet again when I realised that
(x,y)
Is just syntactic sugar for
(,) x y
Naturally I wanted to extend this to larger tuples. But
(,) x ((,) y z)
Gave me
(x,(y,z))
Which was not what I was looking for. On a whim, I tried
(,,) x y z
And it worked, giving exactly what I wanted:
(x,y,z)
This raised the question: How far can you take it? Much to my astonishment, there seemed to be no limit. All of the below are valid operators:
(,)
(,,)
(,,,)
(,,,,)
--etc
(,,,,,,,,,,,,,,)
(,,,,,,,,,,,,,,,)
--etc
(,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)
--etc
This behaviour is amazing and leads to my actual question: Is it something which can be emulated in my own functions? Or is it just a GHC-specific feature of the tuple operator?
I'm thinking it's the latter as I've read the haskell98 specification and iirc it says that implementations only have to define the tuple operator for up to 15 items. Whereas GHC has gone the whole hog and let you do it up to arbitrary limits.
So, would it be possible to define this family of operators/functions from within the haskell implementation itself, using nothing but the type system and existing language features (declarations, type signatures, function definitions etc.)? And if so, how? Or is it impossible and you have to instead look into the compiler to find the supporting framework for this collection of functions?
This leads to an even more general question: How much of Haskell is supported by Haskell itself, through type and function definitions, declarations etc; and how much is supported by the compiler/implementation? (I am aware that GHC was written in Haskell, that doesn't answer the question)
That is, if you were to abandon the standard libraries (including the prelude) and do everything from the ground up in raw Haskell; would it be possible to build a complete implementation that has all the features of GHC, using only that minimal set of features? What are the mimimum set of language features that you need in order to build a haskell implementation using Haskell? Would I be able to abandon the prelude and then completely rebuild it manually from within GHC? If you abandon the prelude and never import anything, what is left over for you to work with?
It may seem like I'm asking a million questions, but they're really all trying to ask the same thing with different wording. Give it your best shot SO!
Alas, there is no magic in the tuples. Here's the implementation GHC uses, and to give you some idea of what's going on here's the source for the last definition:
data (,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,) a b c d e f g h i j k l m n o p q r s t u v w x y z a_ b_ c_ d_ e_ f_ g_ h_ i_ j_ k_ l_ m_ n_ o_ p_ q_ r_ s_ t_ u_ v_ w_ x_ y_ z_ a__ b__ c__ d__ e__ f__ g__ h__ i__ j__
= (,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,) a b c d e f g h i j k l m n o p q r s t u v w x y z a_ b_ c_ d_ e_ f_ g_ h_ i_ j_ k_ l_ m_ n_ o_ p_ q_ r_ s_ t_ u_ v_ w_ x_ y_ z_ a__ b__ c__ d__ e__ f__ g__ h__ i__ j__
...yeah.
So, would it be possible to define this family of operators/functions from within the haskell implementation itself, using nothing but the type system and existing language features (declarations, type signatures, function definitions etc.)? And if so, how? Or is it impossible and you have to instead look into the compiler to find the supporting framework for this collection of functions?
No, there's no way to define the tuples like that in a generic way. The common pattern is purely syntactic, nothing that can be done recursively in the type system or otherwise. You could generate such definitions using Template Haskell, certainly, but you'd still be generating each individually with string manipulation to create the name, not using any sort of shared structure.
There's also the matter that tuple syntax is built-in and not something that can be imitated, but that's a separate issue. You might imagine types like:
data Tuple2 a b = Tuple2 a b
data Tuple3 a b c = Tuple3 a b c
...etc., which don't use special syntax but still can't be defined generically for the reasons above.
This leads to an even more general question: How much of Haskell is supported by Haskell itself, through type and function definitions, declarations etc; and how much is supported by the compiler/implementation? (I am aware that GHC was written in Haskell, that doesn't answer the question)
Almost all of it is defined in Haskell. Certain things have special syntax you can't imitate, but in most cases that only extends as far as the compiler giving special attention to certain definitions. Otherwise, there's no difference between this:
data [] a = [] | a : [a]
...and any equivalent type you define yourself.
That is, if you were to abandon the standard libraries (including the prelude) and do everything from the ground up in raw Haskell; would it be possible to build a complete implementation that has all the features of GHC, using only that minimal set of features? What are the mimimum set of language features that you need in order to build a haskell implementation using Haskell? Would I be able to abandon the prelude and then completely rebuild it manually from within GHC? If you abandon the prelude and never import anything, what is left over for you to work with?
You may find it enlightening to read about GHC's NoImplicitPrelude and RebindableSyntax extensions, which let you, among other things, change the definitions used to interpret do notation, how numeric literals are handled, what the if then else syntax does, etc.
Suffice it to say that very, very little can't be reimplemented. Most things that can't are only special due to syntax, and could be replaced with equivalent stuff (like lists and tuples, above).
In the end there's a limited set of things that have very special behavior--the IO type being an obvious example--that you can't replace at all, because they're hooked directly into something in the runtime system that you can't replace.

Resources