How can we distinguish between regular languages and context free languages? - regular-language

to express regular languages we use regexp and for context free languages we can use an stack-like memory, I know context free languages have some specifications like center embedding, but still I'm not sure when we can be confidant a given language is context free language? for example why does natural language is not a regular language. is there any reason except center embedding?

Automata theory states that a regular language can be processed by a Finite State Machine (FSM). However, if a language has "center-embedding", then that language is a Context-Free Language(CFL) which requires a Push-Down Automata(PDA).
Importantly, a PDA is a FSM with an additional resource of a memory-like device that is a "stack" or "counter" in order to keep track of the embeddings.
Wikipedia says in Languages that are not context-free :-
To prove that a given language is not context-free, one may employ
the pumping lemma for context-free languages or a number of other
methods, such as Ogden's lemma or Parikh's theorem.
Wikipedia says in Deciding whether a language is regular :-
To prove that a language is not regular, one often uses
the Myhill–Nerode theorem or the pumping lemma among other methods.
why does natural language is not a regular language ?
Chomsky said in (1957): “English is not a regular language”. As for context-free languages, “I do not know whether or not English is itself literally outside the range of such analyses”.
I am adding that English is such a vast language which can't be recognised by a finite machine.

Related

How do you determine if a language is regular, context free but not regular, or not context free?

I have a homework problem that requires you to prove if a language is one of the three:
A Regular Language
Context-Free but Not Regular
Not Comtext-Free
How would you prove each one? I know Pumping Lemma can verify if a language is Not Regular or Not Context-Free, but that’s it.
The example to help me understand better is the following:
{ a^(2n+1)b^(3n+2) | n ∈ N }, alphabet { a, b } where N is all natural numbers.
The pumping lemma for regular languages can tell you that a language is not regular; however, it cannot tell you that a language is regular. To tell that a language is regular, you must do the equivalent of producing a finite automaton, regular grammar or regular expression and then proving it's correct for your language.
The pumping lemma for context-free languages tells you whether the language is or is not context free. That is, if a language satisfies the pumping lemma for context-free languages, it is context free; and if it does not, then it is not. However, you can certainly use it in the same way you'd use the pumping lemma for regular languages and go ahead and find a pushdown automaton or context-free grammar instead.
In your case, we can first choose the string a^(2p+1) b^(3p+2) to show that the language is not regular by the pumping lemma for regular languages. We can show the language is context-free by arguing that for any string of the form a^(2k+1) b^(3k+2) where 2k+1 and 3k+2 are sufficiently large, we can always choose v to contain 2 a's and y to contain three b's, so that pumping maintains the required property. Alternatively, we can just give a CFG for it based on the same insight:
S -> aaSbbb | abb
Then we should show the grammar is correct, which is left as an exercise.

How many languages does a DFA recognize?

According to Sipser's "Introduction to the Theory of Computation": If A is the set of all strings that machine M accepts, we say that A is the
language of machine M and write L(M) = A. We say that M recognizes A ... A machine may accept several strings, but it always recognizes only one language. and also We say that M recognizes language A if A = {w| M accepts w}.
I guess the question has already been answered, but I would like to know if anyone has any thought about it, if there is anything interesting we can say about the subsets of a regular language, if we can say that the original DFA recognizes them and if there is any interesting relationship between the original DFA and the ones that recognize the smaller languages
If the language recognized by a DFA (of which there is always exactly one) is finite, then there are finitely many sublanguages of that language (indeed, if the language accepted consists of N strings, there are 2^N sublanguages).
There is no useful relationship which can be easily inferred from the sub/super language relationship w.r.t. where in the Chomsky hierarchy the language falls. That is: a sublanguage of a regular language may be undecidable, and a sublanguage of an undecidable language may be regular, with all possible variations in between.
Because of this, there is no particularly neat relationship to be worked out among DFAs of sub/super languages: not all of the sublanguages will even be regular; some sublanguages will have simpler DFAs than the DFA of the super language, and some will have more complicated DFAs than the DFA of the super language. Some will have the same DFA but a different set of accepting states.
Given a DFA, there is only one language corresponding to the machine. A language is a set, that is, a collection of all the strings accepted by the dfa.

Are features of programming languages a concept in semantics, syntax or something else?

When talking about features of programming languages, such as in Programming Language Comparison and D Language Feature Comparison Table, I was wondering what aspect of languages the concept "features" belong to or are discussed under?
Semantics,
syntax
or something else?
Thanks and regards!
This is just a gut feeling, I'm not a language theory guy or anything. I'd say adding a feature to a programming language means both
adding semantics for certain circumstances or construction (e.g. "Is-expressions return a boolean according whether the type of a template argument matches some type according to the following fifty rules: ...")
defining a syntax that belongs to it (e.g. adding IsExpr : "is" "(" someKindOfExpression ")" in the grammar)
It depends entirely on what you mean by a "feature," and how it's implemented. Some features, like Java's generics, are nothing but syntactic sugar - so that's a "syntax feature." The bytecode generated is unaffected by using Java's generics due to type erasure. This allows for backwards compatibility with pre-generic (e.g. Java 1.5) bytecode.
Other language features go much deeper than the syntactic level, like C#'s generics, which are implemented using reification to provide "first-class" generic objects.
I don't think that there is a clean separation for the concept of programming language "features", as many features like garbage collection (Java) or pattern matching (Haskell) are being provided by the runtime environment. So, generally I would say that the programming language - the grammar - per se provides no features. It just determines the rules of the language (Syntax). As the behaviour is being determined by how the code (produced by the grammar by obeying its rules) is being interpreted, programming language features are a sematic aspect.

How are "pattern languages" related to "regular languages"?

I always thought that there was no real language class below type-3 grammars, but now I found the "Language identification in the limit" model which allows learning of pattern languages but not regular languages.
What exactly are pattern languages and what's their difference from regular languages?
Disclaimer: I am not familiar with language identification in the limit.
But I can say this: pattern languages are not “above” regular languages in the Chomsky hierarchy – in fact, they are not formal languages at all. Rather, they are structured but informal descriptions in natural language which don’t fit into the Chomsky hierarchy at all.
(Pattern languages are generalizations of the description texts used for the Design Patterns by the Gang of Four.)

Conditions for a meta-circular evaluator

Are there any conditions that a language must satisfy so that a meta-circular evaluator can be written for that language? Can I write one for BASIC, or for Python?
To quote Reg Braithwaite:
The difference between self-interpreters and meta-circular interpreters is that the latter restate language features in terms of the features themselves, instead of actually implementing them. (Circular definitions, in other words; hence the name). They depend on their host environment to give the features meaning.
Given that, one of the key features of a language that allows meta-circular interpreters to be written for them is homoiconicity, that is, that the primary representation of the program is a primitive datastructure of the language itself. Lisp exhibits this by virtue of the fact that programs are themselves expressed as lists.
You can write it for any language that is Turing-complete, however, your mileage may vary.
For Python, it has been done (PyPy). A list of languages for which it has been done can be found at the Wikipedia article.

Resources