how can i see directly that a language is not regular - regular-language

given L={a^n b^n c^n}, how can i say directly without looking at production rules that this language is not regular? i can use pumping lemma but some guys are saying just looking at the grammar that this is not regular one. how is it possible?

You have three chars in your alphabet. All of them depends on the same variable: n.
Now, if you have only two of them, imagine {a^n b^n} you can easily accomplish the task with this production:
S -> ab | aSb
But you have three of them and there's no way to link all of them to the same variable. You should use two syntax category, but since you do it, they are unlinked and you can generate different string from each one of them. The only way to link them is with only one syntax category, and that is impossible.
You can't do:
S -> abc | aSbc
In fact, you can't have a syntax category in your final string, so that is not a string. It needs to be transformed again. And what can you do from that point?
You can do:
aabcbc
or you can do:
aaSbcbc
The first one is a string, and isn't part of your language. The second is not a string, yet. But it's very easy to see that you can't manage to do any allowed string from that.

Related

Find any values from a set/list within a string

Long time lurker, first time poster. I'm hoping to get some advice from the brilliant minds in this community. In the project I'm working in, the goal is to look at a user-provided string and determine if the content of that string contains any (one or many) matches to a list of match criteria. For example:
User-provided string: "I like thing a and thing b"
Match List:
Match Criteria
Match Type
Category
Foo
Exact (Case Insensitive)
Bar
Thing a
Contains (Case Insensitive)
Things
Thing b
Contains (Case Insensitive)
Stuff
In this case, it would return the following matches:
Thing a > Things
Thing b > Stuff
As of now, my approach is to iterate through the match criteria list and check each list item against the user-supplied string using the Match Type specified (Exact, Contains, Regular Expression), returning a list of the matches and then doing some stuff with that list. This approach works, even when matching ~100 rules and handling a 200-record batch, but it seems obvious that the performance will be pretty terrible if a large number of rules is introduced.
Is there a better way to do this that would be supported in Apex called by a trigger? I would love to learn a more sophisticated approach if there is one.
Thanks in advance!
What do you need it for. Is it a pure apex exercise or is it "close" to certain standard sObjects? There are lots of built-in features around "fuzzy matching".
In no specific order...
Have you looked into "all things Einstein", from categorising leads to predicting how likely this opportunity is to close. Might not be direction you expected to take but who knows
Obviously SOSL comes to mind, like what powers the global search. It automatically does some substitutions for you like Mike -> Michael
Matching rules, duplicate rules. You'd have a limit of say 5 active rules but you could hook to them up from Apex, including creative abuse of the system. "Dear salesforce, let's pretend I'm making such and such Opportunity, can you find me similar Opportunities?" (plot twist- you're not making an Oppty at all, you're creating account of some venture capitalist looking for investments that match his preferences). Give matching rules a go, if not for everything then at least for more creative fuzzy matching. You really don't want to implement soundex, levenshtein etc manually...
tags? There's somewhat forgotten feature from SF classic, it creates bunch of tables (AccountTag, ContactTag). This plus SOSL could be close to what you need.
Additionally if you need this for anything close to Knowledge Base:
Data Categories come to mind
KB supports synonyms, letting you define your (not very intuitive) "thing b => stuff" mapping.
and it should survive translations

How to prove the correctness of a given grammar?

I am wondering how programming langauge developers validate and prove that their grammar is correct. Suppose that I created a new grammar for a new langauge. I can test my grammar with a unit test tool by providing different kinds of test programs. However, I will never 100% ensure that my grammar is correct. How do language developers ensure that their grammar is correct in real world?
Let's say I created a grammar for a new language using pencil and paper. However, I did a mistake and my grammar accepts the expressions that end with a + like 2+2+. I will implement my language using this incorrect grammar, if I don't find the mistake in it. After implementation and unit testing, I can find the error. Is it possible to find it before starting any implementation?
Definitely, I can try my grammar with some sample inputs using pencil and paper (derivation etc.), but I may miss some corner cases. Is there a better approach or how in the real language developers test their grammar?
A proof is a logical argument that demonstrates the truth of a claim. There are as many ways to prove something as there are ways of thinking about a problem. A common way to prove things about discrete structures (like grammars) is using mathematical induction. Basically, you show that something is true in base cases - the simplest cases possible - and then show that if it's true for all cases under a certain size, it must therefore be true for cases of the next size.
In our case: suppose we wanted only to prove your grammar didn't generate + at the end of a word. We could do induction on the number of productions used in constructing a string in the language. We would identify all relevant base cases, show the property holds for these strings, and then show that longer strings in the language are constructed in such a way that it is impossible to get a + at the end. Here's an example.
S := S + S | (S) | x
Base case: the shortest string in the language is x, generated as S -> x. It does not end with a +.
Induction hypothesis: assume all strings produced using up to and including k productions do not end with +.
Induction step: we must show strings produced using more than k productions do not end with +. If we apply the rule (S) to any string generated from S, we do not add + so the property holds. If we apply S + S to strings generated from S, the last symbol in S + S is the last symbol of a shorter string (at least 2 symbols shorter) generated by S. By the induction hypothesis, that string did not end in +, so neither does this one. There are no other productions, so no string in the language ends in +. QED

Regular expression with quantity dependence

Is it possible to create regular expression that will describe words like:
aabc
aaaabcbc
aaaaaabcbcbc
?
Words are created like each occurrence of (bc) on the right is connected with occurrence (aa) on the left.
Words below are not valid:
aa
bc
aaabc
aaaabc
aabcbc
No it cannot be expressed in terms of regular expressions. That is because, your expression, requires a number of "aa" followed by equal number of "bc". This requires infinite memory. FA do not have infinite memory.
It can be expressed in context-free grammar.-
S -> aaSbc | έ Epsilon stands for string of zero length(empty string).
This generates strings like -
Valid string - έ(Empty string), aabc, aaaabcbc and so on.
Read more about context free and regular grammar here.
Just fyi it would be possible in non regular languages.
For example you could play with balancing groups in .NET and get something like this:
^(?<a>aa)+(bc(?<-a>))+(?(a)(?!))$

caesar cipher check in ocaml

I want to implement a check function that given two strings s1 and s2 will check if s2 is the caesar cipher of s1 or not. the inter face needs to be looked like string->string->bool.
the problem is that I am not allowed to use any string functions other than String.length, so how can I solve it? i am not permitted any list array, iterations. Only recursions and pattern matching.
Please help me. And also can you tell me how I can write a substring function in ocaml other than the module function with the above restrictions?
My guess is that you are probably allowed to use s.[i] to get the ith character of string s. This is the same as String.get, but the instructor may not think of it in those terms. Without some form of getting the individual characters for the string, I believe that this is impossible. You should probably double check with your instructor to be sure, but I would be surprised if he had meant for you to be unable to separate a string into characters (which is something that you cannot do with pattern-matching alone in Ocaml).
Once you can get individual characters, the way to do it should be pretty clear (you do not need substring to traverse each string recursively).
If you still want to write substring, creating it would be complex since you don't have access to String.create or other similar functions. But you can write your own version of String.create using recursion, one character string literals (like "x"), the ability to set a character in a string to another (like s.[0] <- c), and string concatenation (s1 ^ s2). Again, of course, all of this is assuming that those operators are allowed to be used.

Identifying frequent formulas in a codebase

My company maintains a domain-specific language that syntactically resembles the Excel formula language. We're considering adding new builtins to the language. One way to do this is to identify verbose commands that are repeatedly used in our codebase. For example, if we see people always write the same 100-character command to trim whitespace from the beginning and end of a string, that suggests we should add a trim function.
Seeing a list of frequent substrings in the codebase would be a good start (though sometimes the frequently used commands differ by a few characters because of different variable names used).
I know there are well-established algorithms for doing this, but first I want to see if I can avoid reinventing the wheel. For example, I know this concept is the basis of many compression algorithms, so is there a compression module that lets me retrieve the dictionary of frequent substrings? Any other ideas would be appreciated.
The string matching is just the low hanging fruit, the obvious cases. The harder cases are where you're doing similar things but in different order. For example suppose you have:
X+Y
Y+X
Your string matching approach won't realize that those are effectively the same. If you want to go a bit deeper I think you need to parse the formulas into an AST and actually compare the AST's. If you did that you could see that the tree's are actually the same since the binary operator '+' is commutative.
You could also apply reduction rules so you could evaluate complex functions into simpler ones, for example:
(X * A) + ( X * B)
X * ( A + B )
Those are also the same! String matching won't help you there.
Parse into AST
Reduce and Optimize the functions
Compare the resulting AST to other ASTs
If you find a match then replace them with a call to a shared function.
I would think you could use an existing full-text indexer like Lucene, and implement your own Analyzer and Tokenizer that is specific to your formula language.
You then would be able to run queries, and be able to see the most used formulas, which ones appear next to each other, etc.
Here's a quick article to get you started:
Lucene Analyzer, Tokenizer and TokenFilter
You might want to look into tag-cloud generators. I couldn't find any source in the minute that I spent looking, but here's an online one:
http://tagcloud.oclc.org/tagcloud/TagCloudDemo which probably won't work since it uses spaces as delimiters.

Resources