How to find a Regular Expression for DFA - state-machine

Can anyone tell me how to find a regular expression for an DFA. I read and watch many materials, but I still feel confused about this.
here is my picture
For example, with this picture above I do not see any rule because there can be arbitrary number of 0s or 1s. The only thing I know is that all strings must end with 1s.

Any loop from the end state that is valid can be done Zero or more times. Start the regex with the options for getting to the target state. There are then a pair of options for loops that get back to the target state so that is an or expression that is used zero or more times.
I could give the answer but it looks much like homework.

Related

Dynamic Programming Question - What is the base case?

Been preparing of Coding Interviews. I have been trying to beef up my dynamic programming skills and came across Alvin's awesome channel and this great video about some of the approaches to take.
In one of the programming sections - The problem statement is "Can you construct the target string with the contents in an array" - he goes on to correctly use recursion as an approach. But I got stuck in his base case.
He indicates for example that this should return true:
canConstruct('StakeBoard', ['sta', 'te', 'bo', 'ard']
He goes on to settle the recursion base case as the following returning true by literally saying "Because to generate and empty string, you can generally take no-zero elements from the array!". This is the part I did not understand. What does non-zero elements of the array mean here to make this a base case?
canConstruct('', ['cat', dog'])
"Because to generate an empty string, you can generally take no-zero elements from the array!"
It should be something like
"Because to generate an empty string, you can generally take no or zero elements from the array!"
The next question states that
What does non-zero elements of the array mean here to make this a base case?
Actually, he meant
In case the target is empty, no matter what array of words are given, taking no or zero words will return the answer true.
We call a case base case when we have a definite answer for that case. And the case described above definitely looks like one thereby addressed as the base case.

build a calculator for python 3

hey everyone I need your help for my programming course, I am pursuing my undergraduate degree in psychology.
The question is:
The python application you will develop will receive the mathematical operation that the user wants to calculate and print the result on the screen.
This process will continue until the user enter "Done".
The application will terminate when the user enters the end.
Conditions and restructions:
1)you will operate with positive integers
2)only 4 operations will be used
3)remember the priority of the operation
4)you will use "*" as a cross
5)no brackets will be used
6)it is forbiddento use an external module
7)you are not responsible for the user's incorrect entries
8)bulit-in functions you can only use the following:
int
float
range
print
input
len
str
max
min
you can use all functions of list and str data structure.
there are some things to be considered:
since you write, no modules are allowed, I assume, this should only run in the CLI. This will make the exercise incredibly easier, since GUI and Python are a Love-Hate-Realionship of its own.
Assuming I understood your assign right, the user will type a list of expressions like "1*2+3/4-5" for example, typing "Done" will indicate, that these should processed and the result should be printed, right? If the User types end, the script will stop working.
Also, you will only have to handle positives, this also, and the fact that you don't have to validate the user entries, makes it very easy. I won't give you a full solution, since you should learn something with this assignment, but here are some hints, that may nudge you in the right direction.
Hints:
The input will be in a string, so handling strings will be a main task. You should look up the documentation to know, in what ways you can manipulate strings. Also have a closer look at helper functions like split, if you have the word Done more than one time in the string.
Strings are in fact just lists of letters, punctuation and numbers.
Since strings are just a list, you won't have a neatly stored 22 in there, it will be a [2][2] surrounded by an operation or the words Done or end. You will have to go through your string, break it up in smaller parts, and then from there on, call functions to do whatever there is to do.
Keep in mind, that there are literally millions of ways to achieve what you have to do, if you aren't familiar with programming, keep it simple, break it up in smaller steps and then just proceed through the exercise.
Hope this will help you. If it helped you or gave you a hint in the right direction, I would appreciate an upvote.
Have fun coding.

Algorithm (or pointer to literature) sought for string processing challenge

A group of amusing students write essays exclusively by plagiarising portions of the complete works of WIlliam Shakespere. At one end of the scale, an essay might exclusively consist a verbatim copy of a soliloquy... at the other, one might see work so novel that - while using a common alphabet - no two adjacent characters in the essay were used adjacently by Will.
Essays need to be graded. A score of 1 is assigned to any essay which can be found (character-by-character identical) in the plain-text of the complete works. A score of 2 is assigned to any work that can be successfully constructed from no fewer than two distinct (character-by-character identical) passages in the complete works, and so on... up to the limit - for an essay with N characters - which scores N if, and only if, no two adjacent characters in the essay were also placed adjacently in the complete works.
The challenge is to implement a program which can efficiently (and accurately) score essays. While any (practicable) data-structure to represent the complete works is acceptable - the essays are presented as ASCII strings.
Having considered this teasing question for a while, I came to the conclusion that it is much harder than it sounds. The naive solution, for an essay of length N, involves 2**(N-1) traversals of the complete works - which is far too inefficient to be practical.
While, obviously, I'm interested in suggested solutions - I'd also appreciate pointers to any literature that deals with this, or any similar, problem.
CLARIFICATIONS
Perhaps some examples (ranging over much shorter strings) will help clarify the 'score' for 'essays'?
Assume Shakespere's complete works are abridged to:
"The quick brown fox jumps over the lazy dog."
Essays scoring 1 include "own fox jump" and "The quick brow". The essay "jogging" scores 6 (despite being short) because it can't be represented in fewer than 6 segments of the complete works... It can be segmented into six strings that are all substrings of the complete works as follows: "[j][og][g][i][n][g]". N.B. Establishing scores for this short example is trivial compared to the original problem - because, in this example "complete works" - there is very little repetition.
Hopefully, this example segmentation helps clarify the 2*(N-1) substring searches in the complete works. If we consider the segmentation, the (N-1) gaps between the N characters in the essay may either be a gap between segments, or not... resulting in ~ 2*(N-1) substring searches of the complete works to test each segmentation hypothesis.
An (N)DFA would be a wonderful solution - if it were practical. I can see how to construct something that solved 'substring matching' in this way - but not scoring. The state space for scoring, on the surface, at least, seems wildly too large (for any substantial complete works of Shakespere.) I'd welcome any explanation that undermines my assumptions that the (N)DFA would be too large to be practical to compute/store.
A general approach for plagiarism detection is to append the student's text to the source text separated by a character not occurring in either and then to build either a suffix tree or suffix array. This will allow you to find in linear time large substrings of the student's text which also appear in the source text.
I find it difficult to be more specific because I do not understand your explanation of the score - the method above would be good for finding the longest stretch in the students work which is an exact quote, but I don't understand your N - is it the number of distinct sections of source text needed to construct the student's text?
If so, there may be a dynamic programming approach. At step k, we work out the least number of distinct sections of source text needed to construct first k characters of the student's text. Using a suffix array built just from the source text or otherwise, we find the longest match between the source text and characters x..k of the student's text, where x is of course as small as possible. Then the least number of sections of source text needed to construct the first k characters of student text is the least needed to construct 1..x-1 (which we have already worked out) plus 1. By running this process for k=1..the length of the student text we find the least number of sections of source text needed to reconstruct the whole of it.
(Or you could just search StackOverflow for the student's text, on the grounds that students never do anything these days except post their question on StackOverflow :-)).
I claim that repeatedly moving along the target string from left to right, using a suffix array or tree to find the longest match at any time, will find the smallest number of different strings from the source text that produces the target string. I originally found this by looking for a dynamic programming recursion but, as pointed out by Evgeny Kluev, this is actually a greedy algorithm, so let's try and prove this with a typical greedy algorithm proof.
Suppose not. Then there is a solution better than the one you get by going for the longest match every time you run off the end of the current match. Compare the two proposed solutions from left to right and look for the first time when the non-greedy solution differs from the greedy solution. If there are multiple non-greedy solutions that do better than the greedy solution I am going to demand that we consider the one that differs from the greedy solution at the last possible instant.
If the non-greedy solution is going to do better than the greedy solution, and there isn't a non-greedy solution that does better and differs later, then the non-greedy solution must find that, in return for breaking off its first match earlier than the greedy solution, it can carry on its next match for longer than the greedy solution. If it can't, it might somehow do better than the greedy solution, but not in this section, which means there is a better non-greedy solution which sticks with the greedy solution until the end of our non-greedy solution's second matching section, which is against our requirement that we want the non-greedy better solution that sticks with the greedy one as long as possible. So we have to assume that, in return for breaking off the first match early, the non-greedy solution gets to carry on its second match longer. But this doesn't work, because, when the greedy solution finally has to finish using its first match, it can jump on to the same section of matching text that the non-greedy solution is using, just entering that section later than the non-greedy solution did, but carrying on for at least as long as the non-greedy solution. So there is no non-greedy solution that does better than the greedy solution and the greedy solution is optimal.
Have you considered using N-Grams to solve this problem?
http://en.wikipedia.org/wiki/N-gram
First read the complete works of Shakespeare and build a trie. Then process the string left to right. We can greedily take the longest substring that matches one in the data because we want the minimum number of strings, so there is no factor of 2^N. The second part is dirt cheap O(N).
The depth of the trie is limited by the available space. With a gigabyte of ram you could reasonably expect to exhaustively cover Shakespearean English string of length at least 5 or 6. I would require that the leaf nodes are unique (which also gives a rule for constructing the trie) and keep a pointer to their place in the actual works, so you have access to the continuation.
This feels like a problem of partial matching a very large regular expression.
If so it can be solved by a very large non deterministic finite state automata or maybe more broadly put as a graph representing for every character in the works of Shakespeare, all the possible next characters.
If necessary for efficiency reasons the NDFA is guaranteed to be convertible to a DFA. But then this construction can give rise to 2^n states, maybe this is what you were alluding to?
This aspect of the complexity does not really worry me. The NDFA will have M + C states; one state for each character and C states where C = 26*2 + #punctuation to connect to each of the M states to allow the algorithm to (re)start when there are 0 matched characters. The question is would the corresponding DFA have O(2^M) states and if so is it necessary to make that DFA, theoretically it's not necessary. However, consider that in the construction, each state will have one and only one transition to exactly one other state (the next state corresponding to the next character in that work). We would expect that each one of the start states will be connected to on average M/C states, but in the worst case M meaning the NDFA will have to track at most M simultaneous states. That's a large number but not an impossibly large number for computers these days.
The score would be derived by initializing to 1 and then it would incremented every time a non-accepting state is reached.
It's true that one of the approaches to string searching is building a DFA. In fact, for the majority of the string search algorithms, it looks like a small modification on failure to match (increment counter) and success (keep going) can serve as a general strategy.

A reverse inference engine (find a random X for which foo(X) is true)

I am aware that languages like Prolog allow you to write things like the following:
mortal(X) :- man(X). % All men are mortal
man(socrates). % Socrates is a man
?- mortal(socrates). % Is Socrates mortal?
yes
What I want is something like this, but backwards. Suppose I have this:
mortal(X) :- man(X).
man(socrates).
man(plato).
man(aristotle).
I then ask it to give me a random X for which mortal(X) is true (thus it should give me one of 'socrates', 'plato', or 'aristotle' according to some random seed).
My questions are:
Does this sort of reverse inference have a name?
Are there any languages or libraries that support it?
EDIT
As somebody below pointed out, you can simply ask mortal(X) and it will return all X, from which you can simply pick a random one from the list. What if, however, that list would be very large, perhaps in the billions? Obviously in that case it wouldn't do to generate every possible result before picking one.
To see how this would be a practical problem, imagine a simple grammar that generated a random sentence of the form "adjective1 noun1 adverb transitive_verb adjective2 noun2". If the lists of adjectives, nouns, verbs, etc. are very large, you can see how the combinatorial explosion is a problem. If each list had 1000 words, you'd have 1000^6 possible sentences.
Instead of the deep-first search of Prolog, a randomized deep-first search strategy could be easyly implemented. All that is required is to randomize the program flow at choice points so that every time a disjunction is reached a random pole on the search tree (= prolog program) is selected instead of the first.
Though, note that this approach does not guarantees that all the solutions will be equally probable. To guarantee that, it is required to known in advance how many solutions will be generated by every pole to weight the randomization accordingly.
I've never used Prolog or anything similar, but judging by what Wikipedia says on the subject, asking
?- mortal(X).
should list everything for which mortal is true. After that, just pick one of the results.
So to answer your questions,
I'd go with "a query with a variable in it"
From what I can tell, Prolog itself should support it quite fine.
I dont think that you can calculate the nth solution directly but you can calculate the n first solutions (n randomly picked) and pick the last. Of course this would be problematic if n=10^(big_number)...
You could also do something like
mortal(ID,X) :- man(ID,X).
man(X):- random(1,4,ID), man(ID,X).
man(1,socrates).
man(2,plato).
man(3,aristotle).
but the problem is that if not every man was mortal, for example if only 1 out of 1000000 was mortal you would have to search a lot. It would be like searching for solutions for an equation by trying random numbers till you find one.
You could develop some sort of heuristic to find a solution close to the number but that may affect (negatively) the randomness.
I suspect that there is no way to do it more efficiently: you either have to calculate the set of solutions and pick one or pick one member of the superset of all solutions till you find one solution. But don't take my word for it xd

Extracting information in a string

I would like to parse strings with an arbitrary number of parameters, such as P1+05 or P2-01 all put together like P1+05P2-02. I can get that data from strings with a rather large (too much to post around...) IF tree and a variable keeping track of the position within the string. When reaching a key letter (like P) it knows how many characters to read and proceeds accordingly, nothing special. In this example say I got two players in a game and I want to give +05 and -01 health to players 1 and 2, respectively. (hence the +-, I want them to be somewhat readable).
It works, but I feel this could be done better. I am using Lua to parse the strings, so maybe there is some built-in function, within Lua, to ease that process? Or maybe some general hints , or references for better approaches?
Here is some code:
for w in string.gmatch("P1+05P2-02","%u[^%u]+") do
print(w)
end
It assumes that each "word" begins with an uppercase letter and its parameters contain no uppercase letters.

Resources