Reversing a list, evaluation order - haskell

I am reading page 69 of Haskell School of Expression and I am not sure that I got the evalution of rev [1:2:3:4] right.
Hudak does not explain the evalution(rewriting) order in detail in his book for reverse.
Could someone please either confirm that my guess (shown in the attached picture) is correct or if not correct then point out what I got wrong. I believe that it is correct but I am not 100% sure, this is the reason for asking.
So the question is:
when I evaluate one step of reverse then aftes the evaluation (i.e. rewriting) the result should be surrounded by parenthesis, right?
If I understand correctly, these unlucky appearance of parentheseses is the reason for the poor (read quadratic) time complexity of reverse. In this example 6 steps are spent in total on list appending in order to reverse a 4 element list.

Yes, nested, left-associative calls to append (in Haskell, goes by the names (++) and (<>)) generates poor performance of singly-linked lists.
There are several solutions to this problem, since it's been known about for 30 or 40 years, at least. I believe the library version of reverse uses an accumulator to achieve linear complexity rather than quadratic, but it's still not something you want to call frequently on lists.

Related

Guess the number: Dynamic Programming -- Identifying Subproblems

I am working on the following problem and having a hell of a time at the moment.
We are playing the Guessing Game. The game will work as follows:
I pick a number between 1 and n.
You guess a number.
If you guess the right number, you win the game.
If you guess the wrong number, then I will tell you whether the number I picked is higher or lower, and you will continue guessing.
Every time you guess a wrong number x, you will pay x dollars. If you run out of money, you lose the game.
Given a particular n, return the minimum amount of money you need to guarantee a win regardless of what number I pick.
So, what do I know? Clearly this is a dynamic programming problem. I have two choices, break things up recursively or go ahead and do things bottom up. Bottom up seems like a better choice to me (though technically the max recursion depth would be 100 as we are guaranteed n<=100). The question then is: What do the sub-problems look like?
Well, I think we could start thinking about subarrays (but possible we need subsequences here) so what is the worst case in each possible sub-division kind of thing? That is:
[1,2,3,4,5,6]
[[1],[2],[3],[4],[5],[6]] -> 21
[[1,2],[3,4],[5,6]] -> 9
[[1,2,3],[4,5,6]] -> 7
...
but I don't think I quite have the idea yet. So, to get succinct since this post is kind of long: How are we breaking this up? What is the sub-problem we are trying to solve here?
Relevant Posts:
Binary Search Doesn't work in this case?

Parse arithmetic/boolean expression but skip capture

Given the following expression
x = a + 3 + b * 5
I would like to write that in the following data structure, where I'm only interested to capture the variables used on the RHS and keep the string intact. Not interesting in parsing a more specific structure since I'm doing a transformation from language to language, and not handling the evaluation
Variable "x" (Expr ["a","b"] "a + 3 + b * 5")
I've been using this tutorial as my starting point, but I'm not sure how to write an expression parser without buildExpressionParser. That doesn't seem to be the way I should approach this.
I am not sure why you want to avoid buildExpressionParser, as it hides a lot of the complexity in parsing expressions with infix operators. It is the right way to do things....
Sorry about that, but now that I got that nag out of the way, I can answer your question.
First, here is some background-
The main reason writing a parser for expressions with infix operators is hard is because of operator precedence. You want to make sure that this
x+y*z
parses as this
+
/ \
x *
/\
y z
and not this
*
/ \
+ z
/ \
x y
Choosing the correct parsetree isn't a very hard problem to solve.... But if you aren't paying attention, you can write some really bad code. Why? Performance....
The number of possible parsetrees, ignoring precedence, grows exponentially with the size of the input. For instance, if you write code to try all possibilities then throw away all but the ones with the proper precedence, you will have a nasty surprise when your parser tackles anything in the real world (remember, exponential complexity often ain't just slow, it is basically not a solution at all.... You may find that you are waiting half an hour for a simple parse, no one will use that parser).
I won't repeat the details of the "proper" solution here (a google search will give the details), except to note that the proper solution runs at O(n) with the size of the input, and that buildExpressionParser hides all the complexity of writing such a parser for you.
So, back to your original question....
Do you need to use buildExpressionParser to get the variables out of the RHS, or is there a better way?
You don't need it....
Since all you care about is getting the variables used in the right side, you don't care about operator precedence. You can just make everything left associative and write a simple O(n) parser. The parsetrees will be wrong, but who cares? You will still get the same variables out. You don't even need a context free grammar for this, this regular expression basically does it
<variable>(<operator><variable>)*
(where <variable> and <operator> are defined in the obvious way).
However....
I wouldn't recommend this, because, as simple as it is, it still will be more work than using buildExpressionParser. And it will be trickier to extend (like adding parenthesis). But most important, later on, you may accidentally use it somewhere where you do need a full parsetree, and be confused for a while why the operator precedence is so completely messed up.
Another solution is, you could rewrite your grammar to remove the ambiguity (again, google will tell you how).... This would be good as a learning exercise, but you basically would be repeating what buildExpressionParser is doing internally.

Is my program Turing-complete?

I spent a week or two programming a simple logic solver. Having built it, I found myself wondering whether the language it solves is Turing-complete or not. So I coded up a small set of equations which accept any valid expression in the SKI combinator calculus, and produce a result set which contains the normal form of that expression. Since SKI is Turing-complete, proving that my language can execute SKI would demonstrate its Turing-completeness.
There is a glitch, however. The solver does not reduce the expression in normal order. Actually what it does is to try every possible reduction order. Which means that the solution set is typically huge. If a normal form exists, it will be in there somewhere, but it's difficult to tell where.
This brings me to two questions:
Is my language Turing-complete? Or do I need to find a better proof?
Is the number of solutions a computable function of the input?
(At first I assumed that the size of the solution set was exponential or factorial in the input size. But on closer inspection, this is not true. You can write huge expressions which are already in normal form, and tiny expressions which do not terminate. I have a feeling that determining the size of the solution set might be equivilent to solving the Halting Problem, but I'm not completely sure...)
A) As augustss says, your system is clearly turing complete.
B) You're right that determining solution size is the same as the halting problem. If a sequence doesn't terminate, then you get an infinite solution set. So to determine if the set is infinite, you need to determine if a reduction sequence terminates. But that's precisely the halting problem!
C) As I recall, a system that, given a set of instructions to a turing machine, merely says how many steps they take to terminate (which is, I suppose, the cardinality of your solution set) or fails to terminate if the instructions themselves fail to terminate is in itself turing complete. So that should help with the intuition here.
In answer to my own question... I found that by tweaking the source code, I can make it so that if the input SKI expression has a normal form, that normal form will always be solution #1. So if you just ignore any further solutions, the program reduces any SKI expression to normal form.
I believe this constitutes a "better proof". ;-)

Functional alternative to caching known "answers"

I think the best way to form this question is with an example...so, the actual reason I decided to ask about this is because of because of Problem 55 on Project Euler. In the problem, it asks to find the number of Lychrel numbers below 10,000. In an imperative language, I would get the list of numbers leading up to the final palindrome, and push those numbers to a list outside of my function. I would then check each incoming number to see if it was a part of that list, and if so, simply stop the test and conclude that the number is NOT a Lychrel number. I would do the same thing with non-lychrel numbers and their preceding numbers.
I've done this before and it has worked out nicely. However, it seems like a big hassle to actually implement this in Haskell without adding a bunch of extra arguments to my functions to hold the predecessors, and an absolute parent function to hold all of the numbers that I need to store.
I'm just wondering if there is some kind of tool that I'm missing here, or if there are any standards as a way to do this? I've read that Haskell kind of "naturally caches" (for example, if I wanted to define odd numbers as odds = filter odd [1..], I could refer to that whenever I wanted to, but it seems to get complicated when I need to dynamically add elements to a list.
Any suggestions on how to tackle this?
Thanks.
PS: I'm not asking for an answer to the Project Euler problem, I just want to get to know Haskell a bit better!
I believe you're looking for memoizing. There are a number of ways to do this. One fairly simple way is with the MemoTrie package. Alternatively if you know your input domain is a bounded set of numbers (e.g. [0,10000)) you can create an Array where the values are the results of your computation, and then you can just index into the array with your input. The Array approach won't work for you though because, even though your input numbers are below 10,000, subsequent iterations can trivially grow larger than 10,000.
That said, when I solved Problem 55 in Haskell, I didn't bother doing any memoization whatsoever. It turned out to just be fast enough to run (up to) 50 iterations on all input numbers. In fact, running that right now takes 0.2s to complete on my machine.

A reverse inference engine (find a random X for which foo(X) is true)

I am aware that languages like Prolog allow you to write things like the following:
mortal(X) :- man(X). % All men are mortal
man(socrates). % Socrates is a man
?- mortal(socrates). % Is Socrates mortal?
yes
What I want is something like this, but backwards. Suppose I have this:
mortal(X) :- man(X).
man(socrates).
man(plato).
man(aristotle).
I then ask it to give me a random X for which mortal(X) is true (thus it should give me one of 'socrates', 'plato', or 'aristotle' according to some random seed).
My questions are:
Does this sort of reverse inference have a name?
Are there any languages or libraries that support it?
EDIT
As somebody below pointed out, you can simply ask mortal(X) and it will return all X, from which you can simply pick a random one from the list. What if, however, that list would be very large, perhaps in the billions? Obviously in that case it wouldn't do to generate every possible result before picking one.
To see how this would be a practical problem, imagine a simple grammar that generated a random sentence of the form "adjective1 noun1 adverb transitive_verb adjective2 noun2". If the lists of adjectives, nouns, verbs, etc. are very large, you can see how the combinatorial explosion is a problem. If each list had 1000 words, you'd have 1000^6 possible sentences.
Instead of the deep-first search of Prolog, a randomized deep-first search strategy could be easyly implemented. All that is required is to randomize the program flow at choice points so that every time a disjunction is reached a random pole on the search tree (= prolog program) is selected instead of the first.
Though, note that this approach does not guarantees that all the solutions will be equally probable. To guarantee that, it is required to known in advance how many solutions will be generated by every pole to weight the randomization accordingly.
I've never used Prolog or anything similar, but judging by what Wikipedia says on the subject, asking
?- mortal(X).
should list everything for which mortal is true. After that, just pick one of the results.
So to answer your questions,
I'd go with "a query with a variable in it"
From what I can tell, Prolog itself should support it quite fine.
I dont think that you can calculate the nth solution directly but you can calculate the n first solutions (n randomly picked) and pick the last. Of course this would be problematic if n=10^(big_number)...
You could also do something like
mortal(ID,X) :- man(ID,X).
man(X):- random(1,4,ID), man(ID,X).
man(1,socrates).
man(2,plato).
man(3,aristotle).
but the problem is that if not every man was mortal, for example if only 1 out of 1000000 was mortal you would have to search a lot. It would be like searching for solutions for an equation by trying random numbers till you find one.
You could develop some sort of heuristic to find a solution close to the number but that may affect (negatively) the randomness.
I suspect that there is no way to do it more efficiently: you either have to calculate the set of solutions and pick one or pick one member of the superset of all solutions till you find one solution. But don't take my word for it xd

Resources