How would you implement a heuristic search in Haskell? - haskell

In Haskell or some other functional programming language, how would you implement a heuristic search?
Take as an example search space, the nine-puzzle, that is a 3x3 grid with 8 tiles and 1 hole, and you move tiles into the hole until you have correctly assembled a picture. The heuristic is the "Manhattan heuristic", which evaluates a board position adding up the distance each tile is from its target position, taking as the distance the number of squares horizontally plus the number of squares vertically each tile needs to be moved to get to the correct location.
I have been reading John Hughes paper on pretty printing as I know that pretty printer back-tracks to find better solutions. I am trying to understand how to generalise a heuristic search along these lines.
===
Note that my ultimate aim here is not to write a solver for the 9-puzzle, but to learn some general techniques for writing efficient heuristic searches in FP languages. I am also interested to learn if there is code that can be generalised and re-used across a wider class of such problems, rather than solving any specific problem.
For example, a search space can be characterised by a function that maps a State to a List of States together with some 'operation' that describes how one state is transitioned into another. There could also be a goal function, mapping a State to Bool, indicating when a goal State has been reached. And of course, the heuristic function mapping a State to a Number reflecting how well it is estimated to score. Other descriptions of the search are possible.

I don't think it's necessarily very specific to FP or Haskell (unless you utilize lists as "multiple possibility" monads, as in Learn You A Haskell For Great Good).
One way to do it would be by writing a recursive function taking the following:
the current state (that is the board configuration)
possibly some path metadata, e.g., the number of steps from the initial configuration (which is just the recursion depth), or a memoization-map of all the states already considered
possibly some decision, metadata, e.g., a pesudo-random number generator
Within each recursive call, the function would take the state, and check if it is the required result. If not it would
if it uses a memoization map, check if a choice was already considered
If it uses a recursive-step count, check whether to pursue the choices further
If it decides to recursively call itself on the possible choices emanating from this state (e.g., if there are different tiles which can be pushed into the hole), it could do so in the order based on the heuristic (or possibly pseudo-randomly based on the order based on the heuristic)
The function would return whether it succeeded, and, if they are used, updated versions of the memoization map and/or pseudo-random number generator.

Related

How would I construct an integer optimization model corresponding to a graph

Suppose we're given some sort of graph where the feasible region of our optimization problem is given. For example: here is an image
How would I go on about constructing these constraints in an integer optimization problem? Anyone got any tips? Thanks!
Mate, I agree with the others that you should be a little more specific than that paint-ish picture ;). In particular you are neither specifying any objective/objective direction nor are you giving any context, what about this graph should be integer-variable related, except for the existence of disjunctive feasible sets, which may be modeled by MIP-techniques. It seems like your problem is formalization of what you conceptualized. However, in case you are just being lazy and are just interested in modelling disjunctive regions, you should be looking into disjunctive programming techniques, such as "big-M" (Note: big-M reformulations can be problematic). You should be aiming at some convex-hull reformulation if you can attain one (fairly easily).
Back to your picture, it is quite clear that you have a problem in two real dimensions (let's say in R^2), where the constraints bounding the feasible set are linear (the lines making up the feasible polygons).
So you know that you have two dimensions and need two real continuous variables, say x[1] and x[2], to formulate each of your linear constraints (a[i,1]*x[1]+a[i,2]<=rhs[i] for some index i corresponding to the number of lines in your graph). Additionally your variables seem to be constrained to the first orthant so x[1]>=0 and x[2]>=0 should hold. Now, to add disjunctions you want some constraints that only hold when a certain condition is true. Therefore, you can add two binary decision variables, say y[1],y[2] and an additional constraint y[1]+y[2]=1, to tell that only one set of constraints can be active at the same time. You should be able to implement this with the help of big-M by reformulating the constraints as follows:
If you bound things from above with your line:
a[i,1]*x[1]+a[i,2]-rhs[i]<=M*(1-y[1]) if i corresponds to the one polygon,
a[i,1]*x[1]+a[i,2]-rhs[i]<=M*(1-y[2]) if i corresponds to the other polygon,
and if your line bounds things from below:
-M*(1-y[1])<=-a[i,1]*x[1]-a[i,2]+rhs[i] if i corresponds to the one polygon,
-M*(1-y[1])<=-a[i,1]*x[1]-a[i,2]+rhs[i] if i corresponds to the other polygon.
It is important that M is sufficiently large, but not too large to cause numerical issues.
That being said, I am by no means an expert on these disjunctive programming techniques, so feel free to chime in, add corrections or make things clearer.
Also, a more elaborate question typically yields more elaborate and satisfying answers ;) If you had gone to the effort of making up a true small example problem you likely would have gotten a full formulation of your problem or even an executable piece of code in no time.

Monotonicity and A*. Is it optimal?

Is A* an optimal search algorithm (ie. will it find the best solution) even if it is non-monotonic? Why or Why not?
A* is an optimal search algorithm as long as the heuristic is admissible.
But, if the heuristic is inconsistent, you will need to re-expand nodes to ensure optimality. (That is, if you find a shorter path to a node on the closed list, you need to update the g-cost and put it back on the open list.)
This re-expansion can introduce exponential overhead in state spaces with exponentially growing edge costs. There are variants of A* which reduce this to polynomial overhead by essentially interleaving a dijkstra search in between each A* node expansion.
This paper has an overview of recent work, and particularly citations of other work by Martelli and Mero who detail these worst-case graphs and suggest optimizations to improve A*.
A* is an optimal search algorithm if and only if the heuristic is both admissible and monotonic (also referred to as consistent). For a discussion of the distinction between these criteria, see this question. Effectively, the consistency requirement means that the heuristic cannot overestimate the distance between any pair of nodes in the graph being searched. This is the necessary because any overestimate could result in the search ignoring a good path in favor of a path that is actually worse.
Let's walk through an example of why this is true. Consider two nodes, A and B, with heuristic estimates of 5 and 6, respectively. If the heuristic is admissible and consistent, the shortest path that could possibly exist goes through A and is no shorter than 5. But what if the heuristic isn't consistent? In order for a heuristic to be admissible but inconsistent, it must always underestimate the distance to the goal but not result in there being a clear relationship between heuristic estimates at nodes within the graph. For a concrete example, see my answer to this question. In this example, the heuristic function randomly chooses between two other functions. If the heuristic estimates at A and B were not calculated based on the same function, we don't actually know which one of them currently has the potential to lead to a shorter path. In effect, we aren't using the same scale to measure them. So we could choose A when B was actually the better option. This might result in us finding the goal through a sub-optimal path.

efficient functional data structure for finite bijections

I'm looking for a functional data structure that represents finite bijections between two types, that is space-efficient and time-efficient.
For instance, I'd be happy if, considering a bijection f of size n:
extending f with a new pair of elements has complexity O(ln n)
querying f(x) or f^-1(x) has complexity O(ln n)
the internal representation for f is more space efficient than having 2 finite maps (representing f and its inverse)
I am aware of efficient representation of permutations, like this paper, but it does not seem to solve my problem.
Please have a look at my answer for a relatively similar question. The provided code can handle general NxM relations, but also be specialized to just bijections (just as you would for a binary search tree).
Pasting the answer here for completeness:
The simplest way is to use a pair of unidirectional maps. It has some cost, but you won't get much better (you could get a bit better using dedicated binary trees, but you have a huge complexity cost to pay if you have to implement it yourself). In essence, lookups will be just as fast, but addition and deletion will be twice as slow. Which isn't so bad for a logarithmic operation. Another advantage of this technique is that you can use specialized maps types for the key or value type if you have one available. You won't get as much flexibility with a specific generalist data structure.
A different solution is to use a quadtree (instead of considering a NxN relation as a pair of 1xN and Nx1 relations, you see it as a set of elements in the cartesian product (Key*Value) of your types, that is, a spatial plane), but it's not clear to me that the time and memory costs are better than with two maps. I suppose it needs to be tested.
Although it doesn't satisfy your third requirement, bimaps seem like the way to go. (They just make two finite maps, one in each direction, convenient to use.)

Represent Flowchart-specified Algorithms in Haskell

I'm confronted with the task of implementing algorithms (mostly business logic style) expressed as flowcharts. I'm aware that flowcharts are not the best algorithm representation due to its spaghetti-code property (would this be a use-case for CPS?), but I'm stuck with the specification expressed as flowcharts.
Although I could transform the flowcharts into more appropriate equivalent representations before implementing them, that could make it harder to "recognize" the orginal flow-chart in the resulting implementation, so I was hoping there is some way to directly represent flowchart-algorithms as (maybe monadic) EDSLs in Haskell, so that the semblance to the original flowchart-specification would be (more) obvious.
One possible representation of flowcharts is by using a group of mutually tail-recursive functions, by translating "go to step X" into "evaluate function X with state S". For improved readability, you can combine into a single function both the action (an external function that changes the state) and the chain of if/else or pattern matching that helps determine what step to take next.
This is assuming, of course, that your flowcharts are to be hardcoded (as opposed to loaded at runtime from an external source).
Sounds like Arrows would fit exactly what you describe. Either do a visualization of arrows (should be quite simple) or generate/transform arrow code from flow-graphs if you must.
Assuming there's "global" state within the flowchart, then that makes sense to package up into a state monad. At least then, unlike how you're doing it now, each call doesn't need any parameters, so can be read as a) modify state, b) conditional on current state, jump.

Converting graph to canonical string

I'm looking for a way of storing graphs as strings. The strings are to be used as keys in a map, so that two topologically identical graphs will map to the same value in the map. Does anybody know of such an algorithm?
The nodes of the tree are labeled with duplicate labels being allowed.
The program is in java and an implementation in that would be neat, but any pointers to possible algorithms are appreciated.
if you have an algorithm that maps general graphs to strings, and so that two graphs map to the same string if and only if they are topologically equivalent, then you have an algorithm for GRAPH AUTOMORPHISM. Graph automorphism has no known polynomial-time algorithms. So you can't have (easily :) a polynomial-time algorithm that calculates the strings as you postulate them, because otherwise you'd have constructed a previously unknown and very efficient algorithm to graph automorphism.
This doesn't mean that it wouldn't be possible to solve the problem for your class of graphs; it just means that for the class of all graphs it's kind of difficult.
You may find the following question relevant...
Using finite automata as keys to a container
Basically, an automaton can be minimised using well-known algorithms from automata-theory textbooks. Hopcrofts is an example. There is precisely one minimal automaton that is equivalent to any given automaton. However, that minimal automaton may be represented in different ways. Constructing a safe canonical form is basically a matter of renumbering nodes and ordering the adjacency table using information that is significant in defining the automaton, and not by information that is specific to the representation.
The basic principle should extend to general graphs. Whether you can minimise your graphs depends on their semantics, but the basic idea of renumbering the nodes and sorting the adjacency list still applies.
Other answers here assume things about your graphs - for example that the nodes have unique labels that can be ordered and which are significant for the semantics of your graphs, that can be used to identify the nodes in an adjacency matrix or list. This simply won't work if you're interested in morphims of unlabelled graphs, for instance. Different ways of numbering the nodes (and thus ordering the adjacency list) will result in different canonical forms for equivalent graphs that just happen to be represented differently.
As for the renumbering etc, an approach is to borrow and adapt principles from automata minimisation algorithms. Basically...
Create a vector of blocks (sets of nodes). Initially, populate this with one block per class of nodes (ie per distinct node annotation). The modification here is that we order these by annotation details (not by representation-specific node IDs).
For each class (annotation) of edges in order, evaluate each block. If each node in the block can follow the current edge-type to reach the same set of next blocks, leave it untouched. Otherwise, split it as necessary to get maximal blocks that achieve this objective. Keep these split blocks clustered together in the vector (preserve the existing ordering, just refine it a bit), and order the split blocks based on a suitable ordering of the next-block sets. For example, use bitvectors as long as the current vector of blocks, with a set bit for each block reachable by following the current edge type. To order the bitvectors, treat them as big integers.
EDIT - I forgot to mention - in the second bullet, as soon as you split a block, you restart with the first block in the vector and first edge annotation. Obviously, a naive implementation will be slow, so take the principle and use it to adapt Hopcrofts minimisation algorithm.
If you end up with blocks that have multiple nodes in them, those nodes are equivalent. Whether that means they can be merged or not depends on your semantics, but the relative ordering of nodes within each such block clearly doesn't matter.
If dealing with graphs that can be minimised (e.g. automaton digraphs) I suspect it's best to minimise first, though I still haven't got around to implementing this myself.
The key thing is, of course, ensuring that your renumbering is sensitive only to the significant details of the graph - its structure and annotations - and not the things that are only there so that you can construct a representation such as node IDs/addresses etc.
Once you have the blocks ordered, deriving a canonical form should be easy.
gSpan introduced the 'minimum DFS code' which encodes graphs such that if two graphs have the same code, they must be isomorphic. gSpan has implementations in C++ and Java.
A common way to do this is using Adjacency lists
Beside an Adjacency list, there are adjacency matrices. Which one you choose should depend on which you use to implement your Graph class (adjacency lists are usually the better choice, but they both have strengths and weaknesses). If you have a totally different implementation of Graph, consider using one of these, as it makes many graph algorithms very easy to implement.
One other option is, if possible, overriding hashCode() and equals() on the Graph class and use the actual graph object as the key rather than converting to a string.
E: overriding the hashCode() and equals() is the route I would take if some vertices are not uniquely labeled. As noted in the comments, this can be expensive, but I think it would depend on the implementation of the Graph class.
If equals() is too expensive, then you should use an adjacency list or matrix, but don't just use the node names. You have to carefully specify exactly what it is that identifies individual graphs and vertices (and therefore what would make them equal), and then make your string representation of the adjacency list use those properties instead of the node names. I'd suggest you write this specification of your graph equals operation down.

Resources