How to generate distinct solutions in Prolog for '8 out of 10 cats does countdown' numbers game solver? - search

I wrote a Prolog program to find all solutions to any '8 out of 10 cats does countdown' number sequence. I am happy with the result. However, the solutions are not unique. I tried distincts() and reduced() from the "solution sequences" library. They did not produce unique solutions.
The problem is simple. you have a given list of six numbers [n1,n2,n3,n4,n5,n6] and a target number (R). Calculate R from any arbitrary combination of n1 to n6 using only +,-,*,/. You do not have to use all numbers but you can only use each number once. If two solutions are identical, only one must be generated and the other discarded. 
Sometimes there are equivalent results with different arrangement. Such as:
(100+3)*6*75/50+25
(100+3)*75*6/50+25  
Does anyone has any suggestions to eliminate such redundancy?
Each solution is a nested operators and integers. For example +(2,*(4,-(10,5))). This solution is an unbalanced binary tree with Arithmetic Operator for root and sibling nodes and numbers for leaf nodes. In order to have unique solutions, no two trees should be equivalent.
The Code:
:- use_module(library(lists)).
:- use_module(library(solution_sequences)).
solve(L,R,OP) :-
findnsols(10,OP,solve_(L,R,OP),S),
print_solutions(S).
solve_(L,R,OP) :-
distinct(find_op(L,OP)),
R =:= OP.
find_op(L,OP) :-
select(N1,L,Ln),
select(N2,Ln,[]),
N1 > N2,
member(OP,[+(N1,N2), -(N1,N2), *(N1,N2), /(N1,N2), N1, N2]).
find_op(L,OP) :-
select(N,L,Ln),
find_op(Ln,OP_),
OP_ > N,
member(OP,[+(OP_,N), -(OP_,N), *(OP_,N), /(OP_,N), OP_]).
print_solutions([]).
print_solutions([A|B]) :-
format('~w~n',A),
print_solutions(B).
Test:
solve([25,50,75,100,6,3],952,X)
Result
(100+3)*6*75/50+25 <- s1
((100+6)*3*75-50)/25 <- s2
(100+3)*75*6/50+25 <- s1
((100+6)*75*3-50)/25 <- s2
(100+3)*75/50*6+25 <- s1
true.
This code uses select/3 from the "lists" library.
UPDATE: Generate solutions useing DCG
The following is an attempt to generate solutions using DCG.  I was able to generate a more exhaustive solution set than in previous code posted. In a way, using DCG resulted in a more correct and elegant code. However, it is much more difficult to 'guess' what the code is doing.
The issue of redundant solutions still persist.
:- use_module(library(lists)).
:- use_module(library(solution_sequences)).
s(L) --> [L].
s(+(L,Ls)) --> [L],s(Ls).
s(*(L,Ls)) --> [L],s(Ls), {L =\= 1, Ls =\= 1, Ls =\= 0}.
s(-(L,Ls)) --> [L],s(Ls), {L =\= Ls, Ls =\= 0}.
s(/(L,Ls)) --> [L],s(Ls), {Ls =\= 1, Ls =\= 0}.
s(-(Ls,L)) --> [L],s(Ls), {L =\= Ls}.
s(/(Ls,L)) --> [L],s(Ls), {L =\= 1, Ls =\=0}.
solution_list([N,H|[]],S) :-
phrase(s(S),[N,H]).
solution_list([N,H|T],S) :-
phrase(s(S),[N,H|T]);
solution_list([H|T],S).
solve(L,R,S) :-
permutation(L,X),
solution_list(X,S),
R =:= S.

Does anyone has any suggestions to eliminate such redundancy?
I suggest to define a sorting weight on each node (inner or leaf). The number resulting from reducing the child node could be used, although ties will appear. These can be broken by additionally looking at topmost operations, sorting * before + for example. Actually one would like to have a sorting operation for which "tie" means "exactly the same subtree of arithmetic operations".

Since the OP is only seeking hints to help solve the problem.
Use DCG as a generator. (SWI-Prolog) (Prolog DCG Primer)
a. For a more refined version of using DCGs as a generator look for examples that use length/2. When you understand why you might see a beam of light shining down on you for a few moments (The light beam is a video gaming thing).
Use a constraint solver (SWI-Prolog) (CLP(FD) and CLP(ℤ): Prolog Integer Arithmetic) (Understanding CLP(FD) Prolog code of N-queens problem)
Since your solutions are constrained to the 6 numbers and the operators are always binary operators (+,-,*,/) then it is possible to enumerate the unique binary trees. If you know about OEIS then you can find related links that can help you solve this problem, but you need to give OEIS a sequence. To get a sequence for use with OEIS draw the trees for N from 2 to 5 and then enter that sequence into OEIS and see what you get. e.g.
N is the number of leaf (*) nodes.
N=2 ( 1 way to draw the tree )
-
/ \
* *
N=3 ( 2 ways to draw the tree )
- -
/ \ / \
- * * -
/ \ / \
* * * *
So the sequence starts with 1,2 ...
Hint - This page (link died) shows the images of the trees to see if you have done it correctly. In the description I use N to count the number of leaves (*), but on this page they use N to count the number of internal nodes (-). If we call my N N1 and the page N N2, then the relation is N2 = N1 - 1
This might be a Hamiltonian Cycle (Wolfram World) (Hamiltonianicity of the Tower of Hanoi Problem) Remember that there is a relation between Binary Trees and the Tower of Hanoi, but in your case there are added constraints. I don't know if the constraints eliminate a solution as a Hamiltonian Cycle.
Also don't think of building the final answer from a combination of any number and operator, but instead build subsets of operators and numbers, and then use those subsets to build the answer. You constrain at the start, not at the end.
Or put another way, don't think combinations at the start, but permutations of combinations (not sure if that is the correct pattern, but in the ball park) and then using that build the tree.

Related

TSP / CPP variant - subtour constraint

I'm developing an optimization problem that is a variant on Traveling Salesman. In this case, you don't have to visit all the cities, there's a required start and end point, there's a min and max bound on the tour length, you can traverse each arc multiple times if you want, and you have a nonlinear objective function that is associated with the arcs traversed (and number of times you traverse each arc). Decision variables are integers, how many times you traverse each arc.
I've developed a nonlinear integer program in Pyomo and am getting results from the NEOS server. However I didn't put in subtour constraints and my results are two disconnected subtours.
I can find integer programming formulations of TSP that say how to formulate subtour constraints, but this is a little different from the standard TSP and I'm trying to figure out how to start. Any help that can be provided would be greatly appreciated.
EDIT: problem formulation
50 arcs , not exhaustive pairs between nodes. 50 Decision variables N_ab are integer >=0, corresponds to how many times you traverse from a to b. There is a length and profit associated with each N_ab . There are two constraints that the sum of length_ab * N_ab for all ab are between a min and max distance. I have a constraint that the sum of N_ab into each node is equal to the sum N_ab out of the node you can either not visit a node at all, or visit it multiple times. Objective function is nonlinear and related to the interaction between pairs of arcs (not relevant for subtour).
Subtours: looking at math.uwaterloo.ca/tsp/methods/opt/subtour.htm , the formulation isn't applicable since I am not required to visit all cities, and may not be able to. So for example, let's say I have 20 nodes and 50 arcs (all arcs length 10). Distance constraints are for a tour of exactly length 30, which means I can visit at most three nodes (start at A -> B -> C ->A = length 30). So I will not visit the other nodes at all. TSP subtour elimination would require that I have edges from node subgroup ABC to subgroup of nonvisited nodes - which isn't needed for my problem
Here is an approach that is adapted from the prize-collecting TSP (e.g., this paper). Let V be the set of all nodes. I am assuming V includes a depot node, call it node 1, that must be on the tour. (If not, you can probably add a dummy node that serves this role.)
Let x[i] be a decision variable that equals 1 if we visit node i at least once, and 0 otherwise. (You might already have such a decision variable in your model.)
Add these constraints, which define x[i]:
x[i] <= sum {j in V} N[i,j] for all i in V
M * x[i] >= N[i,j] for all i, j in V
In other words: x[i] cannot equal 1 if there are no edges coming out of node i, and x[i] must equal 1 if there are any edges coming out of node i.
(Here, N[i,j] is 1 if we go from i to j, and M is a sufficiently large number, perhaps equal to the maximum number of times you can traverse one edge.)
Here is the subtour-elimination constraint, defined for all subsets S of V such that S includes node 1, and for all nodes i in V \ S:
sum {j in S} (N[i,j] + N[j,i]) >= 2 * x[i]
In other words, if we visit node i, which is not in S, then there must be at least two edges into or out of S. (A subtour would violate this constraint for S equal to the nodes that are on the subtour that contains 1.)
We also need a constraint requiring node 1 to be on the tour:
x[1] = 1
I might be playing a little fast and loose with the directional indices, i.e., I'm not sure if your model sets N[i,j] = N[j,i] or something like that, but hopefully the idea is clear enough and you can modify my approach as necessary.

Solving math with integers larger than any available integer data type

In some programming competitions where the numbers are larger than any available integer data type, we often use strings instead.
Question 1:
Given these large numbers, how to calculate e and f in the below expression?
(a/b) + (c/d) = e/f
note: GCD(e,f) = 1, i.e. they must be in minimised form. For example {e,f} = {1,2} rather than {2,4}.
Also, all a,b,c,d are large numbers known to us.
Question 2:
Can someone also suggest a way to find GCD of two big numbers (bigger than any available integer type)?
I would suggest using full bytes or words rather than strings.
It is relatively easy to think in base 256 instead of base 10 and a lot more efficient for the processor to not do multiplication and division by 10 all the time. Ideally, choose a word size that is half the processor's natural word size, as that makes carry easy to implement. Of course thinking in base 64K or 4G is slightly more complex, but even better than base 256.
The only downside is generating the initial big numbers from the ascii input, which you get for free in base 10. Using a larger word size you can make this more efficient by processing a number of digits initially into a single word (eg 9 digits at a time into 4G), then performing a long multiply of that single word into the correct offset in your large integer format.
A compromise might be to run your engine in base 1 billion: This will still be 9 or 81 times more efficient than using base 10!
The simplest way to solve this equation is to multiply a/b * d/d and c/d * b/b so they both have the common denominator b*d.
I think you will then need to prime factorise your big numbers e and f to find any common factors. Remember to search again for the same factor squared.
Of course, that means you have to write a prime generating sieve. You only need to generate factors up to the square root, or half the digits of the min value of e and f.
You could prime factorise b and d to get a lower initial denominator, but you will need to do it again anyway after the addition.
I think that the way to solve this is to separate the problem:
Process the input numbers as an array of characters (ie. std::string)
Make a class where each object can store an std::list (or similar) that represents one of the large numbers, and can do the needed arithmetic with your data
You can then solve your problems normally, without having to worry about your large inputs causing overflow.
Here's a webpage that explains how you can have such an arithmetic class (with sample code in C++ showing addition).
Once you have such an arithmetic class, you no longer need to worry about how to store the data or any overflow.
I get the impression that you already know how to find the GCD when you don't have overflow issues, but just in case, here's an explanation of finding the GCD (with C++ sample code).
As for the specific math problem:
// given formula: a/b + c/d = e/f
// = ( ( a*d + b*c ) / ( b*d ) )
// Define some variables here to save on copying
// (I assume that your class that holds the
// large numbers is called "ARITHMETIC")
ARITHMETIC numerator = a*d + b*c;
ARITHMETIC denominator = b*d;
ARITHMETIC gcd = GCD( numerator , denominator );
// because we know that GCD(e,f) is 1, this implies:
ARITHMETIC e = numerator / gcd;
ARITHMETIC f = denominator / gcd;

Simultaneous Subset sums

I am dealing with a problem which is a variant of a subset-sum problem, and I am hoping that the additional constraint could make it easier to solve than the classical subset-sum problem. I have searched for a problem with this constraint but I have been unable to find a good example with an appropriate algorithm either on StackOverflow or through googling elsewhere.
The problem:
Assume you have two lists of positive numbers A1,A2,A3... and B1,B2,B3... with the same number of elements N. There are two sums Sa and Sb. The problem is to find the simultaneous set Q where |sum (A{Q}) - Sa| <= epsilon and |sum (B{Q}) - Sb| <= epsilon. So, if Q is {1, 5, 7} then A1 + A5 + A7 - Sa <= epsilon and B1 + B5 + B7 - Sb <= epsilon. Epsilon is an arbitrarily small positive constant.
Now, I could solve this as two completely separate subset sum problems, but removing the simultaneity constraint results in the possibility of erroneous solutions (where Qa != Qb). I also suspect that the additional constraint should make this problem easier than the two NP complete problems. I would like to solve an instance with 18+ elements in both lists of numbers, and most subset-sum algorithms have a long run time with this number of elements. I have investigated the pseudo-polynomial run time dynamic programming algorithm, but this has the problems that a) the speed relies on a short bit-depth of the list of numbers (which does not necessarily apply to my instance) and b) it does not take into account the simultaneity constraint.
Any advice on how to use the simultaneity constraint to reduce the run time? Is there a dynamic programming approach I could use to take into account this constraint?
If I understand your description of the problem correctly (I'm confused about why you have the distance symbols around "sum (A{Q}) - Sa" and "sum (B{Q}) - Sb", it doesn't seem to fit the rest of the explanation), then it is in NP.
You can see this by making a reduction from Subset sum (SUB) to Simultaneous subset sum (SIMSUB).
If you have a SUB problem consisting of a set X = {x1,x2,...,xn} and a target called t and you have an algorithm that solves SIMSUB when given two sets A = {a1,a2,...,an} and B = {b1,b2,...,bn}, two intergers Sa and Sb and a value for epsilon then we can solve SUB like this:
Let A = X and let B be a set of length n consisting of only 0's. Set Sa = t, Sb = 0 and epsilon = 0. You can now run the SIMSUB algorithm on this problem and get the solution to your SUB problem.
This shows that SUBSIM is as least as hard as SUB and therefore in NP.

Longest repeated substring with at least k occurrences correctness

The algorithms for finding the longest repeated substring is formulated as follows
1)build the suffix tree
2)find the deepest internal node with at least k leaf children
But I cannot understand why is this works,so basically what makes this algorithm correct?Also,the source where I found this algorithm says that is find the repeated substring in O(n),where n is the length of the substring,this is also not clear to me!Let's consider the following tree,here the longest repeated substring is "ru" and if we apply DFS it will find it in 5 step but not in 2
Can you explain this stuff to me?
Thanks
image
I suppose you perfectly know O(n) (Big O notation) refers to the order of growth of some quantity as a function of n, and not the equivalence of the quantity with n.
I write this becase reading the question I was in doubt...
I'm writing this as an aswer and not a comment since it's a bit too long for a comment (I suppose...)
Given a string S of N characters, building the corresponding suffix tree is O(N) (using an algorithm such as Ukkonen's).
Now, such a suffix tree can have at most 2N - 1 nodes (root and leaves included).
If you traverse your tree and compute the number of leaves reachable from a given node along with its depth, you'll find the desired result. To do so, you start from the root and explore each of its children.
Some pseudo-code:
traverse(node, depth):
nb_leaves <-- 0
if empty(children(node)):
nb_leaves <-- 1
else:
for child in children(node):
nb_leaves <-- nb_leaves + traverse(child, depth+1)
node.setdepth(depth)
node.setoccurrences(nb_leaves)
return nb_leaves
The initial call is traverse(root, 0). Since the structure is a tree, there is only one call to traverse for each node. This means the maximum number of call to traverse is 2N - 1, therefore the overall traversal is only O(N). Now you just have to keep track of the node with the maximum depth that also verifies: depth > 0 && nb_leaves >= k by adding the relevant bookkeeping mechanism. This does not hinder the overall complexity.
In the end, the complexity of the algorithm to find such a substring is O(N) where N is the length of the input string (and not the length of the matching substring!).
Note: The traversal described above is basically a DFS on the suffix tree.

Finding the minimum number of swaps to convert one string to another, where the strings may have repeated characters

I was looking through a programming question, when the following question suddenly seemed related.
How do you convert a string to another string using as few swaps as follows. The strings are guaranteed to be interconvertible (they have the same set of characters, this is given), but the characters can be repeated. I saw web results on the same question, without the characters being repeated though.
Any two characters in the string can be swapped.
For instance : "aabbccdd" can be converted to "ddbbccaa" in two swaps, and "abcc" can be converted to "accb" in one swap.
Thanks!
This is an expanded and corrected version of Subhasis's answer.
Formally, the problem is, given a n-letter alphabet V and two m-letter words, x and y, for which there exists a permutation p such that p(x) = y, determine the least number of swaps (permutations that fix all but two elements) whose composition q satisfies q(x) = y. Assuming that n-letter words are maps from the set {1, ..., m} to V and that p and q are permutations on {1, ..., m}, the action p(x) is defined as the composition p followed by x.
The least number of swaps whose composition is p can be expressed in terms of the cycle decomposition of p. When j1, ..., jk are pairwise distinct in {1, ..., m}, the cycle (j1 ... jk) is a permutation that maps ji to ji + 1 for i in {1, ..., k - 1}, maps jk to j1, and maps every other element to itself. The permutation p is the composition of every distinct cycle (j p(j) p(p(j)) ... j'), where j is arbitrary and p(j') = j. The order of composition does not matter, since each element appears in exactly one of the composed cycles. A k-element cycle (j1 ... jk) can be written as the product (j1 jk) (j1 jk - 1) ... (j1 j2) of k - 1 cycles. In general, every permutation can be written as a composition of m swaps minus the number of cycles comprising its cycle decomposition. A straightforward induction proof shows that this is optimal.
Now we get to the heart of Subhasis's answer. Instances of the asker's problem correspond one-to-one with Eulerian (for every vertex, in-degree equals out-degree) digraphs G with vertices V and m arcs labeled 1, ..., m. For j in {1, ..., n}, the arc labeled j goes from y(j) to x(j). The problem in terms of G is to determine how many parts a partition of the arcs of G into directed cycles can have. (Since G is Eulerian, such a partition always exists.) This is because the permutations q such that q(x) = y are in one-to-one correspondence with the partitions, as follows. For each cycle (j1 ... jk) of q, there is a part whose directed cycle is comprised of the arcs labeled j1, ..., jk.
The problem with Subhasis's NP-hardness reduction is that arc-disjoint cycle packing on Eulerian digraphs is a special case of arc-disjoint cycle packing on general digraphs, so an NP-hardness result for the latter has no direct implications for the complexity status of the former. In very recent work (see the citation below), however, it has been shown that, indeed, even the Eulerian special case is NP-hard. Thus, by the correspondence above, the asker's problem is as well.
As Subhasis hints, this problem can be solved in polynomial time when n, the size of the alphabet, is fixed (fixed-parameter tractable). Since there are O(n!) distinguishable cycles when the arcs are unlabeled, we can use dynamic programming on a state space of size O(mn), the number of distinguishable subgraphs. In practice, that might be sufficient for (let's say) a binary alphabet, but if I were to try to try to solve this problem exactly on instances with large alphabets, then I likely would try branch and bound, obtaining bounds by using linear programming with column generation to pack cycles fractionally.
#article{DBLP:journals/corr/GutinJSW14,
author = {Gregory Gutin and
Mark Jones and
Bin Sheng and
Magnus Wahlstr{\"o}m},
title = {Parameterized Directed \$k\$-Chinese Postman Problem and \$k\$
Arc-Disjoint Cycles Problem on Euler Digraphs},
journal = {CoRR},
volume = {abs/1402.2137},
year = {2014},
ee = {http://arxiv.org/abs/1402.2137},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
You can construct the "difference" strings S and S', i.e. a string which contains the characters at the differing positions of the two strings, e.g. for acbacb and abcabc it will be cbcb and bcbc. Let us say this contains n characters.
You can now construct a "permutation graph" G which will have n nodes and an edge from i to j if S[i] == S'[j]. In the case of all unique characters, it is easy to see that the required number of swaps will be (n - number of cycles in G), which can be found out in O(n) time.
However, in the case where there are any number of duplicate characters, this reduces to the problem of finding out the largest number of cycles in a directed graph, which, I think, is NP-hard, (e.g. check out: http://www.math.ucsd.edu/~jverstra/dcig.pdf ).
In that paper a few greedy algorithms are pointed out, one of which is particularly simple:
At each step, find the minimum length cycle in the graph (e.g. Find cycle of shortest length in a directed graph with positive weights )
Delete it
Repeat until all vertexes have not been covered.
However, there may be efficient algorithms utilizing the properties of your case (the only one I can think of is that your graphs will be K-partite, where K is the number of unique characters in S). Good luck!
Edit:
Please refer to David's answer for a fuller and correct explanation of the problem.
Do an A* search (see http://en.wikipedia.org/wiki/A-star_search_algorithm for an explanation) for the shortest path through the graph of equivalent strings from one string to the other. Use the Levenshtein distance / 2 as your cost heuristic.

Resources