Related
I have been playing around with a simple n-queens model in MiniZinc:
include "globals.mzn";
int: n_queens = 8;
array[1..n_queens] of var 1..n_queens: queens;
constraint alldifferent(queens);
constraint alldifferent(i in 1..n_queens) (queens[i] + i);
constraint alldifferent(i in 1..n_queens) (queens[i] - i);
solve satisfy;
The MiniZinc handbook mentions failures as the "number of leaf nodes that were failed". Following are the statistics after running the model:
%%%mzn-stat: initTime=0.000576
%%%mzn-stat: solveTime=0.000822
%%%mzn-stat: solutions=1
%%%mzn-stat: variables=24
%%%mzn-stat: propagators=19
%%%mzn-stat: propagations=1415
%%%mzn-stat: nodes=47
%%%mzn-stat: failures=22
%%%mzn-stat: restarts=0
%%%mzn-stat: peakDepth=5
%%%mzn-stat-end
There were 22 failures. Being a beginner to constraint programming, my understanding was that the entire purpose of the paradigm is to prune and avoid leaf nodes as much as possible. I am extra confused since the peak depth of the search tree is reported as 5 (not 8).
Am I interpreting these statistics right? If yes, why are there leaf node failures in the model? Will I create a better model by trying to reduce these failures?
Those values depend of the search strategy, some times you cannot avoid a leaf node because it hasn't been pruned, that means, nothing before it told the solver that that node was going to be a failure, modeling it in a different way can prevent some failures, and can also prevent suboptimal solutions in the case of optimization problems.
These are the first three nodes that got evaluated on the search tree of the default search strategy of minizinc, I labeled them in the image of the Search Tree in the order they got evaluated, and the 4 and 5 to show the arrival to a feasible solution.
In the the blue dots are nodes where there is still uncertainty, the red squares are failures, white dots are non evaluated nodes, large triangles are whole branches where the search only resulted in failures, the green diamond means a feasible solution, and orange diamonds mean non-best-but-feasible solution (only in optimization problems).
The explanation of each of the labeled nodes is
0: Root node: all variables are un assigned
Nothing has happened, these are all the decision variables and their full domains
queens = array1d(1..8, [[1..8], [1..8], [1..8], [1..8], [1..8], [1..8], [1..8], [1..8]]);
1: First decision
Then it picked the smallest value in the domain of the last variable and made the first split, the solver thought either queens[8] = 1(left child of the root) or queens[8] = [2..8](right child of the root), it will first evaluate queens[8] = 1 and that bring the first node to existence,
queens = array1d(1..8, [[2..7], {2..6,8}, {2..5,7..8}, {2..4,6..8}, {2..3,5..8}, {2,4..8}, [3..8], 1]);
where the decision queens[8] = 1 already propagated to the other variables and removed values from its domains.
2: The search continues
Then it again splits at queens[7], this is the left child node where queens[7] = 3, the minimum value in the domain of that variable, and the propagation of that decision to the other variables.
queens = array1d(1..8, [{2,4..7}, {2,4..6}, {2,4..5,8}, {2,4,7..8}, {2,6..8}, [5..8], 3, 1]);
In hindsight (more like cheating by looking at the image of the Search Tree) we know that this whole branch of the search will result in failures, but we cannot know that while searching, because there is still uncertainty in some variables, to know that we would have to evaluate all of the possibilities, which are possibly feasible, that might happen or not, hopefully we will find a satisfying solution before that, but before carry on with the search, notice that already some some pruning got done in the form of nodes that won't exist, for example queens[4] can only take the values 2,4,7,8 at this point, and we haven't made any decision on it, its just the solver eliminating values from the variable that it knows will certainly result in failures, if we where making a brute force search this variable would have the same domain as in the root node [1..8] because we haven't made a decision on it yet, so we are making a smarter search by propagating the constraints.
3: First Failure: but we carry on
Carrying on with the same strategy it makes a split for queens[6], this time the minimum value queens[6] = 5, when propagating to the undecided variables, but there is no solution that satisfies all the constraints (here it gave the value 8 to two queens), so this is a dead end and must backtrack.
queens = array1d(1..8, [7, 2, 4, 8, 8, 5, 3, 1]); ---> Failure
So the very first three nodes of the search lead to a failure.
The search continues like that, since the choice for queens[6] = 5 caused a failure it goes to the next value queens[6] = [6..8], that search also results in the failures that are encircled in red in the image of the Search Tree.
As you can probably guess by now the search strategy is something like go in the order of the variables and split the domain of the variables by picking the smallest value available and put the rest of the domain in another node, this in minizinc search annotations are called input_order and indomain_min.
Now we fast forward the search to the node labeled 4.
4: Prelude to a solution: are we there yet?
Here you can see that queens[8] = 1 (remains the same), queens[7] = 5 while in the node 2 it was queens[7] = 3, that means that all the possibilities where queens[8] = 1 and queens[7] = [3..4] where either evaluated or pruned, but all lead to failures.
queens = array1d(1..8, [{2,4,6..7}, {2..3,6}, {2..4,7}, {3..4,7}, {2,6}, 8, 5, 1]);
Then this node spited into queens[6] = 2 (left child) which lead to more failures and queens[6] = 6 (right child)
5: We struck gold: a feasible Solution !!
queens[2] = 6 propagated, and the result satisfied all the constraints, so we have a solution and we stop the search.
queens = array1d(1..8, [4, 2, 7, 3, 6, 8, 5, 1]);
Pruning
Arriving to the solution only required 47 nodes of the gigantic Whole Search Tree, the area inside the blue line is the search tree is the Search Tree where nodes labeled 0,1,2,3,4,5 are, it is gigantic even pruned for this relatively small instance of 8 decision variables of cardinality 8 with a global constraint which certainly reduces the span of the search tree by a lot since it communicates the domains of the variables between each other much more effectively than the constraint store of the solver. The whole search tree only has 723 nodes in total (nodes and leafs) where only 362 are leafs, while the brute force search could generate all the possible 8^8 leaf nodes directly (again, it might not, but it could), thats a search space of 16.777.216 possibilities (its like 8 octal digits since its 8 variables with cardinality of domain 8), it is a big saving when you compare it, of the 16.777.216 to the solver only 362 made some sense, and 92 where feasible, its less than 0.0001% of the combinations of the whole search space you would face by, for example, generating at random a solution by generating 8 random digits in the range [1..8] and evaluating its feasibility afterwards, talk about a needle in a haystack.
Pruning basically means to reduce the search space, anything better than the evaluation of ALL the combinations, even by removing one single possibility is considered a pruned search space. Since this was a satisfaction problem rather than an optimization one, the pruning is just to remove unfeasible values from the domain of the variables.
In the optimization problems there are two types of pruning, the satisfaction pruning like before, eliminating imposible solutions, and the pruning done by the bounds of the objective function, when the bounds of the objective function can be determined before all the variables reached a value and be it is determined to be "worst" than the current "best" value found so far (i.e. in a minimization optimization the smallest value the objective could take in a branch is larger than the smallest value found so far in a feasible solution) you can prune that branch, which surely contains feasible (but not as good) solutions as well as unfeasible solutions, and save some work, also you still have to prune or evaluate all the tree if you want to find the optimal solution and prove that it is optimal.
To explore search trees like the ones of the images you can run your code with the gecode-gist solver in the minizinc IDE, or use minizinc --Solver gecode-gist <modelFile> <dataFile> in the command line, upon double clicking on one of the nodes you will see the state of the decision variables just like the ones in this post.
And even further use solve :: int_search( pos, varChoise, valChoise, complete) satisfy; to test this different search strategies
% variable selections:
ann : varChoise
% = input_order
% = first_fail
% = smallest
% = largest
;
% value selections:
ann : valChoise
% = indomain_min
% = indomain_max
% = indomain_median
% = indomain_random
% = indomain_split
% = indomain_reverse_split
;
just paste this in you model and uncomment one varChoise annotation and one valChoise to test that combination of variable selection and value selection, and see if one strategy finds the solution with less failures, less nodes, or less propagations. You can read more about them in the minizinc documentation.
This is a first run-in with not only bitwise ops in python, but also strange (to me) syntax.
for i in range(2**len(set_)//2):
parts = [set(), set()]
for item in set_:
parts[i&1].add(item)
i >>= 1
For context, set_ is just a list of 4 letters.
There's a bit to unpack here. First, I've never seen [set(), set()]. I must be using the wrong keywords, as I couldn't find it in the docs. It looks like it creates a matrix in pythontutor, but I cannot say for certain. Second, while parts[i&1] is a slicing operation, I'm not entirely sure why a bitwise operation is required. For example, 0&1 should be 1 and 1&1 should be 0 (carry the one), so binary 10 (or 2 in decimal)? Finally, the last bitwise operation is completely bewildering. I believe a right shift is the same as dividing by two (I hope), but why i>>=1? I don't know how to interpret that. Any guidance would be sincerely appreciated.
[set(), set()] creates a list consisting of two empty sets.
0&1 is 0, 1&1 is 1. There is no carry in bitwise operations. parts[i&1] therefore refers to the first set when i is even, the second when i is odd.
i >>= 1 shifts right by one bit (which is indeed the same as dividing by two), then assigns the result back to i. It's the same basic concept as using i += 1 to increment a variable.
The effect of the inner loop is to partition the elements of _set into two subsets, based on the bits of i. If the limit in the outer loop had been simply 2 ** len(_set), the code would generate every possible such partitioning. But since that limit was divided by two, only half of the possible partitions get generated - I couldn't guess what the point of that might be, without more context.
I've never seen [set(), set()]
This isn't anything interesting, just a list with two new sets in it. So you have seen it, because it's not new syntax. Just a list and constructors.
parts[i&1]
This tests the least significant bit of i and selects either parts[0] (if the lsb was 0) or parts[1] (if the lsb was 1). Nothing fancy like slicing, just plain old indexing into a list. The thing you get out is a set, .add(item) does the obvious thing: adds something to whichever set was selected.
but why i>>=1? I don't know how to interpret that
Take the bits in i and move them one position to the right, dropping the old lsb, and keeping the sign. Sort of like this
Except of course that in Python you have arbitrary-precision integers, so it's however long it needs to be instead of 8 bits.
For positive numbers, the part about copying the sign is irrelevant.
You can think of right shift by 1 as a flooring division by 2 (this is different from truncation, negative numbers are rounded towards negative infinity, eg -1 >> 1 = -1), but that interpretation is usually more complicated to reason about.
Anyway, the way it is used here is just a way to loop through the bits of i, testing them one by one from low to high, but instead of changing which bit it tests it moves the bit it wants to test into the same position every time.
I am trying to draw a DAG for Longest Increasing Subsequence {3,2,6,4,5,1} but cannot break this into a DAG structure.
Is it possible to represent this in a tree like structure?
As far as I know, the answer to the actual question in the title is, "No, not all DP programs can be reduced to DAGs."
Reducing a DP to a DAG is one of my favorite tricks, and when it works, it often gives me key insights into the problem, so I find it always worth trying. But I have encountered some that seem to require at least hypergraphs, and this paper and related research seems to bear that out.
This might be an appropriate question for the CS Stack Exchange, meaning the abstract question about graph reduction, not the specific question about longest increasing subsequence.
Assuming following Sequence, S = {3,2,6,4,5,1,7,8} and R = root node. Your tree or DAG will look like
R
3 2 4 1
6 5 7
8
And your result is the longest path (from root to the node with the maximum depth) in the tree (result = {r,1,7,8}).
The result above show the longest increasing sequence in S. The Tree for the longest increasing subsequence in S look as follows
R
3 2 6 4 5 1 7 8
6 4 7 5 7 7 8
7 5 8 7 8 8
8 7 8
8
And again the result is the longest path (from root to the node with the maximum depth) in the tree (result = {r,2,4,5,7,8}).
The answer to this question should be YES.
I'd like to cite the following from here: The Soul of Dynamic Programming
Formulations and Implementations.
A DP must have a corresponding DAG (most of the time implicit), otherwise we cannot find a valid order for computation.
For your case, Longest Increasing Subsequence can be represented as some DAG like the following:
The task is amount to finding the longest path in that DAG. For more information please refer to section 6.2 of Algorithms, Dynamic programming.
Yes, It is possible to represent longest Increasing DP Problem as DAG.
The solution is to find the longest path ( a path that contains maximum nodes) from every node to the last node possible for that particular node.
Here, S is the starting node, E is the ending node and C is count of nodes between S and E.
S E C
3 5 3
2 5 3
6 6 1
4 5 2
5 5 1
1 1 1
so the answer is 3 and it is very easy to generate solution as we have to traverse the nodes only.
I think it might help you.
Reference: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-20-dynamic-programming-ii-text-justification-blackjack/
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why does the indexing start with zero in 'C'?
Why do prevailing programming languages like C use array starting from 0? I know some programming languages like PASCAL have arrays starting from 1. Are there any good reasons for doing so? Or is it merely a historical reason?
Because you access array elements by offset relative to the beginning of the array.
First element is at offset 0.
Later more complex array data structures appeared (such as SAFEARRAY) that allowed arbitrary lower bound.
In C, the name of an array is essentially a pointer, a reference to a memory location, and so the expression array[n] refers to a memory location n-elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0]. Most programming languages have been designed this way, so indexing from 0 is pretty much inherent to the language.
However, Dijkstra explains why we should index from 0. This is a problem on how to denote a subsequence of natural numbers, say for example 1,2,3,...,10. We have four solutions available:
a. 0 < i < 11
b. 1<= i < 11
c. 0 < i <= 10
d. 1 <= i <= 10
Dijkstra argues that the proper notation should be able to denote naturally the two following cases:
The subsequence includes the smallest natural number, 0
The subsequence is empty
Requirement 1. leaves out a. and c. since they would have the form -1 < i which uses a number not lying in the natural number set (Dijkstra says this is ugly). So we are left with b. and d. Now requirement 2. leaves out d. since for a set including 0 that is shrunk to the empty one, d. takes the form 0 <= i <= -1, which is a little messed up! Subtracting the ranges in b. we also get the sequence length, which is another plus. Hence we are left with b. which is by far the most widely used notation in programming now.
Now you know. So, remember and take pride in the fact that each time you write something like
for( i=0; i<N; i++ ) {
sum += a[i];
}
you are not just following the rules of language notation. You are also promoting mathematical beauty!
here
In assembly and C, arrays was implemented as memory pointers. There the first element was stored at offset 0 from the pointer.
In C arrays are tied to pointers. Array index is a number that you add to the pointer to the array's initial element. This is tied to one of the addressing modes of PDP-11, where you could specify a base address, and place an offset to it in a register to simulate an array. By the way, this is the same place from which ++ and -- came from: PDP-11 provided so-called auto-increment and auto-decrement addressing modes.
P.S. I think Pascal used 1 by default; generally, you were allowed to specify the range of your array explicitly, so you could start it at -10 and end at +20 if you wanted.
Suppose you can store only two bits. That gives you four combinations:
00 10 01 11 Now, assign integers to those 4 values. Two reasonable mappings are:
00->0
01->1
10->2
11->3
and
11->-2
10->-1
00->0
01->1
(Another idea is to use signed magnitude and use the mapping:
11->-1 10->-0 00->+0 01->+1)
It simply does not make sense to use 00 to represent 1 and use 11 to represent 4. Counting from 0 is natural. Counting from 1 is not.
I've been using J for a few months now, and I find that reading unfamiliar code (e.g. that I didn't write myself) is one of the most challenging aspects of the language, particularly when it's in tacit. After a while, I came up with this strategy:
1) Copy the code segment into a word document
2) Take each operator from (1) and place it on a separate line, so that it reads vertically
3) Replace each operator with its verbal description in the Vocabulary page
4) Do a rough translation from J syntax into English grammar
5) Use the translation to identify conceptually related components and separate them with line breaks
6) Write a description of what each component from (5) is supposed to do, in plain English prose
7) Write a description of what the whole program is supposed to do, based on (6)
8) Write an explanation of why the code from (1) can be said to represent the design concept from (7).
Although I learn a lot from this process, I find it to be rather arduous and time-consuming -- especially if someone designed their program using a concept I never encountered before. So I wonder: do other people in the J community have favorite ways to figure out obscure code? If so, what are the advantages and disadvantages of these methods?
EDIT:
An example of the sort of code I would need to break down is the following:
binconv =: +/# ((|.#(2^i.###])) * ]) # ((3&#.)^:_1)
I wrote this one myself, so I happen to know that it takes a numerical input, reinterprets it as a ternary array and interprets the result as the representation of a number in base-2 with at most one duplication. (e.g., binconv 5 = (3^1)+2*(3^0) -> 1 2 -> (2^1)+2*(2^0) = 4.) But if I had stumbled upon it without any prior history or documentation, figuring out that this is what it does would be a nontrivial exercise.
Just wanted to add to Jordan's Answer : if you don't have box display turned on, you can format things this way explicitly with 5!:2
f =. <.#-:##{/:~
5!:2 < 'f'
┌───────────────┬─┬──────┐
│┌─────────┬─┬─┐│{│┌──┬─┐│
││┌──┬─┬──┐│#│#││ ││/:│~││
│││<.│#│-:││ │ ││ │└──┴─┘│
││└──┴─┴──┘│ │ ││ │ │
│└─────────┴─┴─┘│ │ │
└───────────────┴─┴──────┘
There's also a tree display:
5!:4 <'f'
┌─ <.
┌─ # ─┴─ -:
┌─ # ─┴─ #
──┼─ {
└─ ~ ─── /:
See the vocabulary page for 5!: Representation and also 9!: Global Parameters for changing the default.
Also, for what it's worth, my own approach to reading J has been to retype the expression by hand, building it up from right to left, and looking up the pieces as I go, and using identity functions to form temporary trains when I need to.
So for example:
/:~ i.5
0 1 2 3 4
NB. That didn't tell me anything
/:~ 'hello'
ehllo
NB. Okay, so it sorts. Let's try it as a train:
[ { /:~ 'hello'
┌─────┐
│ehllo│
└─────┘
NB. Whoops. I meant a train:
([ { /:~) 'hello'
|domain error
| ([{/:~)'hello'
NB. Not helpful, but the dictionary says
NB. "{" ("From") wants a number on the left.
(0: { /:~) 'hello'
e
(1: { /:~) 'hello'
h
NB. Okay, it's selecting an item from the sorted list.
NB. So f is taking the ( <. # -: # # )th item, whatever that means...
<. -: # 'hello'
2
NB. ??!?....No idea. Let's look up the words in the dictionary.
NB. Okay, so it's the floor (<.) of half (-:) the length (#)
NB. So the whole phrase selects an item halfway through the list.
NB. Let's test to make sure.
f 'radar' NB. should return 'd'
d
NB. Yay!
addendum:
NB. just to be clear:
f 'drara' NB. should also return 'd' because it sorts first
d
Try breaking the verb up into its components first, and then see what they do. And rather than always referring to the vocab, you could simply try out a component on data to see what it does, and see if you can figure it out. To see the structure of the verb, it helps to know what parts of speech you're looking at, and how to identify basic constructions like forks (and of course, in larger tacit constructions, separate by parentheses). Simply typing the verb into the ijx window and pressing enter will break down the structure too, and probably help.
Consider the following simple example: <.#-:##{/:~
I know that <. -: # { and /: are all verbs, ~ is an adverb, and # is a conjunction (see the parts of speech link in the vocab). Therefore I can see that this is a fork structure with left verb <.#-:## , right verb /:~ , and dyad { . This takes some practice to see, but there is an easier way, let J show you the structure by typing it into the ijx window and pressing enter:
<.#-:##{/:~
+---------------+-+------+
|+---------+-+-+|{|+--+-+|
||+--+-+--+|#|#|| ||/:|~||
|||<.|#|-:|| | || |+--+-+|
||+--+-+--+| | || | |
|+---------+-+-+| | |
+---------------+-+------+
Here you can see the structure of the verb (or, you will be able to after you get used to looking at these). Then, if you can't identify the pieces, play with them to see what they do.
10?20
15 10 18 7 17 12 19 16 4 2
/:~ 10?20
1 4 6 7 8 10 11 15 17 19
<.#-:## 10?20
5
You can break them down further and experiment as needed to figure them out (this little example is a median verb).
J packs a lot of code into a few characters and big tacit verbs can look very intimidating, even to experienced users. Experimenting will be quicker than your documenting method, and you can really learn a lot about J by trying to break down large complex verbs. I think I'd recommend focusing on trying to see the grammatical structure and then figure out the pieces, building it up step by step (since that's how you'll eventually be writing tacit verbs).
(I'm putting this in the answer section instead of editing the question because the question looks long enough as it is.)
I just found an excellent paper on the jsoftware website that works well in combination with Jordan's answer and the method I described in the question. The author makes some pertinent observations:
1) A verb modified by an adverb is a verb.
2) A train of more than three consecutive verbs is a series of forks, which may have a single verb or a hook at the far left-hand side depending on how many verbs there are.
This speeds up the process of translating a tacit expression into English, since it lets you group verbs and adverbs into conceptual units and then use the nested fork structure to quickly determine whether an instance of an operator is monadic or dyadic. Here's an example of a translation I did using the refined method:
d28=: [:+/\{.#],>:#[#(}.-}:)#]%>:#[
[: +/\
{.#] ,
>:#[ #
(}.-}:)#] %
>:#[
cap (plus infix prefix)
(head atop right argument) ravel
(increment atop left argument) tally
(behead minus curtail) atop right
argument
divided by
increment atop left argument
the partial sums of the sequence
defined by
the first item of the right argument,
raveled together with
(one plus the left argument) copies
of
(all but the first element) minus
(all but the last element)
of the right argument, divided by
(one plus the left argument).
the partial sums of the sequence
defined by
starting with the same initial point,
and appending consecutive copies of
points derived from the right argument by
subtracting each predecessor from its
successor
and dividing the result by the number
of copies to be made
Interpolating x-many values between
the items of y
I just want to talk about how I read:
<.#-:##{/:~
First off, I knew that if it was a function, from the command line, it had to be entered (for testing) as
(<.#-:##{/:~)
Now I looked at the stuff in the parenthesis. I saw a /:~, which returns a sorted list of its arguments, { which selects an item from a list, # which returns the number of items in a list, -: half, and <., floor...and I started to think that it might be median, - half of the number of items in the list rounded down, but how did # get its arguments? I looked at the # signs - and realized that there were three verbs there - so this is a fork. The list comes in at the right and is sorted, then at the left, the fork got the list to the # to get the number of arguments, and then we knew it took the floor of half of that. So now we have the execution sequence:
sort, and pass the output to the middle verb as the right argument.
Take the floor of half of the number of elements in the list, and that becomes the left argument of the middle verb.
Do the middle verb.
That is my approach. I agree that sometimes the phrases have too many odd things, and you need to look them up, but I am always figuring this stuff out at the J instant command line.
Personally, I think of J code in terms of what it does -- if I do not have any example arguments, I rapidly get lost. If I do have examples, it's usually easy for me to see what a sub-expression is doing.
And, when it gets hard, that means I need to look up a word in the dictionary, or possibly study its grammar.
Reading through the prescriptions here, I get the idea that this is not too different from how other people work with the language.
Maybe we should call this 'Test Driven Comprehension'?