several walkers walking on a grid: How to organize the threads? - multithreading

My algorithm is processing DEMs. a DEM (Digital Elevation Model) is a representations of ground topography where elevation is known at grid nodes.
My problem can be summarized as follows:
Q is a queue containing nodes to visit.
at start, the boundary of the grid is pushed in Q.
while Q is not empty, do
remove Node N from the top of Q
if N was never visited then do
consider the 8 neighbors of N
among them select the unvisited ones
among them select those with a higher elevation than N's
push these at Q's tail
mark N as visited
done
done
As described, the algorithm will mark as 'visited' every node that can be reached from the border by a continuously ascendant path. It is worth noticing that the order of processing the nodes in the queue is unimportant. Note also that some points may request a tortuous ascendant path to be reached from the border. Think for example to a cone with a furrow spiraling around it. The ridge of the furrow is such a unique and tortuous path capable of reaching the top of the cone without never descending into the furrow.
Anyway, I want to mutithread this algorithm. I am still in the first step of wondering which is the best organization of data and threads in order to have as least pain as possible at debugging the beast when it is written.
My first thought is to divide the grid into tiles and split the Queue in as many tiles as there is in the grid. The tiles are piled in a work-list. A few threads are parsing the work-list and grab any tile where something can be done at the moment.
Working on a specific tile will firstly need that the tile's queue is not empty. I may also need that the neighboring tiles can be locked if the walker's tile has to visit a node at the edge of the tile.
I am thinking that when a walker cannot lock a neighboring tile while it needs to, then it can skip to the next node in the local queue, or even the thread itself can release the tile to the work-list and seek for another tile to work on.
My actual experience of multi-thread programming is good enough to understand that this lovely description is very likely to turn into a nightmare when I will debug it. However I am not experienced enough to evaluate the various possibilities of programming the algorithm and make a good decision, having in mind that I will not be given a month to debug a spaghetti dish.
Thanks for reading :)

Related

Should the Monte Carlo tree in calculating the previous bestMove be used to feed the next Monte Carlo search?

I have seen some MCTS implementation online and how they are used in a game.
A best move is calculated each move based on the state at that moment.
If you have a sequence of moves in a game between human and computer like:
turn_h1,turn_c1,turn_h2,turn_c2,turn_h3,turn_c3,....turn_hn,turn_cn
turn_h(i)=human, turn_c(i)=computer and i the i-th move of a player (human/computer).
And for each computer's turn i there is a corresponding state that is used to determine the i-th best move with MCTS.
Question: Should the tree built in the (i-1)-th turn(bestmove) be used for the i-th turn(MCTS bestmove)?
I mean, should the tree which was the result of the best move in state (n-1) be used as input for determining the best move at the i-th state?
Other words can I re-use already constructed tree-nodes from previous turns/bestmoves calculations, so that I do not need to build the whole tree again?
I have created a sequence of turns in pseudo-code just to make clear what what I mean with using the (i-1)th state(tree) to feed the next MCST bestmove. (of course in real world the logic below would be implemented as an iteration/loop construct):
#start game
initial_game_state.board= initialize_board()
#turn 1
#human play
new_game_state_1 = initial_game_state.board.make_move(move_1)
#computer play
move_1 = MCTS.determine_bestmove(new_game_state_1)
new_game_state_2 = game_state_1.board.make_move(move_1)
#turn 2
#human play
new_game_state_3 = new_game_state_2.board.make_move(move_2)
#computer play
move_3 = MCTS.determine_bestmove(new_game_state_3)
new_game_state_4 = new_game_state_4.board.makeMove(move_3)
#turn 3
# ....
Yes you can do this. This is commonly referred to as "tree reuse" (at least, that's how I usually call it).
You would start out your MCTS call (except for the very first one, in which there is no "previous tree" yet) by navigating from the root node to the node that corresponds to the one you have actually reached in the "real" game.
Note that, in a two-player alternating-move game, this does not only involve a move that your MCTS agent made, but also a move made by the opponent. Due to how MCTS work, if the opponent "surprised" your MCTS agent by selecting a move that MCTS didn't predict, it is likely that this leads to a subtree of the previous tree that had relatively few visits. In this case, tree reuse won't have much effect. But in cases where the opponent doesn't surprise you, and plays exactly what MCTS already predicted during the previous search, you may end up getting a relatively large subtree to initialise your new search with.
As for if you "should" do this, as is the literal wording in your question... you don't have to. There are many MCTS implementations out there which don't do this. I'd generally recommend it anyway. It's not too difficult to implement. It generally won't give a big boost in performance (because the playing strength of MCTS tends to scale sub-linearly with increases in "thinking time"), but it definitely shouldn't hurt either, and may give a small boost in playing strength.
Note that, in nondeterministic games, if you implement an "open-loop" variant of MCTS (without explicit chance nodes), the part of the subtree that you're "re-using" will be partially based on outdated information. In such games, it may be beneficial to discount all the statistics gathered in your previous search (i.e. multiply all your visit counts and accumulated scores by a number between 0 and 1) before starting the new search process.
Important implementation detail: when re-using the previous tree, if your new root node (which used to be a node in the middle of your previous tree) has a reference/pointer back to its parent node, make sure to set it to null. If you forget about this, all search trees of all your previous searches will fully persist in memory throughout an entire game, and you'll likely run out of memory quickly.

How Might I organize vertex data in WebGL for a frame-by-frame (very specific) animated program?

I have been working on an animated graphics project with very specific requirements, and after quite a bit of searching and test coding, I have figured that I could take several approaches, but the Khronos and MDN documentation I have been reading coupled with other posts I have seen here don't answer all of my questions regarding my particular project. In the meantime, I have written short test programs (setting infrastructure for testing).
Firstly, I should describe the project:
The main object drawn to the screen is a simple quad surrounded by a black outline (LINE_LOOP or LINES will do, probably, though I have had issues with z-fighting...that will be left for another question). When the user interacts with the program, exactly one new quad is created and immediately drawn, but for a set amount of time its vertices move around until the quad moves to its final destination. (Note that translations won't do.) Random black lines are also drawn, and sometimes those lines also move around.
Once one of the quads reaches its final spot, it never moves again.
A new quad is always atop old quads (closer to the screen). That means that I need to layer the quads and lines from oldest to newest.
*this also means that it would probably be best to assign z-values to each quad and line, even if the graphics are in pixel coordinates and use an orthographic matrix. Would everyone agree with this?
Given these parameters, I have a few options with varying levels of complexity:
1> Take the object-oriented approach and just assign a buffer to each quad, and the same goes for the random lines. --creation and destruction of buffers every frame for the one shape that is moving. I truthfully think that this is a terrible idea that might only work in a higher level library that does heavy optimization underneath. This approach also doesn't take advantage of the fact that almost every quad will stay the same.
[vertices0] ... , [verticesN]
Draw x N (many draws for many small-size buffers)
2> Assign a z-value to each quad, outline, and line (as mentioned above). Allocate a huge vertex buffer and element buffer to store all permanently-in-their-final-positions quads. Resize only in the very unlikely case someone interacts for long enough. Create a second tiny buffer to store the one temporary moving quad and use bufferSubData every frame. When the quad reaches its destination, bufferSubData it into the large buffer and overwrite the small buffer upon creation of the next quad...all on the same frame. The main questions I have here are: is it possible (safe?) to use bufferSubData and draw it on the same frame? Also, would I use DYNAMIC_DRAW on both buffers even though the larger one would see fewer updates?
[permanent vertices ... | uninitialized (keep a count)]
bufferSubData -> [tempVerticesForOneQuad]
Draw 2x
3> Still create the large and small buffers, but instead of using bufferSubData every frame, create a second shader program and add an attribute for the new/moving quad that explicitly sets the vertex positions for the animation (I would pass vertex index attributes). Only draw with the small buffer when the quad is moving. For the frame when the quad reaches its destination, draw both large and small buffer, but then bufferSubData the final coordinates into the large permanent buffer to be used in the next frame.
switchToShaderProgramA();
[permanent vertices...| uninitialized (keep a count)]
switchToShaderProgramB();
[temp vertices] <- shader B accepts indices for each vertex so we can do all animation in the vertex shader
---last frame of movement arrives : bufferSubData into the permanent vertices buffer for when the the next quad is created
I get the sense that the third option might be the best, but I would like to learn whether there are some other factors that I did not consider. For example, my assumption that a program switch, additional attributes, and vertex shader manipulation would be faster than just substituting the buffer values as in 2>. The advantage of approach 3> (I think) is that I can defer the buffer substitution to a time when nothing needs to be drawn.
Still, I am still not sure of how to work with the randomly-appearing lines. I can't take the "single quad vertex buffer" approach since the number of lines cannot be predicted. Might I also allocate a large buffer for the moving lines? Those also stay after the quad is finished moving, though I don't think that I could use the vertex shader trick because there would be too many attributes to set (as opposed to the 4 for the one quad). I suppose that I could create a large "permanent line data" buffer first, but what to do during the animation is tricky because the lines move. Maybe bufferSubData() + draw on the same frame is not terrible? Or it could be. This is where I need advise.
I understand that this question might not be too specific code-wise, but I don't believe that I would be allowed to show the core of the program. All I have is the typical WebGL boilerplate ready.
I am looking forward to hearing people's thoughts on how I might proceed and whether there are any trade-offs I might have missed when considering the three options above.
Thank you in advance, and please feel free to ask any additional questions if clarification is necessary.
Honestly, for what you're describing, it doesn't sound to me like it matters which you choose. On modern hardware, drawing a few hundred quads and a few thousand lines each frame would not really tax the hardware much.
Having said that, I agree that approach 1 seems very inefficient. Approach 2 sounds perfectly fine. You can safely draw a buffer on the same frame that you uploaded the data. I don't think it matters much whether you use DYNAMIC_DRAW or STATIC_DRAW for the buffer. I tend to think of dynamic buffers as being something you're updating every frame. If you only update it every few seconds or less, then static is fine. Approach 3 is also fine. Between 2 and 3, I'd say do whichever is easier for you to understand and program.
Likewise, for the lines, I would use a separate buffer. It sounds like that one changes per frame, so I would use DYNAMIC_DRAW for that. Allocating a single large buffer for it and performing a glBufferSubData() per frame is probably a fine strategy. As always, trying it and profiling it will tell you for sure.

How does Monte Carlo Search Tree work?

Trying to learn MCST using YouTube videos and papers like this one.
http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf
However I am not having much of a luck understanding the details beyond the high level theoretical explanations. Here are some quotes from the paper above and questions I have.
Selection Phase: MCTS iteratively selects the highest scoring child node of the current state. If the current state is the root node, where did these children come from in the first place? Wouldn't you have a tree with just a single root node to begin with? With just a single root node, do you get into Expansion and Simulation phase right away?
If MCTS selects the highest scoring child node in Selection phase, you never explore other children or possibly even a brand new child whilst going down the levels of the tree?
How does the Expansion phase happen for a node? In the diagram above, why did it not choose leaf node but decided to add a sibling to the leaf node?
During the Simulation phase, stochastic policy is used to select legal moves for both players until the game terminates. Is this stochastic policy a hard-coded behavior and you are basically rolling a dice in the simulation to choose one of the possible moves taking turns between each player until the end?
The way I understand this is you start at a single root node and by repeating the above phases you construct the tree to a certain depth. Then you choose the child with the best score at the second level as your next move. The size of the tree you are willing to construct is basically your hard AI responsiveness requirement right? Since while the tree is being constructed the game will stall and compute this tree.
Selection Phase: MCTS iteratively selects the highest scoring child node of the current state. If the current state is the root node, where did these children come from in the first place? Wouldn't you have a tree with just a single root node to begin with? With just a single root node, do you get into Expansion and Simulation phase right away?
The selection step is typically implemented not to actually choose among nodes which really exist in the tree (having been created through the Expansion step). It is typically ipmlemented to choose among all possible successor states of the game state matching your current node.
So, at the very beginning, when you have just a root node, you'll want your Selection step to still be able to select one out of all the possible successor game states (even if they don't have matching nodes in the tree yet). Typically you'll want a very high score (infinite, or some very large constant) for game states which have never been visited yet (which don't have nodes in the tree yet). This way, your Selection Step will always randomly select among any states that don't have a matching node yet, and only really use the exploration vs. exploitation trade-off in cases where all possible game states already have a matching node in the tree.
If MCTS selects the highest scoring child node in Selection phase, you never explore other children or possibly even a brand new child whilst going down the levels of the tree?
The ''score'' used by the Selection step should typically not just be the average of all outcomes of simulations going through that node. It should typically be a score consisting of two parts; an "exploration" part, which is high for nodes that have been visited relatively infrequently, and an "exploitation" part, which is high for nodes which appear to be good moves so far (where many simulations going through that node ended in a win for the player who's allowed to choose a move to make). This is described in Section 3.4 of the paper you linked. The W(s, a) / N(s, a) is the exploitation part (simply average score), and the B(s, a) is the exploration part.
How does the Expansion phase happen for a node? In the diagram above, why did it not choose leaf node but decided to add a sibling to the leaf node?
The Expansion step is typically implemented to simply add a node corresponding to the final game state selected by the Selection Step (following what I answered to your first question, the Selection Step will always end in selecting one game state that has never been selected before).
During the Simulation phase, stochastic policy is used to select legal moves for both players until the game terminates. Is this stochastic policy a hard-coded behavior and you are basically rolling a dice in the simulation to choose one of the possible moves taking turns between each player until the end?
The most straightforward (and probably most common) implementation is indeed to play completely at random. It is also possible to do this differently though. You could for example use heuristics to create a bias towards certain actions. Typically, completely random play is faster, allowing you to run more simulations in the same amount of processing time. However, it typically also means every individual simulation is less informative, meaning you actually need to run more simulations for MCTS to play well.
The way I understand this is you start at a single root node and by repeating the above phases you construct the tree to a certain depth. Then you choose the child with the best score at the second level as your next move. The size of the tree you are willing to construct is basically your hard AI responsiveness requirement right? Since while the tree is being constructed the game will stall and compute this tree.
MCTS does not uniformly explore all parts of the tree to the same depth. It has a tendency to explore parts which appear to be interesting (strong moves) deeper than parts which appear to be uninteresting (weak moves). So, typically you wouldn't really use a depth limit. Instead, you would use a time limit (for example, keep running iterations until you've spent 1 second, or 5 seconds, or 1 minute, or whatever amount of processing time you allow), or an iteration count limit (for example, allow it to run 10K or 50K or any number of simulations you like).
Basically, Monte Carlo is : try randomly many times(*) and then keep the move that led to the best outcome most of the times.
(*) : the number of times and the depth depends on the speed of the decision you want to acheive.
So the root node is always the current game state with immediate children being your possible moves.
If you can do 2 moves (yes/no, left/right,...) then you have 2 sub-nodes.
If you cannot do any moves (it may happen depending on the game) then you do not have any decision to make, then Montec Carlo is useless for this move.
If you have X possible moves (chess game) then each possible move is a direct child node.
Then, (in a 2 player game), evey level is alternating "your moves", "opponent moves" and so on.
How to traverse the tree should be random (uniform).
Your move 1 (random move of sub-level 1)
His move 4 (random move of sub-level 2)
Your move 3 (random move of sub-level 3) -> win yay
Pick a reference maximum depth and evaluate how many times you win or lose (or have a sot of evaluation function if the game is not finished after X depth).
You repeat the operation Y times (being quite large) and you select the immediate child node (aka: your move) that leads to you winning most of the times.
This is to evaluate which move you should do now. After this, the opponent moves and it is your turn again. So you have to re-create a tree with the root node being the new current situation and redo the Monte Carlo technique to guess what is your best possible move. And so on.

TicTacToe strategic reduction

I decided to write a small program that solves TicTacToe in order to try out the effect of some pruning techniques on a trivial game. The full game tree using minimax to solve it only ends up with 549,946 possible games. With alpha-beta pruning, the number of states required to evaluate was reduced to 18,297. Then I applied a transposition table that brings the number down to 2,592. Now I want to see how low that number can go.
The next enhancement I want to apply is a strategic reduction. The basic idea is to combine states that have equivalent strategic value. For instance, on the first move, if X plays first, there is nothing strategically different (assuming your opponent plays optimally) about choosing one corner instead of another. In the same situation, the same is true of the center of the walls of the board, and the center is also significant. By reducing to significant states only, you end up with only 3 states for evaluation on the first move instead of 9. This technique should be very useful since it prunes states near the top of the game tree. This idea came from the GameShrink method created by a group at CMU, only I am trying to avoid writing the general form, and just doing what is needed to apply the technique to TicTacToe.
In order to achieve this, I modified my hash function (for the transposition table) to enumerate all strategically equivalent positions (using rotation and flipping functions), and to only return the lowest of the values for each board. Unfortunately now my program thinks X can force a win in 5 moves from an empty board when going first. After a long debugging session, it became apparent to me the program was always returning the move for the lowest strategically significant move (I store the last move in the transposition table as part of my state). Is there a better way I can go about adding this feature, or a simple method for determining the correct move applicable to the current situation with what I have already done?
My gut feeling is that you are using too big of a hammer to attack this problem. Each of the 9 spots can only have one of two labels: X or O or empty. You have then at most 3^9 = 19,683 unique boards. Since there are 3 equivalent reflections for every board, you really only have 3^9 / 4 ~ 5k boards. You can reduce this by throwing out invalid boards (if they have a row of X's AND a row of O's simultaneously).
So with a compact representation, you would need less than 10kb of memory to enumerate everything. I would evaluate and store the entire game graph in memory.
We can label every single board with its true minimax value, by computing the minimax values bottom up instead of top down (as in your tree search method). Here's a general outline: We compute the minimax values for all unique boards and label them all first, before the game starts. To make the minimax move, you simply look at the boards succeeding your current state, and pick the move with the best minimax value.
Here's how to perform the initial labeling. Generate all valid unique boards, throwing out reflections. Now we start labeling the boards with the most moves (9), and iterating down to the boards with least moves (0). Label any endgame boards with wins, losses, and draws. For any non-endgame boards where it's X's turn to move: 1) if there exists a successor board that's a win for X, label this board a win; 2) if in successor boards there are no wins but there exists a draw, then label this board a draw; 3) if in successor boards there are no wins and no draws then label this board a loss. The logic is similar when labeling for O's turn.
As far as implementation goes, because of the small size of the state space I would code the "if there exists" logic just as a simple loop over all 5k states. But if you really wanted to tweak this for asymptotic running time, you would construct a directed graph of which board states lead to which other board states, and perform the minimax labeling by traversing in the reverse direction of the edges.
Out of curiosity, I wrote a program to build a full transposition table to play the game without any additional logic. Taking the 8 symmetries into account, and assuming computer (X) starts and plays deterministic, then only 49 table entries are needed!
1 entry for empty board
5 entries for 2 pieces
21 entries for 4 pieces
18 entries for 6 pieces
4 entries for 8 pieces
You're on the right track when you're thinking about reflections and rotations. However, you're applying it to the wrong place. Don't add it to your transposition table or your transposition table code -- put it inside the move generation function, to eliminate logically equivalent states from the get-go.
Keep your transposition table and associated code as small and as efficient as possible.
You need to return the (reverse) transposition along with the lowest value position. That way you can apply the reverse transposition to the prospective moves in order to get the next position.
Why do you need to make the transposition table mutable? The best move does not depend on the history.
There is a lot that can be said about this, but I will just give one tip here which will reduce your tree size: Matt Ginsberg developed a method called Partition Search which does equivalency reductions on the board. It worked well in Bridge, and he uses tic-tac-toe as an example.
You may want to try to solve tic-tac-toe using monte-carlo simulation. If one (or both) of the players is a machine player, it could simply use the following steps (this idea comes from one of the mini-projects in the coursera course Principles of Computing 1 which is a part of the Specialization Fundamentals of Computing, taught by RICE university.):
Each of the machine players should use a Monte Carlo simulation to choose the next move from a given TicTacToe board position. The general idea is to play a collection of games with random moves starting from the position, and then use the results of these games to compute a good move.
When a paritular machine player wins one of these random games, it wants to favor the squares in which it played (in hope of choosing a winning move) and avoid the squares in which the opponent played. Conversely, when it loses one of these random games, it wants to favor the squares in which the opponent played (to block its opponent) and avoid the squares in which it played.
In short, squares in which the winning player played in these random games should be favored over squares in which the losing player played. Both the players in this case will be the machine players.
The following animation shows a game played between 2 machine players (that ended in a tie), using 10 MC trials at each board state to determine the next move.
It shows how each of the machine players learns to play the game just by using Monte-Carlo Simulation with 10 trials (a small number of trials) at every state of the board, the scores shown at the right bottom of each grid square are used by each of the players at their corresponding turns, to choose its next move (the brighter cells represent better moves for the current player, as per the simulation results).
Here is my blog on this for more details.

Visit all nodes in a graph with least repeat visits

I have a tile based map where several tiles are walls and others are walkable. the walkable tiles make up a graph I would like to use in path planning. My question is are their any good algorithms for finding a path which visits every node in the graph, minimising repeat visits?
For example:
map example http://img220.imageshack.us/img220/3488/mapq.png
If the bottom yellow tile is the starting point, the best path to visit all tiles with least repeats is:
path example http://img222.imageshack.us/img222/7773/mapd.png
There are two repeat visits in this path. A worse path would be to take a left at the first junction, then backtrack over three already visited tiles.
I don't care about the end node but the start node is important.
Thanks.
Edit:
I added pictures to my question but cannot see them when viewing it. here they are:
http://img220.imageshack.us/img220/3488/mapq.png
http://img222.imageshack.us/img222/7773/mapd.png
Additionally, in the graphs I need this for there will never be a situation where min repeats = 0. That is, to step on every tile in the map the player must cross his own path at least once.
Your wording is bad -- it allows a reduction to an NP-complete problem. If you could minimize repeat visits, then could you push them to 0 and then you would have a Hamiltonian Cycle. Which is solvable, but hard.
This sounds like it could be mapped onto the traveling salesman problem ... and so likely ends up being NP complete and no efficient deterministic algorithm is known.
Finding a path is fairly straight forward -- find a (or the minimum) spanning subtree and then do a depth/breadth-first traversal. Finding the optimal route is the really difficult bit.
You could use one of the dynamic optimization techniques to try and converge on a fairly good solution.
Unless there is some attribute of the minimum spanning subtree that could be used to generate the best path ... but I don't remember enough graph theory for that.

Resources