Where/how is branch prediction data stored? - branch-prediction

I have always wondered where/how the prediction data is stored? Is there a limit? Is it only recent branches? I am mostly concerned about Intel architecture, but anything I can learn about any architecture is appreciated.

Somewhere internally in the processor. What exactly is done depends on the processor.
In a very simple case, you might take 4096 bits of branch prediction data. Then for every branch, you take the last 12 bits of the address of the branch, which gives 4096 different values, and take that as the index into your branch prediction data. And since you have only one bit of data, you just store whether the last branch was taken.
The advantage is that it is very cheap. The disadvantage is that two branches exactly 4096 bytes apart use the same entry in the table. So if your code executes these two branches all the time, and one is always taken and one is never taken, the branch prediction is quite bad.
Some processor use two bits per branch meaning "strong taken", "taken", "not taken", "strong not taken". Every time a branch is taken the prediction moves towards "strong taken", if the branch is not taken it moves towards "strong not taken". This works better if branches are usually taken with rare exceptions.
Some processors don't just use the last 12 or more bits of the branch address, but they mix in whether say the last four branches were taken. Say you have code
if (x >= 0) { ... }
if (x <= 0) { ... }
and x is rarely 0, but quite randomly positive or negative. Then the first branch is hard to predict, but the second is never taken after the first one is taken, and always taken if the first one is not taken. By mixing in this information, you use up two entries in the branch prediction table for the second branch, but the prediction for the second branch will be highly accurate, even though the branch is randomly taken or not taken.
You always have the problem that the same entry in the branch prediction table will be used for more than one branch; you just live with that. (Doing anything clever to handle this would take much too much storage. We are using 1 or 2 bit per branch prediction so we can have massive tables with very little storage).

Metadata of branch predictors is stored on-chip, in branch-predictor tables. Some research works propose storing them in the cache hierarchy (which is called predictor virtualization) but I don't think it has been implemented in any real processor, yet.
Since you expressed willingness to know more, see my survey paper for more details on architectures of several branch predictors.

Related

Optimise multiple objectives in MiniZinc

I am newbie in CP but I want to solve problem which I got in college.
I have a Minizinc model which minimize number of used Machines doing some Tasks. Machines have some resource and Tasks have resource requirements. Except minimize that number, I am trying to minimize cost of allocating Tasks to Machine (I have an array with cost). Is there any chance to first minimize that number and then optimizate the cost in Minizinc?
For example, I have 3 Task and 2 Machines. Every Machine has enough resource to allocate 3 Task on them but I want to allocate Task where cost is lower.
Sorry for my English and thanks for help. If there is such a need I will paste my code.
The technique that you are referring to is called lexicographic optimisation/objectives. The idea is to optimise for multiple objectives, where there is a clear ordering between the objectives. For example, when optimising (A, B, C) we would optimise B and C, subject to A. So if we can improve the value of A then we would allow B and C to worsen. Similarly, C is also optimised subject to B.
This technique is often used, but is currently not (yet) natively supported in MiniZinc. There are however a few workarounds:
As shown in the radation model, we can scale the first objective by a value that is at least as much as the maximum of the second objective (and so on). This will ensure that any improvement on the first objective will trump any improvement/stagnation on the second objective. The result of the instance should then be the lexicographic optimal.
We can seperate our models into multiple stages. In each stage we would only concern ourselves with a single objective value (working from most important to least important). Any subsequent stage would
fix the objectives from earlier stages. The solution of the final stage should give you the lexicographic optimal solution.
Some solvers support lexicographic optimisation natively. There is some experimental support for using these lexicographic objectives in MiniZinc, as found in std/experimental.mzn.
Note that lexicographic techniques might not always (explicitly) talk about both minimisation and maximisation; however, you can always convert from one to the other by negating the intended objective value.

Should the Monte Carlo tree in calculating the previous bestMove be used to feed the next Monte Carlo search?

I have seen some MCTS implementation online and how they are used in a game.
A best move is calculated each move based on the state at that moment.
If you have a sequence of moves in a game between human and computer like:
turn_h1,turn_c1,turn_h2,turn_c2,turn_h3,turn_c3,....turn_hn,turn_cn
turn_h(i)=human, turn_c(i)=computer and i the i-th move of a player (human/computer).
And for each computer's turn i there is a corresponding state that is used to determine the i-th best move with MCTS.
Question: Should the tree built in the (i-1)-th turn(bestmove) be used for the i-th turn(MCTS bestmove)?
I mean, should the tree which was the result of the best move in state (n-1) be used as input for determining the best move at the i-th state?
Other words can I re-use already constructed tree-nodes from previous turns/bestmoves calculations, so that I do not need to build the whole tree again?
I have created a sequence of turns in pseudo-code just to make clear what what I mean with using the (i-1)th state(tree) to feed the next MCST bestmove. (of course in real world the logic below would be implemented as an iteration/loop construct):
#start game
initial_game_state.board= initialize_board()
#turn 1
#human play
new_game_state_1 = initial_game_state.board.make_move(move_1)
#computer play
move_1 = MCTS.determine_bestmove(new_game_state_1)
new_game_state_2 = game_state_1.board.make_move(move_1)
#turn 2
#human play
new_game_state_3 = new_game_state_2.board.make_move(move_2)
#computer play
move_3 = MCTS.determine_bestmove(new_game_state_3)
new_game_state_4 = new_game_state_4.board.makeMove(move_3)
#turn 3
# ....
Yes you can do this. This is commonly referred to as "tree reuse" (at least, that's how I usually call it).
You would start out your MCTS call (except for the very first one, in which there is no "previous tree" yet) by navigating from the root node to the node that corresponds to the one you have actually reached in the "real" game.
Note that, in a two-player alternating-move game, this does not only involve a move that your MCTS agent made, but also a move made by the opponent. Due to how MCTS work, if the opponent "surprised" your MCTS agent by selecting a move that MCTS didn't predict, it is likely that this leads to a subtree of the previous tree that had relatively few visits. In this case, tree reuse won't have much effect. But in cases where the opponent doesn't surprise you, and plays exactly what MCTS already predicted during the previous search, you may end up getting a relatively large subtree to initialise your new search with.
As for if you "should" do this, as is the literal wording in your question... you don't have to. There are many MCTS implementations out there which don't do this. I'd generally recommend it anyway. It's not too difficult to implement. It generally won't give a big boost in performance (because the playing strength of MCTS tends to scale sub-linearly with increases in "thinking time"), but it definitely shouldn't hurt either, and may give a small boost in playing strength.
Note that, in nondeterministic games, if you implement an "open-loop" variant of MCTS (without explicit chance nodes), the part of the subtree that you're "re-using" will be partially based on outdated information. In such games, it may be beneficial to discount all the statistics gathered in your previous search (i.e. multiply all your visit counts and accumulated scores by a number between 0 and 1) before starting the new search process.
Important implementation detail: when re-using the previous tree, if your new root node (which used to be a node in the middle of your previous tree) has a reference/pointer back to its parent node, make sure to set it to null. If you forget about this, all search trees of all your previous searches will fully persist in memory throughout an entire game, and you'll likely run out of memory quickly.

Procedural modelling classical Chinese visions of political order

The problem I’m dealing with at the moment involves a system described in the Guanzi. A large section of the book is about how governments should work to extract a surplus from the economy which they can redistribute to ensure the loyalty of existing followers and gain new ones. Under this system, whoever can redistribute the most wealth becomes the overall leader. However, he also has to out-compete the other individuals in the system: they are all busy trying to establish their own redistribution networks.
The result is a series of pyramid-shaped redistribution networks, both independent and nested.
Simplified visual representation of the expected outcome
These are dynamic across time and space. Gaining resources lets you acquire more followers, which in turn gives you access to more resources. There is also a random component involved: a bad harvest or a war may wipe out your resources. If one leader runs out of resources (whether as a result of a disaster or because he redistributed them too generously among his followers), he will either be supplanted by a follower or his network will collapse and its members leave to join other networks.
I think it is possible to model this algorithmically.
We can assume that willingness to share resources is innate.
Generosity = propensity score
An individual acquires followers as a function of both the surplus resources he possesses and his willingness to share them.
Followers[tn] = Surplus[t-1] * Generosity
It is worth noting that growth is endogenous in this model. It is a product of whatever economic growth coefficient is deemed realistic given technology and natural resources (a), as well as of the previous cycle’s surplus and the number of followers an individual has, on the basis that these constitute factors of production. (Note: I'm not interested in getting actual monetary values out of this, just modelling the relationships. I understand that if you plugged real numbers into it people would end up redistributing more than they own.)
Growth = a (Surplus[t-1] * Followers[t-1])
At T=0 the surplus enjoyed by each individual in the system must be generated randomly.
Surplus[t0] = randomly generated number
Followers generate additional resources for their leader, but they also need to be remunerated, meaning that they simultaneously deplete their leader’s resources, proportional to his generosity propensity score. A random component must also be included, as mentioned above, to account for famines, bumper crops, wars etc.
Surplus[tn] = Random Component (Surplus[t-1] + Growth) – (Followers[t-1] * Generosity)
Once these relationships have been defined, then the algorithm is relatively simple:
T1:
Each individual checks the Surplus*Generosity score of the nearest individual who is not already following him. If Individual A’s SG > Individual B’s SG, then Individual B moves closer to Individual A and becomes his follower. (Note: If individual B has followers of his own, he carries them with him. Also: Followers automatically re-check their leader's SG in every round, since he is the closest individual to them. They will leave his network to become free agents once more if his SG drops below their own.)
Otherwise, he does nothing.
T2 :
Each individual’s stats (Followers, Surplus) are recalculated based on the new situation.
Step 1 is repeated.
T3 :
Repeat previous step
One would expect the individuals with the optimal generosity score to build the biggest networks, as they acquire followers without completely depleting their resources.
I suspect – but am not sure – that this model’s characteristics are similar to those of an L-system model.
Individuals are programmed with a simple instruction: “If the person closest to you has a higher S*G score than you do, approach and follow him.”
On the basis of this the individuals form structures (from the perspective of the individual with the optimal S*G score, they appear to cluster around him in a semi-structured way)
These structures grow with every successive time period
They collapse after depleting their own resources, or when a random disaster strikes.
After a collapse, the process automatically begins again.
However, I'm not a maths or a computing guy (I'm a Chinese philosophy guy) so I'm not sure if I'm just being fooled by a superficial resemblance or not. Is this a genuine example of string rewriting or am I just convincing myself it is because you get tree-like structures out of it? Is this even a model that can work at all? Have I totally messed up my equations? (I haven't done this since high school, so it's highly probable.)
All help is gratefully received.

How to calculate receptive field of blocks with skip connection?

Although there are many resources about how to calculate the receptive field (RF) of CNNs (ex: http://fomoro.com/tools/receptive-fields), I didn't find anything regarding skip connections. In [1] they mention that skip connections make the effective RF smaller, but what happens to the theoretical RF?
At the end of the day, I would like to know how to calculate the receptive field of a network comprising many residual blocks.
Thanks,
Daniel
TL;DR compute the receptive field ignoring all skip connections.
First, in a general case, let's say we have two branches of data flow - A and B. You can compute the receptive field for branches A and B independently, and then simply take the maximum when the branches merge. (The reason you can take the max is that branches typically merge via channels concatenation.)
Now, when one branch is a skip connection, and the other is not, the one which is not, gives the larger receptive field. If you have many skip connections, the longest route (with no skip connections) would give the maximum receptive field. Hence the result in TL;DR.
Getting the maximum among branches becomes more complicated if instead of a simple skip connection you have something like an inception block.
In those cases, you may want to compute the receptive field directly by definition.

Continuous-time finite-horizon MDP

Is there any algorithm for solving a finite-horizon semi-Markov-Decision-Process?
I want to find the optimal policy for a sequential decision problem with a finite action space, a finite state space, and a deadline. Critically, different actions take different amounts of time and for one of the actions this duration is stochastic. I can model time as being discrete or continuous depending on which methods are available.
I am aware of algorithms for discounted infinite-horizon semi-MDPs, but I cannot find any work on finite-horizon semi-MDPs. Has this class of problems been studied before?
As with almost any MDP, backward dynamic programming should work. You could discretize your finite horizon in small steps from 0 to the deadline and then recursively update the values starting from the deadline. In the state space you'll have to track the current action, the total time spend on that action, and the already completed actions. The number of possible states may be quite large.
In the dynamic program you can maybe exploit that you can select the value function for the state at the time the action is completed.

Resources