Use IntVar as index of IntVar[] array in a constraint - constraint-programming

I want to use the value of an IntVar as an index of another IntVar array in a constraint, using Choco Solver.
I have an IntVar who contains the next task who follow the i-th task
And I have another IntVar who contains the person assigned to a task.
My constraint is to ensure the continuity in the task allocation.
This is what I've already tried, but it failled:
model.distance(person[i], person[next[i].getValue()], "=", 0).post();

The solution is to use IntConstraintFactory.element​(IntVar value, int[] table, IntVar index, int offset).
In my case:
model.element(person[i], person, next[i], 0).post();

Related

Counter sum filtered counts

The result should sum over a counter like counter.total(), but only on certain elements in the counter. The elements should be filtered by their keys, I would imagine something like the following code existed:
sum(counter_element.count() for counter_element in counter.elements() if counter_element.key() is good)
How to achieve the correct result?
tried this with counter.elements(), counter.items() and counter.values() but no combination of functions did the job.
sum(count for (key, count) in counter.items() if key is good)
does the job. The main idea is that .items() returns the tuples and (key, count) then already holds the correct objects one can be interested in.

when hash(item1)==hash(item2) but item1!=item2 why the dict can still find they are different [duplicate]

I am trying to understand the Python hash function under the hood. I created a custom class where all instances return the same hash value.
class C:
def __hash__(self):
return 42
I just assumed that only one instance of the above class can be in a dict at any time, but in fact a dict can have multiple elements with the same hash.
c, d = C(), C()
x = {c: 'c', d: 'd'}
print(x)
# {<__main__.C object at 0x7f0824087b80>: 'c', <__main__.C object at 0x7f0823ae2d60>: 'd'}
# note that the dict has 2 elements
I experimented a little more and found that if I override the __eq__ method such that all the instances of the class compare equal, then the dict only allows one instance.
class D:
def __hash__(self):
return 42
def __eq__(self, other):
return True
p, q = D(), D()
y = {p: 'p', q: 'q'}
print(y)
# {<__main__.D object at 0x7f0823a9af40>: 'q'}
# note that the dict only has 1 element
So I am curious to know how a dict can have multiple elements with the same hash.
Here is everything about Python dicts that I was able to put together (probably more than anyone would like to know; but the answer is comprehensive). A shout out to Duncan for pointing out that Python dicts use slots and leading me down this rabbit hole.
Python dictionaries are implemented as hash tables.
Hash tables must allow for hash collisions i.e. even if two keys have same hash value, the implementation of the table must have a strategy to insert and retrieve the key and value pairs unambiguously.
Python dict uses open addressing to resolve hash collisions (explained below) (see dictobject.c:296-297).
Python hash table is just a continguous block of memory (sort of like an array, so you can do O(1) lookup by index).
Each slot in the table can store one and only one entry. This is important
Each entry in the table actually a combination of the three values - . This is implemented as a C struct (see dictobject.h:51-56)
The figure below is a logical representation of a python hash table. In the figure below, 0, 1, ..., i, ... on the left are indices of the slots in the hash table (they are just for illustrative purposes and are not stored along with the table obviously!).
# Logical model of Python Hash table
-+-----------------+
0| <hash|key|value>|
-+-----------------+
1| ... |
-+-----------------+
.| ... |
-+-----------------+
i| ... |
-+-----------------+
.| ... |
-+-----------------+
n| ... |
-+-----------------+
When a new dict is initialized it starts with 8 slots. (see dictobject.h:49)
When adding entries to the table, we start with some slot, i that is based on the hash of the key. CPython uses initial i = hash(key) & mask. Where mask = PyDictMINSIZE - 1, but that's not really important). Just note that the initial slot, i, that is checked depends on the hash of the key.
If that slot is empty, the entry is added to the slot (by entry, I mean, <hash|key|value>). But what if that slot is occupied!? Most likely because another entry has the same hash (hash collision!)
If the slot is occupied, CPython (and even PyPy) compares the the hash AND the key (by compare I mean == comparison not the is comparison) of the entry in the slot against the key of the current entry to be inserted (dictobject.c:337,344-345). If both match, then it thinks the entry already exists, gives up and moves on to the next entry to be inserted. If either hash or the key don't match, it starts probing.
Probing just means it searches the slots by slot to find an empty slot. Technically we could just go one by one, i+1, i+2, ... and use the first available one (that's linear probing). But for reasons explained beautifully in the comments (see dictobject.c:33-126), CPython uses random probing. In random probing, the next slot is picked in a pseudo random order. The entry is added to the first empty slot. For this discussion, the actual algorithm used to pick the next slot is not really important (see dictobject.c:33-126 for the algorithm for probing). What is important is that the slots are probed until first empty slot is found.
The same thing happens for lookups, just starts with the initial slot i (where i depends on the hash of the key). If the hash and the key both don't match the entry in the slot, it starts probing, until it finds a slot with a match. If all slots are exhausted, it reports a fail.
BTW, the dict will be resized if it is two-thirds full. This avoids slowing down lookups. (see dictobject.h:64-65)
There you go! The Python implementation of dict checks for both hash equality of two keys and the normal equality (==) of the keys when inserting items. So in summary, if there are two keys, a and b and hash(a)==hash(b), but a!=b, then both can exist harmoniously in a Python dict. But if hash(a)==hash(b) and a==b, then they cannot both be in the same dict.
Because we have to probe after every hash collision, one side effect of too many hash collisions is that the lookups and insertions will become very slow (as Duncan points out in the comments).
I guess the short answer to my question is, "Because that's how it's implemented in the source code ;)"
While this is good to know (for geek points?), I am not sure how it can be used in real life. Because unless you are trying to explicitly break something, why would two objects that are not equal, have same hash?
For a detailed description of how Python's hashing works see my answer to Why is early return slower than else?
Basically it uses the hash to pick a slot in the table. If there is a value in the slot and the hash matches, it compares the items to see if they are equal.
If the hash matches but the items aren't equal, then it tries another slot. There's a formula to pick this (which I describe in the referenced answer), and it gradually pulls in unused parts of the hash value; but once it has used them all up, it will eventually work its way through all slots in the hash table. That guarantees eventually we either find a matching item or an empty slot. When the search finds an empty slot, it inserts the value or gives up (depending whether we are adding or getting a value).
The important thing to note is that there are no lists or buckets: there is just a hash table with a particular number of slots, and each hash is used to generate a sequence of candidate slots.
Edit: the answer below is one of possible ways to deal with hash collisions, it is however not how Python does it. Python's wiki referenced below is also incorrect. The best source given by #Duncan below is the implementation itself: https://github.com/python/cpython/blob/master/Objects/dictobject.c I apologize for the mix-up.
It stores a list (or bucket) of elements at the hash then iterates through that list until it finds the actual key in that list. A picture says more than a thousand words:
Here you see John Smith and Sandra Dee both hash to 152. Bucket 152 contains both of them. When looking up Sandra Dee it first finds the list in bucket 152, then loops through that list until Sandra Dee is found and returns 521-6955.
The following is wrong it's only here for context: On Python's wiki you can find (pseudo?) code how Python performs the lookup.
There's actually several possible solutions to this problem, check out the wikipedia article for a nice overview: http://en.wikipedia.org/wiki/Hash_table#Collision_resolution
Hash tables, in general have to allow for hash collisions! You will get unlucky and two things will eventually hash to the same thing. Underneath, there is a set of objects in a list of items that has that same hash key. Usually, there is only one thing in that list, but in this case, it'll keep stacking them into the same one. The only way it knows they are different is through the equals operator.
When this happens, your performance will degrade over time, which is why you want your hash function to be as "random as possible".
In the thread I did not see what exactly python does with instances of a user-defined classes when we put it into a dictionary as a keys. Let's read some documentation: it declares that only hashable objects can be used as a keys. Hashable are all immutable built-in classes and all user-defined classes.
User-defined classes have __cmp__() and
__hash__() methods by default; with them, all objects
compare unequal (except with themselves) and
x.__hash__() returns a result derived from id(x).
So if you have a constantly __hash__ in your class, but not providing any __cmp__ or __eq__ method, then all your instances are unequal for the dictionary.
In the other hand, if you providing any __cmp__ or __eq__ method, but not providing __hash__, your instances are still unequal in terms of dictionary.
class A(object):
def __hash__(self):
return 42
class B(object):
def __eq__(self, other):
return True
class C(A, B):
pass
dict_a = {A(): 1, A(): 2, A(): 3}
dict_b = {B(): 1, B(): 2, B(): 3}
dict_c = {C(): 1, C(): 2, C(): 3}
print(dict_a)
print(dict_b)
print(dict_c)
Output
{<__main__.A object at 0x7f9672f04850>: 1, <__main__.A object at 0x7f9672f04910>: 3, <__main__.A object at 0x7f9672f048d0>: 2}
{<__main__.B object at 0x7f9672f04990>: 2, <__main__.B object at 0x7f9672f04950>: 1, <__main__.B object at 0x7f9672f049d0>: 3}
{<__main__.C object at 0x7f9672f04a10>: 3}

Creating a defaultdict of a tuple with a list and int

I know to create a defaultdict with default values, i can use the below:
defaultdict(lambda : 0)
and for a defaultdict of tuple with default values, i can use the following:
defaultdict(lambda: (0,0))
But i am struggling with this, how do i create a defaultdict of tuple with a list and an int? I need something like :
{key1:['a','b','c','a'],100),key2:(['a','a','a','a'],2100),(key3:['adds','bas','cs','a'],300),key4:(['a'],30)}
So i need to check for an item in the list, if it is not present, i need to increment the int value. Is my idea of tackling this situation using defaultdict correct??
if you want to be able to do this:
d["some_key"][1] += 1
even if key doesn't exist and get [set(),1] as a value then do:
d = collections.defaultdict(lambda : [set(),0])
Note #1: defaultdict(lambda : 0) is overkill for defaultdict(int)
Note #2: I used a list and not a tuple for the default value. Had I used a tuple, I would have had a hard time to increment second item by 1 since tuples are read-only.
Note #3: tuple is mostly useful as keys (because they're immutable, thus hashable), not as values, where you can store anything you want, hashable or not.

How to create an array of functions which partly depend on outside parameters? (Python)

I am interested in creating a list / array of functions "G" consisting of many small functions "g". This essentially should correspond to a series of functions 'evolving' in time.
Each "g" takes-in two variables and returns the product of these variables with an outside global variable indexed at the same time-step.
Assume obs_mat (T x 1) is a pre-defined global array, and t corresponds to the time-steps
G = []
for t in range(T):
# tried declaring obs here too.
def g(current_state, observation_noise):
obs = obs_mat[t]
return current_state * observation_noise * obs
G.append(g)
Unfortunately when I test the resultant functions, they do not seem to pick up on the difference in the obs time-varying constant i.e. (Got G[0](100,100) same as G[5](100,100)). I tried playing around with the scope of obs but without much luck. Would anyone be able to help guide me in the right direction?
This is a common "gotcha" to referencing variables from an outer scope when in an inner function. The outer variable is looked up when the inner function is run, not when the inner function is defined (so all versions of the function see the variable's last value). For each function to see a different value, you either need to make sure they're looking in separate namespaces, or you need to bind the value to a default parameter of the inner function.
Here's an approach that uses an extra namespace:
def make_func(x):
def func(a, b):
return a*b*x
return func
list_of_funcs = [make_func(i) for i in range(10)]
Each inner function func has access to the x parameter in the enclosing make_func function. Since they're all created by separate calls to make_func, they each see separate namespaces with different x values.
Here's the other approach that uses a default argument (with functions created by a lambda expression):
list_of_funcs = [lambda a, b, x=i: a*b*x for i in range(10)]
In this version, the i variable from the list comprehension is bound to the default value of the x parameter in the lambda expression. This binding means that the functions wont care about the value of i changing later on. The downside to this solution is that any code that accidentally calls one of the functions with three arguments instead of two may work without an exception (perhaps with odd results).
The problem you are running into is one of scoping. Function bodies aren't evaluated until the fuction is actually called, so the functions you have there will use whatever is the current value of the variable within their scope at time of evaluation (which means they'll have the same t if you call them all after the for-loop has ended)
In order to see the value that you would like, you'd need to immediately call the function and save the result.
I'm not really sure why you're using an array of functions. Perhaps what you're trying to do is map a partial function across the time series, something like the following?
from functools import partial
def g(current_state, observation_noise, t):
obs = obs_mat[t]
return current_state * observation_noise * obs
g_maker = partial(g, current, observation)
results = list(map(g_maker, range(T)))
What's happening here is that partial creates a partially-applied function, which is merely waiting for its final value to be evaluated. That final value is dynamic (but the first two are fixed in this example), so mapping that partially-applied function over a range of values gets you answers for each value.
Honestly, this is a guess because it's hard to see what else you are trying to do with this data and it's hard to see what you're trying to achieve with the array of functions (and there are certainly other ways to do this).
The issue (assuming that your G.append call is mis-indented) is simply that the name t is mutated when you loop over the iterator returned by range(T). Since every function g you create stores returns the same name t, they wind up all returning the same value, T - 1. The fix is to de-reference the name (the simplest way to do this is by sending t into your function as a default value for an argument in g's argument list):
G = []
for t in range(T):
def g(current_state, observation_noise, t_kw=t):
obs = obs_mat[t_kw]
return current_state * observation_noise * obs
G.append(g)
This works because it creates another name that points at the value that t references during that iteration of the loop (you could still use t rather than t_kw and it would still just work because tg is bound to the value that tf is bound to - the value never changes, but tf is bound to another value on the next iteration, while tg still points at the "original" value.

Methods for nearby numbers in Groovy

In groovy are there any methods that can find the near by numbers? For example :
def list = [22,33,37,56]
def number = 25
//any method to find $number is near to 22 rather than 33.
Is there any method for the above mentioned purpose, or i have to construct my own method or closure for this purpose.
Thanks in advance.
The following combination of Groovy's collection methods will give you the closest number in the list:
list.groupBy { (it - number).abs() }.min { it.key }.value.first()
The list.groupBy { (it - number).abs() } will transform the list into a map, where each map entry consists of the distance to the number as key and the original list entry as the value:
[3:[22], 8:[33], 12:[37], 31:[56]]
The values are now each a list on their own, as theoretically the original list could contain two entries with equal distance. On the map you then select the entry with the smallest key, take its value and return the first entry of the value's list.
Edit:
Here's a simpler version that sorts the original list based on the distance and return the first value of the sorted list:
list.sort { (it - number).abs() }.first()
If it's a sorted List, Collections.binarySearch() does nearly the same job. So does Arrays.binarySearch().

Resources