How to make a tuple including a numpy array hashable? - python-3.x

One way to make a numpy array hashable is setting it to read-only. This has worked for me in the past. But when I use such a numpy array in a tuple, the whole tuple is no longer hashable, which I do not understand. Here is the sample code I put together to illustrate the problem:
import numpy as np
npArray = np.ones((1,1))
npArray.flags.writeable = False
print(npArray.flags.writeable)
keySet = (0, npArray)
print(keySet[1].flags.writeable)
myDict = {keySet : 1}
First I create a simple numpy array and set it to read-only. Then I add it to a tuple and check if it is still read-only (which it is).
When I want to use the tuple as key in a dictionary, I get the error TypeError: unhashable type: 'numpy.ndarray'.
Here is the output of my sample code:
False
False
Traceback (most recent call last):
File "test.py", line 10, in <module>
myDict = {keySet : 1}
TypeError: unhashable type: 'numpy.ndarray'
What can I do to make my tuple hashable and why does Python show this behavior in the first place?

You claim that
One way to make a numpy array hashable is setting it to read-only
but that's not actually true. Setting an array to read-only just makes it read-only. It doesn't make the array hashable, for multiple reasons.
The first reason is that an array with the writeable flag set to False is still mutable. First, you can always set writeable=True again and resume writing to it, or do more exotic things like reassign its shape even while writeable is False. Second, even without touching the array itself, you could mutate its data through another view that has writeable=True.
>>> x = numpy.arange(5)
>>> y = x[:]
>>> x.flags.writeable = False
>>> x
array([0, 1, 2, 3, 4])
>>> y[0] = 5
>>> x
array([5, 1, 2, 3, 4])
Second, for hashability to be meaningful, objects must first be equatable - == must return a boolean, and must be an equivalence relation. NumPy arrays don't do that. The purpose of hash values is to quickly locate equal objects, but when your objects don't even have a built-in notion of equality, there's not much point to providing hashes.
You're not going to get hashable tuples with arrays inside. You're not even going to get hashable arrays. The closest you can get is to put some other representation of the array's data in the tuple.

The fastest way to hash a numpy array is likely tostring.
In [11]: %timeit hash(y.tostring())
What you could do is rather than use a tuple define a class:
class KeySet(object):
def __init__(self, i, arr):
self.i = i
self.arr = arr
def __hash__(self):
return hash((self.i, hash(self.arr.tostring())))
Now you can use it in a dict:
In [21]: ks = KeySet(0, npArray)
In [22]: myDict = {ks: 1}
In [23]: myDict[ks]
Out[23]: 1

Related

Why is taking a slice of a list which is assigned to another list not changing the original?

I have a class that is a representation of a mathematical tensor. The tensor in the class, is stored as a single list, not lists inside another list. That means [[1, 2, 3], [4, 5, 6]] would be stored as [1, 2, 3, 4, 5, 6].
I've made a __setitem__() function and a function to handle taking slices of this tensor while it's in single list format. For example slice(1, None, None) would become slice(3, None, None) for the list mentioned above. However when I assign this slice a new value, the original tensor isn't updated.
Here is what the simplified code looks like
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to temp_tensor should also change self.tensor.
for s in slices: # Here I would call self.slices_to_index(), but this is to keep the code simple.
temp_tensor = temp_tensor[slice]
temp_tensor = value # In my mind, this should have also changed self.tensor, but it hasn't.
Maybe i'm just being stupid and can't see why this isn't working. Maybe my actual questions isn't just ' why doesn't this work?' but also 'is there a better way to do this?'. Thanks for any help you can give me.
NOTES:
Each 'dimension' of the list must have the same shape, so [[1, 2, 3], [4, 5]] isn't allowed.
This code is massively simplified as there are many other helper functions and stuff like that.
in __init__() I would flatten the list but as I just said to keep things simple I left that out, along with self.slice_to_index().
You should not think of python variables as in c++ or java. Think of them as labels you place on values. Check this example:
>>> l = []
>>> l.append
<built-in method append of list object at 0x7fbb0d40cf88>
>>> l.append(10)
>>> l
[10]
>>> ll = l
>>> ll.append(10)
>>> l
[10, 10]
>>> ll
[10, 10]
>>> ll = ["foo"]
>>> l
[10, 10]
As you can see, ll variable first points to the same l list but later we just make it point to another list. Modifying the later ll won't modify the original list pointed by l.
So, in your case if you want self.tensor to point to a new value, just do it:
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to the list pointed by temp_tensor will be reflected in self.tensor since it is the same list
for s in slices:
temp_tensor = temp_tensor[slice]
self.tensor = value

How can I fill an array with sets

Say I have an image of 2x2 pixels named image_array, each pixel color is identified by a tuple of 3 entries (RGB), so the shape of image_array is 2x2x3.
I want to create an np.array c which has the shape 2x2x1 and which last coordinate is an empty set.
I tried this:
import numpy as np
image = (((1,2,3), (1,0,0)), ((1,1,1), (2,1,2)))
image_array = np.array(image)
c = np.empty(image_array.shape[:2], dtype=set)
c.fill(set())
c[0][1].add(124)
print(c)
I get:
[[{124} {124}]
[{124} {124}]]
And instead I would like the return:
[[{} {124}]
[{} {}]]
Any idea ?
The object array has to be filled with separate set() objects. That means creating them individually, as I do with a list comprehension:
In [279]: arr = np.array([set() for _ in range(4)]).reshape(2,2)
In [280]: arr
Out[280]:
array([[set(), set()],
[set(), set()]], dtype=object)
That construction should highlight that fact that this array is closely related to a list, or list of lists.
Now we can do a set operation on one of those elements:
In [281]: arr[0,1].add(124) # more idiomatic than arr[0][1]
In [282]: arr
Out[282]:
array([[set(), {124}],
[set(), set()]], dtype=object)
Note that we cannot operate on more than one set at a time. The object array offers few advantages compared to a list.
This is a 2d array; the sets don't form a dimension. Contrast that with
In [283]: image = (((1,2,3), (1,0,0)), ((1,1,1), (2,1,2)))
...: image_array = np.array(image)
...:
In [284]: image_array
Out[284]:
array([[[1, 2, 3],
[1, 0, 0]],
[[1, 1, 1],
[2, 1, 2]]])
While it started with tuples, it made a 3d array of integers.
Try this:
import numpy as np
x = np.empty((2, 2), dtype=np.object)
x[0, 0] = set(1, 2, 3)
print(x)
[[{1, 2, 3} None]
[None None]]
For non-number types in numpy you should use np.object.
whenever you do fill(set()), this will fill the array with exactly same set, as they refer to the same set. To fix this, just make a set if there isnt one everytime you need to add to the set
c = np.empty(image_array.shape[:2], dtype=set)
if not c[0][1]:
c[0,1] = set([124])
else:
c[0,1].add(124)
print (c)
# [[None {124}]
# [None None]]
Try changing your line c[0][1].add to this.
c[0][1] = 124
print(c)

slicing error in numpy array

I am trying to run the following code
fs = 1000
data = np.loadtxt("trainingdataset.txt", delimiter=",")
data1 = data[:,2]
data2 = data1.astype(int)
X,Y = data2['521']
but it gets me the following error
Traceback (most recent call last):
File "C:\Users\hadeer.elziaat\Desktop\testspec.py", line 58, in <module>
X,Y = data2['521']
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
my dataset
1,4,6,10
2,100,125,10
3,100,7216,254
4,100,527,263
5,100,954,13
6,100,954,23
You're using the string '521' rather than the number 521 for indexing. Try X,Y = data2[521] instead.
If you are only given the string, you could cast it to an int first: X,Y = data2[int('521')], but this might result in some errors and/or unexpected behaviour.
Next problem, you are requiring two variable, one for X and one for Y, yet the data2[521] selection only provides you with a single variable (the number in the 3rd column, 522nd row).
You say you want all the data in the 3rd column.
I assume you also want some kind of x-axis, since you are attempting to do X, Y = .... How about using the first column for that? Then your code would be:
import numpy as np
data = np.loadtxt("trainingdataset.txt", delimiter=',', dtype='int')
x = data[:, 0]
y = data[:, 2]
What remains unclear from your question is why you tried to index your data with 521 - which failed because you cannot use strings as indices on plain arrays.

copy in numpy(python 3)

I've just learnt copy,shallow copy,and deep copy in python,and I created a list b,then make c equal b.I know it's reasonable to find that the same element share the identical 'id'.Then I think I'll get the similar result in numpy when I make the nearly same steps,however,it shows that the same element has different 'id', I can't figure out how that happens in numpy.
You don't need a duplicate reference to produce the result.
import numpy as np
a = np.array([[10, 10], [2, 3], [4, 5]])
for x, y in zip(a, a):
print(id(x), ',', id(y))
# 52949424 , 52949464
# 52949624 , 52951424
# 52949464 , 52949424
My guess is that when zip iterates over arrays, it triggers indexing in numpy which then returns copied row. Remember that [] in numpy is not like that for list.
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html
You may try this to see why messing with id is not a good idea for numpy.
a[0] is a[0] # False
a[0] is a[[0]] # False

Difference between map and list iterators in python3

I ran into unexpected behaviour when working with map and list iterators in python3. In this MWE I first generate a map of maps. Then, I want the first element of each map in one list, and the remaining parts in the original map:
# s will be be a map of maps
s=[[1,2,3],[4,5,6]]
s=map(lambda l: map(lambda t:t,l),s)
# uncomment to obtain desired output
# s = list(s) # s is now a list of maps
s1 = map(next,s)
print(list(s1))
print(list(map(list,s)))
Running the MWE as is in python 3.4.2 yields the expected output for s1:
s1 = ([1,4]),
but the empty list [] for s. Uncommenting the marked line yields the correct output, s1 as above, but with the expected output for s as well:
s=[[2,3],[5,6]].
The docs say that map expects an iterable. To this day, I saw no difference between map and list iterators. Could someone explain this behaviour?
PS: Curiously enough, if I uncomment the first print statement, the initial state of s is printed. So it could also be that this behaviour has something to do with a kind of lazy(?) evaluation of maps?
A map() is an iterator; you can only iterate over it once. You could get individual elements with next() for example, but once you run out of items you cannot get any more values.
I've given your objects a few easier-to-remember names:
>>> s = [[1, 2, 3], [4, 5, 6]]
>>> map_of_maps = map(lambda l: map(lambda t: t, l), s)
>>> first_elements = map(next, map_of_maps)
Iterating over first_elements here will in turn iterate over map_of_maps. You can only do so once, so once we run out of elements any further iteration will fail:
>>> next(first_elements)
1
>>> next(first_elements)
4
>>> next(first_elements)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
list() does exactly the same thing; it takes an iterable argument, and will iterate over that object to create a new list object from the results. But if you give it a map() that is already exhausted, there is nothing to copy into the new list anymore. As such, you get an empty result:
>>> list(first_elements)
[]
You need to recreate the map() from scratch:
>>> map_of_maps = map(lambda l: map(lambda t: t, l), s)
>>> first_elements = map(next, map_of_maps)
>>> list(first_elements)
[1, 4]
>>> list(first_elements)
[]
Note that a second list() call on the map() object resulted in an empty list object, once again.

Resources