How can I fill an array with sets - python-3.x

Say I have an image of 2x2 pixels named image_array, each pixel color is identified by a tuple of 3 entries (RGB), so the shape of image_array is 2x2x3.
I want to create an np.array c which has the shape 2x2x1 and which last coordinate is an empty set.
I tried this:
import numpy as np
image = (((1,2,3), (1,0,0)), ((1,1,1), (2,1,2)))
image_array = np.array(image)
c = np.empty(image_array.shape[:2], dtype=set)
c.fill(set())
c[0][1].add(124)
print(c)
I get:
[[{124} {124}]
[{124} {124}]]
And instead I would like the return:
[[{} {124}]
[{} {}]]
Any idea ?

The object array has to be filled with separate set() objects. That means creating them individually, as I do with a list comprehension:
In [279]: arr = np.array([set() for _ in range(4)]).reshape(2,2)
In [280]: arr
Out[280]:
array([[set(), set()],
[set(), set()]], dtype=object)
That construction should highlight that fact that this array is closely related to a list, or list of lists.
Now we can do a set operation on one of those elements:
In [281]: arr[0,1].add(124) # more idiomatic than arr[0][1]
In [282]: arr
Out[282]:
array([[set(), {124}],
[set(), set()]], dtype=object)
Note that we cannot operate on more than one set at a time. The object array offers few advantages compared to a list.
This is a 2d array; the sets don't form a dimension. Contrast that with
In [283]: image = (((1,2,3), (1,0,0)), ((1,1,1), (2,1,2)))
...: image_array = np.array(image)
...:
In [284]: image_array
Out[284]:
array([[[1, 2, 3],
[1, 0, 0]],
[[1, 1, 1],
[2, 1, 2]]])
While it started with tuples, it made a 3d array of integers.

Try this:
import numpy as np
x = np.empty((2, 2), dtype=np.object)
x[0, 0] = set(1, 2, 3)
print(x)
[[{1, 2, 3} None]
[None None]]
For non-number types in numpy you should use np.object.

whenever you do fill(set()), this will fill the array with exactly same set, as they refer to the same set. To fix this, just make a set if there isnt one everytime you need to add to the set
c = np.empty(image_array.shape[:2], dtype=set)
if not c[0][1]:
c[0,1] = set([124])
else:
c[0,1].add(124)
print (c)
# [[None {124}]
# [None None]]

Try changing your line c[0][1].add to this.
c[0][1] = 124
print(c)

Related

Can we initialise a numpy array of numpy arrays with different shapes using some constructor?

I want an array that looks like this,
array([array([[1, 1], [2, 2]]), array([3, 3])], dtype=object)
I can make an empty array and then assign elements one by one like this,
z = [np.array([[1,1],[2,2]]), np.array([3,3])]
x = np.empty(shape=2, dtype=object)
x[0], x[1] = z
I thought if this possible then so should be this: x = np.array(z, dtype=object), but that gets me the error: ValueError: could not broadcast input array from shape (2,2) into shape (2).
So is the way given above the only way to make a ragged numpy array? Or, is there a nice one line constructor/function we can can call to make the array x from above.

Why is taking a slice of a list which is assigned to another list not changing the original?

I have a class that is a representation of a mathematical tensor. The tensor in the class, is stored as a single list, not lists inside another list. That means [[1, 2, 3], [4, 5, 6]] would be stored as [1, 2, 3, 4, 5, 6].
I've made a __setitem__() function and a function to handle taking slices of this tensor while it's in single list format. For example slice(1, None, None) would become slice(3, None, None) for the list mentioned above. However when I assign this slice a new value, the original tensor isn't updated.
Here is what the simplified code looks like
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to temp_tensor should also change self.tensor.
for s in slices: # Here I would call self.slices_to_index(), but this is to keep the code simple.
temp_tensor = temp_tensor[slice]
temp_tensor = value # In my mind, this should have also changed self.tensor, but it hasn't.
Maybe i'm just being stupid and can't see why this isn't working. Maybe my actual questions isn't just ' why doesn't this work?' but also 'is there a better way to do this?'. Thanks for any help you can give me.
NOTES:
Each 'dimension' of the list must have the same shape, so [[1, 2, 3], [4, 5]] isn't allowed.
This code is massively simplified as there are many other helper functions and stuff like that.
in __init__() I would flatten the list but as I just said to keep things simple I left that out, along with self.slice_to_index().
You should not think of python variables as in c++ or java. Think of them as labels you place on values. Check this example:
>>> l = []
>>> l.append
<built-in method append of list object at 0x7fbb0d40cf88>
>>> l.append(10)
>>> l
[10]
>>> ll = l
>>> ll.append(10)
>>> l
[10, 10]
>>> ll
[10, 10]
>>> ll = ["foo"]
>>> l
[10, 10]
As you can see, ll variable first points to the same l list but later we just make it point to another list. Modifying the later ll won't modify the original list pointed by l.
So, in your case if you want self.tensor to point to a new value, just do it:
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to the list pointed by temp_tensor will be reflected in self.tensor since it is the same list
for s in slices:
temp_tensor = temp_tensor[slice]
self.tensor = value

How to randomly select from list of lists

I have a list of lists in python as follows:
a = [[1,1,2], [2,3,4], [5,5,5], [7,6,5], [1,5,6]]
for example, How would I at random select 3 lists out of the 6?
I tried numpy's random.choice but it does not work for lists.
Any suggestion?
numpy's random.choice doesn't work on 2-d array, so one alternative is to use lenght of the array to get the random index of 2-d array and then get the elements from that random index. see below example.
import numpy as np
random_count = 3 # number of random elements to find
a = [[1,1,2], [2,3,4], [5,5,5], [7,6,5], [1,5,6]] # 2-d data list
alist = np.array(a) # convert 2-d data list to numpy array
random_numbers = np.random.choice(len(alist), random_count) # fetch random index of 2-d array based on len
for item in random_numbers: # iterate over random indexs
print(alist[item]) # print random elememt through index
You can use the random library like this:
a = [[1,1,2], [2,3,4], [5,5,5], [7,6,5], [1,5,6]]
import random
random.choices(a, k=3)
>>> [[1, 5, 6], [2, 3, 4], [7, 6, 5]]
You can read more about the random library at this official page https://docs.python.org/3/library/random.html.

dask array map_blocks, with differently shaped dask array as argument

I'm trying to use dask.array.map_blocks to process a dask array, using a second dask array with different shape as an argument. The use case is firstly running some peak finding on a 2-D stack of images (4-dimensions), which is returned as a 2-D dask array of np.objects. Ergo, the two first dimensions of the two dask arrays are the same. The peaks are then used to extract intensities from the 4-dimensional dataset. In the code below, I've omitted the peak finding part. Dask version 1.0.0.
import numpy as np
import dask.array as da
def test_processing(data_chunk, position_chunk):
output_array = np.empty(data_chunk.shape[:-2], dtype='object')
for index in np.ndindex(data_chunk.shape[:-2]):
islice = np.s_[index]
intensity_list = []
data = data_chunk[islice]
positions = position_chunk[islice]
for x, y in positions:
intensity_list.append(data[x, y])
output_array[islice] = np.array(intensity_list)
return output_array
data = da.random.random(size=(4, 4, 10, 10), chunks=(2, 2, 10, 10))
positions = np.empty(data.shape[:-2], dtype='object')
for index in np.ndindex(positions.shape):
positions[index] = np.arange(10).reshape(5, 2)
data_output = da.map_blocks(test_processing, data, positions, dtype=np.object,
chunks=(2, 2), drop_axis=(2, 3))
data_output.compute()
This gives the error ValueError: Can't drop an axis with more than 1 block. Please useatopinstead., which I'm guessing is due to positions having 3 dimensions, while data has 4 dimensions.
The same function, but without the positions dask array works fine.
import numpy as np
import dask.array as da
def test_processing(data_chunk):
output_array = np.empty(data_chunk.shape[:-2], dtype='object')
for index in np.ndindex(data_chunk.shape[:-2]):
islice = np.s_[index]
intensity_list = []
data = data_chunk[islice]
positions = [[5, 2], [1, 3]]
for x, y in positions:
intensity_list.append(data[x, y])
output_array[islice] = np.array(intensity_list)
return output_array
data = da.random.random(size=(4, 4, 10, 10), chunks=(2, 2, 10, 10))
data_output = da.map_blocks(test_processing, data, dtype=np.object,
chunks=(2, 2), drop_axis=(2, 3))
data_computed = data_output.compute()
This has been fixed in more recent versions of dask: running the same code on version 2.3.0 of dask works fine.

copy in numpy(python 3)

I've just learnt copy,shallow copy,and deep copy in python,and I created a list b,then make c equal b.I know it's reasonable to find that the same element share the identical 'id'.Then I think I'll get the similar result in numpy when I make the nearly same steps,however,it shows that the same element has different 'id', I can't figure out how that happens in numpy.
You don't need a duplicate reference to produce the result.
import numpy as np
a = np.array([[10, 10], [2, 3], [4, 5]])
for x, y in zip(a, a):
print(id(x), ',', id(y))
# 52949424 , 52949464
# 52949624 , 52951424
# 52949464 , 52949424
My guess is that when zip iterates over arrays, it triggers indexing in numpy which then returns copied row. Remember that [] in numpy is not like that for list.
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html
You may try this to see why messing with id is not a good idea for numpy.
a[0] is a[0] # False
a[0] is a[[0]] # False

Resources