Array of integers which appear to be objects - python-3.x

I have a strange situation which I do not understand. I wand to use an array as indices.
batch_index = np.arange(min(mb_size,len(D)), dtype=np.int32)
Q_target[batch_index, actions] = rewards + gamma*np.max(Q_next, axis=1)*dones
But when I run this I get an error:
IndexError: arrays used as indices must be of integer (or boolean)
type
It appears to be an array of objects:
In: actions
Out: array([2, 2, 2, 2], dtype=object)
The same in variable explorer:
Type: Array of object Value: ndarray object of numpy module
In the same time:
In:type(actions[0])
Out: numpy.int64
I know that I can use:
actions = np.array(actions, dtype=np.int32)
But I do not understand why do I have to do it.
P.S. here is how I get actions
D = deque()
D.append((state, action, reward, state_new, done))
#type(action) == int64
minibatch = np.array(random.sample(D, min(mb_size,len(D))))
actions = minibatch[:,1]

In my opinion it's because actions is of mixed type (i.e. done is Boolean, reward is most probably some sort of numeric etc.) because you get it from queue object.
In python in general, and specifically in NumPy, when you have a mixed, non-numeric object types - the array is cast to an Object (or other unifying type, that all items agree on). So in your case, the reason you can't index the data with this array is not due to the batch_index, but because of the actions array.

Related

Assigning values to discontinued slices in a ndarray

I have a base array that contains data. Some indices in that array need to be re-assigned a new value, and the indices which do are discontinued. I'd like to avoid for-looping over all of that and using the slice notation as it's likely to be faster.
For instance:
arr = np.zeros(100)
sl_obj_1 = slice(2,5)
arr[sl_obj_1] = 42
Works for a single slice. But I have another discontinued slice to apply to that same array, say
sl_obj_2 = slice(12,29)
arr[sl_obj_1] = 55
I would like to accomplish something along the lines of:
arr[sl_obj_1, sl_obj_2] = 42, 55
Any ideas?
EDIT: changed example to emphasis that sequences are or varying lenghts.
There isn't a good way to directly extract multiple slices from a NumPy array, much less different-sized slices. But you can cheat by converting your slices into indices, and using an index array.
In the case of 1-dimensional arrays, this is relatively simple using index arrays.
import numpy as np
def slice_indices(some_list, some_slice):
"""Convert a slice into indices of a list"""
return np.arange(len(some_list))[some_slice]
# For a non-NumPy solution, use this:
# return range(*some_slice.indices(len(some_list)))
arr = np.arange(10)
# We'll make [1, 2, 3] and [8, 7] negative.
slice1, new1 = np.s_[1:4], [-1, -2, -3]
slice2, new2 = np.s_[8:6:-1], [-8, -7]
# (Here, np.s_ is just a nicer notation for slices.[1])
# Get indices to replace
idx1 = slice_indices(arr, slice1)
idx2 = slice_indices(arr, slice2)
# Use index arrays to assign to all of the indices
arr[[*idx1, *idx2]] = *new1, *new2
# That line expands to this:
arr[[1, 2, 3, 8, 7]] = -1, -2, -3, -8, -7
Note that this doesn't entirely avoid Python iteration—the star operators still create iterators and the index array is a regular python list. In a case with large amounts of data, this could be noticeably slower than the manual approach, because it calculates each index that will be assigned.
You will also need to make sure the replacement data is already the right shape, or you can use NumPy's manual broadcasting functions (e.g. np.broadcast_to) to fix the shapes. This introduces additional overhead—if you were rely on automatic broadcasting, you're probably better off doing the assignments in a loop.
arr = np.zeros(100)
idx1 = slice_indices(arr, slice(2, 5))
idx2 = slice_indices(arr, slice(12, 29))
new1 = np.broadcast_to(42, len(idx1))
new2 = np.broadcast_to(55, len(idx2))
arr[*idx1, *idx2] = *new1, *new2
To generalize to more dimensions, slice_indices will need to take care of shape, and you'll have to be more careful about joining multiple sets of indices (rather than arr[[i1, i2, i3]], you'll need arr[[i1, i2, i3], [j1, j2, j3]], which can't be concatenated directly).
In practice, if you need to do this a lot, you'd probably be better off using a simple function to encapsulate the loop you're trying to avoid.
def set_slices(arr, *indices_and_values):
"""Set multiple locations in an array to their corresponding values.
indices_and_values should be a list of index-value pairs.
"""
for idx, val in indices_and_values:
arr[idx] = val
# Your example:
arr = np.zeros(100)
set_slices(arr, (np.s_[2:5], 42), (np.s_[12:29], 55))
(If your only goal is making it look like you are using multiple indices simultaneously, here are two functions that try to do everything for you, including broadcasting and handling multidimensional arrays.)
1 np.s_

Numpy array is modified globally by the function without return [duplicate]

This question already has answers here:
Are Python variables pointers? Or else, what are they?
(9 answers)
Closed 3 years ago.
When I use the function that doesn't return anything, the input parameters remain globally unchanged.
For example:
def func(var):
var += 1
a = 0
for i in range(3):
func(a)
print(a)
logically results in
0
0
0
By it doesn't seem to work the same when the input is numpy array:
import numpy as np
def func(var):
var += 1
a = np.zeros(3)
for i in range(3):
func(a)
print(a)
Output:
[1. 1. 1.]
[2. 2. 2.]
[3. 3. 3.]
Thus, all modifications were applied to array globally, not inside the function. Why is it happening? And, more generally, are there any special rules on how to handle np arrays as functions input?
In Python, any value passed to a function is passed by object reference. So, in the first case, where you pass a number to your function, var is set to a reference to the object that reresents the number 1. In Python, even numbers are objects. To icrement var in this case actually means to set it with a reference to the object that represents the number 1+1, which is the object that represents 2. Note that object that represents the number 1 is not changed. Within the function, it is replaced.
When you pass a numpy array to your function it is likewise passed in by object reference. Hence, var now holds a reference to your array a. Incrementing an array by arr += 1 means to add 1 to each of its elements. Hence, the actual content of the object that var references has to change. However, var still references the same object are incrementation.
Take a look at the following code:
import numpy as np
def func(vals):
print('Before inc: ', id(vals))
vals += 1
print('After inc: ', id(vals))
When you pass in a number literal, vals is set to a reference of the object representing the respective number. This object has a unique id, which you can return using the id function. After incrementation, vals is a reference to the object representing a number which is one greater the first one. You can verify that by calling id again after incrementation. So, the output of the above function is something like:
Before inc: 4351114480
After inc: 4351114512
Note that there are two different objects. When now pass in an numpy array like, the resulting ids are the same:
a = np.zeros(3)
func(a)
Before inc: 4496241408
After inc: 4496241408
If you want to modify an array inside of a function and don't want that the apply changes take effect outside of the function, you have to copy your array:
def func(vals):
_vals = vals.copy()
# doing stuff with `_vals` won't change the array passed to `vals`
+= for int (and float, str, ...) creates a new value. Such types are known as immutable, because each individual object cannot be changed.
>>> i = 1
>>> id(i)
4550541072
>>> i += 1
>>> id(i) # different id
4550541104
This means incrementing such a value inside a function creates a new value inside the function. Any references outside of the function are unaffected.
+= for np.array (and list, Counter, ...) modifies the content. Such types are known as mutable, because each individual object can be changed.
>>> l = [0, 1, 2, 3]
>>> id(l)
4585425088
>>> l += [4]
>>> id(l)
4585425088
This means incrementing such a value inside a function changes the value of the object visible inside and outside of the function. Any references inside and outside of the function point to the exact same object, and show its changed value.

Python 3.7 boolean indexing *warning* using 'list'

Take a very large list such that for any number of reasons it and all the rest to come does not fit in available memory, here: A = [ 2, -3, 10, 0.2]
Map the sign of its components: sign_A = list(map(lambda u: (abs(u)==u), A))
You get [True, False, True, True]
Do some logic where you need to operate on abs_A = [abs(e) for e in A]. So you flush A and you keep working with sign_A and abs_A. The logic yields components' indices of interest, say i, k, ... for the list abs_A.
The problem I have is when using the ternary operator (falsevalue, truevalue)[condition] to do some algebra on the signed components of A, e.g.:
abs_A[i]*(-1, 1)[sign_A[i]] + abs_A[k]**(-1, 1)[sign_A[k]]
# equivalently, can use:
# abs_A[i]*(-1, 1)[np.bool(sign_A[i])] + abs_A[k]**(-1, 1)[np.bool(sign_A[k])]
I get this warning:
DeprecationWarning: In future, it will be an error for 'np.bool_'
scalars to be interpreted as an index.
The warning indirectly tells me that there is probably a better, more "pythonesque" way than my snippet to do this. I found relevant posts (e.g. here and here) but no suggestion as to how I should deal with it. Pointers anyone ?
With lists, the boolean indexing works fine:
In [21]: A = [ 2, -3, 10, 0.2]
In [22]: sign_A = list(map(lambda u: (abs(u)==u), A))
In [23]: abs_A = [abs(e) for e in A]
In [24]: i=0; k=1
In [25]: abs_A[i]*(-1, 1)[sign_A[i]] + abs_A[k]**(-1, 1)[sign_A[k]]
Out[25]: 2.3333333333333335
We do get the warning if we try to index with a numpy boolean:
In [26]: abs_A[i]*(-1, 1)[np.array(sign_A)[i]]
/usr/local/bin/ipython3:1: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
#!/usr/bin/python3
Out[26]: 2
We can get around that by making the sign_A array integer right from the start:
In [27]: abs_A[i]*(-1, 1)[np.array(sign_A,dtype=int)[i]]
Out[27]: 2
If we start with an array:
In [28]: B = np.array(A)
the sign array - using where to map directly onto (-1,1) space
In [30]: sign_B = np.where(B>=0,1,-1)
In [31]: sign_B
Out[31]: array([ 1, -1, 1, 1])
the abs array:
In [32]: abs_B = np.abs(B)
the recreated array:
In [33]: abs_B*sign_B
Out[33]: array([ 2. , -3. , 10. , 0.2])
To avoid the warning, replace np.bool() with int():
abs_A[i]*(-1, 1)[int(sign_A[i])] + ...
An easy solution here is to use the syntax truevalue if condition else falsevalue in place of (falsevalue, truevalue)[condition].
Expanding on the answer by DYZ from Mar 15 '19:
In this case remapping the sign function from boolean to integer led to the use of np.bool triggering
DeprecationWarning: In future, it will be an error for 'np.bool_'
scalars to be interpreted as an index.
Using int() instead resolved the issue.
A wrapper approach is less readable but would have also worked: int(np.bool(sign_A[i]))
In the more general case of numpy bitwise logical operators the same warning is triggered by e.g., by using an inequality check as an index:
result = X[np.less_equal(a, b)]
X could hold items of a type other than integer.
A suitable solution if X contained float items is:
result = float(X[np.less_equal(a, b)])
Alternatively,
result = X[np.less_equal(a, b)]
return = float(result)
can be used with a function definition.
The latter return form is the way I resolved a Warning triggered today within a function definition. I was guided by DYZ's answer.

Finding multiple indexes but the array always has a length of 1

This seems trivial (again) but has me stumped.
I need to find the indexes of multiple values in a numpy array. I can do this with where and isin but the resulting answer always has a length of 1 regardless of how many indexes are found. Example
import numpy as np
a = [1,3,5,7,9,11,13,15]
b = [1,7,13]
x = np.where(np.isin(a,b))
print(x)
print(len(x))
this returns
(array([0, 3, 6]),)
1
I think its because the array is a single item inside a tuple. How do I return just the array?
Just use
x = np.where(np.isin(a,b))[0]
to get what you expect.
As hpaulj points out in the comments where returns a tuple with one array for each input array dimension, in this case there is only one, which is why x is a tuple of length one.

setting an array element with a list

I'd like to create a numpy array with 3 columns (not really), the last of which will be a list of variable lengths (really).
N = 2
A = numpy.empty((N, 3))
for i in range(N):
a = random.uniform(0, 1/2)
b = random.uniform(1/2, 1)
c = []
A[i,] = [a, b, c]
Over the course of execution I will then append or remove items from the lists. I used numpy.empty to initialize the array since this is supposed to give an object type, even so I'm getting the 'setting an array with a sequence error'. I know I am, that's what I want to do.
Previous questions on this topic seem to be about avoiding the error; I need to circumvent the error. The real array has 1M+ rows, otherwise I'd consider a dictionary. Ideas?
Initialize A with
A = numpy.empty((N, 3), dtype=object)
per numpy.empty docs. This is more logical than A = numpy.empty((N, 3)).astype(object) which first creates an array of floats (default data type) and only then casts it to object type.

Resources