Finding multiple indexes but the array always has a length of 1 - python-3.x

This seems trivial (again) but has me stumped.
I need to find the indexes of multiple values in a numpy array. I can do this with where and isin but the resulting answer always has a length of 1 regardless of how many indexes are found. Example
import numpy as np
a = [1,3,5,7,9,11,13,15]
b = [1,7,13]
x = np.where(np.isin(a,b))
print(x)
print(len(x))
this returns
(array([0, 3, 6]),)
1
I think its because the array is a single item inside a tuple. How do I return just the array?

Just use
x = np.where(np.isin(a,b))[0]
to get what you expect.
As hpaulj points out in the comments where returns a tuple with one array for each input array dimension, in this case there is only one, which is why x is a tuple of length one.

Related

Assigning values to discontinued slices in a ndarray

I have a base array that contains data. Some indices in that array need to be re-assigned a new value, and the indices which do are discontinued. I'd like to avoid for-looping over all of that and using the slice notation as it's likely to be faster.
For instance:
arr = np.zeros(100)
sl_obj_1 = slice(2,5)
arr[sl_obj_1] = 42
Works for a single slice. But I have another discontinued slice to apply to that same array, say
sl_obj_2 = slice(12,29)
arr[sl_obj_1] = 55
I would like to accomplish something along the lines of:
arr[sl_obj_1, sl_obj_2] = 42, 55
Any ideas?
EDIT: changed example to emphasis that sequences are or varying lenghts.
There isn't a good way to directly extract multiple slices from a NumPy array, much less different-sized slices. But you can cheat by converting your slices into indices, and using an index array.
In the case of 1-dimensional arrays, this is relatively simple using index arrays.
import numpy as np
def slice_indices(some_list, some_slice):
"""Convert a slice into indices of a list"""
return np.arange(len(some_list))[some_slice]
# For a non-NumPy solution, use this:
# return range(*some_slice.indices(len(some_list)))
arr = np.arange(10)
# We'll make [1, 2, 3] and [8, 7] negative.
slice1, new1 = np.s_[1:4], [-1, -2, -3]
slice2, new2 = np.s_[8:6:-1], [-8, -7]
# (Here, np.s_ is just a nicer notation for slices.[1])
# Get indices to replace
idx1 = slice_indices(arr, slice1)
idx2 = slice_indices(arr, slice2)
# Use index arrays to assign to all of the indices
arr[[*idx1, *idx2]] = *new1, *new2
# That line expands to this:
arr[[1, 2, 3, 8, 7]] = -1, -2, -3, -8, -7
Note that this doesn't entirely avoid Python iteration—the star operators still create iterators and the index array is a regular python list. In a case with large amounts of data, this could be noticeably slower than the manual approach, because it calculates each index that will be assigned.
You will also need to make sure the replacement data is already the right shape, or you can use NumPy's manual broadcasting functions (e.g. np.broadcast_to) to fix the shapes. This introduces additional overhead—if you were rely on automatic broadcasting, you're probably better off doing the assignments in a loop.
arr = np.zeros(100)
idx1 = slice_indices(arr, slice(2, 5))
idx2 = slice_indices(arr, slice(12, 29))
new1 = np.broadcast_to(42, len(idx1))
new2 = np.broadcast_to(55, len(idx2))
arr[*idx1, *idx2] = *new1, *new2
To generalize to more dimensions, slice_indices will need to take care of shape, and you'll have to be more careful about joining multiple sets of indices (rather than arr[[i1, i2, i3]], you'll need arr[[i1, i2, i3], [j1, j2, j3]], which can't be concatenated directly).
In practice, if you need to do this a lot, you'd probably be better off using a simple function to encapsulate the loop you're trying to avoid.
def set_slices(arr, *indices_and_values):
"""Set multiple locations in an array to their corresponding values.
indices_and_values should be a list of index-value pairs.
"""
for idx, val in indices_and_values:
arr[idx] = val
# Your example:
arr = np.zeros(100)
set_slices(arr, (np.s_[2:5], 42), (np.s_[12:29], 55))
(If your only goal is making it look like you are using multiple indices simultaneously, here are two functions that try to do everything for you, including broadcasting and handling multidimensional arrays.)
1 np.s_

How to split list into numpy array?

A basic question about populating np arrays from a list:
m is a numpy array with shape (4,486,9).
d is a list with length 23328 and a varying number of items for each index.
I am iterating through m on dimension 1 and 2 and d on dimension 1.
I want to import 9 "columns" from particular lines of d at constant intervals, into m. 6 of those columns are successive, they are shown below with index "some_index".
What I have done below works okay but looks really heavy in syntax, and just wrong. There must be a way to export the successive columns more efficiently?
import numpy as np
m=np.empty(4,486,9)
d=[] #list filled in from files
#some_index is an integer incremented in the loops following some conditions
#some_other_index is another integer incremented in the loops following some other conditions
For i in something:
For j in another_thing:
m[i][j]=[d[some_index][-7], d[some_index][-6], d[some_index][-5], d[some_index][-4], d[some_index][-3], d[some_index][-2], d[some_other_index][4], d[some_other_index][0], d[some_other_index][4]]
Without much imagination, I tried the followings which do not work as np array needs a coma to differentiate items:
For i in something:
For j in another_thing:
m[i][j]=[d[some_index][-7:-1], d[some_other_index][4], d[some_other_index][0], d[some_other_index][4]]
ValueError: setting an array element with a sequence.
m[i][j]=[np.asarray(d[some_index][-7:-1]), d[some_other_index][4], d[some_other_index][0], d[some_other_index][4]]
ValueError: setting an array element with a sequence.
Thanks for your help.
Is this what you are looking for?
You can make use of numpy arrays to select multiple elements at once.
I have taken the liberty to create some data in order to make sure we are doing the right thing
import numpy as np
m=np.zeros((4,486,9))
d=[[2,1,2,3,1,12545,45,12], [12,56,34,23,23,6,7,4,173,47,32,3,4], [7,12,23,47,24,13,1,2], [145,45,23,45,56,565,23,2,2],
[54,13,65,47,1,45,45,23], [125,46,5,23,2,24,23,5,7]] #list filled in from files
d = np.asarray([np.asarray(i) for i in d]) # this is where the solution lies
something = [2,3]
another_thing = [10,120,200]
some_index = 0
some_other_index = 5
select_elements = [-7,-6,-5,-4,-3,-2,4,0,4] # this is the order in which you are selecting the elements
for i in something:
for j in another_thing:
print('i:{}, j:{}'.format(i, j))
m[i,j,:]=d[some_index][select_elements]
Also, I noticed you were indexing this way m[i][j] = .... You can do the same with m[i,j,:] = ...

How can I logically test the output of a np.where result?

I was trying to scan an array for values and take action depending on the result. However, when I had a closer look at what the code was doing I noticed that my logical condition was ill posed.
I will illustrate what I mean with the following example:
#importing numpy
import numpy as np
#creating a test array
a = np.zeros((3,3))
#searching items bigger than 1 in 'a'
index = np.where(a > 1)
I was expecting my index to return an empty list. In fact it returns a tuple object, like:
index
Out[5]: (array([], dtype=int64), array([], dtype=int64))
So, the test I was imposing:
#testing if there are values
#in 'a' that fulfil the where condition
if index[0] != []:
print('Values found.')
#testing if there are no values
#in 'a' that fulfil the where condition
if index[0] == []:
print('No values found.')
Will not achieve its purpose because I was comparing different objects (is that correct to say?).
So what is the correct way to create this test?
Thanks for your time!
For your 2D array, np.where returns a tuple of arrays of indices (one for each axis), so that a[index] gives you an array of the elements fulfilling the condition.
Indeed, you compared an empty list to an empty array. Instead, I would compare the size property (or e.g. len()) of the first element of this tuple:
if index[0].size == 0:
print('No values found.')

setting an array element with a list

I'd like to create a numpy array with 3 columns (not really), the last of which will be a list of variable lengths (really).
N = 2
A = numpy.empty((N, 3))
for i in range(N):
a = random.uniform(0, 1/2)
b = random.uniform(1/2, 1)
c = []
A[i,] = [a, b, c]
Over the course of execution I will then append or remove items from the lists. I used numpy.empty to initialize the array since this is supposed to give an object type, even so I'm getting the 'setting an array with a sequence error'. I know I am, that's what I want to do.
Previous questions on this topic seem to be about avoiding the error; I need to circumvent the error. The real array has 1M+ rows, otherwise I'd consider a dictionary. Ideas?
Initialize A with
A = numpy.empty((N, 3), dtype=object)
per numpy.empty docs. This is more logical than A = numpy.empty((N, 3)).astype(object) which first creates an array of floats (default data type) and only then casts it to object type.

How to get list of indices for elements whose value is the maximum in that list

Suppose I have a list l=[3,4,4,2,1,4,6]
I would like to obtain a subset of this list containing the indices of elements whose value is max(l).
In this case, list of indices will be [1,2,5].
I am using this approach to solve a problem where, a list of numbers are provided, for example
l=[1,2,3,4,3,2,2,3,4,5,6,7,5,4,3,2,2,3,4,3,4,5,6,7]
I need to identify the max occurence of an element, however in case more than 1 element appears the same number of times,
I need to choose the element which is greater in magnitude,
suppose I apply a counter on l and get {1:5,2:5,3:4...}, I have to choose '2' instead of '1'.
Please suggest how to solve this
Edit-
The problem begins like this,
1) a list is provided as an input
l=[1 4 4 4 5 3]
2)I run a Counter on this to obtain the counts of each unique element
3)I need to obtain the key whose value is maximum
4)Suppose the Counter object contains multiple entries whose value is maximum,
as in Counter{1:4,2:4,3:4,5:1}
I have to choose 3 as the key whose value is 4.
5)So far, I have been able to get the Counter object, I have seperated key/value lists using k=counter.keys();v=counter.values()
6)I want to get the indices whose values are max in v
If I run v.index(max(v)), I get the first index whose value matches max value, but I want to obtain the list of indices whose value is max, so that I can obtain corresponding list of keys and obtain max key in that list.
With long lists, using NumPy or any other linear algebra would be helpful, otherwise you can simply use either
l.index(max(l))
or
max(range(len(l)),key=l)
These however return only one of the many argmax's.
So for your problem, you can choose to reverse the array, since you want the maximum that appears later as :
len(l)-l[::-1].index(max(l))-1
If I understood correctly, the following should do what you want.
from collections import Counter
def get_largest_most_freq(lst):
c = Counter(lst)
# get the largest frequency
freq = max(c.values())
# get list of all the values that occur _max times
items = [k for k, v in c.items() if v == freq]
# return largest most frequent item
return max(items)
def get_indexes_of_most_freq(lst):
_max = get_largest_most_freq(lst)
# get list of all indexes that have a value matching _max
return [i for i, v in enumerate(lst) if v == _max]
>>> lst = [3,4,4,2,1,4,6]
>>> get_largest_most_freq(lst)
4
>>> get_indexes_of_most_freq(lst)
[1, 2, 5]
>>> lst = [1,2,3,4,3,2,2,3,4,5,6,7,5,4,3,2,2,3,4,3,4,5,6,7]
>>> get_largest_most_freq(lst)
3
>>> get_indexes_of_most_freq(lst)
[2, 4, 7, 14, 17, 19]

Resources