How to split list into numpy array? - python-3.x

A basic question about populating np arrays from a list:
m is a numpy array with shape (4,486,9).
d is a list with length 23328 and a varying number of items for each index.
I am iterating through m on dimension 1 and 2 and d on dimension 1.
I want to import 9 "columns" from particular lines of d at constant intervals, into m. 6 of those columns are successive, they are shown below with index "some_index".
What I have done below works okay but looks really heavy in syntax, and just wrong. There must be a way to export the successive columns more efficiently?
import numpy as np
m=np.empty(4,486,9)
d=[] #list filled in from files
#some_index is an integer incremented in the loops following some conditions
#some_other_index is another integer incremented in the loops following some other conditions
For i in something:
For j in another_thing:
m[i][j]=[d[some_index][-7], d[some_index][-6], d[some_index][-5], d[some_index][-4], d[some_index][-3], d[some_index][-2], d[some_other_index][4], d[some_other_index][0], d[some_other_index][4]]
Without much imagination, I tried the followings which do not work as np array needs a coma to differentiate items:
For i in something:
For j in another_thing:
m[i][j]=[d[some_index][-7:-1], d[some_other_index][4], d[some_other_index][0], d[some_other_index][4]]
ValueError: setting an array element with a sequence.
m[i][j]=[np.asarray(d[some_index][-7:-1]), d[some_other_index][4], d[some_other_index][0], d[some_other_index][4]]
ValueError: setting an array element with a sequence.
Thanks for your help.

Is this what you are looking for?
You can make use of numpy arrays to select multiple elements at once.
I have taken the liberty to create some data in order to make sure we are doing the right thing
import numpy as np
m=np.zeros((4,486,9))
d=[[2,1,2,3,1,12545,45,12], [12,56,34,23,23,6,7,4,173,47,32,3,4], [7,12,23,47,24,13,1,2], [145,45,23,45,56,565,23,2,2],
[54,13,65,47,1,45,45,23], [125,46,5,23,2,24,23,5,7]] #list filled in from files
d = np.asarray([np.asarray(i) for i in d]) # this is where the solution lies
something = [2,3]
another_thing = [10,120,200]
some_index = 0
some_other_index = 5
select_elements = [-7,-6,-5,-4,-3,-2,4,0,4] # this is the order in which you are selecting the elements
for i in something:
for j in another_thing:
print('i:{}, j:{}'.format(i, j))
m[i,j,:]=d[some_index][select_elements]
Also, I noticed you were indexing this way m[i][j] = .... You can do the same with m[i,j,:] = ...

Related

append vectors to empty list

In a for loop, I read iteratively one vector from file that then I want to put in a list or numpy array. I don't really understand how this process works for numpy arrays or lists. Since I know numpy arrays are not done to change size, I wanted to use an empty list and iteratively append the vector I'm reading
import numpy as np
a = np.array([[1],[2],[3]])
b = np.array([[2],[3],[4]])
c = timeStep = list()
c = c.append(a)
c = c.append(b)
The example above describes what I would like to do but, when I print the c list after appending a, the terminal shows there is nothing inside.
If You want to create a list I suggest you to use:
c = [] #this is an empty list in Python
Now, if You want to append a numpy array in that list, You should use:
c.append(YourNumpyArray)
Your code should look like this:
a = np.array([[1],[2],[3]])
b = np.array([[2],[3],[4]])
c = []
c.append(a)
c.append(b)
print(c)
Notice that You don't need to do c = c.append()
As you can see in the figure also, the append doesn't return anything. It appends to the original array c. So you are overwriting the appended array with None.
This should be the working code for you.
import numpy as np
a = np.array([[1],[2],[3]])
b = np.array([[2],[3],[4]])
c = timeStep = list()
c.append(a)
c.append(b)
On a side note, yes you can use python lists with numpy arrays. If the number of elements in the list c is fixed or pre-known, you can make it a numpy array too.

Sliding window over a string using python

I am working on a dataset as a part of my course practice and am stuck in a particular step. I have tried that using R, but I wish to do the same in python. I am comparatively new to python and so require help.
The data set consists of a column with name 'Seq' with seq(5000+) records. I have another column of name 'MainSeq' that contains the substring seq values in it. I need to check the presence of seq on MainSeq based on the start position given and then print 7 letters before and after each letter of the seq. i.e.
I have a a value in col 'MainSeq' as 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
Col 'Seq' contains value JKLMNO
Start Position of J= 10 and O= 15
I need to create a new column such that it takes 7 letters before and after the start letter from J till O i.e. having a total length of 15
CDEFGHI**J**KLMNOPQ
DEFGHIJ**K**LMNOPQR
EFGHIJK**L**MNOPQRS
FGHIJKL**M**NOPQRST
GHIJKLM**N**OPQRSTU
HIJKLMN**O**PQRSTUV
I know to apply the logic on a specific seq. But since I have around 5000+ seq records, I need to figure out a way to apply the same on all the seq records.
seq = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
i = seq.index('J')
j = seq.index('O')
value = 7
for mid in range(i, 1+j):
print(seq[mid-value:mid+value+1])
I'm not sure this will do exactly what you want, you've not really supplied a lot of data to test with, but it might work or at least give you a start.
import pandas as pd
df = pd.DataFrame({'MainSeq':['ABCDEFGHIJKLMNOPQRSTUVWZYZ','ABCDEFGHIJKLMNOPQRSTUVWZYZ'], 'Seq':'JKLMNO'})
def get_sequences(seq, letters, value):
sequences = [seq[seq.index(letter)-value:seq.index(letter)+value+1] for letter in letters]
return sequences
df['new_seq'] = df.apply(lambda row : get_sequences(row['MainSeq'], row['Seq'], 7), axis = 1)
df = df.explode('new_seq')
print(df)

Extracting values of a key from an array of dictionaries

I want to read from a .npy file to do some signal processing tasks but during this task I received this error:
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
this is my code:
import numpy as np
import matplotlib.pyplot as plt
file = '/signal/data.npy'
d = np.load(file,allow_pickle=True,encoding = 'latin1')
d['soma'][0]
There are same questions but I could not use them to solve this one.So can anyone help me to fix It?
Thanks
This the error:
This is part of my data( d is equal to res):
your data consists of arrays of dictionaries. for each array you have some keys with its values.
the solution as # hpaulj said is:
res[array_index]["your_key"]
You have a numpy array d and you are trying to access e.g. "soma" index which is not possible. Numpy indexing rule is:
only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices.
If your numpy array includes dictionaries, you need to extract dictionaries. d['soma'] does not extract elements of numpy array.
This loops over array d and extracts the first element of values of key 'soma' for all dictionaries in d that has key 'soma':
lfp = [i['soma'][0] for i in d if 'soma' in i]
And if it is a dataframe instead of numpy array, try:
d = pd.read_pickle(file)

Awkward arrays; choosing an item from each row according to a list of indices

So the challenge is this; given an awkward array with n rows and a list of n indices (i_1 to i_n), return a list containing element i_m of row_m for all rows.
This could be done like;
import awkward
some_awkward_array = awkward.fromiter([[1,2],[3,4,5],[6]])
some_indices = [0,2,0]
desired_elements = [row[i] for row, i in zip(some_awkward_array, some_indices)]
assert desired_elements == [1,5,6]
But if this were a numpy array we would have access to choose and so we could do;
import numpy as np
some_numpy_array = np.array([[1,2,0],[3,4,5],[6,0,0]])
some_indices = [0,2,0]
desired_elements = np.choose(some_indices, some_numpy_array.T)
assert desired_elements == [1,5,6]
The second version seems to scale better, it becomes faster somewhere round 12 rows. Is there an equivalent option for an awkward array?
Edit; maybe this is an IndexedMaskedArray thing, but I can't get it to do what I want.

Finding multiple indexes but the array always has a length of 1

This seems trivial (again) but has me stumped.
I need to find the indexes of multiple values in a numpy array. I can do this with where and isin but the resulting answer always has a length of 1 regardless of how many indexes are found. Example
import numpy as np
a = [1,3,5,7,9,11,13,15]
b = [1,7,13]
x = np.where(np.isin(a,b))
print(x)
print(len(x))
this returns
(array([0, 3, 6]),)
1
I think its because the array is a single item inside a tuple. How do I return just the array?
Just use
x = np.where(np.isin(a,b))[0]
to get what you expect.
As hpaulj points out in the comments where returns a tuple with one array for each input array dimension, in this case there is only one, which is why x is a tuple of length one.

Resources