Index a 2D matrix with a 1D matrix - python-3.x

I have a 2D matrix of values named matrix1 as shown below:
col1 col2 col3
1 1 0
2 1 2
I have a 1D matrix of values named arr1 as shown below:
col1
10
20
30
I would like to use values from this 2D matrix to index values from a 1D matrix, creating a new 2D matrix in the process.
new_col1 new_col2 new_col3
20 20 10
30 20 30
The actual arrays is shaped (512,1) and matrix shaped (65672, 720). I have tried using arr1[matrix1] but I end up getting a memory error.

A Python3 Solution:
import numpy as np
x = np.array([[1, 1, 0], [2, 1, 2]])
y = np.array([10, 20, 30])
y[x]
Output:
array([[20, 20, 10],
[30, 20, 30]])

So I noticed that I was using a 32 bit python interpreter instead of a 64 bit python interpreter ( I am using a virtual environment in pycharm) changing the python interpreter to 64 bit fixed this memory error.

Related

How to extract and remove few range of indices from a two dimensional numpy array in python

I am struck into a problem and it is required to be resolved. I have created a two dimensional matrix from a continuous range of length. Now, I want to extract few ranges of indices from that 2D matrix. Suppose, I have a matrix like:
a = [[ 12 4 35 0 26 15 100]
[17 37 29 87 46 95 120]]
Now I want to delete some part based on the indices for example: index number 2 to 5 and 8:10. After deleting I want to return my array with same two dimension. Thank you in advance.
I have tried many ways like numpy stacking and concatenating but I cannot solve the problem.
deleting columns of the numpy array is relatively straight forward.
using a corrected example from the question, it looks like this:
import numpy as np
a = np.array([
[12, 4, 35, 0, 26, 15, 100],
[17, 37, 29, 87, 46, 95, 120]])
print('first array:')
print(a)
# deletes items from first row
b = np.delete(a, [2,3], 1)
print('second array:')
print(b)
which gives this:
first array:
[[ 12 4 35 0 26 15 100]
[ 17 37 29 87 46 95 120]]
second array:
[[ 12 4 26 15 100]
[ 17 37 46 95 120]]
So the columns 2,3,4 have been removed in the above example.

Generate conditional lists of lists in Pandas, "Pythonically"

I want to generate a conditional list of lists. The number of embedded lists is determined by the number of unique conditions, and each embedded list contains values from a given condition.
I can generate this list of lists using a for-loop. See the code below. However, I am looking for a faster and more Pythonic (i.e, no for-loop) approach.
import pandas as pd
from random import randint
example_conditions = ["A","A","B","B","B","C","D","D","D","D"]
example_values = [randint(-100,100) for _ in example_conditions ]
df = pd.DataFrame({
"conditions":example_conditions,
"values": example_values
})
lol = []
for condition in df["conditions"].unique():
sublist = df.loc[df["conditions"]==condition]["values"].values.tolist()
lol.append(sublist)
Thanks!
Try:
x = df.groupby("conditions")["values"].agg(list).to_list()
print(x)
Prints:
[[-1, 78], [33, 74, -79], [59], [-32, -2, 52, -66]]
Input dataframe:
conditions values
0 A -1
1 A 78
2 B 33
3 B 74
4 B -79
5 C 59
6 D -32
7 D -2
8 D 52
9 D -66

Iterating over a list of arrays to use as input in a function

In the code below, how can I iterate over y to get all the 5 groups of 2 arrays each to use as input to func?
I know I could just do :
func(y[0],y[1])
func(y[2],y[3])...etc....
But I cant code the lines above because I can have hundreds of arrays in y
import numpy as np
import itertools
# creating an array with 100 samples
array = np.random.rand(100)
# making the array an iterator
iter_array = iter(array)
# Cerating a list of list to store 10 list of 10 elements each
n = 10
result = [[] for _ in range(n)]
# Starting the list creating
for _ in itertools.repeat(None, 10):
for i in range(n):
result[i].append(next(iter_array))
# Casting the lists to arrays
y=np.array([np.array(xi) for xi in result], dtype=object)
#list to store the results of the calculation below
result_func =[]
#applying a function take takes 2 arrays as input
#I have 10 arrays within y, so I need to perfom the function below 5 times: [0,1],[2,3],[4,5],[6,7],[8,9]
a = func(y[0],y[1])
# Saving the result
result_func.append(a)
You could use list comprehension:
result_func = [func(y[i], y[i+1]) for i in range(0, 10, 2)]
or the general for loop:
for i in range(0, 10, 2):
result_func.append(funct(y[i], y[i+1]))
Because of numpy's fill-order when reshaping, you could reshape the array to have
a variable depth (depending on the number of arrays)
a height of two
the same width as the number of elements in each input row
Thus when filling it will fill two rows before needing to increase the depth by one.
Iterating over this array results in a series of matrices (one for each depthwise layer). Each matrix has two rows, which comes out to be y[0], y[1], y[2], y[3], and so on.
For examples sake say the inner arrays each have length 6, and that there are 8 of them in total (so that there are 4 function calls):
import numpy as np
elems_in_row = 6
y = np.array(
[[1,2,3,4,5,6],
[7,8,9,10,11,12],
[13,14,15,16,17,18],
[19,20,21,22,23,24],
[25,26,27,28,29,30],
[31,32,33,34,35,36],
[37,38,39,40,41,42],
[43,44,45,46,47,48],
])
# the `-1` makes the number of rows be inferred from the input array.
y2 = y.reshape((-1,2,elems_in_row))
for ar1,ar2 in y2:
print("1st:", ar1)
print("2nd:", ar2)
print("")
output:
1st: [1 2 3 4 5 6]
2nd: [ 7 8 9 10 11 12]
1st: [13 14 15 16 17 18]
2nd: [19 20 21 22 23 24]
1st: [25 26 27 28 29 30]
2nd: [31 32 33 34 35 36]
1st: [37 38 39 40 41 42]
2nd: [43 44 45 46 47 48]
As a sidenote, if your function outputs simple values (like integers or floats) and does not have side-effects like IO, it may perhaps be possible to use apply_along_axis to create the output array directly without explicitly iterating over the pairs.

How can I use Python to convert an adjacency matrix to a transition matrix?

I am trying to convert a matrix like
1 1 0
0 1 1
0 1 1
to become
1 ⅓ 0
0 ⅓ ½
0 ⅓ ½
I was thinking about summing the rows and then dividing by them, but I was wondering if there was a better way to accomplish this using numpy or any other way in Python.
You can do it using numpy like below
import numpy as np
arr = np.array([[1, 1, 0],
[0, 1, 1],
[0, 1, 1]])
print(arr/arr.sum(axis=0))
[[1.0.33333333 0.]
[0.0.33333333 0.5]
[0.0.33333333 0.5]]

Getting count of rows using groupby in Pandas

I have two columns in my dataset, col1 and col2. I want to display data grouped by col1.
For that I have written code like:
grouped = df[['col1','col2']].groupby(['col1'], as_index= False)
The above code creates the groupby object.
How do I use the object to display the data grouped as per col1?
To get the counts by group, you can use dataframe.groupby('column').size().
Example:
In [10]:df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610],
'colour': ['black', 'white','white','white',
'black', 'black', 'white', 'white'],
'shape': ['round', 'triangular', 'triangular','triangular','square',
'triangular','round','triangular']
}, columns= ['id','colour', 'shape'])
In [11]:df
Out[11]:
id colour shape
0 123 black round
1 512 white triangular
2 zhub1 white triangular
3 12354.3 white triangular
4 129 black square
5 753 black triangular
6 295 white round
7 610 white triangular
In [12]:df.groupby('colour').size()
Out[12]:
colour
black 3
white 5
dtype: int64
In [13]:df.groupby('shape').size()
Out[13]:
shape
round 2
square 1
triangular 5
dtype: int64
Try groups attribute and get_group() method of the object returned by groupby():
>>> import numpy as np
>>> import pandas as pd
>>> anarray=np.array([[0, 31], [1, 26], [0, 35], [1, 22], [0, 41]])
>>> df = pd.DataFrame(anarray, columns=['is_female', 'age'])
>>> by_gender=df[['is_female','age']].groupby(['is_female'])
>>> by_gender.groups # returns indexes of records
{0: [0, 2, 4], 1: [1, 3]}
>>> by_gender.get_group(0)['age'] # age of males
0 31
2 35
4 41
Name: age, dtype: int64

Resources