How to convert a 2d numpy array into a 1d numpy array by summing the values and not using for loop? - python-3.x

Is there a numpy function which can combine a 2d numpy array into a 1d numpy array. I want to do it without using a for loop.
Example:
[[1 0 0 0 0], [0 1 0 0 0]] => [1 1 0 0 0]

Just use the ndarray method sum along row axis:
arr2d = np.array([[1, 3, 8, 2, 0], [0, 1, 0, 5, 1]])
arr1d = arr2d.sum(axis=0)
>>> array([1, 4, 8, 7, 1])

Related

Create a matrix from another matrix in Python 3.11

I need to create two new numpy.array matrix by using only the odd elements from another matrix for one, and the even elements for the other, and insert zeroes in the positions that aren't even or odd in the respective matrixes. How can I do that?
I tried accessing the indexes of the elements directly but this method doesn't seem to work with arrays.
Example input:
1 2 3
4 5 6
7 8 9
should yield two matrixes like:
0 2 0 1 0 3
4 0 6 and 0 5 0
0 8 0 7 0 9
You can use:
is_odd = a%2
odd = np.where(is_odd, a, 0)
even = np.where(1-is_odd, a, 0)
output:
# odd
array([[1, 0, 3],
[0, 5, 0],
[7, 0, 9]])
# even
array([[0, 2, 0],
[4, 0, 6],
[0, 8, 0]])

Flag the month of default - python pandas

I have a pandas dataframe like this:
pd.DataFrame({
'customer_id': ['100', '200', '300', '400', '500', '600'],
'Month1': [1, 1, 1, 1, 1, 1],
'Month2': [1, 0, 1, 1, 1, 1],
'Month3': [0, 0, 0, 0, 1, 1],
'Month4': [0, 0, 0, 0, 0, 1]})
This is showing a boolean value for when a customer defaults on a loan. The first month with 0 means the customer defaulted that month. I want an output that displays the month number the customer defaulted on the loan.
Output:
pd.DataFrame({
'customer_id': ['100', '200', '300', '400', '500', '600'],
'Month1': [1, 1, 1, 1, 1, 1],
'Month2': [1, 0, 1, 1, 1, 1],
'Month3': [0, 0, 0, 0, 1, 1],
'Month4': [0, 0, 0, 0, 0, 1],
'default_month': [3, 2, 3, 3, 4, np.nan]})
You can check whether all the 'Month' columns in a row are not 0, using all(axis=1) and ne(0) and return np.nan which means that the person has not yet defaulted (i.e. your row 5).
Then using eq(0)and idxmax you can check which is the first value of a row that equals to 0 and grab that column name.
import numpy as np
m = df.filter(like='Month')
df['default_month'] = np.where((m.ne(0)).all(1),np.nan,
m.eq(0).idxmax(1))
df
customer_id Month1 Month2 Month3 Month4 default_month
0 100 1 1 0 0 Month3
1 200 1 0 0 0 Month2
2 300 1 1 0 0 Month3
3 400 1 1 0 0 Month3
4 500 1 1 1 0 Month4
5 600 1 1 1 1 NaN
Here's some code to get your result:
first we need a list of months in reverse order. From your data, I just pulled them directly from the index.
months = list(df.columns[1:5])
months.reverse()
months is now ['Month4', 'Month3', 'Month2', 'Month1'].
We iterate backwards so when we find an earlier month of default, it overwrites
for (i,m) in enumerate(months):
mask = df[m] == 0 # Check for a default
df.loc[mask,'default_month'] = len(months) - i
This returns the output you are looking for.

How can I use Python to convert an adjacency matrix to a transition matrix?

I am trying to convert a matrix like
1 1 0
0 1 1
0 1 1
to become
1 ⅓ 0
0 ⅓ ½
0 ⅓ ½
I was thinking about summing the rows and then dividing by them, but I was wondering if there was a better way to accomplish this using numpy or any other way in Python.
You can do it using numpy like below
import numpy as np
arr = np.array([[1, 1, 0],
[0, 1, 1],
[0, 1, 1]])
print(arr/arr.sum(axis=0))
[[1.0.33333333 0.]
[0.0.33333333 0.5]
[0.0.33333333 0.5]]

Column wise specific value count

aMat=df1000.iloc[:,1:].values
print(aMat)
By using the above code I got the below mentioned data matrix from a dataset:
[[1 2 5 2 4]
[1 2 1 2 2]
[1 2 4 2 4]
[1 5 1 1 4]
[1 4 4 2 5]]
The data set only can hold 1,2,3,4 and 5 value. So I want to count number of 1 present in first column, number of 2 present in first column, number of 3 present in first column, number of 4 present in first column, number of 5 present in first column, number of 1 present in second column,.............so on. Means at the end the list will look like this:
[[5,0,0,0,0],[0,3,0,1,1],[2,0,0,2,5],[1,4,0,0,0],[0,1,0,3,1]]
Please help
Let's try:
df = pd.DataFrame([[1, 2, 5, 2, 4],
[1, 2, 1, 2, 2],
[1, 2, 4, 2, 4],
[1, 5, 1, 1, 4],
[1, 4, 4, 2, 5]])
df.apply(pd.Series.value_counts).reindex([1,2,3,4,5]).fillna(0).to_numpy('int')
Output:
array([[5, 0, 2, 1, 0],
[0, 3, 0, 4, 1],
[0, 0, 0, 0, 0],
[0, 1, 2, 0, 3],
[0, 1, 1, 0, 1]])
Or, transposed:
df.apply(pd.Series.value_counts).reindex([1,2,3,4,5]).fillna(0).T.to_numpy('int')
Output:
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]])
You can use np.bincount with apply_along_axis.
a = df.to_numpy()
np.apply_along_axis(np.bincount, 0, a, minlength=a.max()+1).T[:, 1:]
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]], dtype=int64)
May using stack
df.stack().groupby(level=1).value_counts().unstack(fill_value=0).reindex(columns=[1,2,3,4,5],fill_value=0)
Out[495]:
1 2 3 4 5
0 5 0 0 0 0
1 0 3 0 1 1
2 2 0 0 2 1
3 1 4 0 0 0
4 0 1 0 3 1
Method from collections
pd.DataFrame(list(map(collections.Counter,a.T))).fillna(0)#.values
Out[527]:
1 2 4 5
0 5.0 0.0 0.0 0.0
1 0.0 3.0 1.0 1.0
2 2.0 0.0 2.0 1.0
3 1.0 4.0 0.0 0.0
4 0.0 1.0 3.0 1.0
My attempt with get_dummies and sum:
pd.get_dummies(df.stack()).sum(level=1)
1 2 4 5
0 5 0 0 0
1 0 3 1 1
2 2 0 2 1
3 1 4 0 0
4 0 1 3 1
If you need the column 3 with all zeros, use reindex:
pd.get_dummies(df.stack()).sum(level=1).reindex(columns=range(1, 6), fill_value=0)
1 2 3 4 5
0 5 0 0 0 0
1 0 3 0 1 1
2 2 0 0 2 1
3 1 4 0 0 0
4 0 1 0 3 1
Or, if you fancy a main course of numpy with a side dish of broadcasting:
# edit courtesy #user3483203
np.equal.outer(df.values, np.arange(1, 6)).sum(0)
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]])

pandas df from a dictionary of list

I have a dictionary of a two-elements list and I would like to transform it into a 3 columns pandas df.
This dict
{
'Abg': [2, 0],
'Aidi': [1, 2],
'Geng': [0, 0],
}
into this df
0 1 2
Arg 2 0
Aidi 1 2
Geng 0 0
How do I do that?
Solution found:
pd.DataFrame.from_items(name_dict.items(),
orient='index',
columns=['A','B'])

Resources