Flag the month of default - python pandas - python-3.x

I have a pandas dataframe like this:
pd.DataFrame({
'customer_id': ['100', '200', '300', '400', '500', '600'],
'Month1': [1, 1, 1, 1, 1, 1],
'Month2': [1, 0, 1, 1, 1, 1],
'Month3': [0, 0, 0, 0, 1, 1],
'Month4': [0, 0, 0, 0, 0, 1]})
This is showing a boolean value for when a customer defaults on a loan. The first month with 0 means the customer defaulted that month. I want an output that displays the month number the customer defaulted on the loan.
Output:
pd.DataFrame({
'customer_id': ['100', '200', '300', '400', '500', '600'],
'Month1': [1, 1, 1, 1, 1, 1],
'Month2': [1, 0, 1, 1, 1, 1],
'Month3': [0, 0, 0, 0, 1, 1],
'Month4': [0, 0, 0, 0, 0, 1],
'default_month': [3, 2, 3, 3, 4, np.nan]})

You can check whether all the 'Month' columns in a row are not 0, using all(axis=1) and ne(0) and return np.nan which means that the person has not yet defaulted (i.e. your row 5).
Then using eq(0)and idxmax you can check which is the first value of a row that equals to 0 and grab that column name.
import numpy as np
m = df.filter(like='Month')
df['default_month'] = np.where((m.ne(0)).all(1),np.nan,
m.eq(0).idxmax(1))
df
customer_id Month1 Month2 Month3 Month4 default_month
0 100 1 1 0 0 Month3
1 200 1 0 0 0 Month2
2 300 1 1 0 0 Month3
3 400 1 1 0 0 Month3
4 500 1 1 1 0 Month4
5 600 1 1 1 1 NaN

Here's some code to get your result:
first we need a list of months in reverse order. From your data, I just pulled them directly from the index.
months = list(df.columns[1:5])
months.reverse()
months is now ['Month4', 'Month3', 'Month2', 'Month1'].
We iterate backwards so when we find an earlier month of default, it overwrites
for (i,m) in enumerate(months):
mask = df[m] == 0 # Check for a default
df.loc[mask,'default_month'] = len(months) - i
This returns the output you are looking for.

Related

How to solve diagonally constraint sudoku?

Write a program to solve a Sudoku puzzle by filling the empty cells where 0 represent empty cell.
Rules:
All the number in sudoku must appear exactly once in diagonal running from top-left to bottom-right.
All the number in sudoku must appear exactly
once in diagonal running from top-right to bottom-left.
All the number in sudoku
must appear exactly once in a 3*3 sub-grid.
However number can repeat in row or column.
Constraints:
N = 9; where N represent rows and column of grid.
What I have tried:
N = 9
def printing(arr):
for i in range(N):
for j in range(N):
print(arr[i][j], end=" ")
print()
def isSafe(grid, row, col, num):
for x in range(9):
if grid[x][x] == num:
return False
cl = 0
for y in range(8, -1, -1):
if grid[cl][y] == num:
cl += 1
return False
startRow = row - row % 3
startCol = col - col % 3
for i in range(3):
for j in range(3):
if grid[i + startRow][j + startCol] == num:
return False
return True
def solveSudoku(grid, row, col):
if (row == N - 1 and col == N):
return True
if col == N:
row += 1
col = 0
if grid[row][col] > 0:
return solveSudoku(grid, row, col + 1)
for num in range(1, N + 1, 1):
if isSafe(grid, row, col, num):
grid[row][col] = num
print(grid)
if solveSudoku(grid, row, col + 1):
return True
grid[row][col] = 0
return False
if (solveSudoku(grid, 0, 0)):
printing(grid)
else:
print("Solution does not exist")
Input:
grid =
[
[0, 3, 7, 0, 4, 2, 0, 2, 0],
[5, 0, 6, 1, 0, 0, 0, 0, 7],
[0, 0, 2, 0, 0, 0, 5, 0, 0],
[2, 8, 3, 0, 0, 0, 0, 0, 0],
[0, 5, 0, 0, 7, 1, 2, 0, 7],
[0, 0, 0, 3, 0, 0, 0, 0, 3],
[7, 0, 0, 0, 0, 6, 0, 5, 0],
[0, 2, 3, 0, 3, 0, 7, 4, 2],
[0, 5, 0, 0, 8, 0, 0, 0, 0]
]
Output:
8 3 7 0 4 2 1 2 4
5 0 6 1 8 6 3 6 7
4 1 2 7 3 5 5 0 8
2 8 3 5 4 8 6 0 5
6 5 0 0 7 1 2 1 7
1 4 7 3 2 6 8 4 3
7 8 1 4 1 6 3 5 0
6 2 3 7 3 5 7 4 2
0 5 4 2 8 0 6 8 1
Basically I am stuck on the implementation of checking distinct number along diagonal. So question is how I can make sure that no elements gets repeated in those diagonal and sub-grid.

Pandas value_counts on multiple columns : inconsistent result (bug ?)

I have the following dataframe :
df = pd.DataFrame( np.array([
[ 1, -1, -1, -1, -1, -1, 1],
[ 1, 1, 1, -1, -1, -1, 1],
[ 1, -1, -1, -1, -1, -1, 1],
[ 1, 1, 1, -1, 1, 1, 1],
[-1, -1, -1, 1, 1, 1, 1],
[ 1, -1, -1, -1, -1, -1, 1],
[-1, -1, -1, -1, 1, 1, 1] ] ) ,
columns = ["A", "B", "C", "D", "E", "F", "G"])
and I want to compute the number of time each row sequence of values appears (an histogram of the rows). For this I use value_counts :
df.value_counts()
But I got the following output with missing values for some sequences and some of the beginning columns :
A B C D E F G
1 -1 -1 -1 -1 -1 1 3
1 1 -1 1 1 1 1
-1 -1 1 1
-1 -1 -1 1 1 1 1 1
-1 1 1 1 1
dtype: int64
After looking at the data, it seems that when a value is identical in the first columns, it is not always output.
Is this a bug ? How can I get an histogram with full row values ?
I'm using Python 3.7.9 with pandas 1.1.5
The result of value_counts() is a series, with a MultiIndex representing the values.
df.value_counts().index.values
#array([(1, -1, -1, -1, -1, -1, 1), (-1, -1, -1, -1, 1, 1, 1),
# (-1, -1, -1, 1, 1, 1, 1), (1, 1, 1, -1, -1, -1, 1),
# (1, 1, 1, -1, 1, 1, 1)], dtype=object)
Try this to see if formatted better if you're in a Jupyter notebook:
newdf = df.value_counts().to_frame("counts")
or
newdf2 = df.value_counts().to_frame("counts").reset_index()
print(newdf2)
A B C D E F G counts
0 1 -1 -1 -1 -1 -1 1 3
1 -1 -1 -1 -1 1 1 1 1
2 -1 -1 -1 1 1 1 1 1
3 1 1 1 -1 -1 -1 1 1
4 1 1 1 -1 1 1 1 1
If you want a simple bar chart, you can do:
newdf.plot.bar()

Pandas Multiple condition on column

Column A Column B
1 10
1 10
0 10
0 10
3 20
I have to check if column A is 0 and column B is 10 then change column A to 1
df.loc[(df['Column A']==0) & (df['Column B'] == 10) ,df['Column A']] = 1
But i am getting an error
Final df should look something like this None of [Int64Index([1, 1, 0, 0, 0, 1, 1, 0, 1, 1,\n ...\n 1, 0, 0, 0, 0, 2, 1, 0, 1, 1],\n dtype='int64', length=2715)] are in the [columns]"
I think my solution is not correct. All the indexes are correct
Column A Column B
1 10
1 10
1 10
1 10
3 20

transform integer value patterns in a column to a group

DataFrame
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3]})
df
Expected output
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]})
df
I need to transform the dataframe into the output. I am after a wild card that will determine 1 is the start of a new group and a group consists of only 1 followed by n zeroes. If a group criteria is not met, then group it as 100.
I tried in the line of;
bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze()
bs
This even when broken down could only bool select start and nothing more.
Any help?
Create mask by compare 1 and next 1 in mask, then filter occurance for all values without them, create cumulative sum by Series.cumsum and last add 100 values by Series.reindex:
m = df.occurance.eq(1) & df.occurance.shift(-1).eq(1)
df['group'] = df.loc[~m, 'occurance'].cumsum().reindex(df.index, fill_value=100)
print (df)
occurance value group
0 1 45 1
1 0 3 1
2 0 2 1
3 0 12 1
4 1 14 2
5 0 32 2
6 0 1 2
7 0 1 2
8 0 6 2
9 0 4 2
10 1 9 3
11 0 32 3
12 1 78 100
13 1 96 4
14 0 12 4
15 0 6 4
16 0 3 4

Column wise specific value count

aMat=df1000.iloc[:,1:].values
print(aMat)
By using the above code I got the below mentioned data matrix from a dataset:
[[1 2 5 2 4]
[1 2 1 2 2]
[1 2 4 2 4]
[1 5 1 1 4]
[1 4 4 2 5]]
The data set only can hold 1,2,3,4 and 5 value. So I want to count number of 1 present in first column, number of 2 present in first column, number of 3 present in first column, number of 4 present in first column, number of 5 present in first column, number of 1 present in second column,.............so on. Means at the end the list will look like this:
[[5,0,0,0,0],[0,3,0,1,1],[2,0,0,2,5],[1,4,0,0,0],[0,1,0,3,1]]
Please help
Let's try:
df = pd.DataFrame([[1, 2, 5, 2, 4],
[1, 2, 1, 2, 2],
[1, 2, 4, 2, 4],
[1, 5, 1, 1, 4],
[1, 4, 4, 2, 5]])
df.apply(pd.Series.value_counts).reindex([1,2,3,4,5]).fillna(0).to_numpy('int')
Output:
array([[5, 0, 2, 1, 0],
[0, 3, 0, 4, 1],
[0, 0, 0, 0, 0],
[0, 1, 2, 0, 3],
[0, 1, 1, 0, 1]])
Or, transposed:
df.apply(pd.Series.value_counts).reindex([1,2,3,4,5]).fillna(0).T.to_numpy('int')
Output:
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]])
You can use np.bincount with apply_along_axis.
a = df.to_numpy()
np.apply_along_axis(np.bincount, 0, a, minlength=a.max()+1).T[:, 1:]
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]], dtype=int64)
May using stack
df.stack().groupby(level=1).value_counts().unstack(fill_value=0).reindex(columns=[1,2,3,4,5],fill_value=0)
Out[495]:
1 2 3 4 5
0 5 0 0 0 0
1 0 3 0 1 1
2 2 0 0 2 1
3 1 4 0 0 0
4 0 1 0 3 1
Method from collections
pd.DataFrame(list(map(collections.Counter,a.T))).fillna(0)#.values
Out[527]:
1 2 4 5
0 5.0 0.0 0.0 0.0
1 0.0 3.0 1.0 1.0
2 2.0 0.0 2.0 1.0
3 1.0 4.0 0.0 0.0
4 0.0 1.0 3.0 1.0
My attempt with get_dummies and sum:
pd.get_dummies(df.stack()).sum(level=1)
1 2 4 5
0 5 0 0 0
1 0 3 1 1
2 2 0 2 1
3 1 4 0 0
4 0 1 3 1
If you need the column 3 with all zeros, use reindex:
pd.get_dummies(df.stack()).sum(level=1).reindex(columns=range(1, 6), fill_value=0)
1 2 3 4 5
0 5 0 0 0 0
1 0 3 0 1 1
2 2 0 0 2 1
3 1 4 0 0 0
4 0 1 0 3 1
Or, if you fancy a main course of numpy with a side dish of broadcasting:
# edit courtesy #user3483203
np.equal.outer(df.values, np.arange(1, 6)).sum(0)
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]])

Resources