Pandas Multiple condition on column - python-3.x

Column A Column B
1 10
1 10
0 10
0 10
3 20
I have to check if column A is 0 and column B is 10 then change column A to 1
df.loc[(df['Column A']==0) & (df['Column B'] == 10) ,df['Column A']] = 1
But i am getting an error
Final df should look something like this None of [Int64Index([1, 1, 0, 0, 0, 1, 1, 0, 1, 1,\n ...\n 1, 0, 0, 0, 0, 2, 1, 0, 1, 1],\n dtype='int64', length=2715)] are in the [columns]"
I think my solution is not correct. All the indexes are correct
Column A Column B
1 10
1 10
1 10
1 10
3 20

Related

How to solve diagonally constraint sudoku?

Write a program to solve a Sudoku puzzle by filling the empty cells where 0 represent empty cell.
Rules:
All the number in sudoku must appear exactly once in diagonal running from top-left to bottom-right.
All the number in sudoku must appear exactly
once in diagonal running from top-right to bottom-left.
All the number in sudoku
must appear exactly once in a 3*3 sub-grid.
However number can repeat in row or column.
Constraints:
N = 9; where N represent rows and column of grid.
What I have tried:
N = 9
def printing(arr):
for i in range(N):
for j in range(N):
print(arr[i][j], end=" ")
print()
def isSafe(grid, row, col, num):
for x in range(9):
if grid[x][x] == num:
return False
cl = 0
for y in range(8, -1, -1):
if grid[cl][y] == num:
cl += 1
return False
startRow = row - row % 3
startCol = col - col % 3
for i in range(3):
for j in range(3):
if grid[i + startRow][j + startCol] == num:
return False
return True
def solveSudoku(grid, row, col):
if (row == N - 1 and col == N):
return True
if col == N:
row += 1
col = 0
if grid[row][col] > 0:
return solveSudoku(grid, row, col + 1)
for num in range(1, N + 1, 1):
if isSafe(grid, row, col, num):
grid[row][col] = num
print(grid)
if solveSudoku(grid, row, col + 1):
return True
grid[row][col] = 0
return False
if (solveSudoku(grid, 0, 0)):
printing(grid)
else:
print("Solution does not exist")
Input:
grid =
[
[0, 3, 7, 0, 4, 2, 0, 2, 0],
[5, 0, 6, 1, 0, 0, 0, 0, 7],
[0, 0, 2, 0, 0, 0, 5, 0, 0],
[2, 8, 3, 0, 0, 0, 0, 0, 0],
[0, 5, 0, 0, 7, 1, 2, 0, 7],
[0, 0, 0, 3, 0, 0, 0, 0, 3],
[7, 0, 0, 0, 0, 6, 0, 5, 0],
[0, 2, 3, 0, 3, 0, 7, 4, 2],
[0, 5, 0, 0, 8, 0, 0, 0, 0]
]
Output:
8 3 7 0 4 2 1 2 4
5 0 6 1 8 6 3 6 7
4 1 2 7 3 5 5 0 8
2 8 3 5 4 8 6 0 5
6 5 0 0 7 1 2 1 7
1 4 7 3 2 6 8 4 3
7 8 1 4 1 6 3 5 0
6 2 3 7 3 5 7 4 2
0 5 4 2 8 0 6 8 1
Basically I am stuck on the implementation of checking distinct number along diagonal. So question is how I can make sure that no elements gets repeated in those diagonal and sub-grid.

Flag the month of default - python pandas

I have a pandas dataframe like this:
pd.DataFrame({
'customer_id': ['100', '200', '300', '400', '500', '600'],
'Month1': [1, 1, 1, 1, 1, 1],
'Month2': [1, 0, 1, 1, 1, 1],
'Month3': [0, 0, 0, 0, 1, 1],
'Month4': [0, 0, 0, 0, 0, 1]})
This is showing a boolean value for when a customer defaults on a loan. The first month with 0 means the customer defaulted that month. I want an output that displays the month number the customer defaulted on the loan.
Output:
pd.DataFrame({
'customer_id': ['100', '200', '300', '400', '500', '600'],
'Month1': [1, 1, 1, 1, 1, 1],
'Month2': [1, 0, 1, 1, 1, 1],
'Month3': [0, 0, 0, 0, 1, 1],
'Month4': [0, 0, 0, 0, 0, 1],
'default_month': [3, 2, 3, 3, 4, np.nan]})
You can check whether all the 'Month' columns in a row are not 0, using all(axis=1) and ne(0) and return np.nan which means that the person has not yet defaulted (i.e. your row 5).
Then using eq(0)and idxmax you can check which is the first value of a row that equals to 0 and grab that column name.
import numpy as np
m = df.filter(like='Month')
df['default_month'] = np.where((m.ne(0)).all(1),np.nan,
m.eq(0).idxmax(1))
df
customer_id Month1 Month2 Month3 Month4 default_month
0 100 1 1 0 0 Month3
1 200 1 0 0 0 Month2
2 300 1 1 0 0 Month3
3 400 1 1 0 0 Month3
4 500 1 1 1 0 Month4
5 600 1 1 1 1 NaN
Here's some code to get your result:
first we need a list of months in reverse order. From your data, I just pulled them directly from the index.
months = list(df.columns[1:5])
months.reverse()
months is now ['Month4', 'Month3', 'Month2', 'Month1'].
We iterate backwards so when we find an earlier month of default, it overwrites
for (i,m) in enumerate(months):
mask = df[m] == 0 # Check for a default
df.loc[mask,'default_month'] = len(months) - i
This returns the output you are looking for.

transform integer value patterns in a column to a group

DataFrame
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3]})
df
Expected output
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]})
df
I need to transform the dataframe into the output. I am after a wild card that will determine 1 is the start of a new group and a group consists of only 1 followed by n zeroes. If a group criteria is not met, then group it as 100.
I tried in the line of;
bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze()
bs
This even when broken down could only bool select start and nothing more.
Any help?
Create mask by compare 1 and next 1 in mask, then filter occurance for all values without them, create cumulative sum by Series.cumsum and last add 100 values by Series.reindex:
m = df.occurance.eq(1) & df.occurance.shift(-1).eq(1)
df['group'] = df.loc[~m, 'occurance'].cumsum().reindex(df.index, fill_value=100)
print (df)
occurance value group
0 1 45 1
1 0 3 1
2 0 2 1
3 0 12 1
4 1 14 2
5 0 32 2
6 0 1 2
7 0 1 2
8 0 6 2
9 0 4 2
10 1 9 3
11 0 32 3
12 1 78 100
13 1 96 4
14 0 12 4
15 0 6 4
16 0 3 4

Check if any row has the same values as a numpy array

I am working with a pandas.Dataframe that looks as follows:
A B C D
index
1 0 0 0 1
2 1 0 0 1
3 ...
4 ...
...
And I am creating a numpy.arrays that have the same shape as a row within this dataframe. I want to check if the array I am creating 'is present' within the dataframe.
In this case, for example, my array would look like this, if it is in the dataframe:
a= [0,0,0,1]
It is not if it looks like this:
b = [1,1,1,1]
Any help, even if it is a link to the right answer, is much appreciated as I have looked through stackoverflow and fortunately I did not miss anything.
df = pd.DataFrame({'A':[0, 1, 0, 0],
'B':[0, 0, 1, 1],
'C':[0, 0, 0, 0],
'D':[1, 1, 0, 1]})
# A B C D
# 0 0 0 0 1
# 1 1 0 0 1
# 2 0 1 0 0
# 3 0 1 0 1
>>> a = [0, 0, 0, 1]
>>> (df == a).all(axis=1).any()
True
>>> b = [1, 1, 1, 1]
>>> (df == b).all(axis=1).any()
False

vectorize groupby pandas

I have a dataframe like this:
day time category count
1 1 a 13
1 2 a 47
1 3 a 1
1 5 a 2
1 6 a 4
2 7 a 14
2 2 a 10
2 1 a 9
2 4 a 2
2 6 a 1
I want to group by day, and category and get a vector of the counts per time. Where time can be between 1 and 10. The max and min of time I have defined in two variables called max and min.
This is how I want the resulting dataframe to look:
day category count
1 a [13,47,1,0,2,4,0,0,0,0]
2 a [9,10,0,2,0,1,14,0,0,0]
Does anyone know how to make this aggregation into a vaector?
Use reindex with MultiIndex.from_product for append missing categories and then groupby with list:
df = df.set_index(['day','time', 'category'])
a = df.index.levels[0]
b = range(1,11)
c = df.index.levels[2]
df = df.reindex(pd.MultiIndex.from_product([a,b,c], names=df.index.names), fill_value=0)
df = df.groupby(['day','category'])['count'].apply(list).reset_index()
print (df)
day category count
0 1 a [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 a [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
EDIT:
df = (df.set_index(['day','time', 'category'])['count']
.unstack(1, fill_value=0)
.reindex(columns=range(1,11), fill_value=0))
print (df)
time 1 2 3 4 5 6 7 8 9 10
day category
1 a 13 47 1 0 2 4 0 0 0 0
2 a 9 10 0 2 0 1 14 0 0 0
df = df.apply(list, 1).reset_index(name='count')
print (df)
day ... count
0 1 ... [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 ... [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
[2 rows x 3 columns]

Resources