transform integer value patterns in a column to a group - python-3.x

DataFrame
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3]})
df
Expected output
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]})
df
I need to transform the dataframe into the output. I am after a wild card that will determine 1 is the start of a new group and a group consists of only 1 followed by n zeroes. If a group criteria is not met, then group it as 100.
I tried in the line of;
bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze()
bs
This even when broken down could only bool select start and nothing more.
Any help?

Create mask by compare 1 and next 1 in mask, then filter occurance for all values without them, create cumulative sum by Series.cumsum and last add 100 values by Series.reindex:
m = df.occurance.eq(1) & df.occurance.shift(-1).eq(1)
df['group'] = df.loc[~m, 'occurance'].cumsum().reindex(df.index, fill_value=100)
print (df)
occurance value group
0 1 45 1
1 0 3 1
2 0 2 1
3 0 12 1
4 1 14 2
5 0 32 2
6 0 1 2
7 0 1 2
8 0 6 2
9 0 4 2
10 1 9 3
11 0 32 3
12 1 78 100
13 1 96 4
14 0 12 4
15 0 6 4
16 0 3 4

Related

How to solve diagonally constraint sudoku?

Write a program to solve a Sudoku puzzle by filling the empty cells where 0 represent empty cell.
Rules:
All the number in sudoku must appear exactly once in diagonal running from top-left to bottom-right.
All the number in sudoku must appear exactly
once in diagonal running from top-right to bottom-left.
All the number in sudoku
must appear exactly once in a 3*3 sub-grid.
However number can repeat in row or column.
Constraints:
N = 9; where N represent rows and column of grid.
What I have tried:
N = 9
def printing(arr):
for i in range(N):
for j in range(N):
print(arr[i][j], end=" ")
print()
def isSafe(grid, row, col, num):
for x in range(9):
if grid[x][x] == num:
return False
cl = 0
for y in range(8, -1, -1):
if grid[cl][y] == num:
cl += 1
return False
startRow = row - row % 3
startCol = col - col % 3
for i in range(3):
for j in range(3):
if grid[i + startRow][j + startCol] == num:
return False
return True
def solveSudoku(grid, row, col):
if (row == N - 1 and col == N):
return True
if col == N:
row += 1
col = 0
if grid[row][col] > 0:
return solveSudoku(grid, row, col + 1)
for num in range(1, N + 1, 1):
if isSafe(grid, row, col, num):
grid[row][col] = num
print(grid)
if solveSudoku(grid, row, col + 1):
return True
grid[row][col] = 0
return False
if (solveSudoku(grid, 0, 0)):
printing(grid)
else:
print("Solution does not exist")
Input:
grid =
[
[0, 3, 7, 0, 4, 2, 0, 2, 0],
[5, 0, 6, 1, 0, 0, 0, 0, 7],
[0, 0, 2, 0, 0, 0, 5, 0, 0],
[2, 8, 3, 0, 0, 0, 0, 0, 0],
[0, 5, 0, 0, 7, 1, 2, 0, 7],
[0, 0, 0, 3, 0, 0, 0, 0, 3],
[7, 0, 0, 0, 0, 6, 0, 5, 0],
[0, 2, 3, 0, 3, 0, 7, 4, 2],
[0, 5, 0, 0, 8, 0, 0, 0, 0]
]
Output:
8 3 7 0 4 2 1 2 4
5 0 6 1 8 6 3 6 7
4 1 2 7 3 5 5 0 8
2 8 3 5 4 8 6 0 5
6 5 0 0 7 1 2 1 7
1 4 7 3 2 6 8 4 3
7 8 1 4 1 6 3 5 0
6 2 3 7 3 5 7 4 2
0 5 4 2 8 0 6 8 1
Basically I am stuck on the implementation of checking distinct number along diagonal. So question is how I can make sure that no elements gets repeated in those diagonal and sub-grid.

Pandas Multiple condition on column

Column A Column B
1 10
1 10
0 10
0 10
3 20
I have to check if column A is 0 and column B is 10 then change column A to 1
df.loc[(df['Column A']==0) & (df['Column B'] == 10) ,df['Column A']] = 1
But i am getting an error
Final df should look something like this None of [Int64Index([1, 1, 0, 0, 0, 1, 1, 0, 1, 1,\n ...\n 1, 0, 0, 0, 0, 2, 1, 0, 1, 1],\n dtype='int64', length=2715)] are in the [columns]"
I think my solution is not correct. All the indexes are correct
Column A Column B
1 10
1 10
1 10
1 10
3 20

Column wise specific value count

aMat=df1000.iloc[:,1:].values
print(aMat)
By using the above code I got the below mentioned data matrix from a dataset:
[[1 2 5 2 4]
[1 2 1 2 2]
[1 2 4 2 4]
[1 5 1 1 4]
[1 4 4 2 5]]
The data set only can hold 1,2,3,4 and 5 value. So I want to count number of 1 present in first column, number of 2 present in first column, number of 3 present in first column, number of 4 present in first column, number of 5 present in first column, number of 1 present in second column,.............so on. Means at the end the list will look like this:
[[5,0,0,0,0],[0,3,0,1,1],[2,0,0,2,5],[1,4,0,0,0],[0,1,0,3,1]]
Please help
Let's try:
df = pd.DataFrame([[1, 2, 5, 2, 4],
[1, 2, 1, 2, 2],
[1, 2, 4, 2, 4],
[1, 5, 1, 1, 4],
[1, 4, 4, 2, 5]])
df.apply(pd.Series.value_counts).reindex([1,2,3,4,5]).fillna(0).to_numpy('int')
Output:
array([[5, 0, 2, 1, 0],
[0, 3, 0, 4, 1],
[0, 0, 0, 0, 0],
[0, 1, 2, 0, 3],
[0, 1, 1, 0, 1]])
Or, transposed:
df.apply(pd.Series.value_counts).reindex([1,2,3,4,5]).fillna(0).T.to_numpy('int')
Output:
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]])
You can use np.bincount with apply_along_axis.
a = df.to_numpy()
np.apply_along_axis(np.bincount, 0, a, minlength=a.max()+1).T[:, 1:]
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]], dtype=int64)
May using stack
df.stack().groupby(level=1).value_counts().unstack(fill_value=0).reindex(columns=[1,2,3,4,5],fill_value=0)
Out[495]:
1 2 3 4 5
0 5 0 0 0 0
1 0 3 0 1 1
2 2 0 0 2 1
3 1 4 0 0 0
4 0 1 0 3 1
Method from collections
pd.DataFrame(list(map(collections.Counter,a.T))).fillna(0)#.values
Out[527]:
1 2 4 5
0 5.0 0.0 0.0 0.0
1 0.0 3.0 1.0 1.0
2 2.0 0.0 2.0 1.0
3 1.0 4.0 0.0 0.0
4 0.0 1.0 3.0 1.0
My attempt with get_dummies and sum:
pd.get_dummies(df.stack()).sum(level=1)
1 2 4 5
0 5 0 0 0
1 0 3 1 1
2 2 0 2 1
3 1 4 0 0
4 0 1 3 1
If you need the column 3 with all zeros, use reindex:
pd.get_dummies(df.stack()).sum(level=1).reindex(columns=range(1, 6), fill_value=0)
1 2 3 4 5
0 5 0 0 0 0
1 0 3 0 1 1
2 2 0 0 2 1
3 1 4 0 0 0
4 0 1 0 3 1
Or, if you fancy a main course of numpy with a side dish of broadcasting:
# edit courtesy #user3483203
np.equal.outer(df.values, np.arange(1, 6)).sum(0)
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]])

vectorize groupby pandas

I have a dataframe like this:
day time category count
1 1 a 13
1 2 a 47
1 3 a 1
1 5 a 2
1 6 a 4
2 7 a 14
2 2 a 10
2 1 a 9
2 4 a 2
2 6 a 1
I want to group by day, and category and get a vector of the counts per time. Where time can be between 1 and 10. The max and min of time I have defined in two variables called max and min.
This is how I want the resulting dataframe to look:
day category count
1 a [13,47,1,0,2,4,0,0,0,0]
2 a [9,10,0,2,0,1,14,0,0,0]
Does anyone know how to make this aggregation into a vaector?
Use reindex with MultiIndex.from_product for append missing categories and then groupby with list:
df = df.set_index(['day','time', 'category'])
a = df.index.levels[0]
b = range(1,11)
c = df.index.levels[2]
df = df.reindex(pd.MultiIndex.from_product([a,b,c], names=df.index.names), fill_value=0)
df = df.groupby(['day','category'])['count'].apply(list).reset_index()
print (df)
day category count
0 1 a [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 a [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
EDIT:
df = (df.set_index(['day','time', 'category'])['count']
.unstack(1, fill_value=0)
.reindex(columns=range(1,11), fill_value=0))
print (df)
time 1 2 3 4 5 6 7 8 9 10
day category
1 a 13 47 1 0 2 4 0 0 0 0
2 a 9 10 0 2 0 1 14 0 0 0
df = df.apply(list, 1).reset_index(name='count')
print (df)
day ... count
0 1 ... [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 ... [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
[2 rows x 3 columns]

Image.frombytes not writing squares

I have a numpy array:
[[12 13 12 5 6 5 14 4 6 11 11 10 8 11 8 11 7 8 0 0 0]
[ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 7 8 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 8 11 8 11 8 11 7 8 0 0]
[ 5 14 4 6 11 11 10 7 8 0 0 0 0 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 11 10 8 11 8 11 8 11 8 11 8 11 8 11 7 8]
[ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 7 8 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 7 8 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 11 10 1 11 1 11 7 8 0 0 0 0 0 0 0 0]
[ 5 14 4 6 11 10 10 1 11 1 11 1 11 7 8 0 0 0 0 0 0]
[ 5 14 4 6 11 10 10 8 11 8 11 8 11 7 8 0 0 0 0 0 0]
[ 5 14 4 6 11 10 8 11 10 8 11 10 8 11 10 8 11 7 7 0 0]]
And a colors dictionary:
{0: (0, 0, 0), 1: (17, 17, 17), 2: (34, 34, 34), 3: (51, 51, 51), 4: (68, 68, 68), 5: (85, 85, 85), 6: (102, 102, 102), 7: (119, 119, 119), 8: (136, 136, 136), 9: (153, 153, 153), 10: (170, 170, 170), 11: (187, 187, 187), 12: (204, 204, 204), 13: (221, 221, 221), 14: (238, 238, 238)}
And I'm trying to write pass the array through the dictionary, then write those colors in 10x10 blocks to a .png file. So far I have:
rows = []
for row in arr:
for j in range(10):
for col in row:
for i in range(10):
rows.extend(colors[col])
rows = bytes(rows)
img = Image.frombytes('RGB', (110, 120), rows)
img.save("generated.png")
But this writes it like this:
Which has lines instead of the 10x10 blocks I was trying to write. It seems to me as though the blocks are shifted somehow, but I can't figure out how to un-shift them. Why is this behavior happening?
I believe you only need to change the size parameter to obtain the result you want. Replacing this line should correct the error:
# img = Image.frombytes('RGB', (110, 120), rows)
img = Image.frombytes('RGB', (210, 110), rows)
Size should be a 2-Tuple of the width and height of the image in pixels. The rows list you are creating is an image that is (210,110) pixels. You are drawing that to an image that is (110,120) pixels. This causes the image to break to a new row every 110 pixels.
Here is a working example:
from PIL import Image
array = [
[12, 13, 12, 5, 6, 5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 7, 8],
[5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 7, 8, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 11, 10, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 1, 11, 1, 11, 1, 11, 7, 8, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 10, 8, 11, 8, 11, 8, 11, 7, 8, 0, 0, 0, 0, 0, 0],
[5, 14, 4, 6, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 10, 8, 11, 7, 7, 0, 0],
]
colors = {
0: (0, 0, 0),
1: (17, 17, 17),
2: (34, 34, 34),
3: (51, 51, 51),
4: (68, 68, 68),
5: (85, 85, 85),
6: (102, 102, 102),
7: (119, 119, 119),
8: (136, 136, 136),
9: (153, 153, 153),
10: (170, 170, 170),
11: (187, 187, 187),
12: (204, 204, 204),
13: (221, 221, 221),
14: (238, 238, 238)
}
rows = []
for row in array:
for _ in range(10):
for col in row:
for _ in range(10):
rows.extend(colors[col])
rows = bytes(rows)
img = Image.frombytes('RGB', (210, 110), rows)
img.save("generated.png")

Resources