have a list called 'Aff' that consists of dictionaries. It looks like this:
Aff=[{('J', 0, 1): 36, ('J', 1, 1): 36, ('J', 2, 1): 42}, {('I', 0, 1): 36, ('I', 1, 1): 30}, {('H', 0, 1): 36, ('H', 1, 1): 36, ('H', 2, 1): 42}]
and i wanna get this structure on EXCEL :
-------Num------letter-----NV----Postion ---Q
1 J 0 1 36
1 J 1 1 36
1 J 2 1 42
2 I 0 1 36
2 I 1 1 36
...ect
Your dictionary keys consisting of tuple forces you to change the structure of your data. IIUC Num is the index (+1) of the dictionary in the list.
Once you flattened your data, you can use pandas.to_excel():
import pandas as pd
Aff=[{('J', 0, 1): 36, ('J', 1, 1): 36, ('J', 2, 1): 42}, {('I', 0, 1): 36, ('I', 1, 1): 30}, {('H', 0, 1): 36, ('H', 1, 1): 36, ('H', 2, 1): 42}]
arr = []
for num, d in enumerate(Aff):
for k,v in d.items():
arr.append([num+1] + list(k) + [v])
df = pd.DataFrame(arr, columns=['Num', 'letter', 'NV', 'Position', 'Q'])
df.to_excel('output.xlsx', index=False)
print(df) would output:
Num letter NV Position Q
0 1 J 0 1 36
1 1 J 1 1 36
2 1 J 2 1 42
3 2 I 0 1 36
4 2 I 1 1 30
5 3 H 0 1 36
6 3 H 1 1 36
7 3 H 2 1 42
Related
Write a program to solve a Sudoku puzzle by filling the empty cells where 0 represent empty cell.
Rules:
All the number in sudoku must appear exactly once in diagonal running from top-left to bottom-right.
All the number in sudoku must appear exactly
once in diagonal running from top-right to bottom-left.
All the number in sudoku
must appear exactly once in a 3*3 sub-grid.
However number can repeat in row or column.
Constraints:
N = 9; where N represent rows and column of grid.
What I have tried:
N = 9
def printing(arr):
for i in range(N):
for j in range(N):
print(arr[i][j], end=" ")
print()
def isSafe(grid, row, col, num):
for x in range(9):
if grid[x][x] == num:
return False
cl = 0
for y in range(8, -1, -1):
if grid[cl][y] == num:
cl += 1
return False
startRow = row - row % 3
startCol = col - col % 3
for i in range(3):
for j in range(3):
if grid[i + startRow][j + startCol] == num:
return False
return True
def solveSudoku(grid, row, col):
if (row == N - 1 and col == N):
return True
if col == N:
row += 1
col = 0
if grid[row][col] > 0:
return solveSudoku(grid, row, col + 1)
for num in range(1, N + 1, 1):
if isSafe(grid, row, col, num):
grid[row][col] = num
print(grid)
if solveSudoku(grid, row, col + 1):
return True
grid[row][col] = 0
return False
if (solveSudoku(grid, 0, 0)):
printing(grid)
else:
print("Solution does not exist")
Input:
grid =
[
[0, 3, 7, 0, 4, 2, 0, 2, 0],
[5, 0, 6, 1, 0, 0, 0, 0, 7],
[0, 0, 2, 0, 0, 0, 5, 0, 0],
[2, 8, 3, 0, 0, 0, 0, 0, 0],
[0, 5, 0, 0, 7, 1, 2, 0, 7],
[0, 0, 0, 3, 0, 0, 0, 0, 3],
[7, 0, 0, 0, 0, 6, 0, 5, 0],
[0, 2, 3, 0, 3, 0, 7, 4, 2],
[0, 5, 0, 0, 8, 0, 0, 0, 0]
]
Output:
8 3 7 0 4 2 1 2 4
5 0 6 1 8 6 3 6 7
4 1 2 7 3 5 5 0 8
2 8 3 5 4 8 6 0 5
6 5 0 0 7 1 2 1 7
1 4 7 3 2 6 8 4 3
7 8 1 4 1 6 3 5 0
6 2 3 7 3 5 7 4 2
0 5 4 2 8 0 6 8 1
Basically I am stuck on the implementation of checking distinct number along diagonal. So question is how I can make sure that no elements gets repeated in those diagonal and sub-grid.
DataFrame
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3]})
df
Expected output
df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]})
df
I need to transform the dataframe into the output. I am after a wild card that will determine 1 is the start of a new group and a group consists of only 1 followed by n zeroes. If a group criteria is not met, then group it as 100.
I tried in the line of;
bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze()
bs
This even when broken down could only bool select start and nothing more.
Any help?
Create mask by compare 1 and next 1 in mask, then filter occurance for all values without them, create cumulative sum by Series.cumsum and last add 100 values by Series.reindex:
m = df.occurance.eq(1) & df.occurance.shift(-1).eq(1)
df['group'] = df.loc[~m, 'occurance'].cumsum().reindex(df.index, fill_value=100)
print (df)
occurance value group
0 1 45 1
1 0 3 1
2 0 2 1
3 0 12 1
4 1 14 2
5 0 32 2
6 0 1 2
7 0 1 2
8 0 6 2
9 0 4 2
10 1 9 3
11 0 32 3
12 1 78 100
13 1 96 4
14 0 12 4
15 0 6 4
16 0 3 4
Setting: for a dataframe with 10 columns I have a list of 10 functions which I wish to apply in a function1(column1), function2(column2), ..., function10(column10) fashion. I have looked into pandas.DataFrame.apply and pandas.DataFrame.transform but they seem to broadcast and apply each function on all possible columns.
IIUC, with zip and a for loop:
Example
def function1(x):
return x + 1
def function2(x):
return x * 2
def function3(x):
return x**2
df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 2, 3], 'C': [1, 2, 3]})
functions = [function1, function2, function3]
print(df)
# A B C
# 0 1 1 1
# 1 2 2 2
# 2 3 3 3
for col, func in zip(df, functions):
df[col] = df[col].apply(func)
print(df)
# A B C
# 0 2 2 1
# 1 3 4 4
# 2 4 6 9
You could do something like:
# list containing functions
fun_list = []
# assume df is your dataframe
for i, fun in enumerate(fun_list):
df.iloc[:,i] = fun(df.iloc[:,i])
You can probably try to map your N functions to each row by using a lambda containing a Series with your operations, check the following code:
import pandas as pd
matrix = [(22, 34, 23), (33, 31, 11), (44, 16, 21), (55, 32, 22), (66, 33, 27),
(77, 35, 11)]
df = pd.DataFrame(matrix, columns=list('xyz'), index=list('abcdef'))
Will produce:
x y z
a 22 34 23
b 33 31 11
c 44 16 21
d 55 32 22
e 66 33 27
f 77 35 11
and then:
res_df = df.apply(lambda row: pd.Series([row[0] + 1, row[1] + 2, row[2] + 3]), axis=1)
will give you:
0 1 2
a 23 36 26
b 34 33 14
c 45 18 24
d 56 34 25
e 67 35 30
f 78 37 14
You can simply apply to specific column
df['x'] = df['X'].apply(lambda x: x*2)
Similar to #Chris Adams's answer but makes a copy of the dataframe using dictionary comprehension and zip.
def function1(x):
return x + 1
def function2(x):
return x * 2
def function3(x):
return x**2
df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 2, 3], 'C': [1, 2, 3]})
functions = [function1, function2, function3]
print(df)
# A B C
# 0 1 1 1
# 1 2 2 2
# 2 3 3 3
df_2 = pd.DataFrame({col: func(df[col]) for col, func in zip(df, functions)})
print(df_2)
# A B C
# 0 2 2 1
# 1 3 4 4
# 2 4 6 9
I'm trying to have a frame with the following structure
h/a totales
sub1 sub2 sub1 sub2
a b ... f g ....m a b ... f g ....m
That being, 2 labels for the first layer, again 2 labels for the second one, and then a subset of column names where sub1 and sub2 doesn't have the same column names.
In order to do so I did the following:
columnas=pd.MultiIndex.from_product([['h/a','totals'],['means','percentages'],
[('means','a'),('means','b'),....('percentage','g'),....],
names=['data level 1','data level 2','data level 3']])
data=[data,pata,......]
newframe=pd.DataFrame(data,columns=columnas)
What I get is this error:
>ValueError: Shape of passed values is (1, 21), indices imply (84, 21)
How can I fix this to have a multi leveled frame by column names?
Thank you
I think need MultiIndex.from_tuples from list comprehensions:
L1 = list('abc')
L2 = list('ghi')
tups = ([('h/a','means', x) for x in L1] +
[('h/a','percentage', x) for x in L2] +
[('totals','means', x) for x in L1] +
[('totals','percentage', x) for x in L2])
columnas=pd.MultiIndex.from_tuples(tups, names=['data level 1','data level 2','data level 3'])
print (columnas)
MultiIndex(levels=[['h/a', 'totals'],
['means', 'percentage'],
['a', 'b', 'c', 'g', 'h', 'i']],
labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1],
[0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]],
names=['data level 1', 'data level 2', 'data level 3'])
#some random data
np.random.seed(785)
data = np.random.randint(10, size=(3, 12))
print (data)
[[8 0 4 1 2 5 4 1 4 1 1 8]
[1 5 0 7 4 8 4 1 3 8 0 2]
[5 9 4 9 4 6 3 7 0 5 2 1]]
newframe=pd.DataFrame(data,columns=columnas)
print (newframe)
data level 1 h/a totals
data level 2 means percentage means percentage
data level 3 a b c g h i a b c g h i
0 8 0 4 1 2 5 4 1 4 1 1 8
1 1 5 0 7 4 8 4 1 3 8 0 2
2 5 9 4 9 4 6 3 7 0 5 2 1
I have a dataframe like this:
day time category count
1 1 a 13
1 2 a 47
1 3 a 1
1 5 a 2
1 6 a 4
2 7 a 14
2 2 a 10
2 1 a 9
2 4 a 2
2 6 a 1
I want to group by day, and category and get a vector of the counts per time. Where time can be between 1 and 10. The max and min of time I have defined in two variables called max and min.
This is how I want the resulting dataframe to look:
day category count
1 a [13,47,1,0,2,4,0,0,0,0]
2 a [9,10,0,2,0,1,14,0,0,0]
Does anyone know how to make this aggregation into a vaector?
Use reindex with MultiIndex.from_product for append missing categories and then groupby with list:
df = df.set_index(['day','time', 'category'])
a = df.index.levels[0]
b = range(1,11)
c = df.index.levels[2]
df = df.reindex(pd.MultiIndex.from_product([a,b,c], names=df.index.names), fill_value=0)
df = df.groupby(['day','category'])['count'].apply(list).reset_index()
print (df)
day category count
0 1 a [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 a [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
EDIT:
df = (df.set_index(['day','time', 'category'])['count']
.unstack(1, fill_value=0)
.reindex(columns=range(1,11), fill_value=0))
print (df)
time 1 2 3 4 5 6 7 8 9 10
day category
1 a 13 47 1 0 2 4 0 0 0 0
2 a 9 10 0 2 0 1 14 0 0 0
df = df.apply(list, 1).reset_index(name='count')
print (df)
day ... count
0 1 ... [13, 47, 1, 0, 2, 4, 0, 0, 0, 0]
1 2 ... [9, 10, 0, 2, 0, 1, 14, 0, 0, 0]
[2 rows x 3 columns]