Can anyone please explain me what the ‘def displayRMagicSquare(magic_square)’ is asking for exactly? - python-3.x

I want to achieve the expected output by creating two functions. I have included some guidelines within the functions.
def createRMagicSquare(row1,row2,row3,row4):
'''
A function that creates a Ramanujan magic square
Returns
----------
'''
return magic_square
def displayRMagicSquare(magic_square):
'''
A function that displays a Ramanujan magic square
Parameters
----------
'''
c = createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
displayRMagicSquare(c)
Expected output
23 22 18 87
89 27 9 25
90 24 89 16
19 46 23 11

You are achieving your expected output in createRMagicSquare() and i think there is no need of displayRMagicSquare()
To createRMagicSquare() you are passing 4 lists,
createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
and they are being grouped into a single list such as,
magic_square = [row1,row2,row3,row4]
and the line print(" ".join(map(str,i))) is converting each list into a string of space seperated elements such as,
23 22 18 87
89 27 9 25
90 24 89 16
19 46 23 11
and if you still want to use 2 function to print the expected output then,
def createRMagicSquare(row1,row2,row3,row4):
magic_square = [row1,row2,row3,row4]
return magic_square
def displayRMagicSquare(magic_square):
for i in magic_square:
print(" ".join(map(str,i)))
return magic_square
c = createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
displayRMagicSquare(c)

Related

How to rearrange a pandas dataframe having N columns and append N columns together in python?

I have a dataframe df as shown below,A index,B Index and C Index appear as headers
and each of them have sub header as the Last price
Input
A index B Index C Index
Date Last Price Date Last Price Date Last Price
1/10/2021 12 1/11/2021 46 2/9/2021 67
2/10/2021 13 2/11/2021 51 3/9/2021 70
3/10/2021 14 3/11/2021 62 4/9/2021 73
4/10/2021 15 4/11/2021 47 5/9/2021 76
5/10/2021 16 5/11/2021 51 6/9/2021 79
6/10/2021 17 6/11/2021 22 7/9/2021 82
7/10/2021 18 7/11/2021 29 8/9/2021 85
I want to transform the to the below dataframe.
Expected Output
Date Index Name Last Price
1/10/2021 A index 12
2/10/2021 A index 13
3/10/2021 A index 14
4/10/2021 A index 15
5/10/2021 A index 16
6/10/2021 A index 17
7/10/2021 A index 18
1/11/2021 B Index 46
2/11/2021 B Index 51
3/11/2021 B Index 62
4/11/2021 B Index 47
5/11/2021 B Index 51
6/11/2021 B Index 22
7/11/2021 B Index 29
2/9/2021 C Index 67
3/9/2021 C Index 70
4/9/2021 C Index 73
5/9/2021 C Index 76
6/9/2021 C Index 79
7/9/2021 C Index 82
8/9/2021 C Index 85
How can this be done in pandas dataframe?
The structure of your df is not clear from your output. It would be useful if you provided Python code that creates an example, or at the very lest the output of df.columns. Now let us assume it is a 2-level multindex created as such:
columns = pd.MultiIndex.from_tuples([('A index','Date'), ('A index','Last Price'),('B index','Date'), ('B index','Last Price'),('C index','Date'), ('C index','Last Price')])
data = [
['1/10/2021', 12, '1/11/2021', 46, '2/9/2021', 67],
['2/10/2021', 13, '2/11/2021', 51, '3/9/2021', 70],
['3/10/2021', 14, '3/11/2021', 62, '4/9/2021', 73],
['4/10/2021', 15, '4/11/2021', 47, '5/9/2021', 76],
['5/10/2021', 16, '5/11/2021', 51, '6/9/2021', 79],
['6/10/2021', 17, '6/11/2021', 22, '7/9/2021', 82],
['7/10/2021', 18, '7/11/2021', 29, '8/9/2021', 85],
]
df = pd.DataFrame(columns = columns, data = data)
Then what you are trying to do is basically an application of .stack with some re-arrangement after:
(df.stack(level = 0)
.reset_index(level=1)
.rename(columns = {'level_1':'Index Name'})
.sort_values(['Index Name','Date'])
)
this produces
Index Name Date Last Price
0 A index 1/10/2021 12
1 A index 2/10/2021 13
2 A index 3/10/2021 14
3 A index 4/10/2021 15
4 A index 5/10/2021 16
5 A index 6/10/2021 17
6 A index 7/10/2021 18
0 B index 1/11/2021 46
1 B index 2/11/2021 51
2 B index 3/11/2021 62
3 B index 4/11/2021 47
4 B index 5/11/2021 51
5 B index 6/11/2021 22
6 B index 7/11/2021 29
0 C index 2/9/2021 67
1 C index 3/9/2021 70
2 C index 4/9/2021 73
3 C index 5/9/2021 76
4 C index 6/9/2021 79
5 C index 7/9/2021 82
6 C index 8/9/2021 85

how to splits columns by date using python

df.head(7)
df
Month,ward1,ward2,...ward30
Apr-19, 20, 30, 45
May-19, 18, 25, 42
Jun-19, 25, 19, 35
Jul-19, 28, 22, 38
Aug-19, 24, 15, 40
Sep-19, 21, 14, 39
Oct-19, 15, 18, 41
to:
Month, ward1
Apr-19, 20
May-19, 18
Jun-19, 25
Jul-19, 28
Aug-19, 24
Sep-19, 21
Oct-19, 15
Month,ward2
Apr-19, 30
May-19, 25
Jun-19, 19
Jul-19, 22
Aug-19, 15
Sep-19, 14
Oct-19, 18
Month, ward30
Apr-19, 45
May-19, 42
Jun-19, 35
Jul-19, 38
Aug-19, 40
Sep-19, 39
Oct-19, 41
How to group-by date wise in python using pandas?
I have dataframe df that contains a datetime and 30 other columns which I want to split by date attached with each of those columns in pandas but I am facing some difficulties.
try using a dictionary comprehension to hold your separate dataframes.
dfs = {col : df.set_index('Month')[[col]] for col in (df.set_index('Month').columns)}
print(dfs['ward1'])
ward1
Month
Apr-19 20
May-19 18
Jun-19 25
Jul-19 28
Aug-19 24
Sep-19 21
Oct-19 15
print(dfs['ward30'])
ward30
Month
Apr-19 45
May-19 42
Jun-19 35
Jul-19 38
Aug-19 40
Sep-19 39
Oct-19 41
One straight forward way would be to set date column as index and separating out every other column:
data.set_index('Month', inplace =True)
data_dict = {col: data[col] for col in data.columns}
You have to create new DataFrames:
data1 = pd.DataFrame()
data1['Month'] = df['Month']
data1['ward1'] = df['ward1']
data1.head()

Most frequently occurring numbers across multiple columns using pandas

I have a data frame with numbers in multiple columns listed by date, what I'm trying to do is find out the most frequently occurring numbers across the whole data set, also grouped by date.
import pandas as pd
import glob
def lotnorm(pdobject) :
# clean up special characters in the column names and make the date column the index as a date type.
pdobject["Date"] = pd.to_datetime(pdobject["Date"])
pdobject = pdobject.set_index('Date')
for column in pdobject:
if '#' in column:
pdobject = pdobject.rename(columns={column:column.replace('#','')})
return pdobject
def lotimport() :
lotret = {}
# list files in data directory with csv filename
for lotpath in [f for f in glob.glob("data/*.csv")]:
lotname = lotpath.split('\\')[1].split('.')[0]
lotret[lotname] = lotnorm(pd.read_csv(lotpath))
return lotret
print(lotimport()['ozlotto'])
------------- Output ---------------------
1 2 3 4 5 6 7 8 9
Date
2020-07-07 4 5 7 9 12 13 32 19 35
2020-06-30 1 17 26 28 38 39 44 14 41
2020-06-23 1 3 9 13 17 20 41 28 45
2020-06-16 1 2 13 21 22 27 38 24 33
2020-06-09 8 11 26 27 31 38 39 3 36
... .. .. .. .. .. .. .. .. ..
2005-11-15 7 10 13 17 30 32 41 20 14
2005-11-08 12 18 22 28 33 43 45 23 13
2005-11-01 1 3 11 17 24 34 43 39 4
2005-10-25 7 16 23 29 36 39 42 19 43
2005-10-18 5 9 12 30 33 39 45 7 19
The output I am aiming for is
Number frequency
45 201
32 195
24 187
14 160
48 154
--------------- Updated with append experiment -----------
I tried using append to create a single series from the dataframe, which worked for individual lines of code but got a really odd result when I ran it inside a for loop.
temp = lotimport()['ozlotto']['1']
print(temp)
temp = temp.append(lotimport()['ozlotto']['2'], ignore_index=True, verify_integrity=True)
print(temp)
temp = temp.append(lotimport()['ozlotto']['3'], ignore_index=True, verify_integrity=True)
print(temp)
lotcomb = pd.DataFrame()
for i in (lotimport()['ozlotto'].columns.tolist()):
print(f"{i} - {type(i)}")
lotcomb = lotcomb.append(lotimport()['ozlotto'][i], ignore_index=True, verify_integrity=True)
print(lotcomb)
This solution might be the one you are looking for.
freqvalues = np.unique(df.to_numpy(), return_counts=True)
df2 = pd.DataFrame(index=freqvalues[0], data=freqvalues[1], columns=["Frequency"])
df2.index.name = "Numbers"
df2
Output:
Frequency
Numbers
1 6
2 5
3 5
5 8
6 4
7 7
8 2
9 7
10 3
11 4
12 2
13 8
14 1
15 4
16 4
17 6
18 4
19 5
20 9
21 3
22 4
23 2
24 4
25 5
26 4
27 6
28 1
29 6
30 3
31 3
... ...
70 6
71 6
72 5
73 5
74 2
75 8
76 5
77 3
78 3
79 2
80 3
81 4
82 6
83 9
84 5
85 4
86 1
87 3
88 4
89 3
90 4
91 4
92 3
93 5
94 1
95 4
96 6
97 6
98 1
99 6
97 rows × 1 columns
df.max(axis=0)
for columns
df.max(axis=1)
for index
Ok so the final answer I came up with was a mix of a few things including some of the great input from people in this thread. Essentially I do the following:
Pull in the CSV file and clean up the dates and the column names, then convert it to a pandas dataframe.
Then create a new pandas series and append each column to it ignoring dates to prevent conflicts.
Once I have the series, I use Vioxini's suggestion to use numpy to get counts of unique values and then turn the values into the index, after that sort the column by count in descending order and return the top 10 values.
Below is the resulting code, I hope it helps someone else.
import pandas as pd
import glob
import numpy as np
def lotnorm(pdobject) :
# clean up special characters in the column names and make the date column the index as a date type.
pdobject["Date"] = pd.to_datetime(pdobject["Date"])
pdobject = pdobject.set_index('Date')
for column in pdobject:
if '#' in column:
pdobject = pdobject.rename(columns={column:column.replace('#','')})
return pdobject
def lotimport() :
lotret = {}
# list files in data directory with csv filename
for lotpath in [f for f in glob.glob("data/*.csv")]:
lotname = lotpath.split('\\')[1].split('.')[0]
lotret[lotname] = lotnorm(pd.read_csv(lotpath))
return lotret
lotcomb = pd.Series([],dtype=object)
for i in (lotimport()['ozlotto'].columns.tolist()):
lotcomb = lotcomb.append(lotimport()['ozlotto'][i], ignore_index=True, verify_integrity=True)
freqvalues = np.unique(lotcomb.to_numpy(), return_counts=True)
lotop = pd.DataFrame(index=freqvalues[0], data=freqvalues[1], columns=["Frequency"])
lotop.index.name = "Numbers"
lotop.sort_values(by=['Frequency'],ascending=False).head(10)

DataFrameGroupby.agg NamedAgg on same column errors out on custom function, but works on bult-in function

Setup
np.random.seed(0)
df = pd.DataFrame(zip([1, 1, 2, 2, 2, 3, 7, 7, 9, 10],
*np.random.randint(1, 100, 20).reshape(-1,10)),
columns=['A','B', 'C'])
Out[127]:
A B C
0 1 45 71
1 1 48 89
2 2 65 89
3 2 68 13
4 2 68 59
5 3 10 66
6 7 84 40
7 7 22 88
8 9 37 47
9 10 88 89
f = lambda x: x.max()
NamedAgg on built-in function works fine
df.groupby('A').agg(B_min=('B', 'min'), B_max=('B', 'max'), C_max=('C', 'max'))
Out[133]:
B_min B_max C_max
A
1 45 48 89
2 65 68 89
3 10 10 66
7 22 84 88
9 37 37 47
10 88 88 89
NamedAgg on custom function f errors out
df.groupby('A').agg(B_min=('B', 'min'), B_max=('B', f), C_max=('C', 'max'))
KeyError: "[('B', '<lambda>')] not in index"
Is there any explanation for this error? is this error an intentional restriction?
The issue is because of _mangle_lambda_list, which gets called at some point. There seems to be a mismatch where the resulting aggregation gets renamed but the list of output columns, ordered which are then used here, doesn't get changed. Since that function specifically checks for if com.get_callable_name(aggfunc) == "<lambda>" any name other than '<lambda>' will work without issue:
Sample data
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(zip([1, 1, 2, 2, 2, 3, 7, 7, 9, 10],
*np.random.randint(1, 100, 20).reshape(-1,10)),
columns=['A','B', 'C'])
f = lambda x: x.max()
kwargs = {'B_min': ('B', 'min'), 'B_max':('B', f), 'C_max':('C', 'max')}
Here are the most relevant major steps that get called when you aggregate, and we can see where the KeyError comes from.
func, columns, order = pd.core.groupby.generic._normalize_keyword_aggregation(kwargs)
print(order)
#[('B', 'min'), ('B', '<lambda>'), ('C', 'max')]
func = pd.core.groupby.generic._maybe_mangle_lambdas(func)
df.groupby('A')._aggregate(func)
# B C
# min <lambda_0> max # _0 ruins indexing with ('B', '<lambda>')
#A
#1 45 48 89
#2 65 68 89
#3 10 10 66
#7 22 84 88
#9 37 37 47
#10 88 88 89
Because _mangle_lambda_list is only called when there are multiple aggregations for the same column, you can get away with the '<lambda>' name, so long as it is the only aggregation for that column.
df.groupby('A').agg(A_min=('A', 'min'), B_max=('B', f))
# A_min B_max
#A
#1 1 48
#2 2 68
#3 3 10
#7 7 84
#9 9 37
#10 10 88

Converting a pandas dataframe to dictionnary of lists

I have the following problem. I have a pandas dataframe that was populated from a pandas.read_sql.
The dataframe looks like this:
PERSON GRADES
20 A 70
21 A 23
22 A 67
23 B 08
24 B 06
25 B 88
26 B 09
27 C 40
28 D 87
29 D 11
And I would need to convert it to a dictionary of lists like this:
{'A':[70,23,67], 'B':[08,06,88,09], 'C':[40], 'D':[87,11]}
I've tried for 2 hours and now I'm out of ideas. I think I'm missing something very simple.
With groupby and to_dict
df.groupby('PERSON').GRADES.apply(list).to_dict()
Out[286]: {'A': [70, 23, 67], 'B': [8, 6, 88, 9], 'C': [40], 'D': [87, 11]}

Resources