how to splits columns by date using python - python-3.x

df.head(7)
df
Month,ward1,ward2,...ward30
Apr-19, 20, 30, 45
May-19, 18, 25, 42
Jun-19, 25, 19, 35
Jul-19, 28, 22, 38
Aug-19, 24, 15, 40
Sep-19, 21, 14, 39
Oct-19, 15, 18, 41
to:
Month, ward1
Apr-19, 20
May-19, 18
Jun-19, 25
Jul-19, 28
Aug-19, 24
Sep-19, 21
Oct-19, 15
Month,ward2
Apr-19, 30
May-19, 25
Jun-19, 19
Jul-19, 22
Aug-19, 15
Sep-19, 14
Oct-19, 18
Month, ward30
Apr-19, 45
May-19, 42
Jun-19, 35
Jul-19, 38
Aug-19, 40
Sep-19, 39
Oct-19, 41
How to group-by date wise in python using pandas?
I have dataframe df that contains a datetime and 30 other columns which I want to split by date attached with each of those columns in pandas but I am facing some difficulties.

try using a dictionary comprehension to hold your separate dataframes.
dfs = {col : df.set_index('Month')[[col]] for col in (df.set_index('Month').columns)}
print(dfs['ward1'])
ward1
Month
Apr-19 20
May-19 18
Jun-19 25
Jul-19 28
Aug-19 24
Sep-19 21
Oct-19 15
print(dfs['ward30'])
ward30
Month
Apr-19 45
May-19 42
Jun-19 35
Jul-19 38
Aug-19 40
Sep-19 39
Oct-19 41

One straight forward way would be to set date column as index and separating out every other column:
data.set_index('Month', inplace =True)
data_dict = {col: data[col] for col in data.columns}

You have to create new DataFrames:
data1 = pd.DataFrame()
data1['Month'] = df['Month']
data1['ward1'] = df['ward1']
data1.head()

Related

How to rearrange a pandas dataframe having N columns and append N columns together in python?

I have a dataframe df as shown below,A index,B Index and C Index appear as headers
and each of them have sub header as the Last price
Input
A index B Index C Index
Date Last Price Date Last Price Date Last Price
1/10/2021 12 1/11/2021 46 2/9/2021 67
2/10/2021 13 2/11/2021 51 3/9/2021 70
3/10/2021 14 3/11/2021 62 4/9/2021 73
4/10/2021 15 4/11/2021 47 5/9/2021 76
5/10/2021 16 5/11/2021 51 6/9/2021 79
6/10/2021 17 6/11/2021 22 7/9/2021 82
7/10/2021 18 7/11/2021 29 8/9/2021 85
I want to transform the to the below dataframe.
Expected Output
Date Index Name Last Price
1/10/2021 A index 12
2/10/2021 A index 13
3/10/2021 A index 14
4/10/2021 A index 15
5/10/2021 A index 16
6/10/2021 A index 17
7/10/2021 A index 18
1/11/2021 B Index 46
2/11/2021 B Index 51
3/11/2021 B Index 62
4/11/2021 B Index 47
5/11/2021 B Index 51
6/11/2021 B Index 22
7/11/2021 B Index 29
2/9/2021 C Index 67
3/9/2021 C Index 70
4/9/2021 C Index 73
5/9/2021 C Index 76
6/9/2021 C Index 79
7/9/2021 C Index 82
8/9/2021 C Index 85
How can this be done in pandas dataframe?
The structure of your df is not clear from your output. It would be useful if you provided Python code that creates an example, or at the very lest the output of df.columns. Now let us assume it is a 2-level multindex created as such:
columns = pd.MultiIndex.from_tuples([('A index','Date'), ('A index','Last Price'),('B index','Date'), ('B index','Last Price'),('C index','Date'), ('C index','Last Price')])
data = [
['1/10/2021', 12, '1/11/2021', 46, '2/9/2021', 67],
['2/10/2021', 13, '2/11/2021', 51, '3/9/2021', 70],
['3/10/2021', 14, '3/11/2021', 62, '4/9/2021', 73],
['4/10/2021', 15, '4/11/2021', 47, '5/9/2021', 76],
['5/10/2021', 16, '5/11/2021', 51, '6/9/2021', 79],
['6/10/2021', 17, '6/11/2021', 22, '7/9/2021', 82],
['7/10/2021', 18, '7/11/2021', 29, '8/9/2021', 85],
]
df = pd.DataFrame(columns = columns, data = data)
Then what you are trying to do is basically an application of .stack with some re-arrangement after:
(df.stack(level = 0)
.reset_index(level=1)
.rename(columns = {'level_1':'Index Name'})
.sort_values(['Index Name','Date'])
)
this produces
Index Name Date Last Price
0 A index 1/10/2021 12
1 A index 2/10/2021 13
2 A index 3/10/2021 14
3 A index 4/10/2021 15
4 A index 5/10/2021 16
5 A index 6/10/2021 17
6 A index 7/10/2021 18
0 B index 1/11/2021 46
1 B index 2/11/2021 51
2 B index 3/11/2021 62
3 B index 4/11/2021 47
4 B index 5/11/2021 51
5 B index 6/11/2021 22
6 B index 7/11/2021 29
0 C index 2/9/2021 67
1 C index 3/9/2021 70
2 C index 4/9/2021 73
3 C index 5/9/2021 76
4 C index 6/9/2021 79
5 C index 7/9/2021 82
6 C index 8/9/2021 85

How to sum certain columns ending a certain word of a dataframe in python pandas?

I am trying to get 'summing' of columns ending 'Load' and 'Gen' to two new columns.
My dataframe is:
Date A_Gen A_Load B_Gen B_Load
1-1-2010 30 20 40 30
1-2-2010 45 25 35 25
The result wanted is:
Date A_Gen A_Load B_Gen B_Load S_Gen S_Load
1-1-2010 30 20 40 30 70 50
1-2-2010 45 25 35 25 80 50
Try using filter(like='..') to get the relevant columns, sum along axis=1, and return your 2 new columns:
df['S_Gen'] , df['B_Load'] = df.filter(like='Load').sum(1) , df.filter(like='Gen').sum(1)
Output:
df
Out[146]:
Date A_Gen A_Load B_Gen B_Load S_Gen
0 2010-01-01 30 20 40 70 50
1 2010-02-01 45 25 35 80 50

Can anyone please explain me what the ‘def displayRMagicSquare(magic_square)’ is asking for exactly?

I want to achieve the expected output by creating two functions. I have included some guidelines within the functions.
def createRMagicSquare(row1,row2,row3,row4):
'''
A function that creates a Ramanujan magic square
Returns
----------
'''
return magic_square
def displayRMagicSquare(magic_square):
'''
A function that displays a Ramanujan magic square
Parameters
----------
'''
c = createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
displayRMagicSquare(c)
Expected output
23 22 18 87
89 27 9 25
90 24 89 16
19 46 23 11
You are achieving your expected output in createRMagicSquare() and i think there is no need of displayRMagicSquare()
To createRMagicSquare() you are passing 4 lists,
createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
and they are being grouped into a single list such as,
magic_square = [row1,row2,row3,row4]
and the line print(" ".join(map(str,i))) is converting each list into a string of space seperated elements such as,
23 22 18 87
89 27 9 25
90 24 89 16
19 46 23 11
and if you still want to use 2 function to print the expected output then,
def createRMagicSquare(row1,row2,row3,row4):
magic_square = [row1,row2,row3,row4]
return magic_square
def displayRMagicSquare(magic_square):
for i in magic_square:
print(" ".join(map(str,i)))
return magic_square
c = createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
displayRMagicSquare(c)

Converting a pandas dataframe to dictionnary of lists

I have the following problem. I have a pandas dataframe that was populated from a pandas.read_sql.
The dataframe looks like this:
PERSON GRADES
20 A 70
21 A 23
22 A 67
23 B 08
24 B 06
25 B 88
26 B 09
27 C 40
28 D 87
29 D 11
And I would need to convert it to a dictionary of lists like this:
{'A':[70,23,67], 'B':[08,06,88,09], 'C':[40], 'D':[87,11]}
I've tried for 2 hours and now I'm out of ideas. I think I'm missing something very simple.
With groupby and to_dict
df.groupby('PERSON').GRADES.apply(list).to_dict()
Out[286]: {'A': [70, 23, 67], 'B': [8, 6, 88, 9], 'C': [40], 'D': [87, 11]}

Python - sum of each day in multiple excel sheet

I have a excel sheets data like below,
Sheet1
duration date
10 5/20/2017 08:20
23 5/20/2017 10:20
33 5/21/2017 12:20
56 5/22/2017 23:20
Sheet2
duration date
34 5/20/2017 01:20
12 5/20/2017 03:20
05 5/21/2017 11:20
44 5/22/2017 23:20
Expected OP :
day[20] : [33, 46]
day[21] : [33, 12]
day[22] : [56, 44]
I am trying to sum of duration day wise in all sheets like below code,
xls = pd.ExcelFile('reports.xlsx')
report_sheets = []
for sheetName in xls.sheet_names:
sheet = pd.read_excel(xls,sheet_name=sheetName)
sheet['date'] = pd.to_datetime(sheet['date'])
print(sheet.groupby(sheet['date'].dt.strftime('%Y-%m-%d'))['duration'].sum().sort_values())
How can I achieve this?
You can use parameter sheet_name=False to read_excel for return dictionary of DataFrames:
dfs = pd.read_excel('reports.xlsx', sheet_name=None)
print (dfs)
OrderedDict([('Sheet1', duration date
0 10 5/20/2017 08:20
1 23 5/20/2017 10:20
2 33 5/21/2017 12:20
3 56 5/22/2017 23:20), ('Sheet2', duration date
0 34 5/20/2017 01:20
1 12 5/20/2017 03:20
2 5 5/21/2017 11:20
3 44 5/22/2017 23:20)])
Then aggregate in dictionary comprehension:
dfs1 = {i:x.groupby(pd.to_datetime(x['date']).dt.strftime('%Y-%m-%d'))['duration'].sum() for i, x in dfs.items()}
print (dfs1)
{'Sheet2': date
2017-05-20 46
2017-05-21 5
2017-05-22 44
Name: duration, dtype: int64, 'Sheet1': date
2017-05-20 33
2017-05-21 33
2017-05-22 56
Name: duration, dtype: int64}
And last concat, create lists and last dictionary by to_dict:
d = pd.concat(dfs1).groupby(level=1).apply(list).to_dict()
print (d)
{'2017-05-22': [56, 44], '2017-05-21': [33, 5], '2017-05-20': [33, 46]}
Make a function that takes the sheet's dataframe and returns a dictionary
def make_goofy_dict(d):
d = d.set_index('date').duration.resample('D').sum()
return d.apply(lambda x: [x]).to_dict()
Then use merge_with from either toolz or cytoolz
from cytoolz.dicttoolz import merge_with
merge_with(lambda x: sum(x, []), map(make_goofy_dict, (sheet1, sheet2)))
{Timestamp('2017-05-20 00:00:00', freq='D'): [33, 46],
Timestamp('2017-05-21 00:00:00', freq='D'): [33, 5],
Timestamp('2017-05-22 00:00:00', freq='D'): [56, 44]}
details
print(sheet1, sheet2, sep='\n\n')
duration date
0 10 2017-05-20 08:20:00
1 23 2017-05-20 10:20:00
2 33 2017-05-21 12:20:00
3 56 2017-05-22 23:20:00
duration date
0 34 2017-05-20 01:20:00
1 12 2017-05-20 03:20:00
2 5 2017-05-21 11:20:00
3 44 2017-05-22 23:20:00
For your problem
I'd do this
from cytoolz.dicttoolz import merge_with
def make_goofy_dict(d):
d = d.set_index('date').duration.resample('D').sum()
return d.apply(lambda x: [x]).to_dict()
def read_sheet(xls, sn):
return pd.read_excel(xls, sheet_name=sn, parse_dates=['date'])
xls = pd.ExcelFile('reports.xlsx')
sheet_dict = merge_with(
lambda x: sum(x, []),
map(make_goofy_dict, map(read_sheet, xls.sheet_names))
)

Resources