"Max value day" of the week and tallying up each day that was highest Python - python-3.x

I was able to get the highest value of the week. Now, I need to figure out which day of the week it was so I can tally up how many times a certain day of the week is the highest.
For example,
Day of the week that has highest value of that week
Mon:5
Tue:2
Wed:3
Thur:2
Fri:1
This is what my dataframe looked like before I parsed the information that I needed.
Date Weekdays Week Open Close
0 2019-06-26 Wednesday 26 208.279999 208.509995
1 2019-06-27 Thursday 26 208.970001 212.020004
2 2019-06-28 Friday 26 213.000000 213.169998
3 2019-07-01 Monday 27 214.250000 214.619995
4 2019-07-02 Tuesday 27 214.380005 214.539993
.. ... ... ... ... ...
500 2021-06-21 Monday 25 275.619995 277.100006
501 2021-06-22 Tuesday 25 277.570007 276.920013
502 2021-06-23 Wednesday 25 276.890015 274.660004
503 2021-06-24 Thursday 25 275.000000 275.489990
504 2021-06-25 Friday 25 276.369995 278.380005
[505 rows x 5 columns]
Now I was able to get the highest value of the week, but I want to get the day and tally the which days were the highest.
#Tally up the highest days of the week at OPEN
new_data.groupby(pd.Grouper('Week')).Open.max()
The result was
Week
26 213.000000
27 215.130005
28 215.210007
29 214.440002
30 208.369995
31 210.000000
32 204.199997
33 214.740005
34 210.050003
35 217.509995
36 222.000000
37 220.539993
38 220.279999
39 214.000000
40 214.300003
41 215.880005
42 216.740005
43 212.429993
44 213.550003
45 222.809998
46 228.500000
47 233.570007
48 233.919998
49 231.190002
50 231.259995
51 227.679993
52 226.860001
1 233.539993
2 234.789993
3 235.220001
4 233.000000
5 236.979996
6 241.429993
7 244.729996
8 248.070007
9 251.080002
10 264.220001
11 260.309998
12 252.750000
13 259.940002
14 264.220001
15 270.470001
16 272.299988
17 276.290009
18 289.970001
19 292.350006
20 290.200012
21 290.190002
22 292.910004
23 292.559998
24 286.660004
25 277.570007
53 230.500000
Name: Open, dtype: float64

I got you. We wrap the groupby in df.loc, then select the indexes for the max values of Open in each group. Finally just take the value_counts of the Weekdays.
df.loc[df.groupby(["Week"]).Open.idxmax()].Weekdays.value_counts()

Related

Find earliest date within daterange

I have the following market data:
data = pd.DataFrame({'year': [2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020],
'month': [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11],
'day': [1,2,5,6,7,8,9,12,13,14,15,16,19,20,21,22,23,26,27,28,29,30,2,3,5,6,9,10,11,12,13,16,17,18,19,20,23,24,25,26,27,30]})
data['date'] = pd.to_datetime(data)
data['spot'] = [77.3438,78.192,78.1044,78.4357,78.0285,77.3507,76.78,77.13,77.0417,77.6525,78.0906,77.91,77.6602,77.3568,76.7243,76.5872,76.1374,76.4435,77.2906,79.2239,78.8993,79.5305,80.5313,79.3615,77.0156,77.4226,76.288,76.5648,77.1171,77.3568,77.374,76.1758,76.2325,76.0401,76.0529,76.1992,76.1648,75.474,75.551,75.7018,75.8639,76.3944]
data = data.set_index('date')
I'm trying to find the spot value for the first day of the month in the date column. I can find the first business day with below:
def get_month_beg(d):
month_beg = (d.index + pd.offsets.BMonthEnd(0) - pd.offsets.MonthBegin(normalize=True))
return month_beg
data['month_beg'] = get_month_beg(data)
However, due to data issues, sometimes the earliest date from my data does not match up with the first business day of the month.
We'll call the earliest spot value of each month the "strike", which is what I'm trying to find. So for October, the spot value would be 77.3438 (10/1/21) and in Nov it would be 80.5313 (which is on 11/2/21 NOT 11/1/21).
I tried below, which only works if my data's earliest date matches up with the first business date of the month (eg it works in Oct, but not in Nov)
data['strike'] = data.month_beg.map(data.spot)
As you can see, I get NaN in Nov because the first business day in my data is 11/2 (spot rate 80.5313) not 11/1. Does anyone know how to find the earliest date within a date range (in this case the earliest date of each month)?
I was hoping the final df would like like below:
data = pd.DataFrame({'year': [2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020],
'month': [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11],
'day': [1,2,5,6,7,8,9,12,13,14,15,16,19,20,21,22,23,26,27,28,29,30,2,3,5,6,9,10,11,12,13,16,17,18,19,20,23,24,25,26,27,30]})
data['date'] = pd.to_datetime(data)
data['spot'] = [77.3438,78.192,78.1044,78.4357,78.0285,77.3507,76.78,77.13,77.0417,77.6525,78.0906,77.91,77.6602,77.3568,76.7243,76.5872,76.1374,76.4435,77.2906,79.2239,78.8993,79.5305,80.5313,79.3615,77.0156,77.4226,76.288,76.5648,77.1171,77.3568,77.374,76.1758,76.2325,76.0401,76.0529,76.1992,76.1648,75.474,75.551,75.7018,75.8639,76.3944]
data['strike'] = [77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,77.3438,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313,80.5313]
data = data.set_index('date')
I Believe, We can get the first() for every year and month combination and later on join that with main data.
data2=data.groupby(['year','month']).first().reset_index()
#join data 2 with data based on month and year later on
year month day spot
0 2020 10 1 77.3438
1 2020 11 2 80.5313
Based on the question, What i have understood is that we need to take every month's first day and respective 'SPOT' column value.
Correct me if i have understood it wrong.
Strike = Spot value from first day of each month
To do this, we need to do the following:
Step 1. Get the Year/Month value from the Date column. Alternate, we
can use Year and Month columns you already have in the DataFrame.
Step 2: We need to groupby Year and Month. That will give all the
records by Year+Month. From this, we need to get the first record
(which will be the earliest date of the month). The earliest date can
either be 1st or 2nd or 3rd of the month depending on the data in the
column.
Step 3: By using transform in Groupby, pandas will send back the
results to match the dataframe length. So for each record, it will
send the same result. In this example, we have only 2 months (Oct &
Nov). However, we have 42 rows. Transform will send us back 42 rows.
The code: groupby('[year','month'])['date'].transform('first') will give
first day of month.
Use This:
data['dy'] = data.groupby(['year','month'])['date'].transform('first')
or:
data['dx'] = data.date.dt.to_period('M') #to get yyyy-mm value
Step 4: Using transform, we can also get the Spot value. This can be
assigned to Strike giving us the desired result. Instead of getting
first day of the month, we can change it to return Spot value.
The code will be: groupby('date')['spot'].transform('first')
Use this:
data['strike'] = data.groupby(['year','month'])['spot'].transform('first')
or
data['strike'] = data.groupby('dx')['spot'].transform('first')
Putting all this together
The full code to get Strike Price using Spot Price from first day of month
import pandas as pd
import numpy as np
data = pd.DataFrame({'year': [2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020],
'month': [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11],
'day': [1,2,5,6,7,8,9,12,13,14,15,16,19,20,21,22,23,26,27,28,29,30,2,3,5,6,9,10,11,12,13,16,17,18,19,20,23,24,25,26,27,30]})
data['date'] = pd.to_datetime(data)
data['spot'] = [77.3438,78.192,78.1044,78.4357,78.0285,77.3507,76.78,77.13,77.0417,77.6525,78.0906,77.91,77.6602,77.3568,76.7243,76.5872,76.1374,76.4435,77.2906,79.2239,78.8993,79.5305,80.5313,79.3615,77.0156,77.4226,76.288,76.5648,77.1171,77.3568,77.374,76.1758,76.2325,76.0401,76.0529,76.1992,76.1648,75.474,75.551,75.7018,75.8639,76.3944]
#Pick the first day of month Spot price as the Strike price
data['strike'] = data.groupby(['year','month'])['spot'].transform('first')
#This will give you the first row of each month
print (data)
The output of this will be:
year month day date spot strike
0 2020 10 1 2020-10-01 77.3438 77.3438
1 2020 10 2 2020-10-02 78.1920 77.3438
2 2020 10 5 2020-10-05 78.1044 77.3438
3 2020 10 6 2020-10-06 78.4357 77.3438
4 2020 10 7 2020-10-07 78.0285 77.3438
5 2020 10 8 2020-10-08 77.3507 77.3438
6 2020 10 9 2020-10-09 76.7800 77.3438
7 2020 10 12 2020-10-12 77.1300 77.3438
8 2020 10 13 2020-10-13 77.0417 77.3438
9 2020 10 14 2020-10-14 77.6525 77.3438
10 2020 10 15 2020-10-15 78.0906 77.3438
11 2020 10 16 2020-10-16 77.9100 77.3438
12 2020 10 19 2020-10-19 77.6602 77.3438
13 2020 10 20 2020-10-20 77.3568 77.3438
14 2020 10 21 2020-10-21 76.7243 77.3438
15 2020 10 22 2020-10-22 76.5872 77.3438
16 2020 10 23 2020-10-23 76.1374 77.3438
17 2020 10 26 2020-10-26 76.4435 77.3438
18 2020 10 27 2020-10-27 77.2906 77.3438
19 2020 10 28 2020-10-28 79.2239 77.3438
20 2020 10 29 2020-10-29 78.8993 77.3438
21 2020 10 30 2020-10-30 79.5305 77.3438
22 2020 11 2 2020-11-02 80.5313 80.5313
23 2020 11 3 2020-11-03 79.3615 80.5313
24 2020 11 5 2020-11-05 77.0156 80.5313
25 2020 11 6 2020-11-06 77.4226 80.5313
26 2020 11 9 2020-11-09 76.2880 80.5313
27 2020 11 10 2020-11-10 76.5648 80.5313
28 2020 11 11 2020-11-11 77.1171 80.5313
29 2020 11 12 2020-11-12 77.3568 80.5313
30 2020 11 13 2020-11-13 77.3740 80.5313
31 2020 11 16 2020-11-16 76.1758 80.5313
32 2020 11 17 2020-11-17 76.2325 80.5313
33 2020 11 18 2020-11-18 76.0401 80.5313
34 2020 11 19 2020-11-19 76.0529 80.5313
35 2020 11 20 2020-11-20 76.1992 80.5313
36 2020 11 23 2020-11-23 76.1648 80.5313
37 2020 11 24 2020-11-24 75.4740 80.5313
38 2020 11 25 2020-11-25 75.5510 80.5313
39 2020 11 26 2020-11-26 75.7018 80.5313
40 2020 11 27 2020-11-27 75.8639 80.5313
41 2020 11 30 2020-11-30 76.3944 80.5313
Previous Answer to get the first day of each month (within the column data)
One way to do it is to create a dummy column to store the first day of each month. Then use drop_duplicates() and retain only the first row.
Key assumption:
The assumption with this logic is that we have at least 2 rows for each month. If there is only one row for a month, then it will not be part of the duplicates and you will NOT get that month's data.
That will give you the first day of each month.
import pandas as pd
import numpy as np
data = pd.DataFrame({'year': [2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020],
'month': [10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11],
'day': [1,2,5,6,7,8,9,12,13,14,15,16,19,20,21,22,23,26,27,28,29,30,2,3,5,6,9,10,11,12,13,16,17,18,19,20,23,24,25,26,27,30]})
data['date'] = pd.to_datetime(data)
data['spot'] = [77.3438,78.192,78.1044,78.4357,78.0285,77.3507,76.78,77.13,77.0417,77.6525,78.0906,77.91,77.6602,77.3568,76.7243,76.5872,76.1374,76.4435,77.2906,79.2239,78.8993,79.5305,80.5313,79.3615,77.0156,77.4226,76.288,76.5648,77.1171,77.3568,77.374,76.1758,76.2325,76.0401,76.0529,76.1992,76.1648,75.474,75.551,75.7018,75.8639,76.3944]
#create a dummy column to store the first day of the month
data['dx'] = data.date.dt.to_period('M')
#drop duplicates while retaining only the first row of each month
dx = data.drop_duplicates('dx',keep='first')
#This will give you the first row of each month
print (dx)
The output of this will be:
year month day date spot dx
0 2020 10 1 2020-10-01 77.3438 2020-10
22 2020 11 2 2020-11-02 80.5313 2020-11
If there is only one row for a given month, then you can use groupby the month and take the first record.
data.groupby(['dx']).first()
This will give you:
year month day date spot
dx
2020-10 2020 10 1 2020-10-01 77.3438
2020-11 2020 11 2 2020-11-02 80.5313
data['strike']=data.groupby(['year','month'])['spot'].transform('first')
I guess this can be achieved by this without creating any other dataframe.

How can I delete duplicates group 3 columns using two criteria (first two columns)?

That is my data set enter code here
Year created Week created SUM_New SUM_Closed SUM_Open
0 2018 1 17 0 82
1 2018 6 62 47 18
2 2018 6 62 47 18
3 2018 6 62 47 18
4 2018 6 62 47 18
In last three columns there is already the sum for the year and week. I need to get rid of duplicates so that the table contains unique values (for the example above):
Year created Week created SUM_New SUM_Closed SUM_Open
0 2018 1 17 0 82
4 2018 6 62 47 18
I tried to group data but it somehow works wrong and does what I need but just for one column.
df.groupby(['Year created', 'Week created']).size()
And output:
Year created Week created
2017 48 2
49 25
50 54
51 36
52 1
2018 1 17
2 50
3 37
But it is just one column and I don't know which one because even if I separate the data on three parts and do the same procedure for each part I get the same result (as above) for all.
I believe need drop_duplicates:
df = df.drop_duplicates(['Year created', 'Week created'])
print (df)
Year created Week created SUM_New SUM_Closed SUM_Open
0 2018 1 17 0 82
1 2018 6 62 47 18
df2 = df.drop_duplicates(['Year created', 'Week created', 'SUM_New', 'SUM_Closed'])
print(df2)
hope this helps.

Calculate sum for entire sheet based on hours

I have a spreadsheet that has various weekly hours against a fixed number. For example
Name Weekly Hours Week 1 Week 2 Week 3
Jon 40 44 36 40
Shaun 40 40 36 44
Dawn 20 25 10 16
Is there a way where I can convert the weekly hours so that they have the sum of Weekly Hours - Week
Example for Jon
-4, 4, 0
Not sure how to do this and wondered if there was a global setting/sum?

Average formula using number of blank rows above

I'm working on spreadsheet with logged flows that are not at uniform periods.
Looking for formula for Col G that will average values in Col A for logged values for previous 10 minutes.
Here's the spreadsheet data:
Flow Time min sec sec 10_min Average
187.29 06:10:09 10 9 609
202.90 06:11:21 11 21 681
280.94 06:12:37 12 37 757
218.51 06:13:43 13 43 823
187.29 06:15:13 15 13 913
124.86 06:16:26 16 26 986
109.25 06:18:52 18 52 1132
109.25 06:20:00 20 0 1200 1 177.54
202.90 06:22:30 22 30 1350
265.33 06:23:36 23 36 1416
280.94 06:24:42 24 42 1482
249.73 06:25:58 25 58 1558
218.51 06:27:39 27 39 1659
421.41 06:28:47 28 47 1727
421.41 06:30:00 30 0 1800 1 294.32
Use an AVERAGEIFS and construct the criteria with the TEXT function while modifying one criteria by ten minutes.
=AVERAGEIFS(A:A,B:B, TEXT(B9-TIME(0, 10, 0), "\>0.0###############"),B:B, TEXT(B9, "\<\=0.0###############"))
Note that times can also be resolved as decimal numbers which I have used here. My second average came up slightly different from yours. You may wish to change the \>\= to \> .

Excel indexmatch, vlookup

I have a holiday calendar for several years in one table. Can anyone help – How to arrange this data by week and show holiday against week? I want to reference this data in other worksheets and hence arranging this way will help me to use formulae on other sheets. I want the data to be: col A having week numbers and column B showing holiday for year 1, col. C showing holiday for year 2, etc.
Fiscal Week
2015 2014 2013 2012
Valentine's Day 2 2 2 3
President's Day 3 3 3 4
St. Patrick's Day 7 7 7 7
Easter 10 12 9 11
Mother's Day 15 15 15 16
Memorial Day 17 17 17 18
Flag Day 20 19 19 20
Father's Day 21 20 20 21
Independence Day 22 22 22 23
Labor Day 32 31 31 32
Columbus Day 37 37 37 37
Thanksgiving 43 43 43 43
Christmas 47 47 47 48
New Year's Day 48 48 48 49
ML King Day 51 51 51 52
It's not too clear what year 1 is, so I'm going to assume that's 2015, and year 2 is 2014, etc.
Here's how you could set it up, if I understand correctly. Use this index/match formula (psuedo-formula):
=Iferror(Index([holiday names range],match([week number],[2015's week numbers in your table],0)),"")
It looks like this:
(=IFERROR(INDEX($A$3:$A$17,MATCH($H3,B$3:B$17,0)),""), in the cell next to the week numbers)
You can then drag the formula over, and the matching group (in above picture, B3:B17) will "slide over" as you drag the formula over.

Resources