I have a dataframe which has 3 columns [user_id ,year_month & value] , i want to calculate last 6months average for the year automatically for each individual unique user_id and assign it to new column
user_id value year_month
1 50 2021-01
1 54 2021-02
.. .. ..
1 50 2021-11
1 47 2021-12
2 36 2021-01
2 48.5 2021-05
.. .. ..
2 54 2021-11
2 30.2 2021-12
3 41.4 2021-01
3 48.5 2021-02
3 41.4 2021-05
.. .. ..
3 30.2 2021-12
Total year has 12-24 months
to get jan 2022 value[dec 2021 to july 2021]=[55+32+33+63+54+51]/6
to get feb 2022 value[jan 2022 to aug 2021] =[32+33+37+53+54+51]/6
to get mar 2022 value[feb 2022 to sep 2021] =[45+32+33+63+54+51]/6
to get apr 2022 value[mar 2022 to oct 2021] =[63+54+51+45+32+33]/6
First index, your datetime column
df = df.set_index('year_month')
Then do the following
df.groupby('UserId').rolling('6M').transform('avg')
This is the most correct way but hey here is one more intutitive
df.sort_values('year_month').groupby('UserId').rolling(6).transform('avg') # Returns wanted series
As paul h said
Given the following df with string date column with ordinal numbers for day, abbreviated month name for month, and normal year:
date oil gas
0 1st Oct 2021 428 99
1 10th Sep 2021 401 101
2 2nd Oct 2020 189 74
3 10th Jan 2020 659 119
4 1st Nov 2019 691 130
5 30th Aug 2019 742 162
6 10th May 2019 805 183
7 24th Aug 2018 860 182
8 1st Sep 2017 759 183
9 10th Mar 2017 617 151
10 10th Feb 2017 591 149
11 22nd Apr 2016 343 88
12 10th Apr 2015 760 225
13 23rd Jan 2015 1317 316
I'm wondering how could we parse date column to standard %Y-%m-%d format?
My ideas so far: 1. strip ordinal indicators ('st', 'nd', 'rd', 'th') from character day string while keeping the day number with re; 2. and convert abbreviated month name to numbers (seems not %b), 3. finally convert them to %Y-%m-%d.
Code may be useful for the first step:
re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])
References:
https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months
pd.to_datetime already handles this case if you don't specify the format parameter:
>>> pd.to_datetime(df['date'])
0 2021-10-01
1 2021-09-10
2 2020-10-02
3 2020-01-10
4 2019-11-01
5 2019-08-30
6 2019-05-10
7 2018-08-24
8 2017-09-01
9 2017-03-10
10 2017-02-10
11 2016-04-22
12 2015-04-10
13 2015-01-23
Name: date, dtype: datetime64[ns]
Is there any way to return a matrix to a vector? I don't know the number of elements in the matrix, so let's say,matrix has n elements.
Below, it is an example of how I want to transform the table.
Any help, guidance, suggesting, recommendation will be very appreciated.
raw data.csv:
,January,February,March,April,May,June,July,August,September,October,November,December
2019,1,2,3,4,5,6,7,8,9,10,11,12
2018,13,14,15,16,17,18,19,20,21,22,23,24
2017,25,26,27,28,29,30,31,32,33,34,35,36
the link for csv files
raw=pd.read_csv('raw data.csv')
raw.head()
Unnamed: 0 January February March April May June July August September October November December
0 2019 1 2 3 4 5 6 7 8 9 10 11 12
1 2018 13 14 15 16 17 18 19 20 21 22 23 24
2 2017 25 26 27 28 29 30 31 32 33 34 35 36
final=pd.read_csv('Final.csv')
final.head(20)
Year&Month Value
0 2019 January 1
1 2019 February 2
2 2019 March 3
3 2019 April 4
4 2019 May 5
5 2019 June 6
6 2019 July 7
7 2019 August 8
8 2019 September 9
9 2019 October 10
10 2019 November 11
11 2019 December 12
12 2018 January 13
13 2018 February 14
14 2018 March 15
15 2018 April 16
16 2018 May 17
17 2018 June 18
18 2018 July 19
19 2018 August 20```
You can use pandas stack
df = pd.read_csv(r'raw data.csv')
df.set_index(df.columns[0]).stack().reset_index()
Out:
Unnamed: 0 level_1 0
0 2019 January 1
1 2019 February 2
2 2019 March 3
3 2019 April 4
4 2019 May 5
5 2019 June 6
6 2019 July 7
7 2019 August 8
8 2019 September 9
9 2019 October 10
10 2019 November 11
11 2019 December 12
12 2018 January 13
13 2018 February 14
I have a Values like
Month Price
Jan 10
Feb 20
Mar 30
............
Dec 50
I have a dropdown for selecting month
If user pickedup the month Feb
then the sum should be displayed as 30
Help me out ! tried a lot with excel function ended up with frustration
Very interesting idea.
Formula in B1 =SUM(INDIRECT("E1:E"&MATCH(A1,D:D,0)))
Hope this will help you.
A B C D E
Feb 30 Month Price
Jan 10
Feb 20
Mar 30
Apr 40
May 50
Jun 60
Jul 70
Aug 80
Sep 90
Oct 100
Nov 110
Dec 120
I have an excel 2007 file with data of weekly values for two years. I want to collate this into a cumulative monthly sum, but also bare in mind the year. For example, If I had the following dates:
6 Apr 13
13 Apr 13
20 Apr 13
27 Apr 13
4 May 13
.
.
.
.
.
12 Apr 14
19 Apr 14
26 Apr 14
3 May 14
10 May 14
17 May 14
24 May 14
.
.
.
.
14 Feb 15
21 Feb 15
28 Feb 15
7 Mar 15
14 Mar 15
21 Mar 15
28 Mar 15
And I wanted to get the summed totals for the following periods:
April 2013
May 2013
June 2013
July 2013
.
.
.
.
April 2014
May 2014
June 2014
.
.
.
January 2015
February 2015
March 2015
What would be the best way to go about doing so? I was thinking to use Sumifs, but was uncertain how well Month() and Year() play if the function was written like sumif(...Year(...Month(..))).
Assuming your data is in Cells A1:B15, with dates in column A and qaintity to be summed in Column B, you can try this:
=SUMPRODUCT((MONTH(A1:A15)=2)*(YEAR(A1:A15)=2014)*(B1:B15))
You can replace "2" with month of your choice and 2014 with desired year. Hope that helps.