How to sum by month in timestamp Data Frame?

How to sum by month in timestamp Data Frame? - python-3.x

i have dataframe like this :
trx_date
trx_amount
2013-02-11
35
2014-03-10
26
2011-02-9
10
2013-02-12
5
2013-01-11
21
how do i filter that into month and year? so that i can sum the trx_amount
example expected output :
trx_monthly
trx_sum
2013-02
40
2013-01
21
2014-02
35

You can convert values to month periods by Series.dt.to_period and then aggregate sum:
df['trx_date'] = pd.to_datetime(df['trx_date'])
df1 = (df.groupby(df['trx_date'].dt.to_period('m').rename('trx_monthly'))['trx_amount']
.sum()
.reset_index(name='trx_sum'))
print (df1)
trx_monthly trx_sum
0 2011-02 10
1 2013-01 21
2 2013-02 40
3 2014-03 26
Or convert datetimes to strings in format YYYY-MM by Series.dt.strftime:
df2 = (df.groupby(df['trx_date'].dt.strftime('%Y-%m').rename('trx_monthly'))['trx_amount']
.sum()
.reset_index(name='trx_sum'))
print (df2)
trx_monthly trx_sum
0 2011-02 10
1 2013-01 21
2 2013-02 40
3 2014-03 26
Or convert to month and years, then output is different - 3 columns:
df2 = (df.groupby([df['trx_date'].dt.year.rename('year'),
df['trx_date'].dt.month.rename('month')])['trx_amount']
.sum()
.reset_index(name='trx_sum'))
print (df2)
year month trx_sum
0 2011 2 10
1 2013 1 21
2 2013 2 40
3 2014 3 26

You can try this -
df['trx_month'] = df['trx_date'].dt.month
df_agg = df.groupby('trx_month')['trx_sum'].sum()

Related

Pandas: How to ctrate DateTime index

There is Pandas Dataframe as:
year month count
0 2014 Jan 12
1 2014 Feb 10
2 2015 Jan 12
3 2015 Feb 10
How to create DateTime index from 'year' and 'month',so result would be :
count
2014.01.31 12
2014.02.28 10
2015.01.31 12
2015.02.28 10

Use to_datetime with DataFrame.pop for use and remove columns and add offsets.MonthEnd:
dates = pd.to_datetime(df.pop('year').astype(str) + df.pop('month'), format='%Y%b')
df.index = dates + pd.offsets.MonthEnd()
print (df)
count
2014-01-31 12
2014-02-28 10
2015-01-31 12
2015-02-28 10
Or:
dates = pd.to_datetime(df.pop('year').astype(str) + df.pop('month'), format='%Y%b')
df.index = dates + pd.to_timedelta(dates.dt.daysinmonth - 1, unit='d')
print (df)
count
2014-01-31 12
2014-02-28 10
2015-01-31 12
2015-02-28 10

Grouping data based on month-year in pandas and then dropping all entries except the latest one- Python

Below is my example dataframe
Date Indicator Value
0 2000-01-30 A 30
1 2000-01-31 A 40
2 2000-03-30 C 50
3 2000-02-27 B 60
4 2000-02-28 B 70
5 2000-03-31 C 90
6 2000-03-28 C 100
7 2001-01-30 A 30
8 2001-01-31 A 40
9 2001-03-30 C 50
10 2001-02-27 B 60
11 2001-02-28 B 70
12 2001-03-31 C 90
13 2001-03-28 C 100
Desired Output
Date Indicator Value
2000-01-31 A 40
2000-02-28 B 70
2000-03-31 C 90
2001-01-31 A 40
2001-02-28 B 70
2001-03-31 C 90
I want to write a code that groups data by particular month-year and then keep the entry of latest date in that particular month-year and drop the rest. The data is till year 2020
I was only able to fetch the count by month-year. I am not able to drop create a proper code that helps to group data as per month-year and indicator and get the correct results

Use Series.dt.to_period for months periods, aggregate index of maximal date per groups by DataFrameGroupBy.idxmax and then pass to DataFrame.loc:
df['Date'] = pd.to_datetime(df['Date'])
print (df['Date'].dt.to_period('m'))
0 2000-01
1 2000-01
2 2000-03
3 2000-02
4 2000-02
5 2000-03
6 2000-03
7 2001-01
8 2001-01
9 2001-03
10 2001-02
11 2001-02
12 2001-03
13 2001-03
Name: Date, dtype: period[M]
df = df.loc[df.groupby(df['Date'].dt.to_period('m'))['Date'].idxmax()]
print (df)
Date Indicator Value
1 2000-01-31 A 40
4 2000-02-28 B 70
5 2000-03-31 C 90
8 2001-01-31 A 40
11 2001-02-28 B 70
12 2001-03-31 C 90

To find sum and percentage from columns of two different dataframe and append result in third dataframe

I have made 2 identical looking dataframe which looks like below:
df1:
date id email Count
4/22/2019 1 abc#xyz.com 10
4/22/2019 1 def#xyz.com 4
4/23/2019 1 abc#xyz.com 5
4/23/2019 1 def#xyz.com 10
df2:
date id Email_ID Count
4/22/2019 1 fgh#xyz.com 5
4/22/2019 1 ijk#xyz.com 6
4/23/2019 1 fgh#xyz.com 7
4/23/2019 1 ijk#xyz.com 8
I want to make a dataframe3 which has sum and percentage of 'Count' column of each dataframe(df1 and df2) and calculate individual percentage[like df1_count%=(df1_count/df1_count+df2_count)*100] according to the date. Output df3 should be something like this below:
df3:
Count Count%
date df1_count df2_count df1_count% df2_count%
4/22/2019 14 11 56% 44%
4/23/2019 15 15 50% 50%
How can it be done by pandas? I am able to do it using 'for' loop but not able to do by pandas functionality, any leads will help
Output as per solution #jezrael
Count Count count% count%
df1_count df2_count df1_count% df2_count%
Date
4/22/2019 14 11 56% 44%
4/23/2019 15 15 50% 50%

Use concat with aggregation sum:
df = pd.concat([df1.groupby('date')['Count'].sum(),
df2.groupby('date')['Count'].sum()], axis=1, keys=('df1_count','df2_count'))
And then add new columns:
s = (df['df1_count'] + df['df2_count'])
df['df1_count%'] = df['df1_count'] / s * 100
df['df2_count%'] = df['df2_count'] / s * 100
df = df.reset_index()
print (df)
date df1_count df2_count df1_count% df2_count%
0 4/22/2019 14 11 56.0 44.0
1 4/23/2019 15 15 50.0 50.0
If need percentages to values first convert to strings with Series.round for truncate decimals:
s = (df['df1_count'] + df['df2_count'])
df['df1_count%'] = (df['df1_count'] / s * 100).round().astype(str) + '%'
df['df2_count%'] = (df['df2_count'] / s * 100).round().astype(str) + '%'
df = df.reset_index()
print (df)
date df1_count df2_count df1_count% df2_count%
0 4/22/2019 14 11 56.0% 44.0%
1 4/23/2019 15 15 50.0% 50.0%
EDIT:
df = pd.concat([df1.groupby('date')['Count'].sum(),
df2.groupby('date')['Count'].sum()], axis=1,
keys=('Count_df1_count','Count_df2_count'))
s = (df['Count_df1_count'] + df['Count_df2_count'])
df['Count%_df1_count%'] = (df['Count_df1_count'] / s * 100).round().astype(str) + '%'
df['Count%_df2_count%'] = (df['Count_df2_count'] / s * 100).round().astype(str) + '%'
df.columns = df.columns.str.split('_', expand=True, n=1)
print (df)
Count Count%
df1_count df2_count df1_count% df2_count%
date
4/22/2019 14 11 56.0% 44.0%
4/23/2019 15 15 50.0% 50.0%

Split dates into time ranges in pandas

14 [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15 [2017-07-26, 2017-06-09, 2017-02-24]
16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18 [2017-02-08]
this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this
14 [2017-03-06, 2017-02-13]
15 [2017-02-24]
16 [null] # or delete empty rows, it doesn't matter
17 [null]
18 [2017-02-08]
then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far
months_4 = individual_dates.copy()
for _ in months_4['Date']:
_ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))
and
months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')
for _ in months_8['Date']:
_ = _[np.isin(_, range_8)]
achieved absolutely no result, data stays the same no matter what
update: I did what you said
individual_dates['Date'] = individual_dates['Date'].str.strip('[]').str.split(', ')
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())
})
df
and here is the result
Date ID
0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24

For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:
from itertools import chain
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})
range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'].isin(range_8)]
print (df)
Date ID
0 2017-03-06 14
0 2017-02-13 14
1 2017-02-24 15
4 2017-02-08 18

How to split rows in pandas with special condition of date?

I have a DataFrame like:
Code Date sales
1 2/2013 10
1 3/2013 11
2 3/2013 12
2 4/2013 14
...
I want to convert it into a DataFrame with a timeline, code, and sales of each type of item:
Date Code Sales1 Code Sales2
2/2013 1 10 NA NA
3/2013 1 11 2 12
4/2013 NA NA 2 14
....
or into a simpler way:
Date Code Sales1 Date Code Sales2 .....
2/2013 1 10 3/2013 2 12
3/2013 1 11 4/2013 2 14
or even into the simplest way, splitting into many small DataFrames

IIUC using concatwith the groupby result
df.index=df.groupby('Code').cumcount()# create the key for concat
pd.concat([x for _,x in df.groupby('Code')],1)
Out[392]:
Code Date sales Code Date sales
0 1 2/2013 10 2 3/2013 12
1 1 3/2013 11 2 4/2013 14

Actually, I was stupid to split the data that way, I rethink and solve the problem with the pivot_table
pd.pivot_table(df, values = ['sales'], index = ['code'], columns = ['date'])
and the result should be like.
sum
date 2/2013 3/2013 4/2013 ....
code
1 10 11 NaN
2 NaN 12 14
...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to sum by month in timestamp Data Frame? - python-3.x

i have dataframe like this : trx_date trx_amount 2013-02-11 35 2014-03-10 26 2011-02-9 10 2013-02-12 5 2013-01-11 21 how do i filter that into month and year? so that i can sum the trx_amount example expected output : trx_monthly trx_sum 2013-02 40 2013-01 21 2014-02 35

You can try this - df['trx_month'] = df['trx_date'].dt.month df_agg = df.groupby('trx_month')['trx_sum'].sum()

Related

Pandas: How to ctrate DateTime index

Grouping data based on month-year in pandas and then dropping all entries except the latest one- Python

To find sum and percentage from columns of two different dataframe and append result in third dataframe

Split dates into time ranges in pandas

How to split rows in pandas with special condition of date?

Categories

Resources