Pandas groupby Month with annual summary - python-3.x

I have a list of items describing some orders placed like this
items=[('September 2021',1,40),('June 2022',1,77),....]
In order to get a dataframe grouped by how many orders did I receive and how much did I get paid I do the following
tabla2=tabla.sort_values(by=['Date']).groupby(['Date']).agg({'Subscriptions':'count','Total amount (€)':'sum'}).astype('float64').round(2)
What I want is to include a row with the yearly numbers after each month of that year, and a Totals at the bottom of it
For the totals I do the following
df1=pd.DataFrame(pd.Series({'Date':"<b>Totals</b>",'Subscriptions':"<b>{}</b>".format(tabla['Subscriptions'].sum().astype('int')),
'Total amount (€)':"<b>{}</b>".format(tabla['Total amount (€)'].sum().round(2))})).T.set_index(['Date'])
tabla2=tabla2.append(df1)
The <b> is for making it bold later when representing it with plotly.
So I end up having something like this
Date Subscriptions Total amount (€)
September 2021 15 345
.... ... ...
<b>2021</b> 132 1256
June 2022 17 452
... ... ...
<b>2022</b> 144 3215
<b>Totals</b> 1234 4567
What is the most pythonic way of accomplish this from the tabla2 dataframe?

Related

Pandas: Transpose the first 3 rows but duplicate the rest of the data so I get a unique row and a larger table

I'm trying to convert an existing excel sheet that has 3 layers of columns. The first column is the year, but it's a merged cell. The 2nd column is of the months, also merged, and the 3rd layer is alternating Rent | Other.
Original data is shaped like this:
My data looks like this:
2022
Unnamed: 1
Unnamed: 2
Unnamed: 3 ...
2023
Unnamed: 135
Unnamed: 136 ...
January
NaN
February
NaN
January
NaN
February
Rent
Other
Rent
Other
Rent
Other
Rent
100
0
120
30
110
25
100
I added the "...." to the table, this continues for ~130 or so columns per year.
I tried to forward fill the year and months:
2022
2022
2022
2022
January
January
February
February
Rent
Other
Rent
Other
100
0
120
30
I want it to look like this:
Year
Month
Rent
Other
2022
January
100
0
2022
February
120
30
Flying blind here since I don't have access to your Excel file.
df = (
# Your file look to be row-oriented instead of pandas' usual column-oriented
# format. We will import it without column names and assign them later.
pd.read_excel("file.xlsx", header=None)
# Fill in the blanks since some of the cells are merged
.ffill(axis=1)
# Set the row's index, then transpose the dataframe to the usual
# column-oriented format
.set_axis(["Year", "Month", "Metric", "Value"])
.T
)
# Month name is usually a pain in the neck to work with. By default, they sort
# in alphabetical order so April, August, February, ... It's best to convert
# them into number, but if you want to keep the name, use CategoricalDType to
# keep them in semantic order
MonthDType = pd.CategoricalDtype(
pd.date_range("2022-01-01", "2022-12-01", freq="M").strftime("%B"), ordered=True
)
df["Month"] = df["Month"].astype(MonthDType)
# The final pivot
df = df.pivot_table(
index=["Year", "Month"],
columns="Metric",
values="Value",
aggfunc="sum",
observed=True,
).reset_index()

Excel formula that returns sum based on categorical and date data from 2 columns?

I have a dataset in excel that looks like this:
Supplier Date QTY
A 03/02/2018 10
A 05/01/2018 15
A 08/06/2018 30
B 02/01/2018 20
B 04/01/2018 50
B 08/01/2018 40
B 08/15/2018 50
B 10/01/2018 60
C 03/09/2018 25
C 04/08/2018 25
C 05/20/2018 25
And I want to make a spreadsheet that allows the user to enter start and end dates and the Supplier name and be able to see a total quantity of items received in that time period from that supplier.
I've tried using a combination of VLOOKUP and SUM, but I've only been able to get the first result of the quantity associated with that supplier to return. I understand that this is the nature of VLOOKUP--to only return one value. I'm pretty new to Excel so I just don't really even know which function(s) would be best to use in this scenario.
Example Output:
Enter Supplier: "B"
Enter Start Date: "03/01/2018"
Enter End Date: "09/01/2018"
Items Received: 140
Try:
=SUMPRODUCT((A2:A12=F1)*(B2:B12>=F2)*(B2:B12<=F3)*(C2:C12))
Result:

Reformatting Excel datastructure from multiple rows in columns/rows format

I have the following case: I got data in an format like:
date - timestamp - value where there are 29 values per day, so date stayes the same
What i need is something like:
date 1 per row
timestamp times 29 for each timestamp in a day as columns
values where date and timestamp meets
Is there a way to reformat the structure completely? As the data is pretty big (10 years, 29 data per day) it would take ages to do manually. I need it in the normal excel format as result so i can easily import the data in c#.
What i have:
22.03.2018 08:00 200
22.03.2018 08:30 202
22.03.2018 ...
22.03.2018 22:00 120
23.03.2018 08:00 12
What I want:
08:00 08:30 ... 22:00
22.03.2018 200 202 ... 120
23.03.2018 12
Every help would be appreciated :)
Br

Averaging aggregated(SUM) values in Spotfire

I'm trying to Average aggregated(SUM) values, but my expression keeps doing weighted averages over the whole data set.
Table Structure
REGION SITE_ID MONTH QUANTITY
A 1 01 5
A 1 02 6
A 2 01 4
B 3 01 10
B 3 02 12
Expression
Avg(
Sum([quantity]) over (All([region]))/
UniqueCount([site_id]) over (All([region]))/
UniqueCount([month]) over (All([region]))
) over (All([region]))
To Clarify, I want to average A and B's Monthly Qty per Site
But I keep getting total qty divided by total no of site_ids divided by months
This really depends on where you are going to use it and what the REAL data looks like. This should get you started. Insert this calculated column.
SUM([QUANTITY]) OVER (Intersect([REGION],[MONTH])) / UniqueCount([REGION]) AS [AvgOverRegionByMonth]
This could be inaccurate depending on how the rest of your data looks like. You can also accomplish this in a cross table. The expressions for the Sum and Avg on the example below are as follows:
Sum([QUANTITY]) as [Sum], Sum([QUANTITY]) / Count([REGION]) as [Average]
EDIT
In order to ONLY get the average over the months, use this forumla:
AVG([QUANTITY]) OVER ([MONTH]) as [AvgOverMonth]
Here is what your data will look like:

Running-Total for multiple Measures

Appreciate if anyone could advice how can I apply the Running-Total function for multiple measures?
I have 2 measures (Count) and (Revenue) by Month (Row) and Year (Column)
Product A Product B
2014 2015 2014 2015
Count | Revenue Count | Revenue Count | Revenue Count | Revenue
Jan 100
Feb 200
YTD 300
Mar 555
YTD 855
I tried with
running-total (currentMeasure for [Month], [Year], [Product Type])
but there is error claims that currentMeasure is not supported for QueryPackage.
In case I interpreted the message wrong, appreciate for your advice how can I apply the running-total correctly for multiple measures.
Thank you.
Count
running-total([Count] for [Product Type],[Year])
Revenue
running-total([Revenue] for [Product Type],[Year])
You don't specify month because it is in the row header. The 'for' clause indicates the grouping and the grouping comes from the column headers while the month defines the row grain. Another way to think about it is to put it into English:
Within each Product Type and Year combination, show me the accumulating total of each succeeding row broken out by Month.
Also, currentMeasure is used only in dimensionally-sourced reports, I believe.

Resources