I have a working pivot based upon the following code:
pd.pivot_table(df,
index=["row_a","row_b"],
columns=["col_a"],
values=["metric_a", "metric_b"],
aggfunc={"metric_a":np.max, "metric_b":np.sum})
Based upon that code, I correctly receive the below output.
However, I would like to essentially swap the column with the metric to receive the below output. Is this possible?
I think all you need is a call to pandas.DataFrame.swaplevel after the initial pivot, followed by sorting the columns to group the top level (level=0):
# Assuming df holds the result of the pivot
df.swaplevel(0, 1, axis=1).sort_index(axis=1)
Related
I'm working on achieving the following data transformation/wrangling within Power Query but can't seem to get there on my own. i have read a lof of different questions and answers on the forum but it seems just a bit beyond my grasp.
I have a table which has the ticker of a specific currency in the first column.
There is a second column with the date and time when a certain event, related to that specific currency, happens. This second column is basically the different 5-minute intervals which exist on any given day.
Finally there is a third column which describes the magnitude of the event.
The table therefore looks like this
What I would like to do in power Query is transpose the uniques name of the currencies as the first row of a new table. The first column of this table would be the largest time interval for any given currency. In this case, as you can see in the data I am attaching, the largest timeseries would be that of the currency ETH. Using the longest calendar as our first column I would then like to place the values described in item 3 above as rows in the new table.
The new layout would look like this
My steps to transform the raw data in the first table are detailed in this image. Basically just expanding a JSON file and getting all the data I need into that first format which I described previously.
What I then do is:
Pivot using the first column
Transpose
That gives me a whole bunch of new columns. Way more than I want. Any idea what I can do differently?
In powerquery,
click select pair column
Transform .. pivot column .. values column: basis advanced options: do not aggregate
code:
#"Pivoted Column6" = Table.Pivot(YourPriorStepName, List.Distinct(Source[pair]), "pair", "basis", List.Sum)
output:
I am trying to create a table that will show me the mode of a data set. The data is contained in 3 columns.
Sample Data
Though the actual data set is thousands of rows
I am trying to identify what the most frequent rate paid is for each weight and zone.
I can get an average via a pivot table. I can also have a pivot table show me how many times each rate shows up in each weight and zone, but that is just a count. I would like it to show me the mode rate.
Any ideas on how to work this would be very appreciated!
Update: This is what I need the end result to look like:
Result:
I found the answer to what I needed to do here: https://www.get-digital-help.com/2010/02/11/match-two-criteria-and-return-multiple-rows-in-excel/
I was able to use this formula to create a list of values. From that list I used a mode and min formula to return the mode or min value.
From that I was able to populate a table with the values as needed.
Screen shot of the results.
I'm quite new to using pivot tables and data models, so I don't even know if what I want to do is possible. I have a pivot table (PivotTable1) and its source (Table 25) and I would like to add a hundred or so measures which are listed in the TableCombinations.
For example, I entered the two first measure in orange, but they are not linked to TableCombination and entering them all one by one would be quite long. Each measure is for a distinct Sum wfn column that sums all other rows multiplied by a coefficient. The TableCombinations table simply states the coefficient to be used for each column. For the first three rows, these are my measure formulas :
sum wf1=1.4*Table25[Sum of wD]+0*Table25[Sum of wL]+0*Table25[Sum of wS]+0*Table25[Sum of wW]+0*Table25[Sum of wWSOUL]
sum wf2=1.25*Table25[Sum of wD]+1.5*Table25[Sum of wL]+1*Table25[Sum of wS]+0*Table25[Sum of wW]+0*Table25[Sum of wWSOUL]
sum wf3=1.25*Table25[Sum of wD]+1.5*Table25[Sum of wL]+0*Table25[Sum of wS]+0.4*Table25[Sum of wW]+0*Table25[Sum of wWSOUL]
...
Two questions :
Is there a way to link the tables so that any change made to TableCombination would then be updated in the pivot table measures?
Is there a way to generate all the of the measures without typing them in one by one.
You should be able to use just one DAX measure to do this, using the CROSSJOIN function.
Don't set up a relationship between the Tables, and drag # to the Columns area of the PivotTable. Then create this Measure:
=SUMx(CROSSJOIN(Table1,Table2),Table1[wD]*Table2[wD]+Table1[wL]*Table2[wL]+Table1[wS]*Table2[wS]+Table1[wW]*Table2[wW]+Table1[wWSOUL]*Table2[wWSOUL])
That should give you the exact answer you need.
Here's how it looks using some sample data:
...and here's the sample data I'm using:
You could certainly use VBA to add measures, and to update them when the Table changes. I might have a crack at writing up an answer along that approach shortly. But here's another way to achieve what you want.
I've previously written some code to slave a Table to a PivotTable, so that any change in the PivotTable's dimensions or placement will be reflected in the shadowing Table's dimensions and placement. This effectively gives us a way to add a calculated field to a PivotTable that can refer to something outside of that PivotTable. If the PivotTable grows, the Calculated Table will grow. If the PivotTable shrinks, the Calculated Table will shrink, and any redundant formulas in it will be deleted.
You can easily use this approach to perform your calculations in a 2nd table alongside your PivotTable, and each column x in that 2nd table could easily reference row x in your 'parameters' table.
See Select Newest Record and Create New Table of Unique Values in Excel
I'm trying to read in a data set and dropping the first two columns of the data set, but it seems like it is dropping the wrong column of information. I was looking at this thread, but their suggestion is not giving the expected answer. My data set starts with 6 columns, and I need to remove the first two. Elsewhere in threads it has the option of dropping columns with labels, but I would prefer not to name columns only to drop them if I can do it in one step.
df= pd.read_excel('Data.xls', header=17,footer=246)
df.drop(df.columns[[0,1]], axis=1, inplace=True)
But it is dropping columns 4 and 5 instead of the first two. Is there something with the drop function that I'm just completely missing?
If I understand your question correctly, you have a multilevel index, so drop columns [0, 1] will start counting on non-index columns.
If you know the position of the columns, why not try selecting it directly, such as:
df = df.iloc[:, 3:]
Or in another words the question is - how to add some calculation in the pivot table based on columns which do not exist in model level.
I've reproduced my problem using AdventureWorksDW2014 sample database.
Let's say I want to calculate difference between Actual and Budget scenario amounts in the FactFinance table for each Organisation and present it in a form of pivot table.
To achieve that I've created a simple model (screen above) and added SumOfAmount measure to the FactFinance table SumOfAmount:=SUM([Amount])
Next, I've opened my model in Excel and created very simple pivot table (shown below)
So, (the question part) now I want to add an extra column to my pivot table, which should calculate something (for example difference) between columns Actual and Budget. And I want this new column been a part of the pivot table so I could filter it or\and add new grouping levels without necessity to change something "outside" the pivot table.
TRIED SO FAR
I tried to add Calculated Field but it seems like I can only use "real" columns for calculations. Columns which appeared in a pivot table based on values from COLUMNS quadrant can't be used as sources for calculations.
FINAL SOLUTION
I got it finally combined two pivot tables: the old one and the one with Diff measure, defined as Diff:=[Actual Amount]-[Budget Amount], where
Actual Amount:=Calculate([SumOfAmount];'DimScenario'[ScenarioName] = "Actual")
Budget Amount:=Calculate([SumOfAmount];'DimScenario'[ScenarioName] = "Budget")
as #WimV suggested
First calculated measure is good:
SumOfAmount:=SUM('FactFinance'[Amount])
Add the following Calculated measures (if needed mark as hidden):
Budget Amount:=Calculate([SumOfAmount],'DimScenario'[Scenariokey] = "Budget")
Actual Amount:=Calculate([SumOfAmount],'DimScenario'[Scenariokey] = "Actual")
You can use the new calculations for example in a difference calculations