Use my custom row order with pandas .describe() function - python-3.x

Assuming I have the following test DataFrame df:
Car Sold make profit
Honda 100 Accord 10
Honda 20 Fit 5
Toyota 300 Corolla 20
Hyundai 150 Elantra 20
BMW 20 Z-class 100
Toyota 45 Lexus 7
BMW 50 X-class 30
JEEP 150 cherokee 2
Honda 20 CRV 5
Toyota 30 Yaris 3
I need a summary statistic table for number of cars sold, by type of car.
I can do that this way:
df.groupby('Car')['Sold'].describe()
this gives me something like the following:
Car count mean std min 25th 50th 75th max
BMW 2
Honda 3
Hyundai 1
JEEP 1
Toyota 3
The 'Car' column values are listed in the summary statistic table in alphabetically ascending order. I am looking for a way to sort it in my own pre-specified way. I want the summary statistic table to be listed as "Toyota, Hyundai, JEEP, BMW, Honda"

df.groupby('Car')['Sold'].describe().loc[["Toyota", "Hyundai", "JEEP", "BMW", "Honda"]]
helps me put it in order, but I am not able to do it for multi-level indexing. For instance, if I want the summary statistics table by 'Car', and further by the make, .loc does not give me the desired solution.

Related

Excel Power Query. Unpivot Years only to create multiple measures on each row

I'm trying to normalise some data that is supplied in Excel. The data is made up of a number of dimension columns followed by several measure columns over time. Unfortunately the data comes in with a single "Measure/Year" identifier which means that if there are 10 years of data and 4 measures, there will be 40 measure columns.
I can't select specific columns to unpivot as the number of columns will change over time and I want to automate this completely.
A simplified sample of data looks like this (just showing 2 measures over 3 years in this example - but potentially 5 measures over an ever increasing number of years).
Country
Category
Product
QTY_2018
QTY_2019
QTY_2020
Value_2018
Value_2019
Value_2020
France
Fruit
Apple
10
20
30
11
22
33
France
Fruit
Orange
40
50
60
44
55
66
Germany
Veg
Carrot
70
80
90
77
88
99
What I would like to achieve is...
Country
Category
Product
Year
QTY
Value
France
Fruit
Apple
2018
10
11
France
Fruit
Apple
2019
20
22
France
Fruit
Apple
2020
30
33
France
Fruit
Orange
2018
40
44
France
Fruit
Orange
2019
50
55
France
Fruit
Orange
2020
60
66
Germany
Veg
Carrot
2018
70
77
Germany
Veg
Carrot
2019
80
88
Germany
Veg
Carrot
2020
90
99
So far I have selected all the non-measure columns and then applied a transform "Unpivot other columns", and then creating 2 custom columns to get the measure name (Qty or Value in this example) and the year. This gets around the problem of the varying number of measure columns but that only gets me so far.
I now have data that looks like this
Country
Category
Product
Year
Measure
Amount
France
Fruit
Apple
2018
QTY
10
France
Fruit
Apple
2018
Value
11
and so on...
Notes:
The measure label column will always 'measurename_YYYY'
The list of measure names is finite (4 or 5 maybe) so updating this to support more measure names if any are added will be fine as this will be rare. The number of years will increase each year but as I want end users to be able to refresh the query based on the contents of a sheet they update (the sample data above) then the varying periods must be handled in the query.
If this can be done in the datamodel I'm happy to go with that too.
I maybe going about this the wrong way with my attempts so far but my Power Query knowledge is pretty basic so any help would be gratefully received.
You should be able to just repivot on the new Measure column to get your desired result now.
You're nearly there. Just Pivot on your "Measure" column, to complete the output:
Unpivoted = Table.UnpivotOtherColumns(Source, {"Country", "Category", "Product"}, "Attribute", "Value"),
#"Split Column" = Table.SplitColumn(Unpivoted, "Attribute", Splitter.SplitTextByEachDelimiter({"_"}, QuoteStyle.Csv, false), {"Measure", "Year"}),
#"Pivoted Column" = Table.Pivot(#"Split Column", List.Distinct(#"Split Column"[Measure]), "Measure", "Value")

Excel combining up to 100 ranges that have similar but not identical columns into one

I'm pulling in data for unique items and each item has a potentially unique set of columns that go with it. So for instance as a trivial example two ranges might look this:
Range 1:
Brand Year MPG Sales
Nissan 2000 25 150
Nissan 2005 27 180
Nissan 2008 30 190
Range2:
Brand Year Sales
Honda 2000 95
Honda 2001 100
Honda 2001 150
Ideally I would like to combine into one named range:
Brand Year MPG Sales
Nissan 2000 25 150
Nissan 2005 27 180
Nissan 2008 30 190
Honda 2000 Null 95
Honda 2001 Null 100
Honda 2001 Null 150
In reality I have something like 100+ named ranges like this and the columns vary from anywhere between 25-35.
Is there a smart way to achieve the desired result? I'm trying right now to use vba to loop through each range and check if the row exists, but that seems like kind of a sloppy solution and I'm having issues getting it to work, but I can probably brute force it through.

1) Issue In Normalize Transformation for Informatica Power Center

I am Trying to Normalize Records of My SOurce table using Normalize Transformation in informatica, But Sequence are not re-generating for different rows.
Below Is SOurce Table :
Store_Name Sales_Quarter1 Sales_Quarter2 Sales_Quarter3 Sales_Quarter4
DELHI 150 240 455 100
MUMBAI 100 500 350 340
Target Table :
Store_name
Sales
Quarter
I am Using Occurrence - 4, on Sales Column for getting GCID Sales.
For Quarter, I am Using GCID Sales column :
O/P :
STORE_NAME SALES_COLUMN QUARTER
Mumbai 100 1
Mumbai 500 2
Mumbai 350 3
Mumbai 340 4
Delhi 150 5
Delhi 240 6
Delhi 455 7
Delhi 100 8
Why Quarter Value is not restarting from 1 for Delhi and is continuing from 5 ?
There is a GK column that keeps sequential numbers for all rows. Definitely, GCID is the right column that keeps numbers per multi-occurrences in a row. So, double check that there is GCID port and not GK that is linked to QUARTER port to target…
It’s good to provide a screenshot for the mapping and for the normalizer transformation (Normalizer tab) to be more informative about your question/issue…
But I suppose you have 'Store_Name' port at level 1 and all 'Sales_Quarter1', 'Sales_Quarter2', 'Sales_Quarter3' and 'Sales_Quarter4' ports grouped at level 2 on Normalizer tab (using >> button at top left area). And at group level (for these four ports) you set the Occurrence to 4.

Excel How to make a formula differenciate different vehicle plates

I have little knowlage of excel and I'm trying to configure an excel table so I can get the consumption of gas for each vehicle in a company, but all the data is introduced in only one table, how can I calculate the increase of km's of each vehicle to then be able to calculate the consumption?
The problem is that I don't know how to make the formula differenciate for each different plate.
The table is the following:
**A B C D E F G**
**1** Date Plate km Gas Signed Increased km's Consum
**2** 1/1/2018 0157-AAA 123456 50 YES
**3** 5/1/2018 0157-AAA 123789 20 NO
**4** 8/2/2018 0157-AAA 123987 30 NO
**5** 1/2/2018 0582-BBB 123456 40 YES
**6** 1/3/2018 0356-CCC 123456 30 NO
Another exemple:
Data Plate km Gas Increased km Consum %
3/5/2017 1111-AAA 150 20 150 13,33333333
7/5/2017 1111-AAA 400 30 250 12
7/5/2017 2222-BBB 50 10 50 20
7/5/2017 3333-CCC 20 5 20 25
10/5/2017 2222-BBB 200 30 150 20
Each plate is a different vehicle
Gas is the amount of oil that the vehicle refills in L
The table is updated daily or every 2-3 days as it's manually filled
The problem is calculating the increased km's as they may be other plates in between in the same date.
Consum % = Gas/Increased km *100
I thought about just ordering the columns by date and by plate and apply a general formula to everything
Thanks
I think I finally "solved my problem", the formula with the one I work is based on a filter for the plates in order to get them ordered. then the formula is:
Increased km =IF(B2=B1;C2-C1;C2)

EXCEL: Count of Column Items For Every Distinct Item In Another

In my Excel sheet, I have 2 columns. Names of restaurants, and ratings for each one. For each rating, a new row is created, so of course a restaurant name occurs multiple times.
Restaurant Rating
McDonalds 8
McDonalds 7
Red Robin 5
Qdoba 7
Etc.
How can I get the number of times each rating happens for each restaurant? We'll say rating goes from 1-10.
I want it to look like this:
Restaurant 1(rating) 2 3 4 5 6 7 8 9 10
McDonalds 889 22 45 77 484 443 283 333 44 339
Any help is appreciated!
Using Pivot Tables:
Use a pivot tables to set your rows at "Restaurant" and your columns as "Rating" and your values as "Count of Rating"
Using Countifs:

Resources