Pivoting a table with duplicate index

Pivoting a table with duplicate index - python-3.x

I wanted to pivot this table:
Year County Sex rate
0 2006 Alameda Male 45.80
1 2006 Alameda Female 54.20
2 2006 Alpine Male 52.81
3 2006 Alpine Female 47.19
4 2006 Amador Male 49.97
5 2006 Amador female 50.30
My desired output is:
Year County Male Female
2006 Alameda 45.80 54.20
2006 Alameda 52.81 47.19
2006 Alpine 49.97 50.30
I tried doing this:
sex_rate=g.pivot(index="County",columns='Year',values='rate')
But I keep getting this error:
ValueError: Index contains duplicate entries, cannot reshape
Please help. I am new to python

I think you want index=['Year', 'County'], not just index='County'. And since you are passing two columns to index, you may want to use pivot_table instead of pivot:
df.pivot_table(index=['Year','County'],
columns='Sex', values='rate'
).reset_index()
Output:
Sex Year County Female Male
0 2006 Alameda 54.20 45.80
1 2006 Alpine 47.19 52.81
2 2006 Amador 50.30 49.97

Related

Excel - Count based on criteria in 3 other columns

I'm looking for help in getting a count based on criteria in 3 other columns. I started to do a pivot table, but I cannot see how to add an IF statement to the distinct count there.
I need a count of each customer within the customer type, by each supplier, if the Cases > 0 for that year.
Here's a sample data set:
Supplier
Customer
Type
2019 Cases
2020 Cases
ABC
Al's Store
Package
3
2
ABC
Ben's
Package
0
6
ABC
Kroger
Grocery
2
1
ABC
Publix
Grocery
1
0
XYZ
Al's Store
Package
0
5
XYZ
Ben's
Package
4
0
XYZ
Kroger
Grocery
0
1
XYZ
Publix
Grocery
3
7
I need a result like this. My actual report will have each supplier on their own tab.
Supplier
Type
2019 Customer Count
2020 Customer Count
My Reason
ABC
Package
1
2
Al's bought in both years, but Ben's only in 2020
ABC
Grocery
2
1
Kroger bought in both years, but Publix only in 2019
XYZ
Package
1
1
Al only bought in 2020, Ben only bought in 2019
XYZ
Grocery
1
2
Kroger only bought in 2020
Thanks!

Python percentage of 2 columns in new column based on condition

I have asked earlier this question and got some feedback however I am still stuck in some mystery where I am not able to calculate the percentage of 2 columns based on conditions. 2 columns are ‘tested population’ and ‘total population’ based on grouping ‘Year’ & ‘Gender’ and show it in new column as ‘percentage’…
Year Race Gender Tested population Total population
2017 Asian Male 345 567
2017 Hispanic Female 666 67899
2018 Native Male 333 35543
2018 Asian Female 665 78955
2019 Hispanic Female 4444 44356
2020 Native Male 3642 6799
2017 Asian Male 5467 7998
2018 Asian Female 5467 7998
2019 Hispanic Male 456 4567
Table
code
df = pd.DataFrame(alldata, columns=['Year', 'Gender', 'Tested population', 'Total population'])
df2 = df.groupby(['Year', 'Gender']).agg({'Tested population': 'sum'})
pop_pcts = df2.groupby(level=0).apply(lambda x:
100 * x / float(x.sum()))
print(pop_pcts)
Output:
Tested population
Year Gender
2017 Female 10.280951
Male 89.719049
2018 Female 94.849188
Male 5.150812
2019 Female 90.693878
Male 9.306122
2020 Male 100.000000
Whereas i want data as in this format to show along with other columns as a new column 'Percentage' .
Year Race Gender Tested population Total population Percentage
2017 Asian Male 345 567 60.8466
2017 Hispanic Female 666 67899 0.98087
2018 Native Male 333 35543 0.93689
2018 Asian Female 665 78955 0.84225
2019 Hispanic Female 4444 44356 10.0189
2020 Native Male 3642 6799 53.5667
2019 Hispanic Male 456 4567 9.98467
I have gone through Pandas percentage of total with groupby
and not able to fix my issues, can someone help on this

df['Percentage'] = df['Tested population']/df['Total Population']
I believe you just need to add a column.

Pandas Group By Multiple Colums and Calculate Standard Deviation

I have a pandas dataframe that contains statistics of basketball players from the NBA from multiple seasons and teams. It looks like this:
Year Team Player PTS/G
2018 Lakers Lebron James 27.6
2018 Lakers Kyle Kuzma 10.3
2019 Rockets James Harden 25.5
2019 Rockets Russel Westbrook 23.2
I want to create a new column called 'PTS Dev' that is the standard deviation of PTS/G for each team and year. Then, I plan on analyzing where a player is according to that deviation. This is my attempt to calculate that column:
final_data['PTS Dev'] = final_data.groupby('Team', 'Year')['PTS/G'].std()

Use groupby with transform
final_data['PTS Dev'] = final_data.groupby(['Team', 'Year'])['PTS/G'].transform('std')
final_data
Out[9]:
Year Team Player PTS/G PTS Dev
0 2018 Lakers Lebron James 27.6 12.232947
1 2018 Lakers Kyle Kuzma 10.3 12.232947
2 2019 Rockets James Harden 25.5 1.626346
3 2019 Rockets Russel Westbrook 23.2 1.626346

Lookup value in one dataframe and paste it into another dataframe

I have two dataframes in Python one big (car listings), one small (car base configuration prices). The small one looks like this:
Make Model MSRP
0 Acura ILX 27990
1 Acura MDX 43015
2 Acura MDX Sport Hybrid 51960
3 Acura NSX 156000
4 Acura RDX 35670
5 Acura RLX 54450
6 Acura TLX 31695
7 Alfa Romeo 4C 55900
8 Alfa Romeo Giulia 37995
… … … . …
391 Toyota Yaris 14895
392 Toyota Yaris iA 15950
393 Volkswagen Atlas 33500
394 Volkswagen Beetle 19795
395 Volkswagen CC 34475
396 Volkswagen GTI 24995
397 Volkswagen Golf 19575
398 Volkswagen Golf Alltrack 25850
399 Volkswagen Golf R 37895
400 Volkswagen Golf SportWagen 21580
401 Volkswagen Jetta 17680
402 Volkswagen Passat 22440
403 Volkswagen Tiguan 24890
404 Volkswagen Touareg 42705
405 Volkswagen e-Golf 28995
406 Volvo S60 33950
Now I want to paste the values from the MSRP column (far right column) based on matching the Make and Model columns into the big dataframe (car listings) that looks like the following:
makeName modelName trimName carYear mileage
0 BMW X5 sDrive35i 2017 0
1 BMW X5 sDrive35i 2017 3
2 BMW X5 sDrive35i 2017 0
3 Audi A4 Premium Plus2017 0
4 Kia Optima LX 2016 10
5 Kia Optima SX Turbo 2017 15
6 Kia Optima EX 2016 425
7 Rolls-Royce Ghost Series II 2017 15
… … … … … …
In the end I would like to have the following:
makeName modelName trimName carYear mileage MSRP
0 BMW X5 sDrive35i 2017 0 value from the other table
1 BMW X5 sDrive35i 2017 3 value from the other table
2 BMW X5 sDrive35i 2017 0 value from the other table
3 Audi A4 Premium Plus2017 0 value from the other table
4 Kia Optima LX 2016 10 value from the other table
5 Kia Optima SX Turbo 2017 15 value from the other table
6 Kia Optima EX 2016 425 value from the other table
7 Rolls-Royce Ghost Series II 2017 15 value from the other table
… … … … … …
I read the documentation regarding pd.concat, merge and join but I am not making any progress.
Can you guys help?
Thanks!

You can use merge to join the two dataframes together.
car_base.merge(car_listings, left_on=['makeName','modelName'], right_on=['Make','Model'])

PowerPivot Cohort Analysis

I'm trying to do cohort analysis using Excel's PowerPivot. I have a table recording which users have purchased which products in which months eg.
UserID Product Date Quantity
1 Ham Mar 15 2
1 Cheese Jan 15 7
2 Ham Mar 15 8
3 Fish Mar 15 2
2 Cheese Apr 15 8
I want to use a calculated field to filter for a cohort of users who purchased a given product in a given month but be able to analyse all their purchases.
Eg cohort Ham, March 15
--> Users 1, 2
UserID Product Date Quantity
1 Ham Mar 15 2
1 Cheese Jan 15 7
2 Ham Mar 15 8
2 Cheese Apr 15 8
I know this could be done easily using SQL but I am working with colleagues who prefer to use Excel over Access/Some SQL interface.
Thankyou

Create a calculated column like this:
=if([UserID]&SlicerValue=[UserID]&[Product],[UserID])
where HAM would be selected from slicer created from a table of unique products.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pivoting a table with duplicate index - python-3.x

Related

Excel - Count based on criteria in 3 other columns

Python percentage of 2 columns in new column based on condition

Pandas Group By Multiple Colums and Calculate Standard Deviation

Lookup value in one dataframe and paste it into another dataframe

PowerPivot Cohort Analysis

Categories

Resources