Python percentage of 2 columns in new column based on condition - python-3.x

I have asked earlier this question and got some feedback however I am still stuck in some mystery where I am not able to calculate the percentage of 2 columns based on conditions. 2 columns are ‘tested population’ and ‘total population’ based on grouping ‘Year’ & ‘Gender’ and show it in new column as ‘percentage’…
Year Race Gender Tested population Total population
2017 Asian Male 345 567
2017 Hispanic Female 666 67899
2018 Native Male 333 35543
2018 Asian Female 665 78955
2019 Hispanic Female 4444 44356
2020 Native Male 3642 6799
2017 Asian Male 5467 7998
2018 Asian Female 5467 7998
2019 Hispanic Male 456 4567
Table
code
df = pd.DataFrame(alldata, columns=['Year', 'Gender', 'Tested population', 'Total population'])
df2 = df.groupby(['Year', 'Gender']).agg({'Tested population': 'sum'})
pop_pcts = df2.groupby(level=0).apply(lambda x:
100 * x / float(x.sum()))
print(pop_pcts)
Output:
Tested population
Year Gender
2017 Female 10.280951
Male 89.719049
2018 Female 94.849188
Male 5.150812
2019 Female 90.693878
Male 9.306122
2020 Male 100.000000
Whereas i want data as in this format to show along with other columns as a new column 'Percentage' .
Year Race Gender Tested population Total population Percentage
2017 Asian Male 345 567 60.8466
2017 Hispanic Female 666 67899 0.98087
2018 Native Male 333 35543 0.93689
2018 Asian Female 665 78955 0.84225
2019 Hispanic Female 4444 44356 10.0189
2020 Native Male 3642 6799 53.5667
2019 Hispanic Male 456 4567 9.98467
I have gone through Pandas percentage of total with groupby
and not able to fix my issues, can someone help on this

df['Percentage'] = df['Tested population']/df['Total Population']
I believe you just need to add a column.

Related

Pivoting a table with duplicate index

I wanted to pivot this table:
Year County Sex rate
0 2006 Alameda Male 45.80
1 2006 Alameda Female 54.20
2 2006 Alpine Male 52.81
3 2006 Alpine Female 47.19
4 2006 Amador Male 49.97
5 2006 Amador female 50.30
My desired output is:
Year County Male Female
2006 Alameda 45.80 54.20
2006 Alameda 52.81 47.19
2006 Alpine 49.97 50.30
I tried doing this:
sex_rate=g.pivot(index="County",columns='Year',values='rate')
But I keep getting this error:
ValueError: Index contains duplicate entries, cannot reshape
Please help. I am new to python
I think you want index=['Year', 'County'], not just index='County'. And since you are passing two columns to index, you may want to use pivot_table instead of pivot:
df.pivot_table(index=['Year','County'],
columns='Sex', values='rate'
).reset_index()
Output:
Sex Year County Female Male
0 2006 Alameda 54.20 45.80
1 2006 Alpine 47.19 52.81
2 2006 Amador 50.30 49.97

Summing a years worth of data that spans two years pandas

I have a DataFrame that contains data similar to this:
Name Date A B C
John 19/04/2018 10 11 8
John 20/04/2018 9 7 9
John 21/04/2018 22 15 22
… … … … …
John 16/04/2019 8 8 9
John 17/04/2019 10 11 18
John 18/04/2019 8 9 11
Rich 19/04/2018 18 7 6
… … … … …
Rich 18/04/2019 19 11 17
The data can start on any day and contains at least 365 days of data, sometimes more. What I want to end up with is a DataFrame like this:
Name Date Sum
John April 356
John May 276
John June 209
Rich April 452
I need to sum up all of the months to get a year’s worth of data (April - March) but I need to be able to handle taking part of April’s total (in this example) from 2018 and part from 2019. What I would also like to do is shift the days so they are consecutive and follow on in sequence so rather than:
John 16/04/2019 8 8 9 Tuesday
John 17/04/2019 10 11 18 Wednesday
John 18/04/2019 8 9 11 Thursday
John 19/04/2019 10 11 8 Thursday (was 19/04/2018)
John 20/04/2019 9 7 9 Friday (was 20/04/2018)
It becomes
John 16/04/2019 8 8 9 Tuesday
John 17/04/2019 10 11 18 Wednesday
John 18/04/2019 8 9 11 Thursday
John 19/04/2019 9 7 9 Friday (was 20/04/2018)
Prior to summing to get the final DataFrame. Is this possible?
Additional information requested in comments
Here is a link to the initial data set https://github.com/stottp/exampledata/blob/master/SOExample.csv and the required output would be:
Name Month Total
John March 11634
John April 11470
John May 11757
John June 10968
John July 11682
John August 11631
John September 11085
John October 11924
John November 11593
John December 11714
John January 11320
John February 10167
Rich March 11594
Rich April 12383
Rich May 12506
Rich June 11112
Rich July 11636
Rich August 11303
Rich September 10667
Rich October 10992
Rich November 11721
Rich December 11627
Rich January 11669
Rich February 10335
Let's see if I understood correctly. If you want to sum, I suppose you mean sum the values of columns ['A', 'B', 'C'] for each day and get the total value monthly.
If that's right, the first thing to to is set the ['Date'] column as the index so that the data frame is easier to work with:
df.set_index(df['Date'], inplace=True, drop=True)
del df['Date']
Next, you will want to add the new column ['Sum'] by re-sampling your data frame (from days to months) whilst summing the values of ['A', 'B', 'C']:
df['Sum'] = df['A'].resample('M').sum() + df['B'].resample('M').sum() + df['C'].resample('M').sum()
df['Sum'].head()
Out[37]:
Date
2012-11-30 1956265
2012-12-31 2972076
2013-01-31 2972565
2013-02-28 2696121
2013-03-31 2970687
Freq: M, dtype: int64
The last part about squashing February of 2018 and 2019 together as if they were a single month might yield from:
df['2019-02'].merge(df['2018-02'], how='outer', on=['Date', 'A', 'B', 'C'])
Test this last step and see if it works for you.
Cheers

Lookup value in one dataframe and paste it into another dataframe

I have two dataframes in Python one big (car listings), one small (car base configuration prices). The small one looks like this:
Make Model MSRP
0 Acura ILX 27990
1 Acura MDX 43015
2 Acura MDX Sport Hybrid 51960
3 Acura NSX 156000
4 Acura RDX 35670
5 Acura RLX 54450
6 Acura TLX 31695
7 Alfa Romeo 4C 55900
8 Alfa Romeo Giulia 37995
… … … . …
391 Toyota Yaris 14895
392 Toyota Yaris iA 15950
393 Volkswagen Atlas 33500
394 Volkswagen Beetle 19795
395 Volkswagen CC 34475
396 Volkswagen GTI 24995
397 Volkswagen Golf 19575
398 Volkswagen Golf Alltrack 25850
399 Volkswagen Golf R 37895
400 Volkswagen Golf SportWagen 21580
401 Volkswagen Jetta 17680
402 Volkswagen Passat 22440
403 Volkswagen Tiguan 24890
404 Volkswagen Touareg 42705
405 Volkswagen e-Golf 28995
406 Volvo S60 33950
Now I want to paste the values from the MSRP column (far right column) based on matching the Make and Model columns into the big dataframe (car listings) that looks like the following:
makeName modelName trimName carYear mileage
0 BMW X5 sDrive35i 2017 0
1 BMW X5 sDrive35i 2017 3
2 BMW X5 sDrive35i 2017 0
3 Audi A4 Premium Plus2017 0
4 Kia Optima LX 2016 10
5 Kia Optima SX Turbo 2017 15
6 Kia Optima EX 2016 425
7 Rolls-Royce Ghost Series II 2017 15
… … … … … …
In the end I would like to have the following:
makeName modelName trimName carYear mileage MSRP
0 BMW X5 sDrive35i 2017 0 value from the other table
1 BMW X5 sDrive35i 2017 3 value from the other table
2 BMW X5 sDrive35i 2017 0 value from the other table
3 Audi A4 Premium Plus2017 0 value from the other table
4 Kia Optima LX 2016 10 value from the other table
5 Kia Optima SX Turbo 2017 15 value from the other table
6 Kia Optima EX 2016 425 value from the other table
7 Rolls-Royce Ghost Series II 2017 15 value from the other table
… … … … … …
I read the documentation regarding pd.concat, merge and join but I am not making any progress.
Can you guys help?
Thanks!
You can use merge to join the two dataframes together.
car_base.merge(car_listings, left_on=['makeName','modelName'], right_on=['Make','Model'])

Excel PivotTable for counting words for female and male

I have a excel like the following form:
words male words female
I 2 rose 4
am 3 baby 6
sunny 4 slim 9
baby 5 travel 11
football 9
I want this excel to be a new excel like the following form with Excel PivotTable. If there are common words between male and female, the value of each other, if there is no, just show their own value, other are space.
words male female
I 2
am 3
sunny 4
baby 5 6
football 9
rose 4
slim 9
travel 11
Thanks for your time and consideration!
If you paste the data sets on top of each other (cut/paste) and add a label for male/female, a standard pivot table would do the trick:
words count gender
I 2 male
am 3 male
sunny 4 male
baby 5 male
football 9 male
rose 4 female
baby 6 female
slim 9 female
travel 11 female
Pivoted:
Row Labels female male
am 3
baby 6 5
football 9
I 2
rose 4
slim 9
sunny 4
travel 11
If you can't do this, are you open to VBA?

SUM(IF(ColA=ColA AND ColB=ColB,ColC,0)

This SUMIF calculation has stumped me within Excel (2013).
A B C D E
Created Source Conv Rev RPConv
Jan,1 2014 Apples 3 5.00 =Rev/Conv
Jan,1 2014 Oranges 2 4.00 =Rev/Conv
Jan,7 2014 Apples 3 5.00 =Rev/Conv
Feb,1 2014 Apples 5 5.00 =Rev/Conv
Feb,1 2014 Oranges 3 4.00 =Rev/Conv
CURRENT: =SUM(IF(MONTH($A:$A)=1 AND $B:$B='Apples',$D:$D,0)
What I expect to return is:
5.00+5.00
but unfortunately it rejects the statement altogether.
Given the tag and assuming Month 1 is January 2014:
=SUMIFS(D:D,A:A,">"&41639,A:A,"<"&41671,B:B,"Apples")
=SUM(IF(AND(MONTH($A:$A)=1,$B:$B="Apples"),$D:$D,0))

Resources