SUM(IF(ColA=ColA AND ColB=ColB,ColC,0) - excel

This SUMIF calculation has stumped me within Excel (2013).
A B C D E
Created Source Conv Rev RPConv
Jan,1 2014 Apples 3 5.00 =Rev/Conv
Jan,1 2014 Oranges 2 4.00 =Rev/Conv
Jan,7 2014 Apples 3 5.00 =Rev/Conv
Feb,1 2014 Apples 5 5.00 =Rev/Conv
Feb,1 2014 Oranges 3 4.00 =Rev/Conv
CURRENT: =SUM(IF(MONTH($A:$A)=1 AND $B:$B='Apples',$D:$D,0)
What I expect to return is:
5.00+5.00
but unfortunately it rejects the statement altogether.

Given the tag and assuming Month 1 is January 2014:
=SUMIFS(D:D,A:A,">"&41639,A:A,"<"&41671,B:B,"Apples")

=SUM(IF(AND(MONTH($A:$A)=1,$B:$B="Apples"),$D:$D,0))

Related

Python percentage of 2 columns in new column based on condition

I have asked earlier this question and got some feedback however I am still stuck in some mystery where I am not able to calculate the percentage of 2 columns based on conditions. 2 columns are ‘tested population’ and ‘total population’ based on grouping ‘Year’ & ‘Gender’ and show it in new column as ‘percentage’…
Year Race Gender Tested population Total population
2017 Asian Male 345 567
2017 Hispanic Female 666 67899
2018 Native Male 333 35543
2018 Asian Female 665 78955
2019 Hispanic Female 4444 44356
2020 Native Male 3642 6799
2017 Asian Male 5467 7998
2018 Asian Female 5467 7998
2019 Hispanic Male 456 4567
Table
code
df = pd.DataFrame(alldata, columns=['Year', 'Gender', 'Tested population', 'Total population'])
df2 = df.groupby(['Year', 'Gender']).agg({'Tested population': 'sum'})
pop_pcts = df2.groupby(level=0).apply(lambda x:
100 * x / float(x.sum()))
print(pop_pcts)
Output:
Tested population
Year Gender
2017 Female 10.280951
Male 89.719049
2018 Female 94.849188
Male 5.150812
2019 Female 90.693878
Male 9.306122
2020 Male 100.000000
Whereas i want data as in this format to show along with other columns as a new column 'Percentage' .
Year Race Gender Tested population Total population Percentage
2017 Asian Male 345 567 60.8466
2017 Hispanic Female 666 67899 0.98087
2018 Native Male 333 35543 0.93689
2018 Asian Female 665 78955 0.84225
2019 Hispanic Female 4444 44356 10.0189
2020 Native Male 3642 6799 53.5667
2019 Hispanic Male 456 4567 9.98467
I have gone through Pandas percentage of total with groupby
and not able to fix my issues, can someone help on this
df['Percentage'] = df['Tested population']/df['Total Population']
I believe you just need to add a column.

How to fill empty cell value in pandas with condition

My sample dataset is as below. Actuall data till 2020 is available.
Item Year Amount final_sales
A1 2016 123 400
A2 2016 23 40
A3 2016 6
A4 2016 10 100
A5 2016 5 200
A1 2017 123 400
A2 2017 23
A3 2017 6
A4 2017 10
A5 2017 5 200
I have to extrapolate 2017 (and subsequent years) final_sales column data from 2016 for every Item if 2017 data not available.
In the above dataset final_sales not available for the year 2017 for A2 and A4 but available for 2016 year. How to bring in 2016 data (final_sales) value if corresponding year final_sales not available?
Expected results as below. Thanks.
Item Year Amount final_sales
A1 2016 123 400
A2 2016 23 40
A3 2016 6
A4 2016 10 100
A5 2016 5 200
A1 2017 123 400
A2 2017 23 40
A3 2017 6
A4 2017 10 100
A5 2017 5 200
It looks like you want to fill forward where there is missing data.
You can do this with 'fillna', which is available on pd.DataFrame objects.
In your case, you only want to fill forward for each item, so first group by item, and then use fillna. The method 'pad' just carries forward in order (hence why we sort first).
df['final_sales'] = df.sort_values('Year').groupby('Item')['final_sales'].fillna(method='pad')
Note that on your example data, A3 is missing for 2016 as well, so there is nothing to carry forward and it remains missing for 2017.
For me working GroupBy.ffill, only necessary sorted Year column like in question sample data:
#if necessary sorting by both columns
df = df.sort_values(['Year', 'Item'])
df['final_sales'] = df.groupby('Item')['final_sales'].ffill()
print (df)
Item Year Amount final_sales
0 A1 2016 123 400.0
1 A2 2016 23 40.0
2 A3 2016 6 NaN
3 A4 2016 10 100.0
4 A5 2016 5 200.0
5 A1 2017 123 400.0
6 A2 2017 23 40.0
7 A3 2017 6 NaN
8 A4 2017 10 100.0
9 A5 2017 5 200.0
Something like this?:
def fill_final(x):
if x['year'] != 2016:
return df[(df['year'] == 2016) & (df['Item'] == x['Item'])]['final_sales']
else: return x['final_sales']
df['final_sales'] = df.apply(lambda x: fill_final(x), axis = 1)
did not test this but should set you on the right path

Summing a years worth of data that spans two years pandas

I have a DataFrame that contains data similar to this:
Name Date A B C
John 19/04/2018 10 11 8
John 20/04/2018 9 7 9
John 21/04/2018 22 15 22
… … … … …
John 16/04/2019 8 8 9
John 17/04/2019 10 11 18
John 18/04/2019 8 9 11
Rich 19/04/2018 18 7 6
… … … … …
Rich 18/04/2019 19 11 17
The data can start on any day and contains at least 365 days of data, sometimes more. What I want to end up with is a DataFrame like this:
Name Date Sum
John April 356
John May 276
John June 209
Rich April 452
I need to sum up all of the months to get a year’s worth of data (April - March) but I need to be able to handle taking part of April’s total (in this example) from 2018 and part from 2019. What I would also like to do is shift the days so they are consecutive and follow on in sequence so rather than:
John 16/04/2019 8 8 9 Tuesday
John 17/04/2019 10 11 18 Wednesday
John 18/04/2019 8 9 11 Thursday
John 19/04/2019 10 11 8 Thursday (was 19/04/2018)
John 20/04/2019 9 7 9 Friday (was 20/04/2018)
It becomes
John 16/04/2019 8 8 9 Tuesday
John 17/04/2019 10 11 18 Wednesday
John 18/04/2019 8 9 11 Thursday
John 19/04/2019 9 7 9 Friday (was 20/04/2018)
Prior to summing to get the final DataFrame. Is this possible?
Additional information requested in comments
Here is a link to the initial data set https://github.com/stottp/exampledata/blob/master/SOExample.csv and the required output would be:
Name Month Total
John March 11634
John April 11470
John May 11757
John June 10968
John July 11682
John August 11631
John September 11085
John October 11924
John November 11593
John December 11714
John January 11320
John February 10167
Rich March 11594
Rich April 12383
Rich May 12506
Rich June 11112
Rich July 11636
Rich August 11303
Rich September 10667
Rich October 10992
Rich November 11721
Rich December 11627
Rich January 11669
Rich February 10335
Let's see if I understood correctly. If you want to sum, I suppose you mean sum the values of columns ['A', 'B', 'C'] for each day and get the total value monthly.
If that's right, the first thing to to is set the ['Date'] column as the index so that the data frame is easier to work with:
df.set_index(df['Date'], inplace=True, drop=True)
del df['Date']
Next, you will want to add the new column ['Sum'] by re-sampling your data frame (from days to months) whilst summing the values of ['A', 'B', 'C']:
df['Sum'] = df['A'].resample('M').sum() + df['B'].resample('M').sum() + df['C'].resample('M').sum()
df['Sum'].head()
Out[37]:
Date
2012-11-30 1956265
2012-12-31 2972076
2013-01-31 2972565
2013-02-28 2696121
2013-03-31 2970687
Freq: M, dtype: int64
The last part about squashing February of 2018 and 2019 together as if they were a single month might yield from:
df['2019-02'].merge(df['2018-02'], how='outer', on=['Date', 'A', 'B', 'C'])
Test this last step and see if it works for you.
Cheers

Pandas DF Lookup - if value for record not available take the latest available

I am new to pandas and I've been trying to accomplish this task for a couple days now without success. In the beginning I had 3 dataframes that I was supposed to turn into only one with all the info. I managed to merge two of them correctly, which is now df1, however, for the third one there is a tricky logic that I couldnt figure out yet. The data structure is the following:
df1.head()
Out[12]:
Concat YearNb_x MonthNb_x WeekNb_x NatCoCode VariantCode \
1 BN2004384AAA112017 2017 1 1 AAA BN2004384
2 BN2004388AAA112017 2017 1 1 AAA BN2004388
4 BN2004510AAA112017 2017 1 1 AAA BN2004510
5 BN2004645AAA112017 2017 1 1 AAA BN2004645
6 BN2004780AAA112017 2017 1 1 AAA BN2004780
Suppliercode_x ModelName_x SumOfVolume Price
1 HUAWEI P9 (Eva) 745 399.991667
2 HUAWEI P9 lite (Venus) 1770 211.666667
4 SAMSUNG A3 (2016) 6210 205.000000
5 APPLE iPhone 6s Plus 2 724.166667
6 SAMSUNG Galaxy J5 (2016) 4571 190.000000
df2.head()
Out[13]:
YearNb MonthNb WeekNb NatCoCode VariantCode Suppliercode \
0 2016 1 1 BBB BN2001707 APPLE
1 2016 1 2 BBB BN2001707 APPLE
2 2016 1 3 BBB BN2001707 APPLE
3 2016 1 4 BBB BN2001707 APPLE
4 2016 1 1 BBB BN2002345 SAMSUNG
ModelName LocalPrice ProductCategoryCode
0 iPhone 4S 385.0 HS
1 iPhone 4S 385.0 HS
2 iPhone 4S 385.0 HS
3 iPhone 4S 385.0 HS
4 G. Note 2 (N7100) 395.0 HS
All info except for the prices are supposed to be the same, what I need to do is lookup for the prices (it can be by month, WeekNb can be ignored) in df2 for the same combination of items (NatCoCode, VariantCode, Supplier, Etc.) and IF the price for the respective month is not available df1 should take the LATEST available.
I was trying the following logic, which obviously doesnt work:
import pandas as pd
df1 = pd.read_excel('output2.xlsx')
df2 = pd.read_excel('localtest.xlsx')
def PriceAssignment(df1,df2):
i = 1
while i >= 5:
for i in df1['VariantCode'], df2['BNCode']:
if df1.loc[df1[i], df1['YearNb_x'], df1['WeekNb_x'], df1['NatCoCode'], df1['VariantCode']] == df2.loc[df2[i], df2['YearNb_x'], df2['WeekNb_x'], df2['NatCoCode'], df2['VariantCode']]:
df1['LocalPrice'] == df2.loc['Price']
elif df2['MonthNb']==12:
df2['YearNb'] -= i
else:
df2['MonthNb'] -= i
i += 1
return df1
The output would be something like:
From:
2017 2 OBE BN2004780BBB622017 SAMSUNG Galaxy J5 (2016) 500
2017 2 OBE BN2005184BBB622017 APPLE iPhone 6s Plus 300
2017 1 OBE BN2005190BBB622017 APPLE iPhone 7 350
To:
771 BN2004780BBB622017 2017 2 6 BBB BN2004780 SAMSUNG Galaxy J5 (2016) 67 171.9008264
772 BN2005184BBB622017 2017 2 6 BBB BN2005184 APPLE iPhone 6s Plus 13 614.8760331
773 BN2005190BBB622017 2017 2 6 BBB BN2005190 APPLE iPhone 7 1261 690.9090909
Result:
771 BN2004780BBB622017 2017 2 6 BBB BN2004780 SAMSUNG Galaxy J5 (2016) 67 171.9008264 500
772 BN2005184BBB622017 2017 2 6 BBB BN2005184 APPLE iPhone 6s Plus 13 614.8760331 300
773 BN2005190BBB622017 2017 2 6 BBB BN2005190 APPLE iPhone 7 1261 690.9090909 350
In this example, record 777 doesnt have a local price for the same month (03), in this case I would like to assign the latest available value to this item, in this case the latest value available for this item is from the month before, so this would be added in the LocalPrice column
I was trying to check if there was an available price for the same item in the last five months (subjective). The data (spreadsheets) can be found HERE
Does anyone have an idea or know a proper way on how to perform such operation?

Lookup value in one dataframe and paste it into another dataframe

I have two dataframes in Python one big (car listings), one small (car base configuration prices). The small one looks like this:
Make Model MSRP
0 Acura ILX 27990
1 Acura MDX 43015
2 Acura MDX Sport Hybrid 51960
3 Acura NSX 156000
4 Acura RDX 35670
5 Acura RLX 54450
6 Acura TLX 31695
7 Alfa Romeo 4C 55900
8 Alfa Romeo Giulia 37995
… … … . …
391 Toyota Yaris 14895
392 Toyota Yaris iA 15950
393 Volkswagen Atlas 33500
394 Volkswagen Beetle 19795
395 Volkswagen CC 34475
396 Volkswagen GTI 24995
397 Volkswagen Golf 19575
398 Volkswagen Golf Alltrack 25850
399 Volkswagen Golf R 37895
400 Volkswagen Golf SportWagen 21580
401 Volkswagen Jetta 17680
402 Volkswagen Passat 22440
403 Volkswagen Tiguan 24890
404 Volkswagen Touareg 42705
405 Volkswagen e-Golf 28995
406 Volvo S60 33950
Now I want to paste the values from the MSRP column (far right column) based on matching the Make and Model columns into the big dataframe (car listings) that looks like the following:
makeName modelName trimName carYear mileage
0 BMW X5 sDrive35i 2017 0
1 BMW X5 sDrive35i 2017 3
2 BMW X5 sDrive35i 2017 0
3 Audi A4 Premium Plus2017 0
4 Kia Optima LX 2016 10
5 Kia Optima SX Turbo 2017 15
6 Kia Optima EX 2016 425
7 Rolls-Royce Ghost Series II 2017 15
… … … … … …
In the end I would like to have the following:
makeName modelName trimName carYear mileage MSRP
0 BMW X5 sDrive35i 2017 0 value from the other table
1 BMW X5 sDrive35i 2017 3 value from the other table
2 BMW X5 sDrive35i 2017 0 value from the other table
3 Audi A4 Premium Plus2017 0 value from the other table
4 Kia Optima LX 2016 10 value from the other table
5 Kia Optima SX Turbo 2017 15 value from the other table
6 Kia Optima EX 2016 425 value from the other table
7 Rolls-Royce Ghost Series II 2017 15 value from the other table
… … … … … …
I read the documentation regarding pd.concat, merge and join but I am not making any progress.
Can you guys help?
Thanks!
You can use merge to join the two dataframes together.
car_base.merge(car_listings, left_on=['makeName','modelName'], right_on=['Make','Model'])

Resources