Find difference between two integer columns but by specific ID column [duplicate] - python-3.x

This question already has answers here:
Python: Sum values in DataFrame if other values match between DataFrames
(3 answers)
Closed 2 years ago.
I have the following two dataframes.
last_request_df:
name fruit_id sold
apple 123 1
melon 456 12
banana 12 23
current_request_df:
name fruit_id sold
apple 123 5
melon 456 19
banana 12 43
orange 55 3
mango 66 0
The output should be based on matching the fruit_id column from both last_request_df and current_request_df and figuring out the difference in the sold column:
difference_df:
name fruit_id sold
apple 123 4
melon 456 7
banana 12 20
orange 55 3
mango 66 0
I've tried the following but I'm afraid this is not matching by the fruid_id column.
difference_df['sold_diff'] = current_request_df['sold'] - last_request_df['sold']
Is there a preferred method to capture the difference_df based on the data I've provided?

#Reset index to name for both dfs
difference_df=current_request_df.set_index('name')
last_request_df=last_request_df.set_index('name')
#Find the difference using sub. To do this ensure the two dfs have same index by reindexing
difference_df['sold']=difference_df['sold'].sub(last_request_df.reindex(index=difference_df.index).fillna(0)['sold'])
fruit_id sold
name
apple 123 4.0
melon 456 7.0
banana 12 20.0
orange 55 3.0
mango 66 0.0

Related

On SQL with one-to-many merging and many as a narrowing condition

Use sqlalchemy
Parent table
id name
1 sea bass
2 Tanaka
3 Mike
4 Louis
5 Jack
Child table
id user_id pname number
1 1 Apples 2
2 1 Banana 1
3 1 Grapes 3
4 2 Apples 2
5 2 Banana 2
6 2 Grapes 1
7 3 Strawberry 5
8 3 Banana 3
9 3 Grapes 1
I want to sort by parent id with apples and number of bananas, but when I search for "parent id with apples", the search is filtered and the bananas disappear. I have searched for a way to achieve this, but have not been able to find it.
Thank you in advance for your help.
Translated with www.DeepL.com/Translator (free version)

Looking for specifics records matchs in another dataframe [duplicate]

This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 1 year ago.
I have df1 as follow:
Name | ID
________|_____
Banana | 10
Orange | 21
Peach | 115
Then I have a df2 like this:
ID Price
10 2.34
10 2.34
115 6.00
I want to modify df2 to add another column name Fruit to get this as output:
ID Fruit Price
10 Banana 2.34
10 Banana 2.34
115 Peach 6.00
200 NA NA
I can use iloc to get one specific match but how to do it in all records in the df2?
Have you tried looking at the merge function ?
pd.merge(df1, df2)
Output :
Name Id Price
0 Banana 10 2.34
1 Banana 10 2.34
2 Peach 115 6.00
EDIT :
If you want to add only a specific column from df2 :
df = pd.merge(df1,df2[['Id','Price']],on='Id', how='left')
Output :
Name Id Price
0 Banana 10 2.34
1 Banana 10 2.34
2 Orange 21 NaN
3 Peach 115 6.00

Filter rows based on the count of unique values

I need to count the unique values of column A and filter out the column with values greater than say 2
A C
Apple 4
Orange 5
Apple 3
Mango 5
Orange 1
I have calculated the unique values but not able to figure out how to filer them df.value_count()
I want to filter column A that have greater than 2, expected Dataframe
A B
Apple 4
Orange 5
Apple 3
Orange 1
value_counts should be called on a Series (single column) rather than a DataFrame:
counts = df['A'].value_counts()
Giving:
A
Apple 2
Mango 1
Orange 2
dtype: int64
You can then filter this to only keep those >= 2 and use isin to filter your DataFrame:
filtered = counts[counts >= 2]
df[df['A'].isin(filtered.index)]
Giving:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1
Use duplicated with parameter keep=False:
df[df.duplicated(['A'], keep=False)]
Output:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1

calculating mean with a condition on python pandas Group by on two columns. And print only the mean for each category?

Input
Fruit Count Price tag
Apple 55 35 red
Orange 60 40 orange
Apple 60 36 red
Apple 70 41 red
Output 1
Fruit Mean tag
Apple 35.5 red
Orange 40 orange
I need mean on condition price between 31 and 40
Output 2
Fruit Count tag
Apple 2 red
Orange 1 orange
I need count on condition price between 31 and 40
pls help
Use between with boolean indexing for filtering:
df1 = df[df['Price'].between(31, 40)]
print (df1)
Fruit Count Price tag
0 Apple 55 35 red
1 Orange 60 40 orange
2 Apple 60 36 red
If possible multiple columns by aggregated functions:
df2 = df1.groupby(['Fruit', 'tag'])['Price'].agg(['mean','size']).reset_index()
print (df2)
Fruit tag mean size
0 Apple red 35.5 2
1 Orange orange 40.0 1
Or 2 separately DataFrames:
df3 = df1.groupby(['Fruit', 'tag'], as_index=False)['Price'].mean()
print (df3)
Fruit tag Price
0 Apple red 35.5
1 Orange orange 40.0
df4 = df1.groupby(['Fruit', 'tag'])['Price'].size().reset_index()
print (df4)
Fruit tag Price
0 Apple red 2
1 Orange orange 1

Add rows according to other rows

My DataFrame object similar to this one:
Product StoreFrom StoreTo Date
1 out melon StoreQ StoreP 20170602
2 out cherry StoreW StoreO 20170614
3 out Apple StoreE StoreU 20170802
4 in Apple StoreE StoreU 20170812
I want to avoid duplications, in 3rd and 4th row show same action. I try to reach
Product StoreFrom StoreTo Date Days
1 out melon StoreQ StoreP 20170602
2 out cherry StoreW StoreO 20170614
5 in Apple StoreE StoreU 20170812 10
and I got more than 10k entry. I could not find similar work to this. Any help will be very useful.
d1 = df.assign(Date=pd.to_datetime(df.Date.astype(str)))
d2 = d1.assign(Days=d1.groupby(cols).Date.apply(lambda x: x - x.iloc[0]))
d2.drop_duplicates(cols, 'last')
io Product StoreFrom StoreTo Date Days
1 out melon StoreQ StoreP 2017-06-02 0 days
2 out cherry StoreW StoreO 2017-06-14 0 days
4 in Apple StoreE StoreU 2017-08-12 10 days

Resources