This question already has answers here:
Python: Sum values in DataFrame if other values match between DataFrames
(3 answers)
Closed 2 years ago.
I have the following two dataframes.
last_request_df:
name fruit_id sold
apple 123 1
melon 456 12
banana 12 23
current_request_df:
name fruit_id sold
apple 123 5
melon 456 19
banana 12 43
orange 55 3
mango 66 0
The output should be based on matching the fruit_id column from both last_request_df and current_request_df and figuring out the difference in the sold column:
difference_df:
name fruit_id sold
apple 123 4
melon 456 7
banana 12 20
orange 55 3
mango 66 0
I've tried the following but I'm afraid this is not matching by the fruid_id column.
difference_df['sold_diff'] = current_request_df['sold'] - last_request_df['sold']
Is there a preferred method to capture the difference_df based on the data I've provided?
#Reset index to name for both dfs
difference_df=current_request_df.set_index('name')
last_request_df=last_request_df.set_index('name')
#Find the difference using sub. To do this ensure the two dfs have same index by reindexing
difference_df['sold']=difference_df['sold'].sub(last_request_df.reindex(index=difference_df.index).fillna(0)['sold'])
fruit_id sold
name
apple 123 4.0
melon 456 7.0
banana 12 20.0
orange 55 3.0
mango 66 0.0
Related
Use sqlalchemy
Parent table
id name
1 sea bass
2 Tanaka
3 Mike
4 Louis
5 Jack
Child table
id user_id pname number
1 1 Apples 2
2 1 Banana 1
3 1 Grapes 3
4 2 Apples 2
5 2 Banana 2
6 2 Grapes 1
7 3 Strawberry 5
8 3 Banana 3
9 3 Grapes 1
I want to sort by parent id with apples and number of bananas, but when I search for "parent id with apples", the search is filtered and the bananas disappear. I have searched for a way to achieve this, but have not been able to find it.
Thank you in advance for your help.
Translated with www.DeepL.com/Translator (free version)
This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 1 year ago.
I have df1 as follow:
Name | ID
________|_____
Banana | 10
Orange | 21
Peach | 115
Then I have a df2 like this:
ID Price
10 2.34
10 2.34
115 6.00
I want to modify df2 to add another column name Fruit to get this as output:
ID Fruit Price
10 Banana 2.34
10 Banana 2.34
115 Peach 6.00
200 NA NA
I can use iloc to get one specific match but how to do it in all records in the df2?
Have you tried looking at the merge function ?
pd.merge(df1, df2)
Output :
Name Id Price
0 Banana 10 2.34
1 Banana 10 2.34
2 Peach 115 6.00
EDIT :
If you want to add only a specific column from df2 :
df = pd.merge(df1,df2[['Id','Price']],on='Id', how='left')
Output :
Name Id Price
0 Banana 10 2.34
1 Banana 10 2.34
2 Orange 21 NaN
3 Peach 115 6.00
I need to count the unique values of column A and filter out the column with values greater than say 2
A C
Apple 4
Orange 5
Apple 3
Mango 5
Orange 1
I have calculated the unique values but not able to figure out how to filer them df.value_count()
I want to filter column A that have greater than 2, expected Dataframe
A B
Apple 4
Orange 5
Apple 3
Orange 1
value_counts should be called on a Series (single column) rather than a DataFrame:
counts = df['A'].value_counts()
Giving:
A
Apple 2
Mango 1
Orange 2
dtype: int64
You can then filter this to only keep those >= 2 and use isin to filter your DataFrame:
filtered = counts[counts >= 2]
df[df['A'].isin(filtered.index)]
Giving:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1
Use duplicated with parameter keep=False:
df[df.duplicated(['A'], keep=False)]
Output:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1
Input
Fruit Count Price tag
Apple 55 35 red
Orange 60 40 orange
Apple 60 36 red
Apple 70 41 red
Output 1
Fruit Mean tag
Apple 35.5 red
Orange 40 orange
I need mean on condition price between 31 and 40
Output 2
Fruit Count tag
Apple 2 red
Orange 1 orange
I need count on condition price between 31 and 40
pls help
Use between with boolean indexing for filtering:
df1 = df[df['Price'].between(31, 40)]
print (df1)
Fruit Count Price tag
0 Apple 55 35 red
1 Orange 60 40 orange
2 Apple 60 36 red
If possible multiple columns by aggregated functions:
df2 = df1.groupby(['Fruit', 'tag'])['Price'].agg(['mean','size']).reset_index()
print (df2)
Fruit tag mean size
0 Apple red 35.5 2
1 Orange orange 40.0 1
Or 2 separately DataFrames:
df3 = df1.groupby(['Fruit', 'tag'], as_index=False)['Price'].mean()
print (df3)
Fruit tag Price
0 Apple red 35.5
1 Orange orange 40.0
df4 = df1.groupby(['Fruit', 'tag'])['Price'].size().reset_index()
print (df4)
Fruit tag Price
0 Apple red 2
1 Orange orange 1
My DataFrame object similar to this one:
Product StoreFrom StoreTo Date
1 out melon StoreQ StoreP 20170602
2 out cherry StoreW StoreO 20170614
3 out Apple StoreE StoreU 20170802
4 in Apple StoreE StoreU 20170812
I want to avoid duplications, in 3rd and 4th row show same action. I try to reach
Product StoreFrom StoreTo Date Days
1 out melon StoreQ StoreP 20170602
2 out cherry StoreW StoreO 20170614
5 in Apple StoreE StoreU 20170812 10
and I got more than 10k entry. I could not find similar work to this. Any help will be very useful.
d1 = df.assign(Date=pd.to_datetime(df.Date.astype(str)))
d2 = d1.assign(Days=d1.groupby(cols).Date.apply(lambda x: x - x.iloc[0]))
d2.drop_duplicates(cols, 'last')
io Product StoreFrom StoreTo Date Days
1 out melon StoreQ StoreP 2017-06-02 0 days
2 out cherry StoreW StoreO 2017-06-14 0 days
4 in Apple StoreE StoreU 2017-08-12 10 days