On SQL with one-to-many merging and many as a narrowing condition - python-3.x

Use sqlalchemy
Parent table
id name
1 sea bass
2 Tanaka
3 Mike
4 Louis
5 Jack
Child table
id user_id pname number
1 1 Apples 2
2 1 Banana 1
3 1 Grapes 3
4 2 Apples 2
5 2 Banana 2
6 2 Grapes 1
7 3 Strawberry 5
8 3 Banana 3
9 3 Grapes 1
I want to sort by parent id with apples and number of bananas, but when I search for "parent id with apples", the search is filtered and the bananas disappear. I have searched for a way to achieve this, but have not been able to find it.
Thank you in advance for your help.
Translated with www.DeepL.com/Translator (free version)

Related

comma seperated values in columns as rows in pandas

I have a dataframe in pandas as mentioned below where elements in column info is same as unique file in column id:
id text info
1 great boy,police
1 excellent boy,police
2 nice girl,mother,teacher
2 good girl,mother,teacher
2 bad girl,mother,teacher
3 awesome grandmother
4 superb grandson
All I want to get list elements as row for each file, like:
id text info
1 great boy
1 excellent police
2 nice girl
2 good mother
2 bad teacher
3 awesome grandmother
4 superb grandson
Let us try
df['new'] = df.loc[~df.id.duplicated(),'info'].str.split(',').explode().values
df
id text info new
0 1 great boy,police boy
1 1 excellent boy,police police
2 2 nice girl,mother,teacher girl
3 2 good girl,mother,teacher mother
4 2 bad girl,mother,teacher teacher
5 3 awesome grandmother grandmother
6 4 superb grandson grandson
Take advantage of the fact that 'info' is duplicated.
df['info'] = df['info'].drop_duplicates().str.split(',').explode().to_numpy()
Output:
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson
One way using pandas.DataFrame.groupby.transform.
Note that this assumes:
elements in info have same length as the number of members for each id after split by ','
elements in info are identical among the same id.
df["info"] = df.groupby("id")["info"].transform(lambda x: x.str.split(",").iloc[0])
print(df)
Output:
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson
create temp variable counting the number of rows for each info group:
temp = df.groupby('info').cumcount()
Do a list comprehension to index per text in info:
df['info'] = [ent.split(',')[pos] for ent, pos in zip(df['info'], temp)]
df
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson
Or try apply:
df['info'] = pd.DataFrame({'info': df['info'].str.split(','), 'n': df.groupby('id').cumcount()}).apply(lambda x: x['info'][x['n']], axis=1)
Output:
>>> df
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson
>>>

Find difference between two integer columns but by specific ID column [duplicate]

This question already has answers here:
Python: Sum values in DataFrame if other values match between DataFrames
(3 answers)
Closed 2 years ago.
I have the following two dataframes.
last_request_df:
name fruit_id sold
apple 123 1
melon 456 12
banana 12 23
current_request_df:
name fruit_id sold
apple 123 5
melon 456 19
banana 12 43
orange 55 3
mango 66 0
The output should be based on matching the fruit_id column from both last_request_df and current_request_df and figuring out the difference in the sold column:
difference_df:
name fruit_id sold
apple 123 4
melon 456 7
banana 12 20
orange 55 3
mango 66 0
I've tried the following but I'm afraid this is not matching by the fruid_id column.
difference_df['sold_diff'] = current_request_df['sold'] - last_request_df['sold']
Is there a preferred method to capture the difference_df based on the data I've provided?
#Reset index to name for both dfs
difference_df=current_request_df.set_index('name')
last_request_df=last_request_df.set_index('name')
#Find the difference using sub. To do this ensure the two dfs have same index by reindexing
difference_df['sold']=difference_df['sold'].sub(last_request_df.reindex(index=difference_df.index).fillna(0)['sold'])
fruit_id sold
name
apple 123 4.0
melon 456 7.0
banana 12 20.0
orange 55 3.0
mango 66 0.0

Filter rows based on the count of unique values

I need to count the unique values of column A and filter out the column with values greater than say 2
A C
Apple 4
Orange 5
Apple 3
Mango 5
Orange 1
I have calculated the unique values but not able to figure out how to filer them df.value_count()
I want to filter column A that have greater than 2, expected Dataframe
A B
Apple 4
Orange 5
Apple 3
Orange 1
value_counts should be called on a Series (single column) rather than a DataFrame:
counts = df['A'].value_counts()
Giving:
A
Apple 2
Mango 1
Orange 2
dtype: int64
You can then filter this to only keep those >= 2 and use isin to filter your DataFrame:
filtered = counts[counts >= 2]
df[df['A'].isin(filtered.index)]
Giving:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1
Use duplicated with parameter keep=False:
df[df.duplicated(['A'], keep=False)]
Output:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1

VBA Excel - Sort a daily delivery manifest into groups based on complex rules

I am trying to come up with a way to sort my Excel-Table. I wrote a very basic VBA to do this when it was a random sort only. Now there's complex parameters/rules I have to meet and it's way outside my skillset.
The problem: I receive a daily file with a list of items via shipments like the example below. The list can have as few as 1 and as many as 24 items. I have to sort these by restaurant.
Example Original List
ITEM SHIPMENT
Oranges 1
Apples 1
Grapes 1
Pears 2
Pork 3
Chicken 4
Rice 5
Peas 5
Beans 5
Water 5
Corn 5
Milk 5
Eggs 5
Salmon 6
Tofu 7
Juice 8
Cheese 8
Salt 8
Pepper 9
Onions 10
Oats 11
Barley 11
Kale 11
Chips 12
The items need to be sorted out to 6 restaurants and there are complex rules:
Overall Rules
No Restaurant can have more than 1 item from a shipment
No Restaurant can have more than 4 items
Sorting Rules
Restaurant 1 always gets the first two items (Items 1-2)
Restaurant 2-5 evenly gets the next 16 items (Items 3-18)
Restaurant 6 gets the next 2 items (Items 19-20)
Restaurant 1 and 6 then evenly get the last 4 items (Items 21-24)
If there are more than 6 items in a shipment (more items than restaurants) the extra items stay in the warehouse.
The Overall Rules override the sorting rules. For example in our example list Restaurant 1 cannot have both Oranges and Apples since they are from the same shipment so the sort changes.
Example Sort
Restaurant 1 Shipment
Oranges 1
Pears 2
Rice 5
Kale 11
Restaurant 2
Apples 1
Peas 5
Salmon 6
Salt 8
Restaurant 3
Grapes 1
Beans 5
Tofu 7
Pepper 9
Restaurant 4
Pork 3
Water 5
Juice 8
Onions 10
Restaurant 5
Chicken 4
Corn 5
Cheese 8
Oats 11
Restaurant 6
Milk 5
Barley 11
Chips 12
Warehouse Items
Eggs 5
Looking at it as a whole now I'm not even sure this is possible and I have no idea how to go about doing it. If anyone has any input I'd love to hear it. Thank you so much for your help.

Add rows according to other rows

My DataFrame object similar to this one:
Product StoreFrom StoreTo Date
1 out melon StoreQ StoreP 20170602
2 out cherry StoreW StoreO 20170614
3 out Apple StoreE StoreU 20170802
4 in Apple StoreE StoreU 20170812
I want to avoid duplications, in 3rd and 4th row show same action. I try to reach
Product StoreFrom StoreTo Date Days
1 out melon StoreQ StoreP 20170602
2 out cherry StoreW StoreO 20170614
5 in Apple StoreE StoreU 20170812 10
and I got more than 10k entry. I could not find similar work to this. Any help will be very useful.
d1 = df.assign(Date=pd.to_datetime(df.Date.astype(str)))
d2 = d1.assign(Days=d1.groupby(cols).Date.apply(lambda x: x - x.iloc[0]))
d2.drop_duplicates(cols, 'last')
io Product StoreFrom StoreTo Date Days
1 out melon StoreQ StoreP 2017-06-02 0 days
2 out cherry StoreW StoreO 2017-06-14 0 days
4 in Apple StoreE StoreU 2017-08-12 10 days

Resources