How do I check total amount of times a certain value occurred in a nested loop? - python-3.x

Question: Calculate the total number of apples bought on monday and wendsday only.
This is my code currently:
apple = 0
banana = 0
orange = 0
#number of fruits bought on monday, tuesday and wendsday respectively
fruit = [ ['apple', 'cherry', 'apple', 'orange'], \
['orange', 'apple', 'apple', 'banana'], \
['banana', 'apple', 'cherry', 'orange'] ]
for x in fruit:
if 'apple' in x:
if fruit.index(x) == 0 or fruit.index(x) == 2:
apple + 1
print(apple)
For some reason, the current result which I am getting for printing apple is 0.
What is wrong with my code?

the problem in your code is you are only incrementing the the number of apples but you are not assigning them into any variable, that's why it is printing it's initial value:
apple = 0
apple + 1
you need to do:
apple += 1
and also fruit.index(x) always return the index of the first occurence of that element, that is:
fruit[1].index('apple')
will return index of first occurence of 'apple' in fruit[1], which is 1.
but According to your question, this solution is incorrect because they were asking no of apples on monday and wednesday only so you need to this manually, because according to your solution it will also count the apples on tuesday also where index of 'apple' is 0 or 2. below is the correct solution
apple = 0
banana = 0
orange = 0
#number of fruits bought on monday, tuesday and wendsday respectively
fruit = [ ['apple', 'cherry', 'apple', 'orange'],
['orange', 'apple', 'apple', 'banana'],
['banana', 'apple', 'cherry', 'orange'] ]
apple += fruit[0].count('apple')
apple += fruit[2].count('apple')
print(apple)

There are two issues with your code.
The first issue is:
if fruit.index(x) == 0 or fruit.index(x) == 2:
apple + 1
apple + 1 is not doing anything meaningful. If you want you need to increment apple, you need to do apple += 1. This results in apple being 2
The second issue is that you need to calculate the total number, which is 3 apples and not 2. Two apples were bought on Monday and 1 on Wednesday.
You can use collections.Counter for this
from collections import Counter
for x in fruit:
if 'apple' in x:
if fruit.index(x) == 0 or fruit.index(x) == 2:
apple += Counter(x)['apple']

it should be apple += 1, not apple + 1

Related

Given a column value, check if another column value is present in preceding or next 'n' rows in a Pandas data frame

I have the following data
jsonDict = {'Fruit': ['apple', 'orange', 'apple', 'banana', 'orange', 'apple','banana'], 'price': [1, 2, 1, 3, 2, 1, 3]}
Fruit price
0 apple 1
1 orange 2
2 apple 1
3 banana 3
4 orange 2
5 apple 1
6 banana 3
What I want to do is check if Fruit == banana and if yes, I want the code to scan the preceding as well as the next n rows from the index position of the 'banana' row, for an instance where Fruit == apple. An example of the expected output is shown below taking n=2.
Fruit price
2 apple 1
5 apple 1
I have tried doing
position = df[df['Fruit'] == 'banana'].index
resultdf= df.loc[((df.index).isin(position)) & (((df['Fruit'].index+2).isin(['apple']))|((df['Fruit'].index-2).isin(['apple'])))]
# Output is an empty dataframe
Empty DataFrame
Columns: [Fruit, price]
Index: []
Preference will be given to vectorized approaches.
IIUC, you can use 2 masks and boolean indexing:
# df = pd.DataFrame(jsonDict)
n = 2
m1 = df['Fruit'].eq('banana')
# is the row ±n of a banana?
m2 = m1.rolling(2*n+1, min_periods=1, center=True).max().eq(1)
# is the row an apple?
m3 = df['Fruit'].eq('apple')
out = df[m2&m3]
output:
Fruit price
2 apple 1
5 apple 1

Find users most frequent recommandations based on input queries

I have a input query table in the following:
query
0 orange
1 apple
2 meat
which I want to make against the user query table as following
user query
0 a1 orange
1 a1 strawberry
2 a1 pear
3 a2 orange
4 a2 strawberry
5 a2 lemon
6 a3 orange
7 a3 banana
8 a6 meat
9 a7 beer
10 a8 juice
Given a query in input query, I want to match it to query by other user in user query table, and return the top 3 ranked by total number of counts.
For example,
orange in input query, it matches user a1,a2,a3 in user query where all have queried orange, other items they have query are strawberry (count of 2), pear, lemon, banana (count of 1).
The answer will be strawberry (since it has max count), pear, lemon (since we only return top 3).
Similar reasoning for apple (no user query therefore output 'nothing') and meat query.
So the final output table is
query recommend
0 orange strawberry
1 orange pear
2 orange lemon
3 apple nothing
4 meat nothing
Here is the code
import pandas as pd
import numpy as np
# Create sample dataframes
df_input = pd.DataFrame( {'query': {0: 'orange', 1: 'apple', 2: 'meat'}} )
df_user = pd.DataFrame( {'user': {0: 'a1', 1: 'a1', 2: 'a1', 3: 'a2', 4: 'a2', 5: 'a2', 6: 'a3', 7: 'a3', 8: 'a6', 9: 'a7', 10: 'a8'}, 'query': {0: 'orange', 1: 'strawberry', 2: 'pear', 3: 'orange', 4: 'strawberry', 5: 'lemon', 6: 'orange', 7: 'banana', 8: 'meat', 9: 'beer', 10: 'juice'}} )
target_users = df_user[df_user['query'].isin(df_input['query'])]['user']
mask_users=df_user['user'].isin(target_users)
mask_queries=df_user['query'].isin(df_input['query'])
df1=df_user[mask_users & mask_queries]
df2=df_user[mask_users]
df=df1.merge(df2,on='user').rename(columns={"query_x":"query", "query_y":"recommend"})
df=df[df['query']!=df['recommend']]
df=df.groupby(['query','recommend'], as_index=False).count().rename(columns={"user":"count"})
df=df.sort_values(['query','recommend'],ascending=False, ignore_index=False)
df=df.groupby('query').head(3)
df=df.drop(columns=['count'])
df=df_input.merge(df,how='left',on='query').fillna('nothing')
df
Where df is the result. Is there any way to make the code more concise?
Unless there is a particular reason to favor pears over bananas (since they both count for one), I would suggest a more idiomatic way to do it:
import pandas as pd
df_input = pd.DataFrame(...)
df_user = pd.DataFrame(...)
df_input = (
df_input
.assign(
recommend=df_input["query"].map(
lambda x: df_user[
(df_user["user"].isin(df_user.loc[df_user["query"] == x, "user"]))
& (df_user["query"] != x)
]
.value_counts(subset="query")
.index[0:3]
.to_list()
if x in df_user["query"].unique()
else "nothing"
)
)
.explode("recommend")
.fillna("nothing")
.reset_index(drop=True)
)
print(df_input)
# Output
query recommend
0 orange strawberry
1 orange banana
2 orange lemon
3 apple nothing
4 meat nothing

How to find the total length of a column value that has multiple values in different rows for another column

Is there a way to find IDs that have both Apple and Strawberry, and then find the total length? and IDs that has only Apple, and IDS that has only Strawberry?
df:
ID Fruit
0 ABC Apple <-ABC has Apple and Strawberry
1 ABC Strawberry <-ABC has Apple and Strawberry
2 EFG Apple <-EFG has Apple only
3 XYZ Apple <-XYZ has Apple and Strawberry
4 XYZ Strawberry <-XYZ has Apple and Strawberry
5 CDF Strawberry <-CDF has Strawberry
6 AAA Apple <-AAA has Apple only
Desired output:
Length of IDs that has Apple and Strawberry: 2
Length of IDs that has Apple only: 2
Length of IDs that has Strawberry: 1
Thanks!
If always all values are only Apple or Strawberry in column Fruit you can compare sets per groups and then count ID by sum of Trues values:
v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print (out)
2
EDIT: If there is many values:
s = df.groupby('ID')['Fruit'].agg(frozenset).value_counts()
print (s)
{Apple} 2
{Strawberry, Apple} 2
{Strawberry} 1
Name: Fruit, dtype: int64
You can use pivot_table and value_counts for DataFrames (Pandas 1.1.0.):
df.pivot_table(index='ID', columns='Fruit', aggfunc='size', fill_value=0)\
.value_counts()
Output:
Apple Strawberry
1 1 2
0 2
0 1 1
Alternatively you can use:
df.groupby(['ID', 'Fruit']).size().unstack('Fruit', fill_value=0)\
.value_counts()

Compare values in two different pandas columns

I have a dataframe that looks like this:
Fruit Cost Quantity Fruit_Copy
Apple 0.5 6 Watermelon
Orange 0.3 2 Orange
Apple 0.5 8 Apple
Apple 0.5 7 Apple
Banana 0.25 8 Banana
Banana 0.25 7 Banana
Apple 0.5 6 Apple
Apple 0.5 3 Apple
I want to write a snippet that, in pandas, compares Fruit and Fruit_Copy and outputs a new column "Match" that indicates if the values in Fruit = Fruit_Copy.
Thanks in advance!
Lets say your dataframe is 'fruits'. Then you can make use of the Pandas Series Equals function pd.Series.eq as,
fruits['Match'] = pd.Series.eq(fruits['Fruit'],fruits['Fruit_Copy'])
Something like this would work.
df.loc[df['Fruit'] == df['Fruit_Copy'], 'Match'] = 'Yes'
Using numpy.where:
df['Match'] = np.where(df['Fruit'] == df['Fruit_Copy'], 'Yes', 'No')
You could try something like this:
import pandas as pd
import numpy as np
fruits = pd.DataFrame({'Fruit':['Apple', 'Orange', 'Apple', 'Apple', 'Banana', 'Banana', 'Apple', 'Apple'], 'Cost':[0.5,0.3,0.5,0.5,0.25,0.25,0.5,0.5], 'Quantity':[6,2,8,7,8,7,6,3], 'Fruit_Copy':['Watermelon', 'Orange', 'Apple', 'Apple', 'Banana', 'Banana', 'Apple', 'Apple']})
fruits['Match'] = np.where(fruits['Fruit'] == fruits['Fruit_Copy'], 1, 0)
fruits
Fruit Cost Quantity Fruit_Copy Match
0 Apple 0.50 6 Watermelon 0
1 Orange 0.30 2 Orange 1
2 Apple 0.50 8 Apple 1
3 Apple 0.50 7 Apple 1
4 Banana 0.25 8 Banana 1
5 Banana 0.25 7 Banana 1
6 Apple 0.50 6 Apple 1
7 Apple 0.50 3 Apple 1

compare columns and replace result in existing column

I have two pandas columns, where I first compare the two columns and then replace an old string with a new one.
My data:
shopping on_List
Banana 1
Apple 0
Grapes 1
None 0
Banana 1
Nuts 0
Lemon 1
In order to compare the two I have done the following:
results = []
for shopping, on_list in zip(df.shopping, df.on_list):
if shopping != 'None' and on_list == 1:
items = shopping
if items == 'Banana':
re = items.replace('Banana', 'Bananas')
elif items == 'Lemon':
re = items.replace('Lemon', 'Lemons')
elif items == 'Apples':
re= items.replace('Apple','Apples')
results.append(re)
print(results)
Output: ['Bananas','Lemons', 'Apples']
Ideally I would like to return a new column that replaces my new values with old ones in the 'shopping' column:
This is my desired output, but unfortunately my new list (results) is not the same length as the current df:
shopping
Bananas
Apples
Grapes
None
Bananas
Nuts
Lemons
I suggest create dictionary for mapping and replace filtered values:
d = {'Banana':'Bananas', 'Lemon':'Lemons', 'Apple':'Apples'}
mask = df['on_List'].eq(1) & df['on_List'].notnull()
df['shopping'] = df['shopping'].mask(mask, df['shopping'].map(d)).fillna(df['shopping'])
#slowier solution
#df['shopping'] = df['shopping'].mask(mask, df['shopping'].replace(d))
print (df)
shopping on_List
0 Bananas 1
1 Apple 0
2 Grapes 1
3 None 0
4 Bananas 1
5 Nuts 0
6 Lemons 1
val = []
for i in range(len(df)):
if df["shopping"][i] != None and df["on_List"][i] == 1:
if df["shopping"][i] == "Banana":
val.append("Bananas")
elif df["shopping"][i] == "Lemon":
val.append("Lemons")
elif df["shopping"][i] == "Apple":
val.append("Apples")
else:
val.append("None")
df["Result"] = pd.Series(val)

Resources