Performing function in new column based on condition in other column

Performing function in new column based on condition in other column - python-3.x

Here I am attempting to query a column in dataframe df, which has boolean values 'Yes' or 'No', in order to perform some function of random letter assignment according to a probability distribution in rows where the condition is met.
if (df['some_bool'] == 'Yes'):
df['score'] = np.random.choice(['A', 'B', 'C'], len(df), p=[0.3, 0.2, 0.5])
What is a correct way of writing this as I receive the following error for the above:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thanks!

Try this instead:
df['score'] = np.where(df['some_bool'] == 'Yes',
np.random.choice(['A', 'B', 'C'], len(df), p=[0.3, 0.2, 0.5]), '')

Related

Apply a lambda function to iterate over two columns

I have a pandas df:
pd.DataFrame({'61 - 90': [np.NaN, 14, np.NaN, 9, 34, np.NaN],
'91 and over': [np.NaN, 10, np.NaN, 1, np.NaN, 9]})
I am trying to apply a lambda function that returns False if BOTH columns for a record == np.NaN. My attempt at solving this:
df['not_na'] = df[['61 - 90', '91 and over']].apply(lambda x: False if pd.isna(x) else True)
The error message I receive:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 61 - 90')

Why don't you do:
df['not_na'] = df[['61 - 90', '91 and over']].notnull().any(axis=1)

To do this using a lambda function over a data frame For elementwise operations. we need to use applymap
df[['61 - 90', '91 and over']].applymap(lambda x: False if pd.isna(x) else True)
the documentation for applymap fuction is availablein the link below
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.applymap.html

Numpy TypeError: ufunc 'bitwise_and' not supported for the input types,

I have the following data frame and list values
import pandas as pd
import numpy as np
df_merge = pd.DataFrame({'column1': ['a', 'c', 'e'],
'column2': ['b', 'd', 'f'],
'column3': [0.5, 0.6, .04],
'column4': [0.7, 0.8, 0.9]
})
bb = ['b','h']
dd = ['d', 'I']
ff = ['f', 'l']
I am trying to use np.where and np.select to instead of IF FUNCTION:
condition = [((df_merge['column1'] == 'a') & (df_merge['column2'] == df_merge['column2'].isin(bb))),((df_merge['column1'] == 'c') & (df_merge['column2'] == df_merge['column2'].isin(dd))), ((df_merge['column1'] == 'e') & (df_merge['column2'] == df_merge['column2'].
isin(ff)))]
choices1 = [((np.where(df_merge['column3'] >= 1, 'should not have, ','correct')) & (np.where(df_merge['column4'] >= 0.45, 'should not have, ','correct')))]
df_merge['Reason'] = np.select(condition, choices1, default='correct')
However, when i try to run the code line of choices1, i get the following error:
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Im am not sure if we can use np.where in choices as mentioned above.
np.where should be applied for both columns. Expected output as below:
df_merge = pd.DataFrame({'column1': ['a', 'c', 'e'],
'column2': ['b', 'd', 'f'],
'column3': [0.5, 0.6, .04],
'column4': [0.7, 0.8, 0.9],
'Reason': ['correct, should not have', 'correct, should not have', 'correct, should not have'],
})
Any help / guidance / alternative is much appreciated.

First length of condition list has to be same like choices1, so last condition is commented (removed) for length 2.
Then if compare by isin output is condition (mask), so compare with column has no sense.
Last problem was need list of length 2, so replaced & to , and removed parantheses in choices1 list for avoid tuples:
condition = [(df_merge['column1'] == 'a') & df_merge['column2'].isin(bb),
(df_merge['column1'] == 'c') & df_merge['column2'].isin(dd)
# (df_merge['column1'] == 'e') & df_merge['column2'].isin(ff),
]
choices1 = [np.where(df_merge['column3'] >= 1, 'should not have','correct'),
np.where(df_merge['column4'] >= 0.45, 'should not have','correct')]
df_merge['Reason'] = np.select(condition, choices1, default='correct')
print (df_merge)
column1 column2 column3 column4 Reason
0 a b 0.50 0.7 correct
1 c d 0.60 0.8 should not have
2 e f 0.04 0.9 correct

Python print/display only if the sum is not zero

I have a dataframe below:
df = pd.DataFrame({'Product': ['A', 'A', 'C', 'D'], 'Volume': ['-3', '3', '1', '5']})
I am using groupby and sum.
final = df.groupby(['Product'])['Volume'].sum().reset_index()
print(final)
This is ok.
But I only want the print to be carry only those where sum != 0. Like Product C and D
Any idea how can I do that?
I try to use:
if final != 0:
print (final)
But this is throwing error and usually when I get this error, the syntax is definitely wrong...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Your data frame has Volume as strings, is that intended? if you want to sum it like numbers you have to convert it to numbers then you can apply the filter.
df = pd.DataFrame({'Product': ['A', 'A', 'C', 'D'], 'Volume': ['-3', '3', '1', '5']})
# convert from string to integers
df.Volume = df.Volume.map(lambda x: int(x))
final = df.groupby(['Product'])['Volume'].sum().reset_index()
#choose ones with sum none zero
print(final[final.Volume != 0])
it will print only the C & D

Given,
import pandas as pd
df = pd.DataFrame({'Product': ['A', 'A', 'C', 'D'], 'Volume': [-3, 3, 1, 5]})
final = df.groupby(['Product'])['Volume'].sum().reset_index()
Use selection to only select rows that match your criteria. df[some_series_of_booleans_based_on_condition]
print(final[final['Volume'] != 0])
#output:
Product Volume
1 C 1
2 D 5
The idea being that if [some series of booleans]: doesn't make sense for python to interpret, and thus it complains about the syntax with the message that you saw.

Function doesn't change value, when applied to nested list

I have a function which iterates over list of lists. If it finds the value, which is a list itself, it should create a string from this value and insert it instead of the original one:
def lst_to_str(lst):
for x in lst:
for y in x:
i = 0
if type(y) == list:
x[i] = ",".join(y)
i +=1
return lst
The problem is, when I apply this function to pd.DataFrame column
df['pdns'] = df['pdns'].apply(lambda x: lst_to_str(x))
It returns me the original nested list:
[['a', 'b', 'c', 'd'], ['a1', 'b1', 'd1', 'c1'],['a2', 'b2', 'c2', ['d2_1', 'd2_2']]]
Instead of:
[['a', 'b', 'c', 'd'], ['a1', 'b1', 'd1', 'c1'],['a2', 'b2', 'c2', 'd2_1, d2_2']]

Your code is wrong. In your function definition, you're not making any change to 1st and in the end you return 1st. You're checking some condition and then you change the value of your counter (x). Correct this problem and try again

Calling a list of DataFrame index with index key value

df = pd.DataFrame([[3,3,3]]*4,index=['a','b','c','d'])
While we can extract a copy of a section of an index via specifying row numbers like below:
i1=df.index[1:3].copy()
Unfortunately we can't extract a copy of a section of an index via specifying the key (like the case of df.loc method). When I try the below:
i2=df.index['a':'c'].copy()
I get the below error:
TypeError: slice indices must be integers or None or have an __index__ method
Is there any alternative to call a subset of an index based on its keys? Thank you

Simpliest is loc with index:
i1 = df.loc['b':'c'].index
print (i1)
Index(['b', 'c'], dtype='object')
Or is possible use get_loc for positions:
i1 = df.index
i1 = i1[i1.get_loc('b') : i1.get_loc('d') + 1]
print (i1)
Index(['b', 'c'], dtype='object')
i1 = i1[i1.get_loc('b') : i1.get_loc('d') + 1]
print (i1)
Index(['b', 'c', 'd'], dtype='object')
Alternative:
i1 = i1[i1.searchsorted('b') : i1.searchsorted('d') + 1]
print (i1)
Index(['b', 'c', 'd'], dtype='object')

Try using .loc, see this documentation:
i2 = df.loc['a':'c'].index
print(i2)
Output:
Index(['a', 'b', 'c'], dtype='object')
or
df.loc['a':'c'].index.tolist()
Output:
['a', 'b', 'c']

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Performing function in new column based on condition in other column - python-3.x

Try this instead: df['score'] = np.where(df['some_bool'] == 'Yes', np.random.choice(['A', 'B', 'C'], len(df), p=[0.3, 0.2, 0.5]), '')

Related

Apply a lambda function to iterate over two columns

Numpy TypeError: ufunc 'bitwise_and' not supported for the input types,

Python print/display only if the sum is not zero

Function doesn't change value, when applied to nested list

Calling a list of DataFrame index with index key value

Categories

Resources