run function in pandas to create new column - python-3.x

I have this df:
df = {'Option': ["A", "B", "C"]}
I'm trying to create a new row, Identifier, that equals 1 if the value in the Option column equals "A". If not, the value in Identifier should return 0.
I created the following function to do this:
def trigger(row):
if df['Option'] == "A":
return 1
else:
return 0
Here is what I tried for the Identifier column:
df['Identifier'] = df['Option'].apply(trigger, axis=1)
When I print(df), I get the following error: TypeError: trigger() got an unexpected keyword argument 'axis'
final df should look like this:
finaldf = {'Option': ["A", "B", "C"],
'Identifier': [1,0,0]}
It seems relatively straightforward problem, but idk why it doesnt work.

Your method does not work because you are not using the row in your trigger. Furthermore, you can do this entirely vectorized:
df['Identifier'] = 0
df.loc[df.Option == 'A', 'Identifier'] = 1

Try:
df['Identifier'] = np.where(df.Option == 'A', 1,0)
For multiple conditions you might try
df["Identfier"] = np.where(df.Option.isin(["A", "B"]), 1, 0)

Related

Python3 multiple equal sign in the same line

There is a function in the python2 code that I am re-writing into python3
def abc(self, id):
if not isinstance(id, int):
id = int(id)
mask = self.programs['ID'] == id
assert sum(mask) > 0
name = self.programs[mask]['name'].values[0]
"id" here is a panda series where the index is strings and the column is int like the following
data = np.array(['1', '2', '3', '4', '5'])
# providing an index
ser = pd.Series(data, index =['a', 'b', 'c'])
print(ser)
self.programs['ID'] is a dataframe column where there is one row with integer data like '1'
import pandas as pd
# initialize list of lists
data = [[1, 'abc']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'name'])
I am really confused with the line "mask = self.programs['ID'] == id \ assert sum(mask) > 0". Could someone enlighten?
Basically, mask = self.programs['ID'] == id would return a series of boolean values, whether thoses 'ID' values are equal to id or not.
Then assert sum(mask) > 0 sums up the boolean series. Note that, bool True can be treated as 1 in python and 0 for False. So this asserts that, there is at least one case where programs['ID'] column has a value equal to id.

replacing nan values with a function python

I have a big data set (100k+) with many more columns than in the snippet attached. I need to replace missing values with values from the reference table. I found countless articles of how to replace nan values with the same number but can't find relevant help to replace them with different values obtain from a function. My problem is that np.nan is not equal to np.nan so how can I make a comparison? I'm trying to say that if the value is null then replace it with the particular value from the reference table. I have found the way shown below but its a dangerous method as it replace it only as an exception so if anything goes wrong I wouldn't see it. Here is the snippet:
sampleData = {
'BI Business Name' : ['AAA', 'BBB', 'CCC', 'CCC','DDD','DDD'],
'BId Postcode' : ['NW1 8NZ', 'NW1 8NZ', 'WC2N 4AA','WC2N 4AA', 'CV7 9JY', 'CV7 9JY',],
'BI Website' : ['www#1', 'www#1', 'www#2', 'www#2','www#3', 'www#3'],
'BI Telephone' : ['999', '999', '666', '001', np.nan, '12345']
}
df = pd.DataFrame(sampleData)
df
and here is my method:
feature = 'BI Telephone'
df[[feature]] = df[[feature]].astype('string')
def missing_phone(row):
try:
old_value = row[feature]
if old_value == 'NaN' or old_value == 'nan' or old_value == np.nan or old_value is None or
old_value == '':
reference_value = row[reference_column]
new_value = reference_table[reference_table[reference_column]==reference_value].iloc[0,0]
print('changed')
return new_value
else:
print('unchanged as value is not nan. The value is {}'.format(old_value))
return old_value
except Exception as e:
reference_value = row[reference_column]
new_value = reference_table[reference_table[reference_column]==reference_value].iloc[0,0]
print('exception')
return new_value
df[feature]=df.apply(missing_phone, axis=1)
df
If I don't change the data type to string then the nan is just unchanged. How can I fix it?

Can't figure out how to print out all the integers in my dictionary value list

Curious if there is any way to do something similar to this.
Dictionary = {"item":1,"item2":[1,2,3,4]}
for keys,values in Dictionary.items():
if values == list():
print(values[0:3])
With the resulting outcome of
1
2
3
4
Sure, there is:
Dictionary = {"item": 1, "item2": [1,2,3,4]}
for values in Dictionary.values():
if type(values) == list:
for item in values:
print(item, end=' ')

Remove consecutive duplicate entries from pandas in each cell

I have a data frame that looks like
d = {'col1': ['a,a,b', 'a,c,c,b'], 'col2': ['a,a,b', 'a,b,b,a']}
pd.DataFrame(data=d)
expected output
d={'col1':['a,b','a,c,b'],'col2':['a,b','a,b,a']}
I have tried like this :
arr = ['a', 'a', 'b', 'a', 'a', 'c','c']
print([x[0] for x in groupby(arr)])
How do I remove the duplicate entries in each row and column of dataframe?
a,a,b,c should be a,b,c
From what I understand, you don't want to include values which repeat in a sequence, you can try with this custom function:
def myfunc(x):
s=pd.Series(x.split(','))
res=s[s.ne(s.shift())]
return ','.join(res.values)
print(df.applymap(myfunc))
col1 col2
0 a,b a,b
1 a,c,b a,b,a
Another function can be created with itertools.groupby such as :
from itertools import groupby
def myfunc(x):
l=[x[0] for x in groupby(x.split(','))]
return ','.join(l)
You could define a function to help with this, then use .applymap to apply it to all columns (or .apply one column at a time):
d = {'col1': ['a,a,b', 'a,c,c,b'], 'col2': ['a,a,b', 'a,b,b,a']}
df = pd.DataFrame(data=d)
def remove_dups(string):
split = string.split(',') # split string into a list
uniques = set(split) # remove duplicate list elements
return ','.join(uniques) # rejoin the list elements into a string
result = df.applymap(remove_dups)
This returns:
col1 col2
0 a,b a,b
1 a,c,b a,b
Edit: This looks slightly different to your expected output, why do you expect a,b,a for the second row in col2?
Edit2: to preserve the original order, you can replace the set() function with unique_everseen()
from more_itertools import unique_everseen
.
.
.
uniques = unique_everseen(split)

Dictionary using distinct characters as values

I need to make a dictionary using the string list as keys and their distinct characters as values.
I have tried some functions and ended up with the following code but I cannot seem to add the string key into it
value=["check", "look", "try", "pop"]
print(value)
def distinct_characters(x):
for i in x:
yield dict (i=len(set(i)))
print (list(distinct_characters(value))
I would like to get
{ "check" : 4, "look" : 3, "try" : 3, "pop" : 2}
but I keep getting
{ "i" : 4, "i" : 3, "i" : 3, "i" : 2}
Well, string is itself an iterable, so don't call list on dicts instead call dict on list of tuples like below.
value=["check", "look", "try", "pop"]
print(value)
def distinct_characters(x):
for i in x:
yield (i, len(set(i)))
print(dict(distinct_characters(value)))
Output:
{'check': 4, 'look': 3, 'try': 3, 'pop': 2}
Consider the simple dictionary comprehension:
value = ["check", "look", "try", "pop"]
result = {key: len(set(key)) for key in value}
print(result)
Thanks for the replies
I needed to answer it as a function for a class exercise so I ended up using this code:
value=["check", "look", "try", "pop"]
print(value)
def distinct_characters(x):
for i in x:
yield (i, len(set(i)))
print(dict(distinct_characters(value)))
Thanks again

Resources