I want to add region column to df1 using ip range stated in df2 given two data frames:
df1 has ip_address
df2 has ip_from, ip_to, region
can you make a conditional statement using indexes?
if df1.ip[0] is in between specific ip, add region in df1?
I am guessing there should be a loop in the if statement so that it can loop through df2 to see where the ip ranges are and grab the region.
I know by adding each condition manually, this will work,
but is there a way to make condition loop by index?
region=[]
for row in df1['ip']:
if row > 15:
region.append('D')
elif row > 10:
region.append('C')
elif row > 5:
region.append('B')
else:
region.append('A')
df1['region'] = region
To make it iterate through rows, can it be done this way?
region = []
# For each row in the column,
for row in df1['ip']:
if (row >= df2.loc[row,'ip_from']) and (row <= df2.loc[row,'ip_to']):
region.append(df2.loc[row,'region'])
You can use a list of regions with indices that match ip address value:
df1 = {'ip': [13,4,7,2], 'region': []}
df2 = list('AAAAAABBBBBCCCCCDDDDD')
for i in range(len(df1['ip'])):
df1['region'].append(df2[df1['ip'][i]])
Related
I have the 2 dataframes as below:
Based on the values of df2, if df1 rows matches ALL the conditions of df2, then remove it from df1.
Expected output:
If the column value is NULL, then consider it to match ALL the values, else regular match.
i.e. 1st row (from df2) only has product value (other columns are null), so filter should match all values of book, business and ONLY product = Forex; so "tr4" row should be matched and hence remove.
2nd row (form df2) has to match book = b2, all business (since NULL) and product = Swap, i.e. no rows matched with all this (AND) condition, and nothing removed.
I can have result inplace or new df, how can this be done?
for i in range(len(df2)):
for j in range(len(df1)):
if (df2['book'][i] == "[NULL]" or df2['book'][i] == df1['book'][i]
and df2['business'][i] == "[NULL]" or df2['business'][i] == df1['business'][i]
and df2['product'][i] == "[NULL]" or df2['product'][i] == df1['product'][i]):
df1 = df1.drop(j)
df1.reset_index(drop=True, inplace=True)
df1 = pd.DataFrame(zip(l1), columns =['l1'])
df1.l2.value_counts()
df2 = pd.DataFrame(zip(l2), columns =['l2'])
df2.l2.value_counts()
I want to add the column values from l2 to l1 depending on the value count in df1. For example
if value count of 'bb_#2' < value count of 'bb_#1' in df1 then all of 'bb_#3' in df2 should be added in 'bb_#2' in df1 also changing their name to 'bb_#2' as well and
same logic as described above for 'aa_#3'
'cc_#2' & 'cc_#3' in df2 should be combined and added into 'cc_#1' in df1.
conditions should be checked in df1 and if a condition is met then values from l2 in df2 should be added to the l1 column of df1
output
is given here as well
l1=['aa_#1', 'bb_#1', 'bb_#2', 'aa_#1', 'aa_#1', 'aa_#1', 'bb_#1','aa_#2','bb_#2','aa_#2','bb_#1','bb_#1','bb_#1','bb_#2','bb_#2','cc_#1','bb_#2','bb_#2', 'bb_#2','aa_#2','aa_#2','aa_#2', 'cc_#1','cc_#1','cc_#1','cc_#1']
Please let me know if there is a way to do this in Python. I have 10,000 rows like this to add from l2 to l1 and I don't know how to even begin with it. I am really new to Python.
This is a method that doesn't use pandas. The .count() method returns the value count of an item in a list. The .extend() method appends another to the end of an existing list. Lastly, multiplying a list by an integer duplicates and concats it that many times. ['a'] * 3 == ['a', 'a', 'a']
def extend_list(l1, l2, prefixes, final_prefixes, suffix_1, suffix_2, suffix_3):
for prefix in prefixes:
if l1.count(f'{prefix}_{suffix_2}') < l1.count(f'{prefix}_{suffix_1}'):
l1.extend([f'{prefix}_{suffix_2}'] * l2.count(f'{prefix}_{suffix_3}'))
for final_prefix in final_prefixes:
l1.extend([f'{final_prefix}_{suffix_1}'] *
(l2.count(f'{final_prefix}_{suffix_2)') + l2.count(f'{final_prefix}_{suffix_3}')))
l1 = ['aa_#1','bb_#1','bb_#2','aa_#1','aa_#1','aa_#1','bb_#1','aa_#2','bb_#2','aa_#2','bb_#1','bb_#1','bb_#1','bb_#2','bb_#2','cc_#1']
l2 = ['aa_#3','aa_#3','aa_#3','bb_#3','bb_#3','bb_#3','cc_#2','cc_#2','cc_#3','cc_#3']
l1 = extend_list(l1, l2, ["aa", "bb"], ["cc", "dd"], "#1", "#2", "#3")
I'm trying to copy all of the values across a subset of rows into a new column using the apply function, but it seems to just copying the entire dataframe range. I'm receiving that subset of the dataframe as a result, though I'm expecting df.loc[index, 'consolidated_commentary'] to contain a concatenated version of the the text in the columns contained within all_commentary_columns
My code is:
for index, row in df[all_commentary_columns].iterrows():
if pd.isna(row).prod():
df.loc[index, 'new_col'] = 'good'
else:
df.loc[index, 'new_col'] = 'bad'
df.loc[index, 'consolidated_commentary'] = df[all_commentary_columns].apply(lambda x: x.loc[all_commentary_columns], axis=1)
I pulled this from the answer I referenced in the comments:
df.loc[index, 'consolidated_commentary'] = df.loc[index, all_commentary_columns].astype(str).apply(lambda x: ' '.join(' '.join(x).split()),axis=1)
I'm trying to perform an action on grouped data in Pandas. For each group based on variable "atable" and "column" I want to loop through the rows and see if sum of values for variable "value" for Include "Yes" is equal to sum of values for variable "value" for Include "No", only if Include has both "Yes" and "No" values for that group. If conditions are not met, then I want to print out the error with the row details. My data looks like this:
df1 = pd.DataFrame({
'atable':['Users','Users','Users','Users','Locks'],
'column':['col_1','col_1','col_1','col_a','col'],
'Include':['No','Yes','Yes','Yes','Yes'],
'value':[3,2,1,1,1],
})
df1
Include atable column value
0 No Users col_1 3
1 Yes Users col_1 2
2 Yes Users col_1 1
3 Yes Users col_a 1
4 Yes Locks col 1
I tried the below code but it is also erroring out for the rows which doesnot have either "Yes" or "No" in Include column as below:
grouped = df1.groupby(["atable", "column"])
for index, rows in grouped:
if (([rows['Include'].isin(["Yes", "No"])])) and (rows[rows['Include'] == 'Yes']['value'].sum() != rows[rows['Include'] == 'No']["value"].sum()):
print("error", index)
Output:
error ('Locks', 'col')
error ('Users', 'col_a')
I dont want my code to error out even for index 3 & 4 since those rows just has "Yes" in Include column.
This worked:
grouped = df2.groupby(["atable", "column"])
for index, rows in grouped:
if (rows[rows['Include'] == 'Yes']['value'].sum() != rows[rows['Include'] == 'No']["value"].sum()) and (rows[rows['Include'] == 'Yes']['value'].sum() != 0) and (rows[rows['Include'] == 'No']['value'].sum() != 0):
print("error", index)
I am trying to add a new column in a pandas data frame, then update the value of the column row by row:
my_df['col_A'] = 0
for index, row in my_df.iterrows():
my_df.loc[index]['col_A'] = 100 # value here changes in real case
print(my_df.loc[index]['col_A'])
my_df
However, in the print out, all values in the col_A are still 0, why is that? What did I miss? Thanks!
you are assigning to a slice in this line my_df.loc[index]['col_A'] = 100
Instead do
my_df['col_A'] = 0
for index, row in my_df.iterrows():
my_df.loc[index, 'col_A'] = 100 # value here changes in real case
print(my_df.loc[index]['col_A'])