I want to double the value in the distance column in the rows which have the value of 'one-way' in the hike_type column. I am iterating through the df and finding all of the proper rows but I am having trouble getting the multiplication to stick.
This is finding the proper rows but will not put the change into effect
for index, row in df.iterrows():
if row['hike_type'] == 'one-way':
row['distance'] * 2
This hasn't worked either
for index, row in df.iterrows():
if row['hike_type'] == 'one-way':
row['distance'] = row['distance'] * 2
for some reason when I do (below) it prints what I want.
for index, row in df.iterrows():
if row['hike_type'] == 'one-way':
print(row['distance'] * 2)
IIUC, what you want could be achieved with just one line as below
df['distance']= np.where (df['hike_type'] == 'one-way', df['distance'].astype(int)*2,df['distance'])
OR you can use df.loc as below
df.update(df.loc[df['hike_type'] == 'one-way','distance'].astype(int)*2)
OR
df.update(df[df['hike_type'] == 'one-way']['distance'].astype(int)*2)
Related
Want to solve this kind of problem in python:
tran_df['bad_debt']=train_df.frame_apply(lambda x: 1 if (x['second_mortgage']!=0 and x['home_equity']!=0) else x['debt'])
I want be able to create a new column and iterate over index row for specific columns.
in excel it's really easy I did:
if(AND(col_name1<>0,col_name2<>0),1,col_name5)
Any help will be very appreciated.
To iterate over rows only for certain columns:
for rowIndex, row in df[['col1','col2']].iterrows(): #iterate over rows
To create a new column:
df['new'] = 0 # Initialise as 0
As a rule, iterating over rows in pandas is wrong. Use the np.where function from NumPy to select the right values for the rows:
tran_df['bad_debt'] = np.where(
(tran_df['second_mortgage'] != 0) & (tran_df['home_equity'] != 0),
1, tran_df['debt'])
First to create a new column with initial value, then to use .loc to locate rows that match certain condition and assign new value:
tran_df['bad_debt']=tran_df['debt']
tran_df.loc[(tran_df['second_mortgage']!=0)&(tran_df['home_equity']!=0),'bad_debt']=1
Or
tran_df['bad_debt']=1
tran_df.loc[(tran_df['second_mortgage']==0)|(tran_df['home_equity']==0),'bad_debt']=tran_df['debt']
Remember to put round brackets for each condition between bitwise operators (& |)
df['Brand'].value_counts() gives list of occurrence of each value in column Brand. I want to remove all rows where the occurrence is less than 6. Column Brand is string.
Table
Use:
df = pd.DataFrame({'Brand':[1,2,3,3,3,3,3,3,3,3]})
df[df.apply(lambda x: df.value_counts()[x]>6)['Brand'].values]
Output:
A more efficient way, if your data size is huge:
temp = df.value_counts()>6
df[df['Brand'].isin(temp[temp].index.get_level_values(0).values)]
output:
Another way:
df = pd.DataFrame({'Brand':[1,2,3,3,3,3,3,3,3,3]})
temp = df['Brand'].tolist()
df[df['Brand'].apply(lambda x: temp.count(x)>6)]
with the same output.
You can do this below;
column = df['Brand'] > 6
valueCount = column.value_counts()
I have a dataframe as represented here
A
0.001216
0.000453
0.00506
0.004556
0.005266
I want to create a new column B something according to this formula presented in the code below.
column_key = 'B'
factor = 'A'
df[column_key] = np.nan
df[column_key][0] = (df[factor][0] + 1) * 100
for i in range(1, len(df)):
df[column_key][i] = (df[factor][i] + 1) * df[column_key][i-1]
I have been trying to fill the current cell value using the previous cell of a column and adjacent cell of a column.
This is what I have tried but I don't think this is going to be effective.
Can anyone help me with best efficient approach of solving this problem?
Using pandas.cumprod(), it can be done in following way:
df['B'] = df['A'] + 1
df['B'][0] = df['B'][0] * 100
df['B'] = df['B'].cumprod()
Searching for missing values ?
columns = ['median', 'p25th', 'p75th']
# Look at the dtypes of the columns
print(____)
# Find how missing values are represented (Search for missing values in the median, p25th, and p75th columns.)
print(recent_grads["median"].____)
# Replace missing values with NaN,using numpy's np.nan.
for column in ___:
recent_grads.loc[____ == '____', column] = ____?
Names of the columns we're searching for missing values
columns = ['median', 'p25th', 'p75th']
Take a look at the dtypes
print(recent_grads[columns].dtypes)
Find how missing values are represented
print(recent_grads["median"].unique())
Replace missing values with NaN
for column in columns:
recent_grads.loc[recent_grads[column] == 'UN', column] = np.nan
right answer is--
for column in columns:
recent_grads.loc[recent_grads[column] == 'UN', column] = np.nan
# Print .dtypes
print(recent_grads.dtypes)
# Output summary statistics
print(recent_grads.describe())
# Exclude data of type object
print(recent_grads.describe(exclude=["object"]))
Is there a way to extract values from the previous row during the itertuples() operations?
Pseudo code:
df=pd.DataFrame([[1,3,4],[5,6,7],[6,3,5],[7,4,23]])
for row in df.itertuples():
if (value of column[1] in [row -1 ])>(value of column[1] in row / 2):
do something
I know someone might suggest doing vectorized operations using .diff() or .shift(), but I would like to know how to achieve the above via a for loop.
I'd set up a variable to track the previous row
prow = None
for row in df.itertuples():
if prow is None:
prow = row
else:
if prow[1] > row[1] / 2:
pass
# do something
prow = row
Or you can zip two itertuples together
for prow, row in zip(df.iloc[:-1].itertuples(), df.iloc[1:].itertuples()):
if prow[1] > row[1] / 2:
pass
# do something