Double a column values when a different column equals a specified value

Double a column values when a different column equals a specified value - python-3.x

I want to double the value in the distance column in the rows which have the value of 'one-way' in the hike_type column. I am iterating through the df and finding all of the proper rows but I am having trouble getting the multiplication to stick.
This is finding the proper rows but will not put the change into effect
for index, row in df.iterrows():
if row['hike_type'] == 'one-way':
row['distance'] * 2
This hasn't worked either
for index, row in df.iterrows():
if row['hike_type'] == 'one-way':
row['distance'] = row['distance'] * 2
for some reason when I do (below) it prints what I want.
for index, row in df.iterrows():
if row['hike_type'] == 'one-way':
print(row['distance'] * 2)

IIUC, what you want could be achieved with just one line as below
df['distance']= np.where (df['hike_type'] == 'one-way', df['distance'].astype(int)*2,df['distance'])
OR you can use df.loc as below
df.update(df.loc[df['hike_type'] == 'one-way','distance'].astype(int)*2)
OR
df.update(df[df['hike_type'] == 'one-way']['distance'].astype(int)*2)

Related

Is there any python-Dataframe function in which I can iterate over rows of certain columns?

Want to solve this kind of problem in python:
tran_df['bad_debt']=train_df.frame_apply(lambda x: 1 if (x['second_mortgage']!=0 and x['home_equity']!=0) else x['debt'])
I want be able to create a new column and iterate over index row for specific columns.
in excel it's really easy I did:
if(AND(col_name1<>0,col_name2<>0),1,col_name5)
Any help will be very appreciated.

To iterate over rows only for certain columns:
for rowIndex, row in df[['col1','col2']].iterrows(): #iterate over rows
To create a new column:
df['new'] = 0 # Initialise as 0

As a rule, iterating over rows in pandas is wrong. Use the np.where function from NumPy to select the right values for the rows:
tran_df['bad_debt'] = np.where(
(tran_df['second_mortgage'] != 0) & (tran_df['home_equity'] != 0),
1, tran_df['debt'])

First to create a new column with initial value, then to use .loc to locate rows that match certain condition and assign new value:
tran_df['bad_debt']=tran_df['debt']
tran_df.loc[(tran_df['second_mortgage']!=0)&(tran_df['home_equity']!=0),'bad_debt']=1
Or
tran_df['bad_debt']=1
tran_df.loc[(tran_df['second_mortgage']==0)|(tran_df['home_equity']==0),'bad_debt']=tran_df['debt']
Remember to put round brackets for each condition between bitwise operators (& |)

How to remove rows of columns whose value count is less than particular number?

df['Brand'].value_counts() gives list of occurrence of each value in column Brand. I want to remove all rows where the occurrence is less than 6. Column Brand is string.
Table

Use:
df = pd.DataFrame({'Brand':[1,2,3,3,3,3,3,3,3,3]})
df[df.apply(lambda x: df.value_counts()[x]>6)['Brand'].values]
Output:
A more efficient way, if your data size is huge:
temp = df.value_counts()>6
df[df['Brand'].isin(temp[temp].index.get_level_values(0).values)]
output:
Another way:
df = pd.DataFrame({'Brand':[1,2,3,3,3,3,3,3,3,3]})
temp = df['Brand'].tolist()
df[df['Brand'].apply(lambda x: temp.count(x)>6)]
with the same output.

You can do this below;
column = df['Brand'] > 6
valueCount = column.value_counts()

Use previous values of rows in data frame to build the current cell value

I have a dataframe as represented here
A
0.001216
0.000453
0.00506
0.004556
0.005266
I want to create a new column B something according to this formula presented in the code below.
column_key = 'B'
factor = 'A'
df[column_key] = np.nan
df[column_key][0] = (df[factor][0] + 1) * 100
for i in range(1, len(df)):
df[column_key][i] = (df[factor][i] + 1) * df[column_key][i-1]
I have been trying to fill the current cell value using the previous cell of a column and adjacent cell of a column.
This is what I have tried but I don't think this is going to be effective.
Can anyone help me with best efficient approach of solving this problem?

Using pandas.cumprod(), it can be done in following way:
df['B'] = df['A'] + 1
df['B'][0] = df['B'][0] * 100
df['B'] = df['B'].cumprod()

Names of the columns we're searching for missing values

Searching for missing values ?
columns = ['median', 'p25th', 'p75th']
# Look at the dtypes of the columns
print(____)
# Find how missing values are represented (Search for missing values in the median, p25th, and p75th columns.)
print(recent_grads["median"].____)
# Replace missing values with NaN,using numpy's np.nan.
for column in ___:
recent_grads.loc[____ == '____', column] = ____?

Names of the columns we're searching for missing values
columns = ['median', 'p25th', 'p75th']
Take a look at the dtypes
print(recent_grads[columns].dtypes)
Find how missing values are represented
print(recent_grads["median"].unique())
Replace missing values with NaN
for column in columns:
recent_grads.loc[recent_grads[column] == 'UN', column] = np.nan

right answer is--
for column in columns:
recent_grads.loc[recent_grads[column] == 'UN', column] = np.nan

# Print .dtypes
print(recent_grads.dtypes)
# Output summary statistics
print(recent_grads.describe())
# Exclude data of type object
print(recent_grads.describe(exclude=["object"]))

How to iterate a df (itertuples) that can perform logical operations based on previous rows in pandas

Is there a way to extract values from the previous row during the itertuples() operations?
Pseudo code:
df=pd.DataFrame([[1,3,4],[5,6,7],[6,3,5],[7,4,23]])
for row in df.itertuples():
if (value of column[1] in [row -1 ])>(value of column[1] in row / 2):
do something
I know someone might suggest doing vectorized operations using .diff() or .shift(), but I would like to know how to achieve the above via a for loop.

I'd set up a variable to track the previous row
prow = None
for row in df.itertuples():
if prow is None:
prow = row
else:
if prow[1] > row[1] / 2:
pass
# do something
prow = row
Or you can zip two itertuples together
for prow, row in zip(df.iloc[:-1].itertuples(), df.iloc[1:].itertuples()):
if prow[1] > row[1] / 2:
pass
# do something

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Double a column values when a different column equals a specified value - python-3.x

Related

Is there any python-Dataframe function in which I can iterate over rows of certain columns?

How to remove rows of columns whose value count is less than particular number?

Use previous values of rows in data frame to build the current cell value

Names of the columns we're searching for missing values

How to iterate a df (itertuples) that can perform logical operations based on previous rows in pandas

Categories

Resources