Python pandas move cell value to another cell in same row - python-3.x

I have a dataFrame like this:
id Description Price Unit
1 Test Only 1254 12
2 Data test Fresher 4
3 Sample 3569 1
4 Sample Onces Code test
5 Sample 245 2
I want to move to the left Description column from Price column if not integer then become NaN. I have no specific word to call in or match, the only thing is if Price column have Non-integer value, that string value move to Description column.
I already tried pandas replace and concat but it doesn't work.
Desired output is like this:
id Description Price Unit
1 Test Only 1254 12
2 Fresher 4
3 Sample 3569 1
4 Code test
5 Sample 245 2

This should work
# data
df = pd.DataFrame({'id': [1, 2, 3, 4, 5],
'Description': ['Test Only', 'Data test', 'Sample', 'Sample Onces', 'Sample'],
'Price': ['1254', 'Fresher', '3569', 'Code test', '245'],
'Unit': [12, 4, 1, np.nan, 2]})
# convert price column to numeric and coerce errors
price = pd.to_numeric(df.Price, errors='coerce')
# for rows where price is not numeric, replace description with these values
df.Description = df.Description.mask(price.isna(), df.Price)
# assign numeric price to price column
df.Price = price
df

Use:
#convert valeus to numeric
price = pd.to_numeric(df['Price'], errors='coerce')
#test missing values
m = price.isna()
#shifted only matched rows
df.loc[m, ['Description','Price']] = df.loc[m, ['Description','Price']].shift(-1, axis=1)
print (df)
id Description Price
0 1 Test Only 1254
1 2 Fresher NaN
2 3 Sample 3569
3 4 Code test NaN
4 5 Sample 245
If need numeric values in ouput Price column:
df = df.assign(Price=price)
print (df)
id Description Price
0 1 Test Only 1254.0
1 2 Fresher NaN
2 3 Sample 3569.0
3 4 Code test NaN
4 5 Sample 245.0

Related

Given a column value, check if another column value is present in preceding or next 'n' rows in a Pandas data frame

I have the following data
jsonDict = {'Fruit': ['apple', 'orange', 'apple', 'banana', 'orange', 'apple','banana'], 'price': [1, 2, 1, 3, 2, 1, 3]}
Fruit price
0 apple 1
1 orange 2
2 apple 1
3 banana 3
4 orange 2
5 apple 1
6 banana 3
What I want to do is check if Fruit == banana and if yes, I want the code to scan the preceding as well as the next n rows from the index position of the 'banana' row, for an instance where Fruit == apple. An example of the expected output is shown below taking n=2.
Fruit price
2 apple 1
5 apple 1
I have tried doing
position = df[df['Fruit'] == 'banana'].index
resultdf= df.loc[((df.index).isin(position)) & (((df['Fruit'].index+2).isin(['apple']))|((df['Fruit'].index-2).isin(['apple'])))]
# Output is an empty dataframe
Empty DataFrame
Columns: [Fruit, price]
Index: []
Preference will be given to vectorized approaches.
IIUC, you can use 2 masks and boolean indexing:
# df = pd.DataFrame(jsonDict)
n = 2
m1 = df['Fruit'].eq('banana')
# is the row ±n of a banana?
m2 = m1.rolling(2*n+1, min_periods=1, center=True).max().eq(1)
# is the row an apple?
m3 = df['Fruit'].eq('apple')
out = df[m2&m3]
output:
Fruit price
2 apple 1
5 apple 1

Get column value from the column that is dynamically selected depending on row value of another column

I have a dataframe as below.
month fe_month_OCT re_month_APR fe_month_MAY
0 OCT 1 1 2
1 APR 4 2 2
2 MAY 1 4 3
Im trying to create a new column that gets me the value from any of the fe_month_ or re_month_ columns based on what month the row of data corresponds to (for the SAME month however we will not see 2 columns - i.e. we will never see both fe_month_APR and re_month_APR in the same df - it will either be fe or re).
Output example - for the first row, I would want this new column to have the value coming from fe_month_OCT, because month=OCT, for the second row, the value should come from re_month_APR etc.
Expected output:
month fe_month_OCT re_month_APR fe_month_MAY d_month
0 OCT 1 1 2 1
1 APR 4 2 2 2
2 MAY 1 4 3 3
Code to create input dataframe:
data = {'month': ['OCT', 'APR', 'MAY'], 'fe_month_OCT': [1, 4, 1], 're_month_APR': [1, 2, 4],'fe_month_MAY': [2, 2, 3] }
db = pd.DataFrame(data)
Assuming all the column names are in the form "fe_month_" plus the string in db["month"], you can use apply().
get_value = lambda row: row[ "fe_month_" + row["month"] ]
db["d_month"] = db.apply( get_value, axis=1 )

How to select columns based on criteria?

I have the following dataframe:
d2 = {('CAR','ALPHA'): pd.Series(['A22', 'A23', 'A24', 'A25'],index=[2, 3, 4, 5]),
('CAR','BETA'): pd.Series(['B22', 'B23', 'B24', 'B25'],index=[2, 3, 4, 5]),
('MOTOR','SOLO'): pd.Series(['S22', 'S23', 'S24', 'S25'], index=[2, 3, 4, 5])}
db= pd.DataFrame(data=d2)
I would like in the columns that have 'CAR' in the 0 level multiindex to delete all the values and set them to NA after a row index, ex. 4.
I am trying to use .loc but I would like the results to be saved in the same dataframe.
The second thing I would to do to set the values of columns that their 0 multiindex level is different from 'CAR' to NA after a row index, ex 3.
Use slicers for first and for second MultiIndex.get_level_values compare by level value:
idx = pd.IndexSlice
db.loc[4:, idx['CAR', :]] = np.nan
db.loc[3:, db.columns.get_level_values(0) != 'CAR'] = 'AAA'
Or:
mask = db.columns.get_level_values(0) == 'CAR'
db.loc[4:, mask] = np.nan
db.loc[3:, ~mask] = 'AAA'
print(db)
CAR MOTOR
ALPHA BETA SOLO
2 A22 B22 S22
3 A23 B23 AAA
4 NaN NaN AAA
5 NaN NaN AAA

Getting all rows where for column 'C' the entry is larger than the preceding element in column 'C'

How can I select all rows of a data frame where a condition is met according to a column, which has to do with the relationship between every 2 entries of that column. To give the specific example, lets say I have a DataFrame:
>>>df = pd.DataFrame({'A': [ 1, 2, 3, 4],
'B':['spam', 'ham', 'egg', 'foo'],
'C':[4, 5, 3, 4]})
>>> df
A B C
0 1 spam 4
1 2 ham 5
2 3 egg 3
3 4 foo 4
>>>df2 = df[ return every row of df where C[i] > C[i-1] ]
>>> df2
A B C
1 2 ham 5
3 4 foo 4
There is plenty of great information about slicing and indexing in the pandas docs and here, but this is a bit more complicated, I think. I could also be going about it wrong. What I'm looking for is the rows of data where the value stored in C is no longer monotonously declining.
Any help is appreciated!
Use boolean indexing with compare by shifted column values:
print (df[df['C'] > df['C'].shift()])
A B C
1 2 ham 5
3 4 foo 4
Detail:
print (df['C'] > df['C'].shift())
0 False
1 True
2 False
3 True
Name: C, dtype: bool
If want all monotonously declining rows compare diff of column:
print (df[df['C'].diff() > 0])
A B C
1 2 ham 5
3 4 foo 4

Pandas: Update values of a column

I have a large dataframe with multiple columns (sample shown below). I want to update the values of one particular (population column) column by dividing the values of it by 1000.
City Population
Paris 23456
Lisbon 123466
Madrid 1254
Pekin 86648
I have tried
df['Population'].apply(lambda x: int(str(x))/1000)
and
df['Population'].apply(lambda x: int(x)/1000)
Both give me the error
ValueError: invalid literal for int() with base 10: '...'
If your DataFrame really does look as presented, then the second example should work just fine (with the int not even being necessary):
In [16]: df
Out[16]:
City Population
0 Paris 23456
1 Lisbon 123466
2 Madrid 1254
3 Pekin 86648
In [17]: df['Population'].apply(lambda x: x/1000)
Out[17]:
0 23.456
1 123.466
2 1.254
3 86.648
Name: Population, dtype: float64
In [18]: df['Population']/1000
Out[18]:
0 23.456
1 123.466
2 1.254
3 86.648
However, from the error, it seems like you have the unparsable string '...' somewhere in your Series, and that the data needs to be cleaned further.

Resources