Python formatting decimal points on specific columns which contain 'nan' - python-3.x

I am loading a dataframe from a csv file using,
df_c = pd.read_csv(os.path.join(dir_w, csv_c))
Which gives me,
SubCase Row1 Row2 Row3 Row4
0 1003001 NaN 0 NaN 10.0
1 1003002 NaN 0 NaN 10.5
2 1003003 NaN 0 NaN 11.3
3 2000001 110001.0 10 1.0 9.81
4 2000002 110001.0 10 1.0 5.06
For columns 'Row1' and 'Row2' I want to remove the decimal points, while keeping the decimal points in 'Row4'. So it looks like this,
SubCase Row1 Row2 Row3 Row4
0 1003001 NaN 0 NaN 10.0
1 1003002 NaN 0 NaN 10.5
2 1003003 NaN 0 NaN 11.3
3 2000001 110001 10 1 9.81
4 2000002 110001 10 1 5.06
I have tried the following code with no luck,
na_mask = df_c['Row1'].notnull()
df_c.loc[na_mask, 'Row1'] = df_c.loc[na_mask, 'Row1'].round(decimals=0)
and
na_mask = df_c['Row1'].notnull()
df_c.loc[na_mask, 'Row1'] = df_c.loc[na_mask, 'Row1'].astype(int)
Any ideas? Thanks in advance for any help.

Related

Replace only leading NaN values in Pandas dataframe

I have a dataframe of time series data, in which data reporting starts at different times (columns) for different observation units (rows). Prior to first reported datapoint for each unit, the dataframe contains NaN values, e.g.
0 1 2 3 4 ...
A NaN NaN 4 5 6 ...
B NaN 7 8 NaN 10...
C NaN 2 11 24 17...
I want to replace the leading (left-side) NaN values with 0, but only the leading ones (i.e. leaving the internal missing ones as NaN. So the result on the example above would be:
0 1 2 3 4 ...
A 0 0 4 5 6 ...
B 0 7 8 NaN 10...
C 0 2 11 24 17...
(Note the retained NaN for row B col 3)
I could iterate through the dataframe row-by-row, identify the first index of a non-NaN value in each row, and replace everything left of that with 0. But is there a way to do this as a whole-array operation?
notna + cumsum by rows, cells with zeros are leading NaN:
df[df.notna().cumsum(1) == 0] = 0
df
0 1 2 3 4
A 0.0 0.0 4 5.0 6
B 0.0 7.0 8 NaN 10
C 0.0 2.0 11 24.0 17
Here is another way using cumprod() and apply()
s = df.isna().cumprod(axis=1).sum(axis=1)
df.apply(lambda x: x.fillna(0,limit = s.loc[x.name]),axis=1)
Output:
0 1 2 3 4
A 0.0 0.0 4.0 5.0 6.0
B 0.0 7.0 8.0 NaN 10.0
C 0.0 2.0 11.0 24.0 17.0

Drop columns tha thave a header but all rows are empty Python 3 & Pandas

I just could not figure this one out:
df.dropna(axis = 1, how="all").dropna(axis= 0 ,how="all")
All headers have data. How can I exclude the headers form a df.dropna(how="all") command.
I am afraid this is going to be trivial, but help me out guys.
Thanks,
Levi
Okay, as I understand what you want is as follows:
drop any column where all rows contain NaN
drop any row in which one or more NaN appear
So for example, given a dataframe df like:
Id Col1 Col2 Col3 Col4
0 1 25.0 A NaN 6
1 2 15.0 B NaN 7
2 3 23.0 C NaN 8
3 4 5.0 D NaN 9
4 5 NaN E NaN 10
convert the dataframe by:
df.dropna(axis = 1, how="all", inplace= True)
df.dropna(axis = 0, how='all', inplace= True)
which yields:
Id Col1 Col2 Col4
0 1 25.0 A 6
1 2 15.0 B 7
2 3 23.0 C 8
3 4 5.0 D 9
4 5 NaN E 10

Getting average in pandas dataframe when there are NaN values

I have a pandas dataframe and I would like to add a row at the end of dataframe to show the average of each column; however, due to NaN values in Col2, Col3, and Col4, the mean function cannot return the correct average of the columns. How can I fix this issue?
Col1 Col2 Col3 Col4
1 A 11 10 NaN
2 B 14 NaN 15
3 C 45 16 0
4 D NaN 16 NaN
5 E 12 23 5
P.S. This is the dataframe after getting average (df.loc["mean"] = df.mean()):
Col1 Col2 Col3 Col4
1 A 11 10 NaN
2 B 14 NaN 15
3 C 45 16 0
4 D NaN 16 NaN
5 E 12 23 5
Mean NaN Nan NaN NaN
Problem is columns are not filled by numbers, but string repr, so first convert them to numeric by DataFrame.astype:
cols = ['Col2','Col3','Col4']
df[cols] = df[cols].astype(float)
df.loc["mean"] = df.mean()
print (df)
Col1 Col2 Col3 Col4
1 A 11.0 10.00 NaN
2 B 14.0 NaN 15.000000
3 C 45.0 16.00 0.000000
4 D NaN 16.00 NaN
5 E 12.0 23.00 5.000000
mean NaN 20.5 16.25 6.666667
Or if some non numeric values use to_numeric with errors='coerce':
cols = ['Col2','Col3','Col4']
df[cols] = df[cols].apply(lambda x: pd.to_numeric(x, errors='coerce'))
df.loc["mean"] = df.mean()
You can set skipna=True when calculating the mean:
df = df.mean(axis=0, skipna=True).rename('Mean').pipe(df.append)
print(df)
Col1 Col2 Col3 Col4
0 A 11.0 10.00 NaN
1 B 14.0 NaN 15.000000
2 C 45.0 16.00 0.000000
3 D NaN 16.00 NaN
4 E 12.0 23.00 5.000000
Mean NaN 20.5 16.25 6.666667

Number of NaN values before first non NaN value Python dataframe

I have a dataframe with several columns, some of them contain NaN values. I would like for each row to create another column containing the total number of columns minus the number of NaN values before the first non NaN value.
Original dataframe:
ID Value0 Value1 Value2 Value3
1 10 10 8 15
2 NaN 45 52 NaN
3 NaN NaN NaN NaN
4 NaN NaN 100 150
The extra column would look like:
ID NewColumn
1 4
2 3
3 0
4 2
Thanks in advance!
Set the index to ID
Attach a non-null column to stop/catch the argmax
Use argmax to find the first non-null value
Subtract those values from the length of the relevant columns
df.assign(
NewColumn=
df.shape[1] - 1 -
df.set_index('ID').assign(notnull=1).notnull().values.argmax(1)
)
ID Value0 Value1 Value2 Value3 NewColumn
0 1 10.0 10.0 8.0 15.0 4
1 2 NaN 45.0 52.0 NaN 3
2 3 NaN NaN NaN NaN 0
3 4 NaN NaN 100.0 150.0 2

Join several rows in pandas data frame

This is an example of the existing Data Frame:
A B C t
0 2.0 NaN NaN 0.2
1 NaN 1.0 NaN 0.2
2 NaN NaN 3.0 0.2
3 2.0 NaN NaN 0.2
4 NaN 1.0 NaN 0.2
5 NaN NaN 3.0 0.2
What I would like to have as a result looks like this:
A B C t
0 2 1 3 0.2
1 2 1 3 0.6
In this case the rows with the index 1&2 are inserted in the first row. This should also be possible for longer DataFrames with the same shape.
In addition the timestamp (Column 't') is the relative timestamp between the rows. This means there has to be an addition with the timestamps.
Thanks for the answers and sorry for the bad english :)

Resources