Remove NaN values from certain columns Pandas Series [duplicate] - python-3.x

This question already has answers here:
Python: Justifying NumPy array
(2 answers)
How to move Nan values to end in all columns
(2 answers)
Closed 1 year ago.
I have the following DF:
AA BB CC
1 1 1
NaN 3 NaN
4 4 6
NaN NaN 3
NaN
NaN
4
The output should be:
AA BB CC
1 1 1
4 3 6
4 3
4
I've tried:
df = df.dropna(subset=['AA', 'BB', 'CC'])
AA BB CC
0 2 3 1
2 5 5 6
and this is the output I get.
Is there anything else I should be doing differently?

You can use:
df.apply(lambda x: x.dropna().reset_index(drop = True))
AA BB CC
0 1.0 1.0 1.0
1 4.0 3.0 6.0
2 NaN 4.0 3.0
3 NaN NaN 4.0

Related

Drop columns tha thave a header but all rows are empty Python 3 & Pandas

I just could not figure this one out:
df.dropna(axis = 1, how="all").dropna(axis= 0 ,how="all")
All headers have data. How can I exclude the headers form a df.dropna(how="all") command.
I am afraid this is going to be trivial, but help me out guys.
Thanks,
Levi
Okay, as I understand what you want is as follows:
drop any column where all rows contain NaN
drop any row in which one or more NaN appear
So for example, given a dataframe df like:
Id Col1 Col2 Col3 Col4
0 1 25.0 A NaN 6
1 2 15.0 B NaN 7
2 3 23.0 C NaN 8
3 4 5.0 D NaN 9
4 5 NaN E NaN 10
convert the dataframe by:
df.dropna(axis = 1, how="all", inplace= True)
df.dropna(axis = 0, how='all', inplace= True)
which yields:
Id Col1 Col2 Col4
0 1 25.0 A 6
1 2 15.0 B 7
2 3 23.0 C 8
3 4 5.0 D 9
4 5 NaN E 10

Replacing constant values with nan

import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=['col1'])
print df
col1
0 1
1 3
2 3
3 1
4 2
5 3
6 2
7 2
Expected result:
Col1 newCol1
0 1. 1
1 3. 3
2 3. NaN
3. 1. 1
4 2. 2
5 3. 3
6 2. 2
7. 2. Nan
Try where combine with shift
df['col2'] = df.col1.where(df.col1.ne(df.col1.shift()))
df
Out[191]:
col1 col2
0 1 1.0
1 3 3.0
2 3 NaN
3 1 1.0
4 2 2.0
5 3 3.0
6 2 2.0
7 2 NaN

Concatenating Pandas dataframes, getting Nan values for first dataframe

I'm trying to join two dataframes. 'df' is my initial dataframe containing all the header information I require. 'row' is my first row of data that I want to append to 'df'.
df =
FName E1 E2 E3 E4 E5 E6
0 Nan 2 2 2 2 2 2
1 Nan 1 1 1 1 1 1
2 Nan 3 4 5 6 7 8
3 Nan 4 5 6 7 8 10
4 Nan 1002003004 1002004005 1002005006 1002006007 1002007008 1002008010
row =
0 1 2 3 4 5 6
0 501#_ZMB_2019-04-03_070528_reciprocals 30.0193 30.0193 30.0193 34.8858 34.8858 34.8858
I'm trying to create this:
FName E1 E2 E3 E4 E5 E6
0 Nan 2 2 2 2 2 2
1 Nan 1 1 1 1 1 1
2 Nan 3 4 5 6 7 8
3 Nan 4 5 6 7 8 10
4 Nan 1002003004 1002004005 1002005006 1002006007 1002007008 1002008010
5 501#_ZMB_2019-04-03_070528_reciprocals 30.0193 30.0193 30.0193 34.8858 34.8858 34.8858
I have tried the following:
df = df.append(row, ignore_index=True)
and
df = pd.concat([df, row], ignore_index=True)
Both of these result in the loss of all the data in the first df, which should contain all the header information.
0 1 2 3 4 5 6
0 Nan Nan Nan Nan Nan Nan Nan
1 Nan Nan Nan Nan Nan Nan Nan
2 Nan Nan Nan Nan Nan Nan Nan
3 Nan Nan Nan Nan Nan Nan Nan
4 Nan Nan Nan Nan Nan Nan Nan
5 501#_ZMB_2019-04-03_070528_reciprocals 30.0193 30.0193 30.0193 34.8858 34.8858 34.8858
I've also tried
df = pd.concat([df.reset_index(drop=True, inplace=True), row.reset_index(drop=True, inplace=True)])
Which produced the following Traceback
Traceback (most recent call last):
File "<ipython-input-146-3c1ecbd1987c>", line 1, in <module>
df = pd.concat([df.reset_index(drop=True, inplace=True), row.reset_index(drop=True, inplace=True)])
File "C:\Users\russells\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 228, in concat
copy=copy, sort=sort)
File "C:\Users\russells\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 280, in __init__
raise ValueError('All objects passed were None')
ValueError: All objects passed were None
Does anyone know what I'm doing wrong?
When you concatenate extra rows, pandas aligns the columns, which currently do not overlap. rename will get the job done:
pd.concat([df, row.rename(columns=dict(zip(row.columns, df.columns)))],
ignore_index=True)
FName E1 E2 E3 E4 E5 E6
0 Nan 2 2 2 2 2 2
1 Nan 1 1 1 1 1 1
2 Nan 3 4 5 6 7 8
3 Nan 4 5 6 7 8 10
4 Nan 1002003004 1002004005 1002005006 1002006007 1002007008 1002008010
5 501#_ZMB_2019-04-03_070528_reciprocals 30.0193 30.0193 30.0193 34.8858 34.8858 34.8858
Or if you just need to assign one row at the end and you have a RangeIndex on df:
df.loc[df.shape[0], :] = row.to_numpy()

shifting a column down in a pandas dataframe

I have data in the following way
A B C
1 2 3
2 5 6
7 8 9
I want to change the dataframe into
A B C
2 3
1 5 6
2 8 9
3
One way would be to add a blank row to the dataframe and then use shift
# input df:
A B C
0 1 2 3
1 2 5 6
2 7 8 9
df.loc[len(df.index), :] = None
df['A'] = df.A.shift(1)
print (df)
A B C
0 NaN 2.0 3.0
1 1.0 5.0 6.0
2 2.0 8.0 9.0
3 7.0 NaN NaN

Python: Summing every five rows of column b data and create a new column

I have a dataframe like below. I would like to sum row 0 to 4 (every 5 rows) and create another column with summed value ("new column"). My real dataframe has 263 rows so, last three rows every 12 rows will be sum of three rows only. How I can do this using Pandas/Python. I have started to learn Python recently. Thanks for any advice in advance!
My data patterns is more complex as I am using the index as one of my column values and it repeats like:
Row Data "new column"
0 5
1 1
2 3
3 3
4 2 14
5 4
6 8
7 1
8 2
9 1 16
10 0
11 2
12 3 5
0 3
1 1
2 2
3 3
4 2 11
5 2
6 6
7 2
8 2
9 1 13
10 1
11 0
12 1 2
...
259 50 89
260 1
261 4
262 5 10
I tried iterrows and groupby but can't make it work so far.
Use this:
df['new col'] = df.groupby(df.index // 5)['Data'].transform('sum')[lambda x: ~(x.duplicated(keep='last'))]
Output:
Data new col
0 5 NaN
1 1 NaN
2 3 NaN
3 3 NaN
4 2 14.0
5 4 NaN
6 8 NaN
7 1 NaN
8 2 NaN
9 1 16.0
Edit to handle updated question:
g = df.groupby(df.Row).cumcount()
df['new col'] = df.groupby([g, df.Row // 5])['Data']\
.transform('sum')[lambda x: ~(x.duplicated(keep='last'))]
Output:
Row Data new col
0 0 5 NaN
1 1 1 NaN
2 2 3 NaN
3 3 3 NaN
4 4 2 14.0
5 5 4 NaN
6 6 8 NaN
7 7 1 NaN
8 8 2 NaN
9 9 1 16.0
10 10 0 NaN
11 11 2 NaN
12 12 3 5.0
13 0 3 NaN
14 1 1 NaN
15 2 2 NaN
16 3 3 NaN
17 4 2 11.0
18 5 2 NaN
19 6 6 NaN
20 7 2 NaN
21 8 2 NaN
22 9 1 13.0
23 10 1 NaN
24 11 0 NaN
25 12 1 2.0

Resources