I am loading a dataframe from a csv file using,
df_c = pd.read_csv(os.path.join(dir_w, csv_c))
Which gives me,
SubCase Row1 Row2 Row3 Row4
0 1003001 NaN 0 NaN 10.0
1 1003002 NaN 0 NaN 10.5
2 1003003 NaN 0 NaN 11.3
3 2000001 110001.0 10 1.0 9.81
4 2000002 110001.0 10 1.0 5.06
For columns 'Row1' and 'Row2' I want to remove the decimal points, while keeping the decimal points in 'Row4'. So it looks like this,
SubCase Row1 Row2 Row3 Row4
0 1003001 NaN 0 NaN 10.0
1 1003002 NaN 0 NaN 10.5
2 1003003 NaN 0 NaN 11.3
3 2000001 110001 10 1 9.81
4 2000002 110001 10 1 5.06
I have tried the following code with no luck,
na_mask = df_c['Row1'].notnull()
df_c.loc[na_mask, 'Row1'] = df_c.loc[na_mask, 'Row1'].round(decimals=0)
and
na_mask = df_c['Row1'].notnull()
df_c.loc[na_mask, 'Row1'] = df_c.loc[na_mask, 'Row1'].astype(int)
Any ideas? Thanks in advance for any help.
I like to reshape a dataframe thats first column should be used to group the other columns by an additional header row.
Initial dataframe
df = pd.DataFrame(
{
'col1':['A','A','A','B','B','B'],
'col2':[1,2,3,4,5,6],
'col3':[1,2,3,4,5,6],
'col4':[1,2,3,4,5,6],
'colx':[1,2,3,4,5,6]
}
)
Trial:
Using pd.pivot() I can create an example, but this do not fit my expected one, it seems to be flipped in grouping:
df.pivot(columns='col1', values=['col2','col3','col4','colx'])
col2 col3 col4 colx
col1 A B A B A B A B
0 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
1 2.0 NaN 2.0 NaN 2.0 NaN 2.0 NaN
2 3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN
3 NaN 4.0 NaN 4.0 NaN 4.0 NaN 4.0
4 NaN 5.0 NaN 5.0 NaN 5.0 NaN 5.0
5 NaN 6.0 NaN 6.0 NaN 6.0 NaN 6.0
Expected output:
A B
col1 col2 col3 col4 colx col2 col3 col4 colx
0 1 1 1 1 4 4 4 4
1 2 2 2 2 5 5 5 5
2 3 3 3 3 6 6 6 6
Create counter column by GroupBy.cumcount, then use DataFrame.pivot with swapping level of MultiIndex in columns by DataFrame.swaplevel, sorting it and last remove index and columns names by DataFrame.rename_axis:
df = (df.assign(g = df.groupby('col1').cumcount())
.pivot(index='g', columns='col1')
.swaplevel(0,1,axis=1)
.sort_index(axis=1)
.rename_axis(index=None, columns=[None, None]))
print(df)
A B
col2 col3 col4 colx col2 col3 col4 colx
0 1 1 1 1 4 4 4 4
1 2 2 2 2 5 5 5 5
2 3 3 3 3 6 6 6 6
As an alternative to the classical pivot, you can concat the output of groupby with a dictionary comprehension, ensuring alignment with reset_index:
out = pd.concat({k: d.drop(columns='col1').reset_index(drop=True)
for k,d in df.groupby('col1')}, axis=1)
output:
A B
col2 col3 col4 colx col2 col3 col4 colx
0 1 1 1 1 4 4 4 4
1 2 2 2 2 5 5 5 5
2 3 3 3 3 6 6 6 6
Hello I have data as follows:
Col1 Col2 col3
A 2020-01-08 25
A 2020-01-11 26
B 2020-01-06 32
B 2020-01-08 45
I want to create another column(col 4) which will have the value for each category in col1 with the 2 months prior col-3 values as below:
Col1 Col2 col3 col4
A 2020-01-08 25 NaN
A 2020-01-10 56 25
A 2020-01-11 26 NaN
B 2020-01-06 32 NaN
B 2020-01-08 45 32
I tried pd.shift, but its not working If I have missing months in the data. Can anyone please help?
Use np.where to conditionally identify groups in which consecutive difference are greater than or equal to 60 days
df['col4'] = np.where(df.groupby('Col1')['Col2'].diff().dt.days.ge(60),df['col3'].shift(), np.nan)
Col1 Col2 col3 col4
0 A 2020-08-01 25 NaN
1 A 2020-10-01 56 25.0
2 A 2020-11-01 26 NaN
3 B 2020-06-01 32 NaN
4 B 2020-08-01 45 32.0
I just could not figure this one out:
df.dropna(axis = 1, how="all").dropna(axis= 0 ,how="all")
All headers have data. How can I exclude the headers form a df.dropna(how="all") command.
I am afraid this is going to be trivial, but help me out guys.
Thanks,
Levi
Okay, as I understand what you want is as follows:
drop any column where all rows contain NaN
drop any row in which one or more NaN appear
So for example, given a dataframe df like:
Id Col1 Col2 Col3 Col4
0 1 25.0 A NaN 6
1 2 15.0 B NaN 7
2 3 23.0 C NaN 8
3 4 5.0 D NaN 9
4 5 NaN E NaN 10
convert the dataframe by:
df.dropna(axis = 1, how="all", inplace= True)
df.dropna(axis = 0, how='all', inplace= True)
which yields:
Id Col1 Col2 Col4
0 1 25.0 A 6
1 2 15.0 B 7
2 3 23.0 C 8
3 4 5.0 D 9
4 5 NaN E 10
I have a dataframe with several columns, some of them contain NaN values. I would like for each row to create another column containing the total number of columns minus the number of NaN values before the first non NaN value.
Original dataframe:
ID Value0 Value1 Value2 Value3
1 10 10 8 15
2 NaN 45 52 NaN
3 NaN NaN NaN NaN
4 NaN NaN 100 150
The extra column would look like:
ID NewColumn
1 4
2 3
3 0
4 2
Thanks in advance!
Set the index to ID
Attach a non-null column to stop/catch the argmax
Use argmax to find the first non-null value
Subtract those values from the length of the relevant columns
df.assign(
NewColumn=
df.shape[1] - 1 -
df.set_index('ID').assign(notnull=1).notnull().values.argmax(1)
)
ID Value0 Value1 Value2 Value3 NewColumn
0 1 10.0 10.0 8.0 15.0 4
1 2 NaN 45.0 52.0 NaN 3
2 3 NaN NaN NaN NaN 0
3 4 NaN NaN 100.0 150.0 2