I have two dataframe of shape (6,3) & (2,3). Now I want to reindex second dataframe like first dataframe and also fill na values with either ffill method or bfill method. my code is as follows:
df1 = pd.DataFrame(np.random.randn(6,3),columns = ['Col1','Col2','Col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns = ['Col1','Col2','Col3'])
df2 = df2.reindex_like(df1,method='ffill')
But this code is not working well as I am getting following result:
Col1 Col2 Col3
0 0.578282 -0.199872 0.468505
1 1.086811 -0.707933 -0.924984
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
Any suggestion would be great
Related
I have a pandas dataframe as below:
import pandas as pd
df = pd.DataFrame({'ORDER':["A", "A"], 'col1':[np.nan, np.nan], 'col2':[np.nan, 5]})
df
ORDER col1 col2
0 A NaN NaN
1 A NaN 5.0
I want to create a column 'new' as sum(col1, col2) ignoring Nan only if one of the column as Nan,
If both of the columns have NaN value, it should return NaN as below
I tried the below code and it works fine. Is there any way to achieve the same with just one line of code.
df['new'] = df[['col1', 'col2']].sum(axis = 1)
df['new'] = np.where(pd.isnull(df['col1']) & pd.isnull(df['col2']), np.nan, df['new'])
df
ORDER col1 col2 new
0 A NaN NaN NaN
1 A NaN 5.0 5.0
Do sum with min_count
df['new'] = df[['col1','col2']].sum(axis=1,min_count=1)
Out[78]:
0 NaN
1 5.0
dtype: float64
Use the add function on the two columns, which takes a fill_value argument that lets you replace NaN:
df['col1'].add(df['col2'], fill_value=0)
0 NaN
1 5.0
dtype: float64
Is this ok?
df['new'] = df[['col1', 'col2']].sum(axis = 1).replace(0,np.nan)
How do i remove nan values from dataframe in Python? I already tried with dropna(), but that did not work for me. Also is NaN diffferent from nan. I am using Pandas.
While printing the data frame it does not print as NaN but instead as nan.
1 2.11358 0.649067060588935
2 nan 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 nan
You can change nan values with NaN using replace() and then use dropna().
import numpy as np
df = df.replace('nan', np.nan)
df = df.dropna()
Update:
Original dataframe:
1 2.11358 0.649067060588935
2 nan 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 nan
Applied df.replace('nan', np.nan):
1 2.11358 0.649067060588935
2 NaN 0.6094130485307419
3 2.10066 0.3653980276694516
4 2.10545 NaN
Applied df.dropna():
1 2.11358 0.649067060588935
3 2.10066 0.3653980276694516
I am trying to add a row of data to a pandas DataFrame, but it keeps adding a separate row for each piece of data. I feel I am missing something very simple and obvious, but what it is I do not know.
import pandas
colNames = ["ID", "Name", "Gender", "Height", "Weight"]
df1 = pandas.DataFrame(columns = colNames)
df1.set_index("ID", inplace=True, drop=False)
i = df1.shape[0]
person = [{"ID":i},{"Name":"Jack"},{"Gender":"Male"},{"Height":177},{"Weight":75}]
df1 = df1.append(pandas.DataFrame(person, columns=colNames))
print(df1)
Output:
ID Name Gender Height Weight
0 0.0 NaN NaN NaN NaN
1 NaN Jack NaN NaN NaN
2 NaN NaN Male NaN NaN
3 NaN NaN NaN 177.0 NaN
4 NaN NaN NaN NaN 75.0
You are using too many squiggly brackets. All of your data should be inside one pair of squiggly brackets. This creates a single python dictionary. Change that line to:
person = [{"ID":i,"Name":"Jack","Gender":"Male","Height":177,"Weight":75}]
I need your help:
I want to merge consecutive rows like this:
Input:
Time ColA ColB Time_for_test[sec]
2020-01-19 08:51:56.461 NaN B NaN
2020-01-19 08:52:15.405 NaN NaN 18.95
2020-01-19 08:52:40.923 A NaN NaN
2020-01-19 08:52:59.589 NaN NaN 18.67
2020-01-19 08:54:07.687 NaN B NaN
Output:
Time ColA ColB Time_for_test[sec]
2020-01-19 08:51:56.461 NaN B NaN
2020-01-19 08:52:15.405 NaN B 18.95
2020-01-19 08:52:40.923 A NaN NaN
2020-01-19 08:52:59.589 A NaN 18.67
2020-01-19 08:54:07.687 NaN B NaN
Of course, I checked if exist similar cases published on the site:
I tried one adding a new column like that:
merge_df = merge_df.fillNa(0)
merge_df['sum'] = merge_df['TableA']+merge_df['Time_for_ST[sec]'].shift(-1)
It did not work.
Thank you for patience
stack and unstack are your friends. Assuming your dataframe index is unique:
df[['ColA', 'ColB']].stack() \
.reset_index(level=1) \
.reindex(df.index) \
.ffill() \
.set_index('level_1', append=True) \
.unstack() \
.droplevel(0, axis=1)
Since it's one long operation chain, you can run only line 1, then line 1,2, then 1,2,3.... to see how it works.
I'm appending data from a list to pandas df. I keep getting NaN in my entries.
Based on what I've read I think I might have to mention the data type for each column in my code.
dumps = [];features_df = pd.DataFrame()
for i in range (int(len(ids)/50)):
dumps = sp.audio_features(ids[i*50:50*(i+1)])
for i in range (len(dumps)):
print(list(dumps[0].values()))
features_df = features_df.append(list(dumps[0].values()), ignore_index = True)
Expected results, something like-
[0.833, 0.539, 11, -7.399, 0, 0.178, 0.163, 2.1e-06, 0.101, 0.385, 99.947, 'audio_features', '6MWtB6iiXyIwun0YzU6DFP', 'spotify:track:6MWtB6iiXyIwun0YzU6DFP', 'https://api.spotify.com/v1/tracks/6MWtB6iiXyIwun0YzU6DFP', 'https://api.spotify.com/v1/audio-analysis/6MWtB6iiXyIwun0YzU6DFP', 149520, 4]
for one row.
Actual-
danceability energy ... duration_ms time_signature
0 NaN NaN ... NaN NaN
1 NaN NaN ... NaN NaN
2 NaN NaN ... NaN NaN
3 NaN NaN ... NaN NaN
4 NaN NaN ... NaN NaN
5 NaN NaN ... NaN NaN
For all rows
append() strategy in a tight loop isn't a great way to do this. Rather, you can construct an empty DataFrame and then use loc to specify an insertion point. The DataFrame index should be used.
For example:
import pandas as pd
df = pd.DataFrame(data=[], columns=['n'])
for i in range(100):
df.loc[i] = i
print(df)
time python3 append_df.py
n
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
real 0m13.178s
user 0m12.287s
sys 0m0.617s
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html
Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.