Creating a column in one dataframe from another dataframe doesn't transfer missing rows - python-3.x

I have the following two dataframes:
data = {'Name': ['Tom', 'Jack', 'nick', 'juli'], 'marks': [99, 98, 95, 90]}
df = pd.DataFrame(data, index=['rank1', 'rank2', 'rank3', 'rank4'])
data = {'salata': ['ntomata', 'tzatziki']}
df2 = pd.DataFrame(data, index=['rank3', 'rank5'])
What I want is to to copy the salata column from df2 to df1.
df['salata'] = df2['salata']
However, it doesn't copy the missing row rank5 to the df1
Update: Thank you for the answers.
What should I use in case the dataframes have different column multiindex levels?
For example:
data = {('Name','Here'): ['Tom', 'Jack', 'nick', 'juli'], ('marks','There'): [99, 98, 95, 90]}
df = pd.DataFrame(data, index=['rank1', 'rank2', 'rank3', 'rank4'])
df[('salata','-')] = df2['salata']

Use DataFrame.combine_first:
#all columns
df = df.combine_first(df2)
#only columns in list
#df = df.combine_first(df2[['salata']])
print (df)
Name marks salata
rank1 Tom 99.0 NaN
rank2 Jack 98.0 NaN
rank3 nick 95.0 ntomata
rank4 juli 90.0 NaN
rank5 NaN NaN tzatziki
EDIT:
If there is MultiIndex first create MultiIndex in df2, e.g. by MultiIndex.from_product:
df2.columns = pd.MultiIndex.from_product([[''], df2.columns])
df = df.combine_first(df2)
print (df)
Name marks
salata Here There
rank1 NaN Tom 99.0
rank2 NaN Jack 98.0
rank3 ntomata nick 95.0
rank4 NaN juli 90.0
rank5 tzatziki NaN NaN
Another solution with concat:
df = pd.concat([df, df2], axis=1)

if your indexes are representative of your example then you can do an outer join :
df = df.join(df2,how='outer')
Name marks salata
rank1 Tom 99.0 NaN
rank2 Jack 98.0 NaN
rank3 nick 95.0 ntomata
rank4 juli 90.0 NaN
rank5 NaN NaN tzatziki

Related

Create a new dataframe from specific columns

I have a dataframe and I want to use columns to create new rows in a new dataframe.
>>> df_1
mix_id ngs phr d mp1 mp2 mp1_wt mp2_wt mp1_phr mp2_phr
2 M01 SBR2353 100.0 NaN MES/HPD SBR2353 0.253731 0.746269 25.373134 74.626866
3 M02 SBR2054 80.0 NaN TDAE SBR2054 0.264706 0.735294 21.176471 58.823529
I would like to have a dataframe like this.
>>> df_2
mix_id ngs phr d
1 M01 MES/HPD 25.373134 NaN
2 M01 SBR2353 74.626866 NaN
3 M02 TDAE 21.176471 NaN
4 M02 SBR2054 58.823529 NaN
IIUC
you can use pd.wide_to_long, it does however needs the repeating columns to have numbers as suffix. So, the first part of solution, just renames the columns to bring the number as suffix
df.columns=[col for col in df.columns[:6]] + [re.sub(r'\d','',col) + str(re.search(r'(\d)',col).group(0)) for col in df.columns[6:] ]
# this makes mp1_wt as mp_wt1, to support pd.wide_to_long
df2=pd.wide_to_long(df, stubnames=['mp','mp_wt','mp_phr'], i=['mix_id','ngs','d'], j='val').reset_index().drop(columns='val')
df2.drop(columns=['ngs','phr','mp_wt'], inplace=True)
df2.rename(columns={'mp':'ngs','mp_phr':'phr'}, inplace=True)
df2
mix_id d ngs phr
0 M01 NaN MES/HPD 25.373134
1 M01 NaN SBR2353 74.626866
2 M02 NaN TDAE 21.176471
3 M02 NaN SBR2054 58.823529

pandas pivot dataframe, but add unseen values

I have 2 dataframes:
purchases = pd.DataFrame([['Alice', 'sweeties', 4],
['Bob', 'chocolate', 5],
['Alice', 'chocolate', 3],
['Claudia', 'juice', 2]],
columns=['client', 'item', 'quantity'])
goods = pd.DataFrame([['sweeties', 15],
['chocolate', 7],
['juice', 8],
['lemons', 3]], columns=['good', 'price'])
and I want to transform purchases with cols and indexes alike at this photo:
My first thought was to use pivot:
purchases.pivot(columns="item", values="quantity")
Output:
The problem is: I also need the lemons column in the pivot result because it's present in the goods dataframe (just filled with None values).
How can I accomplish that?
You can chain with reindex:
purchases.pivot(columns="item", values="quantity").reindex(goods['good'], axis=1)
Output:
good sweeties chocolate juice lemons
0 4.0 NaN NaN NaN
1 NaN 5.0 NaN NaN
2 NaN 3.0 NaN NaN
3 NaN NaN 2.0 NaN
You can use df.merge with df.pivot:
In [3626]: x = goods.merge(purchases, left_on='good', right_on='item', how='left')
In [3628]: x['total'] = x.price * x.quantity # you can tweak this calculation
In [3634]: res = x[['good', 'client', 'total']].pivot('client', 'good', 'total').dropna(how='all').fillna(0)
In [3635]: res
Out[3635]:
good chocolate juice lemons sweeties
client
Alice 21.0 0.0 0.0 60.0
Bob 35.0 0.0 0.0 0.0
Claudia 0.0 16.0 0.0 0.0

Summing up two columns of pandas dataframe ignoring NaN

I have a pandas dataframe as below:
import pandas as pd
df = pd.DataFrame({'ORDER':["A", "A"], 'col1':[np.nan, np.nan], 'col2':[np.nan, 5]})
df
ORDER col1 col2
0 A NaN NaN
1 A NaN 5.0
I want to create a column 'new' as sum(col1, col2) ignoring Nan only if one of the column as Nan,
If both of the columns have NaN value, it should return NaN as below
I tried the below code and it works fine. Is there any way to achieve the same with just one line of code.
df['new'] = df[['col1', 'col2']].sum(axis = 1)
df['new'] = np.where(pd.isnull(df['col1']) & pd.isnull(df['col2']), np.nan, df['new'])
df
ORDER col1 col2 new
0 A NaN NaN NaN
1 A NaN 5.0 5.0
Do sum with min_count
df['new'] = df[['col1','col2']].sum(axis=1,min_count=1)
Out[78]:
0 NaN
1 5.0
dtype: float64
Use the add function on the two columns, which takes a fill_value argument that lets you replace NaN:
df['col1'].add(df['col2'], fill_value=0)
0 NaN
1 5.0
dtype: float64
Is this ok?
df['new'] = df[['col1', 'col2']].sum(axis = 1).replace(0,np.nan)

Trying to append a single row of data to a pandas DataFrame, but instead adds rows for each field of input

I am trying to add a row of data to a pandas DataFrame, but it keeps adding a separate row for each piece of data. I feel I am missing something very simple and obvious, but what it is I do not know.
import pandas
colNames = ["ID", "Name", "Gender", "Height", "Weight"]
df1 = pandas.DataFrame(columns = colNames)
df1.set_index("ID", inplace=True, drop=False)
i = df1.shape[0]
person = [{"ID":i},{"Name":"Jack"},{"Gender":"Male"},{"Height":177},{"Weight":75}]
df1 = df1.append(pandas.DataFrame(person, columns=colNames))
print(df1)
Output:
ID Name Gender Height Weight
0 0.0 NaN NaN NaN NaN
1 NaN Jack NaN NaN NaN
2 NaN NaN Male NaN NaN
3 NaN NaN NaN 177.0 NaN
4 NaN NaN NaN NaN 75.0
You are using too many squiggly brackets. All of your data should be inside one pair of squiggly brackets. This creates a single python dictionary. Change that line to:
person = [{"ID":i,"Name":"Jack","Gender":"Male","Height":177,"Weight":75}]

How to combine different columns in a dataframe using comprehension-python

Suppose a dataframe contains
attacker_1 attacker_2 attacker_3 attacker_4
Lannister nan nan nan
nan Stark greyjoy nan
I want to create another column called AttackerCombo that aggregates the 4 columns into 1 column.
How would I go about defining such code in python?
I have been practicing python and I reckon a list comprehension of this sort makes sense, but [list(x) for x in attackers]
where attackers is a numpy array of the 4 columns displays all 4 columns aggregated into 1 column, however I would like to remove all the nans as well.
So the result for each row instead of looking like
starknannanlannister would look like stark/lannister
I think you need apply with join and remove NaN by dropna:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
If need separator empty string use DataFrame.fillna:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('') \
.apply(''.join, axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Starkgreyjoy
Another 2 solutions with list comprehension - first compare by notnull and second check if string:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if pd.notnull(e)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
#python 3 - isinstance(e, str), python 2 - isinstance(e, basestring)
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if isinstance(e, str)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
You can set a new column in the dataframe that you will fill thanks to a lambda function:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}{}{}{}'.format(x[0],x[1],x[2],x[3]), axis=1)
You don't specify how you want to aggregate them, so for instance, if you want separated by a dash:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}-{}-{}-{}'.format(x[0],x[1],x[2],x[3]), axis=1)

Resources