How to combine different columns in a dataframe using comprehension-python - python-3.x

Suppose a dataframe contains
attacker_1 attacker_2 attacker_3 attacker_4
Lannister nan nan nan
nan Stark greyjoy nan
I want to create another column called AttackerCombo that aggregates the 4 columns into 1 column.
How would I go about defining such code in python?
I have been practicing python and I reckon a list comprehension of this sort makes sense, but [list(x) for x in attackers]
where attackers is a numpy array of the 4 columns displays all 4 columns aggregated into 1 column, however I would like to remove all the nans as well.
So the result for each row instead of looking like
starknannanlannister would look like stark/lannister

I think you need apply with join and remove NaN by dropna:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
If need separator empty string use DataFrame.fillna:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('') \
.apply(''.join, axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Starkgreyjoy
Another 2 solutions with list comprehension - first compare by notnull and second check if string:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if pd.notnull(e)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
#python 3 - isinstance(e, str), python 2 - isinstance(e, basestring)
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if isinstance(e, str)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy

You can set a new column in the dataframe that you will fill thanks to a lambda function:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}{}{}{}'.format(x[0],x[1],x[2],x[3]), axis=1)
You don't specify how you want to aggregate them, so for instance, if you want separated by a dash:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}-{}-{}-{}'.format(x[0],x[1],x[2],x[3]), axis=1)

Related

Create a new dataframe from specific columns

I have a dataframe and I want to use columns to create new rows in a new dataframe.
>>> df_1
mix_id ngs phr d mp1 mp2 mp1_wt mp2_wt mp1_phr mp2_phr
2 M01 SBR2353 100.0 NaN MES/HPD SBR2353 0.253731 0.746269 25.373134 74.626866
3 M02 SBR2054 80.0 NaN TDAE SBR2054 0.264706 0.735294 21.176471 58.823529
I would like to have a dataframe like this.
>>> df_2
mix_id ngs phr d
1 M01 MES/HPD 25.373134 NaN
2 M01 SBR2353 74.626866 NaN
3 M02 TDAE 21.176471 NaN
4 M02 SBR2054 58.823529 NaN
IIUC
you can use pd.wide_to_long, it does however needs the repeating columns to have numbers as suffix. So, the first part of solution, just renames the columns to bring the number as suffix
df.columns=[col for col in df.columns[:6]] + [re.sub(r'\d','',col) + str(re.search(r'(\d)',col).group(0)) for col in df.columns[6:] ]
# this makes mp1_wt as mp_wt1, to support pd.wide_to_long
df2=pd.wide_to_long(df, stubnames=['mp','mp_wt','mp_phr'], i=['mix_id','ngs','d'], j='val').reset_index().drop(columns='val')
df2.drop(columns=['ngs','phr','mp_wt'], inplace=True)
df2.rename(columns={'mp':'ngs','mp_phr':'phr'}, inplace=True)
df2
mix_id d ngs phr
0 M01 NaN MES/HPD 25.373134
1 M01 NaN SBR2353 74.626866
2 M02 NaN TDAE 21.176471
3 M02 NaN SBR2054 58.823529

Trying to append a single row of data to a pandas DataFrame, but instead adds rows for each field of input

I am trying to add a row of data to a pandas DataFrame, but it keeps adding a separate row for each piece of data. I feel I am missing something very simple and obvious, but what it is I do not know.
import pandas
colNames = ["ID", "Name", "Gender", "Height", "Weight"]
df1 = pandas.DataFrame(columns = colNames)
df1.set_index("ID", inplace=True, drop=False)
i = df1.shape[0]
person = [{"ID":i},{"Name":"Jack"},{"Gender":"Male"},{"Height":177},{"Weight":75}]
df1 = df1.append(pandas.DataFrame(person, columns=colNames))
print(df1)
Output:
ID Name Gender Height Weight
0 0.0 NaN NaN NaN NaN
1 NaN Jack NaN NaN NaN
2 NaN NaN Male NaN NaN
3 NaN NaN NaN 177.0 NaN
4 NaN NaN NaN NaN 75.0
You are using too many squiggly brackets. All of your data should be inside one pair of squiggly brackets. This creates a single python dictionary. Change that line to:
person = [{"ID":i,"Name":"Jack","Gender":"Male","Height":177,"Weight":75}]

combine row values in all consecutive rows that contains NaN and int values using pandas

I need your help:
I want to merge consecutive rows like this:
Input:
Time ColA ColB Time_for_test[sec]
2020-01-19 08:51:56.461 NaN B NaN
2020-01-19 08:52:15.405 NaN NaN 18.95
2020-01-19 08:52:40.923 A NaN NaN
2020-01-19 08:52:59.589 NaN NaN 18.67
2020-01-19 08:54:07.687 NaN B NaN
Output:
Time ColA ColB Time_for_test[sec]
2020-01-19 08:51:56.461 NaN B NaN
2020-01-19 08:52:15.405 NaN B 18.95
2020-01-19 08:52:40.923 A NaN NaN
2020-01-19 08:52:59.589 A NaN 18.67
2020-01-19 08:54:07.687 NaN B NaN
Of course, I checked if exist similar cases published on the site:
I tried one adding a new column like that:
merge_df = merge_df.fillNa(0)
merge_df['sum'] = merge_df['TableA']+merge_df['Time_for_ST[sec]'].shift(-1)
It did not work.
Thank you for patience
stack and unstack are your friends. Assuming your dataframe index is unique:
df[['ColA', 'ColB']].stack() \
.reset_index(level=1) \
.reindex(df.index) \
.ffill() \
.set_index('level_1', append=True) \
.unstack() \
.droplevel(0, axis=1)
Since it's one long operation chain, you can run only line 1, then line 1,2, then 1,2,3.... to see how it works.

How reindex_like function works with method "ffill" & "bfill"?

I have two dataframe of shape (6,3) & (2,3). Now I want to reindex second dataframe like first dataframe and also fill na values with either ffill method or bfill method. my code is as follows:
df1 = pd.DataFrame(np.random.randn(6,3),columns = ['Col1','Col2','Col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns = ['Col1','Col2','Col3'])
df2 = df2.reindex_like(df1,method='ffill')
But this code is not working well as I am getting following result:
Col1 Col2 Col3
0 0.578282 -0.199872 0.468505
1 1.086811 -0.707933 -0.924984
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
Any suggestion would be great

Pandas take value from columns if not NaN

Given the following data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['One','Two',np.nan],
'B':[np.nan,np.nan,'Three'],
})
df
A B
0 One NaN
1 Two NaN
2 NaN Three
I'd like to create one column ('C') that takes the value of either 'A' or 'B' if it is not NaN like this:
A B C
0 One NaN One
1 Two NaN Two
2 NaN Three Three
Thanks in advance!
You can use combine_first:
df['C'] = df.A.combine_first(df.B)
print df
A B C
0 One NaN One
1 Two NaN Two
2 NaN Three Three
Or fillna:
df['C']= df.A.fillna(df.B)
print df
A B C
0 One NaN One
1 Two NaN Two
2 NaN Three Three
Or np.where and add value if both conditions are False e.g. 1:
df['C'] = np.where(df.A.notnull(), df.A,np.where(df.B.notnull(), df.B, 1))
print df
A B C
0 One NaN One
1 Two NaN Two
2 NaN Three Three

Resources