How to get Series list elements vertically in pandas - python-3.x

I need to get a transpose or columnar representation for list of Series in pandas.Below is code snippet which i have used to form lists from Series-
series1.index.values.tolist()
series1.values.tolist()
It gives below lists as output-
['A', 'B'....'Z'] , [4424180.0, 7463.0.....,34]
Current Output-
['A', 'B'....'Z'] , [4424180.0, 7463.0.....,34].
Output required-
'A' 4424180
'B' 7463

You need reset_index, optionaly rename_axis:
series1 = pd.Series([4424180.0, 7463.0,34], index=['A', 'B', 'Z'])
print (series1)
A 4424180.0
B 7463.0
Z 34.0
dtype: float64
df = series1.rename_axis('a').reset_index(name='b')
print (df)
a b
0 A 4424180.0
1 B 7463.0
2 Z 34.0
df = series1.reset_index()
df.columns = ['a','b']
print (df)
a b
0 A 4424180.0
1 B 7463.0
2 Z 34.0

Related

Unique values across columns row-wise in pandas with missing values

I have a dataframe like
import pandas as pd
import numpy as np
df = pd.DataFrame({"Col1": ['A', np.nan, 'B', 'B', 'C'],
"Col2": ['A', 'B', 'B', 'A', 'C'],
"Col3": ['A', 'B', 'C', 'A', 'C']})
I want to get the unique combinations across columns for each row and create a new column with those values, excluding the missing values.
The code I have right now to do this is
def handle_missing(s):
return np.unique(s[s.notnull()])
def unique_across_rows(data):
unique_vals = data.apply(handle_missing, axis = 1)
# numpy unique sorts the values automatically
merged_vals = unique_vals.apply(lambda x: x[0] if len(x) == 1 else '_'.join(x))
return merged_vals
df['Combos'] = unique_across_rows(df)
This returns the expected output:
Col1 Col2 Col3 Combos
0 A A A A
1 NaN B B B
2 B B C B_C
3 B A A A_B
4 C C C C
It seems to me that there should be a more vectorized approach that exists within Pandas to do this: how could I do that?
You can try a simple list comprehension which might be more efficient for larger dataframes:
df['combos'] = ['_'.join(sorted(k for k in set(v) if pd.notnull(k))) for v in df.values]
Or you can wrap the above list comprehension in a more readable function:
def combos():
for v in df.values:
unique = set(filter(pd.notnull, v))
yield '_'.join(sorted(unique))
df['combos'] = list(combos())
Col1 Col2 Col3 combos
0 A A A A
1 NaN B B B
2 B B C B_C
3 B A A A_B
4 C C C C
You can also use agg/apply on axis=1 like below:
df['Combos'] = df.agg(lambda x: '_'.join(sorted(x.dropna().unique())),axis=1)
print(df)
Col1 Col2 Col3 Combos
0 A A A A
1 NaN B B B
2 B B C B_C
3 B A A A_B
4 C C C C
Try (explanation to follow)
df['Combos'] = (df.stack() # this removes NaN values
.sort_values() # so we have A_B instead of B_A in 3rd row
.groupby(level=0) # group by original index
.agg(lambda x: '_'.join(x.unique())) # join the unique values
)
Output:
Col1 Col2 Col3 Combos
0 A A A A
1 NaN B B B
2 B B C B_C
3 B A A A_B
4 C C C C
fill the nan with a string place-holder '-'. Create a unique array from the col1,col2,col3 list and remove the placeholder. join the unique array values with a '-'
import pandas as pd
import numpy as np
def unique(list1):
if '-' in list1:
list1.remove('-')
x = np.array(list1)
return (np.unique(x))
df = pd.DataFrame({"Col1": ['A', np.nan, 'B', 'B', 'C'],
"Col2": ['A', 'B', 'B', 'A', 'C'],
"Col3": ['A', 'B', 'C', 'A', 'C']}).fillna('-')
s="-"
for key,row in df.iterrows():
df.loc[key,'combos']=s.join(unique([row.Col1, row.Col2, row.Col3]))
print(df.head())

How to compare a string of one column of pandas with rest of the columns and if value is found in any column of the row append a new row?

I want to compare the Category column with all the predicted_site and if value matches with anyone column, append a column named rank and insert 1 if value is found or else insert 0
Use DataFrame.filter for predicted columns compared by DataFrame.eq with Category column, convert to integers, change columns names by DataFrame.add_prefix and last add new columns by DataFrame.join:
df = pd.DataFrame({
'category':list('abcabc'),
'B':[4,5,4,5,5,4],
'predicted1':list('adadbd'),
'predicted2':list('cbarac')
})
df1 = df.filter(like='predicted').eq(df['category'], axis=0).astype(int).add_prefix('new_')
df = df.join(df1)
print (df)
category B predicted1 predicted2 new_predicted1 new_predicted2
0 a 4 a c 1 0
1 b 5 d b 0 1
2 c 4 a a 0 0
3 a 5 d r 0 0
4 b 5 b a 1 0
5 c 4 d c 0 1
This solution is much less elegant than that proposed by #jezrael, however you can try it.
#sample dataframe
d = {'cat': ['comp-el', 'el', 'comp', 'comp-el', 'el', 'comp'], 'predicted1': ['com', 'al', 'p', 'col', 'el', 'comp'], 'predicted2': ['a', 'el', 'p', 'n', 's', 't']}
df = pd.DataFrame(data=d)
#iterating through rows
for i, row in df.iterrows():
#assigning values
cat = df.loc[i,'cat']
predicted1 = df.loc[i,'predicted1']
predicted2 = df.loc[i,'predicted2']
#condition
if (cat == predicted1 or cat == predicted2):
df.loc[i,'rank'] = 1
else:
df.loc[i,'rank'] = 0
output:
cat predicted1 predicted2 rank
0 comp-el com a 0.0
1 el al el 1.0
2 comp p p 0.0
3 comp-el col n 0.0
4 el el s 1.0
5 comp comp t 1.0

append one dataframe column value to another dataframe

I have two dataframes. df1 is empty dataframe and df2 is having some data as shown. There are few columns common in both dfs. I want to append df2 dataframe columns data into df1 dataframe's column. df3 is expected result.
I have referred Python + Pandas + dataframe : couldn't append one dataframe to another, but not working. It gives following error:
ValueError: Plan shapes are not aligned
df1:
Empty DataFrame
Columns: [a, b, c, d, e]
Index: [] `
df2:
c e
0 11 55
1 22 66
df3 (expected output):
a b c d e
0 11 55
1 22 66
tried with append but not getting desired result
import pandas as pd
l1 = ['a', 'b', 'c', 'd', 'e']
l2 = []
df1 = pd.DataFrame(l2, columns=l1)
l3 = ['c', 'e']
l4 = [[11, 55],
[22, 66]]
df2 = pd.DataFrame(l4, columns=l3)
print("concat","\n",pd.concat([df1,df2])) # columns will be inplace
print("merge Nan","\n",pd.merge(df2, df1,how='left', on=l3)) # columns occurence is not preserved
#### Output ####
#concat
a b c d e
0 NaN NaN 11 NaN 55
1 NaN NaN 22 NaN 66
#merge
c e a b d
0 11 55 NaN NaN NaN
1 22 66 NaN NaN NaN
Append seems to work for me. Does this not do what you want?
df1 = pd.DataFrame(columns=['a', 'b', 'c'])
print("df1: ")
print(df1)
df2 = pd.DataFrame(columns=['a', 'c'], data=[[0, 1], [2, 3]])
print("df2:")
print(df2)
print("df1.append(df2):")
print(df1.append(df2, ignore_index=True, sort=False))
Output:
df1:
Empty DataFrame
Columns: [a, b, c]
Index: []
df2:
a c
0 0 1
1 2 3
df1.append(df2):
a b c
0 0 NaN 1
1 2 NaN 3
Have you tried pd.concat ?
pd.concat([df1,df2])

Merge two columns into one keeping hierarchical structure using pandas or excel writer

I need to collapse two columns into one preserving hierarchical structure of the rest either using pandas or pandas and excel writer. I need to transform this:
df = pd.DataFrame({'A': [ 'p', 'p', 'q'], 'B': ['x', 'y', 'z'], 'C': [1, 2, 3]})
df
A B C
0 p x 1
1 p y 2
2 q z 3
To this:
A C
0 p
1 x 1
2 y 2
3 q
4 z 3
UPD.
Thank you for your help. I edited my question and added more details.
It seems you need:
df1 = df.stack().drop_duplicates().reset_index(drop=True).to_frame(name='A')
print (df1)
A
0 p
1 x
2 y
3 q
4 z
Detail:
print (df.stack())
0 A p
B x
1 A p
B y
2 A q
B z
dtype: object
print (df.stack().drop_duplicates())
0 A p
B x
1 B y
2 A q
B z
dtype: object
Or if need remove duplicates only in first column is possible replace them by NaNs and stack function remove this rows:
df = pd.DataFrame({'A': [ 'p', 'p', 'q'], 'B': ['x', 'z', 'z']})
print (df)
A B
0 p x
1 p z
2 q z
df['A'] = df['A'].mask(df['A'].duplicated())
df = df.stack().reset_index(drop=True).to_frame(name='A')
print (df)
A
0 p
1 x
2 z
3 q
4 z
Detail:
df['A'] = df['A'].mask(df['A'].duplicated())
print (df)
A B
0 p x
1 NaN y
2 q z
EDIT:
df1 = (df.set_index('C')
.stack()
.reset_index(name='A')
.drop('level_1', 1)
.drop_duplicates('A')[['A','C']])
df1['C'] = df1['C'].mask(df1['A'].isin(df['A']), '')
print (df1)
A C
0 p
1 x 1
3 y 2
4 q
5 z 3
Use stack as mentioned above.
Alternatively,
In [5443]: _, idx = np.unique(df, return_index=True)
In [5444]: pd.DataFrame({'A': df.values.flatten()[np.sort(idx)]})
Out[5444]:
A
0 p
1 x
2 y
3 q
4 z

Filter data iteratively in Python data frame

I'm wondering about existing pandas functionalities, that I might not been able to find so far.
Bascially, I have a data frame with various columns. I'd like to select specific rows depending on the values of certain colums (FYI: i was interested in the value of column D, that had several parameters described in A-C).
E.g. I want to know which row(s) have A==1 & B==2 & C==5?
df
A B C D
0 1 2 4 a
1 1 2 5 b
2 1 3 4 c
df_result
1 1 2 5 b
So far I have been able to basically reduce this:
import pandas as pd
df = pd.DataFrame({'A': [1,1,1],
'B': [2,2,3],
'C': [4,5,4],
'D': ['a', 'b', 'c']})
df_A = df[df['A'] == 1]
df_B = df_A[df_A['B'] == 2]
df_C = df_B[df_B['C'] == 5]
To this:
parameter = [['A', 1],
['B', 2],
['C', 5]]
df_filtered = df
for x, y in parameter:
df_filtered = df_filtered[df_filtered[x] == y]
which yielded the same results. But I wonder if there's another way? Maybe without loop in one line?
You could use query() method to filter data, and construct filter expression from parameters like
In [288]: df.query(' and '.join(['{0}=={1}'.format(x[0], x[1]) for x in parameter]))
Out[288]:
A B C D
1 1 2 5 b
Details
In [296]: df
Out[296]:
A B C D
0 1 2 4 a
1 1 2 5 b
2 1 3 4 c
In [297]: query = ' and '.join(['{0}=={1}'.format(x[0], x[1]) for x in parameter])
In [298]: query
Out[298]: 'A==1 and B==2 and C==5'
In [299]: df.query(query)
Out[299]:
A B C D
1 1 2 5 b
Just for the information if others are interested, I would have done it this way:
import numpy as np
matched = np.all([df[vn] == vv for vn, vv in parameters], axis=0)
df_filtered = df[matched]
But I like the query function better, now that I have seen it #John Galt.

Resources