How to get Series list elements vertically in pandas

How to get Series list elements vertically in pandas - python-3.x

I need to get a transpose or columnar representation for list of Series in pandas.Below is code snippet which i have used to form lists from Series-
series1.index.values.tolist()
series1.values.tolist()
It gives below lists as output-
['A', 'B'....'Z'] , [4424180.0, 7463.0.....,34]
Current Output-
['A', 'B'....'Z'] , [4424180.0, 7463.0.....,34].
Output required-
'A' 4424180
'B' 7463

You need reset_index, optionaly rename_axis:
series1 = pd.Series([4424180.0, 7463.0,34], index=['A', 'B', 'Z'])
print (series1)
A 4424180.0
B 7463.0
Z 34.0
dtype: float64
df = series1.rename_axis('a').reset_index(name='b')
print (df)
a b
0 A 4424180.0
1 B 7463.0
2 Z 34.0
df = series1.reset_index()
df.columns = ['a','b']
print (df)
a b
0 A 4424180.0
1 B 7463.0
2 Z 34.0

Related

Unique values across columns row-wise in pandas with missing values

I have a dataframe like
import pandas as pd
import numpy as np
df = pd.DataFrame({"Col1": ['A', np.nan, 'B', 'B', 'C'],
"Col2": ['A', 'B', 'B', 'A', 'C'],
"Col3": ['A', 'B', 'C', 'A', 'C']})
I want to get the unique combinations across columns for each row and create a new column with those values, excluding the missing values.
The code I have right now to do this is
def handle_missing(s):
return np.unique(s[s.notnull()])
def unique_across_rows(data):
unique_vals = data.apply(handle_missing, axis = 1)
# numpy unique sorts the values automatically
merged_vals = unique_vals.apply(lambda x: x[0] if len(x) == 1 else '_'.join(x))
return merged_vals
df['Combos'] = unique_across_rows(df)
This returns the expected output:
Col1 Col2 Col3 Combos
0 A A A A
1 NaN B B B
2 B B C B_C
3 B A A A_B
4 C C C C
It seems to me that there should be a more vectorized approach that exists within Pandas to do this: how could I do that?

You can try a simple list comprehension which might be more efficient for larger dataframes:
df['combos'] = ['_'.join(sorted(k for k in set(v) if pd.notnull(k))) for v in df.values]
Or you can wrap the above list comprehension in a more readable function:
def combos():
for v in df.values:
unique = set(filter(pd.notnull, v))
yield '_'.join(sorted(unique))
df['combos'] = list(combos())
Col1 Col2 Col3 combos
0 A A A A
1 NaN B B B
2 B B C B_C
3 B A A A_B
4 C C C C

You can also use agg/apply on axis=1 like below:
df['Combos'] = df.agg(lambda x: '_'.join(sorted(x.dropna().unique())),axis=1)
print(df)
Col1 Col2 Col3 Combos
0 A A A A
1 NaN B B B
2 B B C B_C
3 B A A A_B
4 C C C C

Try (explanation to follow)
df['Combos'] = (df.stack() # this removes NaN values
.sort_values() # so we have A_B instead of B_A in 3rd row
.groupby(level=0) # group by original index
.agg(lambda x: '_'.join(x.unique())) # join the unique values
)
Output:
Col1 Col2 Col3 Combos
0 A A A A
1 NaN B B B
2 B B C B_C
3 B A A A_B
4 C C C C

fill the nan with a string place-holder '-'. Create a unique array from the col1,col2,col3 list and remove the placeholder. join the unique array values with a '-'
import pandas as pd
import numpy as np
def unique(list1):
if '-' in list1:
list1.remove('-')
x = np.array(list1)
return (np.unique(x))
df = pd.DataFrame({"Col1": ['A', np.nan, 'B', 'B', 'C'],
"Col2": ['A', 'B', 'B', 'A', 'C'],
"Col3": ['A', 'B', 'C', 'A', 'C']}).fillna('-')
s="-"
for key,row in df.iterrows():
df.loc[key,'combos']=s.join(unique([row.Col1, row.Col2, row.Col3]))
print(df.head())

How to compare a string of one column of pandas with rest of the columns and if value is found in any column of the row append a new row?

I want to compare the Category column with all the predicted_site and if value matches with anyone column, append a column named rank and insert 1 if value is found or else insert 0

Use DataFrame.filter for predicted columns compared by DataFrame.eq with Category column, convert to integers, change columns names by DataFrame.add_prefix and last add new columns by DataFrame.join:
df = pd.DataFrame({
'category':list('abcabc'),
'B':[4,5,4,5,5,4],
'predicted1':list('adadbd'),
'predicted2':list('cbarac')
})
df1 = df.filter(like='predicted').eq(df['category'], axis=0).astype(int).add_prefix('new_')
df = df.join(df1)
print (df)
category B predicted1 predicted2 new_predicted1 new_predicted2
0 a 4 a c 1 0
1 b 5 d b 0 1
2 c 4 a a 0 0
3 a 5 d r 0 0
4 b 5 b a 1 0
5 c 4 d c 0 1

This solution is much less elegant than that proposed by #jezrael, however you can try it.
#sample dataframe
d = {'cat': ['comp-el', 'el', 'comp', 'comp-el', 'el', 'comp'], 'predicted1': ['com', 'al', 'p', 'col', 'el', 'comp'], 'predicted2': ['a', 'el', 'p', 'n', 's', 't']}
df = pd.DataFrame(data=d)
#iterating through rows
for i, row in df.iterrows():
#assigning values
cat = df.loc[i,'cat']
predicted1 = df.loc[i,'predicted1']
predicted2 = df.loc[i,'predicted2']
#condition
if (cat == predicted1 or cat == predicted2):
df.loc[i,'rank'] = 1
else:
df.loc[i,'rank'] = 0
output:
cat predicted1 predicted2 rank
0 comp-el com a 0.0
1 el al el 1.0
2 comp p p 0.0
3 comp-el col n 0.0
4 el el s 1.0
5 comp comp t 1.0

append one dataframe column value to another dataframe

I have two dataframes. df1 is empty dataframe and df2 is having some data as shown. There are few columns common in both dfs. I want to append df2 dataframe columns data into df1 dataframe's column. df3 is expected result.
I have referred Python + Pandas + dataframe : couldn't append one dataframe to another, but not working. It gives following error:
ValueError: Plan shapes are not aligned
df1:
Empty DataFrame
Columns: [a, b, c, d, e]
Index: [] `
df2:
c e
0 11 55
1 22 66
df3 (expected output):
a b c d e
0 11 55
1 22 66
tried with append but not getting desired result

import pandas as pd
l1 = ['a', 'b', 'c', 'd', 'e']
l2 = []
df1 = pd.DataFrame(l2, columns=l1)
l3 = ['c', 'e']
l4 = [[11, 55],
[22, 66]]
df2 = pd.DataFrame(l4, columns=l3)
print("concat","\n",pd.concat([df1,df2])) # columns will be inplace
print("merge Nan","\n",pd.merge(df2, df1,how='left', on=l3)) # columns occurence is not preserved
#### Output ####
#concat
a b c d e
0 NaN NaN 11 NaN 55
1 NaN NaN 22 NaN 66
#merge
c e a b d
0 11 55 NaN NaN NaN
1 22 66 NaN NaN NaN

Append seems to work for me. Does this not do what you want?
df1 = pd.DataFrame(columns=['a', 'b', 'c'])
print("df1: ")
print(df1)
df2 = pd.DataFrame(columns=['a', 'c'], data=[[0, 1], [2, 3]])
print("df2:")
print(df2)
print("df1.append(df2):")
print(df1.append(df2, ignore_index=True, sort=False))
Output:
df1:
Empty DataFrame
Columns: [a, b, c]
Index: []
df2:
a c
0 0 1
1 2 3
df1.append(df2):
a b c
0 0 NaN 1
1 2 NaN 3

Have you tried pd.concat ?
pd.concat([df1,df2])

Merge two columns into one keeping hierarchical structure using pandas or excel writer

I need to collapse two columns into one preserving hierarchical structure of the rest either using pandas or pandas and excel writer. I need to transform this:
df = pd.DataFrame({'A': [ 'p', 'p', 'q'], 'B': ['x', 'y', 'z'], 'C': [1, 2, 3]})
df
A B C
0 p x 1
1 p y 2
2 q z 3
To this:
A C
0 p
1 x 1
2 y 2
3 q
4 z 3
UPD.
Thank you for your help. I edited my question and added more details.

It seems you need:
df1 = df.stack().drop_duplicates().reset_index(drop=True).to_frame(name='A')
print (df1)
A
0 p
1 x
2 y
3 q
4 z
Detail:
print (df.stack())
0 A p
B x
1 A p
B y
2 A q
B z
dtype: object
print (df.stack().drop_duplicates())
0 A p
B x
1 B y
2 A q
B z
dtype: object
Or if need remove duplicates only in first column is possible replace them by NaNs and stack function remove this rows:
df = pd.DataFrame({'A': [ 'p', 'p', 'q'], 'B': ['x', 'z', 'z']})
print (df)
A B
0 p x
1 p z
2 q z
df['A'] = df['A'].mask(df['A'].duplicated())
df = df.stack().reset_index(drop=True).to_frame(name='A')
print (df)
A
0 p
1 x
2 z
3 q
4 z
Detail:
df['A'] = df['A'].mask(df['A'].duplicated())
print (df)
A B
0 p x
1 NaN y
2 q z
EDIT:
df1 = (df.set_index('C')
.stack()
.reset_index(name='A')
.drop('level_1', 1)
.drop_duplicates('A')[['A','C']])
df1['C'] = df1['C'].mask(df1['A'].isin(df['A']), '')
print (df1)
A C
0 p
1 x 1
3 y 2
4 q
5 z 3

Use stack as mentioned above.
Alternatively,
In [5443]: _, idx = np.unique(df, return_index=True)
In [5444]: pd.DataFrame({'A': df.values.flatten()[np.sort(idx)]})
Out[5444]:
A
0 p
1 x
2 y
3 q
4 z

Filter data iteratively in Python data frame

I'm wondering about existing pandas functionalities, that I might not been able to find so far.
Bascially, I have a data frame with various columns. I'd like to select specific rows depending on the values of certain colums (FYI: i was interested in the value of column D, that had several parameters described in A-C).
E.g. I want to know which row(s) have A==1 & B==2 & C==5?
df
A B C D
0 1 2 4 a
1 1 2 5 b
2 1 3 4 c
df_result
1 1 2 5 b
So far I have been able to basically reduce this:
import pandas as pd
df = pd.DataFrame({'A': [1,1,1],
'B': [2,2,3],
'C': [4,5,4],
'D': ['a', 'b', 'c']})
df_A = df[df['A'] == 1]
df_B = df_A[df_A['B'] == 2]
df_C = df_B[df_B['C'] == 5]
To this:
parameter = [['A', 1],
['B', 2],
['C', 5]]
df_filtered = df
for x, y in parameter:
df_filtered = df_filtered[df_filtered[x] == y]
which yielded the same results. But I wonder if there's another way? Maybe without loop in one line?

You could use query() method to filter data, and construct filter expression from parameters like
In [288]: df.query(' and '.join(['{0}=={1}'.format(x[0], x[1]) for x in parameter]))
Out[288]:
A B C D
1 1 2 5 b
Details
In [296]: df
Out[296]:
A B C D
0 1 2 4 a
1 1 2 5 b
2 1 3 4 c
In [297]: query = ' and '.join(['{0}=={1}'.format(x[0], x[1]) for x in parameter])
In [298]: query
Out[298]: 'A==1 and B==2 and C==5'
In [299]: df.query(query)
Out[299]:
A B C D
1 1 2 5 b

Just for the information if others are interested, I would have done it this way:
import numpy as np
matched = np.all([df[vn] == vv for vn, vv in parameters], axis=0)
df_filtered = df[matched]
But I like the query function better, now that I have seen it #John Galt.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get Series list elements vertically in pandas - python-3.x

Related

Unique values across columns row-wise in pandas with missing values

How to compare a string of one column of pandas with rest of the columns and if value is found in any column of the row append a new row?

append one dataframe column value to another dataframe

Merge two columns into one keeping hierarchical structure using pandas or excel writer

Filter data iteratively in Python data frame

Categories

Resources