How to append a column to a dataframe with values from a specified column index [duplicate] - python-3.x

This question already has an answer here:
Vectorized look-up of values in Pandas dataframe
(1 answer)
Closed 2 years ago.
I have a dataframe in which one of the columns is used to designate which of the other columns has the specific value I'm looking for.
df = pd.DataFrame({'COL1':['X','Y','Z'],'COL2':['A','B','C'],'X_SORT':['COL1','COL2','COL1']})
I'm trying to add a new column called 'X_SORT_VALUE' and assigning the value of the column identified in the X_SORT column.
df = df.assign(X_SORT_VALUE=lambda x: (x['X_SORT']))
But I'm getting the value of the X_SORT column:
COL1 COL2 X_SORT X_SORT_VALUE
0 X A COL1 COL1
1 Y B COL2 COL2
2 Z C COL1 COL1
Rather than getting the value of that column index, like I want:
COL1 COL2 X_SORT X_SORT_VALUE
0 X A COL1 X
1 Y B COL2 B
2 Z C COL1 Z

You need df.lookup here:
df['X_SORT_VALUE'] = df.lookup(df.index,df['X_SORT'])
COL1 COL2 X_SORT X_SORT_VALUE
0 X A COL1 X
1 Y B COL2 B
2 Z C COL1 Z

Related

How to find the intersection of a pair of columns in pandas dataframe with pairs in any order?

I have below dataframe
col1 col2
a b
b a
c d
d c
e d
Desired Output should be unique pair from two columns
col1 col2
a b
c d
e d
Convert values to frozenset and then filter by DataFrame.duplicated in boolean indexing:
df = df[~df[['col1','col2']].apply(frozenset, axis=1).duplicated()]
print (df)
col1 col2
0 a b
2 c d
4 e d
Or you can sorting values by np.sort and remove duplicates by DataFrame.drop_duplicates:
df = pd.DataFrame(np.sort(df[['col1','col2']]), columns=['col1','col2']).drop_duplicates()
print (df)
col1 col2
0 a b
2 c d
4 d e

Pandas: Create different dataframes from an unique multiIndex dataframe

I would like to know how to pass from a multiindex dataframe like this:
A B
col1 col2 col1 col2
1 2 12 21
3 1 2 0
To two separated dfs. df_A:
col1 col2
1 2
3 1
df_B:
col1 col2
12 21
2 0
Thank you for the help
I think here is better use DataFrame.xs for selecting by first level:
print (df.xs('A', axis=1, level=0))
col1 col2
0 1 2
1 3 1
What need is not recommended, but possible create DataFrames by groups:
for i, g in df.groupby(level=0, axis=1):
globals()['df_' + str(i)] = g.droplevel(level=0, axis=1)
print (df_A)
col1 col2
0 1 2
1 3 1
Better is create dictionary of DataFrames:
d = {i:g.droplevel(level=0, axis=1)for i, g in df.groupby(level=0, axis=1)}
print (d['A'])
col1 col2
0 1 2
1 3 1

groupby column in pandas

I am trying to groupby columns value in pandas but I'm not getting.
Example:
Col1 Col2 Col3
A 1 2
B 5 6
A 3 4
C 7 8
A 11 12
B 9 10
-----
result needed grouping by Col1
Col1 Col2 Col3
A 1,3,11 2,4,12
B 5,9 6,10
c 7 8
but I getting this ouput
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000025BEB4D6E50>
I am getting using excel power query with function group by and count all rows, but I canĀ“t get the same with python and pandas. Any help?
Try this
(
df
.groupby('Col1')
.agg(lambda x: ','.join(x.astype(str)))
.reset_index()
)
it outputs
Col1 Col2 Col3
0 A 1,3,11 2,4,12
1 B 5,9 6,10
2 C 7 8
Very good I created solution between 0 and 0:
df[df['A'] != 0].groupby((df['A'] == 0).cumsum()).sub()
It will group column between 0 and 0 and sum it

Grouping corresponding Rows based on One column

I have an Excel Sheet Dataframe with no fixed number of rows and columns.
eg.
Col1 Col2 Col3
A 1 -
A - 2
B 3 -
B - 4
C 5 -
I would like to Group Col1 which has the same content. Like the following.
Col1 Col2 Col3
A 1 2
B 3 4
C 5 -
I am using pandas GroupBy, but not getting what I wanted.
Try using groupby:
print(df.replace('-', pd.np.nan).groupby('Col1', as_index=False).first().fillna('-'))
Output:
Col1 Col2 Col3
0 A 1 2
1 B 3 4
2 C 5 -

Filter a dataframe with NOT and AND condition

I know this question has been asked multiple times, but for some reason it is not working for my case.
So I want to filter the dataframe using the NOT and AND condition.
For example, my dataframe df looks like:
col1 col2
a 1
a 2
b 3
b 4
b 5
c 6
Now, I want to use a condition to remove where col1 has "a" AND col2 has 2
My resulting dataframe should look like:
col1 col2
a 1
b 3
b 4
b 5
c 6
I tried this: Even though I used & but it removes all the rows which have "a" in col1 .
df = df[(df['col1'] != "a") & (df['col2'] != "2")]
To remove cells where col1 is "a" AND col2 is 2 means to keep cells where col1 isn't "a" OR col2 isn't 2 (negation of A AND B is NOT(A) OR NOT(B)):
df = df[(df['col1'] != "a") | (df['col2'] != 2)] # or "2", depending on whether the `2` is an int or a str

Resources