How to create column based on string position of other column in python?

How to create column based on string position of other column in python? - python-3.x

df:
Col_A
0 011011
1 011011
2 011011
3 011011
4 011011
How to create a column based on string position, for example in column_A I need to check 0 position create a column B.
Output;
Col_A Col_B
0 011011 pos1,pos4
1 000111 pos1,pos2,pos3
2 011000 pos1,pos4,pos5,pos6
3 011111 pos1
4 011010 pos1,pos4,pos6

First convert strings to DataFrame and add columns names by function in rename:
f = lambda x: f'pos{x+1}'
df1 = pd.DataFrame([list(x) for x in df['Col_A']], index=df.index).rename(columns=f)
print (df1)
pos1 pos2 pos3 pos4 pos5 pos6
0 0 1 1 0 1 1
1 0 0 0 1 1 1
2 0 1 1 0 0 0
3 0 1 1 1 1 1
4 0 1 1 0 1 0
Then compare '0' values by DataFrame.eq and for new column use matrix multiplication by DataFrame.dot with remove separator by Series.str.rstrip:
df['Col_B'] = df1.eq('0').dot(df1.columns + ',').str.rstrip(',')
print (df)
Col_A Col_B
0 011011 pos1,pos4
1 000111 pos1,pos2,pos3
2 011000 pos1,pos4,pos5,pos6
3 011111 pos1
4 011010 pos1,pos4,pos6

Related

how to do count of particular value of given column corresponding to other column

To count the particular value of given column

Use pd.crosstab with df.sum:
In [236]: output = pd.crosstab(df['Rel_ID'], df['Values'])
In [238]: output['total'] = output.sum(axis=1)
In [239]: output
Out[239]:
Values 400.0 500.0 1700.0 6300.0 total
Rel_ID
TESTA 1 1 1 1 4
TESTB 1 0 1 1 3
TESTC 0 1 1 0 2
TESTD 1 0 1 1 3
TESTE 1 1 0 0 2

How to identify where a particular sequence in a row occurs for the first time

I have a dataframe in pandas, an example of which is provided below:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6
A 1 0 0 1 0 0
B 1 1 0 0 1 0
C 1 0 1 1 0 0
D 0 0 1 0 0 1
E 1 1 1 1 1 1
As you can see 1 and 0 occurs randomly in different columns. It would be helpful, if anyone can suggest me a code in python such that I am able to find the column number where the 1 0 0 pattern occurs for the first time. For example, for member A, the first 1 0 0 pattern occurs at appear_1. so the first occurrence will be 1. Similarly for the member B, the first 1 0 0 pattern occurs at appear_2, so the first occurrence will be at column 2. The resulting table should have a new column named 'first_occurrence'. If there is no such 1 0 0 pattern occurs (like in row E) then the value in first occurrence column will the sum of number of 1 in that row. The resulting table should look something like this:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6 first_occurrence
A 1 0 0 1 0 0 1
B 1 1 0 0 1 0 2
C 1 0 1 1 0 0 4
D 0 0 1 0 0 1 3
E 1 1 1 1 1 1 6
Thank you in advance.

I try not to reinvent the wheel, so I develop on my answer to previous question. From that answer, you need to use additional idxmax, np.where, and get_indexer
cols = ['appear_1', 'appear_2', 'appear_3', 'appear_4', 'appear_5', 'appear_6']
df1 = df[cols]
m = df1[df1.eq(1)].ffill(1).notna()
df2 = df1[m].bfill(1).eq(0)
m2 = df2 & df2.shift(-1, axis=1, fill_value=True)
df['first_occurrence'] = np.where(m2.any(1), df1.columns.get_indexer(m2.idxmax(1)),
df1.shape[1])
Out[540]:
Person appear_1 appear_2 appear_3 appear_4 appear_5 appear_6 first_occurrence
0 A 1 0 0 1 0 0 1
1 B 1 1 0 0 1 0 2
2 C 1 0 1 1 0 0 4
3 D 0 0 1 0 0 1 3
4 E 1 1 1 1 1 1 6

List column name having value greater than zero

I have following dataframe
A | B | C | D
1 0 2 1
0 1 1 0
0 0 0 1
I want to add the new column have any value of row in the column greater than zero along with column name
A | B | C | D | New
1 0 2 1 A-1, C-2, D-1
0 1 1 0 B-1, C-1
0 0 0 1 D-1

We can use mask and stack
s=df.mask(df==0).stack().\
astype(int).astype(str).\
reset_index(level=1).apply('-'.join,1).add(',').sum(level=0).str[:-1]
df['New']=s
df
Out[170]:
A B C D New
0 1 0 2 1 A-1,C-2,D-1
1 0 1 1 0 B-1,C-1
2 0 0 0 1 D-1

Combine the column names with the df values that are not zero and then filter out the None values.
df = pd.read_clipboard()
arrays = np.where(df!=0, df.columns.values + '-' + df.values.astype('str'), None)
new = []
for array in arrays:
new.append(list(filter(None, array)))
df['New'] = new
df
Out[1]:
A B C D New
0 1 0 2 1 [A-1, C-2, D-1]
1 0 1 1 0 [B-1, C-1]
2 0 0 0 1 [D-1]

Comparing two different sized pandas Dataframes and to find the row index with equal values

I need some help with comparing two pandas dataframe
I have two dataframes
The first dataframe is
df1 =
a b c d
0 1 1 1 1
1 0 1 0 1
2 0 0 0 1
3 1 1 1 1
4 1 0 1 0
5 1 1 1 0
6 0 0 1 0
7 0 1 0 1
and the second dataframe is
df2 =
a b c d
0 1 1 1 1
1 1 0 1 0
2 0 0 1 0
I want to find the row index of dataframe 1 (df1) which the entire row is the same as the rows in dataframe 2 (df2). My expect result would be
0
3
4
6
The order of the above index does not need to be in order, all I want is the index of dataframe 1 (df1)
Is there a way without using for loop?
Thanks
Tommy

You can using merge
df1.merge(df2,indicator=True,how='left').loc[lambda x : x['_merge']=='both'].index
Out[459]: Int64Index([0, 3, 4, 6], dtype='int64')

drop duplicate rows from dataframe based on column precedence - python

If I have a database
Example:
Name A B C
0 Jon 0 1 0
1 Jon 1 0 1
2 Alan 1 0 0
3 Shaya 0 1 1
If there is a duplicate in my dataset I want the person who has column A as 1 to have precedence.
NB. Column A can only have values 1 or 0
Output:
Name A B C
1 Jon 1 0 1
2 Alan 1 0 0
3 Shaya 0 1 1

IIUC sort value before drop duplicate
df.sort_values('A').drop_duplicates('Name',keep='last').sort_index()
Out[126]:
Name A B C
1 Jon 1 0 1
2 Alan 1 0 0
3 Shaya 0 1 1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to create column based on string position of other column in python? - python-3.x

Related

how to do count of particular value of given column corresponding to other column

How to identify where a particular sequence in a row occurs for the first time

List column name having value greater than zero

Comparing two different sized pandas Dataframes and to find the row index with equal values

drop duplicate rows from dataframe based on column precedence - python

Categories

Resources