how to split data in groups by two column conditions pandas - python-3.x

I have dataframe, i want to split dataframe in groups based on condition from flag_0 and flag_1 column , when flag_0 is '3' and and flag_1 is '1' continous.
Here is my dataframe example:
df=pd.DataFrame({'flag_0':[1,2,3,1,2,3,1,2,3,3,3,3,1,2,3,1,2,3,4,4],'flag_1':[1,2,3,1,2,3,1,2,1,1,1,1,1,2,1,1,2,3,4,4],'dd':[1,1,1,7,7,7,8,8,8,1,1,1,7,7,7,8,8,8,5,7]})
Out[172]:
flag_0 flag_1 dd
0 1 1 1
1 2 2 1
2 3 3 1
3 1 1 7
4 2 2 7
5 3 3 7
6 1 1 8
7 2 2 8
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1
12 1 1 7
13 2 2 7
14 3 1 7
15 1 1 8
16 2 2 8
17 3 3 8
18 4 4 5
19 4 4 7
Desired output:
group_1
Out[172]:
flag_0 flag_1 dd
9 3 1 1
10 3 1 1
11 3 1 1
group 2
Out[172]:
flag_0 flag_1 dd
14 3 1 7

You can use a mask and groupby to split the dataframe:
cond = {'flag_0': 3, 'flag_1': 1}
mask = df[list(cond)].eq(cond).all(1)
groups = [g for k,g in df[mask].groupby((~mask).cumsum())]
output:
[ flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1,
flag_0 flag_1 dd
14 3 1 7]
groups[0]
flag_0 flag_1 dd
8 3 1 8
9 3 1 1
10 3 1 1
11 3 1 1

Related

How to copy values from one column in df1 to df2 based on specific values in other three columns?

I have two dataframes with similar shapes and column names and would like to copy the values of df1['property'] and paste them in df2['property'], but there is a condition.
df1:
i j k property
1 1 1 10
1 1 2 20
1 1 3 30
1 2 1 40
1 2 2 50
1 2 3 60
1 3 1 70
1 3 2 80
1 3 3 90
2 1 1 100
2 1 2 110
2 1 3 120
2 2 1 130
2 2 2 140
2 2 3 150
2 3 1 160
2 3 2 170
2 3 3 180
3 1 1 190
3 1 2 200
3 1 3 210
3 2 1 220
3 2 2 230
3 2 3 240
3 3 1 250
3 3 2 260
3 3 3 270
df2:
i j k property
1 1 1 100
2 1 1 100
3 1 1 100
1 2 1 100
2 2 1 100
3 2 1 100
1 3 1 100
2 3 1 100
3 3 1 100
1 1 2 100
2 1 2 100
3 1 2 100
1 2 2 100
2 2 2 100
3 2 2 100
1 3 2 100
2 3 2 100
3 3 2 100
1 1 3 100
2 1 3 100
3 1 3 100
1 2 3 100
2 2 3 100
3 2 3 100
1 3 3 100
2 3 3 100
3 3 3 100
The other three columns (i, j, k) represent different positions and the copied value of df1['property'] must replace df2['property'] only where df1[['i','j','k']] is the same as df2[['i','j','k']]. Anyone could help me with this?
In my mind, I should use map function but I do not know how to do this for three columns condition.
IIUC you want DatFrame.merge:
df2['property']=( df2.drop('property',axis=1)
.merge(df1,on=['i','j','k'],how = 'left')['property']
.fillna(df2['property']) )
print(df2)
#or this:
#df2['property']=( df2.merge(df1,on=['i','j','k'],how = 'left')['property_y']
# .fillna(df2['property']) )
We could also use DataFrame.update:
df2_update=df2.set_index(['i','j','k'])
df2_update.update(df1.set_index(['i','j','k']))
df2_update = df2_update.reset_index()
print(df2_update)
Output
i j k property
0 1 1 1 10
1 2 1 1 100
2 3 1 1 190
3 1 2 1 40
4 2 2 1 130
5 3 2 1 220
6 1 3 1 70
7 2 3 1 160
8 3 3 1 250
9 1 1 2 20
10 2 1 2 110
11 3 1 2 200
12 1 2 2 50
13 2 2 2 140
14 3 2 2 230
15 1 3 2 80
16 2 3 2 170
17 3 3 2 260
18 1 1 3 30
19 2 1 3 120
20 3 1 3 210
21 1 2 3 60
22 2 2 3 150
23 3 2 3 240
24 1 3 3 90
25 2 3 3 180
26 3 3 3 270
I'd do this:
import pandas as pd, numpy as np
df1 = pd.DataFrame(dict(i=np.repeat([1,2,3],9), j=np.repeat([[1,2,3],[1,2,3],[1,2,3]],3), k=[1,2,3]*9,\
property=range(10,280,10)))
df2 = pd.DataFrame(dict(k=np.repeat([1,2,3],9), j=np.repeat([[1,2,3],[1,2,3],[1,2,3]],3), i=[1,2,3]*9,\
property=100))
df = pd.concat([df1,df2.rename(columns={"i":"ii","j":"jj","k":"kk","property":"property2"})],axis=1)
df.property2 = np.where((df.i==df.ii)&(df.j==df.jj)&(df.k==df.kk),df.property,df.property2)
df=df[["ii","jj","kk","property2"]]
print(df)
Gives:
ii jj kk property2
0 1 1 1 10
1 2 1 1 100
2 3 1 1 100
3 1 2 1 40
4 2 2 1 100
5 3 2 1 100
6 1 3 1 70
7 2 3 1 100
8 3 3 1 100
9 1 1 2 100
10 2 1 2 110
11 3 1 2 100
12 1 2 2 100
13 2 2 2 140
14 3 2 2 100
15 1 3 2 100
16 2 3 2 170
17 3 3 2 100
18 1 1 3 100
19 2 1 3 100
20 3 1 3 210
21 1 2 3 100
22 2 2 3 100
23 3 2 3 240
24 1 3 3 100
25 2 3 3 100
26 3 3 3 270

I need to write a program to print the following pattern:

I need to produce the following pattern:
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
I have written a code that produces the same but in a right side up form. i don't understand how to flip it over.
for i in range(1, 6 + 1):
for j in range(1, rows + 1):
if(j < i):
print(' ', end = ' ')
else:
print(i, end = ' ')
print()
Edit: This somewhat fails with rows >= 12, honorable mention to alexanderhurst for finding the bug in this implementation, and providing another clean solution. However, we can mimic tabulate by using tabs (\t) instead of spaces (see at the bottom).
Why not something simpler?
rows = 6
l = list(range(rows))
for i in range(rows):
print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
Edit: If you want permutations, try these:
>>> for i in range(rows):
... x = " " * 2*i + " ".join(str(x) for x in l[:rows-i])
... print(x[::-1])
6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
>>> for i in range(rows, -1, -1):
... print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
...
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
>>> for i in range(rows, -1, -1):
... x = " " * 2*i + " ".join(str(x) for x in l[:rows-i])
... print(x[::-1])
...
1
2 1
3 2 1
4 3 2 1
5 4 3 2 1
6 5 4 3 2 1
Bug for larger numbers of rows:
>>> rows = 14
>>> l = list(range(rows))
>>> for i in range(rows):
... print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
0 1 2
0 1
0
Hotfix 1: use tabs. This can work okay if your tab length is the same as me and you use <20 rows on max screen width (well, otherwise alexanderhurst's solution might not solve your problem either).
>>> for i in range(rows):
... print("\t" * i + "\t".join(str(x) for x in l[:rows-i]))
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
0 1 2
0 1
0
Hotfix 2: add / remove spaces according to number length (e.g. using log(x) or len(str(x)) or similar) but it becomes too complex.
This solution resembles yours with a few changes
It first prints out the number of spaces needed for the triangle shape
Then it counts up to count
and then it moves to the next line
num = 6
for i in range(num,0,-1):
print(' '*(num - i), end='')
for j in range(i):
print(j + 1, end=' ')
print()
this does have an odd effect if you use a value greater than 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
You can use tabulate to keep everything in its column. Here I also used a list comprehension to reduce code size.
code:
from tabulate import tabulate
count = 16
numbers = [[''] * (count - i) + [j+1 for j in range(i)] for i in range(count, 0, -1)]
print(tabulate(numbers))
output:
- - - - - - - - - -- -- -- -- -- -- --
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
- - - - - - - - - -- -- -- -- -- -- --
You can count backwards with range():
for i in range(6, 0, -1):
for j in range(1, rows + 1):
if(6-j >= i): # if i = 6, doesn't activate. i=5, activates once. i=4, activates twice, etc.
print(' ', end = ' ')
else:
print(i, end = ' ')
print()
So from what I can see you are trying to make the form:
1
21
321
4321
54321
654321
So the loops need to be reversed and you need to add a space filler section.
rows = 6
for i in range(1, rows+1):
out = ''
for j in range(1, rows):
out += ' '
for j in range(i, 0, -1):
out += str(j)
print(out)
rows -=1
A 1 line statement using list comprehension would be
pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(6, -1, -1)))
For clarification, you can have a look at the below commands executed on Python interactive terminal.
>>> pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(6, -1, -1)))
>>>
>>> print(pattern)
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
>>>
It is suggested to use functional approach for this kind of repetitive work (If you want to try with multiple samples).
def print_num_triangle(n=6):
"""
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
"""
pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(n, -1, -1)))
print(pattern)
if __name__ == "__main__":
print_num_triangle(10)
# 1 2 3 4 5 6 7 8 9 10
# 1 2 3 4 5 6 7 8 9
# 1 2 3 4 5 6 7 8
# 1 2 3 4 5 6 7
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
#
print_num_triangle(7)
# 1 2 3 4 5 6 7
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
print_num_triangle() # default -> 6
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1

Pandas how to get top n group by flag column

I have dataframe like below.
df = pd.DataFrame({'group':[1,2,1,3,3,1,4,4,1,4], 'match': [1,1,1,1,1,1,1,1,1,1]})
group match
0 1 1
1 2 1
2 1 1
3 3 1
4 3 1
5 1 1
6 4 1
7 4 1
8 1 1
9 4 1
I want to get top n group like below (n=3).
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 4 1
5 4 1
6 4 1
7 3 1
8 3 1
My data, in actually, each row have another information to use, so only sort to num of match, and extract top n.
How to do this?
I believe you need if need top3 groups per column match - use SeriesGroupBy.value_counts with GroupBy.head for top3 per groups and then convert index to DataFrame by Index.to_frame and DataFrame.merge:
s = df.groupby('match')['group'].value_counts().groupby(level=0).head(3).swaplevel()
df = s.index.to_frame().reset_index(drop=True).merge(df)
print (df)
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 4 1
5 4 1
6 4 1
7 3 1
8 3 1
Or if need filter only values if match is 1 use Series.value_counts with filtering by boolean indexing:
s = df.loc[df['match'] == 1, 'group'].value_counts().head(3)
df = s.index.to_frame(name='group').merge(df)
print (df)
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 4 1
5 4 1
6 4 1
7 3 1
8 3 1
Solution with isin and ordered categoricals:
#if need filter match == 1
idx = df.loc[df['match'] == 1, 'group'].value_counts().head(3).index
#if dont need filter
#idx = df.group.value_counts().head(3).index
df = df[df.group.isin(idx)]
df['group'] = pd.CategoricalIndex(df['group'], ordered=True, categories=idx)
df = df.sort_values('group')
print (df)
group match
0 1 1
2 1 1
5 1 1
8 1 1
6 4 1
7 4 1
9 4 1
3 3 1
4 3 1
Difference in solutions is best seen in changed data of match column:
df = pd.DataFrame({'group':[1,2,1,3,3,1,4,4,1,4,10,20,10,20,10,30,40],
'match': [1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0]})
print (df)
group match
0 1 1
1 2 1
2 1 1
3 3 1
4 3 1
5 1 1
6 4 1
7 4 1
8 1 1
9 4 1
10 10 0
11 20 0
12 10 0
13 20 0
14 10 0
15 30 0
16 40 0
Top3 values per groups by match:
s = df.groupby('match')['group'].value_counts().groupby(level=0).head(3).swaplevel()
df1 = s.index.to_frame().reset_index(drop=True).merge(df)
print (df1)
group match
0 10 0
1 10 0
2 10 0
3 20 0
4 20 0
5 30 0
6 1 1
7 1 1
8 1 1
9 1 1
10 4 1
11 4 1
12 4 1
13 3 1
14 3 1
Top3 values by match == 1:
s = df.loc[df['match'] == 1, 'group'].value_counts().head(3)
df2 = s.index.to_frame(name='group').merge(df)
print (df2)
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 4 1
5 4 1
6 4 1
7 3 1
8 3 1
Top3 values, match column is not important:
s = df['group'].value_counts().head(3)
df3 = s.index.to_frame(name='group').merge(df)
print (df3)
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 10 0
5 10 0
6 10 0
7 4 1
8 4 1
9 4 1

Table wise value count

I have a table like this, I want to draw a histogram for number of 0, 1, 2, 3 across all table, is there a way to do it?
you can apply melt and hist
for example:
df
A B C D
0 3 1 1 1
1 3 3 2 2
2 1 0 1 1
3 3 2 3 0
4 3 1 1 3
5 3 0 3 1
6 3 1 1 0
7 1 3 3 0
8 3 1 3 3
9 3 3 1 3
df.melt()['value'].value_counts()
3 18
1 14
0 5
2 3

Dataframe concatenate columns

I have a dataframe with a multiindex (ID, Date, LID) and columns from 0 to N that looks something like this:
0 1 2 3 4
ID Date LID
00112 11-02-2014 I 0 1 5 6 7
00112 11-02-2014 II 2 4 5 3 4
00112 30-07-2015 I 5 7 1 1 2
00112 30-07-2015 II 3 2 8 7 1
I would like to group the dataframe by ID and Date and concatenate the columns to the same row such that it looks like this:
0 1 2 3 4 5 6 7 8 9
ID Date
00112 11-02-2014 0 1 5 6 7 2 4 5 3 4
00112 30-07-2015 5 7 1 1 2 3 2 8 7 1
Using pd.concat and pd.DataFrame.xs
pd.concat(
[df.xs(x, level=2) for x in df.index.levels[2]],
axis=1, ignore_index=True
)
0 1 2 3 4 5 6 7 8 9
ID Date
112 11-02-2014 0 1 5 6 7 2 4 5 3 4
30-07-2015 5 7 1 1 2 3 2 8 7 1
Use unstack + sort_index:
df = df.unstack().sort_index(axis=1, level=1)
#for new columns names
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
ID Date
112 11-02-2014 0 1 5 6 7 2 4 5 3 4
30-07-2015 5 7 1 1 2 3 2 8 7 1

Resources