As the title says it, my dataframe looks as follows:
ID
Follow up month
Value-x
value -y
1
0
12
12
1
0
11
14
2
0
10
11
2
3
11
0
2
0
12
1
1
3
13
12
2
3
11
5
I want to add another column called timepoint which would make the table look like as follows:
ID
Follow up month
Value-x
value -y
Timepoint
1
0
12
12
1
1
0
11
14
1
2
0
10
11
1
2
3
11
0
2
2
0
12
1
1
1
3
13
12
2
2
3
11
5
2
2
3
11
0
2
2
0
12
1
1
1
3
13
12
2
2
3
11
5
2
So far I tried to group the rows by their ID and follow up month and then apply a timepoint using cumcount. This didn't give me any results any help on how to handle this would be appreciated.
From your table I can only infer that you want to create the Timepoint column based on the corresponding values in Follow up month, which will look like:
from io import StringIO
import pandas as pd
wt = StringIO("""ID Follow up month Value-x value -y
1 0 12 12
1 0 11 14
2 0 10 11
2 3 11 0
2 0 12 1
1 3 13 12
2 3 11 5""")
df = pd.read_csv(wt, sep='\s\s+')
df['Timepoint'] = df['Follow up month'].apply(lambda x: 1 if x==0 else 2)
df
Output:
ID Follow up month Value-x value -y Timepoint
0 1 0 12 12 1
1 1 0 11 14 1
2 2 0 10 11 1
3 2 3 11 0 2
4 2 0 12 1 1
5 1 3 13 12 2
6 2 3 11 5 2
Edit
Based on your comment, this should be what you want:
def timepoint(s):
if not s.isin([0]).any() and s.iloc[0] == 3:
return 1
else:
return s.apply(lambda x: 1 if x==0 else 2)
df['Timepoint'] = df.groupby('ID')['Follow up month'].transform(timepoint)
Related
I have DataFrame with two columns ID and Value1, I want to select rows when the value of column value1 column changes. I want to save rows 3 before change and 3 after the change and also change point row.
df=pd.DataFrame({'ID':[1,3,4,6,7,8,90,23,56,78,90,34,56,78,89,34,56],'Value1':[0,0,0,0,0,2,2,2,2,0,0,0,1,1,1,1,1]})
ID Value1
0 1 0
1 3 0
2 4 0
3 6 0
4 7 0
5 8 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
12 56 1
13 78 1
14 89 1
15 34 1
16 56 1
output:
ID Value1
0 4 0
1 6 0
2 7 0
3 8 2
4 90 2
5 23 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
IIUC,
import numpy as np
df=pd.DataFrame({'ID':[1,3,4,6,7,8,90,23,56,78,90,34,56,78,89,34,56],'Value1':[0,0,0,0,0,2,2,2,2,0,0,0,1,1,1,1,1]})
df.reset_index(drop=True) #index needs to start from zero for solution
ind = list(set([val for i in df[df['Value1'].diff()!=0].index for val in range(i-3, i+4) if i>0 and val>=0]))
# diff gives column wise differencing. combined it with nested list and
# finally, list(set()) to drop any duplicates in index values
df[df.index.isin(ind)]
ID Value1
2 4 0
3 6 0
4 7 0
5 8 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
12 56 1
13 78 1
14 89 1
15 34 1
If you want to retain occurrences of duplicates, drop the list(set()) function over the list
I need to produce the following pattern:
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
I have written a code that produces the same but in a right side up form. i don't understand how to flip it over.
for i in range(1, 6 + 1):
for j in range(1, rows + 1):
if(j < i):
print(' ', end = ' ')
else:
print(i, end = ' ')
print()
Edit: This somewhat fails with rows >= 12, honorable mention to alexanderhurst for finding the bug in this implementation, and providing another clean solution. However, we can mimic tabulate by using tabs (\t) instead of spaces (see at the bottom).
Why not something simpler?
rows = 6
l = list(range(rows))
for i in range(rows):
print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
Edit: If you want permutations, try these:
>>> for i in range(rows):
... x = " " * 2*i + " ".join(str(x) for x in l[:rows-i])
... print(x[::-1])
6 5 4 3 2 1
5 4 3 2 1
4 3 2 1
3 2 1
2 1
1
>>> for i in range(rows, -1, -1):
... print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
...
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
>>> for i in range(rows, -1, -1):
... x = " " * 2*i + " ".join(str(x) for x in l[:rows-i])
... print(x[::-1])
...
1
2 1
3 2 1
4 3 2 1
5 4 3 2 1
6 5 4 3 2 1
Bug for larger numbers of rows:
>>> rows = 14
>>> l = list(range(rows))
>>> for i in range(rows):
... print(" " * 2*i + " ".join(str(x) for x in l[:rows-i]))
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
0 1 2
0 1
0
Hotfix 1: use tabs. This can work okay if your tab length is the same as me and you use <20 rows on max screen width (well, otherwise alexanderhurst's solution might not solve your problem either).
>>> for i in range(rows):
... print("\t" * i + "\t".join(str(x) for x in l[:rows-i]))
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6
0 1 2 3 4 5
0 1 2 3 4
0 1 2 3
0 1 2
0 1
0
Hotfix 2: add / remove spaces according to number length (e.g. using log(x) or len(str(x)) or similar) but it becomes too complex.
This solution resembles yours with a few changes
It first prints out the number of spaces needed for the triangle shape
Then it counts up to count
and then it moves to the next line
num = 6
for i in range(num,0,-1):
print(' '*(num - i), end='')
for j in range(i):
print(j + 1, end=' ')
print()
this does have an odd effect if you use a value greater than 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
You can use tabulate to keep everything in its column. Here I also used a list comprehension to reduce code size.
code:
from tabulate import tabulate
count = 16
numbers = [[''] * (count - i) + [j+1 for j in range(i)] for i in range(count, 0, -1)]
print(tabulate(numbers))
output:
- - - - - - - - - -- -- -- -- -- -- --
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
- - - - - - - - - -- -- -- -- -- -- --
You can count backwards with range():
for i in range(6, 0, -1):
for j in range(1, rows + 1):
if(6-j >= i): # if i = 6, doesn't activate. i=5, activates once. i=4, activates twice, etc.
print(' ', end = ' ')
else:
print(i, end = ' ')
print()
So from what I can see you are trying to make the form:
1
21
321
4321
54321
654321
So the loops need to be reversed and you need to add a space filler section.
rows = 6
for i in range(1, rows+1):
out = ''
for j in range(1, rows):
out += ' '
for j in range(i, 0, -1):
out += str(j)
print(out)
rows -=1
A 1 line statement using list comprehension would be
pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(6, -1, -1)))
For clarification, you can have a look at the below commands executed on Python interactive terminal.
>>> pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(6, -1, -1)))
>>>
>>> print(pattern)
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
>>>
It is suggested to use functional approach for this kind of repetitive work (If you want to try with multiple samples).
def print_num_triangle(n=6):
"""
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
"""
pattern = '\n'.join((' ' * 2 * i) + ' '.join(str(n) for n in range(1, num + 1)) for i, num in enumerate(range(n, -1, -1)))
print(pattern)
if __name__ == "__main__":
print_num_triangle(10)
# 1 2 3 4 5 6 7 8 9 10
# 1 2 3 4 5 6 7 8 9
# 1 2 3 4 5 6 7 8
# 1 2 3 4 5 6 7
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
#
print_num_triangle(7)
# 1 2 3 4 5 6 7
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
print_num_triangle() # default -> 6
# 1 2 3 4 5 6
# 1 2 3 4 5
# 1 2 3 4
# 1 2 3
# 1 2
# 1
I have dataframe like below.
df = pd.DataFrame({'group':[1,2,1,3,3,1,4,4,1,4], 'match': [1,1,1,1,1,1,1,1,1,1]})
group match
0 1 1
1 2 1
2 1 1
3 3 1
4 3 1
5 1 1
6 4 1
7 4 1
8 1 1
9 4 1
I want to get top n group like below (n=3).
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 4 1
5 4 1
6 4 1
7 3 1
8 3 1
My data, in actually, each row have another information to use, so only sort to num of match, and extract top n.
How to do this?
I believe you need if need top3 groups per column match - use SeriesGroupBy.value_counts with GroupBy.head for top3 per groups and then convert index to DataFrame by Index.to_frame and DataFrame.merge:
s = df.groupby('match')['group'].value_counts().groupby(level=0).head(3).swaplevel()
df = s.index.to_frame().reset_index(drop=True).merge(df)
print (df)
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 4 1
5 4 1
6 4 1
7 3 1
8 3 1
Or if need filter only values if match is 1 use Series.value_counts with filtering by boolean indexing:
s = df.loc[df['match'] == 1, 'group'].value_counts().head(3)
df = s.index.to_frame(name='group').merge(df)
print (df)
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 4 1
5 4 1
6 4 1
7 3 1
8 3 1
Solution with isin and ordered categoricals:
#if need filter match == 1
idx = df.loc[df['match'] == 1, 'group'].value_counts().head(3).index
#if dont need filter
#idx = df.group.value_counts().head(3).index
df = df[df.group.isin(idx)]
df['group'] = pd.CategoricalIndex(df['group'], ordered=True, categories=idx)
df = df.sort_values('group')
print (df)
group match
0 1 1
2 1 1
5 1 1
8 1 1
6 4 1
7 4 1
9 4 1
3 3 1
4 3 1
Difference in solutions is best seen in changed data of match column:
df = pd.DataFrame({'group':[1,2,1,3,3,1,4,4,1,4,10,20,10,20,10,30,40],
'match': [1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0]})
print (df)
group match
0 1 1
1 2 1
2 1 1
3 3 1
4 3 1
5 1 1
6 4 1
7 4 1
8 1 1
9 4 1
10 10 0
11 20 0
12 10 0
13 20 0
14 10 0
15 30 0
16 40 0
Top3 values per groups by match:
s = df.groupby('match')['group'].value_counts().groupby(level=0).head(3).swaplevel()
df1 = s.index.to_frame().reset_index(drop=True).merge(df)
print (df1)
group match
0 10 0
1 10 0
2 10 0
3 20 0
4 20 0
5 30 0
6 1 1
7 1 1
8 1 1
9 1 1
10 4 1
11 4 1
12 4 1
13 3 1
14 3 1
Top3 values by match == 1:
s = df.loc[df['match'] == 1, 'group'].value_counts().head(3)
df2 = s.index.to_frame(name='group').merge(df)
print (df2)
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 4 1
5 4 1
6 4 1
7 3 1
8 3 1
Top3 values, match column is not important:
s = df['group'].value_counts().head(3)
df3 = s.index.to_frame(name='group').merge(df)
print (df3)
group match
0 1 1
1 1 1
2 1 1
3 1 1
4 10 0
5 10 0
6 10 0
7 4 1
8 4 1
9 4 1
I have a table like this, I want to draw a histogram for number of 0, 1, 2, 3 across all table, is there a way to do it?
you can apply melt and hist
for example:
df
A B C D
0 3 1 1 1
1 3 3 2 2
2 1 0 1 1
3 3 2 3 0
4 3 1 1 3
5 3 0 3 1
6 3 1 1 0
7 1 3 3 0
8 3 1 3 3
9 3 3 1 3
df.melt()['value'].value_counts()
3 18
1 14
0 5
2 3
This question already has answers here:
Generalise slicing operation in a NumPy array
(4 answers)
Closed 4 years ago.
I have a 2d numpy matrix, for example:
arr = np.arange(0, 12).reshape(3,4)
This I would like to get in a DataFrame, such that:
X Y Z
0 0 0
0 1 1
0 2 2
0 3 3
1 0 4
1 1 5
1 2 6
1 3 7
2 0 8
2 1 9
2 2 10
2 3 11
How would I do this (efficiently)?
You can use numpy functions:
x1 =np.repeat(np.arange(arr.shape[0]), len(arr.flatten())/len(np.arange(arr.shape[0])))
x2 =np.tile(np.arange(arr.shape[1]), int(len(arr.flatten())/len(np.arange(arr.shape[1]))))
x3= arr.flatten()
pd.DataFrame(np.array([x1,x2,x3]).T, columns=['X','Y','Z'])
Output
X Y Z
0 0 0 0
1 0 1 1
2 0 2 2
3 0 3 3
4 1 0 4
5 1 1 5
6 1 2 6
7 1 3 7
8 2 0 8
9 2 1 9
10 2 2 10
11 2 3 11