in this dataframe:
Feat1 Feat2 Feat3 Feat4 Labels
-46.220314 22.862856 -6.1573067 5.6060414 2
-23.80669 20.536781 -5.015675 4.2216353 2
-42.092365 25.680704 -5.0092897 5.665794 2
-35.29639 21.709473 -4.160352 5.578346 2
-37.075096 22.347767 -3.860426 5.6953945 2
-42.8849 28.03802 -7.8572545 3.3361 2
-32.3057 26.568039 -9.47018 3.4532788 2
-24.469942 27.005375 -9.301921 4.3995037 2
-97.89892 -0.38156664 6.4163384 7.234347 1
-81.96325 0.1821717 -1.2870358 4.703838 1
-78.41986 -6.766374 0.8001185 0.83444935 1
-100.68544 -4.5810957 1.6977689 1.8801615 1
-87.05412 -2.9231584 6.817379 5.4460077 1
-64.121056 -3.7892206 -0.283514 6.3084154 1
-94.504845 -0.9999217 3.2884297 6.881124 1
-61.951996 -8.960198 -1.5915259 5.6160254 1
-108.19452 13.909201 0.6966458 -1.956591 0
-97.4037 22.897585 -2.8488266 1.4105041 0
-92.641335 22.10624 -3.5110545 2.467166 0
-199.18787 3.3090565 -2.5994794 4.0802555 0
-137.5976 6.795896 1.6793671 2.2256763 0
-208.0035 -1.33229 -3.2078092 1.5177402 0
-108.225975 14.341716 1.02891 -1.8651972 0
-121.29299 18.274035 2.2891548 2.3360753 0
I wanted to sort the rows based on different column values in the "Labels" column.
I am able to sort in ascending such that the labels appear as [0 1 2] via the command
df2 = df1.sort_values(by = 'Labels', ascending = True)
Then ascending = False, where the labels appear [2 1 0].
How then do I go about sorting the labels as [1 0 2]?
Any help will be greatly appreciated!
Here's a way using Categorical:
df['Labels'] = pd.Categorical(df['Labels'],
categories = [1, 0, 2],
ordered=True)
df.sort_values('Labels')
Output:
Feat1 Feat2 Feat3 Feat4 Labels
11 -100.685440 -4.581096 1.697769 1.880162 1
15 -61.951996 -8.960198 -1.591526 5.616025 1
8 -97.898920 -0.381567 6.416338 7.234347 1
9 -81.963250 0.182172 -1.287036 4.703838 1
10 -78.419860 -6.766374 0.800118 0.834449 1
14 -94.504845 -0.999922 3.288430 6.881124 1
12 -87.054120 -2.923158 6.817379 5.446008 1
13 -64.121056 -3.789221 -0.283514 6.308415 1
21 -208.003500 -1.332290 -3.207809 1.517740 0
20 -137.597600 6.795896 1.679367 2.225676 0
19 -199.187870 3.309057 -2.599479 4.080255 0
18 -92.641335 22.106240 -3.511055 2.467166 0
17 -97.403700 22.897585 -2.848827 1.410504 0
16 -108.194520 13.909201 0.696646 -1.956591 0
23 -121.292990 18.274035 2.289155 2.336075 0
22 -108.225975 14.341716 1.028910 -1.865197 0
7 -24.469942 27.005375 -9.301921 4.399504 2
6 -32.305700 26.568039 -9.470180 3.453279 2
5 -42.884900 28.038020 -7.857254 3.336100 2
4 -37.075096 22.347767 -3.860426 5.695394 2
3 -35.296390 21.709473 -4.160352 5.578346 2
2 -42.092365 25.680704 -5.009290 5.665794 2
1 -23.806690 20.536781 -5.015675 4.221635 2
0 -46.220314 22.862856 -6.157307 5.606041 2
You can use an ordered Categorical, or if you don't want to change the DataFrame, the poor-man's variant, a mapping Series:
order = [1, 0, 2]
key = pd.Series({k:v for v,k in enumerate(order)}).get
# or
# pd.Series(range(len(order)), index=order).get
df1.sort_values(by='Labels', key=key)
Example:
df1 = pd.DataFrame({'Labels': [1,0,1,2,0,2,1]})
order = [1, 0, 2]
key = pd.Series({k:v for v,k in enumerate(order)}).get
print(df1.sort_values(by='Labels', key=key))
Labels
0 1
2 1
6 1
1 0
4 0
3 2
5 2
here is another way to do it
create a new column using map and map the new order sequence and then sort as usual
df['sort_label'] = df['Labels'].map({1:0, 0:1, 2:2 }) #).sort_values('sort_label', ascending=False)
df.sort_values('sort_label')
Feat1 Feat2 Feat3 Feat4 Labels sort_label
11 -100.685440 -4.581096 1.697769 1.880162 1 0
15 -61.951996 -8.960198 -1.591526 5.616025 1 0
8 -97.898920 -0.381567 6.416338 7.234347 1 0
9 -81.963250 0.182172 -1.287036 4.703838 1 0
10 -78.419860 -6.766374 0.800119 0.834449 1 0
14 -94.504845 -0.999922 3.288430 6.881124 1 0
12 -87.054120 -2.923158 6.817379 5.446008 1 0
13 -64.121056 -3.789221 -0.283514 6.308415 1 0
21 -208.003500 -1.332290 -3.207809 1.517740 0 1
20 -137.597600 6.795896 1.679367 2.225676 0 1
19 -199.187870 3.309057 -2.599479 4.080255 0 1
18 -92.641335 22.106240 -3.511054 2.467166 0 1
17 -97.403700 22.897585 -2.848827 1.410504 0 1
16 -108.194520 13.909201 0.696646 -1.956591 0 1
23 -121.292990 18.274035 2.289155 2.336075 0 1
22 -108.225975 14.341716 1.028910 -1.865197 0 1
7 -24.469942 27.005375 -9.301921 4.399504 2 2
6 -32.305700 26.568039 -9.470180 3.453279 2 2
5 -42.884900 28.038020 -7.857254 3.336100 2 2
4 -37.075096 22.347767 -3.860426 5.695394 2 2
3 -35.296390 21.709473 -4.160352 5.578346 2 2
2 -42.092365 25.680704 -5.009290 5.665794 2 2
1 -23.806690 20.536781 -5.015675 4.221635 2 2
0 -46.220314 22.862856 -6.157307 5.606041 2 2
I have a df as shown below
df:
Id Jan20 Feb20 Mar20 Apr20 May20 Jun20 Jul20 Aug20 Sep20 Oct20 Nov20 Dec20 Amount
1 20 0 0 12 1 3 1 0 0 2 2 0 100
2 0 0 2 1 0 2 0 0 1 0 0 0 500
3 1 2 1 2 3 1 1 2 2 3 1 1 300
From the above I would like to calculate Activeness value which is the number of non zero columns in the month columns as given below.
'Jan20', 'Feb20', 'Mar20', 'Apr20', 'May20', 'Jun20', 'Jul20',
'Aug20', 'Sep20', 'Oct20', 'Nov20', 'Dec20'
Expected Output:
Id Jan20 Feb20 Mar20 Apr20 May20 Jun20 Jul20 Aug20 Sep20 Oct20 Nov20 Dec20 Amount Activeness
1 20 0 0 12 1 3 1 0 0 2 2 0 100 7
2 0 0 2 1 0 2 0 0 1 0 0 0 500 4
3 1 2 1 2 3 1 1 2 2 3 1 1 300 12
I tried below code:
df['Activeness'] = pd.Series(index=df.index, data=np.count_nonzero(df[['Jan20', 'Feb20',
'Mar20', 'Apr20', 'May20', 'Jun20', 'Jul20',
'Aug20', 'Sep20', 'Oct20', 'Nov20', 'Dec20']], axis=1))
which is working well, but I would like to know is there any method that is faster than this.
You can try:
df['Activeness'] = df.filter(like = '20').ne(0, axis =1).sum(1)
I have table in df:
X1 X2
1 1
1 2
2 2
2 2
3 3
3 3
And i want calculate Y, where Y = Yprevious + 1 if X1=X1previous and X2=X2previous, elso 0. Y on first line = 0. Example.
X1 X2 Y
1 1 0
2 2 0
2 2 1
2 2 2
2 2 3
3 3 0
Not a duplicate... Previously, the question was simpler - addition with a value in a specific line. Now the term appears in the calculation process. I need some cumulative calculation
What I need, more example:
X1 X2 Y
1 1 0
2 2 0
2 2 1
2 2 2
2 2 3
3 3 0
3 3 1
2 2 0
What I get on the link to the duplicate
X1 X2 Y
1 1 0
2 2 0
2 2 1
2 2 2
2 2 3
3 3 0
3 3 1
2 2 4
Use GroupBy.cumcount with new columns by consecutive values:
df1 = df[['X1','X2']].ne(df[['X1','X2']].shift()).cumsum()
df['Y'] = df.groupby([df1['X1'], df1['X2']]).cumcount()
print (df)
X1 X2 Y
0 1 1 0
1 2 2 0
2 2 2 1
3 2 2 2
4 2 2 3
5 3 3 0
6 3 3 1
7 2 2 0
I'm trying to create a simple game in python 3 and I'm trying to build in an EXP system, for example, every 50 experience points, your health (Which is already an integer) increases by one. Is there a command for this?
(I'm coding this on repl.it if that matters)
I've never shunned guessing. :)
Let me suppose that you are incrementing a variable called experience_points and that, once for every 50 times you increment that you want to increment a variable called health by one.
experience_points += 1
if experience_points % 50 == 0:
health +=1
This bit of code shows how this might work. Notice how health goes up one for every 50 times that 'experience_points` goes up one.
Welcome to the modulus operator!
>>> experience_points = 0
>>> health = 0
>>> while True:
... # do something in the game
... experience_points += 1
... if experience_points % 50 == 0:
... health += 1
... print (experience_points, health, '<--', end='')
... if experience_points > 160:
... break
...
1 0 <--2 0 <--3 0 <--4 0 <--5 0 <--6 0 <--7 0 <--8 0 <--9 0 <--10 0 <--11 0 <--12 0 <--13 0 <--14 0 <--15 0 <--16 0 <--17 0 <--18 0 <--19 0 <--20 0 <--21 0 <--22 0 <--23 0 <--24 0 <--25 0 <--26 0 <--27 0 <--28 0 <--29 0 <--30 0 <--31 0 <--32 0 <--33 0 <--34 0 <--35 0 <--36 0 <--37 0 <--38 0 <--39 0 <--40 0 <--41 0 <--42 0 <--43 0 <--44 0 <--45 0 <--46 0 <--47 0 <--48 0 <--49 0 <--50 1 <--51 1 <--52 1 <--53 1 <--54 1 <--55 1 <--56 1 <--57 1 <--58 1 <--59 1 <--60 1 <--61 1 <--62 1 <--63 1 <--64 1 <--65 1 <--66 1 <--67 1 <--68 1 <--69 1 <--70 1 <--71 1 <--72 1 <--73 1 <--74 1 <--75 1 <--76 1 <--77 1 <--78 1 <--79 1 <--80 1 <--81 1 <--82 1 <--83 1 <--84 1 <--85 1 <--86 1 <--87 1 <--88 1 <--89 1 <--90 1 <--91 1 <--92 1 <--93 1 <--94 1 <--95 1 <--96 1 <--97 1 <--98 1 <--99 1 <--100 2 <--101 2 <--102 2 <--103 2 <--104 2 <--105 2 <--106 2 <--107 2 <--108 2 <--109 2 <--110 2 <--111 2 <--112 2 <--113 2 <--114 2 <--115 2 <--116 2 <--117 2 <--118 2 <--119 2 <--120 2 <--121 2 <--122 2 <--123 2 <--124 2 <--125 2 <--126 2 <--127 2 <--128 2 <--129 2 <--130 2 <--131 2 <--132 2 <--133 2 <--134 2 <--135 2 <--136 2 <--137 2 <--138 2 <--139 2 <--140 2 <--141 2 <--142 2 <--143 2 <--144 2 <--145 2 <--146 2 <--147 2 <--148 2 <--149 2 <--150 3 <--151 3 <--152 3 <--153 3 <--154 3 <--155 3 <--156 3 <--157 3 <--158 3 <--159 3 <--160 3 <--161 3 <--
I'm rather new at python.
I try to have a cumulative sum for each client to see the consequential months of inactivity (flag: 1 or 0). The cumulative sum of the 1's need therefore to be reset when we have a 0. The reset need to happen as well when we have a new client. See below with example where a is the column of clients and b are the dates.
After some research, I found the question 'Cumsum reset at NaN' and 'In Python Pandas using cumsum with groupby'. I assume that I kind of need to put them together.
Adapting the code of 'Cumsum reset at NaN' to the reset towards 0, is successful:
cumsum = v.cumsum().fillna(method='pad')
reset = -cumsum[v.isnull() !=0].diff().fillna(cumsum)
result = v.where(v.notnull(), reset).cumsum()
However, I don't succeed at adding a groupby. My count just goes on...
So, a dataset would be like this:
import pandas as pd
df = pd.DataFrame({'a' : [1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'b' : [1/15,2/15,3/15,4/15,5/15,6/15,1/15,2/15,3/15,4/15,5/15,6/15],
'c' : [1,0,1,0,1,1,0,1,1,0,1,1,1,1]})
this should result in a dataframe with the columns a, b, c and d with
'd' : [1,0,1,0,1,2,0,1,2,0,1,2,3,4]
Please note that I have a very large dataset, so calculation time is really important.
Thank you for helping me
Use groupby.apply and cumsum after finding contiguous values in the groups. Then groupby.cumcount to get the integer counting upto each contiguous value and add 1 later.
Multiply with the original row to create the AND logic cancelling all zeros and only considering positive values.
df['d'] = df.groupby('a')['c'] \
.apply(lambda x: x * (x.groupby((x != x.shift()).cumsum()).cumcount() + 1))
print(df['d'])
0 1
1 0
2 1
3 0
4 1
5 2
6 0
7 1
8 2
9 0
10 1
11 2
12 3
13 4
Name: d, dtype: int64
Another way of doing would be to apply a function after series.expanding on the groupby object which basically computes values on the series starting from the first index upto that current index.
Use reduce later to apply function of two args cumulatively to the items of iterable so as to reduce it to a single value.
from functools import reduce
df.groupby('a')['c'].expanding() \
.apply(lambda i: reduce(lambda x, y: x+1 if y==1 else 0, i, 0))
a
1 0 1.0
1 0.0
2 1.0
3 0.0
4 1.0
5 2.0
6 0.0
2 7 1.0
8 2.0
9 0.0
10 1.0
11 2.0
12 3.0
13 4.0
Name: c, dtype: float64
Timings:
%%timeit
df.groupby('a')['c'].apply(lambda x: x * (x.groupby((x != x.shift()).cumsum()).cumcount() + 1))
100 loops, best of 3: 3.35 ms per loop
%%timeit
df.groupby('a')['c'].expanding().apply(lambda s: reduce(lambda x, y: x+1 if y==1 else 0, s, 0))
1000 loops, best of 3: 1.63 ms per loop
I think you need custom function with groupby:
#change row with index 6 to 1 for better testing
df = pd.DataFrame({'a' : [1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'b' : [1/15,2/15,3/15,4/15,5/15,6/15,1/15,2/15,3/15,4/15,5/15,6/15,7/15,8/15],
'c' : [1,0,1,0,1,1,1,1,1,0,1,1,1,1],
'd' : [1,0,1,0,1,2,3,1,2,0,1,2,3,4]})
print (df)
a b c d
0 1 0.066667 1 1
1 1 0.133333 0 0
2 1 0.200000 1 1
3 1 0.266667 0 0
4 1 0.333333 1 1
5 1 0.400000 1 2
6 1 0.066667 1 3
7 2 0.133333 1 1
8 2 0.200000 1 2
9 2 0.266667 0 0
10 2 0.333333 1 1
11 2 0.400000 1 2
12 2 0.466667 1 3
13 2 0.533333 1 4
def f(x):
x.ix[x.c == 1, 'e'] = 1
a = x.e.notnull()
x.e = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)
return (x)
print (df.groupby('a').apply(f))
a b c d e
0 1 0.066667 1 1 1
1 1 0.133333 0 0 0
2 1 0.200000 1 1 1
3 1 0.266667 0 0 0
4 1 0.333333 1 1 1
5 1 0.400000 1 2 2
6 1 0.066667 1 3 3
7 2 0.133333 1 1 1
8 2 0.200000 1 2 2
9 2 0.266667 0 0 0
10 2 0.333333 1 1 1
11 2 0.400000 1 2 2
12 2 0.466667 1 3 3
13 2 0.533333 1 4 4