I've the following dataframe:
car_id time(seconds) is_charging
1 1 65 1
2 1 70 1
3 1 67 1
4 1 71 1
5 1 120 0
6 1 124 0
7 1 117 0
8 1 80 1
9 1 74 1
10 1 62 1
11 1 130 0
12 1 124 0
I want to create new column to enumerate the charging and discharging periods of the 'is_charging' column so later on i can groupby that new column and compute means, max, min values, etc, of each period.
The resulting dataframe should be like this:
car_id time(seconds) is_charging periods_id
1 1 65 1 1
2 1 70 1 1
3 1 67 1 1
4 1 71 1 1
5 1 120 0 2
6 1 124 0 2
7 1 117 0 2
8 1 80 1 3
9 1 74 1 3
10 1 62 1 3
11 1 130 0 4
12 1 124 0 4
I've done this using for statment, like this:
df['periods_ids] = 0
period_id = 1
previous_charging_state = df.at[0,'is_charging']
def computePeriodIDs():
for ind in df.index:
if df.at[index, 'is_charging'] != previous_charging_state:
previous_charging_state = df.at[index, 'is_charging']
period_id = period_id + 1
df.at[index, 'periods_id'] = period_id
else:
df.at[index, 'periods_id'] = period_id
This is way too slow for the amount of rows that i have. I'm trying to use a vectorize function, especially the apply() one but due to my lack of understanding i haven't had much success and i can not find a similar problem online.
Can someone help me optimize this problem?
Try this:
df.is_charging.diff().ne(0).cumsum()
Out[115]:
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 3
9 3
10 3
11 4
12 4
Name: is_charging, dtype: int32
Related
I have two dataframes with similar shapes and column names and would like to copy the values of df1['property'] and paste them in df2['property'], but there is a condition.
df1:
i j k property
1 1 1 10
1 1 2 20
1 1 3 30
1 2 1 40
1 2 2 50
1 2 3 60
1 3 1 70
1 3 2 80
1 3 3 90
2 1 1 100
2 1 2 110
2 1 3 120
2 2 1 130
2 2 2 140
2 2 3 150
2 3 1 160
2 3 2 170
2 3 3 180
3 1 1 190
3 1 2 200
3 1 3 210
3 2 1 220
3 2 2 230
3 2 3 240
3 3 1 250
3 3 2 260
3 3 3 270
df2:
i j k property
1 1 1 100
2 1 1 100
3 1 1 100
1 2 1 100
2 2 1 100
3 2 1 100
1 3 1 100
2 3 1 100
3 3 1 100
1 1 2 100
2 1 2 100
3 1 2 100
1 2 2 100
2 2 2 100
3 2 2 100
1 3 2 100
2 3 2 100
3 3 2 100
1 1 3 100
2 1 3 100
3 1 3 100
1 2 3 100
2 2 3 100
3 2 3 100
1 3 3 100
2 3 3 100
3 3 3 100
The other three columns (i, j, k) represent different positions and the copied value of df1['property'] must replace df2['property'] only where df1[['i','j','k']] is the same as df2[['i','j','k']]. Anyone could help me with this?
In my mind, I should use map function but I do not know how to do this for three columns condition.
IIUC you want DatFrame.merge:
df2['property']=( df2.drop('property',axis=1)
.merge(df1,on=['i','j','k'],how = 'left')['property']
.fillna(df2['property']) )
print(df2)
#or this:
#df2['property']=( df2.merge(df1,on=['i','j','k'],how = 'left')['property_y']
# .fillna(df2['property']) )
We could also use DataFrame.update:
df2_update=df2.set_index(['i','j','k'])
df2_update.update(df1.set_index(['i','j','k']))
df2_update = df2_update.reset_index()
print(df2_update)
Output
i j k property
0 1 1 1 10
1 2 1 1 100
2 3 1 1 190
3 1 2 1 40
4 2 2 1 130
5 3 2 1 220
6 1 3 1 70
7 2 3 1 160
8 3 3 1 250
9 1 1 2 20
10 2 1 2 110
11 3 1 2 200
12 1 2 2 50
13 2 2 2 140
14 3 2 2 230
15 1 3 2 80
16 2 3 2 170
17 3 3 2 260
18 1 1 3 30
19 2 1 3 120
20 3 1 3 210
21 1 2 3 60
22 2 2 3 150
23 3 2 3 240
24 1 3 3 90
25 2 3 3 180
26 3 3 3 270
I'd do this:
import pandas as pd, numpy as np
df1 = pd.DataFrame(dict(i=np.repeat([1,2,3],9), j=np.repeat([[1,2,3],[1,2,3],[1,2,3]],3), k=[1,2,3]*9,\
property=range(10,280,10)))
df2 = pd.DataFrame(dict(k=np.repeat([1,2,3],9), j=np.repeat([[1,2,3],[1,2,3],[1,2,3]],3), i=[1,2,3]*9,\
property=100))
df = pd.concat([df1,df2.rename(columns={"i":"ii","j":"jj","k":"kk","property":"property2"})],axis=1)
df.property2 = np.where((df.i==df.ii)&(df.j==df.jj)&(df.k==df.kk),df.property,df.property2)
df=df[["ii","jj","kk","property2"]]
print(df)
Gives:
ii jj kk property2
0 1 1 1 10
1 2 1 1 100
2 3 1 1 100
3 1 2 1 40
4 2 2 1 100
5 3 2 1 100
6 1 3 1 70
7 2 3 1 100
8 3 3 1 100
9 1 1 2 100
10 2 1 2 110
11 3 1 2 100
12 1 2 2 100
13 2 2 2 140
14 3 2 2 100
15 1 3 2 100
16 2 3 2 170
17 3 3 2 100
18 1 1 3 100
19 2 1 3 100
20 3 1 3 210
21 1 2 3 100
22 2 2 3 100
23 3 2 3 240
24 1 3 3 100
25 2 3 3 100
26 3 3 3 270
I have DataFrame with two columns ID and Value1, I want to select rows when the value of column value1 column changes. I want to save rows 3 before change and 3 after the change and also change point row.
df=pd.DataFrame({'ID':[1,3,4,6,7,8,90,23,56,78,90,34,56,78,89,34,56],'Value1':[0,0,0,0,0,2,2,2,2,0,0,0,1,1,1,1,1]})
ID Value1
0 1 0
1 3 0
2 4 0
3 6 0
4 7 0
5 8 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
12 56 1
13 78 1
14 89 1
15 34 1
16 56 1
output:
ID Value1
0 4 0
1 6 0
2 7 0
3 8 2
4 90 2
5 23 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
IIUC,
import numpy as np
df=pd.DataFrame({'ID':[1,3,4,6,7,8,90,23,56,78,90,34,56,78,89,34,56],'Value1':[0,0,0,0,0,2,2,2,2,0,0,0,1,1,1,1,1]})
df.reset_index(drop=True) #index needs to start from zero for solution
ind = list(set([val for i in df[df['Value1'].diff()!=0].index for val in range(i-3, i+4) if i>0 and val>=0]))
# diff gives column wise differencing. combined it with nested list and
# finally, list(set()) to drop any duplicates in index values
df[df.index.isin(ind)]
ID Value1
2 4 0
3 6 0
4 7 0
5 8 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
12 56 1
13 78 1
14 89 1
15 34 1
If you want to retain occurrences of duplicates, drop the list(set()) function over the list
Can anyone help me with this issue please? I have used other data sets with different number of studies (NS) and treatments (NT) and it worked fine.
Any help will be highly appreciated.
The dataset is as follows:
list(N=186, NS=5, NT=3, mean=c(0,0);
Where
N= number of intervals
NS= number of studies
NT= number of treatment
s[]= Study ID
r[]= no of events
n[]= no at risk
t[]= study arm ID
b[]= Study arm base
time[]= time in months
dt[]= difference in interval (months)
model {
for (i in 1:N) { # N=number of datapoints in dataset
#likelihood
r[i] ~dbin(p[i],n[i])
p[i]<-1- exp(-h[i]*dt[i]) # hazard h over interval [t,t+dt] expressed as deaths per unit person-time (e.g. months)
#fixed effects model
log(h[i]) <- nu[i]+log(time[i])*theta[i]
nu[i]<-mu[s[i],1]+d[s[i],1]*(1-equals(t[i],b[i]))
theta[i]<-mu[s[i],2]+ d[s[i],2]*(1-equals(t[i],b[i]))
}
# priors
d[1,1]<- 0
d[1,2]<- 0
for(j in 2 :NT){ # NT=number of treatments
d[j,1:2] ~ dmnorm(mean[1:2],prec2[,])
}
for(k in 1:NS) {
mu[k,1:2] ~ dmnorm(mean[1:2],prec2[,])
}
}
#Winbugs data set
list(N=176, NS=5, NT=3, mean=c(0,0),
prec2 = structure(.Data = c(0.0001,0,0,0.0001), .Dim = c(2,2)))
# initials 1
list(
d=structure(.Data=c(NA,NA,0,0,0,0,0,0), .Dim = c(4,2)),
mu = structure(.Data=c(1,1,1,1,1,1,1,1), .Dim = c(4,2)))
# initials 2
list(
d=structure(.Data=c(NA,NA,0.5,0.5,0.5,0.5,0.5,0.5), .Dim = c(4,2)),
mu = structure(.Data=c(0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5), .Dim = c(4,2)))
s[] r[] n[] t[] b[] time[] dt[]
1 1 62 1 1 3 2
1 2 59 1 1 7 4
1 6 53 1 1 11 2
1 2 51 1 1 13 2
1 3 48 1 1 15 2
1 2 45 1 1 17 2
1 5 40 1 1 19 2
1 2 37 1 1 23 4
1 2 35 1 1 25 2
1 2 32 1 1 27 2
1 1 31 1 1 29 2
1 2 28 1 1 31 2
1 2 26 1 1 33 2
1 2 23 1 1 35 2
1 1 21 1 1 39 4
1 1 14 1 1 51 12
1 2 55 2 1 5 4
1 1 54 2 1 7 2
1 2 52 2 1 9 2
1 1 51 2 1 11 2
1 5 46 2 1 13 2
1 2 44 2 1 15 2
1 3 41 2 1 17 2
1 3 37 2 1 19 2
1 2 35 2 1 21 2
1 1 34 2 1 23 2
1 1 33 2 1 25 2
1 1 32 2 1 31 6
1 3 29 2 1 33 2
1 1 28 2 1 35 2
1 1 26 2 1 39 4
1 1 24 2 1 41 2
1 1 22 2 1 43 2
1 2 19 2 1 45 2
2 8 169 1 1 3 4
2 10 148 1 1 5 2
2 8 137 1 1 7 2
2 6 127 1 1 9 2
2 8 118 1 1 11 2
2 7 109 1 1 13 2
2 3 105 1 1 15 2
2 4 95 1 1 17 2
2 3 84 1 1 19 2
2 3 76 1 1 21 2
2 4 68 1 1 23 2
2 4 60 1 1 25 2
2 4 50 1 1 27 2
2 1 35 1 1 31 4
2 2 29 1 1 33 2
2 1 25 1 1 35 2
2 3 21 1 1 37 2
2 1 18 1 1 39 2
2 2 11 1 1 43 4
2 1 180 2 1 1 2
2 11 162 2 1 3 2
2 9 147 2 1 5 2
2 9 135 2 1 7 2
2 6 125 2 1 9 2
2 6 116 2 1 11 2
2 6 106 2 1 13 2
2 7 95 2 1 15 2
2 1 92 2 1 17 2
2 5 84 2 1 19 2
2 3 77 2 1 21 2
2 2 67 2 1 23 2
2 1 59 2 1 25 2
2 4 49 2 1 27 2
2 1 40 2 1 29 2
2 2 34 2 1 31 2
2 3 23 2 1 37 6
2 1 19 2 1 39 2
4 1 62 1 1 3 2
4 2 59 1 1 7 4
4 6 53 1 1 11 2
4 2 51 1 1 13 2
4 3 48 1 1 15 2
4 2 45 1 1 17 2
4 5 40 1 1 19 2
4 2 37 1 1 23 4
4 2 35 1 1 25 2
4 2 32 1 1 27 2
4 1 31 1 1 29 2
4 2 28 1 1 31 2
4 2 26 1 1 33 2
4 2 23 1 1 35 2
4 1 21 1 1 39 4
4 1 14 1 1 51 12
4 2 55 2 1 5 4
4 1 54 2 1 7 2
4 2 52 2 1 9 2
4 1 51 2 1 11 2
4 5 46 2 1 13 2
4 2 44 2 1 15 2
4 3 41 2 1 17 2
4 3 37 2 1 19 2
4 2 35 2 1 21 2
4 1 34 2 1 23 2
4 1 33 2 1 25 2
4 1 32 2 1 31 6
4 3 29 2 1 33 2
4 1 28 2 1 35 2
4 1 26 2 1 39 4
4 1 24 2 1 41 2
4 1 22 2 1 43 2
4 2 19 2 1 45 2
5 8 169 1 1 3 4
5 10 148 1 1 5 2
5 8 137 1 1 7 2
5 6 127 1 1 9 2
5 8 118 1 1 11 2
5 7 109 1 1 13 2
5 3 105 1 1 15 2
5 4 95 1 1 17 2
5 3 84 1 1 19 2
5 3 76 1 1 21 2
5 4 68 1 1 23 2
5 4 60 1 1 25 2
5 4 50 1 1 27 2
5 1 35 1 1 31 4
5 2 29 1 1 33 2
5 1 25 1 1 35 2
5 3 21 1 1 37 2
5 1 18 1 1 39 2
5 2 11 1 1 43 4
5 1 180 2 1 1 2
5 11 162 2 1 3 2
5 9 147 2 1 5 2
5 9 135 2 1 7 2
5 6 125 2 1 9 2
5 6 116 2 1 11 2
5 6 106 2 1 13 2
5 7 95 2 1 15 2
5 1 92 2 1 17 2
5 5 84 2 1 19 2
5 3 77 2 1 21 2
5 2 67 2 1 23 2
5 1 59 2 1 25 2
5 4 49 2 1 27 2
5 1 40 2 1 29 2
5 2 34 2 1 31 2
5 3 23 2 1 37 6
5 1 19 2 1 39 2
3 2 179 1 1 1 2
3 4 172 1 1 3 2
3 3 168 1 1 5 2
3 6 157 1 1 7 2
3 4 151 1 1 9 2
3 9 142 1 1 11 2
3 10 130 1 1 13 2
3 7 123 1 1 15 2
3 3 119 1 1 17 2
3 5 112 1 1 19 2
3 3 108 1 1 21 2
3 3 103 1 1 23 2
3 12 91 1 1 25 2
3 2 68 1 1 27 2
3 2 46 1 1 29 2
3 8 29 1 1 31 2
3 2 23 1 1 33 2
3 3 8 1 1 35 2
3 5 175 3 1 3 4
3 7 163 3 1 5 2
3 12 151 3 1 7 2
3 12 139 3 1 9 2
3 4 132 3 1 11 2
3 9 122 3 1 13 2
3 7 114 3 1 15 2
3 4 108 3 1 17 2
3 7 101 3 1 19 2
3 5 96 3 1 21 2
3 7 89 3 1 23 2
3 2 87 3 1 25 2
3 4 68 3 1 27 2
3 4 50 3 1 29 2
3 3 40 3 1 31 2
3 3 22 3 1 33 2
3 1 8 3 1 35 2
END
You have set NT = 3 while the indexed s vector ranges from 1 to 5.
Set NT = 5 or NT = length(unique(s)).
I want to calculate Return, RET, which is Cumulative of 2 periods (now & next period) with groupby(id).
df['RET'] = df.groupby('id')['trt1m1'].rolling(2,min_periods=2).apply(lambda x:x.prod()).reset_index(0,drop=True)
Expected Result:
id datadate trt1m1 RET
1 20051231 1 2
1 20060131 2 6
1 20060228 3 12
1 20060331 4 16
1 20060430 4 20
1 20060531 5 Nan
2 20061031 10 110
2 20061130 11 165
2 20061231 15 300
2 20070131 20 420
2 20070228 21 Nan
Actual Result:
id datadate trt1m1 RET
1 20051231 1 Nan
1 20060131 2 2
1 20060228 3 6
1 20060331 4 12
1 20060430 4 16
1 20060531 5 20
2 20061031 10 Nan
2 20061130 11 110
2 20061231 15 165
2 20070131 20 300
2 20070228 21 420
The code i used calculate cumprod for trailing 2 periods instead of forward.
I'm trying to transpose and sum with the following criteria: I have to create a row for each LOGIN and DATE and a column with the ACT values and the sum of their respective MAP values. In the middle separated by : I have to create the sum of all the MAP values, as follows:
LOGIN DATE ACT MAP
1 11/02/2008 149 3
1 11/02/2008 18 1
1 11/02/2008 18 1
1 11/02/2008 18 5
1 13/02/2008 145 2
1 13/02/2008 43 3
2 13/02/2008 19 0
2 13/02/2008 18 1
2 14/02/2008 18 1
2 14/02/2008 18 1
3 14/02/2008 39 1
3 15/02/2008 149 0
3 15/02/2008 43 0
3 15/02/2008 19 1
3 15/02/2008 19 1
1 11/02/2008 149 18 : 10: 3 7 This is the first row that I should create because 149 and 18 are the ACT values for this LOGIN and DATE, 3 = MAP value for ACT 149 and 7 is the sum of the MAP values for ACT 18, 7=1+1+5, in the middle the 10 value = 3+7
1 13/02/2008 145 43 : 5: 2 3
2 13/02/2008 19 18 : 1: 1 0
2 14/02/2008 18 : 2 : 2
3 14/02/2008 39 : 1 : 1
3 15/02/2008 149 43 19 : 2 : 0 0 2
I grouped and added to obtain this but need to process by rows
LOGIN MAP
1 15
11/02/2008 10
13/02/2008 5
2 3
13/02/2008 1
14/02/2008 2
3 3
14/02/2008 1
15/02/2008 2
I transformed the input file and now it looks like this, now I need to concatenate the values of the ACT column until I find a blank row. For example I need to create 18 149 10 7 3 for the first group until the first blank. For the second blank I need to create 43 145 5 3 2
LOGIN ACT Total
1 18 7
1 149 3
1 10
1 43 3
1 145 2
1 5
2 18 1
2 19 0
2 1
2 18 2
2 2
3 39 1
3 1
3 19 2
3 43 0
3 149 0
3 2