I have dataframe i want to move column name to left from specific column. original dataframe have many columns can not do this by rename columns
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'H':[1,3,4,7,8,11,1,15,78,15,16,87],
'N':[1,3,4,98,8,11,1,15,20,15,16,87],
'p':[1,3,4,9,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'y':[0,0,0,0,1,1,1,0,0,0,0,0]})
print((df))
A H N p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here i want to remove label N first dataframe after removing label N
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Rrquired output:
A H P B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here last column can be ignore
Note: in original dataframe have many columns , can not rename columns , so need some auto method to shift column names lef
You can do
df.columns=sorted(df.columns.str.replace('N',''),key=lambda x : x=='')
df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Replace the columns with your own custom list.
>>> cols = list(df.columns)
>>> cols.remove('N')
>>> df.columns = cols + ['']
Output
>>> df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Related
Suppose I have this dataframe :
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
4 20 21 22 23 24
I want to swap the position of row 1 and 2.
Is there a native Pandas function that can do this?
Thanks!
Use rename with a custom dict and sort_index
d = {1: 2, 2: 1}
df_final = df.rename(d).sort_index()
Out[27]:
0 1 2 3 4
0 0 1 2 3 4
1 10 11 12 13 14
2 5 6 7 8 9
3 15 16 17 18 19
4 20 21 22 23 24
As far as I am aware there is no Native Pandas function for this.
But here is a custom function:
# Input
df = pd.DataFrame(np.arange(25).reshape(5, -1))
# Output
def swap_rows(df, i1, i2):
a, b = df.iloc[i1, :].copy(), df.iloc[i2, :].copy()
df.iloc[i1, :], df.iloc[i2, :] = b, a
return df
print(swap_rows(df, 1, 2))
Output:
0 1 2 3 4
0 0 1 2 3 4
1 10 11 12 13 14
2 5 6 7 8 9
3 15 16 17 18 19
4 20 21 22 23 24
Cheers!
Try numpy flip:
df.iloc[1:3] = np.flip(df.to_numpy()[1:3], axis=0)
df
0 1 2 3 4
0 0 1 2 3 4
1 10 11 12 13 14
2 5 6 7 8 9
3 15 16 17 18 19
4 20 21 22 23 24
df1=df.copy()
df1.iloc[1,:],df1.iloc[2,:]=df.iloc[2,:],df.iloc[1,:]
df1
I have DataFrame with two columns ID and Value1, I want to select rows when the value of column value1 column changes. I want to save rows 3 before change and 3 after the change and also change point row.
df=pd.DataFrame({'ID':[1,3,4,6,7,8,90,23,56,78,90,34,56,78,89,34,56],'Value1':[0,0,0,0,0,2,2,2,2,0,0,0,1,1,1,1,1]})
ID Value1
0 1 0
1 3 0
2 4 0
3 6 0
4 7 0
5 8 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
12 56 1
13 78 1
14 89 1
15 34 1
16 56 1
output:
ID Value1
0 4 0
1 6 0
2 7 0
3 8 2
4 90 2
5 23 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
IIUC,
import numpy as np
df=pd.DataFrame({'ID':[1,3,4,6,7,8,90,23,56,78,90,34,56,78,89,34,56],'Value1':[0,0,0,0,0,2,2,2,2,0,0,0,1,1,1,1,1]})
df.reset_index(drop=True) #index needs to start from zero for solution
ind = list(set([val for i in df[df['Value1'].diff()!=0].index for val in range(i-3, i+4) if i>0 and val>=0]))
# diff gives column wise differencing. combined it with nested list and
# finally, list(set()) to drop any duplicates in index values
df[df.index.isin(ind)]
ID Value1
2 4 0
3 6 0
4 7 0
5 8 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
12 56 1
13 78 1
14 89 1
15 34 1
If you want to retain occurrences of duplicates, drop the list(set()) function over the list
Can anyone help me with this issue please? I have used other data sets with different number of studies (NS) and treatments (NT) and it worked fine.
Any help will be highly appreciated.
The dataset is as follows:
list(N=186, NS=5, NT=3, mean=c(0,0);
Where
N= number of intervals
NS= number of studies
NT= number of treatment
s[]= Study ID
r[]= no of events
n[]= no at risk
t[]= study arm ID
b[]= Study arm base
time[]= time in months
dt[]= difference in interval (months)
model {
for (i in 1:N) { # N=number of datapoints in dataset
#likelihood
r[i] ~dbin(p[i],n[i])
p[i]<-1- exp(-h[i]*dt[i]) # hazard h over interval [t,t+dt] expressed as deaths per unit person-time (e.g. months)
#fixed effects model
log(h[i]) <- nu[i]+log(time[i])*theta[i]
nu[i]<-mu[s[i],1]+d[s[i],1]*(1-equals(t[i],b[i]))
theta[i]<-mu[s[i],2]+ d[s[i],2]*(1-equals(t[i],b[i]))
}
# priors
d[1,1]<- 0
d[1,2]<- 0
for(j in 2 :NT){ # NT=number of treatments
d[j,1:2] ~ dmnorm(mean[1:2],prec2[,])
}
for(k in 1:NS) {
mu[k,1:2] ~ dmnorm(mean[1:2],prec2[,])
}
}
#Winbugs data set
list(N=176, NS=5, NT=3, mean=c(0,0),
prec2 = structure(.Data = c(0.0001,0,0,0.0001), .Dim = c(2,2)))
# initials 1
list(
d=structure(.Data=c(NA,NA,0,0,0,0,0,0), .Dim = c(4,2)),
mu = structure(.Data=c(1,1,1,1,1,1,1,1), .Dim = c(4,2)))
# initials 2
list(
d=structure(.Data=c(NA,NA,0.5,0.5,0.5,0.5,0.5,0.5), .Dim = c(4,2)),
mu = structure(.Data=c(0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5), .Dim = c(4,2)))
s[] r[] n[] t[] b[] time[] dt[]
1 1 62 1 1 3 2
1 2 59 1 1 7 4
1 6 53 1 1 11 2
1 2 51 1 1 13 2
1 3 48 1 1 15 2
1 2 45 1 1 17 2
1 5 40 1 1 19 2
1 2 37 1 1 23 4
1 2 35 1 1 25 2
1 2 32 1 1 27 2
1 1 31 1 1 29 2
1 2 28 1 1 31 2
1 2 26 1 1 33 2
1 2 23 1 1 35 2
1 1 21 1 1 39 4
1 1 14 1 1 51 12
1 2 55 2 1 5 4
1 1 54 2 1 7 2
1 2 52 2 1 9 2
1 1 51 2 1 11 2
1 5 46 2 1 13 2
1 2 44 2 1 15 2
1 3 41 2 1 17 2
1 3 37 2 1 19 2
1 2 35 2 1 21 2
1 1 34 2 1 23 2
1 1 33 2 1 25 2
1 1 32 2 1 31 6
1 3 29 2 1 33 2
1 1 28 2 1 35 2
1 1 26 2 1 39 4
1 1 24 2 1 41 2
1 1 22 2 1 43 2
1 2 19 2 1 45 2
2 8 169 1 1 3 4
2 10 148 1 1 5 2
2 8 137 1 1 7 2
2 6 127 1 1 9 2
2 8 118 1 1 11 2
2 7 109 1 1 13 2
2 3 105 1 1 15 2
2 4 95 1 1 17 2
2 3 84 1 1 19 2
2 3 76 1 1 21 2
2 4 68 1 1 23 2
2 4 60 1 1 25 2
2 4 50 1 1 27 2
2 1 35 1 1 31 4
2 2 29 1 1 33 2
2 1 25 1 1 35 2
2 3 21 1 1 37 2
2 1 18 1 1 39 2
2 2 11 1 1 43 4
2 1 180 2 1 1 2
2 11 162 2 1 3 2
2 9 147 2 1 5 2
2 9 135 2 1 7 2
2 6 125 2 1 9 2
2 6 116 2 1 11 2
2 6 106 2 1 13 2
2 7 95 2 1 15 2
2 1 92 2 1 17 2
2 5 84 2 1 19 2
2 3 77 2 1 21 2
2 2 67 2 1 23 2
2 1 59 2 1 25 2
2 4 49 2 1 27 2
2 1 40 2 1 29 2
2 2 34 2 1 31 2
2 3 23 2 1 37 6
2 1 19 2 1 39 2
4 1 62 1 1 3 2
4 2 59 1 1 7 4
4 6 53 1 1 11 2
4 2 51 1 1 13 2
4 3 48 1 1 15 2
4 2 45 1 1 17 2
4 5 40 1 1 19 2
4 2 37 1 1 23 4
4 2 35 1 1 25 2
4 2 32 1 1 27 2
4 1 31 1 1 29 2
4 2 28 1 1 31 2
4 2 26 1 1 33 2
4 2 23 1 1 35 2
4 1 21 1 1 39 4
4 1 14 1 1 51 12
4 2 55 2 1 5 4
4 1 54 2 1 7 2
4 2 52 2 1 9 2
4 1 51 2 1 11 2
4 5 46 2 1 13 2
4 2 44 2 1 15 2
4 3 41 2 1 17 2
4 3 37 2 1 19 2
4 2 35 2 1 21 2
4 1 34 2 1 23 2
4 1 33 2 1 25 2
4 1 32 2 1 31 6
4 3 29 2 1 33 2
4 1 28 2 1 35 2
4 1 26 2 1 39 4
4 1 24 2 1 41 2
4 1 22 2 1 43 2
4 2 19 2 1 45 2
5 8 169 1 1 3 4
5 10 148 1 1 5 2
5 8 137 1 1 7 2
5 6 127 1 1 9 2
5 8 118 1 1 11 2
5 7 109 1 1 13 2
5 3 105 1 1 15 2
5 4 95 1 1 17 2
5 3 84 1 1 19 2
5 3 76 1 1 21 2
5 4 68 1 1 23 2
5 4 60 1 1 25 2
5 4 50 1 1 27 2
5 1 35 1 1 31 4
5 2 29 1 1 33 2
5 1 25 1 1 35 2
5 3 21 1 1 37 2
5 1 18 1 1 39 2
5 2 11 1 1 43 4
5 1 180 2 1 1 2
5 11 162 2 1 3 2
5 9 147 2 1 5 2
5 9 135 2 1 7 2
5 6 125 2 1 9 2
5 6 116 2 1 11 2
5 6 106 2 1 13 2
5 7 95 2 1 15 2
5 1 92 2 1 17 2
5 5 84 2 1 19 2
5 3 77 2 1 21 2
5 2 67 2 1 23 2
5 1 59 2 1 25 2
5 4 49 2 1 27 2
5 1 40 2 1 29 2
5 2 34 2 1 31 2
5 3 23 2 1 37 6
5 1 19 2 1 39 2
3 2 179 1 1 1 2
3 4 172 1 1 3 2
3 3 168 1 1 5 2
3 6 157 1 1 7 2
3 4 151 1 1 9 2
3 9 142 1 1 11 2
3 10 130 1 1 13 2
3 7 123 1 1 15 2
3 3 119 1 1 17 2
3 5 112 1 1 19 2
3 3 108 1 1 21 2
3 3 103 1 1 23 2
3 12 91 1 1 25 2
3 2 68 1 1 27 2
3 2 46 1 1 29 2
3 8 29 1 1 31 2
3 2 23 1 1 33 2
3 3 8 1 1 35 2
3 5 175 3 1 3 4
3 7 163 3 1 5 2
3 12 151 3 1 7 2
3 12 139 3 1 9 2
3 4 132 3 1 11 2
3 9 122 3 1 13 2
3 7 114 3 1 15 2
3 4 108 3 1 17 2
3 7 101 3 1 19 2
3 5 96 3 1 21 2
3 7 89 3 1 23 2
3 2 87 3 1 25 2
3 4 68 3 1 27 2
3 4 50 3 1 29 2
3 3 40 3 1 31 2
3 3 22 3 1 33 2
3 1 8 3 1 35 2
END
You have set NT = 3 while the indexed s vector ranges from 1 to 5.
Set NT = 5 or NT = length(unique(s)).
I've a list of number from 1 to 53. I am trying to calculate 1) the quarter of a week and 2) the number of that week within that quarter using numeric week numbers. (if 53, needs to be qtr 4 wk 14, if 27 needs to be 3rd quarter wk 1). Got this working in excel, but not in python? Any thoughts?
tried the following, but at each try I've an issue with the wk's like 13 or 27 depending on the method I'm using.
13 -> should be qtr 1 , 27 -> should be 3 qtr.
df['qtr1'] = df['wk']//13
df['qtr2']=(np.maximum((df['wk']-1),1)/13)+1
df['qtr3']=((df1['wk']-1)//13)
df['qtr4'] = df['qtr2'].astype(int)
Results are awkward
wk qtr qtr2 qtr3 qtr4
1.0 0 1.076923 -1.0 1
13.0 1(wrong) 1.923077 0.0 1
14.0 1 2.000000 1.0 2
27.0 2 3.000000 1.0 2 (wrong)
28.0 2 3.076923 2.0 3
You can convert your weeks to integers, by using astype:
df['wk'] = df['wk'].astype(int)
You should subtract it with one first, like:
df['qtr'] = ((df['wk']-1) // 13) + 1
df['weekinqtr'] = (df['wk']-1) % 13 + 1
since 13//13 will be 1, not zero. This gives us:
>>> df
wk qtr weekinqtr
0 1 1 1
1 13 1 13
2 14 2 1
3 26 2 13
4 27 3 1
5 28 3 2
If you want extra columns per quarter, you can use get_dummies(..) [pandas-doc] to obtain a one-hot encoding per quarter:
>>> df.join(pd.get_dummies(df['qtr'], prefix='qtr'))
wk qtr weekinqtr qtr_1 qtr_2 qtr_3
0 1 1 1 1 0 0
1 13 1 13 1 0 0
2 14 2 1 0 1 0
3 26 2 13 0 1 0
4 27 3 1 0 0 1
5 28 3 2 0 0 1
Using div // and modulo % work for what you want I think
In [254]: df = pd.DataFrame({'week':range(52)})
In [255]: df['qtr'] = (df['week'] // 13) + 1
In [256]: df['qtr_week'] = df['week'] % 13
In [257]: df.loc[(df['qtr_week'] ==0),'qtr_week']=13
In [258]: df
Out[258]:
week qtr qtr_week
0 1 1 1
1 2 1 2
2 3 1 3
3 4 1 4
4 5 1 5
5 6 1 6
6 7 1 7
7 8 1 8
8 9 1 9
9 10 1 10
10 11 1 11
11 12 1 12
12 13 2 13
13 14 2 1
14 15 2 2
15 16 2 3
16 17 2 4
17 18 2 5
18 19 2 6
19 20 2 7
20 21 2 8
21 22 2 9
22 23 2 10
23 24 2 11
24 25 2 12
25 26 3 13
26 27 3 1
27 28 3 2
28 29 3 3
29 30 3 4
30 31 3 5
31 32 3 6
32 33 3 7
33 34 3 8
34 35 3 9
35 36 3 10
36 37 3 11
37 38 3 12
38 39 4 13
39 40 4 1
40 41 4 2
41 42 4 3
42 43 4 4
43 44 4 5
44 45 4 6
45 46 4 7
46 47 4 8
47 48 4 9
48 49 4 10
49 50 4 11
50 51 4 12
So I have some rows of data and some columns with dates.
As you can see on the image below.
I want the sum of the week for each row - but the tricky thing is that not every week is 5 days, so there might be weeks with 3 days. So somehow, I want to try to go for the weeknumber and then sum it.
Can anyone help with me a formular (or a VBA macro)?
I am completely lost after trying several approaches.
18-May-15 19-May-15 20-May-15 21-May-15 22-May-15 25-May-15 26-May-15 27-May-15 28-May-15 29-May-15 1-Jun-15 2-Jun-15 3-Jun-15 4-Jun-15 WEEK 1 TOTAL WEEK 2 TOTAL
33 15 10 19 18 8 10 15 10 29 16 24 8 26 74
18 11 8 17 0 6 16 9 16 16 36 9 6 4 55
0 0 1 0 0 1 0 0 1 0 0 3 3 2 8
30 7 4 8 8 11 10 3 0 11 3 4 5 6 18
0 0 0 11 0 0 0 1 0 7 8 1 1 2 12
1 1 4 0 5 1 6 2 1 4 2 4 5 4 15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
52 27 22 36 23 15 32 26 27 49 54 37 19 34 144
30 50 25 21 34 12 33 32 26 43 54 43 18 32 147
0 0 1 0 3 0 0 0 0 0 0 0 0 0 0
29 5 3 4 4 1 1 2 4 4 3 4 2 3 12
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 4 1 10 9 0 0 0 0 0 1 1 2
1 2 0 0 0 0 0 1 3 0 0 0 2 2 4
15 29 5 17 16 4 18 20 12 28 25 22 4 23 74
11 15 11 3 15 7 11 9 5 12 18 10 5 7 40
1 0 2 1 1 0 0 1 8 1 4 3 2 0 9
3 6 7 0 2 1 4 2 1 2 7 8 7 2 24
21 21 21 21 21 22 22 22 22 22 23 23 23 23
Using SUMIF is one way. But you need to get your references straight in order to make it easy to enter.
Note in the diagram below, the formula:
=SUMIF(Weeknums,M$1,$B2:$K2)
where weeknums is the row of calculated Week Numbers.
Also note that the column headers showing the Week number to be summed could be made more explanatory with custom formatting:
I know you've already accepted an answer but just to show you:
If you transposed your data you would then be able to utilise the pivot tables
You could set up a calculated field to calculate exactly what you wanted (and depending on how you sorted/grouped the date you could sort this by weeks, months, quarters or even years
You would then get all of your final values displayed in an easy to read format grouped by whatever you want. In my opinion this is a lot more powerful solution for the long run.