So I have some rows of data and some columns with dates.
As you can see on the image below.
I want the sum of the week for each row - but the tricky thing is that not every week is 5 days, so there might be weeks with 3 days. So somehow, I want to try to go for the weeknumber and then sum it.
Can anyone help with me a formular (or a VBA macro)?
I am completely lost after trying several approaches.
18-May-15 19-May-15 20-May-15 21-May-15 22-May-15 25-May-15 26-May-15 27-May-15 28-May-15 29-May-15 1-Jun-15 2-Jun-15 3-Jun-15 4-Jun-15 WEEK 1 TOTAL WEEK 2 TOTAL
33 15 10 19 18 8 10 15 10 29 16 24 8 26 74
18 11 8 17 0 6 16 9 16 16 36 9 6 4 55
0 0 1 0 0 1 0 0 1 0 0 3 3 2 8
30 7 4 8 8 11 10 3 0 11 3 4 5 6 18
0 0 0 11 0 0 0 1 0 7 8 1 1 2 12
1 1 4 0 5 1 6 2 1 4 2 4 5 4 15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
52 27 22 36 23 15 32 26 27 49 54 37 19 34 144
30 50 25 21 34 12 33 32 26 43 54 43 18 32 147
0 0 1 0 3 0 0 0 0 0 0 0 0 0 0
29 5 3 4 4 1 1 2 4 4 3 4 2 3 12
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 4 1 10 9 0 0 0 0 0 1 1 2
1 2 0 0 0 0 0 1 3 0 0 0 2 2 4
15 29 5 17 16 4 18 20 12 28 25 22 4 23 74
11 15 11 3 15 7 11 9 5 12 18 10 5 7 40
1 0 2 1 1 0 0 1 8 1 4 3 2 0 9
3 6 7 0 2 1 4 2 1 2 7 8 7 2 24
21 21 21 21 21 22 22 22 22 22 23 23 23 23
Using SUMIF is one way. But you need to get your references straight in order to make it easy to enter.
Note in the diagram below, the formula:
=SUMIF(Weeknums,M$1,$B2:$K2)
where weeknums is the row of calculated Week Numbers.
Also note that the column headers showing the Week number to be summed could be made more explanatory with custom formatting:
I know you've already accepted an answer but just to show you:
If you transposed your data you would then be able to utilise the pivot tables
You could set up a calculated field to calculate exactly what you wanted (and depending on how you sorted/grouped the date you could sort this by weeks, months, quarters or even years
You would then get all of your final values displayed in an easy to read format grouped by whatever you want. In my opinion this is a lot more powerful solution for the long run.
Related
I am a python beginner.
I have the following pandas DataFrame, with only two columns; "Time" and "Input".
I want to loop over the "Input" column. Assuming we have a window size w= 3. (three consecutive values) such that for every selected window, we will check if all the items/elements within that window are 1's, then return the first item as 1 and change the remaining values to 0's.
index Time Input
0 11 0
1 22 0
2 33 0
3 44 1
4 55 1
5 66 1
6 77 0
7 88 0
8 99 0
9 1010 0
10 1111 1
11 1212 1
12 1313 1
13 1414 0
14 1515 0
My intended output is as follows
index Time Input What_I_got What_I_Want
0 11 0 0 0
1 22 0 0 0
2 33 0 0 0
3 44 1 1 1
4 55 1 1 0
5 66 1 1 0
6 77 1 1 1
7 88 1 0 0
8 99 1 0 0
9 1010 0 0 0
10 1111 1 1 1
11 1212 1 0 0
12 1313 1 0 0
13 1414 0 0 0
14 1515 0 0 0
What should I do to get the desired output? Am I missing something in my code?
import pandas as pd
import re
pd.Series(list(re.sub('111', '100', ''.join(df.Input.astype(str))))).astype(int)
Out[23]:
0 0
1 0
2 0
3 1
4 0
5 0
6 1
7 0
8 0
9 0
10 1
11 0
12 0
13 0
14 0
dtype: int32
I have dataframe i want to move column name to left from specific column. original dataframe have many columns can not do this by rename columns
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'H':[1,3,4,7,8,11,1,15,78,15,16,87],
'N':[1,3,4,98,8,11,1,15,20,15,16,87],
'p':[1,3,4,9,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'y':[0,0,0,0,1,1,1,0,0,0,0,0]})
print((df))
A H N p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here i want to remove label N first dataframe after removing label N
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Rrquired output:
A H P B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Here last column can be ignore
Note: in original dataframe have many columns , can not rename columns , so need some auto method to shift column names lef
You can do
df.columns=sorted(df.columns.str.replace('N',''),key=lambda x : x=='')
df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
Replace the columns with your own custom list.
>>> cols = list(df.columns)
>>> cols.remove('N')
>>> df.columns = cols + ['']
Output
>>> df
A H p B y
0 1 1 1 1 1 0
1 3 3 3 3 3 0
2 4 4 4 4 4 0
3 7 7 98 9 6 0
4 8 8 8 8 8 1
5 11 11 11 11 11 1
6 1 1 1 1 1 1
7 15 15 15 15 19 0
8 20 78 20 20 20 0
9 15 15 15 15 15 0
10 16 16 16 16 16 0
11 87 87 87 87 87 0
I have DataFrame with two columns ID and Value1, I want to select rows when the value of column value1 column changes. I want to save rows 3 before change and 3 after the change and also change point row.
df=pd.DataFrame({'ID':[1,3,4,6,7,8,90,23,56,78,90,34,56,78,89,34,56],'Value1':[0,0,0,0,0,2,2,2,2,0,0,0,1,1,1,1,1]})
ID Value1
0 1 0
1 3 0
2 4 0
3 6 0
4 7 0
5 8 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
12 56 1
13 78 1
14 89 1
15 34 1
16 56 1
output:
ID Value1
0 4 0
1 6 0
2 7 0
3 8 2
4 90 2
5 23 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
IIUC,
import numpy as np
df=pd.DataFrame({'ID':[1,3,4,6,7,8,90,23,56,78,90,34,56,78,89,34,56],'Value1':[0,0,0,0,0,2,2,2,2,0,0,0,1,1,1,1,1]})
df.reset_index(drop=True) #index needs to start from zero for solution
ind = list(set([val for i in df[df['Value1'].diff()!=0].index for val in range(i-3, i+4) if i>0 and val>=0]))
# diff gives column wise differencing. combined it with nested list and
# finally, list(set()) to drop any duplicates in index values
df[df.index.isin(ind)]
ID Value1
2 4 0
3 6 0
4 7 0
5 8 2
6 90 2
7 23 2
8 56 2
9 78 0
10 90 0
11 34 0
12 56 1
13 78 1
14 89 1
15 34 1
If you want to retain occurrences of duplicates, drop the list(set()) function over the list
I've a list of number from 1 to 53. I am trying to calculate 1) the quarter of a week and 2) the number of that week within that quarter using numeric week numbers. (if 53, needs to be qtr 4 wk 14, if 27 needs to be 3rd quarter wk 1). Got this working in excel, but not in python? Any thoughts?
tried the following, but at each try I've an issue with the wk's like 13 or 27 depending on the method I'm using.
13 -> should be qtr 1 , 27 -> should be 3 qtr.
df['qtr1'] = df['wk']//13
df['qtr2']=(np.maximum((df['wk']-1),1)/13)+1
df['qtr3']=((df1['wk']-1)//13)
df['qtr4'] = df['qtr2'].astype(int)
Results are awkward
wk qtr qtr2 qtr3 qtr4
1.0 0 1.076923 -1.0 1
13.0 1(wrong) 1.923077 0.0 1
14.0 1 2.000000 1.0 2
27.0 2 3.000000 1.0 2 (wrong)
28.0 2 3.076923 2.0 3
You can convert your weeks to integers, by using astype:
df['wk'] = df['wk'].astype(int)
You should subtract it with one first, like:
df['qtr'] = ((df['wk']-1) // 13) + 1
df['weekinqtr'] = (df['wk']-1) % 13 + 1
since 13//13 will be 1, not zero. This gives us:
>>> df
wk qtr weekinqtr
0 1 1 1
1 13 1 13
2 14 2 1
3 26 2 13
4 27 3 1
5 28 3 2
If you want extra columns per quarter, you can use get_dummies(..) [pandas-doc] to obtain a one-hot encoding per quarter:
>>> df.join(pd.get_dummies(df['qtr'], prefix='qtr'))
wk qtr weekinqtr qtr_1 qtr_2 qtr_3
0 1 1 1 1 0 0
1 13 1 13 1 0 0
2 14 2 1 0 1 0
3 26 2 13 0 1 0
4 27 3 1 0 0 1
5 28 3 2 0 0 1
Using div // and modulo % work for what you want I think
In [254]: df = pd.DataFrame({'week':range(52)})
In [255]: df['qtr'] = (df['week'] // 13) + 1
In [256]: df['qtr_week'] = df['week'] % 13
In [257]: df.loc[(df['qtr_week'] ==0),'qtr_week']=13
In [258]: df
Out[258]:
week qtr qtr_week
0 1 1 1
1 2 1 2
2 3 1 3
3 4 1 4
4 5 1 5
5 6 1 6
6 7 1 7
7 8 1 8
8 9 1 9
9 10 1 10
10 11 1 11
11 12 1 12
12 13 2 13
13 14 2 1
14 15 2 2
15 16 2 3
16 17 2 4
17 18 2 5
18 19 2 6
19 20 2 7
20 21 2 8
21 22 2 9
22 23 2 10
23 24 2 11
24 25 2 12
25 26 3 13
26 27 3 1
27 28 3 2
28 29 3 3
29 30 3 4
30 31 3 5
31 32 3 6
32 33 3 7
33 34 3 8
34 35 3 9
35 36 3 10
36 37 3 11
37 38 3 12
38 39 4 13
39 40 4 1
40 41 4 2
41 42 4 3
42 43 4 4
43 44 4 5
44 45 4 6
45 46 4 7
46 47 4 8
47 48 4 9
48 49 4 10
49 50 4 11
50 51 4 12
I have this data set and I would like to make all boxplots of the 9 input variables to appear on the same plot, despite that they are in different scales. Could you please tell me if there is an easy way to accomplish this?
I am a novice SAS user so I would appreciate some advice. Thank you.
data raw;
input ID$ Family DistRd Cotton Maize Sorg Millet Bull Cattle Goats;
datalines;
FARM1 12 80 1.5 1 3 0.25 2 0 1
FARM2 54 8 6 4 0 1 6 32 5
FARM3 11 13 0.5 1 0 0 0 0 0
FARM4 21 13 2 2.5 1 0 1 0 5
FARM5 61 30 3 5 0 0 4 21 0
FARM6 20 70 0 2 3 0 2 0 3
FARM7 29 35 1.5 2 0 0 0 0 0
FARM8 29 35 2 3 2 0 0 0 0
FARM9 57 9 5 5 0 0 4 5 2
FARM10 23 33 2 2 1 0 2 1 7
FARM11 13 9 0.5 2 2 0 0 0 0
FARM12 15 9 2 2 2 0 0 0 0
FARM13 27 3 1.5 0 2 1 0 0 1
FARM14 28 5 2 0.5 2 2 2 0 5
FARM15 52 5 7 1 7 0 4 11 3
FARM16 12 10 2 2.5 3 0 0 0 0
FARM17 25 30 1 1 4 0 2 0 5
FARM18 5 3 1 0 1 0.5 0 0 3
FARM19 45 30 4.5 1 1 0 6 13 20
FARM20 6 7 1 1 1 1 2 0 5
FARM21 17 8 1.5 0.5 1.5 0.25 0 0 2
FARM22 22 6 3 2 3 1 3 0 2
FARM23 43 40 7 3 3 0.5 6 2 3
FARM24 66 36 0 0.5 5 5 0 0 0
FARM25 15 3 1 0 1.5 0.5 1 0 1
FARM26 26 5 2 1.5 2 2 1 0 0
FARM27 31 5 1.5 1 3 2 2 0 0
FARM28 37 2 3 2 3 5 3 0 5
FARM29 81 2 8 4 4 12 7 8 13
FARM30 14 10 0 0.5 3 1 0 0 0
FARM31 20 7 2 1 4 3 2 0 5
FARM32 26 7 2 1 2 2 2 0 2
FARM33 12 10 0.5 1 3 1 0 0 0
FARM34 18 35 4 3 3 3 4 0 0
FARM35 11 29 1 0.5 3 2 2 0 2
FARM36 50 29 5 3 5 4 4 8 4
FARM37 7 9 0 1 1 0 0 0 0
FARM38 26 9 2 1 3 0 0 0 0
FARM39 19 33 1 1.5 0 4 2 0 0
FARM40 43 33 3 3 4 7 4 3 0
FARM41 18 12 3 0 1 1 2 1 1
FARM42 64 20 3 5 2 2 4 0 6
FARM43 61 25 9 7 3 8 4 17 0
FARM44 18 3 0.5 0.5 2 2 0 0 4
FARM45 11 2 0.5 0 1.5 1.5 1 1 0
FARM46 30 3 4 2 4 0 4 2 0
FARM47 16 1.5 2 0.5 2 2 2 2 0
FARM48 46 1 0.75 1 3 2 0 0 2
FARM49 18 2 1.5 0.5 2 2 2 0 2
FARM50 81 3 12 1.5 10 8 11 14 15
FARM51 15 0 1.5 1.5 2.5 0 1 0 0
FARM52 26 11 3.5 2 4 0 2 2 2
FARM53 10 11 0 0 1.5 0 0 0 0
FARM54 40 12 5 3 6 1 8 17 10
FARM55 82 4 11 7 5 0.5 8 5 0
FARM56 40 5.5 6 4 2.5 1 3 0 2
FARM57 29 8 3 2 4 2 0 0 2
FARM58 23 5 5 4 3 1 1 0 0
FARM59 53 4 0 3 0 3 6 0 0
FARM60 57 3.5 9 8 0 0 10 23 0
FARM61 23 4 2 2 0.5 4 2 0 0
FARM62 9 31 2 2 0 2 1 0 0
FARM63 22 35 3 2 3 0 5 6 1
FARM64 25 35 3 1 2.5 0 4 8 10
FARM65 20 0 1.5 1 3 0 1 6 0
FARM66 27 41 1.1 0.25 1.5 1.5 0 3 1
FARM67 30 19 2 2 4 1 2 0 5
FARM68 77 18 8 4 6 4 6 8 6
FARM69 13 100 0.5 0.5 0 1 0 0 4
FARM70 24 100 2 3 0 0.5 3 14 10
FARM71 29 90 2 1.5 1.5 1.5 2 0 2
FARM72 57 90 10 7 0 1.5 7 8 7
;
run;
You need to transpose the values and use a group= statement.
Steps
1 Sort by ID
2 Transpose the data
3 Adjust the labels for display
4 Plot with PROC SGPLOT
proc sort data=raw;
by id;
run;
proc transpose data=raw out=raw_t;
by id;
run;
data raw_t;
set raw_t;
label _name_ = "Variable";
label col1 = "Value";
run;
ods html;
title "My Box Plot";
proc sgplot data=raw_t;
vbox col1 / group=_name_ ;
run;
ods html close;
Produces: