I'm working on spreadsheet with logged flows that are not at uniform periods.
Looking for formula for Col G that will average values in Col A for logged values for previous 10 minutes.
Here's the spreadsheet data:
Flow Time min sec sec 10_min Average
187.29 06:10:09 10 9 609
202.90 06:11:21 11 21 681
280.94 06:12:37 12 37 757
218.51 06:13:43 13 43 823
187.29 06:15:13 15 13 913
124.86 06:16:26 16 26 986
109.25 06:18:52 18 52 1132
109.25 06:20:00 20 0 1200 1 177.54
202.90 06:22:30 22 30 1350
265.33 06:23:36 23 36 1416
280.94 06:24:42 24 42 1482
249.73 06:25:58 25 58 1558
218.51 06:27:39 27 39 1659
421.41 06:28:47 28 47 1727
421.41 06:30:00 30 0 1800 1 294.32
Use an AVERAGEIFS and construct the criteria with the TEXT function while modifying one criteria by ten minutes.
=AVERAGEIFS(A:A,B:B, TEXT(B9-TIME(0, 10, 0), "\>0.0###############"),B:B, TEXT(B9, "\<\=0.0###############"))
Note that times can also be resolved as decimal numbers which I have used here. My second average came up slightly different from yours. You may wish to change the \>\= to \> .
Related
This is how my input looks like in excel,
days_took_to_equip
cumu_percent
1
0.017418302
2
0.020625735
3
0.023148307
4
0.025237133
5
0.026972115
6
0.028752754
7
0.030350763
8
0.032040087
9
0.033603853
10
0.035270349
11
0.036788458
12
0.037518976
13
0.038283738
14
0.039379516
15
0.040189935
16
0.040783481
17
0.041685215
18
0.042347247
19
0.043032109
20
0.043739798
21
0.044230616
22
0.04476709
23
0.045269322
24
0.045725896
25
0.046250956
26
0.046684701
27
0.047129861
28
0.047620678
29
0.047997352
30
0.048396854
Where my expected output is
Range
Avg cum Percent
1 to 10
0.027
1 to 20
0.033
1 to 30
0.038
Tried pivots tables and labelling is tricky here
I would need this out put to plot a graph
Try-
=MAP(SEQUENCE(3,1,10,10),LAMBDA(x,AVERAGE(INDEX(B2:B31,SEQUENCE(x)))))
I got three answers and the cells consists of formula
E3: =AVERAGE(INDEX($B$2:$B$31,SEQUENCE(RIGHT($D3,2))))
F3: =AVERAGE(INDEX($B$2:$B$31,ROW(INDIRECT("1:"&RIGHT($D3,2)))))
G3: =AVERAGE(OFFSET($A$1,1,1,RIGHT(D3,2)))
I have this set of data in a DataFrame :
data
winsor_data
0
1660
1660
1
600
600
2
50
50
3
3173.55
3173.55
4
30
30
5
120
120
6
7.84
7.84
7
1660
1660
8
33.3
33.3
9
2069.49
2069.49
10
42
42
11
384.29
384.29
12
1660
1660
13
1338.57
1338.57
14
200000
200000
15
1760
1760
The 14th value is clearly an outlier.
from scipy.stats.mstats import winsorize
dfdailyIncome['winsor_data'] = winsorize(df['data'], limits=(0,0.95))
I do not understand why the outlier is not clipped. May be it has something to do with the way the quantiles are calculated.
I think you are misinterpreting the 'limits' parameter.
If you want to cut 10 percent of your largest values, you need:
dfdailyIncome['winsor_data'] = winsorize(df['data'], limits=[0,0.1])
You cut 95 percent of your largest data in your example.
Hint: Even if you would use winsorize(df['data'], limits=[0,0.05]), your data would stay the same because 5 percent of your largest data is the original data because you have less than 20 values.
See the example from here for further explanation: scipy.stats.mstats.winsorize
I was able to get the highest value of the week. Now, I need to figure out which day of the week it was so I can tally up how many times a certain day of the week is the highest.
For example,
Day of the week that has highest value of that week
Mon:5
Tue:2
Wed:3
Thur:2
Fri:1
This is what my dataframe looked like before I parsed the information that I needed.
Date Weekdays Week Open Close
0 2019-06-26 Wednesday 26 208.279999 208.509995
1 2019-06-27 Thursday 26 208.970001 212.020004
2 2019-06-28 Friday 26 213.000000 213.169998
3 2019-07-01 Monday 27 214.250000 214.619995
4 2019-07-02 Tuesday 27 214.380005 214.539993
.. ... ... ... ... ...
500 2021-06-21 Monday 25 275.619995 277.100006
501 2021-06-22 Tuesday 25 277.570007 276.920013
502 2021-06-23 Wednesday 25 276.890015 274.660004
503 2021-06-24 Thursday 25 275.000000 275.489990
504 2021-06-25 Friday 25 276.369995 278.380005
[505 rows x 5 columns]
Now I was able to get the highest value of the week, but I want to get the day and tally the which days were the highest.
#Tally up the highest days of the week at OPEN
new_data.groupby(pd.Grouper('Week')).Open.max()
The result was
Week
26 213.000000
27 215.130005
28 215.210007
29 214.440002
30 208.369995
31 210.000000
32 204.199997
33 214.740005
34 210.050003
35 217.509995
36 222.000000
37 220.539993
38 220.279999
39 214.000000
40 214.300003
41 215.880005
42 216.740005
43 212.429993
44 213.550003
45 222.809998
46 228.500000
47 233.570007
48 233.919998
49 231.190002
50 231.259995
51 227.679993
52 226.860001
1 233.539993
2 234.789993
3 235.220001
4 233.000000
5 236.979996
6 241.429993
7 244.729996
8 248.070007
9 251.080002
10 264.220001
11 260.309998
12 252.750000
13 259.940002
14 264.220001
15 270.470001
16 272.299988
17 276.290009
18 289.970001
19 292.350006
20 290.200012
21 290.190002
22 292.910004
23 292.559998
24 286.660004
25 277.570007
53 230.500000
Name: Open, dtype: float64
I got you. We wrap the groupby in df.loc, then select the indexes for the max values of Open in each group. Finally just take the value_counts of the Weekdays.
df.loc[df.groupby(["Week"]).Open.idxmax()].Weekdays.value_counts()
I have requirement to get row number of next matching value. ie.
Number 1 Number 2 Number 3 Number 4 Number 5 Number 6
16 33 28 20 23 14
13 12 27 29 2 32
31 25 9 28 17 10
11 22 14 3 18 13
12 39 22 32 25 24
37 40 33 18 9 3
4 35 17 24 7 12
16 3 38 8 17 24
now 16 is present in 7th row, and skipped rows are 6. 33 is present in 6th row so skipped rows are 5. Similarly 28 is present in 3rd row so skipped rows are 1.
output will be :
6 4 1 19 10 2
assume that 20 and 23 found in 20th and 11th row respectively.Skipped rows = row number of next find of that number - present row number.
I am not able to form formula for this. Match should work I guess, but not sure.
Put this formula in the first cell:
=AGGREGATE(15,6,ROW($A$3:$F$22)/($A$3:$F$22=A2),1) - ROW($A$3)
Then drag/copy across
If you want to drag down (put the results in columnar form):
=AGGREGATE(15,6,ROW($A$3:$F$22)/($A$3:$F$22=INDEX($2:$2,ROW(1:1))),1) - ROW($A$3)
Put it in the first cell and drag/copy down.
So lets say I have a few numbers in a sheet
a b c d
1 33 53 23 11
2 42 4 83 64
3 75 3 48 38
4 44 0 22 45
5 2 34 76 6
6
7 Total 85
I would like to display those numbers so that the cell value still holds the original figure (A1 = 33)
but the cell displays both the number and a percentage from the total (B7) eg
a b c d
1 33 (39%) 53 (62%) 23 (27%) 11 (13%)
2 42 (49%) 4 (5%) 83 (98%) 64 (75%)
3 75 (88%) 3 (4%) 48 (56%) 38 (45%)
4 44 (52%) 0 (0%) 22 (26%) 45 (53%)
5 2 (2%) 34 (40%) 76 (89%) 6 (7%)
6
7 Total 85
I know how to format a cell as a percentage, but I can't figure out how to display both original values, the calculated percentage value (value/total*100), but not change the cell value so I could still sum the cells in the end (eg. A6 =SUM(A1:A5) = 196)
Does anyone have an idea? I was hoping there could be a way to duplicate and calculate the figure using text formatting, but I can't get anything to work.
I'm guessing this is a trivial answer and maybe not what you're looking for, but why not just add a column for each of the columns you have now?
a a' b b' c c' d d'
1 33 (39%) 53 (62%) 23 (27%) 11 (13%)
2 42 (49%) 4 (5%) 83 (98%) 64 (75%)
3 75 (88%) 3 (4%) 48 (56%) 38 (45%)
4 44 (52%) 0 (0%) 22 (26%) 45 (53%)
5 2 (2%) 34 (40%) 76 (89%) 6 (7%)
6
7 Total 85
#Ari’s answer seems to meet to meet the requirements in your question, not repeat information more than the example you gave for output requirement and be viable for up to around 8000 or so columns to start with (unless a very old version of Excel) and Jerry’s comment is also correct that what you want to achieve the way you want to achieve it is not possible.
However there are other approaches that might be acceptable substitutes. One is to copy your data and Paste Special with Operation Divide, either elsewhere or over the top of your data. If over the top this either shows the values or the percentages otherwise duplicates your data. Over the top would also require something like Operation Multiply to revert back to values, and reformatting each time if to appear as in your example.
Another is to use a PivotTable with some calculated fields and both are shown below:
I appreciate neither is exactly what you are asking for.