Getting the 30 mins max sum of Column B per day - excel

So I have the first column with dates at different time stamps. for the second column, i have the data. Let First column be A, Second column be B. i need to get the the sum of the data which is the maximum sum within 30 mins duration in a day.
So for example, for the data below,
dateTimeRead(YYYY-MM-DD HH-mm-ss) rain_value(mm) air_pressure(hPa)
1/2/2015 0:00 0 941.5675
1/2/2015 0:15 0 941.4625
1/2/2015 0:30 0 941.3
1/2/2015 0:45 0.1 941.2725
1/2/2015 1:00 0.2 941.12
1/2/2015 1:15 0.3 940.8625
1/2/2015 1:30 0.6 940.7575
1/2/2015 1:45 0.2 940.6075
1/2/2015 2:00 0 940.545
1/2/2015 2:15 0 940.27
1/2/2015 2:30 0 940.2125
1/2/2015 16:15 0 940.625
1/2/2015 16:30 0 940.69
1/2/2015 16:45 0 940.6175
1/2/2015 17:00 0 940.635
1/2/2015 19:00 0 941.9975
1/2/2015 20:45 0 942.7925
1/2/2015 21:00 0 942.745
1/2/2015 21:15 0 942.6325
1/2/2015 21:30 0 942.735
1/2/2015 21:45 0 942.765
1/2/2015 22:00 0 941.6
1/3/2015 2:15 0.1
1/3/2015 2:30 0.2 941.1275
1/3/2015 2:45 0.1 941.125
1/3/2015 3:00 0.1 940.955
1/3/2015 3:15 0 941.035
the desired output would be
Date Max Sum
1/2/2015 1.1
1/3/2015 0.4
and so On

You can do this by keeping track the 30-minute interval sums in a helper column, then using an array formula to calculate the max per day.
For example, let's suppose your data above is in columns A-C. (But we ignore the data in C and focus on column B as you have done in your example.) In $D$1, let's put your desired interval, 0:30. In column E, we'll keep track of, for each time in column A, what the sum of rain_value was for the last 30-minute window. To calculate this, you could paste the following formula in E2 and copy down the column (adjusting if you want > instead of >=, for example):
=SUMIFS(B:B,A:A,"<="&A2,A:A,">="&A2-$D$1)
// assumes the time interval is in $D$1
Now you have a column of data that includes the windows over which you want to take the max. One way to do this is by using the MAX formula as an array formula. First, create a new column F which just extracts the date part of the datetime in column A. You can do this just by putting =INT(A2) in cell F2 and copying down, for example.
Then, create a column G just for your dates (1/2/2015 and 1/3/2015 in your example). In column H, calculate the following array formula* in H2 and copy down to get the max of column E:
{=MAX(IF(F:F=G2,E:E))}
This will get the max per date.
*If you don't know how to execute array formulas, basically just type the formula =MAX(IF(F:F=G2,E:E)) into H2, but then instead of typing Enter, type Ctrl-Shift-Enter on Windows (or Cmd-Enter on a Mac). There are ways to do this last part without array formulas too, with clever use of SUMPRODUCT or INDEX.

Related

Spotfire calculate difference with respect to previous row value

I have a data as below. I have created column "difference in values" manually, the calculation is value at 8:15 AM - value at 8:00 AM which is 2 in second row and so on for all values of column Tushar and Lohit respectively. How can i do this calculation in Spotfire i believe over and previous function can help but i am unable find anything on this. Please help
Name Time Values Difference in values
Tushar 08:00 AM 2 0
Tushar 08:15 AM 4 2
Tushar 08:30 AM 5 1
Tushar 08:45 AM 6 1
Tushar 09:00 AM 7 1
Lohit 08:00 AM 2 0
Lohit 08:15 AM 4 2
Lohit 08:30 AM 5 1
Lohit 08:45 AM 6 1
This should work
SN([Values] - Max([Values]) over (Intersect(Previous([Time]),[Name])),0)
where Max(..) is just to have an aggregation, since it is only looking at the previous Time row for each value of Name. [so Min would work just as well].
SN(...) is there to set the result to 0 when it is empty (as in the first row of each Name).

Time manipulations

Hello I have to count how many people were scheduled on each hour in excel so I transformed starting and ending data/time to only contain time and basing on it I tried to substract these two information but I only get an hour then but what I need is the hours to be like this:
instead
starting on 9:00
ending on 17:00
this
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
to count every hour that employee was at work. But I don't know how :(
Or is there a better way of doing that?
Assuming your table looks something like this:
Person
Start
End
09:00
10:00
11:00
12:00
13:00
14:00
15:00
Alice
08:35
16:35
1
1
1
1
1
1
1
Bob
09:35
17:35
0
1
1
1
1
1
1
Carl
10:35
18:35
0
0
1
1
1
1
1
Dan
11:35
19:35
0
0
0
1
1
1
1
Ed
12:35
20:35
0
0
0
0
1
1
1
Total present
1
2
3
4
5
5
5
You can compute the entries 0 or 1 in each cell under the times using the formula
=IF(AND((E$4>$C6);(E$4<=$D6));1;0)
In the formula, E$4 is a reference to the column header, e.g. "9:00", $C6 and $D6 are references to the start and end times of the person. They are defined using partial absolute references ($) so the same formula can be copied and pasted in all the cells.
The result will be 1 if the person was present at that time and 0 if not.
The "Total present" formulas just sum up the 1's and 0's in the column.

How to remove successive similar numbers in a pandas DF column

I have a pandas DF with a column - this column can have 3 values, either 0, 1 or ' ' (see example below).
What I want to do is remove all successive numbers that are similar. So a 0 can never be followed by a 0 and a 1 can never be followed by 1. Instead I want to replace these by a ' '.
Current pandas DF
time
value
1:00
0
2:00
3:00
0
4:00
1
5:00
6:00
7:00
1
8:00
1
9:00
0
What I want
time
value
1:00
0
2:00
3:00
4:00
1
5:00
6:00
7:00
8:00
9:00
0
I tried to work with loops, but cannot find a clean way to refer to 'the next same value'.
Any simple solution for this?
An itertools solution:
from itertools import chain, groupby
df.value = list(chain.from_iterable(
[key, *['']*(len(list(gr))-1)]
for key, gr in groupby(df.value.replace("", np.nan).ffill())
)
)
replaceing empty strings with np.nan
forward filling the NaNs to get streams of 0's and 1's
grouping by 0's and 1's
placing back the key (which is 0 or 1) along with some empty strings (group's length - 1)
flattening these blocks with chain.from_iterable
casting to a list to assign it back to the dataframe
to get
time value
0 1:00 0
1 2:00
2 3:00
3 4:00 1
4 5:00
5 6:00
6 7:00
7 8:00
8 9:00 0
We can use loc on value to drop the rows having empty strings, then shift and compare the filtered rows to create a boolean mask, next mask the values with empty string where the boolean mask holds True
s = df['value'].loc[lambda x: x != '']
m = s.eq(s.shift())
df.loc[m[m].index, 'value'] = ''
time value
0 1:00 0
1 2:00
2 3:00
3 4:00 1
4 5:00
5 6:00
6 7:00
7 8:00
8 9:00 0

Groupby expanding count - elements changing of group at different time stamps

I have a HUGHE DataFrame that looks as follows (this is just an example to illustrate the problem):
id timestamp target_time interval
1 08:00:00 10:20:00 (10-11]
1 08:30:00 10:21:00 (10-11]
1 09:10:00 11:30:00 (11-12]
2 09:15:00 10:15:00 (10-11]
2 09:35:00 10:11:00 (10-11]
3 09:45:00 11:12:00 (11-12]
...
I would like to create a series looking as follows:
interval timestamp unique_ids
(10-11] 08:00:00 1
08:30:00 1
09:15:00 1
09:35:00 1
(11-12] 09:10:00 1
09:45:00 2
The objective is to count, for each time interval, how many unique ids had their corresponding target_time within the interval at their timestamp. Note that the target_time for each id can change at different timestamps. For instance, for the id 1 the interval is (10-11] from 08:00:00 to 08:30:00, but then it changes to (11-12] at 09:10:00. Therefore, at 09:15:00 I do not want to count the id 1 in the resulting Series.
I tried a groupby -> expand -> np.unique approach, but it does not provide the result that I want:
df.set_index('timestamp').groupby('interval').id.expanding().apply(lambda x: np.unique(x).shape[0])
interval timestamp unique_ids
(10-11] 08:00:00 1
08:30:00 1
09:15:00 2
09:35:00 2
(11-12] 09:10:00 1
09:45:00 2
Any hint on how can I approach this problem? I want to make use of pandas routines as much as possible, in order to reduce computational time, since the length of the DataFrame is 1453076...
Many thanks in advance!

Return values in same row by searching different column

Given data like this:
A B C D
1 MAX. Time MIN. Time
2 140 08:00 100 01:00
3 150 15:00 50 02:00
4 130 17:00 80 03:00
5 120 22:00 90 04:00
=MAX(A2:A5) will return 150 and
=MIN(C2:C5) will return 50
How can I find the values in COL B in same row as 150 (for MAX) and in COL D in the same row as 50 (for MIN)?
If you can confirm that you have only one max(min) value (if not, formula returns first occurance), you can simply use VLOOKUP:
=VLOOKUP(Max(A2:A5),A2:B5,2,0)
for min formulaa would be the same:
=VLOOKUP(Min(C2:C5),C2:D5,2,0)
Alternatively you can use more flexible formula:
=INDEX(B2:B5,MATCH(Min(C2:C5),C2:C5,0))
above formula finds min in column C and returns corresponding value from col B

Resources