I have data taken at different times on different days, for example:
dateTimeRead(YYYY-MM-DD HH-mm-ss) rain_value(mm) air_pressure(hPa)
1/2/2015 0:00 0 941.5675
1/2/2015 0:15 0 941.4625
1/2/2015 0:30 0 941.3
1/2/2015 0:45 0 941.2725
1/2/2015 1:00 0.2 941.12
1/2/2015 1:15 0 940.8625
1/2/2015 1:30 0 940.7575
1/2/2015 1:45 0 940.6075
1/2/2015 2:00 0 940.545
1/2/2015 2:15 0 940.27
1/2/2015 2:30 0 940.2125
1/2/2015 16:15 0 940.625
1/2/2015 16:30 0 940.69
1/2/2015 16:45 0 940.6175
1/2/2015 17:00 0 940.635
1/2/2015 19:00 0 941.9975
1/2/2015 20:45 0 942.7925
1/2/2015 21:00 0 942.745
1/2/2015 21:15 0 942.6325
1/2/2015 21:30 0 942.735
1/2/2015 21:45 0 942.765
1/2/2015 22:00 0 7/30/1902
1/3/2015 2:30 0 941.1275
1/3/2015 2:45 0 941.125
1/3/2015 3:00 0 940.955
1/3/2015 3:15 0 941.035
There are dates with missing time stamps.
From these readings how may I extract the maximum values by day for rain_value(mm)?
There is a fairly standard array formula style to provide a pseudo-MAXIF function but I prefer to use INDEX and enter it as a standard formula.
With the date to be determined in F3, the formula in G3 is,
=MAX(INDEX(($A$2:$A$999>=$F3)*($A$2:$A$999<(F3+1))*$B$2:$B$999, , ))
A CSE array formula for the same thing would be something like,
=MAX(IF($A$2:$A$999>=$F3, IF($A$2:$A$999<$F3+1, $B$2:$B$900)))
Array formulas need to be finalized with Ctrl+Shift+Enter↵.
An array formula may not be suitable for your particular requirement since it seems you may have very many readings. Instead I would suggest a PivotTable, with the date/Time entries parsed (Text to Columns, Fixed width) and date for ROWS, Max of rain_value(mm) for VALUES.
Related
I want to simulate battery charging data.
Imagine there is a battery with a constant capacity. e.g. 30000, in the real world a person will charge it at random times at 18:00-18:30, so sometimes the he starts at 18:29 some times start at 18:00, so the half hourly value will be varied by the start charging time. But the total amount won't be changed.
index value
0 2021-01-01 00:00:00 0
1 2021-01-01 00:30:00 0
2 2021-01-01 01:00:00 0
3 2021-01-01 01:30:00 0
4 2021-01-01 02:00:00 0
... ... ...
995 2021-01-21 17:30:00 0
996 2021-01-21 18:00:00 0
997 2021-01-21 18:30:00 0
998 2021-01-21 19:00:00 0
999 2021-01-21 19:30:00 0
1000 2021-01-21 20:00:00 0
So, if the charging speed is 5000 per half hour, it sometimes likes:[10,5000,5000,5000,5000,5000,4990], and sometimes [2500,5000,5000,5000,5000,2500].
And I want to generate such a pattern and insert it into a given time.
index value
0 2021-01-01 00:00:00 0
1 2021-01-01 00:30:00 0
2 2021-01-01 01:00:00 0
3 2021-01-01 01:30:00 0
4 2021-01-01 02:00:00 0
... ... ...
995 2021-01-21 17:30:00 0
996 2021-01-21 18:00:00 2500
997 2021-01-21 18:30:00 5000
998 2021-01-21 19:00:00 5000
999 2021-01-21 19:30:00 5000
1000 2021-01-21 20:00:00 5000
1001 2021-01-01 20:30:00 2500
1002 2021-01-01 21:00:00 0
Assume he charging around the time defined by start parameter. If the start is '2021-01-01 18:00' it will start charging between 18:00 to 18:30.
The function I want:
def insertPattern(emptyTimeseriesDF, capacity, speed, start) :
return dfWithInsertedPattern
Empty ts generated by:
index = pd.date_range(datetime.datetime(2021,1,1), periods=1000, freq='30min')
columns = ['value']
df = pd.DataFrame(index=index, columns=columns)
df = df.fillna(0)
df = df.reset_index()
df
I have a list of data with total number of orders and I would like to calculate the average number of orders per day of the week. For example, average number of order on Monday.
0 2018-01-01 00:00:00 3162
1 2018-01-02 00:00:00 1146
2 2018-01-03 00:00:00 396
3 2018-01-04 00:00:00 848
4 2018-01-05 00:00:00 1624
5 2018-01-06 00:00:00 3052
6 2018-01-07 00:00:00 3674
7 2018-01-08 00:00:00 1768
8 2018-01-09 00:00:00 1190
9 2018-01-10 00:00:00 382
10 2018-01-11 00:00:00 3170
Make sure your date column is in datetime format (looks like it already is)
Add column to convert date to day of week
Group by the day of week and take average
df['Date'] = pd.to_datetime(df['Date']) # Step 1
df['DayofWeek'] =df['Date'].dt.day_name() # Step 2
df.groupby(['DayofWeek']).mean() # Step 3
I have a database of hourly data for an entire year. I want to find the 98th percentile for NO2 (for example) for each hour for each season (Dec-Jan-Feb, Mar-Apr-May, etc.)
I'm trying to use MATCH and INDEX to find the cells for one hour for one season.
=INDEX(A1:E8985,MATCH(Z2,(C3:C8985=AA2,AA3,AA13)*(B3:B8985=Z2),0))
where A1:E8985 is the table area I'm looking in
Z2 is the hour (1:00), looking in column B, which contains the hours
AA2,AA3,AA13 are January, February, and December (one season), looking in column C, which contains the months.
Right now, I'm getting an #N/A error even though the criteria should be met multiple times. I have made sure that the columns match formats.
Sample of part of the table:
Date Time Month NO NO2
1/1/2016 1:00 January -0.1 0.2
1/1/2016 2:00 January -0.1 0.1
1/1/2016 3:00 January -0.1 0.1
1/1/2016 4:00 January -0.1 0.2
1/1/2016 5:00 January -0.1 0.2
1/1/2016 6:00 January -0.1 0.4
1/1/2016 7:00 January -0.1 0.3
1/1/2016 8:00 January -0.1 0.8
1/1/2016 9:00 January -0.1 0.5
1/1/2016 10:00 January -0.1 0.2
1/1/2016 11:00 January -0.1 1.3
1/1/2016 12:00 January -0.1 0.7
1/1/2016 13:00 January -0.1 0.4
1/1/2016 14:00 January 0 0.7
1/1/2016 15:00 January -0.1 0.5
1/1/2016 16:00 January -0.1 0.4
1/1/2016 17:00 January -0.1 1
1/1/2016 18:00 January -0.1 0.7
1/1/2016 19:00 January -0.1 0.9
1/1/2016 20:00 January 1.6 4.5
1/1/2016 21:00 January 2.8 6
1/1/2016 22:00 January 0.1 1.1
1/1/2016 23:00 January 0.2 1.3
1/2/2016 0:00 January 0.2 1.4
Let me summarize the logic you want, you want the 98 percentile of NO2 where the month is either January, February or December and the value of time is 1:00, then for 2:00 and so on.
If it is so find below the formula applied only to the current data you have provided.
Note that it is an array formula
=PERCENTILE.INC(
IF(C1:C25="January",
IF(B1:B25=Z2,
E1:E25,
""),
IF(C1:C25="February",
IF(B1:B25=Z2,
E1:E25,
""),
IF(C1:C25="December",
IF(B1:B25=Z2,
E1:E25,
""),
""))
),0.98)
I have my input data is as follows.
Input
10-08-2012 15-08-2012
sid abc001 0 abc001 0 abc001
status p 0 p 0 s
intime 10:00 0 10:00 0 09:00
outtime 19:00 0 19:00 0 18:00
Total 9 0 9 0 9
Here I need to delete specific cells in a column by shifting besides cells data to the left side. I have gone through below code to get required output.
Sub Delete_Range_To_ShiftLeft()
Range("C3:C7").Delete Shift:=xlToLeft
End Sub
Required output:
10-08-2012 15-08-2012
sid abc001 abc001 abc001
status p p s
intime 10:00 10:00 09:00
outtime 19:00 19:00 18:00
Total 9 9 9
Can you please let me know, if you have any better way to do it?
how can I change my hourly to daily data while there are some missing values in my hourly data? my excel is like:
date hour ENERGY(MJ)
1/01/2002 0:00 0
1/01/2002 1:00 0
1/01/2002 2:00 0
1/01/2002 3:00 0
1/01/2002 4:00 0
1/01/2002 5:00 0
1/01/2002 6:00 0.15
1/01/2002 7:00 0.74
1/01/2002 8:00 1.46
1/01/2002 9:00 2.23
1/01/2002 10:00 2.89
Thanks
If your data is in A1:C12 you have 5 hourly readings (total 7.47MJ) which may be summed with SUM(C2:C12), the divided with 5 derivable with COUNTIF(C2:C12,">"&0) to compute an average hourly rate from the data available, then scaled up for a full day by multiplying by 24:
=SUM(C2:C12)*24/COUNTIF(C2:C12,">"&0)