Fetch start and end datetime while values are zero - python-3.x

I would like to fetch the start datetime and end datetime while the value of data is zero.
The data is in Postgresql.
If I get the Postgresql solution much help full or Python using numpy or pandas.
for example
column 1 will contain datetime
column 2 will contain values.
DateTime Value
06-07-2021 12:00 -521362.8779
06-07-2021 12:15 -57275.52732
06-07-2021 12:30 0
06-07-2021 12:45 0
06-07-2021 13:00 0
06-07-2021 13:15 0
06-07-2021 13:30 0
06-07-2021 13:45 0
06-07-2021 14:00 -57275.52732
06-07-2021 14:15 -377411.4886
06-07-2021 14:30 -377411.4886
06-07-2021 14:45 0
06-07-2021 15:00 0
06-07-2021 15:15 0
06-07-2021 15:30 -889863.5254
06-07-2021 15:45 -1194683.49
06-07-2021 16:00 0
06-07-2021 16:15 0
06-07-2021 16:30 0
06-07-2021 16:45 0
06-07-2021 17:00 -89539.05766
06-07-2021 17:15 -1117269.624
06-07-2021 17:30 -857357.2725
The required output shall be
Column 1 serial no,
Column 2 Start DateTime,
Column 3 End DateTime
Serial No Start DateTime End DateTime
1 06-07-2021 12:30 06-07-2021 13:45
2 06-07-2021 14:45 06-07-2021 15:15
3 06-07-2021 16:00 06-07-2021 16:45

Assuming the type of your DateTime column is already datetime or you transform your above string into a dataframe using
df = pd.read_csv(io.StringIO(df_string), sep='\s{2,}',engine='python',parse_dates=['DateTime'])
then you do
x = df['Value'].to_numpy()
mask = np.empty(x.shape[0], 'bool')
mask[0] = x[0] == 0
mask[1:] = (x[1:] == 0) & (x[:-1] != 0)
mask2 = np.empty(x.shape[0], 'bool')
mask2[-1] = x[0] == 0
mask2[:-1] = (x[1:] != 0) & (x[:-1] == 0)
df2 = pd.DataFrame({'Start': df['DateTime'][mask].reset_index(drop=True),
'End' :df['DateTime'][mask2].reset_index(drop=True)})
and you get
Start End
0 2021-06-07 12:30:00 2021-06-07 13:45:00
1 2021-06-07 14:45:00 2021-06-07 15:15:00
2 2021-06-07 16:00:00 2021-06-07 16:45:00

I just compare the current row with next/previous row values. If one is zero and the other is not, then it's a Start or End.
You can use the shift method to shift the rows.
df1 = pd.DataFrame()
df1['Start DateTime'] = (
df[(df['Value'] == 0) & (df['Value'].shift() != 0)]
['DateTime'].reset_index(drop=True) )
df1['End DateTime'] = (
df[(df['Value'] == 0) & (df['Value'].shift(-1) != 0)]
['DateTime'].reset_index(drop=True))
Start DateTime
End DateTime
0
06-07-2021 12:30
06-07-2021 13:45
1
06-07-2021 14:45
06-07-2021 15:15
2
06-07-2021 16:00
06-07-2021 16:45

Related

How to calculate average number per day of the week?

I have a list of data with total number of orders and I would like to calculate the average number of orders per day of the week. For example, average number of order on Monday.
0 2018-01-01 00:00:00 3162
1 2018-01-02 00:00:00 1146
2 2018-01-03 00:00:00 396
3 2018-01-04 00:00:00 848
4 2018-01-05 00:00:00 1624
5 2018-01-06 00:00:00 3052
6 2018-01-07 00:00:00 3674
7 2018-01-08 00:00:00 1768
8 2018-01-09 00:00:00 1190
9 2018-01-10 00:00:00 382
10 2018-01-11 00:00:00 3170
Make sure your date column is in datetime format (looks like it already is)
Add column to convert date to day of week
Group by the day of week and take average
df['Date'] = pd.to_datetime(df['Date']) # Step 1
df['DayofWeek'] =df['Date'].dt.day_name() # Step 2
df.groupby(['DayofWeek']).mean() # Step 3

I'm trying to find the last column with data in each row and return the column name to the data frame

I'm trying to get the column name the value from ffill is from.
I've searched google and stack overflow and haven't found a way to accomplish this.
This is the ffill code:
df["LAST_PUNCH"] = df.ffill(axis=1).iloc[:, -1]
This is my dataframe:
SHIFT IN OUT IN_1
DA6-0730 07:30 12:35 13:05
DB0-ACOM 08:18 12:30
DC4-0730 07:30 12:39 13:09
DC4-0730 07:30 12:34 13:04
This is my dataframe after using ffill:
SHIFT IN OUT IN_1 LAST_PUNCH
DA6-0730 07:30 12:35 13:05 13:05
DB0-ACOM 08:18 12:30 12:30
DC4-0730 07:30 12:39 13:09 13:09
DC4-0730 07:30 12:34 13:04 13:04
I would like to get the column name where the ffill value came from and
append to the end of the ffill value:
SHIFT IN OUT IN_1 LAST_PUNCH
DA6-0730 07:30 12:35 13:05 13:05_IN_1
DB0-ACOM 08:18 12:30 12:30_OUT
DC4-0730 07:30 12:39 13:09 13:09_IN_1
DC4-0730 07:30 12:34 13:04 13:04_IN_1
Ummm this is a little bit tricky
(df+'_'+pd.DataFrame(dict(zip(df.columns.values,df.columns.values)),index=df.index)).\
reindex(columns=df.columns).ffill(axis=1).iloc[:,-1]
Out[360]:
0 13:05_IN_1
1 12:30_OUT
2 13:09_IN_1
3 13:04_IN_1
Name: IN_1, dtype: object
Or using idxmax with reversed order of columns
df.ffill(axis=1).iloc[:, -1]+'_'+df[df.columns[::-1]].notnull().idxmax(1)
Out[375]:
0 13:05_IN_1
1 12:30_OUT
2 13:09_IN_1
3 13:04_IN_1
dtype: object

Transform CSV structure with pandas dataframe

My CSV contains rows such as:
entryTime entryPrice exitTime exitPrice
06/01/2009 04:00 93.565 06/01/2009 06:00 93.825
I want to load them into a Dataframe that will have two rows per CSV row, in the following format:
datetime signal price
06/01/2009 04:00 entry 93.565
06/01/2009 06:00 exit 93.825
indexed by datetime column. What would be a fast way to do it?
Use numpy.tile with numpy.ravel:
print (df)
entryTime entryPrice exitTime exitPrice
0 01/01/2009 04:00 90.565 02/01/2009 06:00 91.825
1 03/01/2009 04:00 92.565 04/01/2009 06:00 93.825
2 05/01/2009 04:00 94.565 06/01/2009 06:00 95.825
3 07/01/2009 04:00 96.565 08/01/2009 07:00 97.825
4 09/01/2009 04:00 98.565 10/01/2009 06:00 99.825
a = np.tile(['entry','exit'], len(df))
b = df[['entryTime','exitTime']].values.ravel()
c = df[['entryPrice','exitPrice']].values.ravel()
df = pd.DataFrame({'price':c, 'signal':a},
index=pd.to_datetime(b),
columns=['signal','price'])
print (df)
signal price
2009-01-01 04:00:00 entry 90.565
2009-02-01 06:00:00 exit 91.825
2009-03-01 04:00:00 entry 92.565
2009-04-01 06:00:00 exit 93.825
2009-05-01 04:00:00 entry 94.565
2009-06-01 06:00:00 exit 95.825
2009-07-01 04:00:00 entry 96.565
2009-08-01 07:00:00 exit 97.825
2009-09-01 04:00:00 entry 98.565
2009-10-01 06:00:00 exit 99.825

Fill column with previous values in another column spotfire

I am trying to insert a calculated column such that when T1 = CMP 1 Stops it should copy the timestamp when T1 = CMP 1 starts'
timestamp T1 Calculated Expected
5/1/2017 14:00
5/1/2017 14:15
5/1/2017 14:30 CMP 1 Starts
5/1/2017 14:45 CMP 1 Stops 5/1/2017 14:30 5/1/2017 14:30
5/1/2017 15:00
5/1/2017 15:15
5/1/2017 15:30
5/1/2017 15:45
5/1/2017 16:00
5/1/2017 16:15
5/1/2017 16:30 CMP 1 Starts
5/1/2017 16:45 CMP 1 ON
5/1/2017 17:00 CMP 1 Stops 5/1/2017 16:45 5/1/2017 16:30
5/1/2017 17:15
5/1/2017 17:30
5/1/2017 17:45
5/1/2017 18:00
5/1/2017 18:15
5/1/2017 18:30
5/1/2017 18:45 CMP 1 Starts
5/1/2017 19:00 CMP 1 ON
5/1/2017 19:15 CMP 1 Stops 5/1/2017 19:00 5/1/2017 18:45
5/1/2017 19:30
5/1/2017 19:45
Example: Expected column
Note: It is not necessary that it should fill the same row when T1=CMP 1 Stops, even it fill all null values with values when T1=CMP 1 Starts it will work for me
The first expression you will need is:
If((Trim([T1])="CMP 1 Stops") or (Trim([T1])="CMP 1 Starts"),Max([timestamp]) over (PreviousPeriod([timestamp]))) as [YourNewColumn]
Then, if you want to limit it to the rows where [T1] = "CMP 1 Stops" just add another calculated column:
case when [T1] = "CMP 1 Stops" then [YourNewColumn] end as [YourFinalColumn]

Maximum values per day from data with different dates

I have data taken at different times on different days, for example:
dateTimeRead(YYYY-MM-DD HH-mm-ss) rain_value(mm) air_pressure(hPa)
1/2/2015 0:00 0 941.5675
1/2/2015 0:15 0 941.4625
1/2/2015 0:30 0 941.3
1/2/2015 0:45 0 941.2725
1/2/2015 1:00 0.2 941.12
1/2/2015 1:15 0 940.8625
1/2/2015 1:30 0 940.7575
1/2/2015 1:45 0 940.6075
1/2/2015 2:00 0 940.545
1/2/2015 2:15 0 940.27
1/2/2015 2:30 0 940.2125
1/2/2015 16:15 0 940.625
1/2/2015 16:30 0 940.69
1/2/2015 16:45 0 940.6175
1/2/2015 17:00 0 940.635
1/2/2015 19:00 0 941.9975
1/2/2015 20:45 0 942.7925
1/2/2015 21:00 0 942.745
1/2/2015 21:15 0 942.6325
1/2/2015 21:30 0 942.735
1/2/2015 21:45 0 942.765
1/2/2015 22:00 0 7/30/1902
1/3/2015 2:30 0 941.1275
1/3/2015 2:45 0 941.125
1/3/2015 3:00 0 940.955
1/3/2015 3:15 0 941.035
There are dates with missing time stamps.
From these readings how may I extract the maximum values by day for rain_value(mm)?
There is a fairly standard array formula style to provide a pseudo-MAXIF function but I prefer to use INDEX and enter it as a standard formula.
      
With the date to be determined in F3, the formula in G3 is,
=MAX(INDEX(($A$2:$A$999>=$F3)*($A$2:$A$999<(F3+1))*$B$2:$B$999, , ))
A CSE array formula for the same thing would be something like,
=MAX(IF($A$2:$A$999>=$F3, IF($A$2:$A$999<$F3+1, $B$2:$B$900)))
Array formulas need to be finalized with Ctrl+Shift+Enter↵.
An array formula may not be suitable for your particular requirement since it seems you may have very many readings. Instead I would suggest a PivotTable, with the date/Time entries parsed (Text to Columns, Fixed width) and date for ROWS, Max of rain_value(mm) for VALUES.

Resources