compute elapsed time between rows - elapsedtime

| uploadedby | uploaddate |
Gracey Vinas | 2012-04-20 20:16:00
Gracey Vinas | 2012-04-20 20:25:00
Gracey Vinas | 2012-04-20 20:35:00
Gracey Vinas | 2012-04-20 20:39:00
Gracey Vinas | 2012-04-20 22:07:00
Gracey Vinas | 2012-04-21 00:04:00
Gracey Vinas | 2012-04-21 01:14:00
Gracey Vinas | 2012-04-23 17:56:00
Gracey Vinas | 2012-04-23 18:06:00
Gracey Vinas | 2012-04-23 18:21:00
Gracey Vinas | 2012-04-23 19:04:00
Gracey Vinas | 2012-04-23 19:11:00
Gracey Vinas | 2012-04-23 19:24:00
Gracey Vinas | 2012-04-23 20:08:00
Gracey Vinas | 2012-04-23 20:22:00
Gracey Vinas | 2012-04-23 21:00:00
Gracey Vinas | 2012-04-23 22:04:00
Gracey Vinas | 2012-04-23 22:17:00
Gracey Vinas | 2012-04-23 22:29:00
Gracey Vinas | 2012-04-23 23:02:00
Gracey Vinas | 2012-04-23 23:48:00
Gracey Vinas | 2012-04-24 00:23:00
Gracey Vinas | 2012-04-24 01:54:00
Gracey Vinas | 2012-04-24 17:13:00
Gracey Vinas | 2012-04-24 17:32:00
Gracey Vinas | 2012-04-24 17:38:00
Gracey Vinas | 2012-04-24 17:45:00
Gracey Vinas | 2012-04-24 17:54:00
How do I get the average elapsed time for each upload by date in msql. Ex.(the Average elapsed time for each upload on 2012-04-20 is (time diff of row 1 and row 2(9mins) + time diff of row 2 and row 3(10 mins) + time difference of row 3 and row 4(4 mins) + time difference for row 4 and row 5(92 mins)/4 = Average elapsed time is 28.75 mins.

You didn't specify the language so there is a python solution:
from re import search
from itertools import groupby
from operator import itemgetter
from datetime import datetime
elapsed = []
averages = []
with open("log.txt","r") as f:
for line in f:
date = search("\d{4}-\d{2}-\d{2}",line)
time = search("\d{2}:\d{2}:\d{2}",line)
if (date and time):
elapsed.append((date.group(),datetime.strptime(time.group(),'%H:%M:%S')))
for date,times in groupby(elapsed,itemgetter(0)):
times = list(times)
averages.append((date,(times[-1][1]-times[0][1]).seconds/60./(len(times)-1)))
for avg in averages:
print 'On date %s average minutes downloading is %.2f' % avg
The output for your data in log.txt file is:
On date 2012-04-20 average minutes downloading is 27.75
On date 2012-04-21 average minutes downloading is 70.00
On date 2012-04-23 average minutes downloading is 27.08
On date 2012-04-24 average minutes downloading is 175.17

Related

Extract all rows on new sheet where driver name = NS

On a new excel sheet, i am trying to list all rows for a specific drivers name from this sheet.
I want all the rows where the drivers name is NS (or cell A1 on my new sheet). I have over 1000 rows of data and more columns than below but this gives you an idea.
The first line of data is A4:M4
Tried index/match function but unable to get it to work.
+------------+-------------------------------+----------------+-------------+
| Date | Vehicle | Driver | Drops Taken |
+------------+-------------------------------+----------------+-------------+
| 01/04/2019 | LCG - DAF 7.5 Tonner | RL | |
| 01/04/2019 | GXO - Merc 3.5T Dropside | KA | 9 |
| 01/04/2019 | KDZ - DAF 12 Tonner | NS | 12 |
| 01/04/2019 | RYZ - DAF 12 Tonner | MM | 10 |
| 02/04/2019 | GXO - Merc 3.5T Dropside | KA | 8 |
| 02/04/2019 | KDZ - DAF 12 Tonner | NS | 12 |
| 02/04/2019 | LCG - DAF 7.5 Tonner | RL | |
| 02/04/2019 | RYZ - DAF 12 Tonner | MM | 9 |
| 03/04/2019 | KDV - DAF 12 Tonner | NS | |
| 03/04/2019 | GXO - Merc 3.5T Dropside | KA | 8 |
| 03/04/2019 | Hire Vehicle | RL | |
| 03/04/2019 | KDZ - DAF 12 Tonner | NS | 8 |
| 03/04/2019 | RYZ - DAF 12 Tonner | MM | 7 |
+------------+-------------------------------+----------------+-------------+

why Linux system time adds and substracts one hour every 6 months?

When calculating the number of seconds between 2 consecutive days there should be 86400 seconds (24*60*60).
But twice a year its not the case...
One time there are 23 hours on the day and 6 months later there are 25 hours in the day.
Why does this happen?
I ran a code to check the number of seconds between 2 days from 2005 till 2019
and all year there are exactly 24 hours except 2 days were there are 23 and 25.
Why does this happen?
Here is a summery of my results -
The difference column is the number of seconds between this day and the previous one to 86400 seconds
+------------+------------+-------------------+
| dates | difference | number_of_seconds |
+------------+------------+-------------------+
| 2005-04-02 | 3600 | 82800 |
| 2005-10-10 | -3600 | 90000 |
| 2006-04-01 | 3600 | 82800 |
| 2006-10-02 | -3600 | 90000 |
| 2007-03-31 | 3600 | 82800 |
| 2007-09-17 | -3600 | 90000 |
| 2008-03-29 | 3600 | 82800 |
| 2008-10-06 | -3600 | 90000 |
| 2009-03-28 | 3600 | 82800 |
| 2009-09-28 | -3600 | 90000 |
| 2010-03-27 | 3600 | 82800 |
| 2010-09-13 | -3600 | 90000 |
| 2011-04-02 | 3600 | 82800 |
| 2011-10-03 | -3600 | 90000 |
| 2012-03-31 | 3600 | 82800 |
| 2012-09-24 | -3600 | 90000 |
| 2013-03-30 | 3600 | 82800 |
| 2013-10-28 | -3600 | 90000 |
| 2014-03-29 | 3600 | 82800 |
| 2014-10-27 | -3600 | 90000 |
| 2015-03-28 | 3600 | 82800 |
| 2015-10-26 | -3600 | 90000 |
| 2016-03-26 | 3600 | 82800 |
| 2016-10-31 | -3600 | 90000 |
| 2017-03-25 | 3600 | 82800 |
| 2017-10-30 | -3600 | 90000 |
| 2018-03-24 | 3600 | 82800 |
| 2018-10-29 | -3600 | 90000 |
+------------+------------+-------------------+
Here is an example of the code that I ran inside my full code -
echo $((($(date +%s --date 2006-03-31)-$(date +%s --date 2006-03-30))))
echo $((($(date +%s --date 2006-04-01)-$(date +%s --date 2006-03-31))))
echo $((($(date +%s --date 2006-04-02)-$(date +%s --date 2006-04-01))))
The date command with the %s format gives you the wall clock time in seconds from the epoch and your location has daylight savings time. So when you change to and from summer time you either gain or loose an hour.

Count the number of sessions if the beginning and end of each session is known [duplicate]

This question already has answers here:
How to group by time interval in Spark SQL
(2 answers)
Closed 4 years ago.
I have a hive table with two columns with date-time values: start and finish of "session". The following is the sample of such a table:
+----------------------+----------------------+--+
| start_time | end_time |
+----------------------+----------------------+--+
| 2017-01-01 00:24:52 | 2017-01-01 00:25:20 |
| 2017-01-01 00:31:11 | 2017-01-01 10:31:15 |
| 2017-01-01 10:31:15 | 2017-01-01 20:40:53 |
| 2017-01-01 20:40:53 | 2017-01-01 20:40:53 |
| 2017-01-01 10:31:15 | 2017-01-01 10:31:15 |
| 2017-01-01 07:09:34 | 2017-01-01 07:29:00 |
| 2017-01-01 11:36:41 | 2017-01-01 15:32:00 |
| 2017-01-01 07:29:00 | 2017-01-01 07:34:30 |
| 2017-01-01 11:06:30 | 2017-01-01 11:36:41 |
| 2017-01-01 07:45:00 | 2017-01-01 07:50:00 |
+----------------------+----------------------+--+
There are a lot of sessions. I need to get a dataset that presents a number of sessions on half-hour time grid on some interval as following
+----------------------+--------------+--+
| time | sessions_num |
+----------------------+--------------+--+
| 2018-07-04 00:30:00 | 85 |
| 2018-07-04 01:00:00 | 86 |
| 2018-07-04 01:30:00 | 84 |
| 2018-07-04 02:00:00 | 85 |
| 2018-07-04 02:30:00 | 84 |
| 2018-07-04 03:00:00 | 84 |
| 2018-07-04 03:30:00 | 84 |
| 2018-07-04 04:00:00 | 84 |
| 2018-07-04 04:30:00 | 84 |
| 2018-07-04 05:00:00 | 84 |
| 2018-07-04 05:30:00 | 84 |
| 2018-07-04 06:00:00 | 84 |
| 2018-07-04 06:30:00 | 85 |
| 2018-07-04 07:00:00 | 85 |
| 2018-07-04 07:30:00 | 85 |
| 2018-07-04 08:00:00 | 85 |
| 2018-07-04 08:30:00 | 85 |
| 2018-07-04 09:00:00 | 83 |
| 2018-07-04 09:30:00 | 82 |
| 2018-07-04 10:00:00 | 82 |
| 2018-07-04 10:30:00 | 83 |
| 2018-07-04 11:00:00 | 82 |
| 2018-07-04 11:30:00 | 82 |
| 2018-07-04 12:00:00 | 83 |
+----------------------+--------------+--+
What is the Apache Hive or Apache Spark or maybe some other way to make last table from first one?
You can do that with the dataframe window function but it will require some preprocessing of your data. Pyspark example:
#creating example dataframe
from pyspark.sql.functions import to_timestamp
l = [('2017-01-01 00:24:52','2017-01-01 00:25:20')
,('2017-01-01 00:31:11', '2017-01-01 10:31:15')
,('2017-01-01 10:31:15','2017-01-01 20:40:53')
,('2017-01-01 20:40:53','2017-01-01 20:40:53')
,('2017-01-01 10:31:15','2017-01-01 10:31:15')
,('2017-01-01 07:09:34','2017-01-01 07:29:00')
,('2017-01-01 11:36:41','2017-01-01 15:32:00')
,('2017-01-01 07:29:00','2017-01-01 07:34:30' )
,('2017-01-01 11:06:30','2017-01-01 11:36:41' )
,('2017-01-01 07:45:00','2017-01-01 07:50:00' )
]
df = spark.createDataFrame(l,['begin','end'])
df = df.select(to_timestamp(df.begin).alias('begin'),to_timestamp(df.end).alias('end'))
Now we create a new column which contains a list of items for every 30 minutes of a session. Just imagine a client raises every 30 minutes an event since session beginnig and another one if the last event belongs to a different window (for example begin:2017-01-01 00:24:52 end:2017-01-01 00:25:20 leads to one event while begin:2017-01-01 07:29:00 end:2017-01-01 07:34:30 raises two events):
from pyspark.sql.functions import window
from pyspark.sql.types import ArrayType,TimestampType
from pyspark.sql.functions import udf, array, explode
from datetime import timedelta
def generateRows(arr):
li = []
li.append(arr[0])
#range(begin,end)
while (li[-1] + timedelta(minutes=30)) < arr[1]:
li.append(li[-1]+ timedelta(minutes=30))
#check if last range item and end belong to different window
rounded = li[-1] - timedelta(minutes=li[-1].minute % 30, seconds=li[-1].second, microseconds=li[-1].microsecond)
if (rounded + timedelta(minutes=30)) < arr[1]:
li.append(arr[1])
return li
generateRows_udf = udf(lambda arr: generateRows(arr), ArrayType(TimestampType()))
dftoExplode = df.withColumn('toExplode', generateRows_udf(array(df.begin, df.end)))
Now we can 'explode' the toExplode column to create one row for every event:
df_exploded = dftoExplode.withColumn('EventSessionOpen', explode('toExplode'))
df_exploded = df_exploded.drop(df_exploded.toExplode)
and finally we can apply the dataframe window function to get the desired result:
result = df_exploded.groupBy(window(df_exploded.EventSessionOpen, "30 minutes")).count().orderBy("window")
result.show(truncate=False)

My NetWorkDays is sometimes more than the Calendar days

So in my job, I am trying to find the average calendar days between two dates and average business days.
For calendar days, I am using:
=[#[DateLastInReview ]]-[#[DateTimeStamp ]]
For Business days, I am using:
=NETWORKDAYS([#[DateTimeStamp ]], [#[DateLastInReview ]], Holiday)-(1-MOD([#[DateLastInReview ]]-[#[DateTimeStamp ]],1))
They should both give me the dates to a decimal place... but my issues is sometimes the business days are higher than the calendar days. Which makes no sense. Because of how the data is being copied from excel, I had to show the time Stamp separately from date Stamp. But, they are in the same cell when using the NETWORKDAYS equation.
+----------------+---------------+-------------------+------------------+--------+--------+
| DateTimeStamp | DateStampTime | DateLastInReview | LastInReviewTime | PreCal | PreBus |
+----------------+---------------+-------------------+------------------+--------+--------+
| 2018-07-11 | 12:00 AM | 7/12/2018 | 12:00 AM | 1.00 | 1.00 |
| 2018-07-09 | 3:07 PM | 7/10/2018 | 9:42 AM | 0.77 | 0.77 |
| 2018-07-02 | 12:34 PM | 7/3/2018 | 7:45 AM | 0.80 | 1.80 |
| 2018-07-02 | 3:34 PM | 7/3/2018 | 8:06 AM | 0.69 | 1.69 |
| 2018-07-02 | 9:59 AM | 7/3/2018 | 8:06 AM | 0.92 | 1.92 |
| 2018-06-29 | 1:54 PM | 7/2/2018 | 9:52 AM | 2.83 | 1.83 |
| 2018-07-09 | 11:46 PM | 7/11/2018 | 9:16 AM | 1.40 | 2.40 |
| 2018-06-29 | 11:57 AM | 6/29/2018 | 1:58 PM | 0.08 | 0.08 |
| 2018-07-05 | 1:29 PM | 7/6/2018 | 9:08 AM | 0.82 | 1.82 |
| 2018-07-05 | 3:49 PM | 7/6/2018 | 10:21 AM | 0.77 | 1.77 |
| 2018-06-27 | 10:31 AM | 6/28/2018 | 9:38 AM | 0.96 | 1.96 |
| 2018-07-06 | 2:46 PM | 7/9/2018 | 8:58 AM | 2.76 | 1.76 |
| 2018-06-28 | 3:32 PM | 7/10/2018 | 7:12 AM | 11.65 | 7.65 |
| 2018-06-29 | 3:04 PM | 7/2/2018 | 11:24 AM | 2.85 | 1.85 |
| 2018-07-11 | 10:28 AM | 7/11/2018 | 1:25 PM | 0.12 | 0.12 |
| 2018-07-10 | 3:30 PM | 7/11/2018 | 2:29 PM | 0.96 | 1.96 |
| 2018-06-26 | 4:09 PM | 7/3/2018 | 12:42 PM | 6.86 | 5.86 |
| 2018-06-28 | 9:18 AM | 6/28/2018 | 10:58 AM | 0.07 | 0.07 |
| 2018-07-09 | 11:39 PM | 7/11/2018 | 9:06 AM | 1.39 | 2.39 |
| 2018-07-06 | 10:40 AM | 7/6/2018 | 3:00 PM | 0.18 | 0.18 |
| 2018-07-02 | 9:33 AM | 7/2/2018 | 2:12 PM | 0.19 | 0.19 |
| 2018-07-10 | 12:00 AM | 7/10/2018 | 4:39 PM | 0.69 | 0.69 |
| 2018-07-03 | 8:20 AM | 7/6/2018 | 9:00 AM | 3.03 | 2.03 |
| 2018-06-27 | 8:52 AM | 6/29/2018 | 9:07 AM | 2.01 | 2.01 |
| 2018-07-09 | 12:50 PM | 7/11/2018 | 8:56 AM | 1.84 | 2.84 |
| 2018-07-05 | 2:56 PM | 7/6/2018 | 12:53 PM | 0.91 | 1.91 |
| 2018-07-10 | 8:43 AM | 7/10/2018 | 9:42 AM | 0.04 | 0.04 |
| 2018-07-10 | 8:43 AM | 7/10/2018 | 9:42 AM | 0.04 | 0.04 |
+----------------+---------------+-------------------+------------------+--------+--------+
Too many parentheses:
=NETWORKDAYS([#[ DateTimeStamp]],[#[ DateLastInReview]])-1-MOD([#[ DateTimeStamp]],1)+MOD([#[ DateLastInReview]],1)
The reason could be the way the NETWORKDAYS function works. =NETWORKDAYS("7/12/2018","7/13/2018") evaluates to 2, while 7/13/2018 - 7/12/2018 evaluates to 1.

spotfire- how to find create a column to calculate the time of next stage

I want to create a calculated column to show the time action end grouped by [Case ID], [Stage], and [Action]. The order of the stage is not necessary alphabetic and it could be duplicated. Say after [stage] 'C', we could have another [stage] 'c' in the future
Thanks,
Thanks for the updated test data. The data types are extremely important when asking. Also, the test data should mirror the actual data as close as possible, otherwise solutions often will not scale. For example, in the test data the values are only time. Sorting on time doesn't take into account the day, thus it's all treated equally. Since these values are actually DateTime, I have added that to the test data. These expressions will give you the expected results as identified in your question.
Rank([Time_Action_Begin],"asc",[Case ID]) as [Rank]
Min([Time_Action_Begin]) OVER (Intersect([Case ID],Next([Rank])))
RESULTS
+---------+-------+----------+------------------------+------------------------+------+
| Case ID | Stage | Action | Time_Action_Begin | Time_Action_End | Rank |
+---------+-------+----------+------------------------+------------------------+------+
| 1 | A | approve | 01/01/2016 11:30:00 PM | 01/02/2016 12:30:00 AM | 1 |
| 1 | A | approve | 01/01/2016 11:30:00 PM | 01/02/2016 12:30:00 AM | 1 |
| 1 | B | approve | 01/02/2016 12:30:00 AM | 01/02/2016 1:30:00 AM | 3 |
| 1 | B | approve | 01/02/2016 12:30:00 AM | 01/02/2016 1:30:00 AM | 3 |
| 1 | C | approve | 01/02/2016 1:30:00 AM | 01/02/2016 2:30:00 AM | 5 |
| 1 | C | approve | 01/02/2016 1:30:00 AM | 01/02/2016 2:30:00 AM | 5 |
| 1 | D | approve | 01/02/2016 2:30:00 AM | 01/02/2016 3:30:00 AM | 7 |
| 1 | D | approve | 01/02/2016 2:30:00 AM | 01/02/2016 3:30:00 AM | 7 |
| 1 | E | approve | 01/02/2016 3:30:00 AM | 01/02/2016 4:30:00 AM | 9 |
| 1 | E | approve | 01/02/2016 3:30:00 AM | 01/02/2016 4:30:00 AM | 9 |
| 1 | F | complete | 01/02/2016 4:30:00 AM | 01/02/2016 5:30:00 AM | 11 |
| 1 | F | complete | 01/02/2016 4:30:00 AM | 01/02/2016 5:30:00 AM | 11 |
| 1 | C | approve | 01/02/2016 5:30:00 AM | | 13 |
| 1 | C | approve | 01/02/2016 5:30:00 AM | | 13 |
| 2 | A | approve | 01/01/2016 10:30:00 PM | 01/02/2016 12:30:00 AM | 1 |
| 2 | A | approve | 01/01/2016 10:30:00 PM | 01/02/2016 12:30:00 AM | 1 |
| 2 | B | approve | 01/02/2016 12:30:00 AM | 01/02/2016 2:30:00 AM | 3 |
| 2 | B | approve | 01/02/2016 12:30:00 AM | 01/02/2016 2:30:00 AM | 3 |
| 2 | C | approve | 01/02/2016 2:30:00 AM | 01/02/2016 3:30:00 AM | 5 |
| 2 | C | approve | 01/02/2016 2:30:00 AM | 01/02/2016 3:30:00 AM | 5 |
| 2 | D | approve | 01/02/2016 3:30:00 AM | 01/02/2016 4:30:00 AM | 7 |
| 2 | D | approve | 01/02/2016 3:30:00 AM | 01/02/2016 4:30:00 AM | 7 |
| 2 | E | approve | 01/02/2016 4:30:00 AM | 01/02/2016 5:30:00 AM | 9 |
| 2 | E | approve | 01/02/2016 4:30:00 AM | 01/02/2016 5:30:00 AM | 9 |
| 2 | F | complete | 01/02/2016 5:30:00 AM | 01/02/2016 6:30:00 AM | 11 |
| 2 | F | complete | 01/02/2016 5:30:00 AM | 01/02/2016 6:30:00 AM | 11 |
| 2 | C | approve | 01/02/2016 6:30:00 AM | | 13 |
| 2 | C | approve | 01/02/2016 6:30:00 AM | | 13 |
+---------+-------+----------+------------------------+------------------------+------+

Resources