Generate a interval based time series using Spark SQL - apache-spark

I am new to Spark sql. I want to generate the following series of start time and end time which have an interval of 5 seconds for current date. So in lets say I am running my job on 1st Jan 2018 I want a series of start time and end time which have a difference of 5 seconds. So there will be 17280 records for 1 day
START TIME | END TIME
-----------------------------------------
01-01-2018 00:00:00 | 01-01-2018 00:00:04
01-01-2018 00:00:05 | 01-01-2018 00:00:09
01-01-2018 00:00:10 | 01-01-2018 00:00:14
.
.
01-01-2018 23:59:55 | 01-01-2018 23:59:59
01-02-2018 00:00:00 | 01-01-2018 00:00:05
I know I can generate this data-frame using a scala for loop. My constraint is that I can use only queries to do this.
Is there any way I can create this data structure using select * constructs?

Related

Is there a way to do IntervalMatch in Azure Data Factory?

Im trying to do an IntervalMatch from the following link in ADF:
https://help.qlik.com/en-US/qlikview/May2022/Subsystems/Client/Content/QV_QlikView/Scripting/ScriptPrefixes/IntervalMatch.htm
Is there an activity (Join, ...) or another way to achieve this?
I tried to repro this in ADF data flow with sample inputs and below is the approach.
Input tables:
Event_log:
Time
Event
Comment
0:00
0
Start of shift 1
1:18
1
Line stop
2:23
2
Line restart 50%
4:15
3
Line speed 100%
8:00
4
Start of shift 2
11:43
5
End of production
Order_log:
Start
End
Order
1:00
03:35
A
2:30
07:58
B
3:04
10:27
C
7:23
11:43
D
Source transformations (source1 and source2) are taken for the above tables.
Join transformation is taken. In Join settings
Source1 (Event Log) is taken as Left stream and Source2 (Order_log) is taken as right stream.
Join type is given as Right Outer.
Joining conditions are Start<=Time and End>=Time.
Output of Join Transformation:
Start
End
Order
Time
Event
Comment
NULL
NULL
NULL
0:00
0
Start of shift 1
1:00
3:35
A
1:18
1
Line stop
1:00
3:35
A
2:23
2
Line restart 50%
2:30
7:58
B
4:15
3
Line speed 100%
3:04
10:27
C
4:15
3
Line speed 100%
3:04
10:27
C
8:00
4
Start of shift 2
7:23
11:43
D
8:00
4
Start of shift 2
7:23
11:43
D
11:43
5
End of production

Computing First Day of Previous Quarter in Spark SQL

How do I derive the first day of the last quarter pertaining to any given date in Spark-SQL query using the SQL API ? Few required samples are as below:
input_date | start_date
------------------------
2020-01-21 | 2019-10-01
2020-02-06 | 2019-10-01
2020-04-15 | 2020-01-01
2020-07-10 | 2020-04-01
2020-10-20 | 2020-07-01
2021-02-04 | 2020-10-01
The Quarters generally are:
1 | Jan - Mar
2 | Apr - Jun
3 | Jul - Sep
4 | Oct - Dec
Note:I am using Spark SQL v2.4.
Any help is appreciated. Thanks.
Use the date_trunc with the negation of 3 months.
df.withColumn("start_date", to_date(date_trunc("quarter", expr("input_date - interval 3 months"))))
.show()
+----------+----------+
|input_date|start_date|
+----------+----------+
|2020-01-21|2019-10-01|
|2020-02-06|2019-10-01|
|2020-04-15|2020-01-01|
|2020-07-10|2020-04-01|
|2020-10-20|2020-07-01|
|2021-02-04|2020-10-01|
+----------+----------+
Personally I would create a table with the dates in from now for the next twenty years using excel or something and just reference that table.

Excel: convert float number to time

In Excel I need to convert float numbers to time.
For example:
8,3 must become 08:30
10 must become 10:00
11,3 must become 11:30
Any ideas?
Thank you
Simple:
=(INT(A1)+(A1-INT(A1))/0.6)/24
Input | Output
----- | --------
8.3 | 08:30:00
10 | 10:00:00
11.3 | 11:30:00

Graphing of Timestamps from PostgreSQL and external program

I am looking for a way to graph the time a Toy Store stock boy was checking into his job. There would be a defined start and end time for the overall graph but looking to span the amount of time he/she spent on the job.
The actual database would simply have the times the stock boy checked into work and the time he/she clocked out. Example:
timeshifts table
employerID | start_time | end_time
---------------------------------------------
1 | 2014-12-10 09:00:00 | 2014-12-10 09:37:00
1 | 2014-12-10 09:53:00 | 2014-12-10 11:44:00
1 | 2014-12-10 12:00:00 | 2014-12-10 15:00:00
after extracting the data and importing into (??), my IDEAL graph output would look something like
I know PSGL can't do this on its own but not sure if I need to structure the data in any special format to calculate the X distance (something like an age(end_time,start_time) or the like).
Thank you for your help ahead of time!
In PostgreSQL, you can subtract timestamps.
select employer_id, start_time, end_time, end_time - start_time as elapsed_time
from timeshifts;
employer_id start_time end_time elapsed_time
--
1 2014-12-10 09:00:00-05 2014-12-10 09:37:00-05 00:37:00
1 2014-12-10 09:53:00-05 2014-12-10 11:44:00-05 01:51:00
1 2014-12-10 12:00:00-05 2014-12-10 15:00:00-05 03:00:00
Whether Excel can recognize the values in "elapsed_time" is a different thing. It might be easier to do the subtraction in Excel.
create temp table timeshifts (
employer_id integer not null,
start_time timestamp with time zone not null,
end_time timestamp with time zone not null,
check (start_time < end_time),
primary key (employer_id, start_time)
);
insert into timeshifts values
(1, '2014-12-10 09:00:00', '2014-12-10 09:37:00'),
(1, '2014-12-10 09:53:00', '2014-12-10 11:44:00'),
(1, '2014-12-10 12:00:00', '2014-12-10 15:00:00');

MDB query for Time

I have table as
Id Name Date Time
1 S 1-Dec-2009 9:00
2 N 1-Dec-2009 10:00
1 S 1-Dec-2009 10:30
1 S 1-Dec-2009 11:00
2 N 1-Dec-2009 11:10
Need query to display as
Id Name Date Time
1 S 1-Dec-2009 9:00
1 S 1-Dec-2009 11:00
2 N 1-Dec-2009 10:00
2 N 1-Dec-2009 11:10
My backend database is MS Access and using VB6 for Max and Min time
I would make an additional two [int] columns, say hour and minute and then use an MS Access query to sort them. It would be MUCH easier to call that in VB. The query itself would be something like the following:
SELECT * FROM YOURTABLE ORDER BY id, hour, minute;

Resources