Computing First Day of Previous Quarter in Spark SQL - apache-spark

How do I derive the first day of the last quarter pertaining to any given date in Spark-SQL query using the SQL API ? Few required samples are as below:
input_date | start_date
------------------------
2020-01-21 | 2019-10-01
2020-02-06 | 2019-10-01
2020-04-15 | 2020-01-01
2020-07-10 | 2020-04-01
2020-10-20 | 2020-07-01
2021-02-04 | 2020-10-01
The Quarters generally are:
1 | Jan - Mar
2 | Apr - Jun
3 | Jul - Sep
4 | Oct - Dec
Note:I am using Spark SQL v2.4.
Any help is appreciated. Thanks.

Use the date_trunc with the negation of 3 months.
df.withColumn("start_date", to_date(date_trunc("quarter", expr("input_date - interval 3 months"))))
.show()
+----------+----------+
|input_date|start_date|
+----------+----------+
|2020-01-21|2019-10-01|
|2020-02-06|2019-10-01|
|2020-04-15|2020-01-01|
|2020-07-10|2020-04-01|
|2020-10-20|2020-07-01|
|2021-02-04|2020-10-01|
+----------+----------+

Personally I would create a table with the dates in from now for the next twenty years using excel or something and just reference that table.

Related

Use PIVOT TABLE to show occupation percentage by month from date ranges. google sheet

my first time here.
I have a lease table where each row has the follow:
Lease number | Tenant name | unit number | checkin date | checkout date.
I need to have a table (prefer a PIVOT) that will show me the total occupation (by %) per month for each year. Keep in mind that the checkin-checkout dates can be from mid of one month till mid of another month (oct 3 2020 - feb 17 2021)
Expected result:
| 2019 | 2020 | 2021
jan | 0% | 58% | 67%
feb | 67% | 21% | 42%
mar ....
as I'm not a professional sim looking for a simple/easy way to resolve it.

Conditional Cumulative Sum based on multiple columns

I have a simple inventory table in excel that looks like this:
Number of Items | Date Incoming | Date Out
-------------------------------------------------------
10 | 1 Jan 2018 | 30 Jan 2018
30 | 15 Jan 2018 | 1 May 2018
20 ! 1 Feb 2018 | 15 Mar 2018
I would like something that can give me the the total number of items that are present in the inventory at each date, that is:
1 Jan 2018 | 10
15 Jan 2018 | 40
30 Jan 2018 | 30
1 Feb 2018 | 50
15 Mar 2018 | 30
1 May 2018 | 0
What I was thing is some sort of cumulative sum where the number of items are added at "Date Incoming" and substracted at "Date Out".
Can you help me? I would prefer to avoid macros but even a vba solution if fine.
For a given date, you can do:
=sumif(#DateIn, "<="&#CellWithGivenDate, #NumberOfItems) - sumif(#DateOut, "<="&#CellWithGivenDate, #NumberOfItems)
With #NumberOfItems, #DateIn, and #DateOut being columns 1 to 3 of your sample, and #CellWithGivenDate being the relevant cell in column 1 of your expected result sample.

Generate a interval based time series using Spark SQL

I am new to Spark sql. I want to generate the following series of start time and end time which have an interval of 5 seconds for current date. So in lets say I am running my job on 1st Jan 2018 I want a series of start time and end time which have a difference of 5 seconds. So there will be 17280 records for 1 day
START TIME | END TIME
-----------------------------------------
01-01-2018 00:00:00 | 01-01-2018 00:00:04
01-01-2018 00:00:05 | 01-01-2018 00:00:09
01-01-2018 00:00:10 | 01-01-2018 00:00:14
.
.
01-01-2018 23:59:55 | 01-01-2018 23:59:59
01-02-2018 00:00:00 | 01-01-2018 00:00:05
I know I can generate this data-frame using a scala for loop. My constraint is that I can use only queries to do this.
Is there any way I can create this data structure using select * constructs?

Excel: convert float number to time

In Excel I need to convert float numbers to time.
For example:
8,3 must become 08:30
10 must become 10:00
11,3 must become 11:30
Any ideas?
Thank you
Simple:
=(INT(A1)+(A1-INT(A1))/0.6)/24
Input | Output
----- | --------
8.3 | 08:30:00
10 | 10:00:00
11.3 | 11:30:00

DAX Ranking events year over year

I have a table of data that has a format similar to the following:
EventID | Event Date
--------------------
1 | 1/1/2014
2 | 2/8/2014
3 | 10/1/2014
4 | 2/5/2014
5 | 4/1/2014
6 | 9/1/2014
What I am trying to do is create a DAX formula to rank each event in the order that it happened for the year. So I want to end up with something like this. This way I can compare the events year over year as the events don't happen on any regular time schedule.
Event Date | Year | Rank
------------------------
1/1/2014 | 2014 | 1
2/8/2014 | 2014 | 2
10/1/2014 | 2014 | 3
2/5/2015 | 2015 | 1
4/1/2015 | 2015 | 2
9/1/2015 | 2015 | 3
I have tried to do this by creating a formula that will give me the day number of the year:
Day of Year =(YEARFRAC(CONCATENATE("Jan 1 ", YEAR([Event Date])),[Event Date])*360)+1
Then using rankX on this table, but I cant seem to get the proper result. Perhaps I am not understanding the use of rankX or going about this the right way.
=RANKX(FILTER(Event,EARLIER(Event[Event Year])=Event[Event Year]),Event[Day of Year])
or
=RANKX(All(Event[Event Year]),[Day of Year],,1,Dense)
Any ideas would be much appreciated!
Thanks for any help in advance!
Create the following measures:
[Year]:=YEAR(LASTDATE(Event[Event Date]))
and
[Rank]:=RANKX(FILTER(ALL(Event),[Year]=YEAR(MAX(Event[Event Date]))),FIRSTDATE(Event[Event Date]),,1,DENSE)
and this is the result that you get:
Note: My dates are in UK format and I suspect yours were in US format, so the rankings do not appear to tally with your example, but it does work!

Resources