How do I derive the first day of the last quarter pertaining to any given date in Spark-SQL query using the SQL API ? Few required samples are as below:
input_date | start_date
------------------------
2020-01-21 | 2019-10-01
2020-02-06 | 2019-10-01
2020-04-15 | 2020-01-01
2020-07-10 | 2020-04-01
2020-10-20 | 2020-07-01
2021-02-04 | 2020-10-01
The Quarters generally are:
1 | Jan - Mar
2 | Apr - Jun
3 | Jul - Sep
4 | Oct - Dec
Note:I am using Spark SQL v2.4.
Any help is appreciated. Thanks.
Use the date_trunc with the negation of 3 months.
df.withColumn("start_date", to_date(date_trunc("quarter", expr("input_date - interval 3 months"))))
.show()
+----------+----------+
|input_date|start_date|
+----------+----------+
|2020-01-21|2019-10-01|
|2020-02-06|2019-10-01|
|2020-04-15|2020-01-01|
|2020-07-10|2020-04-01|
|2020-10-20|2020-07-01|
|2021-02-04|2020-10-01|
+----------+----------+
Personally I would create a table with the dates in from now for the next twenty years using excel or something and just reference that table.
Related
my first time here.
I have a lease table where each row has the follow:
Lease number | Tenant name | unit number | checkin date | checkout date.
I need to have a table (prefer a PIVOT) that will show me the total occupation (by %) per month for each year. Keep in mind that the checkin-checkout dates can be from mid of one month till mid of another month (oct 3 2020 - feb 17 2021)
Expected result:
| 2019 | 2020 | 2021
jan | 0% | 58% | 67%
feb | 67% | 21% | 42%
mar ....
as I'm not a professional sim looking for a simple/easy way to resolve it.
I have a simple inventory table in excel that looks like this:
Number of Items | Date Incoming | Date Out
-------------------------------------------------------
10 | 1 Jan 2018 | 30 Jan 2018
30 | 15 Jan 2018 | 1 May 2018
20 ! 1 Feb 2018 | 15 Mar 2018
I would like something that can give me the the total number of items that are present in the inventory at each date, that is:
1 Jan 2018 | 10
15 Jan 2018 | 40
30 Jan 2018 | 30
1 Feb 2018 | 50
15 Mar 2018 | 30
1 May 2018 | 0
What I was thing is some sort of cumulative sum where the number of items are added at "Date Incoming" and substracted at "Date Out".
Can you help me? I would prefer to avoid macros but even a vba solution if fine.
For a given date, you can do:
=sumif(#DateIn, "<="&#CellWithGivenDate, #NumberOfItems) - sumif(#DateOut, "<="&#CellWithGivenDate, #NumberOfItems)
With #NumberOfItems, #DateIn, and #DateOut being columns 1 to 3 of your sample, and #CellWithGivenDate being the relevant cell in column 1 of your expected result sample.
I am new to Spark sql. I want to generate the following series of start time and end time which have an interval of 5 seconds for current date. So in lets say I am running my job on 1st Jan 2018 I want a series of start time and end time which have a difference of 5 seconds. So there will be 17280 records for 1 day
START TIME | END TIME
-----------------------------------------
01-01-2018 00:00:00 | 01-01-2018 00:00:04
01-01-2018 00:00:05 | 01-01-2018 00:00:09
01-01-2018 00:00:10 | 01-01-2018 00:00:14
.
.
01-01-2018 23:59:55 | 01-01-2018 23:59:59
01-02-2018 00:00:00 | 01-01-2018 00:00:05
I know I can generate this data-frame using a scala for loop. My constraint is that I can use only queries to do this.
Is there any way I can create this data structure using select * constructs?
In Excel I need to convert float numbers to time.
For example:
8,3 must become 08:30
10 must become 10:00
11,3 must become 11:30
Any ideas?
Thank you
Simple:
=(INT(A1)+(A1-INT(A1))/0.6)/24
Input | Output
----- | --------
8.3 | 08:30:00
10 | 10:00:00
11.3 | 11:30:00
I have a table of data that has a format similar to the following:
EventID | Event Date
--------------------
1 | 1/1/2014
2 | 2/8/2014
3 | 10/1/2014
4 | 2/5/2014
5 | 4/1/2014
6 | 9/1/2014
What I am trying to do is create a DAX formula to rank each event in the order that it happened for the year. So I want to end up with something like this. This way I can compare the events year over year as the events don't happen on any regular time schedule.
Event Date | Year | Rank
------------------------
1/1/2014 | 2014 | 1
2/8/2014 | 2014 | 2
10/1/2014 | 2014 | 3
2/5/2015 | 2015 | 1
4/1/2015 | 2015 | 2
9/1/2015 | 2015 | 3
I have tried to do this by creating a formula that will give me the day number of the year:
Day of Year =(YEARFRAC(CONCATENATE("Jan 1 ", YEAR([Event Date])),[Event Date])*360)+1
Then using rankX on this table, but I cant seem to get the proper result. Perhaps I am not understanding the use of rankX or going about this the right way.
=RANKX(FILTER(Event,EARLIER(Event[Event Year])=Event[Event Year]),Event[Day of Year])
or
=RANKX(All(Event[Event Year]),[Day of Year],,1,Dense)
Any ideas would be much appreciated!
Thanks for any help in advance!
Create the following measures:
[Year]:=YEAR(LASTDATE(Event[Event Date]))
and
[Rank]:=RANKX(FILTER(ALL(Event),[Year]=YEAR(MAX(Event[Event Date]))),FIRSTDATE(Event[Event Date]),,1,DENSE)
and this is the result that you get:
Note: My dates are in UK format and I suspect yours were in US format, so the rankings do not appear to tally with your example, but it does work!