Generating adjstrt and adjend dates based on each partition in Spark - apache-spark

Partition by id,plan_nm
Need to convert the Sas code to Spark. We need to generate adjstrt and adjend dates based on fill_dts.
adjstrt=fill_dt;
adjend=fill_dt+days_sply-1;
If same plan name and id overlaps across fill date and adjend date of previous record then modify adjust start date of current record. If there is no overlap then fill_dt is adjstrt date and sum of filldate and days_sply-1 will be adjust end date. Below is the sample code implemented for adjust start and adjust end date.
Sas Code:
data input;
set step1;
by id plan_nm;
format adjstr adjend lastdate mmddyy10;
retain lastdate;
adjstrt=fill_dt;
adjend=fill_dt+days_sply-1;
if first.plan_nm then lastdate=.;
else do;
if adjstrt <= lastdate then do;
adjstrt = lastdate+1;
adjend = adjstrt+days_sply-1;
end;
end;
lastdate=adjend
I tried using lag function to find adjstrt and adjend but these values needs to be modified as per the logic. I am getting discrepancy in the values.
Expected Output

Related

MS Access - Multiple Queries shared Criteria in time stamp date range

MS Access 2016,
I have Multiple queries - approximately 120 - that are gathering temp history based on date criteria that I currently update manually for each query - >=#8/1/2021# And <#9/1/2021# - for the month of August 2021.
What would be the best solution to update this in one place so all queries could refer to that one date range?
Sample Query: (Usually 43 parameter points)
SELECT
History1HourAverage.TimeStamp,
History1HourAverage.Parameter001,
History1HourAverage.Value001,
History1HourAverage.Parameter002,
History1HourAverage.Value002,
History1HourAverage.Parameter003,
History1HourAverage.Value003,
History1HourAverage.Parameter004,
History1HourAverage.Value004
FROM History1HourAverage
WHERE
(
(
***(History1HourAverage.TimeStamp)>=#8/1/2021#
AND (History1HourAverage.TimeStamp)<#9/1/2021#
)***
AND ((History1HourAverage.Parameter001)="10S8MApp.nvoSpaceTemp")
AND ((History1HourAverage.Parameter002)="10S9MApp.nvoSpaceTemp")
AND ((History1HourAverage.Parameter003)="10S10MApp.nvoSpaceTemp")
AND ((History1HourAverage.Parameter004)="10S11MApp.nvoSpaceTemp")
);
Thank you
A couple of options:
Either
Add a table called DateRange with two fields, StartDate and EndDate. Enter one record in that table with the date range that you want to use for your queries. Edit each of your queries and include the DateRange table. Set the criteria for the query to WHERE History1HourAverage.TimeStamp >= DateRange.StartDate And History1HourAverage.TimeStamp < DateRange.EndDate
Alternatively
Create a VBA module with two functions
Public Function StartDate() As Date
StartDate = #8/1/2021#
End Function
Public Function EndDate() As Date
EndDate = #9/1/2021#
End Function
Set your query criteria for the query to WHERE History1HourAverage.TimeStamp >= StartDate() And History1HourAverage.TimeStamp < EndDate()
When you want to use a different date range you either (1) edit the data in DateRange table, or (2) edit your functions to return the new dates.

How can I split weekly data to monthly using Excel/ Power Pivot?

My Data is in weekly buckets. I want to split the number into a monthly number but, since there is an overlap in days falling in both the months, I want a weighted average of the data in terms of days that fall in each of the months. For example:
Now, in the above picture, I want to split that 200 (5/7*200 in Jan, 2/7 in Feb). How can I do that using Excel/ Power Pivot/ Dax Functions? Any help here is much appreciated.
Thank you!
Assuming your fact table looks something like below. Values are associated with the starting date of the week it occurred.
Although it may actually be a more granular data, having multiple rows for each week with additional attributes (such as identifiers of a person, a store, depending on the business), what being shown below will work the same.
What we need to do first is to create a date table. We can do that in "Design" tab, by clicking "Date Table", then "New".
In this date table, we need to add a column for starting date of the week which the date of each row is in. Set the cursor to "Add Column" area, and input following formula. Then rename this column to "Week Start Date".
= [Date] - [Day Of Week Number] + 1
Now, we can define the measure to calculate the number allocated to each month with following formula. What this measure is doing is:
Iterating over each row of the fact table
Count the number of days for the week visible in the filter context
Add the value portion for the visible days
Value Allocation := SUMX (
MyData,
VAR WeekStartDate = MyData[Week]
VAR NumDaysInSelection = COUNTROWS (
FILTER (
'Calendar',
'Calendar'[Week Start Date] = WeekStartDate
)
)
VAR AllocationRate = DIVIDE ( NumDaysInSelection, 7 )
RETURN AllocationRate * MyData[Value]
)
Result in the pivot table will be looking like this.

How to find date periods between 2 dates?

I have 2 dates one is stored inside my date and for other date I am using calculated column in order to store the end date into that, how an I calculate the difference in time period between those dates, I need the date period between all those dates is that possible with DAX?
How can I use calculated column inside my DAX and also I dont have a calender table inside my database.
2019-05-31 and end date is 2019-06-03 then the difference will give me 3 dates that is 2019-05-31,2019-06-01 2019-06-02 and 2019-06-03
Totally possible and easy. If you just need the difference between dates in two columns you can create a calculated column using the following:
DateDiff =
DATEDIFF ( 'Table'[Date1], 'Table'[Date2], DAY )
This will take the difference between Date1 and Date2 in days.
DECLARE #start_date [date] = CAST(‘2012-08-01’ as [date])
DECLARE #end_date [date] = CAST(‘2012-09-01’ as [date])
SELECT
DATEADD(day, [v].[number], #start_date)
FROM
[master].[dbo].[spt_values] [v]
WHERE
[v].[type] = ‘P’ AND
DATEADD(day, [v].[number], #start_date) <= #end_date

setting date range filter in saved search formula

I’m looking for some guidance on how to write date range filter within saved search formula. For example, below is what I have so far to see sales of certain type, but also need to add date filter so get only sales for this year to date. I understand I can set date filter in filter section, but I also need to add other columns in this report which will display transaction amounts of different types and dates. Thanks in advance for any input.
Formula:
CASE WHEN {transaction.type} = 'Invoice' AND {transaction.custbody1} = 'Direct' THEN {transaction.amount} end
You can use the SQL TO_CHAR function to return text portions of a date, for comparison with the corresponding text portions of sysdate. In the example you give, that would look like:
CASE WHEN {transaction.type} = 'Invoice'
AND {transaction.custbody1} = 'Direct'
AND TO_CHAR({trandate}, 'YYYY') = TO_CHAR(sysdate, 'YYYY')
THEN {transaction.amount}
END
You can use the same approach with different formats to get almost any date. You could also use EXTRACT to get slightly better performance in some situations. EXTRACT returns a number rather than text.
CASE WHEN {transaction.type} = 'Invoice'
AND {transaction.custbody1} = 'Direct'
AND EXTRACT(YEAR from {trandate}) = EXTRACT(YEAR FROM sysdate)
THEN {transaction.amount}
END
You may find TO_CHAR to be easier to use when you want to combine different parts of a date. E.G.: TO_CHAR({trandate}, 'YYYY-MM') = TO_CHAR(sysdate, 'YYYY-MM') rather than EXTRACT(YEAR from {trandate}) = EXTRACT(YEAR from sysdate) AND EXTRACT(MONTH from {trandate}) = EXTRACT(MONTH from sysdate)

Get last item with date range and name filter in google sheets

I have the below set of records in Google Sheets. I would like to filter the rows with specific name and date range. Once I have the filtered data, I would like to fetch the last row's final amount cell data.
Ex: I would like to fetch final amount as 300 if my date(dd/mm/yyyy) range is 01/01/206 to 11/06/2016 and Name selection is 'Sandeep'.
As I have experience SQLite db, I have inserted the same records in DB and got the expected result using the below query.
select Final from MyTable where Date in (select max(Date) from MyTable WHERE Date BETWEEN '01/01/2016' AND '11/06/2016' and name = "Sandeep")
But I am not getting idea how to use multiple select statements in google sheets. It is ok for me to get result using any other way. So please help me to get the result as explained above.
= QUERY (A1:E50,"Select F where A > date '2016-1-1' and A < date '2016-6-11' and B ='Sandeep' order by A desc limit 1")
Use Column IDs A,B,C instead of name, income. Multiple columns can be given in a single Select clause separated by a ,
Dates in where clause should be written in yyyy-mm-dd format only(regardless of the format of dates in actual column)
See if this works
=index(E:E, max(filter(row(A:A), A:A>date(2016, 1, 1), A:A<date(2016, 6, 11), B:B="Sandeep")))
If you want to include start and end date, change > to >= and < to <=.

Resources