=IF(AND(WEEKDAY(AA3,2)<5,(AA3-INT(AA3))<17/24),((INT(AA3)+1)+12/24),IF(AND(WEEKDAY(AA3,2)<5,(AA3-INT(AA3))>17/24),((INT(AA3)+2)+12/24),IF(WEEKDAY(AA3,2)=5,(INT(AA3)+4)+12/24,IF(WEEKDAY(AA3,2)=7,(INT(AA3)+2)+12/24,IF(WEEKDAY(AA3,2)=6,(INT(AA3)+3)+12/24,)))))
I am trying to find the next business day depending on day of the week and hour of the day. Here is what I have converted to DAX but it does not work and I have no idea why.
NBD =
IF (
AND (
WEEKDAY ( D2S[Actual Received Time], 2 <= 5 ),
HOUR ( D2S[Actual Received Time] ) < 14
),
(
INT ( D2S[Actual Received Time] ) + 23.99 / 24
),
IF (
AND (
WEEKDAY ( D2S[Actual Received Time], 2 ) = 5,
HOUR ( D2S[Actual Received Time] > 14 )
),
(
INT ( D2S[Actual Received Time] ) + 3 + 12 / 24
),
IF (
AND (
WEEKDAY ( D2S[Actual Received Time], 2 ) <= 5,
HOUR ( D2S[Actual Received Time] ) >= 14
),
INT ( D2S[Actual Received Time] ) + 1 + 12 / 24,
IF (
WEEKDAY ( D2S[Actual Received Time], 2 ) = 6,
INT ( D2S[Actual Received Time] ) + 2 + 12 / 24,
IF (
WEEKDAY ( D2S[Actual Received Time], 2 ) = 7,
INT ( D2S[Actual Received Time] ) + 1 + 12 / 24
)
)
)
)
)
Line 4 of your formula should read:
WEEKDAY ( D2S[Actual Received Time], 2 ) <= 5,
(with "<= 5" outside the parenthesis)
Related
Sometimes a person would Type 9999999999999999 or 0000000000 8999999998888888, ...
instead of typing their identification number. I want a column to identify those cases, meaning the cases in which a single string (In this case a number string) is typed more than 7 times.
I have no idea on how to do it.. My best guess would be to count each number using len, but that would be at least 10 IF statements opened... Any suggestions?
We can find the maximum digit count as follows (as a calculated column):
MaxDigitCount =
VAR n = LEN ( [ID] )
VAR Digits =
ADDCOLUMNS ( GENERATESERIES ( 1, n ), "#D", MID ( [ID], [Value], 1 ) )
VAR Frequencies =
GROUPBY ( Digits, [#D], "#Freq", COUNTX ( CURRENTGROUP (), [Value] ) )
RETURN
MAXX ( Frequencies, [#Freq] )
Suppose [ID] = "899999999777". Then n = 12 and Digits is the table generated by creating a list from 1 to 12 and adding the column of digits corresponding to each of those positions.
Digits =
Value
#D
1
8
2
9
3
9
4
9
5
9
6
9
7
9
8
9
9
9
10
7
11
7
12
7
Then Frequencies summarizes this table by grouping on #D and counting the number of occurrences of each distinct digit.
Frequencies =
#D
#Freq
8
1
9
8
7
3
Finally, return the maximum value in the #Freq column.
Using this, it's easy to check if the value is greater than 7. You can modify the final line to
IF ( MAXX ( Frequencies, [#Freq] ) > 7, ">7", "<=7" )
I've constructed a data model around utilization for my company's fleet within PowerQuery. We have a number of different columns in the data model. Specifically, mileage, VIN, start date, and end date (see below for example table).
Mileage | VIN | Start Date | End Date |
0 | 123 | 6/1/18 | 6/30/18 |
0 | 123 | 7/1/18 | 7/31/18 |
0 | 123 | 8/1/18 | 8/31/18 |
0 | 123 | 9/1/18 | 9/30/18 |
0 | 123 | 10/1/18 | 10/31/18 |
What I'm trying to accomplish is if mileage is equal to 0 for one month it is categorized into a bucket labeled 0-30 days, if mileage is equal to 0 for two consecutive months it is categorized as 31-60 days, and 0 mileage for more than 3 consecutive months would be categorized as >60 days. From the example above, this vehicle would be categorized in the ">60 days" bucket. Is there an easy way to do this within the data model using DAX? Please let me know if you have any follow up questions. Thank you!
Try this as a Calculated Column:
Buckets =
VAR rowDate = 'myTable'[ Start Date ]
VAR previousDate =
CALCULATE (
MAX ( 'myTable'[ Start Date ] ),
FILTER (
ALLEXCEPT ( myTable, myTable[ VIN ] ),
'myTable'[ Start Date ] < rowDate
)
)
VAR prePreviousDate =
CALCULATE (
MAX ( 'myTable'[ Start Date ] ),
FILTER (
ALLEXCEPT ( myTable, myTable[ VIN ] ),
'myTable'[ Start Date ] < previousDate
)
)
VAR PreviousMileage =
CALCULATE (
MAX ( 'myTable'[Mileage ] ),
ALLEXCEPT ( 'myTable', 'myTable'[ VIN ] ),
'myTable'[ Start Date ] = previousDate
)
VAR PrePreviousMileage =
CALCULATE (
MAX ( 'myTable'[Mileage ] ),
ALLEXCEPT ( 'myTable', 'myTable'[ VIN ] ),
'myTable'[ Start Date ] = prePreviousDate
)
RETURN
SWITCH (
TRUE (),
'myTable'[Mileage ] + IF ( ISBLANK ( PreviousMileage ), 1, PreviousMileage )
+ IF ( ISBLANK ( PrePreviousMileage ), 1, PrePreviousMileage )
= 0, "> 60 Days",
'myTable'[Mileage ] + IF ( ISBLANK ( PreviousMileage ), 1, PreviousMileage )
= 0, "31 to 60 Days",
'myTable'[Mileage ] = 0, "0 to 30 Days",
"No Days"
)
The result looks like this. I added some values for testing.
Is there a efficient way to transform the below hive table with shown target transformation. The column count in the source table is ~ 1500.
Using spark 2.0, source and target as dataframes.
(id, dt , source1_ColA, source1_ColB, source2_ColA, source2_ColB)
------------------------------------------------------------
(10,"2018-06-01", 10, 9, 5, 8 )
(20,"2018-06-01", 20, 12, 16, 11 )
The columns A,B are transformed as shown below
Target table
(id, dt , element_name, source1, source2 )
---------------------------------------
(10,"2018-06-01", ColA , 10 , 5 )
(10,"2018-06-01", ColB , 9 , 8 )
(20,"2018-06-01", ColA , 20 , 16 )
(20,"2018-06-01", ColB , 12 , 11 )
I have to create a dataframe having columns start_date and end_date where end_date > start_date using a function which randomly generates date values.
I tried something like this:
Project = pd.DataFrame({'Name': np.random.choice(['Starbucks','Macdonalds', 'KFC', 'Maruti',
'Honda','Mercedes', 'BMW', 'Reebok','Nike','Lee'],10),
'Start_Date':Project.apply(lambda row: gen_datetime(), axis = 1),
'End_Date': Project.apply(lambda row: gen_datetime() where('End_Date' > 'Start_Date' ), axis = 1)})
I don't know how to use the condition statement:
def gen_datetime(min_year=2017, max_year=datetime.now().year):
start = date(min_year, 10, 28)
years = max_year - min_year + 1
end = start + timedelta(days=365 * years)
for i in range(10):
random_date = start + (end - start) * random.random()
return random_date
Idea is generate random end time from start time by adding random timedelta:
N = 10
shift_end_date = 20
def gen_datetime(min_year=2017, max_year=datetime.now().year):
start = date(min_year, 10, 28)
years = max_year - min_year + 1
end = start + timedelta(days=365 * years)
dates = pd.date_range(start, end - timedelta(shift_end_date))
return np.random.choice(dates, N)
names = ['Starbucks','Macdonalds', 'KFC', 'Maruti',
'Honda','Mercedes', 'BMW', 'Reebok','Nike','Lee']
Project = pd.DataFrame({'Name': np.random.choice(names,N),
'Start_Date':gen_datetime()})
days = pd.to_timedelta(np.random.randint(1, shift_end_date, size=N), unit='d')
Project['End_Date'] = Project['Start_Date'] + days
print(Project)
Name Start_Date End_Date
0 Maruti 2018-07-31 2018-08-13
1 KFC 2017-11-20 2017-11-21
2 Maruti 2018-07-22 2018-07-23
3 Reebok 2018-05-13 2018-05-15
4 KFC 2018-08-16 2018-08-29
5 Starbucks 2018-03-18 2018-03-23
6 Reebok 2018-02-13 2018-03-03
7 Lee 2018-04-26 2018-05-10
8 Reebok 2018-09-11 2018-09-15
9 Honda 2018-05-15 2018-05-19
Improved solution - function return both arrays for start and end days and use parameter origin in to_datetime, need pandas 0.20.1+:
N = 10
def gen_datetime(min_year=2017, max_year=datetime.now().year):
start = pd.Timestamp(min_year, 10, 28)
years = max_year - min_year + 1
end = 365 * years
#get random sorted 2d array for days from start date
d = np.sort(np.random.randint(end, size=[2,N]), axis=0)
#convert to datetime with origin parameter
a = pd.to_datetime(d[0], unit='D',
origin=start)
b = pd.to_datetime(d[1], unit='D',
origin=start)
#return both arrays together
return a,b
#extract output to 2 variables
start, end = gen_datetime()
names = ['Starbucks','Macdonalds', 'KFC', 'Maruti',
'Honda','Mercedes', 'BMW', 'Reebok','Nike','Lee']
Project = pd.DataFrame({'Name': np.random.choice(names,N),
'Start_Date':start,
'End_Date':end}, columns=['Name','Start_Date','End_Date'])
print(Project)
Name Start_Date End_Date
0 Reebok 2017-11-20 2018-06-28
1 Nike 2018-06-12 2018-07-23
2 Reebok 2018-04-26 2018-07-06
3 BMW 2018-02-20 2018-07-14
4 Starbucks 2018-04-02 2018-09-10
5 Starbucks 2017-12-14 2018-03-29
6 Lee 2018-05-17 2018-09-13
7 Macdonalds 2017-11-01 2018-08-20
8 Reebok 2018-04-09 2018-06-27
9 Macdonalds 2018-02-21 2018-10-07
The ParaellePeriod function allows for the comparison of values between points in time (how do sales compare to a year ago). I'm doing something wrong in my use of it, but have no idea what that thing may be.
Set up
I created a bog simple PowerPivot SQL Server 2008+ source query and named it Source. The query generates 168 rows: 6 IDs (100-600) and 28 dates (first of a month from Jan 2010 to Apr 2012) all cross applied together.
; WITH SRC (groupKey, eventDate, value) AS
(
SELECT G.groupKey, D.eventDate, CAST(rand(G.groupKey * year(D.eventDate) * month(D.eventDate)) * 100 AS int)
FROM
(
SELECT 100
UNION ALL SELECT 200
UNION ALL SELECT 300
UNION ALL SELECT 400
UNION ALL SELECT 500
UNION ALL SELECT 600
) G (groupKey)
CROSS APPLY
(
SELECT CAST('2010-01-01' AS date)
UNION ALL SELECT CAST('2010-02-01' AS date)
UNION ALL SELECT CAST('2010-03-01' AS date)
UNION ALL SELECT CAST('2010-04-01' AS date)
UNION ALL SELECT CAST('2010-05-01' AS date)
UNION ALL SELECT CAST('2010-06-01' AS date)
UNION ALL SELECT CAST('2010-07-01' AS date)
UNION ALL SELECT CAST('2010-08-01' AS date)
UNION ALL SELECT CAST('2010-09-01' AS date)
UNION ALL SELECT CAST('2010-10-01' AS date)
UNION ALL SELECT CAST('2010-11-01' AS date)
UNION ALL SELECT CAST('2010-12-01' AS date)
UNION ALL SELECT CAST('2011-01-01' AS date)
UNION ALL SELECT CAST('2011-02-01' AS date)
UNION ALL SELECT CAST('2011-03-01' AS date)
UNION ALL SELECT CAST('2011-04-01' AS date)
UNION ALL SELECT CAST('2011-05-01' AS date)
UNION ALL SELECT CAST('2011-06-01' AS date)
UNION ALL SELECT CAST('2011-07-01' AS date)
UNION ALL SELECT CAST('2011-08-01' AS date)
UNION ALL SELECT CAST('2011-09-01' AS date)
UNION ALL SELECT CAST('2011-10-01' AS date)
UNION ALL SELECT CAST('2011-11-01' AS date)
UNION ALL SELECT CAST('2011-12-01' AS date)
UNION ALL SELECT CAST('2012-01-01' AS date)
UNION ALL SELECT CAST('2012-02-01' AS date)
UNION ALL SELECT CAST('2012-03-01' AS date)
UNION ALL SELECT CAST('2012-04-01' AS date)
) D (eventDate)
)
SELECT
*
FROM
SRC;
I added a derived column in PowerPivot using a formula I lifted from MSDN
=CALCULATE(SUM(Source[value]), PARALLELPERIOD(Source[eventDate], -1, year))
There are no errors displayed but there's never any calculated data. I've tried different intervals (-1, +1) and periods (year, month) but to no avail.
The only thing I could observe that was different between my demo and the MSDN was theirs had a separate dimension defined for the date. Easy enough to rectify so I created a Dates query with the following. This query generates a row for all the days between 2010-01-01 and 2012-06-01 (1096 rows)
DECLARE
#start int = 20100101
, #stop int = 20120601;
WITH L0 AS
(
SELECT
0 AS C
UNION ALL
SELECT
0
)
, L1 AS
(
SELECT
0 AS c
FROM
L0 AS A
CROSS JOIN L0 AS B
)
, L2 AS
(
SELECT
0 AS c
FROM
L1 AS A
CROSS JOIN L1 AS B
)
, L3 AS
(
SELECT
0 AS c
FROM
L2 AS A
CROSS JOIN L2 AS B
)
, L4 AS
(
SELECT
0 AS c
FROM
L3 AS A
CROSS JOIN L3 AS B
)
, L5 AS
(
SELECT
0 AS c
FROM
L4 AS A
CROSS JOIN L4 AS B
)
, NUMS AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number
FROM
L5
)
, YEARS AS
(
SELECT
Y.number
FROM
NUMS Y
WHERE
Y.number BETWEEN #start / 10000 AND #stop / 10000
)
, MONTHS AS
(
SELECT
Y.number
FROM
NUMS Y
WHERE
Y.number BETWEEN 1 and 12
)
, DAYS AS
(
SELECT
Y.number
FROM
NUMS Y
WHERE
Y.number BETWEEN 1 and 31
)
, CANDIDATES_0 AS
(
SELECT
Y.number * 10000 + M.number * 100 + D.number AS SurrogateKey
, CAST(Y.number * 10000 + M.number * 100 + D.number AS char(8)) AS DateValue
FROM
YEARS Y
CROSS APPLY
MONTHS M
CROSS APPLY
DAYS D
)
, HC AS
(
SELECT
Y.number * 10000 + M.number * 100 + D.number AS SurrogateKey
, CAST(Y.number * 10000 + M.number * 100 + D.number AS char(8)) AS DateValue
FROM
YEARS Y
CROSS APPLY
MONTHS M
CROSS APPLY
DAYS D
WHERE
D.number < 31
AND M.number IN (4,6,9,11)
UNION ALL
SELECT
Y.number * 10000 + M.number * 100 + D.number AS SurrogateKey
, CAST(Y.number * 10000 + M.number * 100 + D.number AS char(8)) AS DateValue
FROM
YEARS Y
CROSS APPLY
MONTHS M
CROSS APPLY
DAYS D
WHERE
D.number < 32
AND M.number IN (1,3,5,7,8,10,12)
UNION ALL
SELECT
Y.number * 10000 + M.number * 100 + D.number AS SurrogateKey
, CAST(Y.number * 10000 + M.number * 100 + D.number AS char(8)) AS DateValue
FROM
YEARS Y
CROSS APPLY
MONTHS M
CROSS APPLY
DAYS D
WHERE
D.number < 29
AND M.number = 2
AND
(
Y.number % 4 > 0
OR Y.number % 100 = 0 AND Y.number % 400 > 0
)
UNION ALL
SELECT
Y.number * 10000 + M.number * 100 + D.number AS SurrogateKey
, CAST(Y.number * 10000 + M.number * 100 + D.number AS char(8)) AS DateValue
FROM
YEARS Y
CROSS APPLY
MONTHS M
CROSS APPLY
DAYS D
WHERE
D.number < 30
AND M.number = 2
AND
(
Y.number % 4 = 0
OR Y.number % 100 = 0 AND Y.number % 400 = 0
)
)
, CANDIDATES AS
(
SELECT
C.SurrogateKey
, CAST(C.DateValue as date) As DateValue
FROM
HC C
WHERE
ISDATE(c.DateValue) = 1
)
, PARTS
(
DateKey
, FullDateAlternateKey
, DayNumberOfWeek
, EnglishDayNameOfWeek
, DayNumberOfMonth
, DayNumberOfYear
, WeekNumberOfYear
, EnglishMonthName
, MonthNumberOfYear
, CalendarQuarter
, CalendarYear
, CalendarSemester
--,FiscalQuarter
--,FiscalYear
--,FiscalSemester
) AS
(
SELECT
CAST(C.SurrogateKey AS int)
, C.DateValue
, DATEPART(WEEKDAY, C.DateValue)
, DATENAME(WEEKDAY, C.DateValue)
, DATEPART(DAY, C.DateValue)
, DATEPART(DAYOFYEAR, C.DateValue)
, DATEPART(WEEK, C.DateValue)
, DATENAME(MONTH, C.DateValue)
, DATEPART(MONTH, C.DateValue)
, DATEPART(QUARTER, C.DateValue)
, DATEPART(YEAR, C.DateValue)
, DATEPART(WEEK, C.DateValue)
FROM
CANDIDATES C
WHERE
C.DateValue IS NOT NULL
)
SELECT
P.*
FROM
--HC P
PARTS P
ORDER BY 1;
With data generated, I created a relationship between the Source and Dates and tried this formula with no luck either
=CALCULATE(SUM(Source[value]), PARALLELPERIOD(Dates[FullDateAlternateKey], -1, year))
The PowerPivot designer looks like
Any thoughts on what I'm doing wrong?
References
PARALLELPERIOD Function
PowerPivot DAX PARALLELPERIOD vs DATEADD
The DAX expression you used in the derived column should be a measure and defined in the calculation area...
MeasurePriorPeriodValue := CALCULATE(SUM(Source[value]), PARALLELPERIOD(Source[eventDate], -1, year))
...as long as the column you use in the parallelperiod function is configured as a date datatype, it should still work. Having the date table separated from the rest is "best practice" but not required...because it allows you to ensure that there are no gaps (which can cause problems with some DAX Time-Intelligence functions) and things like that.