DAX Measure to calculate number of lost days in different year from total number of days - excel

I am trying to calculate number of days for particular year based on calendar table that i have created.
For Example: I have 3 columns.
Event, number of days and Date when this event started
Event DaysLost
Injury 30 25/12/2016
Injury 588 06/08/2012
Days in 2016 - 6
Days in 2017 - 24
For the second case:
Days in 2012 - 146
Days in 2013 - 365
Days in 2014 - 77
Now for above case there are only 6 days which need to be counted in 2016 and the rest of the days should automatically be counted in 2017. But i cannot figure out how to do it.
In my output i would like to put years in one column and days lost for year in front of that particular year.
I have a calendar table and i want sum of days to populate for a particular year.
I tried calculating it by getting end date, by adding number of days to First start date and then if days were more that remaining days in that year. subtract remaining days from total days and remaining days should move to next year. But i cannot figure out how to keep adding days for next years if days extends for many years and list them after words.
Sept 4, 2017
Please see the excel solution below
Excel solution of the problem

0) Importing the data from your Excel screenshot into Power BI results in this.
1) Create a new column in that table using the following formula for end date (to help with future formulas).
EndDate = Injuries[First Start Date] + Injuries[Days]
You stated that you have a calendar table, so you can skip to step 3
2) Create a new table by clicking on Modeling -> New Table and entering the following formula. This gives a single column table with a list of years.
Years = GENERATESERIES(2000, 2020, 1)
3) Create another new table using the following formula. This gives a table with all of the fields from the initial data table crossjoined with the Year table that was just created. The formula also filters the resulting table to only return rows where the value in the Year column is between the First Start Date and the First Start Date plus Days. To learn more about the CROSSJOIN function, check of the documentation here.
InjuriesByYear = FILTER(
CROSSJOIN(Years, Injuries),
Years[Year] >= Injuries[First Start Date].[Year] &&
Years[Year] <= Injuries[EndDate].[Year]
)
4) Create relationships from the InjuriesByYear table back to the initial data table and the Year table. This will help facilitate nicer reporting efforts.
5) In the InjuriesByYear table, create a new column by clicking on Modeling -> New Column and entering the following formula. The first IF checks if all of the days lost are in a single year. The second IF handles when the days are spread across multiple years, with the True clause handling the first year, and the False clause handling all other years.
DayPerYear = IF(
InjuriesByYear[Year] = InjuriesByYear[First Start Date].[Year] && InjuriesByYear[Year] = InjuriesByYear[EndDate].[Year], InjuriesByYear[Days],
IF(
InjuriesByYear[Year] = InjuriesByYear[First Start Date].[Year], DATEDIFF(InjuriesByYear[First Start Date], DATE(InjuriesByYear[First Start Date].[Year], 12, 31), DAY),
DATEDIFF(DATE(InjuriesByYear[Year], 1, 1), MIN(InjuriesByYear[EndDate], DATE(InjuriesByYear[Year], 12, 31)), DAY) + 1
)
)
6) To test it all out, create a pivot table as configured in below. Following these steps, the pivot table should match your Excel solution.

This is a Power Query based approach...
I started with this:
Then I added a custom column by clicking the Add Column tab and Custom Column button and completing the pop-up window like this:
...and clicking OK.
Then I changed the type for that new column by selecting it and then clicking the Transform tab and then Data Type and Date.
Then I added another custom column, completing the pop-up like this:
Then I added another custom column, completing the pop-up like this:
Then I added yet another custom column, completing the pop-up like this:
Then I expanded that last column I added by clicking on the at the top of the column and Expand to New Rows.
Then I added a final custom column, completing the pop-up like this:
Finally, I grouped by the Event, DaysLost, Started, and Year columns and summed the DaysLostForYear column by clicking the Transform tab and Group By button and completing the pop-up like this:
I end up with this:
You might want a different grouping, but this should get you close. It shows how many days were lost in the years associated with each instance of an injury's total days lost. For instance, the first injury, which was 30 days in duration, started on 12/25/2016: 7 of those days occurred in 2016 and 23 in 2017. The second injury was 588 days, started on 8/6/2012: 148 days were in 2012, 365 in 2013, and 75 in 2014.
Note that I count the started date as a lost day.
Note also that I account for leap years.
I hope this helps.
Here's the query code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Event", type text}, {"DaysLost", Int64.Type}, {"Started", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Ended", each Date.AddDays([Started],[DaysLost]-1)),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Ended", type date}}),
#"Added Custom3" = Table.AddColumn(#"Changed Type1", "DaysYearStarted", each Number.From(Date.From(Text.From(Date.Year([Started]))&"/12/31")-[Started])+1),
#"Added Custom4" = Table.AddColumn(#"Added Custom3", "DaysYearEnded", each Number.From([Ended]-Date.From(Text.From(Date.Year([Ended])-1)&"/12/31"))),
#"Added Custom5" = Table.AddColumn(#"Added Custom4", "Year", each List.Numbers(Date.Year([Started]),Date.Year([Ended])-Date.Year([Started])+1)),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom5", "Year"),
#"Added Custom1" = Table.AddColumn(#"Expanded Custom", "DaysLostForYear", each if [Year]=Date.Year([Started]) then [DaysYearStarted] else
if [Year]=Date.Year([Ended]) then [DaysYearEnded] else
if Date.IsLeapYear([Year]) then 366 else 365),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Event", "DaysLost", "Started", "Year"}, {{"DaysLostForYear", each List.Sum([DaysLostForYear]), type number}})
in
#"Grouped Rows"

Related

How to speed up dynamic columns with formulas in Power Query

The Question (How do I make it faster)
I have been playing around with Power Query in Excel for over a year now but for the first time, I have a query that takes 20+ minutes to run.
I am sure there is something here I can learn!
While it does currently work I believe if it was well-written it would run much faster.
Data Structure
There are two databases here
Database of Company (Aka attendees) - About 400 rows
Company Title
Rita Book
Paige Turner
Dee End
etc
Database of Events - About 500 rows
An Event can have many Company (Attendees). The database exports this as a comma-separated list in the column [#"Export CSV - Company"]
Event Title
Export CSV - Company
Date
Year
Event 1
Rita Book, Dee End
1/1/2015
2015
Event 2
Paige Turner
2/1/2015
2015
Event 3
Dee End
3/1/2015
2015
Event 4
Rita Book, Paige Turner, Dee End
1/1/2016
2016
etc
...
...
...
Note that I also have a separate query called #"Company Event Count - 1 Years List" which is a list of all years that events have been run.
The Goal
For a visualization, I need to get the data into the following structure:
Company Title
2015
2016
etc
John Smith
10
20
...
Jane Doe
5
14
...
etc
...
...
...
The Code
I have done my best to comment on my code below. Feel free to ask any questions.
let
// This is a function. It was the only way I could figure out how to use [Company Title] from #"Keep only names column" and "currentColumnTitleYearStr" from the dynamically created columns in the same scope
count_table_year_company = (myTbl, yearStr, companyStr) =>
Table.RowCount(
Table.SelectRows(
myTbl,
each Text.Contains([#"Export CSV - Company"], companyStr)
)
),
Source = #"Company 1 - Loaded CSV From Folder", // Grab a list of all Company
#"Keep only names column" = Table.SelectColumns(Source,{"Company Title"}), // Keep only the [Company Title] field
// Dynamically create columns for each year. Example Columns: [Company Title], [2015], [2016], [2017], etc
#"Add Columns for each year" =
List.Accumulate(
#"Company Event Count - 1 Years List", // Get a table of all events
#"Keep only names column",
(state, currentColumnTitleYearStr) => Table.AddColumn(
state,
currentColumnTitleYearStr, // The Year becomes the column title and is also used in filters
let // I hoped that filting the table by Year at this point would mean it only has to do it once per column, instead of once per cell.
eventsThisYearTbl = Table.SelectRows(
#"Event 1 - Loaded CSV From Folder",
each ([Year] = Number.FromText(currentColumnTitleYearStr))
)
in(
// Finally for each cell, calculate the count of events. E.g How many events did 'John Smith' attend in 2015
each count_table_year_company(eventsThisYearTbl, currentColumnTitleYearStr, [Company Title]) //CompanyTitleVar
)
)
),
FinalStep = #"Add Columns for each year"
in
FinalStep
My Theries
I believe one of a few things may be making it slow
I am using "List.Accumulate(" to dynamically create a column for each year. While this does work I think it may be the wrong formula for the job. Especially because the state field which is like a running total of each cell must be a huge number.
I worry that I have an 'each' where I dont need it but I cant seem to remove any. Its my understanding that every 'each' is effectively a nested loop so removing one may have a dramatic impact on performance.
In Conclusion
While it does currently work I know there is something for me to learn here.
Thank you so much any guidance or suggested readings you can provide :)
Does this do what you want? Converts from left to right. If not please explain more clearly
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
SplitNames = Table.TransformColumns(Source,{{"Names", each Text.Split(_,", ")}}),
#"Expanded Names" = Table.ExpandListColumn(SplitNames, "Names"),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Names",{"Event Title", "Date"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Count", each 1),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Added Custom", {{"Year", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Added Custom", {{"Year", type text}}, "en-US")[Year]), "Year", "Count", List.Sum)
in #"Pivoted Column"

Translate Excel Formula to Power Query

In my Power Query I have a column that shows different durations on certain items, but it displays an error when attempting to convert on time or duration.
As a solution next to my Excel Table I created a formula that alows to convert the duration in the format I wish to use, but I have not been able to translate the formula into a language that Power Query can understand (I am pretty new to Power Query).
This is how the data is pulled from source:
But I will like it to show like this:
The Excel Formula I am using to accomplish this is:
=IF(LEN([#Age])=7,"0"&[#Age],IF(LEN([#Age])=5,"00:"&[#Age],IF(LEN([#Age])=4,"00:0"&[#Age],IF(LEN([#Age])=3,"00:00"&[#Age],[#Age]))))
It will be nice to have it in the Power Query instead of the Excel sheet, as it serves as a learning oportunity.
I am self learning Power Query in Excel so any help is welcomed.
EDIT: In Case of the duration being more than 24:00:00, how will i approach it
Here is the error code it returns
You can add a custom column with the formula:
Duration.FromText(
Text.Combine(
List.LastN(
{"00"} & List.ReplaceValue(Text.Split([Age],":"),"","00",Replacer.ReplaceValue),
3),
":"))
The formula
Splits the text string by the colon into a List
Replacing blanks with {00} and also prepend the list with a {00} element
Retrieve the last three elements and combine them into a colon separated text string.
Use Duration.FromText function to convert to a duration.
Set the data type of the column to duration
In the PQ Editor, a duration will have the format of d.hh:mm:ss, but when you load it back into Excel, you can change that to [hh]:mm:ss
You can accomplish the above all in the PQ User Interface.
Here is M-Code that does the same thing:
let
Source = Excel.CurrentWorkbook(){[Name="Table16"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Age", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Duration", each Duration.FromText(
Text.Combine(
List.LastN(
{"00"} & List.ReplaceValue(Text.Split([Age],":"),"","00",Replacer.ReplaceValue),
3),
":"))),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Age"})
in
#"Removed Columns"
You can even do it (using M-Code in the Advanced Editor) without adding a column by using the Table.TransformColumns function:
let
Source = Excel.CurrentWorkbook(){[Name="Table16"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Age", type text}}),
#"Change to Duration" = Table.TransformColumns(#"Changed Type",
{"Age", each Duration.FromText(
Text.Combine(
List.LastN(
{"00"} & List.ReplaceValue(Text.Split(_,":"),"","00",Replacer.ReplaceValue),
3),
":")), type duration})
in
#"Change to Duration"
All result in:
Edit
With your modified data, now showing duration values of more than 23 hours (not allowed in a duration literal in PQ), the transformation will be different. We have to check the hours and break it into days and hours if it is more than 23.
Note: the below edit also assumes there will never be anything entered in the day location; and that entries for minutes and seconds will always be within range. If there might be day values, you will need to just add what's there to the "overflow" from the hours entry
So we change the Custom Column formula to check for that:
let
split = List.LastN({"00","00"} & List.ReplaceValue(Text.Split([Age],":"),"","00",Replacer.ReplaceValue),4),
s = Number.From(List.Last(split)),
m = Number.From(List.LastN(split,2){0}),
hTotal = Number.From(List.LastN(split,3){0}),
h = Number.Mod(hTotal,24),
d = Number.IntegerDivide(hTotal,24)
in #duration(d,h,m,s)
If you might have illegal values for minutes or seconds, you can add logig to check for that also
Also, if you will be loading this into Excel, and you might have total days >31, you will need to format it (in Excel), as [hh]:mm:ss as with the format d.hh:mm:ss Excel cannot display more than 31 days (although the proper value will be stored in the cell)

How to extract months with data and find n-th value as starting point and n-th value as ending point in Excel Power Query, maybe VBA

I have a data set which consists of Date/Time, Pressure and Custom Column. This represents pressure over time data, where I wanna know my starting point (after 5 minutes) and ending point of -before last value (row) within one month. To help you a bit out, usually the measurements are taking roughly 30-40 mins what you can see on this example down. So it means the amount of data can vary.
The Time column is calculated using:
=([#[Date/Time]]-I5)*1440+L5
This data set represents whole data and all the months with values, and I need separated (filtered) months with these starting/ending points as on the screenshot. I used Power Query a lot to play with data, but maybe there is another method to obtain those values...and make them dynamic when possible for future data.
I will also upload my dummy workbook with whole data set (all the months), filter table with months if needed for your infos and test.
https://docs.google.com/spreadsheets/d/1LGl-eri6ewCni2NJ2wGeoYIf-40KO2Lr/edit?usp=sharing&ouid=101738555398870704584&rtpof=true&sd=true
In Power Query:
Based on your shared workbook and what you have written, it seems that for any given month, you
edit: minor change in algorithm
start the minute count after excluding the first entry in the month.
If that is a typo/error, just remove the function that removes that first line
with that second entry = minute 0, return the first entry in or after minute 5 as well as the next to last entry in the table.
Note that I started with just the Date and Pressure columns
Algorithm
Add a column of monthYear
GroupBy monthYear
Custom aggregation to
Remove the first and last rows of the table
Create a list of durations in minutes of each time compared with the first time in month. This will be a minute + fraction of a minute
Add that list as a column to the original table
Determine the first entry in or after the fifth minute
Determine the last entry
Filter the month subtable to return those two entries.
If you want to see the result for just a given month, you can filter the result in the resultant Excel table.
M Code
please read the comments and examine the Applied Steps to better understand the algorithm
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date/Time", type datetime}, {"P7 [mbar]", Int64.Type}}),
//add month/year column for grouping
#"Added Custom" = Table.AddColumn(#"Changed Type", "month Year",
each Number.ToText(Date.Month([#"Date/Time"]),"00") & Number.ToText(Date.Year([#"Date/Time"]),"0000")),
#"Grouped Rows" = Table.Group(#"Added Custom", {"month Year"}, {
//elapsed minutes column
{"Elapsed Minutes", (x)=> let
//remove first and last rows from table
t=Table.RemoveColumns(Table.RemoveFirstN(Table.RemoveLastN(x)),"month Year"),
//add a column with the elapsed minutes
TableToFilter = Table.FromColumns(
Table.ToColumns(t)
& {List.Generate(
()=>[em=null, idx=0],
each [idx]< Table.RowCount(t),
each [em=Duration.TotalMinutes(t[#"Date/Time"]{[idx]+1} - t[#"Date/Time"]{0}), idx=[idx]+1],
each [em])}, type table[#"Date/Time"=datetime, #"P7 [mbar]"=number, elapsed=number]),
//filter for last entry (which would be next to last in the month
maxMinute = List.Max(TableToFilter[elapsed]),
//filter for first entry in the 5th minute
fifthMinute = List.Select(TableToFilter[elapsed], each Number.IntegerDivide(_,1)>=5){0},
//select the 5th minute and the last row
FilteredTable = Table.SelectRows(TableToFilter, each [elapsed]=fifthMinute or [elapsed]=maxMinute)
in FilteredTable,type table[#"Date/Time"=datetime, #"P7 [mbar]"=number, elapsed=number]}
}),
//remove uneeded column and expand the others
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"month Year"}),
#"Expanded Elapsed Minutes" = Table.ExpandTableColumn(#"Removed Columns", "Elapsed Minutes", {"Date/Time", "P7 [mbar]"}, {"Date/Time", "P7 [mbar]"})
in
#"Expanded Elapsed Minutes"
Results from your shared workbook data
In Office/Excel 365
Filter Column (eg for January 2020)
E4: 1/1/2020
E5: 1/1/2020
Results
F4 (date/time 5th minute): =IF(COUNTIFS(Table1[Date/Time],">="&E4,Table1[Date/Time],"<" & EDATE(E4,1))=0,"",
LET(x,FILTER(Table1[Date/Time],(Table1[Date/Time]>=E4)*(Table1[Date/Time]<EDATE(E4,1))),
y, (x-INDEX(x,2))*1440,
z, XMATCH(5,y,1),
INDEX(x,z,1)))
G4: (Pressure 5th minute): =IF(F4="","",
LET(x,FILTER(Table1,(Table1[Date/Time]>=E4)*(Table1[Date/Time]<EDATE(E4,1))),
y, (INDEX(x,0,1)-INDEX(x,2,1))*1440,
z, XMATCH(5,y,1),
INDEX(x,z,2)))
F5: (Date next to last): =IF(COUNTIFS(Table1[Date/Time],">="&E5,Table1[Date/Time],"<" & EDATE(E5,1))=0,"",
LET(x,FILTER(Table1[Date/Time],(Table1[Date/Time]>=E5)*(Table1[Date/Time]<EDATE(E5,1))),
INDEX(x,COUNT(x)-1)))
G5: (Pressure next to last):=IF(F5="","",
LET(x,FILTER(Table1,(Table1[Date/Time]>=E5)*(Table1[Date/Time]<EDATE(E5,1))),
INDEX(x,COUNT(INDEX(x,0,1))-1,2)))

Count active members between 2 dates PowerPivot DAX

Apologies if I make any errors, first time posting here!
I have a dataset that I've read into the Excel data model using PowerQuery, I've split this into 3 tables that I've linked through a unique ID field (so one main table with just the unique IDs and general info then two tables linked from it).
What I want to do is take one of the linked tables that looks like this:
ID
Start Date
End Date
Category
123456
01/01/2000
01/01/2001
A
I've created a separate date table and what I want is a count of every active ID for each month of the date table which I managed using CALCULATE and FILTER in a column on the date table. But when I load that into the Pivot it ignores the categories.
I tried relating the date table using the start date field of the other table but it didn't make any difference.
I've found tonnes of PowerBI solutions that involve calculated tables but being Excel based is a requirement.
Thanks in advance!
I'm afraid that to expand the date interval in Power Query we need to write a line of M code.
This is a small sample that creates a sample table with the columns of the table in your question. I used different value to keep the example simple.
The idea is to expand the dates interval creating a M List, containing the interval of dates expanded. Then to use this list to create the new rows with the new column "Date".
The last step removes the "Start Date" and "End Date" columns
This code can be directly pasted into the advanced query editor for a new blank query.
let
Source = #table(
type table
[
#"ID"=number,
#"Start Date"=date,
#"End Date"=date,
#"Category"=text
],
{
{1,#date(2020,1,1),#date(2020,1,2), "A"},
{2,#date(2020,1,10),#date(2020,1,12), "A"}
}
),
SourceWithList = Table.AddColumn(Source, "Date",
each List.Dates([Start Date], Duration.Days([End Date] - [Start Date]) + 1, #duration(1, 0, 0, 0))),
#"Expanded DateList" = Table.ExpandListColumn(SourceWithList, "Date"),
#"Removed Columns" = Table.RemoveColumns(#"Expanded DateList",{"Start Date", "End Date"})
in
#"Removed Columns"
The Source statement is just needed for the example, to create the starting table.
The SourceWithList is the M code to be written: it adds a column using the function Table.AddColumn(), and creates the new column using the function List.Dates().
This function requires the start date, the duration and the step interval.
The duration is computed with the function Duration.Days() that returns the difference between two dates as number of days.
To create the #"Expanded DateList" step it's possible to use the Power Query interface clicking on "Expand to new rows" in the column menu. The screenshot I took are in Power BI, but the Power Pivot interface for Power Query is very similar.
Then remove the "Start Date" and "End Date" columns by selecting the column and clicking on "Remove Columns"

Table.RemoveColumns based on the date

I am looking to insert a remove column step which removes any column where the header (which is a date) is before a certain date (older than X years prior to the current date). I receive a large data dump which is just a list of client names and fees they pay each month from 2012 to today, headed by the month they pay each fee, but as time goes on I don't need the oldest of the data.
So far I have tried producing a list from the headers (based on a previous response from another board member - thankyou #horseyride!) and then removing the columns which dont meet the criteria FROM that list. However it keeps breaking.
This is the latest line in the advanced Editor
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"Calendar Period", type text}}, "en-GB"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"Calendar Period", type text}}, "en-GB")[#"Calendar Period"]), "Calendar Period", "Approved Invoice Amount", List.Sum)
This are the lines i am attempting to create:
"ColumnList" = List.Select(Table.ColumnNames(#"Pivoted Column"), each Text.Contains(_, " ")),,
"Delete Columns"= Table.Transform(#"Pivoted Column", Table.RemoveColumns(#"ColumnList", each {})as table)
in
#"Delete Columns"
the Second bit of code I cant seem to get right - that is what I believe it should look like for now. But essentially i want the table to remove any columns where their header (a date) is prior to X amount of years older than todays date.
EDIT - Screenshot of before and after IF the desired cut off was Dec 2012:
Example Data
Thank you in advance
Just use following code. For static date:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
final = Table.SelectColumns(Source, List.Select(Table.ColumnNames(Source),
each try Date.From(_) >= #date(2012,12,1) otherwise true))
in
final
For dynamic date (older than 3 years prior to the current date):
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
final = Table.SelectColumns(Source, List.Select(Table.ColumnNames(Source),
each try Date.From(_) >= Date.AddYears(Date.From(DateTime.FixedLocalNow()),-3)
otherwise true))
in
final

Resources