Table.RemoveColumns based on the date - excel

I am looking to insert a remove column step which removes any column where the header (which is a date) is before a certain date (older than X years prior to the current date). I receive a large data dump which is just a list of client names and fees they pay each month from 2012 to today, headed by the month they pay each fee, but as time goes on I don't need the oldest of the data.
So far I have tried producing a list from the headers (based on a previous response from another board member - thankyou #horseyride!) and then removing the columns which dont meet the criteria FROM that list. However it keeps breaking.
This is the latest line in the advanced Editor
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"Calendar Period", type text}}, "en-GB"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"Calendar Period", type text}}, "en-GB")[#"Calendar Period"]), "Calendar Period", "Approved Invoice Amount", List.Sum)
This are the lines i am attempting to create:
"ColumnList" = List.Select(Table.ColumnNames(#"Pivoted Column"), each Text.Contains(_, " ")),,
"Delete Columns"= Table.Transform(#"Pivoted Column", Table.RemoveColumns(#"ColumnList", each {})as table)
in
#"Delete Columns"
the Second bit of code I cant seem to get right - that is what I believe it should look like for now. But essentially i want the table to remove any columns where their header (a date) is prior to X amount of years older than todays date.
EDIT - Screenshot of before and after IF the desired cut off was Dec 2012:
Example Data
Thank you in advance

Just use following code. For static date:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
final = Table.SelectColumns(Source, List.Select(Table.ColumnNames(Source),
each try Date.From(_) >= #date(2012,12,1) otherwise true))
in
final
For dynamic date (older than 3 years prior to the current date):
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
final = Table.SelectColumns(Source, List.Select(Table.ColumnNames(Source),
each try Date.From(_) >= Date.AddYears(Date.From(DateTime.FixedLocalNow()),-3)
otherwise true))
in
final

Related

Translate Excel Formula to Power Query

In my Power Query I have a column that shows different durations on certain items, but it displays an error when attempting to convert on time or duration.
As a solution next to my Excel Table I created a formula that alows to convert the duration in the format I wish to use, but I have not been able to translate the formula into a language that Power Query can understand (I am pretty new to Power Query).
This is how the data is pulled from source:
But I will like it to show like this:
The Excel Formula I am using to accomplish this is:
=IF(LEN([#Age])=7,"0"&[#Age],IF(LEN([#Age])=5,"00:"&[#Age],IF(LEN([#Age])=4,"00:0"&[#Age],IF(LEN([#Age])=3,"00:00"&[#Age],[#Age]))))
It will be nice to have it in the Power Query instead of the Excel sheet, as it serves as a learning oportunity.
I am self learning Power Query in Excel so any help is welcomed.
EDIT: In Case of the duration being more than 24:00:00, how will i approach it
Here is the error code it returns
You can add a custom column with the formula:
Duration.FromText(
Text.Combine(
List.LastN(
{"00"} & List.ReplaceValue(Text.Split([Age],":"),"","00",Replacer.ReplaceValue),
3),
":"))
The formula
Splits the text string by the colon into a List
Replacing blanks with {00} and also prepend the list with a {00} element
Retrieve the last three elements and combine them into a colon separated text string.
Use Duration.FromText function to convert to a duration.
Set the data type of the column to duration
In the PQ Editor, a duration will have the format of d.hh:mm:ss, but when you load it back into Excel, you can change that to [hh]:mm:ss
You can accomplish the above all in the PQ User Interface.
Here is M-Code that does the same thing:
let
Source = Excel.CurrentWorkbook(){[Name="Table16"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Age", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Duration", each Duration.FromText(
Text.Combine(
List.LastN(
{"00"} & List.ReplaceValue(Text.Split([Age],":"),"","00",Replacer.ReplaceValue),
3),
":"))),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Age"})
in
#"Removed Columns"
You can even do it (using M-Code in the Advanced Editor) without adding a column by using the Table.TransformColumns function:
let
Source = Excel.CurrentWorkbook(){[Name="Table16"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Age", type text}}),
#"Change to Duration" = Table.TransformColumns(#"Changed Type",
{"Age", each Duration.FromText(
Text.Combine(
List.LastN(
{"00"} & List.ReplaceValue(Text.Split(_,":"),"","00",Replacer.ReplaceValue),
3),
":")), type duration})
in
#"Change to Duration"
All result in:
Edit
With your modified data, now showing duration values of more than 23 hours (not allowed in a duration literal in PQ), the transformation will be different. We have to check the hours and break it into days and hours if it is more than 23.
Note: the below edit also assumes there will never be anything entered in the day location; and that entries for minutes and seconds will always be within range. If there might be day values, you will need to just add what's there to the "overflow" from the hours entry
So we change the Custom Column formula to check for that:
let
split = List.LastN({"00","00"} & List.ReplaceValue(Text.Split([Age],":"),"","00",Replacer.ReplaceValue),4),
s = Number.From(List.Last(split)),
m = Number.From(List.LastN(split,2){0}),
hTotal = Number.From(List.LastN(split,3){0}),
h = Number.Mod(hTotal,24),
d = Number.IntegerDivide(hTotal,24)
in #duration(d,h,m,s)
If you might have illegal values for minutes or seconds, you can add logig to check for that also
Also, if you will be loading this into Excel, and you might have total days >31, you will need to format it (in Excel), as [hh]:mm:ss as with the format d.hh:mm:ss Excel cannot display more than 31 days (although the proper value will be stored in the cell)

Comparing Cells in a row to see if adjacent cells have same values

Problem: My maximum Range is around 10000 Rows x 365 columns, I want to compare cell values across a row .
Conditions:
It has to return how many times a name is repeated in each row for every primary key
if a name comes only once in a row, that need not be shown, anything more than 2 should be displayed
It has to exclude blank cells and if it encounters "Dispatched" then it need not count further.
Requirement: Any solution either excel or macro would do.
Sample Excel File
Bag Number
8th July
9th July
10th July
11th July
12th July
13th July
20/F/43352/1
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
20/F/43352/2
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
20/F/43352/3
FINAL POLISH
QC
Dispatched
Dispatched
Dispatched
Dispatched
20/F/43352/4
Casting
Casting
Laser Cutting
Filing
Filing
FINAL POLISH
20/F/43352/5
Casting
20/F/43352/6
Casting
Casting
FINAL POLISH
Dispatched
20/F/43352/7
FILING
FILING
FILING
FINAL POLISH
FINAL POLISH
FINAL POLISH
The Output for the same should be
Bags
Casting
Filing
Final Polish
Dispatched
20/F/43347/1
3days
3 days
Yes
20/F/43347/2
3days
3 days
Yes
20/F/43347/3
2 days
3days
3 days
Yes
Background
Until very recently this process was manual so once this spreadsheet was made, it would be divided among 3 people and they would manually scan, highlight and proceed
Tried a countif condition, row wise but that again reduces 365 columns to 12 columns and leaves behind lots of unnecessary values, (if its in a station for only 1 day need not be highlighted)
Tried Pivot but did not give a summary that makes sense.
VBA is not my strong suite haven't tried anything there.
I am looking for something that will help make sense to this and highlight if any product is stuck anywhere.
Hi all, to answer all queries,
#braX I have tried countif with the department names, but the resulting table is unwieldy for my requirement. am looking for ideas to solve this
#DavidWooley-AST there are total of 12 departments, and the data is kept for an entire year, a primary key can go through each department in 45 days or more.
Also there is a chance that incase of any rework then there is a revisit to the department. thus that data also has to be captured, sorry I should have mentioned this before.
You can create the output you show using Power Query, available in Windows Excel 2010+ and Office 365.
The below should get you started.
You will have to add some lines in the Table.Group Aggregation list for other tasks.
You may also need to add code to exclude non-repeats and after "Dispatched" but you showed no examples of that in your data or results, so I did not code anything for that.
I also don't know what you mean by "highlight if any product is stuck anywhere".
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//Replace table name in next line with the "real" table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
//unpivot all except the "Bag Number" to => a three column table
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Bag Number"}, "Attribute", "Value"),
//remove unneeded Attribute column (the dates)
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Group by Bag Number
// then extract the Count for each type
// Add " days" to each count
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Bag Number"}, {
{"Filing", (t)=> "Filing " & Text.From(List.Count(List.Select(t[Value],each _ = "FILING"))) & " days"},
{"Final Polish", (t)=> "Final Polish " & Text.From(List.Count(List.Select(t[Value],each _ = "FINAL POLISH"))) & " days"}
}),
//Merge columns with commas (and hyphen for the first to the rest) to get final format
#"Merged Columns" = Table.CombineColumns(#"Grouped Rows",{"Filing", "Final Polish"},
Combiner.CombineTextByDelimiter(", ", QuoteStyle.None),"Merged"),
#"Merged Columns1" = Table.CombineColumns(#"Merged Columns",{"Bag Number", "Merged"},
Combiner.CombineTextByDelimiter(" - ", QuoteStyle.None),"A")
in
#"Merged Columns1"
Edit based on your new example of data and desired output
Given your new example, you can get the output from PQ as shown below.
Note that you can add the other departments using the same syntax as shown for those done (except for Dispatched which is treated differently).
M Code
let
//Replace table name in next line with the "real" table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
//unpivot all except the "Bag Number" to => a three column table
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Bag Number"}, "Attribute", "Value"),
//remove unneeded Attribute column (the dates)
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Change to proper case for consistency and text matching
properCase = Table.TransformColumns(#"Removed Columns",{{"Value", Text.Proper, type text}}),
//Group by Bag Number
// then extract the Count for each type
// Show null if count < 2
// Add " days" to each count
// Show only `Dispatched` if it occurrs one or more times
#"Grouped Rows" = Table.Group(properCase, {"Bag Number"}, {
{"Casting", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Casting"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Laser Cutting", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Laser Cutting"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Filing", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Filing"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Final Polish", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Final Polish"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"QC", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Qc"))
in
if x < 2 then null else Number.ToText(x) & " days", type text},
{"Dispatched", (t)=>
let
x =List.Count(List.Select(t[Value], each _ = "Dispatched"))
in
if x = 0 then null else "Dispatched", type text}
})
in
#"Grouped Rows"

Using VBA to calculate latest dates of certain tasks?

I am using a spreadsheet to log the tasks completed and in progress of a project. I was to generate some VBA code that can identify the latest delivery date within a task. However, in each task there are various sub tasks.
So the boundaries are the task which are whole numbers, and in between these whole numbers e.g. 46 and 46, are sub tasks.
The latest date needs to be calculated by examining the dates of the tasks between each whole number. E.g. 46.1,46.2,46.3 etc.
Would i be better by using the excel functions or would it be easier to use code?
e.g. the example of an excel function but in vba i would use.
Worksheets("Activity Overview").cells(n, "E").value = "=IFERROR(IF(AGGREGATE(14,7,'Sub Tasks'!S:S/(('Sub Tasks'!A:A>='Activity Overview'!A" & n & ")*('Sub Tasks'!A:A<'Activity Overview'!A" & n + 1 & ")),1),AGGREGATE(14,7,'Sub Tasks'!S:S/(('Sub Tasks'!A:A>='Activity Overview'!A" & n & ")*('Sub Tasks'!A:A<'Activity Overview'!" & n + 1 & ")),1),""""),"""")"
```
Use MAXIFS():
=MAXIFS(B:B,A:A,">="&E1,A:A,"<"&E1+1)
If one does not have MAXIFS then use AGGREGATE:
=AGGREGATE(14,7,$B$1:$B$6/(($A$1:$A$6>=E1)*($A$1:$A$6<E1+1)),1)
AGGREGATE is an array type formula and as such the references should be limited to the data range.
Here's a solution using SUMPRODUCT. It basically filters all values between the base value (>=49) and less than the base value plus one (<50).
You can also do this using Power Query aka Get & Transform available in Excel 2010+
Get Data from Table/Range
Change the Date column to Date format (it defaults to DateTime)
Create a custom column which is the Integer of the Task column and Name it Main Task
=Int64.From([Task])
Delete the original Task column
Group By the Main Task column
New Column Name Latest Date
Operation: Max
Column: Target Delivery Date
If there are any nulls in the Latest Date Column (from tasks with no delivery dates), you can either leave them, or filter them out.
Data
Results
M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Task", type number}, {"Target Delivery Date", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Main Task", each Int64.From([Task])),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Main Task"}, {{"Latest Date", each List.Max([Target Delivery Date]), type date}}),
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Latest Date] <> null))
in
#"Filtered Rows"

excel formula with multiple criteria (match and index?)

I have a table with following structure and it shows calendar entries:
| Title | Description | StartTime | EndTime | User |
.
I want to create a new table with the following structure and this table would show all users and their plans for the date which has given in the first row.:
| User | Date1 | Date2 | Date3 | …
.
My problem is something like this:
I want to show in the second table the titles of the rows if the Date1(or Date2 ..) is between Start- and End date. So I need an excel formula which I can write in all cells.
.
I could write a SQL statement like that (I know its syntax is not correct but I want to show what I need):
SELECT Title
FROM Table1, Table2
WHERE Date1 > StartDate AND Date1 < EndDate and User.Table1 = User.Table2
.............
Can you please help me?
Can't think of a simple way to do this.
First of all, how do you plan to display it if there are two titles that fall under the same date segment for the same user?
To me this looks like an effort to reverse engineer a summary table to a more detailed table, in which you will need to type in the individual column by dates - fill in all the missing data, then a simple pivot would do the job.
First you will need to keep only ONE date field, then populate all the dates in between start and end date.
From this:
*listing two titles - a and b for user ak to illustrate the problem where one user has multiple titles appearing within the same date segment.
To this: - populating all the dates where the title will appear
Then just pivot the new range to get this:
Instead of the title being listed out, we can see which date did it occur. Easily copy and paste the pivot as values, then replace the title count "1" with title name "a" to get below:
Assuming you would want the title concatenated by user, just copy the blue part, and get the end result below:
Do you have Power Query? if you have Excel 2016 version you have it (Get & Transform) in previous versions you can download it. it is a free add-in.
Go to Data
Select From Table/Range
ok
It will appear the Query Editor, there you can:
Change data type to "Date"
Go to Add Column
And 7. In date options select "Subtract Days"
Fix the negatives results Duration.Days([End] - [Start])
Add a "custom column" List.Dates([Start],[Subtraction]+1,#duration(1,0,0,0))
Click in the corner (doble arrow) and chose "Expand to New Rows"
Select and delete Columns that you won't need
Go to Transform
Click "Pivot Column"
In "Advanced Options" select "Don't aggregate"
ok
Go Home select "close & load"
Finally you get a new sheet with the new information.
You can add some filters to see a specific period of time...
The amazing thing about this is you can append all the data that you want, and then it will be a simple right click and refresh in the green table, and you will have your data fixed it.
This is the query if you just want to copy and paste in the "Advanced Editor"
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Title", type text}, {"Start", type date}, {"End", type date}, {"User", type text}}),
#"Inserted Date Subtraction" = Table.AddColumn(#"Changed Type", "Subtraction", each Duration.Days([End] - [Start])),
#"Added Custom" = Table.AddColumn(#"Inserted Date Subtraction", "Days", each List.Dates([Start],[Subtraction]+1,#duration(1,0,0,0))),
#"Expanded Days" = Table.ExpandListColumn(#"Added Custom", "Days"),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Days",{"Start", "End", "Subtraction"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"Days", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"Days", type text}}, "en-US")[Days]), "Days", "Title")
in
#"Pivoted Column"

DAX Measure to calculate number of lost days in different year from total number of days

I am trying to calculate number of days for particular year based on calendar table that i have created.
For Example: I have 3 columns.
Event, number of days and Date when this event started
Event DaysLost
Injury 30 25/12/2016
Injury 588 06/08/2012
Days in 2016 - 6
Days in 2017 - 24
For the second case:
Days in 2012 - 146
Days in 2013 - 365
Days in 2014 - 77
Now for above case there are only 6 days which need to be counted in 2016 and the rest of the days should automatically be counted in 2017. But i cannot figure out how to do it.
In my output i would like to put years in one column and days lost for year in front of that particular year.
I have a calendar table and i want sum of days to populate for a particular year.
I tried calculating it by getting end date, by adding number of days to First start date and then if days were more that remaining days in that year. subtract remaining days from total days and remaining days should move to next year. But i cannot figure out how to keep adding days for next years if days extends for many years and list them after words.
Sept 4, 2017
Please see the excel solution below
Excel solution of the problem
0) Importing the data from your Excel screenshot into Power BI results in this.
1) Create a new column in that table using the following formula for end date (to help with future formulas).
EndDate = Injuries[First Start Date] + Injuries[Days]
You stated that you have a calendar table, so you can skip to step 3
2) Create a new table by clicking on Modeling -> New Table and entering the following formula. This gives a single column table with a list of years.
Years = GENERATESERIES(2000, 2020, 1)
3) Create another new table using the following formula. This gives a table with all of the fields from the initial data table crossjoined with the Year table that was just created. The formula also filters the resulting table to only return rows where the value in the Year column is between the First Start Date and the First Start Date plus Days. To learn more about the CROSSJOIN function, check of the documentation here.
InjuriesByYear = FILTER(
CROSSJOIN(Years, Injuries),
Years[Year] >= Injuries[First Start Date].[Year] &&
Years[Year] <= Injuries[EndDate].[Year]
)
4) Create relationships from the InjuriesByYear table back to the initial data table and the Year table. This will help facilitate nicer reporting efforts.
5) In the InjuriesByYear table, create a new column by clicking on Modeling -> New Column and entering the following formula. The first IF checks if all of the days lost are in a single year. The second IF handles when the days are spread across multiple years, with the True clause handling the first year, and the False clause handling all other years.
DayPerYear = IF(
InjuriesByYear[Year] = InjuriesByYear[First Start Date].[Year] && InjuriesByYear[Year] = InjuriesByYear[EndDate].[Year], InjuriesByYear[Days],
IF(
InjuriesByYear[Year] = InjuriesByYear[First Start Date].[Year], DATEDIFF(InjuriesByYear[First Start Date], DATE(InjuriesByYear[First Start Date].[Year], 12, 31), DAY),
DATEDIFF(DATE(InjuriesByYear[Year], 1, 1), MIN(InjuriesByYear[EndDate], DATE(InjuriesByYear[Year], 12, 31)), DAY) + 1
)
)
6) To test it all out, create a pivot table as configured in below. Following these steps, the pivot table should match your Excel solution.
This is a Power Query based approach...
I started with this:
Then I added a custom column by clicking the Add Column tab and Custom Column button and completing the pop-up window like this:
...and clicking OK.
Then I changed the type for that new column by selecting it and then clicking the Transform tab and then Data Type and Date.
Then I added another custom column, completing the pop-up like this:
Then I added another custom column, completing the pop-up like this:
Then I added yet another custom column, completing the pop-up like this:
Then I expanded that last column I added by clicking on the at the top of the column and Expand to New Rows.
Then I added a final custom column, completing the pop-up like this:
Finally, I grouped by the Event, DaysLost, Started, and Year columns and summed the DaysLostForYear column by clicking the Transform tab and Group By button and completing the pop-up like this:
I end up with this:
You might want a different grouping, but this should get you close. It shows how many days were lost in the years associated with each instance of an injury's total days lost. For instance, the first injury, which was 30 days in duration, started on 12/25/2016: 7 of those days occurred in 2016 and 23 in 2017. The second injury was 588 days, started on 8/6/2012: 148 days were in 2012, 365 in 2013, and 75 in 2014.
Note that I count the started date as a lost day.
Note also that I account for leap years.
I hope this helps.
Here's the query code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Event", type text}, {"DaysLost", Int64.Type}, {"Started", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Ended", each Date.AddDays([Started],[DaysLost]-1)),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Ended", type date}}),
#"Added Custom3" = Table.AddColumn(#"Changed Type1", "DaysYearStarted", each Number.From(Date.From(Text.From(Date.Year([Started]))&"/12/31")-[Started])+1),
#"Added Custom4" = Table.AddColumn(#"Added Custom3", "DaysYearEnded", each Number.From([Ended]-Date.From(Text.From(Date.Year([Ended])-1)&"/12/31"))),
#"Added Custom5" = Table.AddColumn(#"Added Custom4", "Year", each List.Numbers(Date.Year([Started]),Date.Year([Ended])-Date.Year([Started])+1)),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom5", "Year"),
#"Added Custom1" = Table.AddColumn(#"Expanded Custom", "DaysLostForYear", each if [Year]=Date.Year([Started]) then [DaysYearStarted] else
if [Year]=Date.Year([Ended]) then [DaysYearEnded] else
if Date.IsLeapYear([Year]) then 366 else 365),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Event", "DaysLost", "Started", "Year"}, {{"DaysLostForYear", each List.Sum([DaysLostForYear]), type number}})
in
#"Grouped Rows"

Resources