Overlapping Time Series Data in Power Query - excel

Hello all you power query wizards,
I have a similar question to this question: Timeseries with overlapping timeframes, using just the most recent in Excel Power Query, except my column isn't just a date column, but instead a date/time column. I am bringing together a directory of files that look like this and have overlapping times but I only want to keep the newer data instead of combining them together:
List A
List B
Does anyone have a strategy to accomplish this goal or is this something I should do outside of Power Query, such as python?
Many thanks in advance for any insight you can provide!
let
Source = Folder.Files("C:\Users\xxxx\OneDrive\Documents\Atom Projects\10MinOrtho\2. Orthometric\2021-06\10MinOrthos"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File (2)", each #"Transform File (2)"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File (2)"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File (2)", Table.ColumnNames(#"Transform File (2)"(#"Sample File (2)"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type date}, {"Column2", type time}, {"Column3", type number}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Source.Name"}),
#"Merged Date and Time" = Table.CombineColumns(#"Removed Columns", {"Column1", "Column2"}, (columns) => List.First(columns) & List.Last(columns), "Merged"),
#"Sorted Rows" = Table.Sort(#"Merged Date and Time",{{"Merged", Order.Ascending}})
in
#"Sorted Rows"

You don't describe exactly what you want to do with the overlapped times.
I suggest
remove the entries from List A that are in the overlap region with List B.
This can be done with a simple filter based on the first time listed in List B
I have assumed that List B is in date/time sorted order. If not a minor code change will be required
Then append the two lists
M Code
let
Source = Excel.CurrentWorkbook(){[Name="ListA"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date/Time", type datetime}, {"Value", type number}}),
Source2 = Excel.CurrentWorkbook(){[Name="ListB"]}[Content],
#"Changed Type2" = Table.TransformColumnTypes(Source2,{{"Date/Time", type datetime}, {"Value", type number}}),
//overlap starts at the first date from the second list
overlapStart = #"Changed Type2"[#"Date/Time"]{0},
//Filter list A to end before start time in List B
filteredA = Table.SelectRows(#"Changed Type", each [#"Date/Time"] < overlapStart),
//now combine the two lists
combLists = Table.Combine({filteredA,#"Changed Type2"})
in
combLists
Lists A & B
Combined

Related

How to group data from a column into rows based on prefix in Excel

I have a set of data in excel column. Example is 1000_1.jpg, 1000_2.jpg, 1001_1.jpg ... i am looking to convert this data into rows based on prefix of each file i.e. 1000, 1001 etc.
I have tried using the formula given by #Tom in how to group data from a column into rows based on content this guide but its only working on small set of data which i tested on 10,000 rows. But when testing on whole excel sheet that same formula is returning 0.
I am attaching excel file link here: https://drive.google.com/file/d/1vfEFh2idNpB_gMiMWPhXY2JTsAALtxS0/view?usp=sharing
Expected result is same as given in reference question This
It is quite quick done with PowerQuery
I'm not so good at it, probably it can be done more beautifully. But it works like this:
let
Source = Table.FromColumns({Lines.FromBinary(File.Contents("D:\OneDrive\Desktop\images names.csv"))}),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Images Names", type text}}),
#"Inserted Text Between Delimiters" = Table.AddColumn(#"Changed Type", "Text Between Delimiters", each Text.BetweenDelimiters([Images Names], "_", "."), type text),
#"Changed Type1" = Table.TransformColumnTypes(#"Inserted Text Between Delimiters",{{"Text Between Delimiters", Int64.Type}}),
#"Inserted Text Before Delimiter" = Table.AddColumn(#"Changed Type1", "Text Before Delimiter", each Text.BeforeDelimiter([Images Names], "_"), type text),
#"Sorted Rows" = Table.Sort(#"Inserted Text Before Delimiter",{{"Text Before Delimiter", Order.Ascending}, {"Text Between Delimiters", Order.Ascending}}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Sorted Rows", {{"Text Between Delimiters", type text}}, "de-DE"), List.Distinct(Table.TransformColumnTypes(#"Sorted Rows", {{"Text Between Delimiters", type text}}, "de-DE")[#"Text Between Delimiters"]), "Text Between Delimiters", "Images Names"),
#"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([2] <> null)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Text Before Delimiter"})
in
#"Removed Columns"
82505 rows and 40 columns
I uploades the file here:
https://1drv.ms/x/s!AncAhUkdErOkgvInwjmn2ETvY0-ysA?e=Rnpxp1
If you try to load it from PowerQuery to Excel it might not work, because the original name list is not available. But you can see how I did it and if you want just download the pasted values (in case you need to do this job only once - you got it done by me)

I am trying to create two separate columns from one column. I have cost type column that has actual and forecast as types

as shown in the image above, I am trying to separate actuals and forecast as columns. Is there a way to transform a table as shown in the image?
Its filtering and combining.
Use drop down atop the column to filter cost_types for "actuals". Remove cost_types column. Rename the amount column as actuals
Use drop down atop the column to filter cost_types for "forecast". Remove cost_types column. Rename the amount column as forecast
combine the two data sets
sample code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Filtered Rows" = Table.SelectRows(Source, each ([cost type] = "actual")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"cost type"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"amount", "actuals"}}),
#"Filtered Rows2" = Table.SelectRows(Source, each ([cost type] = "forecast")),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows2",{"cost type"}),
#"Renamed Columns2" = Table.RenameColumns(#"Removed Columns2",{{"amount", "forecast"}}),
combined = Table.Combine({#"Renamed Columns" , #"Renamed Columns2"})
in combined
alternate data structure, which makes more sense to me:
code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"project id", type text}, {"title", type text}, {"manager", type text}, {"dept", type text}, {"Subproject id", type text}, {"effective date", type text}, {"cost type", type text}, {"amount", Int64.Type}}),
#"Pivoted Column" = Table.Pivot(#"Changed Type", List.Distinct(#"Changed Type"[#"cost type"]), "cost type", "amount", List.Sum)
in #"Pivoted Column"

Compare columns return maximum power query

I have data from multiple suppliers which I wish to compare. The data shown in the image below has been previously transformed via a series of steps using power query. The final step was to pivot the Supplier column (in this example consisting of X,Y,Z) so that these new columns can be compared and the maximum value is returned.
How can I compare the values in columns X, Y and Z to do this? Importantly, X Y and Z arent necessarily the only suppliers. If I Add Say A as a new supplier to the original data, a new column A will be generated and I wish to include this in the comparison so that at column at the end outputs the highest value found for each row. So reading from the top down it would read in this example: 3,3,1,1,5,0.04,10 etc.
Thanks
Link to file https://onedrive.live.com/?authkey=%21AE_6NgN3hnS6MpA&id=8BA0D02D4869CBCA%21763&cid=8BA0D02D4869CBCA
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Material", type text}, {"Residual Solvents", type text}, {"RMQ", type text}, {"COA", type text}, {"Technical Data Sheet", type text}}),
//Replace Time and null with blank
#"Replaced Value" = Table.ReplaceValue(#"Changed Type","00:00:00","",Replacer.ReplaceText,{"Material", "RMQ", "Residual Solvents", "Technical Data Sheet", "COA"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",null,"",Replacer.ReplaceValue,{"Material", "RMQ", "Residual Solvents", "Technical Data Sheet", "COA"}),
//Trims all whitespace from user
#"Power Trim" = Table.TransformColumns(#"Replaced Value1",{{"Material", #"PowerTrim", type text}, {"Residual Solvents", #"PowerTrim", type text}, {"RMQ", #"PowerTrim", type text}, {"COA", #"PowerTrim", type text}, {"Technical Data Sheet",#"PowerTrim", type text}}),
//Unpivot to develop a single column of solvent/metals/date data
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Power Trim", {"Material", "Supplier"}, "Attribute", "Value"),
//split into rows by line feed
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Unpivoted Other Columns",
{{"Value", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Value"),
#"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Value", Text.Trim, type text}}),
//filter out the blank rows
#"Filtered Rows" = Table.SelectRows(#"Trimmed Text", each ([Value] <> "" and [Value] <> "Not Provided")),
//Add custom column for separating the tables
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each try Date.FromText([Value]) otherwise
if [Value] = "Heavy Metals" or [Value] = "Residual Solvents" or [Value] = "Other" then [Value] else null),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type1",{"Custom"}),
//Filter the value and custom columns to remove contaminant type from Value column and remove dates from Custom column
#"Filtered Rows1" = Table.SelectRows(#"Filled Down", each ([Custom] = "Heavy Metals" or [Custom] = "Residual Solvents") and ([Value] <> "Heavy Metals" and [Value] <> "Residual Solvents")),
//split substance from amount
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows1", "Value",
Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"Substance", "Amount"}),
//Filter for Solvents Table
#"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter1", each ([Custom] = "Heavy Metals")),
#"Changed Type2" = Table.TransformColumnTypes(#"Filtered Rows2",{{"Amount", type number}}),
//Group by Material and Substance, then extract the Max contaminant and Source
#"Grouped Rows" = Table.Group(#"Changed Type2", {"Substance","Material", "Supplier"}, {
{"Amount", each List.Max([Amount]), type number},
{"Source", (t) => t[Attribute]{List.PositionOf(t[Amount],List.Max(t[Amount]))}, type text}
}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Substance", Order.Ascending}}),
//PIVOT to compare suppliers
#"Pivoted Column" = Table.Pivot(#"Sorted Rows", List.Distinct(#"Sorted Rows"[Supplier]), "Supplier", "Amount", List.Sum)
in
#"Pivoted Column"
Add an Index Column starting with zero (0).
Add a Custom Column:
=List.Max(
Record.ToList(
Table.SelectColumns(#"Added Index",
List.RemoveFirstN(
Table.ColumnNames(#"Pivoted Column"),3)){[Index]}))
Then remove the Index column
Algorithm
Generate a list of the relevant column names to sum.
We exclude the first three column names from this list
Note that we refer to the step PRIOR to adding the Index column for the list of column names. If we referred to the actual previous step where we added the Index column, we'd also have to remove the Last column name
Select the relevant columns
{[Index]} will return a record corresponding to the Index number.
Convert the record to a list and use List.Max

Split data grouped within cells from multiple columns into rows using Power Query Editor

Similar to a beginners question I posted: Split values in cell into columns and rows
When trying to achieve the same affect for multiple columns, power query editor can split one column as desired but for the other column copies all of the values to the split into each new row (as in the image). This makes sense however im wondering if its possible to split the data accordingly as shown in the desired outcome.
I have found a work around to this by repeating the PQE exercise twice for each column to the split and then moving the outputted columns so that they are adjacent. However this seems like an inefficient way to achieve this. Can power query split both columns as desired without having to do this twice?
I would suggest first combining the columns; then doing the split.
But when you combine the columns, you need to do this on a row-by-row basis to keep things together on the same line.
A list of each cell contents can be created with the Text.Split function.
Then the two lists can be combined using the List.Zip function.
Finally, we just split them up.
I use a Custom Column to create the joined lists. You can see the formula by clicking on the Added Custom applied step.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table6"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Material", type text}, {"Sub", type text}, {"CAS", type text}}),
//combine the two columns
#"Added Custom" = Table.AddColumn(#"Changed Type", "list", each List.Zip({
Text.Split([Sub],"#(lf)"),
Text.Split([CAS],"#(lf)")
})),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Sub", "CAS"}),
//Expand the list and split into rows
#"Expanded list" = Table.ExpandListColumn(#"Removed Columns", "list"),
#"Extracted Values" = Table.TransformColumns(#"Expanded list", {"list", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "list", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"list.1", "list.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"list.1", type text}, {"list.2", type text}}),
//Rename the splitted columns
renamed = Table.RenameColumns(#"Changed Type1",List.Zip({Table.ColumnNames(#"Changed Type1"),Table.ColumnNames(Source)}))
in
renamed
try below
The key is in the added custom columns that split on linefeed into lists, and then combine those lists into a table that can be expanded into rows. To make null handling easier I converted nulls to a text null, then back at end
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Sub", type text}, {"CAS", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"[null]",Replacer.ReplaceValue,{"Sub", "CAS"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Custom", each Text.Split([Sub],"#(lf)")),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.Split([CAS],"#(lf)")),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each Table.FromColumns({[Custom],[Custom.1]})),
#"Expanded Custom.2" = Table.ExpandTableColumn(#"Added Custom2", "Custom.2", {"Column1", "Column2"}, {"Column1", "Column2"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom.2",{"Sub", "CAS", "Custom", "Custom.1"}),
#"Replaced Value1" = Table.ReplaceValue(#"Removed Columns","[null]",null,Replacer.ReplaceValue,{"Column1", "Column2"})
in #"Replaced Value1"

Create a different pivot view in Power Query

I have the data structured in excel in the following format
What I want to do with that is to transform it into this. In simple words for each ID I want to record the difference in value from previous day, and if there is no value in previous day we just keep the current value.
As an intermediate step I am trying to transform the raw data into something like this but I am not sure how to go about it in simple Excel pivot tables, or Power query transformations.
There is something wrong with your sample because [v1-v2] is not the same method as [v5-v4, v3-v2, v8-v7] but I assume the latter ones were right
See if this works for you
Assumes data in 3 columns in a range named Table1 with column headers Dates, ID, Value
You can paste into PowerQuery using ... Advanced Editor ...
Creates a column with the value of yesterday for that ID and returns a null if nothing is found. Then does the subtraction, and pivots
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Dates", type date}, {"ID", type text}, {"Value", Int64.Type}}),
Yesterday = Table.AddColumn(#"Changed Type" , "Yesterday", (i) => List.Sum(Table.SelectRows( #"Changed Type", each ([ID] = i[ID] and Date.AddDays([Dates],1) = i[Dates]))[Value]), type number ),
#"Replaced Value" = Table.ReplaceValue(Yesterday,null,0,Replacer.ReplaceValue,{"Yesterday"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Custom", each [Value]-[Yesterday]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Value", "Yesterday"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"Dates", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"Dates", type text}}, "en-US")[Dates]), "Dates", "Custom", List.Sum)
in #"Pivoted Column"

Resources