I am a bit new to powerquery and need some help with the points specified below the code.
I have a piece of working code that does the following:
Load the source
find the max "series" of a row, the format is a mix of letters and numbers, i.e. Y21Q3S1, the letters stay the same and the numbers are increasing (year, quarter, and series).
I want to look if a certain tag is assigned to a row, so I search all the tag columns if a tag is present and write that in the "Tags" column and "none" if there were none found
through grouping I find the points per tag, for each "max series"
I finally present it in a table in excel with first column being the series, then a column for the Tags as well as a column "None" if none of the Tags were present. I add a last updated date column.
The code:
let
Source = Csv.Document(Web.Contents("somefile.csv"),[Delimiter=",", Columns=32, Encoding=65001, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type with Locale" = Table.TransformColumnTypes(#"Promoted Headers", {{"Custom field (Points)", type number}}, "en-GB"),
#"Added Custom" = Table.AddColumn(#"Changed Type with Locale", "Max Series", each List.Max({[Series], [Series_1], [Series_2], [Series_3], [Series_4]})),
Tags = Table.AddColumn(#"Added Custom", "Tags", each if List.Contains({[Tags], [Tags_7], [Tags_8], [Tags_9], [Tags_10], [Tags_11], [Tags_12], [Tags_13], [Tags_14], [Tags_15], [Tags_16], [Tags_17], [Tags_18], [Tags_19], [Tags_20], [Tags_21], [Tags_22], [Tags_23]}, "tag1") then "tag1"
else if List.Contains({[Tags], [Tags_7], [Tags_8], [Tags_9], [Tags_10], [Tags_11], [Tags_12], [Tags_13], [Tags_14], [Tags_15], [Tags_16], [Tags_17], [Tags_18], [Tags_19], [Tags_20], [Tags_21], [Tags_22], [Tags_23]}, "tag2") then "tag2"
else if List.Contains({[Tags], [Tags_7], [Tags_8], [Tags_9], [Tags_10], [Tags_11], [Tags_12], [Tags_13], [Tags_14], [Tags_15], [Tags_16], [Tags_17], [Tags_18], [Tags_19], [Tags_20], [Tags_21], [Tags_22], [Tags_23]}, "tag3") then "tag3"
else "zzzNone"),
RemoveDummy = Table.SelectRows(Tags, each [ID] <> "ID-1234"),
#"Grouped Rows" = Table.Group(RemoveDummy, {"Max Series", "Tags"}, {{"Points per Tags", each List.Sum([#"Custom field (Points)]), type number}}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Tags", Order.Ascending}}),
#"Pivoted Column" = Table.Pivot(#"Sorted Rows", List.Distinct(#"Sorted Rows"[Tags]), "Tags", "Points per Tags"),
#"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"zzzNone", "None"}, {"Max Series", "Series"}}),
#"Added Custom1" = Table.AddColumn(#"Renamed Columns", "Last update", each DateTime.LocalNow()),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom1",{{"Last update", type datetime}})
in
#"Changed Type1"
The "Series" and "Tags" columns are a multivariable field, containing all series and tags and is translated by excel into multiple columns. The issue is that the number of series and tags are changing and to try coping with this I have created a dummy row with a lot of series. However, as you can see from the code this also changes and somehow "Tags_2" to "Tags_6" has disappeared and I had to error correct by removing these from the code.
Is there a dynamic way to if any column "Tags_*" contains "tag1" then... so I don't have to hard-code this?
Same goes for the "Max Series" where I would like to dynamically take max value of any columns "Series_*"
I would like to make the “Tags” step more dynamic, so that I can take input from a table in the excel sheet specifying which tags I want to search for instead of hardcoding “tag1, “tag2” etc.
My current code only assigns the points to the first tag found. However, I would like to assign points to several tags, so if two tags were found the "points" be assigned with half to each and for 3 tags they would all get one third of the points. I don’t know how to do this. Could you help me here?
As I am a bit new powerquery my code might be far from optimal, if you have some suggestions in your answers on how I can improve it that would be highly appreciated :-)
Hi mitru and welcome to StackOverflow!
You can make the 'Tags' step automatic by 'Unpivot other columns' and 'Group by' operations. To obtain this you should select all non Tag* columns and use 'Unpivot other columns'. Then please perform a Group by operation with operation = All Rows
You will receive a column populated with tables. The next step is to create a Custom columns with following formula:
=if List.Contains([Tags][Value],"tag1") then "tag1"
else if List.Contains([Tags][Value],"tag2") then "tag2"
else if List.Contains([Tags][Value],"tag3") then "tag3"
else "zzzNone"
The [Tags] is the column containing tables while [Value] is the column within each table that contains tags you are looking for.
Under below link there is a file with sample solution that I created.
https://sendeyo.com/en/608c8dee7f
Regarding bullet 3. I am not sure how the scoring system should work. Can you provide a sample data with the final output?
With the help from Gonso's post I was able to make the "tags" step more dynamic.
Furthermore, I found a solution on the bullet 3 assigning points to different tags if more than one tag is present.
I am posting the updated code here in case anyone else find the solution helpful:
let
Source = Csv.Document(Web.Contents("somefile"),[Delimiter=",", Columns=50, Encoding=65001, QuoteStyle=QuoteStyle.None]),
Promoted_Headers = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
Changed_Type_with_Locale = Table.TransformColumnTypes(Promoted_Headers, {{"Custom field (Points)", type number}}, "en-GB"),
Max_Series = Table.AddColumn(Changed_Type_with_Locale, "Max Series", each List.Max({[Series], [Series_1], [Series_2], [Series_3], [Series_4]})),
Unpivoted_Other_Columns = Table.UnpivotOtherColumns(Max_Series, {"Type", "ID", "Custom field (Points)", "Series", "Series_1", "Series_2", "Series_3", "Series_4", "Series_5", "Series_6", "Series_7", "Series_8", "Series_9", "Max Series"}, "Attribute", "Value"),
Tags = Table.Group(Unpivoted_Other_Columns, {"Type", "ID", "Custom field (Points)", "Series", "Series_1", "Series_2", "Series_3", "Series_4", "Series_5", "Series_6", "Series_7", "Series_8", "Series_9", "Max Series"}, {{"Tags", each _, type table [Type=nullable text, ID=nullable text, #"Custom field (Points)"=nullable number, Series=nullable text, Series_1=nullable text, Series_2=nullable text, Series_3=nullable text, Series_4=nullable text, Series_5=nullable text, Series_6=nullable text, Series_7=nullable text, Series_8=nullable text, Series_9=nullable text, Max Series=text, Attribute=text, Value=text]}}),
Tag1 = Table.AddColumn(Tags, "Tag1", each if List.Contains([Tags][Value],"Tag1") then 1
else 0),
Tag2 = Table.AddColumn(Tag, "Tag2", each if List.Contains([Tags][Value],"tag2") then 1
else 0),
Tag3 = Table.AddColumn(Tag2, "Tag3", each if List.Contains([Tags][Value],"tag3") then 1
else 0),
NoneTag = Table.AddColumn(Tag3, "TagNone", each if List.Sum({[Tag], [Tag2], [Tag3]}) > 0
then 0
else 1),
Tag_total = Table.AddColumn(NoneTag, "Tag_total", each List.Sum({[Tag], [Tag2], [Tag3], [TagNone]})),
Tag_update = Table.ReplaceValue(Tag_total,each [Tag1], each if [Tag1] > 0 then ([#"Custom field (Points)"] * ([Tag1] / [Tag_total])) else [Tag1],Replacer.ReplaceValue,{"Tag1"}),
Tag2_update = Table.ReplaceValue(Tag_update, each [Tag2], each if [Tag2] > 0 then ([#"Custom field (Points)"] * ([Tag2] / [Tag_total])) else [Tag2],Replacer.ReplaceValue,{"Tag2"}),
Tag3_update = Table.ReplaceValue(Tag2_update, each [Tag3], each if [Tag3] > 0 then ([#"Custom field (Points)"] * ([Tag3] / [Tag_total])) else [Tag3],Replacer.ReplaceValue,{"Tag3"}),
TagNone_update = Table.ReplaceValue(Tag3_update, each [TagNone], each if [TagNone] > 0 then ([#"Custom field (Points)"] * ([TagNone] / [Tag_total])) else [TagNone],Replacer.ReplaceValue,{"TagNone"}),
RemoveDummy = Table.SelectRows(TagNone_update, each [ID] <> "ID-1234"),
Grouped_Series = Table.Group(RemoveDummy, {"Max Series"}, {{"Tag1", each List.Sum([Tag1]), type number}, {"Tag2", each List.Sum([Tag2]), type number}, {"Tag3", each List.Sum([Tag3]), type number}, {"None", each List.Sum([TagNone]), type nullable number}}),
Sorted_Series = Table.Sort(Grouped_Series,{{"Max Series", Order.Ascending}}),
Renamed_Series = Table.RenameColumns(Sorted_Series,{{"Max Series", "Series"}}),
Added_last_update = Table.AddColumn(Renamed_Series, "Last update", each dateTime.LocalNow()),
Changed_date_Type = Table.TransformColumnTypes(Added_last_update,{{"Last update", type datetime}})
in
Changed_date_Type
Power BI junior here
How to look in each excel file from a SharePoint list and extract contents from predefined cells.
I am currently accessing a few intranet Sharepoint libraries containing .xlsx files and with the metadata of those files I am doing some reporting. For example, a library contains 10 excel files so I can graph who uploaded them, when they were uploaded, and wat category they were assigned to...
However, is there a way with Power Query to look into each and every of the files, take the value from, say cell A1 of the excel, and add it as a new column "CellA1Content"? I.e., make your own metadata from the content of the files and add them to the imported metadata table.
I've found some functions that I possibly might need:
File.Contents
Excel.CurrentWorkbook
However I am not well-versed enough in Power Query to put it all together, if it's even possible at all. I would have to do a foreach operation of some kind.
Edit: Solution
This worked. I selected the first non-hidden sheet in the excel and I also made the function so that I can pass the column and row number.
Main query:
let
Source = SharePoint.Contents("http://mysharepoint", [Implementation=null, ApiVersion=15]),
... ... ...
//Open each excel and get cell D5
#"AddedColumn1" = Table.AddColumn(#"Filtered Rows", "AddedColumn1", each GetCellContent([Content],4,5))
in
AddedColumn1
Blank query in Power BI, called GetCellContent:
let
Source = (binaryParameter,col,row) => let
Source = Excel.Workbook(binaryParameter, null, false),
UnhiddenSheets = Table.SelectRows(Source, each if [Hidden]=false and [Kind]="Sheet" then true else false),
Sheet = UnhiddenSheets{0}[Data],
Column = Table.SelectColumns(Sheet,{Text.Combine({"Column",Number.ToText(col)})}),
Cell = Record.Field(Column{row-1}, Text.Combine({"Column",Number.ToText(col)}) )
in
Cell
in
Source
You'll need a Function used in a column like this.
This is my local interpretation of your problem, without sharepoint. The same logic is shared though.
Main Query
let
Source = Folder.Contents("YourDirectory"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content", "Name"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Row1Col1", each PullRow1Col1([Content]))
in
#"Added Custom"
PullRow1Col1:
let
Source = (binaryParameter) => let
Source = Excel.Workbook(binaryParameter, null, false),
Sheet1_sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
Column1 = Sheet1_sheet{0}[Column1]
in
Column1
in
Source
I have a workbook where I fetch data from SQL Server using fixed parameter values for a SQL query.
I want to make another sheet and have the parameter for the SQL query be taken from the cell values.
I didn't find anything on this regard.
Also I would like to refresh the data as soon as the cell values changes in the other sheet.
For this to work, you need to set up three different parts:
1) A parameter table in an Excel sheet
2) Changes to the advanced editor in PowerQuery
3) A macro to refresh the PQ when any cells in the parameter table are changed
1) Excel Table
You can see I included a column called param which can hold a Parameter name to help keep straight which parameter is which.
2) PQ Advanced Editor
let
ParamTable = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Param = "'" & Text.From(ParamTable[value]{0}) & "'",
Source = Sql.Database("IP Address", "Database Name", [Query="Select * from weeks#(lf)where date >= '2018-01-01' and date < " &Param])
in
Source
Equivalent alternative: (Difference in location of variable used in SQL query.)
let
ParamTable = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Param = "'" & Text.From(ParamTable[value]{0}) & "'",
Source = Sql.Database("IP Address", "Database Name", [Query="Select * from weeks#(lf)where date < " &Param & " and date >= '2018-01-01'"])
in
Source
Alternative Variable Type: (If dealing with numbers, the string markers ' aren't required)
let
ParamTable = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Param = Text.From(ParamTable[value]{0}),
Source = Sql.Database("IP Address", "Database Name", [Query="Select * from weeks#(lf)where cnt < " &Param & " and date >= '2018-01-01'"])
in
Source
Explanation:
After pulling the Parameter Table into the PQ Query (ParamTable = Excel.CurrentWorkbook(){[Name="Table1"]}[Content]), the columns can be accessed by column name [value] and the rows by a zero-index number {0}. Since I was pulling in a date-value. I needed to convert it to a string value I could insert into the SQL Query -- thus the Text.From() and the appended ''s to the ends (SQL marks strings with single ' rather than the double ")
Since I named the variable Param, to use it in the string, I substituted &Param for the value which had originally been there.
2.1 Power Query's Value.NativeQuery
let
ParamTable = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Param = ParamTable[value]{0},
Source = Value.NativeQuery(Sql.Database("IP Address", "Database Name"), "Select * from weeks where date < #dateSel and date >= '2018-01-01'",[dateSel = Param])
in
Source
Alternative Formatting:
let
ParamTable = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Param = ParamTable[value]{0},
Source = Sql.Database("IP Address", "Database Name"),
Data = Value.NativeQuery(Source, "
Select * from weeks
where date < #dateSel and date >= '2018-01-01'
",[dateSel = Param])
in
Source
Notes:
When using Value.NativeQuery(), you can pass a date or datetime value directly in as a variable without having to include the single apostrophes.
Sometimes splitting the data retrieval into a Source step and a NativeQuery step can help with PQ's sporadic firewall issues.
3) Macro
This works for a simple check if anything in the table has changed, then runs the refresh. You will need to make sure this is placed in the correct module. Items you will need to change are:
Sheet1 is the codename of the worksheet with the parameter table.
"Table" is the name of the Parameter table
"Query - Query1" is the name of the connection to be refreshed
Note: Query1 is the name of the query. Default names for connections are usually Query - & query name
Private Sub Worksheet_Change(ByVal Target As Range)
If Not Intersect(Target, Sheet1.ListObjects("Table1").DataBodyRange) Is Nothing Then
ThisWorkbook.Connections("Query - Query1").Refresh
End If
End Sub
I want to import multiple text files from a folder, each file containing two columns, into a single excel sheet, so that every new file starts in a new column. Ideally, I need the two columns from the first file and only the second column from every additional text file.
In powerquery, I tried to use the "Import From Folder (Import metadata and links about files in a folder)" functionality followed by query editor and expanding the binaries and the result was that every new file was appended at the end of the previous one. But I want every file to start a new column in the same sheet and I don't know how to do that.
How can I direct powerquery to do that?
Thanks in advance for your help!
My proposal includes 2 rather difficult steps added via the advanced editor, but it is dynamic with regard to the number of .txt files in the folder. I added a ton of comments so it should be self explanatory.
/* In this query, .txt files from a folder are combined.
Each source file has 2 columns.
The resulting table consists of both columns from the first file and each second column from the other files.
Tables are joined using each first column as key and with a left outer join
It is assumed that each file has column headers in the first row, that the first column header is the same for each file
and, preferably, the second column header differs per file, although this is not necessary.
This query is tested with the following file contents:
File1.txt:
ID,File1
1,A
2,B
3,C
4,D
File2.txt:
ID,File2
1,W
2,X
3,Y
Another file was added later on, to test for .txt files being added to the folder: works fine!
*/
let
// Standard UI:
Source = Folder.Files("C:\Users\Marcel\Documents\Forum bijdragen\StackOverflow Power Query\Multiple files in 1 folder"),
// Standard UI; step renamed
FilteredTxt = Table.SelectRows(Source, each [Extension] = ".txt"),
// Standard UI; step renamed
RemovedColumns = Table.RemoveColumns(FilteredTxt,{"Name", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"}),
// UI add custom column "FileContents" with formula Csv.Document([Content]); step renamed
AddedFileContents = Table.AddColumn(RemovedColumns, "FileContents", each Csv.Document([Content])),
// Standard UI; step renamed
RemovedBinaryContent = Table.RemoveColumns(AddedFileContents,{"Content"}),
// In the next 3 steps, temporary names for the new columns are created ("Column2", "Column3", etcetera)
// Standard UI: add custom Index column, start at 2, increment 1
#"Added Index" = Table.AddIndexColumn(RemovedBinaryContent, "Index", 2, 1),
// Standard UI: select Index column, Transform tab, Format, Add Prefix: "Column"
#"Added Prefix" = Table.TransformColumns(#"Added Index", {{"Index", each "Column" & Text.From(_, "en-US"), type text}}),
// Standard UI:
#"Renamed Columns" = Table.RenameColumns(#"Added Prefix",{{"Index", "ColumnName"}}),
// Now we have the names for the new columns
// Advanced Editor: create a list with records with FileContents (tables) and ColumnNames (text) (1 list item (or record) per txt file in the folder)
// From this list, the resulting table will be build in the next step.
ListOfRecords = Table.ToRecords(#"Renamed Columns"),
// Advanced Editor: use List.Accumulate to build the table with all columns,
// starting with Column1 of the first file (Table.FromList(ListOfRecords{0}[FileContents][Column1], each {_}),)
// adding Column2 of each file for all items in ListOfRecords.
BuildTable = List.Accumulate(ListOfRecords,
Table.FromList(ListOfRecords{0}[FileContents][Column1], each {_}),
(TableSoFar,NewColumn) =>
Table.ExpandTableColumn(Table.NestedJoin(TableSoFar, "Column1", NewColumn[FileContents], "Column1", "Dummy", JoinKind.LeftOuter), "Dummy", {"Column2"}, {NewColumn[ColumnName]})),
// Standard UI
#"Promoted Headers" = Table.PromoteHeaders(BuildTable)
in
#"Promoted Headers"