Data set Transformation for Power BI - excel

Actually i'm working on Power BI to make an analysis of authors publications numbers and trends.
I have the data set shown in the image below.
A column of authors and and another for their IDs
in each cell, i'ev multiple authors at once, the same for their IDs
so my question
Is there a way to match each author with it's ID so i can proceed my analysis.
Thank you so much

Since you chose to provide your data as a screenshot, which cannot be copy/pasted into a table, I had to make up my own.
split each column into a list
combine the two lists into a table
Source
M Code (Transform=>Home=>Advanced Editor)
let
Source = Table.FromRecords(
{[Authors="Author A, Author B", #"Author(s) ID"="12345;67890;"],
[Authors="Author C,Author D,Author E", #"Author(s) ID"="444123;789012;66666;"],
[Authors="Author X, Author Y, Author Z, Author P", #"Author(s) ID"="1111;2222;3333;4444;"]}),
#"Changed Type" = Table.TransformColumnTypes(Source, {{"Authors", type text},{"Author(s) ID", type text}}),
//split each column into a List; trim the entries
authors = List.Combine(List.Transform(#"Changed Type"[Authors], each Text.Split(Text.Trim(_),","))),
IDs = List.Combine(List.Transform(#"Changed Type"[#"Author(s) ID"], each Text.Split(Text.Trim(_,";"),";"))),
//create new table
result = Table.FromColumns({authors,IDs},
type table[Authors=text, #"Author(s) ID"=text])
in
result
Result

Related

Power Query: Function to search a column for a list of keywords and return only rows with at least one match

I am making a simple Google-like search function in Power Query.
Let's say I have a column called Description in a table called Database. The user then inputs some search queries like "dog, cat, animals". I want to filter Database for rows that contain at least one of these keywords. They keywords can change each time, depending on what the user types in a named range in Excel.
I know you can filter a column in Power Query for multiple keywords, like this:
FilterRows = Table.SelectRows(LastStep, each Text.Contains([English], "dog") or Text.Contains([English], "cat")),
but those keywords are static, and the column is also static. I want to be able to control both the keywords and the column name as variables. I think I need to write a function but I am not sure how to start.
Your question requires several moving parts.
First, I would get the keywords from a named range "Keywords" into a table like this:
{KeywordTbl}
let
GetKeywords = if Excel.CurrentWorkbook(){[Name="Keywords"]}[Content]{0}[Column1] = null then null else Text.Split(Excel.CurrentWorkbook(){[Name="Keywords"]}[Content]{0}[Column1], ", "),
ConvertToTable = Table.FromList(GetKeywords,null,{"Keywords"})
in
ConvertToTable
Secondly, store the column name where you want to search in an Excel named range called "ColName". Then pull the named range into Power Query like this:
{ColName}
let
GetColName = Excel.CurrentWorkbook(){[Name="ColName"]}[Content]{0}[Column1]
in
GetColName
Then I would write a function that takes 4 variables, the table and column you want to look in, and the table and column containing the keywords:
{SearchColForKeywords}
(LookInTbl as table, KeywordTbl as table, LookInCol as text, KeywordCol as text) =>
let
RelativeMerge = Table.AddColumn(LookInTbl, "RelativeJoin",
(Earlier) => Table.SelectRows(KeywordTbl,
each Text.Contains(Record.Field(Earlier, LookInCol), Record.Field(_, KeywordCol), Comparer.OrdinalIgnoreCase))),
ExpandRelativeJoin = Table.ExpandTableColumn(RelativeMerge, "RelativeJoin", {KeywordCol}, {"Keywords found"}),
FilterRows = Table.SelectRows(ExpandRelativeJoin, each [Keywords found] <> null and [Keywords found] <> ""),
// Concatenate multiple keyword founds line into one line
GroupAllData = Table.Group(FilterRows, {"Word ID"}, {{"AllData", each _, type table [First column=text, Second column=text, ... your other columns=text]}}),
AddCol = Table.AddColumn(GroupAllData, "Keywords found", each [AllData][Keywords found]),
ExtractValues = Table.TransformColumns(AddCol, {"Keywords found", each Text.Combine(List.Transform(_, Text.From), ", "), type text}),
DeleteAllData = Table.RemoveColumns(ExtractValues,{"AllData"}),
MergeQueries = Table.NestedJoin(DeleteAllData, {"Word ID"}, FilterRows, {"Word ID"}, "DeleteAllData", JoinKind.LeftOuter),
ExpandCols = Table.ExpandTableColumn(MergeQueries, "DeleteAllData", {"First Col name", "Second col name", ... "Your Other column names here"}),
DeleteKeywordsFound = Table.RemoveColumns(ExpandCols,{"Keywords found"})
in
DeleteKeywordsFound
FYI, half of this function has been developed by a user named lmkeF on PowerBI community. The full discussion is here. I merely improved on his solution.
Finally, I will use that function in another query like this:
StepName = SearchColForKeywords(MainTbl, KeywordTbl, ColName, "Keywords"),
You may customize the 4 variable names.

Excel Power Query: Variables for Table Name

I'm trying to achieve something that seems like it should be fairly simple but I can't find an answer for... replace the name of a table or power query with a variable.
Currently trying to do this with a merge query so it would look something like this:
Table.NestedJoin(VARIABLE1,key1,VARIABLE2,key2,"Append",JoinKind.Inner)
Currently getting all sorts of errors no matter what I try...
Thank you!
// Edit:
Not really looking to do a function - hoping for users to utilize as easy as possible so they would be able to update a named table in the workbook, refresh, and then get a table as an output. Here is my current code - hopefully that'll help. My Region code replacements worked fine, but the Days replacements don't - I need each day (Monday-Thursday) to be replaced with my day variables (StartDay, Day2, etc.). Each of those has a separate text query referring back to the excel workbook inputs, and each of them should pull up a query based on the text (ex: StartDay = Monday so should pull the Monday query). This is the error I get, assuming that it is reading it as text "Monday" and not query Monday.
Expression.Error: We cannot convert the value "Monday" to type Table.
Details:
Value=Monday
Type=Type
let
ANDOriginCode = OriginRegion,
ANDDestinationCode = DestinationRegion,
ANDStartDay = StartDay,
ANDDay2 = Day2,
ANDDay3 = Day3,
ANDDay4 = Day4,
ANDDay5 = Day5,
Source = Table.NestedJoin(Monday,{"Tuesday Destination Region Code"},Tuesday,{"Tuesday Origin Region Code"},"Append1 (3)",JoinKind.Inner),
#"Filtered Rows1" = Table.SelectRows(Source, each [Monday Origin Region Code] = OriginRegion),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"ID", "Pickup Day of Week", "Delivery Day of Week"}),
#"Expanded Append1 (3)" = Table.ExpandTableColumn(#"Removed Columns", "Append1 (3)", {"Tuesday Origin Region Code", "Wednesday Destination Region Code", "Tuesday Projected Number of Loads"}, {"Tuesday Origin Region Code", "Wednesday Destination Region Code", "Tuesday Projected Number of Loads"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded Append1 (3)",{"Wednesday Destination Region Code"},Wednesday,{"Wednesday Origin Region Code"},"Append1 (4)",JoinKind.Inner),
#"Expanded Append1 (4)" = Table.ExpandTableColumn(#"Merged Queries", "Append1 (4)", {"Wednesday Origin Region Code", "Thursday Destination Region Code", "Wednesday Projected Number of Loads"}, {"Wednesday Origin Region Code", "Thursday Destination Region Code", "Wednesday Projected Number of Loads"})
#"Merged Queries1" = Table.NestedJoin(#"Expanded Append1 (4)",{"Thursday Destination Region Code"},Thursday,{"Thursday Origin Region Code"},"Append1 (5)",JoinKind.Inner)
in
#"Merged Queries1"
This might help:
let
Source = (VARIABLE1 as table, VARIABLE2 as table) => Table.NestedJoin(VARIABLE1, Key1, VARIABLE2, Key1, "Append", JoinKind.Inner)
in
Source
You can use parameters for Key1 and Key2. The function will prompt you to select your tables.
You can invoke it from any other query with:
Function.Invoke(Merge,{Table1,Table2})
Replace Merge with whatever you named the first query above and replace Table1 and Table2 with your target tables.
In case you're thinking of it, I have not been able to figure out how to pass tables from parameters. When you do that, the value you enter is recognized as text--for instance, "Table" versus Table--so it won't work. I could not find any information on how to pass a table value, like Table, in a variable. Anyhow, I hope this helps at least a little.
I was searching for this, too!
I finally found it, thanks to Chris Webb at https://blog.crossjoin.co.uk/2015/02/06/expression-evaluate-in-power-querym/
The key is using Expression.Evaluate with #shared as the second argument.
If you define Query1 as
let
Source = 1 + 1
in
Source
Query2 as
let
Source = 15 * 10
in
Source
define pIndex as a parameter that is "1" or "2", and
define QuerySwitch as
Expression.Evaluate("Query" & pIndex, #shared)
then QuerySwitch will return
2 when pIndex is "1"
150 when pIndex is "2"
My example:
I have a query QueryThatTakesFiveMinutes that
other queries use, and
writes to an Excel table (also named "QueryThatTakesFiveMinutes")
If I define a query "QueryThatTakesFiveMinutes Cached" by moving my cursor to the output QueryThatTakesFiveMinutes table in Excel and creating a new query from that table then, when I'm testing, I can change all the queries that use QueryThatTakesFiveMinutes to instead use #"QueryThatTakesFiveMinutes cached" and test downstream computation without waiting five minutes every time. Then I just need to remember to change it back when I'm ready.
But that was annoying.
I created a named range in Excel called "ProductionMode" that pointed to a specific cell that holds a value of either TRUE or FALSE
In Power-Query, I defined a very handy power query function called fNamedCellValue as
(rangeName as text) => Excel.CurrentWorkbook(){[Name=rangeName ]}[Content]{0}[Column1]
so that I can define a "ProductionMode" query as
fGetNamedCellValue("ProductionMode")
I use this in a way that's similar to the Index parameter above, but this way I can edit it via Excel.
When I defined "modeQueryThatTakesFiveMinutes" as
if ProductionMode then QueryThatTakesFiveMinutes else #"QueryThatTakesFiveMinutes Cached"
and changed all queries that use QueryThatTakesFiveMinutes to use modeQueryThatTakesFiveMinutes instead, I was very surprised to find that both QueryThatTakesFiveMinutes and #"QueryThatTakesFiveMinutes Cached" were evaluated and it didn't save any time at all!
So then after searching, being overjoyed to find your question only to realize it wasn't answered, then finding Chris Webb's article, I tried redefining modeQueryThatTakesFiveMinutes as
Expression.Evaluate(
if ProductionMode then
"QueryThatTakesFiveMinutes"
else
"#""QueryThatTakesFiveMinutes Cached""",
#shared
)
Unfortunately, instead of working, I got an error of
Formula.Firewall: Query 'modeQueryThatTakesFiveMinutes' references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
However, I found a way around this, too, by putting the offending code within a function that the consuming query executes.
Deleting ProductionMode and defining a new query fProductionMode of
() => fGetNamedCellValue("ProductionMode") as logical
now doesn't return true or false, it returns a function that will return true or false when evaluated. Why is one legal and the other isn't? I don't know, but it is! Change the definition of modeQueryThatTakesFiveMinutes to
Expression.Evaluate(
if fProductionMode() then
"QueryThatTakesFiveMinutes"
else
"#""QueryThatTakesFiveMinutes Cached""",
#shared
)
and it works!

Add custom field to a "pivot table" - Power Query

I've used Power Query to add custom fields to a table made from 2 merged tables in order to simulate a pivot table. However, I can't seem to add a filter to my final table. Is there another way to do this?
I've tried to use the Pivot table from Excel, but I can't seem to insert calculated field as desired.
Here's my Excel file:
https://ufile.io/x2v1j
I'll start with a disclaimer that I'm not exactly sure I know what you're trying to do; but I took a stab at this anyway.
I figured you were trying to filter the months in the T_Catégories query, before your grouping; so I added a manual filter step there. When I did that and deselected months, your T_Final query broke. The reason is because, as I filtered out months, it also filtered out categories that your T_Final query relied upon for column names. For instance, this affected your calculations that relied upon column names. I had to change your T_Final query so that it would dynamically determine the column names.
Again, I'm not exactly sure about what you're trying to do, so I may have gotten it wrong with respect to the calculations, but this might help get you closer at least.
Like I said, in T_Catégories, I added the filter:
That's when things broke for T_Final. So in T_Final, I needed to:
Change the step Valeur remplacée1 to = Table.ReplaceValue(#"Colonne dynamique",null,0,Replacer.ReplaceValue,Table.ColumnNames(#"Colonne dynamique"))
(I was pretty sure you were using the columns resulting from the previous step Colonne dynamique.)
Change the step Personnalisée ajoutée3 to = Table.AddColumn(#"Valeur remplacée1", "Total général", each List.Sum(List.RemoveFirstN(Record.ToList(_),1)))
(This is making a list from the record, then removing the first entry of the list and summing what remains in the list.)
Change the step Colonnes permutées to = Table.ReorderColumns(#"Personnalisée ajoutée3",Table.ColumnNames(#"Personnalisée ajoutée3"))
(I was pretty sure you were using the column resulting from the previous step Personnalisée ajoutée3.)
Change the step Personnalisée ajoutée to = Table.AddColumn(#"Colonnes permutées", "Indisponibilté", each List.Sum(List.RemoveLastN(List.RemoveFirstN(Record.ToList(_),1),2)))
(This is making a list from the record, then removing the first entry of the list, then removing the last two entries of the list, and summing what remains in the list. This is especially where I'm not sure I added the items you intended. At least you can see what I did to be able to add the columns without using static column names.)
Here's the m code for the three queries:
T_Catégories:
let
Source = Excel.CurrentWorkbook(){[Name="T_Catégories"]}[Content],
#"Type modifié" = Table.TransformColumnTypes(Source,{{"Métier", type text}, {"Code absence", Int64.Type}, {"Date", type date}, {"Catégorie", type text}}),
#"Colonnes supprimées" = Table.RemoveColumns(#"Type modifié",{"Code absence", "Date"}),
#"Filtered Rows" = Table.SelectRows(#"Colonnes supprimées", each true),
#"Lignes groupées" = Table.Group(#"Filtered Rows", {"Métier", "Catégorie"}, {{"Nombre", each Table.RowCount(_), type number}})
in
#"Lignes groupées"
T_métiers:
let
Source = Excel.CurrentWorkbook(){[Name="T_métiers"]}[Content],
#"Type modifié" = Table.TransformColumnTypes(Source,{{"Métier", type text}, {"Nombre", Int64.Type}})
in
#"Type modifié"
T_Final:
let
Source = Table.Combine({T_Catégories, T_métiers}),
#"Valeur remplacée" = Table.ReplaceValue(Source,null,"Nombre employés",Replacer.ReplaceValue,{"Catégorie"}),
#"Colonne dynamique" = Table.Pivot(#"Valeur remplacée", List.Distinct(#"Valeur remplacée"[Catégorie]), "Catégorie", "Nombre"),
#"Valeur remplacée1" = Table.ReplaceValue(#"Colonne dynamique",null,0,Replacer.ReplaceValue,Table.ColumnNames(#"Colonne dynamique")),
#"Personnalisée ajoutée3" = Table.AddColumn(#"Valeur remplacée1", "Total général", each List.Sum(List.RemoveFirstN(Record.ToList(_),1))),
#"Colonnes permutées" = Table.ReorderColumns(#"Personnalisée ajoutée3",Table.ColumnNames(#"Personnalisée ajoutée3")),
#"Personnalisée ajoutée" = Table.AddColumn(#"Colonnes permutées", "Indisponibilté", each List.Sum(List.RemoveLastN(List.RemoveFirstN(Record.ToList(_),1),2))),
#"Personnalisée ajoutée1" = Table.AddColumn(#"Personnalisée ajoutée", "Disponibilté", each [Nombre employés]*7.5),
#"Personnalisée ajoutée2" = Table.AddColumn(#"Personnalisée ajoutée1", "Taux disponibilté (%)", each (1-[Indisponibilté]/[Disponibilté])*100),
#"Type modifié" = Table.TransformColumnTypes(#"Personnalisée ajoutée2",{{"Indisponibilté", Int64.Type}, {"Disponibilté", type number}, {"Taux disponibilté (%)", type number}})
in
#"Type modifié"
I would think you can progress from here fairly well.

Trying to pull data from a SODA API into Excel

The API call looks like this:
https://data.edmonton.ca/resource/3pdp-qp95.json?house_number=10008&street_name=103%20STREET%20NW
and returns data in json:
[{"account_number":"3070208","garage":"N","house_number":"10008","latitude":"53.539158992619","longitude":"-113.497760691896","neighbourhood":"DOWNTOWN","street_name":"103 STREET NW","tax_class":"Non Residential","total_asmt":"1717000"}]
I have an excel table with specific house_number and street_name pairs and I want to capture the total_asmt column for each pair.
I've been able to create a power query which pulls the very first data point into a new sheet:
let
Parameter = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Removed Other Columns" = Table.SelectColumns(Parameter,{"house_number", "street_name"}),
X = #"Removed Other Columns"[house_number]{0},
Y = #"Removed Other Columns"[street_name]{0},
Source = Json.Document(Web.Contents("https://data.edmonton.ca/resource/3pdp-qp95.json?house_number="& X &"&street_name=" & Y)),
in
Source
I can't figure out how to iterate through all the value I have in X and Y or how to capture specific rows from the JSON data. Any help would be appreciated!
Thanks,
Aaleem
I think your best best is to not do it.
Why are you wasting your time scraping this data one address at a time when you could have the entire city's data in under a minute.
JSON: https://data.edmonton.ca/resource/3pdp-qp95.json
CSV: https://data.edmonton.ca/api/views/q7d6-ambg/rows.csv?accessType=DOWNLOAD
XML: https://data.edmonton.ca/api/views/q7d6-ambg/rows.xml?accessType=DOWNLOAD
...among others. Heck, they even have !
And when you're done with that one, they have a few hundred other interesting datasets.
The trick was to create a function inside powerquery, and then use the query as part of a table. Create the function as below and then under the data tab select your table using "From Table/Range" from there it is pretty straight forward.
let a_value= (x as number,y as text)=> //this creates the function
let //this is essentially the query I wanted with some minor changes from above
x_text = Number.ToText(x, "D", ""),
Source = Json.Document(Web.Contents("https://data.edmonton.ca/resource/3pdp-qp95.json?house_number="&x_text&"&street_name="&y)),
Source1 = Source{0},
total_asmt = Source1[total_asmt]
in
total_asmt
in a_value //closes the function

Orchard: In what table is the Blog post stored

I'm attempting to export data from an older Orchard db and am having problems finding which table the content of a blog post is stored. I've tried using a number of different 'Search all columns' spocs to search all tables and columns but am not finding text from the post itself.
If I have a blog post where the opening sentence is:
This sentence contains a unique word.
I would have expected at least one of the various 'Search all columns' examples to have turned up a table/column. But so far, none have.
thx
Orchard store data based on two tables, ContentItemRecord and ContentItemVersionRecord, which store meta data for content items like BlogPost, and these content items built from multiple parts, each part has it's table and the relation between the item and it's parts is based on Id (if not draftable) or ContentItemRecord_Id (if draftable) columns
if we take BlogPost type as example, which built from TitlePart, BodyPart, AutoroutePart and CommonPart, and you want to select all the data of post (id = 90), then you can find it's title in TitlePartRecord table (ContentItemRecord_Id = 90), and the body text of it in BodyPartRecord table with same relation as title part record, and the route part in AutorouteRecord table with same relation, and the common meta data in CommonPartRecord (Id = 90).
This is the way to extract data from Orchard database, hope this will help you.
Tnx to #mdameer...
and the related query of madmeer's answer is this:
SELECT * FROM dbo.default_Title_TitlePartRecord
inner join dbo.default_Orchard_Framework_ContentItemRecord on
dbo.default_Title_TitlePartRecord.ContentItemRecord_id=dbo.default_Orchard_Framework_ContentItemRecord.Id
inner join dbo.default_Common_BodyPartRecord on
dbo.default_Common_BodyPartRecord.ContentItemRecord_id=dbo.default_Orchard_Framework_ContentItemRecord.Id
where dbo.default_Title_TitlePartRecord.ContentItemRecord_id=90
and this is the rightsolution
Just in case it may be useful for others, the following is the actual SQL query used to migrate an Orchard instance to Umbraco. It is derived from the excellent answers by mdameerand and Iman Salehi:
SELECT t.Title, f.Data, b.Text FROM dbo.Title_TitlePartRecord t
inner join dbo.Orchard_Framework_ContentItemRecord f on
t.ContentItemRecord_id=f.Id
inner join dbo.Common_BodyPartRecord b on
b.ContentItemRecord_id=f.Id
AND b.Id = (
SELECT MAX(m2.Id)
FROM dbo.Common_BodyPartRecord m2
WHERE m2.ContentItemRecord_id = f.Id
)
AND t.Id = (
SELECT MAX(m2.Id)
FROM dbo.Title_TitlePartRecord m2
WHERE m2.ContentItemRecord_id = f.Id
)

Resources