Excel Power Query: Using List.MatchAny on a column value

Excel Power Query: Using List.MatchAny on a column value - excel

In Excel Power Query, I have a table. Column A has single numbers. I want to mark those records where the Column A value matches a list. A cutdown version of the problem is:
let
TableA = Table.FromColumns({{1,2,4}}, {"A"}),
ListB = {4,5,6 },
DPart = Table.AddColumn(TableA, "IsInB",
List.MatchesAny(ListB, each _ = [A]))
in
DPart
I get an error in the DPart line
Expression.Error: We cannot apply field access to the type Number.
Details:
Value=4
Key=A
Apparently, the code is trying to access the [A] column of elements of the list, instead of the [A] column of TableA.
What's the correct syntax to accomplish this?

This works:
let
TableA = Table.FromColumns({{1,2,4}}, {"A"}),
ListB = {4,5,6 },
DPart = Table.AddColumn(TableA, "IsInB",
(x) => List.MatchesAny(ListB, each _ = x[A]))
in
DPart
But I would prefer:
let
TableA = Table.FromColumns({{1,2,4}}, {"A"}),
ListB = {4,5,6 },
DPart = Table.AddColumn(TableA, "IsInB",
each List.Contains(ListB, _[A]))
in
DPart

Related

Access columns in Nested Tables by its column position instead by name using Power Query

I have been working to create a Power Query for Excel that checks a folder and gets the Excel WorkBooks, consolidating the Sheets within. I’m novice in power query so I need a lot of your help providing examples to accomplish it.
I’ve been stuck iterating some changes, trying to overcome that some excel sheets do not have the same column name {‘Column1’, ‘Column2’ or ‘Column3’}, and when accessing columns by name the query will give me error.
The comparison should be done on the columns 1 to 3 of each nested table in 'First15Rows' column, and if there are 3 or more Desired Titles found in the same row, then the sheet is considered valid
So I’m asking for help
The current query looks like this.
let
Source = Folder.Files(Excel.CurrentWorkbook(){[Name="FldrLocation"]}[Content][FldrLocation]{0}),
FilterFileNames = Table.SelectRows(Source, each not Text.StartsWith([Name], "~$") and Text.Contains([Extension], ".xls")),
RemoveOtherCols1 = Table.SelectColumns(FilterFileNames,{"Content", "Name", "Date modified"}),
OnlyRecent = Table.SelectRows(RemoveOtherCols1, each [Date modified] >= Date.AddWeeks(DateTime.LocalNow(), -WeeksAgo)),
AddSheetsColumn = Table.AddColumn(OnlyRecent, "Custom", each Excel.Workbook([Content])),
ExpandSheetsFromTable = Table.ExpandTableColumn(AddSheetsColumn, "Custom", {"Name", "Data"}, {"Sheets", "Data"}),
FilterSheetNames = Table.SelectRows(ExpandSheetsFromTable, each not Text.Contains([Sheets], "Print") and not Text.StartsWith([Sheets], "_xlnm")),
RemoveEmptySheets = Table.SelectRows(FilterSheetNames, each
if Table.IsEmpty(Table.SelectRows ([Data], each _ **[Column1]** <> null)) then null else true),
AddFirst15Rows = Table.AddColumn(RemoveEmptySheets, "First15Rows", each Table.FirstN([Data], 15)),
CheckMatch = Table.SelectRows(AddFirst15Rows, each
if Table.IsEmpty(Table.SelectRows([First15Rows], each _**[Column1]** = "Date" or _**[Column2]** = "Time"))
then null
else true)
in
CheckMatch

Power Query: Detect null created by Table.NestedJoin JoinKind.FullOuter

With Table.NestedJoin JoinKind.FullOuter, a null may be written into columns when there is a value in the right table "key" that does not exist in the left table "key".
However, unlike a null that is in the left table because the cell is empty, this created null does not = True with the formula [column] = null.
For example:
Table1
Note the null in row 3
Table2
Joined Table
The null in row 5 was created as a result of the Join
Custom Column
added with formula =[A]=null
note the different results for the null
MCode to reproduce the above
let
Source1 = Table.FromRecords({
[A="a"],
[A="b"],
[A=null],
[A="c"]
}),
type1 = Table.TransformColumnTypes(Source1,{"A", type text}),
Source2 = Table.FromRecords({
[A="c"],
[A="d"]
}),
type2 = Table.TransformColumnTypes(Source2,{"A", type text}),
combo = Table.NestedJoin(type1,"A",type2,"A","joined",JoinKind.FullOuter),
#"Added Custom" = Table.AddColumn(combo, "Custom", each [A]=null)
in
#"Added Custom"
Explanations and suggestions as to how to deal with this would be appreciated.
Edit In addition to the above, doing a Replace will also only replace the null in row 3, and not the null in row 5. Seems there is something different about these two nulls.
Note: If I Expand the table, the null in Column A will now test correctly.

Asking the same question on the Microsoft Q&A forum pointed me to the possibility of an issue with the Power Query Evaluation model and also this article on Lazy Evaluation and Query Folding in Power BI/Power Query.
By forcing evaluation of the table with Table.Buffer, both nulls now behave the same.
So:
let
Source1 = Table.FromRecords({
[A="a"],
[A="b"],
[A=null],
[A="c"]
}),
type1 = Table.TransformColumnTypes(Source1,{"A", type text}),
Source2 = Table.FromRecords({
[A="c"],
[A="d"]
}),
type2 = Table.TransformColumnTypes(Source2,{"A", type text}),
//Table.Buffer forces evaluation
combo = Table.Buffer(Table.NestedJoin(type1,"A",type2,"A","joined",JoinKind.FullOuter)),
//IsItNull now works
IsItNull = Table.AddColumn(combo, "[A] = null", each [A] = null)
in
IsItNull
It also seems to be the case that try ... otherwise will also force an evaluation. So instead of Table.Buffer, the following also works:
...
combo = Table.NestedJoin(type1,"A",type2,"A","joined",JoinKind.FullOuter),
//try ... otherwise seems to force Evaluation
IsItNull = Table.AddColumn(combo, "[A] = null", each try [A] = null otherwise null)

Very interesting case. Indeed, the behaviour of last null is counterintuitive in most possible implementations. If you wish to get the same behaviour for both kinds of nulls, try this approach:
= Table.AddColumn(combo, "test", each [A] ?? 10)
Quite interesting, the similar code doesn't work:
= Table.AddColumn(combo, "test", each if [A] = null then 10 else [A])
Moreover, if we want to improved the previous code by using the first syntax we still get unexpectable result (10 instead of 20 for the last null):
= Table.AddColumn(combo, "test", each if [A] = null then 10 else [A] ?? 20)
Сurious, applying ?? operator also fixes the problem with initial column. Now there are regular nulls in A column:
= Table.AddColumn(add, "test2", each [A] = null)
So, if we don't need any calculations and just want to fix invalid nulls, we may use such code:
= Table.TransformColumns(combo, {"A", each _ ?? _})
The column doesn't matter and for joined column the result is the very same:
transform = Table.TransformColumns(combo, {"joined", each _ ?? _}),
add = Table.AddColumn(transform, "test", each [A] = null)

Why do I get "Incompatible indexer with Series"

I have two Dataframes with the same structure:
lTable and rTable
I loop through one to update with values from the other:
for i,lrow in lTable.iterrows():
rrows = rTable[rTable.Date == lrow["Date"]]
if not rrows.empty:
ref = rrows.head(1)["RefPrice"]
lTable.loc[i,"RefPrice"] = ref
Why do I get ValueError on ".loc" line:
ValueError: Incompatible indexer with Series

Here change one element Series returned after filter rows with head(1):
ref = rrows.head(1)["RefPrice"]
to scalar by selecting first value by position:
ref = rrows["RefPrice"].iat[0]
Or one line solution:
ref = rrows.head(1)["RefPrice"]
lTable.loc[i,"RefPrice"] = ref
to:
lTable.loc[i,"RefPrice"] = rrows["RefPrice"].iat[0]
But better/faster should be merge with left join:
lTable = lTable.merge(rTable[['Date', "RefPrice"]], on='Date', how='left')

Python SQLite3: Select from table by multiple keys and list of key values, using parametrization if possible

If I query SqLite table using single key, I can use the following code for parametrization:
contact_phones_list = ['+123456789', '+987654321']
q = "select * from {} WHERE user_phone in ({})".format(
my_table_name,
', '.join('?' for _ in contact_phones_list)
)
res = self.cursor.execute(q, contact_phones_list).fetchall()
Now I want to query for key pairs for which I have values:
keys = ['user_phone', 'contact_phone']
values = [('+1234567', '+1000000'), ('+987654321', '+1200000')]
q = "select contact_phone, is_main, aliases from {} WHERE ({}) in ({})".format(
my_table_name,
', '.join(keys),
', '.join('(?, ?)' for _ in values)
)
res = self.cursor.execute(q, values).fetchall()
I'm getting error "row value misused". I tried many combinations with sublist instead of tuple, single "?", etc.
How can I create parametrization in this case?
EDIT: adding "VALUES" keyword and flattening list works:
keys = ['user_phone', 'contact_phone']
values = [('+1234567', '+1000000'), ('+987654321', '+1200000')]
values_q = []
for v in values:
values_q += [v[0], v[1]]
q = "select * from my_table_name WHERE ({}) IN (VALUES {})".format(
', '.join(keys),
', '.join('(?, ?)' for _ in values)
)
res = cursor.execute(q, values_q).fetchall()
Is this a workaround or only acceptable solution?

From the documentation:
For a row-value IN operator, the left-hand side (hereafter "LHS") can be either a parenthesized list of values or a subquery with multiple columns. But the right-hand side (hereafter "RHS") must be a subquery expression.
You're building up something that looks like (?,?) IN ((?,?), (?,?)), which doesn't meet that requirement. The syntax (?,?) IN (VALUES (?,?), (?,?)) works, though.
Also, I think you might have to flatten out that list of tuples you pass to the prepared statement, but somebody more knowledgeable about python would have to say for sure.

DocumentDB multiple filter query on array

Using the DocumentDB query playground, I am working on a filter type of query. I have a set of attributes in my data that are set up to allow the user to search by the specific attribute. Each attribute type becomes and OR statement if multiple items are selected from the name in the name/value collection. If attributes are selected that differ (i.e. color and size) this becomes an AND statement.
SELECT food.id,
food.description,
food.tags,
food.foodGroup
FROM food
JOIN tag1 IN food.tags
JOIN tag2 IN food.tags
WHERE (tag1.name = "snacks" OR tag1.name = "granola bars")
AND (tag2.name = "microwave")
This query works beautifully in the playground.
The main issue is that I have up to 12 attributes, and maybe more. Once I hit 5 joins, that is my maximum allowed number of joins, so the query below doesn't work. (note that this isn't playground data, but a sample of my own)
SELECT s.StyleID FROM StyleSearch s
JOIN a0 in s.Attributes
JOIN a1 in s.Attributes
JOIN a2 in s.Attributes
JOIN a3 in s.Attributes
JOIN a4 in s.Attributes
JOIN a5 in s.Attributes
WHERE (a0 = "color-finish|Grey" OR a0 = "color-finish|Brown" OR a0 = "color-finish|Beige")
AND (a1 = "fabric-type|Polyester" OR a1 = "fabric-type|Faux Leather")
AND (a2 = "design-features|Standard" OR a2 = "design-features|Reclining")
AND (a3 = "style_parent|Contemporary" OR a3 = "style_parent|Modern" OR a3 = "style_parent|Transitional")
AND (a4 = "price_buckets|$1500 - $2000" OR a4 = "price_buckets|$2000 and Up")
AND (a5 = "dimension_width|84 in +")
I am not 100% sure I am using the proper query to perform this, but a simple where clause per below which works in SQL brings back anything matching in the or statements so I end up with items from each "AND statement.
SELECT s.StyleID FROM StyleSearch s
JOIN a in s.Attributes
WHERE (a = "color-finish|Grey" OR a = "color-finish|Brown" OR a = "color-finish|Beige")
AND (a = "fabric-type|Polyester" OR a = "fabric-type|Faux Leather")
AND (a = "design-features|Standard" OR a = "design-features|Reclining")
AND (a = "style_parent|Contemporary" OR a = "style_parent|Modern" OR a = "style_parent|Transitional")
AND (a = "price_buckets|$1500 - $2000" OR a = "price_buckets|$2000 and Up")
AND (a = "dimension_width|84 in +")
Here is an example of the data:
{
"StyleID": "chf_12345-bmc",
"Attributes": [
"brand|chf",
"color|red",
"color|yellow",
"dimension_depth|30 in +",
"dimension_height|counter height",
"materials_parent|wood",
"price_buckets|$500 - $1000",
"style_parent|rustic",
"dimension_width|55 in +"
]
}
I am looking for the proper way to handle this. Thanks in advance.

Is it possible for you to change the structure of your document to add filter attributes specifically for your query on e.g.
{
"StyleID": "chf_12345-bmc",
"Attributes": [
"brand|chf",
"color|red",
"color|yellow",
"dimension_depth|30 in +",
"dimension_height|counter height",
"materials_parent|wood",
"price_buckets|$500 - $1000",
"style_parent|rustic",
"dimension_width|55 in +"
],
"filter_color": "red,yellow",
"filter_fabric_type":"Polyester,leather"
}
This would eliminate the join restriction because now your query looks something like this:
SELECT s.StyleID FROM StyleSearch s
WHERE (CONTAINS(s.filter_color, "Grey") OR CONTAINS(s.filter_color, "Red"))
AND (CONTAINS(s.filter_fabric_type, "Polyester") OR CONTAINS(s.filter_fabric_type, "Leather"))
Of course this does mean that you have additional fields to maintain.
You might also consider writing a stored proc for this and using javascript to loop through your collection and filtering that way: DocumentDB stored procedure tutorial

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Excel Power Query: Using List.MatchAny on a column value - excel

Related

Access columns in Nested Tables by its column position instead by name using Power Query

Power Query: Detect null created by Table.NestedJoin JoinKind.FullOuter

Why do I get "Incompatible indexer with Series"

Python SQLite3: Select from table by multiple keys and list of key values, using parametrization if possible

DocumentDB multiple filter query on array

Categories

Resources