Conditional column in Power Query

Conditional column in Power Query - excel

I have a table with 3 columns: Name, Property and Value
All names are unique. But there are cases where, for instance, for different Names , both their properties and values are equal at the same time.
I want to add conditional column that would add all Names with line feed delimiter for which properties and values are equal. So for example, for 1st Name I would go to conditional column and would see list of 5 other names that have the same property and value
So far I have tried adding conditional column:
If Property equals Property Then
Else if Value equals Value Then Name
but it just returns values from name column and I dont know to add up these names together
Thanks!

You could group your rows by Property and Value, then combine the Name of for each row.
= Table.Group(Source, {"Property", "Value"}, {{"Names", each Text.Combine(_[Name], ", "), type text}})
Table.Group - like SQL's GROUP BY
Text.Combine - like array joining in other languages, you provide a list and a separator and receive a string
Original table:
| Name | Property | Value |
| ---- | -------- | ----- |
| A | a | 1 |
| B | b | 2 |
| C | a | 2 |
| D | a | 1 |
| E | b | 2 |
Full query:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Grouped = Table.Group(Source, {"Property", "Value"}, {{"Names", each Text.Combine(_[Name], ", "), type text}})
in
Grouped
Result:
| Property | Value | Names |
| -------- | ----- | ----- |
| a | 1 | A, D |
| b | 2 | B, E |
| a | 2 | C |

Related

How to merge rows based on value in one column in Excel?

I want to merge/concatenate rows if those rows have a duplicate values in one column. The merge step applies to more than one columns. In de table below I show an example of the problem:
+-----+--------+--------+--------+--------+--------+----------+
| | A | B | C | D | E | F |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | | | param1 | | | |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | param2 | | | | | |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | | | | | | |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | | | | | | param3 |
+-----+--------+--------+--------+--------+--------+----------+
| Cat | | param5 | | | | |
+-----+--------+--------+--------+--------+--------+----------+
| Cat | | | | param6 | | |
+-----+--------+--------+--------+--------+--------+----------+
I have about 4000 unique row values and about 30 columns. The duplicate row values are ranging from n=1 to n=10.
My preferred table:
+-----+--------+--------+--------+--------+--------+----------+
| | A | B | C | D | E | F |
+-----+--------+--------+--------+--------+--------+----------+
| Dog | param2 | | param1 | | | param3 |
+-----+--------+--------+--------+--------+--------+----------+
| Cat | | param5 | | param6 | | |
+-----+--------+--------+--------+--------+--------+----------+
Can this be done in Excel with some magic or do I need advanced stuff like python for this?
I have tried multiple formula's with CONCATINATE but to no success.
Thank you in advance

This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//change next line to reflect actual data source
Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
//set all columns to data type text
#"Changed Type" = Table.TransformColumnTypes(Source,
List.Transform(Table.ColumnNames(Source), each {_, type text})),
//Group by Animal
//Then "Fill Up" each column and return only the first row
#"Group Animal" = Table.Group(#"Changed Type","Animal",
{"Consolidate", each Table.FillUp(_,Table.ColumnNames(_)){0}}),
//Expand the grouped table and re-set the data types to text
#"Expanded Consolidate" = Table.ExpandRecordColumn(#"Group Animal", "Consolidate",
List.RemoveFirstN(Table.ColumnNames(#"Changed Type"))),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded Consolidate", List.Transform(Table.ColumnNames(#"Expanded Consolidate"), each {_, type text}))
in
#"Expanded Consolidate"

Tricky. One way is to nest REDUCE() in another:
Formula in A8:
=DROP(REDUCE(0,UNIQUE(A1:A6),LAMBDA(a,b,VSTACK(a,REDUCE(b,SEQUENCE(6),LAMBDA(x,y,HSTACK(x,#SORT(INDEX(FILTER(B1:G6&"",A1:A6=b),,y),,-1))))))),1)
Or, a bit more dynamic:
=LET(r,A1:G6,s,TAKE(r,,1),t,DROP(r,,1)&"",DROP(REDUCE(0,UNIQUE(s),LAMBDA(a,b,VSTACK(a,REDUCE(b,SEQUENCE(COLUMNS(t)),LAMBDA(x,y,HSTACK(x,#SORT(INDEX(FILTER(t,s=b),,y),,-1))))))),1))

Highlight the values you need to be grouped together then go to data, click on groe tab

PySpark Map to Columns, rename key columns

I am converting the Map column to multiple columns dynamically based on the values in the column. I am using the following code (taken mostly from here), and it works perfectly fine.
However, I would like to rename the column names that are programmatically generated.
Input df:
| map_col |
|:-------------------------------------------------------------------------------|
| {"customer_id":"c5","email":"abc#yahoo.com","mobile_number":"1234567890"} |
| null |
| {"customer_id":"c3","mobile_number":"2345678901","email":"xyz#gmail.com"} |
| {"email":"pqr#hotmail.com","customer_id":"c8","mobile_number":"3456789012"} |
| {"email":"mnk#GMAIL.COM"} |
Code to convert Map to Columns
keys_df = df.select(F.explode(F.map_keys(F.col("map_col")))).distinct()`
keys = list(map(lambda row: row[0], keys_df.collect()))
key_cols = list(map(lambda f: F.col("map_col").getItem(f).alias(str(f)), keys))
final_cols = [F.col("*")] + key_cols
df = df.select(final_cols)
Output df:
| customer_id | mobile_number | email |
|:----------- |:--------------| :---------------|
| c5 | 1234567890 | abc#yahoo.com |
| null | null | null |
| c3 | 2345678901 | xyz#gmail.com |
| c8 | 3456789012 | pqr#hotmail.com |
| null | null | mnk#GMAIL.COM |
I already have the fields customer_id, mobile_number and email in the main dataframe, of which map_col is one of the columns. I get error when I try to generate the output because same column names are already in the dataset. Therefore, I need to rename these column names to customer_id_2, mobile_number_2, and email_2 before it is generated in the dataset. map_col column may have more keys and values than shown.
Desired output:
| customer_id_2 | mobile_number_2 | email_2 |
|:------------- |:-----------------| :---------------|
| c5 | 1234567890 | abc#yahoo.com |
| null | null | null |
| c3 | 2345678901 | xyz#gmail.com |
| c8 | 3456789012 | pqr#hotmail.com |
| null | null | mnk#GMAIL.COM |

Add the following line just before the code which converts map to columns:
df = df.withColumn('map_col', F.expr("transform_keys(map_col, (k, v) -> concat(k, '_2'))"))
This uses transform_keys which changes the key names adding _2 to the originam name, as you needed.

Count matches of entire record in two separate tables - excel

Ive been trying to find a formula which would count me the match between two tables (like inner join) in excel.
I have a table1 with columns(ID,UserName,Function) and table2 (UserName,Function, etc...) need to count an explicit matches of table1(UserName&Function) and table2(UserName&Function)
tried sumproduct(--(table1[UserName:Function]=table2[UserName:Function]) but it seems like it compares it column by column and returns incorrect value, i tried to concatenate those columns within sumproduct, but still doesnt work.
Is it possible to make it in one formula or shall i build udf with sql query?
Would it be possible to return the records and list it as an array by using FILTERXML formula?
sample data:
table1:
| ID | UserName | Function |
| -- | -------- | ----------|
| 1 | oopz | FCA4001 |
| 2 | oopz | FCA4002 |
| 3 | arronT | FCA4001 |
table2:
| UserName | Function |
| -------- | ----------|
| randalO | FCA4001 |
| oopz | FCA4001 |
| arronT | FCA4005 |
Thanks in advance!:)

Return unique column headers matching criteria

Consider the following data below:
| 1st | 2nd | A | B | C | D | E | F | G | H |
|-----|-----|---|---|---|---|---|---|---|---|
| y | x | | | 1 | | | | | |
| y | x | | | 1 | | | | | |
| y | x | | | | 1 | | | | |
| | x | 1 | | | | | | | |
| y | | 1 | 1 | 1 | | | | | |
| y | x | | | | | | 1 | | |
| y | | | | | | | | 1 | |
| | x | | | | | 1 | | | |
| | x | | | | | | | | 1 |
| y | x | | | | | | | | 1 |
What I wish to do is to return all column headers (from A to H) that meets the following condition: it should have a value of 1 that is both aligned with a y and x value from the first two columns.
I already have a working array formula to do this, which is as follows:
{=INDEX($C$1:$J$1,SMALL(IF(($A$2:$A$11="y")*($B$2:$B$11="x")*($C$2:$J$11=1),COLUMN($C$1:$J$1)-COLUMN($B$1)),ROW(1:1)))}
However, while I drag this down, it returns two C values and one for D, F and H.
This is since there are two 1's under header C that meets the said condition. What I want is to return unique values, so C should only be returned once. I tried to make use of MATCH and additional COUNTIF instead of the SMALL function, but it is returning an error, and the 'Evaluate formula' feature of Excel isn't helping. Below if the erroneous formula I experimented with:
{=INDEX($C$1:$J$1,MATCH(0,IF(($A$2:$A$11="y")*($B$2:$B$11="x")*($C$2:$J$11=1),COUNTIF($N$1:N1,COLUMN($C$1:$J$1)-COLUMN($B$1))),0))}
A workaround I am currently doing is to make my first formula a "helper column" and then create another formula based from the first formula's result to return only the unique values. However, the double array formula is heavily affecting the efficiency of Excel's calculation due to the huge volume of data I'm dealing with.
Any help/suggestions will do please (no VBA please, since I believe it's not needed here). Thanks!

Insert a helper row. I did it just under your header row before your data. In this row you check to see if there is a 1 that lines up with an x and a y. I assumed this to be non blank, but if its specific values change the formula from <>"" to ="y" or =134 as the case may be. Place the following formula under your first column header you are interested in and copy right.
=--(0<SUMPRODUCT(($B$3:$B$12<>"")*($C$3:$C$12<>"")*(D3:D12=1)))
Then where you want to generate your list in a column without space and sorted in the order the appear in from left to right in the headings, use the following formula and copy down as required:
=IFERROR(INDEX($1:$1,AGGREGATE(15,6,COLUMN($D$2:$K$2)/$D$2:$K$2,ROW(A1))),"")
The above formula put in a blank value when no column heading applies are you have copied the formula down beyond the number of applicable columns.
The above formulas are based on the proof of concept image below. Adjust ranges to suit your needs.

Have you tried without the use of an array formula? I don't know how large the data actually is. But, this might be what you are looking for:
=IF(COUNTIFS($A:$A,"y",$B:$B,"x",C:C,1)>0,C1,"")
Assuming column A is "1st" and "H" is your last column at colunm J. Try pasting the formula at "K1" and drag it to your right until "S1".

Excel: get the value of third column on the behalf of second column

i am not much familiar with excel formulas and i am trying to get the value of third column on the behalf of second column.
Example:
|---------------------------------------------------------|
| A B C D E |
|-----|----------|----------|--------------|--------------|
|Sr.No| Bar Code | Cat Id | Org BarCode | Org Category |
|---------------------------------------------------------|
| 1 | 89457898 | | 85214784 | 2 |
| 2 | 87414714 | | 63247458 | 3 |
| 3 | 85214784 | | 89457898 | 4 |
| 4 | 63247458 | | ---- | --- |
-----------------------------------------------------------
i just want to update column C by column E on the behalf of column D and B
can any one please tell me the formula, how i can do this?

Use VLOOKUP. Enter the following formula into cell C1 and then copy it down the C column:
=VLOOKUP(B1, D$1:E$4, 2, FALSE)
To cover more than 4 rows, then just update the formula accordingly. If you want to display a certain placeholder value if a value in column B be not found, then you wrap the call to VLOOKUP as follows:
=IFNA(VLOOKUP(B1, D$1:E$4, 2, FALSE), "Not found")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Conditional column in Power Query - excel

Related

How to merge rows based on value in one column in Excel?

PySpark Map to Columns, rename key columns

Count matches of entire record in two separate tables - excel

Return unique column headers matching criteria

Excel: get the value of third column on the behalf of second column

Categories

Resources