I would like to write a query in serverless pool for concatenation of string values from multiple rows into single row with comma separated values. I am getting below error when I use COALESE function which I am unable to fix "Queries referencing variables are not supported in distributed processing mode"
Input rows :
A
B
C
A
B
Output row (Looking for distinct values only while creating a list like below)
A,B,C
You can use STRING_AGG() function to concatenate values from multiple rows to a single row with comma-separated.
Get distinct values of a column and apply STRING_AGG on the results as below.
select STRING_AGG(col1, ',') output_col1 from (select distinct col1 from #tb1) a
Related
I am trying to isolate several substrings from a specific column of a parquet file that contains text (string). The substrings are all in an array and I want to keep only those rows that contain one or more of these substrings - words, while I keep a new column with the substrings that where found at the text.
I have currently used the following transformations:
source: which is the parquet file I use
derived column: where I create a new column (words) which contains an array of the words-substrings that are contained in the text, by using the following expression
intersect(split(text_column, ' '), ['array','of','words'])
filter: where I want to filter the derived column that was created at the previous transformation and exclude those rows that are either Null or contain an empty array
sink
I have currently stuck to the 3rd transformation where I cannot filter and discard those rows that the 2 arrays do not intersect. I think that when intersect doesn't find any common element it returns an empty string array which I have not find the right condition that filters it out.
I have tried:
1. not(isNull(words))
2. word != array('')
3. not(isNull(word[1]))
but none of them worked.
Any suggestions regarding the whole process or the filtering of the empty string array will be perfect.
Thank you in advance.
I was expecting to get back only the rows that contain at least one of the substrings, but I get all the rows regardless if they contain one of the substrings.
You can check the size of the array and remove the rows with array size=0. In filter transformation, filter on size(words)!=0.
I repro'd this with sample inputs.
Derived column transformation with same expression is given.
intersect(split(text_column, ' '), ['array','of','words'])
Then in Filter transformation, condition is given as filter on size(words)!=0
By this way, we can remove the empty array.
Reference: MS document on size expression.
I hope you can give me some guidance.
I have this first table with two comma delimited cells:
=>I separated the delimited values of the last column with power query. Split rows and remove the pivot (https://exceloffthegrid.com/power-query-split-delimited-cells-into-rows/):
=>My question is for the second column, how can I avoid that it generates the matrix product and only put a single value in the cell?
I want to get the table like this:
You could add 2 more steps:
Between first and second table above you may create and process a new column with combined values „d-1, e-2, f-3“ and “y-4, x-5” before unpivot.
After unpivot split the columns with the combined values to 2 columns back.
Here I am working with two columns, both the columns have concatenated data present within them. I have examples shown below:
The column from where i need to search the data from-
col1
buildingtoysjacknicksharon
watertoysnatealexasamfelix
rctoyssharonsamnate
sciencetoysjackfelixalexa
The list-
col2
buildingtoyssharon
watertoysaleaxa
rctoysnate
sciencefelix
I wanted to do a vlookup of column2 with column 1 but the data in both the columns were not exact, I tried other methods using lookup, search, index, match, countif etc. but could not find a way to lookup the data from column 2.
I wanted to know if there is a way to return "buildingtoyssharon" from "buildingtoysjacknicksharon" etc.
I'm using presto, and I have a dataset of rows with ids and values, each id can have multiple rows with multiple values.
I need to group the values into an array and create one row of "value"s for each "id" (a comma delimited string of all values per id).
The number of values for each id can be different (some will have 1, some will have ~10)
Any ideas on how to do that?
I would suggest you to go through [array_agg](https://prestodb.io/docs/current/functions/aggregate.html"Aggregate Functions")
select
id, array_agg(value)
from
database
group by
id
I hope this is what you want.
I have the following table with three columns A, B, and C on PowerBI, from which I want to select columns A and B to create a new table.
And looking from the equivalent in Pandas to:
table_2 = table1[["A","B"]]
or from SQL:
SELECT A,B FROM table1;
But I'm not finding the equivalent function on Power BI; the SELECTCOLUMNS function is used to create a new column as stated in the docs:
Adds calculated columns to the given table or table expression.
The SELECTCOLUMNS function works fine for this. It allows you to create more complex calculated columns, but you can simply use the column itself as the calculation definition.
This should do the trick:
SELECTCOLUMNS(table1, "A", table1[A], "B", table1[B])