I'm using presto, and I have a dataset of rows with ids and values, each id can have multiple rows with multiple values.
I need to group the values into an array and create one row of "value"s for each "id" (a comma delimited string of all values per id).
The number of values for each id can be different (some will have 1, some will have ~10)
Any ideas on how to do that?
I would suggest you to go through [array_agg](https://prestodb.io/docs/current/functions/aggregate.html"Aggregate Functions")
select
id, array_agg(value)
from
database
group by
id
I hope this is what you want.
Related
I would like to write a query in serverless pool for concatenation of string values from multiple rows into single row with comma separated values. I am getting below error when I use COALESE function which I am unable to fix "Queries referencing variables are not supported in distributed processing mode"
Input rows :
A
B
C
A
B
Output row (Looking for distinct values only while creating a list like below)
A,B,C
You can use STRING_AGG() function to concatenate values from multiple rows to a single row with comma-separated.
Get distinct values of a column and apply STRING_AGG on the results as below.
select STRING_AGG(col1, ',') output_col1 from (select distinct col1 from #tb1) a
I have a table of data below in columns A & B. I would like to count the number of times a Type number appears for each ID shown in the top table D to I. I've included the answers I'm looking for in the table under it.
I know I can use =COUNTIFS(B2:B25,"="&E1) to get the count of the Types, but I don't know how to make them unique to each unique ID
You just left out the other part of the COUNTIFS.
Try: =COUNTIFS($B$2:$B$25, E$1, $A$2:$A$25, $D2)
=COUNTIFS($B$2:$B$25, IF(E$1="Blank","",E$1), $A$2:$A$25, $D2)
I want to iterate over rows of a dataset (row by row) and get value of a certain column, how to achieve this ?
I tried with :
oldDF.foreach((ForeachFunction<Row>) row -> System.out.println(row));
Is it the right way ? else how to achieve it and how to access the value of a column of a row ?
Thanks ?
If you just want to output the value of a specific column, you can actually use a select query for this particular column and just show the results on the console like that:
oldDF.select("_2").show() // show the 2nd column's values of oldDF
In case you want to have the elements of this column in an Array of type Rows, you can swap the show() method with collect() or collectAsList() like this:
oldDF.select("_2").collect() // store the 2nd column's values of oldDF
I have a table (Table 1) with multiple names per row i.e. Column Structure is (Unique ID, Name 1, Name 2, Name 3, etc). I then have another table (Table2) which is just a list of names in one column (Name x, Name y, Name z, etc). I am trying to identify which rows in Table 1 have a column with a name in Table2 and then return whichever name is matching. It does not matter if there are multiple matches, I only care if there is a match.
My current thinking is to use the Index(match) function, but I am unsure if there is a way to input the list from table 2 into the "Lookup Value" spot of the match portion. Does anyone know if this is possible or if there is another way to get what I am looking for?
I have a Table, say TABLE-1 with 4 columns. Columns are (ID, NAME, GROUP_NAME, IS_APPROVER), ID is not the primary key. There are many rows with same ID. Now I got a new Table, say TABLE-2 with Columns, (NEW_ID, OLD_IDs), in this table (TABLE-2), OLD_ID is the primary Key. I have to replace all the values of IDs in TABLE-1, with the NEW_ID (in TABLE-2) values compared using OLD_ID (in TABLE-2). How can I do this?
You could potentially use VLOOKUP to make it a little easier.
I added a new column to the first table and the formula for that column is simply
=VLOOKUP([#ID], Table2[#All], 2)
Assuming your tables look like the below:
You should just be able to use a combination of INDEX and MATCH to get your desired results:
=INDEX(TABLE-2[NEW_ID],(MATCH(TABLE-1[#ID],TABLE-2[OLD_IDs],0)))
If your data is not actually in tables but in ranges the general formula is:
=INDEX(ColumnFromWhichValueShouldBeReturned, (MATCH(LookupValue, ColumnAgainstWhichToLookup, 0)))
EDIT following further info from OP:
If not all the ID's are matched in TABLE-2 you can use the following:
=IFERROR(INDEX(TABLE-2[NEW_ID],(MATCH(TABLE-1[#ID],TABLE-2[OLD_IDs],0))), Table1[#ID])
In practice you can see this working below:
This is the solution..
Ref: https://www.ablebits.com/office-addins-blog/2014/08/13/excel-index-match-function-vlookup/
=INDEX(Table_2[NewIDs],MATCH(Table_1[#ID],Table_2[Old_ID],0))
Ex: =INDEX($K$2:$K$253,MATCH(B2,$J$2:$J$253,0))