Counting distinct elements from strings PostgreSQL - string

I am trying to count the elements contained in a string in the following way:
row features
1 'a | b | c'
2 'a | c'
3 'b | c | d'
4 'a'
Result:
feature count
a 3
b 2
c 3
d 1
I have already found a solution by finding the highest number of features and separating each feature into 1 column, so feature1 = content of feature 1 in the string, but the I have to manually aggregate the data. There must be a smart way to do this for sure as in my example.

By normalizing the data by using unnest() this turns into a simple group by
select trim(f.feature), count(*)
from the_table t
cross join lateral unnest(string_to_array(t.features, '|')) as f(feature)
group by trim(f.feature)
order by 1;

I used regexp_split_to_table:
SELECT regexp_split_to_table(feature, E' \\| ') AS k, count(*)
FROM tab1
GROUP BY k

Related

How to "Group By" by result and count in Azure App Insights

I'm trying to group some results I have in app insights and am struggling
If I were to tabulate my results, it would look like
Product Version
A 1
B 2
A 2
A 1
B 3
B 3
As you can see, I have 2 products (A and B), and each has a version number.
I am trying to group these and provide a count, so my end result is
Product Version Count
A 1 2
A 2 1
B 2 1
B 3 2
At the moment, my approach is a mess because I am doing this manually with
customEvents
| summarise A1 = count(customEvents.['payload.prod'] == "A" and myEvents.['payload.vers'] == "1"),
| summarise A2 = count(customEvents.['payload.prod'] == "A" and myEvents.['payload.vers'] == "2")
I have no idea how I can aggregate these so it can group by product and version and then count the occurrences of each
I think your are looking for:
customEvents
| extend Product = tostring(customDimensions.prod)
| extend MajorVersion = split(customDimensions.Version, ".")[0]
| summarize Count = count() by Product , tostring(MajorVersion)
I wrote this off the top off my head so there might be some syntax issues. I assumed prod and vers are in the customdimensions, let me know if it is otherwise.
You can summarize by multiple fields as you can see.

How to unnest multiple columns in presto, outputting into corresponding rows

I'm trying to unnest some code
I have a a couple of columns that have arrays, both columns using | as a deliminator
The data would be stored looking like this, with extra values to the side which show the current currency
I want to output it like this
I tried doing another unnest column, like this
SELECT c.campaign, c.country, a.product_name, u.price--, u.price -- add price to this split. handy for QBR
FROM c, UNNEST(split(price, '|')) u(price), UNNEST(split(product_name, '|')) a(product_name)
group by 1,2, 3, 4
but this duplicated several rows, so I'm not sure if unnesting the two columns doesn't quite work
Thanks
The issue with your query is that the clause FROM c, UNNEST(...), UNNEST(...) is effectively computing the cross join between each row of c and the rows produced by each of the derived tables resulting from the UNNEST calls.
You can solve it by unnesting all your arrays in a single call to UNNEST, thus, producing a single derived table. When used in that manner, the UNNEST produces a table with one column for each array and one row for each element in the arrays. If the arrays have a different length, it will produce rows up to the number of elements in the largest array and fill in with NULL for the column of the smaller array.
To illustrate, for your case, this is what you want:
WITH data(a, b, c) AS (
VALUES
('a|b|c', '1|2|3', 'CAD'),
('d|e|f', '4|5|6', 'USD')
)
SELECT t.a, t.b, data.c
FROM data, UNNEST(split(a, '|'), split(b, '|')) t(a, b)
which produces:
a | b | c
---+---+-----
a | 1 | CAD
b | 2 | CAD
c | 3 | CAD
d | 4 | USD
e | 5 | USD
f | 6 | USD
(6 rows)

How to count every word occurence in a string in a ORACLE loop?

I've got a problem which seems simple at first, but really isn't. I'm storing words in a table in such way that pair of strings "A B C D E" and "D E F" becomes:
id value
-- -----
1 A
1 B
1 C
1 D
1 E
2 D
2 E
2 F
And i pass to my ORACLE procedure string which looks like this: "A B C D G". And now I want to check percentage of similarity between strings in the database and string passed as a parameter.
I presume that I have to use one of split functions and use an array. Later check if every word in passed string occurs in the table and then count ids. But there`s a twist: I need precise percentage value.
So, result from example above should look like this:
id percentage
-- ----------
1 80 -- 4 out of 5 letters exists in query string (A B C D)
2 33 -- 1 out of 3 (D)
So, my questions are:
what is most effective way to split query string and then iterate on it (table?)
how to store partial results and then count them?
how to count final percentage value?
Every help would be greatly appreciated.
The following query would give you what you want without the need to bother with procedures.
select id
, sum(case when value in ('A', 'B', 'C', 'D', G') then 1 else 0 ) / count(*)
from my_table
group by id
Alternatively if you have to pass the string "A B C D G" and get a result back you could do:
select id
, sum(case when instr('A B C D G', value) <> 0 then 1 else 0 ) / count(*)
from my_table
group by id
These do involve full scanning the table or an index full scan if you use the suggested index below, so you might want to add the following where clause if you only want to find ids that have a percentage > 0.
select id
, sum(case when instr('A B C D G', value) <> 0 then 1 else 0 ) / count(*)
from my_table
where exists ( select 1
from my_table
where id = mt.id
and instr('A B C D G', value) <> 0 )
group by id
For all the queries your table should be indexed on my_table, id in that order.
Have you had a look at UTL_MATCH? It doesn't do exactly what you're trying to achieve, but you may find it useful if the definition of your percentage agreement isn't set in stone.

Within Table Subquery of Identical Combinations

I would like to select groups that have the exact same attributes from a table. For example, my table is like the following
facs_run_id | fcj_id
1 | 17
1 | 4
1 | 12
2 | 17
2 | 4
2 | 12
3 | 17
3 | 12
3 | 10
In this table each facs_run_id has different combinations of fcj_id, some are shared between facs_run_id numbers while others are not. For example, above facs_run_id 1 and 2 are identical, while 3 has shared fcj_id but is not identical to 1 and 2. I would like to make query to:
gather all fcj_id from a particular facs_run_id
find all facs_run_id that have the exact same fcj_id combination.
Herein, I want to find all facs_run_id that are equal in fcj_id combinations to facs_run_id: 1, so it should return 2 (or 1 & 2).
I can get those that are missing certain fcj_id and even find which fcj_id are missing with this:
SELECT facs_run_id
FROM facs_panel
EXCEPT
SELECT fcj_id
FROM facs_panel
WHERE facs_run_id = 2;
or this:
SELECT row(fp.*, fcj.fcj_antigen, fcj.fcj_color)
FROM facs_panel fp
LEFT OUTER JOIN facs_conjugate_lookup fcj ON fcj.fcj_id = fp.fcj_id
WHERE fp.fcj_id in ( SELECT fp.fcj_id
FROM facs_panel fp
WHERE fp.facs_run_id = 1);
But I am not able to make a query that returns IDENTICAL facs_run_id. I suppose this could be considered a way of looking for aggregated duplicates, but I don't know how to do that. Any suggestions or pointers would be greatly appreciated (or a better way to create the table if this type of query will not work).
It's pretty easy with a couple CTEs:
with f1 as (select
facs_run_id,
array_agg(fcj_id) as flist
from facs_panel
group by facs_run_id),
f2 as (select flist, count(*)
from f1
group by flist
having count(*) > 1)
select f2.flist, f1. facs_run_id
from f2
join f1 on (f2.flist = f1.flist)
order by flist, facs_run_id;
The data from the question, run through this query, produces:
flist | facs_run_id
-----------+-------------
{4,12,17} | 1
{4,12,17} | 2
(2 rows)

Excel: Arrange Rows to Match Another Row

I have a database with a column A with some values, and then 2 additional columns: Column B contains a bunch values that will match the ones in column A, but are not in order. I also have a Column with information that pertains to that specific row, amnd would like it to stay in 'sync' with column B. For example:
| A | B | C |
1 3 A
2 1 F
3 2 D
4 5 R
5 4 P
I'd like a way to sort it so my result would be:
| A | B | C |
1 1 F
2 2 D
3 3 A
4 4 P
5 5 R
Is there a way to do this?
If possible, if there is no match, delete the row?
In Excel 2007/2010,
simply select the cell with the "B" in it and go to the Data tab along the top and click on the A to Z button which is near the middle of the data tab. As long as B & C are adjacent columns, they will sort according to your needs. Please Note, Column A must not be adjacent to the other 2, otherwise you would run through the same procedure above but you would highlight columns B and C and perform the same sort button steps. If it gives you a Sort Warning, click the "Continue with the Current Selection" radio button and OK.

Resources