Select rows from array of uuid when dealing with two tables - node.js

I have products and providers. Each product has an uuid and each provider has a list of uuid of products that they can provide.
How do I select all the products that a given (i.e. by provider uuid) provider can offer?
Products:
+------+------+------+
| uuid | date | name |
+------+------+------+
| 0 | - | - |
| 1 | - | - |
| 2 | - | - |
+------+------+------+
Providers:
+------+----------------+
| uuid | array_products |
+------+----------------+
| 0 | [...] |
| 1 | [...] |
| 2 | [...] |
+------+----------------+

select p.name, u.product_uuid
from products p
join
(
select unnest(array_products) as product_uuid
from providers where uuid = :target_provider_uuid
) u on p.uuid = u.product_uuid;
Please note however that your data design is not efficient and much harder to work with than a normalized one.

Related

How to create sketches for million of data using Spark?

I have a data frame something like this
| UserID | Platform | Genre | Publisher |
| -------- | ------- |-----------|-----------|
| 1 | PS2. | FPS | Activision|
| 2 | PS1. | Race |EA Sports. |
| 3 | PS2. |RTS |Microsoft. |
| 4 | Xbox. | Race |EA Sports. |
Now from the above data frame, I want to build a Map that has column name and value as keys and a set of user Id as values.
For Ex
Platform_PS2 = [1,3]
Platform_Xbox = [4]
Platform_PS1 = [2]
Genere_Race = [2,4]
Basically, for these arrays, I want to build sketches at the end

Creating an incremental model in DBT+Spark with no unique_key

I have a user table as follows
|------------|-----------------|
| user_id | visited |
|------------|-----------------|
| 1 | 12-23-2021 |
| 1 | 11-23-2021 |
| 1 | 10-23-2021 |
| 2 | 01-21-2021 |
| 3 | 02-19-2021 |
| 3 | 02-25-2021 |
|------------|-----------------|
I'm trying to create an incremental model to get the user's recent visited date.
Since the incremental model needs an unique key, I'm concatenating user_id||visited -> unique_id
DBT + Spark
{{ config(
materialized='incremental',
file_format='delta',
unique_key='unique_id',
incremental_strategy='merge'
) }}
with CTE as (
select user_id,
visited,
user_id||visited as unique_id
from my_table
{% if is_incremental() %}
where visited >= date_add(current_date, -1)
{% endif %}
)
select user_id,
unique_id,
max(visited) as recent_visited_date
from CTE
group by 1,2
This above model is giving me the result as follows
|------------|------------------|-----------------------|
| user_id | unique_id |recent_visited_date |
|------------|------------------|-----------------------|
| 1 | 112-23-2021 | 12-23-2021 |
| 1 | 111-23-2021 | 11-23-2021 |
| 1 | 110-23-2021 | 10-23-2021 |
| 2 | 201-21-2021 | 01-21-2021 |
| 3 | 302-19-2021 | 02-19-2021 |
| 3 | 302-25-2021 | 02-25-2021 |
|------------|------------------|-----------------------|
The output what I wanted is
|------------|------------------------|
| user_id | recent_visited_date |
|------------|------------------------|
| 1 | 12-23-2021 |
| 2 | 01-21-2021 |
| 3 | 02-25-2021 |
|------------|------------------------|
I know that for the incremental model with merge strategy, the unique_id should be in the final table in order to compare
but having the unique_id is giving the wrong output
Is there any other way around to get the max(visited) for the user?

Oracle: update table where number column in a string variable

Here is what I want to do:
current table:
+----+-------------+
| id | data |
+----+-------------+
| 1 | max |
| 2 | linda |
| 3 | sam |
| 4 | henry |
+----+-------------+
I have a id_str=1,3,4
Mystery Query - something like:
UPDATE table SET data = 'jen' where id in (id_str)
resulting table:
+----+-------------+
| id | data |
+----+-------------+
| 1 | jen |
| 2 | lindaa |
| 3 | jen |
| 4 | jen |
+----+-------------+
Starting from a list of ids given as a CSV string, say :id_str, you can do:
update mytable
set data = 'jen'
where ',' || :id_str || ',' like ',%' || id || ',%'
An alternative is a regex functions:
where regexp_like(:id_str, '(^|,)' || id || '(,|$)')
Both solutions work, but are rather inefficient. A much better solution would be not to pass the serch parameters as a proper list of values rather than a CSV string.

Find all occurrences from a string - Presto

I have the following as rows in HIVE (HDFS) and using Presto as the Query Engine.
1,#markbutcher72 #charlottegloyn Not what Belinda Carlisle thought. And yes, she was singing about Edgbaston.
2,#tomkingham #markbutcher72 #charlottegloyn It's true the garden of Eden is currently very green...
3,#MrRhysBenjamin #gasuperspark1 #markbutcher72 Actually it's Springfield Park, the (occasional) home of the might
The requirement is to do get the following through Presto Query. How can we get this please
1,markbutcher72
1,charlottegloyn
2,tomkingham
2,markbutcher72
2,charlottegloyn
3,MrRhysBenjamin
3,gasuperspark1
3,markbutcher72
select t.id
,u.token
from mytable as t
cross join unnest (regexp_extract_all(text,'(?<=#)\S+')) as u(token)
;
+----+----------------+
| id | token |
+----+----------------+
| 1 | markbutcher72 |
| 1 | charlottegloyn |
| 2 | tomkingham |
| 2 | markbutcher72 |
| 2 | charlottegloyn |
| 3 | MrRhysBenjamin |
| 3 | gasuperspark1 |
| 3 | markbutcher72 |
+----+----------------+

How can I do multiple concurrent insert transactions against postgres without causing a deadlock?

I have a large dump file that I am processing in parallel and inserting into a postgres 9.4.5 database. There are ~10 processes that all are starting a transaction, inserting ~X000 objects, and then committing, repeating until their chunk of the file is done. Except they never finish because the database locks up.
The dump contains 5 million or so objects, each object representing an album. An object has a title, a release date, a list of artists, a list of track names etc. I have a release table for each one of these (who's primary key comes from the object in the dump) and then join tables with their own primary keys for things like release_artist, release_track.
The tables look like this:
Table: mdc_releases
Column | Type | Modifiers | Storage | Stats target | Description
-----------+--------------------------+-----------+----------+--------------+-------------
id | integer | not null | plain | |
title | text | | extended | |
released | timestamp with time zone | | plain | |
Indexes:
"mdc_releases_pkey" PRIMARY KEY, btree (id)
Table: mdc_release_artists
Column | Type | Modifiers | Storage | Stats target | Description
------------+---------+------------------------------------------------------------------+---------+--------------+-------------
id | integer | not null default nextval('mdc_release_artists_id_seq'::regclass) | plain | |
release_id | integer | | plain | |
artist_id | integer | | plain | |
Indexes:
"mdc_release_artists_pkey" PRIMARY KEY, btree (id)
and inserting an object looks like this:
insert into release(...) values(...) returning id; // refer to id below as $ID
insert into release_meta(release_id, ...) values ($ID, ...);
insert into release_artists(release_id, ...) values ($ID, ...), ($ID, ...), ...;
insert into release_tracks(release_id, ...) values ($ID, ...), ($ID, ...), ...;
So the transactions look like BEGIN, the above snippet 5000 times, COMMIT. I've done some googling on this and I'm not sure why what look to me like independent inserts are causing deadlocks.
This is what select * from pg_stat_activity shows:
| state_change | waiting | state | backend_xid | backend_xmin | query
+-------------------------------+---------+---------------------+-------------+--------------+---------------------------------
| 2016-01-04 18:42:35.542629-08 | f | active | | 2597876 | select * from pg_stat_activity;
| 2016-01-04 07:36:06.730736-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:37:36.066837-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:37:36.314909-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:37:49.491939-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:36:04.865133-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:38:39.344163-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:36:48.400621-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:34:37.802813-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:37:24.615981-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:37:10.887804-08 | f | idle in transaction | | | BEGIN
| 2016-01-04 07:37:44.200148-08 | f | idle in transaction | | | BEGIN

Resources