How to do inner join with two containers in Cosmos DB - azure

I have two containers in Cosmos DB and I need to do an inner join between them.
Exemple:
VD_VENDAS
VD_ESTOQUE
VD_COMPRAS
I need to do inner join between VD_VENDAS and VD_ESTOQUE. I try it:
SELECT v.obraId
FROM VD_VENDAS v
JOIN VD_ESTOQUE e IN v.obraId
But it doesn't work. How can I do it ?

Cosmos DB's JOIN is only for self-joins (joining against data within the same document). There is no way to run a query that performs a join across multiple containers.
To accomplish something similar to what you want, you'll need to run two queries: one against each container (with whatever filtering you need to do). And how you write those queries is really up to you.

Related

cassandra - SELECT result as WHERE condition

i want to use the result of select query as input of another queries condition like this:
DELETE FROM message_user WHERE id = 8a81de70-1991-11e9-a38f-9e0aa7c9f25f and group = e5b04c50-1982-11e9-abf3-b17ecbb80329 and receiver in (SELECT member FROM chat_group_member WHERE id = e5b04c50-1982-11e9-abf3-b17ecbb80329)
Cassandra is distributed database, Nested queries are type of joins. In Cassandra Data might be stored on multiple host. In order to make joint large data might need to be downloaded on single node. This might cause performance issues as all nodes are on commodity hardware (peer to peer). Hence I think its not supported.

Is it possible to use multiple databases within same query using pg-promise?

I have two related tables T1 and T2 in different databases D1 and D2. I need to do an inner join between two tables.
From here: Joining Results from Two Separate Databases it is clear that separate queries should be made to two databases and results to be consolidated on client side OR use dblink / postgres-fdw.
However, I see this issue: Multiple Databases #1
and the use of $dc parameter here: pg-promise/Database.
I believe issue: Multiple Databases #1
just facilitates allowing connecting to multiple databases within the same codebase.
The description of $dc parameter states:
This is mainly to facilitate the use of multiple databases which may need separate protocol extensions, or different implementations within a single task
However, I did not find any examples.
Is the $dc paramter just a database context object that can be accessed, or would it allow to do an inner join between two different databases?
Is there a way to do utilise two database connections but do a join across databases in without having to do it on client-side using pg-promise?
Is the $dc paramter just a database context object that can be accessed, or would it allow to do an inner join between two different databases?
It is the former.
Is there a way to do utilise two database connections but do a join across databases in without having to do it on client-side using pg-promise?
No. Each Database object represents only a single connection to A database.
Database Context is there to allow re-use of tasks, transactions and protocol extensions across multiple Database objects, by relying on its value.

CosmosDB Join (SQL API)

I'm using CosmosDB using SQL API and I'm trying to join two collections. I saw join example within a document but not getting what actually looking.
RequestLog
{
"DateTimeStamp": "2018-03-16T10:56:52.1411006Z",
"RequestId": "8ce80648-66e2-4357-98a8-7a71e8b65301",
"IPAddress": "0.0.0.173"
}
ResponseLog
{
"DateTimeStamp": "2018-03-16T10:56:52.1411006Z",
"RequestId": "8ce80648-66e2-4357-98a8-7a71e8b65301",
"Body": "Hello"
}
Is it possible to join both collections? how?
Actually Cosmos DB JOIN operation is limited to the scope of a single document. What possible is you can join parent object with child objects under same document.
Cross-document joins are NOT supported, so you would have to implement such query yourself.
It is not possible to write join queries across multiple collections in Cosmos, or even across multiple documents in a single collection for that matter. Your only options here would be to issue separate queries (preferably in parallel) OR if your documents lived together in the same collection, you could retrieve all the relevant logs for a request using the common RequestId property.
SELECT * from c WHERE c.RequestId = '8ce80648-66e2-4357-98a8-7a71e8b65301'
This will only work if the object structure across the documents is the same. In this example it's possible because they both share a property of the same name called RequestId. You can't do JOIN on arbitrary properties.
In Azure Cosmos DB, joins are scoped to a single item. Cross-item and cross-container joins are not supported. check documentation here

Selecting from multiple tables in Cassandra CQL

So I have two tables in the query I am using:
SELECT
R.dst_ap, B.name
FROM airports as A, airports as B, routes as R
WHERE R.src_ap = A.iata
AND R.dst_ap = B.iata;
However it is throwing the error:
mismatched input 'as' expecting EOF (..., B.name FROM airports [as] A...)
Is there anyway I can do what I am attempting to do (which is how it works relationally) in Cassandra CQL?
The short answer, is that there are no joins in Cassandra. Period. So using SQL-based JOIN syntax will yield an error similar to what you posted above.
The idea with Cassandra (or any distributed database) is to ensure that your queries can be served by a single node (cutting down on network time). There really isn't a way to guarantee that data from different tables could be queried from a single node. For this reason, distributed joins are typically seen as an anti-pattern. To that end, Cassandra simply doesn't allow them.
In Cassandra you need to take a query-based modeling approach. So you could solve this by building a table from your post-join result set, consisting of desired combinations of dst_ap and name. You would have to find an appropriate way to partition this table, but ultimately you would want to build it based on A) the result set you expect to see and B) the properties you expect to filter on in your WHERE clause.

ARRAY_CONTAINS vs JOIN in azure-cosmosDB

The JSON documents that we plan to ingest into DocumentDb look as follows…
[
{"id":"id1","LastName": “user1”, "GroupMembership":["g1","g2"]},
{"id":"id2","LastName": “user2”, "GroupMembership":["g1","g4","g5"]},
{"id":"id3","LastName": “user3”, "GroupMembership":["g3","g4","g2"]},
…
]
We want to answer queries such as, get me count of all users who are members of group “g1” or “g2” etc…. The number of users is very large (few millions)…
What is the best way to implement this query and use the index and avoid any scans…
Should I be using ARRAY_CONTAINS or JOIN (does ARRAY_CONTAINS internally use the index or is it doing a scan)…
Option1)
SELECT VALUE COUNT(1) FROM Users WHERE ARRAY_CONTAINS(Users.GroupMembership, "g1") or ARRAY_CONTAINS(Users.GroupMembership, "g2")
Option2)
SELECT VALUE COUNT(1) FROM Users JOIN Membership in Users.GroupMembership WHERE Membership = "g1" or Membership = "g2"
Both queries should utilize the index the same way, but ARRAY_CONTAINS is likely to provide a better execution time compared to JOIN. You could profile both queries using the Query Metrics as per this article: https://learn.microsoft.com/en-us/azure/cosmos-db/documentdb-sql-query-metrics#query-execution-metrics
Both shall provide same index utilization, however with the JOIN usage you can get duplicating results per entry and with the ARRAY_CONTAINS you won't. I guess that difference is very significant. See more about duplicating issue in the replies to Getting duplicate records in select query for the Azure DocumentDB and Cosmos db joins give duplicate results SO question.

Resources