Sub query and GroupBy in Azure cosmosDB

Sub query and GroupBy in Azure cosmosDB - azure

I have a situation where i need to get container item based on a GroupBy sub query. This looks simple but not working for me. Help is appreciated! Below is the sql query
SELECT * FROM my_container
WHERE my_container.item.Id
IN (SELECT VALUE c.item.Id FROM c WHERE c.item.name = 'ABC'
GROUP BY c.item.Id )
Gives error as it is not in correct IN acceptable format. IN ('a', 'b')
My my_container items are something like:
[
{
item : {
name: "ABC",
id: "1",
address1: "address1",
city: "city1"
},
item : {
name: "ABC",
id: "2",
address1: "address2",
city: "city2"
},
item : {
name: "ABC",
id: "3",
address1: "address3",
city: "city3"
},
}
]

The result of your sub query is an array[],but keyword IN just supports ().
I tried this sql:
SELECT * FROM c WHERE ARRAY_CONTAINS((SELECT VALUE c.item.id FROM c WHERE c.item.name = 'ABC' GROUP BY c.item.id),c.item.id,false)
But it gets 0 rows.The reason is that ARRAY_CONTAINS() function does not support sub query as argument.
AS a workaround:
you should use 2 sqls to achieve the goal.
First,execute the sql SELECT VALUE c.item.id FROM c WHERE c.item.name = 'ABC' GROUP BY c.item.id to get the outputs array[].
Then,pass the result you get at the first step to ARRAY_CONTAINS() and execute the below sql
SELECT * FROM c WHERE ARRAY_CONTAINS(['1','2','3'],c.item.id,false)
By the way,sub query in cosmos db unlike the relation database's.Learn more about sub query,please refer to this document.

In addition to Steve's answer, you can call ARRAY function on your subquery, instead of copying & pasting query result. Like this:
SELECT * FROM c WHERE ARRAY_CONTAINS(
ARRAY(SELECT VALUE c.item.id FROM c WHERE c.item.name = 'ABC' GROUP BY c.item.id),
c.item.id,
false
)

Related

Is there a way to specify the column name generated by the inline_outer function in Spark SQL?

I have a table named order like this:
id
campaigns
2
[{"id": "1", "title": "test", "type": "one"}, {"id": "2", "title": "test2", "type": "two"}]
5
[{"id": "3", "title": "test3", "type": "three"}]
What I expect:
id
campaignId
title
type
2
1
test
one
2
2
test2
two
5
3
test3
three
My code:
SELECT orderId AS id, id AS campaignid, title, type
FROM (
SELECT id AS orderId, inline_outer(from_json(campaigns, 'ARRAY<STRUCT<id: STRING, title: STRING, type: STRING>>'))
FROM order
);
I have to rename the id field to orderId in subQuery because my campaigns field include an id key.
Question: Is there a way to specify the column name generated by the inline_outer function in Spark SQL?
What I tried:
SELECT id, inline_outer(from_json(campaigns, 'ARRAY<STRUCT<id: STRING, title: STRING, type: STRING>>')) AS ('campaignId', 'title', 'type')
FROM order;
SELECT id, inline_outer(from_json(campaigns, 'ARRAY<STRUCT<id: STRING, title: STRING, type: STRING>>')) AS {'campaignId', 'title', 'type'}
FROM order;
However, the above two methods do not conform to the syntax of Spark SQL.
Thank you in advance.

You need to cast the from_json output and change the column name:
SELECT
id,
inline_outer(cast(from_json(campaigns, 'ARRAY<STRUCT<id: STRING, title: STRING, type: STRING>>') AS ARRAY<STRUCT<campaignId: STRING, title: STRING, type: STRING>>))
FROM order;

Here is a solution using full pyspark :
from pyspark.sql import functions as F, types as T
# Define schema of the JSON
schema = T.ArrayType(
T.StructType(
[
T.StructField("id", T.StringType()),
T.StructField("title", T.StringType()),
T.StructField("type", T.StringType()),
]
)
)
# OR you can use also this schema with your current example
schema = T.ArrayType(T.MapType(T.StringType(), T.StringType()))
# Convert string to struct
df = df.withColumn(
"campaigns",
F.from_json("campaigns", schema),
)
# Explode the array
df = df.withColumn("campaign", F.explode("campaigns"))
# Rename the field
df = df.select(
"id",
F.col("campaign.id").alias("caimpagnId"),
F.col("campaign.title"),
F.col("campaign.type"),
)
+---+----------+-----+-----+
| id|caimpagnId|title| type|
+---+----------+-----+-----+
| 2| 1| test| one|
| 2| 2|test2| two|
| 5| 3|test3|three|
+---+----------+-----+-----+

CosmosDB - list in aggregate query response

I have following document structure:
{
"id": "1",
"aId": "2",
"bId": "3",
....
},
{ "id":"2",
"aId": "2",
"bId": "4"
}
How do i return for that JSON that has aId that has list of all bIds of the same aId, and as additional field: count of such bIds? So for example above and condtion: "WHERE aId="2" response would be:
{
"aId": "2",
"bIds" : ["4","3"],
"bIds count" : 2
}
Assuming i only pass one aId as parameter.
I tried something like:
select
(select 'something') as aId,
(select distinct value c.bId from c where c.aId='something') as bIds
from TableName c
But for love of me i cant figure out how to get that list + its count + hardcoded aId in single JSON response (single row)
For example this query:
select
(select distinct value 'someId') as aId,
(select distinct value c.bId) as bIds
from c where c.aId='someId'
will return
{ { 'aId': 'someId', 'bIds':'2'},{'aId':'someId','bIds':'4'}}
while what i acutally want is
{ {'aId':''someId', 'bIds':['2','4']}}
Here is query that is closest to what i want:
select
c.aId as aId,
count(c2) as bIdCount,
array(select distinct value c2.bId from c2)
from c join (select c.bId from c) as c2
where c.aId = 'SOME_ID'
Only thing line with array make this query fail if i delete this line it works (correctly returns id and count in one row). But i need to select content of this list also, and i ma lost why its not working, example is almost copypasted from "How to perform array projection Cosmos Db"
https://azurelessons.com/array-in-cosmos-db/#How_to_perform_array_projection_Azure_Cosmos_DB

Here is how you'd return an array of bId:
SELECT distinct value c.bId
FROM c
where c.aId = "2"
This yields:
[
"3",
"4"
]
Removing the value keyword:
SELECT distinct c.bId
FROM c
where c.aId = "2"
yields:
[
{ "bId" : "3" },
{ "bId" : "4" }
]
From either of these, you can count the number of array elements returned. If your payload must include count and aId, you'll need to add those to your JSON output.

Cosmos db null value

I have two kind of record mention below in my table staudentdetail of cosmosDb.In below example previousSchooldetail is nullable filed and it can be present for student or not.
sample record below :-
{
"empid": "1234",
"empname": "ram",
"schoolname": "high school ,bankur",
"class": "10",
"previousSchooldetail": {
"prevSchoolName": "1763440",
"YearLeft": "2001"
} --(Nullable)
}
{
"empid": "12345",
"empname": "shyam",
"schoolname": "high school",
"class": "10"
}
I am trying to access the above record from azure databricks using pyspark or scala code .But when we are building the dataframe reading it from cosmos db it does not bring previousSchooldetail detail in the data frame.But when we change the query including id for which the previousSchooldetail show in the data frame .
Case 1:-
val Query = "SELECT * FROM c "
Result when query fired directly
empid
empname
schoolname
class
Case2:-
val Query = "SELECT * FROM c where c.empid=1234"
Result when query fired with where clause.
empid
empname
school name
class
previousSchooldetail
prevSchoolName
YearLeft
Could you please tell me why i am not able to get previousSchooldetail in case 1 and how should i proceed.

As #Jayendran, mentioned in the comments, the first query will give you the previouschooldetail document wherever they are available. Else, the column would not be present.
You can have this column present for all the scenarios by using the IS_DEFINED function. Try tweaking your query as below:
SELECT c.empid,
c.empname,
IS_DEFINED(c.previousSchooldetail) ? c.previousSchooldetail : null
as previousSchooldetail,
c.schoolname,
c.class
FROM c
If you are looking to get the result as a flat structure, it can be tricky and would need to use two separate queries such as:
Query 1
SELECT c.empid,
c.empname,
c.schoolname,
c.class,
p.prevSchoolName,
p.YearLeft
FROM c JOIN c.previousSchooldetail p
Query 2
SELECT c.empid,
c.empname,
c.schoolname,
c.class,
null as prevSchoolName,
null as YearLeft
FROM c
WHERE not IS_DEFINED (c.previousSchooldetail) or
c.previousSchooldetail = null
Unfortunately, Cosmos DB does not support LEFT JOIN or UNION. Hence, I'm not sure if you can achieve this in a single query.
Alternatively, you can create a stored procedure to return the desired result.

How do I select a numerical field in Cosmos DB?

I have data that looks like: {"id": "abc", "1":"2", "3":"5"}
I'm trying to select this data with this SQL query:
SELECT c.3 FROM c WHERE c.id = '102'
This gives me a syntax error. I also tried c.'3' and "c.3" the and c."3", but none of those worked.
Is there a way to do this?

Please try something like:
SELECT c["1"], c["3"] FROM c WHERE c.id = '102'
It will produce an output like:
[
{
"1": "2",
"3": "5"
}
]

Azure Document Query Sub Dictionaries

I have stored the following JSON document in the Azure Document DB:
"JobId": "04e63d1d-2af1-42af-a349-810f55817602",
"JobType": 3,
"
"Properties": [
{
"Key": "Value1",
"Value": "testing1"
},
{
"Key": "Value",
"Value": "testing2"
}
]
When i try to query the document back i can easily perform the
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C.Key = 'Value1'
However when i try to query:
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C.Value = 'testing1'
I get an error that the query cannot be computed. I assume this is due to 'VALUE' being a reserved keyword within the query language.
I cannot specify a specific order in the property array because different subclasses can add different property in different orders as they need them.
Anybody any suggestion how i can still complete this query ?

To escape keywords in DocumentDB, you can use the [] syntax. For example, the above query would be:
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C["Value"] = 'testing1'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sub query and GroupBy in Azure cosmosDB - azure

In addition to Steve's answer, you can call ARRAY function on your subquery, instead of copying & pasting query result. Like this: SELECT * FROM c WHERE ARRAY_CONTAINS( ARRAY(SELECT VALUE c.item.id FROM c WHERE c.item.name = 'ABC' GROUP BY c.item.id), c.item.id, false )

Related

Is there a way to specify the column name generated by the inline_outer function in Spark SQL?

CosmosDB - list in aggregate query response

Cosmos db null value

How do I select a numerical field in Cosmos DB?

Azure Document Query Sub Dictionaries

Categories

Resources