Selecting key and value of dictionary in Azure DocumentDB / Azure CosmosDB - azure

Consider these two sample documents stored in DocumentDB.
Document 1
"JobId": "04e63d1d-2af1-42af-a349-810f55817602",
"JobType": 3,
"
"Properties": {
"Key1": "Value1",
"Key2": "Value2"
}
"KeyNames": ["Key1", "Key2"]
Document 2
"JobId": "04e63d1d-2af1-42af-a349-810f55817603",
"JobType": 4,
"
"Properties": {
"Key3": "Value3",
"Key4": "Value4"
}
"KeyNames": ["Key3", "Key4"]
I want to select the all the keys and all the values in Properties object for each document.
Something like:
SELECT
c.JobId,
c.JobType,
c.Properties.<Keys> AS Keys,
c.Properties.<Values> AS Values
FROM c
But as you can see the keys are not fixed. So how do I list them? So finally I get a result like this. I cannot use .NET or LINQ. I need a query to be executed in the DocumentDB Query Explorer.
[
{
"JobId": "04e63d1d-2af1-42af-a349-810f55817602",
"JobType": 3,
"Key1": "Value1"
}
{
"JobId": "04e63d1d-2af1-42af-a349-810f55817602",
"JobType": 3,
"Key2": "Value2"
}
{
"JobId": "04e63d1d-2af1-42af-a349-810f55817603",
"JobType": 4,
"Key3": "Value3"
}
{
"JobId": "04e63d1d-2af1-42af-a349-810f55817603",
"JobType": 4,
"Key4": "Value4"
}
]

I was able to solve my problem using UDF in DocumentDB. Since KeyNames is an array. Self-join was returning the key.
So this query.
SELECT
c.JobId,
c.JobType,
Key,
udf.GetValueUsingKey(c.Properties, Key) AS Value
FROM collection AS c
JOIN Key in c.KeyNames
returned me the desired result.
You can define UDF by using Script Explorer provided in DocumentDB.
For my purpose I used:
function GetValueUsingKey(Properties, Key) {
var result = Properties[Key];
return JSON.stringify(result);
}
Hope this helps :)

Related

CouchDB View : Add document as a new field to a document

Let's suppose I have these two documents :
{
"type": "ip",
"_id": "321",
"key1": "10",
"key2": "20",
"ip_config": {
"ip": "127.0.0.1",
"connexion": "WIFI"
}
}
{
"type": "device",
"_id": "1",
"key1": "10",
"key2": "20",
"device": {
"port": "8808",
"bits": 46
}
}
I want to generate a view in CouuchDB that gives me the following output :
{
"key1": "10",
"key2": "20",
"ip_config": {
"port": "8808",
"bits": 46
},
"device": {
"port": "8808",
"bits": 46
}
}
What is the map function that can help me get this output ?
As #RamblinRose points out, you cannot "join" documents with a view. The only thing you can do is emit the keys that are common between the docs (in this case it looks like key1 and key2 identify this relationship).
So if you had a database called devices and created a design document called test with a view called device-view with a map function:
function (doc) {
emit([doc.key1, doc.key2], null);
}
Then you would be able to obtain all the documents related to the combination of key1 and key2 with:
https://host/devices/_design/test/_view/device-view?include_docs=true&key=[%2210%22,%2220%22]
This would give you:
{"total_rows":2,"offset":0,"rows":[
{"id":"1","key":["10","20"],"value":null,"doc":{"_id":"1","_rev":"1-630408a91350426758c0932ea109f4d5","type":"device","key1":"10","key2":"20","device":{"port":"8808","bits":46}}},
{"id":"321","key":["10","20"],"value":null,"doc":{"_id":"321","_rev":"1-09d9a676c37f17c04a2475492995fade","type":"ip","key1":"10","key2":"20","ip_config":{"ip":"127.0.0.1","connexion":"WIFI"}}}
]}
This doesn't do the join, so you would have to process them to obtain a single document.

CouchDB Mango query - Match any key with array item

I have the following documents:
{
"_id": "doc1"
"binds": {
"subject": {
"Test1": ["something"]
},
"object": {
"Test2": ["something"]
}
},
},
{
"_id": "doc2"
"binds": {
"subject": {
"Test1": ["something"]
},
"object": {
"Test3": ["something"]
}
},
}
I need a Mango selector that retrieves documents where any field inside binds (subject, object etc) has an object with key equals to any values from an array passed as parameter. That is, if keys of binds contains any values of some array it should returns that document.
For instance, consider the array ["Test2"] my selector should retrieve doc1 since binds["subject"]["Test1"] exists; the array ["Test1"] should retrieve doc1 and doc2 and the array ["Test2", "Test3"] should also retrieve doc1 and doc2.
F.Y.I. I am using Node.js with nano lib to access CouchDB API.
I am providing this answer because the luxury of altering document "schema" is not always an option.
With the given document structure this cannot be done with Mango in any reasonable manner. Yes, it can be done, but only when employing very brittle and inefficient practices.
Mango does not provide an efficient means of querying documents for dynamic properties; it does support searching within property values e.g. arrays1.
Using worst practices, this selector will find docs with binds properties subject and object having properties named Test2 and Test3
{
"selector": {
"$or": [
{
"binds.subject.Test2": {
"$exists": true
}
},
{
"binds.object.Test2": {
"$exists": true
}
},
{
"binds.subject.Test3": {
"$exists": true
}
},
{
"binds.object.Test3": {
"$exists": true
}
}
]
}
}
Yuk.
The problems
The queried property names vary so a Mango index cannot be leveraged (Test37 anyone?)
Because of (1) a full index scan (_all_docs) occurs every query
Requires programmatic generation of the $or clause
Requires a knowledge of the set of property names to query (Test37 anyone?)
The given document structure is a show stopper for a Mango index and query.
This is where map/reduce shines
Consider a view with the map function
function (doc) {
for(var prop in doc.binds) {
if(doc.binds.hasOwnProperty(prop)) {
// prop = subject, object, foo, bar, etc
var obj = doc.binds[prop];
for(var objProp in obj) {
if(obj.hasOwnProperty(objProp)) {
// objProp = Test1, Test2, Test37, Fubar, etc
emit(objProp,prop)
}
}
}
}
}
So the map function creates a view for any docs with a binds property with two nested properties, e.g. binds.subject.Test1, binds.foo.bar.
Given the two documents in the question, this would be the basic view index
id
key
value
doc1
Test1
subject
doc2
Test1
subject
doc1
Test2
object
doc2
Test3
object
And since view queries provide the keys parameter, this query would provide your specific solution using JSON
{
include_docs: true,
reduce: false,
keys: ["Test2","Test3"]
}
Querying that index with cUrl
$ curl -G http://{view endpoint} -d 'include_docs=false' -d
'reduce=false' -d 'keys=["Test2","Test3"]'
would return
{
"total_rows": 4,
"offset": 2,
"rows": [
{
"id": "doc1",
"key": "Test2",
"value": "object"
},
{
"id": "doc2",
"key": "Test3",
"value": "object"
}
]
}
Of course there are options to expand the form and function of such a view by leveraging collation and complex keys, and there's the handy reduce feature.
I've seen commentary that Mango is great for those new to CouchDB due to it's "ease" in creating indexes and the query options, and that map/reduce if for the more seasoned. I believe such comments are well intentioned but misguided; Mango is alluring but has its pitfalls1. Views do require considerable thought, but hey, that's we're supposed to be doing anyway.
1) $elemMatch for example require in memory scanning which can be very costly.

Update the Nested Json with another Nested Json using Python

For example, I have one full set of nested JSON, I need to update this JSON with the latest values from another nested JSON.
Can anyone help me with this?
I want to implement this in Pyspark.
Full Set Json look like this:
{
"email": "abctest#xxx.com",
"firstName": "name01",
"id": 6304,
"surname": "Optional",
"layer01": {
"key1": "value1",
"key2": "value2",
"key3": "value3",
"key4": "value4",
"layer02": {
"key1": "value1",
"key2": "value2"
},
"layer03": [
{
"inner_key01": "inner value01"
},
{
"inner_key02": "inner_value02"
}
]
},
"surname": "Required only$uid"
}
LatestJson look like this:
{
"email": "test#xxx.com",
"firstName": "name01",
"surname": "Optional",
"id": 6304,
"layer01": {
"key1": "value1",
"key2": "value2",
"key3": "value3",
"key4": "value4",
"layer02": {
"key1": "value1_changedData",
"key2": "value2"
},
"layer03": [
{
"inner_key01": "inner value01"
},
{
"inner_key02": "inner_value02"
}
]
},
"surname": "Required only$uid"
}
In above for id=6304 we have received updates for the layer01.layer02.key1 and emailaddress fileds.
So I need to update these values to full JSON, Kindly help me with this.
You can load the 2 JSON files into Spark data frames and do a left_join to get updates from the latest JSON data :
from pyspark.sql import functions as F
full_json_df = spark.read.json(full_json_path, multiLine=True)
latest_json_df = spark.read.json(latest_json_path, multiLine=True)
updated_df = full_json_df.alias("full").join(
latest_json_df.alias("latest"),
F.col("full.id") == F.col("latest.id"),
"left"
).select(
F.col("full.id"),
*[
F.when(F.col("latest.id").isNotNull(), F.col(f"latest.{c}")).otherwise(F.col(f"full.{c}")).alias(c)
for c in full_json_df.columns if c != 'id'
]
)
updated_df.show(truncate=False)
#+----+------------+---------+-----------------------------------------------------------------------------------------------------+--------+
#|id |email |firstName|layer01 |surname |
#+----+------------+---------+-----------------------------------------------------------------------------------------------------+--------+
#|6304|test#xxx.com|name01 |[value1, value2, value3, value4, [value1_changedData, value2], [[inner value01,], [, inner_value02]]]|Optional|
#+----+------------+---------+-----------------------------------------------------------------------------------------------------+--------+
Update:
If the schema changes between full and latest JSONs, you can load the 2 files into the same data frame (this way the schemas are being merged) and then deduplicate per id:
from pyspark.sql import Window
from pyspark.sql import functions as F
merged_json_df = spark.read.json("/path/to/{full_json.json,latest_json.json}", multiLine=True)
# order priority: latest file then full
w = Window.partitionBy(F.col("id")).orderBy(F.when(F.input_file_name().like('%latest%'), 0).otherwise(1))
updated_df = merged_json_df.withColumn("rn", F.row_number().over(w))\
.filter("rn = 1")\
.drop("rn")
updated_df.show(truncate=False)

Azure CosmosDB (SQL) - How to query an for a set of objects with a list to return when any element in the object's list matches condition

From the database we need to return all objects who have a closedDate within a date range. The CloseDate property is on a child object contained in the list within the object. I want to return the object if any ClosedDate within that list is within the date range. Currently i'm only able to construct a Cosmos query which returns the object when All CloseDates are in the range but I need to return when Any are in the range.
Current Query
IQueryable<ServiceRepairOrder> query = this.Client.CreateDocumentQuery<ServiceRepairOrder>(UriFactory.CreateDocumentCollectionUri(DatabaseName, ContainerName()), queryOptions)
.Where(ro => ro.AccountId == this.AccountID)
.Where(ro => ro.Items.Any(li => li.ClosedDate >= start && li.ClosedDate <= end) );
Object JSON Example
{
"id": "45144",
"Type": "ServiceRepairOrder",
"AccountID": "account1",
"Items": [
{
"ClosedDate": "someDateInRange",
"Id": "itemId1",
"Key": "value1"
},
{
"ClosedDate": "someDateOutOfRange",
"Id": "itemId2",
"Key": "value2"
}
]
}
Can this help you?
SELECT distinct c.id FROM c JOIN t IN c.Items where t.ClosedDate>="2020-11-1" and t.ClosedDate<="2020-11-30"

How to create complex structure in Cassandra with CQL3

I have problem with presenting complex data structure in cassandra.
JSON example of data :
{
"A": {
"A_ID" : "1111"
"field1": "value1",
"field2": "value2",
"field3": [
{
"id": "id1",
"name": "name1",
"segment": [
{
"segment_id": "segment_id_1",
"segment_name": "segment_name_1",
"segment_value": "segment_value_1"
},
{
"segment_id": "segment_id_2",
"segment_name": "segment_name_2",
"segment_value": "segment_value_2"
},
...
]
},
{
"id": "id2",
"name": "name2",
"segment": [
{
"segment_id": "segment_id_3",
"segment_name": "segment_name_3",
"segment_value": "segment_value_3"
},
{
"segment_id": "segment_id_4",
"segment_name": "segment_name_4",
"segment_value": "segment_value_4"
},
...
]
},
...
]
}
}
Will be used only one query:
Find by A_ID.
I think this data should store in one TABLE (Column Family) and without serialization/deserialization operations for more efficiency.
How can I do this if CQL does not support nested maps and lists?
Cassandra 2.1 adds support for nested structures: https://issues.apache.org/jira/browse/CASSANDRA-5590
The downside to "just store it as a json/protobuf/avro/etc blob" is that you have to read-and-rewrite the entire blob to update any field. So at the very least you should pull your top level fields into Cassandra columns, leveraging collections as appropriate.
As you will be using it just as a key/value, you could actually store it either as JSON, or for saving data more efficiently, something like BSON or event Protobuf.
I personally would store it in the Protobuf record, as it doesn't save the field names which may be repeating in your case.

Resources