CouchDB View : Add document as a new field to a document - couchdb

Let's suppose I have these two documents :
{
"type": "ip",
"_id": "321",
"key1": "10",
"key2": "20",
"ip_config": {
"ip": "127.0.0.1",
"connexion": "WIFI"
}
}
{
"type": "device",
"_id": "1",
"key1": "10",
"key2": "20",
"device": {
"port": "8808",
"bits": 46
}
}
I want to generate a view in CouuchDB that gives me the following output :
{
"key1": "10",
"key2": "20",
"ip_config": {
"port": "8808",
"bits": 46
},
"device": {
"port": "8808",
"bits": 46
}
}
What is the map function that can help me get this output ?

As #RamblinRose points out, you cannot "join" documents with a view. The only thing you can do is emit the keys that are common between the docs (in this case it looks like key1 and key2 identify this relationship).
So if you had a database called devices and created a design document called test with a view called device-view with a map function:
function (doc) {
emit([doc.key1, doc.key2], null);
}
Then you would be able to obtain all the documents related to the combination of key1 and key2 with:
https://host/devices/_design/test/_view/device-view?include_docs=true&key=[%2210%22,%2220%22]
This would give you:
{"total_rows":2,"offset":0,"rows":[
{"id":"1","key":["10","20"],"value":null,"doc":{"_id":"1","_rev":"1-630408a91350426758c0932ea109f4d5","type":"device","key1":"10","key2":"20","device":{"port":"8808","bits":46}}},
{"id":"321","key":["10","20"],"value":null,"doc":{"_id":"321","_rev":"1-09d9a676c37f17c04a2475492995fade","type":"ip","key1":"10","key2":"20","ip_config":{"ip":"127.0.0.1","connexion":"WIFI"}}}
]}
This doesn't do the join, so you would have to process them to obtain a single document.

Related

Get max of grouped documents in CosmosDb

I want to query Azure CosmosDb documents with SQL API query. These Documents shall be filtered and grouped by specific values. From these groups only the document with a specified max value shall be returned.
Example
Azure CosmosDb Documents
{
"id": "1",
"someValue": "shall be included",
"group": "foo",
"timestamp": "1668907312"
}
{
"id": "2",
"someValue": "shall be included",
"group": "foo",
"timestamp": "1668907314"
}
{
"id": "3",
"someValue": "shall be included",
"group": "bar",
"timestamp": "1668907312"
}
{
"id": "4",
"someValue": "don't include",
"group": "bar",
"timestamp": "1668907312"
}
Query
I want do get all documents
with "someValue": "shall be included"
grouped by group
from group only max of timestamp
Example response
{
"id": "2",
"someValue": "shall be included",
"group": "foo",
"timestamp": "1668907314"
},
{
"id": "3",
"someValue": "shall be included",
"group": "bar",
"timestamp": "1668907312"
}
Question
What is the best way to do this? It would be optimal if
it is possible in one query
and executable with Azure SDK with use of SqlParameter (to prevent injection)
What i've tried
My current approach consists of 2 queries and uses ARRAY_CONTAINS, which does not allow the use of SqlParameter for the document paths.
{
"id": "2",
"some-value": "shall be included",
"group": "foo",
"timestamp": "1668907314"
}
First Query
SELECT c.group AS group
MAX(c.timestamp) AS maxValue
FROM c
WHERE c.someValue = 'shall be included'
GROUP BY c.group
Second Query
SELECT * FROM c WHERE ARRAY_CONTAINS(
<RESULT-FIRST-QUERY>,
{
"group": c.group,
"maxValue": c.timestamp
},
false
)
I would utilize the MAX() function in conjunction with GROUP BY i.e
SELECT *
FROM c
WHERE c.someValue = "shall be included"
GROUP BY c.group
HAVING MAX(c.timestamp)
Haven't run that yet/need to make a collection, but seems like it should do the trick...

Cosmos Db: How to query for the maximum value of a property in an array of arrays?

I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.

Update the Nested Json with another Nested Json using Python

For example, I have one full set of nested JSON, I need to update this JSON with the latest values from another nested JSON.
Can anyone help me with this?
I want to implement this in Pyspark.
Full Set Json look like this:
{
"email": "abctest#xxx.com",
"firstName": "name01",
"id": 6304,
"surname": "Optional",
"layer01": {
"key1": "value1",
"key2": "value2",
"key3": "value3",
"key4": "value4",
"layer02": {
"key1": "value1",
"key2": "value2"
},
"layer03": [
{
"inner_key01": "inner value01"
},
{
"inner_key02": "inner_value02"
}
]
},
"surname": "Required only$uid"
}
LatestJson look like this:
{
"email": "test#xxx.com",
"firstName": "name01",
"surname": "Optional",
"id": 6304,
"layer01": {
"key1": "value1",
"key2": "value2",
"key3": "value3",
"key4": "value4",
"layer02": {
"key1": "value1_changedData",
"key2": "value2"
},
"layer03": [
{
"inner_key01": "inner value01"
},
{
"inner_key02": "inner_value02"
}
]
},
"surname": "Required only$uid"
}
In above for id=6304 we have received updates for the layer01.layer02.key1 and emailaddress fileds.
So I need to update these values to full JSON, Kindly help me with this.
You can load the 2 JSON files into Spark data frames and do a left_join to get updates from the latest JSON data :
from pyspark.sql import functions as F
full_json_df = spark.read.json(full_json_path, multiLine=True)
latest_json_df = spark.read.json(latest_json_path, multiLine=True)
updated_df = full_json_df.alias("full").join(
latest_json_df.alias("latest"),
F.col("full.id") == F.col("latest.id"),
"left"
).select(
F.col("full.id"),
*[
F.when(F.col("latest.id").isNotNull(), F.col(f"latest.{c}")).otherwise(F.col(f"full.{c}")).alias(c)
for c in full_json_df.columns if c != 'id'
]
)
updated_df.show(truncate=False)
#+----+------------+---------+-----------------------------------------------------------------------------------------------------+--------+
#|id |email |firstName|layer01 |surname |
#+----+------------+---------+-----------------------------------------------------------------------------------------------------+--------+
#|6304|test#xxx.com|name01 |[value1, value2, value3, value4, [value1_changedData, value2], [[inner value01,], [, inner_value02]]]|Optional|
#+----+------------+---------+-----------------------------------------------------------------------------------------------------+--------+
Update:
If the schema changes between full and latest JSONs, you can load the 2 files into the same data frame (this way the schemas are being merged) and then deduplicate per id:
from pyspark.sql import Window
from pyspark.sql import functions as F
merged_json_df = spark.read.json("/path/to/{full_json.json,latest_json.json}", multiLine=True)
# order priority: latest file then full
w = Window.partitionBy(F.col("id")).orderBy(F.when(F.input_file_name().like('%latest%'), 0).otherwise(1))
updated_df = merged_json_df.withColumn("rn", F.row_number().over(w))\
.filter("rn = 1")\
.drop("rn")
updated_df.show(truncate=False)

How to validate elements in nested json array in karate?

I am using the karate framework for writing some automated test cases. I'd like to validate the schema for each element in a nested array list . For the example below, I would like to validate each child of each element in the returned array. Is there a way to get an array list of all children of all elements? I can do that by calling some java functions, but I was wondering if there's a way in karate to get that.
Something like "for each element in the returned array validate the schema of each of its children".
Thanks!
[
{
"id": "A",
"children": [
{
"size": "10",
"type": "A",
"name": "B"
},
{
"size": "10",
"type": "A",
"name": "B"
}
]
},
{
"id": "B",
"children": [
{
"size": "10",
"type": "A",
"name": "B"
},
}
"size": "3",
"type": "C",
"name": "D"
}
]
}
]
match each will be more convenient for validating JSON array with a schema,
* def children = $response[*].children[*]
* def schema = { "name": "#string","size": "#string","type": "#string"}
* match each children == schema
This will extract all the values of the children and validate each child is matching with the schema

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

Resources