Apply a filter on array field of couchDB - couchdb

I'm working on Hyperledger fabric. I need a particular value from array not a full document in CouchDB.
Example
{
"f_id": "1",
"History": [
{
"amount": "1",
"contactNo": "-",
"email": "i2#mail.com"
},
{
"amount": "5",
"contactNo": "-",
"email": "i#gmail.com",
}
],
"size": "12"
}
I want only an email :"i2#mail.com" Object on history array, not a full History array.
mango Query:
{
"selector": {
"History": {
"$elemMatch": {
"email": "i2#mail.com"
}
}
}
}
Output:
{
"f_id": "1",
"History": [
{
"amount": "1",
"contactNo": "-",
"email": "i2#mail.com"
},
{
"amount": "5",
"contactNo": "-",
"email": "i#gmail.com",
}
],
"size": "12"
}
Full History array But needs only the first object of history array.
Can anyone guide me?
Thanks.

I think it's not possible, because rich queries are for retrieving complete records (key-value pairs) according to given selector.
You may want to reconsider your design. For example if you want to hold an history and query from there, this approach may work out:
GetState of your special key my_record.
If key exists:
PutState new value with key my_record.
Enrich old value with additional attributes: {"DocType": "my_history", "time": "789546"}. With the help of these new attributes, it will be possible create indexes and search via querying.
PutState enriched old value with a new key my_record_<uniqueId>
If key doesn't exists, just put your value with key my_record without any new attributes.
With this approach my_record key will always hold latest value. You can query history with any attributes with/out pagination by using indexes (or not, based on your performance concerns).
This approach will also be less space consuming approach. Because if you accumulate history on single key, existing history will be copied to next version every time which means your every entry will consume previous_size + delta, instead of just delta.

Related

Power Automate FIlter Array with Array Object as Attribute

i have a Object-Array1 with some Attributes that are Object-Array2. I want to filter my Object-Array1 only to these elements, that contain a special value in Object-Array2. How wo i do this? Example:
{
"value": [
{
"title": "aaa",
"ID": 1,
"Responsible": [
{
"EMail": "abc#def.de",
"Id": 1756,
},
{
"EMail: "xyz#xyz.com",
"Id": 289,
}
]
},
{
"title": "bbbb",
"ID": 2,
"Responsible": [
{
"EMail": "tzu#iop.de",
"Id": 1756,
}
]
}
]
}
I want to filter my Object-Array1 (with title & id) only to these elements, that contain abc#def.de
How do i do this in Power Automate with the "Filter Array" Object? I tried this way, but didn't work:
Firstly, you haven't entered an expression, you've entered text. That will never work.
Secondly, even if you did set that as an expression, I don't think you'll be able to make it work over an array, at least, not without specifying more properties and making it a little more complex.
I think the easiest way is to use a contains statement after turning the item into a string ...
The expression I am using on the left hand side is ...
string(item()?['Responsible'])
... and this is the result ...

Searching for sub-objects with a date range containing the queried date value

Let's say we're handling the advertising of various job openings across several channels (newspapers, job boards, etc.). For each channel, we can buy a "publication period" which will mean the channel will advertise our job openings during that period. How can we find the jobs for a given channel that have a publication period valid for today (i.e. starting on or before today, and ending on or after today)? The intent is to be able to generate a feed of "active" job openings that (e.g.) a job board can consume periodically to determine which jobs should be displayed to its users.
Another wrinkle is that each job opening is associated with a given tenant id: the feeds will have to be generated scoped to tenant and channel.
Let's say we have the following simplified documents (if you think the data should be modeled differently, please let me know also):
{
"_id": "A",
"tenant_id": "foo",
"name": "Job A",
"publication_periods": [
{
"channel": "linkedin",
"start": "2021-03-10T00:00:0.0Z",
"end": "2021-03-17T00:00:0.0Z"
},
{
"channel": "linkedin",
"start": "2021-04-10T00:00:0.0Z",
"end": "2021-04-17T00:00:0.0Z"
},
{
"channel": "monster.com",
"start": "2021-03-10T00:00:0.0Z",
"end": "2021-03-17T00:00:0.0Z"
}
]
}
{
"_id": "B",
"tenant_id": "foo",
"name": "Job B",
"publication_periods": [
{
"channel": "linkedin",
"start": "2021-04-10T00:00:0.0Z",
"end": "2021-04-17T00:00:0.0Z"
},
{
"channel": "monster.com",
"start": "2021-03-15T00:00:0.0Z",
"end": "2021-03-20T00:00:0.0Z"
}
]
}
{
"_id": "C",
"tenant_id": "foo",
"name": "Job C",
"publication_periods": [
{
"channel": "monster.com",
"start": "2021-05-15T00:00:0.0Z",
"end": "2021-05-20T00:00:0.0Z"
}
]
}
{
"_id": "D",
"tenant_id": "bar",
"name": "Job D",
"publication_periods": [
...
]
}
How can I query the jobs linked to tenant "foo" that have an active publication period for "monster.com" on for the date of 17.03.2021? (I.e. this query should return both jobs A and B.)
Note that the DB will contain documents of other (irrelevant) types.
Since I essentially need to "find all job openings containing an object in the publication_periods array having: CHAN as the channel value, "start" <= DATE, "end" >= DATE" it appears I'd require a Mango query to achieve this, as standard view queries don't provide comparison operators (if this is mistaken, please correct me).
Naturally, I want the Mango query to be executed only on relevant data (i.e. exclude documents that aren't job openings), but I can find references on how to do this (whether in the docs or elsewhere): all resources I found simply seem to define the Mango index on the entire set of documents, relying on the fact that documents where the indexed field is absent won't be indexed.
How can I achieve what I'm after?
Initially, I was thinking of creating a view that would emit the publication period information along with a {'_id': id} object in order to "JOIN" the job opening document to the matching periods at query time (per Best way to do one-to-many "JOIN" in CouchDB). However, I realized that I wouldn't be able to query this view as needed (i.e. "start" value before today, "end" value after today) since I wouldn't have a definite start/end key to use... And I have no idea how to properly leverage a Mango index/query for this. Presumably I'd have to create a partial index based on document type and the presence of publication periods, but how can I even index the multiple publication periods that can be located within a single document? Can a Mango index be defined against a specific view as opposed to all documents in the DB?
I stumbled upon this answer Mango search in Arrays indicating that I should be able to index the data with
{
"index": {
"fields": [
"tenant_id",
"publication_periods.[].channel",
"publication_periods.[].start",
"publication_periods.[].end"
]
},
"ddoc": "job-openings-periods-index",
"type": "json"
}
And then query them with
{
"selector": {
"tenant_id": "foo",
"publication_periods": {
"$elemMatch": {
"$and": [
{
"channel": "monster.com"
},
{
"start": {
"$lte": "2021-03-17T00:00:0.0Z"
}
},
{
"end": {
"$gte": "2021-03-17T00:00:0.0Z"
}
}
]
}
}
},
"use_index": "job-openings-periods-index"
"execution_stats": true
}
Sadly, I'm informed that the index "was not used because it does not contain a valid index for this query" and terrible performance, which I will leave for another question.

Search, Sort and aggregate documents

I have a database with two different document types:
{
"id": "1",
"type": "User",
"username": "User 1"
}
and a second document type with the following structure:
{
"id": "2",
"type": "Asset",
"name": "Asset one",
"owner_id": "1" //id of the user who owns the asset
}
We need to display the list of existing assets and the name of the owner (side by side). We were able to achieve this by using views and linked documents. The problem is, now we need to be able to search and sort which is not supported by views.
Is what we're trying to accomplish possible using CouchDB? Can we do this using search indexes?
We're using CouchDB 2.3.1 and we're not able to upgrade (at least for now).
I need to search for username and asset name and also be able to sort by these fields. We don't need a full featured search. Something like matches (case insensitive) is good enough.
The id / owner_id specified in the examples, represent the document _id. A user will not own more than ~10 assets. The normal scenario will be 2/3 assets.
Without knowing the complete nature of the asset documents (e.g. lifetime, immutability etc) this may get you moving in a positive direction. The problem appears that information from both documents is needed to generate a meaningful view, which isn't happening.
Assuming asset names are immutable and the number of assets per user are low, consider decoupling and denormalizing the owner_id relationship by keeping a list of assets in the User document.
For example, a User document Where the assets property contains a collection of owned asset document information (_id, name):
{
"_id": "1",
"type": "User",
"username": "User 1",
"assets": [
[
"2",
"Asset one"
],
[
"10",
"Asset ten"
]
]
}
Given this structure, an Asset document is fairly thin
{
"_id": "2",
"type": "Asset",
"name": "Asset one"
}
I will assume there is much more information in the Asset documents than presented.
So how to get search and sorted results? Consider a design doc _design/user/_view/assets with the following map function:
function (doc) {
if(doc.type === "User" && doc.assets) {
for(var i = 0; i < doc.assets.length; i++) {
/* emit user name, asset name, value as asset doc id */
emit(doc.username + '/' + doc.assets[i][1], { _id: doc.assets[i][0] });
/* emit asset name with leading /, value as User doc _id */
emit('/' + doc.assets[i][1], { _id: doc._id })
}
}
}
Let's assume the database only has the one user "User 1" and two Asset documents "Asset one" and "Asset ten".
This query (using cUrl)
curl -G <db endpoint>/_design/user/_view/assets
yields
{
"total_rows":4,"offset":0,"rows":[
{"id":"1","key":"/Asset one","value":{"_id":"1"}},
{"id":"1","key":"/Asset ten","value":{"_id":"1"}},
{"id":"1","key":"User 1/Asset one","value":{"_id":"2"}},
{"id":"1","key":"User 1/Asset ten","value":{"_id":"10"}}
]
}
Not very interesting, except notice the rows are returned in ascending order according to its key. To reverse the order simply adding the descending=true parameter
curl -G <db endpoint>/_design/user/_view/assets?descending=true
yields
{
"total_rows":4,"offset":0,"rows":[
{"id":"1","key":"User 1/Asset ten","value":{"_id":"10"}},
{"id":"1","key":"User 1/Asset one","value":{"_id":"2"}},
{"id":"1","key":"/Asset ten","value":{"_id":"1"}},
{"id":"1","key":"/Asset one","value":{"_id":"1"}}
]
}
Now here's where things get cool, and those cool things are startkey and endkey.
For the nature of the keys we can query all assets for "User 1" and have the Asset documents returned in ordered fashion according to the asset name, leveraging the slash in the key
curl -G <db endpoint>/_design/user/_view/assets
-d "startkey="""User%201/"""" -d "endkey="""User%201/\uFFF0""""
note I'm on Windows, where we have to escape double quotes ;(
yields
{
"total_rows":4,"offset":2,"rows":[
{"id":"1","key":"User 1/Asset one","value":{"_id":"2"}},
{"id":"1","key":"User 1/Asset ten","value":{"_id":"10"}}
]
}
This is a prefix search. Note the use of the high unicode character \uFFF0 as a terminator; we're asking for all documents in the view that start with "User 1/".
Likewise to get a sorted list of all Assets
curl -G <db endpoint>/_design/user/_view/assets
-d "startkey="""/"""" -d "endkey="""/\uFFF0""""
yields
{
"total_rows":4,"offset":0,"rows":[
{"id":"1","key":"/Asset one","value":{"_id":"1"}},
{"id":"1","key":"/Asset ten","value":{"_id":"1"}}
]
}
Since the Asset document _id is emit'ed, use include_docs to fetch the Asset document:
curl -G <db endpoint>_design/user/_view/assets -d "include_docs=true"
-d "startkey="""User%201/"""" -d "endkey="""User%201/\uFFF0""""
yields
{
"total_rows": 4,
"offset": 2,
"rows": [
{
"id": "1",
"key": "User 1/Asset one",
"value": {
"_id": "2"
},
"doc": {
"_id": "2",
"_rev": "2-f4e78c52b04b77e4b5d2787c21053155",
"type": "Asset",
"name": "Asset one"
}
},
{
"id": "1",
"key": "User 1/Asset ten",
"value": {
"_id": "10"
},
"doc": {
"_id": "10",
"_rev": "2-30cf9245b2f3e95f22a06cee6789d91d",
"type": "Asset",
"name": "Asset 10"
}
}
]
}
Same goes for Assets where the User _id is emit'ted.
Caveat
The major drawback here is that deleting an Asset document requires updating the User document; not the end of the world but it would be ultra nice to avoid that dependency.
Given the original 1-1 relationship of asset to user, totally getting rid of the Asset document all together and simply storing all Asset data with the User document might be feasible depending on your usage, and wildly reduces complexity.
I hope the above inspires a solution. Good luck!

Azure Search match against two properties of the same object

I would like to do a query matches against two properties of the same item in a sub-collection.
Example:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "person.1#xpto.org" },
{ "type": "phone", "value": "555-12345" },
]
}
]
I would like to be able to search by emails than contain xpto.org but,
doing something like the following doesn't work:
search.ismatchscoring('email','contacts/type,','full','all') and search.ismatchscoring('/.*xpto.org/','contacts/value,','full','all')
instead, it will consider the condition in the context of the main object and objects like the following will also match:
[
{
"name": "Person 1",
"contacts": [
{ "type": "email", "value": "555-12345" },
{ "type": "phone", "value": "person.1#xpto.org" },
]
}
]
Is there any way around this without having an additional field that concatenates type and value?
Just saw the official doc. At this moment, there's no support for correlated search:
This happens because each clause applies to all values of its field in
the entire document, so there's no concept of a "current sub-document
https://learn.microsoft.com/en-us/azure/search/search-howto-complex-data-types
and https://learn.microsoft.com/en-us/azure/search/search-query-understand-collection-filters
The solution I've implemented was creating different collections per contact type.
This way I'm able to search directly in, lets say, the email collection without the need for correlated search. It might not be the solution for all cases but it works well in this case.

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

Resources