ArangoDB AQL Updating Strange Attribute Names - arangodb

In arangodb I have a Lookup Table as per below:
{
'49DD3A82-2B49-44F5-A0B2-BD88A32EDB13' = 'Human readable value 1',
'B015E210-27BE-4AA7-83EE-9F754F8E469A' = 'Human readable value 2',
'BC54CF8A-BB18-4E2C-B333-EA7086764819' = 'Human readable value 3',
'8DE15947-E49B-4FDC-89EE-235A330B7FEB' = 'Human readable value n'
}
I have documents in a seperate collection such as this which have non human readable attribute and value pairs as per "details" below:
{
"ptype": {
"name": "BC54CF8A-BB18-4E2C-B333-EA7086764819",
"accuracy": 9.6,
"details": {
"49DD3A82-2B49-44F5-A0B2-BD88A32EDB13": "B015E210-27BE-4AA7-83EE-9F754F8E469A",
"8DE15947-E49B-4FDC-89EE-235A330B7FEB": true,
}
}
}
I need to update the above document by looking up the human readable values out of the lookup table and I also need to update the non-human readable attributes with the readable attribute names also found in the lookup table.
The result should look like this:
{
"ptype": {
"name": "Human readable value 3",
"accuracy": 9.6,
"details": {
"Human readable value 1": "Human readable value 2",
"Human readable value n": true,
}
}
}
so ptype.name and ptype.details are updated with values from the lookup table.

This query should help you see how a LUT (Look Up Table) can be used.
One cool feature of AQL is that you can do a LUT query and assign it's value to a variable with the LET command, and then access the contents of that LUT later.
See if this example helps:
LET lut = {
'aaa' : 'Apples',
'bbb' : 'Bananas',
'ccc' : 'Carrots'
}
LET garden = [
{
'size': 'Large',
'plant_code': 'aaa'
},
{
'size': 'Medium',
'plant_code': 'bbb'
},
{
'size': 'Small',
'plant_code': 'ccc'
}
]
FOR doc IN garden
RETURN {
'size': doc.size,
'vegetable': lut[doc.plant_code]
}
The result of this query is:
[
{
"size": "Large",
"vegetable": "Apples"
},
{
"size": "Medium",
"vegetable": "Bananas"
},
{
"size": "Small",
"vegetable": "Carrots"
}
]
You'll notice in the bottom query that actually returns data, it's referring to the LUT by using the doc.plant_code as the look up key.
This is much more performant that performing subqueries there, because if you had 100,000 garden documents you don't want to perform a supporting query 100,000 times to work out the name of the plant_code.
If you wanted to confirm that you could find a value in the LUT, you could optionally have your final query in this format:
FOR doc IN garden
RETURN {
'size': doc.size,
'vegetable': (lut[doc.plant_code] ? lut[doc.plant_code] : 'Unknown')
}
This optional way to return the value for vegetable uses an inline if/then/else, where if the value is not found in the lut, it will return the value 'Unknown'.
Hope this helps you with your particular use case.

Related

CouchDB Mango query - Match any key with array item

I have the following documents:
{
"_id": "doc1"
"binds": {
"subject": {
"Test1": ["something"]
},
"object": {
"Test2": ["something"]
}
},
},
{
"_id": "doc2"
"binds": {
"subject": {
"Test1": ["something"]
},
"object": {
"Test3": ["something"]
}
},
}
I need a Mango selector that retrieves documents where any field inside binds (subject, object etc) has an object with key equals to any values from an array passed as parameter. That is, if keys of binds contains any values of some array it should returns that document.
For instance, consider the array ["Test2"] my selector should retrieve doc1 since binds["subject"]["Test1"] exists; the array ["Test1"] should retrieve doc1 and doc2 and the array ["Test2", "Test3"] should also retrieve doc1 and doc2.
F.Y.I. I am using Node.js with nano lib to access CouchDB API.
I am providing this answer because the luxury of altering document "schema" is not always an option.
With the given document structure this cannot be done with Mango in any reasonable manner. Yes, it can be done, but only when employing very brittle and inefficient practices.
Mango does not provide an efficient means of querying documents for dynamic properties; it does support searching within property values e.g. arrays1.
Using worst practices, this selector will find docs with binds properties subject and object having properties named Test2 and Test3
{
"selector": {
"$or": [
{
"binds.subject.Test2": {
"$exists": true
}
},
{
"binds.object.Test2": {
"$exists": true
}
},
{
"binds.subject.Test3": {
"$exists": true
}
},
{
"binds.object.Test3": {
"$exists": true
}
}
]
}
}
Yuk.
The problems
The queried property names vary so a Mango index cannot be leveraged (Test37 anyone?)
Because of (1) a full index scan (_all_docs) occurs every query
Requires programmatic generation of the $or clause
Requires a knowledge of the set of property names to query (Test37 anyone?)
The given document structure is a show stopper for a Mango index and query.
This is where map/reduce shines
Consider a view with the map function
function (doc) {
for(var prop in doc.binds) {
if(doc.binds.hasOwnProperty(prop)) {
// prop = subject, object, foo, bar, etc
var obj = doc.binds[prop];
for(var objProp in obj) {
if(obj.hasOwnProperty(objProp)) {
// objProp = Test1, Test2, Test37, Fubar, etc
emit(objProp,prop)
}
}
}
}
}
So the map function creates a view for any docs with a binds property with two nested properties, e.g. binds.subject.Test1, binds.foo.bar.
Given the two documents in the question, this would be the basic view index
id
key
value
doc1
Test1
subject
doc2
Test1
subject
doc1
Test2
object
doc2
Test3
object
And since view queries provide the keys parameter, this query would provide your specific solution using JSON
{
include_docs: true,
reduce: false,
keys: ["Test2","Test3"]
}
Querying that index with cUrl
$ curl -G http://{view endpoint} -d 'include_docs=false' -d
'reduce=false' -d 'keys=["Test2","Test3"]'
would return
{
"total_rows": 4,
"offset": 2,
"rows": [
{
"id": "doc1",
"key": "Test2",
"value": "object"
},
{
"id": "doc2",
"key": "Test3",
"value": "object"
}
]
}
Of course there are options to expand the form and function of such a view by leveraging collation and complex keys, and there's the handy reduce feature.
I've seen commentary that Mango is great for those new to CouchDB due to it's "ease" in creating indexes and the query options, and that map/reduce if for the more seasoned. I believe such comments are well intentioned but misguided; Mango is alluring but has its pitfalls1. Views do require considerable thought, but hey, that's we're supposed to be doing anyway.
1) $elemMatch for example require in memory scanning which can be very costly.

How to define an index to use in a Mango Query

I am trying to create a CouchDB Mango Query with an index with the hope that the query runs faster. At the moment I have the following Mango Query which returns what I am looking for but it's slow. Therefore, I assume, I need to create an index to make it faster. I need help figuring out how to create that index.
selector: {
categoryIds: {
$in: categoryIds,
},
},
sort: [{ publicationDate: 'desc' }],
You can assume that my documents are let say news articles from different categories. Therefore in each document I have a field that contains one or more categories that the news article belongs to. For that I have an array of categoryIds for each document. My query needs to be optimized for queries like "Give me all news that have categoryId1 in their array of categoryIds sorted by publicationDate". What I don't know how to do is 1. How to define an index 2. What that index should be 3. How to use that index in "use_index" field of the Mango Query. Any help is appreciated.
Update after "Alexis Côté" answer:
If I define the index like this:
{
"_id": "_design/0f11ca4ef1ea06de05b31e6bd8265916c1bbe821",
"_rev": "6-adce50034e870aa02dc7e1e075c78361",
"language": "query",
"views": {
"categoryIds-json-index": {
"map": {
"fields": {
"categoryIds": "asc"
},
"partial_filter_selector": {}
},
"reduce": "_count",
"options": {
"def": {
"fields": [
"categoryIds"
]
}
}
}
}
}
And run the Mango Query like this:
{
"selector": {
"categoryIds": {
"$in": [
"e0bd5f97ac35bdf6893351337d269230"
]
}
},
"use_index": "categoryIds-json-index"
}
It still does return the results but they are not sorted in the order I want by publicationDate. So I am not clear what you are suggesting the solution is.
You can create an index as documented here
In your case, you will need an index on the "categoryIds" field.
You can specify the index using "use_index": "_design/<name>"
Note:The query planner should automatically pick this index if it's compatible.

How to iterate through indexed field to add field from another index

I'm rather new to elasticsearch, so i'm coming here in hope to find advices.
I have two indices in elastic from two different csv files.
The index_1 has this mapping:
{'settings': {
'number_of_shards' : 3
},
'mappings': {
'properties': {
'place': {'type': 'keyword' },
'address': {'type': 'keyword' },
}
}
}
The file is about 400 000 documents long.
The index_2 with a much smaller file(about 50 documents) has this mapping:
{'settings': {
"number_of_shards" : 1
},
'mappings': {
'properties': {
'place': {'type': 'text' },
'address': {'type': 'keyword' },
}
}
}
The field "place" in index_2 is all of the unique values from the field "place" in index_1.
In both indices the "address" fields are postcodes of datatype keyword with a structure: 0000AZ.
Based on the "place" field keyword in index_1 I want to assign the term of field "address" from index_2.
I have tried using the pandas library but the index_1 file is too large. I have also to tried creating modules based off pandas and elasticsearch, quite unsuccessfully. Although I believe this is a promising direction. A good solution would be to stay into the elasticsearch library as much as possible as these indices will be later be used for further analysis.
If i understand correctly it sounds like you want to use updateByQuery.
the request body should look a little like this:
{
'query': {'term': {'place': "placeToMatch"}},
'script': 'ctx._source.address = "updatedZipCode"'
}
This will update the address field of all documents with the matched place.
EDIT:
So what we want to do is use updateByQuery while iterating over all the documents in index2.
First step: get all the documents from index2, will just do this using the basic search feature
{
"index": 'index2',
"size": 100 // get all documents, once size is over 10,000 you'll have to padginate.
"body": {"query": {"match_all": {}}}
}
Now we iterate over all the results and use updateByQuery for each of the results:
// sudo
doc = response[i]
// update by query request.
{
index: 'index1',
body: {
'query': {'term': {'address': doc._source.address}},
'script': 'ctx._source.place = "`${doc._source.place}`"'
}
}

Couchdb mango query speed

I have following type of documents:
{
"_id": "0710b1dd6cc2cdc9c2ffa099c8000f7b",
"_rev": "1-93687d40f54ff6ca72e66ca7fc99caff",
"date": "2018-06-04T07:46:08.848Z",
"topic": "some topic",
}
The collection is not very large. Only 20k documents.
However, the following query is very slow. Takes ca 5 secs!
{
selector: {
topic: 'some topic'
},
sort: ['date'],
}
I tried various indexes, e.g.
index: {
fields: ['topic', 'date']
}
but nothing really worked well.
What I am missing here?
When sorting in a Mango query, you need to ensure that the sort order you are asking for matches the index that you are using.
If you are indexing the data set in topic,date order then you can use the following query on "topic" to get the data out in data order using the index:
{
"selector": {
"topic": "some topic"
},
"sort": [
"topic",
"date"
]
}
Because the sort matches the form of the data in the index, the index is used to answer the query which should speed up your query time considerably.

Sorting and placing matched values on top

I am using MongoDB and Node.js to display a record set in a page. I have got as far as displaying them on the page alphabetically, but I would like to display one row (the "default" row) at the top, and all the others alphabetically beneath it.
I know, I know, Mongo is definitely not SQL, but in SQL I would have done something like this:
SELECT *
FROM themes
ORDER BY name != "Default", name ASC;
or perhaps even
SELECT * FROM themes WHERE name = "Default"
UNION
SELECT * FROM themes WHERE name != "Default" ORDER BY name ASC;
I have tried a few variations of Mongo's sorting options, such as
"$orderby": {'name': {'$eq': 'Default'}, 'name': 1}}
but without any luck so far. I have been searching a lot for approaches to this problem but I haven't found anything. I am new to Mongo but perhaps I'm going about this all wrong.
My basic code at the moment:
var db = req.db;
var collection = db.get('themes');
collection.find({"$query": {}, "$orderby": {'name': 1}}, function(e, results) {
res.render('themes-saved', {
title: 'Themes',
section: 'themes',
page: 'saved',
themes: results
});
});
You cannot do that in MongoDB, as sorting must be on a specific value already present in a field of your document. What you "can" do is $project a "weighting" to the record(s) matching your condition. As in:
collection.aggregate(
[
{ "$project": {
"each": 1,
"field": 1,
"youWant": 1,
"name": 1,
"weight": {
"$cond": [
{ "$eq": [ "$name", "Default" ] },
10,
0
]
}
}},
{ "$sort": { "weight": -1, "name": 1 } }
],
function(err,results) {
}
);
So you logically inspect the field you want to match a value in ( or other logic ) and then assign a value to that field, and a lower score or 0 to those that do not match.
When you then $sort on that "weighting" first in order ( decending from highest in this case ) so that those values are listed before others with a lower weighting.

Resources