Cloudant: Searching across databases - search

I have documents across 2 different databases: fruits, vegetables. It's easier for me to keep the databases separated.
Now, suppose I want my user to search from any combination of these databases. Would it work if I run the same query across the three databases, and merge the result. That is: does the order field in a results have an absolute value, or is it relative to the other results? For example:
Run my query on fruits db:
{
total_rows: 2,
bookmark: "xxx",
rows: [
{
id: "Apple",
order: [
2,
220
],
fields: {
title: "Apple"
}
},
{
id: "pear",
order: [
1,
4223
],
fields: {
title: "Pear"
}
}
}
Run my query on vegetable database:
{
total_rows: 1,
bookmark: "xxx",
rows: [
{
id: "brocolli",
order: [
1.5,
3000
],
fields: {
title: "Brocolli"
}
}
}
Then bringing the results together to produce
{
total_rows: 2,
bookmark: "xxx",
rows: [
{
id: "Apple",
order: [
2,
220
],
fields: {
title: "Apple"
}
},
{
id: "brocolli",
order: [
1.5,
3000
],
fields: {
title: "Brocolli"
}
},
{
id: "pear",
order: [
1,
4223
],
fields: {
title: "Pear"
}
}
}
Would this work? Or is it better to just make a single foods database?

It is not possible to perform joins across databases using CouchDB or Cloudant. You will need to either:
put all your data in a single database and query that
have separate databases and replicate the data from each to a single database and query that
have separate databases and perform the join functionality in your application tier
I've added this question to: How can I use my sql knowledge with Cloudant/CouchDB?

Related

MongoDB - findOne with nested subdocuments and projection

I am currently using the code below in node.js to find and return data on various nesting levels from a mongo database. I'd like to add another layer of nesting (as mentioned in #3).
Collection:
[
{
"title": "Category A",
"link": "a",
"items": [
{
"title": "Item C",
"link": "a-c",
"series": [
{
"title": "Item C X",
"link": "a-c-x"
},
{
"title": "Item C Y",
"link": "a-c-y"
},
]
},
{
"title": "Item D",
"link": "a-d"
}
]
},
{
"title": "Category B",
"link": "b"
}
]
The query:
const doc = await ... .findOne(
{
$or: [
{ link: id },
{ "items.link": id },
{ "items.series.link": id }
],
},
{
projection: {
_id: 0,
title: 1,
link: 1,
items: { $elemMatch: { link: id } },
},
}
);
Intended results:
(works) if link of the document is matched,
(works) there should only be an object with the title and link returned
e.g.
value of id variable: "a"
expected query result: { title: "Category A", link: "a"}
(works) if items.link of subdocument is matched,
(works) it should be the same as above + an additional element in the items array returned.
e.g.
value of id variable: "a-c"
expected query result: { title: "Category A", link: "a", items: [{ title: "Item C", link: "a-c" }]}
(works) if items.series.link of sub-subdocument is matched
(struggling with this) it should return the same as in 2. + an additional element inside the matched items.series
e.g.
value of id variable: "a-c-y"
expected query result: { title: "Category A", link: "a", items: [{ title: "Item C", link: "a-c", series: [{ title: "Item C Y", link: "a-c-y" }]}]}
current query result: The whole Category A document with all sub-documents
Questions:
a.) How do I modify the projection to return the expected output in #3 as well?
b.) Is the approach above sound in terms of reading speed from a denormalized structure? I figured there'd probably need to be indexes on link, items.link and items.series.link as they are all completely unique in the document, but maybe there is a way to achieve the above goal with a completely different approach?
Ended up with going half-way via mongodb and get the full item for both - when the item link is matched and the series link is matched:
projection: {
_id: 0,
title: 1,
link: 1,
items: { $elemMatch: { $or: [
{ link: id },
{"series.link": id }
]}},
}
After that javascript filters the series array to see if the series is matched:
doc?.items?.[0]?.series?.find(item => item.link === id)
if the js is truthy (returns an object) we matched a series, if there is a doc, but the js is falsy we matched an item result.
Although not a full mongodb solution and there is definitely room for improvement the above seems to achieve the end goal to be able to distinguish between category, item and series results.

MongoDB assymetrical return of data, first item in array returned in full, the rest with certain properties omitted?

I'm new to MongoDB and getting to grips with its syntax and capabilities. To achieve the functionality described in the title I believe I can create a promise that will run 2 simultaneous queries on the document - one to get the full content of one item in the array (or at least the data that is omitted in the other query, to re-add after), searched for by most recent date, the other to return the array minus specific properties. I have the following document:
{
_id : ObjectId('5rtgwr6gsrtbsr6hsfbsr6bdrfyb'),
uuid : 'something',
mainArray : [
{
id : 1,
title: 'A',
date: 05/06/2020,
array: ['lots','off','stuff']
},
{
id : 2,
title: 'B',
date: 28/05/2020,
array: ['even','more','stuff']
},
{
id : 3,
title: 'C',
date: 27/05/2020,
array: ['mountains','of','knowledge']
}
]
}
and I would like to return
{
uuid : 'something',
mainArray : [
{
id : 1,
title: 'A',
date: 05/06/2020,
array: ['lots','off','stuff']
},
{
id : 2,
title: 'B'
},
{
id : 3,
title: 'C'
}
]
}
How valid and performant is the promise approach versus constructing one query that would achieve this? I have no idea how to perform such 'combined-rule'/conditions in MongoDB, if anyone could give an example?
If your subdocument array you want to omit is not very large. I would just remove it at the application side. Doing processing in MongoDB means you choose to use the compute resources of MongoDB instead of your application. Generally your application is easier and cheaper to scale, so implementation at the application layer is preferable.
But in this exact case it's not too complex to implement it in MongoDB:
db.collection.aggregate([
{
$addFields: { // keep the first element somewhere
first: { $arrayElemAt: [ "$mainArray", 0] }
}
},
{
$project: { // remove the subdocument field
"mainArray.array": false
}
},
{
$addFields: { // join the first element with the rest of the transformed array
mainArray: {
$concatArrays: [
[ // first element
"$first"
],
{ // select elements from the transformed array except the first
$slice: ["$mainArray", 1, { $size: "$mainArray" }]
}
]
}
}
},
{
$project: { // remove the temporary first elemnt
"first": false
}
}
])
MongoDB Playground

transform raw query to mongodb query the efficient way

In a nodejs app with mongodb storage, I have the following query from user:
const rawQuery = [
'{"field":"ingredient","type":"AND","value":"green and blue"}',
'{"field":"ingredient","type":"AND","value":"black"}',
'{"field":"ingredient","type":"OR","value":"pink"}',
'{"field":"ingredient","type":"OR","value":"orange"}',
'{"field":"place","type":"AND","value":"london"}',
'{"field":"school","type":"NOT","value":"fifth"}',
'{"field":"food","type":"OR","value":"burger"}',
'{"field":"food","type":"OR","value":"pizza"}',
'{"field":"ownerFirstName","type":"AND","value":"Jimmy"}'
];
I have a collection called restaurant, and a collection called owners.
Would this query aim to handle such a search scenario?
const query = {
$and: : [
{ ingredient: 'green and blue' },
{ ingredient: 'black' },
{ $or : [
{ ingredient: 'pink' },
{ ingredient: 'orange' },
]
},
{ place: 'london' },,
{ school: { $ne: 'fifth' } },
{ $or : [
{ food: 'burger' },
{ food: 'pizza' },
]
}
]
};
How can I transform the rawQuery into this mongo query? (Given that it has to be dynamic, because I have many fields, and in this example I just included a couple of them.)
This example query aims to get the restaurants that match the description/place/school/food queries in the restaurant and also to match the owner's first name from another collection. Each restaurant document will have a ownerUuid field that points to the owner in the other collection.
What is the best solution to do a search in the mongodb for such a query in production env?
How can this be achieved with Elasticsearch?

Push if not present or update a nested array mongoose [duplicate]

I have documents that looks something like that, with a unique index on bars.name:
{ name: 'foo', bars: [ { name: 'qux', somefield: 1 } ] }
. I want to either update the sub-document where { name: 'foo', 'bars.name': 'qux' } and $set: { 'bars.$.somefield': 2 }, or create a new sub-document with { name: 'qux', somefield: 2 } under { name: 'foo' }.
Is it possible to do this using a single query with upsert, or will I have to issue two separate ones?
Related: 'upsert' in an embedded document (suggests to change the schema to have the sub-document identifier as the key, but this is from two years ago and I'm wondering if there are better solutions now.)
No there isn't really a better solution to this, so perhaps with an explanation.
Suppose you have a document in place that has the structure as you show:
{
"name": "foo",
"bars": [{
"name": "qux",
"somefield": 1
}]
}
If you do an update like this
db.foo.update(
{ "name": "foo", "bars.name": "qux" },
{ "$set": { "bars.$.somefield": 2 } },
{ "upsert": true }
)
Then all is fine because matching document was found. But if you change the value of "bars.name":
db.foo.update(
{ "name": "foo", "bars.name": "xyz" },
{ "$set": { "bars.$.somefield": 2 } },
{ "upsert": true }
)
Then you will get a failure. The only thing that has really changed here is that in MongoDB 2.6 and above the error is a little more succinct:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16836,
"errmsg" : "The positional operator did not find the match needed from the query. Unexpanded update: bars.$.somefield"
}
})
That is better in some ways, but you really do not want to "upsert" anyway. What you want to do is add the element to the array where the "name" does not currently exist.
So what you really want is the "result" from the update attempt without the "upsert" flag to see if any documents were affected:
db.foo.update(
{ "name": "foo", "bars.name": "xyz" },
{ "$set": { "bars.$.somefield": 2 } }
)
Yielding in response:
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
So when the modified documents are 0 then you know you want to issue the following update:
db.foo.update(
{ "name": "foo" },
{ "$push": { "bars": {
"name": "xyz",
"somefield": 2
}}
)
There really is no other way to do exactly what you want. As the additions to the array are not strictly a "set" type of operation, you cannot use $addToSet combined with the "bulk update" functionality there, so that you can "cascade" your update requests.
In this case it seems like you need to check the result, or otherwise accept reading the whole document and checking whether to update or insert a new array element in code.
if you dont mind changing the schema a bit and having a structure like so:
{ "name": "foo", "bars": { "qux": { "somefield": 1 },
"xyz": { "somefield": 2 },
}
}
You can perform your operations in one go.
Reiterating 'upsert' in an embedded document for completeness
I was digging for the same feature, and found that in version 4.2 or above, MongoDB provides a new feature called Update with aggregation pipeline.
This feature, if used with some other techniques, makes possible to achieve an upsert subdocument operation with a single query.
It's a very verbose query, but I believe if you know that you won't have too many records on the subCollection, it's viable. Here's an example on how to achieve this:
const documentQuery = { _id: '123' }
const subDocumentToUpsert = { name: 'xyz', id: '1' }
collection.update(documentQuery, [
{
$set: {
sub_documents: {
$cond: {
if: { $not: ['$sub_documents'] },
then: [subDocumentToUpsert],
else: {
$cond: {
if: { $in: [subDocumentToUpsert.id, '$sub_documents.id'] },
then: {
$map: {
input: '$sub_documents',
as: 'sub_document',
in: {
$cond: {
if: { $eq: ['$$sub_document.id', subDocumentToUpsert.id] },
then: subDocumentToUpsert,
else: '$$sub_document',
},
},
},
},
else: { $concatArrays: ['$sub_documents', [subDocumentToUpsert]] },
},
},
},
},
},
},
])
There's a way to do it in two queries - but it will still work in a bulkWrite.
This is relevant because in my case not being able to batch it is the biggest hangup. With this solution, you don't need to collect the result of the first query, which allows you to do bulk operations if you need to.
Here are the two successive queries to run for your example:
// Update subdocument if existing
collection.updateMany({
name: 'foo', 'bars.name': 'qux'
}, {
$set: {
'bars.$.somefield': 2
}
})
// Insert subdocument otherwise
collection.updateMany({
name: 'foo', $not: {'bars.name': 'qux' }
}, {
$push: {
bars: {
somefield: 2, name: 'qux'
}
}
})
This also has the added benefit of not having corrupted data / race conditions if multiple applications are writing to the database concurrently. You won't risk ending up with two bars: {somefield: 2, name: 'qux'} subdocuments in your document if two applications run the same queries at the same time.

MongoDB apply iterator with additional query to all results

Is there anyway within mongo, via MapReduce or Aggregation to apply a second query based on the result set of the first?, such as an Aggregate within an aggregate, or new emit/query within MapReduce.
For example, I have a materialized path pattern of items (which also includes parentId), I can get all of the roots simply by:
db.collection.find({parentId: null}
.toArray(function(err, docs) {
});
What I want to do is determine if these docs have children, just a flag true/false. I can iterate through these docs using async each and check, but on large docs, this is not very performant at all and causes event loop delays, I can use eachSeries, but this is just slow.
Ideally, I'd like to be able to handle this all within Mongo. Any suggestions if that's possible?
Edit, Example collection:
{
_id: 1,
parentId: null,
name: 'A Root Node',
path: ''
}
{
_id: 2,
parentId: 1,
name: 'Child Node A',
path: ',1'
}
{
_id: 3,
parentId: 2,
name: 'Child Node B',
path: ',1,2'
}
{
_id: 4,
parentId: null,
name: 'Another Root Node',
path: ''
}
This basically represents two root nodes, where one root node ({_id: 1}) has two children (one being direct), example:
1
2
3
4
What I would like to do is do a query based on parentId so I can get the root nodes by using null or by passing a parentId I can get the children of that and determine if the result set from this, any of the items contain children, example response for where {parentId: null}:
[{
_id: 1,
parentId: null,
name: 'A Root Node',
path '',
hasChildren: true
},
{
_id: 4,
parentId: null,
name: 'Another Root Node',
path '',
hasChildren: false
}]
You could try creating an array of the parentIds from the materialized paths that you can then use in the aggregation pipeline to project the extra field/flag hasChildren.
This can be done by using the map() method on the cursor returned from the find() method. The following illustrates this:
var arr = db.collection.find({ "parentId": { "$ne": null } })
.map(function (e){ return e.path; })
.join('')
.split(',')
.filter(function (e){ return e; })
.map(function (e){ return parseInt(e); }),
parentsIds = _.uniq(arr); /* using lodash uniq method to return a unique array */
Armed with this array of parentIds, you can then use the aggregation framework in particular the $project pipeline which makes use of the the set operator $setIsSubset which takes two arrays and returns true when the first array is a subset of the second, including when the first array equals the second array, and false otherwise:
db.collection.aggregate([
{
"$match": {
"parentId": null
}
},
{
"$project": {
"parentId": 1,
"name": 1,
"path": 1,
"hasChildren": { "$setIsSubset": [ [ "$_id" ], parentIds ] }
}
}
], function (err, res) { console.log(res); });

Resources