How to find an object which is at nth nested level in mongoDB? (single collection, single document)

How to find an object which is at nth nested level in mongoDB? (single collection, single document) - node.js

I am trying to find an nth object using '_id', which is in the same document.
Any suggestions or references or code samples would be appreciated.
(e.g)
Document will look as below:
{
"_id": "xxxxx",
"name": "One",
"pocket": [{
"_id": "xxx123",
"name": "NestedOne",
"pocket": []
}, {
"_id": "xxx1234",
"name": "NestedTwo",
"pocket": [{
"_id": "xxx123456",
"name": "NestedTwoNested",
"pocket": [{"_id": "xxx123666",
"name": "NestedNestedOne",
"pocket": []
}]
}]
}]
}
The pockets shall hold more pockets and it is dynamic.
Here, I would like to search "pocket" using "_id" , say "xxx123456", but without using static reference.
Thanks again.

I highly recommend you change your document structure to something easier to manage/search, as this will only become more of a pain to work with.
Why not use multiple collections, like explained in this answer?
So an easy way to think about this for your situation, which I hope is easier for you to reason about than dropping some schema code...
Store all of your things as children in the same document. Give them unique _ids.
Store all of the contents of their pockets as collections. The collections simply hold all the ids that would normally be inside the pocket (instead of referencing the pockets themselves).
That way, a lot of your work can happen outside of DB calls. You just batch pull out the items you need when you need them, instead of searching nested documents!
However, if you can work with the entire document:
Looks like you want to do a recursive search a certain of number of levels deep. I'll give you a general idea with some pseudocode, in hopes that you'll be able to figure the rest out.
Say your function will be:
function SearchNDeep(obj, n, id){
/**
You want to go down 1 level, to pocket
see if pocket has any things in it. If so:
Check all of the things...for further pockets...
Once you've checked one level of things, increment the counter.
When the counter reaches the right level, you'd want to then see if the object you're checking has a `'_id'` of `id`.
**/
}
That's the general idea. There is a cleaner, recursive way to do this where you call SearchNDeep while passing a number for how deep you are, base case being no more levels to go, or the object is found.
Remember to return false or undefined if you don't find it, and the right object if you do! Good luck!

Related

Querying mongoDB using pymongo (completely new to mongo/pymongo)

If this question seems too trivial then please let me know in the comments, I will do further research on how to solve it.
I have a collection called products where I store details of a particular product from different retailers. The schema of a document looks like this -
{
"_id": "uuid of a product",
"created_at": "timestamp",
"offers": [{
"retailer_id": 123,
"product_url": "url - of -a - product.com",
"price": "1"
},
{
"retailer_id": 456,
"product_url": "url - of -a - product.com",
"price": "1"
}
]
}
_id of a product is system generated. Consider a product like 'iPhone X'. This will be a single document with URLs and prices from multiple retailers like Amazon, eBay, etc.
Now if a new URL comes into the system, I need to make a query if this URL already exists in our database. The obvious way to do this is to iterate every offer of every product document and see if the product_url field matches with the input URL. That would require loading up all the documents into the memory and iterate through the offers of every product one by one. Now my question arises -
Is there a simpler method to achieve this? Using pymongo?
Or my database schema needs to be changed since this basic check_if_product_url_exists() is too complex?

MongoDB provides searching within arrays using dot notation.
So your query would be:
db.collection.find({'offers.product_url': 'url - of -a - product.com'})
The same syntax works in MongoDB shell or pymongo.

AZURE SEARCH, ismatch not filtering

I'm using Azure Search for perform some customs search in a database.
I got this one field that have this kind of structure:
"STUFF": "05-05-16-00|"
but I'm having trouble by creating the filter, because its possible that I'll not have all the numbers that builds this structure. It all depends that what the final user will type. So I need a wildcard to fill the blanks with the missing numbers, like this
"05-05-??-??" -> the pipe is important, because this field can have more than 1 code inside.
Now I need to catch all the possible elements that STARTS WITH 05-05, like, for example: 05-05-11-01
I thought I suposed to use the search.ismatch() function, but it doesnt work.
here some code:
search.ismatch('05-05-??-??','STUFF');
And the results were:
"STUFF": "02-02-16-00|",
"STUFF": "02-02-14-00|",
this is driving me crazy, because I dont know why this results came back.
Maybe is important to know that Im performing a POST request to the Azure Search API with this code in 'filter'
Maybe i should to escape this especial characters like - and ? like this
search.ismatch('05\\-05\\-\\?\\?\\-\\?\\?','STUFF')
But the results were the same.
Can somebody please help me ?
EDIT 1
following this Article I change some things and make the following search:
search.ismatch('\"05-00*\"','STUFF','simple', 'all')
And I starting the get some results, but now this is my results:
"STUFF": "06-05-02-00|", //WRONG
"STUFF": "05-02-05-01|", //RIGHT
"STUFF": "05-02-02-07|", //RIGHT
For some reason, it's returing the right structure but not the in the front of the text.
EDIT 2
I made some changes and change all the "-" for the keyword "OU" and I'm trying to follow this question to make sore like a "contains", but i perfoming a POST request with the following parameters
{
"search": "*",
"filter": "search.ismatch('/.*08010000OU/.*','STUFF', 'full', 'all')",
"skip": "0",
"count": true
}
Im trying to use a wildcard in the begining of the query search because I still missing some information.

I believe you won't be able to solve this using the StandardAnalyzer. Try switching to WhitespaceAnalyzer for this particular field and it probably will work with "05-05*"

How to backup/dump structure of graphs in arangoDB

Is there a way to dump the graphstructure of an arangoDB database, since
arangodump unfortunately just dumps the data of edges and collections.

According to the documentation in order to dump structural information of all collections (including system collections) you run the following
arangodump --dump-data false --include-system-collections true --output-directory "dump"
If you do not want the system collections to be included then don't provide the argument (it defaults to false) or provide a false value.
How is the structural and data of collections dumped, see below from the documentation
Structural information for a collection will be saved in files with
name pattern .structure.json. Each structure file will contains a JSON
object with these attributes:
parameters: contains the collection properties
indexes: contains the collection indexes
Document data for a collection will be saved in
files with name pattern .data.json. Each line in a data file is a
document insertion/update or deletion marker, alongside with some meta
data.

For testing I often want to extract a subgraph with a known structure. I use that to test my queries against. The method is not pretty but it might address your question. I blogged about it here.

Although #Raf's answer is accepted , --dump-data false will only give structure files for all the collections and but data wont be there. Including --include-system-collections true would give _graphs system collection's structure which wont have information pertaining to individual graphs creation/structure.
For Graph creation data as well
Right command is as follows.
arangodump --server.database <DB_NAME> --include-system-collections true --output-directory <YOUR_DIRECTORY>
We would be interested in _graphs_<long_id>.data.json named file which has below data format.
{
"type": 2300,
"data":
{
"_id": "_graphs/social",
"_key": "social",
"_rev": "_WaJxhIO--_",
"edgeDefinitions": [
{
"collection": "relation",
"from": ["female", "male"],
"to": ["female", "male"]
}
],
"numberOfShards": 1,
"orphanCollections": [],
"replicationFactor": 1
}
}
Hope this helps other users who were looking for my requirement!

Currently ArangoDB manages graphs via documents in the system collection _graphs.
One document equals one graph. It contains the graph name, involved vertex collections and Edge Definition that configure the directions of edge collections.

Return a document with the largest value

I've got a series of documents that have a version field.
{
"_id": "xxx",
"_rev": "6-4bdeb530234c454ae2f16d77ba577428",
"name": "glossary",
"locale": "en-us",
"version": "186.0",
"title": "Glossary",
...
}
I'd like to be able to return the document where name is glossary that has the largest version.
If I make the map function like this:
function(doc) {
emit(doc.name, doc);
}
and do a reduce on it, I can figure out which document has the latest version, but when I try to return it (return latest_doc), I get a reduce_overflow_error error.
It seems like I need to do this via a map function only, but I can't figure out how to return a single document with the highest value from the map.
I'm sure there's an easy way to do this, but I haven't been able to figure it out.
Can you help me get the latest version of my glossary document?

reduce_overflow_error is couchdb's way of telling you that you are not using reduce in a way it is supposed to be used. From the wiki
Reduce is a powerful feature of CouchDB but is often misused which leads to performance problems. From 0.10 onwards, CouchDB uses a heuristic to detect reduce functions that won't scale to give the developer an early warning. A reduce function must reduce the input values to a smaller output value. If you are building a composite return structure in your reduce, or only transforming the values field, rather than summarizing it, you might be misusing this feature
Try this map function
function(doc) {
emit([doc.name,doc.version], doc._id);
}
query it with
/view-name?include_docs=true&startkey=["glossary",{}]&endkey=["glossary"]&descending=true&limit=1
This will give you the the file with the highest version number. Remove the limit parameter if you want more files.

Freebase batch search

I'm trying to use Freebase to search for multiple items at a time (using one API call). For example, if I have two items:
Robert Downey, Jr.
The Avengers
I want to query Freebase once and get back results for both items. Basically all I need is the mid for the top 3 or 4 results for both items. I would like to rely on Freebase's search API to provide disambiguation for topics. For example, I'd like to be able to search for "Robert Downey, Jr." with the abbreviation: "RDJ".
This is easy to do when searching one item at a time:
https://www.googleapis.com/freebase/v1/search?query=rdj
Making two calls like this would give me exactly what I'm looking for, but I would like to stay away from making these calls individually.
Reconciliation
I did run across the json-rpc call for reconciliation, and I have tried the following:
Endpoint: https://www.googleapis.com/rpc
POST body:
[
{
"method": "freebase.reconcile",
"apiVersion": "v1",
"params": {
"name": ["RDJ"],
"key": "api_key",
"limit":10
}
},
{
"method": "freebase.reconcile",
"apiVersion": "v1",
"params": {
"name": ["the avengers"],
"key": "api_key",
"limit":10
}
}
]
This works fairly well for Robert Downey, Jr in that I get a result of type /film/actor as I did using the search api. However, for The Avengers, I get a set of results with type /book/book rather than the 2012 film. These results don't seem to be prioritized the same way as the search results.
I tried something similar using json-rpc for a Freebase search method:
{
"method": "freebase.search",
"apiVersion": "v1",
"params": {
"name": ["RDJ"],
"key": "api_key",
"limit":10
}
}
But the "freebase.search" method didn't seem to exist.
One thing to note is that I will not know the expected type of the items I am looking for before hand.
Long story short: I want the exact results the search API provides, but with multiple queries wrapped up into one call.
Am I missing something terribly simple like an OR operator for the search API?? I've been searching for days, but can't seem to find a good solution. I would appreciate any help at all!

Why not just make two calls asynchronously? That would give you the results you need with almost no penalty in latency.
A few relevant facts:
The Reconcile API is still experimental. It's intended for use in reconciling against a type at a minimum and usually scoring using additional property values.
The Search API isn't included in the RPC mechanism because its freeform output doesn't work with the assumptions of the RPC framework. Ditto for the Topic API, although that's not really relevant here.
The Search API has a fairly expressive S-expression language. You don't say if you want the queries scored independently or together, but if you want them ranked jointly, you can use a filter expression like [(any name:rdj name:"The Avengers")]
https://www.googleapis.com/freebase/v1/search?query=&limit=10&filter=%28any%20name:rdj%20name:%22the%20avengers%22%29

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string