Limit JSON nesting level at parsing stage in rapidjson - rapidjson

Is there any way to limit the nesting level in the JSON document at the parsing stage in rapidjson (in order to limit the resource consumption, e.g. memory)? For instance set maximum nesting level to 100, so if the document to be parsed overpassed that limit, then rapidjson returns error.
I mean, something like this:
rapidjson::Document document;
document.Parse(myJsonAsString, 100);
if (document.HasParseError())
{
// use GetParseError() to check if the error was due to exceding nesting level
}
I have been looking to the Parse() method documentation but it's not clear to me if something like this is possible.

Related

Marklogic QueryByExample in collection NodeJS

TLDR
Is there a way to limit queryByExample to a collection in NodeJS?
Problem faced
I have a complex query with some optional fields (i.e. sometimes some search fields will be omitted). So I need to create a query dynamically, e.g. in JSON. QueryByExample seems to be the right tool to use here as it gives me that flexibility to pass a JSON. However my problem is that I would like to limit my search to only one collection or directory.
e.g. I was hoping for something like
searchJSON = {
title: { $word: "test" },
description: { $word: "desc" }
};
//query
db.documents.query(qb.where(
qb.collection("collectionName"),
qb.byExample(searchJSON)
)).result()...
In this case searchJSON could have been built dynamically, for example maybe sometimes title may be omitted from the search.
This doesn't work because the query builder only allows queryByExample to be the only query. But I'd instead like to built a dynamic search query which is limited to a collection or directory.
At present, I think you would have to express the query with QueryBuilder instead of Query By Example using
qb.and([
qb.collection('collectionName'),
qb.word('title', 'test'),
qb.word('description', 'desc')
])
See http://docs.marklogic.com/jsdoc/queryBuilder.html#word
That said, it should be possible for the Node.js API to relax that restriction based on the fixes in MarkLogic 9.0-2
Please file an issue on https://github.com/marklogic/node-client-api

Couchdb filter using reduce functions/linked documents

Considering:
doc profile
{
_id:"1",
name:"john",
likes: ["2222","1111"]
}
doc likes
{
_id:"2222",
value:"true"
}
{
_id:"1111",
value:"false"
}
I have a filter on my xamarin app to get the profile, and it works well but I need to include the "children" (linked) docs... I can do this with a view setting include_docs=true but I want couchdb to filter so I can use replication.
Also, it would be possible to accomplish the same result if I could use a reduce function to filter data, but I can't make the filter use the reduce function.. So, any idea?
the expected result would be:
doc profile
{
_id:"1",
name:"john",
likes: {
{_id:"2222",
value:"true"},
{_id:"1111",
value:"false"]
}
}
Thanks!
I can do this with a view setting include_docs=true but I want couchdb to filter so I can use replication
You might already know this but you can use couchdb views as filters.
Also, it would be possible to accomplish the same result if I could use a reduce function to filter data
The reduce function is for "reducing" the values that are returned by the map function. The map function returns a key and a value like so:
emit(key,value)
The reduce function only gets the keys and the values that are returned from a map function. For example if you call a view with
?key=abc
and it returns results like
[{
_id:...,
type: abc
},
{
_id:...,
type:abc
}
....
]
You already have all the documents filtered by the key "abc". The reduce function will get as inputs the key, the value and a rereduce parameters. If you use the reduce function as a post map processing step to further filter the results from the view there will be two problems:
There is no way to pass a parameter to a reduce. The keys that you specify will only be used by the map function and then passed as they are to reduce.
It is not a good idea anyway. With reduce you want to return a small value that aggregates the results you get from a view. So taking the above example if you return say an integer as a value from the map function ( in emit(key,value)//suppose that the value is an integer) the reduce function may return a sum or aggregate of those values. But trying to return a modified document is not what reduce function is for. From the docs
"A reduce function must reduce the input values to a smaller output value. If you are building a composite return structure in your reduce, or only transforming the values field, rather than summarizing it, you might be misusing this feature. "
List functions might be more suited to what you are trying to do. If you want to process the results of the view query before returning them they are they way to go.
In list functions you get a set of results returned by the view function. You can even pass additional parameters if you'd like to apply complex filters on them. But you won't be able to use list functions for replication.
Finally replication works on a document level. Documents have _rev fields that is used by the replicator process to check what version the document is in before the replication is performed. So you won't be able to replicate the results returned by a view. Only the documents will be replicated.

Referencing external doc in CouchDB view

I am scraping an 90K record database using JSON-RPC and I am trying to put in some basic error checking. I want to start by scraping the database twice using two different settings and adding a prefix to the second scrape. This way I can check to ensure that the two settings are not producing different records (due to dropped updates, etc). I wanted to implement the comparison using a view which compares each document from the first scrape with it's twin produced by the second scrape and then emit the names of records with a difference between them.
However, I cannot quite figure out how to pull in another doc in the view, everything I have read only discusses external docs using the emit() function, which is too late to permit me to compare it. In the example below, the lookup() function would grab the referenced document.
Is this just not possible?
function(doc) {
if(doc._id.slice(0,1)!=='$' && doc._id.slice(0,1)!== "_"){
var otherDoc = lookup('$test" + doc._id);
if(otherDoc){
var keys = doc.value.keys();
var same = true;
keys.forEach(function(key) {
if ((key.slice(0,1) !== '_') && (key.slice(0,1) !=='$') && (key!=='expires')) {
if (!Object.equal(otherDoc[key], doc[key])) {
same = false;
}
}
});
if(!same){
emit(doc._id, 1);
}
}
}
}
Context
You are correct that this is not possible in CouchDB. The whole point of the map function is that it must be idempotent, otherwise you lose all the other nice benefits of a pre-calculated index.
This is why you cannot access external resources in the map function, whether they be other records or the clock. Any time you run a map you must always get the same result if you put the same record into it. Since there are no relationships between records in CouchDB, you cannot promise that this is possible.
Solution
However, you can still achieve your end goal, just be different means. Some possibilities...
Assuming there is some meaningful numeric value in each doc, you could use a view to take the sum of all those values and group them by which import you did ({key: <batch id>, value: <meaningful number>}). Then compare the two numbers in your client or the browser to see if they match.
A brute force approach would be to use a view to pair the docs that should match. Each doc is on a different row, but they're grouped by a common field. Then iterate through the entire index comparing the pairs. This would certainly be the quickest to code and doesn't depend on your application or data.
Implement a validation function to enforce a schema on your data. Just be warned that this will reduce your write throughput since each written record will be piped out of Erlang and into the JS engine. Also, this is only applicable if you're worried about properly formed records instead of their precise content, which might not be the case.
Instead of your different batch jobs creating different docs, have them place them into the same doc. The structure might look like this: { "_id": "something meaningful", "batch_one": { ..data.. }, "batch_two": { ..data.. } } Then your validation function could compare them or you could create a view that indexes all the docs that don't match. All depends on where in your pipeline you want to do the error checking and correction.
Personally I like the last option better, but only if you don't plan to use the database as is in production. Ie., you wouldn't want to carry around all that extra data in each record.
Hope that helps.
Cheers.

Alternative to skip and limit for mongoose pagination with arbitrary sorting

Let me start by saying that I've read (MongoDB - paging) that using skip and limit for pagination is bad for performance and that its better to sort by something like dateCreated and modify the query for each page.
In my case, I'm letting the user specify the parameter to sort by. Some may be alphabetical. Specifying a query for this type of arbitrary sorting seems rather difficult.
Is there a performance-friendly way to do pagination with arbitrary sorting?
Example
mongoose.model('myModel').find({...})
.sort(req.sort)
...
Secondary question: At what scale do I need to worry about this?
i don't think you can do this.
but in my opinion the best way is to build you query depending on your req.sort var.
for example (it's written in coffescript)
userSort = {name:1} if req.sort? and req.sort="name"
userSort = {date:1} if req.sort? and req.sort="date"
userSort = {number:1} if req.sort? and req.sort="number"
find {}, null , {skip : 0 , limit: 0, sort : userSort } , (err,results)->

Working with nested objects in Redis?

Say I have a hash where a nested property could change.
"key1": {
"prop1": {
"subprop1": "could_change"
}
}
If I get sent the info that prop1.subprop1 has changed can I preform atomic updates on this property? Right now node_redis saves prop1 as a string that says '[object Object]'. If I JSON.stringify() the obj then I would need to retrieve the object, parse to object in memory, make the edit, and then stringify and save the object -- not knowing if something has changed in the mean time.
If I should be working with this data in a different way could someone please explain? I have an object with may nested attributes that I need to be able to update parts of, in addition to needing to retrieve as a whole object.
Thanks for any help!
Lua scripting or a lock pattern would solve your problem.
EVAL 'local obj = cjson.decode(redis.call("GET", "key1")); obj.prop1.subprop1 = ARGV[1]; redis.call("SET", "key1", cjson.encode(obj));' 0 "did_change"
You could even make something more advanced in Lua for editing any key's JSON subobjects if you wanted to.
Look at the Redis SETNX command docs for an example of how to use a lock.

Resources