I am using mongodb and what I am trying to do is query a collection, and receive well filtered and ordered and projected result. Sadly the logic I want to implement is complex and even if we assume that it is possible to use db.collection.aggregate it will result in long, complex, hard to read aggregate descriptor, which I believe in most cases is unwanted.
So I was thinking - Mongodb understands javascript, therefor most likely I can pass a javascript function during the query itself, expecting that my mongo server will make the query, run the provided function passing the query result to it, then return the final result to me. Something like:
db.collection.find(myQuery, serverCallback).toArray(function(err, db) { ... });
Sadly it seems this is impossible. Investigating further I reached stored javascript and understood that I can define that serverCallback on the server instead of passing it. Which is good, but seems messy and wrong to me. So basically this is the reason, why i decided to ask here if someone with better mongodb experience can argument this approach.
[My understandings]
I beleive that not every case of filtering, aggregating, etc. can be achieved with db.collection.aggregate, which is pretty normal. For all the cases that need special way of filtering the query result, we have two options - to define stored javascript that we execute on the query result on the mongo server, or to fetch the information from the server and do the processing/filtering/etc. in the client.
If we choose to define stored javascript, it is very likely that we will define some project specific logic into the mongo server. I think that the project specifics should always belong to the project code instead of the database. That way we can version them with git and easily access them if we want to change them.
If we choose to apply the aggregation logic after the query we loose the capability to choose who will make the calculations - the server or the client. Which may be an opinion that we want to have.
[My question]
What is the reasoning behind not allowing serverCallback to be provided during the query? I believe that there must be reasons that I do not understand here.
[Edits]
First I want to say that I have resolve my problem and as it was way too complex to explain it easily. I will prefer to stick to something easier to explain and understand. I believe this example of MongoDB stored javascript provides great example so lets use it. Basically what i tried to ask above was is there a way to pass this sum function during db.collection.find (and why there isn't any). Something like this:
function sum(queryResultAsArray) {
//Do whatever we want with queryResultAsArray
//For the example we filter result rows with x + y == 6;
return queryResultAsArray.filter(function(row) {
return row.x + row.y == 6
});
}
db.test.find({}, queryResultAsArray);
And this to be equal to the examples:
db.test.find({$where: "sum(this.x, this.y) == 6"});
For reasoning on why would one can prefer passing function rather than stored javascript see the original post.
Related
I state that I have already tried to look in the Mongo documentation, but I have not found what I am looking for. I've also read similar questions, but they always talk about very simple queries. I'm working with the Node's Mongo native driver. This is a scalability problem, so the collections I am talking about can have millions of records or some dozen.
Basically I have a query and I need to validate all results (which have a complex structure). Two possible solutions come to mind:
I create a query as specific as possible and try to validate the result directly on the server
I use the cursor to go through the documents one by one from the client (this would also allow me to stop if I am looking for only one result)
Here is the question: what is the most efficient way, in terms of latency, overall time, bandwidth use and computational weight server/client? There is probably no single answer, in fact I'd like to understand the pros and cons of the different approaches (and whichever approach you recommend). I know the solution should be determined on a case-by-case basis, however I am trying to figure out what could best cover most of the cases.
Also, to be more specific:
A) Being a complex query (several nested objects with ranges of values and lists of values allowed), performing the validation from the server would certainly save bandwidth, but is it always possible? And in terms of computation could it be more efficient to do it on the client?
B) I don't understand the cursor behavior: is it a continuously open stream until it is closed by server/client? In addition, does the next() result already take up resources on the server/client or does it happen to the call?
If anyone knows, I'd also like to know how Mongoose solved these "problems", for example in the case of custom validators.
I'm developing a node application which has an endpoint where it filters stories.
I accept multiples query params up to five. I don't know if this is too much query params but I need these params to filter in the data base.
Example:
http://localhost:3000/api/search?location=wherever&duration=123
I was wondering if there is any way better to do it like:
http://localhost:3000/api/search?filter=location:wherever,duration:123
or even I could send a stringify object from the url but it seems an ugly solution for me.
The second way I put here looks cleaner for me but I'm not sure how to handle the parameters.
In the first option I can pass req.query to my service and handle the object however in the other way handle the filter value would be harder.
What do you think? How do you manage this case to send a filter to your back?
Thanks in advance and my apologies if it was already answered, I didn't find a similar post.
What is the maximum query params url I should use?
There is no limit to the number of query parameters and it doesn't cause any problem to have lots of them.
I accept multiples query params up to five. I don't know if this is too much query params but I need these params to filter in the data base.
Five is low, not high - you're absolutely fine.
The practical limit to the total URL length is around 2000 characters so you just want to make sure your URLs stay comfortably below that and not worry about the count of separate query parameters.
I would only consider an alternative way to send the query if the total URL length was getting too long or could be too long. Aside from that, you can have as many query parameters as you want. It's not really something a human consumes directly, it's just something your code generates and parses so the quantity really doesn't matter.
That said, if you had 100 separate query parameters that would all be used at once, I'd wonder what you were doing and if there was a better way to accomplish it or express the query, but it wouldn't be a technical problem or limitation.
The structural alternatives are using POST instead of a GET and sending data as JSON or something like that. You can send as much data as you want that way, but a POST changes the usability of the URL since it's not something that can just be directly linked to or bookmarked or copied into a URL bar like a GET can.
I did a google search, but I could not find what I really need.
I need to query an API, which have the same route, but with different parameters.
Example:
router.get('/items/:query, function(){})
In this case, I would search for all items
router.get('/items/:id, function(){})
Here, I would look for a specific item
At the core of your issue is that you are trying to specify two different resources at the same location. If you design your API to adhere to restful principles you'll see why that's not a wise choice. Here are two good starting points:
https://en.wikipedia.org/wiki/Representational_state_transfer
http://www.restapitutorial.com/lessons/whatisrest.html
In restful api's the root resource represents the item collection:
/api/items
...and when you specify an id that indicates you want only one item:
/api/items/abc123
If you really still want to accomplish what you asked in your question you'll need to add a url parameter like /items/query?query=true or /items/abc123?detail=true but this will be confusing to 99% of web developers who ever look at your code.
Also I'm not sure if you really meant this, but when you pass a variable named "query" to the server that seems to indicate that you're going to send a SQL query (or similar data definition language) from the client into the server. This is a dangerous practice for several reasons - it's best to leave all of this type of code on your server.
Edit: if you really absolutely positively have to do it this way then maybe have a query parameter that says ?collection=true. This would at least be understood by other developers that might have to maintain the code in future. Also make sure you add comments to explain why you weren't able to implement rest so you're not leaving behind a bad reputation :)
The issue is that without additional pattern matching there isn't a way Express will be able to distinguish between /items/:query and /items/:id, they are the same route pattern, just with different aliases for the parameter.
Depending on how you intend to structure your query you may want to consider having the route /items and then use query string parameters or have a separate /items/search/:query endpoint.
I just recently started to learn NodeJS/SailsJS and have few questions.
Please, note that I have strong Java background ( this affects my style of thinking and the architecture ).
How to prevent SQL injection in SailsJS?
Basically, I have:
User.query(query, function(err, result) {
if (err)
return next(err);
//
res.json({ data : stepsCount });
});
But where/how should I put parameters for SQL query?
In Java I used to make something like:
Query q = new Query("select * from User where id = :id");
q.setParameter("id", some-value);
What about Data Access Object?
I am feeling uncomfortable having SQL queries in Controller.
Do you have any best practices for that? Maybe some example projects?
All example projects I've found so far do not use some complex SQL queries.
They are more like school projects using some predefined methonds in Domain classes ( like User.create, User.find ) etc.
Thank you in advance.
Best Regards,
Maksim
Sailsjs use a DAO library written in javascript (of course). it is called waterlinejs The documentation is here
Basically, it means, if you want to find User with specific id, you only need
User.findOne({id:xxx}).then(function(data){
res.json(data);
})
This is like the main advantage of using sailsjs, because waterlinejs can have different adapters, but all of them can construct User model, and access them like this. So if you use sails-mysql-adapter, then it will create a user table, with id, name...etc as column, if you use memory adapter, it will be stored in memory.
It does provide User.query('select.....',callback) in case what you want cannot be achieved by this DAO methods. But because it is for last resort, for query building, sailsjs does not have native support, you can certainly use package like sprintf to build sql.
Since you are a java programmer, (just like me), as a side note, I'd like you to remember these findOne() methods provided by waterlinejs are asynchronous, and also in a promise manner. They are quite different from java. I spent a lot of time wrap my head around this, but as soon as I am comfortable with this idea, I start to love it right away.
Let me know if this is clear.
Is it possible to transform the returned data from a Find query in MongoDB?
As an example, I have a first and last field to store a user's first and last name. In certain queries, I wish to return the first name and last initial only (e.g. 'Joe Smith' returned as 'Joe S'). In MySQL a SUBSTRING() function could be used on the field in the SELECT statement.
Are there data transformations or string functions in Mongo like there are in SQL? If so can you please provide an example of usage. If not, is there a proposed method of transforming the data aside from looping through the returned object?
It is possible to do just about anything server-side with mongodb. The reason you will usually hear "no" is you sacrifice too much speed for it to make sense under ordinary circumstances. One of the main forces behind PyMongo, Mike Dirolf with 10gen, has a good blog post on using server-side javascript with pymongo here: http://dirolf.com/2010/04/05/stored-javascript-in-mongodb-and-pymongo.html. His example is for storing a javascript function to return the sum of two fields. But you could easily modify to return the first letter of your user name field. The gist would be something like:
db.system_js.first_letter = "function (x) { return x.charAt(0); }"
Understand first, though, that mongodb is made to be really good at retrieving your data, not really good at processing it. The recommendation (see for example 50 tips and tricks for mongodb developers from Kristina Chodorow by Oreilly) is to do what Andrew tersely alluded to doing above: make a first letter column and return that instead. Any processing can be more efficiently done in the application.
But if you feel that even querying for the fullname before returning fullname[0] from your 'view' is too much of a security risk, you don't need to do everything the fastest possible way. I'd avoided map-reduce in mongodb for awhile because of all the public concerns about speed. Then I ran my first map reduce and twiddled my thumbs for .1 seconds as it processed 80,000 10k documents. I realize in the scheme of things, that's tiny. But it illustrates that just because it's bad for a massive website to take a performance hit on some server side processing, doesn't mean it would matter to you. In my case, I imagine it would take me slightly longer to migrate to Hadoop than to just eat that .1 seconds every now and then. Good luck with your site
The question you should ask yourself is why you need that data. If you need it for display purposes, do that in your view code. If you need it for query purposes, then do as Andrew suggested, and store it as an extra field on the object. Mongo doesn't provide server-side transformations (usually, and where it does, you usually don't want to use them); the answer is usually to not treat your data as you would in a relational DB, but to use the more flexible nature of the data store to pre-bake your data into the formats that you're going to be using.
If you can provide more information on how this data should be used, then we might be able to answer a little more usefully.