I guess node.js' mysql drivers are async, but I'm not really sure what that means, so... The npm module for node.js allows for rows to "stream" or be "gathered up all at once".
Can someone show me where streaming and gathering are applied, and can someone show me how a mysql_fetch_array()-like (from php) operation would be done in node.js?
"Streaming" means that you get a callback for each row, with an argument that's an object corresponding to the row. The other option gives you a callback after all the rows have been fetched, with an argument that's an array of objects, each element corresponding to one row. Which one to use depends on how your application logic is structured.
The latter one sounds like what you're looking for, though (again depending on your application logic) it might or might not be the best way to do it.
Related
I state that I have already tried to look in the Mongo documentation, but I have not found what I am looking for. I've also read similar questions, but they always talk about very simple queries. I'm working with the Node's Mongo native driver. This is a scalability problem, so the collections I am talking about can have millions of records or some dozen.
Basically I have a query and I need to validate all results (which have a complex structure). Two possible solutions come to mind:
I create a query as specific as possible and try to validate the result directly on the server
I use the cursor to go through the documents one by one from the client (this would also allow me to stop if I am looking for only one result)
Here is the question: what is the most efficient way, in terms of latency, overall time, bandwidth use and computational weight server/client? There is probably no single answer, in fact I'd like to understand the pros and cons of the different approaches (and whichever approach you recommend). I know the solution should be determined on a case-by-case basis, however I am trying to figure out what could best cover most of the cases.
Also, to be more specific:
A) Being a complex query (several nested objects with ranges of values and lists of values allowed), performing the validation from the server would certainly save bandwidth, but is it always possible? And in terms of computation could it be more efficient to do it on the client?
B) I don't understand the cursor behavior: is it a continuously open stream until it is closed by server/client? In addition, does the next() result already take up resources on the server/client or does it happen to the call?
If anyone knows, I'd also like to know how Mongoose solved these "problems", for example in the case of custom validators.
Well I have a few pipes in the application I'm working on and I'm starting to think they actually should be guards or even interceptors.
One of them is called PincodeStatusValidationPipe and its job as simple as snow. It checks the cache for a certain value if that value is the one expected then it returns what it gets otherwise it throws the FORBIDEN exception.
Another pipe is called UserExistenceValidationPipe it operates on the login method and checks if a user exists in DB and some other things related to that user (e.g. wheter a password expected in the login method is present and if it does then whether it matches that of the retrieved user) otherwise it throws appropriate exceptions.
I know it's more of a design question but I find it quite important and I would appreciate any hints. Thanks in advance.
EDIT:
Well I think UserExistenceValidationPipe is definitely not the best name choice, something like UserValidationPipe fits way better.
If you are throwing a FORBIDEN already, I would suggest migrating the PincodeStatusValidationPipe to be PincodeStatusValidationGuard, as returning false from a guard will throw a FORBIDEN for you. You'll also have full access to the Request object which is pretty nice to have.
For the UserExistenceValidationPipe, a pipe is not the worst thing to have. I consider existence validation to be a part of business logic, and as such should be handled in the service, but that's me. I use pipes for data validation and transformation, meaning I check the shape of the data there and pass it on to the service if the shape looks correct.
As for interceptors, I like to use those for logging, caching, and response mapping, though I've heard of others using interceptors for overall validators instead of using multiple pipes.
As the question is mostly an opinionated one, I'll leave the final decision up to you. In short, guards are great for short circuiting requests with a failure, interceptors are good for logging, caching, and response mapping, and pipes are for data validation and transformation.
I am using mongodb and what I am trying to do is query a collection, and receive well filtered and ordered and projected result. Sadly the logic I want to implement is complex and even if we assume that it is possible to use db.collection.aggregate it will result in long, complex, hard to read aggregate descriptor, which I believe in most cases is unwanted.
So I was thinking - Mongodb understands javascript, therefor most likely I can pass a javascript function during the query itself, expecting that my mongo server will make the query, run the provided function passing the query result to it, then return the final result to me. Something like:
db.collection.find(myQuery, serverCallback).toArray(function(err, db) { ... });
Sadly it seems this is impossible. Investigating further I reached stored javascript and understood that I can define that serverCallback on the server instead of passing it. Which is good, but seems messy and wrong to me. So basically this is the reason, why i decided to ask here if someone with better mongodb experience can argument this approach.
[My understandings]
I beleive that not every case of filtering, aggregating, etc. can be achieved with db.collection.aggregate, which is pretty normal. For all the cases that need special way of filtering the query result, we have two options - to define stored javascript that we execute on the query result on the mongo server, or to fetch the information from the server and do the processing/filtering/etc. in the client.
If we choose to define stored javascript, it is very likely that we will define some project specific logic into the mongo server. I think that the project specifics should always belong to the project code instead of the database. That way we can version them with git and easily access them if we want to change them.
If we choose to apply the aggregation logic after the query we loose the capability to choose who will make the calculations - the server or the client. Which may be an opinion that we want to have.
[My question]
What is the reasoning behind not allowing serverCallback to be provided during the query? I believe that there must be reasons that I do not understand here.
[Edits]
First I want to say that I have resolve my problem and as it was way too complex to explain it easily. I will prefer to stick to something easier to explain and understand. I believe this example of MongoDB stored javascript provides great example so lets use it. Basically what i tried to ask above was is there a way to pass this sum function during db.collection.find (and why there isn't any). Something like this:
function sum(queryResultAsArray) {
//Do whatever we want with queryResultAsArray
//For the example we filter result rows with x + y == 6;
return queryResultAsArray.filter(function(row) {
return row.x + row.y == 6
});
}
db.test.find({}, queryResultAsArray);
And this to be equal to the examples:
db.test.find({$where: "sum(this.x, this.y) == 6"});
For reasoning on why would one can prefer passing function rather than stored javascript see the original post.
In our domain-driven application, we use a type called ServiceResponse<> to send data between layers of our application - specifically, one is returned by every method in the domain. As of right now, it encapsulates the data (if any) which was returned from the method, or any errors that it may have generated.
My question, then, is this: is it an acceptable practice to add fields to this object that may be useful in other layers of the application? For example, is it good form to add a Status or StatusCode field to it that may be interpreted later by the service layer for use as an HTTP status code (with or without some mapping)?
It sounds like a fine place to me. The idea that every method returns a "response" of some sort smells a bit like trying to decouple too much, but there are some cases where such extreme decoupling is warranted.
In any case, the ServiceResponse could easily have a status, and if it needed one, that is where I would put it.
Is it possible to transform the returned data from a Find query in MongoDB?
As an example, I have a first and last field to store a user's first and last name. In certain queries, I wish to return the first name and last initial only (e.g. 'Joe Smith' returned as 'Joe S'). In MySQL a SUBSTRING() function could be used on the field in the SELECT statement.
Are there data transformations or string functions in Mongo like there are in SQL? If so can you please provide an example of usage. If not, is there a proposed method of transforming the data aside from looping through the returned object?
It is possible to do just about anything server-side with mongodb. The reason you will usually hear "no" is you sacrifice too much speed for it to make sense under ordinary circumstances. One of the main forces behind PyMongo, Mike Dirolf with 10gen, has a good blog post on using server-side javascript with pymongo here: http://dirolf.com/2010/04/05/stored-javascript-in-mongodb-and-pymongo.html. His example is for storing a javascript function to return the sum of two fields. But you could easily modify to return the first letter of your user name field. The gist would be something like:
db.system_js.first_letter = "function (x) { return x.charAt(0); }"
Understand first, though, that mongodb is made to be really good at retrieving your data, not really good at processing it. The recommendation (see for example 50 tips and tricks for mongodb developers from Kristina Chodorow by Oreilly) is to do what Andrew tersely alluded to doing above: make a first letter column and return that instead. Any processing can be more efficiently done in the application.
But if you feel that even querying for the fullname before returning fullname[0] from your 'view' is too much of a security risk, you don't need to do everything the fastest possible way. I'd avoided map-reduce in mongodb for awhile because of all the public concerns about speed. Then I ran my first map reduce and twiddled my thumbs for .1 seconds as it processed 80,000 10k documents. I realize in the scheme of things, that's tiny. But it illustrates that just because it's bad for a massive website to take a performance hit on some server side processing, doesn't mean it would matter to you. In my case, I imagine it would take me slightly longer to migrate to Hadoop than to just eat that .1 seconds every now and then. Good luck with your site
The question you should ask yourself is why you need that data. If you need it for display purposes, do that in your view code. If you need it for query purposes, then do as Andrew suggested, and store it as an extra field on the object. Mongo doesn't provide server-side transformations (usually, and where it does, you usually don't want to use them); the answer is usually to not treat your data as you would in a relational DB, but to use the more flexible nature of the data store to pre-bake your data into the formats that you're going to be using.
If you can provide more information on how this data should be used, then we might be able to answer a little more usefully.