Currently when using Rust Rocket framework it is necessary to get your database connection via the controller. Basically the connection is given to your handler from a pre-configured pool. Now we have to pass this connection down into any struct which needs a database connection.
If I would like to separate the concerns of reading from the data store, or potentially multiple data stores if caching is involved as well, then I will have to pass one, or potentially multiple, different connection structs from my handler into the lower layer.
While I am aware that I can encapsulate all of the connections into a single request guard, I am dissatisfied by lack of abstraction. I feel quite strongly that my handler should know nothing about the database in order to keep the concerns as separate as possible.
How would I proceed in Rust to obtain a connection from some shared pool of connections in an object, without usage of request guards and argument drilling?
Note: Terminology may be incorrect due to rather limited experience with Rocket
Related
I'm currently trying to implement something similar to DataSourceUtils::doGetConnection in rust with actix web.
I've multiple repository traits with methods like get_user_for_update. Now currently the implementations of the trait receive an sqlx::PgPool and draw a connection from this pool. But in many cases multiple repository methods are called and should execute within one transaction. Is there any way to bind a connection to the current async context in actix web? So each repository method could use the logic "If there's a connection bound to this thread already, use it. If a transaction is requested, get a connection and bind it to this context, otherwise get a connection, execute, and close it".
I'd like to avoid passing a connection reference to every repository method, since I'd have to pass it between services too and mocking the connection is really difficult (associated generic types etc).
I know that in PHP objects are created for each request and destroyed when the processing is finished.
And in Java, depending on configuration, objects can remain in memory and be either associated with a single user (through server sessions) or shared between multiple users.
Is there a general rule for this in Node.js?
I see many projects instantiating all app objects in the entry script, in which case they will be shared between requests.
Others will keep object creation inside functions, so AFAIK objects are destroyed after processing each request.
What are the downsides of each approach? Obviously, things like memory usage and information sharing should be considered, but are there any other things specific to Node.js that we should pay attention to?
Javascript has no such thing as objects that are tied to a given request. The language is garbage collected and all objects are garbage collected when there are no more references to them and no code can reach them. This has absolutely nothing to do with request handlers.
so AFAIK objects are destroyed after processing each request.
No. The lifetime of objects in Javascript has absolutely nothing to do with requests.
Instead, think of function scopes. If you create an object in a request handler and use it in that request handler and don't store it somewhere that creates a long lasting reference to the object, then just like ANY other function in Javascript, when that request handler function finishes and returns and has no more asynchronous operations still in-flight, then any objects created within that function that are not stored in some other scope will be cleaned up by the garbage collector.
It is the exact same rules for a request handler as it is for any other function call in the language.
So, please forget anything you know about PHP as its request-specific architecture will only mess you up in Javascript/node.js. There is no such thing in node.js.
Instead, think of a node.js server as one, long running process with a garbage collector. All objects that are created will be garbage collected when they are no longer reachable by live code (e.g. there are no live references to them that any code can get to). This is the same whether the object is created at startup of the server, in a request handler on the server, in a recurring timer on the server or any other event on the server. The language has one garbage collector that works the same everywhere and has no special behavior for server requests.
The usual way to do things in a node.js server is to create objects that are local variables in the request handler function (or in any functions that it calls) or maybe even occasionally assigned as properties of the request or response objects (middleware will often do this). Since everything is scoped to a function call in the request chain when that function call is done, the things you created as local variables in those functions will become eligible for garbage collection.
In general, you do not use many higher scoped variables outside the request handler except for purposeful long term storage (session state, database connections, or other server-wide state).
Is there a general rule for this in Node.js?
Not really in the sense you were asking since Javascript is really just about the scope that a variable is declared in and then garbage collection from there, but I will offer some guidelines down below.
If data is stored in a scope higher than the request handler (module scope or global scope), then it probably lasts for a long time because there is a lasting reference that future request handlers can access so it will not be garbage collected.
If objects are created and used within a request handler and not attached to any higher scope, then they will be garbage collected by the language automatically when the function is done executing.
Session frameworks typically create a specific mechanism for storing server-side state that persists on the server on a per-user basis. A popular node.js session manager, express-session does exactly this. There, you follow the rules for the session framework for how to store or remove data from each user's session. This isn't really a language feature as it is specific library built in the language. Even the session manage relies on the garbage collector. Data persists in the session manager when desired because there are lasting references to the data to make it available to future requests.
node.js has no such thing as "per-user" or "per-request" data built into the language or the environment. A session manager builds "per-user" data artificially by making persistent data that can be requested or accessed on a per-user basis.
Some general rules for node.js:
Define in your head and your design which data is local to a specific request handler, which data is meant for long term store, which data is meant for user-specific sessions. You should be very clear about that.
Don't ever put request-specific variables into any higher scope that any other request handler can access unless these are purposeful shared variables that are meant to be accessed by multiple requests. Accidentally sharing variables between requests creates concurrency issues and race conditions and very hard-to-track-down server bugs as one request may write to that variable in doing it's work and then another request may come along and also write to it, trouncing what the first request was working on. Keep these kind of request-specific variables local to the request handler (local to the function for the request handler) so that can never happen.
If you are storing data for long term use (beyond the lifetime of a specific request) which would generally mean storing it in a module scoped variable or in a global scoped variable (should generally not use global scoped variables), then be very, very careful about how the data is stored and accessed to avoid race conditions or inconsistent state that might mess up some other request handler reading/writing to that data. node.js makes this simpler because it runs your Javascript as single threaded, but once your request handler makes some sort of asynchronous function call (like a database call), then other request handlers get to run so you have to be careful about modifications to shared state across asynchronous boundaries.
I see many projects instantiating all app objects in the entry script, in which case they will be shared between requests.
In the example of an web server using the Express framework, there is one app object that all requests have access to. The only request-specific variables are the request and response objects that are created by the web server framework and passed into your request handler. Those will be unique to each new request. All other server state is accessible by all requests.
What are the downsides of each approach?
If you're asking for a comparison of the Apache/PHP web server model to the node.js/Express web server model, that's a really giant question. They are very different architectures and the topic has been widely discussed and debated before. I'd suggest you do some searching on that topic, read what has been previously written and then ask a more specific question about things you don't quite understand or need some clarification on.
I know that blocking code is discouraged in node.js because it is single-threaded. My question is asking whether or not blocking code is acceptable in certain circumstances.
For example, if I was running an Express webserver that requires a MongoDB connection, would it be acceptable to block the event loop until the database connection was established? This is assuming that all pages served by Express require a database query (which would fail if MongoDB was not initialized).
Another example would be an application that requires the contents of a configuration file before being initializing. Is there any benefit in using fs.readFile over fs.readFileSync in this case?
Is there a way to work around this? Is wrapping all the code in a callback or promise the best way to go? How would that be different from using blocking code in the above examples?
It is really up to you to decide what is acceptable. And you would do that by determining what the consequences of blocking would be ... on a case-by-case basis. That analysis would take into account:
how often it occurs,
how long the event loop is likely to be blocked, and
the impact that blocking in that context will have on usability1.
Obviously, there are ways to avoid blocking, but these tend to add complexity to your application. Really, you need to decide ... on a case-by-case basis ... whether that added complexity is warranted.
Bottom line: >>you<< need to decide what is acceptable based on your understanding of your application and your users.
1 - For example, in a game it would be more acceptable to block the UI while switching "levels" than during active play. Or for a general web service, "once off" blocking while a config file is loaded or a DB connection is established during webserver startup is more acceptable that if this happened on every request.
From my experience most tasks should be handled in a callback or by returning a promise. You DO NOT want to block code in a Node application. That's what makes it so nice! Mostly with MongoDB it will crash before it has a chance to connect if there is no connection. It won't' really have an effect on an API call because your server will be dead!
Source: I'm a developer at a bootcamp that teaches MEAN stack.
Your two examples are completely different. The distinction actually answers the question in and of itself.
Grabbing data from a database is dependent on being connected to that database. Any code that is dependent upon that data is then dependent upon that connection. These things have to happen serially for the app to function and be meaningful.
On the other hand, readFileSync will block ALL code, not just code that is reliant on it. You could start reading a csv file while simultaneously establishing a database connection. Once both are done, you could add that csv data to the database.
I'm using the Node native client 1.4 in my application and I found something in the document a little bit confusing:
A Connection Pool is a cache of database connections maintained by the driver so that connections can be re-used when new connections to the database are required. To reduce the number of connection pools created by your application, we recommend calling MongoClient.connect once and reusing the database variable returned by the callback:
Several questions come in mind when reading this:
Does it mean the db object also maintains the fail over feature provided by replica set? Which I thought should be the work of MongoClient (not sure about this but the C# driver document does say MongoClient maintains replica set stuff)
If I'm reusing the db object, when should I invoke the db.close() function? I saw the db.close() in every example. But shouldn't we keep it open if we want to reuse it?
EDIT:
As it's a topic about reusing, I'd also want to know how we can share the db in different functions/objects?
As the project grows bigger, I don't want to nest all the functions/objects in one big closure, but I also don't want to pass it to all the functions/objects.
What's a more elegant way to share it among the application?
The concept of "connection pooling" for database connections has been around for some time. It really is a common sense approach as when you consider it, establishing a connection to a database every time you wish to issue a query is very costly and you don't want to be doing that with the additional overhead involved.
So the general principle is there that you have an object handle ( db reference in this case ) that essentially goes and checks for which "pooled" connection it can use, and possibly if the current "pool" is fully utilized then and create another ( or a few others ) connection up to the pool limit in order to service the request.
The MongoClient class itself is just a constructor or "factory" type class whose purpose is to establish the connections and indeed the connection pool and return a handle to the database for later usage. So it is actually the connections created here that are managed for things such as replica set fail-over or possibly choosing another router instance from the available instances and generally handling the connections.
As such, the general practice in "long lived" applications is that "handle" is either globally available or able to be retrieved from an instance manager to give access to the available connections. This avoids the need to "establish" a new connection elsewhere in your code, which has already been stated as a costly operation.
You mention the "example" code which is often present through many such driver implementation manuals often or always calling db.close. But these are just examples and not intended as long running applications, and as such those examples tend to be "cycle complete" in that they show all of the "initialization", the "usage" of various methods, and finally the "cleanup" as the application exits.
Good application or ODM type implementations will typically have a way to setup connections, share the pool and then gracefully cleanup when the application finally exits. You might write your code just like "manual page" examples for small scripts, but for a larger long running application you are probably going to implement code to "clean up" your connections as your actual application exits.
I'm writing a server application and its client counterpart that both use Netty for the network layer. I find myself facing typical safety concerns about sending a password from a client to the server so I decided SSL was the safest way of doing this.
I know of the securechat example and will use this to modify my pipelines accordingly. However, I would also like to disable SSL after password transmission and acknowledge to save a few precious CPU cycles on server side, which may be busy with many other clients. The ChannelPipeline documentation states that:
"Once attached, the coupling between the channel and the pipeline is permanent; the channel cannot attach another pipeline to it nor detach the current pipeline from it."
The idea is then to not change the pipeline on-the-fly, which is prohibited, but to somehow tell the SslHandler in the pipeline that it should stop encrypting messages at some point. I was thinking of creating a class inheriting from SslHandler, overriding its handleDownstream function to call context.sendDownstream(evt) after some point in the communication.
Question 1: Is this a bad idea, that is, disabling SSL at some point ?
To allow a block in the pipeline (say a Decoder) telling another block (say SslHandler) that it should change its behaviour from now on, I thought I could create, say, an AtomicBoolean in my ChannelPipelineFactory's getPipeline() and pass it to the constructor of both the Decoder and the SslHandler.
Question 2: Is this a bad idea, that is, sharing state between pipeline blocks ? I'm worried I might screw up the multithreading of Netty here: are the blocks of a pipeline working on a single message, one at a time ? i.e.: does the first block wait for the completion of the last block before pulling the next message ?
EDIT:
Oh my bad, this is from the ChannelPipeline page I had been visiting many times and quoting in this very question:
"A ChannelHandler can be added or removed at any time because a ChannelPipeline is thread safe. For example, you can insert a SslHandler when sensitive information is about to be exchanged, and remove it after the exchange."
So this answers question 2 about modifying the pipeline's content on-the-fly, and not the pipeline reference itself.
I'm not sure about the efficacy of turning off SSL once established, but I think you have misinterpreted the mutability of the pipeline. Once a given channel is associated with a pipeline, that association is immutable. However, the handlers in the pipeline can be safely modified. That is to say, you can add and remove handlers as your protocol requires. Accordingly,you should be able to remove the SSL handler once it has served its purpose.
You can remove SslHandler from the pipeline with ChannelPipeline.remove(..) then it should turn your connection to plaintext. Please file a bug if it does not work - we actually have not tried that scenario in production :-)
I'm not sure about Netty, but in principle, you could indeed carry on with plain traffic on the same TCP connection. There are a few downsides:
Only the authentication would be secured. A MITM could perform actions other than those intended by the user. (This is similar to using HTTP Digest to some extent: the credentials are protected, but the request/response entities aren't.)
From an implementation point of view, this is tricky to get right. The TLS specification says:
If the application protocol using TLS provides that any data may be
carried over the underlying transport after the TLS connection is
closed, the TLS implementation must receive the responding
close_notify alert before indicating to the application layer that
the TLS connection has ended.
This implies that you're going to synchronise your stream somehow to wait for the close_notify response, before carrying on with your plain traffic.
The SSLEngine programming model is rather complex, and you may find that the Netty API isn't necessary handling this situation.
While it may make sense to want to save a few CPU cycles, most of the SSL/TLS overhead is in the handshake, which you'll be doing anyway. The symmetric cryptographic operations used for the actual encryption of the data are much less expensive. (You should try to measure this overhead to see if it really is a problem.)