I'm trying to build a Rust web framework using Actix that needs to query HBase backend. We have chosen to use the Thrift code generator to generate the APIs with this file. However we are having some troubles figuring out how to pass the connection to our web-tier query functions. The official Actix way is to use an extractor that extracts the application state, in this case an HBase connection. More specifically, we are able to create objects of type THBaseServiceSyncClient, which is the application data we wish to pass around that keeps an open connection to HBase and allows us to do queries.
The official way is to clone this data, for each running thread. The first issue we encountered is that, this type does not implement the Clone trait. We were able to implement our own Clone function, only to realize that it also does not implement the DerefMut trait. This is slightly harder, and cannot be circumvented due to the function definitions in the API linked above. The usual way to go about doing this, is to wrap the object with a Mutex. We have experimented with that, and it performed very poorly. The contention was way too high, and we simply cannot have one global connection for all the threads to use.
We researched how other popular databases connections are handled in Rust. We realize that a thread_pool is usually used, where a pool of active connections is kept, and a manager keeps track of the alive/dead connections and spins up more if needed. We found this r2d2 crate, which claims to provide a generic connection pool for Rust. Unfortunately there is no thrift support, and we experimented by implementing our very simple pool manager similar to the mysql variant here. The result was very underwhelming. The throughput was not nearly what we need, and a lot of the time is wasted on the pool manager, according to some simple flamegraph profiling.
Is there some more obvious ways to achieve this goal that we're missing here? I'm wondering if anyone has experienced similar issues and can provide some inside as to what's the best way to go about doing this. Much appreciated.
Related
I'm currently building an application with actix-web and sqlx. What I've build as an architecture is very similar to this source.
This is basically a trait wrapping the db access, so far so good. But this assumes every single method will get a connection from the pool and execute it. There's no way to share a connection for a transactional workflow (e.g. SELECT FOR UPDATE ... process ... UPDATE).
With which achitecture or library could I achieve this?
I'm a rust newbie, want to write a node.js package related to database querying.
I'm using napi-rs for the package.
In node.js we have own async stuff, in rust we have similar thing called "tokio" for async stuff.
I want to create ORM for node.js, and learn rust at the same time, so idea is to construct queries on node.js side and perform them on rust side, give response back to node.
I see two ways:
Use tokio with tokio-postgres
Not to use tokio, instead to write own database adapter library which will rely on node async functionality and node sockets. Not even sure if I can do this but can try. Not sure if that makes sense.
First way is way more simple, but will it work? Is efficient to include tokio to node.js package?
execute_tokio_future method of napi works just perfect! Details on how to use it I found here: https://forum.safedev.org/t/adventures-in-rust-node-js-and-safe/2959/10
I was able to get benchmark with better results using rust addon than a node library, so the idea paid off.
Background
I am working on an actix-web application using diesel through r2d2 and am unsure of how to best make asynchronous queries. I have found three options that seem reasonable, but am unsure of which one is best.
Potential Solutions
Sync Actor
For one I could use the actix example, but it is quite complicated and requires a fair deal of boilerplate to build. I hope there exists a more reasonable solution.
Actix_web::web::block
As another option I could use the actix_web::web::block to wrap my query functions into a future, but I am unsure of the performance implications of this.
Is the query then running in the same Tokio system? From what I could find in the source, it creates a thread in the underlying actix-web threadpool. Is that a problem?
If I read the code right, r2d2 blocks its thread when acquiring a connection, which would block part of the core actix-web pool. Same with database queries. This would then block all of actix-web if I do more queries than I have threads in that pool? If so, big problem.
Futures-cpupool
Finally, the safe bet that may have some unneeded overhead is futures-cpupool. The main issue is that this means adding another crate to my project, though I don't like the idea of multiple cpu-pools floating around in my application needlessly.
Since both r2d2 and diesel will block there are a surprising amount of tricky things in here.
Most importantly, do not share this cpupool with anything not using the same r2d2 pool (as all threads created may just block waiting for an r2d2 connection, locking down the whole pool when work exists).
Secondly (a bit more obviously), you thus shouldn't have more r2d2 connections than threads in the pool and vice-versa since the bigger one would waste resources (connections unused/threads constantly blocked) (perhaps one more thread, for maybe quicker connection handover by the OS scheduler rather than the cpupool scheduler).
Finally, mind what database you are using and the performance you have there. Running a single connection r2d2 and a single thread in the pool might be best in a write heavy sqlite application (though I would recommend a proper database for such).
Old answers
Old solutions that may work
https://www.reddit.com/r/rust/comments/axy0hp/patterns_to_scale_actixweb_and_diesel/
In essence, recommends Futures-cpupool.
What is the best approach to encapsulate blocking I/O in future-rs?
Recommends Futures-cpupool for general cases.
Old solutions that don't work
https://www.reddit.com/r/rust/comments/9fe1ye/noob_here_can_we_talk_about_async_and_databases/
A really nice fix for a old actix-web version. From what I can find requests no longer have a cpu-pool in them.
I am going with futures-cpupool. It is the best solution due to the blocking nature of my interactions.
Using actix_web::web::block is decent enough, but will use a shared thread-pool in actix (and due to the blocking calls I use this can block the entire thread pool and interfere with other tasks given to actix_web).
It is better to use futures-cpupool to create a separate threadpool per database just for database interactions. This way you group all the tasks that need to wait for each other (when there are more tasks than connections) into one pool, preventing them from blocking any other tasks that don't need a connection and potentially limiting the number of threads to the number of connections (so that the task will only be scheduled when it won't be blocked).
In the case where you only want to use one database connection (or very few) the sync actor is a pretty good option. It will act like a futures-cpupool with one thread, ensuring that all tasks are run one at a time, except that it will use one of actix-web's underlying threads rather than a separate one (therefore, only good for very few connections). I find the boilerplate too big to be worth it, though.
I am using kue.js, which is a redis-backed priority queue for node, for pretty straightforward job-queue stuff (sending mails, tasks for database workers).
As part of the same application (albeit in a different service), I now want to use redis to manually store some mappings for a url-shortener. Does concurrent manual use of the same redis instance and database as kue.js interfere with kue, i.e., does kue require exclusive access to its redis instance?
Or can I use the same redis instance manually as long as I, e.g., avoid certain key prefixes?
I do understand that I could use multiple databases on the same instances but found a lot of chatter from various sources that discourage the use of the database feature as well as talk of it being deprecated in the future, which is why I would like to use the same database for now if safely possibly.
Any insight on this as well as considerations or advice why this might or might not be a bad idea are very welcome, thanks in advance!
I hope I am not too late with this answer, I just came across this post ...
It should be perfectly safe. See the README, especially the section on redis connections.
You will notice that each queue can have its own prefix (default is q), so as long as you are aware of how prefixes are used in your system, you should be fine. I am not sure why it would be a bad idea as long as you know about the prefixes and load usage by various apps hitting the redis server. Can you reference a post/page where this was described as a bad idea ?
I am learning and evaluating sparks and Flink before picking one of them for a project that I got.
In my evaluation I came up with the following simple tasks, that I can figure out how to implement it in both framework.
Let say that
1-/ I have a stream of events that are simply information about the fact that some item have changed somewhere in a database.
2-/ I need for each of those event, to query the db to get the new version of the item
3-/ apply some transformation
4-/connect to another Db and write that results.
My question here is as follow:
Using Flink or Sparks, how can one make sure that the calls to the dbs are handle asynchronously to avoid thread starvation?
I come from scala/Akka, where typically we avoid to make blocking calls and use future all the ways for this kind of situation. Akka stream allows that fine grain level of detail for stream processing for instance Integrating stream with external service. This avoid thread starvation. While I wait in my io operation the thread can be used for something else.
In short I don't see how to work with futures in both frameworks.
So I believe that somehow this can be reproduce with both frameworks.
Can anyone please explain how this is supposed to be handled in Flink or sparks.
If this is not supported out of the box, does anyone has experience with getting it incorporated somehow.
Since version 1.2.0 of Flink, you can now use the Async I/O API to achieve this.