Long running tasks in Thrift

Long running tasks in Thrift - rpc

I plan on using Apache Thrift but some calls will be long running/blocking but still require a return value, which would traditionally be returned via callback.
I understand that Thrift does not support callbacks (has this changed?) so I am thinking about making the function just block until a response is returned. Would this be ok? Will Thrift complain (timeout) if an RPC request takes too long?
They say Thrift wasn't intended for bi-directional communication but it should be easy enough to do with a socket.
Context: I am using Thrift or IPC between two processes locally, therefore there won't be huge load on the server alleviating any concern that long running HTTP requests would overload the server.
Am I missing a solution provided by something else?

I understand that Thrift does not support callbacks (has this changed?)
No (not supported), and no (not changed).
some calls will be long running/blocking but still require a return value, which would traditionally be returned via callback.
Yes, if you want to stick with the RPC style of doing things, or are technically limited in that regard.
so I am thinking about making the function just block until a response is
returned. Would this be ok?
Long running calls are a perfectly legal solution. Even polling could be an option, of course unless you are flooding the server with calls. Depends on what "long" means exactly.
Will Thrift complain (timeout) if an RPC request takes too long?
You can always initiate a new request after the connection has dropped.
They say Thrift wasn't intended for bi-directional communication
but it should be easy enough to do with a socket.
In a local setup having both ends acting as client and server is indeed possible, and maybe the better option in your case.
In contrast, it's much harder to do that across the interblag. Therefore, if you have strong plans to extend your solution later into such a scenario, this may create some additional headaches to rewrite the bidi solution into long running calls. If that is not the case, you can safely ignore this paragraph.

Related

Rust HTTP2 request multiplexing in single thread

I am trying to write a piece of latency-sensitive Rust code that makes a few HTTP requests very quickly (ideally within a few microseconds of one another -- I know that seems ridiculous when considering the effects of network latency, but I'm co-located with the server I'm trying to work with here). The server I'm trying to work with supports HTTP/2 (I know it at least does ALPN -- not sure if it also does prior-knowledge connections).
I've tried using Reqwest's tokio-based client, but tokio's task spawning time seems to be anywhere from 10-100us (which is more than I'd like).
I've tried doing independent threads with their own HTTP clients that just make their own requests when they get an incoming message on their channels, but the time between sending the message and them receiving it can be anywhere from 1-20us (which is, again, more than I'd like).
I've also tried using libcurl's multi library, but that seems to add milliseconds of latency, which is far from ideal.
I've also tried using hyper's client, but any way that I try to enable HTTP/2 seems to run into issues.
I know that HTTP/2 can multiplex many requests into a single connection. And it seems to me that it's probably possible to do this in a single thread. Would anyone be able to point me in the right direction here?
Alright, here's some of the code I've got so far: https://github.com/agrimball/stack-overflow-examples
Not sure if this question is scoped appropriately for StackOverflow. The basic idea is simple (I want to do HTTP/2 request multiplexing and haven't found a usable lib for it in Rust -- any help?), but the story of how I got to that question and the list of alternatives I've already tried does end up making it a bit long...
In any case, thanks in advance for any help!
ninja edit: And yes, being able to do so from a single thread would be ideal since it would remove a bit of latency as well as some of the code complexity. So that part of the title is indeed relevant as well... Still striving for simplicity / elegance when possible!

Best approach to connect two nodejs REST API servers

Scenario is I have two Node applications which are providing some REST APIs, Server_A has some set of REST endpoints, and Server_B has some other set of endpoints.
We have a requirement where Server_A need some data from Server_B. We can create some REST endpoints for this but there will be some performance issues. Server_A will create http connection each time to Server_B.
We can use Websockets but I am not much sure if it would be good approach or not.
In all cases Server_A will be calling Server_B and Server_B will return data instantly.
Server_B will be doing most of the database operations, Server_A has calculations only. Server_A will call Server_B for some data requirements.
In Addition, there will be only one socket connection that is between Server_A and Server_B, all other clients will connect via REST only.
Could anyone suggest if it would be correct approach?
Or you have some better idea.
It would be helpful if I get some code references, modules suggestions.
Thanks

What you are asking about is premature optimization. You are attempting to optimize before you even know you have a problem.
HTTP connections are pretty darn fast. There are databases that work using an HTTP API and those databases are consulted on every HTTP request of the server. So, an HTTP API that is used frequently can work just fine.
What you need to do is to implement your server A using the regular HTTP requests to server B that are already supported. Then, test your system at load and see how it performs. Chances are pretty good that the real bottleneck won't have anything to do with the fact that you're using HTTP requests between server A and server B and if you want to improve the performance of your system, you will probably be working on different problems. This is why you don't want to do premature optimization.
The more moving parts in a system, the less likely you have any idea where the actual bottlenecks are when you put the system under load. That's why you have to test the system under load, instrument it like crazy so you can see where the performance is being impacted the most and then measure like crazy. Then, and only then, will you know where it makes sense to invest your development resources to improve your scalablity or performance.
FYI, a webSocket connection has some advantages over repeated HTTP connections (less connection overhead per request), but also some disadvantages (it's not request/response so you have invent your own way to match a response with a given request).

Node.js options to push updates to some microcontrollers that have an HTTP 1.1 stack

The title pretty well says it.
I need the microcontrollers to stay connected to the server to receive updates within a couple of seconds and I'm not quite sure how to do this.
The client in this case is very limited to say the least and it seems like all the solutions I've found for polling or something like socket.io require dropping some significant JavaScript to the client.
If I'm left having to reimplement one of those libraries in C on the micro I could definitely use some pointers on the leanest way to handle it.
I can't just pound the server with constant requests because this is going to increase to a fair number of connected micros.

Just use ordinary long polling: each controller initially makes an HTTP request and waits for a response, which happens when there's an update. Once the controller receives the response, it makes another request. Lather, rinse, repeat. This won't hammer the server because each controller makes only one request per update, and node's architecture is such that you can have lots of requests pending, since you aren't creating a new thread or process for each active connection.

Node.js App with API endpoints which take 20sec+ :: Connections Left Open :: How to Optimize?

I have a Node.js RESTful API returning JSON data. One of the API calls can (and frequently does) take 10 - 20 seconds to finish. This long RTT is due to connecting to external APIs, like DiffBot, MailChimp, Facebook, Twitter, etc. I wish I could make the API call shorter, but I cannot.
Of course, I've implemented the node code in a nice async way, but the problem is that the client's inbound connection (to the node app) is alive while it waits for the server to finish, and thus might be killing my performance. In fact, I'm currently guessing that this may explain my long-running timeout issue in node.
I've already increased maxSockets to a huge number...
require('http').globalAgent.maxSockets = 9999;
For the sake of interest, I'm printing out the active sockets each time a new connection is made (here's the code).
Which gives me output like this:
SOCKETS: {} { 'graph.facebook.com:443': 5, 'api.instagram.com:443': 1 }
Nothing too enlightening there. The max connections I ever see is around 20 or so, total, across all hosts. But this doesn't really tell me anything about incoming connections, or how to optimize them so that my server does not choke when there are many of them alive at once (which I suspect it is).

You should optimize your architecture, not just the code.
First, I would change the way the client/server interact with each other. The server should end the request upon recept and notify the client once all the tasks for that request are truly complete.
There are different ways to achieve that. For example, the client can query the stats of the request using AJAX (poll) every X seconds. Another example would be to use WebSocket.
If you're going with this approach, look into Socket.IO. It supports many transports with the same API, if WebSocket is available, it would use that, otherwise, it would fall back to other transports such as Flash Socket, long-polling, etc.
Second, you shouldn't use one process to do all this work. You should use a queue (preferably a messaging system that supports queues), then, run workers (separate processes) to do the "heavy lifting".
Personally, I use AMQP due to its features and portability (it's an open-standard) but feel free to use any other queue system with a persistant backend.
That way, if one or more process(es) crash(es) and you use the right queue, you wouldn't lose any data (such as the API tasks you mentioned).
Hope it helps.

choose between tcp "long" connection and "short" connection for internal service

I got an app that web server re-direct some requests to backend servers, and the backend servers(Linux) will do complicated computations and response to web server.
For the tcp socket connection management between web server and backend server, i think there are two basic strategy:
"short" connection: that is, one connection per request. This seems very easy for socket management and simplify the whole program structure. After accept, we just get some thread to process the request and finally close this socket.
"long" connection: that is, for one tcp connection, there could be multi request one by one. It seems this strategy could make better use of socket resource and bring some performance improvement(i am not quite sure). BUT it seems this brings a lot of complexity than "short" connection. For example, since now socket fd may be used by multi-threads, synchronization must be involved. and there are more, socket failure process, message sequence...
Is there any suggestions for these two strategies?
UPDATE:, #SargeATM 's answer remind me that i should tell more about the backend service.
Each request is kind of context-free. Backend service can do calculation based on one single request message. It seems to be sth. stateless.

Without getting into the architecture of the backend which I think heavily influences this decision, I prefer short connections for stateless "quick" request/response type traffic and long connections for stateful protocols like a synchronization or file transfer.
I know there is some tcp overhead for establishing a new connection (if it isn't local host) but that has never been anything I have had to optimize in my applications.
Ok I will get a little into architecture since this is important. I always use threads not per request but by function. So I would have a thread that listened on the socket. Another thread that read packets off of all the active connections and another thread doing the backend calculations and a last thread saving to a database if needed. This keep things clean and simple. Easy to measure slow spots, maintain, and to optimize later when needed if needed.

What about a third option... no connection!
If your job description and job results are both of small size, UDP sockets may be a good idea. You have even less resources to manage, as there's no need to bound the request/response to a file descriptor, which give you some flexibility for the future. Imagine you have more backend services and would like to do some load balancing – a busy service can send the job to another one with UDP address of job submitter. The latter just waits for the result and doesn't care where you performed the task.
Obviously you'd have to deal with lost, duplicated and out of order packets, but as a reward you don't have to deal with broken connections. Out of order packets are probably not a big deal if you can fit the request and response in one UDP message, duplication can be taken care of by some job ids, and lost packet... well, they can be simply resent ;-)
Consider this!

Well, you are right.
The biggest problem with persistent connections will be making sure that app got "clean" connection from pool. Without any garbage left of data from another request.
There are a lot of ways to deal with that problem, but at the end it is better to close() tainted connection and open new one than trying to clean it...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string