I was doing some nodejs and I ran into a scenario in which I had to use POST requests. I saw that node deals with POST requests in a slightly different manner than the GET requests. In the case of POST requests we need to create two event listeners on('data', ...) and on('end', ...) . In the case of GET requests, I found no such complication. All of this led me to believe that maybe GET requests are always guaranteed to be sent within one chunk of data from the client. Whereas, POST requests can be sent over multiple chunks. Am I correct, or is there any flaw in my understanding. Please correct me if so.
GET requests don't usually have a "body" as part of them so once you've read the http request headers, you have everything so there is no need for additional code to read more.
POST requests, on the other hand, usually do have a body so once you've gotten the headers, you then need to read the body.
FYI, TCP is a streaming protocol which means there are no guarantees about what chunks data will arrive in. Even the headers themselves could arrive in multiple packets. But, the http library you're using already takes care of that for you. It reads data until it has all the headers. Reading the body of a POST request is more up to you to do unless you use some sort of body-parser middleware which will read the body for you.
Related
So I have been reading about flow based programming (FBP) in the last few days and I have also been reading J. Paul Morrison's book about it. However I feel I still can't really wrap my head around it. The general concept is that you see programming as some sort of assembly line where you have components that take some packet as input and produce some packets as output. You can connect these components and packets travel through the network. While I totally see how this can work for ETL type applications or batch processing, I have no good idea how you could handle things like synchronous request/response patterns or database transactions with it.
For example let's say I have a web server implemented as FPB. This webserver has a GET /user/{id} which should return a JSON with some information about a user. It also has a POST /user/{id} where you can update the user by sending some JSON back to the server. So here is how I would imagine this flow to be looking:
I tried to have many re-usable components instead of putting the whole logic of handling a request into a single component. So there is a HTTP server component which sends out requests to a dispatcher component which then dispatches the requests into subsequent flows. In each flow the request is parsed by a generic "Request parser" component which outputs various parts of the request into the rest of the flow.
The upper part is quite straightforward, I read the entity of the user with the given ID from DB, serialize the object to JSON and then send it back. However at this point we don't really have a reference to the HTTP request anymore, so how would I know where to send this request to?
On the lower part we have some additional complexity because I would like to write to the database in a transactional way. So first a transaction is started (in parallel the request body is parsed into some object), then the user object is retrieved from the database and merged with the inputs from the request. At the end it is written back to the database and the transaction is committed. Finally some "OK" status is responded to the caller. Here I have the additional problem that when committing the transaction I really don't know which transaction to commit. And of course when sending the response I don't know which request to send it to.
So both problems seem to have something in common - a kind of "Context" that spans over many components. On one example it is a HTTP request/response context in the other a transactional context. In regular programming, these contexts are usually handled at the thread level. Since a request runs in a single thread, the transaction and request contexts are bound to a thread-local so they can be accessed everywhere as long as everything is running in the same thread.
In flow based programming, every component runs independently and ideally on separate threads. This is actually a key thing because it allows for parallelization and effective use of multiple processors. However when that thread-local context is no longer there, how can you handle these problems in flow based programming? This would get even more complicated with proper error handling (which I left out in my example).
I figure that when you do reactive style programming where most of the processing is asynchronous and multithreaded as well you will have the same issues, so I wonder if there are patterns to handle this. Do you have real life experience with either reactive style programming or flow based programming and have some hints on how I could solve this problem?
I wrote a quick answer on Twitter - thought I would post it here as well... Apologies for double-posting!
I like substreams for this/these problem(s), where the first Information Packet in the substream provides the "context" you were talking about. This may help: https://github.com/jpaulm/javafbp-websockets... HTH!
PS This loop-style network topology is also the basis of Facebook's new "Flux" technology - see Jing Chen's presentation, in which she compares this approach with MVC: https://www.youtube.com/watch?v=nYkdrAPrdcw
Hopefully this may nudge you in the right direction. I had a similar issue where I needed to perform a synchronous operation in an asynchronous microservice architecture.
How I solved it was using the Observer pattern. I have 3 components; a http server, a callback server and a timer wheel. The http server similar to yours receives the incoming request, the callback server receives the overal result after asynchronous processing and the timer wheel that queues the original http context and reconciles the response to the http request.
When an incoming request is received, the http server creates a correlation id ,appends it to the request metadata, appends the callback server url to the request metadata and finally adds the request and the original http context together into the timer wheel. Then the http server would pass the request to the dispatcher like in your case and send messages to the relevant components for asynchronous processing.
Depending on the outcome of execution of the current processing component, it will retrieve the callback url from the metadata and send the response to the callback server.In your case there's the json serialization or the database write that would do this. The callback server will then extract the correlation Id that was appended and get the corresponding http context and write the response.
NB each timer object in the timer wheel has a timeout that's configurable, that way if the asynchronous processing delays it will timeout and return a configurable message to the http client of the corresponding http context.
To respond a http request, we can just use return "content" in the method function.
But for some mission-critical use cases, I would like to make sure the http
200 OK response was delivered. Any idea?
The HTTP protocol doesn't work that way. If you need an acknowledgement then you need the client to send the acknowledgement to you.
Or you should look at implementing a bi-direction socket (a sample library is socket.io) where the client can send the ACK. If it is mission critical, then don't let it be on just http, use websockets
Also you can use AJAX callbacks to gather acknowledgment. One way of creating such a solution would be UUID generated for every request and returned as a part of header
$ curl -v http://domain/url
....
response:
X-ACK-Token: 89080-3e432423-234234-23-42323
and then client make a call again
$ curl http://domain/ack/89080-3e432423-234234-23-42323
So the server would know that the given response has been acknowledge by the client. But you cannot enforce automatic ACK, it is still on the client to send it, if they don't, you have no way of knowing
PS: The UUID is not an actual UUID here, just for example shared as random number
Take a look at Microsofts asynchronous server socket.
An asynchronous server socket requires a method to begin accepting connection requests from the network, a callback method to handle the connection requests and begin receiving data from the network, and a callback method to end receiving the data (this is where your client could respond with the success or failure of the HTTP request that was made).
Example
It is not possible with HTTP, if for some reason you can't use Sockets because your implementation requires HTTP (like an API) you must acknowledge a timeout strategy with your client.
It depends on how much cases you want to handle, but for example you can state something like this:
Client generate internal identifier and send HTTP request including that "ClientID" (like a timestamp or a random number) either in the Headers or as a Body parameter.
Server responds 200 OK (or error, does not matter)
Client waits for server answer 60 seconds (you define your maximum timeout).
If it receives the response, handle it and finish.
If it does NOT receive the answer, try again after the timeout including the same "ClientID" generated in the step 1.
Server detects that the "ClientID" was already received.
Either return 409 Conflict informing that it "Already exists" and the client should know how to handle it.
Or just return 200 OK and the client never knew that it was received the first time.
Again, this depends a lot on your business / technical requirements. Because you could even get two or more consecutive loops of timeout handle.
Hope you get an idea.
as #tarun-lalwani already written is the http protocol not designed for that. What you can do is to let the app create a file and your program checks after the 200 respone the existence and the time of the remote file. This have the implication that every 200 response requires another request for the check file
For example, I've got 2 threads, each sending a request to the same serial. Will the response of them follow the same order of the request? Or chances are the response of the latter request might come first?
The serial reading process has no way to know the response it's receiving is for which request. So I want to make sure the response order to handle the reading process.
Thanks.
I am making a certain client -> server application using CherryPy as the web server.
I will be needing to create a request with a large content-length header while sending about 80% of the size of the content but then i don't want CherryPy to read the post data based on the content-length i sent, i want to read it manually and write to another file. but it seems CherryPy times out waiting for whole content-length.
In other words i want to read the incoming post stream manually but still allow CherryPy to process the request headers (and not the body)
UPDATE: I think I can do this with a 'Custom Processor' : http://docs.cherrypy.org/stable/refman/_cpreqbody.html , but i still don't understand how i can write a processor and call it in my application.
You could try doing it with rfile, but see the warning. You should really look for a solution that doesn't break the standards. Perhaps use WebSocket.
In a node.js server that accepts HTTPS post requests that are typically pretty large (a few MBs) we want to be able to start processing the requests before the entire thing is accepted by the server.
For example, if a request with a big fat body arrives, we want to look at its path and based on it decide whether to terminate/reject it, without having to wait for the entire request to arrive (and pay IO cost of receiving that fat body).
You could try the the Connect Limit middleware:
https://github.com/senchalabs/connect/blob/master/lib/middleware/limit.js
or, implement your own solution in a similar way by checking req.headers[content-length], etc..
Based on experimentation, it seems that Node.js only fires the request event after parsing the HTTP headers. Meaning there's a chance to examine the headers before we even start listening for the data event.
Thus the solution seems to be to check the headers before reading any data, and potentially rejecting the request at that point. If we don't reject at that point, we start accumulating the data buffers as they arrive and if they exceed the limit (and thus conflict with the reported content length) we have another chance to reject the request right there by calling response.end().