Performance difference between res.json() and res.end() - node.js

I want to send a JSON response using Node and Express. I'm trying to compare the performance of res.end and res.json for this purpose.
Version 1: res.json
res.json(anObject);
Version 2: res.end
res.setHeader('Content-Type', 'application/json');
res.end(JSON.stringify(anObject));
Running some benchmarks I can see that the second version is almost 15% faster than the first one. Is there a particular reason I have to use res.json if I want to send a JSON response?

Yes, is is very desirable to use json despite the overhead.
setHeader and end come from the native http module. By using them, you're effectively bypassing a lot Express's added features, hence the moderate speed bump in a benchmark.
However, benchmarks in isolation don't tell the whole story. json is really just a convenience method that sets the Content-Type and then calls send. send is an extremely useful function because it:
Supports HEAD requests
Sets the appropriate Content-Length header to ensure that the response does not use Transfer-Encoding: chunked, which wastes bandwidth.
Most importantly, provides ETag support automatically, allowing conditional GETs.
The last point is the biggest benefit of json and probably the biggest part of the 15% difference. Express calculates a CRC32 checksum of the JSON string and adds it as the ETag header. This allows a browser making subsequent requests for the same resource to issue a conditional GET (the If-None-Match header), and your server will respond 304 Not Modified if the JSON string is the same, meaning the actual JSON need not be sent over the network again.
This can add up to substantial bandwidth (and thus time) savings. Because the network is a much larger bottleneck than CPU, these savings are almost sure to eclipse the relatively small CPU savings you'd get from skipping json().
Finally, there's also the issue of bugs. Your "version 2" example has a bug.
JSON is stringified as UTF-8, and Chrome (contrary to spec) does not default to handling application/json responses as UTF-8; you need to supply a charset. This means non-ASCII characters will be mangled in Chrome. This issue has already been discovered by Express users, and Express sets the proper header for you.
This is one of the many reasons to be careful of premature/micro-optimization. You run the very real risk of introducing bugs.

Related

How to ignore big cookies in varnish

I want to ignore requests with big cookie size. We have some requests dropped in varnish due to "BogoHeader Header too long: Cookie: xyz". How can it be done in VCL? I didn't find any len, length or strlen function in VCL, I know that it can be done in the vcl_rcev phase.
The strlen() feature won't help to fix your problem. Varnish discards the request due to the large Cookie header before vcl_recv is executed. If you don't want those requests to be discarded you need to check and adjust some runtime parameters: http_req_hdr_len, http_req_size, http_resp_hdr_len, etc.
In any case, if you are still interested in the strlen() feature, it would be trivial to add it to the std VMOD, but that support doesn't exist at the moment. You could consider using an existing VMOD including utilities like strlen() (or implement it on your own), but that's probably too much work. Finally, you could consider using a hacky approach using just VCL and a regexp:
if (req.http.Cookie ~ "^.{1024,}$") {
...
}

Varnish return(fetch/deliver) vs chunked encoding

I'm trying to get varnish cache response to be chunked... (that's possible, right?)
I have the following scenario:
1 - cache is clean, good to go (service varnish restart)
2 - access the www.mywebsite.com/page for the first time
(no content-length is returned, and chunking is there, great!)
3 - the next time I access the page (like simple reloading) it will be cached.. and now I get this:
(now we have content-length... which means no chunking :( not great!)
After reading some Varnish docs/blogs (and this: http://book.varnish-software.com/4.0/chapters/VCL_Basics.html), looks like there are two "last" returns: return(fetch) or return(deliver).
When forcing a return(fetch), the chunked encoding works... but it also means that the request won't be cached, right? While return(deliver) caches correctly but adds the content-length header.
I've tried adding these to my default.vcl file:
set beresp.do_esi = true; (at vcl_backend_response stage)
and
unset beresp.http.content-length; (at different stages, without success)
So.. how to have Varnish caching working with Transfer-Encoding: chunked?
Thanks for your attention!
Is there a reason why you want to send it chunked? Chunked transfer encoding is kind of a clumsy workaround for when the content length isn't known ahead of time. What's actually happening here is Varnish is able to compute the length of the gzipped content after caching it for the first time, and so doesn't have to use the workaround! Rest assured that you are not missing out on any performance gains in this scenario.

Modify Header server: ArangoDB

Something that seems easy, but I don't find the way to do that. Does it possible to change the header sent in a response
server: ArangoDB
by something else (in order to be less verbose and more secure) ?
Also, I need to store a large string (very long url + lot of informations) in a document, but what is the max length of a joi.string ?
Thx,
The internal string limit in V8 (the JavaScript engine used by ArangoDB) is around 256 MB in the V8 version used by ArangoDB. Thus 256 MB will be the absolute maximum string length that can be used from JavaScript code that's executed in ArangoDB.
Regarding maximum URL lengths as mentioned above: URLs should get too long because very long URLs may not be too portable across browsers. I think in practice several browser will enforce some URL max length limits of around 64 K, so URLs should definitely not get longer than this value. I would recommend using much shorter URLs though and passing hugh payloads in the HTTP request body instead. This also means you may need to change from HTTP GET to HTTP POST or HTTP PUT, but its at least portable.
Finally regarding the HTTP response header "Server: ArangoDB" that is sent by ArangoDB in every HTTP response: starting with ArangoDB 2.8, there is an option to turn this off: --server.hide-product-header true. This option is not available in the stable 2.7 branch yet.
No, there currently is no configuration to disable the server: header in ArangoDB.
I would recommend prepending an NGiNX or similar HTTP-Proxy to achieve that (and other possible hardening for your service).
The implementation of server header can be found in lib/Rest/HttpResponse.cpp.
Regarding Joi -
I only found howto specify a string length in joi - not what its maximum could be.
I guess the general javascript limit for strings should be taken into account.
However, it rather seems that you shouldn't exceed the limit of 2000 chars for URLs which thereby should be the limit.

Heartbeat extension: does it make sense to allow for arbitrary payload?

https://www.rfc-editor.org/rfc/rfc6520 does not explain why a heartbeat request/response round-trip is supposed to contain a payload. It just specifies that there is room for payload and that the response has to contain the same payload as the request.
What is this payload good for? My questions are:
What could it be that the engineers thought when they designed the protocol to allow for including arbitrary payload into the heartbeat request? What are the advantages?
What are the reasons that this payload must be contained in the response?
I see that by allowing for arbitrary payload the application is able to unambiguously match a certain response with a certain request. Is that the only advantage? If yes, then why did one not force the payload to be of a certain length? What is the flexibility in the payload length good for? Does it have to do with a cryptographic concept, such that the length of heartbeat requests must be unpredictable?
Other "heartbeat"-like protocol extensions simply pre-define the exact request (e.g. "ping") and the corresponding response (e.g. "pong"). Why did https://www.rfc-editor.org/rfc/rfc6520 take a different route?
It is important to understand the reasoning behind the choices made in RFC6520 in order to properly assess hypotheses that all this might have been an intelligently placed backdoor.
Regarding the arbitrary size: the rfc abtract states that the Hearbeat extension is a basis for path MTU (PMTU) discovery for DTLS. Varying the size is a basis to implement that protocol (http://en.wikipedia.org/wiki/Path_MTU_Discovery)
Regarding the arbitrary content: packet delivery may not be preserved or packets may be lost. varying the content helps to identify them

How long should a message header/prefix be?

I've worked with a few protocols, and written my own. I have written some message formats with only 1 char to identify the message, and some with 4 chars. I don't feel that I'm experienced enough to tell which is better, so I'm looking for an answer which describes in which scenario one might be better than the other.
For performance, you would imagine that sending 2 bytes (A%1i) is faster than sending 5 bytes (ABCD%1i). However, I have noticed that when writing the protocol with the 1 byte prefix, if you have a bug which causes your code to not read enough data from the socket, you might get garbage data comming into your system.
So is the purpose of a 4 byte prefix just to provide a guarentee that your message is clean? Is it worth it for the performance you sacrafice? Do you really sacrafice any performance at all? Maybe it's better to have 2 or 3 byte prefix?
I'm not sure if this question should be specific to TCP, or whether it applies to all transport protocols. Advice on this would be interesting.
Update: For interest, I will mention that Synergy uses 4-byte message prefixes, so for a mouse move delta the header is the same size as the actual data. Some have suggested just having a 1 or 2 byte prefix to improve efficiency. I wonder what drawbacks this would have?
Update: Also, I wonder if only the handshake really matters, if you're worried about garbage data. Synergy has a long handshake (a few bytes), so are the 4-byte message prefixes needed? I made a protocol recently that has only a 1 byte handshake, and that turned out to be a bad idea, since incompatible protocols were spamming the system with bad data (off the back of this, I might reccomend at least having a long handshake).
The purpose of the header is to make it easier to solve the frame synchronization problem ( byte aligning in serial communication ).
To synchronize, the receiver looks for anything in the data stream that "looks like" a start-of-message header.
If you have lots of different kinds of valid start-of-message headers, and all of them are 1 byte long, then you will inevitably get a lot of "false frame synchronizations" -- garbage from something that "looks like" a start-of-message header, but isn't.
It would be better to pick some other header that makes it "unlikely" that anything in the serial data stream "looks like" a valid start-of-message header.
It is inevitable that you will get garbage data coming into your system, no matter how you design the packet header.
Whatever you use to handle these other problems (such as occasional bit errors in the middle of the message) should also be adequate to handle the occasional "false frame synchronization" garbage.
In some systems, any bad data is quickly overwritten by fresh new good data, and if you blink you might never see the bad data.
Other systems need at least some sort of error detection in the footer to reject the bad data.
Yet other systems need to not only detect such errors, but somehow keep re-sending that message -- until both sides are convinced that an error-free version of that message has been successfully received.
As Oleksi implied, in some systems the latency is not significantly different between sending a single binary bit (100 ms) and sending 10 bytes (102.4 ms).
So the advantages of using a tiny header (2.4% less latency!) may not be worth it compared to the advantages of using a more verbose header (easier debugging; easier to make backward-compatible and forward-compatible; easier to test the effect of minor changes "in isolation" without upgrading both sides in lockstep to the new protocol which is completely incompatible with the old protocol).
Perhaps you could get the best of both worlds by (a) keeping the verbose, easy-to-debug headers on messages that are so rarely used that the effect of tiny headers is too small to measure (which I suspect is nearly all messages), and (b) introducing a "tiny header" format for any kind of message where the effect of tiny headers is "noticeably better" or at least at least measurable.
It looks like the Synergy protocol is flexible enough to add such a "tiny header" format in a way that is easily distinguishable from the other kinds of message headers.
I use Synergy between my laptop and a few desktop machines. I am glad someone is trying to make it even better.
The performance will depend on the content of the message you are sending. If your content is several kilobytes, it doesn't really matter how many bytes your header is. For now, I would choose the scheme that's easiest to work with, because the performance difference between sending one byte, or four bytes is going to be negligible compared to the actual data that you're sending.

Resources