Does compressing HTTP response is useless when Content-Length is lower than 1K? - http-compression

I wrote a web service that respond JSON content lower than 1K. Which one of this compression strategy is the best?
gzip this content by the reverse-proxy as any other text ressource?
Add a rule to not compress ressources under a threshold?
I think that the packet size on internet network are greater than 1K (This article is pretty interessing but it brings me more questions than answers: 579 bytes? 1518 bytes?). This way, it would make sense to avoid taking time and processor to compress a content that will already be sent in 1 packet.
Thus I am more looking at somebody's testing about these 2 strategies? Does anybody made any test?
And I am also interested in the rule you have written.
Thanks

I downloaded a copy of this page (that is, the source code containing the HTML for this question), and kept only the first 993 characters.
That is, the original size is 993 characters.
Compressing that file using gzip compression results in a file of 595 bytes.
This means that the new file is almost 60% of the original!
Conclusion: Yes, it is easily worth ~1KB of (textual) data.
Approximately halving the original size to 515 characters results in a compressed file of 397 characters, the newer file is about 77% of the original, not as good but still an advantage.
Approximately halving the file again to 223 characters results in a compressed file of 277 bytes, and the compressed file is now larger, so for very small packet sizes, gzip compression isn't useful, although it's still possible to achieve compression. (But not with a naive use of gzip).
To give you an idea of how tiny ~500 bytes is, consider google.com's response (including HTTP headers):
HTTP/1.0 302 Found
Location: http://www.google.com/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
X-Content-Type-Options: nosniff
Date: Wed, 16 Mar 2011 11:27:29 GMT
Server: sffe
Content-Length: 219
X-XSS-Protection: 1; mode=block
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
That is already 465 bytes including the header! (But the HTTP header is not normally compressed, only the content... Which here is 219 characters).
Compressing that results in a file size of 266 (excluding the headers), so is a small increase not worth worrying about.

While it may not help, compressing a small packet probably doesn't hurt either. In addition, high-concurrency systems that make use of keep-alives may still benefit, because they can potentially buffer up multiple responses in a single packet, and compression will squeeze more responses into each packet.

Related

Varnish return(fetch/deliver) vs chunked encoding

I'm trying to get varnish cache response to be chunked... (that's possible, right?)
I have the following scenario:
1 - cache is clean, good to go (service varnish restart)
2 - access the www.mywebsite.com/page for the first time
(no content-length is returned, and chunking is there, great!)
3 - the next time I access the page (like simple reloading) it will be cached.. and now I get this:
(now we have content-length... which means no chunking :( not great!)
After reading some Varnish docs/blogs (and this: http://book.varnish-software.com/4.0/chapters/VCL_Basics.html), looks like there are two "last" returns: return(fetch) or return(deliver).
When forcing a return(fetch), the chunked encoding works... but it also means that the request won't be cached, right? While return(deliver) caches correctly but adds the content-length header.
I've tried adding these to my default.vcl file:
set beresp.do_esi = true; (at vcl_backend_response stage)
and
unset beresp.http.content-length; (at different stages, without success)
So.. how to have Varnish caching working with Transfer-Encoding: chunked?
Thanks for your attention!
Is there a reason why you want to send it chunked? Chunked transfer encoding is kind of a clumsy workaround for when the content length isn't known ahead of time. What's actually happening here is Varnish is able to compute the length of the gzipped content after caching it for the first time, and so doesn't have to use the workaround! Rest assured that you are not missing out on any performance gains in this scenario.

gzip and pipe to output (performance consideration)

q1) Can i check if I do a
gzip -c file | encrypt (some parameters)
a) does gzip print out the output line by line and pipe it to the encrypt function or
b) gzip will be perform 1st, then the output will be pipe all at once to the encrypt function ?
====================================================
q2) Will performing gzip | encrypt have any better performance considerations then gzip, then encrypt
Regards,
Noob
Gzip is a streaming compressor/decompressor. So (for large enough inputs) the compressor/decompressor starts writing output before it has seen the whole input.
That's one of the reasons gzip compression is used for HTTP compression. The sender can compress while it's still generating content; the recipient can work on decompressing the first part of the content, while still receiving the rest.
Gzip does not work "line-by-line", because it doesn't know what a line is. But it does work "chunk-by-chunk", where the compressor defines the size of the chunk.
"Performance" is too vague a word, and too complex an area to give a yes or no answer.
With gzip -c file | encrypt, for a large enough file, will see encrypt and gzip working concurrently. That is, encrypt will be encrypting the first compressed block before gzip compresses the last chunk of file.
The size of a pipe buffer is implementation dependent. Under SunOS, it's 4kB. That is: gunzip < file.gz | encrypt will move in 4k chunks. Again, it depends on the OS. CygWIN might behave completely differently.
I should add that this is in man 7 pipe. Search for PIPE_BUF.

Performance difference between res.json() and res.end()

I want to send a JSON response using Node and Express. I'm trying to compare the performance of res.end and res.json for this purpose.
Version 1: res.json
res.json(anObject);
Version 2: res.end
res.setHeader('Content-Type', 'application/json');
res.end(JSON.stringify(anObject));
Running some benchmarks I can see that the second version is almost 15% faster than the first one. Is there a particular reason I have to use res.json if I want to send a JSON response?
Yes, is is very desirable to use json despite the overhead.
setHeader and end come from the native http module. By using them, you're effectively bypassing a lot Express's added features, hence the moderate speed bump in a benchmark.
However, benchmarks in isolation don't tell the whole story. json is really just a convenience method that sets the Content-Type and then calls send. send is an extremely useful function because it:
Supports HEAD requests
Sets the appropriate Content-Length header to ensure that the response does not use Transfer-Encoding: chunked, which wastes bandwidth.
Most importantly, provides ETag support automatically, allowing conditional GETs.
The last point is the biggest benefit of json and probably the biggest part of the 15% difference. Express calculates a CRC32 checksum of the JSON string and adds it as the ETag header. This allows a browser making subsequent requests for the same resource to issue a conditional GET (the If-None-Match header), and your server will respond 304 Not Modified if the JSON string is the same, meaning the actual JSON need not be sent over the network again.
This can add up to substantial bandwidth (and thus time) savings. Because the network is a much larger bottleneck than CPU, these savings are almost sure to eclipse the relatively small CPU savings you'd get from skipping json().
Finally, there's also the issue of bugs. Your "version 2" example has a bug.
JSON is stringified as UTF-8, and Chrome (contrary to spec) does not default to handling application/json responses as UTF-8; you need to supply a charset. This means non-ASCII characters will be mangled in Chrome. This issue has already been discovered by Express users, and Express sets the proper header for you.
This is one of the many reasons to be careful of premature/micro-optimization. You run the very real risk of introducing bugs.

How to modify a gzip compressed file

i've a single gzip compressed file (100GB uncompressed 40GB compressed). Now i would like to modify some bytes / ranges of bytes - i DO NOT want to change the files size.
For example
Bytes 8 + 10 and Bytes 5000 - 40000
is this possible without recompressing the whole file?
Stefan
Whether you want to change the file sizes makes no difference (since the resulting gzip isn't laid out according to the original file sizes anyway), but if you split the compressed file into parts so that the parts you want to modify are in isolated chunks, and use a multiple-file compression method instead of the single-file gzip method, you could update just the changed files without decompressing and compressing the entire file.
In your example:
bytes1-7.bin \
bytes8-10.bin \ bytes.zip
bytes11-4999.bin /
bytes5000-40000.bin /
Then you could update bytes8-10.bin and bytes5000-40000.bin but not the other two. But whether this will take less time is dubious.
In a word, no. It would be necessary to replace one or more deflate blocks with new blocks with exactly the same total number of bits, but with different contents. If the new data is less compressible with deflate, this becomes impossible. Even if it is more compressible, it would require a lot of bit twiddling by hand to try to get the bits to match. And it still might not be possible.
The man page for gzip says "If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip." I believe that means that gzip compression continues through the files, therefore is context-sensitive, and therefore will not permit what you want.
Either decompress/patch/recompress, or switch to a different representation of your data (perhaps an uncompressed tar or zip of individually compressed files, so you only have to decompress/recompress the one you want to change.) The latter will not store your data as compactly, in general, but that's the tradeoff you have to make.

How to corrupt header of tar.gz for testing purpose

How to corrupt header of tar.gz for testing purpose? So that when the application tries to unzip it ... it fails.
Thanks
It's awfully simple to create a file that gzip won't recognize:
dd if=/dev/urandom bs=1024 count=1 of=bad.tar.gz
While of course it's possible to create a valid gzip file with /dev/urandom, it's about as likely as being struck by lightning. Under a clear sky.
Get yourself a hex editor, that previous questions recommends bless.
You can try arbitrarily changing bits but if you want to be more surgical take a look at the gzip spec which can tell you exactly which bits to flip on the outer gzip header. Or try the tar specification.
There are checksums embedded in gzip files, those may be a good first choice to change:
If FHCRC is set, a CRC16 for the gzip header is present, immediately
before the compressed data. The CRC16 consists of the two least
significant bytes of the CRC32 for all bytes of the gzip header up to
and not including the CRC16

Resources