gzip and deflate in .htaccess [duplicate]

gzip and deflate in .htaccess [duplicate] - .htaccess

Can someone tell me the difference in the following scripts in terms of CPU load performance and compression?
<ifModule mod_gzip.c>
mod_gzip_on Yes
mod_gzip_dechunk Yes
mod_gzip_item_include file \.(html?|txt|css|js|php|pl)$
mod_gzip_item_include handler ^cgi-script$ mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime ^application/x-javascript.*
mod_gzip_item_exclude mime ^image/.*
mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</ifModule>
<ifModule mod_deflate.c>
<filesMatch "\.(js|css)$"> SetOutputFilter DEFLATE </filesMatch>
</ifModule>

Both these modules do the same thing: add gzip compression on the fly.
The one you should use these days is mod_deflate - it is the more modern and recommended one and is distributed with Apache. mod_gzip was an older third-party implementation and there is no good reason to use it anymore.
Don't be fooled by the names into thinking they use different compression schemes. They both use gzip (which is a format that uses a compression algorithm called DEFLATE). The latter is only called mod_deflate because the name mod_gzip was taken. They both will achieve the same compression levels with equivalent settings.
They have different configuration (which may include different default settings) so you need to find documentation for the specific one you're using.

I guess you're asking about the differences between these two, and not about compression in general. In that case, I recommend to use mod_deflate which relies on the still-active project zlib. BTW, old articles (6-7 years old) that compare the two, are not relevant anymore.

Most compression algorithms, when applied to a plain-text file, can reduce its size by 70% or more, depending on the content in the file. When using compression algorithms, the difference between standard and maximum compression levels is small, especially when you consider the extra CPU time necessary to process these extra compression passes. This is quite important when dynamically compressing Web content. Most software content compression techniques use a compression level of 6 (out of 9 levels) to conserve CPU cycles. The file size difference between level 6 and level 9 is usually so small as to be not worth the extra time involved.
For files identified as text/.* MIME types, compression can be applied to the file prior to placing it on the wire. This simultaneously reduces the number of bytes transferred and improves performance. Testing also has shown that Microsoft Office and PostScipt files can be GZIP-encoded for transport by the compression modules.
Some important MIME types that cannot be GZIP encoded are external JavaScript files, PDF files and image files. The problem with Javascript files mainly is due to bugs in browser software, as these files are really text files and overall performance would benefit by being compressed for transport. PDF and image files already are compressed, and attempting to compress them again simply makes them larger and leads to potential rendering issues with browsers.
Prior to sending a compressed file to a client, it is vital that the server ensures the client receiving the data correctly understands and renders the compressed format. Browsers that understand compressed content send a variation of the following client request headers:
Accept-encoding: gzip
Accept-encoding: gzip, deflate
Current major browsers include some variation of this message with every request they send. If the server sees the header and chooses to provide compressed content, it should respond with the server response header:
For more information, see this article: http://www.linuxjournal.com/article/6802

Enabling mod_deflate and mod_gzip basically accomplishes the same thing, they both compress your web files before they get sent to the visitors of your website.
There are differences between the two though; mod_deflate can sometimes have a slighter better compression percentage as mod_gzip. Another reason you should choose mod_deflate is because it's better supported as mod_gzip which makes it easier to configure because it's better documented. More info can be found here.

Check out this article on linuxjournal.com
http://www.linuxjournal.com/article/6802
Other than that mod_deflate is easier to configure and generally comes along the apache package.

IMHO You should avoid using mod_deflate for static files, instead serve precompressed files through mod_rewrite with best compression available (7zip created gzip AFAIK). Apache is not caching gzipped static files.
For dynamic files: depends on available bandwidth - mod_deflate needs lot of processing power, but usable if your server is capable of overflowing your network.

Related

Tool that improves compression by uncompressing inner archives

There was a compression tool that uncompressed the inner gz/bz2/xz/etc files before storing them in a tar format and archiving them, and I don't remember its name. I'm creating archives that contain very similar rpm/deb/tgz packages, and only applying compression at the end will probably improve compression ratio significantly.
From what I remember, the tool also stored a metadata file that recorded what compression options were used, in order to reproduce identical zipped files during decompression.

Found it: https://github.com/schnaader/precomp-cpp
It's not clear yet whether it only does recompressing of a given archive, or it also accepts a list of input files/dirs (some of which may already be compressed), like tar does.

How do you prepare deflate streams for PIGZ (parallel gzip)?

I am using the PIGZ library. https://zlib.net/pigz/
I compressed large files using multiple threads per file with this library and now I want to decompress those files using multiple threads per file too. As per the documentation:
Decompression can’t be parallelized, at least not without specially
prepared deflate streams for that purpose.
However, the documentation doesn't specify how to do that, and I'm finding it difficult to find information on this.
How would I create these "specically prepared deflate streams" that PIGZ can utilise for decompression?

pigz does not currently support parallel decompression, so it wouldn't help to specially prepare such a deflate stream.
The main reason this has not been implemented is that, in most situations, decompression is fast enough to be i/o bound, not processor bound. This is not the case for compression, which can be much slower than decompression, and where parallel compression can speed things up quite a bit.
You could write your own parallel decompressor using zlib and pthread. pigz 2.3.4 and later will in fact make a specially prepared stream for parallel decompression by using the --independent (-i) option. That makes the blocks independently decompressible, and puts two sync markers in front of each to make it possible to find them quickly by scanning the compressed data.The uncompressed size of a block is set with --blocksize or -b. You might want to make that size larger than the default, e.g. 1M instead of 128K, to reduce the compression impact of using -i. Some testing will tell you how much your compression is reduced by using -i.
(By the way, pigz is not a library, it is a command-line utility.)

How much does using htaccess files slow down website performance (especially with solid state disks)?

The Apache docs say (http://httpd.apache.org/docs/2.4/howto/htaccess.html),
"You should avoid using .htaccess files completely if you have access
to httpd main server config file. Using .htaccess files slows down
your Apache http server. Any directive that you can include in a
.htaccess file is better set in a Directory block, as it will have the
same effect with better performance."
But that gives me no idea of the scale of the impact.
I have an architecture designed for shared hosting where the only choice was to use htaccess files.
I'm moving over to Digital Ocean where I can do what I like.
I need to make a judgement on whether to stick with htaccess files or move stuff from there into the centralized config files and switch them off.
There could be 100s of small low-use sites (local businesses).
If the performance hit amounts to under about 50ms in serving a page or has some other minor hit like reducing the number of concurrent accesses that can be supported by under about 5%, then I don't care.
If the effect is big enough that people might feel the difference, then I care enough to spend time changing things.
But I've found nothing that gives me an indication of what order of magnitude of the hit I can expect.
Can anyone enlighten me?
Edit: I'm not looking for anything like exact numbers. But surely someone somewhere who is more able than me has done some benchmarking, or knows from experience the type of difference there can be under particular circumstances.

From an answer on Quora by Jonathan Klein, 12ms for a 1500 line .htaccess file:
Having a large .htaccess does have a cost. Ours is currently ~1500 lines and we benchmarked the time spent parsing it at around 10-12ms on a production webserver. Hardware makes a difference obviously, but you can fairly safely assume that the cost of that 3000 line .htaccess is around 25-35ms per request.

That slide shows the impact of htaccess files with either no htaccess file, the htaccess file in the root folder as well as in subfolders in comparison to the no htaccess baseline.

A Digital Ocean tutorial at https://www.digitalocean.com/community/tutorials/how-to-use-the-htaccess-file says,
"The .htaccess page may slow down your server somewhat; for most servers this will probably be an imperceptible change."

The httpd.conf is parsed one time. If you use .htaccess it'll get hit every time something is called. That'll cause a fairly large performance hit that will just get worse with increasing requests.

I quote from a tutorial on .htaccess by Joseph Pecoraro on code.tutsplus.com here:
Also, when [.htaccess is] enabled the server will take a potential
performance hit. The reason is because, every server request, if
.htaccess support is enabled, when Apache goes to fetch the requested
file for the client, it has to look for a .htaccess file in every
single directory leading up to wherever the file is stored.
These potential file accesses (potential because the files may not
exist) and their execution (if they did exist) will take time. Again,
my experience is that it's unnoticeable and it doesn’t outweigh the
benefits and flexibility that .htaccess files provide developers.
For your scenario, my personal recommendation would be like "Don't fix what isn't broken", because time and effort is equal to money, and I fully agree with your reasoning in this comment.

It is hard to give numbers as the impact depends on the speed of your hardware and how much memory are you willing to dedicate for file caching, but let me give you an example that should clarify the impact - let say you have a wordpress site and an image which is located at /wp-content/uploads/2015/10/my.png needs to be served.
While serving this file might take 1 disk access, just to check out for all the possible .htaccess you will need additional 5 disk accesses. If the file is big the overhead might not be noticeable, but for a small file that might even fit into one block on the disk you waste 80% of the time doing things that are not needed at all.
Still, big enough cache can fix almost any design/coding problem, but the more directories you will have the bigger the cache that you will need to avoid degradation of performance.
In your specific case it is actually a no brainer at all. You already generate the htaccess files so all you need to do is wrap them in a directory directive, generate them into some directory and include them from http.conf. Even doing it manually should not take more then a few hours and after that you have a better architecture of your server.

When not to do maximum compression in png?

Intro
When saving png images through GIMP, I've always used level 9 (maximum) compression, as I knew that it's lossless. Now I've to specify compression level when saving png format image through GD extension of PHP.
Question
Is there any case when I shouldn't compress PNG to maximum level? Like any compatibility issues? If there's no problem then why to ask user; why not automatically compress to max?

Each level of PNG compression requires significantly more memory and processing power to compress (and decompress to a lesser degree).
There is a rapid tailoff in the compression gains from each level, however, so choose one that balances the webserver resources available for compression with your need to reduce bandwith.

Is there an appender/configuration for log4j or Logback that allows you to write to a GZIP file?

I'm having issue with logging that is using up too much DiskIO and too much space when a large number of users are using a live system which has issues which only happen in live.
Is there a log4j or (preferably) LogBack appender/configuration that will allow writing directly to a GZIP compressed file?

This feature already exists in Logback. Take a look at appenders section, specifically at time based rolling policy.
Quote:
Just like FixedWindowRollingPolicy, TimeBasedRollingPolicy supports automatic file compression. This feature is enabled if the value of the fileNamePattern option ends with .gz or .zip.
Also take a look at time and size based rolling policy.
You can setup rollover to occur after one log file hits a certain limit.
I don't believe writing directly to a GZIP compressed file for every log statement would be feasable, since this would create a pretty big performance overhead. Using a combination of existing features sounds reasonable to me.

The space issue is already solved by logback. It will compress your log files during rollover. The IO issue is quite a different one and I am afraid logback does not offer a solution.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string