How to compress html pages using SetOutputFilter DEFLATE - .htaccess

I am not able to get compressed html pages in my browser even though I am 100% sure mod_deflate is activated on my server.
My htaccess file has this code snippet :
<IfModule mod_deflate.c>
<Files *.html>
SetOutputFilter DEFLATE
</Files>
</IfModule>
A non compressed excerpt of my content is:
<div>
<div>
Content
</div>
</div>
With the htaccess code I am using, I would expect to get the output below in my browser (no space and no tabs at the beginning of each line):
<div>
<div>
Content
</div>
</div>
Is there something wrong with the code I am using in the htaccess file?
Is keeping all tabs in front of each html line after compression the normal behavior of mod_deflate?
If so, would you recommend that I switch tabs with spaces in my html code to get the desired effect?
Thanks for your insights on this

For the Deflate output filter to compress the content
Your content should be at least 120 bytes; compressing lesser bytes increases the output size.
The http client making the request should support gzip/deflate encoding.
Most modern Web browsers support gzip encoding and automatically decompress the gziped content for you. So what you are seeing using a Web browser's View Page Source option is not the compressed content. To verify if your browser received a compressed content, hit the F12 Key, select the Network tab and your requested page. If the response header has Content-Encoding: gzip, you can be sure the compression worked.
In Firefox, you can remove support for gzip,deflate by going to about:config and emptying the value for network.http.accept-encoding. Now with no support for gzip, Firefox will receive uncompressed content from your Apache server.
Alternatively, if you want to see the compressed content, you can use a client that does not automatically decompress the contents for you (unless you use --compressed option).
You can use curl for this:
curl -H "Accept-Encoding: gzip,deflate" http://example.com/page.html > page.gz

Related

image not cached when loaded with meta refresh

Not really an issue, but how come images cached in .htaccess with :
<FilesMatch "\.(jpg|jpeg|gif|bmp)$">
Header set Cache-Control "max-age=6048000, public"
</FilesMatch>
are reloaded every time when the url of the page is called by a meta tag
<http-equiv="refresh" content="2;URL=page_with_large_jpg.php"> )
This meta tag is on another php script... php scripts are only cached 1 second :
<FilesMatch "\.(html|htm|php)$">
Header set Cache-Control "max-age=1, no-cache, private, must-revalidate"
</FilesMatch>
In most situations the cache looks good, if I enter the url of a page containing a large jpg image manually, or call this page by clicking on a link to it, the jpg is clearly cached (provided I visited this page/image previously of course), and so displayed instantly, but if the page containing the large jpg is called by meta refresh tag in the head section, the image is loaded again, taking a few seconds or more to be displayed entirely if it's very large !
Is there a way to prevent this ?

Content disposition link conflict

I use MODx Evolution and I included into my htaccess file the following:
<IfModule mod_headers.c>
<FilesMatch "\.jpg$">
Header append Content-Disposition "attachment;"
</FilesMatch>
<FilesMatch "\.jpeg$">
Header append Content-Disposition "attachment;"
</FilesMatch>
<FilesMatch "\.png$">
Header append Content-Disposition "attachment;"
</FilesMatch>
</IfModule>
I have a download button for each image that can be downloaded, like this:
<div class="box download-box">
<a class="button" href="[*template-variable-image*]">Download</a>
The above code works perfectly.
Now I've added another button for users to see the image in full scale in a separate browser tab with this code:
<h2 class="thumb-caption"><span data-href="[*template-variable-image*]" target="_blank">PREVIEW</span></h2>
Now when the user clicks "PREVIEW" the content disposition attachment box appears for download. How can I get the "PREVIEW" to show the preview of the image the way I planned and NOT the content download box???
This is more a HTML than a MODX question. A lot of modern browsers know the download attribute in the a tag.
So throw away the .htaccess additions and use
<a class="button" href="[*template-variable-image*]" download>Download</a>
You could also use javascript for this and catch all browsers. John Culviner has written a nice jQuery plugin jquery-file-download for this.

What is the htaccess equivalent of <meta http-equiv="Content-Type" content="text/html; charset=utf-8>

What is the htacces equivalent to meta http-equiv="Content-Type" content="text/html; charset=utf-8"? Yslow says i should put this in my htacces. I'm on appache server.
Ok seen here I have I think an answer. Which code is appropriate though? I only have html extensions on my site. http://www.askapache.com/htaccess/setting-charset-in-htaccess.html
AddCharset UTF-8 .html
vs
AddType 'text/html; charset=UTF-8' html
vs
AddDefaultCharset UTF-8
vs
Content-Type: text/html; charset=UTF-8
The first one, AddCharset, tells the server that files ending in .html should be said to be encoded in UTF-8.
The second gives the full Content-Type for HTML files, including both the MIME type and charset. This shouldn't be necessary, since Apache should already be configured to serve .html files as text/html.
The third, AddDefaultCharset, sets the default character set for all file types, not just HTML. So, for instance, text documents, XML documents, stylesheets, and the like will be served with a UTF-8 character set listed. This is what I would recommend; you should be saving all of your documents in UTF-8 by default anyhow, and so even if all of your documents are HTML now, this will keep the correct character set configured for other types of files if you add them later.
The last is not an Apache configuration; it's the actual header that should be sent along with your documents if you set one of the above options. You can check the headers that were sent in Firebug on Firefox, or various developer tools that other browsers offer. You should always have a Content-Type: header, and if your text is encoded in UTF-8, it should always specify charset=UTF-8.
Note that the meta tag is not required if you set the charset appropriately via the headers. It is still nice to have the meta tag if you are going to view the files locally, without a web server; in that case, there is nothing to set the header, so the browser needs to fall back toe the meta tag. But for this purpose, you can use the shorter and simpler meta tag: <meta charset=utf-8>. This abbreviated form was formally introduced in HTML5, but browsers have actually supported it for much longer, and it's compatible with all modern browsers, even back to IE 6.
Another possibility is the rewrite engine (in this case, matching no-extension URLs):
RewriteEngine on
RewriteRule ^([^.]*)$ $1 [type=text/html]

How to crawl only HTML in Nutch?

Is it possible to crawl/fetch only plain HTML pages via Nutch (i.e. no pictures, video, flash, excel, exe, pdf or word files)?
How to check Content-Type of the page and fetch only text/html pages via Nutch?
Edit conf/regex-urlfilter.txt:
Set files suffix for ignore:
-\.(jpg|gif|zip|ico)$

Stop IE8 from opening or downloading a text/plain MIME type

I'm dynamically generating a text file in PHP, so it has a .php extension but a text/plain MIME type. All browsers display the file as nicely preformatted text, except IE8.
Googling tells me that they've added security where if the HTTP header content type doesn't match the expected content type (I think based on the extension and some sniffing) then it forces the file to be downloaded. In my case I have to open it, and also give it permission to open the file I just told it open! That's probably a Win7 annoyance though. Serving a static plain text file works fine, of course.
So can I stop IE8 from downloading the file and get it to view it normally? The code has to run on multiple shared hosting environments, so I think I'm stuck with the .php extension.
Add this to your HTTP header:
X-Content-Type-Options: nosniff
It's an IE8 feature to opt-out of its MIME-sniffing.
Source
Alternatively, you can "trick" IE8 into thinking that it is indeed serving up a text file. These 2 lines do it for me and don't involve using non-standardized "X-" headers:
Header("Content-Type: text/plain");
Header("Content-Disposition: inline; filename=\"whatever.txt\"");

Resources