Code snippet explanation, image serving in .htaccess - .htaccess

I am working on a site that I've inherited, and I'm looking at the .htaccess file. Truth be told I'm really NOT familiar with writing custom .htaccess rules, so I was wondering if someone can explain to me what is going on in this code snippet?
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_ACCEPT} image/webp
RewriteCond %{DOCUMENT_ROOT}/$1.webp -f
RewriteRule (.+)\.(jpe?g|png)$ $1.webp [T=image/webp,E=accept:1]
</IfModule>
After doing some googling, I'm pretty sure its taking any image that's served up and returning with webp on the end instead of its original extension?
I believe the intent of the code is to fool pagespeed insights so that it doesn't complain about serving next gen image formats.
I was just wondering if someone can give me a full break down on exactly what is going on and how it works?
Also, I've noticed that when I do change image sizes on the site (for optimisation purposes), pagespeed doesn't register the changes whilst this code is present in the htaccess, like its contributing to some sort of image cache? If I remove this snippet, pagespeed detects the new optimised images?

i'm pretty sure its taking any image thats served up and returning with webp on the end instead of its original extension?
Basically, yes. (Although, it's not a "redirect".)
More specifically... (providing mod_rewrite is enabled - checked by the outer <IfModule mod_rewrite.c> container):
RewriteRule (.+)\.(jpe?g|png)$ - For every URL-path that matches the regex (.+)\.(jpe?g|png)$. In other words, any URL that ends in .jpg, .jpeg or .png.
RewriteCond %{HTTP_ACCEPT} image/webp - And the Accept HTTP request header contains the string image/webp. In other words the user-agent accepts webp images.
RewriteCond %{DOCUMENT_ROOT}/$1.webp -f - And the corresponding .webp image exists in the same place as the requested file. eg. Request /foo/bar/baz.jpg and /foo/bar/baz.webp exists.
RewriteRule -------- $1.webp - Then internally rewrite the request to the .webp image (eg. foo/bar/baz.webp). Note that this is not a "redirect", the user-agent still sees the original .jpg (or .png) filename. Also...
T=image/webp - Sets the Content-Type header to image/webp. (Overriding the image/jpeg or image/png value that would otherwise be sent.)
E=accept:1 - Sets the environment variable accept to 1. (This can then be used by your application.)
I believe the intent of the code is to fool pagespeed insights so that it doesnt complain about serving next gen image formats.
I don't think the intent is to necessarily "fool pagespeed insights". (That's a lot of work just to fool a tool!)
However, it does allow graceful degradation... a .jpg is served by default and .webp is served if the user-agent supports it (and it exists!).
Also, i've noticed that when I do change image sizes on the site (for optimisation purposes), pagespeed doesn't register the changes whilst this code is present in the htaccess, like its contributing to some sort of image cache? If I remove this snippet, pagespeed detects the new optimised images?
Are you regenerating the .webp image?
This code only applies when requesting .jpg (or .png) images and the corresponding .webp image exists. Unless you also update the .webp image (or it didn't exist in the first place) then any compliant user-agent will not see a change in your image.
In the future... when everything and everybody supports .webp images then you could safely remove the two RewriteCond directives (and the relatively expensive filesystem check). And save all your images as .webp (no .jpg images exist on your site). The image URL would still be .jpg. These directives allow you to upgrade your images without having to change the URL.
In which case you should also remove the <IfModule mod_rewrite.c> wrapper as your site is now dependent on mod_rewrite doing its thing. The <IfModule> wrapper should only be used if mod_rewrite is optional to your site working "normally".

Related

.htaccess rewrite all image.php to their respective image URL

I'm wondering if there is a rule I can add to my .htaccess file that will make the URL of an image to appear as another image URL. Here is the use case I'm referring to:
My app automatically watermarks images using the GD library on the fly. In turn, the newly watermarked image has an image source of localhost/some/dir/image.php?original=someFilename.png.
Is it possible to make that dynamic image URL appear as the original, but still serve the dynamic, watermarked image? Thanks in advance.
This should work (currently testing it):
RewriteRule ^(.+\.png)$ /some/dir/image.php?original=$1
Works for me, but you probably do not want to allow any characters in the filename (".+").
The following would be much more restrictive but also safer:
RewriteRule ^([A-z0-9]+\.png)$ /some/dir/image.php?original=$1

Compress .htaccess file

I've got a htaccess file which contains over 3,000 lines mainly thanks to 301 redirects I have setup from my old ecommerce site. The file is 323kb in size and I'm worried it's going to be a burden for load times and therefore conversions.
Is there anything available that can compress (minify?) the file into a smaller size or someone offer a better idea to handle the 301 redirects?
If the redirects are simple redirects i.e. url1 to url2, no regex etc, AND you have access to httpd.conf, then you could use a RewriteMap for all the redirects and possibly have just 1 rule in your .htaccess to handle these.
From the RewriteMap documentation
The looked-up keys are cached by httpd until the mtime (modified time) of the mapfile changes, or the httpd server is restarted. This ensures better performance on maps that are called by many requests.
Can you specify some regular expressions to group / match all of these redirects? This then offers two options for doing this:
The first is to use a (hopefully smaller) set of RewriteRule statements using the [R=301] flag.
The second is to move this redirection into a redirector script where you use, say, PHP logic to decode the legacy ecommerce URI into its current format then issue a response with and 301/302 status and Location: pointing to the current URI. This would also need you to do a catch-all rewrite of the legacy ecommerce URIs to this redirector script, e.g.
RewriteRule ^(product/.*) rewriter.php?uri=$1 [QSA,L]
Without some examples, I can't give a more specific reply. Sorry.
I've had some of these cases before, most of the times you can replace the redirect statments with RewriteRules. For example, if your URL's went from:
http://shop.example.com/shop/category/product-id.html
To this:
http://shop.example.com/category/product-id.html
You can fetch it with a rewrite like this:
RewriteRule ^/shop/([a-z]+)/([0-9]+)\.html$ /$1/$2.html [L, R=301]
This will still result in a 301 redirect, so crawlers will still know it's a permanent move.

How do I use .htaccess to limit file uploads to .pdf?

I have a simple upload form that allows a file to be uploaded to a folder in the site. I don't want to allow anything but .pdf files to be uploaded. However, I cannot modify the form at all to limit the upload. And I can't use PHP on the back end to limit it either. Javascript is unsecure because a user can turn it off. How can I limit the upload to a .pdf file with .htaccess?
As far as I know, it isn't possible. You could, however, restrict the files being returned, and force their mime type to be application/pdf, so they will be treated like PDFs, even if they aren't. If this was combined with JavaScript, it would help honest users (ex, if someone accidentally selects a .jpg they will get a warning right away), and it will make attacks more difficult.
It seems like the third-party mod_upload might be able to help, though.
To restrict the output types, you could use a .htaccess file similar to this:
# Prevent request to non-.pdf files
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} ! \.pdf$
RewriteRule (.*) $1 [F]
# Tell the browser that this is a PDF
Header set Content-Type application/pdf
# Hint that the browser shouldn't try to auto-detect the content type
Header set X-Content-Type-Options nosniff
(note: I wrote those from memory, so make sure to test them before you trust them…)

Fastest way to redirect missing image files

I have an on-the-fly thumbnailing system and am trying to find the best way to make sure it's as fast as possible when serving up images. Here is the current flow:
User requests thumbnail thumbnails/this-is-the-image-name.gif?w=200&h=100&c=true
htaccess file uses modrewrite to send requests from this folder to a PHP file
PHP file checks file_exists() for the requested image based on the query string values
If it does:
header('content-type: image/jpeg');
echo file_get_contents($file_check_path);
die();
If it doesn't it creates the thumbnail and returns it.
My question is whether there is a way to optimize this into being faster? Ideally my htaccess file would do a file_exists and only send you to the PHP file when it doesn't... but since I am using query strings there is no way to build a dynamic URL to check. Is it worth switching from query strings to an actual file request and then doing the existence check in htaccess? Will that be any faster? I prefer the query string syntax, but currently all requests go to the PHP file which returns images whether they exist or not.
Thank you for any input in advance!
You should be able to do this in theory. The RewriteCond command has a flag -f which can be used to check for the existence of a file. You should be able to have a rule like this:
# If the file doesn't exist
RewriteCond %{REQUEST_FILENAME} !-f
# off to PHP we go
RewriteRule (.*) your-code.php [L,QSA]
The twist here is that I imagine you're naming files according to the parameters that come in -- so the example above might be thumbnails/this-is-the-image-name-200-100.gif. If that is the case, you'll need to generate a filename to test on the fly, and check for that instead of the REQUEST_FILENAME -- the details of this are really specific to your setup. If you can, I would recommend some sort of system that doesn't involve too much effort. For example, you could store your thumbnails to the filesystem in a directory structure like /width/height/filename, which would be easier to check for in a rewrite rule than, modified-filename-width-height.gif.
If you haven't checked it out, Apache's mod_rewrite guide has a bunch of decent examples.
UPDATE: so, you'll actually need to check for the dynamic filename from the looks of it. I think that the easiest way to do something like this will be to stick the filename you generate into an environment variable, like this (I've borrowed from your other question to flesh this out):
# generate potential thumbnail filename
RewriteCond %{SCRIPT_FILENAME}%{QUERY_STRING} /([a-zA-Z0-9-]+).(jpg|gif|png)w=([0-9]+)&h=([0-9]+)(&c=(true|false))
# store it in a variable
RewriteRule .* - [E=thumbnail:%1-%2-%3-%4-%6.jpg]
# check to see if it exists
RewriteCond %{DOCUMENT_ROOT}/path/%{ENV:thumbnail} !-f
# off to PHP we go
RewriteRule (.*) thumbnail.php?file_name=%1&type=%2&w=%3&h=%4&c=%6 [L,QSA]
This is completely untested, and subject to not working for sure. I would recommend a couple other things:
Also, one huge recommendation I have for you is that if possible, turn on logging and set RewriteLogLevel to a high level. The log for rewrite rules can be pretty convoluted, but definitely gives you an idea of what is going on. You need server access to do this -- you can't put the logging config in an .htaccess file if I recall.

Issues when creating pretty URL that uses actual site urls

I want to create functionality similar to the site downforeveryoneorjustme.com. They use a pretty URL to take in the URL of any given site. I sure they use htaccess to do this, however the method i'm using is encountering problems.
This is my .htaccess file that I'm using to send the site URL to a file.php:
RewriteEngine on
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)?$ /file.php?var=$1
However when I type in something like
mysite.com/http://google.com the variable it sends the file is http:/google.com (missing a slash). I can't figure out why this is occurring.
Also, when I type in something like mysite.com/existingfolder, where existingfolder is a folder on my site, it always works incorrectly. The variable it passes to the file is missing.html instead of existingfolder. In this case, the file doesn't display images. The image can't be found, and i'm assuming its because it's searching for the image in an incorrect folder on the site. That it might think it's in existingfolder and not in the normal folder it should be in.
Does anyone know why I'm getting these problems? I'm knew to htaccess, and I'm assuming it has something to do with that.
Thanks for any help.
I sure they use htaccess to do this
I'm not. I'm not even sure they're using Apache.
mod_rewrite is not always the answer to all URL-processing problems. It's certainly prone to some of the quirks of path-based URL handling, including the removal of double-slashes.
I suggest reading the Apache-specific REQUEST_URI variable from your script, rather than relying on rewrites to get a parameter. This will give you the path requested by the browser without any processing.

Resources