500 Internal Server Error instead of: 404 - Not Found - .htaccess

Apparently these lines in my .htaccess cause the server to output a 500 instead of a 404 error which should appear when trying to access a non-existant site.
RewriteEngine On
# Don't rewrite requests to /de or other real files
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !^.*/(css|scripts)
# Rewrite incoming requests to their equivalent behind /de
RewriteRule ^(.*)$ de/$1 [L,QSA]
Alas I'm very unfamiliar with .htaccess. Where's the mistake in this one which causes the 500 instead of the 404 error?
This is the error in the logfile:
Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
Edit:
RewriteRule ^(.*)$ de/$1 [L,QSA]
This part is responsible for the error. Why does it prevent the 404-error page though?

This part is responsible for the error. Why does it prevent the 404-error page though?
The reason why this isn't producing a 404 error page is because of the conditions which prevent the blind match from being rewritten to de/.
These are the 3 preventative conditions you have:
# these prevent rewriting if the URI doesn't point to a resource
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
# this prevents scripts from being rewritten
RewriteCond %{REQUEST_URI} !^.*/(css|scripts)
However, nothing is preventing a rewrite if, for example, /de/blahblah doesn't exist. Thus if someone requests /blahblah, this is what happens:
URI = /blahblah
checks first condidion: blahblah is not a directory
checks second condition: blahblah is not a file
checks third condition: blahblah isn't a css or script
rewrites to /de/blahblah
Rewrite engine loops, URI = /de/blahblah
checks first condidion: de/blahblah is not a directory
checks second condition: de/blahblah is not a file
checks third condition: de/blahblah isn't a css or script
rewrites to /de/de/blahblah
etc. etc.
You need to add an additional set of conditions to rewrite IF the target exists:
RewriteCond %{DOCUMENT_ROOT}/de%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}/de%{REQUEST_URI} -d
Thus, you should have something like:
RewriteEngine On
# Don't rewrite requests to /de or other real files
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !^.*/(css|scripts)
# make sure rewrite target actually exists as a file or directory
RewriteCond %{DOCUMENT_ROOT}/de%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}/de%{REQUEST_URI} -d
# Rewrite incoming requests to their equivalent behind /de
RewriteRule ^(.*)$ de/$1 [L,QSA]
This way, a 404 will get returned because nothing is ever rewritten into the de/ directory.

Related

URL Rewriting using .htaccess

I'm trying to get a url to rewrite using htaccess but can't seem to get it working.
I'm trying to rewrite http://website.com/pages/blog/article.php?article=blog-entry so that it can be entered as http://website.com/pages/blog/blog-entry but i'm getting an error when I try the following:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^pages/blog/(.+)$ pages/blog/article.php?article=$1 [NC,L]
Can anybody see where i'm going wrong as this just gives me a 404 error. Thanks in advance.
Use this rule inside /pages/blog/.htaccess:
RewriteEngine on
RewriteBase /pages/blog/
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([\w-]+)/?$ article.php?article=$1 [QSA,L]
I'm trying to rewrite
http://website.com/pages/blog.php?article=blog-entry so that it can be
entered as http://website.com/pages/blog/blog-entry but i'm getting an
error when I try the following:
RewriteEngine on RewriteCond %{REQUEST_FILENAME} !-d RewriteCond
%{REQUEST_FILENAME}\.php -f
RewriteRule ^pages/blog/(.+)$ pages/blog/article.php?article=$1
[NC,L]
Your wording is confusing, but I believe this is what you mean:
The real url is: http://website.com/pages/blog.php?article=blog-entry
you want to be able to use a 'friendly' url: http://website.com/pages/blog/blog-entry to point to the real url.
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^pages/blog/(.+)$ /pages/blog/article.php?article=$1 [QSA,L]
The first two tests ask: is this a directory that exists? is this a file that exists? Because article.php is a file, it won't be included in this action, so you won't enter into an endless loop, which is always the risk with incorrectly done rewrite rules.
Take the given url, and use query string append (QSA) to attach the desired data to the actual file that will process the request. This is not a rewrite in that the url the user sees does not change, this only happens internally in apache, which sends the request to the desired target, with the desired information.
You have to test if the file or directory exists because otherwise you'd be applying this rule incorrectly, since it should only be applied when the target does NOT exist. This is basically how all blog/cms 'search engine friendly urls' work, more or less.
Last, since the target is /blog.php?article=blog-entry you can't skip the leading /.
However, it's unclear to me why you'd want the friendly url to be so long, when you can just make it short, and friendlier: like, pages/[article-name]

How to rewrite to a script and also redirect away from that script using .htaccess while avoiding infinite loops

I want to have all the URLs on my site handled by a single script. So I put in a rewrite rule like this:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) /myscript.php?p=$1 [L]
But I don't want to allow access to my script on URLs that actually contain "myscript.php" in them so I would like to redirect those back to the main site:
Redirect 301 /myscript.php http://example.com/
The problem is that if I put both of those rules into my .htaccess file it causes an infinite loop. How do I get them both to work at the same time?
I would also like to be able to redirect things like:
/myscript.php?p=foo -> /foo
You can set an environment variable
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !myscript\.php
RewriteRule (.*) /myscript.php?p=$1 [L,E=LOOP:1]
and test for that in your second rule
RewriteCond %{ENV:REDIRECT_LOOP} !1
RewriteRule ^myscript\.php$ / [R,L]
Never test with 301 enabled, see this answer Tips for debugging .htaccess rewrite rules for details.
Using an environment variable is perfectly OK, however, you don't need to manually set this environment variable yourself. Apache provides the REDIRECT_STATUS environment variable which can be used for this purpose.
REDIRECT_STATUS is empty (or not set) on the initial request. It is set to 200 on the first (successful) internal rewrite. Or some other HTTP status code in the case of an error (404 etc.).
So, instead of checking that REDIRECT_LOOP is not 1, we can simply check that REDIRECT_STATUS is empty to ensure we are testing the initial request and not the rewritten request. For example:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^myscript\.php$ / [R,L]
(Note that it is just REDIRECT_STATUS, there is no STATUS variable at the start of the request.)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !myscript\.php
RewriteRule (.*) /test/myscript.php?p=$1 [L,E=LOOP:1]
Aside: The RewriteCond directive that checks against the REQUEST_URI doesn't really do anything here. If the first condition is true (ie. it's not a file), then this condition must also be true. However, it could be optimised by including this condition first. This would then avoid the file check on every request (including the rewritten request). For example:
RewriteCond %{REQUEST_URI} !^/test/myscript\.php
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) /test/myscript.php?p=$1 [L]
Or, you could include a pre-check (an exception) before this rule instead that halts processing when myscript.php is requested:
RewriteRule ^test/myscript\.php$ - [L]
However, if you do this, then the above canonical redirects must appear before these rules, otherwise they will never be processed. (Putting the canonical redirects first is generally preferable anyway.)

mod rewrite to remove file extension, add trailing slash, remove www and redirect to 404 if no file/directory is available

I would like to create rewrite rules in my .htaccess file to do the following:
When accessed via domain.com/abc.php: remove the file extension, append a trailing slash and load the abc.php file. url should look like this after rewrite: domain.com/abc/
When accessed via domain.com/abc/: leave the url as is and load abc.php
When accessed via domain.com/abc: append trailing slash and load abc.php. url should look like this after rewrite: domain.com/abc/
Remove www
Redirect to 404 page (404.php) when accessed url doesn't resolve to folder or file, e.g. when accessing either domain.com/nothingthere.php or domain.com/nothingthere/ or domain.com/nothingthere
Make some permanent 301 redirects from old urls to new ones (e.g. domain.com/abc.html to domain.com/abc/)
All php files sit in the document root directory, but if there is a solution that would make urls such as domain.com/abc/def/ (would load domain.com/abc/def.php) also work it would be great as well, but not necessary
So here is what I have at the moment (thrown together from various sources and samples from around the web
<IfModule mod_rewrite.c>
RewriteCond %{HTTPS} !=on
# redirect from www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [R=301,L]
# remove php file extension
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{THE_REQUEST} ^GET\ /[^?\s]+\.php
RewriteRule (.*)\.php$ /$1/ [L,R=301]
# add trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*[^/]$ /$0/ [L,R=301]
# resolve urls to matching php files
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*)/$ $1.php [L]
With this the first four requirements seem to work, whether I enter domain.com/abc.php, domain.com/abc/ or domain.com/abc, the final url always ends up being domain.com/abc/ and domain.com/abc.php is loaded.
When I enter a url that resolves to a file that doesn't exists I'm getting an error 310 (redirect loop), when really a 404 page should be loaded. Additionally I haven't tried if subfolders work, but as I said, that's low priority. I'm pretty sure I can just slap the permanent 301 redirects for legacy urls on top of that without any issues as well, just wanted to mention it. So the real issue is really the non working 404 page.
I've had problems with getting ErrorDocument to work reliably with rewrite errors, so I tend to prefer to handle invalid pages correctly in my rewrite cascade. I've tried to cover a fully range of test vectors with this. Didn't find any gaps.
Some general points:
You need to use the DOCUMENT_ROOT environment variable in this. Unfortunately if you use a shared hosting service then this isn't set up correctly during rewrite execution, so hosting providers set up a shadow variable to do the same job. Mine uses DOCUMENT_ROOT_REAL, but I've also come across PHP_DOCUMENT_ROOT. Do a phpinfo to find out what to use for your service.
There's a debug info rule that you can trim as long as you replace DOCROOT appropriately
You can't always use %{REQUEST_FILENAME} where you'd expect to. This is because if the URI maps to DOCROOT/somePathThatExists/name/theRest then the %{REQUEST_FILENAME} is set to DOCROOT/somePathThatExists/name rather than the full pattern equivalent to the rule match string.
This is "Per Directory" so no leading slashes and we need to realise that the rewrite engine will loop on the .htaccess file until a no-match stop occurs.
This processes all valid combinations and at the very end redirects to the 404.php which I assume sets the 404 Status as well as displaying the error page.
It will currently decode someValidScript.php/otherRubbish in the SEO fashion, but extra logic can pick this one up as well.
So here is the .htaccess fragment:
Options -Indexes -MultiViews
AcceptPathInfo Off
RewriteEngine On
RewriteBase /
## Looping stop. Not needed in Apache 2.3 as this introduces the [END] flag
RewriteCond %{ENV:REDIRECT_END} =1
RewriteRule ^ - [L,NS]
## 302 redirections ##
RewriteRule ^ - [E=DOCROOT:%{ENV:DOCUMENT_ROOT_REAL},E=URI:%{REQUEST_URI},E=REQFN:%{REQUEST_FILENAME},E=FILENAME:%{SCRIPT_FILENAME}]
# redirect from HTTP://www to non-www
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [R=301,L]
# remove php file extension on GETs (no point in /[^?\s]+\.php as rule pattern requires this)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_METHOD} =GET
RewriteRule (.*)\.php$ $1/ [L,R=301]
# add trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*[^/]$ $0/ [L,R=301]
# terminate if file exists. Note this match may be after internal redirect.
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L,E=END:1]
# terminate if directory index.php exists. Note this match may be after internal redirect.
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{ENV:DOCROOT}/$1/index.php -f
RewriteRule ^(.*)(/?)$ $1/index.php [L,NS,E=END:1]
# resolve urls to matching php files
RewriteCond %{ENV:DOCROOT}/$1.php -f
RewriteRule ^(.*?)/?$ $1.php [L,NS,E=END:1]
# Anything else redirect to the 404 script. This one does have the leading /
RewriteRule ^ /404.php [L,NS,E=END:1]
Enjoy :-)
You'll probably want to check if the php file exists before adding the tailing slash.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^.*[^/]$ /$0/ [L,R=301]
or if you really want a tailing slash for all 404 pages (so /image/error.jpg will become /images/error.jpg/, which I think is weird):
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*[^/]$ /$0/ [L,R=301]
I came up with this:
DirectorySlash Off
RewriteEngine on
Options +FollowSymlinks
ErrorDocument 404 /404.php
#if it's www
# redirect to non-www.
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301,QSA]
#else if it has slash at the end, and it's not a directory
# serve the appropriate php
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1.php [L,QSA]
#else if it's an existing file, and it's not php or html
# serve the content without rewrite
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_FILENAME} -f
RewriteCond %{REQUEST_URI} !(\.php)|(\.html?)$
RewriteRule ^ - [L,QSA]
#else
# strip php/html extension, force slash
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^(.*?)((\.php)|(\.html?))?/?$ /$1/ [L,NC,R=301,QSA]
Certainly not very elegant (env:redirect_status is quite a hack), but it passes my modest tests. Unfortunately I can't test the www redirection, as I'm on localhost, and has no real access to a server, but that part should work too.
You see, I used the ErrorDocument directive to specify the error page, and used the DirectorySlash Off request to make sure Apache doesn't interfere with the slash-appending fun. I also used the QSA (Query String Append) flag that, well, appends the query string to the request so that it's not lost. It looks kind of silly after the trailing slash, but anyhow.
Otherwise it's pretty straightforward, and I think the comments explain it pretty well. Let me know if you run into any trouble with it.
Create a folder under the root of the domain
Place a .htaccess in the above folder as RewriteRule ^$ index.php
Parse the URL
With PHP coding you can now strip the URL or file extension as required

RewriteRule subtle differences - one in the same?

I'm trying to better understand mod_rewrite and I've come across some differences, which I think do the same thing? In this case, no existing files or directories and rewriting to an index.php page.
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule .+ - [L]
Do I need the [OR] or can I leave it off?
What are the differences or advantages of the following rules? I'm currently using the first one, but I've come across the last four in places like WordPress:
#currently using
RewriteRule ^(.+)$ index\.php?$1 [L]
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
Do I need the [OR] or can I leave it off?
In this case you need the [OR] because RewriteCond's are inherently ANDed, and it's not the case that a request is both a file and a directory (as far as mod_rewrite is concerned).
RewriteRule ^(.+)$ index\.php?$1 [L]
This rewrites all requests that aren't for the document root (e.g. http://domain.com/) as a query string for index.php, thus a request for http://domain.com/some/path/file.html gets internally rewritten to index.php?some/path/file.html
RewriteRule ^index\.php$ - [L]
This is a rule to prevent rewrite looping. The rewrite engine will continue to loop through all the rules until the URI is the same before the rewrite iteration and after (without the query string). If the URI starts with index.php simply stop the current rewrite iteration (what the - does). The rewrite engine sees that the URI before sending it through the rules was index.php and after the rules was index.php, thus the rewrite engine stops and all rewriting is done. This prevents mod_rewrite from rewriting things to index.php?index.php which the first rule would do upon the 2nd pass through the rewrite engine if it isn't for this rule.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
This is the catch-all. If the first rule never gets applied, and the request isn't for an existing file or directory, send the request to index.php. Though in this case, it looks like this rule will never get applied.
EDIT:
is there a way to ignore a certain rule if a condition is true? For example, www.domain.com/some/path > index.php?some/path, but if the URI is www.domain.com/this/path > no rewrite?
You'd have to add 2 conditions, one that checks to make sure the requested host isn't "www.domain.com" and one to check that the URI isn't "/this/path":
RewriteCond %{HTTP_HOST} !^(www\.)?domain\.com$ [NC,OR]
RewriteCond %{REQUEST_URI} !^/some/path
The [NC] indicates that the condition's match should ignore case, so when someone enters the URL http://WWW.domain.com/ in their address bar, it will match (or in this case, not match). The second condition matches when the URI starts with "/some/path", which means requests for http://domain.com/some/path/file.html will match and NOT get rewritten. If you want to match exactly "/some/path", then the regular expression needs to be !^/some/path$.
Why not use [OR] in the final block between !-f and !-d?
This is the logical negation of -f OR -d: "if the file exists, don't rewrite, OR if the directory exists, don't rewrite" turns into "if the file doesn't exist, AND if the directory doesn't exist, then rewrite"

mod_rewrite with .htaccess to spoof subdirectory

I have a django app running on a subdomain, subdomain.domain.com/appname, but I don't want the app name to show up in any of my urls. I have accomplished this via .htaccess
RewriteEngine On
RewriteCond %{REQUEST_URI} !admin
RewriteCond %{REQUEST_URI} !appname
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /appname/$1 [L]
This accompishes the case where the requested url is subdomain.domain.com/home and it is served from subdomain.domain.com/appname/home.
However, I'd also like to accomplish the reverse, where the requested url is subdomain.domain.com/appname/home, and the displayed url changes to subdomain.domain.com/home, which then triggers the rule above and is served from subdomain.domain.com/appname/home
I tried the following but got an error that I have a loop
RewriteEngine On
RewriteCond %{REQUEST_URI} appname
RewriteRule ^appname/(.*)$ /$1 [N,R=301]
RewriteCond %{REQUEST_URI} !admin
RewriteCond %{REQUEST_URI} !appname
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /appname/$1 [L]
Try without the 'N' flag:
'next|N' (next round)
Re-run the rewriting process (starting again with the first rewriting rule). This time, the URL to match is no longer the original URL, but rather the URL returned by the last rewriting rule. This corresponds to the Perl next command or the continue command in C. Use this flag to restart the rewriting process - to immediately go to the top of the loop.
Be careful not to create an infinite loop!
http://httpd.apache.org/docs/current/mod/mod_rewrite.html

Resources