.htaccess URL rewriting challenge - .htaccess

I'm having trouble with some URL rewriting.
All of the stuff below works fine, but I need to add a rule which removes querystrings from URLS.
site.com/page?a=b
will become
site.com/page
Can someone help out? I have done some reading on .htaccess but I find it terribly complex. Also, will need to know where in the file my new directives should appear.
Thanks.
# EE 404 page for missing pages
ErrorDocument 404 /index.php/404/index
# Simple 404 for missing files
ErrorDocument 404 "File Not Found"
# Rewriting will likely already be on, uncomment if it isnt
RewriteEngine On
RewriteBase /
# Block access to "hidden" directories whose names begin with a period. This
# includes directories used by version control systems such as Subversion or Git.
RewriteRule "(^|/)\." - [F]
# remove the www - Uncomment to activate
#
# RewriteCond %{HTTPS} !=on
# RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
# RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
#
# Remove the trailing slash to paths without an extension
# Uncomment to activate
#
# RewriteRule ^(.*)/$ /$1 [R=301,L]
#
# Remove index.php
# Uses the "include method"
# http://expressionengine.com/wiki/Remove_index.php_From_URLs/#Include_List_Method
#
RewriteCond %{QUERY_STRING} !^(ACT=.*)$ [NC]
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5})$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} ^/(home|inc|publishers|sidebars|about|include-template|testing|advertisers|products|sitemap|style|ad-choices|social-bar|search|404||members|P[0-9]{2,8}) [NC]
RewriteRule (.*) /index.php/$1 [L]

This would remove query string from url
RewriteRule ^(.*) /index.php/$1? [L] #remove query string
Hope it helps

I'm a little late for an answer, but since i was searching for a similar behaviour, i thought i should share it: you can also add a flag to a rewrite rule to remove the query string; the flag is [QSD], and it helps avoiding workarounds like the ? at the end ;-)
Here you can find more about this flag. I feel like pointing out that "This flag is available in [Apache] version 2.4.0 and later"

Related

Redirecting all urls (including existing files) to the same page results in a 500 Internal Server Error

This is my current .htaccess file:
RewriteEngine on
# remove trailing slash
RewriteRule (.*)(/|\\)$ $1 [R]
# everything
RewriteRule ^(.*?)$ /handler.php?url=$1 [L,QSA]
However, this doesn't work, it throws a 500 Internal Server Error
My previous .htaccess file looked like this:
RewriteEngine on
# remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*)(/|\\)$ $1 [R]
# everything
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*?)$ /handler.php?url=$1 [L,QSA]
And it worked, except for specific files. However, now I'd like the specific files to redirect into the handler as well. Is there a way to use RewriteRules without the RewriteConds?
Without RewriteCond for file check you can tweak your regex like this:
RewriteEngine on
RewriteBase /
# remove trailing slash for non directories
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+?)/$ $1 [R,L]
# every request not for handler.php
RewriteRule ^((?!handler\.php$).*)$ handler.php?url=$1 [L,QSA]
With the help of CBroe and Sumurai8, I was able to fix the problem on my own. The problem was not that you can't have RewriteRules without RewriteConds, but that if you rewrite every url into a single file, it will rewrite requests to that specific file to itself once more, creating an infinite loop.
The new .htaccess:
RewriteEngine on
# remove trailing slash
RewriteRule (.*)(/|\\)$ $1 [R,L]
# everything
RewriteCond %{REQUEST_URI} !^/handler.php$
RewriteRule ^(.*?)$ /handler.php?url=$1 [L,QSA]
Relevant resource: Request exceeded the limit of 10 internal redirects

Htaccess - Rewrite engine (reverse engineering a line of code)

On a site I'm working on, if you enter the url, plus 1 directory, the htaccess adds a trailing slash.
So, this: http://www.mysite.com/shirts
Becomes this: http://www.mysite.com/shirts/
The htaccess that runs the site is quite long and complex, so it's not easy to find or test which rule is causing the rewrite. I was able to track down the issue to this line of code (I think):
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]
Does this rule match the behavior I'm describing above? It seems to be the cause, but it doesn't make logical sense to me. I don't unsderstand where the trailing slash is coming from.
Can someone shed some light on this for me? Thanks in advance.
Edit: MORE:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^mysite\.com$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]
By default apache will add the ending /, you will have to use:
DirectorySlash Off
To disable that behavior which is caused by mod_dir, you can read more about it here.
However if you're trying to remove the / to fix images not showing. That is not the right way to do it, you should instead use the HTML base tag, for example:
<BASE href="http://www.yourdomain.com/">
Read more here about it.
Your current rule as you have updated on your question:
RewriteCond %{HTTP_HOST} ^mysite\.com$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]
Means:
if domain on the URL is only mysite.com
redirect current URL to domain with www.
So an example of it would be, if you access:
http://domain.com/blog/some_blog_article
It will redirect the user to:
http://www.domain.com/blog/some_blog_article
Note how it retains everything and only add the www. to the domain.
If you really want to redirect it regardless here is one way to do it:
Options +FollowSymLinks -MultiViews
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^mysite\.com$ [NC]
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]
# check if it is a directory
RewriteCond %{REQUEST_FILENAME} -d
# check if the ending `/` is missing and redirect with slash
RewriteRule ^(.*[^/])$ /$1/ [R=301,L]
# if file or directory does not exist
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
# and we still want to append the `/` at the end
RewriteRule ^(.*[^/])$ /$1/ [R=301,L]

Cyclic redirection in .htaccess

I have the following .htaccess file:
AddDefaultCharset utf-8
RewriteEngine on
Options +SymLinksIfOwnerMatch
RewriteBase /
# redirect all www-requests to no-www
# -
RewriteCond %{HTTP_HOST} ^www\.site\.com$ [NC]
RewriteRule ^(.*)$ http://site.com/$1 [R=301,L]
# redirect all home pages to / (root)
# -
RewriteCond %{THE_REQUEST} ^.*/index\.(php|html?)
RewriteRule ^(.*)index\.(php|html?)$ /$1 [R=301,L]
# remove trailing slash from dirs
# -
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)/$ /$1 [R=301,L]
# automatically add index.php when needed
# -
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond $1 !^(index\.php|login\.php|reg\.php|robots\.txt|css/|js/)
RewriteRule ^(.*)$ /index.php [L]
The .htaccess file should do the following (for SEO):
Conversion to no-www (http://www.site.com should become http://site.com)
All URIs with trailing slashes should convert to no-trailing-slash: http://site.com/me/ should be redirected http://site.com/me
All URIs with index.php/index.html should convert to just nothing: http://site.com/admin/index.php or http://site.com/admin/ should be eventually displayed as http://site.com
However the current version of .htaccess results in a cyclic redirection when trying to access (http://site.com/admin). The real document that should be fetched by browser is http://site.com/admin/index.php.
Can anyone please help me with this issue?
There's a module called mod_dir that's automatically loaded and it causes requests for directories that are missing the trailing slash to get redirected with a trailing slash. You can turn this off using the DirectorySlash directive, but note the security warning when you turn it off. There's an information disclosure security issue if you turn it off and default indexes won't get loaded. However, your lats rule (looks like) it does that, though incorrectly.
First, turn off the DirectorySlash
DirectorySlash Off
Then you need to change the last rule to:
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_FILENAME}/index.php -f
RewriteRule ^(.*)$ /$1/index.php [L]

htaccess rewrite/redirect if last character is NOT a digit or slash

I have a site where I have a htaccess rule set to take the entire url, and forward it to my index file, using the below rule, with everything working fine.
#################################
# Magic Re-Writes DO NOT CHANGE #
#################################
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine on
#RewriteBase /
# Do Not apply if a specific file or folder exists
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# The rules on how to rewrite the urls
RewriteRule (.*) /index.php?url=$1 [QSA,L]
</IfModule>
So the below rule forwards http://mydomain.com/players/scoresheet/singlegame
to
http://mydomain.com/index.php?url=players/scoresheet/singlegame
However, I also need to ensure I cater for people forgetting the trailing slash in the url, something normally straight forward, however, I need to be able to force the final trailing slash ONLY if that last character is not numerical (or a slash obviously).
For Example, someone types;
http://mydomain.com/players/scoresheet/singlegame
I need the url in the browser to show as: http://mydomain.com/players/scoresheet/singlegame/
but still be forwarded to: http://mydomain.com/index.php?url=players/scoresheet/singlegame/
As said the exception to this will be if the last character already has the trailing slash, or is a numerical digit.
(Hope that makes sense)
Ok, heres what I have so far...
#######################################
# Add trailing slash to url #
# unless last character is a number #
#######################################
<IfModule mod_rewrite.c>
RewriteEngine on
Rewritecond %{REQUEST_URI} [^0-9/]$
RewriteRule ^(.*)$ /$1/ [R=301,L]
</IfModule>
#################################
# Magic Re-Writes DO NOT CHANGE #
#################################
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
# Do Not apply if a specific file or folder exists
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# The rules on how to rewrite the urls
RewriteRule (.*) /index.php?url=$1 [QSA,L]
</IfModule>
The problem with this is although it seems to get the adding of the slash to the url, it also addes the index.php as well, so what I end up with, is:
Visit: http://mydomain.com/players/scoresheet/singlegame
get url rewritten to: http://mydomain.com/index.php?url=players/scoresheet/singlegame/
The slash is added, but I need it to do so without display the index part.
I have gone backwards and forwards, with many different outcomes (usually outright failures, or loops).
Any help would be appreciated
Your rule is correct, but it's blindly redirecting even when it's not supposed to. The URL that you have above is probably not what it's getting rewritten to. You have it as:
http://mydomain.com/index.php?url=players/scoresheet/singlegame/
But I'm willing to bet it's really something like:
# note the slash here--------v
http://mydomain.com/index.php/?url=players/scoresheet/singlegame/
Because after the URI is internally rewritten and routed to /index.php, the rewrite engine loops again and the redirect catches it, and redirects /index.php to /index.php/. So you need to add the same exclusion conditions that you have in your routing rule:
So change:
Rewritecond %{REQUEST_URI} [^0-9/]$
RewriteRule ^(.*)$ /$1/ [R=301,L]
to either:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
Rewritecond %{REQUEST_URI} [^0-9/]$
RewriteRule ^(.*)$ /$1/ [R=301,L]
or:
RewriteCond %{REQUEST_URI} !index.php
Rewritecond %{REQUEST_URI} [^0-9/]$
RewriteRule ^(.*)$ /$1/ [R=301,L]

mod rewrite to remove file extension, add trailing slash, remove www and redirect to 404 if no file/directory is available

I would like to create rewrite rules in my .htaccess file to do the following:
When accessed via domain.com/abc.php: remove the file extension, append a trailing slash and load the abc.php file. url should look like this after rewrite: domain.com/abc/
When accessed via domain.com/abc/: leave the url as is and load abc.php
When accessed via domain.com/abc: append trailing slash and load abc.php. url should look like this after rewrite: domain.com/abc/
Remove www
Redirect to 404 page (404.php) when accessed url doesn't resolve to folder or file, e.g. when accessing either domain.com/nothingthere.php or domain.com/nothingthere/ or domain.com/nothingthere
Make some permanent 301 redirects from old urls to new ones (e.g. domain.com/abc.html to domain.com/abc/)
All php files sit in the document root directory, but if there is a solution that would make urls such as domain.com/abc/def/ (would load domain.com/abc/def.php) also work it would be great as well, but not necessary
So here is what I have at the moment (thrown together from various sources and samples from around the web
<IfModule mod_rewrite.c>
RewriteCond %{HTTPS} !=on
# redirect from www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [R=301,L]
# remove php file extension
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{THE_REQUEST} ^GET\ /[^?\s]+\.php
RewriteRule (.*)\.php$ /$1/ [L,R=301]
# add trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*[^/]$ /$0/ [L,R=301]
# resolve urls to matching php files
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*)/$ $1.php [L]
With this the first four requirements seem to work, whether I enter domain.com/abc.php, domain.com/abc/ or domain.com/abc, the final url always ends up being domain.com/abc/ and domain.com/abc.php is loaded.
When I enter a url that resolves to a file that doesn't exists I'm getting an error 310 (redirect loop), when really a 404 page should be loaded. Additionally I haven't tried if subfolders work, but as I said, that's low priority. I'm pretty sure I can just slap the permanent 301 redirects for legacy urls on top of that without any issues as well, just wanted to mention it. So the real issue is really the non working 404 page.
I've had problems with getting ErrorDocument to work reliably with rewrite errors, so I tend to prefer to handle invalid pages correctly in my rewrite cascade. I've tried to cover a fully range of test vectors with this. Didn't find any gaps.
Some general points:
You need to use the DOCUMENT_ROOT environment variable in this. Unfortunately if you use a shared hosting service then this isn't set up correctly during rewrite execution, so hosting providers set up a shadow variable to do the same job. Mine uses DOCUMENT_ROOT_REAL, but I've also come across PHP_DOCUMENT_ROOT. Do a phpinfo to find out what to use for your service.
There's a debug info rule that you can trim as long as you replace DOCROOT appropriately
You can't always use %{REQUEST_FILENAME} where you'd expect to. This is because if the URI maps to DOCROOT/somePathThatExists/name/theRest then the %{REQUEST_FILENAME} is set to DOCROOT/somePathThatExists/name rather than the full pattern equivalent to the rule match string.
This is "Per Directory" so no leading slashes and we need to realise that the rewrite engine will loop on the .htaccess file until a no-match stop occurs.
This processes all valid combinations and at the very end redirects to the 404.php which I assume sets the 404 Status as well as displaying the error page.
It will currently decode someValidScript.php/otherRubbish in the SEO fashion, but extra logic can pick this one up as well.
So here is the .htaccess fragment:
Options -Indexes -MultiViews
AcceptPathInfo Off
RewriteEngine On
RewriteBase /
## Looping stop. Not needed in Apache 2.3 as this introduces the [END] flag
RewriteCond %{ENV:REDIRECT_END} =1
RewriteRule ^ - [L,NS]
## 302 redirections ##
RewriteRule ^ - [E=DOCROOT:%{ENV:DOCUMENT_ROOT_REAL},E=URI:%{REQUEST_URI},E=REQFN:%{REQUEST_FILENAME},E=FILENAME:%{SCRIPT_FILENAME}]
# redirect from HTTP://www to non-www
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [R=301,L]
# remove php file extension on GETs (no point in /[^?\s]+\.php as rule pattern requires this)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_METHOD} =GET
RewriteRule (.*)\.php$ $1/ [L,R=301]
# add trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*[^/]$ $0/ [L,R=301]
# terminate if file exists. Note this match may be after internal redirect.
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L,E=END:1]
# terminate if directory index.php exists. Note this match may be after internal redirect.
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{ENV:DOCROOT}/$1/index.php -f
RewriteRule ^(.*)(/?)$ $1/index.php [L,NS,E=END:1]
# resolve urls to matching php files
RewriteCond %{ENV:DOCROOT}/$1.php -f
RewriteRule ^(.*?)/?$ $1.php [L,NS,E=END:1]
# Anything else redirect to the 404 script. This one does have the leading /
RewriteRule ^ /404.php [L,NS,E=END:1]
Enjoy :-)
You'll probably want to check if the php file exists before adding the tailing slash.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^.*[^/]$ /$0/ [L,R=301]
or if you really want a tailing slash for all 404 pages (so /image/error.jpg will become /images/error.jpg/, which I think is weird):
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*[^/]$ /$0/ [L,R=301]
I came up with this:
DirectorySlash Off
RewriteEngine on
Options +FollowSymlinks
ErrorDocument 404 /404.php
#if it's www
# redirect to non-www.
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301,QSA]
#else if it has slash at the end, and it's not a directory
# serve the appropriate php
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1.php [L,QSA]
#else if it's an existing file, and it's not php or html
# serve the content without rewrite
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_FILENAME} -f
RewriteCond %{REQUEST_URI} !(\.php)|(\.html?)$
RewriteRule ^ - [L,QSA]
#else
# strip php/html extension, force slash
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^(.*?)((\.php)|(\.html?))?/?$ /$1/ [L,NC,R=301,QSA]
Certainly not very elegant (env:redirect_status is quite a hack), but it passes my modest tests. Unfortunately I can't test the www redirection, as I'm on localhost, and has no real access to a server, but that part should work too.
You see, I used the ErrorDocument directive to specify the error page, and used the DirectorySlash Off request to make sure Apache doesn't interfere with the slash-appending fun. I also used the QSA (Query String Append) flag that, well, appends the query string to the request so that it's not lost. It looks kind of silly after the trailing slash, but anyhow.
Otherwise it's pretty straightforward, and I think the comments explain it pretty well. Let me know if you run into any trouble with it.
Create a folder under the root of the domain
Place a .htaccess in the above folder as RewriteRule ^$ index.php
Parse the URL
With PHP coding you can now strip the URL or file extension as required

Resources