Surprising rewriting of URL by htaccess rule - .htaccess

I've zeroed my problem and I've specific question.
With only the following code in the .httaccess why index2.php gets called if I type in my URL as www.mysite.com/url2 ?
RewriteEngine On
RewriteCond %{REQUEST_URI} (.html|.htm|.feed|.pdf|.raw)$ [NC]
RewriteRule (.*) index2.php [L]
I've also tested it at http://www.regextester.com and should not replace it with index2.php:
In the end I want this rule to skip any URL starting with /url2 or /url2/*.
EDIT: I've made screen recording of this problem: http://screenr.com/BBBN

You have this in your .htaccess:
RewriteEngine On
RewriteCond %{REQUEST_URI} (.html|.htm|.feed|.pdf|.raw)$ [NC]
RewriteRule (.*) index2.php [L]
What it does? it rewrites anything that ends with html, htm, feed , pdf , raw to index2.php. So, if you are getting results as your URL is ends with those extensions, then there are two possible answers:
There is another rewrite rule in an .htaccess in upper directories (or in server config files) that causes the URL to be rewritten.
Your URL actually ends with those extensions. have in mind, what you enter in your address bar, will be edited and rewritten. For example, if you enter www.mysite.com/url2 in your address bar and that file doesn't exist on server, your server will try to load the proper error document. So, if your error document is /404.html, it will be rewritten to index2.php at the end.
Update:
I think it's the case. create a file named 404.php in your document root. Inside your main .htaccess (in your document root), put this:
ErrorDocument 404 /404.php
delete all other ErrorDocument directives.
inside 404.php , put this:
<?php
echo 'From 404.php file';
?>
Logic behind it:
When you have a weird behavior in mod_rewrite, the best solution in my experience is using rewrite log. to enable rewrite log put this in your virtualhost or other server config directives you may choose:
RewriteLogLevel 9
RewriteLog "logs/RewriteLog.log"
be careful: the code above will enable rewrite log and start logging at highest level possible (logging everything). It will decrease your server speed and the log file will become huge very quickly. Do this only on your dev server.
Explanation: When you try to access www.mysite.com/url2, Apache gives your URL to rewrite module. Rewrite module checks if any of RewriteRules applies to your URL. Because you have one rule and it doesn't apply to your URL, it tries to load the normal file. But this file does not exit. So, Apache will do the next step which is showing the proper error message. When you set a custom error file, Apache will run the test against the new address. For example if error document is /404.html, Apache checks whether your rule applies to /404.html or not. Since it does, it will rewrite it.
The point to remember is apache will do this every time there is change in URL, whether the change is made by rewrite module or not!

The rule you list should work as you expect if this is the only rule. Fact is that theory is fun, but apparently it doesn't work as expected. Please note that . will match ANY CHARACTER. If you want to match the full stop/period character, you'll need to escape it. That's why I use \.(html|htm|feed|pdf|raw)$ instead of (.html|.htm|.feed|.pdf|.raw)$ below.
You can add another RewriteCond that simply doesn't match if the url starts with /url2, like below. This might not be a viable solution if there are lots of urls that shouldn't be matched.
RewriteCond %{REQUEST_URI} !^/url2
RewriteCond %{REQUEST_URI} \.(html|htm|feed|pdf|raw)$ [NC]
RewriteRule (.*) index2.php [L]
To get a better understanding of what is happening you can alter the rule to something like this. Now simply enter the urls you dont want to be matched in the url bar and inspect the url bar after the redirect happens. In the url-parameter you now see what url actually triggered this rule to match. This screencast shows you a similar version working with a sneaky rewriterule that is working away on the url.
#A way of finding out what is -actually- matched
RewriteCond %{REQUEST_URI} \.(html|htm|feed|pdf|raw)$ [NC]
RewriteCond %{REQUEST_URI} !/foo
RewriteRule (.*) /foo?url=$1 [R,L]
You can decide to match the %{THE_REQUEST} variable instead. This will always contain the request itself. If something else is rewriting the url, this variable doesn't change, meaning you can use this to overwrite any changes. Make sure the url won't be matching itself. You would get something like below. An example screencast can be found here.
#If it doesn't end on .html/htm/feed etc, this one won't match
RewriteCond %{THE_REQUEST} ^(GET|POST)\ /.*\.(html|htm|feed|pdf|raw)\ HTTP [NC]
RewriteCond %{REQUEST_URI} !^/index2\.php$
RewriteRule (.*) /index2.php [L]

Related

Targeting single directory for rewrite rule

I have edited this question to use the actual URLs. I need the url
http://westernmininghistory.com/mine_db/main.php?page=mine_detail&dep_id=10257227
To be rewritten like
http://westernmininghistory.com/mine_detail/10257227/
I have tried
RewriteRule ^([^/]*)/([^/]*)/$ /mine_db/main.php?page=$1&dep_id=$2 [L]
Which works on this page but breaks every other page on the site. I was wondering if there was a way to force the rewriterule to only operate on files within the mine_db directory. I had tried RewriteCond but with no success:
#RewriteCond %{REQUEST_URI} ^/mine_db
I really don't know they proper syntax for this though. Any ideas?
First of your rule can be shortened and written without needing RewriteCond. Also it appears that you want to capture 2 variables after test_db.
You can try this rule instead:
RewriteRule ^(mine_detail)/([0-9]+)/?$ /mine_db/main.php?page=$1&dep_id=$2 [QSA,L,NC]
Which will work with URIs like /mine_detail/12345 (trailing slash is optional). Also note that above rewrite will happen silently (internally) without changing the URLi in browser. If you want to change URL in browser then use R flag as well like this:
RewriteRule ^(mine_detail)/([0-9]+)/?$ /mine_db/main.php?page=$1&dep_id=$2 [QSA,L,NC,R]

htacces redirect and mask

How would I redirect from the root folder to a sub folder and then mask that folder?
So instead of http://root.com/sub_folder
It would be just http://root.com
I have tried:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^root\.com$
RewriteRule (.*) http://root.com/$1 [R=301,L]
RewriteRule ^$ /sub [L]
However, that does not work. Any help will be welcome.
To clarify what I think you're looking for:
You want users who enter http://root.com with no trailing path to be rewritten silently to http://root.com/sub.
If a user directly enters http://root.com/sub, however, you want them to be redirected to http://root.com.
Any other path within root.com should be left alone.
The following two rules accomplish this. If you have more than one domain and only want this to apply to one domain, add your original RewriteCond in front of each RewriteRule.
RewriteRule ^sub/?$ http://root.com/ [R=301,L]
RewriteRule ^$ /sub [END]
First rule redirects /sub with or without trailing slash to root.com. Second rule rewrites base domain to /sub.
EDIT: Per Jon Lin's comment, below, the [L] flag only stops the current round of processing and internal rewrites are sent through the rules once more (I always forge that part). So, you can terminate the second line with [END] instead, which stops all rewrite processing. The catch is that [END] is only available in Apache 2.4 or higher, so if you're on an older version something trickier will need to be done.

Redirect to fallback file if first attempt fails

I have this in my .htaccess:
RewriteRule ^images/([^/\.]+)/(.+)$ themes/current/images/$1/$2 [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^images/([^/\.]+)/(.+)$ modules/$1/images/$2 [L,NC]
The idea is that it does the following:
// Rewrite this...
images/calendar/gear.png
// ... to this
themes/current/images/calendar/gear.png
// HOWEVER, if that rewritten path doesn't exist, rewrite the original URL to this:
modules/calendar/images/gear.png
The only things that change here are calendar and gear.png, the first of which could be any other single word and the latter the file name (possibly with path) to an image file.
I can rewrite the original URL to the first rewrite as shown in the example just fine, but what I cannot do is get my .htaccess to serve up the file from the other, fallback location if the first location 404s. I was under the impression that not using [L] in my first RewriteRule would rewrite the URL for RewriteCond.
The problem I'm having is that instead of serving the fallback file, the browser just shows a 404 to the first rewritten path (themes/current/calendar/gear.png), instead of falling back to modules/calendar/gear.png. What am I doing wrong?
Please note that my regex isn't perfect, but I can refine that later. Right now I'm concerning myself with the rewrite logic itself.
Fallthrough rules are fraught with bugs. My general recommendation is than any rule with a replacement string other than - should trigger an internal redirect to restart the .htaccess parse. This avoids the subrequest and URI_PATH bugs.
Next once you go to 404, again in my experience this is unrecoverable. I have a fragment which does something similar to what you are trying to do:
# For HTML cacheable blog URIs (a GET to a specific list, with no query params,
# guest user and the HTML cache file exists) then use it instead of executing PHP
RewriteCond %{HTTP_COOKIE} !blog_user
RewriteCond %{REQUEST_METHOD}%{QUERY_STRING} =GET [NC]
RewriteCond %{ENV:DOCUMENT_ROOT_REAL}/blog/html_cache/$1.html -f
RewriteRule ^(article-\d+|index|sitemap.xml|search-\w+|rss-[0-9a-z]*)$ \
blog/html_cache/$1.html [L,E=END:1]
Note that I do the conditional test in filesystem space and not URI (Location) space. So this would map in your case to
RewriteCond %{DOCUMENT_ROOT}/themes/current/images/$1/$2l -f
RewriteRule ^images/(.+?)/(.+)$ themes/current/images/$1/$2 [L]
Though do a phpinfo() to check to see if your hosting provider uses an alternative to DOCUMENT_ROOT if it is a shared hosting offering e.g an alternative environment variable as mine uses DOCUMENT_ROOT_REAL.
The second rule will be picked up on the second processing past after the internal redirect.

Force removal of index.php with .htaccess

I'm currently using the following to rewrite http://www.site.com/index.php/test/ to also work directly with http://www.site.com/test/, but I would like to not only allow the second version, I would like to FORCE the second version. If a user goes to http://www.site.com/index.php/test/ it should immediately reroute them to http://www.site.com/test/. index.php should never appear in a url. Stipulation: this should only apply to the first index.php. If I have a title like http://www.site.com/index.php/2011/06/08/remove-index.php-from-urls/ it should leave the second index.php, as it is part of the URL.
Current rule that allows but does not force:
#Remove index.php
RewriteCond $1 !^(index.php|images|css|js|robots.txt)
RewriteRule ^(.*)$ /index.php/$1 [L]
Thanks.
As you wrote, if a user goes to http://www.site.com/index.php/test/ this rule will imediately reroute him to http://www.site.com/test/
RedirectMatch 301 /index.php/(.*)/$ /$1
I'm not sure if that is what you need as your current rewrite rule is opposite to mine.
First (and wrong) answer - see below
You can accomplish a redirection with these directives (in this order):
RewriteCond %{REQUEST_URI} ^index.php
RewriteRule ^index\.php/(.+)$ /$1 [R,L]
RewriteCond $1 !^(index.php|images|css|js|robots.txt)
RewriteRule ^(.*?)$ /index.php/$1 [L]
That will first redirect all the requests that begin with index.php to the corresponding shortened url, then silently serve index.php/etc with the second rule.
EDIT - Please read on!
In fact, the solution above generates an infinite redirection loop, because Apache takes the following actions (let's say we request /index.php/abc):
first RewriteCond matches
Apache redirects [R], that is, generates a new HTTP request, to /abc
/abc fails first RewriteCond
/abc matches second RewriteCond
Apache does not redirect, but rewrites this URI (so it makes an "hidden" request), to /index.php/abc . We are again at point 1, that's a loop.
Please note...
By using the [L] (last rule) flag, we can only tell Apache not to process more rewrite rules, but only if the current rule matches. Since a new HTTP request is made, there is no information about how may redirection we have been through yet. So, any time one of the two matches, and in any case it generates a new request (=>loop)
Using the [C] (chain rules) flag is kinda pointless because it makes Apache process a rule only if the previous rule matches, while the two rules we have are mutually excluding.
Using the [NS] (not if subrequest) flag on rule #1 is again not an option because it aƬsimply does not apply to our case (see Apache RewriteRule docs about it)
Setting env variables is not an option (alas), since a new request is made at pt 2, thus destroying all environment variables we set.
An alternative solution can be to rewrite e.g. /abc , to /index.php?path=abc. That is done by these rules (please, delete your RedirectMatch similar rule before adding these):
RedirectMatch ^/index\.php(/.*) $1
RewriteCond %{REQUEST_URI} !^/(index.php|images|css|js|robots.txt|favicon.ico)
RewriteRule ^(.+) /index.php?path=$1 [L,QSA]
I don't know the internals of CodeIgniter's scripts, but as most of the MVC scripts, it will read $_REQUEST['PATH_INFO'] to understand which page is requested. You could slightly modify the code that recognizes the page like this (I assumed that the page path is stored in the $page var):
$page = $_REQUEST['PATH_INFO'];
if(isset($_GET['path']) && strlen($_GET['path'])) $page = $_GET['path']; // Add this line
This won't break the previous code and accomplish what you asked for.

How do I get the [L] flag of RewriteRule (.htaccess) really working?

To newcomers: While trying to comprehensively describe my problem and phrase my questions I produced huge ammount of text. If you don't want to read the whole thing, my observations about (read "proof of") [L] flag not working the misconception, from which it all sprung, is located in Additional observations section. Why I misunderstood apparent behaviour is described in my Answer as well as solution to given problem.
Setup
I have following code in my .htaccess file:
# disallow directory indexing
Options -Indexes
# turn mod_rewrite on
Options +FollowSymlinks
RewriteEngine on
# allow access to robots file
RewriteRule ^robots.txt$ robots.txt [NC,L]
# mangle core request handler address
RewriteRule ^core/(\?.+)?$ core/handleCoreRequest.php$1 [NC,L]
# mangle web file adresses (move them to application root folder)
# application root folder serves as application GUI address
RewriteRule ^$ web/index.html [L]
# allow access to images
RewriteRule ^(images/.+\.(ico|png|bmp|jpg|gif))$ web/$1 [NC,L]
# allow access to stylesheets
RewriteRule ^(css/.+\.css)$ web/$1 [NC,L]
# allow access to javascript
RewriteRule ^(js/.+\.js)$ web/$1 [NC,L]
# allow access to library scripts, styles and images
RewriteRule ^(lib/js/.+\.js)$ web/$1 [NC,L]
RewriteRule ^(lib/css/.+\.css)$ web/$1 [NC,L]
RewriteRule ^(lib/(.+/)?images/.+\.(ico|png|bmp|jpg|gif))$ web/$1 [NC,L]
# redirect all other requests to application address
# RewriteRule ^(.*)$ /foo/ [R]
My web application (and its .htaccess file) is located in foo subfolder of DOCUMENT_ROOT (accessed from browser as http://localhost/foo/). It has PHP core part located in foo/core and JavaScript GUI part located in foo/web. As can be seen from the code above, I want to allow access only to single core script that handles all requests from GUI and to 'safe' web files and redirect all other requests to base application address (last commented directive).
Problem
Behaviour
It works until I try the last part by uncommenting the last redirecting directive. If I comment some more lines, the appropriate page parts stop working, etc.
However, when I uncomment last line, which should be performed only when matching of all previous rules fails (at least that's what I understand), page goes into redirection cycle (Firefox throws error page with something like "This page isn't redirecting properly"), because it's redirecting to http://localhost/foo/ again and again and again, forever.
Questions
What I don't understand is this processing of this rule:
RewriteRule ^$ web/index.html [L],
specifically the [L] flag. The flag apparently doesn't work for me. When the last line is commented, it correctly redirects, but when I uncomment it, it is always processed, even though rewriting should stop on [L] flag. Anyone got any ideas?
Also, on a sidenote, I'd be thrilled to know why my following attempt at fixing it doesn't work either:
RewriteEngine on
RewriteRule ^core/(\?.+)?$ core/handleCoreRequest.php$1 [NC,L]
RewriteRule ^(.*)$ web/$1 [L]
RewriteRule ^.*$ /foo/ [L]
This actually doesn't work at all. Even if I remove the last line, it still doesn't redirect anything correctly. How does the redirecting work in the first example, if it doesn't work in the second?
It would also be of great benefit to me, if anybody knew any way to actually debug these directives. I spend hours on this without even the slightest clue what could possibly be wrong.
Additional observations
After trying the advice given by bbadour (not that I haven't tried it before, but now that I had a second opinion, I gave it another shot) and it didn't work, I've come up with the following observation. By rewriting last line to this:
RewriteRule ^(.*)$ /foo/?uri=$1 [R,L]
or this
RewriteRule ^(.*)$ /foo/?uri=%{REQUEST_URI} [R,L]
and using Firebug's Net panel, I found out more evidence, that the [L] flag is clearly not working as expected in the previously mentioned RewriteRule ^$ web/index.html [L] rule (let's call it THE RULE from now on). In first case I get [...]uri=web/index.html, in second case [...]uri=/foo/web/index.html. That means that THE RULE gets executed (rewrites ^$ to web/index.html), but the rewriting doesn't stop there. Any more ideas, please?
After hours of searching and testing, I finally found the real problem and solution. Hopefully this will help somebody else too, when they come across the same problem.
Cause of observed behavior
.htaccess file is processed after every redirect (even without [R] flag),
which means that after the RewriteRule ^$ web/index.html [L] is processed, mod_rewrite correctly stops rewriting, goes to the end of the file, redirects correctly to /foo/web/index.html, and then the server starts processing .htaccess file for the new location, which is the same file. Now only the last rewrite rule matches and redirects back to /foo/ (this time with [R], so the redirect can be observed in browser) ... and the .htaccess file is processed again, and again, and again...
Once more for clarity: Because only the hard redirects can be observed, it seems like the [L] flag is ignored, but it is not so. Instead, the .htaccess is processed two times redirecting back and forth between /foo/ and /foo/web/index.html.
Solution
Disallow direct access to subfolder
To virtually move subdirectory to application root directory, additional complex conditional rewrites must be used. Variable THE_REQUEST is useful for distinguishing between hard and soft redirects:
RewriteCond %{THE_REQUEST} ^GET\ /foo/web/
RewriteRule ^web/(.*) /foo/$1 [L,R]
For this rewrite rule to be matched, two conditions must apply. First, on second line, the "local URI" must start with web/ (which corresponds with absolute web URI /foo/web/). Second, on first line, the real request URI must start with /foo/web/ too. Together this means, that the rule only matches when the file inside the web/ subfolder is requested directly from the browser, in which case we want to do a hard redirect.
Redirect to allowed content from root to subfolder (soft)
RewriteCond $1 !^web/
RewriteCond $1 ^(.+\.(html|css|js|ico|png|bmp|jpg|gif))?$
RewriteRule ^(.*)$ web/$1 [L,NC]
We want to redirect to allowed content only if we haven't done it already, hence the first condition. Second condition specifies mask for allowed content. Anything matching this mask will be softly redirected, possibly returning 404 error if the content doesn't exist.
Hide all content not in subfolder or not allowed
RewriteRule !^web/ /foo/ [L,R]
This will do a hard redirect to application root for all URIs not beginning with web/ (and remember, only requests that can begin with web/ at this point are internal redirects for allowed content.
Real example
My code shown in my "question" after using solution tips mentioned above gradually transformed into the following:
# disallow directory indexing
Options -Indexes
# turn mod_rewrite on
Options +FollowSymlinks
RewriteEngine on
# allow access to robots file
RewriteRule ^robots.txt$ - [NC,L]
# mangle core request handler address
# disallow direct access to core request handler
RewriteCond %{THE_REQUEST} !^(GET|POST)\ /asm/core/handleCoreRequest.php
RewriteRule ^core/handleCoreRequest.php$ - [L]
# allow access to request handler under alias
RewriteRule ^core/$ core/handleCoreRequest.php [NC,QSA,L]
# mangle GUI files adressing (move to application root folder)
# disallow direct access to GUI subfolder
RewriteCond %{THE_REQUEST} ^GET\ /foo/web/
RewriteRule ^web/(.*) /foo/$1 [L,R]
# allow access only to correct filetypes in appropriate locations
RewriteCond $1 ^$ [OR]
RewriteCond $1 ^(images/.+\.(ico|png|bmp|jpg|gif))$ [OR]
RewriteCond $1 ^(css/.+\.css)$ [OR]
RewriteCond $1 ^(js/.+\.js)$ [OR]
RewriteCond $1 ^(lib/js/.+\.js)$ [OR]
RewriteCond $1 ^(lib/css/.+\.css)$ [OR]
RewriteCond $1 ^(lib/(.+/)?images/.+\.(ico|png|bmp|jpg|gif))$
RewriteRule ^(.*)$ web/$1 [L,NC]
# hide all files not in GUI subfolder that are not whitelisted above
RewriteRule !^web/ /foo/ [L,R]
What I don't like about this approach is that the application root folder must be hardcoded in .htaccess file (as far as I know), so the file must be generated on application install, not simply copied.
To debug, try simplifying your regex, and the url you ask for (a part of the full url you wanna match), and see if it's working, now step by step, add more bits to the regex adn the testing url, till you find where things are stopping to work properly.
Try using:
RewriteRule ^(.*)$ /foo/ [R,L]
If it still loops, put a RewriteCond in front of it to skip the rule if it is already /foo/

Resources