.htaccess, two consecutive rewrites? - .htaccess

I need to take a url, "/ServiceSearch/r.php?n=blahblah", and have it go to "/search/blahblah/" so that it appears in the browser as "/search/blahblah", but I actually want it to REALLY be going to "r.php?n=ServiceSearch&n=blahblah"..
So I was thinking I'll need to rewrite the first URL to "/ServiceSearch/r.php?n=blahblah" and then the second url, "/search/blahblah/", to the third, "r.php?n=ServiceSearch&n=blahblah".
Well, I know this is wrong, but it's my best guess. I'm really struggling with it.

Well, I know this is wrong
No, that’s actually the right way. Something like the following should work:
RewriteRule /ServiceSearch/r.php?n=(.*)$ /search/$1 [R]
RewriteRule /search/(.*)$ /r.php?n=ServiceSearch&n=$1 [L]
Here, (.*) captures the variable part (“blablabla”) and inserts it into the replacement via $1. The flags at the end mean that the first query should be a HTTP redirect ([R]), i.e. the client’s browser will be instructed to redirect to that address. And that the second redirect is to be the last ([L] – it’s also not an HTTP redirect since we didn’t specify that; instead, the redirect is handled on the server side). Strictly speaking, the [L] flag isn’t necessary but if you later add more rewrite rules it will prevent unwanted interference.

Related

RewriteRule giving me issues with my regex

I'm trying to do a simple redirect where going to a url like www.example.com/foo will take me to www.example.com/quokka/inquiry/ask.php?user=foo.
For testing purposes I started with this:
RewriteRule ^(m.*)$ /quokka/inquiry/ask.php?user=$1
This works great for use cases where the foo starts with the letter: m, but I want it to be super customizable. So then I make this my redirect (note the removal of the letter m):
RewriteRule ^(.*)$ /quokka/inquiry/ask.php?user=$1
Why isn't the RewriteRule above not working for any instance of foo? I believe there's something wrong with my Regex?
Any help would be greatly appreciated.
RewriteRule ^(.*)$ /quokka/inquiry/ask.php?user=$1
Depending on what other directives you have in your .htaccess file, this is possibly causing an internal rewrite loop, which is preventing the URL from ever resolving correctly (do you get a 500 Internal Server Error?). Or, at best, an invalid rewrite to /quokka/inquiry/ask.php?user=quokka/inquiry/ask.php.
Aside: Note that, as mentioned, this is an internal rewrite, not strictly a "redirect" as you stated in your question. The term "redirect" usually refers to an "external 3xx redirect". (Although admittedly the Apache docs also confuse these terms, but do at least qualify this as an "internal redirect".)
In the case of the above directive, the rewritten URL is also captured by the ^(.*)$ pattern (which captures anything), which results in a loop something like:
Request: www.example.com/foo
Rewritten to: /quokka/inquiry/ask.php?user=foo
Rewritten to: /quokka/inquiry/ask.php?user=quokka/inquiry/ask.php
Rewritten to: /quokka/inquiry/ask.php?user=quokka/inquiry/ask.php
:
URL-rewriting does not stop when it gets to the end of the .htaccess file. Processing loops until the URL passes through unchanged. (Although what is considered a "change" is not always entirely clear, as you can get loops simply by rewriting the URL, even when the rewritten URL is the same, as in step#4 above.)
The pattern ^(m.*)$ "works" because the rewritten URL does not start with an "m". But if you have an other URLs that start with an "m", then these will also be rewritten and become inaccessible.
You need to have a unique URL that only captures "user IDs" (in this case). For example, all URLs that reference "user IDs" could have a specific prefix, eg. example.com/u/<userid>.
RewriteRule ^u/(.*)$ /quokka/inquiry/ask.php?user=$1
Or perhaps are of a maximum length that does not conflict with any other URL (eg. between 3 and 8 chars):
RewriteRule ^(.{3,8})$ /quokka/inquiry/ask.php?user=$1
Also, if you are restrictive as possible on the format of the user ID then this might also be sufficient. eg. only lowercase letters:
RewriteRule ^([a-z]+)$ /quokka/inquiry/ask.php?user=$1
However, using a prefix and restriction (regex should always be as restrictive as possible) would be my preference, as it avoids potential conflicts in the future. For example:
RewriteRule ^u/([a-z]{3,8})$ /quokka/inquiry/ask.php?user=$1 [L]
Also, include the L flag to ensure that no other directives that immediately follow are processed.

Renaming and redirecting pages fails in htaccess

I am sorry to ask this question, because the answer seemingly is so easy. However, after three hours of trial and error I am without a clue.
I have several pages on a website using parameters in the url. I would like to change that, to a more regular url. Example:
domain.com/pag.php?id=1-awesome-page should become domain.com/awesome-page
So far so good, but so far I have three problems.
1. The old page still is accessible, Google will index it as duplicated content. When I try to redirect it, I am getting infinite loop errors.
2. For whatever reason, sometimes SOME images (straight from the content) get stripped off on the newly named page. I tried playing with a base-url and renaming the images and urls, but nothing so far.
3. Also the redirect doesn't care if i'd enter id=1-awesome-page or id=2-worthless-page. It all redirects to the first one.
Among the things i've tried.
RewriteCond %{QUERY_STRING} id=1-awesome-page
RewriteRule ^pag\.php$ /awesome-page? [L,R=301]
RewriteRule ^awesome-page?$ pag\.php?id=1 [NC]
What you want to do cannot really be done with mod_rewrite, unless you want to make a rule for every page, which will probably slow your site down quite a lot. This is, because you can't summon the 1 in 1-awesome-page out of thin air, and your pag.php page doesn't seem to be able to load the page only based on it's seo name. If you need to use that number, you need to have that number somewhere in your url.
As for your questions:
The error you mention cannot be reproduced with the current iteration of your .htaccess. You likely had an infinite loop previously, and since you use R=301 to test, the browser will cache this redirect and only request the second resource afterwards when you request the first resource. You should test with [R,L] and only change to [R=301,L] when everything works as expected. Not doing so will cause weird behaviour, and behaviour you do not expect with your .htaccess.
When you have an url a and an url b, and want to redirect a to b, and want to internally rewrite b to a, you need to make sure that any given time not both rules can be matched. You can either use the %{THE_REQUEST} trick or use the END flag. Both are outlined in this answer.
If you have a problem with resources on a page not loading after making a fancy url, you likely used relative url's. This question outlines the possibilities on how to resolve this. You can either make the url's absolute or relative to the root of your site, or use <base href="/">.
The following would work for /pag.php?id=123-news-page and /news/123/news-page.
RewriteCond %{THE_REQUEST} pag\.php\?.*id=([^-]+)-([^&\s]+)
RewriteRule ^pag\.php$ /news/%1/%2? [L,R]
RewriteRule ^news/([^/]+)/([^/]+)/?$ pag.php?id=$1-$2 [L]

mod_rewrite: How to disable not clean urls navigation of rewrite rules

I've been enabled mod_rewrite module and all is right.
I created simple rules for the url, but how do I disable the url navigation (rewritten) with the parameters?
example:
# rewrite rule for cleaning
RewriteRule ^bookstore/([0-9]+)?$ /bookstore/book.php?id=$1 [L]
Now, if I navigate to http://mydomine.com/bookstore/123 all is done, but the url http://mydomine.com/bookstore/book.php?id=123 is also navigable.
How can I make visible and bavigable only the first one?
Add this to the same htaccess file:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /bookstore/book\.php\?id=([0-9]*)
RewriteRule ^bookstore/book\.php$ /bookstore/%1? [L,R=301]
This will 301 redirect requests for the URI with query strings to the one without.
Not 100% sure about this, but I think that if you rewrite A to B, then both A and B will work.
I would like to ask why exactly is it a problem that http://mydomine.com/bookstore/book.php?id=123 is navigable too? What is the problem if that link is valid too, and the user can use both links... although it would take them some time and luck to discover the second option. What would they gain by doing that? What would you lose? If the answer in both cases is "nothing", then simply stop worrying. :) If you used the old links previously and now replace then with new links, then it is a good thing that your customer's old bookmarks will still work.
But assuming that you have a good reason for disabling the old URLs, how about changing them both. For example rename "book.php" to "xyz.php" and then redirect http://mydomine.com/bookstore/123 to http://mydomine.com/bookstore/xyz.php?id=123 -- and the old http://mydomine.com/bookstore/book.php?id=123 will stop working.
Ok, that is an ugly solution, but you can make it nicer if instead of renaming the files you just move them to a subdirectory, like http://mydomine.com/xyz/bookstore/book.php?id=123 . Alternatively, you could use the redirect to add a "secret" parameter and then check it in the PHP file, for example rewrite http://mydomine.com/bookstore/123 to http://mydomine.com/bookstore/book.php?id=123&secret=xyz . Sure, it's just a "security by obscurity", but again... what exactly would anyone gain by discovering your true URLs?

Why does this cause an infinite request loop?

Earlier today, I was helping someone with an .htaccess use case, and came up with a solution that works but can't quite figure it out myself!
He wanted to be able to:
Browse to index.php?id=3&cat=5
See the location bar read index/3/5/
Have the content served from index.php?id=3&cat=5
The last two steps are fairly typical (usually from the user entering index/3/5 in the first place), but the first step was required because he still had some old-format links in his site and, for whatever reason, couldn't change them. So he needed to support both URL formats, and have the user always end up seeing the prettified one.
After much to-ing and fro-ing, we came up with the following .htaccess file:
RewriteEngine on
# Prevents browser looping, which does seem
# to occur in some specific scenarios. Can't
# explain the mechanics of this problem in
# detail, but there we go.
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [L]
# Hard-rewrite ("[R]") to "friendly" URL.
# Needs RewriteCond to match original querystring.
# Uses "?" in target to remove original querystring,
# and "%n" backrefs to move its components.
# Target must be a full path as it's a hard-rewrite.
RewriteCond %{QUERY_STRING} ^id=(\d+)&cat=(\d+)$
RewriteRule ^index\.php$ http://example.com/index/%1/%2/? [L,R]
# Soft-rewrite from "friendly" URL to "real" URL.
# Transparent to browser.
RewriteRule ^index/(\d+)/(\d+)/$ /index.php?id=$1&cat=$2
Whilst it might seem to be a somewhat strange use case ("why not just use the proper links in the first place?", you might ask), just go with it. Regardless of the original requirement, this is the scenario and it's driving me mad.
Without the first rule, the client enters into a request loop, trying to GET /index/X/Y/ repeatedly and getting 302 each time. The check on REDIRECT_STATUS makes everything run smoothly. But I would have thought that after the final rule, no more rules would be served, the client wouldn't make any more requests (note, no [R]), and everything would be gravy.
So... why would this result in a request loop when I take out the first rule?
Without being able to tinker with your setup, I can't say for sure, but I believe this problem is due to the following relatively arcane feature of mod_rewrite:
When you manipulate a URL/filename in per-directory context mod_rewrite first rewrites the filename back to its corresponding URL (which is usually impossible, but see the RewriteBase directive below for the trick to achieve this) and then initiates a new internal sub-request with the new URL. This restarts processing of the API phases.
(source: mod_rewrite technical documentation, I highly recommend reading this)
In other words, when you use a RewriteRule in an .htaccess file, it's possible that the new, rewritten URL maps to an entirely different directory on the filesystem, in which case the .htaccess file in the original directory wouldn't apply anymore. So whenever a RewriteRule in an .htaccess file matches the request, Apache has to restart processing from scratch with the modified URL. This means, among other things, that every RewriteRule gets checked again.
In your case, what happens is that you access /index/X/Y/ from the browser. The last rule in your .htaccess file triggers, rewriting that to /index.php?id=X&cat=Y, so Apache has to create a new internal subrequest with the URL /index.php?id=X&cat=Y. That matches your earlier external redirect rule, so Apache sends a 302 response back to the browser to redirect it to /index/X/Y/. But remember, the browser never saw that internal subrequest; as far as it knows, it was already on /index/X/Y/. So it looks to you as though you're being redirected from /index/X/Y/ to that same URL, triggering an infinite loop.
Besides the performance hit, this is probably one of the better reasons that you should avoid putting rewrite rules in .htaccess files when possible. If you move these rules to the main server configuration, you won't have this problem because matches on the rules won't trigger internal subrequests. If you don't have access to the main server configuration files, one way you can get around it (EDIT: or so I thought, although it doesn't seem to work - see comments) is by adding the [NS] (no subrequest) flag to your external redirect rule,
RewriteRule ^index\.php$ http://example.com/index/%1/%2/? [L,R,NS]
Once you do that, you should no longer need the first rule that checks the REDIRECT_STATUS.
The solution below worked for me.
RewriteEngine on
RewriteBase /
#rule1
#Guard condition: only if the original client request was for index.php
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php [NC]
RewriteCond %{QUERY_STRING} ^id=(\d+)&cat=(\d+)$ [NC]
RewriteRule . /index/%1/%2/? [L,R]
#rule 2
RewriteRule ^index/(\d+)/(\d+)/$ /index.php?id=$1&cat=$2 [L,NC]
Here is what I think is happening
From the steps you quoted above
Browse to index.php?id=3&cat=5
See the location bar read index/3/5/
Have the content served from index.php?id=3&cat=5
At Step 1, Rule 1 matches and redirects to location bar and fulfills Step 2.
At Step 3, Rule 2 now matches and rewrites to index.php.
The rules are rerun, for the reasons David stated, but since THE_REQUEST is immutable once set to the original request, it still contains /index/3/5 so Rule 1 does not match.
Rule 2 does not match either and the result of index.php is served.
Most other variables are mutable e.g. REQUEST_URI. Their modification during rule processing, and the incorrect expectation that the pattern matches are against the original request is a common reason for infinite loops.
Its feels quite esoteric sometimes, but I am sure there is a logical reason for its complexity :-)
EDIT
Surely there are two distinct requests
There are 2 client requests, the original one from Step1 and the one from the external redirect in step 2.
What I glossed over above is that when Rule 2 matches on the second request, it is rewritten to /index.php and causes an internal redirect. This forces the .htaccess file for / directory to be loaded again (it could easily have been another another directory with different .htaccess rules) and Re-run all the rules again.
So... why would this result in a request loop when I take out the first rule?
When the rules are re-run, the first rule now unexpectedly matches, as a result of Rule2's rewrite, and does a redirect, causing an infinite loop.
David's answer does contain most of this information and is what I meant "for the reasons David stated".
However, the main point here is that you do need the extra condition, either your condition, which stops further rule processing on internal redirects, or mine, which prevents rule 1 from matching, is necessary to prevent the infinite loop.

301 redirect question?

Is this qood example of redirection of page to another domain page:
RewriteCond %{HTTP_HOST} ^dejan.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.dejan.com.au$
RewriteRule ^seo_news_blog_spam\.html$ "http\:\/\/dejanseo\.com\.au\/blog\-spam\/" [R=301,L]
or good old works too:
301 redirect seo_news_blog_spam.html http://dejanseo.com.au/blog/spam/
and whats the difference?
Presumably, the rules are functionally equivalent (well, assuming that http://dejanseo.com.au/blog/spam/ was supposed to be http://dejanseo.com.au/blog-spam/ like the first one redirects to, and the only host pointing at that location is dejanseo.com.au with or without the www).
The first example uses directives from mod_rewrite, whereas the second one uses some from mod_alias. I imagine that the preferred option is the second one for this particular case, if not only because it's a bit simpler (there's also marginal additional overhead involved in creating the regular expressions being used by mod_rewrite, but that's very minor):
Redirect 301 seo_news_blog_spam.html http://dejanseo.com.au/blog-spam/
However, I suspect the reason that you have the first one is that it was created using CPanel (based on the unnecessary escapes in the replacement that appeared before in another user's question where it was indicated CPanel was the culprit). They've gone with the mod_rewrite option because it provides conditional flexibility that the Redirect directive does not, and I assume this flexibility is reflected somewhat in whatever interface is used to create these rules.
You'll note that there is a condition on whether or not to redirect based upon your host name in the first example, identified by the RewriteCond. This allows for you to perform more powerful redirects that are based on more than just the request path. Note that mod_rewrite also allows for internal redirects invisible to the user, which mod_alias is not intended for, but that's not the capacity it's being used in here.
As a final aside, the host names in your RewriteCond statements should technically have their dots escaped, since the . character has special meaning in regular expressions. You could also combine them, change them to direct string comparisons, or remove them altogether (since I imagine they don't do anything useful here).
Unbeliavable, the problem was that the synthax wasn't correct, so instead of:
redirect 301 seo_news_blog_spam.html http://dejanseo.com.au/blog/spam/
it should look like this:
Redirect 301 seo_news_blog_spam.html http://dejanseo.com.au/blog/spam/
One, first big letter was the source of all troubles, what a waste of time :D
it works now as it supposed to!
Thanks to everyone who participated, issue solved.

Resources