.htaccess mod_rewrite variables through redirect - .htaccess

Short Version:
I wrote the question, and realized most people wouldn't want to read that much text. Consider the below reference, here's the TL;DR:
I need to 301 redirect this url http://app.com/search/foo-bar/
to this url http://app.com/#!/search/foo-bar/
and send this: /foo-bar/, or anything else past /search/ to a server side script. In this case, it's written in php.
Edit for clarity:
Current answers seem to focus on the rewrite to hashbang. That part is not the problem. The problem is that I lose any associated data when rewriting to a hashbang url, as the server side will see app.php as the location, not app.php/#!/foo-bar/ - So I need to capture foo-bar, and send it to the server somewhere other than in the URL. The rewrite works, and is not the issue. Thanks for your answers though!
Long Version:
Ok, so I have an interesting issue that has been tough for me to figure out.
The Scenario:
I have a backbone.js app that uses the hashbang for state:
app.com/#!/search/search-term/key-value/foo-bar/
In addition, I have google traffic coming to the site from the previous version that will be hitting "pretty url" style urls:
app.com/search/search-term/key-value/foo-bar/
I use an .htaccess mod_rewrite to swap the old url out for a hashbanged one if a user hits the legacy url.
I recently introduced a javascript-less bootstrapped version of the site that the site will be built on top of to gracefully downgrade and support crawlers. This is written using php.
For the php site to work, I need to pass in the values past the hashbang to the server side script, so I can figure out what to display.
The Problem:
When I transform a url and add an anchor, everything past the anchor (hashbang) is no longer sent to the request, so I don't have access to it in php.
RewriteRule search/?(.*) #!/search/$1 [R=301,NC,L]
My options for sending things to the server side then are reduced to:
1. Query String
2. Environment Variables
3. Headers
So, I tried sending things via the query string
RewriteRule search/?(.*) #!/search/$1?filter=$1 [R=301,NC,L]
Obviously that didn't work (the query is behind the anchor), so I tried it in front of the hashbang
RewriteRule search/?(.*) ?filter=$1/#!/search/$1 [R=301,NC,L]
That works, but is hideous and redundant to the end user. So, I thought I might try using environment variables.
RewriteRule search/?(.*) /!#/search/$1 [R=301,NC,L,E=FILTER:$1]
That failed, because environment variables aren't preserved through a redirect (duh). I turned to using headers:
RewriteRule search/?(.*) /#!/search/$1 [R=301,NC,L,E=FILTER:$1]
Header set filterParams "%{FILTER}e"
But for some reason, the headers aren't received by the page through the redirect. That seemed to make sense (although I've now stepped well outside of my comfort level with apache directives), so I tried echoing the header, in hopes that it would be passed, received by the second rewrite (that didn't find search), and echoed out.
RewriteRule search/?(.*) /#!/search/$1 [R=301,NC,L,E=FILTER:$1]
Header set filterParams "%{FILTER}e"
Header echo filterParams
Nada - the filter doesn't exist, so although it makes it to the server, it is null. My next thought was to attempt to employ some sort of conditional. Here was my attempt:
RewriteRule search/?(.*) legacy.php/#!/search/$1 [R=301,NC,L,E=FILTER$1]`
<FilesMatch "legacy.php">
Header set filterParams "%{FILTER}e"
</FilesMatch>
Header echo filterParams
That didn't seem to work either, so I'm stumped. I realize that I've spent so long on this that I probably have the solution within my grasp and I'm just tired of looking at it, or it's not even remotely possible, even with gross header hacking.
Anyone have a clue how to to this?

rfc1738.txt says # is not a valid url character
additionally the apache docs says # signals a comment in apache config files.
short answer is your solution is broken not your implementation

AFAIK, there's no good way to preserve variables through redirect without sticking them in the query string...

Related

htaccess redirect not working for long url

How do I redirect the following long link:
http://www.vbpmonitor.com/index.php?option=com_content&view=article&id=24&utm_source=MagnetMail&utm_medium=email&utm_term=asmith#panaceainc.com&utm_content=EVVWP040716&utm_campaign=White%20Paper%3A%20Optimizing%20VBM%20Quality%20Tiering%20for%20Physicians
to
http://www.vbpmonitor.com/optimizing-vbm-quality-tiering-for-physicians
Redirect 301 /index.php?option=com_content&view=article&id=24&utm_source=MagnetMail&utm_medium=email&utm_term=asmith#panaceainc.com&utm_content=EVVWP040716&utm_campaign=White%20Paper%3A%20Optimizing%20VBM%20Quality%20Tiering%20for%20Physicians http://www.vbpmonitor.com/optimizing-vbm-quality-tiering-for-physicians
As said above in the comments I suspect that you have a glitch in your logic here and that in reality you want to redirection to work the other way 'round. Redirecting from the long to the search engine friendly URL simply does not make any sense. So:
Using a Redirect rule you could try that instead:
Redirect 301 /optimizing-vbm-quality-tiering-for-physicians /index.php?option=com_content&view=article&id=24&utm_source=MagnetMail&utm_medium=email&utm_term=asmith#panaceainc.com&utm_content=EVVWP040716&utm_campaign=White%20Paper%3A%20Optimizing%20VBM%20Quality%20Tiering%20for%20Physicians
This will redirect an incoming request to the short URL to the actually existing long URL. That is the usual scenario.
If however you really want to redirect that short URL to the long version, then you cannot do that with a Redirect rule. This might for example be the case if you accidentally sent out that long URL and have a working redirection setup for the short version. Unfortunately you do not explain anything about that in your question or comments, so I can only guess here.
You'd have to use the more flexible rewriting module and use a combination of RewriteCond and RewriteRule. That allows to "cut out" specific patterns of request URLs and to "redesign" how the request should look like after the rewriting.
This would be a simple example that applies two conditions to rewriting the request for file index.php to the long URL:
RewriteEngine on
RewriteCond %{QUERY_STRING} view=article
RewriteCond %{QUERY_STRING} id=24
RewriteRule ^/?index\.php$ /optimizing-vbm-quality-tiering-for-physicians [L,R=301]
Note: this version should work both in the http servers host configuration and also in those .htaccess style files. Where you always should prefer the first option if you have access.
As said above, I can only guess here with the sparse information you provided. I picked two out of many request arguments, since those appear to be the ones best suited as distinct identifiers. But you may have to tweak things. Note that per default RewriteConds are combined by a logical AND, so they both have to resolve to something truish.
For more precise details about this stuff I would like to point you to the official documentation of those modules again. The documentation is extremely precise, well written and comes with good examples. I would always prefer the information there to snippets you find somewhere in the internet or partial answers to questions...
http://httpd.apache.org/docs/current/mod/mod_alias.html#redirect
http://httpd.apache.org/docs/current/mod/mod_rewrite.html

Trying to make a GET variable invisible in an URL but retain its usefulness using mod_write

Good day all,
I am trying to master the ,magic of mod_rewrite and require some advice/help.
I am trying to turn an URL from:
http://www.domainname.com/preview/about/5
To this:
http://www.domainname.com/preview/about
The issue is, I still need to retain the [id] part of the original URL to be used as a GET later on and it not be visible.
The code I have thus far:
RewriteRule ^preview\/([^/]+)\/([^/]+)\/$ /preview\/$1?id=$2 [R=301,QSA]
RewriteRule ^preview\/([^/]+)\/$ ?mode=preview&id=$2 [L,QSA]
This manages to create an URL like: http://www.domainname.com/preview/about/?id=5 and passes the ID through, I just need the ?id=5 to be invisible in the URL.
Thank you in advance anyone who has a solution for this, much appreciated.
UPDATE:
I have managed to get the following code to work as expected alas this is using static values for ID all I now need for this to be complete is to get it working off dynamic values for ID.
RewriteRule ^preview\/([^/]+)\/([^/]+)\/$ /preview\/$1 [R=301,QSA]
RewriteCond %{QUERY_STRING} !.*id=5.*$
RewriteRule ^preview\/([^/]+)\/$ ?mode=preview&id=5 [L,QSA]
You cannot get 'invisible' get parameters. The closest you'll get is setting a cookie to pass this data onwards.
RewriteRule ^preview/([^/]+)/([^/]+)[/]?$ preview/$1/ [CO=id:$2:127.0.0.1:1:/preview/$1:0:1,R]
In php you can access this cookie with $_COOKIE['id'] and the id is invisible in the url (because it isn't actually there). Documentation about the CO flag can be found here.
Edit: If you want to do it all with mod_rewrite, you can access this cookie from mod_rewrite too. As this is an internal rewrite, you can probably just use a direct path to the actual file you want to call.
RewriteCond %{HTTP_COOKIE} id=([^;]*)
RewriteRule ^preview/([^/]+)[/]?$ preview/$1?id=%1 [CO=id:-:127.0.0.1:-1:/preview/$1:0:1,END]
Edit2: I've added in a reset for the id-cookie in the second rule (expiry time T-1 minutes). This will cause the correct page to load if the user decides to go to preview/about/ again within 1 minute from going to preview/about/5 (which redirects to preview/about/ with a hidden id set to '5' to load something different).
If you are not passing the "ID" as part of the query string (e.g. ?id=5) or part of the URI (e.g. /preview/about/5) then you need to pass it in the request body, in something like a POST request. Otherwise, you can't make it "invisible", because the webserver isn't going to see it. If the webserver doesn't see it as a request, there is nothing mod_rewrite can possibly do to extract it.
Assuming you can't setup your site so that requests get POSTed (sort of like how a form is submitted) everytime someone clicks on a link, you're best bet is probably having it look like the http://www.domainname.com/preview/about/5 form, or maybe http://www.domainname.com/preview/about-5?

.htaccess redirect to subfolder, and remove it's name

I'm kind of noob in the world of web so my apologies... I tried many things found on SO and elsewhere, but I didn't manage to do what I want. And the Apache documentation is... well too much complete.
Basically what I want to do is redirect my domain to a subfolder. I found easy solutions for this (many different actually).
http://www.foo.com/
http://foo.com/
should redirect to /bar and appear as http://foo.com/
Using the following I got the expected result :
RewriteEngine on
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} ^www\.foo.com$
RewriteRule ^/?$ "http\:\/\/foo.com" [R=301,L]
RewriteRule ^((?!bar/).*)$ bar/$1 [NC,L]
But I also want the subfolder as well as filenames not to appear when explicitly entered, i.e :
http://www.foo.com/index.html
http://foo.com/index.html
http://wwww.foo.com/bar
http://foo.com/bar
http://wwww.foo.com/bar/index.html
http://foo.com/bar/index.html
Should all appear as
http://foo.com/
Is this possible ?
Obviously using .htaccess, since I'm on a virtual host.
Thanks
As Felipe says, it's not really possible, because you lose information when you do that R=301 redirect: a hard redirect like this starts a whole new request, with no memory of the previous request.
Of course, there are ways to do similar things. The easiest is to put the original request in the query string (here's a good rundown on how mod_rewrite works with query strings). Sure, the query string does show up in the URL, but most modern browsers hide the query string in the address bar, so if your goal is aesthethics, then this method would be workable.
If you really don't want to show any of the original query in the URL, you might use cookies by employing the CO flag (here are some very good examples about cookie manipulation). At any rate, the information about the original request must somehow be passed in the hard redirect.
But anyhow, and most importantly, why would you want to do something like this? It's bound to confuse humans and robots alike. Great many pages behaved like this back when frames were fashionable, and it was pretty terrible (no bookmarking, no easy linking to content, Google results with the snippet "your browser cannot handle frames", no reloading, erratic back button, oh boy, those were the days).
Speaking of which, if your content is html, you may just use a plain old iframe to achieve the effect (but I'd sincerely advise against it).

Having huge redirect list in .htaccess a Problem?

I want to redirect every post 301 redirect, but I have over 3000 posts.
If I list
Redirect permanent /blog/2010/07/post.html http://new.blog.com/2010/07/23/post/
Redirect permanent /blog/2010/07/post1.html http://new.blog.com/2010/07/24/post1/
Redirect permanent /blog/2010/07/post2.html http://new.blog.com/2010/07/25/post2/
Redirect permanent /blog/2010/07/post3.html http://new.blog.com/2010/07/26/post3/
Redirect per......
for over 3000 url redirect command in .htaccess would this eat my server resource or cause some problem? Im not sure how .htaccess work but if the server is looking at these lists each time user requests for page, I would guess it will be a resource hog.
I can't use RedirectMatch because I added date variable in my new url. Do you have any other suggestions redirecting these posts? Or am I just fine?
Thanks!
I am not an Apache expert, so I cannot speak to whether or not having 3,000 redirects in .htaccess is a problem (though my gut tells me it probably is a bad idea). However, as a simpler solution to your problem, why not use mod_rewrite to do your redirects?
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/$3/ [R=permanent]
This uses a regex to match old URLs and rewrite them to new ones. The [R=permanent] instructs mod_rewrite to issue a 301 with the new URL instead of silently rewriting the request internally.
In your example, it looks like you've added the day of the post to the URL, which does not exist in the old URL. Since you obviously cannot use a regexp to divine the day an arbitrary post was made, this method may not work for you. If you can drop the day from the URL, then you're good to go.
Edit: The first time I read your question, I missed the last paragraph. ("I can't use RedirectMatch because I added date variable in my new url.") In this case, you can use mod_rewrite's RewriteMap to lookup the day component of a post.
You have two options:
Use a hashmap to perform fast lookups in a static file. This means all your old URLs will work, but any new posts cannot be accessed using the old URL scheme.
Use a script to grab the day.
In option one, create a file called posts.txt and put:
/yyyy/mm/pppp dd
...for each post where yyyy is the year of the post, mm is the month, and pppp is the post name (without the .html).
When you're done, run:
$ httxt2dbm -i posts.txt -o posts.map
Then we add to to the server/virtual server config: (Note the path is a filesystem path, not a URL.)
RewriteMap postday dbm:/path/to/file/posts.map
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/${postday:$1/$2/$3}/$3/ [R=permanent]
In option two, use pgm:/path/to/script/lookup.whatever as your RewriteMap. See the mod_rewrite documentation for more info about using a script.
Doing the lookup in mod_rewrite is better than just redirecting to a script which looks up the date and then redirects to the final destination because you should never redirect more than once. Issuing a 301 or 302 incurs a round trip cost, which increases the latency of your page load time.
If you have some way in code to determine the day of a post, you can generate the rewrite on the fly. You can setup a mod_rewrite pattern, something like .html and set up a front controller pattern to calculate the new url from the old and issue the 301 header.
With php as an example:
$_SERVER['REQUEST_URI']
will contain the requested url and
header("Location: http://new.blog.com/$y/$m/$d/$title/",TRUE,301);
will send a redirect.
That's... a lot of redirects. But the first thing I would tell you, and probably the only thing I can tell you without qualification, is that you should run some tests and see what the access times for your blog are like, and also look at the server's CPU and memory usage while you're doing it. If they're fairly low even with that giant list of redirects, you're okay as long as your blog doesn't experience a sudden increase in traffic. (I strongly suspect the 3000 rewrites will be slowing Apache down a lot, though)
That being said, I would second josh's suggestion of replacing the redirects with something dynamic. Like animuson said, if you're willing to drop the day from the URL, it'll be easy to set up a RewriteRule directive to handle the redirection. Otherwise, you could do it with a PHP script, or generally some code in whatever scripting language you (can) use. If you're using one of the popular blog engines, it probably contains code to do this already. Basically you could do something like
RewriteRule .* /blog/index.php
and just let the PHP script sort out which post was requested. It has access to the database so it'll be able to do that, and then you can either display the post directly from the PHP script, or to recover your original redirection behavior, you can send a Location header with the correct URL.
An alternative would be to use RewriteMap, which lets you write a RewriteRule where the target is determined by a program or file of your choice instead of being directly specified in the configuration file. As one option, you can specify a text file that contains the old and new URLs, and Apache will handle searching the file for the appropriate line for any given request. Read the documentation (linked above) for the full details. I will mention that this isn't used very often, and I'm not sure how much faster it would be compared to just having 3000 redirects.
Last tip: Apache can be significantly faster if you're able to move the configuration directives (like Redirect) into the server or virtual host configuration file, and disable reading of .htaccess entirely. I would guess that moving 3000 directives from .htaccess into the virtual host configuration could make your server considerably faster. But even moving the directives into the vhost config file probably wouldn't produce as much of a speedup as using a single RewriteRule.
It's never a good idea to make a massive list of Redirects. A better programming technique is to simply redirect the pages without that date variable then have a small PHP snippet that detects if it's missing and redirects to the URL with it included. The long list looks tacky and slows down Apache because it's checking that URL (any every other URL that might not even be affected by this) against each line. If it were only 5 or so, I'd say fine, but 3,000 is a definite NO.
Although I'm not a big fan of this method, a better choice would be to redirect all those URLs normally using a single match statement, redirecting them to the page without the date part, or with a dash or something, then include a small PHP snippet to check if the date is valid and if not, rewrite the path again to the correctly formed URL.
Honestly, if you didn't have that part there before, you don't need it now, and it will probably just confuse the search engines changing the URL for 3,000 posts. You don't really need a date in the URL, a good title is much more meaningful not only to users, but also to search engines, than a bunch of numbers.

Is it possible to handle such URL

http://www.example.com/http://www.test.com
I have tried many different methods using .htaccess with no luck. I need to get that second url coming as parameter. Is it possible to redirect it to index.php and get it as $_SERVER["REQUEST_URI"] or other method? Thanks
UPD: Looks like it is impossible to get whole URL, question marks are not recognized. Ideal example:
127.0.0.1/http://www.test.com/script.php?a=hello&b=world#blabla;par2?%par3
and i need to get in my index.php exact string
www.test.com/script.php?a=hello&b=world#blabla;par2?%par3
It's definitely possible: http://downforeveryoneorjustme.com/http://www.google.com/
As to how, it's been covered on ServerFault already
The Problem:
This is a problem with Apache running on Windows. Apache on Windows does not let you have a colon (:) in your REQUEST URI. This is basically for avoiding URLs like http://www.mysite.com/C:/SomeFile.exe but is actually annoying.
If you use mod_rewrite at the same time it will be skipped.
You and some applications (like wikipedia) uses colon : in URL. so what to do in Apache on Windows?
The Solution:
At the time of writing this answer this bug still persists and there is no absolute solution, BUT there is a trick:
You may change your URL to something like this:
http://www.mysite.com/url/http://www.test.com
in this example http://www.mysite.com/ is your SCRIPT PATH and /url/http://www.test.com is your REQUEST URI.
The problem will be gone if there is a Slash (/) before Colon (:).
You can get the URI but only without the fragment since that is not transmitted to the server. Try this rule:
RewriteRule ^http:/ index.php [L]
Then the requested URI path plus query (so the part from the third / up to the first # or the end of the URI) is available at $_SERVER['REQUEST_URI'].

Resources