Avoiding sub-directory request rewrites with Apache mod_rewrite - security

I want to a rewrite rule such that if a user goes to the URL example.org/stuff/junk.jpg the rule will process and end up at re-writer.php but if the user goes to example.org/stuff/hackingisawesome/junk.jpg the rule will not be triggered and they will get a standard 404 (or a page, if one should exist).
I can't tell, based on the environmental variables, if this is possible without some fairly fancy regex.
So does anyone know of either:
a) a way this is already built into the mod_rewrite syntax, or
b) a good, reliable way of handling this with regular expressions?
Links to documentation or tutorials welcome. I'm just feeling clueless on where to go next.
Oh, and I can imagine the ways I could simply have the script that the rule redirects to simply deliver the 404, but I'd rather only use the rule when the conditions exist.

Try this:
RewriteRule ^stuff/[^/]+$ re-writer.php
This will rewrite all requests to /stuff/… with only one additional path segment to re-writer.php.

Related

htaccess redirect not working for long url

How do I redirect the following long link:
http://www.vbpmonitor.com/index.php?option=com_content&view=article&id=24&utm_source=MagnetMail&utm_medium=email&utm_term=asmith#panaceainc.com&utm_content=EVVWP040716&utm_campaign=White%20Paper%3A%20Optimizing%20VBM%20Quality%20Tiering%20for%20Physicians
to
http://www.vbpmonitor.com/optimizing-vbm-quality-tiering-for-physicians
Redirect 301 /index.php?option=com_content&view=article&id=24&utm_source=MagnetMail&utm_medium=email&utm_term=asmith#panaceainc.com&utm_content=EVVWP040716&utm_campaign=White%20Paper%3A%20Optimizing%20VBM%20Quality%20Tiering%20for%20Physicians http://www.vbpmonitor.com/optimizing-vbm-quality-tiering-for-physicians
As said above in the comments I suspect that you have a glitch in your logic here and that in reality you want to redirection to work the other way 'round. Redirecting from the long to the search engine friendly URL simply does not make any sense. So:
Using a Redirect rule you could try that instead:
Redirect 301 /optimizing-vbm-quality-tiering-for-physicians /index.php?option=com_content&view=article&id=24&utm_source=MagnetMail&utm_medium=email&utm_term=asmith#panaceainc.com&utm_content=EVVWP040716&utm_campaign=White%20Paper%3A%20Optimizing%20VBM%20Quality%20Tiering%20for%20Physicians
This will redirect an incoming request to the short URL to the actually existing long URL. That is the usual scenario.
If however you really want to redirect that short URL to the long version, then you cannot do that with a Redirect rule. This might for example be the case if you accidentally sent out that long URL and have a working redirection setup for the short version. Unfortunately you do not explain anything about that in your question or comments, so I can only guess here.
You'd have to use the more flexible rewriting module and use a combination of RewriteCond and RewriteRule. That allows to "cut out" specific patterns of request URLs and to "redesign" how the request should look like after the rewriting.
This would be a simple example that applies two conditions to rewriting the request for file index.php to the long URL:
RewriteEngine on
RewriteCond %{QUERY_STRING} view=article
RewriteCond %{QUERY_STRING} id=24
RewriteRule ^/?index\.php$ /optimizing-vbm-quality-tiering-for-physicians [L,R=301]
Note: this version should work both in the http servers host configuration and also in those .htaccess style files. Where you always should prefer the first option if you have access.
As said above, I can only guess here with the sparse information you provided. I picked two out of many request arguments, since those appear to be the ones best suited as distinct identifiers. But you may have to tweak things. Note that per default RewriteConds are combined by a logical AND, so they both have to resolve to something truish.
For more precise details about this stuff I would like to point you to the official documentation of those modules again. The documentation is extremely precise, well written and comes with good examples. I would always prefer the information there to snippets you find somewhere in the internet or partial answers to questions...
http://httpd.apache.org/docs/current/mod/mod_alias.html#redirect
http://httpd.apache.org/docs/current/mod/mod_rewrite.html

Hiding URL parameters with .htaccess while still making them available to $_GET

I have a script on my site ('write-review.php') that takes an optional url parameter 'site'. So server-side requests could be:
/reviews/write-review.php
or
/reviews/write-review.php?site=foo
I'm using .htaccess to create search engine friendly URLs and hide my php extensions, so that requests to this script are respectively rewritten as
/reviews/write-a-review/
or
/reviews/write-a-review/foo
I think having 'foo' in the URL may cause confusion for my users, so I'm trying to write an htaccess rewrite rule that removes 'foo' while still passing this variable to my script. Thus, a request to /reviews/write-a-review/foo would be rewritten as /reviews/write-a-review/ but write-review.php would be passed 'foo'.
The rewrite rule I currently have in place is:
RewriteRule ^reviews/write-a-review/?$ reviews/write-review.php
RewriteRule ^reviews/write-a-review/([a-zA-Z0-9_-]+)/?$ reviews/write-review.php?site=$1
Is it even possible to do what I've described above? There are MANY questions on Stack Overflow that are similar to this, and I've read through at least a dozen, but I haven't found a way to do this specifically.
Any help is really appreciated.
Thanks,
Chris
Is it even possible to do what I've described above?
No. To alter the actual URL the user inputs, you'd have to do a header redirect, during which you would lose foo.
This is not possible, except maybe by using ridiculous technical tricks like storing foo in a session variable or something. I would not recommend going that route.

301 redirect question?

Is this qood example of redirection of page to another domain page:
RewriteCond %{HTTP_HOST} ^dejan.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.dejan.com.au$
RewriteRule ^seo_news_blog_spam\.html$ "http\:\/\/dejanseo\.com\.au\/blog\-spam\/" [R=301,L]
or good old works too:
301 redirect seo_news_blog_spam.html http://dejanseo.com.au/blog/spam/
and whats the difference?
Presumably, the rules are functionally equivalent (well, assuming that http://dejanseo.com.au/blog/spam/ was supposed to be http://dejanseo.com.au/blog-spam/ like the first one redirects to, and the only host pointing at that location is dejanseo.com.au with or without the www).
The first example uses directives from mod_rewrite, whereas the second one uses some from mod_alias. I imagine that the preferred option is the second one for this particular case, if not only because it's a bit simpler (there's also marginal additional overhead involved in creating the regular expressions being used by mod_rewrite, but that's very minor):
Redirect 301 seo_news_blog_spam.html http://dejanseo.com.au/blog-spam/
However, I suspect the reason that you have the first one is that it was created using CPanel (based on the unnecessary escapes in the replacement that appeared before in another user's question where it was indicated CPanel was the culprit). They've gone with the mod_rewrite option because it provides conditional flexibility that the Redirect directive does not, and I assume this flexibility is reflected somewhat in whatever interface is used to create these rules.
You'll note that there is a condition on whether or not to redirect based upon your host name in the first example, identified by the RewriteCond. This allows for you to perform more powerful redirects that are based on more than just the request path. Note that mod_rewrite also allows for internal redirects invisible to the user, which mod_alias is not intended for, but that's not the capacity it's being used in here.
As a final aside, the host names in your RewriteCond statements should technically have their dots escaped, since the . character has special meaning in regular expressions. You could also combine them, change them to direct string comparisons, or remove them altogether (since I imagine they don't do anything useful here).
Unbeliavable, the problem was that the synthax wasn't correct, so instead of:
redirect 301 seo_news_blog_spam.html http://dejanseo.com.au/blog/spam/
it should look like this:
Redirect 301 seo_news_blog_spam.html http://dejanseo.com.au/blog/spam/
One, first big letter was the source of all troubles, what a waste of time :D
it works now as it supposed to!
Thanks to everyone who participated, issue solved.

Having huge redirect list in .htaccess a Problem?

I want to redirect every post 301 redirect, but I have over 3000 posts.
If I list
Redirect permanent /blog/2010/07/post.html http://new.blog.com/2010/07/23/post/
Redirect permanent /blog/2010/07/post1.html http://new.blog.com/2010/07/24/post1/
Redirect permanent /blog/2010/07/post2.html http://new.blog.com/2010/07/25/post2/
Redirect permanent /blog/2010/07/post3.html http://new.blog.com/2010/07/26/post3/
Redirect per......
for over 3000 url redirect command in .htaccess would this eat my server resource or cause some problem? Im not sure how .htaccess work but if the server is looking at these lists each time user requests for page, I would guess it will be a resource hog.
I can't use RedirectMatch because I added date variable in my new url. Do you have any other suggestions redirecting these posts? Or am I just fine?
Thanks!
I am not an Apache expert, so I cannot speak to whether or not having 3,000 redirects in .htaccess is a problem (though my gut tells me it probably is a bad idea). However, as a simpler solution to your problem, why not use mod_rewrite to do your redirects?
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/$3/ [R=permanent]
This uses a regex to match old URLs and rewrite them to new ones. The [R=permanent] instructs mod_rewrite to issue a 301 with the new URL instead of silently rewriting the request internally.
In your example, it looks like you've added the day of the post to the URL, which does not exist in the old URL. Since you obviously cannot use a regexp to divine the day an arbitrary post was made, this method may not work for you. If you can drop the day from the URL, then you're good to go.
Edit: The first time I read your question, I missed the last paragraph. ("I can't use RedirectMatch because I added date variable in my new url.") In this case, you can use mod_rewrite's RewriteMap to lookup the day component of a post.
You have two options:
Use a hashmap to perform fast lookups in a static file. This means all your old URLs will work, but any new posts cannot be accessed using the old URL scheme.
Use a script to grab the day.
In option one, create a file called posts.txt and put:
/yyyy/mm/pppp dd
...for each post where yyyy is the year of the post, mm is the month, and pppp is the post name (without the .html).
When you're done, run:
$ httxt2dbm -i posts.txt -o posts.map
Then we add to to the server/virtual server config: (Note the path is a filesystem path, not a URL.)
RewriteMap postday dbm:/path/to/file/posts.map
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/${postday:$1/$2/$3}/$3/ [R=permanent]
In option two, use pgm:/path/to/script/lookup.whatever as your RewriteMap. See the mod_rewrite documentation for more info about using a script.
Doing the lookup in mod_rewrite is better than just redirecting to a script which looks up the date and then redirects to the final destination because you should never redirect more than once. Issuing a 301 or 302 incurs a round trip cost, which increases the latency of your page load time.
If you have some way in code to determine the day of a post, you can generate the rewrite on the fly. You can setup a mod_rewrite pattern, something like .html and set up a front controller pattern to calculate the new url from the old and issue the 301 header.
With php as an example:
$_SERVER['REQUEST_URI']
will contain the requested url and
header("Location: http://new.blog.com/$y/$m/$d/$title/",TRUE,301);
will send a redirect.
That's... a lot of redirects. But the first thing I would tell you, and probably the only thing I can tell you without qualification, is that you should run some tests and see what the access times for your blog are like, and also look at the server's CPU and memory usage while you're doing it. If they're fairly low even with that giant list of redirects, you're okay as long as your blog doesn't experience a sudden increase in traffic. (I strongly suspect the 3000 rewrites will be slowing Apache down a lot, though)
That being said, I would second josh's suggestion of replacing the redirects with something dynamic. Like animuson said, if you're willing to drop the day from the URL, it'll be easy to set up a RewriteRule directive to handle the redirection. Otherwise, you could do it with a PHP script, or generally some code in whatever scripting language you (can) use. If you're using one of the popular blog engines, it probably contains code to do this already. Basically you could do something like
RewriteRule .* /blog/index.php
and just let the PHP script sort out which post was requested. It has access to the database so it'll be able to do that, and then you can either display the post directly from the PHP script, or to recover your original redirection behavior, you can send a Location header with the correct URL.
An alternative would be to use RewriteMap, which lets you write a RewriteRule where the target is determined by a program or file of your choice instead of being directly specified in the configuration file. As one option, you can specify a text file that contains the old and new URLs, and Apache will handle searching the file for the appropriate line for any given request. Read the documentation (linked above) for the full details. I will mention that this isn't used very often, and I'm not sure how much faster it would be compared to just having 3000 redirects.
Last tip: Apache can be significantly faster if you're able to move the configuration directives (like Redirect) into the server or virtual host configuration file, and disable reading of .htaccess entirely. I would guess that moving 3000 directives from .htaccess into the virtual host configuration could make your server considerably faster. But even moving the directives into the vhost config file probably wouldn't produce as much of a speedup as using a single RewriteRule.
It's never a good idea to make a massive list of Redirects. A better programming technique is to simply redirect the pages without that date variable then have a small PHP snippet that detects if it's missing and redirects to the URL with it included. The long list looks tacky and slows down Apache because it's checking that URL (any every other URL that might not even be affected by this) against each line. If it were only 5 or so, I'd say fine, but 3,000 is a definite NO.
Although I'm not a big fan of this method, a better choice would be to redirect all those URLs normally using a single match statement, redirecting them to the page without the date part, or with a dash or something, then include a small PHP snippet to check if the date is valid and if not, rewrite the path again to the correctly formed URL.
Honestly, if you didn't have that part there before, you don't need it now, and it will probably just confuse the search engines changing the URL for 3,000 posts. You don't really need a date in the URL, a good title is much more meaningful not only to users, but also to search engines, than a bunch of numbers.

.htaccess Rewrite Question

I have a number of pages in my site, as one would expect.
For example:
index.php
submit.php
view.php?id=blah
I want these rewritten like
index/
submit/
view/blah
Whats the best way of doing this?
The ways of handling it through .htaccess Rewrite can generate a bit of a headache. It seems like a basic answer, but unless you're up on your regular expressions, you're going to be lost.
There's a few ways of handling it, however. I'm assuming that you only have index.php, submit.php, and view.php with an id associated.
RewriteEngine On
RewriteRule ^(index|submit|view)/(\d+)$ /$1.php?id=$2
RewriteRule ^(index|submit|view)/(\d+)$ /$1.php
Here's how it works: You tell .htaccess to turn on the Rewrite Engine. Step 2, you give the site the parameter that tells it how it's done. The parameter in this case reads: At the beginning of the url, after the domain name, check for index, submit, or view. If those exists, it'll look for the id. If both those exists, it will return the value into PHP as /(index, submit, or view)?id=$id.
The second one is in case the ID isn't viable.
This is a simple way of handling it. A more complex way of handling it would be...
RewriteEngine On
RewriteRule ^([a-z]+)/?(\d+)?$ /$1.php?id=$2
This will load whatever is written in regular alphabetical characters of upper and lower case letters only, use that as the filename, then detects if id is even necessary--it will load without.
You should be sure to include some safeguards on your $_GET lines to return errors if the names are erroneous or doesn't return anything of worth.
You should play around with it, research Regular Expressions over a pot of coffee and something alcoholic (I believe that regexp is the #1 cause of alcoholism in modern programmers, but I could be wrong ;-P ...) til you find a scheme that fits comfortably into your system.
As a side-note, you can have as many RewriteRules as you need, but they always get processed from the top one first. I realize this sounds like common sense, but it's important to know when debugging.

Resources