I am trying to block request been made to our pagination parameter by multiple robots (evil ones it seems)
Hundreds of these types of requests are showing up:
http://www.ourdomain.com/search.php?q=search+query&page=366100876
Is there a way using regular expressions in .htaccess to send any request that requests a page larger than 1000 or anything more than 4 digits in the parameter 'page' ?
'q' parameter is of course always different.
Thank you.
I derived most of this from a really cool article called Ultimate .htaccess file sample. Very handy.
Redirect 500 /error500.html
RedirectMatch 500 ^.+{1001}.+$
That would send away any long URLs.
LimitRequestBody 102400
That would limit any requests over 100K.
To target the GET variable page specifically:
RedirectMatch 500 ^.+page=[0-9]{4}.+$
I tried this and it works, added it to some other checks I had:
RewriteCond %{QUERY_STRING} page=[0-9]{4} [OR]
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
# Block out any script trying to base64_encode crap to send via URL
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
# Block out any script that includes a <script> tag in URL
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Send all blocked request to homepage with 403 Forbidden error!
A: htaccess: Rewrite, URI /*admin*/* [NC,OR] - you must rename anything with "admin" in it to something else, xyzzy, spot, nancy, fuivan... This also includes names like login.php become jumpjoy.php.
B: When you see a attempted hack - 188 hits from a non-RU, non-CH IP - contact the hosting company and tell them. More often than not you get back "Thank you, we have found and cleaned out the bot.[banned the user]"
I never display "403 forbidden", I redirect to some other page (I have a random selection of 7). I then append htaccess with "deny from $IP". I host local groups and businesses so there is no need for Ivan wanting a cab... he couldn't afford the fare :-/ There are 900+ deny from in my htaccess.
I run a crontab "find $HOME -newer lasttime | mail-me ; touch lasttime". That way if anyone does get in I know within a couple of hours. Also "chmod 444 [all].php".
Related
I am using .htaccess to redirect certain subfolders of my domain, to remove the question mark to improve my URLs.
Currently my URLs are like this:
www.example.com/post/?sometitle
I am trying to remove the question mark, so it is the following URL:
www.example.com/post/sometitle
Currently I have the following code in my .htaccess file:
RewriteCond %{THE_REQUEST} /post/?([^\s&]+) [NC]
RewriteRule ^ /post/%1 [R=302,L,NE]
i am using php GET parameters, i am attempting for when the browser visits example.com/post/sometitle that the page that is currently example.com/post/?sometitle is displayed
In that case you need to the opposite of what you are asking in your question: you need to internally rewrite (not externally "redirect") the request from example.com/post/sometitle to example.com/post/?sometitle.
However, you must have already changed all the URLs in your application to use the new URL format (without the query string). You shouldn't be using .htaccess alone for this.
I also assume that /post is a physical directory and that you are really serving index.php in that directory (mod_dir is issuing an internal subrequest to this file). So, instead of /post/?sometitle, it's really /post/index.php?sometitle?
For example:
RewriteEngine On
# Rewrite /post/sometitle to filesystem path
RewriteRule ^post/([\w-]+)$ /post/index.php?$1 [L]
So, now when you request /post/sometitle the request is internally rewritten and handled by /post/index.php?sometitle instead.
I have assumed that "sometitle" can consist of 1 or more of the characters a-z, A-Z, 0-9, _ and -. Hence the regex [\w-]+.
If this is a new site then you can stop there. However, if you are changing an existing URL structure that has already been indexed by search engines and linked to by external third parties then you'll need to redirect the old URLs to the new. (Just to reiterate, you must have already changed the URL in your application, otherwise users will experience repeated redirects as they navigate your site.)
To implement the redirect, you can add something like the following before the above rewrite:
# Redirect any "stray" requests to the old URL
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ([\w-]+)
RewriteRule ^post/$ /post/%1 [R=302,NE,QSD,L]
The check against the REDIRECT_STATUS environment variable is to ensure we only redirect "direct requests" and thus avoiding a redirect loop.
(Change to 301 only when tested as OK, to avoid caching issues.)
In Summary:
RewriteEngine On
# Redirect any "stray" requests to the old URL
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ([\w-]+)
RewriteRule ^post/$ /post/%1 [R=302,NE,QSD,L]
# Rewrite /post/sometitle to filesystem path
RewriteRule ^post/([\w-]+)$ /post/index.php?$1 [L]
UPDATE: If you have multiple URLs ("folders") that all follow the same pattern, such as /post/<title>, /home/<title> and /build/<title> then you can modify the above to cater for all three, for example:
# Redirect any "stray" requests to the old URL
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ([\w-]+)
RewriteRule ^(post|home|build)/$ /$1/%1 [R=302,NE,QSD,L]
# Rewrite /post/sometitle to filesystem path
RewriteRule ^(post|home|build)/([\w-]+)$ /$1/index.php?$2 [L]
Aside: (With my Webmasters hat on...) This is not really much of an "improvement" to the URL structure. If this is an established website with many backlinks and good SE ranking then you should think twice about making this change as you could see a dip in rankings at least initially.
If only changing from query is your requirement then try with below, we are using QSD flag to discard our query string after our rule matched.
RewriteCond %{QUERY_STRING} ([^\s&]+) [NC]
RewriteRule ^ /post/%1 [R=302,L,NE,QSD]
A search bot is scanning pages on my site with a lot of strange GET params right now. For example ?x?, ?728%10%02, ?%18%9B%D9%DF%05 etc. I don't know where the bot found that urls but it makes my cpu to smoke because a cache system doesn't process urls with GET params.
I have no ability to modify cache system, but i want to redirect requests with GET params to the same url without GET params through .htaccess. But I have some important GET params that shoudn't be redirected. It's ?s=... for site search and utm labels.
In summary I want to redirect the following urls
/some-url?x?
/some-url?728%10%02
/some-url?%18%9B%D9%DF%05
and a lot of others GET params to
/some-url
But keep untouched urls like this:
/some-url?s=searh_term or
/some-url?utm_campaign=my_campaign
If you've a selected number of GET parameters possible, then you can check against them in your htaccess file, and redirect all requests without the allowed parameters.
RewriteEngine On
# check that there is indeed a query string
RewriteCond %{QUERY_STRING} ^.+$
# check that it doesn't start with one of allowed parameters
RewriteCond %{QUERY_STRING} !^(utm_campaign|s|other|parameters|list)= [NC]
RewriteRule ^(.*)$ /$1? [R=301,L]
I've zeroed my problem and I've specific question.
With only the following code in the .httaccess why index2.php gets called if I type in my URL as www.mysite.com/url2 ?
RewriteEngine On
RewriteCond %{REQUEST_URI} (.html|.htm|.feed|.pdf|.raw)$ [NC]
RewriteRule (.*) index2.php [L]
I've also tested it at http://www.regextester.com and should not replace it with index2.php:
In the end I want this rule to skip any URL starting with /url2 or /url2/*.
EDIT: I've made screen recording of this problem: http://screenr.com/BBBN
You have this in your .htaccess:
RewriteEngine On
RewriteCond %{REQUEST_URI} (.html|.htm|.feed|.pdf|.raw)$ [NC]
RewriteRule (.*) index2.php [L]
What it does? it rewrites anything that ends with html, htm, feed , pdf , raw to index2.php. So, if you are getting results as your URL is ends with those extensions, then there are two possible answers:
There is another rewrite rule in an .htaccess in upper directories (or in server config files) that causes the URL to be rewritten.
Your URL actually ends with those extensions. have in mind, what you enter in your address bar, will be edited and rewritten. For example, if you enter www.mysite.com/url2 in your address bar and that file doesn't exist on server, your server will try to load the proper error document. So, if your error document is /404.html, it will be rewritten to index2.php at the end.
Update:
I think it's the case. create a file named 404.php in your document root. Inside your main .htaccess (in your document root), put this:
ErrorDocument 404 /404.php
delete all other ErrorDocument directives.
inside 404.php , put this:
<?php
echo 'From 404.php file';
?>
Logic behind it:
When you have a weird behavior in mod_rewrite, the best solution in my experience is using rewrite log. to enable rewrite log put this in your virtualhost or other server config directives you may choose:
RewriteLogLevel 9
RewriteLog "logs/RewriteLog.log"
be careful: the code above will enable rewrite log and start logging at highest level possible (logging everything). It will decrease your server speed and the log file will become huge very quickly. Do this only on your dev server.
Explanation: When you try to access www.mysite.com/url2, Apache gives your URL to rewrite module. Rewrite module checks if any of RewriteRules applies to your URL. Because you have one rule and it doesn't apply to your URL, it tries to load the normal file. But this file does not exit. So, Apache will do the next step which is showing the proper error message. When you set a custom error file, Apache will run the test against the new address. For example if error document is /404.html, Apache checks whether your rule applies to /404.html or not. Since it does, it will rewrite it.
The point to remember is apache will do this every time there is change in URL, whether the change is made by rewrite module or not!
The rule you list should work as you expect if this is the only rule. Fact is that theory is fun, but apparently it doesn't work as expected. Please note that . will match ANY CHARACTER. If you want to match the full stop/period character, you'll need to escape it. That's why I use \.(html|htm|feed|pdf|raw)$ instead of (.html|.htm|.feed|.pdf|.raw)$ below.
You can add another RewriteCond that simply doesn't match if the url starts with /url2, like below. This might not be a viable solution if there are lots of urls that shouldn't be matched.
RewriteCond %{REQUEST_URI} !^/url2
RewriteCond %{REQUEST_URI} \.(html|htm|feed|pdf|raw)$ [NC]
RewriteRule (.*) index2.php [L]
To get a better understanding of what is happening you can alter the rule to something like this. Now simply enter the urls you dont want to be matched in the url bar and inspect the url bar after the redirect happens. In the url-parameter you now see what url actually triggered this rule to match. This screencast shows you a similar version working with a sneaky rewriterule that is working away on the url.
#A way of finding out what is -actually- matched
RewriteCond %{REQUEST_URI} \.(html|htm|feed|pdf|raw)$ [NC]
RewriteCond %{REQUEST_URI} !/foo
RewriteRule (.*) /foo?url=$1 [R,L]
You can decide to match the %{THE_REQUEST} variable instead. This will always contain the request itself. If something else is rewriting the url, this variable doesn't change, meaning you can use this to overwrite any changes. Make sure the url won't be matching itself. You would get something like below. An example screencast can be found here.
#If it doesn't end on .html/htm/feed etc, this one won't match
RewriteCond %{THE_REQUEST} ^(GET|POST)\ /.*\.(html|htm|feed|pdf|raw)\ HTTP [NC]
RewriteCond %{REQUEST_URI} !^/index2\.php$
RewriteRule (.*) /index2.php [L]
I'm trying to set up a htaccess file that would accomplish the following:
Only allow my website to be viewed if the viewing user is coming from a specific domain (link)
So, for instance. I have a domain called. protect.mydomain.com . I only want people coming from a link on unprotected.mydomain.com to be able to access protect.mydomain.com.
The big outstanding issue I have is that if you get to protect.mydomain.com from unprotected.mydomain.com and click on a link in the protect.mydomain.com that goes to another page under protect.mydomain.com then I get sent back to my redirect because the http_referer is protect.mydomain.com . So to combat that I put in a check to allow the referrer to be protect.mydomain.com as well. It's not working and access is allowed from everywhere. Here is my htaccess file. (All this is under https)
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_REFERER} ^https://(.+\.)*mydomain\.com
RewriteCond %1 !^(protect|unprotected)\.$
RewriteRule ^.*$ https://unprotected.mydomain.com/ [R=301,L]
You are matching your referer against ^https://(.+\.)*mydomain\.com. Which means if some completely other site, say http://stealing_your_images.com/ links to something on protect.mydomain.com, the first condition will fail, thus the request is never redirected to https://unprotected.mydomain.com/. You want to approach it from the other direction, only allow certain referers to pass through, then redirect everything else:
RewriteEngine On
RewriteBase /
# allow these referers to passthrough
RewriteCond %{HTTP_REFERER} ^https://(protect|unprotected)\.mydomain\.com
RewriteRule ^ - [L]
# redirect everything else
RewriteRule ^ https://unprotected.mydomain.com/ [R,L]
I have the following page: www.domain.com/index.php?route=information/contact and I'd like to rewrite it so that it shows up as: www.domain.com/contact, but there's more...
What's important, is that when someone types in www.domain.com/contact, it redirects them to www.domain.com/index.php?route=information/contact, which in turn, is rewritten as www.domain.com/contact.
I appreciate any help! Thanks.
Edit: To clarify
I want users to be able to enter www.domain.com/contact and be redirected to www.domain.com/index.php?route=information/contact.
However once redirected, I'd like a purely aesthetic rewrite so that www.domain.com/index.php?route=information/contact shows up as www.domain.com/contact (the same as what they typed in.)
Is this possible?
Edit: My .htaccess file currently...
Options +FollowSymlinks
# Prevent Directoy listing
Options -Indexes
# Prevent Direct Access to files
<FilesMatch "\.(tpl|ini)">
Order deny,allow
Deny from all
</FilesMatch>
# SEO URL Settings
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)\?*$ index.php?_route_=$1 [L,QSA]
RewriteCond %{QUERY_STRING} ^route=common/home$
RewriteCond %{REQUEST_METHOD} !^POST$
RewriteRule ^index\.php$ http://www.domain.com/? [R=301,L]
### Additional Settings that may need to be enabled for some servers
### Uncomment the commands by removing the # sign in front of it.
### If you get an "Internal Server Error 500" after enabling, then restore the # as this means your host
doesn't allow that.
# 1. If your cart only allows you to add one item at a time, it is possible register_globals is on. This
may work to disable it:
# php_flag register_globals off
Try these rules in your .htaccess file:
Options +FollowSymlinks -MultiViews
RewriteEngine on
RewriteCond %{THE_REQUEST} ^GET\s/+index\.php [NC]
RewriteCond %{QUERY_STRING} ^route=information [NC]
RewriteRule . /warranty? [L,NC,R=301]
RewriteRule ^warranty$ /index.php?route=information/contact [L,NC]
L will make sure that user's URL in browser doesn't change and redirection happens internally.
Your question is extremely unclear, and I suspect that inexperience is to blame.
With the following rule:
RewriteRule /?(.*) index.php?route=information/$1
the location bar will read "/contact" but index.php will be invoked via an internal rewrite.
With a small modification:
RewriteRule /?(.*) index.php?route=information/$1 [R]
the location bar will read "/index.php?route=information/contact" and index.php will be invoked, after the redirect.
As always, the rule should follow the appropriate RewriteCond so as to avoid rewriting if an actual file is requested.
AFAIK, you can't make the address bar show a different address than the one that the page was loaded from. If you want the user to see www.domain.com/contact in the address bar when viewing the page, you need to make the server actually return the page content (not a redirect) when that URL is requested.
I think you might be misunderstanding URL rewriting: it's not for changing what the user sees in the address bar, it's for changing what the server sees when a request arrives from the user. If you create a rewrite rule that changes /foo to /bar, then when the user types /foo in their browser, the server will treat it as a request for /bar.
What you want, I think, is that when the user types www.domain.com/contact in their browser, the server should treat it as a request for www.domain.com/index.php?route=information/contact, but the browser should still show the pretty URL that the user typed. The way to do that is to simply rewrite /contact to /index.php?route=information/contact on the server. No redirect is needed; the user simply requests the pretty URL, and the server handles the request based on the equivalent ugly one and sends back the resulting page.