I have a example .htaccess file (found here) with bad robots to block. Here's a small sample code block from that file:
#bad bots start
#programmed by tab-studio.com public version 2017.12
#1 new rule every 500 entries
RewriteCond %{HTTP_USER_AGENT} \
12soso|\
192\.comagent|\
1noonbot|\
zuibot|\
zyborg|\
zyte\
[NC]
RewriteRule .* - [F]
#bad bots end
Basically throwing a 403 on a URL match. I checked this post to see how I can convert these .htaccess rules to a web.config rewrite rule via IIS.
When I import the rules however, I get an unexpected result where no rules seem to be converted, see image below. What am I doing wrong?
It's certainly choking on the \ and the carriage return. If you try the following you'll see it should import properly:
#bad bots start
#programmed by tab-studio.com public version 2017.12
#1 new rule every 500 entries
RewriteCond %{HTTP_USER_AGENT} 12soso|192\.comagent|1noonbot|zuibot|zyborg|zyte
[NC]
RewriteRule .* - [F]
#bad bots end
Having said that, you might consider looking at using Request Filtering & Scan Headers instead: https://learn.microsoft.com/en-us/iis/configuration/system.webserver/security/requestfiltering/filteringrules/filteringrule/scanheaders/
Related
We have tried adding the below hotlink protection inorder to save the bandwidth.
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^https://(www\.)?domain [NC]
RewriteCond %{HTTP_REFERER} !^https://(www\.)?domain.*$ [NC]
RewriteRule \.(gif|GIF|jpg|JPG|PNG|png|jpeg|JPEG|mp4|MP4|mkv|MKV|webm|WEBM|ico|ICO)$ - [F]
This is working perfectly. Now, we want to exclude hotlink protection for the URL admin/thumbs (domain.tld/admin/thumbs/image.jpg) should be excluded from the hotlink protection.
We tried adding the below code however its not working. We searched on stackoverflow and multiple forums however none were helped us.
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^https://(www\.)?domain [NC]
RewriteCond %{HTTP_REFERER} !^https://(www\.)?domain.*$ [NC]
RewriteCond %{REQUEST_URI} !/admin/thumbs$
RewriteRule \.(gif|GIF|jpg|JPG|PNG|png|jpeg|JPEG|mp4|MP4|mkv|MKV|webm|WEBM|ico|ICO)$ - [F]
Any help would be appreciated.
RewriteCond %{REQUEST_URI} !/admin/thumbs$
This creates an exception for any URL that ends with /admin/thumbs, whereas it would seem you want to create an exception for any file in the /admin/thumbs subirectory, ie. any URL that starts /admin/thumbs.
The suggestion !^admin/thumbs/? in comments is incorrect, since the REQUEST_URI server variable always starts with a slash so the condition will always be successful and the request is potentially blocked.
You should use the CondPattern !^/admin/thumbs($|/) instead to exclude requests for /admin/thumbs, /admin/thumbs/ and /admin/thumbs/<anything>, but not /admin/thumbsomething. For example:
RewriteCond %{REQUEST_URI} !^/admin/thumbs($|/)
Your existing rule can be further simplified/refined since the existing conditions that check the HTTP_REFERER are "the same", but also match too much. And the mixed case RewriteRule pattern can be flattened by using the NC (nocase) flag instead.
For example, the complete rule would become:
# Hotlink protection for images, except those in "/admin/thumbs/..."
RewriteCond %{HTTP_REFERER} !^https://(www\.)?example\.com($|/) [NC]
RewriteCond %{REQUEST_URI} !^/admin/thumbs($|/)
RewriteRule \.(gif|jpg|png|jpeg|mp4|mkv|webm|ico)$ - [NC,F]
Note that this also blocks an empty Referer header. This includes direct requests (anyone typing the URL directly into the browser's address bar) and any user that has suppressed the Referer in their browser (which some users do for increased privacy).
Alternative solution with additional .htaccess file
Alternatively, you could create an additional .htaccess in the /admin/thumbs/ subdirectory and simply disable the rewrite engine. For example:
RewriteEngine Off
This overrides and prevents the hotlink-protection directives in the parent config from being processed when anything within this subdirectory is requested.
I've zeroed my problem and I've specific question.
With only the following code in the .httaccess why index2.php gets called if I type in my URL as www.mysite.com/url2 ?
RewriteEngine On
RewriteCond %{REQUEST_URI} (.html|.htm|.feed|.pdf|.raw)$ [NC]
RewriteRule (.*) index2.php [L]
I've also tested it at http://www.regextester.com and should not replace it with index2.php:
In the end I want this rule to skip any URL starting with /url2 or /url2/*.
EDIT: I've made screen recording of this problem: http://screenr.com/BBBN
You have this in your .htaccess:
RewriteEngine On
RewriteCond %{REQUEST_URI} (.html|.htm|.feed|.pdf|.raw)$ [NC]
RewriteRule (.*) index2.php [L]
What it does? it rewrites anything that ends with html, htm, feed , pdf , raw to index2.php. So, if you are getting results as your URL is ends with those extensions, then there are two possible answers:
There is another rewrite rule in an .htaccess in upper directories (or in server config files) that causes the URL to be rewritten.
Your URL actually ends with those extensions. have in mind, what you enter in your address bar, will be edited and rewritten. For example, if you enter www.mysite.com/url2 in your address bar and that file doesn't exist on server, your server will try to load the proper error document. So, if your error document is /404.html, it will be rewritten to index2.php at the end.
Update:
I think it's the case. create a file named 404.php in your document root. Inside your main .htaccess (in your document root), put this:
ErrorDocument 404 /404.php
delete all other ErrorDocument directives.
inside 404.php , put this:
<?php
echo 'From 404.php file';
?>
Logic behind it:
When you have a weird behavior in mod_rewrite, the best solution in my experience is using rewrite log. to enable rewrite log put this in your virtualhost or other server config directives you may choose:
RewriteLogLevel 9
RewriteLog "logs/RewriteLog.log"
be careful: the code above will enable rewrite log and start logging at highest level possible (logging everything). It will decrease your server speed and the log file will become huge very quickly. Do this only on your dev server.
Explanation: When you try to access www.mysite.com/url2, Apache gives your URL to rewrite module. Rewrite module checks if any of RewriteRules applies to your URL. Because you have one rule and it doesn't apply to your URL, it tries to load the normal file. But this file does not exit. So, Apache will do the next step which is showing the proper error message. When you set a custom error file, Apache will run the test against the new address. For example if error document is /404.html, Apache checks whether your rule applies to /404.html or not. Since it does, it will rewrite it.
The point to remember is apache will do this every time there is change in URL, whether the change is made by rewrite module or not!
The rule you list should work as you expect if this is the only rule. Fact is that theory is fun, but apparently it doesn't work as expected. Please note that . will match ANY CHARACTER. If you want to match the full stop/period character, you'll need to escape it. That's why I use \.(html|htm|feed|pdf|raw)$ instead of (.html|.htm|.feed|.pdf|.raw)$ below.
You can add another RewriteCond that simply doesn't match if the url starts with /url2, like below. This might not be a viable solution if there are lots of urls that shouldn't be matched.
RewriteCond %{REQUEST_URI} !^/url2
RewriteCond %{REQUEST_URI} \.(html|htm|feed|pdf|raw)$ [NC]
RewriteRule (.*) index2.php [L]
To get a better understanding of what is happening you can alter the rule to something like this. Now simply enter the urls you dont want to be matched in the url bar and inspect the url bar after the redirect happens. In the url-parameter you now see what url actually triggered this rule to match. This screencast shows you a similar version working with a sneaky rewriterule that is working away on the url.
#A way of finding out what is -actually- matched
RewriteCond %{REQUEST_URI} \.(html|htm|feed|pdf|raw)$ [NC]
RewriteCond %{REQUEST_URI} !/foo
RewriteRule (.*) /foo?url=$1 [R,L]
You can decide to match the %{THE_REQUEST} variable instead. This will always contain the request itself. If something else is rewriting the url, this variable doesn't change, meaning you can use this to overwrite any changes. Make sure the url won't be matching itself. You would get something like below. An example screencast can be found here.
#If it doesn't end on .html/htm/feed etc, this one won't match
RewriteCond %{THE_REQUEST} ^(GET|POST)\ /.*\.(html|htm|feed|pdf|raw)\ HTTP [NC]
RewriteCond %{REQUEST_URI} !^/index2\.php$
RewriteRule (.*) /index2.php [L]
I'm pretty green when it comes to rewriting URL's with htaccess, though I can do the basics.
In this case I have a series of query strings returned from a vendor to one of my scripts. That's all fine and dandy, except for when it includes a URL with 'http' or 'https' in it. When it detects that, Apache throws a 403 Forbidden error. I thought that I could craft a RewriteRule that would rewrite the 'http' portion of the query string into something could get past Apache's rules.
This will eventually be installed on a client's machine so I can't change any server settings.
An example URL would be:
http://mysite.com/gocardless_confirm.php?resource_uri=https%3A%2F%2Fsandbox.gocardless.com%2Fapi%2Fv1%2Fbills%2F07F56ERHRT
Here's the settings I was trying to use:
RewriteCond ${QUERY_STRING} ^(.*)$resource_uri=http^(.*)$
RewriteCond %{REQUEST_URI} ^gocardless_confirm.php
RewriteRule ^(.*)$ gocardless_confirm.php?$1resource_uri=hllp$2 [L]
How can I rewrite this portion so I can simply get to the script?
Thanks!
Try:
RewriteCond %{QUERY_STRING} ^(.*)resource_uri=http(.*)$
RewriteRule ^gocardless_confirm.php$ gocardless_confirm.php?%1resource_uri=hllp%2 [L]
Subpattern in RewriteCond are referenced %# in the RewriteRule.
^ is the begin of the hole string the pattern is matched against.
$ is matches its end.
I'm trying to make a facebook like user profile page using .htaccess
ie. http://example.com/<userid > will actually call http://example.com/sites/<userid >/<sub_page >
using thefollowing code:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]+)/?(.*)$ sites/$1/$2 [NC,L]
Options -Indexes
ErrorDocument 404 /missing.html
ErrorDocument 403 /forbidden.html
It works perfectly well when the folder sites/<userid> exists but when it doesn't it throws 500 internal error. I've scoured the internet but couldn't go beyond this. Can someone please help with this.
Can i have a text file with all existing folders and somehow use that to generate a list in the .htaccess?
Thanks in advance!! :)
Your rewrite rules are looping. The regular expression ^([^/]+)/?(.*)$ is matching the rewritten URI sites/something/something and it's looping indefinitely (so your URI starts looking like: sites/sites/sites/sites/sites/sites/sites/sites/sites/something/something etc).
Either tweak your regular expression so that it doesn't liberally match what's after the first ([^/]+)/? expression or add an additional condition:
RewriteCond %{REQUEST_URI} !^/sites/
We've just finished a major re-structuring our website and I'm trying to write a set of redirect rules of varying specificity. The redirects are half working:
They correctly re-route old URLs
They incorrectly also allow and re-route URLs that include text not specified in the
ReWriteCond statements (when instead I would expect to see a "Not Found" error message displayed in the browser.)
Statements in the .htaccess file (located in the root of the web site) include:
RewriteBase /
RewriteCond %{REQUEST_URI} /company/company-history.html
RewriteRule (.*)$ http://www.technofrolics.com/about/index.html
RewriteCond %{REQUEST_URI} /press
RewriteRule (.*)$ http://www.technofrolics.com/gallery/index.html
The above correctly executes the desired redirect
but also works when I enter the following after the domain name:
/youcanenteranytext/hereatall/anditstillworks/press
In other words, any text following the domain and preceding the conditional string seems to be allowed/ignored. Any advise on how to restrict the condition or rewrite rule to prevent this would be much appreciated!
Thanks, Margarita
You need to including bounds in your regular expressions when you try to match against %{REQUEST_URI}, the ^ indicates the beginning of the match.
RewriteCond %{REQUEST_URI} ^/company/company-history\.html
Will make it so requests for /garbage/stuff/comapny/company-history.html won't match. And likewise:
RewriteCond %{REQUEST_URI} ^/press
Will make it so requests for /youcanenteranytext/hereatall/anditstillworks/press won't match. You can additionally employ the $ in your regular expression to indicate the end of the match, so something like this:
RewriteCond %{REQUEST_URI} ^/press$
Will ONLY match requests for /press and not /something/press or /press/somethingelse or /press/.