.htaccess Blocking Useragents AND IPs AT THE SAME TIME not working - .htaccess

How do I block Useragents and IPs AT THE SAME TIME?
Currently using this
SetEnvIfNoCase User-Agent "Chrome/80" good_ua
SetEnvIfNoCase User-Agent "Chrome/81" good_ua
SetEnvIfNoCase User-Agent "Chrome/82" good_ua
SetEnvIfNoCase User-Agent "Chrome/83" good_ua
order deny,allow
deny from all
allow from env=good_ua
That white lists those UAs. But when I try adding this code
deny from 1.1.1.1
deny from 1.0.0.1
only blocking UA works, I can not make them both work at the time. I need to block IPs and allow certain UAs.

Related

How to block yandex

I'm trying to block yandex from my site. I've tried the solutions posted in other threads but they are not working so I'm wondering if I am doing something wrong?
The user-agent string is:
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots
I have tried the following (one at a time). RewriteEngine is on
SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block
Order Allow,Deny
Deny from env=bad_bot_block
Allow from ALL
SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block
<RequireAll>
Require all granted
Require not env bad_bot_block
</RequireAll>
Can anyone see a reason one of the above won't work or have any other suggestions?
In case anyone else has this problem, the following worked for me:
RewriteCond %{HTTP_USER_AGENT} ^.*(yandex).*$ [NC]
RewriteRule .* - [F,L]
SetEnvIfNoCase User-Agent "^yandex.com$" bad_bot_block
With the start and end-of-string anchors in the regex you are bascially checking that the User-Agent string is exactly equal to "yandex.com" (except that the . is any character), which clearly does not match the stated user-agent string.
You need to check that the User-Agent header contains "YandexBot" (or "yandex.com"). You can also use a case-sensitive match here, since the real Yandex bot does not vary the case.
For example, try the following instead:
SetEnvIf User-Agent "YandexBot" bad_bot_block
Consider using the BrowserMatch directive instead, which is a shortcut for SetEnvIf User-Agent.
If you are on Apache 2.4 then you should be using the Require (second) variant of your two code blocks. Order, Deny and Allow directives are Apache 2.2 and formerly deprecated on Apache 2.4.
However, consider using using robots.txt instead to block crawling in the first place. Yandex supposedly supports robots.txt.

Use .htaccess to Block Yandex, Baidu, and MJ12bot

I am so tired of Yandex, Baidu, and MJ12bot eating all my bandwidth. None of them even care about the useless robots.txt file.
I would also like to block any user-agent with the word "spider" in it.
I have been using the following code in my .htaccess file to look at the user-agent string and block them that way but it seems they still get through. Is this code correct? Is there a better way?
BrowserMatchNoCase "baidu" bots
BrowserMatchNoCase "yandex" bots
BrowserMatchNoCase "spider" bots
BrowserMatchNoCase "mj12bot" bots
Order Allow,Deny
Allow from ALL
Deny from env=bots
To block user agents, you can use :
SetEnvIfNoCase User-agent (yandex|baidu|foobar) not-allowed=1
Order Allow,Deny
Allow from ALL
Deny from env=not-allowed

Apache Configuration: How to deny from env=protected but allow from env=errorhandler?

I've written this .htaccess configuration, which works:
SetEnvIfNoCase Host ^staging\. protected
SetEnvIfNoCase Host ^dev\. protected
AuthType basic
AuthName "Protected"
AuthUserFile "/path/to/.htpasswd"
Order deny,allow
deny from env=protected
Require valid-user
Satisfy any
It checks if hostname starts with "staging." or "dev." and set the environment variable "protected". If it is set, the browser will ask for the password. If not, no access restriction takes affect. This is the expected behaviour.
Unfortunately the CMS I am working with does an own HTTP request from localhost to fetch the error page internally, if a 404-error occures. But the server itself is not authenticated, so this fails.
I was able to set another environment variable like this:
SetEnvIfNoCase Request_URI "404$" errorhandler
But it has no effect, even if I change the last block like this
Order allow,deny
allow from env=errorhandler
deny from env=protected
Require valid-user
Satisfy any
What do I need to change, to ask for password if env=proctected is set, but skip the password, if env=errorhandler is set, too? Thanks.
Try these directives:
SetEnvIfNoCase Host ^(dev|staging)\. protected
SetEnvIfNoCase Request_URI 404 !protected
AuthType basic
AuthName "Protected"
AuthUserFile "/path/to/.htpasswd"
Require valid-user
Satisfy any
Order allow,deny
allow from all
deny from env=protected

Restricting access from only 3 domains in my htaccess fle - SetEnvIf Referer

Hi so i have the below working:
SetEnvIf Referer "^http://sub\.site1\.com/yvvl/Portal/" local_referral
SetEnvIf Referer "^http://sub\.site2\.com/yvvl/Portal/" auth_referral
SetEnvIf Referer "^http://sub\.site3\.com/yvvl/Portal/" authC_referral
Order Deny,Allow
Deny from all
Allow from env=local_referral
Allow from env=auth_referral
Allow from env=authC_referral
what i dont know how to do is wildcard it so anything from those 3 domains will be accepted my preg is not good at all
Thanks
Just remove everything after the .com:
SetEnvIf Referer "^http://sub\.site1\.com/" local_referral
SetEnvIf Referer "^http://sub\.site2\.com/" auth_referral
SetEnvIf Referer "^http://sub\.site3\.com/" authC_referral
Since there's no fence-post for the end of the referer (indicated by the $ character) that will match anything that starts with http://sub.site1.com/ etc.

robots.txt htaccess block google

In my .htaccess file I have:
<Files ~ "\.(tpl|txt)$">
Order deny,allow
Deny from all
</Files>
This denies any text file from being read, but the Google search engine gives me the following error:
robots.txt Status
http://mysite/robots.txt
18 minutes ago 302 (Moved temporarily)
How can I modify .htaccess to permit Google to read robots.txt while prohibiting everyone else from accessing text files?
Use this:
<Files ~ "\.(tpl|txt)$">
Order deny,allow
Deny from all
SetEnvIfNoCase User-Agent "Googlebot" goodbot
Allow from env=goodbot
</Files>

Resources