Set Status Code for Deny from - .htaccess

I want to block out user agents and tell them that the file doesn't even exist (Status Code: 404)
My current htaccess (Status Code: 403):
RewriteEngine on
SetEnvIfNoCase User-Agent UpdaterV* updater
Order Deny,Allow
Deny from All
Allow from env=updater
How can I fix this?

You can use mod-rewrite to get 404 status code :
RewriteEngine on
#--if user agent is not UpdaterV--#
RewriteCond %{HTTP_USER_AGENT} !UpdaterV [NC]
#--404 the request--#
RewriteRule ^ - [R=404,L]

Related

htaccess redirect empty uri instead of directory listing

If I just do Options -Indexes
localhost/subdir/ would give me 500 error.
But I want to redirect it to https://github.com/miranda-zhang/cloud-computing-schema
Something like
RewriteRule ^$ https://github.com/miranda-zhang/cloud-computing-schema [R=308,L]
Currently, this doesn't work on my local server.
I have to do
RewriteRule ^home$ https://github.com/miranda-zhang/cloud-computing-schema [R=308,L]
And use the url localhost/subdir/home
Also Options -Indexes seem to make the following stop working.
ErrorDocument 406 https://miranda-zhang.github.io/cloud-computing-schema/v1.0/406.html
RewriteRule ^.*$ https://miranda-zhang.github.io/cloud-computing-schema/v1.0/406.html [R=406,L]
Full original .htaccess file
I also tried some methods in Problem detecting empty REQUEST_URI with Apache mod_rewrite
The following doesn't seem to work, or maybe I'm missing some other config.
RewriteCond %{HTTP_HOST} ^localhost/coocon
Or
RewriteCond %{REQUEST_URI} "^/$"
Seems something else was wrong, it is a 403 error now.
Forbidden
You don't have permission to access /cocoon/ on this server.
Cannot serve directory /var/www/html/cocoon/: No matching
DirectoryIndex (index.html,index.cgi,index.pl,index.php,index.xhtml,index.htm) found, and server-generated directory index forbidden by Options directive
This might work:
Options -Indexes
ErrorDocument 403 https://github.com/miranda-zhang/cloud-computing-schema

HTACCESS: Throw 404 error but still accessible

I have the following structure:
- root
- css
- test.css
- index.php
Is there a way to throw a 404 error when I try to open anything in CSS folder in my browser as:
http://localhost:1993/css
http://localhost:1993/css/test.css
but still accessible from the browser? I want CSS to load normally, but to throw a 404 error when someone opens this path. Or turn off the directory listing for everything in the directory (with all files inside)? Or something like that? Its possible?
You can try mod_access,
<FilesMatch "\.css$">
order deny,allow
deny from all
allow from 127.0.0.1
allow from localhost
</FilesMatch>
With mod_rewrite
RewriteEngine On
RewriteCond %{HTTP_HOST} ^!localhost
RewriteCond %{REQUEST_URI} ^/css [OR]
RewriteCond %{REQUEST_FILENAME} \.css$
RewriteRule ^ - [R=404]

Restrict access to webpage from single referrer domain using .htaccess

I want to restrict access to a website to only allow referrers from a single domain. I can't get the .htaccess file to work correctly.
Say I am referring from http://domainname.com - access will be allowed.
Or http://subdomain.domainname.com - access will be allowed.
But any other referrer (or typing in URL) will block, and direct to Access Denied page.
Code as follows (note I need to allow access from ANY referrer page on domainname.com
RewriteEngine On
RewriteBase /
# allow these referers to passthrough
RewriteCond %{HTTP_REFERER} ^http://(protect|unprotected)\.domainname\.com
RewriteRule ^ - [L]
# everybody else receives a forbidden
RewriteRule ^ - [F]
ErrorDocument 403 /forbidden.html
The HTTP referer header only says were the request is coming from. E.g. when there is a link in some webpage from www.example.net
<a href="http://www.example.com/some/path>Click here</a>
then the request will be for http://www.example.com/some/path and the referer header will contain the URI from www.example.net.
If you block any request without a specific referer, then any direct request will be blocked too. Also note, that the referer header is sent by the client and therefore, it is not a reliable indicator.
Another caveat is, according to Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content- 5.5.2 Referer, the client may send a partial-URI, which doesn't contain a domain name at all.
To answer your question, if you want to allow requests coming from either domainname.com or any of its subdomain, you might check for
RewriteCond %{HTTP_REFERER} ^http://(?:.*\.)?domainname\.com
RewriteRule ^ - [L]
RewriteRule ^ - [F]
or the other way round, forbid when you negate it
RewriteCond %{HTTP_REFERER} !^http://(?:.*\.)?domainname\.com
RewriteRule ^ - [F]
To check for one of multiple conditions, cond1 or cond2 or cond3, you must use RewriteCond with the ornext|OR flag, e.g.
RewriteCond %{HTTP_REFERER} ^http://(?:.*\.)?nature\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(?:.*\.)?adclick\.g\.doubleclick\.net [OR]
RewriteCond %{HTTP_REFERER} ^http://(?:.*\.)?onepointedpixel\.com
RewriteRule ^ - [L]

Why isn't this htaccess code blocking a specified IP?

I have a large htaccess file for my site. One of the IPs I'm trying to block is 27.153.228.56
Despite my htaccess, I still see 27.153.228.56 showing up in my latest visitor logs.
Is there something wrong with my htaccess that's allowing this IP to access the site?
There are many more IPs blocked but this is an abbreviated version:
# Protect from spam bots
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
RewriteCond %{HTTP_REFERER} !.garagehangover.com* [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]
</IfModule>
# Begin IP blocking #
Order Allow,Deny
deny from 27.153.228.56
# End IP blocking #
#Begin Bad Bot Blocking
BrowserMatchNoCase yandex bad_bot
Deny from env=bad_bot
# End Bad Bot Blocking
Allow from all
Order Deny,Allow
And remove Allow from all
This will process all the deny rules, and if none match, allow the request.
Also, generally you would put these rules before the RewriteEngine on directive.
Looks ok to me. But you could try to block a range of IPs like this...
deny from 27.153.228.0/255.255.255.0
or this
deny from 27.153.0.0/255.255.0.0

how to ban crawler 360Spider with robots.txt or .htaccess?

I've got a problems because of 360Spider: this bot makes too many requests per second to my VPS and slows it down (the CPU-usage becomes 10-70%, but usually i have 1-2%). I looked into httpd logs and saw there such lines:
182.118.25.209 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/42957-polovity.html HTTP/1.1" 200 96809 "http://www.hrinchenko.com/slovar/znachenie-slova/42957-polovity.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider
182.118.25.208 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/52614-rospryskaty.html HTTP/1.1" 200 100239 "http://www.hrinchenko.com/slovar/znachenie-slova/52614-rospryskaty.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider
etc.
How can I block this spider completely via robots.txt? Now my robots.txt looks like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
User-agent: YoudaoBot
Disallow: /
User-agent: sogou spider
Disallow: /
I've added lines:
User-agent: 360Spider
Disallow: /
but that does not seem to work. How to block this angry bot?
If you offer to block it via .htaccess, so mind that it looks now like this:
# Turn on URL rewriting
RewriteEngine On
# Installation directory
RewriteBase /
SetEnvIfNoCase Referer ^360Spider$ block_them
Deny from env=block_them
# Protect hidden files from being viewed
<Files .*>
Order Deny,Allow
Deny From All
</Files>
# Protect application and system files from being viewed
RewriteRule ^(?:application|modules|system)\b.* index.php/$0 [L]
# Allow any files or directories that exist to be displayed directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Rewrite all other URLs to index.php/URL
RewriteRule .* index.php/$0 [PT]
And, in spite of presence of
SetEnvIfNoCase Referer ^360Spider$ block_them
Deny from env=block_them
this bot still tries to kill my VPS and is logged in access logs.
In your .htaccess file simply add the following :
RewriteCond %{REMOTE_ADDR} ^(182\.118\.2)
RewriteRule ^.*$ http://182.118.25.209/take_a_hike_moron [R=301,L]
This will catch ALL the bots being launched from the 182.118.2xx.xxx range and send them back to themself...
The crappy 360 bot is being fired from servers in China... so as long as you don't mind saying bye bye to crappy Chinese traffic from that IP range, this will guaranteed make those puppies disappear from reaching any files on your web site.
The following two lines in your .htaccess file will also pick it off simply by it being stupid enough to proudly put 360spider in its user agent string. This could be handy for when they use other IP ranges then the 182.118.2xx.xxx
RewriteCond %{HTTP_USER_AGENT} .*(360Spider) [NC]
RewriteRule ^.*$ http://182.118.25.209/take_a_hike_moron [R=301,L]
And yes... I hate them too !
Your robots.txt seems right. Some bots just ignore it (malicious bots crawl from any IP address from any botnet of hundreds to millions of infected devices from all around the globe), in this case you can limit the number of requests per second using mod_security module for apache 2.X
Config example here: http://blog.cherouvim.com/simple-dos-protection-with-mod_security/
[EDIT] On linux, iptables also allows restricting tcp:port connections per (x) second(s) per ip, providing conntrack capabilities are enabled on your kernel. See: https://serverfault.com/questions/378357/iptables-dos-limit-for-all-ports
You can put following rules into your .htaccess file
RewriteEngine On
RewriteBase /
SetEnvIfNoCase Referer 360Spider$ block_them
Deny from env=block_them
Note: Apache module mod_setenvif should be enabled in your server configuration
The person running the crawler might be ignoring robots.txt. You could block them via IP
order deny, allow
deny from 216.86.192.196
in .htaccess
SetEnvIfNoCase User-agent 360Spider blocked
I have lines in my .htaccess file like this to block bad bots:
RewriteEngine On
RewriteCond %{ENV:bad} 1
RewriteCond %{REQUEST_URI} !/forbidden.php
RewriteRule (.*) - [R=402,L]
SetEnvIf Remote_Addr "^38\.99\." bad=1
SetEnvIf Remote_Addr "^210\.195\.45\." bad=1
SetEnvIf Remote_Addr "^207\.189\." bad=1
SetEnvIf Remote_Addr "^69\.84\.207\." bad=1
# ...
SetEnvIf Remote_Addr "^221\.204\." bad=1
SetEnvIf User-agent "360Spider" bad=1
It will send the status code 402 Payment Required to all blacklisted IPs / user-agents.
You can put anything that you want displayed to the bot in forbidden.php.
It's quite effective.
I just had to block 360Spider. Solved with StreamCatcher on IIS (IIS7), which fortunately was already installed so only a small configuration change was needed. Details at http://needs-be.blogspot.com/2013/02/how-to-block-spider360.html
I use the following, and it helps alot! Check the HTTP_USER_AGENT for bad bots
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteCond %{REQUEST_URI} !^/error\.html$
RewriteCond %{HTTP_USER_AGENT} EasouSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YisouSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Sogou\ web\ spider [NC]
RewriteCond %{HTTP_USER_AGENT} 360Spider [NC,OR]
RewriteRule ^.*$ - [F,L]
</IfModule>
<Location />
<IfModule mod_setenvif.c>
SetEnvIfNoCase User-Agent "EasouSpider" bad_bot
SetEnvIfNoCase User-Agent "YisouSpider" bad_bot
SetEnvIfNoCase User-Agent "LinksCrawler" bad_bot
Order Allow,Deny
Allow from All
Deny from env=bad_bot
</IfModule>
</Location>

Resources