Blocking a specific page to Pinterest with .htaccess - .htaccess

I have a problem with Pinterest : a pin to a page of my website has pushed away my own page from Google results.
I would like to block any crawl or new pin of this specific page from Pinterest
Pinterest User Agent is :
Pinterest/0.2 (+https://www.pinterest.com/bot.html)
Mozilla/5.0 (compatible ; Pinterestbot/1.0 ; +https://www.pinterest.com/bot.html)
Mozilla/5.0 (Linux ; Android 6.0.1 ; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Pinterestbot/1.0; +https://www.pinterest.com/bot.html)
Pinterest Bot IP range is :
54.236.1.XXX.
For the example, the page I want to block contains "substring" in it's URL
Here is my code in .htaccess and obviously it's not working :
RewriteEngine On
RewriteCond %{REQUEST_URI} substring [NC]
RewriteCond %{HTTP_USER_AGENT} pinterest [NC,OR]
RewriteCond %{HTTP_REFERER} ^(www\.)?pinterest\. [NC,OR]
RewriteCond %{REMOTE_ADDR} ^54\.236\.1\.
RewriteRule .* - [F]
Thank you for your help !

Related

Redirect specific useragent using HTaccess

Due to Gmail/Google opens all emails I send to check content and links I see inflated Open Rates. Around 80% of emails get checked by Gmail, so my open rates show 80%.
Most ESPs solved this by excluding opens and clicks from this Useragent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246 Mozilla/5.0".
So what I want to do now is to redirect any request with this user agent to a new url, so it can not access/download the tracking pixel in the email.
I heard you can do this in the htaccess file. I found numerous codes to redirect user agents but none of them shows an example to only redirect a very specific user agent string.
I found codes like this:
RewriteCond %{HTTP_USER_AGENT} Opera
RewriteRule ^abc.html$ http://example.com/xy/opera.html [R=301]
But it always says "Firefox" or things like "googlebot". Is there any way to ONLY redirect THIS SPECIFIC string?
I have no knowledge about this stuff at all and hope to get a copy paste solution.
The url I want do rediret to is beseductiv.com
Thank you very much.
Could you please try following, written based on your shown samples.
RewriteEngine ON
RewriteCond %{HTTP_USER_AGENT} ^(Mozilla|Safari|googlebot).*$ [NC]
RewriteRule ^(.*)$ http://example.com/xy/opera.html [R=301,L]

How to properly escape a user agent for blocking in htaccess file

I'm trying to block 1 specific user that is constantly scraping my site by hand. He uses a VPN, so IP blocking doesn't work. I can't seem to get him blocked via my HTaccess file. I'm using the following code which is the escaped version of this user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36 OPR/65.0.3467.48
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
SetEnvIfNoCase User-Agent "Chrome version 0\.0 running on Win10 Mozilla\/5\.0 \(Windows NT 10\.0; Win64; x64\) AppleWebKit\/537\.36 \(KHTML, like Gecko\) Chrome\/78\.0\.3904\.97 Safari\/537\.36 OPR\/65\.0\.3467\.48" bad_bot
Deny from env=bad_bot
</IfModule>
Does this make any sense? I don't want to block too many other users with this piece of code, but I'm afraid that user agent is quite common.
Thanks in advance!
Ps. This goes way beyond my knowledge, but feel free to drop in any technical solutions or language and I'll figure it out with a dev friend.

chrome redirect to mobile version of the website on a desktop

Chrome redirects to mobile version, on a desktop.
I have a website and it has a iphone version that redirect iphones and safari users to a mobile version of the website.
The chrome browser on a desktop redirect also to the iphone/mobile version of the website.
Only this i dont want.
Other browsers like Mozilla and internet explore are working fine.
In my .htaccess file i have :
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} iPhone
RewriteCond %{REQUEST_URI} !^/iphone/
RewriteRule .* /iphone/ [R]
RewriteCond %{HTTP_USER_AGENT} Safari
RewriteCond %{REQUEST_URI} !^/iphone/
RewriteRule .* /iphone/ [R]
i have searched the web for many hour before asking here.
is there anyone who have experience with this type of bugs.
The website is html5, no php is used.
Regards iamdaves
Looking here you can see that the user agent string for Chrome also includes the keyword 'Safari'. Thus you might want to use something else to detect just the Safari browser.
Chrome 37.0.2049.0
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36
Perhaps 'Macintosh' or 'Mac OS'
Safari 5.1.7
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.13+ (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2
Examples from www.useragentstring.com

how to ban crawler 360Spider with robots.txt or .htaccess?

I've got a problems because of 360Spider: this bot makes too many requests per second to my VPS and slows it down (the CPU-usage becomes 10-70%, but usually i have 1-2%). I looked into httpd logs and saw there such lines:
182.118.25.209 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/42957-polovity.html HTTP/1.1" 200 96809 "http://www.hrinchenko.com/slovar/znachenie-slova/42957-polovity.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider
182.118.25.208 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/52614-rospryskaty.html HTTP/1.1" 200 100239 "http://www.hrinchenko.com/slovar/znachenie-slova/52614-rospryskaty.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider
etc.
How can I block this spider completely via robots.txt? Now my robots.txt looks like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
User-agent: YoudaoBot
Disallow: /
User-agent: sogou spider
Disallow: /
I've added lines:
User-agent: 360Spider
Disallow: /
but that does not seem to work. How to block this angry bot?
If you offer to block it via .htaccess, so mind that it looks now like this:
# Turn on URL rewriting
RewriteEngine On
# Installation directory
RewriteBase /
SetEnvIfNoCase Referer ^360Spider$ block_them
Deny from env=block_them
# Protect hidden files from being viewed
<Files .*>
Order Deny,Allow
Deny From All
</Files>
# Protect application and system files from being viewed
RewriteRule ^(?:application|modules|system)\b.* index.php/$0 [L]
# Allow any files or directories that exist to be displayed directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Rewrite all other URLs to index.php/URL
RewriteRule .* index.php/$0 [PT]
And, in spite of presence of
SetEnvIfNoCase Referer ^360Spider$ block_them
Deny from env=block_them
this bot still tries to kill my VPS and is logged in access logs.
In your .htaccess file simply add the following :
RewriteCond %{REMOTE_ADDR} ^(182\.118\.2)
RewriteRule ^.*$ http://182.118.25.209/take_a_hike_moron [R=301,L]
This will catch ALL the bots being launched from the 182.118.2xx.xxx range and send them back to themself...
The crappy 360 bot is being fired from servers in China... so as long as you don't mind saying bye bye to crappy Chinese traffic from that IP range, this will guaranteed make those puppies disappear from reaching any files on your web site.
The following two lines in your .htaccess file will also pick it off simply by it being stupid enough to proudly put 360spider in its user agent string. This could be handy for when they use other IP ranges then the 182.118.2xx.xxx
RewriteCond %{HTTP_USER_AGENT} .*(360Spider) [NC]
RewriteRule ^.*$ http://182.118.25.209/take_a_hike_moron [R=301,L]
And yes... I hate them too !
Your robots.txt seems right. Some bots just ignore it (malicious bots crawl from any IP address from any botnet of hundreds to millions of infected devices from all around the globe), in this case you can limit the number of requests per second using mod_security module for apache 2.X
Config example here: http://blog.cherouvim.com/simple-dos-protection-with-mod_security/
[EDIT] On linux, iptables also allows restricting tcp:port connections per (x) second(s) per ip, providing conntrack capabilities are enabled on your kernel. See: https://serverfault.com/questions/378357/iptables-dos-limit-for-all-ports
You can put following rules into your .htaccess file
RewriteEngine On
RewriteBase /
SetEnvIfNoCase Referer 360Spider$ block_them
Deny from env=block_them
Note: Apache module mod_setenvif should be enabled in your server configuration
The person running the crawler might be ignoring robots.txt. You could block them via IP
order deny, allow
deny from 216.86.192.196
in .htaccess
SetEnvIfNoCase User-agent 360Spider blocked
I have lines in my .htaccess file like this to block bad bots:
RewriteEngine On
RewriteCond %{ENV:bad} 1
RewriteCond %{REQUEST_URI} !/forbidden.php
RewriteRule (.*) - [R=402,L]
SetEnvIf Remote_Addr "^38\.99\." bad=1
SetEnvIf Remote_Addr "^210\.195\.45\." bad=1
SetEnvIf Remote_Addr "^207\.189\." bad=1
SetEnvIf Remote_Addr "^69\.84\.207\." bad=1
# ...
SetEnvIf Remote_Addr "^221\.204\." bad=1
SetEnvIf User-agent "360Spider" bad=1
It will send the status code 402 Payment Required to all blacklisted IPs / user-agents.
You can put anything that you want displayed to the bot in forbidden.php.
It's quite effective.
I just had to block 360Spider. Solved with StreamCatcher on IIS (IIS7), which fortunately was already installed so only a small configuration change was needed. Details at http://needs-be.blogspot.com/2013/02/how-to-block-spider360.html
I use the following, and it helps alot! Check the HTTP_USER_AGENT for bad bots
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteCond %{REQUEST_URI} !^/error\.html$
RewriteCond %{HTTP_USER_AGENT} EasouSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YisouSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Sogou\ web\ spider [NC]
RewriteCond %{HTTP_USER_AGENT} 360Spider [NC,OR]
RewriteRule ^.*$ - [F,L]
</IfModule>
<Location />
<IfModule mod_setenvif.c>
SetEnvIfNoCase User-Agent "EasouSpider" bad_bot
SetEnvIfNoCase User-Agent "YisouSpider" bad_bot
SetEnvIfNoCase User-Agent "LinksCrawler" bad_bot
Order Allow,Deny
Allow from All
Deny from env=bad_bot
</IfModule>
</Location>

Why is my `favicon.ico` request not working?

I have a favicon.ico on my site.
In the HTML, I link to its location...
<link rel="icon" href="/assets/images/layout/favicon.ico" type="image/x-icon" />
I also have this in my .htaccess.
# Redirect /favicon.ico requests
RewriteCond %{REQUEST_URI} !^assets/images/layout/favicon\.ico [NC]
RewriteCond %{REQUEST_URI} ^favicon\.(gif|ico|png|jpe?g)$ [NC]
RewriteRule ^(.*)$ assets/images/layout/favicon.ico [R=301,L]
...to redirect the /favicon.ico requests to a different location.
For some reason, every time I request favicon.ico in my browser, I get 304 Not Modified response with matching Etags and apparently a blank image, even though /assets/images/layout/favicon.ico does exist.
I get the same issue when trying to access it wil the full path.
What is going on here? What is causing this 304?
First of all I would rather put this rule in .htaccess like this:
RewriteRule ^favicon\.(gif|ico|png|jpe?g)$ /assets/images/layout/favicon.ico [L,NC]
Then if you have this in your web page:
<link rel="icon" href="/favicon.ico" type="image/x-icon" />
/favicon.ico will be internally redirected to /assets/images/layout/favicon.ico
I have tested this in IE, Firefox and Chrome and all 3 show this behavior that for the first time (or after clearing cache) I get 200 for favicon.ico but afterwards all browsers cache this icon file and don't bother to send another request to the server. With this setup I didn't find any 304 happening in my testing.
My access log:
Chrome
127.0.0.1 - - [05/May/2011:23:58:15 -0400] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like Geck
o) Chrome/11.0.696.60 Safari/534.24"
IE
127.0.0.1 - - [06/May/2011:00:05:18 -0400] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .
NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
Firefox
127.0.0.1 - - [06/May/2011:00:07:33 -0400] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
<IfModule mod_rewrite.c>
RewriteRule ^favicon\.ico$ _/img/ico/favicon.ico [L]
RewriteRule ^apple-touch-icon\.png$ _/img/ico/apple-touch-icon.png [L]
RewriteRule ^apple-touch-icon-precomposed\.png$ _/img/ico/apple-touch-icon-precomposed.png [L]
RewriteRule ^apple-touch-icon-57x57-precomposed\.png$ _/img/ico/apple-touch-icon-57x57-precomposed.png [L]
RewriteRule ^apple-touch-icon-72x72-precomposed\.png$ _/img/ico/apple-touch-icon-72x72-precomposed.png [L]
RewriteRule ^apple-touch-icon-144x144-precomposed\.png$ _/img/ico/apple-touch-icon-144x144-precomposed.png [L]
RewriteRule ^apple-touch-icon-114x114-precomposed\.png$ _/img/ico/apple-touch-icon-114x114-precomposed.png [L]

Resources