From every example on the net it seems this is the config to use to block referrer spam. I am still getting traffic from trafficmonetize.org. Can anyone tell me or give me some ideas what to look for.
## SITE REFERRER BANNING
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR]
RewriteCond %{HTTP_REFERER} best-seo-offer\.com [NC,OR]
RewriteCond %{HTTP_REFERER} 100dollars-seo\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons-for-your-website\.com [NC,OR]
RewriteCond %{HTTP_REFERER} seoanalyses\.com [NC,OR]
RewriteCond %{HTTP_REFERER} 4webmasters\.org [NC,OR]
RewriteCond %{HTTP_REFERER} trafficmonetize\.org [NC]
RewriteRule .* - [F]
I spent a week dealing with referral bots spamming sites. The first line defense was doing it via the htaccess file, however bots where still able to get through and hitting my Google Analytics account.
The reason some of these bots are hitting your site is because they are in fact not actually visiting your website. They are taking your Google Analytics tracker code, and placing it within a JavaScript on their servers and pinging it which is causing false pageviews.
The best solution that I came up with, was simply filtering them out in my Google Analytics account. Here is the Moz article that I used as a reference. Since adding the filter, the bots no longer appear in my Analytics stats.
Server solutions like the .htaccess file will only work for Crawler spam, from your list
semalt
100dollars-seo
buttons-for-website
buttons-for-your-website
Ghost spam like 4webmasters and trafficmonetize never access your site, so there is no point on trying to block it from the .htaccess file, it all happens within GA so has to be filtered out there, that's why it keeps showing on your reports.
As for seoanalyses I'm not sure since I haven't seen it on any of the properties I manage, but you can see it for yourself, select as a second dimension hostname and if you see a fake hostname or not set then is ghost spam, if it has a valid hostname then is crawler. Either way you can filter it.
You can use 2 approaches for filtering spam, one is creating a Campaign Source excluding the referral or a more advanced approach is to create a Valid hostname filter that will get rid of all Ghost Spam
Here you can find more information about the spam and both solutions:
https://stackoverflow.com/a/28354319/3197362
https://stackoverflow.com/a/29717606/3197362
Related
I would like to block some specific pages from being indexed / accessed by Google. This pages have a GET parameter in common and I would like to redirect bots to the equivalent page without the GET parameter.
Example - page to block for crawlers:
mydomain.com/my-page/?module=aaa
Should be blocked based on the presence of module= and redirected permanently to
mydomain.com/my-page/
I know that canonical can spare me the trouble of doing this but the problem is that those urls are already in the Google Index and I'd like to accelerate their removal. I have already added a noindex tag one month ago and I still see results in google search. It is also affecting my crawl credit.
What I wanted to try out is the following:
RewriteEngine on
RewriteCond %{QUERY_STRING} module=
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
Is this correct?
What should I add for the final redirection?
It's a tricky thing to do so before implementing anything I'd like to make sure it's the right thing to do.
Thanks
That would be:
RewriteEngine On
RewriteCond %{QUERY_STRING} module= [NC]
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule ^ %{REQUEST_URI}? [L,R=301]
Last ? in %{REQUEST_URI}? will remove previous query string.
Our website was getting many hits from "Sogou web spider", So we thought of blocking it using htaccess rules. We created below rules -
RewriteCond %{HTTP_USER_AGENT} Sogou [NC]
RewriteRule ^.*$ - [L]
However we are still getting hits from Sogou. I would like to know what changes should I make in this rule to block Sogou.
Thanking you,
As #faa mentioned, you're not actually blocking anything:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Sogou [NC]
RewriteRule ^.*$ map.txt [R=403]
Make sure you've got RewriteEngine On and the [R=403].
You may still see hits from them in your access logs but with the combination of not sending any data and a 403 forbidden header, you should see the hits die off eventually. Even if they continue to crawl your site, it should no longer generate so much extra load on your server.
I'm trying to redirect images on my server to a url, if the user client is NOT A BOT.
So far I have:
RewriteCond %{HTTP_USER_AGENT} "Windows" [NC]
RewriteCond %{REQUEST_URI} jpg
RewriteRule ^(.*)$ http://www.myurl.com/$1 [R=301,L]
But something is wrong. Is it possible to combine these 2 conditions?
Your idea is admirable, but the logic is flawed based on real world bot behavior.
I deal with security on sites all the time & User Agent strings are faked all the time. If have an option to install it, I would recommend using a tool like Mod Security. It’s basically an Apache module firewall that uses configurable rulesets to deny bad patterns of access behavior. But honestly, if you are having issues with .htaccess stuff like this Mod Security might be too intense to understand.
A better tact is to just prevent hot-linking via mod_rewrite tricks like this.
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mydomain.com/.*$ [NC]
RewriteRule \.(gif|jpg)$ http://www.mydomain.com/angryman.gif [R,L]
Then again, reading your question I am not 100% sure what you want to achieve? Maybe mod_rewrite stuff like this can give you hints on how to approach the issue? Good luck!
I'm having trouble blocking two bad bots that keep sucking bandwidth from my site and I'm certain it has something to do with the * in the user-agent name that they use.
Right now, I'm using the following code to block the bad bots (this is an excerpt)...
# block bad bots
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^spider$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^robot$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^crawl$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^discovery$
RewriteRule .* - [F,L]
When I try to do RewriteCond %{HTTP_USER_AGENT} ^*bot$ [OR] or RewriteCond %{HTTP_USER_AGENT} ^(*bot)$ [OR] I get an error.
Guessing there is a pretty easy way to do this that I just haven't found yet on Google.
An asterisk (*) in a regular expression pattern needs to be escaped, since it is being interpreted as part of the regular expression.
RewriteCond %{HTTP_USER_AGENT} ^\*bot$
should do the trick.
I think your are missing a dot ., change your condition to this:
RewriteCond %{HTTP_USER_AGENT} ^.*bot$ [OR]
But how is this going to prevent Bad Bot access?
I work for a security company (also PM at Botopedia.org) and I can tell that 99.9% of bad bots will not use any of these expressions in their user-agent string.
Most of the time Bad Bots will use legitimate looking user-agents (impersonating browsers and VIP bots like Googlebot) and you simply cannot filter them via user-agent data alone.
For effective bot detection you should look into other signs like:
1) Suspicious signatures (i.e. Order of Header parameter)
or/and
2) Suspicious behavior (i.e. early robots.txt access or request rates/patterns)
Then you should use different challenges (i.e. JS or Cookie or even CAPTCHA) to verify your suspicions.
The problem you've described is often referred to as a "Parasitic Drag".
This is a very real and serious issue and we actually published a research about it just a couple of month ago.
(We found that on an average sized site 51% of visitors will be bots, 31% malicious)
Honestly, I don't think you can solve this problem with several line of RegEx.
We offer our Bot filtering services for free and there are several others like us. (I can endorse good services if needed)
GL.
I have a website. Some of the images on my website are hotlinked in other websites. I would like to block them. I would like to allow some sites to hotlink my images (like some forums), as I have made posts about my product in those forums. I have list of sites which I regularly promote my products and I would like to allow them to hotlink my images. I would like to block all other sites other than my list.
I use shared hosting from hostgator btw.
I hope it is clearly understandable.
Please help.
Thanks
You can put this code to your .htaccess file:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yoursite\.com [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} ^http://.*$
RewriteRule \.(jpe?g|gif|bmp|png)$ /media/nohotlinks.png [L]
Please read this article for more information
http://www.yourhtmlsource.com/sitemanagement/bandwidththeft.html
Add the sites you want to allow as RewriteCondition to the hotlinking rule of your choice, e.g. #YNhat's above.
For hotlinking from alloweddomain.com:
RewriteCond %{HTTP_REFERER} !^http://(www\.)?alloweddomain\.com