Block spider bot except one - .htaccess

I have a site where every day in different hour a spider bot scan my site with semrush.
I can block the user agent via htaccess but now at Sunday I scan with semrush my site for some improvement.
So if I block semrush user agent I block myself, IP is every different because It's from semrush.
Is there any way to block all semrush user agent except mine?
Thanks

You can use the following rule to block the user agent, plus this excludes your IP and does not block you with that user agent.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} semrush [NC]
RewriteCond %{REMOTE_ADDR} !^yourIP$
RewriteRule ^ - [F]

Related

blocking referrer spam not working with my htaccess file

From every example on the net it seems this is the config to use to block referrer spam. I am still getting traffic from trafficmonetize.org. Can anyone tell me or give me some ideas what to look for.
## SITE REFERRER BANNING
RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR]
RewriteCond %{HTTP_REFERER} best-seo-offer\.com [NC,OR]
RewriteCond %{HTTP_REFERER} 100dollars-seo\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons-for-your-website\.com [NC,OR]
RewriteCond %{HTTP_REFERER} seoanalyses\.com [NC,OR]
RewriteCond %{HTTP_REFERER} 4webmasters\.org [NC,OR]
RewriteCond %{HTTP_REFERER} trafficmonetize\.org [NC]
RewriteRule .* - [F]
I spent a week dealing with referral bots spamming sites. The first line defense was doing it via the htaccess file, however bots where still able to get through and hitting my Google Analytics account.
The reason some of these bots are hitting your site is because they are in fact not actually visiting your website. They are taking your Google Analytics tracker code, and placing it within a JavaScript on their servers and pinging it which is causing false pageviews.
The best solution that I came up with, was simply filtering them out in my Google Analytics account. Here is the Moz article that I used as a reference. Since adding the filter, the bots no longer appear in my Analytics stats.
Server solutions like the .htaccess file will only work for Crawler spam, from your list
semalt
100dollars-seo
buttons-for-website
buttons-for-your-website
Ghost spam like 4webmasters and trafficmonetize never access your site, so there is no point on trying to block it from the .htaccess file, it all happens within GA so has to be filtered out there, that's why it keeps showing on your reports.
As for seoanalyses I'm not sure since I haven't seen it on any of the properties I manage, but you can see it for yourself, select as a second dimension hostname and if you see a fake hostname or not set then is ghost spam, if it has a valid hostname then is crawler. Either way you can filter it.
You can use 2 approaches for filtering spam, one is creating a Campaign Source excluding the referral or a more advanced approach is to create a Valid hostname filter that will get rid of all Ghost Spam
Here you can find more information about the spam and both solutions:
https://stackoverflow.com/a/28354319/3197362
https://stackoverflow.com/a/29717606/3197362

htaccess to allow pages to be visited via a redirect, but NOT directly

I thought I had things set up correctly but I havent found a 'perfect' solution quite yet and wanted to see if anyone out there has done a similar setup.
I have a page that depending on the day of the week will redirect to other pages (which are are not publicly linked). I do not want users to be able to get to the final page without going through the designated referer so that users cannot bookmark and pull-up Mondays page on Friday. If they do so they'll be redirected saying that they cannot access it via a bookmark.
The main page is www.example.com/AB/ Upon visiting that page a user is automatically redirected to www.example.com/AB/123 or www.example.com/AB/123
What I would like to accomplish is to block access to the final pages if they are not referred to from the parent page. The parent page, however, can be accessed from anywhere (referrer does not matter).
Thoughts?
Thank you!
EDIT:
#olaf - Here's where I am at ...
RewriteEngine On
RewriteCond %{HTTP_REFERER} !mysite\.com [NC]
RewriteRule ^/?q/? http://mysite.com/sorry-no-bookmarks/ [R=302,L]
disregarding days or timing, anything that is a subpage under "/q/..." should be redirected to the "no bookmarks" page if it did not come from somewhere (anywhere) on mysite.com. In addition I would like to keep anything thats subpaged under "/n/..." freely accessible from anywhere (outside referers included). Works mostly okay with chrome, but firefox and IE are blocking things regardless. fun times!
Do you mean is to redirect into the abuse text the user that have linked to /AB/$dynamic in which not came from /AB/? Give this a try:
RewriteCond %{HTTP_REFERER} !example.com/AB/?$ [NC]
RewriteCond ^/AB/([a-z0-9-_]+)/? abuse.txt
Since you stay on the same server, I wouldn't redirect but do an internal rewrite instead
RewriteEngine on
RewriteCond %{TIME_WDAY} 1
RewriteRule ^AB/?$ /AB/monday.html [L]
RewriteCond %{TIME_WDAY} 2
RewriteRule ^AB/?$ /AB/tuesday.html [L]
# and so on ...
or name the daily files 1.html for monday, 2.html for tuesday, ... and do it in one rule
RewriteEngine on
RewriteRule ^AB/?$ /AB/%{TIME_WDAY}.html [L]

htaccess only accept traffic from specific http_referer

I'm trying to set up a htaccess file that would accomplish the following:
Only allow my website to be viewed if the viewing user is coming from a specific domain (link)
So, for instance. I have a domain called. protect.mydomain.com . I only want people coming from a link on unprotected.mydomain.com to be able to access protect.mydomain.com.
The big outstanding issue I have is that if you get to protect.mydomain.com from unprotected.mydomain.com and click on a link in the protect.mydomain.com that goes to another page under protect.mydomain.com then I get sent back to my redirect because the http_referer is protect.mydomain.com . So to combat that I put in a check to allow the referrer to be protect.mydomain.com as well. It's not working and access is allowed from everywhere. Here is my htaccess file. (All this is under https)
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_REFERER} ^https://(.+\.)*mydomain\.com
RewriteCond %1 !^(protect|unprotected)\.$
RewriteRule ^.*$ https://unprotected.mydomain.com/ [R=301,L]
You are matching your referer against ^https://(.+\.)*mydomain\.com. Which means if some completely other site, say http://stealing_your_images.com/ links to something on protect.mydomain.com, the first condition will fail, thus the request is never redirected to https://unprotected.mydomain.com/. You want to approach it from the other direction, only allow certain referers to pass through, then redirect everything else:
RewriteEngine On
RewriteBase /
# allow these referers to passthrough
RewriteCond %{HTTP_REFERER} ^https://(protect|unprotected)\.mydomain\.com
RewriteRule ^ - [L]
# redirect everything else
RewriteRule ^ https://unprotected.mydomain.com/ [R,L]

use htaccess to redirect any desktop users to holding page

I have a mobile web-app which isn't desktop ready just yet, and I don't want users to be able to view the desktop version of the app just yet. Is it possible to catch any requests from desktop browsers and redirect to a holding page, or possibly even to a tablet version of the site by appending the User Agent info to the URL? A basic attempt at this is as follows, and doesn't work...
# Turn mod_rewrite on
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^.*(MSIE.*Windows\ NT|Lynx|Safari|Opera|Firefox|Konqueror) [NC]
RewriteRule ^$ holding.jpg [L,R]
Try changing your RewriteRule's regex so that it'll match anything:
RewriteRule .* holding.jpg [L,R]

Blocking "hacking" attempts via .htaccess

I am trying to block request been made to our pagination parameter by multiple robots (evil ones it seems)
Hundreds of these types of requests are showing up:
http://www.ourdomain.com/search.php?q=search+query&page=366100876
Is there a way using regular expressions in .htaccess to send any request that requests a page larger than 1000 or anything more than 4 digits in the parameter 'page' ?
'q' parameter is of course always different.
Thank you.
I derived most of this from a really cool article called Ultimate .htaccess file sample. Very handy.
Redirect 500 /error500.html
RedirectMatch 500 ^.+{1001}.+$
That would send away any long URLs.
LimitRequestBody 102400
That would limit any requests over 100K.
To target the GET variable page specifically:
RedirectMatch 500 ^.+page=[0-9]{4}.+$
I tried this and it works, added it to some other checks I had:
RewriteCond %{QUERY_STRING} page=[0-9]{4} [OR]
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
# Block out any script trying to base64_encode crap to send via URL
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
# Block out any script that includes a <script> tag in URL
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Send all blocked request to homepage with 403 Forbidden error!
A: htaccess: Rewrite, URI /*admin*/* [NC,OR] - you must rename anything with "admin" in it to something else, xyzzy, spot, nancy, fuivan... This also includes names like login.php become jumpjoy.php.
B: When you see a attempted hack - 188 hits from a non-RU, non-CH IP - contact the hosting company and tell them. More often than not you get back "Thank you, we have found and cleaned out the bot.[banned the user]"
I never display "403 forbidden", I redirect to some other page (I have a random selection of 7). I then append htaccess with "deny from $IP". I host local groups and businesses so there is no need for Ivan wanting a cab... he couldn't afford the fare :-/ There are 900+ deny from in my htaccess.
I run a crontab "find $HOME -newer lasttime | mail-me ; touch lasttime". That way if anyone does get in I know within a couple of hours. Also "chmod 444 [all].php".

Resources