Our website was getting many hits from "Sogou web spider", So we thought of blocking it using htaccess rules. We created below rules -
RewriteCond %{HTTP_USER_AGENT} Sogou [NC]
RewriteRule ^.*$ - [L]
However we are still getting hits from Sogou. I would like to know what changes should I make in this rule to block Sogou.
Thanking you,
As #faa mentioned, you're not actually blocking anything:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Sogou [NC]
RewriteRule ^.*$ map.txt [R=403]
Make sure you've got RewriteEngine On and the [R=403].
You may still see hits from them in your access logs but with the combination of not sending any data and a 403 forbidden header, you should see the hits die off eventually. Even if they continue to crawl your site, it should no longer generate so much extra load on your server.
Related
Recently I moved my websites to the hoster one.com. They have setup an automated mechanism (I dunno what they use to achieve that) to rewrite any first-level folder on the webspace to a subdomain.
I.e. the folder http://example.com/folder1/ will be also available as http://folder1.example.com/
Now, I have a site, that is using quite a lot javascript to include pages from a hardcoded, static source. Due to the SOP the scripts are working depending on which hardcoded reference they use.
So, to make sure that everybody gets a working version of the website, i wanted to redirect the direct folder access to the subdomain as well.
My htaccess for this - which is working localy and on various htaccess-testers out there - seems to be not working with one.com:
RewriteEngine On
#Rewrite Access to folder1-folder to subdomain.
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteCond %{REQUEST_URI} ^/folder1.*?$ [NC]
RewriteRule .* http://folder1.example.com/ [R=301,L]
Since I don't know the exact mechanism one.com is using to achieve the mentioned behaviour it might just be a conflict with my rules.
Support says, that all the used commands are fully supported, and therefore wasn't be able to tell what's going wrong...
Does anybody have encountered something similiar and has a hint for me?
just fiured out the solution:
RewriteEngine On #does not work
vs.
RewriteEngine on #does work
You need to check that the actual request was made for /folder/ and not the URI (which can internally be rewritten). Try:
#Rewrite Access to folder1-folder to subdomain.
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /+folder1/ [NC]
RewriteRule ^folder1/(.*)$ http://folder1.example.com/$1 [R=301,L]
What I want to achieve is to redirect any subdomain.mydomain.info to mydomain.info/subdomain using a 301 so that the visitor still sees subdomain.mydomain.info.
After some research I found that I had to set wildcard in my A-Record, did that. Than I went on to create a .htaccess. Below is my entire .htaccess.
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.domain\.info [NC]
RewriteCond %{HTTP_HOST} ([^.]+)\.domain\.info [NC]
RewriteRule ^(.*)$ /%1/$1 [L]
When I open subdomain.mydomain.info where I know that mydomain.info/subdomain is an existing folder I only get a message telling me that the domain "subdomain.mydomain.info" is unavailable.
My webspace is running a Confixx panel, just if that helps.
What could be going wrong here?
At this point I am guessing that some configuration outside the .htacces need to be made, but no idea what and where.
BIG EDIT:
Revisiting this. Turned out I had to talk to my provider to get some things set up correctly. Still trying to figure this our though.
Current situation: the .htaccess from above gives me a 500. Putting in an R, als was suggested in the comments, will redirect "sd.domain.info" to "domain.info/sd/sd/sd/sd" and result in an error by my browser. The browser says "There is redirect on this page" and give me the option to load it again. The version suggested by Al Kafri Firas also gives me a 500. When I remove the .htaccess any "subdomain.doamin.info" gets redirected to "domain.info" with the URL being changed in the head of my browser.
Still looking to get this working....
Revert all changes you made to your A-Record and use this rules
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.example\.info$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([a-z0-9-]+)\.example\.info$ [NC]
RewriteRule /%2%{REQUEST_URI} [PT,L]
Need your help. Just spend many ours on this htaccess problem and still don't have a clue how to manage this.
I have many http://www.example.com/menu-alias/foo links on my company's website which should get redirected to http://www.example.com/foo.
This alone shouldn't be the hard part but listen up... the tricky part follows.
I don't manage to get the site (Joomla 1.5) working without the 'menu-alias' this means that all http://www.example.com/foo should get internally mapped to http://www.example.com/menu-alias/foo. So that the user still has http://www.example.com/foo in his browser's address bar.
To make it even more complicated i have to 301 redirect the old menu-alias/foo links to /foo.
Can some htaccess guru help me out? Is this even possible?
You can try adding these rules in the htaccess file in your document root (or vhost config):
RewriteEngine On
# externally redirect requests that have "menu-alias"
RewriteCond %{THE_REQUEST} /menu-alias/([^\ \?]+)
RewriteRule ^ /%1 [L,R=301]
# internally rewrite requests back to menu-alias
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/menu-alias/
RewriteRule ^/?(.*)$ /menu-alias/$1 [L]
Couple of potential problems:
Joomla may be looking for the original un-rewritten request in $_SERVER, if so, the rewrite won't work.
The rule to add the /menu-alias/ back into the URI does so blindly rewrites all requests that don't point to an existing resource. This means "virtual" paths that Joomla may handle will get a "menu-alias" appended to the front.
I am trying to redirect blackberry users to my mobile site by doing:
# redirect for blackberry users
RewriteCond %{HTTP_USER_AGENT} ^.BlackBerry.$
RewriteRule ^(.*)$ http://www.domain.com/m/ [R=301]
in my .htaccess, but nothing happens when I try to access from the device, I've already deleted cache and cookis and nothing works. I have been googling around and it seems I'm doing the redirect correctly but I guess not, what am I missing?
My .htaccess only contains that by the way.
Edit
The .htaccess in my server's root.
If this isn't the only rule in your .htaccess file, you might have an issue where a later rule messes up your redirect. To redirect immediately, you need to include the L flag.
I also suspect that your regular expression for the user agent is probably not correct for the input you're testing against, since the two . match just one character on either side of the word "BlackBerry". It would also be a good idea to guard against a redirect loop with a check to see if you're already in /m/ (although if you have mod_rewrite directives in a .htaccess file in that directory it's not important).
Putting all of that together, we get something like the following:
# Check for x-wap-profile/Profile headers
RewriteCond %{HTTP:x-wap-profile} !^$ [OR]
RewriteCond %{HTTP:Profile} !^$ [OR]
# Check for BlackBerry anywhere in the user agent string
RewriteCond %{HTTP_USER_AGENT} BlackBerry [NC]
# Make sure we're not in /m/ already
RewriteCond %{REQUEST_URI} !^/m/
RewriteRule ^ http://example.com/m/ [R=301,L]
You may also want that RewriteRule to be...
RewriteRule ^.*$ http://example.com/m/$0 [R=301,L]
...if the content is named the same (but mobile-friendly) in the /m/ directory.
Make sure your BlackBerry is not in "emulation mode" where it passes the user-agent for IE or Firefox instead of BlackBerry. You can check in the browser's option screen.
A better way around this would be to base your rewrite rule on the "x-wap-profile" and/or "profile" headers, which the mobile browser should always send accurately.
I'm trying to get www.example.com and www.example.com/index.html to go to index.html, but I want all other urls e.g. www.example.com/this/is/another/link to still show www.example.com/this/is/another/link but be processed by a generic script. I've tried
RewriteEngine on
RewriteCond %{REQUEST_URI} !^index\.html$
RewriteCond %{REQUEST_URI} !^$
RewriteRule ^(.*)$ mygenericscript.php [L]
but it wont work, can someone please help?
Instead of testing what %{REQUEST_URI} is, you can instead just test if the resource exists:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* mygenericscript.php
This prevents your static resources (images, stylesheets, etc.) from being redirected if they're handled through the same directory your .htaccess is in as well.
What's probably happening now is that you're seeing an internal server error, caused by an infinite internal redirection loop when you try to access anything that isn't / or /index.html. This is because .* matches every request, and after you rewrite to mygenericscript.php the first time, the rule set is reprocessed (because of how mod_rewrite works in the context that you're using it in).
The easiest to do this is to install a 404-handler which gets executed when the server does not find a file to display.
ErrorDocument 404 /mygenericscript.php
or
ErrorDocument 404 /cgi-bin/handler.cgi
or similar should do the trick.
It is not that RewriteRule's can not be used for this, it is just that they are tricky to set up and requires in depth knowledge on how apache handles requests. It is a bit of a black art.
It appears as if you're using PHP, and you can use auto_x_file (x is either append or prepend:
http://php.net/manual/en/ini.core.php