htaccess - Difference between .* and \* - .htaccess

I would like to allow a robot with the user agent ECLoadToEdge/383175. Since I cannot confirm if the 6 numbers will change, I intend to use an asterisk.
May I know the difference between:
RewriteCond %{HTTP_USER_AGENT} !^ECLoadToEdge\*$
and
RewriteCond %{HTTP_USER_AGENT} !^ECLoadToEdge.*$
Would it be better to use !^ECLoadToEdge.[0-9]{6} instead of * for performance?

This rule is wrong:
RewriteCond %{HTTP_USER_AGENT} !^ECLoadToEdge\*$
SInce it will try to match literal asterisk in user agent.
You should just use:
RewriteCond %{HTTP_USER_AGENT} !^ECLoadToEdge
Since you don't care what comes after ECLoadToEdge

Related

RewriteCond for query parameters in .htaccess files

I have to write some Rewrite Rules and I need to make a check based on my query parameters.
The public url is something like this abc.com/lmn/xyz.json and there is an optional parameter optparam.
This is what I want to achieve:
If optparam is present and not equal to false, the conditions have to fail and carry on with other rules.
After reading through few blogs and posts, I have a very faint idea about these rules. So tried this:
RewriteCond %{QUERY_STRING} !^optparam $ [NC,OR]
RewriteCond %{QUERY_STRING} ^optparam=false$ [NC]
RewriteRule ^lmn/xyz.json$ xyz.json
But the RewriteRule is applied even when I send the param value to be true.
Please tell me what I am missing here.
Thanks in advance!
Examples:
abc.com/lmn/xyz.json ==> Rule should fire
abc.com/lmn/xyz.json?optparam ==> Rule should not fire
abc.com/lmn/xyz.json?optparam=false ==> Rule should fire
abc.com/lmn/xyz.json?optPARAM=hfjsgzjrg ==> Rule should not fire
abc.com/lmn/xyz.json?optParam=FALSE ==> Rule should fire
Change your rule to this:
RewriteCond %{QUERY_STRING} ^(optparam=false)?$ [NC]
RewriteRule ^lmn/xyz\.json$ xyz.json [L,NC]
Well, I think an additional check of optparam not present at all is required. Tried this:
RewriteCond %{QUERY_STRING} (^$|!^optparam$|^optparam=false$) [NC]
RewriteRule ^lmn/xyz\.json$ xyz.json [L,NC]
This seems to work for me. But is there any way to further shorten this condition? I believe I am not using regex to its full potential here.

htaccess cookie value is directory redirect

I have a cookie name=dir and value=test. I want htaccess to check if that value exist as a directory and redirect based on that.
RewriteCond /var/www/whatever/%{HTTP_COOKIE:dir} -d
RewriteRule ^(.*)$ example.com [R]
I know it would be possible with a RewriteMap, but I have no access to the conf file and RewriteMaps must be defined there, not in htaccess. A pure mod_rewrite solution would be best, because the module for setenv isn't enabled either. I've tried and googled, but to no avail.
Something like %{HTTP:header} but for cookies would be ideal, but Apache doesn't do that.
You have to match against %{HTTP_COOKIE} in a seperate RewriteCond
RewriteCond %{HTTP_COOKIE} ^dir=(.+)$
RewriteCond /var/www/whatever/%1 -d
RewriteRule ^(.*)$ example.com [R]
#starkeen: I wasn't aware that i could use %1 in RewriteCond, thought it was for RewriteRule only. Your Answer works perfectly exept for 2 things:
A. Regex. %{HTTP_COOKIE} is a String that can have 3 cases in this Situation:
Case 1: dir=abc - Your Regex works
Case 2: dir=abc; cookie1=v1 - Your Rexex does not work
Case 3: cookie1=v1; dir=abc; cookie2=v2 - Your Rexex does not work
Also important: it might be something like dir_save=v1; x_dir=v2; dir=abs, so something like
RewriteCond %{HTTP_COOKIE} ^.*dir=(.*).*$
will not work either.
So ... Start with [NEWLINE] or ';[SPACE]' and end with [LINE-END] or ';'
RewirteCond %{HTTP_COOKIE} (^|;\ )dir=(.*)(;|$)
RewriteCond %2 -d
RewriteRule ^(.*)$ example.com [R]
B. Also check for strange values of the cookie dir, like empty or '\' or '.' or '..\' and so on. On Windows of cause '\' ...
RewirteCond %{HTTP_COOKIE} (^|\ ;)dir=(.*)(;|$)
# must be a directory
RewriteCond %2 -d
# must not be empty
RewriteCond %2 !^$
# must not contain dot
RewriteCond %2 !^.*\..*$
# must not contain /
RewriteCond %2 !^.*/.*$
# must not contain \
RewriteCond %2 !^.*\\.*$
RewriteRule ^(.*)$ example.com [R]
#all: sorry for answering my own question (also starkeen showed a vital part i wasn't aware of) and also sorry for being so rigorous about regex and strange cookie values, but such things can cost you hours to fiddle out in the worst case. I've allready made such mistakes and learned the hard way.

htaccess block *bot and bot*

I'm having trouble blocking two bad bots that keep sucking bandwidth from my site and I'm certain it has something to do with the * in the user-agent name that they use.
Right now, I'm using the following code to block the bad bots (this is an excerpt)...
# block bad bots
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^spider$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^robot$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^crawl$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^discovery$
RewriteRule .* - [F,L]
When I try to do RewriteCond %{HTTP_USER_AGENT} ^*bot$ [OR] or RewriteCond %{HTTP_USER_AGENT} ^(*bot)$ [OR] I get an error.
Guessing there is a pretty easy way to do this that I just haven't found yet on Google.
An asterisk (*) in a regular expression pattern needs to be escaped, since it is being interpreted as part of the regular expression.
RewriteCond %{HTTP_USER_AGENT} ^\*bot$
should do the trick.
I think your are missing a dot ., change your condition to this:
RewriteCond %{HTTP_USER_AGENT} ^.*bot$ [OR]
But how is this going to prevent Bad Bot access?
I work for a security company (also PM at Botopedia.org) and I can tell that 99.9% of bad bots will not use any of these expressions in their user-agent string.
Most of the time Bad Bots will use legitimate looking user-agents (impersonating browsers and VIP bots like Googlebot) and you simply cannot filter them via user-agent data alone.
For effective bot detection you should look into other signs like:
1) Suspicious signatures (i.e. Order of Header parameter)
or/and
2) Suspicious behavior (i.e. early robots.txt access or request rates/patterns)
Then you should use different challenges (i.e. JS or Cookie or even CAPTCHA) to verify your suspicions.
The problem you've described is often referred to as a "Parasitic Drag".
This is a very real and serious issue and we actually published a research about it just a couple of month ago.
(We found that on an average sized site 51% of visitors will be bots, 31% malicious)
Honestly, I don't think you can solve this problem with several line of RegEx.
We offer our Bot filtering services for free and there are several others like us. (I can endorse good services if needed)
GL.

.htacces RewriteRule not working

Hi people#stackoverflow,
Maybe I have a fundamental misconception about the working of RewriteRule. Or maybe not. Nevertheless, I'm trying to figure this out now for two days, without any progress.
This is the currrent situation:
I have a Joomla website with SEF and mod_rewrite turned on.
This results in the URL:
mysite.com/index.php?option=com_remository&Itemid=7
being rewritten to:
mysite.com/sub-directory/sub-directory/0000-Business-files/
These are the lines that are currently used in my .htaccess (all standard Joomla)
Options +FollowSymLinks
RewriteEngine On
RewriteRule ^([^\-]*)\-(.*)$ $1 $2 [N]
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
RewriteRule ^(.*)$ index.php [F,L]
# RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$ [NC]
RewriteRule (.*) index.php
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
This is what I want to achieve:
When a visitor uses this URL
mysite.com/sub directory/sub directory/0000 Business files/
it should lead him to the right page.
Although I know it's not the best idea to use spaces in a URL, I'm confronted with the fact that these 'spacious' URL's are used in a PDF, that's already been issued.
I thought I could use mod_rewrite to rewrite these URL's. But all I get is 'page not found'
I've added this rule on top of the .htaccess file:
RewriteRule ^([^\-]*)\-(.*)$ $1 $2 [N]
But this is not working. What am I doing wrong? Or, also possible, am I missing the point on when and how to use mod_rewrite?
rgds, Eric
First off, the default behavior of apache is usually to allow direct URLs that map to the underlying file system (relative to the document root), and you should use RewriteRule when you want to work around that. Looking at your question, it seems like you want to browse the filesystem and so you should not use a RewriteRule.
If mysite.com/sub+diretory/sub+directory/0000+Business+files/ doesn't work (without your rule), I'm wondering: do you have that directory structure on your server? I.e. does it look like this?
[document root]/index.php
[document root]/sub directory/sub directory/0000 Business files/
If not, I'm not sure I understand what you're trying to achieve, and what you mean by the visitor being "lead to the right page". Could you provide an example URL that the user provides, and the corresponding URL (or file system path) that you want the user to be served.
Regarding your rewrite rule, I'm not even sure that it is allowed, and I'm surprised you don't get a 500 Internal Server Error. RewriteRule takes two arguments (matching pattern and substitution) and optionally some flags, but because of the space between $1 and $2 you're supplying three arguments (+ flags).
EDIT: I got the pattern wrong, but it still doesn't make much sense. It matches against any URL that has at least one dash in it, and then picks out the parts before and after the first dash. So, for a URL like "this-is-a-url-path/to-a-file/on-the-server", $1 would be "this" and $2 would be "is-a-url-path/to-a-file/on-the-server". Again, if I had some example URLs and their corresponding rewrites, I could help you find the right pattern.
On a side note, spaces aren't allowed in URLs, but the browser and server probably does some work behind the scenes, allowing your PDFs to be picked up correctly.

mod_rewrite regex (too many redirects)

I am using mod_rewrite, to convert subdomains into directory urls. (solution from here). When I explicity write a rule for one subdomain, it works perfectly:
RewriteCond %{HTTP_HOST} ^[www\.]*sub-domain-name.domain-name.com [NC]
RewriteCond %{REQUEST_URI} !^/sub-domain-directory/.*
RewriteRule ^(.*) /sub-domain-directory/$1 [L]
However, if I try to match all subdomains, it results in 500 internal error (log says too many redirects). The code is:
RewriteCond %{HTTP_HOST} ^[www\.]*([a-z0-9-]+).domain-name.com [NC]
RewriteCond %{REQUEST_URI} !^/%1/.*
RewriteRule ^(.*) /%1/$1 [L]
Can anyone suggest what went wrong and how to fix it?
Your second RewriteCond will never return false, because you can't use backreferences within your test clauses (they're compiled during parsing, making this impossible since no variable expansion will take place). You're actually testing for paths beginning with the literal text /%1/, which isn't what you wanted. Given that you're operating in a per-directory context, the rule set will end up being applied again, resulting in a transformation like the following:
path -> sub/path
sub/path -> sub/sub/path
sub/sub/path -> sub/sub/sub/path
...
This goes on for about ten iterations before the server gets upset and throws a 500 error. There are a few different ways to fix this, but I'm going to chose one that most closely resembles the approach you were trying to take. I'd also modify that first RewriteCond, since the regular expression is a bit flawed:
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com$ [NC]
RewriteCond %1 !=www
RewriteCond %1#%{REQUEST_URI} !^([^#]+)#/\1/
RewriteRule .* /%1/$0 [L]
First, it checks the HTTP_HOST value and captures the subdomain, whatever it might be. Then, assuming you don't want this transformation to take place in the case of www, it makes sure that the capture does not match that. After that, it uses the regular expression's own internal backreferences to see if the REQUEST_URI begins with the subdomain value. If it doesn't, it prepends the subdomain as a directory, like you have now.
The potential problem with this approach is that it won't work correctly if you access a path beginning with the same name as the subdomain the request is sent to, like sub.example.com/sub/. An alternative is to check the REDIRECT_STATUS environment variable to see if an internal redirect has already been performed (that is, this prepending step has already occurred):
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com$ [NC]
RewriteCond %1 !=www
RewriteCond %{ENV:REDIRECT_STATUS} =""
RewriteRule .* /%1/$0 [L]

Resources