I am currently working on an AngularJS project, and for SEO I decided to use an automatic crawler.
The only thing is that they ask to add those few lines to the .htaccess, resulting in a 500 Internal Server Error on my server...
<ifmodule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} .*googlebot.* [OR][NC]
RewriteCond %{HTTP_USER_AGENT} .*bingbot.* [NC]
RewriteRule .* http://crawlr.wombit.se/Crawler/htmlsnapshot?url=$1 [P]
</ifmodule>
I tested those lines against my website and the answer is:
This variable is not supported: %{HTTP_USER_AGENT}
I already checked a bunch of topics to see if I could find a solution, but I didn't find anything working for my case...
PS: I also tried to remove all the other rules, but I am sure that those 2 rewriteCond are throwing the error.
Update - server configuration
Apache version 2.2.26
PHP version 5.4.26
MySQL version 5.1.73-cll
Not sure why you're getting that error, %{HTTP_USER_AGENT} is a valid apache 2.* mod_rewrite variable. The problem that I see is that your flags are messed up.
[OR][NC]
needs to be
[OR,NC]
Also, you're backreferencing using $1, but you've not created a grouping in your pattern, so $1 will simply be blank, so you probably want to replace $1 with %{REQUEST_URI} or create a grouping in your pattern by changing it to (.*).
Related
I'm having trouble writing a rule for .htaccess which will redirect HTML to PHP. I'm using 302 until I get it to work right - then I'll change to a 301. I found several postings that describe this, but I am having problems - possibly because I'm running a hosting package, and each client is in a virtual/subfolder (sorry for poor description of hosting environment).
The rule I am using is
...
RewriteEngine on
RewriteRule ^(.*).html$ $1.php [R=302]
...
When I try to go to https://dmcelebratealife.com/index.html I get a 404 message saying:
https://www.dmcelebratealife.com/var/www/vhosts/dmcelebratealife.com/public_html/index.php
I added the following and it seemed to work for me.
RewriteEngine on
RewriteRule ^(.*)\.html$ /$1.php [R=302]
This works for me and is domain-independent.
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
RewriteRule ^(?!.+\.\w{2,4})(.+)$ $1.php [L]
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^(?!.+\.\w{2,4})(.+)$ $1.html [L]
The first two redirect to the php file if it exists. If it doesn't then the second two will try for an HTML file.
Note it doesn`t use a 302 as you don't want to tell the user what you are doing. This does an internal redirect.
This means that if the user types in
https://www.example.com/test
it will remain in the URL box, but the following file will be executed
https://www.example.com/test.php
This is EXACTLY the same case as: (htaccess) How to prevent a file from DIRECT URL ACCESS?
But, no one of codes provided by answers work for me. I tried 1 by 1, then tried to combine, but still not works. Here is my code:
# prevent direct image url access
# ----------
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)://(www\.)?example\.com [NC]
RewriteCond %{HTTP_REFERER} !^http(s)://(www\.)?example\.com.*$ [NC]
# this not works
RewriteRule \.(png|gif|jpe?g)$ - [F]
# and this
RewriteRule \.(png|gif|jpe?g)$ - [F,NC]
# and this
RewriteRule \.(png|gif|jpe?g)$ https://example.com/wp-login.php [NC,R,L]
# even by combining them
# ----------
# /prevent direct image url access
The case simulation:
index.php has <img src="test.png" alt=""> and should be normally accessible. The requirement is: http://example.com/test.png shouldn't be accessible.
I use WordPress in wp-engine, and i think WordPres's default rewrite doesn't cause the problem since the code from answers are placed above WordPress rewrite.
UPDATE
I use PHP Version 5.5.9-1ubuntu4.14 on Apache 2 on wp engine
Your rules basically work for me, except for one thing:
The (s) is not doing what you think it does.
RewriteCond %{HTTP_REFERER} !^http(s)://(www\.)?example\.com [NC]
With parentheses you define a group, which doesn't make any sense at this point. If you remove the (s), it works for http.
If you want to use https too you have to write it like this:
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?example\.com [NC]
The ? will make the preceding character (or group, if in parentheses) optional.
I have the following snippet:
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^https?://%{SERVER_NAME}/
RewriteRule \.(js|css|png|jpg) - [R=404,L]
Simple and should work right? It seems to 404 the listed filetypes if I have referrers enabled on browser. Disabling referrers it then allows the files to be served. I have checked the value of %{SERVER_NAME} and it is www.mydomain.com I've tested this in multiple browsers and under HTTP and HTTPS, all have the same result. I used the below rewrite to check %{SERVER_NAME}'s value:
RewriteRule servername value_is_%{SERVER_NAME} [R=301,L]
The URL I get redirected to is then https://www.mydomain.com/value_is_www.mydomain.com
That being said the snippet should allow a referrer with that value or an empty one. But why is it being triggered? It's been driving me nuts for the past 2 hours, but it's 5am so I could be just crazy =o\ Thank you in advance, and I'm off to bed!
Problem is, you cannot use variables in conditional patterns (well, at least not until Apache 2.4) as the patterns are being precompiled during server startup.
For your particular problem, though, there's a simple workaround that you may use to mimic the condition:
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{SERVER_NAME}%{HTTP_REFERER} !^(.*)https?://\1/
RewriteRule \.(js|css|png|jpg) - [R=404,L]
Yep, that's all. You cannot use variables but sure can use back-references.
Oh ... and btw. Apache 2.4 does ship with expressions that may be used instead of the conditional patterns:
RewriteCond expr "! %{HTTP_REFERER} -strmatch '*://%{HTTP_HOST}/*'"
Hi people#stackoverflow,
Maybe I have a fundamental misconception about the working of RewriteRule. Or maybe not. Nevertheless, I'm trying to figure this out now for two days, without any progress.
This is the currrent situation:
I have a Joomla website with SEF and mod_rewrite turned on.
This results in the URL:
mysite.com/index.php?option=com_remository&Itemid=7
being rewritten to:
mysite.com/sub-directory/sub-directory/0000-Business-files/
These are the lines that are currently used in my .htaccess (all standard Joomla)
Options +FollowSymLinks
RewriteEngine On
RewriteRule ^([^\-]*)\-(.*)$ $1 $2 [N]
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
RewriteRule ^(.*)$ index.php [F,L]
# RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$ [NC]
RewriteRule (.*) index.php
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
This is what I want to achieve:
When a visitor uses this URL
mysite.com/sub directory/sub directory/0000 Business files/
it should lead him to the right page.
Although I know it's not the best idea to use spaces in a URL, I'm confronted with the fact that these 'spacious' URL's are used in a PDF, that's already been issued.
I thought I could use mod_rewrite to rewrite these URL's. But all I get is 'page not found'
I've added this rule on top of the .htaccess file:
RewriteRule ^([^\-]*)\-(.*)$ $1 $2 [N]
But this is not working. What am I doing wrong? Or, also possible, am I missing the point on when and how to use mod_rewrite?
rgds, Eric
First off, the default behavior of apache is usually to allow direct URLs that map to the underlying file system (relative to the document root), and you should use RewriteRule when you want to work around that. Looking at your question, it seems like you want to browse the filesystem and so you should not use a RewriteRule.
If mysite.com/sub+diretory/sub+directory/0000+Business+files/ doesn't work (without your rule), I'm wondering: do you have that directory structure on your server? I.e. does it look like this?
[document root]/index.php
[document root]/sub directory/sub directory/0000 Business files/
If not, I'm not sure I understand what you're trying to achieve, and what you mean by the visitor being "lead to the right page". Could you provide an example URL that the user provides, and the corresponding URL (or file system path) that you want the user to be served.
Regarding your rewrite rule, I'm not even sure that it is allowed, and I'm surprised you don't get a 500 Internal Server Error. RewriteRule takes two arguments (matching pattern and substitution) and optionally some flags, but because of the space between $1 and $2 you're supplying three arguments (+ flags).
EDIT: I got the pattern wrong, but it still doesn't make much sense. It matches against any URL that has at least one dash in it, and then picks out the parts before and after the first dash. So, for a URL like "this-is-a-url-path/to-a-file/on-the-server", $1 would be "this" and $2 would be "is-a-url-path/to-a-file/on-the-server". Again, if I had some example URLs and their corresponding rewrites, I could help you find the right pattern.
On a side note, spaces aren't allowed in URLs, but the browser and server probably does some work behind the scenes, allowing your PDFs to be picked up correctly.
I've tried every single example I could find, they all produce an internal server error. I have these rules set up (this works, no error):
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}/index.php !-f
RewriteRule ^((/?[^/]+)+)/?$ ?q=$1 [L]
So if it's not an existing file or an existing directory with an index.php we redirect. For instance, http://domain.com/foo/bar becomes http://domain.com/?q=foo/bar
Thing is, I want the trailing slash stripped. So take off the /? at the end of the rule. How do I make it so that http://domain.com/foo/bar/ becomes http://domain.com/foo/bar with a visible redirect first (fixing the client's URL), and only then the real, silent redirection to ?q=?
Everywhere I look I see this:
RewriteRule (.*)/$ $1 [R,L]
But it gives me a 500 error if I insert it before my rule.
If foo/bar exists as a real directory, then the server will be redirecting the client to foo/bar/ (with the trailing slash). It has to do that in order for relative URLs to work correctly on the client. If you put in a rule to rewrite that back to foo/bar with a redirect then there will be a loop. An easy way to test if that's happening is to specify a path that doesn't exist at all (I assume from your index.php detection that the directory tree actually exists). The nonexistent path won't trigger the built-in redirect.
If I setup a similar set of rules to yours (plus the suggested slash-removal rule) I can see the difference between a directory that exists and one that doesn't. The ones that don't work as expected, the ones that do cause Firefox to say This page isn't redirecting properly. IE8 says something similar. Perhaps the Apache setup you're using can detect it and turns it into the 500 error?
It looks like the simpler rewrite rule you mention at the end of your question should work. The problem is, the 500 error isn't really helpful in figuring out why it's not working. One way I've found useful in helping debug mod_rewrite errors is to enable it's logging. Add the following to your httpd.conf:
RewriteLog "/usr/local/var/apache/logs/rewrite.log"
RewriteLogLevel 3
Then try again, and look in the log to see what's going on. Once you're done, you can disable the log be setting the rewriteloglevel 0. See the mod_rewrite docs for details.
Try this rule in front of your current rule:
RewriteRule (.*)/$ /$1 [R,L]
Try these rules:
#prevent mod_dir from adding slash
DirectorySlash Off
#redirect /folder/ to /folder
RewriteCond %{THE_REQUEST} ^GET\s\S+/(\?\S+)?\s [NC]
RewriteRule ^(.*)/$ /$1 [R=301,L,QSA]
#internal redirect for directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{REQUEST_URI} !/$
RewriteRule ^(.*)$ /$1/ [L]