htaccces 1/2 similar rules not redirecting - .htaccess

I have two rewrite rules:
RewriteRule ^library/(.*)$ market-intelligence/resources/$1 [L,R=301]
RewriteRule ^library/.*\.pdf$ email/$1 [L,R=301]
As one can see, they are for the same directory, but the second deals with all pdf files. However, any pdfs in the directory still lead to the first rule's destination.
Am I doing something wrong?

Am I doing something wrong?
Yes. Rules are tested sequentially and with the [L], the first matching rule fires and quits that scan. So the trick is to order your rules from the specific to the general. In this case swap them. The PDFs will be rewritten to the email folder and the rest to market-intelligence/resource.

Related

Clarification of using [L] in htaccess rewrites?

I wonder if you could just clarify a simple point for me please. Sorry if its a bit basic or been asked before. Ok say I have something like the following Rewrite rules:
RewriteRule ^directory/file http://www.mysite.com/newdirectory/file [R=301,NC,L]
RewriteRule ^directory/file2 http://www.mysite.com/newdirectory/file2 [R=301,NC,L]
RewriteRule ^directory/file3 http://www.mysite.com/newdirectory/file3 [R=301,NC,L]
Since these changes are permanent, I'm using the R=301 flag. Now I am assuming I am correct in adding a [L] flag to each of these, since they are independent/separate rewrites and don't relate to each other? Now if I have a section underneath that deals with hotlinking, or some other rewrites, I'm assuming they will still work? I guess what I am trying to say is by using the [L] it doesn't stop all processing or Rewrite rules?
Again sorry if this is a stupid question, I just need someone to clarify it for me as I've read some articles about this, but in my mind they haven't made it 100% clear.
Thanks.
Definitely not a silly question and many web developers have misunderstanding around flag L.
This flag only stops current loop of rewrite (works like continue in a for loop) and forces rewrite engine to process rewrite rules again. It doesn't and cannot stop rules below or above to be processed.
Yes and no. It does meant to stop processing the rules, but it doesn't stop the other rules.
The apache docs says:
The [L] flag causes mod_rewrite to stop processing the rule set. In most contexts, this means that if the rule matches, no further rules will be processed. This corresponds to the last command in Perl, or the break command in C. Use this flag to indicate that the current rule should be applied immediately without considering further rules.
But, what it doesn't say is that:
(from mediacollege)
The L flag will tell Apache to stop processing the rewrite rules for
that request. Now what is often unrealised is that it now makes a new
request for the new, rewritten filename and begin processing the
rewrite rules again.
Therefore, if you were to do a rewrite where the destination is still
a match to the pattern, it will not behave as desired. In these cases,
you should use a RewriteCond to exlude a certain file from the rule.

.htaccess and dynamically generated SEO friendly URLs

I'm trying to build a website that may be called from the URL bar with any one of the following examples:
domainname.com/en
domainname.com/zh-cn/
domainname.com/fr/page1
domainname.com/ru/dir1/page2
domainname.com/jp/dir1/page2/
domainname.com/es-mx/dir1/dir2/page3.html
These page requests need to hit my .htaccess template and ultimately be converted into this php call:
/index.php?lng=???&tpl=???
I've been trying to make RewriteCond and RewriteRule code that will safely deal with the dynamic nature of the URLs I'm trying to take in but totally defeated. I've read close to 50 different websites and been working on this for almost a week now but I have no idea what I'm doing. I don't even know if I should be using a RewriteCond. Here is my last attempt at making a RewriteRule myself:
RewriteRule ^(([a-z]{2})(-[a-z]{2})?)([a-z0-9-\./]*) /index.php?lng=$1&tpl=$4 [QSA,L,NC]
Thanks for any help,
Vince
What's causing your loop is that your regex pattern matching /index.php. Why? Let's take a look:
First, the prefix is stripped because these are rules in an htaccess file, so the URI after the first rewrite is: index.php (query string is separate)
The beginning of your regex: ^(([a-z]{2})(-[a-z]{2})?), matches in in the URI
The next bit of your regex: ([a-z0-9-\./]*) matches dex.php. Thus the rule matches and gets applied again, and will continue to get applied until you've reached the internal recursion limit.
Your URL structure:
domainname.com/en
domainname.com/zh-cn/
domainname.com/fr/page1
domainname.com/ru/dir1/page2
domainname.com/jp/dir1/page2/
domainname.com/es-mx/dir1/dir2/page3.html
Either has a / after the country code or nothing at all, so you need to account for that:
# here -------------------v
^(([a-z]{2})(-[a-z]{2})?)(/([a-z0-9-\./]*))?$
# and an ending match here ------------^
You shouldn't need to change anything else:
RewriteRule ^(([a-z]{2})(-[a-z]{2})?)(/([a-z0-9-\./]*))?$ /index.php?lng=$1&tpl=$4 [QSA,L,NC]

htaccess mod rewrite

I want to rewrite the newsletter page of my site to url.com/newsletter/ the problem is, I have another rule overlapping this step. The rules looks like this :
RewriteRule ^([^/]*)/$ /index.php?categories=$1 [L,QSA] //Primeryrule overlapping the secondary rule
RewriteRule ^newsletter/$ /?newsletter=$1 [L]
Is there any possibility to apply special case rules or something like that (I don't want to use any workaround like .html or .php extension or stuff like this, just the url as above).
Apache reads these rules from the top down. So put your new rule first and then the existing rule and give it a try.
Just found the solution, that I need to keep the newsletter rule on top of the other rule to make it apply first.

Why does this cause an infinite request loop?

Earlier today, I was helping someone with an .htaccess use case, and came up with a solution that works but can't quite figure it out myself!
He wanted to be able to:
Browse to index.php?id=3&cat=5
See the location bar read index/3/5/
Have the content served from index.php?id=3&cat=5
The last two steps are fairly typical (usually from the user entering index/3/5 in the first place), but the first step was required because he still had some old-format links in his site and, for whatever reason, couldn't change them. So he needed to support both URL formats, and have the user always end up seeing the prettified one.
After much to-ing and fro-ing, we came up with the following .htaccess file:
RewriteEngine on
# Prevents browser looping, which does seem
# to occur in some specific scenarios. Can't
# explain the mechanics of this problem in
# detail, but there we go.
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [L]
# Hard-rewrite ("[R]") to "friendly" URL.
# Needs RewriteCond to match original querystring.
# Uses "?" in target to remove original querystring,
# and "%n" backrefs to move its components.
# Target must be a full path as it's a hard-rewrite.
RewriteCond %{QUERY_STRING} ^id=(\d+)&cat=(\d+)$
RewriteRule ^index\.php$ http://example.com/index/%1/%2/? [L,R]
# Soft-rewrite from "friendly" URL to "real" URL.
# Transparent to browser.
RewriteRule ^index/(\d+)/(\d+)/$ /index.php?id=$1&cat=$2
Whilst it might seem to be a somewhat strange use case ("why not just use the proper links in the first place?", you might ask), just go with it. Regardless of the original requirement, this is the scenario and it's driving me mad.
Without the first rule, the client enters into a request loop, trying to GET /index/X/Y/ repeatedly and getting 302 each time. The check on REDIRECT_STATUS makes everything run smoothly. But I would have thought that after the final rule, no more rules would be served, the client wouldn't make any more requests (note, no [R]), and everything would be gravy.
So... why would this result in a request loop when I take out the first rule?
Without being able to tinker with your setup, I can't say for sure, but I believe this problem is due to the following relatively arcane feature of mod_rewrite:
When you manipulate a URL/filename in per-directory context mod_rewrite first rewrites the filename back to its corresponding URL (which is usually impossible, but see the RewriteBase directive below for the trick to achieve this) and then initiates a new internal sub-request with the new URL. This restarts processing of the API phases.
(source: mod_rewrite technical documentation, I highly recommend reading this)
In other words, when you use a RewriteRule in an .htaccess file, it's possible that the new, rewritten URL maps to an entirely different directory on the filesystem, in which case the .htaccess file in the original directory wouldn't apply anymore. So whenever a RewriteRule in an .htaccess file matches the request, Apache has to restart processing from scratch with the modified URL. This means, among other things, that every RewriteRule gets checked again.
In your case, what happens is that you access /index/X/Y/ from the browser. The last rule in your .htaccess file triggers, rewriting that to /index.php?id=X&cat=Y, so Apache has to create a new internal subrequest with the URL /index.php?id=X&cat=Y. That matches your earlier external redirect rule, so Apache sends a 302 response back to the browser to redirect it to /index/X/Y/. But remember, the browser never saw that internal subrequest; as far as it knows, it was already on /index/X/Y/. So it looks to you as though you're being redirected from /index/X/Y/ to that same URL, triggering an infinite loop.
Besides the performance hit, this is probably one of the better reasons that you should avoid putting rewrite rules in .htaccess files when possible. If you move these rules to the main server configuration, you won't have this problem because matches on the rules won't trigger internal subrequests. If you don't have access to the main server configuration files, one way you can get around it (EDIT: or so I thought, although it doesn't seem to work - see comments) is by adding the [NS] (no subrequest) flag to your external redirect rule,
RewriteRule ^index\.php$ http://example.com/index/%1/%2/? [L,R,NS]
Once you do that, you should no longer need the first rule that checks the REDIRECT_STATUS.
The solution below worked for me.
RewriteEngine on
RewriteBase /
#rule1
#Guard condition: only if the original client request was for index.php
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php [NC]
RewriteCond %{QUERY_STRING} ^id=(\d+)&cat=(\d+)$ [NC]
RewriteRule . /index/%1/%2/? [L,R]
#rule 2
RewriteRule ^index/(\d+)/(\d+)/$ /index.php?id=$1&cat=$2 [L,NC]
Here is what I think is happening
From the steps you quoted above
Browse to index.php?id=3&cat=5
See the location bar read index/3/5/
Have the content served from index.php?id=3&cat=5
At Step 1, Rule 1 matches and redirects to location bar and fulfills Step 2.
At Step 3, Rule 2 now matches and rewrites to index.php.
The rules are rerun, for the reasons David stated, but since THE_REQUEST is immutable once set to the original request, it still contains /index/3/5 so Rule 1 does not match.
Rule 2 does not match either and the result of index.php is served.
Most other variables are mutable e.g. REQUEST_URI. Their modification during rule processing, and the incorrect expectation that the pattern matches are against the original request is a common reason for infinite loops.
Its feels quite esoteric sometimes, but I am sure there is a logical reason for its complexity :-)
EDIT
Surely there are two distinct requests
There are 2 client requests, the original one from Step1 and the one from the external redirect in step 2.
What I glossed over above is that when Rule 2 matches on the second request, it is rewritten to /index.php and causes an internal redirect. This forces the .htaccess file for / directory to be loaded again (it could easily have been another another directory with different .htaccess rules) and Re-run all the rules again.
So... why would this result in a request loop when I take out the first rule?
When the rules are re-run, the first rule now unexpectedly matches, as a result of Rule2's rewrite, and does a redirect, causing an infinite loop.
David's answer does contain most of this information and is what I meant "for the reasons David stated".
However, the main point here is that you do need the extra condition, either your condition, which stops further rule processing on internal redirects, or mine, which prevents rule 1 from matching, is necessary to prevent the infinite loop.

mod_rewrite Redirect Rule Variables question

I'm a bit of an .htaccess n00b, and can't for the life of me get a handle of regular expressions.
I have the following piece of RewriteRule code that works just fine:
RewriteRule ^logo/?$ /pages/logo.html
Basically, it takes /pages/logo.html and makes it /logo.
Is there a way for me to generalize that code with variables, so that it works automatically without having to have an independent line for each page?
I know $1 can work as a variable, but thats usually for queries, and I can't get it to work in this instance.
First you need to know that mod_rewrite can only handle requests to the server. So you would need to request /logo to have it rewritten to /pages/logo.html. And that’s what the rule does, it rewrites requests with the URL path /logo internally to /pages/logo.html and not vice versa.
If you now want to use portions of the matched string, you need to use groups to group them ( (expr)) that you then can reference to with $n. In your case the pattern [^/] will be suitable that describes any character other than the slash /:
RewriteRule ^([^/]+)$ /pages/$1.html
Try this:
RewriteRule ^/pages/(.*)\.html$ /$1
The (.*) matches anything between pages/ and .html. Whatever it matches is used in $1. So, /pages/logo.html becomes /logo, and /pages/subdir/other_page.html would become /subdir/other_page

Resources