htaccess/mod_write: Help to understand rewrite rules and rewrite conditions - .htaccess

I have the following rewrite rules, which I honestly don't quite understand. This great community helped me to write them a long time ago:
RewriteEngine On
#1
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.]+)\.[^.#]+#FB([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/facebook/%1/%2? [L,NE,R=302]
#2
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.]+)\.[^.#]+#LI([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/linkedin/%1/%2? [L,NE,R=302]
#3
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.]+)\.[^.#]+#PT([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/pinterest/%1/%2? [L,NE,R=302]
#4
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} \.([^.]+)\.[^.#]+#XN([^&]*) [NC]
RewriteRule ^ https://www.mysite.de/link/xing/%1/%2? [L,NE,R=302]
#5
RewriteCond %{QUERY_STRING}#%{HTTP_HOST} ^IG([^&#]+)#(?:.+\.)?([^.]+)\. [NC]
RewriteRule ^ https://www.mysite.de/link/instagram/%2/%1 [L,NE,R=302]
#6
RewriteCond %{HTTP_HOST} \.([^.]+)\.[^.]+$
RewriteRule ^IG([^/]+)/?$ https://www.mysite.de/link/instagram/%1/$1 [L,NC,NE,R=302]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ / [L,QSA,R=302]
Besides lots of other things, I don't get why in rule #5 %{QUERY_STRING} and %{HTTP_HOST} is swapped compared to the other rules. What does the # do within the rewrite condition? Does it function as a separator?
What does rule #6 do in addition to rule #5?
The main problem for now: The following link:
https://test.info/IGZSGUFSPSWC?fbclid=PAAaY2XV-SntbX2OtypPe92gVjSldUctufiEup5zBCOCC7rB71MO8JQlc-8F0&e=ATPMSEL2pg5PoJpf1puuGgcr8dPHj-CUM60wGlrTHZ4VGbz5KBIno8SeX_UzO-K1HHjnP8ebBEwdDfWgMGB3Pa1mv6YCLAUzSBvdZQ
should be redirected to
https://www.mysite.de/link/instagram/datenschutz-impressum/ZSGUFSPSWC
but it is redirected to
https://www.mysite.de/link/facebook/datenschutz-impressum/clid=PAAaY2XV-SntbX2OtypPe92gVjSldUctufiEup5zBCOCC7rB71MO8JQlc-8F0
The query string begins with fb so the first rule does apply. I want rule #5 to apply.
How can this be done?
I wish you a great new year

#5
RewriteCond %{QUERY_STRING}#%{HTTP_HOST} ^IG([^&#]+)#(?:.+\.)?([^.]+)\. [NC]
RewriteRule ^ https://www.mysite.de/link/instagram/%2/%1 [L,NE,R=302]
I don't get why in rule #5 %{QUERY_STRING} and %{HTTP_HOST} is swapped compared to the other rules.
There would not seem to be any good reason for that. Maybe it was written at a different time and it just made sense to do it that way at the time?
Everything is just reversed... the regex and the backreferences in the substitution string (ie. %2 and %1).
That rule could be rewritten the same way as the preceding rules like this:
#5 (reversed)
RewriteCond %{HTTP_HOST}#%{QUERY_STRING} ^(?:.+\.)?([^.]+)\.[^#]*#IG([^&]+)$ [NC]
RewriteRule ^ https://www.mysite.de/link/instagram/%1/%2 [L,NE,R=302]
Note that this rule is subtly different to the preceding rules for some reason, which does look like an error, but may not make any difference (depending on the request). Points of note:
This rule makes the subdomain optional (for single level TLDs), whereas in the preceding rules the subdomain is mandatory. For instance, this rule will match test.info, whereas the preceding rules will not, since they are expecting a subdomain like www.test.info. So I doubt that test.info is an accurate "exemplified" hostname in your example?
This rule only permits a single URL parameter that starts IG followed by something. Whereas the earlier rules match the two character code followed by anything (incl. nothing) and any other URL parameters (which are simply discarded).
This rule also preserves the query string, whereas the preceding rules discard it. This looks like an oversight/error?
What does the # do within the rewrite condition? Does it function as a separator?
Yes, it's simply a separator between the two parts of the URL. Which effectively allows two conditions to be combined in order to capture backreferences from both. Any character could be used here that does not occur in either the hostname or the query string parts of the URL.
What does rule #6 do in addition to rule #5?
Rule #6 checks the URL-path instead of the query string (in rule #5). In fact, it's this rule that should be triggered for your example URL, not rule #5.
In other words, rule #5 would match /anything?IGX, whereas rule #6 matches /IGX.
Solution
The query string begins with fb so the first rule does apply. I want rule #5 to apply.
(Except, as mentioned above, test.info would not match, and rule #6 should apply here, not rule #5.)
I would question whether you need the NC (case-insensitive) flag on all the preceding conditions? Do you need to match fb in the query string in the first rule, or just FB as stated in the regex? Removing the NC would resolve the immediate problem you are experiencing.
Otherwise, you need to change the order of the rules so that rule #6 is first and therefore takes priority over rule #1.
Aside:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ / [L,QSA,R=302]
This 302 redirects anything that would ordinarily trigger a 404 to root (ie. the "homepage"). This isn't generally recommended for SEO or users. However, this is more easily achieved with the following core directive, which will do the same thing:
ErrorDocument 404 https://www.example.com/
Logically, this should be defined at the top of the file, but the order does not really matter.

Related

mod_rewrite and redirect causing loop

I have problem when I try to redirect and rewrite together.
I have site example.com/show_table.php?table=12 (max 99 tables). I wanted nice links, so I got this .htacces rw rule:
RewriteRule ^table/([0-9]{1,2})$ show_table.php?table=$1 [L,NC]
Now are links something like example.com/table/12 - it's definitely OK. But I want all old links redirect to new format. So I use Redirect 301, I added to .htaccess this code:
RewriteCond %{REQUEST_URI} show_table.php
RewriteCond %{QUERY_STRING} ^table=([0-9]{1,2})$
RewriteRule ^show_table\.php$ http://example.com/table/%1? [L,R=301,NC]
But when I visit example.com/show_table.php?table=12, I receive just redir-loop. I don't understant - the first is rewrite, the second is redirection, there ain't no two redirections. Do You see any error?
Thanks!
Instead of checking REQUEST_URI in the condition, you need to be checking in THE_REQUEST (which contains the full original HTTP request, like GET /show_table.php HTTP/1.1). When Apache performs the rewrite, it changes REQUEST_URI, so to the rewritten value, and that sends you into a loop.
# Match show_table.php in the input request
RewriteCond %{THE_REQUEST} /show_table\.php
RewriteCond %{QUERY_STRING} ^table=([0-9]{1,2})$
# Do a full redirection to the new URL
RewriteRule ^show_table\.php$ http://example.com/table/%1? [L,R=301,NC]
# Then apply the internal rewrite as you already have working
RewriteRule ^table/([0-9]{1,2})$ show_table.php?table=$1 [L,NC]
You could get more specific in the %{THE_REQUEST} condition, but it should be sufficient and not harmful to use show_table\.php as the expression.
You'll want to read over the notes on THE_REQUEST over at Apache's RewriteCond documentation.
Note: Technically, you can capture the query string in the same RewriteCond and reduce it to just one condition. This is a little shorter:
# THE_REQUEST will include the query string so you can get it here.
RewriteCond %{THE_REQUEST} /show_table\.php\?table=([0-9]{1,2})
RewriteRule ^show_table\.php$ http://example.com/table/%1? [L,R=301,NC]

Stop hotlinking using htaccess and non-specific domain code

I need to write an anti-hotlink command for my .htaccess file but it can not be specific to any domain name in particular. Here's what I found on another sites so far but I'm not sure exactly why it doesn't work, can anyone spot the problem?
# Stop hotlinking.
#------------------------------
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} ^https?://([^/]+)/ [NC]
# Note the # is just used as a boundary. It could be any character that isn't used in domain-names.
RewriteCond %1#%{HTTP_HOST} !^(.+)#\1$
RewriteRule \.(bmp|gif|jpe?g|png|swf)$ - [F,L,NC]
Try this.
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} ^https?://(www\.)?([^/]+)/.*$ [NC]
RewriteCond %2#%{HTTP_HOST} !^(.+)#(www\.)?\1$ [NC]
RewriteRule \.(bmp|gif|jpe?g|png|swf)$ - [F,L,NC]
Would even work when only one of the referrer or target url has a leading www.
EDIT : (how does this % thing work?)
%n references the n(th) bracket's matched content from the last matched rewrite condition.
So, in this case
%1 = either www. OR "" blank (because it's optional; used ()? to do that)
%2 = yourdomain.com (without www always)
So, now the rewrite condition actually tries to match
yourdomain.com#stealer.com OR yourdomain.com#www.stealer.com
with ^(.+)#(www\.)?\1$ which means (.+)# anything and everything before # followed by www. (but again optional); followed by \1 the first bracket's matched content (within this regex; not the rewrite condition) i.e. the exact same thing before #.
So, stealer.com would fail the regex while yourdomain.com would pass. But, since we've negated the rule with a !; stealer.com passes the condition and hence the hot-link stopper rule is applied.

mod rewrite in htaccess with paging system

I have the following code which works fine
RewriteRule ^articles/([^/\.]+)/?$ articles.php?pid=$1 [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+articles\.php\?pid=([^\s&]+) [NC]
RewriteRule ^ http://www.mydomain.com/articles/%1? [R=301,L]
The problem is that I have a paging systen and I send another variable to that page for my paging system which looks like this
&pageNum_getArticles=1#1
I've already tried to do the following but gets confused with the hash I think
RewriteRule ^articles/([^/\.]+)/?$ articles.php?pid=$1&pageNum_getArticles=$2 [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+articles\.php\?pid=([^\s&]+)&pageNum_getArticles=([^\s&]+) [NC]
RewriteRule ^ http://www.mydomain.com/articles/%1/%2? [R=301,L]
Thanks for the info. So is there a solution about this or I have to change the paging system?
Yes.
You have 2 sets of rewrite rules that sort of work together. The first rule takes a nice looking URL without any query strings and internally rewrites it to a php script. The second rule takes the ugly looking php script URI and redirects the browser to use the nicer looking one. The first set is correct. But the second set uses the same rule to route to the script. You need to create another grouping for your paging, as $2 doesn't backreference to any match.
Try:
# second match here---------------v
RewriteRule ^articles/([^/.]+)/([^/.]+)/?$ articles.php?pid=$1&pageNum_getArticles=$2 [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+articles\.php\?pid=([^\s&]+)&pageNum_getArticles=([^\s&]+) [NC]
RewriteRule ^ http://www.mydomain.com/articles/%1/%2? [R=301,L]
Note that if you still want to use the rules that doesn't use paging, you need to have the non-paging rules after the ones with paging.
Additionally, if for some reason your paging doesn't work without the URL fragment, and if the fragment is always going to be the same as the paging number, you can just add it to the end of the redirect. But you'll need a NE like faa suggested:
RewriteRule ^ http://www.mydomain.com/articles/%1/%2#%2? [R=301,L,NE]

htaccess ReWriteCond ReWriteRule - redirect occurs even when URL includes nonsense characters

We've just finished a major re-structuring our website and I'm trying to write a set of redirect rules of varying specificity. The redirects are half working:
They correctly re-route old URLs
They incorrectly also allow and re-route URLs that include text not specified in the
ReWriteCond statements (when instead I would expect to see a "Not Found" error message displayed in the browser.)
Statements in the .htaccess file (located in the root of the web site) include:
RewriteBase /
RewriteCond %{REQUEST_URI} /company/company-history.html
RewriteRule (.*)$ http://www.technofrolics.com/about/index.html
RewriteCond %{REQUEST_URI} /press
RewriteRule (.*)$ http://www.technofrolics.com/gallery/index.html
The above correctly executes the desired redirect
but also works when I enter the following after the domain name:
/youcanenteranytext/hereatall/anditstillworks/press
In other words, any text following the domain and preceding the conditional string seems to be allowed/ignored. Any advise on how to restrict the condition or rewrite rule to prevent this would be much appreciated!
Thanks, Margarita
You need to including bounds in your regular expressions when you try to match against %{REQUEST_URI}, the ^ indicates the beginning of the match.
RewriteCond %{REQUEST_URI} ^/company/company-history\.html
Will make it so requests for /garbage/stuff/comapny/company-history.html won't match. And likewise:
RewriteCond %{REQUEST_URI} ^/press
Will make it so requests for /youcanenteranytext/hereatall/anditstillworks/press won't match. You can additionally employ the $ in your regular expression to indicate the end of the match, so something like this:
RewriteCond %{REQUEST_URI} ^/press$
Will ONLY match requests for /press and not /something/press or /press/somethingelse or /press/.

mod_rewrite regex (too many redirects)

I am using mod_rewrite, to convert subdomains into directory urls. (solution from here). When I explicity write a rule for one subdomain, it works perfectly:
RewriteCond %{HTTP_HOST} ^[www\.]*sub-domain-name.domain-name.com [NC]
RewriteCond %{REQUEST_URI} !^/sub-domain-directory/.*
RewriteRule ^(.*) /sub-domain-directory/$1 [L]
However, if I try to match all subdomains, it results in 500 internal error (log says too many redirects). The code is:
RewriteCond %{HTTP_HOST} ^[www\.]*([a-z0-9-]+).domain-name.com [NC]
RewriteCond %{REQUEST_URI} !^/%1/.*
RewriteRule ^(.*) /%1/$1 [L]
Can anyone suggest what went wrong and how to fix it?
Your second RewriteCond will never return false, because you can't use backreferences within your test clauses (they're compiled during parsing, making this impossible since no variable expansion will take place). You're actually testing for paths beginning with the literal text /%1/, which isn't what you wanted. Given that you're operating in a per-directory context, the rule set will end up being applied again, resulting in a transformation like the following:
path -> sub/path
sub/path -> sub/sub/path
sub/sub/path -> sub/sub/sub/path
...
This goes on for about ten iterations before the server gets upset and throws a 500 error. There are a few different ways to fix this, but I'm going to chose one that most closely resembles the approach you were trying to take. I'd also modify that first RewriteCond, since the regular expression is a bit flawed:
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com$ [NC]
RewriteCond %1 !=www
RewriteCond %1#%{REQUEST_URI} !^([^#]+)#/\1/
RewriteRule .* /%1/$0 [L]
First, it checks the HTTP_HOST value and captures the subdomain, whatever it might be. Then, assuming you don't want this transformation to take place in the case of www, it makes sure that the capture does not match that. After that, it uses the regular expression's own internal backreferences to see if the REQUEST_URI begins with the subdomain value. If it doesn't, it prepends the subdomain as a directory, like you have now.
The potential problem with this approach is that it won't work correctly if you access a path beginning with the same name as the subdomain the request is sent to, like sub.example.com/sub/. An alternative is to check the REDIRECT_STATUS environment variable to see if an internal redirect has already been performed (that is, this prepending step has already occurred):
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com$ [NC]
RewriteCond %1 !=www
RewriteCond %{ENV:REDIRECT_STATUS} =""
RewriteRule .* /%1/$0 [L]

Resources