Match "full stop" in mod rewrite - .htaccess

At the moment I am just matching numbers, letters, dashes and underscores in my .htaccess file:
RewriteRule ^([A-Za-z0-9-_]+)/?$ index.php?folder=$1
I also want to match full stops in the string. I don't want to use:
(.*)
I have tried:
([.A-Za-z0-9-_]+)
([\.A-Za-z0-9-_]+)
([\\.A-Za-z0-9-_]+)
([A-Za-z0-9-_\.]+)
None of which seem to work.... how can I escape the full stop so it matches a full stop!
---------- Additional information ----------------
As an example:
mydomain.com/groups/green/ should go to index.php?folder=green
In addition I am also re-writing subdomains over the top of this (I think this is causing the complication)...
anotherdomain.com should map to index.php?folder=anotherdomain.com
I have succesfully re-written the subdomain with the following rule:
# external group domain name
RewriteCond %{ENV:Rewrite-Done} !^Yes$
## exclude requests from myhost.com
RewriteCond %{HTTP_HOST} !^www\.myhost\.com
## allowed list of domain masking domains
RewriteCond %{HTTP_HOST} ^(anotherdomain.com|extra.domain.com|external.otherdomain.com)
RewriteRule (.*) /groups/%1/$1
I think this is where the complication lies.
---------------- Solution ----------------------
Despite not finding a solution to the exact problem above, I have worked around it by changing the first re-direct (which maps the external domains) from:
RewriteRule (.*) /groups/%1/$1
to:
RewriteRule (.*) /groups/external/$1&external_domain=%1
The second re-write (on the folder) can then interpret the "external domain" variable instead of the folder.

Your first option is the simplest and is correct. Inside square brackets . has no special meaning, so you include it verbatim without any special escaping needed.
Actually there is a small problem with the second dash in 0-9-_. If you want a dash inside square brackets you should place it at the beginning of the character class. Otherwise it will have its special meaning of defining a character range:
([-.A-Za-z0-9_]+)
If that doesn't work there is something else wrong with your RewriteRule. For instance, if this is a global rule rather than per-directory (no RewriteBase) then URLs will begin with a slash /.

Related

htaccess RewriteRule with literal question marks (not query string)

I need to be able to match question marks because there was a translated text encoding mistake, and part of the URL ended up hardcoded with question marks in them. Here's a URL example that I need to rewrite:
https://example.com/Documentation/Product????/index.html
Here is my current rewrite rule. It works when the characters following "Product" are not question marks, but when they are, the rule doesn't apply.
RewriteRule "^Documentation/Product[^/]+/(.*)$" "https://s3.amazonaws.com/company-documentation/Help/Product/$1" [L,NC]
How would I make sure that question marks are considered to be characters too in this rule? I can't expect that only question marks and not the original non-English characters will be in the URL, so I want the rule above to match both question marks and any other character.
I found this topic which seems relevant, but the flags don't help, and the answer doesn't explain how to overcome the problem mentioned in the "Aside".
https://webmasters.stackexchange.com/questions/107259/url-path-with-encoded-question-mark-results-in-incorrect-redirect-when-copied-to
https://example.com/Documentation/Product????/index.html
You say it's "not a query string", but actually that is exactly what it is. And that is why you can't match it with the RewriteRule pattern. The above URL is split as follows:
URL-path: /Documentation/Product (matched by the RewriteRule pattern)
Query string: ???/index.html (note 3 ? - the first one starts the query string)
To match the query string you'll need an additional RewriteCond directive that checks against the QUERY_STRING server variable.
For example, to match the above URL, you would need to do something like:
RewriteCond %{QUERY_STRING} ^\?*/index\.html
RewriteRule ^Documentation/Product$ https://s3.amazonaws.com/company-documentation/Help/Product/index.html [NC,R,L]
This matches any number of erroneous ? at the start of the query string.
I've added the R (redirect) flag. Your directive (without the R flag) would trigger an external redirect anyway (because you specifying an absolute URL in the substitution), but it is far better to be explicit here. This is also a temporary (302) redirect. If this should be permanent (301) then change it to R=301, but only once you have confirmed that it's working OK (301s are cached hard by the browser so can make testing problematic).
UPDATE:
...so I want the rule above to match both question marks and any other character.
Only if there are question marks in the URL will there be a query string, so I think it is advisable to keep these two rules separate.
If there could be any erroneous characters at the start of the query string and if you want to capture the end part of the URL (like you are doing in your original directive, eg. index.html) then you can modify the above to read:
RewriteCond %{QUERY_STRING} /(.*)$
RewriteRule ^Documentation/Product$ https://s3.amazonaws.com/company-documentation/Help/Product/%1 [NC,R,L]
Note the %1 (as opposed to $1) backreference in the substitution string. This is a backreference to the captured group in the last matched CondPattern (ie. /(.*)$).
You can follow this with your existing directive (but remember to include the R flag) for more "normal" URLs that don't contain a ? (ie. query string).
NB: Surrounding the arguments in double quotes are entirely optional in this example. They are only required if you have unescaped spaces in the pattern or substitution arguments.
In summary
# Redirect URLs of the form:
# "/Documentation/Product?<anything#1>/<anything#2>"
RewriteCond %{QUERY_STRING} /(.*)$
RewriteRule ^Documentation/Product$ https://s3.amazonaws.com/company-documentation/Help/Product/%1 [NC,R,L]
# Redirect URL-paths of the form (no query string):
# "/Documentation/Product<something>/<anything>"
RewriteRule ^Documentation/Product[^/]+/(.*) https://s3.amazonaws.com/company-documentation/Help/Product/$1 [NC,R,L]

What does ::$1 mean in an htaccess?

I've been browsing the symfony2 framework source. In the htaccess file for their example website, I found the %{REQUEST_URI}::$1 written as follows:
RewriteCond %{REQUEST_URI}::$1 ^(/.+)(.+)::\2$
RewriteRule ^(.*) - [E=BASE:%1]
The comment above that rule explains
The following rewrites all other queries to the front controller. The condition ensures that if you are using Apache aliases to do mass virtual hosting, the base path will be prepended to allow proper resolution of the app.php file; it will work in non-aliased environments as well, providing a safe, one-size fits all solution.
However, that doesn't explain the ::$1 or ::\2.
Are they backreferences? If not, what are they? What is their purpose?
I have encountered almost the same htaccess file in my Zend project, and here are my thoughts and hope it helps.
The htaccess file (located at the Zend project directory, same as index.php) says
RewriteCond %{REQUEST_URI}::$1 ^(/.+)(.+)::\2$
RewriteRule ^(.*)$ - [E=BASE:%1]
RewriteRule ^(.*)$ %{ENV:BASE}index.php [NC,L]
Suppose Zend is installed at http://mydomain.tld/zend (let's call it yourdomain later on)
and we are requesting yourdomain/mycontroller/myaction
Therefore %{REQUEST_URI} will be /zend/mycontroller/myaction.
Note that $1, which is the pattern in the RewriteRule directive in the htaccess context [1], "will initially be matched against the filesystem path, after removing the prefix that led the server to the current RewriteRule (e.g. app1/index.html or index.html depending on where the directives are defined)".
Therefore $1 will be mycontroller/myaction.
And %{REQUEST_URI}::$1 will be /zend/mycontroller/myaction::mycontroller/myaction.
The above string will be matched against ^(/.+)(.+)::\2$. Note that for the two capturing groups in round braces i.e., (/.+)(.+) before :: many combinations can match that. For example:
Group 1: /z
Group 2: end/mycontroller/myaction
or
Group 1: /zend/mycontroller/myactio
Group 2: n
and anything in between is a valid match. In fact, the most interesting one would be
Group 1: /zend/
Group 2: mycontroller/myaction
which (is the only case that) makes backreference \2 (after ::) to the second group a match.
In this case, /zend/ will be stored in the environment variable BASE which is what the first RewriteRule does. The %1 refers to the first matched string in RewriteCond which is /zend/.
Looking at the second RewriteRule, it is clear that why there is a need for this. As index.php can only be found in /zend/index.php, we need to add /zend/ in front of index.php.
Here we assume to use the URL-path as Substitution for the second RewriteRule directive. Refer to [1] and search for "A DocumentRoot-relative path to the resource to be served" under the RewriteRule Directive section.
All the above leave the query string unchanged/untouched. It is up to index.php how to parse the query string (as well as the URI).
Lastly goes the case where Zend is installed at the domain root.
%{REQUEST_URI} will be /mycontroller/myaction.
$1 will be mycontroller/myaction.
The string to be matched by RewriteCond will be /mycontroller/myaction::mycontroller/myaction.
This time the second group in (/.+)(.+) will never match mycontroller/myaction as there needs to be at least one letter following the initial backslash for the first group, making the second group as close as ycontroller/myaction but not exactly mycontroller/myaction so there cannot be a match.
As a result, the first RewriteRule is not used. The BASE enviornment variable will not be set, and when the second RewriteRule uses it, it will simply be empty.
References
[1] http://httpd.apache.org/docs/current/mod/mod_rewrite.html
The $1 in %{REQUEST_URI}::$1 references the matched string of the RewriteRule directive, i.e., the matched string of .* in ^(.*). So %{REQUEST_URI}::$1 is expanded to the requested URI path as supplied by the user, and the current internal URI path and query, separated by ::.
The pattern ^(/.+)(.+)::\2$ is used to find a prefix (first capturing group) which makes the remaining part match the part behind the :: (\2 is a back reference to the matched string of the second capturing group of the pattern).
If such a match is found, the prefix is stored in the environment variable BASE ([E=BASE:%1], where %1 references the matched string of the previous successful RewriteCond pattern match).

How to write this .htaccess rewrite rule

I am setting up a MVC style routing system using mod rewrite within an .htaccess file (and some php parsing too.)
I need to be able to direct different URLs to different php files that will be used as controllers. (index.php, admin.php, etc...)
I have found and edited a rewrite rule that does this well by looking at the first word after the first slash:
RewriteCond %{REQUEST_URI} ^/stats(.*)
RewriteRule ^(.*)$ /hello.php/$1 [L]
However, my problem is I want it to rewrite based on the 2nd word, not the first. I want the first word to be a username. So I want this:
http://www.samplesite.com/username/admin to redirect to admin.php
instead of:
http://www.samplesite.com/admin
I think I just need to edit the rewrite rule slightly with a 'anything can be here' type variable, but I'm unsure how to do that.
I guess you can prefix [^/]+/ to match and ignore that username/
RewriteCond %{REQUEST_URI} ^/[^/]+/stats(.*)
RewriteRule ^[^/]+/(.*)$ /hello.php/$1 [L]
then http://www.samplesite.com/username/statsadmin will be redirecte to http://www.samplesite.com/hello.php/statsadmin (or so, I do not know the .htaccess file)
To answer your question, "an anything can be here type variable" would be something like a full-stop . - it means "any character". Also the asterisk * means "zero or more of the preceding character or parenthesized grouped characters".
But I don't think you need that...If your matching url will always end in "admin" then you can use the dollar sign $ to match the end of the string.
Rewrit­eRule admin$ admin.php [R,NC,L]
Rewrites www.anything.at/all/that/ends/in/admin to www.anything.at/admin.php

htaccess is not working

I am trying to write an htaccess rewrite rule. But it is not redirecting,
This is my present rule
RewriteEngine On
RewriteRule ^([a-zA-Z0-9]+)$ question.php?qkey=$1
that will show a url like sitename/questionkey and redirect it perfectly.
Now Iam trying to show a url like sitename/questioncatagory/questiontititle
Iam trying to use the following rule, but it is not working
RewriteEngine On
RewriteRule ^([a-zA-Z0-9]+)/^([a-zA-Z0-9]+)/^([a-zA-Z0-9]+)$ question.php?qkey=$1
First, it's probably better for clarity and maintenance to replace your ([a-zA-Z0-9]) with ([\w]+)
Secondly, your new rule doesn't work because of the caret ^ character. In the beginning, it indicates 'match beginning of the line', which surely doesn't apply 3 times total in the regex. Remove the two later ^ carets (and then make use of your additional captured groups somewhere with $1, $2, et c).
Lastly, you probably don't need to match the end of the line with the $ character. This is unfriendly to many URLs, for example ones with a trailing slash.
RewriteEngine On
RewriteRule ^([\w]+)/([\w]+)/([\w]+) question.php?qkey=$1&cat=$2&qtitle=$3

Ampersands in URL problems

I have a php page which creates URL like:
vendors/London City/cat-DJ & Entertainment/keywords
which my .htaccess redirects as shown below
RewriteRule vendors/(.+)/cat-(.+)/(.+)$ vendors.php?location=$1&category=$2&freetext=$3 [L]
RewriteRule vendors/(.+)/cat-(.+)/(.+)/$ vendors.php?location=$1&category=$2&freetext=$3 [L]
problem 1 is : in the vendors.php file, I am getting only "DJ ; Entertainment" as category. The ampersand is missing.
Problem 2 is : My complete .htaccess file is shown below... 6 rules are defined.
RewriteRule vendors/(.+)/(.+)/$ vendors.php?location=$1&freetext=$2 [L]
RewriteRule vendors/(.+)/(.+)$ vendors.php?location=$1&freetext=$2 [L]
RewriteRule vendors/(.+)/cat-(.+)/$ vendors.php?location=$1&category=$2 [L]
RewriteRule vendors/(.+)/cat-(.+)$ vendors.php?location=$1&category=$2 [L]
RewriteRule vendors/(.+)/cat-(.+)/(.+)$ vendors.php?location=$1&category=$2&freetext=$3[L]
RewriteRule vendors/(.+)/cat-(.+)/(.+)/$ vendors.php?location=$1&category=$2&freetext=$3[L]
Why the URL vendors/London City/cat-DJ & Entertainment/keywords is matching with rule 3 or 4 and redirecting to vendors.php?location=$1&category=$2 ?
Does .htaccess Process the rules from top to beginning one by one?
I had solved the problem by putting the rules 5 and 6 at the top of other rules. Did I make the correct fix?
1. I don't really like the idea of having spaces and other special characters in the URLs. I don't know if it's possible with your site, but instead of this kind of URL
vendors/London City/cat-DJ & Entertainment/keywords
you should have this one:
vendors/london-city/cat-dj-and-entertainment/keywords
For that, of course, you will have to perform some additional transformations / lookups in your database to convert london-city back to London City and dj-and-entertainment back to DJ & Entertainment. This can be done by storing these "from-to" pairs in database.
2. In any case -- order of rules matters. Therefore you should start with more specific rules and end up with more generic rules.
Also -- the (.+) pattern is a way too broad as it can match hello as well as hello/pink/kitten. To ensure that you always grab only one section (part of URL between /) use ([^/]+) pattern instead -- this will address one of the aspects of your "prob #2".
Therefore, try these optimized rules (each rule will match the URL with and without trailing slash):
RewriteRule ^vendors/([^/]+)/cat-([^/]+)/([^/]+)/?$ vendors.php?location=$1&category=$2&freetext=$3 [L]
RewriteRule ^vendors/([^/]+)/cat-([^/]+)/?$ vendors.php?location=$1&category=$2 [L]
RewriteRule ^vendors/([^/]+)/([^/]+)/?$ vendors.php?location=$1&freetext=$2 [L]
Also I'm not getting the value of 'category' with the Ampersand as
given in the url. I am getting only semi-colon. What can be the
reason?
I do not have Apache box currently running next to me, so cannot check it right now, but try adding B or NE flag next to the L flag (e.g. [L,B]) -- one of them should help:
http://httpd.apache.org/docs/current/rewrite/flags.html#flag_b
http://httpd.apache.org/docs/current/rewrite/flags.html#flag_ne
From the docs:
The order in which these rules are defined is important - this is the order in which they will be applied at run-time.

Resources