I'm trying to figure out how to set up a mod_rewrite rule so that a request for a page, with a URL parameter appended, determines the root from which that file is served.
Here's the setup. In my "foo" directory I have "bar", "bar19", "bar27", etc.
Ideally I'd like to match on the first two characters of the "v" parameter. So like this:
I would like a request for ..................... to be served from:
www.example.com/foo/bar/something.html .......... foo/bar/something.html
www.example.com/foo/bar/something.html?v=19xxx ... foo/bar19/something.html
www.example.com/foo/bar/something.html?v=27xxx ... foo/bar27/something.html
Of course I would expect that if a value for "v" parameter that doesn't have a corresponding directory to 404.
I've done some Mod_Rewrite magic before, but I'm kind of stuck here. Any ideas?
Add a .htaccess file in directory /foo with the following content. Of course you can also insert it into your httpd.conf:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /foo
# match query parameter v with two-digit value; capture
# the value as %2 and the query string before and after v
# as %1 and %3, respectively
RewriteCond %{QUERY_STRING} ^(.*)v=(\d\d)[^&]*(&.*)?$
# match path into "bar" and rest; insert two-digit value from
# RewriteCond inbetween; append rest of query string without v value;
# use case-insensitivity
RewriteRule ^(bar)(.*)$ $1%2$2?%1%3 [NC]
</IfModule>
I think the key is to use captured values from the RewriteCond (accessible as %1, %2, etc.) and at the same time captured values from the RewriteRule itself (as usual $1, $2, etc.).
Related
So, after searching for a solution all over this community, my question is as follow:
Im working within the Wordpress enviroment, Apache server. I have a folder within uploads named /restricted/. Everything in here (any file extension) can only be accessed if:
A cookie named 'custom_cookie' is set
And this cookie value must be a partial match of the URL request
If these conditions fail, an image is served. Inside this /restricted/ folder I got a .htaccess file. Everything must (prefered) be done in that htaccess file, not on root htaccess file.
The cookie is set by functions.php, no problem with that
part. And comments about security is not the question here
This is an url example (localhost): http://localhost/komfortkonsult/wp-content/uploads/restricted/some-file.jpg?r=870603c9d23f2b7ea7882e89923582d7
The first condition A cookie named custom_cookie is set, everything is working with this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /komfortkonsult/
RewriteCond %{REQUEST_URI} ^.*uploads/restricted/.*
RewriteCond %{HTTP_COOKIE} !custom_cookie
RewriteRule . /komfortkonsult/restricted.png [R,L]
</IfModule>
However, the next part Im totally out in the blue, But I tried and failed with the following approaches:
RewriteCond %{HTTP_COOKIE} custom_cookie=(.*)$
RewriteCond %1::%{REQUEST_URI} ^(.*?)::/\1/?
RewriteRule . /komfortkonsult/restricted.png [R,L]
Likewise:
RewriteCond %{QUERY_STRING} ^r=(.*)$
RewriteRule ^/ - [E=COOKIE_MATCH:%1]
RewriteCond %{HTTP_COOKIE} !custom_cookie="%{ENV:COOKIE_MATCH}"
RewriteRule . /komfortkonsult/restricted.png [R,L]
Likewise:
RewriteCond %{HTTP_COOKIE} custom_cookie=([^;]+) [NC]
RewriteCond %{REQUEST_URI} !%1 [NC]
RewriteRule . /komfortkonsult/restricted.png [R,L]
And so on. I really want to keep this inside the .htaccess, instead using validation through a .php file call. But if that is the only solution to my architechture, please provide a full working example (not foo=bar, your redirects goes here...)
Any other approaches of my objectives are welcome.
Thanks so much for helping me out with this.
/ Intervik
Update (after accepted answer and working) example of usage
The objectives are one layer of protection in a Wordpress single install. All media, images or other files, uploaded and attached to pages, are hidden (replaced by an image) if A) the user is not logged-in or B) The user is logged in but not with the capability of 'edit_post'.
But the restriction is only for files uploaded into a unique folder called /restricted/. The folder is resident in the Wordpress original /uploads/ root. This restricted material is not allowed to be direct-linked or accessable by search engines etc etc. No browser-cache is allowed and restriction must work immediately after log-out. And more... but I think you get it.
The namespace 'custom_cookie' is just a providing example. And the examples showing the Wordpress install is within a subfolder on localhost. LIKE h**p://example.com/workspace/. Remove 'workspace/' if in root.
The cookie architecture, functions.php
function intervik_theme_set_custom_cookie(){
if(is_user_logged_in()){
global $current_user;
if(current_user_can('edit_posts')){
if(!isset($_COOKIE['custom_cookie'])){
$cookie_value = $current_user->ID . '|' . $current_user->user_login . '|' . $current_user->roles;
$salt = wp_salt('auth');
$cookie_hash = hash_hmac('md5', $cookie_value, $salt);
setcookie('custom_cookie', $cookie_hash, time()+36, '/');
$_COOKIE['custom_cookie'] = $cookie_hash;
} else {
$cookie_value = $current_user->ID . '|' . $current_user->user_login . '|' . $current_user->roles;
$salt = wp_salt('auth');
$cookie_hash = hash_hmac('md5', $cookie_value, $salt);
if($cookie_hash != $_COOKIE['custom_cookie']){
setcookie('custom_cookie', '', 1, '/');
unset($_COOKIE['custom_cookie']);
}
}
} else {
if(isset($_COOKIE['custom_cookie'])){
setcookie('custom_cookie', '', 1, '/');
unset($_COOKIE['custom_cookie']);
}
}
} else {
if(isset($_COOKIE['custom_cookie'])){
setcookie('custom_cookie', '', 1, '/');
unset($_COOKIE['custom_cookie']);
}
}
}
add_action('init', 'intervik_theme_set_custom_cookie');
As you can see, Each cookie is unique for each valid user, for each +36 seconds period (enough for a page-load - but use +120 for 2 minutes). This "token" is applied to every request send to the the server:
The link to attachment url filter:
function intervik_restricted_wp_get_attachment_url($url, $post_id){
if(strpos($url, '/restricted/') !== FALSE){
if(isset($_COOKIE['custom_cookie'])){
$url = add_query_arg('r', $_COOKIE['custom_cookie'], $url);
}
}
return $url;
}
add_filter('wp_get_attachment_url', 'intervik_restricted_wp_get_attachment_url', 10, 2);
We are not allowing any other query strings. Remark, more filter must be added for sizes, like wp_get_attachment_image_src etc etc. But direct links to media, this is enough.
Replace the if(current_user_can('edit_posts') with another
if(is_user_logged_in() ... changes everything to just login/out
users. Then skip the filters in the admin backend with if(!is_admin()
&& strpos($url, '/restricted/')!== FALSE) ...
And finally the .htaccess file, in the root of the uploads/restricted/ folder:
# BEGIN Intervik
Options +FollowSymLinks
Options All -Indexes
<IfModule !mod_rewrite.c>
Deny from all
</IfModule>
<IfModule mod_headers.c>
Header set Cache-Control "no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires 0
</IfModule>
RewriteEngine On
RewriteCond %{HTTP_COOKIE}::%{QUERY_STRING} !\bcustom_cookie=([0-9a-f]{32})\b.*::r=\1(&|$)
RewriteRule . /workspace/restricted.png? [R,L]
# END Intervik
I also placed the nice PNG IMAGE "Restriced Access timeout" in the Wordpress install root. This is also served as thumbnail in Library admin area for non valid administrators. The upload filter or backend is another area.
We are not protecting Englands financial plans here, but we wanna keep
away some paperwork for an organistion and some picures from Google and from
your wife.
Please comment
Its actually working and you are welcome to comment the flaws or security risks. However, there is also another layer validation with PHP above this layer in our install, but we need speed for not so important stuff.
You've got some of the correct bits in your different attempts, but you need to bring them together in the correct order.
Try the following instead:
RewriteEngine On
# custom_cookie value is 32 char hex and must match the value of the "r" URL parameter
RewriteCond %{HTTP_COOKIE}::%{QUERY_STRING} !\bcustom_cookie=([0-9a-f]{32})\b.*::r=\1(&|$)
RewriteRule ^ /komfortkonsult/restricted.png [QSD,R,L]
The QSD flag (Apache 2.4+) is required to remove the query string from the redirected URL. Alternatively, if you are still using Apache 2.2 then you can append a ? to the susbstitution instead.
Note that the RewriteBase is not required here. The <IfModule> should also be removed. The <IfModule mod_rewrite.c> wrapper is only required if this is intended to work without mod_rewrite being available. It is not. If mod_rewrite is not available then your conditions will simply fail silently and access will be unrestricted. In this case, it is preferable to fail with an error and access is forbidden (for everyone).
Assumptions:
The cookie value is a 32 character hex value (as in your example).
The r URL parameter is always the first URL parameter (as in your example).
You mentioned "any file extension", however, redirecting to an image only really "works" if an image is being requested in the first place. If you have files other than images it may be preferable to simply return a 403 Forbidden. (Strictly speaking, sending a 403 is the correct response rather than a 302, followed by 200 OK.) To send a 403 instead, just change the RewriteRule directive to read:
RewriteRule ^ - [F]
How this works...
An important point, that is missed from all but one of your examples, is the r URL parameter is part of the query string, not the URL-path. The REQUEST_URI server variable contains the URL-path only, which notably excludes the query string. To match the query string you need to compare against the QUERY_STRING server variable.
%{HTTP_COOKIE}::%{QUERY_STRING} - The cookie HTTP request header is joined with the query string using a separater (::) that is guaranteed to not appear in either value. This forms the TestString.
!\bcustom_cookie=([0-9a-f]{32})\b.*::r=\1(&|$) - This is the CondPattern that matches the TestString. \b is a word boundary, so we match only this specific cookie. The value of this cookie is captured using ([0-9a-f]{32}). We then skip over any remaining characters in the cookie header until we get to our separater (::). After this we are matching against the query string (value of the QUERY_STRING server variable in the TestString). The "magic" is the \1 backreference to the first captured group, ie. the cookie value.
The ! prefix on the CondPattern negates the entire pattern. So, the condition is successful when this pattern does not match, ie. when the values of the cookie and URL parameter are different (or not present at all).
Why your attempts were not working...
RewriteCond %{HTTP_COOKIE} custom_cookie=(.*)$
RewriteCond %1::%{REQUEST_URI} ^(.*?)::/\1/?
This assumes your cookie is the last cookie in the Cookie header. This is difficult to guarantee.
You are trying to match the cookie value with the entire URL-path (REQUEST_URI), so this will never match. It assumes your URL is of the form: http://localhost/870603c9d23f2b7ea7882e89923582d7.
RewriteCond %{QUERY_STRING} ^r=(.*)$
RewriteRule ^/ - [E=COOKIE_MATCH:%1]
RewriteCond %{HTTP_COOKIE} !custom_cookie="%{ENV:COOKIE_MATCH}"
Good, you are checking the query string for the URL parameter value. However...
The first RewriteRule never matches because the URL-path never starts with a slash in per-directory (.htaccess) context. Consequently, the COOKIE_MATCH environment variable is never set.
The CondPattern is a regex, not a plain string, so %{ENV:COOKIE_MATCH} is not evaluated - it is seen as a literal string. You've also enclosed this in double quotes, which aren't part of the cookie value either.
RewriteCond %{HTTP_COOKIE} custom_cookie=([^;]+) [NC]
RewriteCond %{REQUEST_URI} !%1 [NC]
Again, you are comparing against the URL-path, not the query string. However, as mentioned above, the %1 backreference is not evaluated in the CondPattern, so this is seen as a literal string anyway.
It is why the %{VARIABLE} (and %1 etc) expressions are not evaluated in the CondPattern that we need to use the seemingly complex expression that uses a regex backreference of the form:
%{VAR1}##%{VAR2} ^(.+)##\1$
I need to be able to match question marks because there was a translated text encoding mistake, and part of the URL ended up hardcoded with question marks in them. Here's a URL example that I need to rewrite:
https://example.com/Documentation/Product????/index.html
Here is my current rewrite rule. It works when the characters following "Product" are not question marks, but when they are, the rule doesn't apply.
RewriteRule "^Documentation/Product[^/]+/(.*)$" "https://s3.amazonaws.com/company-documentation/Help/Product/$1" [L,NC]
How would I make sure that question marks are considered to be characters too in this rule? I can't expect that only question marks and not the original non-English characters will be in the URL, so I want the rule above to match both question marks and any other character.
I found this topic which seems relevant, but the flags don't help, and the answer doesn't explain how to overcome the problem mentioned in the "Aside".
https://webmasters.stackexchange.com/questions/107259/url-path-with-encoded-question-mark-results-in-incorrect-redirect-when-copied-to
https://example.com/Documentation/Product????/index.html
You say it's "not a query string", but actually that is exactly what it is. And that is why you can't match it with the RewriteRule pattern. The above URL is split as follows:
URL-path: /Documentation/Product (matched by the RewriteRule pattern)
Query string: ???/index.html (note 3 ? - the first one starts the query string)
To match the query string you'll need an additional RewriteCond directive that checks against the QUERY_STRING server variable.
For example, to match the above URL, you would need to do something like:
RewriteCond %{QUERY_STRING} ^\?*/index\.html
RewriteRule ^Documentation/Product$ https://s3.amazonaws.com/company-documentation/Help/Product/index.html [NC,R,L]
This matches any number of erroneous ? at the start of the query string.
I've added the R (redirect) flag. Your directive (without the R flag) would trigger an external redirect anyway (because you specifying an absolute URL in the substitution), but it is far better to be explicit here. This is also a temporary (302) redirect. If this should be permanent (301) then change it to R=301, but only once you have confirmed that it's working OK (301s are cached hard by the browser so can make testing problematic).
UPDATE:
...so I want the rule above to match both question marks and any other character.
Only if there are question marks in the URL will there be a query string, so I think it is advisable to keep these two rules separate.
If there could be any erroneous characters at the start of the query string and if you want to capture the end part of the URL (like you are doing in your original directive, eg. index.html) then you can modify the above to read:
RewriteCond %{QUERY_STRING} /(.*)$
RewriteRule ^Documentation/Product$ https://s3.amazonaws.com/company-documentation/Help/Product/%1 [NC,R,L]
Note the %1 (as opposed to $1) backreference in the substitution string. This is a backreference to the captured group in the last matched CondPattern (ie. /(.*)$).
You can follow this with your existing directive (but remember to include the R flag) for more "normal" URLs that don't contain a ? (ie. query string).
NB: Surrounding the arguments in double quotes are entirely optional in this example. They are only required if you have unescaped spaces in the pattern or substitution arguments.
In summary
# Redirect URLs of the form:
# "/Documentation/Product?<anything#1>/<anything#2>"
RewriteCond %{QUERY_STRING} /(.*)$
RewriteRule ^Documentation/Product$ https://s3.amazonaws.com/company-documentation/Help/Product/%1 [NC,R,L]
# Redirect URL-paths of the form (no query string):
# "/Documentation/Product<something>/<anything>"
RewriteRule ^Documentation/Product[^/]+/(.*) https://s3.amazonaws.com/company-documentation/Help/Product/$1 [NC,R,L]
I need to substitute the character %26 for & and %3D for = in my URL:
http://www.example.com/dir/?sort-by=title%26listing_types%3Dcars
I tried the rewrite below but it does not work
RewriteRule ^dir/?$ ?sort-by=$1&listing-types=$2 [QSA,L]
Any help will be welcome
RewriteRule ^dir/?$ ?sort-by=$1&listing-types=$2 [QSA,L]
Your backreferences $1 and $2 don't actually refer to anything so these will always be empty (resulting in empty URL parameters). However, $n backreferences refer back to the RewriteRule pattern, which does not match against the query string anyway. You would need %n type backreferences that refer back to the last matched CondPattern in a preceding RewriteCond directive.
This also looks like it should ideally be a redirect, rather than an internal rewrite? Otherwise, you run the risk of duplicate content.
http://www.example.com/dir/?sort-by=title%26listing_types%3Dcars
You can do something like the following to match the above URL and replace the appropriate characters:
RewriteCond %{QUERY_STRING} ^(sort-by=.*)%26(listing_types)%3D(.*)
RewriteRule ^(dir/)$ /$1?%1&%2=%3 [R,L]
The above would temporarily (302) redirect /dir/?sort-by=<anything1>%26listing_types%3D<anything2> to /dir/?sort-by=<anything1>&listing_types=<anything2>. Change R to R=301 if this should be permanent, but only once you have confirmed it is working OK.
$1 is a backreference to the captured group in the RewriteRule pattern (ie. "dir/") and %1, %2 and %3 are backreferences to the corresponding captured groups in the preceding CondPattern in order to reconstruct the query string.
If you specifically need this to be an internal rewrite then remove the R flag (and optionally remove the slash prefix on the substitution).
http://www.example.com/dir/?sort-by=title%26listing_types%3Dcars
As mentioned in comments, you could instead handle this entirely in your server-side code. For example, in PHP you could do something like:
<?php
$queryString = urldecode($_SERVER['QUERY_STRING']);
parse_str($queryString,$urlParams);
print_r($urlParams);
?>
Given the above request URL, this will output:
Array
(
[sort-by] => title
[listing_types] => cars
)
I've been browsing the symfony2 framework source. In the htaccess file for their example website, I found the %{REQUEST_URI}::$1 written as follows:
RewriteCond %{REQUEST_URI}::$1 ^(/.+)(.+)::\2$
RewriteRule ^(.*) - [E=BASE:%1]
The comment above that rule explains
The following rewrites all other queries to the front controller. The condition ensures that if you are using Apache aliases to do mass virtual hosting, the base path will be prepended to allow proper resolution of the app.php file; it will work in non-aliased environments as well, providing a safe, one-size fits all solution.
However, that doesn't explain the ::$1 or ::\2.
Are they backreferences? If not, what are they? What is their purpose?
I have encountered almost the same htaccess file in my Zend project, and here are my thoughts and hope it helps.
The htaccess file (located at the Zend project directory, same as index.php) says
RewriteCond %{REQUEST_URI}::$1 ^(/.+)(.+)::\2$
RewriteRule ^(.*)$ - [E=BASE:%1]
RewriteRule ^(.*)$ %{ENV:BASE}index.php [NC,L]
Suppose Zend is installed at http://mydomain.tld/zend (let's call it yourdomain later on)
and we are requesting yourdomain/mycontroller/myaction
Therefore %{REQUEST_URI} will be /zend/mycontroller/myaction.
Note that $1, which is the pattern in the RewriteRule directive in the htaccess context [1], "will initially be matched against the filesystem path, after removing the prefix that led the server to the current RewriteRule (e.g. app1/index.html or index.html depending on where the directives are defined)".
Therefore $1 will be mycontroller/myaction.
And %{REQUEST_URI}::$1 will be /zend/mycontroller/myaction::mycontroller/myaction.
The above string will be matched against ^(/.+)(.+)::\2$. Note that for the two capturing groups in round braces i.e., (/.+)(.+) before :: many combinations can match that. For example:
Group 1: /z
Group 2: end/mycontroller/myaction
or
Group 1: /zend/mycontroller/myactio
Group 2: n
and anything in between is a valid match. In fact, the most interesting one would be
Group 1: /zend/
Group 2: mycontroller/myaction
which (is the only case that) makes backreference \2 (after ::) to the second group a match.
In this case, /zend/ will be stored in the environment variable BASE which is what the first RewriteRule does. The %1 refers to the first matched string in RewriteCond which is /zend/.
Looking at the second RewriteRule, it is clear that why there is a need for this. As index.php can only be found in /zend/index.php, we need to add /zend/ in front of index.php.
Here we assume to use the URL-path as Substitution for the second RewriteRule directive. Refer to [1] and search for "A DocumentRoot-relative path to the resource to be served" under the RewriteRule Directive section.
All the above leave the query string unchanged/untouched. It is up to index.php how to parse the query string (as well as the URI).
Lastly goes the case where Zend is installed at the domain root.
%{REQUEST_URI} will be /mycontroller/myaction.
$1 will be mycontroller/myaction.
The string to be matched by RewriteCond will be /mycontroller/myaction::mycontroller/myaction.
This time the second group in (/.+)(.+) will never match mycontroller/myaction as there needs to be at least one letter following the initial backslash for the first group, making the second group as close as ycontroller/myaction but not exactly mycontroller/myaction so there cannot be a match.
As a result, the first RewriteRule is not used. The BASE enviornment variable will not be set, and when the second RewriteRule uses it, it will simply be empty.
References
[1] http://httpd.apache.org/docs/current/mod/mod_rewrite.html
The $1 in %{REQUEST_URI}::$1 references the matched string of the RewriteRule directive, i.e., the matched string of .* in ^(.*). So %{REQUEST_URI}::$1 is expanded to the requested URI path as supplied by the user, and the current internal URI path and query, separated by ::.
The pattern ^(/.+)(.+)::\2$ is used to find a prefix (first capturing group) which makes the remaining part match the part behind the :: (\2 is a back reference to the matched string of the second capturing group of the pattern).
If such a match is found, the prefix is stored in the environment variable BASE ([E=BASE:%1], where %1 references the matched string of the previous successful RewriteCond pattern match).
I am learning how to write regular expressions for .htaccess redirects.
So far I've managed to figure out everything I needed, except for a couple of regular expressions which don't behave as I expected. I am testing my regular expressions using a desktop application, and they work fine there, but not in the .htaccess file.
FYI: The RewriteBase is set to /site/
This is the incoming URL:
/site/view-by-tag/politics/?el_mcal_month=3&el_mcal_year=2009
I want to grab "politics" and redirect to /site/tags/politics/
Here is what I used:
RewriteRule ^view-by-tag/([a-zA-Z\-]+)/([a-zA-Z0-9\-\/\.\_\=\?\&]+) /tags/$1/ [R=301,L]
I added the capture of all the characters after politics because I am having the issue that when there is a ? in the URL the redirect does not work, and I can't figure out why. In the URL given above, if I remove the ? it works fine, but if the ? is in there, nothing happens. Is there a reason for this?
The same thing happens when I try to capture 307 from /site/?option=com_content&view=article&id=307&catid=89&Itemid=55
I used this regular expression, article&id=([0-9]+) /?p=$1 [R=301,L] but again, when there is a ? in the URL it stops the redirect for doing anything.
What is the reason for that?
The .htaccess file in question is on a Wordpress blog (3.4.1)
The point that you've missed is that the rewrite engine splits the URI into two parts: the REQUEST_URI and the QUERY_STRING. The query string part isn't used in the rule match string so there is no point in constructing rule regexp patterns to look for it.
You can probe and pick out parameters from the query string by using rewrite conditions and condition regexps to set %N variables.
By default the query string is appended to the output substitution string unless you have a ?someparam in it -- in which case it is ignored unless you used the [QSA] (query string append) parameter.
The way that you'd pick up the id in /site/?option=com_content&view=article&id=307&catid=89&Itemid=55 is to use something like:
RewriteCond %{QUERY_STRING} \bid=(\d+)
Before the rule and this would set %1 to 307. Read the rewrite documentation for more general discussion of how to do this.
The query string is must be processed separately in a RewriteCond if you need to manipulate it, and should not be matched inside the RewriteRule Instead, just match the request not including the query string, and use QSA to append the query string onto the redirect:
RewriteRule ^view-by-tag/([A-Za-z-]+)/?$ /tags/$1/ [R=301,L,QSA]
# OR, if you don't want the rest of the query string appended, put a `?` onto
# the redirect to replace it with nothing
RewriteRule ^view-by-tag/([A-Za-z-]+)/?$ /tags/$1/? [R=301,L]
Actually, the QSA may not be needed in a R redirect - I think that the default behavior is to pass the query string with the redirect.
If you need to capture 307 from the query string, do it in a RewriteCond and capture in %1:
# Capture the id in %1
RewriteCond %{QUERY_STRING} id=([\d]+)
# Redirect everything to /, pass %1 into p
RewriteRule . /?p=%1 [LR=301,L]