Removal of trailing dot in RewriteRule of .htaccess - .htaccess

The .htaccess rewrite rule applied in a restful database application:
RewriteRule ^author/([A-z.]+)/([A-z]+)$ get_author.php?first_name=$1&last_name=$2
applied to
http://localhost:8080/API/author/J./Doe
removes the period from "J." and the resulting name "J Doe" is obviously not in the database (while "J. Doe" is). This rewrite rule only removes a trailing period, e.g. "J.O" translates correctly to "J.O". I use XAMPP 7.0.6 plus Apache under Windows 10. What to do in order to NOT remove the trailing dot on the initial?
Update:
Apparently my question wasn't clear, I give it another try.
The regexp (RewriteRule) above is supposed to assign "J." to the variable $1. Instead it assigns "J" to $1, in other words, the regex drops the trailing dot. Secondly, the regex assigns "Doe" to the variable $2, this assignment is as expected and correct. The variables $1 (with incorrect value "J") and $2 (with correct value "Doe") are used in a database search. This search fails because of the missing dot. The database contains "J. Doe", but not "J Doe".
When a dot is not trailing, as in "J.O", the variable $1 gets the correct value "J.O". In other words, the regex does not remove all dots, only the trailing ones.
My question is: how can I tell (the rewrite engine of) .htaccess to apply the regexp correctly?
For comparison, the following piece of JS code does what I want:
var regexp = "^author/([A-z.]+)/([A-z]+)$";
var result = "author/J./Doe".match(regexp);
alert(result[1] + " " + result[2]);

This is apparently (still) a "feature": https://bz.apache.org/bugzilla/show_bug.cgi?id=20036
Problem: Apache strips all trailing dots and spaces unless the path segments is exactly "." or "..".
I ran into the problem because I tried to map an URL from get/a/b/c to get.php?param1=a&param2=b&param3=c, but c can legitimately have trailing dots. The issue is not actually mod_rewrite related but happens with regular URLs too, example URL of a file that's definitely not named this way: Example favicon file. Other servers don't do this. Example: Stackoverflow favicon file, which turns this into a way to detect an Apache server when the HTTP server header is stripped.
To work around this problem, I still map the URL using mod_rewrite, but then in the PHP script, I use the exact same regex to manually map the parameters:
if(preg_match('#/get/([^/]+)/([^/]+)/(.+)$#',$_SERVER['REQUEST_URI'],$matches)){
$param1=$matches[1];
$param2=$matches[2];
$param3=$matches[3];
}
Instead of using the PATH_INFO, I use the REQUEST_URI because it's untouched.
This means if you absolutely need to pass trailing dots in a path string to a backend using apache, your best bet right now is to write an intermediate script that extracts the proper parameters and then does the proxy request for you.

Related

Is it possible to put a line break into a 'mailto:' rewrite in htaccess?

For complex reasons I've had to remove an enquiry form from a web site and use a 'mailto:' instead. For simplicity I've changed the htaccess file so that the former 'contact' link to the form now becomes a 'mailto:' as follows:
RewriteRule ^contact$ mailto:myname#mydomain.com?subject=BusinessName\ BandB\ Enquiry&body=You\ can\ find\ our\ availability\ on\ line.\ Delete\ this\ content\ if\ inapplicable
That does work, my local e-mail client (Thunderbird) opens with the information correctly shown in subject and body. (My TB is set to compose in plain text, I've yet to test with HTML)
I would like to introduce a new line in the body so that 'Delete this content if inapplicable' is on a separate line. Is there any way to do this? Given mod_rewrite's intended purpose I could understand if there isn't but I thought I'd ask before giving up.
I would like to introduce a new line in the body so that 'Delete this content if inapplicable' is on a separate line.
New lines in the body need are represented by two characters: carriage return (char 13) + line feed (char 10) (see RFC2368). This would need to be URL encoded in the resulting URL as %0D%0A.
When used in the RewriteRule substitution string the literal % characters would need to backslash-escaped to negate their special meaning as a backreference to the preceding CondPattern (which there isn't one). ie. \%0D\%0A. Otherwise, you will end up with the string DA, because there is no %0 backreference in this example.
You can also avoid having to backslash-escape all the literal spaces by encloses the entire argument (substitution string) in double quotes.
So, try the following instead:
RewriteRule ^contact$ "mailto:myname#mydomain.com?subject=BusinessName BandB Enquiry&body=You can find our availability on line.\%0D\%0ADelete this content if inapplicable" [R,L]

Encoded URL in htaccess 301 redirect (mod_rewrite)

I am lost after digging into this matter for a lot of days. We have the following redirects:
RewriteRule ^something/something2/?$ http://test.com/blabla?key=blablabla1287963%3D [R=301,L,E=OUTLINK:1]
Header always set X-Robots-Tag "noindex" env=OUTLINK
Unfortunately that %3D got stripped by the module (mod_rewrite). The main problem is that I know how to manually fix it but I have multiple similar redirects and I need a "global solution". Please note that moving back to redirect 301 (I had no issues with redirect 301 and encoded URLs/characters) is not an option since I want to use noindex...
Thank you!
that %3D got stripped by the module
I think you'll find that it's the %3 that gets stripped, not %3D. %3 is seen as a backreference to a preceding condition - which I suspect doesn't exist - so gets replaced with an empty string in the substitution. (This would not have been a problem with Redirect since %N backreferences aren't a thing with mod_alias.)
You need to backslash escape the % to represent a literal % in the substitution string in order to negate its special meaning in this case.
You will then need the NE flag on the RewriteRule to prevent the % itself from being URL encoded (as %25) in the response (essentially doubly encoding the URL param value).
For example:
RewriteRule ^foo$ http://test.com/blabla?key=blablabla1287961\%3D [NE,R=302,L,E=OUTLINK:1]
I have multiple similar redirects and I need a "global solution"
As far as a "global solution" goes, there isn't a magic switch you can enable on the server that will "fix" this. You need to modify each directive where this conflict occurs.

what does $1 in .htaccess file mean?

I am trying to understand the meaning of this line in the .htaccess file
RewriteRule ([a-z0-9/-]+).html $1.php [NC,L,QSA]
basically what does $1.php ? what file in the server
if we have home.html where this gonna redirect to? home.php?
$1 is the first captured group from your regular expression; that is, the contents between ( and ). If you had a second set of parentheses in your regex, $2 would contain the contents of those parens. Here is an example:
RewriteRule ([a-z0-9/-]+)-([a-z]+).html$ $1-$2.php [NC,L,QSA]
Say a user navigates to hello-there.html. They would be served hello-there.php. In your substitution string, $1 contains the contents of the first set of parens (hello), while $2 contains the contents of the second set (there). There will always be exactly as many "dollar" values available in your substitution string as there are sets of capturing parentheses in your regex.
If you have nested parens, say, (([a-z]+)-[a-z]+), $1 always refers to the outermost capture (in this case the whole regex), $2 is the first nested set, and so on.
.htaccess files can contain a wide variety of Apache configuration directives, but this one, like many, is to do with the URL rewriting module, mod_rewrite.
A RewriteRule directive has 3 parts:
a Pattern (regular expression) which needs to match against the current URL
a Substitution string, representing the URL to serve instead, or instruct the browser to redirect to
an optional set of flags
In this case, you have a regular expression which matches anything ending in .html which consists only of letters a-z, digits 0-9, / and -. However, it also contains a set of parentheses (...), which mark a part of the pattern to be "captured".
The Substitution string can then reference this "captured" value; the first capture is $1, and the second would be $2, and so on.
In this case, the captured part is everything before the .html, and the Substitution is $1.php, meaning whatever string came before .html is kept, but the .html is thrown away and .php is stuck on instead.
So for your specific example, accessing home.html will instead act as though you had requested home.php.
It's a reference to the first capture group denoted by the parentheses in the pattern ([a-z0-9/-]+).html$. If there were two (.*)-(.*) then you would access $1 for the first capture group and $2 for the second, etc...
$1 refers to the first group caught by your regex (ie between parenthesis). In your case it refers to :
([a-z0-9/-]+)
For the URL mypage.html, $1 will contain "mypage", and the rule will redirect to mypage.php.

Need a mod_rewrite .htaccess solution to replace %20 spaces with -'s in the finished URL

I need an .htaccess mod_rewrite solution that will take a .cgi search query like this:
www.mydomain.com/cgi-bin/finda/therapist.cgi?Therapy_Type=Pilates Training&City=Los Angeles&State=CA
and return matching results in the browser's address bar to look like this:
www.mydomain.com/therapists/Pilates-Training-Los-Angeles-CA.html
or better yet:
www.mydomain.com/therapists/pilates-training-los-angeles-ca.html
Notice the database includes values with one, two or three words + spaces...
For example:
Therapy_Type=Pilates Training <- includes a space
City=Los Angeles <- includes a space
State=CA <- no space
I used the tool at: http://www.generateit.net/mod-rewrite/ to generate the following RewriteRule:
RewriteEngine On
RewriteRule ^([^-]*)-([^-]*)-([^-]*)\.html$ /cgi-bin/finda/therapist.cgi?Therapy_Types=$1&City=$2&State=$3 [L]
This does work (finds the search matches) and generates the results page, but because the parameter values have spaces in them, we end up with a URL that looks like this:
www.mydomain.com/therapists/Pilates%20Training-Los%20Angeles-CA.html
I've spent days in this forum and others trying to find a solution to get rid of these %20 (encoded spaces) so the final returned URL will look like 1) or 2) above.
I know someone on here must know how to do this... Help ;-)
If you replace the %20 with -, then how would you know where the therapy type ends and the city starts?
pilates-training-los-angeles-ca
would be
type=pilates
city=training
state=los
So I don't think you like to replace the %20 by -. You could however replace it with another character, like _:
pilates_training-los_angeles-ca
You then would have to translate every _ to a space within your PHP script (or whatever language you are using server side).

urlencoded Forward slash is breaking URL

About the system
I have URLs of this format in my project:-
http://project_name/browse_by_exam/type/tutor_search/keyword/class/new_search/1/search_exam/0/search_subject/0
Where keyword/class pair means search with "class" keyword.
I have a common index.php file which executes for every module in the project. There is only a rewrite rule to remove the index.php from URL:-
RewriteCond $1 !^(index\.php|resources|robots\.txt)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php [L,QSA]
I am using urlencode() while preparing the search URL and urldecode() while reading the search URL.
Problem
Only the forward slash character is breaking URLs causing 404 page not found error.
For example, if I search one/two the URL is
http://project_name/browse_by_exam/type/tutor_search/keyword/one%2Ftwo/new_search/1/search_exam/0/search_subject/0/page_sort/
How do I fix this? I need to keep index.php hidden in the URL. Otherwise, if that was not needed, there would have been no problem with forward slash and I could have used this URL:-
http://project_name/index.php?browse_by_exam/type/tutor_search/keyword/one
%2Ftwo/new_search/1/search_exam/0/search_subject/0
Apache denies all URLs with %2F in the path part, for security reasons: scripts can't normally (ie. without rewriting) tell the difference between %2F and / due to the PATH_INFO environment variable being automatically URL-decoded (which is stupid, but a long-standing part of the CGI specification so there's nothing can be done about it).
You can turn this feature off using the AllowEncodedSlashes directive, but note that other web servers will still disallow it (with no option to turn that off), and that other characters may also be taboo (eg. %5C), and that %00 in particular will always be blocked by both Apache and IIS. So if your application relied on being able to have %2F or other characters in a path part you'd be limiting your compatibility/deployment options.
I am using urlencode() while preparing the search URL
You should use rawurlencode(), not urlencode() for escaping path parts. urlencode() is misnamed, it is actually for application/x-www-form-urlencoded data such as in the query string or the body of a POST request, and not for other parts of the URL.
The difference is that + doesn't mean space in path parts. rawurlencode() will correctly produce %20 instead, which will work both in form-encoded data and other parts of the URL.
Replace %2F with %252F after url encoding
PHP
function custom_http_build_query($query=array()){
return str_replace('%2F','%252F', http_build_query($query));
}
Handle the request via htaccess
.htaccess
RewriteCond %{REQUEST_URI} ^(.*?)(%252F)(.*?)$ [NC]
RewriteRule . %1/%3 [R=301,L,NE]
Resources
http://www.leakon.com/archives/865
In Apache, AllowEncodedSlashes On would prevent the request from being immediately rejected with a 404.
Just another idea on how to fix this.
$encoded_url = str_replace('%2F', '/', urlencode($url));
I had the same problem with slash in url get param, in my case following php code works:
$value = "hello/world"
$value = str_replace('/', '/', $value;?>
$value = urlencode($value);?>
# $value is now hello%26%2347%3Bworld
I first replace the slash by html entity and then I do the url encoding.
Here's my humble opinion. !!!! Don't !!!! change settings on the server to make your parameters work correctly. This is a time bomb waiting to happen someday when you change servers.
The best way I have found is to just convert the parameter to base 64 encoding. So in my case, I'm calling a php service from Angular and passing a parameter that could contain any value.
So my typescript code in the client looks like this:
private encodeParameter(parm:string){
if (!parm){
return null;
}
return btoa(parm);
}
And to retrieve the parameter in php:
$item_name = $request->getAttribute('item_name');
$item_name = base64_decode($item_name);
On my hosting account this problem was caused by a ModSecurity rule that was set for all accounts automatically. Upon my reporting this problem, their admin quickly removed this rule for my account.
Use a different character and replace the slashes server side
e.g. Drupal.org uses %21 (the excalamation mark character !) to represent the slash in a url parameter.
Both of the links below work:
https://api.drupal.org/api/drupal/includes%21common.inc/7
https://api.drupal.org/api/drupal/includes!common.inc/7
If you're worried that the character may clash with a character in the parameter then use a combination of characters.
So your url would be
http://project_name/browse_by_exam/type/tutor_search/keyword/one_-!two/new_search/1/search_exam/0/search_subject/0
change it out with js and convert it back to a slash server side.
is simple for me use base64_encode
$term = base64_encode($term)
$url = $youurl.'?term='.$term
after you decode the term
$term = base64_decode($['GET']['term'])
this way encode the "/" and "\"
A standard solution for this problem is to allow slashes by making the parameter that may contain slashes the last parameter in the url.
For a product code url you would then have...
mysite.com/product/details/PR12345/22
For a search term you'd have
http://project/search_exam/0/search_subject/0/keyword/Psychology/Management
(The keyword here is Psychology/Management)
It's not a massive amount of work to process the first "named" parameters then concat the remaining ones to be product code or keyword.
Some frameworks have this facility built in to their routing definitions.
This is not applicable to use case involving two parameters that my contain slashes.
I use javascript encodeURI() function for the URL part that has forward slashes that should be seen as characters instead of http address.
Eg:
"/api/activites/" + encodeURI("?categorie=assemblage&nom=Manipulation/Finition")
see http://www.w3schools.com/tags/ref_urlencode.asp
I solved this by using 2 custom functions like so:
function slash_replace($query){
return str_replace('/','_', $query);
}
function slash_unreplace($query){
return str_replace('_','/', $query);
}
So to encode I could call:
rawurlencode(slash_replace($param))
and to decode I could call
slash_unreplace(rawurldecode($param);
Cheers!
You can use %2F if using it this way:
?param1=value1&param2=value%2Fvalue
but if you use /param1=value1/param2=value%2Fvalue it will throw an error.

Resources