URL Rewrite with encoded and "unencoded" special characters - .htaccess

I need to write redirects for a list of URLs and got some problems because they contain encoded as well as "unencoded" special characters.
Example:
http://example.com/lvl-+-1/écrire/gar%C3%A7ons/2/fr
As you can see, there is an é as well as encoded characters like %C3%A7 (for ç) in the same URL. How can I write a redirect for this?
Currently I'm trying the following, I already escaped the +characters with \:
RewriteRule ^lvl-\+-1/écrire/gar%C3%A7ons/2/fr https://www.example.com/Boîtes [NC,L,R=301,NE]
The new URLs contains special characters like î, therefore I set the NEtag.
Unfortunately this isn't working, because I guess the characters are either not encoded at all or twice.
Is there a way to catch such URLs with with "unencoded" and encoded characters?

To match % character use hex notation as \xMN.
This rule should work for you:
RewriteRule ^lvl-\+-1/écrire/gar\xC3\xA7ons/2/fr/?$ /Boîtes [NC,L,R=301,NE]

Related

Is it possible to put a line break into a 'mailto:' rewrite in htaccess?

For complex reasons I've had to remove an enquiry form from a web site and use a 'mailto:' instead. For simplicity I've changed the htaccess file so that the former 'contact' link to the form now becomes a 'mailto:' as follows:
RewriteRule ^contact$ mailto:myname#mydomain.com?subject=BusinessName\ BandB\ Enquiry&body=You\ can\ find\ our\ availability\ on\ line.\ Delete\ this\ content\ if\ inapplicable
That does work, my local e-mail client (Thunderbird) opens with the information correctly shown in subject and body. (My TB is set to compose in plain text, I've yet to test with HTML)
I would like to introduce a new line in the body so that 'Delete this content if inapplicable' is on a separate line. Is there any way to do this? Given mod_rewrite's intended purpose I could understand if there isn't but I thought I'd ask before giving up.
I would like to introduce a new line in the body so that 'Delete this content if inapplicable' is on a separate line.
New lines in the body need are represented by two characters: carriage return (char 13) + line feed (char 10) (see RFC2368). This would need to be URL encoded in the resulting URL as %0D%0A.
When used in the RewriteRule substitution string the literal % characters would need to backslash-escaped to negate their special meaning as a backreference to the preceding CondPattern (which there isn't one). ie. \%0D\%0A. Otherwise, you will end up with the string DA, because there is no %0 backreference in this example.
You can also avoid having to backslash-escape all the literal spaces by encloses the entire argument (substitution string) in double quotes.
So, try the following instead:
RewriteRule ^contact$ "mailto:myname#mydomain.com?subject=BusinessName BandB Enquiry&body=You can find our availability on line.\%0D\%0ADelete this content if inapplicable" [R,L]

htaccess RewriteRule - pattern not matching

I'm trying to use categories in a clean way in my urls like this:
website.com/category
In the url the categories are written like this: Some random examples:
Animals
Consumer-Electronics
Books-&-Comics
External-Hard-Discs
Form,-Beauty-&-Health
Black-&-White-TV
The-Adventures-Of-Tintin
Fryers,-Waffle-makers-&-Cooking
etc...
As you can see, there is a random combination of words (with starting upper case), characters "-", ",", and "&". There are more combinations than the examples.
With rewrite I'm trying to get the categories in a variable like this:
RewriteRule ^([\w-&]+)$ /categories.php?mcn=$1 [L,NC]
This is not working. If I read out the variable I wanted with "Books-&-Comics" in categories.php, I only get "Books-" while it should be "Books-&-Comics".
When I add a "," in the character class like this:
RewriteRule ^([\w,-&]+)$ /categories.php?mcn=$1 [L,NC]
I get an internal server error.
How should my RewriteRule look like to match the category examples and get them correctly in the variable?
For your first problem, the issue is that your parameters are being decoded and thus the & is starting a new URL parameter. You can fix this by adding a B flag to your rule.
Your second issue is that the pattern ^([\w,-&]+)$ is invalid. It is trying to match any word character, or any character between , and &. (Ascii 44 & 38) because this is out of order, the regex fails. As you want to match the - character rather than using it as a range indicator, it should be escaped.
With these changes made your rule is:
RewriteRule ^([\w,\-&]+)$ /categories.php?mcn=$1 [L,NC,B]
A regex helper like regex101 can be a huge help in creating your rules.

Removal of trailing dot in RewriteRule of .htaccess

The .htaccess rewrite rule applied in a restful database application:
RewriteRule ^author/([A-z.]+)/([A-z]+)$ get_author.php?first_name=$1&last_name=$2
applied to
http://localhost:8080/API/author/J./Doe
removes the period from "J." and the resulting name "J Doe" is obviously not in the database (while "J. Doe" is). This rewrite rule only removes a trailing period, e.g. "J.O" translates correctly to "J.O". I use XAMPP 7.0.6 plus Apache under Windows 10. What to do in order to NOT remove the trailing dot on the initial?
Update:
Apparently my question wasn't clear, I give it another try.
The regexp (RewriteRule) above is supposed to assign "J." to the variable $1. Instead it assigns "J" to $1, in other words, the regex drops the trailing dot. Secondly, the regex assigns "Doe" to the variable $2, this assignment is as expected and correct. The variables $1 (with incorrect value "J") and $2 (with correct value "Doe") are used in a database search. This search fails because of the missing dot. The database contains "J. Doe", but not "J Doe".
When a dot is not trailing, as in "J.O", the variable $1 gets the correct value "J.O". In other words, the regex does not remove all dots, only the trailing ones.
My question is: how can I tell (the rewrite engine of) .htaccess to apply the regexp correctly?
For comparison, the following piece of JS code does what I want:
var regexp = "^author/([A-z.]+)/([A-z]+)$";
var result = "author/J./Doe".match(regexp);
alert(result[1] + " " + result[2]);
This is apparently (still) a "feature": https://bz.apache.org/bugzilla/show_bug.cgi?id=20036
Problem: Apache strips all trailing dots and spaces unless the path segments is exactly "." or "..".
I ran into the problem because I tried to map an URL from get/a/b/c to get.php?param1=a&param2=b&param3=c, but c can legitimately have trailing dots. The issue is not actually mod_rewrite related but happens with regular URLs too, example URL of a file that's definitely not named this way: Example favicon file. Other servers don't do this. Example: Stackoverflow favicon file, which turns this into a way to detect an Apache server when the HTTP server header is stripped.
To work around this problem, I still map the URL using mod_rewrite, but then in the PHP script, I use the exact same regex to manually map the parameters:
if(preg_match('#/get/([^/]+)/([^/]+)/(.+)$#',$_SERVER['REQUEST_URI'],$matches)){
$param1=$matches[1];
$param2=$matches[2];
$param3=$matches[3];
}
Instead of using the PATH_INFO, I use the REQUEST_URI because it's untouched.
This means if you absolutely need to pass trailing dots in a path string to a backend using apache, your best bet right now is to write an intermediate script that extracts the proper parameters and then does the proxy request for you.

Need a mod_rewrite .htaccess solution to replace %20 spaces with -'s in the finished URL

I need an .htaccess mod_rewrite solution that will take a .cgi search query like this:
www.mydomain.com/cgi-bin/finda/therapist.cgi?Therapy_Type=Pilates Training&City=Los Angeles&State=CA
and return matching results in the browser's address bar to look like this:
www.mydomain.com/therapists/Pilates-Training-Los-Angeles-CA.html
or better yet:
www.mydomain.com/therapists/pilates-training-los-angeles-ca.html
Notice the database includes values with one, two or three words + spaces...
For example:
Therapy_Type=Pilates Training <- includes a space
City=Los Angeles <- includes a space
State=CA <- no space
I used the tool at: http://www.generateit.net/mod-rewrite/ to generate the following RewriteRule:
RewriteEngine On
RewriteRule ^([^-]*)-([^-]*)-([^-]*)\.html$ /cgi-bin/finda/therapist.cgi?Therapy_Types=$1&City=$2&State=$3 [L]
This does work (finds the search matches) and generates the results page, but because the parameter values have spaces in them, we end up with a URL that looks like this:
www.mydomain.com/therapists/Pilates%20Training-Los%20Angeles-CA.html
I've spent days in this forum and others trying to find a solution to get rid of these %20 (encoded spaces) so the final returned URL will look like 1) or 2) above.
I know someone on here must know how to do this... Help ;-)
If you replace the %20 with -, then how would you know where the therapy type ends and the city starts?
pilates-training-los-angeles-ca
would be
type=pilates
city=training
state=los
So I don't think you like to replace the %20 by -. You could however replace it with another character, like _:
pilates_training-los_angeles-ca
You then would have to translate every _ to a space within your PHP script (or whatever language you are using server side).

URL Rewrite with 3 parameters for forum .htaccess

I am trying to URL Rewrite to the following URL
Http://***.com/index.php?p=forum&mod=view_posts&page=$3&name=$2&id=$1
Http://***.com/forum/{id}-{name}/{page}
Http://***.com/forum/1-Hello-World/1
I have tryed the following code and have had no joy
RewriteRule ^forum/([^-]+)-([^&]+)/([^-]+)$ index.php?p=forum&mod=view_posts&page=$3&orderby=$2&id=$1
Thanks
That regex isn't very good: you see, the "([^&]+)" says: "one or more characters, up until the first ampersand", while you have no ampersands in the subject. Also, the "([^-]+)$" says "one or more characters before a hyphen", while you don't intend to end the subject with a hyphen.
Try this one:
^forum/([^-]+)-([^/]+)/(.+)$
But note that this actually captures any characters in the id and page positions, so you might be better off with
^forum/([0-9]+)-([^/]+)/([0-9]+)$
as that allows only numbers in those positions.
Also, you probably meant "index.php?p=forum&mod=view_posts&page=$3&name=$2&id=$1" instead of "index.php?p=forum&mod=view_posts&page=$3&orderby=$2&id=$1"

Resources