Ampersands in URL problems - .htaccess

I have a php page which creates URL like:
vendors/London City/cat-DJ & Entertainment/keywords
which my .htaccess redirects as shown below
RewriteRule vendors/(.+)/cat-(.+)/(.+)$ vendors.php?location=$1&category=$2&freetext=$3 [L]
RewriteRule vendors/(.+)/cat-(.+)/(.+)/$ vendors.php?location=$1&category=$2&freetext=$3 [L]
problem 1 is : in the vendors.php file, I am getting only "DJ ; Entertainment" as category. The ampersand is missing.
Problem 2 is : My complete .htaccess file is shown below... 6 rules are defined.
RewriteRule vendors/(.+)/(.+)/$ vendors.php?location=$1&freetext=$2 [L]
RewriteRule vendors/(.+)/(.+)$ vendors.php?location=$1&freetext=$2 [L]
RewriteRule vendors/(.+)/cat-(.+)/$ vendors.php?location=$1&category=$2 [L]
RewriteRule vendors/(.+)/cat-(.+)$ vendors.php?location=$1&category=$2 [L]
RewriteRule vendors/(.+)/cat-(.+)/(.+)$ vendors.php?location=$1&category=$2&freetext=$3[L]
RewriteRule vendors/(.+)/cat-(.+)/(.+)/$ vendors.php?location=$1&category=$2&freetext=$3[L]
Why the URL vendors/London City/cat-DJ & Entertainment/keywords is matching with rule 3 or 4 and redirecting to vendors.php?location=$1&category=$2 ?
Does .htaccess Process the rules from top to beginning one by one?
I had solved the problem by putting the rules 5 and 6 at the top of other rules. Did I make the correct fix?

1. I don't really like the idea of having spaces and other special characters in the URLs. I don't know if it's possible with your site, but instead of this kind of URL
vendors/London City/cat-DJ & Entertainment/keywords
you should have this one:
vendors/london-city/cat-dj-and-entertainment/keywords
For that, of course, you will have to perform some additional transformations / lookups in your database to convert london-city back to London City and dj-and-entertainment back to DJ & Entertainment. This can be done by storing these "from-to" pairs in database.
2. In any case -- order of rules matters. Therefore you should start with more specific rules and end up with more generic rules.
Also -- the (.+) pattern is a way too broad as it can match hello as well as hello/pink/kitten. To ensure that you always grab only one section (part of URL between /) use ([^/]+) pattern instead -- this will address one of the aspects of your "prob #2".
Therefore, try these optimized rules (each rule will match the URL with and without trailing slash):
RewriteRule ^vendors/([^/]+)/cat-([^/]+)/([^/]+)/?$ vendors.php?location=$1&category=$2&freetext=$3 [L]
RewriteRule ^vendors/([^/]+)/cat-([^/]+)/?$ vendors.php?location=$1&category=$2 [L]
RewriteRule ^vendors/([^/]+)/([^/]+)/?$ vendors.php?location=$1&freetext=$2 [L]
Also I'm not getting the value of 'category' with the Ampersand as
given in the url. I am getting only semi-colon. What can be the
reason?
I do not have Apache box currently running next to me, so cannot check it right now, but try adding B or NE flag next to the L flag (e.g. [L,B]) -- one of them should help:
http://httpd.apache.org/docs/current/rewrite/flags.html#flag_b
http://httpd.apache.org/docs/current/rewrite/flags.html#flag_ne

From the docs:
The order in which these rules are defined is important - this is the order in which they will be applied at run-time.

Related

.Htaccess redirect with multiple query strings

I tried doing searching and trying to understand how to do a redirect with (multiple) query strings but I didn't have luck. I'm hoping someone here can help me understand this issue :)
I'm working on this ecommerce shop and people are searching the ecommerce search input for content located in a different CMS. For example, the word "returns". This isn't a product in the ecommerce system so of course it returns an error for the results (no products found).
My idea was simply to manually redirect those quieres to the proper landing pages in the CMS.
Here's an example of the URL for "return" on the ecommerce system:
http://www.domain.com/catalog/search.php?mode=search&page=1&substring=return
And here's where I would like to send people:
http://www.domain.com/catalog/Returns.html
Any thoughts on how to do this? Thanks in advance!
Solution
The way to do this is as Phil suggested; but with a few (small) modifications:
RewriteEngine On
RewriteCond %{QUERY_STRING} substring=returns? [NC]
RewriteRule . /catalog/Returns.html? [L]
RewriteCond %{QUERY_STRING} substring=shipping [NC]
RewriteRule . /catalog/Shipping.html? [L]
N.B. In the event you only want to remove one parameter see the Additional Information and Explanations below.
N.B. For more strict matching see Where & becomes a problem below.
Explanation
Background
The best way for me to explain the difference (between the above and Phil's original) and why you were having a problem is to explain what is going on...
RewriteCond %{QUERY_STRING} substring=returns? [NC] checks the query string for instances of the regex that follows it in this case substring=returns?*.
The [NC] flag simply means to match upper and lower case letters.
*Clarification: The regex(substring=returns?) means substr=return is matched literally with or without an s.
Problem
If the condition is met (i.e. the regex pattern is matched in the query string) then the rewrite rule is triggered. This is where the problem lies...
Given the URL: http://example.com/?substring=returns
The original rule:
RewriteRule . /catalog/Returns.html [L]
Rewrites the URL leaving the query string in place, like so:
http://example.com/?substring=returns
http://example.com/catalog/Returns.html?substring=returns
http://example.com/catalog/Returns.html?substring=returns
http://example.com/catalog/Returns.html?substring=returns
http://example.com/catalog/Returns.html?substring=returns
...and so on until limit is reached...
Side note: The [L] flag stops the .htaccess file from going through any more rules but it doesn't stop it looping again.
Solution
The solution then is to overwrite the query string (since we no longer need it) you can do this simply by adding a ? to the end of the RewriteRule:
RewriteRule . /catalog/Returns.html? [L]
N.B. In the event you only want to remove one parameter see the Additional Information and Explanations below.
N.B. For more strict matching see Where & becomes a problem below.
Resources
The following resources may come in helpful in the future:
.htaccess flags
http://httpd.apache.org/docs/current/rewrite/flags.html
Regular expressions
http://www.regular-expressions.info/ - Check out the tutorials section
Additional Information and Explanations
Where & becomes a problem
RewriteCond %{QUERY_STRING} &substring=returns? [NC]
In the above the regex means to match the characters &substring=return with an optional s appended to it.
So it would match the following as expected:
http://example.com/?var1=somvalue&substring=return
http://example.com/?var1=somvalue&substring=returns
http://example.com/?var1=somvalue&substring=return&var2=othervalue
http://example.com/?var1=somvalue&substring=returns&var2=othervalue
Which is fine and given the original query string wouldn't be a problem, however, if I were to navigate to the page and write in the parameters in a different order, the & wouldn't necessarily be there and therefore it wouldn't match (when it should):
http://example.com/?substring=return&var1=somevalue
http://example.com/?substring=returns&var1=somevalue
Simply getting rid of it (as I did) would solve this problem, but it doesn't come risk free.
RewriteCond %{QUERY_STRING} substring=returns? [NC]
If you were to introduce a new parameter secondsubstring for example it would match when it shouldn't:
Good Match > http://example.com/?substring=return&var1=somevalue
Good Match > http://example.com/?var1=somevalue&substring=return
Bad Match > http://example.com/?secondsubstring=return&var1=somevalue
To solve this potential issue you could do the following:
RewriteCond %{QUERY_STRING} ^(.*&)?substring=returns?
The above will match:
http://example.com/?substring=return&var1=somevalue
http://example.com/?var1=somevalue&substring=return
But won't match:
http://example.com/?secondsubstring=return&var1=somevalue
One more potential problem is that the expression would match:
http://example.com/?substring=returning&var1=somevalue
http://example.com/?substring=return%20television&var1=somevalue
My understanding, again, is that this wouldn't be a problem in the given situation. However if it were to be a problem you could do:
RewriteCond %{QUERY_STRING} ^(.*&)?substring=returns?(&|$)
The above checks that the character following return/returns is either an & signalling the end of the variable and the start of a new one or the end of the query string.
Rewriting one parameter
In some circumstances as Phil pointed out it may be preferable to only remove one parameter at a time and leave the rest of the query string untouched.
You can do this, quite simply, by implementing capture groups in the RewriteCond and outputting them in the RewriteRule:
RewriteCond %{QUERY_STRING} ^(.*&)?substring=returns?(&.*)?$ [NC]
RewriteRule . /catalog/Shipping.html?%1%2 [L]
Rewrite explanation
You use %N to insert capture groups from the rewrite condition and $N to insert capture groups from the rewrite rule.
So in this case we redirect to:
/catalog/shipping.html?(RewriteCond Group1)(RewriteCond Group2)
/catalog/Shipping.html?%1%2
The [L] flag - as previously - stops the processing of any rules further down the .htaccess file
Regex explanation
^(.*&)?substring=returns?(&.*)?$
^ Start of string
(.*&)? First capture group
Capture any character . 0 or more times *
Followed by an &
The ? makes the entire group optional
substring=returns? Matches substring=return literally with an optional s
(&.*)? Second capture group
Capture an &
Capture any character . 0 or more times *
The ? again makes the group optional
$ End of string
[L] flag vs [END]
For completeness sake...
The [L] flag stops the .htaccess from going over any more rules further down the .htaccess file.
The [END] flag stops the rewrite process completely.
To illustrate with an example:
while(TRUE){
if(condition1){ continue; }
if(condition2){ continue; }
if(condition3){ continue; }
if(condition4){ continue; }
}
while(TRUE){
if(condition1){ break; }
if(condition2){ break; }
if(condition3){ break; }
if(condition4){ break; }
}
In the above code blocks the [L] flag acts like a continue statement in that it skips the rest of the code block and starts again. Whilst the [END] flag acts as a break statement and stops the loop entirely.
If we were to replace the [L] flag with [END] in Phil's original answer then it would work. With the caveats mentioned in the Where & becomes a problem section above.
RewriteEngine On
RewriteCond %{QUERY_STRING} &substring=returns? [NC]
RewriteRule . /catalog/Returns.html [L]
RewriteCond %{QUERY_STRING} &substring=shipping [NC]
RewriteRule . /catalog/Shipping.html [L]
etc.
Would something like that do the job for you? Note that 'returns?' means 'return' or 'returns'. Are you limited to one search term at a time, or might customers type in a phrase? I think & is safe to use there, but it's possible it's not.
Don't forget to do this stuff ahead of any commands to rewrite Returns.html to Returns.php, do SEO, etc.

mod_rewrite .htaccess with %20 translate to -

I have been reading about .htaccess files for a couple of hours now and I think I'm starting to get the idea but I still need some help. I found various answers around SO but still unsure how to do this.
As far as I understand you write a rule for each page extension you want to 'prettify', so if you have something.php , anotherpage.php, thispage.php etc and they are expecting(will receive??) arguments, each needs its own rule. Is this correct?
The site I want to change has urls like this,
maindomain.com/sue.php?r=word1%20word2
and at least one page with two arguments
maindomain.com/kevin.php?r=place%20name&c=person%20name
So what I would like to make is
maindomain.com/sue/word1-word2/
maindomain.com/kevin/place-name/person-name/
Keeping this .php page and making it look like the directory. Most of the tutorials I have read deal with how to remove the .php page to which the argument is passed. But I want to keep it.
the problem I am forseeing is that all of the .php?r=parts of the url are the same ie sue.php?r=, kevin.php?r= and the .htaccess decides which URL to change based on the filename and then omits it. If I want to keep the file name will I have to change the ?r=
so that it is individual? I hope this make sense. So far I have this, but I'm sure it won't work.
Options +FollowSymLinks
RewriteEngine On
RewriteRule ^([a-zA-Z0-9]+)/$1.php?r=$1
RewriteRule ^([a-zA-Z0-9]+)/$1.php?r=$1&c=$1
And I think I have to add ([^-]*) this in some part or some way so that it detects the %20 part of the URL, but then how do I convert it to -. Also, how are my $_GET functions going to work??
I hope my question makes sense
You're missing a space somewhere in those rules, but I think you've got the right idea in making 2 separate rules. The harder problem is converting all the - to spaces. Let's start with the conversion to GET variables:
# check that the "sue.php" actually exists:
RewriteCond %{REQUEST_URI} ^/([a-zA-Z0-9]+)/([^/]+)/?$
RewriteCond %{DOCUMENT_ROOT}/%1.php -f
RewriteRule ^([a-zA-Z0-9]+)/([^/]+)/?$ /$1.php?r=$2 [L,QSA]
RewriteCond %{REQUEST_URI} ^/([a-zA-Z0-9]+)/([^/]+)/([^/]+)/?$
RewriteCond %{DOCUMENT_ROOT}/%1.php -f
RewriteRule ^([a-zA-Z0-9]+)/([^/]+)/([^/]+)/?$ /$1.php?r=$2&c=$3 [L,QSA]
Those will take a URI that looks like /sue/blah/ and:
Extract the sue part
Check that /document_root/sue.php actually exists
rewrite /sue/blah/ to /sue.php?r=blah
Same thing applies to 2 word URI's
Something like /kevin/foo/bar/:
Extract the kevin part
Check that /document_root/kevin.php actually exists
3 rewrite /kevin/foo/bar/ to /kevin.php?r=foo&c=bar
Now, to get rid of the "-" and change them to spaces:
RewriteCond %{QUERY_STRING} ^(.*)(c|r)=([^&]+)-(.*)$
RewriteRule ^(.*)$ /$1?%1%2=%3\ %4 [L]
This looks a little messy but the condition matches the query string, looks for a c= or r= in the query string, matches against a - in the value of a c= or r=, then rewrites the query string to replace the - with a (note that the space gets encoded as a %20). This will remove all the - instances in the values of the GET parameters c and r and replace them with a space.

How to write this .htaccess rewrite rule

I am setting up a MVC style routing system using mod rewrite within an .htaccess file (and some php parsing too.)
I need to be able to direct different URLs to different php files that will be used as controllers. (index.php, admin.php, etc...)
I have found and edited a rewrite rule that does this well by looking at the first word after the first slash:
RewriteCond %{REQUEST_URI} ^/stats(.*)
RewriteRule ^(.*)$ /hello.php/$1 [L]
However, my problem is I want it to rewrite based on the 2nd word, not the first. I want the first word to be a username. So I want this:
http://www.samplesite.com/username/admin to redirect to admin.php
instead of:
http://www.samplesite.com/admin
I think I just need to edit the rewrite rule slightly with a 'anything can be here' type variable, but I'm unsure how to do that.
I guess you can prefix [^/]+/ to match and ignore that username/
RewriteCond %{REQUEST_URI} ^/[^/]+/stats(.*)
RewriteRule ^[^/]+/(.*)$ /hello.php/$1 [L]
then http://www.samplesite.com/username/statsadmin will be redirecte to http://www.samplesite.com/hello.php/statsadmin (or so, I do not know the .htaccess file)
To answer your question, "an anything can be here type variable" would be something like a full-stop . - it means "any character". Also the asterisk * means "zero or more of the preceding character or parenthesized grouped characters".
But I don't think you need that...If your matching url will always end in "admin" then you can use the dollar sign $ to match the end of the string.
Rewrit­eRule admin$ admin.php [R,NC,L]
Rewrites www.anything.at/all/that/ends/in/admin to www.anything.at/admin.php

migrated system with new urls

I am switching system from a MVC to a custom code system. Currently we are using this format for urls:
index.php?part=CAPACITOR&type=CERAMIC&model=0805&page=spec
I need now to rewrite urls like to be more nice for user like
mysitecom/CAPACITOR/
mysitecom/CAPACITOR/CERAMIC/
mysitecom/CAPACITOR/CERAMIC/0805/spec.html#2
where #1 and #2 are the pages loaded in jquery. The developer use the old way using /index.php/CAPACITOR/CERAMIC/0805/spec.html
Because I don't think using the index.php in the url is good, what can I do to make this better?
Here's what you need to use
RewriteEngine On
RewriteBase /
RewriteRule ^([a-z0-9\-_]+)/?$ index.php?part=$1&type=all&model=all&page=index [L,NC]
RewriteRule ^([a-z0-9\-_]+)/([a-z0-9\-_]+)/?$ index.php?part=$1&type=$2&model=all&page=index [L,NC]
RewriteRule ^([a-z0-9\-_]+)/([a-z0-9\-_]+)/([a-z0-9\-_]+)/?$ index.php?part=$1&type=$2&model=$3&page=index [L,NC]
RewriteRule ^([a-z0-9\-_]+)/([a-z0-9\-_]+)/([a-z0-9\-_]+)/([a-z0-9\-_\.]+)\.html$ index.php?part=$1&type=$2&model=$3&page=$4 [L,NC]
So when a folder (example CERAMIC) is not provided you can add a flag to load all, same idea for model. It means that if only the first part is provided only he first rule will be used. As of the page.html by default you can load the index.
Now, a-z0-9\-_ means any letters, numbers, dashes and underscore ONLY. You can use ([^/]+) if you prefer that will allow you to use more characters.
The L mean last meaning if the rule match, it will stop. NC means non case so A = a or ABC = abc.

URL Beautification using .htaccess

in search of a more userfriendly URL, how do i achieve both of the following, elegantly using only .htaccess?
/de/somepage
going to /somepage?ln=de
/zh-CN/somepage#7
going to /somepage?ln=zh-CN#7
summary:
/[language]/[pagefilenameWithoutExtension][optional anchor#][a number from 0-9]
should load (without changing url)
/[pagefilenameWithoutExtension]?ln=[language][optional anchor#][a number from 0-9]
UPDATE, after provided solution:
1. exception /zh-CN/somepage should be reachable as /cn/somepage
2. php generated thumbnails now dont load anymore like:
img src="imgcpu?src=someimage.jpg&w=25&h=25&c=f&f=bw"
RewriteRule ^([a-z][a-z](-[A-Z][A-Z])?)/(.*) /$3?ln=$1 [L]
You don't need to do anything for fragments (eg: #7). They aren't sent to the server. They're handled entirely by the browser.
Update:
If you really want to treat zh-CN as a special case, you could do something like:
RewriteRule ^zh-CN/(.*) /$1?ln=zh-CN [L]
RewriteRule ^cn/(.*) /$1?ln=zh-CN [L]
RewriteRule ^([a-z][a-z])/(.*) /$2?ln=$1 [L]
I would suggest the following -
RewriteEngine on
RewriteRule ^([a-z][a-z])/([a-zA-Z]+) /$2?ln=$1
RewriteRule ^([a-z][a-z])/([a-zA-Z]+#([0-9])+) /$2?ln=$1$3
The first rule takes care of URLs like /de/somepage. The language should be of exactly two characters
length and must contain only a to z characters.
The second rule takes care of URLs like /uk/somepage#7.

Resources