& Ampersand in URL - .htaccess

I am trying to figure out how to use the ampersand symbol in an url.
Having seen it here: http://www.indeed.co.uk/B&Q-jobs I wish to do something similar.
Not exactly sure what the server is going to call when the url is accessed.
Is there a way to grab a request like this with .htaccess and rewrite to a specific file?
Thanks for you help!

Ampersands are commonly used in a query string. Query strings are one or more variables at the end of the URL that the page uses to render content, track information, etc. Query strings typically look something like this:
http://www.website.com/index.php?variable=1&variable=2
Notice how the first special character in the URL after the file extension is a ?. This designates the start of the query string.
In your example, there is no ?, so no query string is started. According to RFC 1738, ampersands are not valid URL characters except for their designated purposes (to link variables in a query string together), so the link you provided is technically invalid.
The way around that invalidity, and what is likely happening, is a rewrite. A rewrite informs the server to show a specific file based on a pattern or match. For example, an .htaccess rewrite rule that may work with your example could be:
RewriteEngine on
RewriteRule ^/?B&Q-(.*)$ /scripts/b-q.php?variable=$1 [NC,L]
This rule would find any URL's starting with http://www.indeed.co.uk/B&Q- and show the content of http://www.indeed.co.uk/scripts/b-q.php?variable=jobs instead.
For more information about Apache rewrite rules, check out their official documentation.
Lastly, I would recommend against using ampersands in URLs, even when doing rewrites, unless they are part of the query string. The purpose of an ampersand in a URL is to string variables together in a query string. Using it out of that purpose is not correct and may cause confusion in the future.

A URI like /B&Q-jobs gets sent to the server encoded like this: /B%26Q-jobs. However, when it gets sent through the rewrite engine, the URI has already been decoded so you want to actually match against the & character:
Rewrite ^/?B&Q-jobs$ /a/specific/file.html [L]
This makes it so when someone requests /B&Q-jobs, they actually get served the content at /a/specific/file.html.

Related

Multiple and Variable parameters URL rewriting

I don't know how to rewrite URLs of this type:
mywebsite/param1-val1-param2-val2-param3-val3-param4-val4.html
that's really simple to do BUT my problem is that my parameters are variables like:
mywebsite/param1-val1-param3-val3-param4-val4.html
or
mywebsite/param3-val3-param4-val4.html
so, the number of parameters is not always the same. It can sometimes be just one, sometimes it can be 10 or more. It redirects to a search script which will grab the parameters through GET querystring.
What I want to do is to not write (on htaccess) a line for every link. The links are pretty simple in that form separated by a -(hyphen) sign.
Rather than rely on complex rewrite rules, I would suggest a simple rewrite rule and then modifying the code of your web application to do the hard part. Supporting this kind of variable parameters is not something that a rewrite rule is going to be very good at on its own.
I would use the following rewrite rule that intercepts any url that contains a hyphen separator and ends in .html
RewriteRule ^(.+[\-].+)\.html$ /query.html?params=$1
Then in your web application can get the parameters from the CGI parameter called params . They look like this now param1-val1-param3-val3-param4-val4. Your code should then split on the hyphens, and then put the parameters into a map. Most web frameworks support some way of adding to or overriding the request parameters. If so, you can do this without doing invasive modifications to the rest of your code.

Is it possible with canonical URL for this pattern in htaccess: /a/*/id/uniqueid?

A big problem is that I am not a programmer….! So I need to solve this with means within my own competence… I would be very happy for help!
I have an issue with a lot of duplicated URLs in the Google index and there are strong signs that it is causing SEO problems.
I don’t have duplicate links on the site itself, but as it once was set-up, for certain pages the system allows all sorts of variations in the URL. As long as is it has a specific article-id, the same content will be presented under an infinite number of URLs.
I guess the duplicates in Google's index has been growing over long time and is due to links gone wrong from other sites that links to mine. The problem is that the system have accepted the variations.
Here are examples of variations that exists in the Google index:
site.com/a/Cow_Cat/id/5272
site.com/a/cow_cat/id/5272
site.com/a/cow…cat/id/5272
site.com/a/cowcat/id/5272
site.com/a/bird/id/5272
The first URL with mixed case is the one used site-wide and for now I have to live with it, it would take too long time to make a change to all lower case. I cannot make a manual effort via htaccess as it is a total of 300.000 articles. I believe there are 10 ‘s of thousands that have one or more duplicates.
My question is this:
Is it possible to create rules for canonical URLs in htaccess in order to make the above URLs to be handled as one as well as for the rest of the 300.000?
I e, is there a way to say that all URLs having
/a/*/id/uniqueid
should be seen as one = based only on the unique ID and not give any regard to the text expressed with the “*”?
My hope is that it would be possible to say that a certain pattern like above should only be differentiated by the last unique segment.
If it is not possible in htaccess, how would it be done with link rel="canonical" on each page, can the code include wildcards?
I should add that the majority of the duplicates are caused by incoming links being lower case where the site itself is using a mix. Would it be OK to assign a canonical URL only with lower case although the site itself is basically always using a mix of lower/upper case?
If this is possible, I would be very happy to be helped with how to do it!!!!
Jonas
Hi Michael! I am not an expert but this is how I think it could be done:
1) My problem is that the URLs have mixed cases and I cannot change that now.
2) If it is OK for the searchengines, it would be fine for me to make the canonical URL identical to the actual URLs with the difference that it was all lower case, that would solve approx 90% of the duplicates. I e this would be the used URL: site.com/a/Cow_Cat/id/5272 and this would be the canonical: site.com/a/cow_cat/id/5272. As I understand, that would be good SEO...or...?
My idea was NOT to change the address browser address bar (i e using 301 redirect) but rather just telling the search engines which URLs that are duplicates, as I understand, that can be done by defining a canonical URL either in htaccess (as a pattern - I hope) or as a tag on each page.
3) IF, it would be possible to find a wildcard solution...I am not sure if this is possible at all, but that would mean it was possible to NOT assign a specific canonical URL but rather a "group pattern", i e "Please search engine, see all URLs with this patter - having the unique identifier in the end - as if they are one and the same URL, you SE, decide which one you prefer": /a/*/id/uniqueid
Would that work? It will only work in htaccess if canonical URLs can be defined as a group where the group is defined as a pattern with a defined part as the unique id.
Is it possible when adding a tag for each page to say that "all URLs containing this unique id should be treated the same"? If that would work it would look something similar to this
link rel="canonical" /a/*/id/5272
I dont know if this syntax with wildcard exist but it would be nice : )
My advice would be to use 301 redirects, with URL rewriting. Ask your webmaster to place this in your apache config or virtual host config:
RewriteMap lc int:tolower
Then inside your .htaccess file you can use the map ${lc:$1} to convert matches to lower case. Here, the $1 part is a match (backreference from brackets in a regex in the RewriteRule) and the ${lc: } part is just how you apply the lc (lowercase) function set up earlier. Here is an example of what you might want in your .htaccess file:
RewriteCond %{REQUEST_URI} [A-Z] #this matches a url with any uppercase characters
RewriteRule (.*) /${lc:$1} [L,R=301] #this makes it lowercase
As for matching the IDs, presuming your examples mean "always end with the ID" you could use a regex like:
^(.+/)(\d+))$
The first match (brackets) gets everything up to and including the forward slash before the ID, and the second part grabs the ID. We can then use it to point to a single, specific URL (like canonical, but with a 301).
If you do just want to use canonical tags, then you'll have to say what you're using code wise, but an example I use (so as not add tags to hundreds of individual pages, for instance) in PHP would be:
if ($_SERVER["REDIRECT_URL"] != "") {
$canonicalUrl = $_SERVER["SERVER_NAME"] . $_SERVER["REDIRECT_URL"];
} else if ($_SERVER["REQUEST_URI"] != "") {
$canonicalUrl = $_SERVER["SERVER_NAME"] . preg_replace('/^([^?]+)\?.*$/', "$1", $_SERVER['REQUEST_URI']);
}
Here, the redirect URL is used if it's available, and if not the request uri is used. This code strips off the query string (this bold bit in http://www.mysite.com/a/blah/12345/?something=true). Of course you can add to this code to specify a custom path, not just taking off the query string, by playing with the regex.

Mod-Rewrite to a query string

I need to create a series of redirects for personalized urls. They need to go from a directory on my server to a more complex url on another which includes a query string based on the original url. Here is an example:
I would like to rewrite from this:
http://www.mywebsite.com/TestDirectory/John.Doe
to this:
http://their.server.com/adifferentdirectoryname/page.aspx?u=John.Doe&s=lorem&dm=purl
There will be hundreds of these personalized urls I will send out, so I need the solution to account for that so that I don't have to write this for hundreds of names.
Any help is greatly appreciated. Much Thanks!
I think you want something like this:
RewriteRule ^TestDirectory/(\w+\.\w+)$ foo.aspx?u=$1 [R]
The regex \w+\.\w+ matches a word, a dot, and another word. The $1 is replaced with the captured string from the regex. The [R] means to actually redirect the user.
These rules are tough to get just right so I recommend reading through some examples.

Is it possible to handle such URL

http://www.example.com/http://www.test.com
I have tried many different methods using .htaccess with no luck. I need to get that second url coming as parameter. Is it possible to redirect it to index.php and get it as $_SERVER["REQUEST_URI"] or other method? Thanks
UPD: Looks like it is impossible to get whole URL, question marks are not recognized. Ideal example:
127.0.0.1/http://www.test.com/script.php?a=hello&b=world#blabla;par2?%par3
and i need to get in my index.php exact string
www.test.com/script.php?a=hello&b=world#blabla;par2?%par3
It's definitely possible: http://downforeveryoneorjustme.com/http://www.google.com/
As to how, it's been covered on ServerFault already
The Problem:
This is a problem with Apache running on Windows. Apache on Windows does not let you have a colon (:) in your REQUEST URI. This is basically for avoiding URLs like http://www.mysite.com/C:/SomeFile.exe but is actually annoying.
If you use mod_rewrite at the same time it will be skipped.
You and some applications (like wikipedia) uses colon : in URL. so what to do in Apache on Windows?
The Solution:
At the time of writing this answer this bug still persists and there is no absolute solution, BUT there is a trick:
You may change your URL to something like this:
http://www.mysite.com/url/http://www.test.com
in this example http://www.mysite.com/ is your SCRIPT PATH and /url/http://www.test.com is your REQUEST URI.
The problem will be gone if there is a Slash (/) before Colon (:).
You can get the URI but only without the fragment since that is not transmitted to the server. Try this rule:
RewriteRule ^http:/ index.php [L]
Then the requested URI path plus query (so the part from the third / up to the first # or the end of the URI) is available at $_SERVER['REQUEST_URI'].

How to use htaccess to rewrite url to html anchor tag (#)

I have a situation where I want to take the following URL:
/1/john
and have it redirect using Apache's htaccess file to go to
/page.php?id=1&name=john#john
so that it goes to an html anchor with the name of john.
I've found a lot of reference to escaping special characters, and to adding the [NE] flag so that the redirect ignores the # sign, but these don't work. For example, adding [NE,R] means that the URL just appears in the browser address as the original: http://example.com/page.php?id=1&name=john#john.
This is possible using [NE] flag (noescape).
By default, special characters, such as & and ?, for example, will be converted to their hexcode equivalent. Using the [NE] flag prevents that from happening.
More info http://httpd.apache.org/docs/2.2/rewrite/flags.html#flag_ne
You can in fact do one of these things, but not both.
You can use the [NE] flag to signify to Apache not to escape the '#' character, but for the redirect to work, you have to specify an absolute URL to redirect to, not simply a relative page. Apache cannot do the scrolling of the window down to the anchor for you. But the browser will, if you redirect to an absolute URL.
What you want to do, can be accomplished with URL rewriting, or, more specifically, URL beautification.
I just quickly found this well explained blog post for you, I hope it can help you out with the learning to rewrite URLs-part.
As for the #-thing (expecting that you now know what I'm talking about), I don't see a problem in passing the same variable to the rewritten URL twice. Like: (notice the last part of the first line)
RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)$ /$1/$2/#$2 [R]
RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)/$ /index.php?page=$1&subpage=$2
Though, you'll have to escape the #-part, and it seems that it can be done this way:
RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)$ /$1/$2/\%23$2 [R,NE]
BTW, URL rewriting is not that hard (but can become complicated, and I'm not an expert), but Google can help a lot along the way.
You cannot do an internal redirect to an anchor. (Just think about it: how would Apache scroll down to the anchor?) Your link should pointo to /1/john#john. Anchors aren't part of the request uri.

Resources