htaccess 301 redirect Find and Replace - .htaccess

I have a website which refers to to various Indian cities. Code makes links from an external source, which may name a city as Calcutta, calcuta, Kolkata, etc.
I need a htaccess 301 redirect rule which, given all the misspellings would redirect the following the known good spelling (Kolkata):
Calcutta-to-Delhi.html to Kolkata-to-Delhi.html
and also
Calcuta-Airport.html to Kolkata-Airport.html
Thanks.

What you're asking for is not simple and need you to have a powerful computer, but the results are simply amazing.
Here's what I'd suggest to do:
You have to test if your expression matches in the URL / i.e. http://mysite.com/(expr1)-to-(expr2)/
If yes and there's no file, redirect to a Php file that handles everything.
In this Php file, analyze the URL (once again): if your expression matches /(expr1)-to-(expr2)/ then do a SOUNDEX search with MySQL (in your Php file). See query sample here.
Then, in this "special" 404 case, do a suggestion, like google does, i.e.: "did you mean Kolkata-to-Delhi.html? if so, click on the link".
This a hard work, but it's both interesting and shows your skill. Very few websites do this (I just know google actually).

Maybe you can try something like this:
# REDIRECTS FROM (eg): http://www.calcutta.com.au/ to http://www.kolkata.com.au/
RewriteCond %{HTTP_HOST} ^calcutta\.com\.au$ [nocase, ornext]
RewriteCond %{HTTP_HOST} ^www\.calcutta\.com\.au$ [nocase]
RewriteRule ^(.*)$ http://www.kolkata.com.au/ [nocase,L,R=301,O]
Notice the "[nocase]" which ignores case. This will only redirect from "calcutta" to "kolkata" and not from "calcuta" or "Kolkata" (notice the missing 't' in calcuta and uppercase 'K' in Kolkata).
You will need to add two more rules following the same pattern of code as above.

Related

How to redirect only when there is something after .html?

I have found that there are some people with bad syntax links to our articles.
For example, we have an article with URL
http://www.oursite.com/demo/article-179.html
The issue is that lot of people have linked back to this article with bad syntax such as
http://www.oursite.com/demo/article-179.html%5Cohttp:/www.oursite.com/demo/glossary.php
Now, I added the following ReWrite Rule in the .htaccess file to take care of such links.
RewriteRule article-179\.html(.*)$ "http\:\/\/www\.oursite\.com\/demo\/article-179\.html [301,L]
But this has resulted in a Redirect Loop message. How can we fix this issue via htaccess rewrite rule. Basically, we need something in our rewrite rule that works only when there is one or more characters after the .html. If not, then it should not redirect.
Any help would be highly appreciated!
With best regards!
Use + instead of *. * matches zero or more, which causes the pattern to match for the redirected path too, + instead matches one or more.
Also you should make the pattern as precise as possible, ie don't just check whether it ends with article-179.html, better check for the full path. And if this all happens on the same domain, then there's no need to use the absolute URL for the redirect.
There's also no need for escaping the substitution parameter like you did, it's treated as a simple string except for:
back-references ($N) to the RewriteRule pattern
back-references (%N) to the last matched RewriteCond pattern
server-variables as in rule condition test-strings (%{VARNAME})
mapping-function calls (${mapname:key|default})
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule
Long story short, theoretically this should do it:
RewriteRule ^demo/article-179\.html(.+)$ /demo/article-179.html [R=301,L]
or this if you really need the absolute URL:
RewriteRule ^demo/article-179\.html(.+)$ http://www.oursite.com/demo/article-179.html [R=301,L]

.htaccess and dynamically generated SEO friendly URLs

I'm trying to build a website that may be called from the URL bar with any one of the following examples:
domainname.com/en
domainname.com/zh-cn/
domainname.com/fr/page1
domainname.com/ru/dir1/page2
domainname.com/jp/dir1/page2/
domainname.com/es-mx/dir1/dir2/page3.html
These page requests need to hit my .htaccess template and ultimately be converted into this php call:
/index.php?lng=???&tpl=???
I've been trying to make RewriteCond and RewriteRule code that will safely deal with the dynamic nature of the URLs I'm trying to take in but totally defeated. I've read close to 50 different websites and been working on this for almost a week now but I have no idea what I'm doing. I don't even know if I should be using a RewriteCond. Here is my last attempt at making a RewriteRule myself:
RewriteRule ^(([a-z]{2})(-[a-z]{2})?)([a-z0-9-\./]*) /index.php?lng=$1&tpl=$4 [QSA,L,NC]
Thanks for any help,
Vince
What's causing your loop is that your regex pattern matching /index.php. Why? Let's take a look:
First, the prefix is stripped because these are rules in an htaccess file, so the URI after the first rewrite is: index.php (query string is separate)
The beginning of your regex: ^(([a-z]{2})(-[a-z]{2})?), matches in in the URI
The next bit of your regex: ([a-z0-9-\./]*) matches dex.php. Thus the rule matches and gets applied again, and will continue to get applied until you've reached the internal recursion limit.
Your URL structure:
domainname.com/en
domainname.com/zh-cn/
domainname.com/fr/page1
domainname.com/ru/dir1/page2
domainname.com/jp/dir1/page2/
domainname.com/es-mx/dir1/dir2/page3.html
Either has a / after the country code or nothing at all, so you need to account for that:
# here -------------------v
^(([a-z]{2})(-[a-z]{2})?)(/([a-z0-9-\./]*))?$
# and an ending match here ------------^
You shouldn't need to change anything else:
RewriteRule ^(([a-z]{2})(-[a-z]{2})?)(/([a-z0-9-\./]*))?$ /index.php?lng=$1&tpl=$4 [QSA,L,NC]

redirect old wordpress ?page_id= to non-wordpress site

I used to have a WP site that I converted to a standard html site. Problem is I found doing a google search that instead of http://www.genealogyinc.com it was returning http://www.genealogyinc.com/?page_id=21, I dont know how many pages are like this but am trying to find a htaccess workaround, all the ones I found online give me 500 server errors.
Need a rewrite for any ?page_id= cause I dont know how many other numbers are out there.
Thanks
Off the top of my head, without testing, it would be something like this.
The first line looks for the page_id querystring parameter, and if it meets it, it should pass on to the second line. The rewrite rule I have below may need some tweaking, but I hope this helps you.
RewriteCond %{QUERY_STRING} page_id=(.*)$
RewriteRule $ /? [R=301,L]

How to redirect an erroneous URL

I just noticed that sometimes (even when given a wrong url) load perfectly fine. How do they accomplish this? What I mean is, suppose you click on a link that seems good like www.foo.com but it contains in the end a space character which would appear on the address bar as www.foo.com%20 some sites manage to redirect this to their correct url while others just break. How can this be achieved? I'm guessing it's something to do with the .htaccess but I have no idea what to do or where to do it.
The URL I'd like to redirect looks like this actually: http://foo.com/%C2%A0
I get the following error message:
The requested URL /%C2%A0 was not found on this server.
How can I make this redirection?
So far I came up with:
RewriteEngine on
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^%?\ ]*\%
RewriteCond %{REQUEST_URI} !^/
RewriteRule ^(.*)$ http://www.foo.com/ [R=301,L]
but it's not working at all
URL Rewrite would be the IIS version that may exist in other forms if you want to look at re-writing the URL assuming you mean this kind of case.
Don't forget that browsers may make certain guesses about what someone enters so that if someone types in "foo.com " that the browser may trim white space by default rather than URL encode the text. If "http://foo.com" fails then it may try "http://www.foo.com" for another idea as these could be seen as simple interpretations to take on what someone types in. If both fail then it may just Google the text believing that the address bar should be treated like a search box.

URL Rewriting based on form input

I'm creating a frontpage for my website with a single form and input text, Google-style. It's working fine, however, I want to generate a pretty URL based on the input. Let's say, my input is called "id", and using the GET method of form, and the action defined to "/go/", on submission, the URL will be:
site.com/go/?id=whateverIType
and I want to change it to
site.com/go/whateverIType
I was thinking on Mod Rewrite, but if the user put something in the URL, like:
site.com/go/?dontwant=this&id=whateverIType&somemore=trash
I want to ignore the other variables but "id", and rewrite the rule.
What's the better way of get this done? Thanks in advance!
PS: I'm using CodeIgniter, maybe there's something I can use for it as well. I already have a controller for "go".
I'm not familiar with CodeIgniter, but you can try the following RewriteRule
RewriteEngine on
RewriteCond %{REQUEST_URI} ^\/go\/
RewriteCond %{QUERY_STRING} id=([^&]*)
RewriteRule (.*) /go/%1? [L,R]
The %1 references the regex group from the previous RewriteCond, and the trailing ? will strip the querystring from the redirected URL.
Hope this helps.
Mod_rewrite supports conditions and rules with RegEx, so you could have a rule that matched the ?id=XXXX, that would extract it from the URL (keeping the other parameters), and rewrote the URL accordingly.
However... I don't think you want to do this, because if you rewrite the URL to be /go/Some+Search+Query, you won't be able to pick it up with say, PHP, without parsing the URL out manually.
It's really tough to have custom, SEO-friendly URLs with user input, but it is technically possible. You're better off leaving in the ?id=XXX part, and instead, using mod_rewrite in the opposite approach... take all URLs that match the pattern /go/My+Search+Terms and translate that back into something like ?id=My+Search+Terms, that way you'll be able to easily parse out the value using the URL's GET parameters. This isn't an uncommon practice - Google actually still uses URL parameters for user input (example URL: http://www.google.com/search?q=test).
Just keep in mind that mod_rewrite rewrites the URL before anything else (even PHP), so anything you do to the URL you need to handle. Think of mod_rewrite as a regular expression-based, global "Find and Replace" for URLs, every time a page is called on the server. For example, if you remove the query string, you need to make sure your website/application/whatever accounts for that.
In application/config/routes.php
$route['go/(:any)'] = "go/index/$1";
Where go is your controller and index is the index action.
http://codeigniter.com/user_guide/general/routing.html
You can use something like this in your .htaccess if you aren't already:
RewriteEngine on
RewriteCond $1 !^(index\.php|images|css|js|robots\.txt)
RewriteRule ^(.*)$ /index.php/$1 [L]

Resources