.htaccess: redirect specific link to another? - .htaccess

I have these three links:
localhost/my_projects/my_website.php
localhost/my_projects/my_website.html
localhost/my_projects/my_website
The paths of the php and html files are as follows:
C:\xampp\htdocs\my_projects\my_website.php
C:\xampp\htdocs\my_projects\my_website.html
The link without an extension is "artificial" and I want to use said link:
localhost/my_projects/my_website
to get the contents of either of these links:
localhost/my_projects/my_website.php
localhost/my_projects/my_website.html
The reason for the two example files, instead of just one, is that I want to be able to switch between those two files when I edit the htaccess file. Obviously I only want to access one of those files at a time.
What do I need to have in my .htaccess file inside the my_projects folder to accomplish that? How can I make one specific link redirect to another specific link?

After reading your comment clarifying your folder structure I corrected the RewriteRule. (By the way, it would be best if you add that info to the question itself instead of in comments).
The url you want to target is: http://localhost/my_projects/my_website
http:// is the protocol
localhost is your domain (it could also be 127.0.0.1 or a domian name like www.example.com in the Internet)
I assume you are running Apache on port 80, otherwise in the url you need to also specify the port. For port 8086 for example it would be http://localhost:8086/my_projects/my_website.
The real path is htdocs/my_projects/my_website.php or htdocs/my_projects/my_website.html depending on your needs (obviously both won't work at the same time).
Here the my_projects in the "fake" url collides with the real folder "my_projects" so Apache will go for the folder and see there is no my_website (with no extension) document there (it won't reach the rewrite rules).
There is a question in SO that provides a work around for this, but it is not a perfect solution, it has edge cases where the url will still fail or make other urls fail. I had posted it yesterday, but I seem not to find it now.
The simple solution if you have the flexibility for doing it is to change the "fake" url for it not to collide with the real path.
One option is for example to replace the underscores with hyphens.
Then you would access the page as http://localhost/my-projects/my-website if you want to keep a sort of "fake" folder structure in the url. Otherwise you could simply use http://localhost/my-website.
Here are both alternatives:
# This is for the directory not to be shown. You can remove it if you don't mind that happening.
Options -Indexes
RewriteEngine On
#Rule for http://localhost/my-projects/my-website
RewriteRule ^my-projects/my-website(.+)?$ my_projects/my_website.php$1 [NC,L]
#Rule for http://localhost/my-website
RewriteRule ^my-website(.+)?$ my_projects/my_website.php$1 [NC,L]
(Don't use both, just choose one of these two, or use them to adapt it to your needs)
The first part the rewrite rule is the regular expression for your "fake" url, the second part is the relative path of your real folder structure upto the page you want to show.
In the regular expression we capture whatever what we assume to be possible query parameters after .../my_website, and paste it after my_website.php in the second part of the rule (the $1).
Later on if you want to point the url to my_website.html, you have to change the second part of the rule, where it says .php, replace it by .html.
By the way, it is perfectly valid and you'll see it in most SEO friendly web sites to write an url as http://www.somesite.com/some-page-locator, and have a rewrite rule that translates that url to a page on the website, which is what I had written in my first answer.

Related

URL rewrite with HTACCESS for multiple language support

I am looking to add multiple language support to my website. Is it possible to use the .htaccess file to change something like:
example.com/dir/?lang=en to example.com/en/dir/
example.com/main/?lang=de to example.com/de/main/
example.com/main/page.php?lang=de to example.com/de/main/page.php
Where this works with any possible directories - so if for instance later on I made a new directory, I wouldn't need to add to this. In the above example, I want the latter to be what the user types in/is in the address bar, and the start to be how it is used internally.
Is it possible to use the .htaccess file to change something like:
example.com/dir/?lang=en to example.com/en/dir/
Yes, except that you don't change the URL to example.com/en/dir/ in .htaccess. You change the URL to example.com/en/dir/ in your internal links in your application, before you change anything in .htaccess. This is the canonical URL and is "what the user types in/is in the address bar" - as you say.
You then use .htaccess to internally rewrite the request from example.com/en/dir/, back into the URL your application understands, ie. example.com/dir/?lang=en (or rather example.com/dir/index.php?lang=en - see below). This is entirely hidden from the user. The user only ever sees example.com/en/dir/ - even when they look at the HTML source.
So, we need to rewrite /<lang>/<url-path>/ to /<url-path>/?lang=<lang>. Where <lang> is assumed to be a 2 character lowercase language code. If you are offering only a small selection of languages then this should be explicitly stated to avoid conflicts. We can also handle any additional query string on the original request (if this is requried). eg. /<lang>/<url-path>/?<query-string> to /<url-path>/?lang=<lang>&<query-string>.
A slight complication here is that a URL of the form /dir/?lang=en is not strictly a valid endpoint and requires further rewriting. I expect you are relying on mod_dir to issue an internal subrequest for the DirectoryIndex, eg. index.php? So, really, this should be rewritten directly to /dir/index.php?lang=en - or whatever the DirectoryIndex document is defined as.
For example, in your root .htaccess file:
RewriteEngine On
# Rewrite "/<lang>/<directory>/" to `/<directory>/index.php?lang=<lang>"
RewriteCond %{DOCUMENT_ROOT}/$2/index.php -f
RewriteRule ^([a-z]{2})/(.*?)/?$ $2/index.php?lang=$1 [L]
# Rewrite "/<lang>/<directory>/<file>" to `/<directory>/<file>?lang=<lang>"
RewriteCond %{DOCUMENT_ROOT}/$2 -f
RewriteRule ^([a-z]{2})/(.+) $2?lang=$1 [L]
If you have just two languages (as in your example), or a small subset of known languages then change the ([a-z]{2}) subpattern to use alternation and explicitly identify each language code, eg. (en|de|ab|cd).
This does assume you don't have physical directories in the document root that consist of 2 lowercase letters (or match the specific language codes).
Only URLs where the destination directory (that contains index.php) or file exists are rewritten.
This will also rewrite requests for the document root (not explicitly stated in your examples). eg. example.com/en/ (trailing slash required here) is rewritten to /index.php?lang=en.
The regex could be made slightly more efficient if requests for directories always contain a trailing slash. In the above I've assumed the trailing slash is optional, although this does potentially create a duplicate content issue unless you resolve this in some other way (eg. rel="canonical" link element). So, in the code above, both example.com/en/dir/ (trailing slash) and example.com/en/dir (no trailing slash) are both accessible and both return the same resource, ie. /dir/index.php?lang=en.

mod_rewrite so a specific directory is removed/hidden from the URL, but still displays the content

I'd like to create a rewrite in .htaccess for my site so that when a user asks for URL A, the content comes from URL B, but the user still sees the URL as being URL A.
So, for example, let's say I have content at mydomain.com/projects/project-example. I want users to be able to ask for mydomain.com/project-example, still see that URL in their address bar, but the browser should display the content from mydomain.com/projects/project-example.
I've looked through several .htaccess rewrite tips and FAQs, but unfortunately none of them seemed to present a solution for exactly what I've described above. Not everything on my domain will be coming from the /projects/ directory, so I'd imagine the rewrite should check to see if the page exists first so it's not appending /projects/ to every url. I'm really stumped.
If a rewrite is not exactly what I need, or if there is a simple solution for this problem, I'd love to hear it.
This tutorial should have everything that you need, including addressing exactly what you are asking: http://httpd.apache.org/docs/2.0/misc/rewriteguide.html . It may just be a matter of terminology.
So, for example, let's say I have content at mydomain.com/projects/project-example. I want users to be able to ask for mydomain.com/project-example, still see that URL in their address bar, but the browser should display the content from mydomain.com/projects/project-example.
With something like:
RewriteRule ^project-example$ /projects/project-example [L]
When someone requests http://mydomain.com/project-example and the URI /project-example gets rewritten internally to /projects/project-example. Note that when this is in an .htaccess file, the URI /project-example gets the leading slash removed when matching.
If you have a directory of stuff, you can use regular expressions and back-references, for example you want any request for http://mydomain.com/stuff/ to map to /internal/stuff/:
RewriteRule ^stuff/(.*)$ /internal/stuff/$1 [L]
So requests for http://mydomain.com/stuff/file1.html, http://mydomain.com/stuff/image1.png, etc. get rewritten to /internal/stuff/file1.html, /internal/stuff/image1.png, etc.

URL rewrite image links

I have third party sites that link to some images on my site. The images were placed in Magento's image cache some time ago. But when the cache is refreshed, Magento modifies the file names and thus the links become unreachable. It is not every image just certain ones that this is happening to. I have 22 images where I need to do this.
How can I modify my .htaccess to make the links go to a static copy of the image located in another directory?
Take a look at mod_alias and RedirectMatch, you can use regular expressions to match against a URI and a target (where to redirect to), if you don't need regular expressions, you can just use Redirect.
RedirectMatch /old_image_uri /new_image_uri

Having huge redirect list in .htaccess a Problem?

I want to redirect every post 301 redirect, but I have over 3000 posts.
If I list
Redirect permanent /blog/2010/07/post.html http://new.blog.com/2010/07/23/post/
Redirect permanent /blog/2010/07/post1.html http://new.blog.com/2010/07/24/post1/
Redirect permanent /blog/2010/07/post2.html http://new.blog.com/2010/07/25/post2/
Redirect permanent /blog/2010/07/post3.html http://new.blog.com/2010/07/26/post3/
Redirect per......
for over 3000 url redirect command in .htaccess would this eat my server resource or cause some problem? Im not sure how .htaccess work but if the server is looking at these lists each time user requests for page, I would guess it will be a resource hog.
I can't use RedirectMatch because I added date variable in my new url. Do you have any other suggestions redirecting these posts? Or am I just fine?
Thanks!
I am not an Apache expert, so I cannot speak to whether or not having 3,000 redirects in .htaccess is a problem (though my gut tells me it probably is a bad idea). However, as a simpler solution to your problem, why not use mod_rewrite to do your redirects?
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/$3/ [R=permanent]
This uses a regex to match old URLs and rewrite them to new ones. The [R=permanent] instructs mod_rewrite to issue a 301 with the new URL instead of silently rewriting the request internally.
In your example, it looks like you've added the day of the post to the URL, which does not exist in the old URL. Since you obviously cannot use a regexp to divine the day an arbitrary post was made, this method may not work for you. If you can drop the day from the URL, then you're good to go.
Edit: The first time I read your question, I missed the last paragraph. ("I can't use RedirectMatch because I added date variable in my new url.") In this case, you can use mod_rewrite's RewriteMap to lookup the day component of a post.
You have two options:
Use a hashmap to perform fast lookups in a static file. This means all your old URLs will work, but any new posts cannot be accessed using the old URL scheme.
Use a script to grab the day.
In option one, create a file called posts.txt and put:
/yyyy/mm/pppp dd
...for each post where yyyy is the year of the post, mm is the month, and pppp is the post name (without the .html).
When you're done, run:
$ httxt2dbm -i posts.txt -o posts.map
Then we add to to the server/virtual server config: (Note the path is a filesystem path, not a URL.)
RewriteMap postday dbm:/path/to/file/posts.map
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/${postday:$1/$2/$3}/$3/ [R=permanent]
In option two, use pgm:/path/to/script/lookup.whatever as your RewriteMap. See the mod_rewrite documentation for more info about using a script.
Doing the lookup in mod_rewrite is better than just redirecting to a script which looks up the date and then redirects to the final destination because you should never redirect more than once. Issuing a 301 or 302 incurs a round trip cost, which increases the latency of your page load time.
If you have some way in code to determine the day of a post, you can generate the rewrite on the fly. You can setup a mod_rewrite pattern, something like .html and set up a front controller pattern to calculate the new url from the old and issue the 301 header.
With php as an example:
$_SERVER['REQUEST_URI']
will contain the requested url and
header("Location: http://new.blog.com/$y/$m/$d/$title/",TRUE,301);
will send a redirect.
That's... a lot of redirects. But the first thing I would tell you, and probably the only thing I can tell you without qualification, is that you should run some tests and see what the access times for your blog are like, and also look at the server's CPU and memory usage while you're doing it. If they're fairly low even with that giant list of redirects, you're okay as long as your blog doesn't experience a sudden increase in traffic. (I strongly suspect the 3000 rewrites will be slowing Apache down a lot, though)
That being said, I would second josh's suggestion of replacing the redirects with something dynamic. Like animuson said, if you're willing to drop the day from the URL, it'll be easy to set up a RewriteRule directive to handle the redirection. Otherwise, you could do it with a PHP script, or generally some code in whatever scripting language you (can) use. If you're using one of the popular blog engines, it probably contains code to do this already. Basically you could do something like
RewriteRule .* /blog/index.php
and just let the PHP script sort out which post was requested. It has access to the database so it'll be able to do that, and then you can either display the post directly from the PHP script, or to recover your original redirection behavior, you can send a Location header with the correct URL.
An alternative would be to use RewriteMap, which lets you write a RewriteRule where the target is determined by a program or file of your choice instead of being directly specified in the configuration file. As one option, you can specify a text file that contains the old and new URLs, and Apache will handle searching the file for the appropriate line for any given request. Read the documentation (linked above) for the full details. I will mention that this isn't used very often, and I'm not sure how much faster it would be compared to just having 3000 redirects.
Last tip: Apache can be significantly faster if you're able to move the configuration directives (like Redirect) into the server or virtual host configuration file, and disable reading of .htaccess entirely. I would guess that moving 3000 directives from .htaccess into the virtual host configuration could make your server considerably faster. But even moving the directives into the vhost config file probably wouldn't produce as much of a speedup as using a single RewriteRule.
It's never a good idea to make a massive list of Redirects. A better programming technique is to simply redirect the pages without that date variable then have a small PHP snippet that detects if it's missing and redirects to the URL with it included. The long list looks tacky and slows down Apache because it's checking that URL (any every other URL that might not even be affected by this) against each line. If it were only 5 or so, I'd say fine, but 3,000 is a definite NO.
Although I'm not a big fan of this method, a better choice would be to redirect all those URLs normally using a single match statement, redirecting them to the page without the date part, or with a dash or something, then include a small PHP snippet to check if the date is valid and if not, rewrite the path again to the correctly formed URL.
Honestly, if you didn't have that part there before, you don't need it now, and it will probably just confuse the search engines changing the URL for 3,000 posts. You don't really need a date in the URL, a good title is much more meaningful not only to users, but also to search engines, than a bunch of numbers.

Getting "mywebsite.org/" to resolve to "mywebsite.org/index.php"

At my work we have various web pages that, my boss feels, are being ranked lower than they should be because "mywebsite.org/category/" looks like a different URL to search engines than "mywebsite.org/category/index.php" does, even though they show the same file. I don't think it works this way but he's convinced. Maybe I'm wrong though. I have two questions:
How do i make it so that it will say "index.php" in the address bar of all subcategories?
Is this really how pagerank works?
Besides changing all the links everywhere, a simpler solution is to use a rewrite rule. Make sure it is a permanent redirect, or Google will keep using the old link (without index.php). How you do this exactly depends on your web server, but for Apache HTTPd it looks something like the example given below.
Yes. Or so I've heard. Very few people know for sure. But Google mentions this guideline (as "Be consistent"). Make sure to check out all of Google's Webmaster guidelines.
Apache config for rewrite rule:
# in the generic config
LoadModule rewrite_module modules/mod_rewrite.so
# in your virutal host
RewriteEngine On
# redirect everything that ends in a slash to the same, but with index.php added
RewriteRule ^(.*)/$ $1/index.php [R=301,L]
# or the other way around, as suggested
# RewriteRule ^(.*)/index.php$ $1/ [R=301,L]
Adding this code to the top of every page should also work:
<?php
if (substr($_SERVER['REQUEST_URI'], -1) == '/') {
$new_request_uri = $_SERVER['REQUEST_URI'].'index.php';
header('HTTP/1.1 301 Moved Permanently');
header('Location: '.$new_request_uri);
exit;
}
?>
You don't tell us if you're using straight PHP or some other framework, but for PHP, probably you just need to change all the links on your site to "mywebsite.org/category/index.php".
I think it's possible that this does affect your search engine rank. However, you would be better off using only "mywebsite.org/category" rather than adding "index.php" to each one.
Bottom line is that you need to make sure all your links in your website use one or the other. What actually gets shown in the address bar is unimportant.
A simple solution is to put in the <head> tag:
<link rel="canonical" href="http://mywebsite.org/category/" />
Then, no matter which page the search engine ends up on, it will know it is simply a different view of /category/
And for your second question--yes, it can affect your results, if Google thinks you are spamming. If it wasn't, they wouldn't have added support for rel="canonical". Although I wouldn't be surprised if they treat somedir/index.* the same as somedir/
I'm not sure if /category/ and /category/index.php are considered two urls for seo, but there is a good chance that it will effect them, one way or another. There is nothing wrong with making a quick change just to be sure.
A few thoughts:
URLs
Rather than adding /index.php, you will be better off making it so there is no index.php on any of them, since the keyword 'index' is probably not what you want.
You can make a script that will check if the URL of the current page ends in index.php and remove it, then forward to the resulting URL.
For example, on one of my sites, I require the 'www.' for my domain (www.domain.com and domain.com are considered two URLs for search purposes, though not always), so I have a script that checks each page and if there is no www., it ads it, and forwards.
if (APPLICATION_LIVE) {
if ( (strtolower($_SERVER["HTTP_HOST"]) != "www.domain.com") ) {
header("HTTP/1.1 301 Moved Permanently"); // Recognized by search engines and may count the link toward the correct URL...
header("Location: " . 'www.domain.com/'.$_SERVER["REQUEST_URI"] );
exit();
}
}
You could mode that to do what you need.
That way, if a crawler visits the wrong URL, it will be notified that it was replaced with the correct URL. If a person visits the wrong URL, they will be forwarded to the correct URL (most won't notice), and then if they copy the url from the browser to send someone or link to that page, they will end up linking to the correct url for that page.
LINKING URLS
They way other pages link to your pages is more important for seo. Make sure all your in-site links use the proper URL (without /index.php), and that if you have a 'link to this page' feature, it doesn't include the /index.php part. You can't control how everyone links to you, but you can take some control over it, like with the script in item 1.
URL ROUTING
You may also want to consider using some sort of framework or stand-alone URL rerouting scheme. It could make it so there were more keywords, etc.
See here for an example: http://docs.kohanaphp.com/general/routing
I agree with everyone who's saying to ditch the index.php. Please don't force your visitor to type index.php if not typing it could get them the same result.
You didn't say if you're on an IIS or Apache server.
IIS can be set to assume index.php is the default page so that http:// mywebsite.org/ will resolve correctly without including index.php.
I would say that if you want to include the default page and force your users to type the page name in the url, make the page name meaningful to a search engine and to your visitors.
Example:
http://mywebsite.org/teaching-web-scripting.php
is far more descriptive and beneficial for SEO rankings than just
http://mywebsite.org/index.php
Might want to take a look at robots.txt files? Not quite the best solution, but you should be able to implement something workable with them...

Resources