Using one directory with different aliases using .htaccess - .htaccess

I'm redeveloping a site (replacing it with one based on CodeIgniter), which is currently a horrid mess of repeated procedural code, however, it has good search engine rankings. Because of this, I need to keep the exact same URL structure.
The company has many different quote pages, which are all essentially the same - so I've produced one clean version which can be used everywhere.
The quote system is now in a folder called /get-quote, but due to the old URLs being required, that folder mustn't be visible anywhere.
I'd like the following to happen, but don't know how to:
A user accessing /insurancequote.php should (on the server) load the /get-quote/ directory (which in turn will load the default CI route). The Base URL in CI should be http://www.mysite.com/insurancequote.php (I'm able to do that bit), so moving to step 2 would result in: http://www.mysite.com/insurancequote.php/step2 (which would map to /get-quote/step2).
Secondly, a user accessing /brokerquote.php should show mysite.com/broker in the address bar (redirect?), but on the server access /get-quote/broker.
Thirdly, a user accessing one of many broker-specific pages, e.g. mysite.com/brokername1.php or mysite.com/broker/brokername2.php (yep, they are scattered all over the place! - but I do know where each one is) should show mysite.com/broker/brokername1 or mysite.com/broker/brokername2. On the server, /get-quote/broker/brokername1 or /get-quote/broker/brokername2 should be accessed.
I don't think what I've written is completely clear, so maybe sudocode helps:
If '/insurancequote.php'
Dont Redirect
Use '/get-quote/'
If '/brokerquote.php'
Redirect '/broker/'
Use '/get-quote/broker/'
// Do the following (manually) for each broker
If '/brokername1.php'
Redirect '/broker/brokername1/'
Use '/get-quote/broker/brokername1/'
If '/brokers/bname2.php'
Redirect '/broker/brokername2/'
Use '/get-quote/broker/brokername2/'
If '/mybrokerpage.php'
Redirect '/broker/mybroker/'
Use '/get-quote/broker/mybroker/'
Is this possible? If so, how would I go about doing it?
Thanks!

The risk you take is messing up your new clean code for historical reasons (and the guy coming next you will say, WTF, this is a mess!).
For me the right solution would be handling the url migration in apache and not in your application. Every refferenced url that you do not want to keep should get a 410 - Gone message (think about referenced images for example) and every referenced page which have a new matching page should get a 301 - moved permanently redirection on the right page. Then after some time as gone check the access log of your server, and if nobody checks the old url anymore then remove the rules.
If you know every old url and every matching url then use a matching url file (or hash file, faster) and manage the redirection 301 with rewriteMap. You can have a really big number of files in a hash file, the match should be fast. And it should be a temporary function, waiting for robots to fix the urls.

Related

WebURK without any kind of subdirectories

It is my first time seeing something like this.
Does anyone know, what the name/kind/type of the website is that does not have any kind of subdirectories on the web-URL page, and it always just stays as a plain domain name, and how it was made, and how it can be avoided since I need to send an API call to one of those subdirectories?
Example:
I have a website let's call it example.net. It has UI page and it has a home page, which should look like this in a browser: example.net/home, or it has a /shipment option inside of the UI page. So the URL should look like this:
example.net/shipment and it has one more subdirectory inside for example /report, and if I select it, it should look like this: example.net/shipment/report (something like this).
And open up that subdirectory, but again web-URL link on a website continues to stay just as a example.net all the time.
And for some reason whatever subdirectory I would go on a website, Web-browser URL will remain as a hello-world.net all the time without any kind of changes subdirectories on a web-browser URL.
It is an internal website, so I can not post examples of it from work here.
Does anyone knows, what the name of that kind of set?
How it can be avoided? Since I need to send an API request to one of the subdirectories?
I am not a developer, and I am new to IT, so I am not really sure, what the name of this, and how does it works.
If you are on example.net/shipment and you want to link to a subdirectory, the link needs to include that subdirectory. You have two possibilities:
Root relative links: <a href=/shipment/report>
Absolute links: <a href=https://example.com/shipment/report>
If you shipment directory has a trailing slash (example.net/shipment/), you a third possibility. (Note this only works with a shipment URL that is different than what you specified in your question.)
Document relative links: <href=report>
There is no name for websites that don't have subdirectories that I know of. Websites are often set up like this to make the URLs easy to type and remember which helps with SEO.

.htaccess: redirect specific link to another?

I have these three links:
localhost/my_projects/my_website.php
localhost/my_projects/my_website.html
localhost/my_projects/my_website
The paths of the php and html files are as follows:
C:\xampp\htdocs\my_projects\my_website.php
C:\xampp\htdocs\my_projects\my_website.html
The link without an extension is "artificial" and I want to use said link:
localhost/my_projects/my_website
to get the contents of either of these links:
localhost/my_projects/my_website.php
localhost/my_projects/my_website.html
The reason for the two example files, instead of just one, is that I want to be able to switch between those two files when I edit the htaccess file. Obviously I only want to access one of those files at a time.
What do I need to have in my .htaccess file inside the my_projects folder to accomplish that? How can I make one specific link redirect to another specific link?
After reading your comment clarifying your folder structure I corrected the RewriteRule. (By the way, it would be best if you add that info to the question itself instead of in comments).
The url you want to target is: http://localhost/my_projects/my_website
http:// is the protocol
localhost is your domain (it could also be 127.0.0.1 or a domian name like www.example.com in the Internet)
I assume you are running Apache on port 80, otherwise in the url you need to also specify the port. For port 8086 for example it would be http://localhost:8086/my_projects/my_website.
The real path is htdocs/my_projects/my_website.php or htdocs/my_projects/my_website.html depending on your needs (obviously both won't work at the same time).
Here the my_projects in the "fake" url collides with the real folder "my_projects" so Apache will go for the folder and see there is no my_website (with no extension) document there (it won't reach the rewrite rules).
There is a question in SO that provides a work around for this, but it is not a perfect solution, it has edge cases where the url will still fail or make other urls fail. I had posted it yesterday, but I seem not to find it now.
The simple solution if you have the flexibility for doing it is to change the "fake" url for it not to collide with the real path.
One option is for example to replace the underscores with hyphens.
Then you would access the page as http://localhost/my-projects/my-website if you want to keep a sort of "fake" folder structure in the url. Otherwise you could simply use http://localhost/my-website.
Here are both alternatives:
# This is for the directory not to be shown. You can remove it if you don't mind that happening.
Options -Indexes
RewriteEngine On
#Rule for http://localhost/my-projects/my-website
RewriteRule ^my-projects/my-website(.+)?$ my_projects/my_website.php$1 [NC,L]
#Rule for http://localhost/my-website
RewriteRule ^my-website(.+)?$ my_projects/my_website.php$1 [NC,L]
(Don't use both, just choose one of these two, or use them to adapt it to your needs)
The first part the rewrite rule is the regular expression for your "fake" url, the second part is the relative path of your real folder structure upto the page you want to show.
In the regular expression we capture whatever what we assume to be possible query parameters after .../my_website, and paste it after my_website.php in the second part of the rule (the $1).
Later on if you want to point the url to my_website.html, you have to change the second part of the rule, where it says .php, replace it by .html.
By the way, it is perfectly valid and you'll see it in most SEO friendly web sites to write an url as http://www.somesite.com/some-page-locator, and have a rewrite rule that translates that url to a page on the website, which is what I had written in my first answer.

Notes 9, rewriting URLs

How do you rewrite a URL in Notes 9 XPages.
Let's say I have:
www.example.com/myapp.nsf/page-name
How do I get rid of that .nsf part:
www.example.com/page-name
I don't want to do lots of manual re-direct because my pages are dynamically formed like wordpress.
I've read this: http://www.ibm.com/developerworks/lotus/library/ls-Web_site_rules/
It does not address the issue.
If you use substitution rules like the following, you can get rid of the db.nsf part and call your XPages directly as example.com/xpage1.xsp:
Rule (substitution): /db.nsf/* -> /db.nsf/*
Rule (substitution): /* -> /db.nsf/*
However, you have to "manually" generate your URLs without the db.nsf part in e.g. menus because the XPages runtime will include the db.nsf part in the URLs if you use for instance the openPage simple action.
To completely control what is going in and out put your Domino behind an Apache HTTP and use mod_rewrite. On Domino 9.0 Windows you can use mod_domino
You can do it with a mix of subsitutions, "URL-pattern" and paritial refresh.
I had the same problem, my customers wants clean URLs for SEO.
My URLs now looks like these:
www.myserver.de/products/financesoftware/anyproduct
First i used one subsitution to cover the folder, database and xpage part of the URL.
My substitution: "/products" -> "/web/techdemo.nsf/product.xsp"
Problem with these is, any update on this site (with in redirect mode) and the user gets back the "dirty" URL.
I solved this with the use of paritial refreshes only.
Last but not least, i uses my own slash pattern at the end of the xpage call (.xsp)
In my case thats the "/financesoftware/anyproduct/" part.
I used facesContext.getExternalContext().getRequestPathInfo() to resolve that URL part.
Currently i used good old RegExp to get the slash separated parameters back out of the url, but i am investigating a REST solution at the moment.
I haven't actually done this, but just saw the option yesterday while looking for something else. In your Xpage, go to All Properties, and look at 'navigationRules' and 'pageBaseUrl'. I think you will find what you are looking for there.

CakePHP nice urls - how to prevent normal urls from working

I have a website that's written using CakePHP. I've added some rewrite rules in the .htacces file to change the default urls to different ones (instead of /controller1/action1/parameter I have /some-string-about-controller-and-action/parameter, for example).
The problem is that now both the normal url and the nice one are available, and google seems to be indexing both, which is a problem. I'd like to only keep the nice one, which is the proper way to handle this so that it affects the google results as little as possible?
I don't know why you don't want to use cakes own routing (if you are having trouble doing what you want, you can accomplish what you want with a custom route class), then make sure that you redirect all relevant URL's in your .htaccess file to the desired URL using a MOVED PERMANENTLY redirect.
This way google will index the target url instead of the one that is undesirable. You are right to take offense to this, double indexing is a great way to harm your SEO rankings.

Having huge redirect list in .htaccess a Problem?

I want to redirect every post 301 redirect, but I have over 3000 posts.
If I list
Redirect permanent /blog/2010/07/post.html http://new.blog.com/2010/07/23/post/
Redirect permanent /blog/2010/07/post1.html http://new.blog.com/2010/07/24/post1/
Redirect permanent /blog/2010/07/post2.html http://new.blog.com/2010/07/25/post2/
Redirect permanent /blog/2010/07/post3.html http://new.blog.com/2010/07/26/post3/
Redirect per......
for over 3000 url redirect command in .htaccess would this eat my server resource or cause some problem? Im not sure how .htaccess work but if the server is looking at these lists each time user requests for page, I would guess it will be a resource hog.
I can't use RedirectMatch because I added date variable in my new url. Do you have any other suggestions redirecting these posts? Or am I just fine?
Thanks!
I am not an Apache expert, so I cannot speak to whether or not having 3,000 redirects in .htaccess is a problem (though my gut tells me it probably is a bad idea). However, as a simpler solution to your problem, why not use mod_rewrite to do your redirects?
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/$3/ [R=permanent]
This uses a regex to match old URLs and rewrite them to new ones. The [R=permanent] instructs mod_rewrite to issue a 301 with the new URL instead of silently rewriting the request internally.
In your example, it looks like you've added the day of the post to the URL, which does not exist in the old URL. Since you obviously cannot use a regexp to divine the day an arbitrary post was made, this method may not work for you. If you can drop the day from the URL, then you're good to go.
Edit: The first time I read your question, I missed the last paragraph. ("I can't use RedirectMatch because I added date variable in my new url.") In this case, you can use mod_rewrite's RewriteMap to lookup the day component of a post.
You have two options:
Use a hashmap to perform fast lookups in a static file. This means all your old URLs will work, but any new posts cannot be accessed using the old URL scheme.
Use a script to grab the day.
In option one, create a file called posts.txt and put:
/yyyy/mm/pppp dd
...for each post where yyyy is the year of the post, mm is the month, and pppp is the post name (without the .html).
When you're done, run:
$ httxt2dbm -i posts.txt -o posts.map
Then we add to to the server/virtual server config: (Note the path is a filesystem path, not a URL.)
RewriteMap postday dbm:/path/to/file/posts.map
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/${postday:$1/$2/$3}/$3/ [R=permanent]
In option two, use pgm:/path/to/script/lookup.whatever as your RewriteMap. See the mod_rewrite documentation for more info about using a script.
Doing the lookup in mod_rewrite is better than just redirecting to a script which looks up the date and then redirects to the final destination because you should never redirect more than once. Issuing a 301 or 302 incurs a round trip cost, which increases the latency of your page load time.
If you have some way in code to determine the day of a post, you can generate the rewrite on the fly. You can setup a mod_rewrite pattern, something like .html and set up a front controller pattern to calculate the new url from the old and issue the 301 header.
With php as an example:
$_SERVER['REQUEST_URI']
will contain the requested url and
header("Location: http://new.blog.com/$y/$m/$d/$title/",TRUE,301);
will send a redirect.
That's... a lot of redirects. But the first thing I would tell you, and probably the only thing I can tell you without qualification, is that you should run some tests and see what the access times for your blog are like, and also look at the server's CPU and memory usage while you're doing it. If they're fairly low even with that giant list of redirects, you're okay as long as your blog doesn't experience a sudden increase in traffic. (I strongly suspect the 3000 rewrites will be slowing Apache down a lot, though)
That being said, I would second josh's suggestion of replacing the redirects with something dynamic. Like animuson said, if you're willing to drop the day from the URL, it'll be easy to set up a RewriteRule directive to handle the redirection. Otherwise, you could do it with a PHP script, or generally some code in whatever scripting language you (can) use. If you're using one of the popular blog engines, it probably contains code to do this already. Basically you could do something like
RewriteRule .* /blog/index.php
and just let the PHP script sort out which post was requested. It has access to the database so it'll be able to do that, and then you can either display the post directly from the PHP script, or to recover your original redirection behavior, you can send a Location header with the correct URL.
An alternative would be to use RewriteMap, which lets you write a RewriteRule where the target is determined by a program or file of your choice instead of being directly specified in the configuration file. As one option, you can specify a text file that contains the old and new URLs, and Apache will handle searching the file for the appropriate line for any given request. Read the documentation (linked above) for the full details. I will mention that this isn't used very often, and I'm not sure how much faster it would be compared to just having 3000 redirects.
Last tip: Apache can be significantly faster if you're able to move the configuration directives (like Redirect) into the server or virtual host configuration file, and disable reading of .htaccess entirely. I would guess that moving 3000 directives from .htaccess into the virtual host configuration could make your server considerably faster. But even moving the directives into the vhost config file probably wouldn't produce as much of a speedup as using a single RewriteRule.
It's never a good idea to make a massive list of Redirects. A better programming technique is to simply redirect the pages without that date variable then have a small PHP snippet that detects if it's missing and redirects to the URL with it included. The long list looks tacky and slows down Apache because it's checking that URL (any every other URL that might not even be affected by this) against each line. If it were only 5 or so, I'd say fine, but 3,000 is a definite NO.
Although I'm not a big fan of this method, a better choice would be to redirect all those URLs normally using a single match statement, redirecting them to the page without the date part, or with a dash or something, then include a small PHP snippet to check if the date is valid and if not, rewrite the path again to the correctly formed URL.
Honestly, if you didn't have that part there before, you don't need it now, and it will probably just confuse the search engines changing the URL for 3,000 posts. You don't really need a date in the URL, a good title is much more meaningful not only to users, but also to search engines, than a bunch of numbers.

Resources