htaccess to remove print.html from url - .htaccess

We are currently hosting a large joomla site.
Google has indexed hundreds of the "print" versions of our pages.
for example if we have an article with the url:
www.mysite.com/funnyarticle.html
the joomla site automatically created:
www.mysite.com/funnyarticle/print.html
We have moved the site and deleted these pages, so they now get a 404 error from google.
We would like to redirect or rewrite (not sure what is the correct terminology) the "print" urls to their respective articles.
I would like to use htaccess to remove:
/print.html
and replace it with:
.html
I have seen examples but cannot get them to work correctly.
So I was hoping I could get specific advise on how to remove and replace the exact code above.
Thanks for your time.
Regards,
Aforantman

You can create a robot.txt file with following lines.
User-agent: *
Disallow: /*/print.html
this will disallow search engine robots to access files with name print.html.

You probably want to use a RewriteRule. See Apache's guide on how to use them: http://httpd.apache.org/docs/2.0/rewrite/rewrite_guide.html
But if you just want Google (and other search engines) to ignore those print versions, put a corresponding entry in you robots.txt. That way you don't need to fiddle around with Joomla's way of generating and accessing the print version for your human visitors.

You need to put these lines in your DOCROOT/.htaccess file:
RewriteEngine On
RewriteBase /
RewriteRule ^(.*?)/print.html $1.html [L,R=301]
This will redirect any Google user clicking through to one of these pages to the correct article. If your article names can contain / then remove the ? from the above; the rule will still work but might take a few more μS runtime :-)

You can use robots.txt as said by Jishnu.This is the best way to do this.
User-agent: *
Disallow: /*/print.html

Related

How to ignore/redirect all URLs matching a certain string

I am using the Wordpress plugin, Timely All-in-One events calendar. Unfortunately it is creating a plethora of duplicate URLs which end in strings like (https://www.mywebsite.com/events/action~agenda/page_offset~-2/request_format~json/cat_ids~4) or (https://www.mywebsite.com/events/action~oneday/exact_date~2-4-2019/) for example.
As a consequence of these URL directives each being for a different calendar view but containing the same webpage title and content, some search engines are seeing this as duplicate content. Whilst robots.txt is setup to tell bots to ignore the URLs containing said strings, some crawlers are ignoring robots.txt. I have also disabled the various different calendar views so there is now only the agenda view but even in spite of this, bots continue to crawl these URLs.
Therefore is it possible to use Apache/ a .htaccess directive to tell the server to direct any requests containing "/action~" to either remove the string from the URL so the browser just reads "/events/" or to redirect/forward the URLs to another page.
There are over 500 of these URLs so I ideally would like a quick remedy!
Thanks in advance.
Check this rewrite in your .htaccess file
RewriteEngine On
RewriteRule ^events\/action(.*)$ /events/ [L,R=301]

Is there a 301 wildcard match?

I am having great difficulty fielding all the 301 redirects that seem to be needed following a complete site redesign. Entire sub-directories and their extensive content no longer exist. If, for example, 'myolddata' was a folder that no longer exists, and was full of countless files that each no longer exists either, I would like to say "forget all that and just go to the index page". There seems no other way of closing the door on Google & Bing endlessly reporting squillions of 404's to me. Is there a way of saying in effect:
Redirect 301 /myolddata/ /index.html
Redirect 301 /myolddata/* /index.html
where the first says 'forget the folder' and the second says 'and forget everything tht was in it'?
The second part to this issue is that old PHP files and their arbitrary search parameters are logged too. Stuff like:
oldfile.php?this=1&that=2&somethingelse=3
oldfile.php?this=Tom&that=Dick&somethingelse=Harry
You get the picture. Millions of them. How can I set up a 301 to say "forget oldfile.php and any parameter with it imaginable - they are all gone!"
Your assistance, comments and advice would be incredibly valuable, so all insights welcome please!!
Better use mod_rewrite rules for this using 301 status code. Use this code in your DOCUMENT_ROOT/.htaccess file:
RewriteEngine On
RewriteRule ^(oldfile\.php$|myolddata(/|$)) /index.html? [L,NC,R=301]
? in the end will strip any existing query string.

using mod_rewrite to create SEO friendly URLS

I've been searching google for this but can't find the solution to my exact needs. Basically I've already got my URL's named how I like them i.e. "http://mysite.com/blog/page1.php"
What I'm trying to achieve (if it's possible!) is to use rewrite to alter the existing URLS to: "http://mysite.com/blog/page1"
The problem I've come across is I've found examples that will do this if the user enters "http://mysite.com/blog/page1" into the broweser which is great, however I need it to work for the existing links in google as not to loose traffic, so incoming URLS "http://mysite.com/blog/page1.php" are directed to "http://mysite.com/blog/page1".
The 1st example (Canonical URLs) at the following is pretty much what you want:
http://httpd.apache.org/docs/2.0/misc/rewriteguide.html#url
This should do the trick, rewriting requests without .php to have it, invisible to the user.
RewriteEngine On
RewriteRule ^/blog/([^.]+)$ /blog/$1.php
You will need to write a rewrite rule for mapping your old url's to your new url as a permanent redirect. This will let the search engine know that the new, seo friendly url's are the ones to be used.
RewriteRule blog/page1.php blog/page1 [R=301,L]

How to rename /page.php?1 to /welcome.html in htacces?

I have a cms that does not generate friendly url's
What is the best way to rename this without getting double content by google.
Now I have in .htacces:
RewriteEngine On
RewriteBase /
RewriteRule welcome.html page.php?1[L]
RewriteRule about-us.html page.php?2[L]
Is this the best way to do?
Any help would be appreciated
Google has no problem spidering and indexing this very simple dynamic URL scheme. But if you want extra onpage-optimization-bonus-points with the help of keyword-stuffed-URLs it would be best you switch to a CMS that creates them automatically. You save time by avoiding to maintain the link-scheme manually both in your content and the rule-file.
If not there's always the chance you forget to replace those dynamic links with your readable ones if you create new content. Also your cms will always answer both variants: the friendly one and the dynamic one, so you have to tell Google the "canonical" URL (Explanation here) to avoid duplicate content. This might happen because you can't tell how people link to content on your site.

Getting "mywebsite.org/" to resolve to "mywebsite.org/index.php"

At my work we have various web pages that, my boss feels, are being ranked lower than they should be because "mywebsite.org/category/" looks like a different URL to search engines than "mywebsite.org/category/index.php" does, even though they show the same file. I don't think it works this way but he's convinced. Maybe I'm wrong though. I have two questions:
How do i make it so that it will say "index.php" in the address bar of all subcategories?
Is this really how pagerank works?
Besides changing all the links everywhere, a simpler solution is to use a rewrite rule. Make sure it is a permanent redirect, or Google will keep using the old link (without index.php). How you do this exactly depends on your web server, but for Apache HTTPd it looks something like the example given below.
Yes. Or so I've heard. Very few people know for sure. But Google mentions this guideline (as "Be consistent"). Make sure to check out all of Google's Webmaster guidelines.
Apache config for rewrite rule:
# in the generic config
LoadModule rewrite_module modules/mod_rewrite.so
# in your virutal host
RewriteEngine On
# redirect everything that ends in a slash to the same, but with index.php added
RewriteRule ^(.*)/$ $1/index.php [R=301,L]
# or the other way around, as suggested
# RewriteRule ^(.*)/index.php$ $1/ [R=301,L]
Adding this code to the top of every page should also work:
<?php
if (substr($_SERVER['REQUEST_URI'], -1) == '/') {
$new_request_uri = $_SERVER['REQUEST_URI'].'index.php';
header('HTTP/1.1 301 Moved Permanently');
header('Location: '.$new_request_uri);
exit;
}
?>
You don't tell us if you're using straight PHP or some other framework, but for PHP, probably you just need to change all the links on your site to "mywebsite.org/category/index.php".
I think it's possible that this does affect your search engine rank. However, you would be better off using only "mywebsite.org/category" rather than adding "index.php" to each one.
Bottom line is that you need to make sure all your links in your website use one or the other. What actually gets shown in the address bar is unimportant.
A simple solution is to put in the <head> tag:
<link rel="canonical" href="http://mywebsite.org/category/" />
Then, no matter which page the search engine ends up on, it will know it is simply a different view of /category/
And for your second question--yes, it can affect your results, if Google thinks you are spamming. If it wasn't, they wouldn't have added support for rel="canonical". Although I wouldn't be surprised if they treat somedir/index.* the same as somedir/
I'm not sure if /category/ and /category/index.php are considered two urls for seo, but there is a good chance that it will effect them, one way or another. There is nothing wrong with making a quick change just to be sure.
A few thoughts:
URLs
Rather than adding /index.php, you will be better off making it so there is no index.php on any of them, since the keyword 'index' is probably not what you want.
You can make a script that will check if the URL of the current page ends in index.php and remove it, then forward to the resulting URL.
For example, on one of my sites, I require the 'www.' for my domain (www.domain.com and domain.com are considered two URLs for search purposes, though not always), so I have a script that checks each page and if there is no www., it ads it, and forwards.
if (APPLICATION_LIVE) {
if ( (strtolower($_SERVER["HTTP_HOST"]) != "www.domain.com") ) {
header("HTTP/1.1 301 Moved Permanently"); // Recognized by search engines and may count the link toward the correct URL...
header("Location: " . 'www.domain.com/'.$_SERVER["REQUEST_URI"] );
exit();
}
}
You could mode that to do what you need.
That way, if a crawler visits the wrong URL, it will be notified that it was replaced with the correct URL. If a person visits the wrong URL, they will be forwarded to the correct URL (most won't notice), and then if they copy the url from the browser to send someone or link to that page, they will end up linking to the correct url for that page.
LINKING URLS
They way other pages link to your pages is more important for seo. Make sure all your in-site links use the proper URL (without /index.php), and that if you have a 'link to this page' feature, it doesn't include the /index.php part. You can't control how everyone links to you, but you can take some control over it, like with the script in item 1.
URL ROUTING
You may also want to consider using some sort of framework or stand-alone URL rerouting scheme. It could make it so there were more keywords, etc.
See here for an example: http://docs.kohanaphp.com/general/routing
I agree with everyone who's saying to ditch the index.php. Please don't force your visitor to type index.php if not typing it could get them the same result.
You didn't say if you're on an IIS or Apache server.
IIS can be set to assume index.php is the default page so that http:// mywebsite.org/ will resolve correctly without including index.php.
I would say that if you want to include the default page and force your users to type the page name in the url, make the page name meaningful to a search engine and to your visitors.
Example:
http://mywebsite.org/teaching-web-scripting.php
is far more descriptive and beneficial for SEO rankings than just
http://mywebsite.org/index.php
Might want to take a look at robots.txt files? Not quite the best solution, but you should be able to implement something workable with them...

Resources