SEO - Problems possibly related to 301 Moved Permanently - .htaccess

Right, here's the story:
We have had a website for one of our brands now for many years, the site design was very bad and recently did a complete overhaul, mostly design, but also some of the backend code.
The original site was using links such as this example.com/products/item/127 and thus I wanted to change them to be move user friendly, especially to include the product name, the same link now reads example.com/product/127/my-jucy-product/.
Since our switch over we have seen our Google results take a beating (we were on the first page for our normal search terms, now we're nearer the 4th!). The other problem we're having is that the links to the old products haven't updated to the new links despite me coding a 301 redirect from old to new. The 301 is not being fired from .htaccess, but in our PHP framework.
I had a look at how the site is being loaded from a old link that is still in Google and here's what firebug is reporting:
GET <google link> 302 Found
GET example.com/products/item/127 302 Found
GET example.com/products/item/127 301 Moved Permanently
GET example.com/product/127/my-jucy-product/ 302 Found
So the Google link has a 302, good. But when the old link comes in our framework is returning a 302! It's only afterwards when it finally hits the right part of the framework does it 301, so here's my question:
Is the reason our old links have not changed and our Google Ranking has significantly nose dived because Google is seeing a 302 before the 301?
At the time I was reluctant to mess with our .htaccess because it had become pretty complicated and I was under some pretty intense time constraints, now I'm wondering whether this was an incorrect disicion and perhaps I should revisit it.
Many thanks!
Edit
Bugger, just signed up to the Webmaster Tools and I'm getting redirect errors all over the place, hundreds of them! I think this is my problem.
Edit 2
So on closer inspection it looks as if it is because I was being lazy and not using .htaccess to redirect my URLs, I wanted to avoid doing this as it was easier at the time just to throw a PHP header, regardless I have now started convert our framework to depend more on the .htaccess, not only has this solved the problem (well, we'll see when I get a google crawl) but it has also improved the speed dramatically!

One thing to look at is canonical links (which is how SO does it). This means you don't need to do redirects, old links will still work and search engines will get updated accordingly.
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

There's no telling how Google will adjust its PageRank witch's brew on a day to day basis, but in general, you should expect to see a (temporary) drop in PR following a mass 301 redirect of legacy URLs. It often just takes a little time (a month, maybe two) for things to percolate.
Note this does not answer your question about whether the 302 is hurting you. Just pointing out that, even if it's not hurting you, you should still see a drop in PR temporarily, on the basis of the mass 301 redirect alone.

I think noone except of google can answer your question with 100% confidence.
302 temporary redirect most probably prevents google to update the old link to new one and this situation COULD have effect on page ranks.
I'd first of make sure that all old pages are accessible and redirected immediately with 301.

Related

How to redirect a URL?

I have a problem in redirecting a URL on a Silverstripe website. I have a news feed page with a summary of articles in a paginated style. It displays 20 articles initially and switches to the next 20 based on the page number chosen. It is just the standard blog layout. When I click on page 2 then it should navigate to https://*****/news/?count=20 and for page 3 as https://*****/news/?count=40 etc. However upon clicking the blog page number it navigates to https://*****/news/news/?count=20. So the navigation link is not rewriting the parent URL.
All of my other Silverstripe websites work fine with the same blog layout except this and I don't see any reason to tweak the default code. I thought of adding a .htaccess redirect like this
Redirect 301 /news/news/?start=20 https://******/news/?start=20
but I didn't have any luck to make it work. Kindly suggest me a solution for this.
The output I expect is to redirect to the right URL
https://******/news/?start=20
Here is a simple redirection rule that should fix the symptom you describe:
RewriteEngine on
RewriteRule ^/?news/news/(.*)$ /news/$1 [R=301,L]
But I doubt that approach is a good idea. Simply because it tries to fix a symptom, not the cause. The cause is that you actually create requests to URLs that contain the /news/news/ issue which should never happen. I assume the cause of that issue is that you hand out relative references (so something like news/...) instead of absolute references (/news/...). I strongly suggest that you handle the cause instead of trying to fix the symptom.

Preventing shortened URL (by htaccess) resolving

This is for bit of a knowledge gainer for me really, everyday is school day and I like to know what is possible, and not try to have a go at something is impossible.
I have with help here (.htaccess to hide 2 folder paths) shortened the full path so people access the website don't know the full directory.
My problem is now that I would like the short URL if typed in not too resolve, but only can be navigated through, I want the URL just be purely for display I suppose, and if it was to be entered it wouldn't work. Is this possible?
You can't do this with .htaccess rewrite rules — if the URL works in a link, then it also works if typed in.
(OK, technically you could make your rewrite rules conditional on the presence of an appropriate HTTP Referer (sic) header, but that's not something you really should rely on.)
What you could do is fake the URL shown in the browser's address bar using the JavaScript History API, like this:
history.replaceState({}, "", "/whatever")
(Try running that in your browser's JS console!)
Of course, that's a purely cosmetic change; you'd still need to use the real URL of the page in your links, and it would be trivial for anyone with a basic understanding of how web browsers work to figure out the real URL. It also has the annoying side effect that reloading the page will cause the browser to try to load the fake URL, likely breaking the page. But that's pretty much inevitable; there's effectively no difference between reloading a page and typing its URL into the address bar. But if you're OK with all that, it could be one way to go.
Honestly, though, I suspect that what you really should do is to stop trying to do this, and instead take a step back and try to find an alternative solution to your real problem, whatever it may be.

Magento URL adding random string

Currently have a Magento install running which seems to be printing #.UD3vymhSSYV at the end of every product URL. I assumed I could remove it using the htaccess file, however whenever this is done it generates a different random code.
http://domainname.com/gentlemans-tall-coat-wallet#.UD3vymhSSYV
As you can see above. Im just stumped not sure where it is coming from or how to get rid of it.
Any help would be great.
THIS ANSWER IS POSSIBLY WRONG:
Harmless - you has some ADDTHIS extension added.
Their URLs are like: http://www.addthis.com/browser-extensions#.UD4zyRVFcal
No further action needed but you may want to get rid of the 'addthis' stalker code.
THIS ANSWER IS MOSTLY WRONG:
That does not look like a Magento SID string to me.
You may have a compromised server with someone able to write/inject stuff into your index.php./.htaccess
Fire up the site on localhost and make sure that install files are as they should be, particularly index.php and .htaccess.
The server may be compromised with a common attack on whatever else is on the box, e.g. Wordpress, Expression Engine or anything else with known vulnerabilities.
As for what the anchor string does, this may not be obvious to you but could be just for the search engines to pick up - check your site isn't a 'viagra' site on Google.
Don't panic though - it is unlikely that the DB of Magento is compromised.
Yes, it is done by addthis script.
See how to remove :
http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls#.UrGWX_QW314

CakePHP nice urls - how to prevent normal urls from working

I have a website that's written using CakePHP. I've added some rewrite rules in the .htacces file to change the default urls to different ones (instead of /controller1/action1/parameter I have /some-string-about-controller-and-action/parameter, for example).
The problem is that now both the normal url and the nice one are available, and google seems to be indexing both, which is a problem. I'd like to only keep the nice one, which is the proper way to handle this so that it affects the google results as little as possible?
I don't know why you don't want to use cakes own routing (if you are having trouble doing what you want, you can accomplish what you want with a custom route class), then make sure that you redirect all relevant URL's in your .htaccess file to the desired URL using a MOVED PERMANENTLY redirect.
This way google will index the target url instead of the one that is undesirable. You are right to take offense to this, double indexing is a great way to harm your SEO rankings.

Having huge redirect list in .htaccess a Problem?

I want to redirect every post 301 redirect, but I have over 3000 posts.
If I list
Redirect permanent /blog/2010/07/post.html http://new.blog.com/2010/07/23/post/
Redirect permanent /blog/2010/07/post1.html http://new.blog.com/2010/07/24/post1/
Redirect permanent /blog/2010/07/post2.html http://new.blog.com/2010/07/25/post2/
Redirect permanent /blog/2010/07/post3.html http://new.blog.com/2010/07/26/post3/
Redirect per......
for over 3000 url redirect command in .htaccess would this eat my server resource or cause some problem? Im not sure how .htaccess work but if the server is looking at these lists each time user requests for page, I would guess it will be a resource hog.
I can't use RedirectMatch because I added date variable in my new url. Do you have any other suggestions redirecting these posts? Or am I just fine?
Thanks!
I am not an Apache expert, so I cannot speak to whether or not having 3,000 redirects in .htaccess is a problem (though my gut tells me it probably is a bad idea). However, as a simpler solution to your problem, why not use mod_rewrite to do your redirects?
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/$3/ [R=permanent]
This uses a regex to match old URLs and rewrite them to new ones. The [R=permanent] instructs mod_rewrite to issue a 301 with the new URL instead of silently rewriting the request internally.
In your example, it looks like you've added the day of the post to the URL, which does not exist in the old URL. Since you obviously cannot use a regexp to divine the day an arbitrary post was made, this method may not work for you. If you can drop the day from the URL, then you're good to go.
Edit: The first time I read your question, I missed the last paragraph. ("I can't use RedirectMatch because I added date variable in my new url.") In this case, you can use mod_rewrite's RewriteMap to lookup the day component of a post.
You have two options:
Use a hashmap to perform fast lookups in a static file. This means all your old URLs will work, but any new posts cannot be accessed using the old URL scheme.
Use a script to grab the day.
In option one, create a file called posts.txt and put:
/yyyy/mm/pppp dd
...for each post where yyyy is the year of the post, mm is the month, and pppp is the post name (without the .html).
When you're done, run:
$ httxt2dbm -i posts.txt -o posts.map
Then we add to to the server/virtual server config: (Note the path is a filesystem path, not a URL.)
RewriteMap postday dbm:/path/to/file/posts.map
RewriteRule ^/blog/(.+)/(.+)/(.+).html$ http://new.blog.com/$1/$2/${postday:$1/$2/$3}/$3/ [R=permanent]
In option two, use pgm:/path/to/script/lookup.whatever as your RewriteMap. See the mod_rewrite documentation for more info about using a script.
Doing the lookup in mod_rewrite is better than just redirecting to a script which looks up the date and then redirects to the final destination because you should never redirect more than once. Issuing a 301 or 302 incurs a round trip cost, which increases the latency of your page load time.
If you have some way in code to determine the day of a post, you can generate the rewrite on the fly. You can setup a mod_rewrite pattern, something like .html and set up a front controller pattern to calculate the new url from the old and issue the 301 header.
With php as an example:
$_SERVER['REQUEST_URI']
will contain the requested url and
header("Location: http://new.blog.com/$y/$m/$d/$title/",TRUE,301);
will send a redirect.
That's... a lot of redirects. But the first thing I would tell you, and probably the only thing I can tell you without qualification, is that you should run some tests and see what the access times for your blog are like, and also look at the server's CPU and memory usage while you're doing it. If they're fairly low even with that giant list of redirects, you're okay as long as your blog doesn't experience a sudden increase in traffic. (I strongly suspect the 3000 rewrites will be slowing Apache down a lot, though)
That being said, I would second josh's suggestion of replacing the redirects with something dynamic. Like animuson said, if you're willing to drop the day from the URL, it'll be easy to set up a RewriteRule directive to handle the redirection. Otherwise, you could do it with a PHP script, or generally some code in whatever scripting language you (can) use. If you're using one of the popular blog engines, it probably contains code to do this already. Basically you could do something like
RewriteRule .* /blog/index.php
and just let the PHP script sort out which post was requested. It has access to the database so it'll be able to do that, and then you can either display the post directly from the PHP script, or to recover your original redirection behavior, you can send a Location header with the correct URL.
An alternative would be to use RewriteMap, which lets you write a RewriteRule where the target is determined by a program or file of your choice instead of being directly specified in the configuration file. As one option, you can specify a text file that contains the old and new URLs, and Apache will handle searching the file for the appropriate line for any given request. Read the documentation (linked above) for the full details. I will mention that this isn't used very often, and I'm not sure how much faster it would be compared to just having 3000 redirects.
Last tip: Apache can be significantly faster if you're able to move the configuration directives (like Redirect) into the server or virtual host configuration file, and disable reading of .htaccess entirely. I would guess that moving 3000 directives from .htaccess into the virtual host configuration could make your server considerably faster. But even moving the directives into the vhost config file probably wouldn't produce as much of a speedup as using a single RewriteRule.
It's never a good idea to make a massive list of Redirects. A better programming technique is to simply redirect the pages without that date variable then have a small PHP snippet that detects if it's missing and redirects to the URL with it included. The long list looks tacky and slows down Apache because it's checking that URL (any every other URL that might not even be affected by this) against each line. If it were only 5 or so, I'd say fine, but 3,000 is a definite NO.
Although I'm not a big fan of this method, a better choice would be to redirect all those URLs normally using a single match statement, redirecting them to the page without the date part, or with a dash or something, then include a small PHP snippet to check if the date is valid and if not, rewrite the path again to the correctly formed URL.
Honestly, if you didn't have that part there before, you don't need it now, and it will probably just confuse the search engines changing the URL for 3,000 posts. You don't really need a date in the URL, a good title is much more meaningful not only to users, but also to search engines, than a bunch of numbers.

Resources