I have deleted a folder called forums from my website from 3 months. but in my Google Webmaster Tools it keeps saying that e.g. /forums/member.php?u=1092 is missing (404). is there any way to stop these messages and tell google that i am not going to re-upload it? is this going to affect on my SEO ranking?
I tried this code, but it's not working.
RewriteRule ^forums/(.*)$ http://www.mysite.com [301, L]
Thanks.
Have you tried changing the status code to 410?
410 Gone
The requested resource is no longer
available at the server and no
forwarding address is known. This
condition is expected to be considered
permanent. Clients with link editing
capabilities SHOULD delete references
to the Request-URI after user
approval. If the server does not know,
or has no facility to determine,
whether or not the condition is
permanent, the status code 404 (Not
Found) SHOULD be used instead. This
response is cacheable unless indicated
otherwise.
More detail available in the rfc.
Google on Removing my own content from Google.
Related
I am having some issues with spam links visiting my site returning a 404 error.
My site was hacked with a secret spam links folder on public_html that redirected users to pornographic sites, those links were plastered across the internet. I have since remedied the malware issue, but have several hundred visitors hitting a 404 page because these links no longer exist, messing up all my analytics accounts, using bandwidth, etc.
I have searched for a way to block (so that they never hit my website) anyone that tries to access these URL paths, but cannot possibly redirect every single link (there were over 2000) using a wildcard, or something. My search led me here: Block Spam Referrer Traffic and it is not quite the solution I need.
The searches go to pages like this: www.mywebsite.com/spampage/morespam/ (which have been deleted and are now 404 errors)
There are several iterations of the /spampage/ and /morespam/ urls.
The referrer is generally a google search, so I can't block the referrer using .htaccess. I'd like to somehow block www.mywebsite.com/spampage/*/ and all iterations.
Apologies, I am by no means a programmer. I do appreciate any help that can be offered.
Update#1:
Seems that perhaps the best way is to block these links/directories using the robots.txt file, I have done so and will report back if I have success!
Update#2:
Reporting back. I am new to this all, so I was going about the solution wrong in my original question. Essentially, I found that I needed all of the links de-indexed, as they were generating all the traffic by being indexed by google. I was able to request de-indexing of the directories in question manually through the google webmaster tools account. One requirement for de-indexing was to have the robots.txt on the site block the directories in question from being crawled. Once I did that I submitted the request to remove the directory from the google index. Those pages were taken off in about 3 hours by google (thanks google!), so it was pretty quick once I found out the proper way to go about it. No .htaccess editing needed. Once the pages were no longer index, traffic went back down to normal levels and my keywords, etc, will be back to normal.
In your opinion what is the best way to protect directory listing from external users?
Option 1: Blank index. This is the standar way that i have seen on several sites, it has te advantage of not showing anything but the disadvantage of implying that there is something there
Option 2: 404, send a fake 404 page and redirect, will this can cause problems with the webcrawlers?
Option 3: 401 error and redirection, this is similar to the blank index, except that it will show an "unauthorized" header, i think this will be a very bad option (because im implicity saying that there is something important inside), but i would like to hear your thoughts on this too
Thanks for your help if you know any other option that i might use please tell me as well
The 'best' way is to disable directory listing the server (this will normally cause a 403 error, see error 404 in the following list for discussion of information leakage)
The easiest way is a blank page (normally index.html or index.htm)
Other options with returning errorcodes:
403 (forbidden) is the default in apache httpd and i think this is better than a blank page.
404 is for 'not found' which is not the case here (could be used if nobody knows that the directory exists in order to prevent disclosure, but if ppl. know it exits it doesn't make any sense as its existance is already known) and
401 (authentication required) doesn't make any sense in any case
Other considerations
some browsers do not display custom error pages. If you want to provide a link to the main page (or somewhere else) a 'blank' page containing a link or a direct 301/302 redirect could be used.
So... we have a custom CMS. We have a rewrite rule that any page request (when a file doesn't exist) goes to the root/index.cfm file. There we search our DB for the page in question. If the page exists, we serve up the correct template,etc. If the page doesn't exist I want to server up a 404 page. Now I "think" I cannot do this in IIS since I need to handle the request in CF, so it has to get through. The file will always exist. When the page doesn't exist I've tried using <cfheader statusCode="404" > and then include some html, it puts The resource you are looking for has been removed, had its name changed, or is temporarily unavailable. at the top of the page before my html. In order to get it to display the page I had to remove the 404 status code handler from IIS.
In addition when I fetch as Google, it get's a 301. However when I view response headers in Firefox I get.
Transfer-Encoding: chunked
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Wed, 16 Jan 2013 21:31:42 GMT
404 Not Found
I've tried a combination of redirecting and all sorts of things. I open to letting IIS handle the 404, if there is a way, but I cannot figure out how to get Coldfusion to correctly deliver a 404 so Google gets it right. Webmaster tools gets mad at me because I am delivering "Soft 404s" before this point, so I am trying to fix that.
I've also tried setting <httpErrors existingResponse="PassThrough" /> whatever the hell that does, but didn't work either. I've been looking up other threads trying to figure this out and just can't.
EDIT: Looking further into this, viewing the header info in both Firebug & Chrome I clearly see the headers say 404. Why would Fetch as Bing and Fetch as Google say differently?
I tested the fact that if I add .cfm to the URLs, it Fetch as Google will deliver see 404. However without the .cfm, it thinks it's 301. Firebug sees both as 404. This seems like a Google issue.
ANSWER Kind of:
So I was doing more testing this morning (Right after I added a bounty actually), and I noticed in webmaster tools, Google correctly noted one of my pages as a 404. So I started looking into it. I have an "Add Trailing Slash Rule". Google notices domain.com/page as a 301 (Correct I guess) to domain.com/page/. But it does notice domain.com/page/ as a 404. I think using the trailing slash rule as I have it is the right way, however, should I be doing something different, or is using the redirect with the slash the "correct" way of doing things, even though Google wants to ding me for it sometimes.
I'm not entirely sure I follow the specifics of your approach, so I will give you a few things that you need to look at in order to get this approach working well (or at least what has worked best for me).
Under "Error Pages", make sure that your 404 error page is set to "Execute a URL on this site" ( I generally set mine to something like "/404.cfm"). This will make sure that your ColdFusion page is called correctly for 404 pages (it sounds like you have this working correctly).
Under "Handler Mappings", double-click on the handler for ".cfm". Then click the "Request Restrictions..." button. It should open to the "mappings" tab. The "Invoke handler only if request is mapped to:" checkbox should NOT be checked.
This can really trip up this sort of operation because it means that IIS won't invoke ColdFusion if the file doesn't exist. This shouldn't be an issue if your 404 is set up correctly, but still something to look into.
While you are in the "Handler Mapping" section, look for the IsapiModule with a path of "*". Mine is always set to ColdFusion - not sure if that makes a difference or not.
The other thing to look at is the "Default Document" setting. Keep in mind that this could impact you when forwarding to a folder.
You might also look at your rewrite rule again and make sure it isn't adding slashes where one already exists.
I’m fairly new to the Magento platform but I have a decent amount of experience in web development on apache servers.
A few days ago I was asked to look into an issue that was first made aware of with failing filters.
I had a look at the google analytics data and it seems the SEO friendly URLs have all stopped displaying. The navigation URLs still use friendly words however on the page return the URL is redirected to a basic catalog URL.
http://www.camera-camera.com/cameras-and-accessories.html
instead now it goes to
https://www.camera-camera.com/index.php/catalog/category/view/id/9
I checked the admin config. The Web > SEO URL rewrites are set to YES
I toggled them to No saved and back to yes then saved. Tried clearing the catalog URL rewrite cache
Checked the htaccess file and it hasn’t been touched for months.
Emptied the core rewrite table and reindexed it.
So I’m outta ideas now, was hoping some of you more experienced users can have some input as to what else I can check.
I also found it strange that the URL is now ignoring postback parameters. If you look at their filters they are simply an a link to the same page with a post parameter. This gets striped and ignored now might be related?
A file restore was on the day it happened. Any files I should check it against?
Thanks for any help you can provide !
I just discovered that it was related to HTTPS. I didn't notice but seems the site keeps redirecting to HTTPS even though the filter links etc are pointing to HTTP, in the redirect the parameters are dropped. Now to figure out why its going into HTTPS
I would like to block my website and probably redirect them to a 404 page while i am updating it which can take some time.
Could a redirect to the 404 page everytime a user goes to my website work?
You shouldn't do that. Status code 503 "Temporarily Unavailable" is much better in this case.
RewriteRule . - [R=503,L]
This might work.
If it's just a temporary redirect during site-down maintenance then you probably don't want to use a 404 code. Take a look at the other codes available to you. For a scenario such as this, 307 (temporary redirect) would make a lot more sense. It would also be better if you have any SEO or rely on search crawlers at all, as they will remove results which now produce a 404 but are smart enough to keep results which temporarily produce a 307.
The redirect itself will work fine, just redirect all traffic to a static page. (Did you need advice on how to do that, or were you just looking for alternative options and viability? It's unclear from the question. If the former, I can't help much. It's been years since I've cracked open an .htaccess file.)
Basically, a 404 tells visitors: "This resource isn't here. Don't both asking again." Whereas a 307 tells visitors: "This resource is temporarily being handled by something else, but it hasn't really moved, please try again later."
Here's a simpler idea: just make a new index page that's your original, except with the content replaced with a "site currently being updated; please come back later" sort of message. And then you'd redirect all hits to your site to that index page.
That's what many sites I've seen tend to do, at least. And it makes sense, at least to me. I mean, would you rather your users not know why the pages they want to access are no longer there, or that they know the reason is because the site is being updated? It's basically the same as a 404 page, just with the specific information of why the desired pages aren't there.
EDIT: It seems I'm basically talking about a 503 page, going by David's link and Roland's answer.
That would work, but that would not only be wrong information (the page is not 'not found' - it's just currently being updated), but also mislead your users and crawlers. I would redirect them to a 'Update in progress' page and send this with the http status code 423 (LOCKED) to the client to provide a standard conform answer to exactly your scenario.