seo toolkit - request is disallowed by a robots.txt rule

seo toolkit - request is disallowed by a robots.txt rule - iis

I am trying to run the SEO toolkit IIS extension on an application I have running but I keep getting the following error:
The request is disallowed by a Robots.txt rule
Now I have edited the robots.txt file in both the application and the root website so they both have the following rules:
User-agent: *
Allow: /
But this makes no difference and the toolkit still won't run.
I have even tried deleting both robots.txt files and that still doesn't make any difference.
Does anyone know any other causes for the seo toolkit to be unable to run or how to solve this problem?

To allow all robots complete access I would recommend using the following syntax (according to robotstxt.org)
User-agent: *
Disallow:
(or just create an empty "/robots.txt" file, or don't use one at all)
The allow directive is supported only by "some major crawlers". So perhaps the IIS Search Engine Optimization (SEO) Toolkit's crawler doesn't.
Hope this helps. If it doesn't, you can also try going through IIS SEO Toolkit's Managing Robots.txt and Sitemap Files learning resource.

Check to make sure the DNS record is pointing to the correct server
If you're searching for the file, account for case sensitivity - robots.txt vs Robots.txt
Verify that the Toolkit is actually attempting to visit the site. Check the IIS logs for the presence of the "iisbot" user-agent.

The robot.txt may have been cached. Stop/restart/unload IIS (application). The robots.txt will refreshed. Open a browser reload the file. You can even delete the file to be sure that IIS is not caching it.

Basically robots.text is a file that does not allow Google to crawl the pages which are disallowed by admin so Google ignores those pages that's why those pages never rank and google never shows that data.

Related

Can Search Engine read robots.txt if it's read access is restricted?

I have added robots.txt file and added some lines to restrict some folders.Also i added restriction from all to access that robots.txt file using .htaccess file.Can Search engines read content of that file?

This file should be freely readable. Search engine are like visitors on your website. If a visitor can't see this file, then the search engine will not be able to see it either.
There's absolutely no reason to try to hide this file.

Web crawlers need to be able to HTTP GET your robots.txt, or they will be unable to parse the file and respect your configuration.

The answer is no! But the simplest and safest too, is still to try:
https://support.google.com/webmasters/answer/6062598?hl=en
The robots.txt Tester tool shows you whether your robots.txt file
blocks Google web crawlers from specific URLs on your site. For
example, you can use this tool to test whether the Googlebot-Image
crawler can crawl the URL of an image you wish to block from Google
Image Search.

how to restrict the site from being indexed

I know this question was being asked many times but I want to be more specific.
I have a development domain and moved the site there to a subfolder. Let's say from:
http://www.example.com/
To:
http://www.example.com/backup
So I want the subfolder to not be indexed by search engines at all. I've put robots.txt with the following contents in the subfolder (can I put it in a subfolder or it has to be at the root always, because I want the content at the root to be visible to search engines):
User-agent: *
Disallow: /
Maybe I need to replace it and put in the root the following:
User-agent: *
Disallow: /backup
The other thing is, I read somewhere that certain robots don't respect the robots.txt file so would just putting an .htaccess file in the /backup folder do the job?
Order deny,allow
Deny from all
Any ideas?

This would prevent that directory from being indexed:
User-agent: *
Disallow: /backup/
Additionally, your robots.txt file must be placed in the root of your domain, so in this case, the file would be placed where you can access it in your browser by going to http://example.com/robots.txt
As an aside, you may want to consider setting up a subdomain for your development site, something like http://dev.example.com. Doing so would allow you to completely separate the dev stuff from the production environment and would also ensure that your environments more closely match.
For instance, any absolute paths to JavaScript files, CSS, images or other resources may not work the same from dev to production, and this may cause some issues down the road.
For more information on how to configure this file, see the robotstxt.org site. Good luck!
As a last and final note Google Webmaster Tools has a section where you can see what is blocked by the robots.txt file:
To see which URLs Google has been blocked from crawling, visit the Blocked URLs page of the Health section of Webmaster Tools.
I strongly suggest you use this tool, as an incorrectly configured robots.txt file could have a significant impact on the performance of your website.

Webmaster Tools Crawler 403 errors

Google Webmaster Tools is reporting 403 errors for some folders on the websites server for example:
http://www.philaletheians.co.uk/Study%20notes/
The folder isnt forbidden so dont understand why it would be 403 errors for Googles Crawler?
How come the Google Crawler is trying to browser the actual folders and not just going straight to the files in that folder? Is this somthing to do with robots.txt ?

Make sure is there any actual place or document to be present if some one request that url. I've browsed through your site and could not found a link that directs to http://www.philaletheians.co.uk/Study%20notes/
Also it seems, all the study notes are inside this "Study%20notes" directory.So actual this link will not work anyway. So check the google web master tools's link from to find where this broken link situate and cure it.

Have you set default document correctly in your web server? In apache, this comes in the DirectoryIndex setting (and defaults to index.html). Also, in general it might be better to strip off spaces etc.. from your traversable directory names (the %20 you are seeing between Study and notes is a url-encoded space character), so as to keep your URLs clean to your visitors and search engine bots.

How to move pages around and rename them while not breaking incoming links from external sites that still use the poorly formed URLs

update
Here is the situation:
I'm working on a website that has no physical folder structure. Nothing had been planned or controlled and there were about 4 consecutive webmasters.
Here is an example of an especially ugly directory
\new\new\pasite-new.asp
most pages are stored in a folder with the same name as the file, for maximum redundancy.
\New\10cap\pasite-10cap.asp
\QL\Address\PAsite-Address.asp
each of these [page directories]? (I don't know what else to call them) has an include folder, the include folder contains the same *.inc files in every case, just copied about 162 times for each page directory. The include folder was duplicated so that the
<!--#include file="urlstring"--> would work correctly due to lack of understanding of relative paths, and the #inclue virtual directive or using server.execute()
Here is a picture if my explanation was lacking.
Here are some of my limitations:
The site is written in ASP classic
Server is Windows Server 2003 R2 SP2 , IIS 6 (According to my resource)
I have no access to the IIS server
I would have to go through a process to add any modules or features to iis
What changes can I make that would allow me to move pages around and rename them while not breaking incoming links from external sites that still use the poorly formed URLs?
To make my question more specific.
How can I move the file 10cap.asp from \new\10cap\ to a better location like \ and rename the file to someting like saveourhomescap.asp and not break any incoming links and finally, not have to leave a dummy 10cap.asp page in the original location with a redirect to the new page.

Wow, that's a lot of limitations to deal with.
Can you setup a custom error page? If so you can add some code into a custom error page that would redirect users to the new page. So maybe you create a custom 404 page, and in that page you grab the query string variable and based on that send the user to the correct "new" page. That would allow you to delete all of the old pages.
Here is a pretty good article on this method: URL Rewriting for Classic ASP

Well, you have a lot of limitations and especially no access to the IIS server hurts. An ISAPI module for URL rewriting is not an option here (IIS) and equally a custom 404 page where you could read the referer and forward with a HTTP 301 won't work (IIS).
I would actually recommend you to go through the process and let them install:
An ISAPI URL rewriting module
or if that doesn't work (for any reason):
Let them point the HTTP 404 of your web to a custom 404.asp, read the referer and redirect with a HTTP 301 (Moved Permanently) to your new location.
If none of this is an option for you, I can think about another possibility. I haven't actually tried that so I'm not 100% sure if it will work, but in theory it sounds good ;)
You could make in your global.asa in the Session_OnStart event a Response.Redirect or change the header of your response to a HTTP 301. This will actually only work for new users and not fix real 404 errors. Sorry, for the pseudo code, but it's a while ago that I had anything to do with classic ASP and I think you'll get what I mean ;)
sub Session_OnStart
' here should be a Select Case switch or something like that
Response.Redirect("newlocation.asp")
' or if that will work, this would be better (again with switch)
Response.Status = "301 Moved Permanently"
Response.AddHeader "Location", "http://company.com/newlocation.asp"
end sub
Hope that helps.

I recommend using URL Rewrite for that, see the following blog about it, in particular "Site Reorganization":
http://blogs.msdn.com/b/carlosag/archive/2008/09/02/iis7urlrewriteseo.aspx
For more info about URL Rewrite see: http://www.iis.net/download/URLRewrite

You can try ISAPIRewrite since it's classic ASP + IIS6
http://www.isapirewrite.com/
They have a lite version which is free, probably good enough for your use.

urlrewrite will only work if you can install a dll on the server
one of these articles will help
http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=qRR&q=url+rewrite+classic+asp&btnG=Search&aq=f&oq=&aqi=g-m1
basically you have to point 404 errors to an error page which will parse the incoming querystring / post info and redirect user to correct location with incoming parameters added.
variations on that theme will be found in the examples fro google.

Redirect all HTTP-Requests with *.asp to one single file

Is it possible on an IIS to redirect all files with the file extension .asp to one single file (i.e. switch.php, switch.cfm) and how?
Thx in advance for the upcoming solutions :)
EDIT:
version of IIS is "IIS 6.0"

Here’s a few different thoughts off the top of my head:
Use an ISAPI filter. Either write your own or use a commercial one like Helicon ISAPI Rewrite (the reverse proxy feature should be able to do this).
Add a global.asa file to the root of the site and Response.Redirect to the page you want in the Session_OnStart event (I think this event still fires if the requested page doesn’t actually exist but am not 100% sure). More info here.
Define a new 404 “File not found” page in IIS which loads a custom page with a redirect to your desired URL. You could do this with either client or server side script and make it conditional on the requested URL having a .asp extension so as not to catch genuine 404s for other file types.
I’d say option 1 is your “best practice” approach but option 3 would get you up and running very quickly. Good luck!

your going to want to look into "iis modrewrite" on google :)
lets you use regular expressions to define rules and you can set a global match to rewrite to 1 page

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

seo toolkit - request is disallowed by a robots.txt rule - iis

Check to make sure the DNS record is pointing to the correct server If you're searching for the file, account for case sensitivity - robots.txt vs Robots.txt Verify that the Toolkit is actually attempting to visit the site. Check the IIS logs for the presence of the "iisbot" user-agent.

The robot.txt may have been cached. Stop/restart/unload IIS (application). The robots.txt will refreshed. Open a browser reload the file. You can even delete the file to be sure that IIS is not caching it.

Basically robots.text is a file that does not allow Google to crawl the pages which are disallowed by admin so Google ignores those pages that's why those pages never rank and google never shows that data.

Related

Can Search Engine read robots.txt if it's read access is restricted?

how to restrict the site from being indexed

Webmaster Tools Crawler 403 errors

How to move pages around and rename them while not breaking incoming links from external sites that still use the poorly formed URLs

Redirect all HTTP-Requests with *.asp to one single file

Categories

Resources