Block a specific page in Robots.txt - drupal-6

According to this
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Disallow: /page1/
all page1 URLs will be disallowed i.e page1/foo/bar will also get blocked.
Disallow: /page1
Only page1 will be blocked and page1/foo/bar will be allowed.
But this is not happening , how can I block only page1 and allow page1/foo/bar to be crawled
EDIT :
Actual Issue is that same Page is crawled twice in different paths
as /page and /page/

Why don't you just add a robots metatag?
<meta name="robots" content="noindex, nofollow, noarchive"/>

Related

Is it better for the robots to find html pages that do redirection or should I just use htaccess?

I want some of the pages to be redirected back to the main page as 'params'. These are not data or privacy sensitive, so I dont need to use the "session" variable.
I was thinking that it would be better for the robots to find physical html files. So for example, I have example.kiwi/downloads.html containing
<meta http-equiv=="refresh" content="0; url=https://example.kiwi?mode=downloads" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<script>window.location.href = "https://example.kiwi?mode=downloads"</script>
I assume that I can do the same with htaccess. Is that better and how do I do it. Being a sensitive file, I am trying not to experiment a lot with it.

.htaccess works like i created a new directory

I have bone.php and bone.css inside public_html in my server. Inside bone.php i have a link tag which calls bone.css <link rel="stylesheet" type="text/css" href="bone.css">. i have created .htaccess file for bone.php file
RewriteRule ^community/([0-9a-zA-Z]+) bone.php?first=$1 [NC, L]
After i created .htaccess i need to change link tag like <link rel="stylesheet" type="text/css" href="../bone.css">. Which means bone.php thinks its inside a folder, which is not.
If its only way i need change all links inside my website. I hope someone will say there is an another way.Thanks
This is because your relative URIs have their base changed. Originally, the base is / when the page is /bone.php, and the browser properly fills in relative links with the / base. But when the browser goes to a page like /community/foo/ the base suddenly becomes /community/foo/ and it tries to append that in front of all relative URLs and thus none of them load.
You can either make your links absolute, or change the URI base in the header of your pages (inbetween the <head> </head> tags):
<base href="/">

How does google crawl pages that have a numbered or next or previous buttons

I have a search page that contains paged results. How does Google know to go to the next page so it can crawl all the content and not just the 1st page?
Google actually recommends using rel="next" and rel="prev" for paginated pages.
Basically you'll insert two additional tags in the head of the document (except on the first and last page):
<head>
…
<link rel="prev" href="http://www.example.com/article?story=abc&page=1" />
<link rel="next" href="http://www.example.com/article?story=abc&page=3" />
…
</head>
More Info can be found in their blog: http://googlewebmastercentral.blogspot.ch/2012/03/video-about-pagination-with-relnext-and.html
Even if you don't do this, google usually does a pretty good job indexing paged results. But it doesn't hurt to help them.

How to show full page URL of welcome file in address bar

I have "/pages/index.xhtml" page. The problem is when i run the application, this index page name doesn't appears at address bar.
It only shows http://localhost:8080/myApplication/. What I want to see is http://localhost:8080/myApplication/pages/index.xhtml
Is there any solution?
here is my welcome file from web.xml
<welcome-file-list>
<welcome-file>pages/index.xhtml</welcome-file>
</welcome-file-list>
You need to send a redirect from / to /pages/index.xhtml. Easiest is to use a real index file with a meta refresh header for this.
First create a /index.xhtml file as follows:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Dummy homepage</title>
<meta http-equiv="refresh" content="0; url=pages/index.xhtml" />
</head>
</html>
Then change your <welcome-file> as follows:
<welcome-file>index.xhtml</welcome-file>
This also instantly fixes your bad way of using <welcome-file>. It's not supposed to specify the "home page", but it's supposed to specify the folder's own file which needs to be served when a folder such as / or /foo/ is requested in URL instead of a file.
See also
How to use a sub-folder as web.xml welcome directory
why do i get the protected page instead of the login page?

Htaccess html file from being indexed

Hey we are trying to prevent just one file from being indexed by google. "preview.html"
How can we set this up to allow everything else but deny this one html file?
<meta name="robots" content="noindex">

Resources