I am creating a website for one of my client. consider this is my website: www.website.com
I saved my all my files in sub-domain like-> http://sub.website.com/file.jpg
I also have robots.txt file in sub-domain like-> http://sub.website.com/robots.txt.
If someone directly enter this url "http://sub.website.com/robots.txt", They can able to read my robots.txt file.
What should i do, if i want protect that files?
If you make your robots.txt file "un-readable" than search engines won't be able to view it to know what not to index.
Related
I have added robots.txt file and added some lines to restrict some folders.Also i added restriction from all to access that robots.txt file using .htaccess file.Can Search engines read content of that file?
This file should be freely readable. Search engine are like visitors on your website. If a visitor can't see this file, then the search engine will not be able to see it either.
There's absolutely no reason to try to hide this file.
Web crawlers need to be able to HTTP GET your robots.txt, or they will be unable to parse the file and respect your configuration.
The answer is no! But the simplest and safest too, is still to try:
https://support.google.com/webmasters/answer/6062598?hl=en
The robots.txt Tester tool shows you whether your robots.txt file
blocks Google web crawlers from specific URLs on your site. For
example, you can use this tool to test whether the Googlebot-Image
crawler can crawl the URL of an image you wish to block from Google
Image Search.
Using website grabbers whole website with folder structure can be downloaded.
Is there any way to prevent this?
If so,how?
The only way to protect a websites markup is not to publish it. If you want your users to see something they need to get the HTML markup and the images, that should be displayed. And therefore the files need to be accessible. And if your files are accessible every user/bot/crawler/grabber can save these files.
The best way is to put a few files like the index page in the main directory and call the other sub pages in it. If using php then you may do the following.
Say keep the index.php in the main folder and keep the homepage.php in a directory called includes and use the homepage in the index.php via include function in php.
Now add a .htaccess file to the includes folder which must contain
"deny from all"
This way users can use the page but will not have direct access to the files. So will be for the grabber.
I have a sitemap file called links.txt, and I want only search engine/bots to access this file.
How can I do that via htaccess file.
How do I protect and hide /adminblah/ folder from robots and from users so the only the administrator will know it exists?
1) To prohibit it from robots and bots, we can use robots.txt file.
But, that file will contain Disallow: /adminblah/ then. As result, everybody (who wants to) will know the path to the administrator's folder because he can read robots.txt file.
For that purpose, we can put .htaccess file to the /adminblah/ to password protect that folder.
Is that smart? Any smarter solutions to limit access to /adminblah/index.php page?
This question concerns all the content - admin php files, admin pictures etc.
Mentionning the directory in robots.txt is not a solution since it's worse than doing nothing as you say it yourself.
.htaccess protection is a very good option alone ; add it at the root of /adminblah/ and even IF someone guesses it (including robots) they'll get nothing.
Using a robots.txt to hide a directory from search engines is nothing more than security through obscurity. You must have proper access controls on the content and a .htaccess file is perfect for this.
I like to do things like this
RedirectMatch permanent (?i)adminblah http://www.fbi.gov
I don`t know anything about .htaccess files except how to secure a folder or deny access.
I want to deny direct access to .js files (by typing the file name in url) on my server, say the files are stored in a folder named /js/ how can I use the .htaccess to do that?
You cannot do that.
Actually there is no 'direct access' or 'indirect access'. The browser accesses the JS file the same way when you load it from a SCRIPT tag and when you try to load it separately (typing the file name in browser).