How to hide website directory from search engines without Robots.txt? - security

We know we can stop search engines from indexing directories on our site using robots.txt.
But this of course has the disadvantage of actually publicising directories we don't want found to possible attackers.
Password protecting the directory using .htaccess or other means is obviously the best way to keep the directory private.
But what if, for reasons of convenience, we didn't want to add another layer of security to the directory and just wanted to add another level of obfuscation? To hide, for example, an admin login page.
Is there another way to "hide" the directory without broadcasting its location in a robots.txt file?

Here is what to do, please note as you haven't mentioned any particular technology I haven't included how to do it.
If you configure your web server to output the following meta tag in the directory listing HTML page, it will prevent your page from being indexed by compliant search engines.
<meta name="robots" content="noindex">
Adding this would probably require implementing a custom module within your web server that will override the default directory listing output page.

Try using a random string. Something like http://website.com/some-random-string-here/file.html
The remember not use some-random-string-here in your robots.txt file or on any links.

Related

Kentico permanent link vs direct path

I'm working on some site that all links (dynamic + hard-coded) to media library are permanent links (with getmedia...), which made it so hard to locate the exact folder of the files and update them. I've asked some developer and heard that permanent links are more secure as the system can check who have access to download the materials. Is it a fair statement and why/why not? Thanks for your input!
This is not a fair or correct statement. Access is set at the individual medial library directory, not an individual file level.
For example, if you have an Images media library which has no security behind it, you can access it directly with a URL of:
/site/media/images/logo.png or /getmedia/<guid>/logo.png
and the image will display without issue.
Now you have another media library called "Secure_Files", if you attempt to access:
/site/media/secure_files/file1.pdf
You'll get an error or a login page because the security is set on the
/site/media/secure_files directory.
Here is the documentation on securing media libraries.
By default, Kentico does not check the See library content permission for visitors on the live site. If you wish to require users to have this permission to view media library content, you need to enable the following settings in the Content -> Media category of the Settings application:
Use permanent URLs
Check file permissions
See the note at the very bottom of this documentation page.
Permanent Link is made up of:
/getmedia/
Guid ID
Image Path
.aspx
Eg: /getmedia/C73B5-6A0-4F6-878-3C29D792014/IMG_3860.jpg.aspx
Direct Path is made up of:
/
Site Name
Media Library Folder Name
Image Path
Eg: /google/media/Blog-images-from-Kentico-Cloud/IMG_360.jpg

Can sitemap.xml be misused to copy entire website?

I am planning to upload sitemap.xml on my website having generated content pages. As of now, if I try to copy the entire website using tools like HTTrack etc., it cannot be copied.
Now if I want search bots to find and index content pages on this website, I will have to include all urls in the sitemap.xml file.
So the question is - will such a sitemap.xml expose all urls thereby "facilitating" full copy of the website ?
Inputs on this will be highly appreciated.
Technically, yes.
But I suppose the question you really need to ask is 'Do I care'
If the answer is yes, you should really consider if you should be publishing it to the web in the first place?
A well constructed IA would contain links between each pages anyway (for navigational and SEO reasons), so tools like HTTrack would be able to copy the site anyway.
Anything you don't want to be seen by HTTrack needs also to be invisible to the ordinary web user - ie either password protected, or non-existent.

IIS7 - How to password protect a single folder using a Web.config file?

I have a folder that contains log files. They're not super critical, but I don't want total strangers looking through them. I'd like to put a password on that one folder. The folder and its contents are served straight up from IIS, so I'm not looking for a coding solution.
With Apache I'd use a .htaccess file.
With IIS it's possible to use multiple Web.config files at various levels to control this kind of thing.
So, what goes in the Web.config file that allows me to require a password when accessing this folder?
I'm happy for the password to pop up in a dialog like old-school websites used to do (not sure what this is called -- I think it is digest authentication) and so avoid any loginUrl redirection stuff
I'm happy to put the password in the Web.config file in plain text if it's easier
The application is internet facing and running on shared hosting, so I don't have much control over the box beyond what I can configure in Web.config.
You can achieve this using the <location path="..."/> element of web.config file.
Check this link for step-by-step instructions..

How to make that only one page requires .htaccess auth?

Here is my problem. I have one page www.example.com which I don't want to be publicly accessible, so I want to have it behind some kind login.
The problem is that I also have www.example.com/api which I need to be publicly accessible.
Do you have any ideas how to achieve this?
Best regards,
Mladjo
.htaccess (or the <DIRECTORY> directive) applies to the directory you put it in (.htaccess) or the directory you specify (<DIRECTORY> directive), and all sub-directories below that one. If you have a specific file that you wish to control access on, put it in its own directory one level deeper than your web-root directory, and apply your access restriction to that path and its subdirs only.

google index - will google index my logs?

I have some txt log files where i print out some important activities for my site.
These files ARE NOT referenced from any link within my site, so it's only me i know the url
(they contain current date in the filname so i have one for each day).
Question: will google index these kind of files?
I think google indexes only the pages whom urls are on the site.
Can you confirm my assumption? I just do not want others to find the link from google etc:)
In theory they shouldn't. If they aren't linked from anywhere they shouldn't be able to find them. However I'm not sure if stuff can make its way into the index by virtue of having the google toolbar installed. Definitely I've had some unexpected stuff turn up in search engines. The only safe way would be to password protect the folder.
Google can not index pages that it doesn't know they exist, so it won't index these, unless someone posts the url's to google, or place them on some website.
If you want to be sure, just disallow indexing for the files (in /robots.txt).
Best practice is to use the robots.txt to prevent the google crawler from indexing files you don't want to show up.
This description from Google Webmaster Tools is very helpful and leads you through the process of creating such a file:
https://support.google.com/webmasters/answer/6062608
edit: As it was pointed out in the comments there is no guarantee that the robots.txt is used so password-protecting the folders is also a good idea.

Resources