Preventing Rogue spiders from Indexing directory

Preventing Rogue spiders from Indexing directory - security

We have a secure website (developed in .NET 2.0/C# running on Windows server and IIS 5) to which members have to log in and then they can view some PDF files stored in a virtual directory. To prevent spiders from crawling this website, we have a robots.txt that will disallow all user agents from coming in. However, this will NOT prevent Rogue spiders from indexing the PDF files since they will disregard the robots.txt commands. Since the documents are to be secure, I do not want ANY spiders getting into this virtual directory (not even the good ones).
Read a few articles on the web and wondering how programmers (rather than web masters) have solved this problem in their applications, since this seems like a very common problem. There are many options on the web but am looking for something that is easy and elegant.
Some options that I have seen, but seem to be weak. Listed here with their cons:
Creating a Honeypot/tarpit that will allow rogue spiders to get in and then will list their IP address. Cons : this can also block valid users coming from the same IP, need to manually maintain this list or have some way for members to remove themselves from the list. We dont have a range of IPs that valid members will use, since the website is on the internet.
Request header analysis : However, the rogue spiders use real agent names so this is pointless.
Meta-Robots tag: Cons: only obeyed by google and other valid spiders.
There was some talk about using .htaccess which is suppose to be good but thats only will apache, not IIS.
Any suggestions are very much appreciated.
EDIT: as 9000 pointed out below, rogue spiders should not be able to get into a page that requires a login. I guess the question is 'how to prevent someone who knows the link form requesting the PDF file without logging into the website'.

I see a contradiction between
members have to log in and then they can view some PDF files stored in a virtual directory
and
this will NOT prevent Rogue spiders from indexing the PDF files
How come any unauthorized HTTP request to this directory ever gets served with something else than code 401? The rouge spiders certainly can't provide an authorization cookie. And if the directory is accessible to them, what is 'member login' then?
Probably you need to serve the PDF files via a script that checks authorization. I think IIS is capable of requiring an authorization just for a directory access, too (but I don't really know).

I assume that your links to PDFs come from a known location. You can check the Request.UrlReferrer to make sure users are coming from this internal / known page to access the PDFs.
I would definitely force downloads to go through a script where you can check that a user is in fact logged in to the site before allowing the download.
protected void getFile(string fileName) {
/*
CHECK AUTH / REFERER HERE
*/
string filePath = Request.PhysicalApplicationPath + "hidden_PDF_directory/" + fileName;
System.IO.FileInfo fileInfo = new System.IO.FileInfo(filePath);
if (fileInfo.Exists) {
Response.Clear();
Response.AddHeader("Content-Disposition", "attachment; filename=" + fileInfo.Name);
Response.AddHeader("Content-Length", fileInfo.Length.ToString());
Response.ContentType = "application/pdf";
Response.WriteFile(fileInfo.FullName);
Response.End();
} else {
/*
ERROR
*/
}
}
Untested, but this should give you an idea at least.
I'd also stay away from robots.txt since people will often use this to actually look for things you think you're hiding.

Here is what I did (expanding on Leigh's code).
Created an HTTPHandler for PDF files, created a web.config on the secure directory and configured the Handler to handle PDFs.
In the handler, I check to see if the user is logged in using a session variable set by the application.
If the user has the session variable, I create a fileInfo object and send it on the response. Note : don't do 'context.Response.End()', also the 'Content-Disposition' is obsolete.
So now, where even there is a request for a PDF on the secure directory, the HTTP handler gets the request and checks to see if the user is logged in. If not, display error message, else display the file.
Not sure if there is an performance hit since I am creating the fileInfo objects and sending that, rather than sending the file that already exists. The thing is that you can't Server.Transfer or Response.Redirect to the *.pdf file since you are creating an infinite loop and the response will never get returned to the user.

Related

Security Testing - How to test file upload feature for malicious upload

Need to test file upload feature for security. Purpose is to avoid/stop any type of malicious files from being uploaded.
Thanks !!

There are multiple vulnerabilities that usually come up around file uploads/downloads.
Malware in uploaded files
Any uploaded file should be virus-checked. As #CandiedOrange responded, you can use the EICAR test for that purpose.
Path injection
The filename for an uploaded file is te same type of user input as any other field in the request, an attacker can freely choose the filename. As a tester, you can send something like "../filename" to try and save it to unintended locations or to overwrite other files.
Filetypes
If the filetype restriction is only on the client, that's obviously useless for security. But even if the file extension is restricted on the server side, say only .pdf is allowed, you can still try to upload something.pdf.php or something.pdf.exe or similar to get around the filter. It's best if the application uses some real content discovery to find out if the uploaded file is actually an allowed filetype.
Content sniffing
Some browsers have this awesome (not) feature that when a file is downloaded, the browser looks into its content and displays it according to the content, regardless of the content type header received from the server. This means even if uploads are restricted to say .pdf, an attacker might upload an html file with javascript, in a file named "something.pdf" and when somebody else downloads that file, the browser may run the javascript, thus making the application vulnerable to XSS. To prevent this, the application should send the X-Content-Type-Options: nosniff response header.
Uploaded file size
If an attacker can upload too many or too big files, he may be able to achieve denial of service by filling up the space on the server.
Download without restriction (direct object reference)
An application might save uploaded files to a location directly accessible to the webserver. In such a case, download links would look similar to /uploads/file.pdf. This is only suitable for public files, access control cannot be enforced that way, anybody that has the link can download the file.
Lack of access control
If files are not available to all logged on users, the application must perform authorization to decide whether the user that's logged in can actually download the file he is requesting. Too many times this authorization step is missing or flawed, resulting in the application being able to serve the wrong files to users cleverly modifying requests.
So the bottom line is, file upload/download vulnerabilities are much more than just virus checking uploaded files.

If you're security is signature based consider uploading an EICAR test file. It should trigger your protection and if it doesn't, and is somehow executed, all it will do is print "EICAR-STANDARD-ANTIVIRUS-TEST-FILE!" and stop.

Well you can activate malware protection on your network firewall. Snort is good option for protecting websites.
You can also add input filters to your application code so it checks if the uploaded file has malware

is there any security issue if I send a path in a QueryString?

is there any security issue if I send a path in a QueryString? like if send this request http://localhost/eCTDTreeViewer/Home/Index/?pathOnServer=G:\test\company2

Thinking about QueryString security, you should keep in mind (read as "worry") the following moments:
URLs are stored in web server logs
URLs are stored in the browser history
URLs are passed in Referrer headers
You can find more detailed information about this reading How secure are query strings over HTTPS article and Is an HTTPS query string secure? question on SO.

The risk of exposing a path, given the filesystem is not externally accessible, is negligible.
Especially if the sole purpose of the component you're talking about is to display directories as they exist on the server. What you see in the query string is what you will see in the payload of the response, so it's just fine having the path there in plain text.
Trouble can arise when this "TreeViewer" exposes sensitive files and allows the user to browse to arbitrary locations, enabling them to retrieve passwords stored in files and what not.
Of course it never hurts to add HTTPS, but that only prevents a man in the middle from finding out which directories and files exist on that server and does not offer anny additional security.
HTTPS does not make your improperly secured application secure, you still have to implement authentication and authorization, input sanitation and so on.

Yes, you open yourself up to Directory Traversal (DT) and Local File Inclusion (LFI) attacks.
The main difference between the two is that DT is read-only in which a user can access any file on your web server provided that they have sufficient privileges. LFI on the other hand would allow you to invoke a file (e.g. a PHP file) on the web server rather than reading it.
If, for example, you have a SQL Injection vulnerability on your web application, an attacker may deploy a web shell into your system:
SELECT "<?php system($_GET['cmd']); ?>" INTO OUTFILE C:/tmp/shell.php
An attacker could then invoke the file:
http://localhost/eCTDTreeViewer/Home/Index/?pathOnServer=C:/tmp/shell.php?cmd=echo "foo"
This is very brief but it should provide a good idea as to how dangerous it can be.

If you stay in plain HTTP, yes. The request will be sent in plain text over the network. Don't be confused, it will be the same issue with a POST request with your information inside the body of it.
The good way to make it safe is to use HTTPS. Because of the handshake done before the exchange, the full request will be encrypted (with the path as well) to be sent to the endpoint.

Allow only certain people to view a file

Let's say I have a swf with a movie or something (it's a stream actually but it doesn't really matter).
I created a quite secure way to get to the page where it is displayed (as embed). The only problem is this:
How do I stop someone to view source or use something like Firebug and send the address of the file to somebody else.
I want them to see the result but not be able to send it to anyone else.
The platform for my site is LAMP.

You can't do this.
if you don't want the client to know something, the only option is to not tell it;
if you don't tell the client where the file is, he can't possibly view it.
What you should do instead, is to have some authentication and authorization in place, so that only authenticated and authorized users can access said address. That way users can share the address as they like, but unauthorized users can only get "Access denied" messages.
This does not prevent authorized users from downloading the file and hosting it somewhere else, though. If you don't trust them not to do this, don't authorize them.

I created a quite secure way to get to the page where it is displayed
intriguing. Please describe this secure way in detail.
If its secure in the sense that it authenticates and authorises the user, then it provides the security you ask about from the sharing of URLs.

You could implement a refer[r]er check to prevent users accessing the file directly.
$referer = $_SERVER['HTTP_REFERER'];
$components = parse_url($referer);
if ($components['host'] != "www.example.com") {
// User didn't access file from your site.
}
This can be circumvented by spoofing the referer header but should defeat casual users.
Naturally, this doesn't prevent someone downloading and re-hosting the file.

How to prevent XSS injection while allowing users to post external images

A user recently reported to me that they could exploit the BBCode tag [img] that was available to them through the forums.
[img=http://url.to.external.file.ext][img]
Of course, it would show up as a broken image, however the browser would retrieve the file over there. I tested it myself and sure enough it was legit.
I'm not sure how to prevent this type of XSS injection other than downloading the image and checking if it is a legitimate image through PHP. This easily could be abused with a insanely huge file.
Are there any other solutions to this?

You could request the headers and check if the file is actually an image.
Edit:
Sorry that I couldn't answer in more depth; I was enjoying dinner.
There are two ways I see it:
You check to see if the supplied address is actually a image when the post is submitted or viewed, you could accomplish this by checking the headers (making sure it's actually an image) or by using file extension. This isn't fool-proof and has some obvious issues (changing the image on the fly, etc.).
Secure your site that even if there is a compromise with the [img] tag there is no real problem, for example: the malicious code can't use stolen cookies.
Use a script that requests an external image and modifies the headers.
A basic way to check the remote files content type:
$Headers = get_headers('http://url.to.external.file.ext');
if($Headers[8] == 'text/html') {
echo 'Wrong content type.';
exit;
}

There's only two solutions to this problem. Either download the image and serve from your webserver, or only allow a white-list of url patterns for the images.
Some gotchas if you decide to download the images -
Make sure you have a validation for the maximum file size. There are ways to stop the download if the file exceeds a certain size, but these are language specific.
Check that the file is actually an image.
If you store it on the hard-disk, be sure to rename it. You shouldn't allow the user to control the file name on the system.
When you serve the images, use a throw-away domain, or use naked ip address to serve the images. If the browser is ever tricked in thinking the image is executable code, the same-origin policy will prevent further damage.

Displaying PDF to user

We're providing a web form whereby users fill in their personal information; some of it is sensitive information (SSN, Birthday, etc). Upon user submission, the data is prefilled into a PDF which is then made available via a link.
We are creating the PDF in a folder that has write access on the website.
How can we safely create and add PDFs in this folder, with whatever naming scheme (use a GUID?), such that another user cannot guess/spoof the PDF file location, type this in the URL and access another person's PDF?
Maybe the PDF folder has rights only specific to the user, but that may be a different question on how that is accomplished. (The number of users is unknown, as this will be open to public).
Any thoughts on this? In a nut shell, we need to allow the user to view a PDF of the data they just entered while preventing more-savvy users to figure out the location of PDF files, allowing access to other files.
Thanks!

trying to obfuscate the path to a file isn't really making it secure. I would find a way to email or another way to fetch it for the user instead of allowing access to an open directory.
Make the web app fetch the file for the user instead of relying on web server open folder permissions.
just keep in mind obfuscation isn't really security.

If it's really just for the moment, create a completely random file (20384058532045850.pdf) in a temporary directory, serve that to the user immediately and remove it after a certain period of time.
Whether your web app has write rights on that directory or not (I assume you are talking about chmod user rights) is not important, it can't be breached trough the web server and I don't see a problem in revealing the directory path per se - you have to reveal something in giving the user a URL to download. If your PDF names are random enough, there is practically no risk of somebody being able to guess the name of another PDF file in the same directory.
As the PDF contains sensitive data: Don't forget to turn off caching to prevent a local copy of the PDF being saved on the client's browser cache.
I don't know for sure whether turning off caching through the appropriate headers is enough to prevent local caching in all browsers. You might have to look into that.

For the purpose of pdf's, would it not be better (I know I will get flamed for this) to store the actual pdf into the database as a BLOB, which would be on the back-end of the website in question?
There will be no reference to the URL anywhere nor will there be a specific path highlighted in any links on that form.
Hope this helps,
Best regards,
Tom.

The simplest way is to proxy the file through your application (fpassthru() in php for example), this allows you to use what ever access control/identification system you already use for the dynamic content.
If you don't have any means of identifying your users and restricting access, and assuming your platform has a secure session mechanism, you can protect the file by storing the filename in the user's session and then returning that file (and only that file) to the user when requested. This should mean that an attacker would have to spoof a session to access the file so this should be as secure as your session mechanism is.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string