Best image upload directory structure practises? - node.js

I have developed a large web application with NodeJS. I allow my users to upload multiple images to my Google Cloud Storage bucket.
Currently, I am storing all images under the same directory of /uploads/images/.
I'm beginning to think that this is not the safest way, and could effect performance later down the track when the directory has thousands of images. It also opens up a threat since some images are meant to be private, and it could allow users to search for images by guessing a unique ID, such as uploads/images/29rnw92nr89fdhw.png.
Would I be best changing my structure to something like /uploads/{user-id}/images/ instead? That way each directory only has a couple dozen images. Although, can a directory handle thousands of other subdirectories without suffering performance issues? Does Google Cloud Storage happen to accomodate for issues like this?

GCS does not actually have "directories." They're an illusion that the UI and command-line tools provide as a nicety. As such, you can put billions of objects inside the same "directory" without running into any problems.
One addendum there: if you are inserting more than a thousand objects per second, there are some additional caveats worth being aware of. In such a case, you would see a performance benefit to avoiding sequential object names. In other words, uploading /uploads/user-id/images/000000.jpg through /uploads/user-id/images/999999.jpg, in order, in rapid succession, would likely be slower than if you used random object names. GCS has documentation with more on this, but this should not be a concern unless you are uploading in excess of 1000 objects per second.
A nice, long GUID should be effectively unguessable (or at least no more guessable than a password or an access token), but they do have the downside of being non-revocable without renaming the image. Once someone knows it, they know it forever and can leak it to others. If you need firm control of your objects, you could keep them all private and visible only to your project and allow users to access them only via signed URLs. This offers you the most flexibility and control, but it's also harder to implement.

Related

Saving user data from Chrome Extension to global variable, then shared for all users

Wondering if this is at all possible. I'm working on a Chrome extension where, as users browse a particular site, certain elements on the page are saved to chrome.storage.local (or chrome.storage.sync). Those elements are then called again later on a different page. However, it would be useful to allow all users to save this data to 1 global variable/source, and all users be able to read from that variable/source. Do Chrome extensions have any method of accomplishing this?
The data in question isn't anything sensitive, it's not authentication info or anything. The reason I'm hoping to do this and not just save static variables or JSON objects within a content script is that the website I'm building this for changes fairly frequently, and I would rather that data not be completely static.
Thank you!
Not possible natively but there are lots of ways to do it for free (given you have few users and load and assuming you dont surpass their free quotas or rate limits) like a google appengine backend or a public google spreadsheet as sync. For the spreadsheet case, you can store as rows or put everything on a single cell. For appengine, the datastore has free quotas for read/write and free store quota (with limits and rate limits of course).

Is CouchDB per-user database approach feasible for users with lots of shared data?

I want to implement a webapp - a feed that integrates data from various sources and displays them to users. A user should only be able to see the feed items that he has permissions to read (e.g. because they belong to a project that he is a member of). However, a feed item might (and will) be visible by many users.
I'd really like to use CouchDB (mainly because of the cool _changes feed and map/reduce views). I was thinking about implementing the app as a pure couchapp, but I'm having trouble with the permissions model. AFAIK, there are no per-document permissions in CouchDB and this is commonly implemented using per-user databases and replication.
But when there is a lot of overlap between what various users see, that would introduce a LOT of overhead...stuff would be replicated all over the place and duplicated in many databases. I like the elegance of this approach, but the massive overhead just feels like a dealbreaker... (Let's say I have 50 users and they all see the same data...).
Any ideas how on that, please? Alternative solution?
You can enforce read permissions as described in CouchDB Authorization on a Per-Database Basis.
For write permissions you can use validation functions as described on CouchDB
The Definitive Guide - Security.
You can create a database for each project and enforce the permissions there, then all the data is shared efficiently between the users. If a user shares a feed himself and needs permissions on that as well you can make the user into a "project" so the same logic applies everywhere.
Using this design you can authorize a user or a group of users (roles) for each project.
Other than (as victorsavu3 has suggested already) handling your read auth in a proxy between your app and couch, there are only two other alternatives that I can think of.
First is to just not care, disk is cheap and having multiple copies of the data may seem like a lot of unnecessary duplication, but it massively simplifies your architecture and you get some automatic benefits like easy scaling up to handle load (by just moving some of your users' DBs off to other servers).
Second is to have the shared data split into a different DB. This will occasionally limit things you can do in views (eg. no "Linked Documents") but this is not a big deal in many situations.

What kind of security issues will I have if I provide my web app write access?

I would like to give my web application write access to a particular folder on my web server. My web app can create files on this folder and can write data to those files. However, the web app does not provide any interface to the users nor does it publicize the fact that it can create files or write to files. Am I susceptible to any security vulnerabilities? If so, what are they?
You are suspectible to having your server tricked into writing malicious files into that location.
The issues that can arrive out of that depend on what happens with that folder.
Is it web-accessible?
Then malicious files can be hosted, such as stealing cookies or serving up malware.
Is it a folder where applications are executed automatically?
This would be madness. Do not do this.
Is just some place where you store files for later processing?
Consider what could happen if malicious files are put there. Malicious PDFs, say, fed into your PDF processing system, and then some PDF bug is executed that causes some malicious code to be executed, and then it's all over.
Basically, the issue you expose yourself to, potentially, is as I said - malicious files in that location. You can think through carefully what happens in that folder, and how exposed it is, and decide for yourself how risky it is.
With those risks identified, you can then decide how to go ahead. And obviously, you probably don't allow direct uploads to that area, so you can consider the risk being significantly less, because you are basically assessing a situation in which someone has identified a bug in your webserver that lets them, without you providing access, telling it to save some file in some place. I'd hazard to say there aren't hundreds of these types of issues. There may be though. Hence the reason it is appropriate to minimise the risk of a file in that folder, by making sure the folder and files therein are used in a restricted way and, if possible, checked to see if they are "good" files.

What security issues appear when users can upload their own files?

I was wondering what security issues appear when the end user of a website can upload files to the server.
For instance if my website allows the users to upload a profile picture, and one user uploads something harmful instead, what could happen? What kind of security should I set up to prevent attacks like this? I'm talking here about images, but what about the case where a user can upload anything into a file-vault kind of application?
It's more a general question than a question about a specific situation, so what are the best practices in that situation? What do you usually do?
I suppose: type validation on upload, different permissions for uploaded files... what else?
EDIT: To clear up the context, I am thinking about a web application where a user can upload any kind of file and then display it in the browser. The file would be stored on the server. The users are whoever uses the website, so there is no trust involved.
I am looking for general answers that could apply for different languages/framework and production environments.
Your first line of defense will be to limit the size of uploaded files, and kill any transfer that is larger than that amount.
File extension validation is probably a good second line of defense. Type validation can be done later... as long as you aren't relying on the (user-supplied) mime-type for said validation.
Why file extension validation? Because that's what most web servers use to identify which files are executable. If your executables aren't locked down to a specific directory (and most likely, they aren't), files with certain extensions will execute anywhere under the site's document root.
File extension checking is best done with a whitelist of the file types you want to accept.
Once you validate the file extension, you can then check to verify that said file is the type its extension claims, either by checking for magic bytes or using the unix file command.
I'm sure there are other concerns that I missed, but hopefully this helps.
Assuming you're dealing with only images, one thing you can do is use an image library to generate thumbnails/consistent image sizes, and throw the original away when you're done. Then you effectively have a single point of vulnerability: your image library. Assuming you keep it up-to-date, you should be fine.
Users won't be able to upload zip files or really any non-image file, because the image library will barf if it tries to resize non-image data, and you can just catch the exception. You'll probably want to do a preliminary check on the filename extension though. No point sending a file through the image library if the filename is "foo.zip".
As for permissions, well... don't set the execute bit. But realistically, permissions won't help protect you much against malicious user input.
If your programming environment allows it, you're going to want to run some of these checks while the upload is in progress. A malicious HTTP client can potentially send a file with an infinite size. IE, it just never stops transmitting random bytes, resulting in a denial of service attack. Or maybe they just upload a gig of video as their profile picture. Most image file formats have a header at the beginning as well. If a client begins to send a file that doesn't match any known image header, you can abort the transfer. But that's starting to move into the realm of overkill. Unless you're Facebook, that kind of thing is probably unnecessary.
Edit
If you allow users to upload scripts and executables, you should make sure that anything uploaded via that form is never served back as anything other than application/octet-stream. Don't try to mix the Content-Type when you're dealing with potentially dangerous uploads. If you're going to tell users they have to worry about their own security (that's effectively what you do when you accept scripts or executables), then everything should be served as application/octet-stream so that the browser doesn't attempt to render it. You should also probably set the Content-Disposition header. It's probably also wise to involve a virus scanner in the pipeline if you want to deal with executables. ClamAV is scriptable and open source, for example.
size validation would be useful too, wouldn't want someone to intentionally upload a 100gb fake image just out of spite now would you :)
Also, you may want to consider something to prevent people from using your bandwidth just for a easy way to host images (I would mostly be concerned with hosting of illegal stuff). Most people would use imageshack for temp image hosting anyway.
For further reading, there's a great article by Acunetix on Why File Upload Forms are a Major Security Threat
With more context, it would be easier to know where the vulberabilities may lie.
If the data could be stored in a database (sounds like it won't be), then you should guard against SQL Injection attacks.
If the data could be displayed in a browser (sounds like it would be), then you may need to guard against HTML/CSS Injection attacks.
If you're using scripting languages (e.g., PHP) on the server, then you may need to guard against injection attacks against those specific languages. With compiled server code (or a poor scripting implementation), there's the chance of buffer overrun attacks.
Don't overlook user data security, too: Can your users trust you to prevent their data from being compromised?
EDIT: If you really want to cover all bases, consider the risks of JPEG and WMF security holes. These could be exploited if a malicious user can upload the files from one system, and then views the files -- or persuades another user to view the files -- from another system.
Size of the content
Restricting certain file types (.jpeg, .png etc., white-listed file types should only be allowed)
file tampering (for ex: a site supporting foreign languages, certain encoding is allowed. the hacker may take advantage of this and adds any script/malicious code encoded and appends to the original file and tries to upload)

network drive file sharing

For the better part of 10 years + we have relied on various network mapped drives to allow file sharing. One drive letter for sharing files between teams, a seperate file share for the entire organization, a third for personal use etc. I would like to move away from this and am trying to decide if an ECM/Sharepoint type solution, or home grown app, is worth the cost and the way to go? Or if we should simply remain relying on login scripts/mapped drives for file sharing due to its relative simplicity? Does anyone have any exeperience within their own organization or thoughts on this?
Thanks.
SharePoint is very good at document sharing.
Documents generally follow a process for approval, have permissions, live in clusters... and these things lend themselves well to SharePoints document libraries.
However there are somethings that don't lend themselves well to living inside SharePoint... do you have a virtual hard drive (.vhd) file that you want to share with a workmate? Not such a good idea to try and put a 20GB file into SharePoint.
SharePoint can handle large files, and so can SQL Server behind it... but do you want your SQL Server bandwidth being saturated by such large files? Do you want your backup of SQL Server to hold copies of such large files multiple times?
I believe that there are a few Microsoft partners who offer the ability to disassociate file blobs from the SharePoint database, so that SharePoint can hold the metadata and a file system holds the actual files, and SharePoint simply becomes the gateway to manage access, permissions, and offer a centralised interface to files throughout an organisation. This would offer you the best of both worlds.
Right now though, I consider SharePoint ideal for documents, and I keep large files (that are not document centric) on Windows file shares.
Definetely, use a tool.
The main benefit here is version control. Being able to jump easily to a previous version, diff'ing and seeing who modified what (see most VCS' blame/annotate tool- it prints out a text file showing when/who modified each line in the text file).
Second, you can probably benefit from issue tracking/task tracking.
Other benefits include web access from the internet, having a wiki (which can be great in some situations), etc.
I use Subversion + Redmine at work, and I find it highly useful- test a few solutions and you will surely find out further advantages for you.
One thing that can be overlooked in the change to an document management tool is the planning required around how much is going to be stored and information architecture issues like where different content is going to end up.
SharePoint particularly is easy to setup without a good plan going forward and is particularly vulnerable to difficulties later on when things get to busy.
I would not recommend a home grown app for something like this. The problem has been solved by off the shelf tools and growing one from scratch is going to cost a huge amount and not get you any way near the features for the money.
Did I mention how important planning your security groups and document areas (IA) was?
If you need just document storage then sharepoint can do very well. WSS is ewen free and it provides very good document storage capabilities.
But you have to plan carefully as updating existing applications is painfull. If you decide to go with Sharepoint then I can give you few advices from top of my head
Pay attention to security configuration (user groups, privilegies,..)
Plan your document libraries well as it is not easy to just move documents betveen them
Also consider limiting number of versions that one document can have, because sharepoint stores full backups betveen verions, not just changes
Don't use infopath:) we have very bad experience with it (just don't tell this to managers)
If you don't really need to change graphical look of Sharepoint than don't bother with it as it brings many problems (I'm talking about custom masterpages and custom site templates)
Try to use as much OOB stuff as possible, because developing your own webparts not only cost more, but it can be quite complicated.
Make sure to turn-on search indexing. This is quite tricky, because it is by default turned off and then you will be as surprised that search is not working as I was :)
If you try to just deploy it and load 10.000 documents into it then you will surely have problems with it later. If you give a little thought about structure then you will end up with really good document storage.
Migrating is very probably worth the cost in the long term. You will gain reliability, versioning, traceability, and extensibility.
Be sure to first identify the groups/rights, and to identify which links need to be fixed (maybe you have applications that use links to the shares).
An open source alternative to SharePoint is Alfresco, it is very good for CIFS (Windows shares) too.

Resources