Technologies for File Sharing - file-sharing

We presently have a social networking kind of platform. We are next working on file sharing feature, wherein the user should be able to upload and share files(pdf,ppt,docs,images,zip) with friends and groups.
Which specific technologies we should look out for? We are not looking for storage providers like Dropbox, Amazon s3 as answer. We want some advice for efficient storage technologies. We have to store attributes of files like author, with whom the file is shared, edit rights, download rights etc.
Any help would be appreciated.

The answer depends on your specific requirements. In general, you should look for a provider that offers high availability (e.g. no single point of failure), high durability (once something is written, it stays written), and high performance (low latency, high throughput). In addition, you may want certain security features but the specifics are, again, a function of your needs. You noted the ability to specify sharing attributes so you'll want a provider that has a high degree of flexibility and control in terms of specifying access permissions. To store related data, like authorship, you'll want the ability to store and retrieve arbitrary meta data associated with the storage object. Finally, while you stated you don't want a specific provider recommendation, I will nonetheless add that Google Cloud Storage is an excellent choice because it provides all of the above functionality and more (full disclosure: I work on Google's cloud products).

Related

Architecture decisions for system comprising mobile app with database in cloud and varying user restriction levels

I am looking to develop an app that is to be used by a fairly small number of people and which has to store and recall data from a cloud database. Users should have various access levels in that some can create stuff, some just read, others modify, some can do anything etc. Just like you would do on a file system.
I am currently considering Azure (very new to it) and thinking what would be the components involved in the project. Obviously, a mobile app (Xamarin.Forms) would be front end. Some kind of Cosmos DB or another database in the cloud. Blob storage too for the media files created by users. But my main question is how to implement the control of what user can do what actions to which data.
A simple way would be to do it within the app itself, but that is counter intuitive and a security risk. Even though this is internal app used by people in the same or sister organizations, it really sounds bad.
Best option would be if that's handled by database itself, but I am not aware of existence of such mechanism. Hopefully, this actually exists and someone will point me in the right direction.
Only other way I see is having some kind of mid layer, still on the back end but just before database. However that also seems clunky and am also unaware of how to even implement it "in cloud".
What would be my actual options?
To clarify, it's about having permissions assigned based on certain columns of a table, for example, and not about having different tables with different user that share parts of data.
That's why it is "Architecture decisions" question, and not "how do i give read permissions to user X of my database Y".
An answer might be "Database X" has what you want. Or, least favourably, "There's no way to offload that to DB. You will have to keep all data separately, so that users can only operate on their set of data, and then collate stuff on the backend". Or something in between, perhaps.
I'm not knowledgeable with Azure or any of that other stuff, but every DBMS will have user accounts that enable different permissions, eg for Apache Derby, MySQL, etc.
I would never implement authentication on the client side.

Is data removal on Ceph cluster DoD compliant?

I'm currently researching cloud storage solutions and I came across Ceph which looks quite interesting. I need it for a project where customers can store data that needs to be processed by a piece of software. Potentially that data contains sensitive information, which brings me to my actual question: if a customer or an automated system removes data from the Ceph cluster, do I have to take further steps to ensure a DoD compliant removal?
Assessing Department of Defence compliance without listing a standard or security level of the information leads to a lot of guess-work and assumptions on the answerers part.
The aforementioned being said, the definitive answer is yes, you will have to take additional steps to adhere to any applicable data erasure standards. Ceph does not provide any automated sanitizing processes to remove data from disks, however, the general practice for decommissioning disks that may have held sensitive information includes strict chain-of-custody, degaussing and destruction procedures. Typical government standards also call for verification of data sanitation and usually exclude the sanitizing system from performing the verification
Generally, overwrite procedures (such as the superseded DoD 5220.22-M standard) are no longer considered sufficient to mitigate possible recovery tactics, and only layered defences including the final destruction of the disk have been demonstrated to be effective.
Additionally, Ceph is generally not considered a "cloud storage solution" as it is not typically used on top of a cloud platform, but rather is used to provide distributed storage leveraged in some on-premise solution. Using Ceph on top of something like AWS's Elastic Block Storage or GCP's Persistent Disk is not advisable.

Are services like AWS secure enough for an organization that is highly responsible for it's clients privacy?

Okay, so we have to store our clients` private medical records online and also the web site will have a lot of requests, so we have to use some scaling solutions.
We can have our own share of a datacenter and run something like Zend Server Cluster Manager on it, but services like Amazon EC2 look a lot easier to manage, and they are incredibly cheaper too. We just don't know if they are secure enough!
Are they?
Any better solutions?
More info: I know that there is a reference server and it's highly secured and without it, even the decrypted data on the cloud server would be useless. It would be a bunch of meaningless numbers that aren't even linked to each other.
Making the question more clear: Are there any secure storage and process service providers that guarantee there won't be leaks from their side?
First off, you should contact AWS and explain what you're trying to build and the kind of data you deal with. As far as I remember, they have regulations in place to accommodate most if not all the privacy concerns.
E.g., in Germany such thing is a called a "Auftragsdatenvereinbarung". I have no idea how this relates and translates to other countries. AWS offers this.
But no matter if you go with AWS or another cloud computing service, the issue stays the same. And therefor, whatever is possible is probably best answered by a lawyer and based on the hopefully well educated (and expensive) recommendation, I'd go cloud shopping, or maybe not. If you're in the EU, there are a ton of regulations especially in regards to medical records -- some countries add more to it.
From what I remember it's basically required to have end to end encryption when you deal with these things.
Last but not least security also depends on the setup and the application, etc..
For complete and full security, I'd recommend a system that is not connected to the Internet. All others can fail.
You should never outsource highly sensitive data. Your company and only your company should have access to it - in both software and hardware terms. Even if your hoster is generally trusted someone there might just steal hardware.
Depending on the size of your company you should have your custom servers - preferable even unaccessible for the technicans in your datacenter (supposing you don't own the datacenter ;).
So the more important the data is, the less foreign people should have access to it in any means. In the best case you can name all people that have access to them in any way.
(Update: This might not apply to anonymous data, but as you're speaking of customers I don't think that applies here?)
(On a third thought: There're are probably laws to take into consideration of how you have to handle that kind of information ;)

Google Docs as Content Management System

I'm thinking of using Google Docs as a content management system, and to integrate it with my java/j2ee web application.
I only need to upload, view, search meta-data, and organize docs.
Would anybody have a reason to believe I should not try this?
One good reason not to do that is that then you have no control over your system's uptime. Google does occasionally have outages, which would take your system down as well.
In addition, by storing them on Google's servers, you are giving up any control over privacy. There is nothing you can do to ensure Google's security of both their live systems and their backup systems will never be broken, and if they get broken in to, your documents' privacy is lost. In addition, you'll need to keep an eye on Google's terms of use. They may very well update it to read "We reserve the right to sell your documents to whomever we please." which may include your competitors.
That being said, if downtime won't break you, and privacy isn't a huge concern, it doesn't sound like a bad idea. Just make sure they're not the ONLY place you're storing your documents.

network drive file sharing

For the better part of 10 years + we have relied on various network mapped drives to allow file sharing. One drive letter for sharing files between teams, a seperate file share for the entire organization, a third for personal use etc. I would like to move away from this and am trying to decide if an ECM/Sharepoint type solution, or home grown app, is worth the cost and the way to go? Or if we should simply remain relying on login scripts/mapped drives for file sharing due to its relative simplicity? Does anyone have any exeperience within their own organization or thoughts on this?
Thanks.
SharePoint is very good at document sharing.
Documents generally follow a process for approval, have permissions, live in clusters... and these things lend themselves well to SharePoints document libraries.
However there are somethings that don't lend themselves well to living inside SharePoint... do you have a virtual hard drive (.vhd) file that you want to share with a workmate? Not such a good idea to try and put a 20GB file into SharePoint.
SharePoint can handle large files, and so can SQL Server behind it... but do you want your SQL Server bandwidth being saturated by such large files? Do you want your backup of SQL Server to hold copies of such large files multiple times?
I believe that there are a few Microsoft partners who offer the ability to disassociate file blobs from the SharePoint database, so that SharePoint can hold the metadata and a file system holds the actual files, and SharePoint simply becomes the gateway to manage access, permissions, and offer a centralised interface to files throughout an organisation. This would offer you the best of both worlds.
Right now though, I consider SharePoint ideal for documents, and I keep large files (that are not document centric) on Windows file shares.
Definetely, use a tool.
The main benefit here is version control. Being able to jump easily to a previous version, diff'ing and seeing who modified what (see most VCS' blame/annotate tool- it prints out a text file showing when/who modified each line in the text file).
Second, you can probably benefit from issue tracking/task tracking.
Other benefits include web access from the internet, having a wiki (which can be great in some situations), etc.
I use Subversion + Redmine at work, and I find it highly useful- test a few solutions and you will surely find out further advantages for you.
One thing that can be overlooked in the change to an document management tool is the planning required around how much is going to be stored and information architecture issues like where different content is going to end up.
SharePoint particularly is easy to setup without a good plan going forward and is particularly vulnerable to difficulties later on when things get to busy.
I would not recommend a home grown app for something like this. The problem has been solved by off the shelf tools and growing one from scratch is going to cost a huge amount and not get you any way near the features for the money.
Did I mention how important planning your security groups and document areas (IA) was?
If you need just document storage then sharepoint can do very well. WSS is ewen free and it provides very good document storage capabilities.
But you have to plan carefully as updating existing applications is painfull. If you decide to go with Sharepoint then I can give you few advices from top of my head
Pay attention to security configuration (user groups, privilegies,..)
Plan your document libraries well as it is not easy to just move documents betveen them
Also consider limiting number of versions that one document can have, because sharepoint stores full backups betveen verions, not just changes
Don't use infopath:) we have very bad experience with it (just don't tell this to managers)
If you don't really need to change graphical look of Sharepoint than don't bother with it as it brings many problems (I'm talking about custom masterpages and custom site templates)
Try to use as much OOB stuff as possible, because developing your own webparts not only cost more, but it can be quite complicated.
Make sure to turn-on search indexing. This is quite tricky, because it is by default turned off and then you will be as surprised that search is not working as I was :)
If you try to just deploy it and load 10.000 documents into it then you will surely have problems with it later. If you give a little thought about structure then you will end up with really good document storage.
Migrating is very probably worth the cost in the long term. You will gain reliability, versioning, traceability, and extensibility.
Be sure to first identify the groups/rights, and to identify which links need to be fixed (maybe you have applications that use links to the shares).
An open source alternative to SharePoint is Alfresco, it is very good for CIFS (Windows shares) too.

Resources