With IPFS being distributed p2p storage and sharing, isn't there then a chance that someone could store something illegal on your machine if you are an IPFS provider?
Is there some mechanism that IPFS systems use to prevent this? How would someone even know if illegal content is stored on their machine, especially if they are only storing a part of the file?
I want to run an IPFS node on my machine, but I am unsure if I have to worry about malicious actors using my IPFS node.
The law is very clear here, you are not responsible for caching partial/complete files/metadata in almost all cases.
distributed p2p storage and sharing
No, it works more like in BitTorrent protocol and less like in TV series "Silicon Valley". You need to share a file and somebody will need to find a hash to download a file (the difference is that .torrent file was mostly preferred over magnet hash before around 2015 in BitTorrent, while in IPFS hashes are the only way, also the system is global, that means the complex hash function is used so that no collisions are possible (at least not in a billion years) and thus it can check for hashes over THE WHOLE network and thus do not store duplicate chunks of data that folder/file structure is reconstructed from).
The point here is that just like in BitTorrent you store no files you do not request, now just like in BitTorrent you can do IPFS BitSwap stuff to accelerate the swarms, that is what cloudflare-ipfs.com does and ipfs.infura.io and others (?). In BitTorrent such things also exist in particular to automatically attach to updated torrents that have the same hashes for file parts... That is very cool, but in IPFS it is done automatically. Also different servers exist that propagate .torrent file (a.k.a. magnet metadata) using just magnet hash. I believe even DHT crawlers play some role, like BTDigg or https://btdb.eu/, but not much of course, you can set up you own crawler (as Btdigg is open source) that will do precisely that: share metadata of torrents, that requires almost no resources... (You can even set up your own bootstrap supernode to create you OWN seperate DHT.) But is very cool to do as a lot of stuff can be found there. As I understand IPFS also does that by default, i.e. it stores some metadata to help data propagation. You can further read this:
https://discuss.ipfs.io/t/ipfs-propagation/4301
https://discuss.ipfs.io/t/how-fast-do-ipns-changes-propagate/311
https://docs.ipfs.io/concepts/bitswap/
There is also this: https://collab.ipfscluster.io/
Related
I'm used to working with user-uploaded files to the same server, and transferring my own files to a remote server. But not transferring user-uploaded files to a remote server.
I'm looking for the best (industry) practice for selecting a transfer protocol in this regard.
My application is running Django on a Linux Server and the files live on a Windows Server.
Does it not matter which protocol I choose as long as it's secure (FTPS, SFTP, HTTPS)? Or is one better than the other in terms of performance/security specifically in regards to user-uploaded files?
Please do not link to questions that explain the differences of protocols, I am asking specifically in the context of user-uploaded files.
As long as you choose a standard protocol that provides (mutual) authentication, encryption and message authentication, there is not much difference security-wise. If all of this is provided by a layer of TLS in your chosen protocol (like in all of your examples), you can't make a big mistake on a design level (but implementation is key, many security bugs are bugs of implementation, and not design flaws). Such protocols might differ in the supported list of algorithms for different purposes though.
Performance-wise there can be quite significant differences, it depends on what you want to optimize for. If you choose HTTPS, you won't be able to keep a connection open for a long time, and would most probably have to bear the overhead of the whole connection setup with authentication and everything, for every transmitted file. (Well, you can actually keep a https connection open, but that would be quite a custom implementation for such file uploads.) Choosing FTPS/SFTP you will be able to keep a connection open and transmit as many files as you want, but would probably have to have more complex error handling logic (sometimes connections terminate without the underlying sockets knowing about it for a while and so on). So in short I think HTTPS would be more resilient, but secure FTP would be more performant for many small files.
It's also an architecture question, by using HTTPS, you would be able to implement all of this in your application code, while something like FTP would mean dependence on external components, which might be important from an operational point of view (think about how this will actually be deployed and whether there is already a devops function to manage proper operations).
Ultimately it's just a design decision you have to make, the above is just a few things that came to mind without knowing all the circumstances, and not at all a comprehensive list of things to consider.
Tim Berners-Lee recently announced Solid.
How much is this different from ipfs and will it be possible to use them together?
The technical spec makes it sound as though it does not compete with IPFS, since everything seems to be done in the usual single-server-over-HTTP regime, which makes me think that it should be able to be used on top of IPFS instead with minimal pain. Really the decentralized part seems to be the access to the data, not the storage of the data itself, which is a key value-add for IPFS.
One unfortunate thing is that they have come up with their own version of IPLD called Linked Data, so I'm not sure how that would interface with IPFS's content addressing solution.
I expect the first place more answers will show up is this forum thread.
Solid and IPFS are compatible but have different trade offs.
Both systems are driven by URIs, and typically Solid will use http style URIs, but is it not limited to them. IPFS will use ipfs type URIs, which play nicely with linked data.
The advantage of ipfs URIs is that they are content addressable and therefore can be long lived, mirrored and findable on a P2P network without the need for DNS. The advantage of http URIs is that they have a large network effect, lots of tooling, and most devices are able to interact with them without extra installation.
Both teams collaborate in a friendly way and even share developers. Hopefully as IPFS grows in popularity both systems can offer more choice to end users as a way to store data. Solid apps will be able to benefit from both types of data, and even mix them together.
I have developed a large web application with NodeJS. I allow my users to upload multiple images to my Google Cloud Storage bucket.
Currently, I am storing all images under the same directory of /uploads/images/.
I'm beginning to think that this is not the safest way, and could effect performance later down the track when the directory has thousands of images. It also opens up a threat since some images are meant to be private, and it could allow users to search for images by guessing a unique ID, such as uploads/images/29rnw92nr89fdhw.png.
Would I be best changing my structure to something like /uploads/{user-id}/images/ instead? That way each directory only has a couple dozen images. Although, can a directory handle thousands of other subdirectories without suffering performance issues? Does Google Cloud Storage happen to accomodate for issues like this?
GCS does not actually have "directories." They're an illusion that the UI and command-line tools provide as a nicety. As such, you can put billions of objects inside the same "directory" without running into any problems.
One addendum there: if you are inserting more than a thousand objects per second, there are some additional caveats worth being aware of. In such a case, you would see a performance benefit to avoiding sequential object names. In other words, uploading /uploads/user-id/images/000000.jpg through /uploads/user-id/images/999999.jpg, in order, in rapid succession, would likely be slower than if you used random object names. GCS has documentation with more on this, but this should not be a concern unless you are uploading in excess of 1000 objects per second.
A nice, long GUID should be effectively unguessable (or at least no more guessable than a password or an access token), but they do have the downside of being non-revocable without renaming the image. Once someone knows it, they know it forever and can leak it to others. If you need firm control of your objects, you could keep them all private and visible only to your project and allow users to access them only via signed URLs. This offers you the most flexibility and control, but it's also harder to implement.
I am looking for an encrypted version control system . Basically I would like to
Have all files encrypted locally before sending to the server. The server should never receive any file or data unencrypted.
Every other feature should work pretty much the same way as SVN or CVS does today.
Can anyone recommend something like this? I did a lot of searches but I cant find anything.
You should encrypt the data pipe (ssl/ssh) instead, and secure the access to the server. Encrypting the data would force SVN to essentially treat everything as a binary file. It can't do any diff, so it can't store deltas. This defeats the purpose of a delta-based approach.
Your repository would get huge, very quickly. If you upload a file that's 100kb and then change 1 byte and checkin again, do that 8 more times (10 revs total), the repository would be storing 10 * 100kb, instead of 100kb + 9 little deltas (let's call it 101kb).
Update: #TheRook explains that it is possible to do deltas with encrypted repository. So it may be possible to do this. However, my initial advice stands: it's not worth the hassle, and you're better off with encrypting the ssl/ssh pipe and securing the server. i.e. "best practices".
It is possible to create a version control system of cipher text. It would be ideal to use a stream cipher such as RC4-drop1024 or AES-OFB mode. As long as the same key+iv is used for each revision. This will mean that the same PRNG stream will be generated each time and then XOR'ed with the code. If any individual byte is different, then you have a mismatch and the cipher text its self will be merged normally.
A block cipher in ECB mode could also be used, where the smallest mismatch would be 1 block in size, so it would be ideal to use small blocks. CBC mode on the other hand can produce widely different cipher text for each revision and thus is undesirable.
I recognize that this isn't very secure, OFB and ECB modes shouldn't normally be used as they are weaker than CBC mode. The sacrifice of the IV is also undesirable. Further more it isn't clear what attack is being defended against. Where as using something like SVN+HTTPS is very common and also secure. My post is merely stating that it is possible to do this efficiently.
Why not set up your repo (subversion, mercurial, whatever) on an encrypted filesystem, and use ssh only to connect?
Use rsyncrypto to encrypt files from your plaintext directory to your encrypted directory, and decrypt files from your encrypted directory and your plaintext directory, using keys that you keep locally.
Use your favorite version control system (or any other version control system -- svn, git, mercurial, whatever) to synchronize between your encrypted directory and the remote host.
The rsyncrypto implementation you can download now from Sourceforge not only handles changes in bytes, but also insertions and deletions.
It implements an approach very similar to the approach that that "The Rook" mentions.
Single-byte insertions, deletions, and changes in a plaintext file typically cause rsyncrypto to make a few K of completely different encrypted text around the corresponding point in the encrypted version of that file.
Chris Thornton points out that ssh is one good solution; rsyncrypto is a very different solution. The tradeoff is:
using rsyncrypto requires transferring a few K for each "trivial" change rather than the half-a-K it would take using ssh (or on a unencrypted system). So ssh is slightly faster and requires slightly less "diff" storage than rsyncrypto.
using ssh and a standard VCS requires the server to (at least temporarily) have the keys and decrypt the files. With rsyncrypto, all encryption keys never leave the local computer. So rsyncrypto is slightly more secure.
SVN have built-in support for transferring data securely. If you use svnserve, then you can access it securely using ssh. Alternatively you can access it through the apache server using https. This is detailed in the svn documentation.
You could use a Tahoe-LAFS grid to store your files. Since Tahoe is designed as a distributed file system, not a versioning system, you'd probably need to use another versioning scheme on top of the file system.
Edit: Here's a prototype extension to use Tahoe-LAFS as the backend storage for Mercurial.
What specifically are you trying to guard against?
Use Subversion ssh or https for the repo access. Use an encrypted filesystem on the clients.
Have a look at GIT. It supports various hooks that might do the job. See, git encrypt/decrypt remote repository files while push/pull.
Have you thought of using Duplicity? It's like rdiff-backup (delta backups) but encrypted? Not really version control - but maybe you want all the cool diff stuff but don't want to work with anyone else.
Or, just push/pull from a local Truecrypt archive and rsync it to a remote location.
rsync.net has a nice description of both - http://www.rsync.net/resources/howto/duplicity.html
http://www.rsync.net/products/encrypted.html
- apparently Truecrypt containers still rsync well.
Variant A
Use a distributed VCS and transport changes direct between different clients over encrypted links. In this scenario is no attackable central server.
For example with mercurial you can use the internal web server, and connect the clients via VPN. Or you can bundle the change sets and distribute them with encrypted mails.
Variant B
You can export an encrypted hard drive partition over the network and mount it on the client side, and run the VCS on the client side. But this approach has lot's of problems, like:
possible data loss when two different clients try to access the VCS at the same time
the link itself must be secured against fraudulent write access (when the partition is shared via NFS it is very likely to end with a configuration where anyone can write to the shared partition, so even when there is no way for others to read the content, there is easily a hole to destroy the content)
There might be also other problems with variant B, so forget variant B.
Variant C
Like #Commodore Jaeger wrote, use a VCS on top of an encryption-aware network file system like AFS.
Similar to some comments above, a useful workaround may be to encrypt&zip the whole repository locally and synchronize it with the remote box. E.g. when using git, the whole repository is stored in directory '.git' within the project dir.
zip/encrypt the whole project dir
upload it to a server (unsure if handling .git alone is sufficient)
before you continue working on the project download this archive
decrypt/unpack and sync with git (locally)
This can be done with a simple shell line script more comfortable.
pro: You can use whatever encryption is appropriate as well as every VCS which supports local repositories.
cons: lacks obviously some VCS features when several people want to upload their code base (the merge situation) - although, perhaps, this could be fixed by avoiding overwrite remotely and merging locally (which introduces locking, this is where the nightmare starts)
Nevertheless, this solution is a hack, not a proper solution.
Based on my understanding this cannot be done, because in SVN and other versioning systems, the server needs access to the plaintext in order to perform versioning.
Source safe stores data in Encrypted files. Wait, I take that back. They're obfuscated. And there's no other security other than a front door to the obfuscation.
C'mon, it's monday.
I was wondering what security issues appear when the end user of a website can upload files to the server.
For instance if my website allows the users to upload a profile picture, and one user uploads something harmful instead, what could happen? What kind of security should I set up to prevent attacks like this? I'm talking here about images, but what about the case where a user can upload anything into a file-vault kind of application?
It's more a general question than a question about a specific situation, so what are the best practices in that situation? What do you usually do?
I suppose: type validation on upload, different permissions for uploaded files... what else?
EDIT: To clear up the context, I am thinking about a web application where a user can upload any kind of file and then display it in the browser. The file would be stored on the server. The users are whoever uses the website, so there is no trust involved.
I am looking for general answers that could apply for different languages/framework and production environments.
Your first line of defense will be to limit the size of uploaded files, and kill any transfer that is larger than that amount.
File extension validation is probably a good second line of defense. Type validation can be done later... as long as you aren't relying on the (user-supplied) mime-type for said validation.
Why file extension validation? Because that's what most web servers use to identify which files are executable. If your executables aren't locked down to a specific directory (and most likely, they aren't), files with certain extensions will execute anywhere under the site's document root.
File extension checking is best done with a whitelist of the file types you want to accept.
Once you validate the file extension, you can then check to verify that said file is the type its extension claims, either by checking for magic bytes or using the unix file command.
I'm sure there are other concerns that I missed, but hopefully this helps.
Assuming you're dealing with only images, one thing you can do is use an image library to generate thumbnails/consistent image sizes, and throw the original away when you're done. Then you effectively have a single point of vulnerability: your image library. Assuming you keep it up-to-date, you should be fine.
Users won't be able to upload zip files or really any non-image file, because the image library will barf if it tries to resize non-image data, and you can just catch the exception. You'll probably want to do a preliminary check on the filename extension though. No point sending a file through the image library if the filename is "foo.zip".
As for permissions, well... don't set the execute bit. But realistically, permissions won't help protect you much against malicious user input.
If your programming environment allows it, you're going to want to run some of these checks while the upload is in progress. A malicious HTTP client can potentially send a file with an infinite size. IE, it just never stops transmitting random bytes, resulting in a denial of service attack. Or maybe they just upload a gig of video as their profile picture. Most image file formats have a header at the beginning as well. If a client begins to send a file that doesn't match any known image header, you can abort the transfer. But that's starting to move into the realm of overkill. Unless you're Facebook, that kind of thing is probably unnecessary.
Edit
If you allow users to upload scripts and executables, you should make sure that anything uploaded via that form is never served back as anything other than application/octet-stream. Don't try to mix the Content-Type when you're dealing with potentially dangerous uploads. If you're going to tell users they have to worry about their own security (that's effectively what you do when you accept scripts or executables), then everything should be served as application/octet-stream so that the browser doesn't attempt to render it. You should also probably set the Content-Disposition header. It's probably also wise to involve a virus scanner in the pipeline if you want to deal with executables. ClamAV is scriptable and open source, for example.
size validation would be useful too, wouldn't want someone to intentionally upload a 100gb fake image just out of spite now would you :)
Also, you may want to consider something to prevent people from using your bandwidth just for a easy way to host images (I would mostly be concerned with hosting of illegal stuff). Most people would use imageshack for temp image hosting anyway.
For further reading, there's a great article by Acunetix on Why File Upload Forms are a Major Security Threat
With more context, it would be easier to know where the vulberabilities may lie.
If the data could be stored in a database (sounds like it won't be), then you should guard against SQL Injection attacks.
If the data could be displayed in a browser (sounds like it would be), then you may need to guard against HTML/CSS Injection attacks.
If you're using scripting languages (e.g., PHP) on the server, then you may need to guard against injection attacks against those specific languages. With compiled server code (or a poor scripting implementation), there's the chance of buffer overrun attacks.
Don't overlook user data security, too: Can your users trust you to prevent their data from being compromised?
EDIT: If you really want to cover all bases, consider the risks of JPEG and WMF security holes. These could be exploited if a malicious user can upload the files from one system, and then views the files -- or persuades another user to view the files -- from another system.
Size of the content
Restricting certain file types (.jpeg, .png etc., white-listed file types should only be allowed)
file tampering (for ex: a site supporting foreign languages, certain encoding is allowed. the hacker may take advantage of this and adds any script/malicious code encoded and appends to the original file and tries to upload)