Tim Berners-Lee recently announced Solid.
How much is this different from ipfs and will it be possible to use them together?
The technical spec makes it sound as though it does not compete with IPFS, since everything seems to be done in the usual single-server-over-HTTP regime, which makes me think that it should be able to be used on top of IPFS instead with minimal pain. Really the decentralized part seems to be the access to the data, not the storage of the data itself, which is a key value-add for IPFS.
One unfortunate thing is that they have come up with their own version of IPLD called Linked Data, so I'm not sure how that would interface with IPFS's content addressing solution.
I expect the first place more answers will show up is this forum thread.
Solid and IPFS are compatible but have different trade offs.
Both systems are driven by URIs, and typically Solid will use http style URIs, but is it not limited to them. IPFS will use ipfs type URIs, which play nicely with linked data.
The advantage of ipfs URIs is that they are content addressable and therefore can be long lived, mirrored and findable on a P2P network without the need for DNS. The advantage of http URIs is that they have a large network effect, lots of tooling, and most devices are able to interact with them without extra installation.
Both teams collaborate in a friendly way and even share developers. Hopefully as IPFS grows in popularity both systems can offer more choice to end users as a way to store data. Solid apps will be able to benefit from both types of data, and even mix them together.
Related
I'm used to working with user-uploaded files to the same server, and transferring my own files to a remote server. But not transferring user-uploaded files to a remote server.
I'm looking for the best (industry) practice for selecting a transfer protocol in this regard.
My application is running Django on a Linux Server and the files live on a Windows Server.
Does it not matter which protocol I choose as long as it's secure (FTPS, SFTP, HTTPS)? Or is one better than the other in terms of performance/security specifically in regards to user-uploaded files?
Please do not link to questions that explain the differences of protocols, I am asking specifically in the context of user-uploaded files.
As long as you choose a standard protocol that provides (mutual) authentication, encryption and message authentication, there is not much difference security-wise. If all of this is provided by a layer of TLS in your chosen protocol (like in all of your examples), you can't make a big mistake on a design level (but implementation is key, many security bugs are bugs of implementation, and not design flaws). Such protocols might differ in the supported list of algorithms for different purposes though.
Performance-wise there can be quite significant differences, it depends on what you want to optimize for. If you choose HTTPS, you won't be able to keep a connection open for a long time, and would most probably have to bear the overhead of the whole connection setup with authentication and everything, for every transmitted file. (Well, you can actually keep a https connection open, but that would be quite a custom implementation for such file uploads.) Choosing FTPS/SFTP you will be able to keep a connection open and transmit as many files as you want, but would probably have to have more complex error handling logic (sometimes connections terminate without the underlying sockets knowing about it for a while and so on). So in short I think HTTPS would be more resilient, but secure FTP would be more performant for many small files.
It's also an architecture question, by using HTTPS, you would be able to implement all of this in your application code, while something like FTP would mean dependence on external components, which might be important from an operational point of view (think about how this will actually be deployed and whether there is already a devops function to manage proper operations).
Ultimately it's just a design decision you have to make, the above is just a few things that came to mind without knowing all the circumstances, and not at all a comprehensive list of things to consider.
With IPFS being distributed p2p storage and sharing, isn't there then a chance that someone could store something illegal on your machine if you are an IPFS provider?
Is there some mechanism that IPFS systems use to prevent this? How would someone even know if illegal content is stored on their machine, especially if they are only storing a part of the file?
I want to run an IPFS node on my machine, but I am unsure if I have to worry about malicious actors using my IPFS node.
The law is very clear here, you are not responsible for caching partial/complete files/metadata in almost all cases.
distributed p2p storage and sharing
No, it works more like in BitTorrent protocol and less like in TV series "Silicon Valley". You need to share a file and somebody will need to find a hash to download a file (the difference is that .torrent file was mostly preferred over magnet hash before around 2015 in BitTorrent, while in IPFS hashes are the only way, also the system is global, that means the complex hash function is used so that no collisions are possible (at least not in a billion years) and thus it can check for hashes over THE WHOLE network and thus do not store duplicate chunks of data that folder/file structure is reconstructed from).
The point here is that just like in BitTorrent you store no files you do not request, now just like in BitTorrent you can do IPFS BitSwap stuff to accelerate the swarms, that is what cloudflare-ipfs.com does and ipfs.infura.io and others (?). In BitTorrent such things also exist in particular to automatically attach to updated torrents that have the same hashes for file parts... That is very cool, but in IPFS it is done automatically. Also different servers exist that propagate .torrent file (a.k.a. magnet metadata) using just magnet hash. I believe even DHT crawlers play some role, like BTDigg or https://btdb.eu/, but not much of course, you can set up you own crawler (as Btdigg is open source) that will do precisely that: share metadata of torrents, that requires almost no resources... (You can even set up your own bootstrap supernode to create you OWN seperate DHT.) But is very cool to do as a lot of stuff can be found there. As I understand IPFS also does that by default, i.e. it stores some metadata to help data propagation. You can further read this:
https://discuss.ipfs.io/t/ipfs-propagation/4301
https://discuss.ipfs.io/t/how-fast-do-ipns-changes-propagate/311
https://docs.ipfs.io/concepts/bitswap/
There is also this: https://collab.ipfscluster.io/
I would like go get my head around how is best to set up a client server architecture where security is of up most importance.
So far I have the following which I hope someone can tell me if its good enough, or it there are other things I need to think about. Or if I have the wrong end of the stick and need to rethink things.
Use SSL certificate on the server to ensure the traffic is secure.
Have a firewall set up between the server and client.
Have a separate sql db server.
Have a separate db for my security model data.
Store my passwords in the database using a secure hashing function such as PBKDF2.
Passwords generated using a salt which is stored in a different db to the passwords.
Use cloud based infrastructure such as AWS to ensure that the system is easily scalable.
I would really like to know is there any other steps or layers I need to make this secure. Is storing everything in the cloud wise, or should I have some physical servers as well?
I have tried searching for some diagrams which could help me understand but I cannot find any which seem to be appropriate.
Thanks in advance
Hardening your architecture can be a challenging task and sharding your services across multiple servers and over-engineering your architecture for semblance security could prove to be your largest security weakness.
However, a number of questions arise when you come to design your IT infrastructure which can't be answered in a single SO answer (will try to find some good white papers and append them).
There are a few things I would advise which is somewhat opinionated backed up with my own thought around it.
Your Questions
I would really like to know is there any other steps or layers I need to make this secure. Is storing everything in the cloud wise, or should I have some physical servers as well?
Settle for the cloud. You do not need to store things on physical servers anymore unless you have current business processes running core business functions that are already working on local physical machines.
Running physical servers increases your system administration requirements for things such as HDD encryption and physical security requirements which can be misconfigured or completely ignored.
Use SSL certificate on the server to ensure the traffic is secure.
This is normally a no-brainer and I would go with a straight, "Yes"; however you must take into consideration the context. If you are running something such as a blog site or documentation-related website that does not transfer any sensitive information at any point in time through HTTP then why use HTTPS? HTTPS has it's own overhead, it's minimal, but it's still there. That said, if in doubt, enable HTTPS.
Have a firewall set up between the server and client.
That is suggested, you may also want to opt for a service such as CloudFlare WAF, I haven't personally used it though.
Have a separate sql db server.
Yes, however not necessarily for security purposes. Database servers and Web Application servers have different hardware requirements and optimizing both simultaneously is not very feasible. Additionally, having them on separate boxes increases your scalability quite a bit which will be beneficial in the long run.
From a security perspective; it's mostly another illusion of, "If I have two boxes and the attacker compromises one [Web Application Server], he won't have access to the Database server".
At foresight, this might seem to be the case but is rarely so. Compromising the Web Application server is still almost a guaranteed Game Over. I will not go into much detail into this (unless you specifically ask me to) however it's still a good idea to keep both services separate from eachother in their own boxes.
Have a separate db for my security model data.
I'm not sure I understood this, what security model are you referring to exactly? Care to share a diagram or two (maybe an ERD) so we can get a better understanding.
Store my passwords in the database using a secure hashing function such as PBKDF2.
Obvious yes; what I am about to say however is controversial and may be flagged by some people (it's a bit of a hot debate)—I recommend using BCrypt instead of PKBDF2 due to BCrypt being slower to compute (resulting in slower to crack).
See - https://security.stackexchange.com/questions/4781/do-any-security-experts-recommend-bcrypt-for-password-storage
Passwords generated using a salt which is stored in a different db to the passwords.
If you use BCrypt I would not see why this is required (I may be wrong). I go into more detail regarding the whole username and password hashing into more detail in the following StackOverflow answer which I would recommend you to read - Back end password encryption vs hashing
Use cloud based infrastructure such as AWS to ensure that the system is easily scalable.
This purely depends on your goals, budget and requirements. I would personally go for AWS, however you should read some more on alternative platforms such as Google Cloud Platform before making your decision.
Last Remarks
All of the things you mentioned are important and it's good that you are even considering them (most people just ignore such questions or go with the most popular answer) however there are a few additional things I want to point:
Internal Services - Make sure that no unrequired services and processes are running on server especially in productions. These services will normally be running old versions of their software (since you won't be administering them) that could be used as an entrypoint for your server to be compromised.
Code Securely - This may seem like another no-brainer yet it is still overlooked or not done properly. Investigate what frameworks you are using, how they handle security and whether they are actually secure. As a developer (and not a pen-tester) you should at least use an automated web application scanner (such as Acunetix) to run security tests after each build that is pushed to make sure you haven't introduced any obvious, critical vulnerabilities.
Limit Exposure - Goes somewhat hand-in-hand with my first point. Make sure that services are only exposed to other services that depend on them and nothing else. As a rule of thumb, keep everything entirely closed and open up gradually when strictly required.
My last few points may come off as broad. The intention is to keep a certain philosophy when developing your software and infrastructure rather than a permanent rule to tick on a check-box.
There are probably a few things I have missed out. I will update the answer accordingly over time if need be. :-)
Okay, so we have to store our clients` private medical records online and also the web site will have a lot of requests, so we have to use some scaling solutions.
We can have our own share of a datacenter and run something like Zend Server Cluster Manager on it, but services like Amazon EC2 look a lot easier to manage, and they are incredibly cheaper too. We just don't know if they are secure enough!
Are they?
Any better solutions?
More info: I know that there is a reference server and it's highly secured and without it, even the decrypted data on the cloud server would be useless. It would be a bunch of meaningless numbers that aren't even linked to each other.
Making the question more clear: Are there any secure storage and process service providers that guarantee there won't be leaks from their side?
First off, you should contact AWS and explain what you're trying to build and the kind of data you deal with. As far as I remember, they have regulations in place to accommodate most if not all the privacy concerns.
E.g., in Germany such thing is a called a "Auftragsdatenvereinbarung". I have no idea how this relates and translates to other countries. AWS offers this.
But no matter if you go with AWS or another cloud computing service, the issue stays the same. And therefor, whatever is possible is probably best answered by a lawyer and based on the hopefully well educated (and expensive) recommendation, I'd go cloud shopping, or maybe not. If you're in the EU, there are a ton of regulations especially in regards to medical records -- some countries add more to it.
From what I remember it's basically required to have end to end encryption when you deal with these things.
Last but not least security also depends on the setup and the application, etc..
For complete and full security, I'd recommend a system that is not connected to the Internet. All others can fail.
You should never outsource highly sensitive data. Your company and only your company should have access to it - in both software and hardware terms. Even if your hoster is generally trusted someone there might just steal hardware.
Depending on the size of your company you should have your custom servers - preferable even unaccessible for the technicans in your datacenter (supposing you don't own the datacenter ;).
So the more important the data is, the less foreign people should have access to it in any means. In the best case you can name all people that have access to them in any way.
(Update: This might not apply to anonymous data, but as you're speaking of customers I don't think that applies here?)
(On a third thought: There're are probably laws to take into consideration of how you have to handle that kind of information ;)
I'm interested in the history of distributed, collaborative, cross-organisational programming paradigms - web services and SOA are de-facto now, but what came before? What models have been superceded by SOA?
Thanks
Well, I suppose there was RPC - which is really what SOAP is, only they didn't piggy-back the data payload on top of a standard protocol (http in SOAP's case). So CORBA and DCE-RPC and ONC RPC all did the same thing, but only over internal networks, not over the internet.
There was also EDI as a 'standard' for exchanging data between disparate entities. This was effectively a way of defining what the data payload would look like (similar the the XML part of SOAP).
But these are still not SOAs really, they provide the same functionality but the big difference was how people thought of using them. Once you could write a machine-to-machine 'website' and have different machines talk to each other through them, it took off. You could do it before using CORBA, say, but it wasn't as easy or as widely known about. You can tell this has happened by the fact we have several terms used for effectively the same thing - SOA, SaaS, Web Services... all the same thing (but lots of money to be made 'consulting' on the difference ;) )
Maybe Silos?
...where services are just not shared across an enterprise, at least in a standard way. This is why products like BizTalk are used: to get silos to talk to each other via standard interfaces.
I don't really think you'll find anything that's been superceded by SOA. You will find that there's been progress in organizing computer programs to take advantage of the SOA type principles. As for programming models that have been in reasonably common use, well, let's see... CORBA, RPC, more generic client-server applications. Of course, computer-to-computer communications were preceded by process-to-process communication using a wide variety of conventions.
SOA as a philosophy of breaking large problems into smaller ones and then composing the results has been known and applied since humans started making bricks instead of building complete walls. Of course, that was mostly implicit. Explicit statements for SOA really started to come about with CORBA and, while SOA is independent of Web Services, the advent of HTTP and XML, and then SOAP, really started to make development of non-specialized "services" easier, more worthwhile and thus common.
This pdf A Note on Distributed Computing should be an interesting read. It is pre-SOA and would give an idea of the history up to that point (1994).
I would say distributed object technology. And before it remote procedure calls.
RPC is one of the earlier approaches and gained popularity from the Sun implementation. One of the famous uses is NFS (network file system).
As object oriented programming became more popular, distributed objects followed. Most important was Microsoft DCOM (and later COM+) and, more industry wide, CORBA.
SOA is a divide and conquer approach that is critically dependent on the concept of services. Which is different from objects as used by CORBA et al, as well as being different from resources as in REST.
Objects are created and their lifetime is typically controlled by the client. On the other hand, services are assumed to be always there provided by the server. This is one reason why SOA is not equivalent to distributed objects.
Services are also stateless, which means that the server when considering the response to a service request need not look at the history of interaction with the client. This was not a consideration when originally devising the RPC concept as scalability wasn't such an important issue then. Interestingly, large scale users of RPC did notice the relationship between scalability and statelessness. The NFS RFC explicitly mentions stateless servers, though with reliability as the main concern. Anyway, statelessness is one of the main difference between services and plain old RPC.
In short, no. I don't believe in the revisionist history of SOA being since the dawn of time. Any more than the universe being written in Lisp (or Perl for that matter). Nor is it equivalent to divide and conquer or division of labour.
SOA started as a concept at some point in the nineties. Overlapping with the development of CORBA. It is much harder to pinpoint an actual date or event and there are more than a few claims to the conceptualisation of it.