I’m looking at creating a P2P system. During initial research, I’m reading from Peer-to-Peer – Harnessing the Power of Disruptive Technologies. That book states “a fully decentralized approach to instant messaging would not work on today's Internet.” Mostly blaming firewalls and NATs. The copyright is 2001. Is this information old or still correct?
It's still largely correct. Most users still are behind firewalls or home routers that block incoming connections. Those can be opened easier today than in 2001 (using uPnP for example, requiring little user interaction and knowledge) but most commercial end-user-targeting applications - phone (Skype, VoIP), chat (the various Messengers), remote control - are centralized solutions to circumvent firewall problems.
I would say that it is just plain wrong, both now and then. Yes, you will have many nodes that will be firewalled, however, you will also have a significant number who are not. So, if end-to-end encryption is used to protect the traffic from snooping, then you can use non-firewalled clients to act as intermediaries between two firewalled clients that want to chat.
You will need to take care, however, to spread the load around, so that a few unfirewalled clients aren't given too much load.
Skype uses a similar idea. They even allow file transfers through intermediaries, though they limit the through-put so as not to over load the middle-men.
That being said, now in 2010, it is a lot easier to punch holes in firewalls than it was in 2001, as most routers will allow you to automate the opening of ports via UPNP, so you are likely to have a larger pool of unfirewalled clients to work with.
Firewalls and NATs still commonly disrupt direct peer-to-peer communication between home-based PCs (and also between home-based PCs and corporate desktops).
They can be configured to allow particular peer-to-peer protocols, but that remains a stumbling block for most unsavvy users.
I think the original statement is no longer correct. But the field of Decentralized Computing is still in its infancy, with little serious contenders.
Read this interesting post on ZeroTier (thanks to #joehand): The State of NAT Traversal:
NAT is Traversable
In reading the Internet chatter on this subject I've been shocked by how many people don't really understand this, hence the reason this post was written. Lots of people think NAT is a show-stopper for peer to peer communication, but it isn't. More than 90% of NATs can be traversed, with most being traversable in reliable and deterministic ways.
At the end of the day anywhere from 4% (our numbers) to 8% (an older number from Google) of all traffic over a peer to peer network must be relayed to provide reliable service. Providing relaying for that small a number is fairly inexpensive, making reliable and scalable P2P networking that always works quite achievable.
I personally know of Dat Project, a decentralized data sharing toolkit (based on their hypercore protocol for P2P streaming).
From their Dat - Distributed Dataset Synchronization And Versioning paper:
Peer Connections
After the discovery phase, Dat should have a list of
potential data sources to try and contact. Dat uses
either TCP, UTP, or HTTP. UTP is designed to not
take up all available bandwidth on a network (e.g. so
that other people sharing wifi can still use the Inter-
net), and is still based on UDP so works with NAT
traversal techniques like UDP hole punching.
HTTP is supported for compatibility with static file servers and
web browser clients. Note that these are the protocols
we support in the reference Dat implementation, but
the Dat protocol itself is transport agnostic.
Furthermore you can use it with Bittorrent DHT. The paper also contains some references to other technologies that inspired Dat.
For implementation of peer discovery, see: discovery-channel
Then there is also IPFS, or 'The Interplanetary File System' which is currently best positioned to become a standard.
They have extensive documentation on their use of DHT and NAT traversal to achieve decentralized P2P.
The session messenger seem to have solved the issue with a truly decentralized p2p messenger by using a incentivized mixnet to relay and store messages. Its a fork of the Signal messenger with a mixnet added in. https://getsession.org -- whitepaper: https://getsession.org/wp-content/uploads/2020/02/Session-Whitepaper.pdf
It's very old and not correct. I believe there is a product out called Tribler (news article) which enables BitTorrent to function in a fully decentralized way.
If you want to go back a few years (even before that document) you could look at Windows. Windows networking used to function in a fully decentralized way. In some cases it still does.
UPNP is also decentralized in how it determines available devices on your local network.
In order to be decentralized you need to have a way to locate other peers. This can be done proactively by scanning the network (time consuming) or by having some means of the clients announcing that they are available.
The announcements can be simple UDP packets that get broadcast every so often to the subnet which other peers listen for. Another mechanism is broadcasting to IIRC channels (most common for command and control of botnets), etc. You might even use twitter or similar services. Use your imagination here.
Firewalls don't really play a part because they almost always leave open a few ports, such as 80 (http). Obviously you couldn't browse the network if that was closed. Now if the firewall is configured to only allow connections that originated from internal clients, then you'd have a little more work to do. But not much.
NATs are also not a concern for similiar issues.
Related
While the are many social networks in the wild, most rely on data stored on a central site owned by a third party.
I'd like to build a solution, where data remains local on member's systems. Think of the project as an address book, which automagically updates contact's data as soon a a contact changes its coordinates. This base idea might get extended later on...
Updates will be transferred using public/private key cryptography using a central host. The sole role of the host is to be a store and forward intermediate. Private keys remain private on each member's system.
If two client are both online and a p2p connection could be established, the clients could transfer data telegrams without the central host.
Thus, sender and receiver will be the only parties which are able create authentic messages.
Questions:
Do exist certain protocols which I should adopt?
Are there any security concerns I should keep in mind?
Do exist certain services which should be integrated or used somehow?
More technically:
Use e.g. Amazon or Google provided services?
Or better use a raw web-server? If yes: Why?
Which algorithm and key length should be used?
UPDATE-1
I googled my own question title and found this academic project developed 2008/09: http://www.lifesocial.org/.
The solution you are describing sounds remarkably like email, with encrypted messages as the payload, and an application rather than a human being creating the messages.
It doesn't really sound like "p2p" - in most P2P protocols, the only requirement for central servers is discovery - you're using store & forward.
As a quick proof of concept, I'd set up an email server, and build an application that sends emails to addresses registered on that server, encrypted using PGP - the tooling and libraries are available, so you should be able to get that up and running in days, rather than weeks. In my experience, building a throw-away PoC for this kind of question is a great way of sifting out the nugget of my idea.
The second issue is that the nature of a social network is that it's a network. Your design may require you to store more than the data of the two direct contacts - you may also have to store their friends, or at least the public interactions those friends have had.
This may not be part of your plan, but if it is, you need to think it through early on - you may end up having to transmit the entire social graph to each participant for local storage, which creates a scalability problem....
The paper about Safebook might be interesting for you.
Also you could take a look at other distributed OSN and see what they are doing.
None of the federated networks mentioned on http://en.wikipedia.org/wiki/Distributed_social_network is actually distributed. What Stefan intends to do is indeed new and was only explored by some proprietary folks.
I've been thinking about the same concept for the last two years. I've finally decided to give it a try using Python.
I've spent the better part of last night and this morning writing a sockets communication script & server. I also plan to remove the central server from the equation as it's just plain cumbersome and there's no point to it when all the members could keep copies of their friend's keys.
Each profile could be accessed via a hashed string of someone's public key. My social network relies on nodes and pods. Pods are computers which have their ports open to the network. They help with relaying traffic as most firewalls block incoming socket requests. Nodes store information and share it with other nodes. Each node will get a directory of active pods which may be used to relay their traffic.
The PeerSoN project looks like something you might be interested in: http://www.peerson.net/index.shtml
They have done a lot of research and the papers are available on their site.
Some thoughts about it:
protocols to use: you could think exactly on P2P programs and their design
security concerns: privacy. Take a great care to not open doors: a whole system can get compromised 'cause you have opened some door.
services: you could integrate with the regular social networks through their APIs
People will have to install a program in their computers and remeber to open it everytime, like any P2P client. Leaving everything on a web-server has a smaller footprint / necessity of user action.
Somehow you'll need a centralized server to manage the searches. You can't just broadcast the internet to find friends. Or you'll have to rely uppon email requests to add somenone, and to do that you'll need to know the email in advance.
The fewer friends /contacts use your program, the fewer ones will want to use it, since it won't have contact information available.
I see that your server will be a store and forward, so the update problem is solved.
I’m wondering if there are ways to achieve flow-based traffic shaping with linux.
Traditional traffic shaping approaches seem be based on creating classes for specific protocols or types of packets (such as ssh, http, SYN or ACK) that need high troughput.
Here I want to see every TCP connection as a flow characterized by a certain data-rate.
There’ll be
quick flows such as interactive ssh or IRC chat and
slow flows (bulk data) such as scp or http file transfers
Now I’m looking for a way to characterize / classify an incoming packet to one of these classes, so I can run a tc based traffic shaper on it. Any hints?
Since you mention a dedicated machine I'll assume that you are managing from a network bridge and, as such, have access to the entirety of the packet for the lifetime it is in your system.
First and foremost: throttling at the receiving side of a connection is meaningless when you are speaking of link saturation. By the time you see the packet it has already consumed resources. This is true even if you are a bridge; you can only realistically do anything intelligent on the egress interface.
I don't think you will find an off-the-shelf product that is going to do exactly what you want. You are going to have to modify something like dummynet to be dynamic according to rules you derive during execution or you are going to have to program a dynamic software router using some existing infrastructure. One I am familiar with is Click modular router, but there are others. I really dont know how things like tc and ipfw will react to being configured/reconfigured with high frequency - I suspect poorly.
There are things that you should address ahead of time, however. Things that are going to make this task difficult regardless of the implementation. For instance,
How do you plan on differentiating between scp bulk and ssh interactive behavior? Will you monitor initial behavior and apply a rule based on that?
You mention HTTP-specific throttling; this implies DPI. Will you be able to support that on this bridge/router? How many classes of application traffic will you support?
How do you plan on handling contention? (you allot for 'bulk' flows to each get 30% of the capacity but get 10 'bulk' flows trying to consume)
Will you hard-code the link capacity or measure it? Is it fixed or will it vary?
In general, you can get a fairly rough idea of 'flow' by just hashing the networking 5-tuple. Once you start dealing with applications semantics, however, all bets are off and you need to plow through packet contents to get what you want.
If you had a more specific purpose it might render some of these points moot.
Anytime a username/password authentication is used, the common wisdom is to protect the transport of that data using encryption (SSL, HTTPS, etc). But that leaves the end points potentially vulnerable.
Realistically, which is at greater risk of intrusion?
Transport layer: Compromised via wireless packet sniffing, malicious wiretapping, etc.
Transport devices: Risks include ISPs and Internet backbone operators sniffing data.
End-user device: Vulnerable to spyware, key loggers, shoulder surfing, and so forth.
Remote server: Many uncontrollable vulnerabilities including malicious operators, break-ins resulting in stolen data, physically heisting servers, backups kept in insecure places, and much more.
My gut reaction is that although the transport layer is relatively easy to protect via SSL, the risks in the other areas are much, much greater, especially at the end points. For example, at home my computer connects directly to my router; from there it goes straight to my ISPs routers and onto the Internet. I would estimate the risks at the transport level (both software and hardware) at low to non-existant. But what security does the server I'm connected to have? Have they been hacked into? Is the operator collecting usernames and passwords, knowing that most people use the same information at other websites? Likewise, has my computer been compromised by malware? Those seem like much greater risks.
My question is this: should I be worried if a service I'm using or developing doesn't use SSL? Sure, it's a low-hanging fruit, but there are a lot more fruit up above.
By far the biggest target in network security is the Remote server. In the case of a Web Browser and an HTTP Server, the most common threats are in the form of XSS and XSRF. Remote Servers are juicy targets for other protocols as well because they often have an open port which is globally accessable.
XSS can be used to bypass the Same-Origin Policy. This can be used by a hacker to fire off xmlhttprequests to steal data from a remote server. XSS is wide spread and easy for hackers to find.
Cross-Site Request Forgeries (XSRF) can be used to change a the password for an account on a remote server. It can also be used to Hijack mail from your gmail account. Like XSS, this vulnerability type is also wide spread and easy to find.
The next biggest risk is the "Transport layer", but I'm not talking about TCP. Instead you should worry more about the other network layers. Such as OSI Layer 1, the Physical Layer such as 802.11b. Being able to sniff the wireless traffic at your local cafe can be incredibly fruitful if applications aren't properly using ssl. A good example is the Wall of Sheep. You should also worry about OSI Layer 2, the Data Link Layer, ARP spoofing can be used to sniff a switched wired network as if it where a wireless broadcast. OSI Layer 4 can be compromised with SSLStrip. Which can still be used to this day to undermine TLS/SSL used in HTTPS.
The next up is End-user device. Users are dirty, if you every come across one of these "Users" tell them to take a shower! No seriously, users are dirty because they have lots of: Spyware/Viruses/Bad Habits.
Last up is Transport devices. Don't get me wrong, this is an incredibly juicy target for any hacker. The problem is that serious vulnerabilities have been discovered in Cisco IOS and nothing has really happened. There hasn't been a major worm to affect any router. At the end of the day its unlikely that this part of your network will be directly compromised. Although, if a transport device is responsible for your security, as in the case of a hardware firewall, then mis-configurations can be devastating.
Let's not forgot things like:
leaving logged-in sessions unattended
writing passwords on stickies
The real risk is stupid users.
They leave their terminals open when they go to lunch.
Gullible in front of any service personell doing "service".
Storing passords and passphrases on notes next to the computer.
In great numbers someone someday will install the next Killer App (TM) which takes down the network.
Through users, any of the risks you mention can be accomplished trough social engineering.
Just because you think the other parts of your communications might be unsafe doesn't mean you shouldn't protect the bits that you can protect as best you can.
The things you can do are:
Protect your own end
give your message a good shot at
surviving the
internet, by wrapping it up warm.
try to make sure that the other end is not an impostor.
The transport is where more people can listen-in than at any other stage. (There could only be a maximum 2 or 3 people standing behind you while you type in your password, but dozens could be plugged into the same router, doing a man-in-the-middle attack, hundreds could be sniffing your wifi-packets)
If you don't encrypt your message then anyone along the way can get a copy.
If you're communicating with a malicious/negligent end-point, then you're in trouble no matter what security you use, you have to avoid that scenario (authenticate them to you as well as you to them (server-certs))
None of these problems have been solved, or anywhere close. But going out there naked is hardly the solution.
I'm looking to implement a basic product activation scheme such that when the program is launched it will contact our server via http to complete the activation. I'm wondering if it is a big problem (especially with bigger companies or educational organizations) that firewalls will block the outgoing http request and prevent activation. Any idea how big as issue this may be?
In my experience when HTTP traffic is blocked by a hardware firewall then there is more often than not a proxy server which is used to browse the internet. Therefore it is good practice to allow the user to enter proxy and authentication details.
The amount of times I have seen applications fail due to not using a corporate proxy server and therefore being blocked by the firewall astonishes me.
there are personal software solutions to purposely block outgoing connections. Check out little snitch. This program can set up rules that explicitly block your computer from making connections to certain domains, IP's and / or Ports. A common use for this program is to stop one's computer from "phoning home" to an activation server.
I can't tell you how prevalent this will be, sorry. But I can give you one data point.
In this company Internet access is granted on an as needed basis. There is one product I have had to support which is wonderful for its purpose and reasonably priced, but I will never approve its purchase again - the licensing is too much of a hassle to be worth it.
I'd say that it may not be common, but if any one of your customers is a business it's likely that you will encounter someone who tryes to run your software behind a restricted internet connection or a proxy. Your software will need to handle this situation, otherwise you will ahve a pissed off customer who cannot use your product, and you will lose the sale for sure.
If you are looking for a third party tool, I've used InstallKey (www.lomacons.com) for product activations. This thing has functionaility that allows for validating with and without an internet connection.
I need to set up a protocol for fast command/response interactions. My instinct tells me to just knock together a simple protocol with CRLF separated ascii strings like how SMTP or POP3 works, and tunnel it through SSH/SSL if I need it to be secured.
While I could just do this, I'd prefer to build on an existing technology so people could use a friendly library rather than the socket library interface the OS gives them.
I need...
Commands and responses passing structured data back and forth. (XML, S expressions, don't care.)
The ability for the server to make unscheduled notifications to the client without being polled.
Any ideas please?
If you just want request/reply, HTTP is very simple. It's already a request/response protocol. The client and server side are widely implemented in most languages. Scaling it up is well understood.
The easiest way to use it is to send commands to the server as POST requests and for the server to send back the reply in the body of the response. You could also extend HTTP with your own verbs, but that would make it more work to take advantage of caching proxies and other infrastructure that understands HTTP.
If you want async notifications, then look at pub/sub protocols (Spread, XMPP, AMQP, JMS implementations or commercial pub/sub message brokers like TibcoRV, Tibco EMS or Websphere MQ). The protocol or implementation to pick depends on the reliability, latency and throughput needs of the system you're building. For example, is it ok for notifications to be dropped when the network is congested? What happens to notifications when a client is off-line -- do they get discarded or queued up for when the client reconnects.
AMQP sounds promising. Alternatively, I think XMPP supports much of what you want, though with quite a bit of overhead.
That said, depending on what you're trying to accomplish, a simple ad hoc protocol might be easier.
How about something like SNMP? I'm not sure if it fits exactly with the model your app uses, but it supports both async notify and pull (i.e., TRAP and GET).
That's a great question with a huge number of variables to consider, and the question only mentioned a few them: packet format, asynchronous vs. synchronized messaging, and security. There are many, many others one could think about. I suggest going through a description of the 7-layer protocol stack (OSI/ISO) and asking yourself what you need at those layers, and whether you want to build that layer or get it from somewhere else. (You seem mostly interested in layer 6 and 7, but also mentioned bits of lower layers.)
Think also about whether this is in a safety-critical application or part of a system with formal V&V. Really good, trustworthy communication systems are not easy to design; also an "underpowered" protocol can put a lot of coding burden on application to do error-recovery.
Finally, I would suggest looking at how other applications similar to yours do the job (check open source, read books, etc.) Also useful is the U.S. Patent Office database, etc; one can get great ideas just from reading the description of the communication problem they were trying to solve.