Bootstraping in peer 2 peer systems [duplicate] - p2p

This question already has answers here:
Peer to Peer: Methods of Finding Peers
(13 answers)
Closed 7 years ago.
Simple question. How does client connect to a p2p system if it does not know any already connected nodes? Is it event possible? Like in torrent magnet links, or pastry?

Yes, it is possible. One technique is to use a well-known URL, where the peer can fetch a list of IP addresses of (bootstrap/central) peers currently reachable online. Another technique is to send a multicast message on the LAN and hope that another peer on the same LAN is already connected to the P2P network. Then, addresses of other peers can be fetched from it.

There is an academic paper on this subject which is quite interesting. Addressing the P2P bootstrap problem for small overlay networks 2010 by David Isaac Wolinsky, Pierre St. Juste, P. Oscar Boykin, and Renato Figueiredo.
It explores using already existing P2P-services (like XMPP, BitTorrent, Gnutella, and Brunet) for bootstrapping smaller, private overlays.
This is the abstract:
"P2P overlays provide a framework for building distributed applications consisting of few to many resources with features including self-configuration, scalability, and resilience to node failures. Such systems have been successfully adopted in large-scale services for content delivery networks, file sharing, and data storage. In small-scale systems, they can be useful to address privacy concerns and for network applications that lack dedicated servers. The bootstrap problem, finding an existing peer in the overlay, remains a challenge to enabling these services for small-scale P2P systems. In large networks, the solution to the bootstrap problem has been the use of dedicated services, though creating and maintaining these systems requires expertise and resources, which constrain their usefulness and make them unappealing for small-scale systems. This paper surveys and summarizes requirements that allow peers potentially constrained by network connectivity to bootstrap small-scale overlays through the use of existing public overlays. In order to support bootstrapping, a public overlay must support the following requirements: a method for reflection in order to obtain publicly reachable addresses, so peers behind network address translators and firewalls can receive incoming connection requests; communication relaying to share public addresses and communicate when direct communication is not feasible; and rendezvous for discovering remote peers, when the overlay lacks stable membership. After presenting a survey of various public overlays, we identify two overlays that match the requirements: XMPP overlays, such as Google Talk and Live Journal Talk, and Brunet, a structured overlay based upon Symphony. We present qualitative experiences with prototypes that demonstrate the ability to bootstrap small-scale private structured overlays from public Brunet or XMPP infrastructures."

Related

Query DHT for the total number of active peers

Assume there is a P2P file sharing system which has no trackers but only a DHT.
How to know the number of all active peers uploading/downloading a specific file?
Is it just keep querying the DHT by get_peers to get new peers? Are there any better solution?
The distributed part in the DHT makes it hard to get the exact number of peers in a swarm from it. Technically it's also unneeded and not a very useful number, as its only necessary to get in contact with only one other peer in the swarm and then the PeerEXchange extension will give plenty more peers in a more efficient way than the DHT.
Some clients also support the BEP33 DHT scrape extension that can give a approximate number of peers registered in the DHT with a max capacity of ca 6000.
Unfortunately it's badly designed and has a vulnerability making it the currently most potent vector for UDP amplification attacks using the BitTorrent protocol. It has a BAF (Bandwidth Amplification Factor) of 13.4 The attack is called Distributed Reflective Denial of Service (DRDoS) and is decribed in this paper. If this vector starts to get used it may be necessary to speedly remove this extension from the protocol.

Private secured P2P Network

I know the concept of building a simple P2P network without any server. My problems is with securing the network. The network should have some administrative nodes. So there are two kinds of nodes:
Nodes with privileges
Nodes without privileges
The first question is: Can I assign some nodes more rights than others, like the privileges to send a broadcast message?
How can I secure the network of modified nodes that are trying to get privileges?
I'm really interested in answers and resources than can help me. It is important to me to understand this, and I'm happy to add further information if anything is unclear.
You seem lost, and I used to do research in this area, so I'll take a shot. I feel this question is borderline off-topic, but I tend to error toward leaving things open.
See the P2P networks Chord, CAN, Tapestry, and Pastry for examples of P2P networks as well as psuedo-code. These works are all based off distributed hash tables (DHTs) and have been around for over 10 years now. Many of them have open source implementations you can use.
As for "privileged nodes", your question contradicts itself. You want a P2P network, but you also want nodes with more rights than others. By definition, your network is no longer P2P because peers are no longer equally privileged.
Your question points to trust within P2P networks - a problem that academics have focused on since the introduction of (DHTs). I feel that no satisfactory answer has been found yet that solves all problems in all cases. Here are a few approaches which will help you:
(1) Bitcoin addresses malicious users by forcing all users within their network do perform computationally intensive work. For any member to forge bitcoins that would need more computational power than everyone to prove they had done more work than everyone else.
(2) Give privileges based on reputation. You can calculate reputation in any number of ways. One simple example - for each transaction in your system (file sent, database look up, piece of work done), the requester sends a signed acknowledgement (using private/public keys) to the sender. Each peer can then present the accumulation of their signed acknowledgements to any other peer. Any peer who has accumulated N acknowledgements (you determine N) has more privileges.
(3) Own a central server that hands out privileges. This one is the simplest and you get to determine what trust means for you. You're handing it out.
That's the skinny version - good luck.
I'm guessing that the administrative nodes are different from normal nodes by being able to tell other nodes what to do (and the regular nodes should obey).
You have to give the admin nodes some kind of way to prove themselves that can be verified by other nodes but not forged by them (like a policeman's ID). The Most standard way I can think of is by using TLS certificates.
In (very) short, you create couples of files called key and certificate. The key is secret and belongs to one identity, and the certificate is public.
You create a CA certificate, and distribute it to all of your nodes.
Using that CA, you create "administrative node" certificates, one for each administrative node.
When issuing a command, an administrative node presents its certificate to the "regular" node. The regular node, using the CA certificate you provided beforehand, can make sure the administrative node is genuine (because the certificate was actually signed by the CA), and it's OK to do as it asks.
Pros:
TLS/SSL is used by many other products to create a secure tunnel, preventing "man in the
middle" attacks and
impersonations
There are ready-to-use libraries and sample projects for TLS/SSL in practically every language, from .net to C.
There are revocation lists, to "cancel" certificates that have been stolen (although you'll have to find a way to distribute these)
Certificate verification is offline - a node needs no external resources (except for the CA certificate) for verification
Cons:
Since SSL/TLS is a widely-used system, there are many tools to exploit misconfigured / old clients / servers
There are some exploits found in such libraries (e.g. "heartbleed"), so you might need to patch your software a lot.
This solution still requires some serious coding, but it's usually better to rely on an existing and proven system than to go around inventing your own.

is there such thing a Bittorent passive tracking?

Hi I want to make an application that if given a torrent file (or hash) can give the number of peers without being active (i.e not responsible) in the process that allow the sharing of a file (for legal reason obviously). whether by being a "passive" (passive as define previously) tracker or a bittorrent client that counts "All time" peers (i.e. number of download for a torrent). Can it be done? I know some trackers keep track of download but I don't know if those who "seem not to" actually do as well. I look for something that can track the number of unique-ip transfers from when the torrent was added to the tracking system or something that count download (complete).
It's not possible to determine all peers just from a tracker. There can be multiple trackers for each torrent, and they may not store complete, fresh, or even truthful information. Additionally there's no obligation for peers to be honest with their trackers. There are also alternatives to centralized trackers, such as DHT and PEX. There's no guarantee that all peers are participating in the same DHT network. Peers might even establish disjoint PEX communities.
In short, you might make a best effort attempt at determining the total swarm participation for a particular torrent by checking trackers and querying DHT. But to be as thorough as the technology will allow, you'd actually have to participate in the swarm with all manner of transports and protocol extensions currently in use such as uTP and encryption, and scrape each peer for further peers and download states. Of course the BitTorrent community is familiar with such attempts to scrape data, and there a lot of security measures in place to prevent exploitation in this way. Examples include IP blocklists, and heuristics on peer behaviour.

Considerations regarding a p2p social network

While the are many social networks in the wild, most rely on data stored on a central site owned by a third party.
I'd like to build a solution, where data remains local on member's systems. Think of the project as an address book, which automagically updates contact's data as soon a a contact changes its coordinates. This base idea might get extended later on...
Updates will be transferred using public/private key cryptography using a central host. The sole role of the host is to be a store and forward intermediate. Private keys remain private on each member's system.
If two client are both online and a p2p connection could be established, the clients could transfer data telegrams without the central host.
Thus, sender and receiver will be the only parties which are able create authentic messages.
Questions:
Do exist certain protocols which I should adopt?
Are there any security concerns I should keep in mind?
Do exist certain services which should be integrated or used somehow?
More technically:
Use e.g. Amazon or Google provided services?
Or better use a raw web-server? If yes: Why?
Which algorithm and key length should be used?
UPDATE-1
I googled my own question title and found this academic project developed 2008/09: http://www.lifesocial.org/.
The solution you are describing sounds remarkably like email, with encrypted messages as the payload, and an application rather than a human being creating the messages.
It doesn't really sound like "p2p" - in most P2P protocols, the only requirement for central servers is discovery - you're using store & forward.
As a quick proof of concept, I'd set up an email server, and build an application that sends emails to addresses registered on that server, encrypted using PGP - the tooling and libraries are available, so you should be able to get that up and running in days, rather than weeks. In my experience, building a throw-away PoC for this kind of question is a great way of sifting out the nugget of my idea.
The second issue is that the nature of a social network is that it's a network. Your design may require you to store more than the data of the two direct contacts - you may also have to store their friends, or at least the public interactions those friends have had.
This may not be part of your plan, but if it is, you need to think it through early on - you may end up having to transmit the entire social graph to each participant for local storage, which creates a scalability problem....
The paper about Safebook might be interesting for you.
Also you could take a look at other distributed OSN and see what they are doing.
None of the federated networks mentioned on http://en.wikipedia.org/wiki/Distributed_social_network is actually distributed. What Stefan intends to do is indeed new and was only explored by some proprietary folks.
I've been thinking about the same concept for the last two years. I've finally decided to give it a try using Python.
I've spent the better part of last night and this morning writing a sockets communication script & server. I also plan to remove the central server from the equation as it's just plain cumbersome and there's no point to it when all the members could keep copies of their friend's keys.
Each profile could be accessed via a hashed string of someone's public key. My social network relies on nodes and pods. Pods are computers which have their ports open to the network. They help with relaying traffic as most firewalls block incoming socket requests. Nodes store information and share it with other nodes. Each node will get a directory of active pods which may be used to relay their traffic.
The PeerSoN project looks like something you might be interested in: http://www.peerson.net/index.shtml
They have done a lot of research and the papers are available on their site.
Some thoughts about it:
protocols to use: you could think exactly on P2P programs and their design
security concerns: privacy. Take a great care to not open doors: a whole system can get compromised 'cause you have opened some door.
services: you could integrate with the regular social networks through their APIs
People will have to install a program in their computers and remeber to open it everytime, like any P2P client. Leaving everything on a web-server has a smaller footprint / necessity of user action.
Somehow you'll need a centralized server to manage the searches. You can't just broadcast the internet to find friends. Or you'll have to rely uppon email requests to add somenone, and to do that you'll need to know the email in advance.
The fewer friends /contacts use your program, the fewer ones will want to use it, since it won't have contact information available.
I see that your server will be a store and forward, so the update problem is solved.

Fully Decentralized P2P?

I’m looking at creating a P2P system. During initial research, I’m reading from Peer-to-Peer – Harnessing the Power of Disruptive Technologies. That book states “a fully decentralized approach to instant messaging would not work on today's Internet.” Mostly blaming firewalls and NATs. The copyright is 2001. Is this information old or still correct?
It's still largely correct. Most users still are behind firewalls or home routers that block incoming connections. Those can be opened easier today than in 2001 (using uPnP for example, requiring little user interaction and knowledge) but most commercial end-user-targeting applications - phone (Skype, VoIP), chat (the various Messengers), remote control - are centralized solutions to circumvent firewall problems.
I would say that it is just plain wrong, both now and then. Yes, you will have many nodes that will be firewalled, however, you will also have a significant number who are not. So, if end-to-end encryption is used to protect the traffic from snooping, then you can use non-firewalled clients to act as intermediaries between two firewalled clients that want to chat.
You will need to take care, however, to spread the load around, so that a few unfirewalled clients aren't given too much load.
Skype uses a similar idea. They even allow file transfers through intermediaries, though they limit the through-put so as not to over load the middle-men.
That being said, now in 2010, it is a lot easier to punch holes in firewalls than it was in 2001, as most routers will allow you to automate the opening of ports via UPNP, so you are likely to have a larger pool of unfirewalled clients to work with.
Firewalls and NATs still commonly disrupt direct peer-to-peer communication between home-based PCs (and also between home-based PCs and corporate desktops).
They can be configured to allow particular peer-to-peer protocols, but that remains a stumbling block for most unsavvy users.
I think the original statement is no longer correct. But the field of Decentralized Computing is still in its infancy, with little serious contenders.
Read this interesting post on ZeroTier (thanks to #joehand): The State of NAT Traversal:
NAT is Traversable
In reading the Internet chatter on this subject I've been shocked by how many people don't really understand this, hence the reason this post was written. Lots of people think NAT is a show-stopper for peer to peer communication, but it isn't. More than 90% of NATs can be traversed, with most being traversable in reliable and deterministic ways.
At the end of the day anywhere from 4% (our numbers) to 8% (an older number from Google) of all traffic over a peer to peer network must be relayed to provide reliable service. Providing relaying for that small a number is fairly inexpensive, making reliable and scalable P2P networking that always works quite achievable.
I personally know of Dat Project, a decentralized data sharing toolkit (based on their hypercore protocol for P2P streaming).
From their Dat - Distributed Dataset Synchronization And Versioning paper:
Peer Connections
After the discovery phase, Dat should have a list of
potential data sources to try and contact. Dat uses
either TCP, UTP, or HTTP. UTP is designed to not
take up all available bandwidth on a network (e.g. so
that other people sharing wifi can still use the Inter-
net), and is still based on UDP so works with NAT
traversal techniques like UDP hole punching.
HTTP is supported for compatibility with static file servers and
web browser clients. Note that these are the protocols
we support in the reference Dat implementation, but
the Dat protocol itself is transport agnostic.
Furthermore you can use it with Bittorrent DHT. The paper also contains some references to other technologies that inspired Dat.
For implementation of peer discovery, see: discovery-channel
Then there is also IPFS, or 'The Interplanetary File System' which is currently best positioned to become a standard.
They have extensive documentation on their use of DHT and NAT traversal to achieve decentralized P2P.
The session messenger seem to have solved the issue with a truly decentralized p2p messenger by using a incentivized mixnet to relay and store messages. Its a fork of the Signal messenger with a mixnet added in. https://getsession.org -- whitepaper: https://getsession.org/wp-content/uploads/2020/02/Session-Whitepaper.pdf
It's very old and not correct. I believe there is a product out called Tribler (news article) which enables BitTorrent to function in a fully decentralized way.
If you want to go back a few years (even before that document) you could look at Windows. Windows networking used to function in a fully decentralized way. In some cases it still does.
UPNP is also decentralized in how it determines available devices on your local network.
In order to be decentralized you need to have a way to locate other peers. This can be done proactively by scanning the network (time consuming) or by having some means of the clients announcing that they are available.
The announcements can be simple UDP packets that get broadcast every so often to the subnet which other peers listen for. Another mechanism is broadcasting to IIRC channels (most common for command and control of botnets), etc. You might even use twitter or similar services. Use your imagination here.
Firewalls don't really play a part because they almost always leave open a few ports, such as 80 (http). Obviously you couldn't browse the network if that was closed. Now if the firewall is configured to only allow connections that originated from internal clients, then you'd have a little more work to do. But not much.
NATs are also not a concern for similiar issues.

Resources