In a distributed network such as Bitcoin, how do a node find its peers after initial connection? - p2p

Suppose that the client keeps a hardcoded table of some known, famous nodes in order to start the connection with the network. When that node initially connects to the entry point, it must then proceed to find its own peers. How is that done?
The node obviously can't just ask the initial nodes for their neighbors.

https://en.bitcoin.it/wiki/Satoshi_Client_Node_Discovery explains the process pretty well.
The gist is:
A central service to identify self's public IP address.
Hardcoded IRC channel, DNS domains or IP addresses to discover initial nodes.
Followed by each node regularly sharing their peers using specific advertisement messages.

I'm in particular wondering if something complex such as kademlia is required/used, or if we could have it as simple as randomly asking nodes neighbors and then neighbors neighbors for a while.
Both exist, the commonly used terms are structured and unstructured overlay network. The latter usually try to be a small world network.
Whether a simple network is sufficient depends on what you want to do with it. They generally work well enough for simply propagating updates to everyone or forming separate networks for each topic of interest. More complex things like address-based lookups on the other hand will benefit from structued overlays like kademlia.

Related

Query DHT for the total number of active peers

Assume there is a P2P file sharing system which has no trackers but only a DHT.
How to know the number of all active peers uploading/downloading a specific file?
Is it just keep querying the DHT by get_peers to get new peers? Are there any better solution?
The distributed part in the DHT makes it hard to get the exact number of peers in a swarm from it. Technically it's also unneeded and not a very useful number, as its only necessary to get in contact with only one other peer in the swarm and then the PeerEXchange extension will give plenty more peers in a more efficient way than the DHT.
Some clients also support the BEP33 DHT scrape extension that can give a approximate number of peers registered in the DHT with a max capacity of ca 6000.
Unfortunately it's badly designed and has a vulnerability making it the currently most potent vector for UDP amplification attacks using the BitTorrent protocol. It has a BAF (Bandwidth Amplification Factor) of 13.4 The attack is called Distributed Reflective Denial of Service (DRDoS) and is decribed in this paper. If this vector starts to get used it may be necessary to speedly remove this extension from the protocol.

Confused over nearID, farID, nearNonce etc

I am using Flash Media Server.
I am confused over various IDs
I am the nearID? And, the person on the other end is farID?
What is a nearNonce ID?
I found a high level architecture explanation that beautifully explained how P2P works overall.
Are there any more articles with detailed explanation of how all pieces fit together in the puzzle?
Finally, for peers to communicate they need to exchange peerIDs. Would using a remote shared object perform this task well or would suggest using some other kind of web service like XMPP?
Apologies for the many questions.
FMS programming can get very confusing. To tackle you last question, typically your peers are introduced via the FMS itself - in Server Side AS. One way to do this is to have your peers connect to a NetGroup, in which case they can discover other peers connected to the same group. You can also manually introduce 2 peers in the SSAS code.
One hard lesson I learned about NetGroups is that simply being connected to a group does not mean that you will receive notification when others join the same group. You only get notified when you gain a new neighbor, which is a direct connection within the group, vs a new non-neighbor peer in the group, which is an indirect connection through other peers. If you want to know when a peer joins a group that peer should announce themselves via a group broadcast.
I'm still learning this stuff, so take this all with a grain of salt :)

Considerations regarding a p2p social network

While the are many social networks in the wild, most rely on data stored on a central site owned by a third party.
I'd like to build a solution, where data remains local on member's systems. Think of the project as an address book, which automagically updates contact's data as soon a a contact changes its coordinates. This base idea might get extended later on...
Updates will be transferred using public/private key cryptography using a central host. The sole role of the host is to be a store and forward intermediate. Private keys remain private on each member's system.
If two client are both online and a p2p connection could be established, the clients could transfer data telegrams without the central host.
Thus, sender and receiver will be the only parties which are able create authentic messages.
Questions:
Do exist certain protocols which I should adopt?
Are there any security concerns I should keep in mind?
Do exist certain services which should be integrated or used somehow?
More technically:
Use e.g. Amazon or Google provided services?
Or better use a raw web-server? If yes: Why?
Which algorithm and key length should be used?
UPDATE-1
I googled my own question title and found this academic project developed 2008/09: http://www.lifesocial.org/.
The solution you are describing sounds remarkably like email, with encrypted messages as the payload, and an application rather than a human being creating the messages.
It doesn't really sound like "p2p" - in most P2P protocols, the only requirement for central servers is discovery - you're using store & forward.
As a quick proof of concept, I'd set up an email server, and build an application that sends emails to addresses registered on that server, encrypted using PGP - the tooling and libraries are available, so you should be able to get that up and running in days, rather than weeks. In my experience, building a throw-away PoC for this kind of question is a great way of sifting out the nugget of my idea.
The second issue is that the nature of a social network is that it's a network. Your design may require you to store more than the data of the two direct contacts - you may also have to store their friends, or at least the public interactions those friends have had.
This may not be part of your plan, but if it is, you need to think it through early on - you may end up having to transmit the entire social graph to each participant for local storage, which creates a scalability problem....
The paper about Safebook might be interesting for you.
Also you could take a look at other distributed OSN and see what they are doing.
None of the federated networks mentioned on http://en.wikipedia.org/wiki/Distributed_social_network is actually distributed. What Stefan intends to do is indeed new and was only explored by some proprietary folks.
I've been thinking about the same concept for the last two years. I've finally decided to give it a try using Python.
I've spent the better part of last night and this morning writing a sockets communication script & server. I also plan to remove the central server from the equation as it's just plain cumbersome and there's no point to it when all the members could keep copies of their friend's keys.
Each profile could be accessed via a hashed string of someone's public key. My social network relies on nodes and pods. Pods are computers which have their ports open to the network. They help with relaying traffic as most firewalls block incoming socket requests. Nodes store information and share it with other nodes. Each node will get a directory of active pods which may be used to relay their traffic.
The PeerSoN project looks like something you might be interested in: http://www.peerson.net/index.shtml
They have done a lot of research and the papers are available on their site.
Some thoughts about it:
protocols to use: you could think exactly on P2P programs and their design
security concerns: privacy. Take a great care to not open doors: a whole system can get compromised 'cause you have opened some door.
services: you could integrate with the regular social networks through their APIs
People will have to install a program in their computers and remeber to open it everytime, like any P2P client. Leaving everything on a web-server has a smaller footprint / necessity of user action.
Somehow you'll need a centralized server to manage the searches. You can't just broadcast the internet to find friends. Or you'll have to rely uppon email requests to add somenone, and to do that you'll need to know the email in advance.
The fewer friends /contacts use your program, the fewer ones will want to use it, since it won't have contact information available.
I see that your server will be a store and forward, so the update problem is solved.

Chat program without a central server

I'm developing a chat application (in VB.Net). It will be a "secure" chat program. All traffic will be encrypted (I also need to find the best approach for this, but that's not the question for now).
Currently the program works. I have a server application and a client application. However I want to setup the application so that it doesn't need a central server for it to work.
What approach can I take to decentralize the network?
I think I need to develop the clients in a way so that they do also act as a server.
How would the clients know what server it needs to connect with / what happens if a server is down? How would the clients / servers now what other nodes there are in the network without having a central server?
At best I don't want the clients to know what the IP addresses are of the different nodes, however I don't think this would be possible without having a central server.
As stated the application will be written in VB.Net, but I think the language doesn't really matter at this point.
Just want to know the different approaches I can follow.
Look for example at the paper of the Kademlia protocol (you can find it here). If you just want a quick overview, look at the Wikipedia page http://en.wikipedia.org/wiki/Kademlia. The Kademlia protocol defines a way of node lookups in a network in a decentral way. It has been successfully applied in the eMule software - so it is tested to really work.
It should cause no serious problems to apply it to your chat software.
You need some known IP address for clients to initially get into a network. Once a client is part of a network, things can be more decentralized, but that first step needs something.
There are basically only two options - either the user provides one (for an existing node of the network - essentially how BitTorrent trackers work), or you hard-code in a gateway node (which is effectively a central server).
Maybe you can see uChat program. It's a program from uTorrent creator with chat without server in mind.
The idea is connect to a swarm from a magnetlink and use it to send an receive messages. This is as Amber answer, you need an access point, may it be a server, a know swarm, manual ip, etc.
Here is uChat presentation: http://blog.bittorrent.com/2011/06/30/uchat-we-just-need-each-other/

Which DHT algorithm to use (if I want to join two separate DHTs)?

I've been looking into some DHT systems, specially Pastry and Chord. I've read some concerns about Chord's reaction to churn, though I believe that won't be a problem for the task I have at hands. I'm implementing some sort of social network service that doesn't rely on any central servers for a course project. I need the DHT for the lookups.
Now I don't know of all the servers in the network in the beginning. As I've stated, there's no main tracker server. It works this way: each client has three dedicated servers. The three servers have the profile of the client, and it's wall, it's personal info, replicated. I only get to know about other group of servers when the user adds a friend (inputing the client's address). So I would create two separate DHTs on the two groups of three servers and when they friend each other I would like to join the DHTs. I would like to this consistently. I haven't had a lot of time to get all that familiar with the protocols, so I would like to know which one is better if I want to join the two separate DHTs?
Distributed hash tables are designed to automatically handle the problem of finding a node that stores a given piece of data. So, in the DHT design philosophy, you wouldn't have a dedicated server for the profile, wall, etc... you'd have a dedicated data-identifier for each of those and the DHT would handle placing the data amongst active servers and finding the correct server for a given piece of data.
Pastry and Chord are pretty similar in terms of functionality and differ mostly in how they handle neighbor sets and routing. It's not clear to me that one would be better than the other for this sort of application.
A good technical comparison paper is A performance vs. cost framework for evaluating DHT design tradeoffs under churn (PDF), from Infocom 2005, if you really want details.

Resources