libp2p - How to discover initial peers?

libp2p - How to discover initial peers? - rust

In the bitcoin p2p core client, the initial peers are found, as stated, as:
When started for the first time, programs don’t know the IP addresses
of any active full nodes. In order to discover some IP addresses, they
query one or more DNS names (called DNS seeds) hardcoded into Bitcoin
Core and BitcoinJ. The response to the lookup should include one or
more DNS A records with the IP addresses of full nodes that may accept
new incoming connections. For example, using the Unix ``dig command
https://en.wikipedia.org/wiki/Dig_%28Unix_command%29>`__:
source: https://developer.bitcoin.org/devguide/p2p_network.html
Is the same approach required for libp2p for initial peer discovery? I was not able to find any tutorial which covers this information. I was hoping libp2p would handle this problem. Does the libp2p provide guidance or facilities for this?

Related

How does the bitcoin client determine the first IP address to connect?

In my knowledge, bitcoin is a p2p protocol and a p2p protocol must have a dedicated central server. But it is said that bitcoin is decentralized.

Back in 2009 we relied on IRC to bootstrap the network, so every node would connect to Freenode (later LFnet) and would join a channel. Their nicknames were their encoded public IP address.
Nowadays the Bitcoin Core client, and many other implementations, rely on DNS seeds. DNS seeds are special DNS servers that are configured to return a number of randomly selected nodes from the network. The operators of the DNS seeds also run crawlers to enumerate the publicly reachable nodes that are to be returned by the seeds.
The seeds that are currently included in the Bitcoin Core client are:
bitcoin.sipa.be
dnsseed.bluematt.me
dnsseed.bitcoin.dashjr.org
seed.bitcoinstats.com
bitseed.xf2.org
bitcoin.jonasschnelli.ch
If you send a request to any of these servers they will return a number of random IPs that are known to run Bitcoin on port 8333:
dig seed.bitcoinstats.com +short
71.19.155.244
173.254.232.51
45.79.97.30
198.252.112.64
35.128.8.141
108.17.18.165
98.208.76.134
8.29.28.12
52.62.2.124
96.234.214.85
47.89.24.56
212.164.215.159
52.62.42.229
68.52.96.191
115.66.205.171
24.250.16.39
201.43.160.155
5.3.253.18
100.40.179.172
50.135.169.181
186.149.249.18
101.201.44.207
96.35.97.46
124.188.118.196
82.8.4.79
Besides the DNS seeds, the Core client also has a static list of IPs to try first and it will cache any previously contacted peers in a local database in order to reconnect without having to query the DNS seeds.
(Disclaimer: I am the operator of one of the DNS seeds)

IGMPv2 flood source detection

In wireshark I can see Membership Query, general IGMPv2 requests coming over and over from 0.0.0.0 source which suggests ( according to RFC ) machine that hasn't received address yet. My question is how in Linux environment I can find such machine. This query triggers many answers and causes significant network communication slowdown.

When a machine is connected to a network for the first time, it will try to find the DHCP servers in order to get an IP address configuration. Untill then, as you already said, it has no IP address and the only identifier it has is it's MAC address, which is used to keep a comunication alive while it negotiates with the DHCP server (during this period it does not have an IP address until the very last).
Answering your question, you'd find the machine you are looking for making use of the MAC address. If you are on a small network, a manual check (ifconfig) will do it but, if you are on a big one, you better check the ARP table of your switch(es) to have a better idea where it could be.

How PEX protocol (Magnetic links) finds it first IP?

I'm trying to understand how can a magnetic link work, as I've read they use DHT and PEX to get the peers, but if I'm a new node in the network how can I find peers with only the hash of the file?! Doesn't it always require a link to a known host?
Thanks

The bittorrent DHT can be bootstrapped in many ways. It just needs the IP and Port of any other reachable DHT node out there.
Current clients generally use several of the following strategies:
bootstrap from a cache of long-lived nodes from a previous session
use a DNS A/AAAA record mapping to a known node (e.g. router.bittorrent.com or dht.transmissionbt.com) with a known port
use a node embedded in a .torrent file
retrieve the DHT port from a bittorrent client over a bittorrent connection established through other means, e.g. a conventional tracker.
If a peer is embedded in a magnet link one can also piggyback a DHT bootstrap on that through the port message
multicast neighbor discovery via LSD
cross-chatter from the IPv4 to the IPv6 DHTs and vice versa (if needed)
Other ways such as user-configurable bootstrap lists, DNS SRV records round-robin mapping to live nodes or - should everything else fail - adding the IP of your friend(s) manually work.
Once a node has joined the network the first strategy mentioned above will kick in and it is unlikely that it will have to bootstrap again.
So while most implementations rely on a single/few points of entry into the network for convenience, the protocol itself is flexible enough to decentralize the points of entry too.
Just for emphasis: Any node in the DHT can be used to join the network. Dedicated bootstrap nodes are an implementation detail, not part of the protocol, and could be replaced by other discovery mechanisms if necessary.

Where the p2p software connect to initially?

In BitTorrent, client connects to the tracker specified in .torrent file. Tracker is a kind of centralized server and it is the starting point. So BitTorrent is not pure p2p.
If we want to develop pure p2p system, we should design routing overlay network. All nodes will have routing table like routers do. But even in routing overlay network, each node should know at least one existing node(GUID, IP address) initially. So how can we determine this? Should we keep 'one existing node to connect initially' forever like fixed centralized server? If so, I think this is not fully decentralized method.

The solution you describe (defining a central peer wit well-know ip address) is not the only one.
The other solution is to post an html page (or json file) in a well-know URL on the net, and make sure this item contains an updated list of peers this peer can initialy connect. This list can be updated if a peer goes down. Eventually, you can use multiple URLs.
Pure P2P system is a theoretical concept which cannot be implemented fully in reality.

You could use an anycast. So the first other client will answer and may send such an initial "client list". Where your client can connect to them to get more lists.
Classically I would implement a multicast to a adress and wait for an answer of other clients.

Firstly, a true peer to peer network is not necessarily decentralized. Secondly, decentralization does not necessarily mean that the network doesn't make use of secondary services which may themselves be centralized.
The main question in both of these issues is wheather the primary resources of the network solution are distributed through correlated peers.
For example, peer-to-peer video conferencing may use a central contacts service but still be pure peer-to-peer as long as the peers resolve such issues before entering a true peer-to-peer scope. This would also be decentralized.
What it comes down to is what your trying to solve by using peer-to-peer. A video conference is a video conference - it starts with a video being recorded on one peer and ends with the video being viewed on another. As long as each byte of this data is transferred directly between the peers (even if there's hundreds of peers in the conference, and regardless of how these peers found each other) it is a true peer-to-peer video conference.
Note that the video peers will still be in your typical ring, and that the contacts lists may still use the node key for location information rather than an IP. This will still be a network overlay as it will still be built over IP, replacing it's addressing scheme on the peer level to facilitate true peer-to-peer networking.
What it realy comes down to is the concepts of a network connection. IP just pushes packets to unspecified routers until it gets to a specific address. 'connections' between each endpoint only exist within the higher software levels (including when dealing with TCP/IP). A connection is just the data used within software to understand who is who and how each point can handle data etc. Peer-to-peer network overlays effectively distributes this data, eliminating the need for each peer to create massive amounts of connections to communicate on a massive scope. Decentralization is not required for this (as long as peer to peer communication is not centralized), and a secondary service within the system wont necessarily limit a network's scope or otherwise centralize actual peer-to-peer networking.
So to answer your question, it doesn't matter where it initially connects in order to be considered peer-to-peer, and different peer-to-peer services will handle this based on their service design.

Ok so I am thinking about writing a p2p protocol for a number of different services for an AI project, and I thought I'd come here to see if i could get some ideas on how to get the initial connection.
I have come across several ways to establish the initial connection:
1) You have a static ip address on the internet that distributes information on other peers. This isn't good, because:
a) it's a single point of failure, the service could go offline, preventing any new connections from creating an initial connection to peers,
b) the IP address could change. This could be mitigated by using a domain name which is maintained to point to the current ip address of the location of a service which provides data on peers, however this can be subverted in theory by hackers by spoofing or arp poisoning, dns attacks etc.
2) You could force the user to provide an initial ip address or hostname for another peer, and it's up to the user to find the hostname / ip address / port number. This is good, but if someone posts disinformation or they cannot find a peer on google or some other search site then obviously it's fallable.
3) You could leave it to the peer to publish their own existence in a central location - for example a group of IRC channels or a group of websites. Again, unless it's going through a central trusted domain, it's hard to determine the authenticity of the peer.
4) You could use some kind of nmap style discovery algorithm that searches through subnets for appropriate protocols. The problem with this approach is it's slow and it's likely to attract attention from things like firewalls etc.
5) This is a variation of 3) you could allow the peers to advertise their own information on a website, then instead of having to look for the information in a suitable location (a specific website, or group of websites), you can let google's search algorithm find it, and do the discovery for you, however you could imagine that this may take a few days for google to cache the website data with information on the peers. Also again, it would be upto you to provide some way to verify the authenticity of the advertised data.
6) If you are interested in an exclusive p2p network that locks out certain people (for example, you might want to have a file sharing network and you don't want law enforcement to be able to access it, or MPIAA), then you could use 2) and then have a referral system where you require that the initial connection provide the referrer's ip address, and then the service could connect to the referrers ip address and ask the referrer if they did indeed refer the referee.
That's all I can think of currently, but if anyone comes up with any other ways to do this, i would be very interested.

Is it allowed to run several different DHT nodes behind the same ip:port pair in Mainline DHT?

Is it allowed to run several different DHT nodes behind the same ip:port pair in Mainline DHT?
And which node should reply on the DHT query message?
All or one of them?
Thank you in advance.

The short answer is: one of them. Each request is expected to yield a single response.
DHT nodes are assumed to have a persistent node-ID associated with their (IP, port)-pair. If the node ID changes (or as you phrase it, a different node responds) its entry in the remote node's routing table is likely to be removed and replaced by the new node ID.
It is probably a better idea to run the nodes on different ports, so that requests to the same port results in responses from the same node with the same node-ID.
As a side note, the Azureus has certain security features in its DHT to mitigate attacks that where an attacker owns a certain area of the node ID space by restricting which node IDs you can run on any given IP address. There is a proposal to do something similar for the mainline DHT (proposed by me) DHT security extension. With something like this deployed, you would be limited in how many nodes you could run behind a single IP address.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string