Why Pastry DHT has an efficient routing - p2p

Recently I read some articles about Pastry DHT.The articles said that Pastry DHT has an efficient routing.In Pastry's routing,each step's node ID has a longer common prefix with the destination node,but node IDs are assigned randomly so it is possible that message travel a very very long distance before it arrive the destination and as a result the routing is not efficient.
For example,a Pastry routing,the destination node ID is d467c4,the starting node ID is 65a1fc,the routing process is 65a1fc->d13da3->d4213f->d462ba->d46702->d467c4.It is possible that nodes on this routing are all over the world(IDs are assigned randomly).The message will travel around the world before it arrive the final node.So this routing is not efficient.
So why Pastry DHT has an efficient routing?

That depends on your notion of efficiency. When designing overlay networks the first concern usually is to bound the total number of hops relative to the network size. In other words if there are n nodes you don't want O(n) routes, O(log n) is the usual goal because it can be achieved without total network awareness.
Route length in terms of latency, path cost or minimum bandwidth along the link are second-rank concerns. That are often achieved by adding some sort of locality-awareness or clustering after the hop-length has been optimized.
Pastry is efficient for the hop metric.

While selecting node ids to add to records in each row of the routing table, Pastry prefers nodes that are topologically closer to it. The lower the row number, say i, more choices are available to choose the nearest nodes from as only the first i prefixes need to match. As the row number goes up in the routing table, the available close neighbor choices decrease and hence for later hops, the latency might be more.

Related

How can I increase the number of peers in my routing table associated with a given infohash

I'm working on a side project and trying to monitor peers on popular torrents, but I can't see how I can get a hold of the full dataset.
If the theoretical limit on routing table size is 1,280 (from 160 buckets * bucket size k = 8) then I'm never going to be able to hold the full number of peers on a popular torrent (~9,000 on a current top-100 torrent)
My concern with simulating multiple nodes is low efficiency due to overlapping values. I would assume that their bootstrapping paths being similar would result in similar routing tables.
Your approach is wrong since it would violate reliability goals of the DHT, you would essentially be performing an attack on the keyspace region and other nodes may detect and blacklist you and it would also simply be bad-mannered.
If you want to monitor specific swarms don't collect data passively from the DHT.
if the torrents have trackers, just contact them to get peer lists
connect to the swarm and get peer lists via PEX which provides far more accurate information than the DHT
if you really want to use the DHT perform active lookups (get_peers) at regular intervals

kademlia closest good nodes won't intersect enough between two requests

working on bep44 implementation, i use the defined kademlia algorithm to find the closest good node given an hash id.
Using my program i do go run main.go -put "Hello World!" -kname mykey -salt foobar2 -b public and get the value stored over a hundred nodes (good).
Now, when i run it multiple consecutive times, the sets of ip which are written by the put requests poorly intersects.
It is a problem as when i try to do a get request, the set of ips queried does not intersect with the put set, so the value is not found.
In my tests i use the public dht bootstrap node
"router.utorrent.com:6881",
"router.bittorrent.com:6881",
"dht.transmissionbt.com:6881",
When i query the nodes, I select the 8 closest nodes (nodes := s.ClosestGoodNodes(8, msg.InfoHash())), which usually end up in a list of ~1K queries after a recursive traversal.
In my understanding, storing addresses of the info hash in the dht table is deterministic given the status of the table. As i m doing consecutive queries i expect the table to change, indeed, but not that much.
How does it happen the store nodes set does not intersect ?
Since BEP44 is an extension it is only supported by a subset of the DHT nodes, which means the iterative lookup mechanism needs to take support into account when determining whether the set of closest nodes is stable and the lookup can be terminated.
If a node returns a token, v or seq field in in a get response then it is eligible for the closest-set of a read-only get.
If a node returns a token then it is eligible for the closest-set for a get that will be followed by put operation.
So your lookup may home in on a set of nodes in the keyspace that is closest to the target ID but not eligible for the operations in question. As long as you have candidates that are closer than the best known eligible contacts you have to continue searching. I call this perimeter widening, as it conceptually broadens the search area around the target.
Additionally you also need to take error responses or the absence of a response into account when performing put requests. You can either retry the node or try the next eligible node instead.
I have written down some additional constraints that one might want to put on the closest set in lookups for robustness and security reasons in the documentation of my own DHT implementation.
which usually end up in a list of ~1K queries after a recursive traversal.
This suggests something is wrong with your lookup algorithm. In my experience a lookup should only take somewhere between 60 and 200 udp requests to find its target if you're doing a lookup with concurrent requests, maybe even fewer when it is sequential.
Verbose logs of the terminal sets to eyeball how the lookups make progress and how much junk I am getting from peers have served me well.
In my tests i use the public dht bootstrap node
You should write your routing table to disk and reload it from there and only perform bootstrapping when none of the persisted nodes in your RT are reachable. Otherwise you are wasting the bootstrap nodes' resources and also waste time by having to re-populate your routing table first before performing any lookup.

Will Elasticsearch survive this much load or simply die?

We have Elasticsearch Server with 1 cluster 3 Nodes, we are expecting that queries fired per second will be 800-1000, so we want to know if we get load like 1000 queries per second then will the elasticsearch server respond with delays or it will simply stop working ?
Queries are all query_string, fuzzy (prefix & wildcard queries are not used).
There's a few factors to consider assuming that your network has the necessary throughput:
What's the CPU speed and number of cores for each node?
Should have 2GHZ quad cores at the very least. Also the nodes should be dedicated to ELK, so they aren't busy with other tasks.
How much ram do your nodes have?
Probably want to be north of 10GB at least
Are your logs filtered and indexed?
Having your logs filtered will greatly reduce the work load generated by the queries. Additionally, filtered logs can make it so that you don't have to query as much with wild cards (which are very expensive).
Hope that helps point in a better direction :)
One immediate suggestion: if you are expecting sustained query rates of 800 - 1K/sec you do not want the nodes storing the data (which will be handling indexing of new records, merging and shard rebalancing) to also be having to deal with query scatter/gather operations. Consider a client + data node topology where you keep your 3 nodes and add n client nodes (data and master set to false in their configs.) The actual value for n will vary based on your actual performance; this will be something you'll want to determine via experimentation.
Other factors equal or unknown, abundant memory is a good resource to have. Review the Elastic team's guidance on hardware and be sure to link through to the discussion on heap.

Routing table creation at a node in a Pastry P2P network

This question is about the routing table creation at a node in a p2p network based on Pastry.
I'm trying to simulate this scheme of routing table creation in a single JVM. I can't seem to understand how these routing tables are created from the point of joining of the first node.
I have N independent nodes each with a 160 bit nodeId generated as a SHA-1 hash and a function to determine the proximity between these nodes. Lets say the 1st node starts the ring and joins it. The protocol says that this node should have had its routing tables set up at this time. But I do not have any other nodes in the ring at this point, so how does it even begin to create its routing tables?
When the 2nd node wishes to join the ring, it sends a Join message(containing its nodeID) to the 1st node, which it passes around in hops to the closest available neighbor for this 2nd node, already existing in the ring. These hops contribute to the creation of routing table entries for this new 2nd node. Again, in the absence of sufficient number of nodes, how do all these entries get created?
I'm just beginning to take a look at the FreePastry implementation to get these answers, but it doesn't seem very apparent at the moment. If anyone could provide some pointers here, that'd be of great help too.
My understanding of Pastry is not complete, by any stretch of the imagination, but it was enough to build a more-or-less working version of the algorithm. Which is to say, as far as I can tell, my implementation functions properly.
To answer your first question:
The protocol says that this [first] node should have had its routing tables
set up at this time. But I do not have any other nodes in the ring at
this point, so how does it even begin to create its routing tables?
I solved this problem by first creating the Node and its state/routing tables. The routing tables, when you think about it, are just information about the other nodes in the network. Because this is the only node in the network, the routing tables are empty. I assume you have some way of creating empty routing tables?
To answer your second question:
When the 2nd node wishes to join the ring, it sends a Join
message(containing its nodeID) to the 1st node, which it passes around
in hops to the closest available neighbor for this 2nd node, already
existing in the ring. These hops contribute to the creation of routing
table entries for this new 2nd node. Again, in the absence of
sufficient number of nodes, how do all these entries get created?
You should take another look at the paper (PDF warning!) that describes Pastry; it does a rather good job of explain the process for nodes joining and exiting the cluster.
If memory serves, the second node sends a message that not only contains its node ID, but actually uses its node ID as the message's key. The message is routed like any other message in the network, which ensures that it quickly winds up at the node whose ID is closest to the ID of the newly joined node. Every node that the message passes through sends their state tables to the newly joined node, which it uses to populate its state tables. The paper explains some in-depth logic that takes the origin of the information into consideration when using it to populate the state tables in a way that, I believe, is intended to reduce the computational cost, but in my implementation, I ignored that, as it would have been more expensive to implement, not less.
To answer your question specifically, however: the second node will send a Join message to the first node. The first node will send its state tables (empty) to the second node. The second node will add the sender of the state tables (the first node) to its state tables, then add the appropriate nodes in the received state tables to its own state tables (no nodes, in this case). The first node would forward the message on to a node whose ID is closer to that of the second node's, but no such node exists, so the message is considered "delivered", and both nodes are considered to be participating in the network at this time.
Should a third node join and route a Join message to the second node, the second node would send the third node its state tables. Then, assuming the third node's ID is closer to the first node's, the second node would forward the message to the first node, who would send the third node its state tables. The third node would build its state tables out of these received state tables, and at that point it is considered to be participating in the network.
Hope that helps.

When does Cassandra hit Amdahl's law?

I am trying to understand the claims that Cassandra scales linearly with the number of nodes. In a quick look around the 'net I have not seen much of a treatment of this topic. Surely there are serial processing elements in Cassandra that must limit the speed gained as N increases. Any thoughts, pointers or links on this subject would be appreciated.
Edit to provide perspective:
I am working on a project that has a current request for a 1,000+ node Cassandra infrastructure. I did not come-up with this spec. I find myself proposing that N be reduced to a range between 200 and 500, with each node being at least twice as fast for serial computation. This is easy to achieve without a cost penalty per node by making simple changes to the server configuration.
Cassandra's scaling is better described in terms of Gustafson's law, rather than Amdahl's law. Gustafson scaling looks at how much more data you can process as the number of nodes increases. That is, if you have N times as many nodes, you can process a dataset N times larger in the same amount of time.
This is possible because Cassandra uses very little cluster-wide coordination, except for schema and ring changes. Most operations only involve a number of nodes equal to the replication factor, which stays constant as the dataset grows -- hence nearly linear scale out.
By contrast, Amdahl scaling looks at how much faster you can process a fixed dataset as the number of nodes increases. That is, if you have N times as many nodes, can you process the same dataset N times faster?
Clearly, at some point you reach a limit where adding more nodes doesn't make your requests any faster, because there is a minimum amount of time needed to service a request. Cassandra is not linear here.
In your case, it sounds like you're asking whether it's better to have 1,000 slow nodes or 200 fast ones. How big is your dataset? It depends on your workload, but the usual recommendation is that the optimal size of nodes is around 1TB of data each, making sure you have enough RAM and CPU to match (see cassandra node limitations). 1,000 sounds like far too many, unless you have petabytes of data.

Resources