working on bep44 implementation, i use the defined kademlia algorithm to find the closest good node given an hash id.
Using my program i do go run main.go -put "Hello World!" -kname mykey -salt foobar2 -b public and get the value stored over a hundred nodes (good).
Now, when i run it multiple consecutive times, the sets of ip which are written by the put requests poorly intersects.
It is a problem as when i try to do a get request, the set of ips queried does not intersect with the put set, so the value is not found.
In my tests i use the public dht bootstrap node
"router.utorrent.com:6881",
"router.bittorrent.com:6881",
"dht.transmissionbt.com:6881",
When i query the nodes, I select the 8 closest nodes (nodes := s.ClosestGoodNodes(8, msg.InfoHash())), which usually end up in a list of ~1K queries after a recursive traversal.
In my understanding, storing addresses of the info hash in the dht table is deterministic given the status of the table. As i m doing consecutive queries i expect the table to change, indeed, but not that much.
How does it happen the store nodes set does not intersect ?
Since BEP44 is an extension it is only supported by a subset of the DHT nodes, which means the iterative lookup mechanism needs to take support into account when determining whether the set of closest nodes is stable and the lookup can be terminated.
If a node returns a token, v or seq field in in a get response then it is eligible for the closest-set of a read-only get.
If a node returns a token then it is eligible for the closest-set for a get that will be followed by put operation.
So your lookup may home in on a set of nodes in the keyspace that is closest to the target ID but not eligible for the operations in question. As long as you have candidates that are closer than the best known eligible contacts you have to continue searching. I call this perimeter widening, as it conceptually broadens the search area around the target.
Additionally you also need to take error responses or the absence of a response into account when performing put requests. You can either retry the node or try the next eligible node instead.
I have written down some additional constraints that one might want to put on the closest set in lookups for robustness and security reasons in the documentation of my own DHT implementation.
which usually end up in a list of ~1K queries after a recursive traversal.
This suggests something is wrong with your lookup algorithm. In my experience a lookup should only take somewhere between 60 and 200 udp requests to find its target if you're doing a lookup with concurrent requests, maybe even fewer when it is sequential.
Verbose logs of the terminal sets to eyeball how the lookups make progress and how much junk I am getting from peers have served me well.
In my tests i use the public dht bootstrap node
You should write your routing table to disk and reload it from there and only perform bootstrapping when none of the persisted nodes in your RT are reachable. Otherwise you are wasting the bootstrap nodes' resources and also waste time by having to re-populate your routing table first before performing any lookup.
Related
I am a beginner in Accumulo and using Accumulo 1.7.2.
As an Indexing strategy, I am planning to use Embedded Index with Rounds Strategy (http://accumulosummit.com/program/talks/accumulo-table-designs/ on page 21). For the same, I couldn't find any documents anywhere. I am wondering if any of you could help me here.
My description of that strategy was mostly just to avoid sending a query to all the servers at once by simply querying one portion of the table at a time. Adding rounds to an existing 'embedded index' example might be the easiest place to start.
The Accumulo O'Reilly book includes an example that starts on page 284 in a section called 'Index Partitioned by Document' whose code lives here: https://github.com/accumulobook/examples/tree/master/src/main/java/com/accumulobook/designs/multitermindex
The query portion of that example is in the class WikipediaQueryMultiterm.java. It uses a BatchScanner configured with a single empty range to send the query to all tablet servers. To implement the by-rounds query strategy this could be replaced with something that goes from one tablet server to the next, either in a round-robin fashion, or perhaps going to 1, then if not enough results are found, going to the next 2, then 4 and so on, to mimic what Cassandra does.
Since you can't target servers directly with a query and since the table is using some partitioning IDs you could configure your scanners to scan all the key values within the first partition ID, then querying the next partition ID and so on, or perhaps visiting the partitions in random order to avoid congestion.
What some others have mentioned, adding additional indexes to help narrow the search space before sending a query to multiple servers hosting an embedded index, is beyond the scope of what I described and is a strategy that I believe is employed by the recently released DataWave project: https://github.com/NationalSecurityAgency/datawave
I'm working on a side project and trying to monitor peers on popular torrents, but I can't see how I can get a hold of the full dataset.
If the theoretical limit on routing table size is 1,280 (from 160 buckets * bucket size k = 8) then I'm never going to be able to hold the full number of peers on a popular torrent (~9,000 on a current top-100 torrent)
My concern with simulating multiple nodes is low efficiency due to overlapping values. I would assume that their bootstrapping paths being similar would result in similar routing tables.
Your approach is wrong since it would violate reliability goals of the DHT, you would essentially be performing an attack on the keyspace region and other nodes may detect and blacklist you and it would also simply be bad-mannered.
If you want to monitor specific swarms don't collect data passively from the DHT.
if the torrents have trackers, just contact them to get peer lists
connect to the swarm and get peer lists via PEX which provides far more accurate information than the DHT
if you really want to use the DHT perform active lookups (get_peers) at regular intervals
It seems to me that using IF would make the statement possibly fail if re-tried. Therefore, the statement is not idempotent. For instance, given the CQL below, if it fails because of a timeout or system problem and I retry it, then it may not work because another person may have updated the version between retries.
UPDATE users
SET name = 'foo', version = 4
WHERE userid = 1
IF version = 3
Best practices for updates in Cassandra are to make updates idempotent, yet the IF operator is in direct opposition to this. Am I missing something?
If your application is idempotent, then generally you wouldn't need to use the expensive IF clause, since all your clients would be trying to set the same value.
For example, suppose your clients were aggregating some values and writing the result to a roll up table. Each client would calculate the same total and write the same value, so it wouldn't matter if multiple clients wrote to it, or what order they wrote to it, since it would be the same value.
If what you are actually looking for is mutual exclusion, such as keeping a bank balance, then the IF clause could be used. You might read a row to get the current balance, then subtract some money and update the balance only if the balance hadn't changed since you read it. If another client was trying to add a deposit at the same time, then it would fail and would have to try again.
But another way to do that without mutual exclusion is to write each withdrawal and deposit as a separate clustered transaction row, and then calculate the balance as an idempotent result of applying all the transaction rows.
You can use the IF clause for idempotent writes, but it seems pointless. The first client to do the write would succeed and Cassandra would return the value "applied=True". And the next client to try the same write would get back "applied=False, version=4", indicating that the row had already been updated to version 4 so nothing was changed.
This question is more about linerizability(ordering) than idempotency I think. This query uses Paxos to try to determine the state of the system before applying a change. If the state of the system is identical then the query can be retried many times without a change in the results. This provides a weak form of ordering (and is expensive) unlike most Cassandra writes. Generally you should only use CAS operations if you are attempting to record state of a system (rather than a history or log)
Do not use many of these queries if you can help it, the guidelines suggest having only a small percentage of your queries rely on this behavior.
This question is about the routing table creation at a node in a p2p network based on Pastry.
I'm trying to simulate this scheme of routing table creation in a single JVM. I can't seem to understand how these routing tables are created from the point of joining of the first node.
I have N independent nodes each with a 160 bit nodeId generated as a SHA-1 hash and a function to determine the proximity between these nodes. Lets say the 1st node starts the ring and joins it. The protocol says that this node should have had its routing tables set up at this time. But I do not have any other nodes in the ring at this point, so how does it even begin to create its routing tables?
When the 2nd node wishes to join the ring, it sends a Join message(containing its nodeID) to the 1st node, which it passes around in hops to the closest available neighbor for this 2nd node, already existing in the ring. These hops contribute to the creation of routing table entries for this new 2nd node. Again, in the absence of sufficient number of nodes, how do all these entries get created?
I'm just beginning to take a look at the FreePastry implementation to get these answers, but it doesn't seem very apparent at the moment. If anyone could provide some pointers here, that'd be of great help too.
My understanding of Pastry is not complete, by any stretch of the imagination, but it was enough to build a more-or-less working version of the algorithm. Which is to say, as far as I can tell, my implementation functions properly.
To answer your first question:
The protocol says that this [first] node should have had its routing tables
set up at this time. But I do not have any other nodes in the ring at
this point, so how does it even begin to create its routing tables?
I solved this problem by first creating the Node and its state/routing tables. The routing tables, when you think about it, are just information about the other nodes in the network. Because this is the only node in the network, the routing tables are empty. I assume you have some way of creating empty routing tables?
To answer your second question:
When the 2nd node wishes to join the ring, it sends a Join
message(containing its nodeID) to the 1st node, which it passes around
in hops to the closest available neighbor for this 2nd node, already
existing in the ring. These hops contribute to the creation of routing
table entries for this new 2nd node. Again, in the absence of
sufficient number of nodes, how do all these entries get created?
You should take another look at the paper (PDF warning!) that describes Pastry; it does a rather good job of explain the process for nodes joining and exiting the cluster.
If memory serves, the second node sends a message that not only contains its node ID, but actually uses its node ID as the message's key. The message is routed like any other message in the network, which ensures that it quickly winds up at the node whose ID is closest to the ID of the newly joined node. Every node that the message passes through sends their state tables to the newly joined node, which it uses to populate its state tables. The paper explains some in-depth logic that takes the origin of the information into consideration when using it to populate the state tables in a way that, I believe, is intended to reduce the computational cost, but in my implementation, I ignored that, as it would have been more expensive to implement, not less.
To answer your question specifically, however: the second node will send a Join message to the first node. The first node will send its state tables (empty) to the second node. The second node will add the sender of the state tables (the first node) to its state tables, then add the appropriate nodes in the received state tables to its own state tables (no nodes, in this case). The first node would forward the message on to a node whose ID is closer to that of the second node's, but no such node exists, so the message is considered "delivered", and both nodes are considered to be participating in the network at this time.
Should a third node join and route a Join message to the second node, the second node would send the third node its state tables. Then, assuming the third node's ID is closer to the first node's, the second node would forward the message to the first node, who would send the third node its state tables. The third node would build its state tables out of these received state tables, and at that point it is considered to be participating in the network.
Hope that helps.
To best explain my goal, I will simplify the problem to the basics of my requirements. Please let me know if more details are needed for clarity.
Suppose I have 10 unique numbers (0-9) that are available for assignment. Which numbers are reserved or free is contained by the database. The goal of a running Front End web service is to successfully request a number for assignment. Once the number has been assigned to the specific node, it is reserved and no other node can have it assigned.
Keep in mind that this is to be a distributed system with no single point of failure.
The caveat that is giving me trouble is Cassandra's notion of eventual consistency. Note that I could tune Cassandra to be fully consistent at the cost of higher latency. If that is my best (and possibly only) option, I can do it, but I'd like to preserve the concept of consistency tuning.
My thoughts on the strategy is to do the following on the node:
1) Query Cassandra to get a list of free numbers.
2) Randomly select one of the free numbers.
3) Perform a Put to Cassandra saying that this node has reserved that number.
4) Continuously query Cassandra to see which node successfully reserved the number. (Continuously request, as the read may not immediately reflect the assignment.)
5) If the returned node name is the name in which this node's reservation was filed, then the reservation was a success.
6) If the returned node name is a different node name, it means that another node requested the number at around the same time as this one, and was given the assignment. This node must go back to step 1 and try again.
I have an odd feeling that specific circumstances will cause errors (double assignment, etc.) if I use the above strategy.
Can anyone else comment on my proposed strategy, and possibly offer their own? Thanks.
You may want to check out Apache ZooKeeper. It seems like it's exactly what you need. You can use ZooKeeper to get new assignments, and store the existing assignments in Cassandra.
Depending on your application, do you really need to choose one from a fixed set of numbers? If all you need is to choose a unique number that no other node has chosen, you could do it by generating a UUID (Universally Unique IDentifier) for each node, e.g. with http://docs.python.org/library/uuid.html.