What are the most recent bittorrent DHT implementation recommendations? - bittorrent

I'm working on implementing yet another bittorrent client and at this time struggling with DHT. It is implemented accordingly to this specification http://www.bittorrent.org/beps/bep_0005.html but starting debugging it I noticed that other nodes' responses on the network vary.
For example, find_node is supposed to return either target node info or 8 closest nodes. Most of the nodes reply with 34 closest nodes and usually only 1 - 3 nodes from those 34 successfully reply to the consequent ping request.
Is there another document with better implementation recommendation? May be it is already proved that using 15 minutes interval to change the nodes state to questionable is not efficient and I have to use 10 or other number? Where can I find the best up to date suggestions?
There is another strange thing. Bootstrap nodes like router.bittorrent.com reply with even more closest nodes and usually the "nodes" BDictionary property buffer length is not divisible to 6 (compact node info: 4 for IP and 2 for port). For now, I simply cut off the buffer at the closest divisible to 6 length but all that is strange. Does anybody know why that might happen?

the spec says (emphasis mine):
When a node receives a find_node query, it should respond with a key "nodes" and value of a string containing the compact node info for [...]
Further down:
Contact information for nodes is encoded as a 26-byte string. Also known as "Compact node info" the 20-byte Node ID in network byte order has the compact IP-address/port info concatenated to the end.
Additionally you should read the original Kademlia paper since the bittorrent BEP builds on the concepts described therein and omits deeper explanations of those concepts.
You might also want to read for a few few extensions that are more or less de-facto standard for most implementations now http://libtorrent.org/dht_extensions.html
And read the other DHT-related BEPs, some are fairly widely adopted and modify/clarify BEP-5-specified behavior, but generally in a backward-compatible way.
For example, find_node is supposed to return either target node info or 8 closest nodes
Nodes will return a variable amount of entries. Could be more than 8. Or fewer.

Related

KRPC Protocol act weird in BEP-05

According to BEP-05 , when you start a find_node or get_peers request, you will receive the query message or K (8) good nodes closest to the target/infohash.
However, in my case ,with the bootstrap node router.utorrent.com:6881, the remote returned the 8 nodes which closest to self's nodeId. And if it is a get_peers request, it always returned 8 nodes closest to self and 7 invalid peers. But if access to some special node which redirect to near the infohash, the protocol acts normal.
weird wireshark dump
success wireshark dump
Any help would be appreciated!
You shouldn't pay too much attention to what the bootstrap nodes do as long as they allow you to populate your routing table, since that is their primary purpose.
They receive a disproportionate amount of traffic and to avoid directing undue amounts of traffic to any particular node they may deviate from the specification in a few ways that are harmless as long as only a vanishingly small fraction of the network behaves that way. There is only a single-digit number of bootstrap nodes among millions, so their behavior is negligible and should not be taken as a reference point.
It does not make sense to contact a bootstrap node via get peers either. find node queries would be the correct choice to populate your routing table. And it is only necessary to contact them in the relatively rare case where other mechanisms were not successful.

1000 rows limit for chef-api module/wrapper

So im using this Node module to connect to chef from my API.
https://github.com/normanjoyner/chef-api
The same contains a method called "partialSearch" which will fetch determined data for all nodes that match a given criteria. The problem I have, on of our environments have 1386 nodes attached to it, but it seems the module only returns 1000 as a maximum.
There does not seem to be any method to "offset" the results. This module works pretty well and its a shame this feature is not implemented since its lack really breaks the utility of such.
Does someone bumped into a similar issue with this module and can advise how to workaround it?
Here its an extract of my code :
chef.config(SetOptions(environment));
console.log("About to search for any servers ...");
chef.partialSearch('node',
{
q: "name:*"
},
{
name: ['name'] ,
'ipaddress': ['ipaddress'] ,
'chef_environment': ['chef_environment'] ,
'ip6address': ['ip6address'],
'run_list': ['run_list'],
'chef_client': ['chef_client'],
'ohai_time': ['ohai_time']
}
, function(err, chefRes) {
Regards!
The maximum is 1000 results per page, you can still request pages in order. The Chef API doesn't have a formal cursor system for pagination so it's just separate requests with a different start value, which can sometimes lead to minor desync (as in an item at the end of one page might shift in ordering and also show up at the start of the next page) so just make sure you handle that. That said, the fancy API in the client library you linked doesn't seem to expose that option, so you'll have to add it or otherwise workaround the problem. Check out https://github.com/sethvargo/chef-api/blob/master/lib/chef-api/resources/partial_search.rb#L34 for a Ruby implementation that does handle it.
We have run into similar issues with Chef libraries. One work-around you might find useful is if you have some node attribute that you can use to segment all of your nodes into smaller groups that are less than 1000.
If you have no such natural segmentation friendly already, a simple implementation would be to create a new attribute called segment and during your chef runs set the attribute's value randomly to a number between 1 and 5.
Now you can perform 5 queries (each query will only search for a single segment) and you should find all your nodes and if the randomness is working each group will be sized about 275 (1386/5).
As your node population grows you'll need to keep increasing the number of segments to ensure the segment sizes are less than 1000.

Hazelcast High Response Times

We have a Java 1.6 application that uses Hazelcast 3.7.4 version,
with a topology of two nodes. The application operates mainly with 3 maps.
In normal application working, response times when consulting the maps are
generally in values around some milliseconds tens.
I have observed that in some circumstances such as for example with network
cuts, the response time increases to huge values such as for example, 20 or 30 seconds!!
And this is impacting the application performance.
I would like to know if this kind of situation with network micro-cuts can lead
to increase searches response time in this manner. I do not know if some concrete configuration can be done to minimize this, and also which other elements can provoke so high times.
I provide some examples of some executed consults
Example 1:
String sqlPredicate = "acui='"+acui+"'";
Collection<Agent> agents =
(Collection<Agent>) data.getMapAgents().values(new SqlPredicate(sqlPredicate));
Example 2:
boolean exist = data.getMapAgents().containsKey(agent);
Thanks so much for your help.
Best Regards,
Jorge
The Map operations are all TCP Socket based and thus are subject to your Operating Systems TCP Driver implementation.
See TCP_NODELAY

Packing 20 bytes chunk via BLE

I've never worked with bluetooth before. I have to sends data via BLE and I've found the limit of 20 bytes per chunk.
The sender is an Arduino and the receiver could be both an Android or a Node.js app on a pc.
I have to send 9 values, stored in float values, so 4 bytes * 9 = 36 bytes. I need 2 chunks for all my data via BLE. The receiving part needs both chunks to process them. If some data are lost, I don't care.
I'm not expert in network protocols and I think I have to give each message an incremental timestamp so that the receiver can glue the two chunks with the same timestamp or discard the last one if the new timestamp is higher. But I'm not sure how to do a checksum, if I really need it or not, if I really have to care about it, or if - for a simple beta version of my system - I can ignore all those problems..
Does anyone can give me some advice? Like examples of similar situations handled with BLE communication?
You can get around the size limitation using the "Read Blob Request" of ATT. It allows you to read an attribute and also give an offset. So, you can use it to read the attribute with an offset of 0, if there's more than ATT_MTU bytes than you can request again with the offset at ATT_MTU*1, if there's still more ATT_MTU*2, etc... (You can read it in 3.4.4.5 of the Bluetooth v4.1 specifications; it's in the 4.0 spec too but I don't have that in front of me right now)
If the value changes between request, I'm not sure how you could go about detecting such a change. You could have the attribute send notifications when there's a change to interrupt the process in case the value changes in the middle of reading it.

Bittorrent : Why value of peers field is binary , not Bencoded list?

I'm trying to implement Bittorent in C. First of all, before writing a code snippet, I tried to used a web browser to send the following message(URL) to the tracker server.
you may try this URL.
http://torrent.ubuntu.com:6969/announce?info_hash=%9ea%80%ed%e7/%c4%ae%c8%de%8c%b0C%81c%fbq%3cJ%22&peer_id=M7-3-5--%eck%a8%2a%7f%e6%3ah%84%f2%9d%c5&port=43611&uploaded=0&downloaded=0&left=0&corrupt=0&key=00BA7F86&event=started&numwant=4&compact=0&no_peer_id=0
I have downloaded the torrent file from this link which is named xubuntu-13.04-desktop-i386.iso and has 9e6180ede72fc4aec8de8cb0438163fb713c4a22 as SHA-1 value.
However, after sending above request, I get
HTTP/1.0 200 OK
d8:completei357e10:incompletei8e8:intervali1800e5:peers24:l\262j"\310Հp\226\310\325G?\205^%!\221x \364\367\357e
But Bittorent specification says
peers : The value is a list of dictionaries, each with the following keys
-peer id
peer's self-selected ID, as described above for the tracker request (string)
-ip
peer's IP address (either IPv6 or IPv4) or DNS name (string)
-port
peer's port number (integer)
Why value of peers field is binary, not Bencoded list?
Thank you in advance.
the peers value may be a string consisting of multiples of 6 bytes. First 4 bytes are the IP address and last 2 bytes are the port number. All in network (big endian) notation.
https://wiki.theory.org/BitTorrentSpecification#Tracker_HTTP.2FHTTPS_Protocol
The protocol you refer to was used in the early days of bittorrent. However, as some trackers became increasingly popular without being scaled out significantly in terms of capacity, the size of the tracker responses became significant. One measure to deal with this was for clients to accept gzipped HTTP responses and the compact peer responses (which is by far the most popular format among trackers these days). The compact peer response provides a significantly smaller response with the same amount of information. It's defined in BEP23.
However, even though the responses are relatively small now, the TCP handshake and teardown still imposes a significant const, this is why many trackers are moving over to UDP BEP15.

Resources