BitTorrent Optimistic Unchoke/Bandwith probing - p2p

While thinking about how BitTorrent works, a few questions come to my mind. Would appreciate if somebody can share a few possible responses.
Suppose a BitTorrent gets 50 peers from the tracker and then it establishes connection with 20 of them to form the peer-set. Is this peer-set randomly selected or based on their bandwith? (i understand that the peers which will be unchoked are selected based on their offered bandwidth) Subsequently, how is this bandwidth determined for each connection (a ping can give us the latency but not the bandwidth i assume)
The optimistic unchoke leads to the problem of free-riders in the system. Considering an unchoke might not always result in better peers, why is not possible to discard this policy at all? (I assume this policy helps peers with slow bandwith to fulfill requests, why cannot BitTorrent adopt a policy to probe the bandwith of the optimistic peer without sending data packets; and have another (maybe the 5th connection) for low-bandwidth peers so that they don't starve. This 5th channel will transmit at only a fraction of the bandwith compared to the other 4 channels) This may at least discourage free-riding?

traditionally the peers are selected at random. Some clients may have had weak biases based on previous interactions with the peers or CIDR distance. However, there is a recent proposal (which uTorrent and libtorrent implemens) suggests a consistent but uniformly distributed peer selection/priority algorithm. For more information, see this blog post. The unchoke algorithm is triggered every 15 seconds. The peers are then sorted by the number of bytes they sent during the last 15 seconds. The ones sending the most are then unchoked, and the rest are choked. So, the download rate is the 15 second average.
If you don't optimistically unchoke peers, there's no way for you to prove to them that you are better than the other peers in their unchoke set, and they will never unchoke you back. Without optimistic unchokes (also assuming you don't have the allow-fast extension), there is no way to start a download. When you first join, you won't have any pieces, you can't trade the first piece, you have to rely on being optimistically unchoked. Estimating someone's bandwidth without sending bulk data is hard and probably unreliable. Even if you got a good estimate of someones capacity, that wouldn't necessarily mean that capacity was available to you. The current mechanism is very robust in that it doesn't need to make assumptions about the network equipment between the peers (like the packet-train bandwidth estimation needs to do) and it looks at actual data.

Related

TCP_NODELAY parameter implication on MSS and CPU utilisation

While I was trying to compose test-suite using netperf I had to go through the manual where I came across the below line for option -D
If setting TCP_NODELAY with -D affects throughput and/or service demand for tests where the send size (-m) is larger than the MSS it suggests the TCP/IP stack’s implementation of the Nagle Algorithm may be broken, perhaps interpreting the Nagle Algorithm on a segment by segment basis rather than the proper user send by user send basis
My question is - how can setting TCP_DELAY and it's effect on throughput determines nagle implementation is broken? Can someone help me with a logical explanation on the same?
In broad terms Nagle goes something like:
Is the amount of data in this send, added to any data queued waiting
to be sent, larger than the Maximum Segment Size (MSS) of this
connection? If the answer is yes, then send the data now, so long
as there are no other constraints on sending such as the congestion
window or the receiver’s advertised window. If the answer is no,
then consider the next step:
Is the connection “idle?” In other
words, is there no un-ACKnowledged data out on the network. If so,
then send the data in the user’s send now. If the connection is not
idle, queue the data and wait. We know that one or more of three
things will happen to cause the data to be sent:
The application will continue making send calls and we get to an MSS-worth of data to send.
An ACKnowledgement will arrive from the remote for the currently un-ACKed data making the connection “idle” again.
The retransmission timer for the un-ACKed data will expire and we might piggy-back this data with the retransmission.
The main point being Nagle is supposed to be evaluated on a user-send by user-send basis. If disabling it results in higher throughput with a send size greater than the MSS, it implies the Nagle implementation of that stack is going TCP segment by TCP segment instead.

Connection interval dependent of transmission frequency?

I'm new to BLE, and bluetooth in general, but I'm on a project that includes communication via BT 5.
As the BLE communication has to transmit around 2 bytes, to 1 MB at a time, I'm looking for a way to optimize the transmission time.
I know the pros n cons for the lower transmission freq (125 kbps), and for the highest transmission freq (2 Mbps), and for the DLE of 251 PDU bytes, but what I see from different forums and articles, the throughput mostly depend on the connection parameters as the connection interval and the packets per connection event. But where does the transmission frequency come in?
I've tried searching this forum for an answer, and several others, and even the BT core specification, but I haven't been able to find a solution for my problem.
If you read my answer at Why is BLE 4.2 faster than BLE 4.1, you can see that there are many components affecting the overall transfer speed.
You first have the radio transmission rate itself, which sets the upper limit.
You then have the overhead between all packets that becomes less apparant as longer packets you have.
The connection interval and length of each connection event can be important if you want the throughout to be high. If there is only one connection and the Bluetooth chip is not too stupid, the connection event length will fill the connection interval and therefore the connection interval doesn't really matter. However, if there are other conflicting radio events scheduled in a way that the connection event must be closed, the transmission cannot continue until the next connection event. So in this case, the throuhput will be higher if you lower the connection interval. So as a summary it highly depends on which Bluetooth stack the chip runs, how it's configured by the host and how many active connections you have.
The transmission rate controls your underlying bitrate but on top of that sits different layers of the BLE protocol which slow down the realizable throughput. This article has general derivation of how the different layers impact throughput in case that's useful!

KRPC Protocol act weird in BEP-05

According to BEP-05 , when you start a find_node or get_peers request, you will receive the query message or K (8) good nodes closest to the target/infohash.
However, in my case ,with the bootstrap node router.utorrent.com:6881, the remote returned the 8 nodes which closest to self's nodeId. And if it is a get_peers request, it always returned 8 nodes closest to self and 7 invalid peers. But if access to some special node which redirect to near the infohash, the protocol acts normal.
weird wireshark dump
success wireshark dump
Any help would be appreciated!
You shouldn't pay too much attention to what the bootstrap nodes do as long as they allow you to populate your routing table, since that is their primary purpose.
They receive a disproportionate amount of traffic and to avoid directing undue amounts of traffic to any particular node they may deviate from the specification in a few ways that are harmless as long as only a vanishingly small fraction of the network behaves that way. There is only a single-digit number of bootstrap nodes among millions, so their behavior is negligible and should not be taken as a reference point.
It does not make sense to contact a bootstrap node via get peers either. find node queries would be the correct choice to populate your routing table. And it is only necessary to contact them in the relatively rare case where other mechanisms were not successful.

Why TCP/IP speed depends on the size of sending data?

When I sent small data (16 bytes and 128 bytes) continuously (use a 100-time loop without any inserted delay), the throughput of TCP_NODELAY setting seems not as good as normal setting. Additionally, TCP-slow-start appeared to affect the transmission in the beginning.
The reason is that I want to control a device from PC via Ethernet. The processing time of this device is around several microseconds, but the huge latency of sending command affected the entire system. Could you share me some ways to solve this problem? Thanks in advance.
Last time, I measured the transfer performance between a Windows-PC and a Linux embedded board. To verify the TCP_NODELAY, I setup a system with two Linux PCs connecting directly with each other, i.e. Linux PC <--> Router <--> Linux PC. The router was only used for two PCs.
The performance without TCP_NODELAY is shown as follows. It is easy to see that the throughput increased significantly when data size >= 64 KB. Additionally, when data size = 16 B, sometimes the received time dropped until 4.2 us. Do you have any idea of this observation?
The performance with TCP_NODELAY seems unchanged, as shown below.
The full code can be found in https://www.dropbox.com/s/bupcd9yws5m5hfs/tcpip_code.zip?dl=0
Please share with me your thinking. Thanks in advance.
I am doing socket programming to transfer a binary file between a Windows 10 PC and a Linux embedded board. The socket library are winsock2.h and sys/socket.h for Windows and Linux, respectively. The binary file is copied to an array in Windows before sending, and the received data are stored in an array in Linux.
Windows: socket_send(sockfd, &SOPF->array[0], n);
Linux: socket_recv(&SOPF->array[0], connfd);
I could receive all data properly. However, it seems to me that the transfer time depends on the size of sending data. When data size is small, the received throughput is quite low, as shown below.
Could you please shown me some documents explaining this problem? Thank you in advance.
To establish a tcp connection, you need a 3-way handshake: SYN, SYN-ACK, ACK. Then the sender will start to send some data. How much depends on the initial congestion window (configurable on linux, don't know on windows). As long as the sender receives timely ACKs, it will continue to send, as long as the receivers advertised window has the space (use socket option SO_RCVBUF to set). Finally, to close the connection also requires a FIN, FIN-ACK, ACK.
So my best guess without more information is that the overhead of setting up and tearing down the TCP connection has a huge affect on the overhead of sending a small number of bytes. Nagle's algorithm (disabled with TCP_NODELAY) shouldn't have much affect as long as the writer is effectively writing quickly. It only prevents sending less than full MSS segements, which should increase transfer efficiency in this case, where the sender is simply sending data as fast as possible. The only effect I can see is that the final less than full MSS segment might need to wait for an ACK, which again would have more impact on the short transfers as compared to the longer transfers.
To illustrate this, I sent one byte using netcat (nc) on my loopback interface (which isn't a physical interface, and hence the bandwidth is "infinite"):
$ nc -l 127.0.0.1 8888 >/dev/null &
[1] 13286
$ head -c 1 /dev/zero | nc 127.0.0.1 8888 >/dev/null
And here is a network capture in wireshark:
It took a total of 237 microseconds to send one byte, which is a measly 4.2KB/second. I think you can guess that if I sent 2 bytes, it would take essentially the same amount of time for an effective rate of 8.2KB/second, a 100% improvement!
The best way to diagnose performance problems in networks is to get a network capture and analyze it.
When you make your test with a significative amount of data, for example your bigger test (512Mib, 536 millions bytes), the following happens.
The data is sent by TCP layer, breaking them in segments of a certain length. Let assume segments of 1460 bytes, so there will be about 367,000 segments.
For every segment transmitted there is a overhead (control and management added data to ensure good transmission): in your setup, there are 20 bytes for TCP, 20 for IP, and 16 for ethernet, for a total of 56 bytes every segment. Please note that this number is the minimum, not accounting the ethernet preamble for example; moreover sometimes IP and TCP overhead can be bigger because optional fields.
Well, 56 bytes for every segment (367,000 segments!) means that when you transmit 512Mib, you also transmit 56*367,000 = 20M bytes on the line. The total number of bytes becomes 536+20 = 556 millions of bytes, or 4.448 millions of bits. If you divide this number of bits by the time elapsed, 4.6 seconds, you get a bitrate of 966 megabits per second, which is higher than what you calculated not taking in account the overhead.
From the above calculus, it seems that your ethernet is a gigabit. It's maximum transfer rate should be 1,000 megabits per second and you are getting really near to it. The rest of the time is due to more overhead we didn't account for, and some latencies that are always present and tend to be cancelled as more data is transferred (but they will never be defeated completely).
I would say that your setup is ok. But this is for big data transfers. As the size of the transfer decreases, the overhead in the data, latencies of the protocol and other nice things get more and more important. For example, if you transmit 16 bytes in 165 microseconds (first of your tests), the result is 0.78 Mbps; if it took 4.2 us, about 40 times less, the bitrate would be about 31 Mbps (40 times bigger). These numbers are lower than expected.
In reality, you don't transmit 16 bytes, you transmit at least 16+56 = 72 bytes, which is 4.5 times more, so the real transfer rate of the link is also bigger. But, you see, transmitting 16 bytes on a TCP/IP link is the same as measuring the flow rate of an empty acqueduct by dropping some tears of water in it: the tears get lost before they reach the other end. This is because TCP/IP and ethernet are designed to carry much more data, with reliability.
Comments and answers in this page point out many of those mechanisms that trade bitrate and reactivity for reliability: the 3-way TCP handshake, the Nagle algorithm, checksums and other overhead, and so on.
Given the design of TCP+IP and ethernet, it is very normal that, for little data, performances are not optimal. From your tests you see that the transfer rate climbs steeply when the data size reaches 64Kbytes. This is not a coincidence.
From a comment you leaved above, it seems that you are looking for a low-latency communication, instead than one with big bandwidth. It is a common mistake to confuse different kind of performances. Moreover, in respect to this, I must say that TCP/IP and ethernet are completely non-deterministic. They are quick, of course, but nobody can say how much because there are too many layers in between. Even in your simple setup, if a single packet get lost or corrupted, you can expect delays of seconds, not microseconds.
If you really want something with low latency, you should use something else, for example a CAN. Its design is exactly what you want: it transmits little data with high speed, low latency, deterministic time (just microseconds after you transmitted a packet, you know if it has been received or not. To be more precise: exactly at the end of the transmission of a packet you know if it reached the destination or not).
TCP sockets typically have a buffer size internally. In many implementations, it will wait a little bit of time before sending a packet to see if it can fill up the remaining space in the buffer before sending. This is called Nagle's algorithm. I assume that the times you report above are not due to overhead in the TCP packet, but due to the fact that the TCP waits for you to queue up more data before actually sending.
Most socket implementations therefore have a parameter or function called something like TcpNoDelay which can be false (default) or true. I would try messing with that and seeing if that affects your throughput. Essentially these flags will enable/disable Nagle's algorithm.

Average internet delay

Just wondering, what is the average packet transmission delay between two hosts over the internet (ignoring packet loss and retransmission).
Now, hang a second before you write that it's too genenral and depends on too many factors (Location of the two hosts, network workload at a specific time, just to name a few), i'm aware of that.
Yet, that's why i'm asking what might be the AVERAGE delay. There must be some record for that.
Maybe it's appropriate to ask for seperate countrywide/continentwide/intercontinental average values, too. Whatever makes sense.
However you ask it, this question is WAAAYYYY too general. Ping times can give you a reasonable approximation, though. My avg to a google host:
round-trip (ms) min/avg/max/med = 20/23/37/21
Yahoo:
round-trip (ms) min/avg/max/med = 19/23/38/23
Baidu (China):
round-trip (ms) min/avg/max/med = 269/272/275/272
Pair (Pittsburgh):
round-trip (ms) min/avg/max/med = 63/66/73/67
Google and Y! are using content-distribution networks, so I am most likely hitting servers very nearby. Baidu is across the world from me. Pair is across the country. These are all from a relatively fast connection.
I'd expect a dialup user to see figures that are approximately 100-200 ms higher (depending on network activity at the time). Similarly, my figures would increase significantly if my network were heavily loaded (its not at the moment).
Does that help at all?
You may find the discussion of this stuff at this page interesting. The author argues that traffic is traveling at about half the speed of light (the speed of light being the best you can possibly do for traffic speed, assuming various scientists are right.

Resources