Mainline DHT: why hash in ping is different than hash in find_node? - bittorrent

I am working with Mainline DHT implementation. And I saw strange behaviour.
Let’s say I know node IP and port: 1.1.1.1:7777. I send "find_node" request to him with my own node hash as a target. I get 8 nodes from him, let’s say the first one hash is: abcdeabcdeabcdeabcde and IP: 2.2.2.2:8888.
Now I send "ping" request to 2.2.2.2:8888 and that node responses me with completely different hash than I got from 1.1.1.1:7777 in "find_node" response. And I see that is not individual case. What’s going on? Why hashes of the same node from 2 different sources are different? Thanks for answer.

This may be a malicious node that does not keep its node ID consistent in an effort to get into as many routing tables as possible. It might be doing that for data harvesting or DoS amplification purposes.
Generally you shouldn't put too much trust in anything that remote nodes report and sanitize the data. In the case of it not keeping its ID consistent you should remove it from your routing table and disregard results returned in its queries. I have listed a bunch of possible sanitizing approaches beyond BEP42 in the documentation of my own DHT implementation.
Another possibility is that the node B simply changed its ID in the meantime (e.g. due to a restart) and node A either has not updated it yet or does not properly keep track of ID changes. But this shouldn't be happening too frequently.
And I see that is not individual case.
In total I would only expect this behavior from a tiny total fraction of the network. So you should compare the number of unique IP addresses sending bogus responses to the number of unique IPs sending sane ones. It's easy to get these kinds of statistics wrong if your implementation is naive and gets trapped by malicious nodes to contact even more malicious nodes.
But during a lookup you may see this more frequently during the terminal phase when you get polluted data from nodes that do not sanitize their routing table properly. As one example old libtorrent versions did not (see related issue; note that I'm not singling out libtorrent here, many implementations are crappy in this area).

It can be, that the 2.2.2.2:8888 does not know its external port / address or it didn’t update it yet. Thus different hashes..

Related

Is there a way to query flow hash in Linux user space?

Suppose I'm receiving UDP packets and I want to know what network hash has been computed for that flow. Is there any way to ask the OS what the hash is for a particular socket/flow?
Seems like there should be a way to query skb->hash from userspace but I can't find a way to get that information.
You indicate that it's skb->hash which needs to be queried. That tricks me into thinking you might be interested in getting per-packet RSS hash values rather than some "flow hash" for a series of packets. If that is indeed the case, the answer is yes, there's a way to accomplish the goal.
In Linux, there's PACKET_MMAP mechanism. Long story short, this generic API allows an application to set up a Rx/Tx ring mmap-ed from the kernel to the user process, bound to a given network interface. Received packets are made available to the application in the ring entries (slots), and each packet is prefixed with a kernel-defined structure enclosing useful meta information. PACKET_MMAP mechanism has different API versions and, in particular, TPACKET_V3 (available starting from ~3.2 kernels) provides RSS hash over there (hv1.tp_rxhash at the link below), if requested/configured.
Furthermore, an application may require that packet reception be load balanced among processes (PACKET_FANOUT_HASH at the link below).
You should probably take a look at https://www.kernel.org/doc/html/latest/networking/packet_mmap.html for more details and useful code examples.

I have a question with option in DNSperf tool in Linux

I tried to use the DNSperf tool that is the benchmark testing tool for an authoritative name server in Linux. This tool has various features to provide the result in many aspects. But I would like to know some options like -c and -q. I tried to observe it from the source code in C. But I don't get it.
-c from the manual, it's written that
-c clients Enables the local server to act as multiple clients and specifies the number of clients represented by this server. The server
sends requests from multiple sockets. By default, the local server
acts as a single client.
In fact, it's just trying to send many queries as many internal threads from the source code. And the maximum of the inputted value in -c option must not exceed 256 that means the length of the socket should not exceed 256?
And two:
I'm also curious about the -q option, it's written that
-q num_queries Sets the maximum number of outstanding requests. When this value is reached, dnsperf stops sending requests until either
response is received or its requests time out. The default value is
100.
What is the trigger runs the number reaches 100? I don't understand about this and I tried to seek out from the source code, it's quite too complex.
Could everyone help me to understand it? I know my question is quite ambiguous, but I'm not sure how to exactly ask the right way about this so please help me.
"-c" option specifies how many local source ports to use when doing queries. This is default to 1. So, you will see all queries using only one source port. Maximum value 256 means that you can use a max of 256 unique src ports to send DNS queries.
"-q" is the queue limit. There can be at most this many queries in the dnsperf queue when it stops generating new queries.
So, if the DNS server is slower than usual and takes a longer time to respond, dnsperf will only generate "-q" number of queries and wait for responses.
For example, if you set "-q" to 100, dnsperf will generate at most 100 queries and wait for their responses or timeout. If it gets 5 responses, it will generate 5 new queries, and again the queue will be full at 100.
If the dns server is fast, it may happen that the queue limit of 100 is never reached, and dnsperf will make DNS queries as fast as possible.
Be aware that using a high value for -c and -q will also likely increase the memory usage of dnsperf tool in certain network conditions.

Difference between Stagers, stages and singles

Using msfconsole and searched for linux x64 payloads.
I came across stagers, stages and singles? They all have 'reverse_tcp' in them which reverses the connection back to the attacker. However I tried looking up the differences between stagers, stages and singles, they seem similar but different. I still don't understand.
Anyone able to explain whats the difference between them so I know which one to use?
The staged payload is send in two parts to the victim. The first part is a small primary payload that do that the victim machine connect to the attacker machine and later the attack machine sends the second part of the payload. It is useful when the buffer has not a big length. (eg. linux/x64/shell/reverse_tcp)
The non-staged (single) payload is send entire the shellcode to the victim. It needs a buffer length bigger. (linux/x64/shell_reverse_tcp)

How to implement node-lru-cache?

I've developed a real time app with Node.js, Socket.io and mongodb. It has a certain requirement that when a user loads a specific page then some 20000 points with x and y coordinates which are between 2 specific dates are fetched from mongodb and rendered on the map to client. Now if the user again reloads, the process is repeated. I'm confused how to insert these points in cache with what key so that when user reloads, the values from cache are fetched easily with the key.
Any suggestions? Thanks!
you could
completly write your own caching layer
use an existing caching library here (e.g.
lru-cache-module by isaacs,
which probably is the most popular in this field)
could use redis as a cache (there is the
ability to set a TTL for inserted docs) there is already a
mongoose-redis-cache-module,
maybe that helps
and potentially x other solutions. it depends on the scale of your data/number of reqests and so on.
The caching is something your database does for you in this case. MongoDB relies on the operating system's memory-mapped I/O for storage. A general purpose OS will usually keep the most frequently used pages in memory. If you still want to use an additional cache, the obvious key to use for coordinates would be a Geohash.
This library runtime-memcache implements lru and a few other caching schemes in javascript. Works with Node.js and written in Typescript.
It uses modified Doubly Linked List to achieve O(1) for get, set and remove.

Is there any modern review of solutions to the 10000 client/sec problem

(Commonly called the C10K problem)
Is there a more contemporary review of solutions to the c10k problem (Last updated: 2 Sept 2006), specifically focused on Linux (epoll, signalfd, eventfd, timerfd..) and libraries like libev or libevent?
Something that discusses all the solved and still unsolved issues on a modern Linux server?
The C10K problem generally assumes you're trying to optimize a single server, but as your referenced article points out "hardware is no longer the bottleneck". Therefore, the first step to take is to make sure it isn't easiest and cheapest to just throw more hardware in the mix.
If we've got a $500 box serving X clients per second, it's a lot more efficient to just buy another $500 box to double our throughput instead of letting an employee gobble up who knows how many hours and dollars trying to figure out how squeeze more out of the original box. Of course, that's assuming our app is multi-server friendly, that we know how to load balance, etc, etc...
Coincidentally, just a few days ago, Programming Reddit or maybe Hacker News mentioned this piece:
Thousands of Threads and Blocking IO
In the early days of Java, my C programming friends laughed at me for doing socket IO with blocking threads; at the time, there was no alternative. These days, with plentiful memory and processors it appears to be a viable strategy.
The article is dated 2008, so it pulls your horizon up by a couple of years.
To answer OP's question, you could say that today the equivalent document is not about optimizing a single server for load, but optimizing your entire online service for load. From that perspective, the number of combinations is so large that what you are looking for is not a document, it is a live website that collects such architectures and frameworks. Such a website exists and its called www.highscalability.com
Side Note 1:
I'd argue against the belief that throwing more hardware at it is a long term solution:
Perhaps the cost of an engineer that "gets" performance is high compared to the cost of a single server. What happens when you scale out? Lets say you have 100 servers. A 10 percent improvement in server capacity can save you 10 servers a month.
Even if you have just two machines, you still need to handle performance spikes. The difference between a service that degrades gracefully under load and one that breaks down is that someone spent time optimizing for the load scenario.
Side note 2:
The subject of this post is slightly misleading. The CK10 document does not try to solve the problem of 10k clients per second. (The number of clients per second is irrelevant unless you also define a workload along with sustained throughput under bounded latency. I think Dan Kegel was aware of this when he wrote that doc.). Look at it instead as a compendium of approaches to build concurrent servers, and micro-benchmarks for the same. Perhaps what has changed between then and now is that you could assume at one point of time that the service was for a website that served static pages. Today the service might be a noSQL datastore, a cache, a proxy or one of hundreds of network infrastructure software pieces.
You can also take a look at this series of articles:
http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3
He shows a fair amount of performance data and the OS configuration work he had to do in order to support 10K and then 1M connections.
It seems like a system with 30GB of RAM could handle 1 million connected clients on a sort of social network type of simulation, using a libevent frontend to an Erlang based app server.
libev runs some benchmarks against themselves and libevent...
I'd recommend Reading Zed Shaw's poll, epoll, science, and superpoll[1]. Why epoll isn't always the answer, and why sometimes it's even better to go with poll, and how to bring the best of both worlds.
[1] http://sheddingbikes.com/posts/1280829388.html
Have a look at the RamCloud project at Stanford: https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud
Their goal is 1,000,000 RPC operations/sec/server. They have numerous benchmarks and commentary on the bottlenecks that are present in a system which would prevent them from reaching their throughput goals.

Resources