Knot Resolver: Paralelism and concurrency in modules - dns

Context
Dear Knot Resolver users, I have a module that hooks into Knot's finish phase,
static knot_layer_api_t _layer = {
.finish = &collect,
};
the purpose of the collect function static int collect(knot_layer_t *ctx) { is to ask an external oraculum via a REST API whether a particular domain is listed for containing a malware or phishing campaign and whether it should be resolved or sinkholed.
It works well as long as Knot Resolver is not targeted with hundreds of concurrent DNS requests.
When that happens, given the fact that the oraculum's API response time varies and could be as long as tens to hundreds of milliseconds on occasion,
clients start to temporarily perceive very long response times from Knot Resolver, far exceeding the hard timeout set on communication to oraculum's API.
Possible problem
I think that the scaling-with-processes actually
renders the module very inefficiently implemented, because queries are being queued and processed by
module one by one (in a particular process). That means if n queries almost-hit oraculum's API timeout limit t, the client
who sent its n+1 query to this particular kresd process, will perceive a very long response time of accumulated n*t.
Or would it? Am I completely off?
When I prototyped similar functionality in GoDNS using goroutines, GoDNS server (at the cost of hideous CPU usage) let numerous
DNS clients' queries talk to the oraculum and return to clients "concurrently".
Question
Is it O.K. to use Apache Portable Runtime threading or OpenMP threading and to start hiding the API's response time in the module? Isn't it a complete Knot Resolver antipattern?
I'm caching oraculum's API responses in a simple in memory ephemeral LRU cache that resides in each kresd process. Would it be possible to use kresd's own MVCC cache instead for my arbitrary structure?
Is it possible that the problem is elsewhere, for instance, that Knot Resolver doesn't expect any blocking delay in finish layer and thus some network queue is filled and subsequent DNS queries are rejected and/or intolerably delayed?
Thanks for pointers (pun intended)

A Knot Resolver developer here :-) (I also repeat some things answered by Jan already.)
Scaling-with-processes is able to work fine. Waiting for responses from name-servers is done by libuv (via event-loop and callbacks, all within a single thread).
Due to the single-threaded style, no layer function should be blocked (on I/O), as that would make everything block on it. AFAIK currently the only case when this can really happen is when (part of) the cache gets swapped-out.
There is the YIELD state http://knot-resolver.readthedocs.io/en/latest/lib.html?highlight=yield It's used when a sub-request is needed before processing of the layer can continue, but I currently don't know details of its working. I don't think it's directly applicable, as resuming the layers seems currently only triggered by a sub-request finishing.
Cache: if you put your module before the rrcache module and you change the RRset, it will get cached changed already.

Knot DNS developer here (not Resolver though). I think you are right. My understanding is that the layer code is executed synchronously in the daemon thread. The asynchrony appears only at the resolver network I/O level.
Internally the server runs libuv loop which just executes callbacks for events on primitives provided by libuv (sockets, timers, signals, etc.). The problem is that you just cannot suspend the running callback (C function) at an arbitrary point, escape back to libuv loop, and continue with the callback execution at some point later.
That said, asynchronous waiting for an event can happen only where this was expected. And the code driving layers doesn't expect that.
Answers:
I'm not very familiar with libapr or OpenMP. But I don't think this could be really solved without reworking the layer interface and making it asynchronous.
The shared cache could be used for sure. If you cannot find the API, jolly Knot DNS folks will happily accept a patch or help you writing one.
This is exactly the case. Knot Resolver doesn't expect blocking code in the layer finish callback.

Related

Does node-cache uses locks

I'm trying to understand if the node-cache package uses locks for the cache object and can't find anything.
I tried to look at the source code and it doesn't look like it, but this answer suggests otherwise with the quote:
So there is Redis and node-cache for memory locks.
This cache is used in a CRUD server and I want to make sure that GET/UPDATE requests will not create a race condition on the data.
I don't see any evidence of locking in the code.
If two requests for the same key which is not in the cache are made one after the other, then it will launch two separate fetch() operations and whichever request comes back last is the one that will remain in the cache. This is probably not normally a problem, but an improved implementation could make only one request for that same key and have the second request just wait for the first request to provide the value that was already in flight.
Since the cache itself is all in-memory, all access to the cache is synchronous and thus regulated by Javascript's single threaded nature. So, the only place concurrency issues could affect things in the cache code itself are when they launch an asynchronous fetch() operation.
There are, of course, race conditions waiting to happen in how one uses the code that accesses the data just like there are with a database interface so the calling code has to be smart about how it uses the interface to avoid creating race conditions because of how it calls things.
Unfortunately no, you can write a unit test to confirm it.
I have written a library to fix that and also added read through method to easy the code usage:
https://github.com/KhanhPham2411/node-cache-async-lock

Comparison of Nodejs EventLoop (with cluster module) and Golang Scheduler

In nodejs the main critics are based on its single threaded event loop model.
The biggest disadvantage of nodejs is that one can not perform CPU intensive tasks in the application. For demonstration purpose, lets take the example of a while loop (which is perhaps analogous to a db function returning hundred thousand of records and then processing those records in nodejs.)
while(1){
x++
}
Such sort of the code will block the main stack and consequently all other tasks waiting in the Event Queue will never get the chance to be executed. (and in a web Applications, new users will not be able to connect to the App).
However, one could possibly use module like cluster to leverage the multi core system and partially solve the above issue. The Cluster module allows one to create a small network of separate processes which can share server ports, which gives the Node.js application access to the full power of the server. (However, one of the biggest disadvantage of using Cluster is that the state cannot be maintained in the application code).
But again there is a high possibility that we would end up in the same situation (as described above) again if there is too much server load.
When I started learning the Go language and had a look at its architecture and goroutines, I thought it would possibly solve the problem that arises due to the single threaded event loop model of nodejs. And that it would probably avoid the above scenario of CPU intensive tasks, until I came across this interesting code, which blocks all of the GO application and nothing happens, much like a while loop in nodejs.
func main() {
var x int
threads := runtime.GOMAXPROCS(0)
for i := 0; i < threads; i++ {
go func() {
for { x++ }
}()
}
time.Sleep(time.Second)
fmt.Println("x =", x)
}
//or perhaps even if we use some number that is just greater than the threads.
So, the question is, if I have an application which is load intensive and there would be lot of CPU intensive tasks as well, I could probably get stuck in the above sort of scenario. (where db returns numerous amount of rows and then the application need to process and modify some thing in those rows). Would not the incoming users would be blocked and so would all other tasks as well?
So, how could the above problem be solved?
P.S
Or perhaps, the use cases I have mentioned does not make much of the sense? :)
Currently (Go 1.11 and earlier versions) your so-called
tight loop will indeed clog the code.
This would happen simply because currently the Go compiler
inserts code which does "preemption checks" («should I yield
to the scheduler so it runs another goroutine?») only in
prologues of the functions it compiles (almost, but let's not digress).
If your loop does not call any function, no preemption checks
will be made.
The Go developers are well aware of this
and are working on eventually alleviating this issue.
Still, note that your alleged problem is a non-issue in
most real-world scenarious: the code which performs long
runs of CPU-intensive work without calling any function
is rare and far in between.
In the cases, where you really have such code and you have
detected it really makes other goroutines starve
(let me underline: you have detected that through profiling—as
opposed to just conjuring up "it must be slow"), you may
apply several techniques to deal with this:
Insert calls to runtime.Gosched() in certain key points
of your long-running CPU-intensive code.
This will forcibly relinquish control to another goroutine
while not actually suspending the caller goroutine (so it will
run as soon as it will have been scheduled again).
Dedicate OS threads for the goroutines running
those CPU hogs:
Bound the set of such CPU hogs to, say, N "worker goroutines";
Put a dispatcher in front of them (this is called "fan-out");
Make sure that N is sensibly smaller than runtime.GOMAXPROCS
or raise the latter so that you have those N extra threads.
Shovel units of work to those dedicated goroutines via the dispatcher.

General Strategies for Profiling Simultaneous Asynchronous Requests

We have a system that makes 1 to N asynchronous requests ("foo") within the same time frame. These are launched on threads other than the main and all of these requests don't necessarily originate from the same thread.
Callbacks for the asynchronous requests are all handled on one specific thread, which for the sake of discussion, we'll call the 'bar' thread.
Everything done 'request side' is opaque to us. We don't have access to that library.
Up to this point in time, we've gotten away with a very naive profiler which basically calls markStart('measurement name') and markDone('measurement name') to time a request. I'm getting closer to having to profile the individual foo requests, from the time we start the foo request, to when it is handled by bar.
Obviously our existing profiler won't work, and I'll need to introduce a way to associate the correct markDone() call in callback with its corresponding markStart() from a foo.
If our requests had some manner of sequence number returned in response it would be straight forward, however we don't have those.
Is there a smart, generic way that I can associate an ID with each of the requests, that is visible across threads, or is profiling in this situation usually handled differently (if at all)?
I don't know of any profiler that will be useful for this.
That doesn't mean they don't exist.
I have faced this kind of problem before.
I wrote a book, and discussed this in it.
Basically I came up with two methods, one that works within-thread, and the other across threads.
You really need both, because either one can spend time unnecessarily.
So here are some scanned pages:

Channels in Go, and emitters in node.js?

Does Go have an equivalent of node.js' "emitter"?
I'm teaching myself Go by porting over a node.js library I wrote. In the node version, the library emits an event once something happens (e.g. it listens on UDP port 1234 and when "ABC" is received, "abcreceived" is emitted so the calling code can respond as necessary (e.g. sending back "DEF")
I've seen channels in Go (and am currently reading up on them), but as I'm still new to this language, I don't know if (or how, for that matter) that can be used to communicate with whatever code is using my library.
I've also seen https://github.com/chuckpreslar/emission, but am not sure if this is acceptable, or if there's a better ("Best practice") way of doing things.
Go and Node.js are very different. Node.js supports concurrency only via callbacks. There might be various ways of dressing them up, but they're fundamentally callbacks.
In Node.js, there is no parallelism; Node.js has a single-threaded runtime. When Node.js async is used to achieve what is called 'parallel' execution, it isn't parallel in the sense used in Go, but concurrent.
Concurrency is not parallelism in the Go world.
Go has explicit concurrency based on Communicating Sequential Processes (CSP), a mathematical basis conceived by Tony Hoare at Oxford. The runtime interleaves cooperating processes called goroutines by time-slicing them onto the available CPU cores. Within each goroutine, the code is single threaded, so is easy to write. In the simple case, no data is shared between goroutines; instead messages pass between them along channels. In this way, there is no need for callbacks.
When goroutines get blocked waiting for I/O, that's OK because they don't use any CPU time until they're unblocked. Their memory footprint is slight and you can have very large numbers of them. So callbacks are not needed for I/O operations either.
Because the execution models of Go and Node.js are about as different as they could be, attempting to port code from one to the other is very likely to lead to very clumsy solutions. It's better to start from the original requirements and implement from scratch.
It would be possible to distort the Go concurrency model using function arguments to behave like callbacks. This would be a bad idea because it would not be idiomatic and would lose the benefits that CSP gives.
So by reading others' Go code and some links in the comments to my question, I think channels are the way to go.
In my library code (semi pseudo-code):
// Make a new channel called "Events"
var Events = make(chan
func doSomething() {
// ...
Events <-"abcreceived" // Add "abcreceived" to the Events channel
}
And in the code that will use my library:
evt := <-mylib.Events
switch evt {
case "abcreceived":
sendBackDEF()
break
// ...
}
I still prefer node.js' EventEmitter (because you can transfer data back easily) but for simple things, this should suffice.

Node Background Threads - When Do These Get Created?

I've been doing a fair amount of work with Node lately, trying to build a system which has certain characteristics, one of which is non-blocking / parallelism - a Node strong suit, as I understand it.
What I don't fully understand is when a separate thread is spun off to handle some processing. I'm pretty sue this happens on a function call/call back, but certainly not all of them.
In my specific case, it's an Express based app. At app start-up it does several things including instantiating a RabbitMQ based "bus", an object with a method which will write to the bus (objA) and object which will subscribe to the bus and process messages coming across it (objB).
objA will write to the bus inside an express callback
app.put((req,res) => {
objA.methodWhichWritesToBus();
});
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
Is that the only point at which this sort of thing happens? methodWhichWritesToBus is IO instensive (it calls an elastic search service on another box and brings back 10's to 100's of thousands of records) with lots of chained promises etc., but none of that gets split off, does it?
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Finally, are the ways to effect/force a method etc to "run in the background"?
I've been noodling this, testing it, for awhile now but all on one machine so it's difficult to tell what's going on.
Who can clarify this for me?
Pre-answer: this is a topic best learned by going and reading, doing coding exercises to solidify your understanding, and working with the technology in a significant way. You're not going to "get it" based on a Q&A format. That said...
What I don't fully understand is when a separate thread is spun off to handle some processing.
Never, sort of. "Processing" as in the computation that happens in your javascript program, happens in the main event loop thread. End of story. However, waiting on I/O to come back from the OS is not considered "processing" so there are various queues managed by node and the OS to track pending I/O requests and invoke callbacks when data is ready. There are a handful of threads node uses internally to manage this stuff with the OS, but from your program's perspective, those threads are irrelevant. Your program can ask node to do some IO, then your program keeps running in parallel, and when the I/O is done, node will eventually invoke the callback in the main event loop and you can process the results.
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
You call it "asynchronously" and it happens whenever you do IO, including filesystem calls, networking, or child processes. Which is to say, quite a lot.
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Nope.
Finally, are the ways to effect/force a method etc to "run in the background"?
Generally I/O is done asynchronously by default, so no you don't normally need to force anything to run in the background. It's baked into the node design by way of the node core APIs themselves. However, there are ways to delay synchronous processing to a future event loop using setImmediate, setTimeout, or process.nextTick. I explain these in some detail in my blog post setTimeout and friends.
More precisely, all networking is asynchronous. End of story. Specifically, the APIs in node core that are available are all asynchronous, and there's simply no synchronous API available in node. For filesystem IO and child processes, there are both synchronous and asynchronous APIs, but the synchronous APIs must only be used under special limited circumstances, and if you don't know confidently that it's OK in this specific case to make a synchronous IO API call, you should use the asynchronous API so you don't break the lynchpin that makes node perform as it does.

Resources