Linux driver kthreads and blkid task - linux

I'm currently creating a Linux driver for block devices. This has been going on for some time, and I just recently changed the driver design from bio-mode to request-mode (I used to handle struct bio but now I'm operating on struct request) and it made the functionality simpler, but the change also presented some new issues of which I'm requesting advice. I'm also using a kthread for backgrond maintenance and periodic flushing of entries in my driver-device circular buffers.
One of those issues that I'm having difficulty handling or fully understanding is regarding the slowness and presence of sbin/blkid process after entering commands that alter partition or filesystem type. With my current design, whenever I run fdisk or mkfs or just any command that modify the partition/FS type, upon checking running processes via the ps -ef terminal command, the blkid process runs afterwards and will take around a minute or two before completing and updating drive info in Disk Utility (which the user sees). This is of course too long to wait, and I have to wait before entering another command that modify the partition/FS type, or else the drive info gets messed up, partition table is lost, etc.
I found out that when I came back to my previous bio-mode driver (which did not have kthread) that this blkid task also runs after the same commands as above, but finishes very quickly! I just became aware of blkid because of its slowness now, and not then because it easily completes. And so I decided to play with my driver code, removing stuffs to see which code segment causes slow blkid, and eventually found that the presence of kthread correlates to the slowness of blkid.
The body of my kthread BTW looks as follows:
while (kthread_should_stop()) {
< driver code here... >
Here's what I did to my kthread:
- slowly remove code until nothing is left between the spin_lock/unlock calls. sbin/blkid is still slow.
- remove kthread and replace it with timer that periodically expires, resets itself, and upon every expiry calls a function, to which I moved the contents of the kthread driver code. sbin/blkid is still slow.
- change the schedule_timeout time value from 250ms to 1s - blkid became even slower > 3mins. If I lessen it to 50ms, it's as slow as during 250ms timeout.
I'm really confused - how does my driver kthread and blkid interact in such a way it slows down blkid?


DirectX produces TDR under Intel when reusing command queue

I'm currently working on a directx rendering environment and encounter undefined behaviour under Intel cards.
I'm currently using trivial synchronization, so that the CPU always waits for the GPU to finish after every ExecuteCommandLists().
Everything works perfectly fine, drawing, copying my resources etc.
Since I work with different "passes" (draw passes, memory transfer pass, clear pass) I also have a "present" pass. There I have one commandlist with just 2 tasks, using a resource barrier to transform the render target from render target to present mode. I use the same command queue that is used for the swap chain and where all the drawing happens. Previous tasks were fully executed and waited for.
When I try to execute the two tasks it produces a device removal under Intel cards without any further explanations. On other cards from different vendors it works perfectly fine.
It definitely has something to do with the command queue, when I remove the resource barrier calls, it still crashes. If I create a new command queue it runs through, but crashes as well but in this case with a useful debug message that I need to use the same queue as for the swap chain.
Whats going wrong here and how can I get more information about the device removal?
Thanks for your help!

Monitoring Process Syscalls in Live Environment

I've been working on a project for a little while, and the first step is building a library of syscall traces for processes. Essentially, what I'm trying to do is have system wherein every time a process requests an OS service via a syscall, relevant information (calling process, time, syscall name) of the event get logged to a file.
Theoretically, this sounds like a simple enough thing to do, however, implementing such is becoming more of a pain as time goes on. I suppose the main that's causing issues for me is a general lack of knowing where to start implementation.
Initially, I thought that this could all be handled be adding a few lines of code to the kernel entry point, but after digging through entry_64.S for a little while, I came to the conclusion that there must be an easier way. The next idea I had was to overwrite all the services pointed to by sys_call_table with my own service that did logging then called the original service. But, turns out, there are some difficulties to this method with linux kernel 5.4.18 due to sys_call_table no longer being exported. And, even when recompiling the kernel so that sys_call_table is exported, the table is in a memory protected location. Lastly, I've been experimenting with auditd. Specifically, I followed this link but it doesn't seem to be working (when I executed kill command there was is only a corresponding result in ausearch about 50% of time based on timestamps).
I'm getting a little burned out by all these dead-ends, and am really hoping to finally have this first stage in my project up and running. Does anyone have any pointers as to what I should try?
Solution: BPFTrace was exactly what I was looking for.
I used BPFTrace to log every time the kernel began execution of a syscall (excluding those initiated by BPFTrace itself)

Querying multiple sensors regularly using NodeJS

I need to fetch the values of about 200 sensors every 15 seconds or so. To fetch the values I simply need to make an HTTP call with basic authentication and parse the response. The catch is that these sensors might be on slow connection so I need to wait at least 5 seconds for one sensor (but usually they respond a lot quicker, but there's always some that are slow and timeout).
So right now I have the following setup for that:
There is a NodeJS process that is connected to my DB and knows all about the sensors. It checks regularly to see if there are new ones or there are some that got deleted. It spawns a child process for every sensor, and in case the child process dies it restarts it. Also it kills it if the sensor gets deleted. The child process makes the HTTP call to its sensor with a 5 second timeout value and if it receives the value, saves it to Redis. Also it is in an infinite loop with a 15 seconds setTimeout. And there is a third process that copies all the values from Redis to the main MySQL DB.
So that has been a working solution for half a year, but after a major system upgrade (from Ubuntu 14.04 to 18.04 and thus every package upgraded as well) it seems to leak some memory and I can't seem to figure out where.
After starting out, the processes summarised take about 1.5GB of memory. But after a day or so this goes up to 3GB and so on and before running out of memory I need to kill all node processes and restart the whole thing.
So now I am trying to figure out more efficient methods to achieve the same result (query around 2-300 URLs every 15 sec and store the result in MySQL). At the moment I'm thinking of ditching Redis and the child processes will communicate with their master process and the master process will write to MySQL directly. This way I don't need to load the Redis library into every child process and that might save me some time.
So I need ideas on how to reduce memory usage for that application (I'm limited to PHP and NodeJS, mainly because of my knowledge, so writing a native daemon might be out of the question)
The solution was easier than I thought. I had to rewrite the child process into a native bash script and that brought down the memory usage to almost being zero.

How does Erlang sleep (at night?)

I want to run a small clean up process every few hours on an Erlang server.
I know of the timer module. I saw an example in a tutorial used chained timer:sleep commands to wait for an event that would occur multiple days later, which I found strange. I understand that Erlang process are unique compared to those in other languages, but the idea of a process/thread sleeping for days, weeks, and even months at a time seemed odd.
So I set out to find out the details of what sleeping actually does. The closest I found was a blog post mentioning that sleep is implemented with a receive timeout, but that still left the question:
What do these sleep/sleep-like functions actually do?
Is my process taking up resources as it sleeps? Would having thousands of sleeping process use as many resources, as say, thousands of process servicing a recursive call that did nothing? Is there any performance penalty from repeatedly sleeping within processes, or sleeping for long periods of time? Is the VM constantly expending resources to see if the conditions to end the processes' sleep are up?
And as a side note, I'd appreciate if someone could comment on if there is a better way than sleeping to pause for hours or days at a time?
That is the Karma of any erlang process: it waits or dies :o)
when a process is spawned, it start executing until the last execution line, and die, returning the last evaluation.
To keep a process alive, there is no other solution to recursively loop in a never ending succession of calls.
of course there are several conditions that make it stop or sleep:
end of the loop: the process received a message which tell him to
stop recursion
a receive bloc: the process will wait until a message
matching one entry in the receive bloc is posted in the message
The VM scheduler stop it temporarily to let access to the CPU
to other processes
in the 2 last cases the execution will restart under the responsibility of the VM scheduler.
while waiting it uses no CPU bandwidth, but keeps the exact same memory layout it had when it started waiting. The Erlang OTP offers some means to reduce this memory layout to the minimum using the hibernate option (see the documentation of gen_serevr or gen_fsm, but it is for advanced usage only in my mind).
a simple way to create a "signal" that will fire a process at regular (or almost regular) interval is effectively to use receive block with timout (The timeout is limited to 65535 ms), for example:
on_tick_sec(Module,Function,Arglist,Period) ->
on_tick_mn(Module,Function,Arglist,Period) ->
on_tick_hr(Module,Function,Arglist,Period) ->
on_tick(Module,Function,Arglist,TimeBase,Period,Period) ->
on_tick(Module,Function,Arglist,TimeBase,Period,CountTimeBase) ->
stop -> stopped
after TimeBase ->
and usage:
1> Pid = spawn(util,on_tick_sec,[io,format,["hello~n"],5]).
2> Pid ! stop.
The timer module is a standard gen_server running in a separate process. All the function in the timer module are public interfaces that execute a hidden gen_server:call or gen_server:cast to the timer server. This is a common usage to hide the internal of a server and allow further evolutions without impact on existing applications.
The server uses internally a table (ets) to store all the actions it has to do along with each timer reference and it uses its own function to be awaken when needed (at the end, the VM must take care of this ?).
So you can hibernate a process without any effect on the timer server behavior. The hibernation mechanism is
tricky, see documentation at hibernate/3 definition, you will see that yo have to "rebuild" the context by yourself since everything was removed from the process context, and a tuple(Module,Function,Arguments} is stored by the system to restart your process when needed.
cost some time in garbage collecting and process restart
It is why I said that it is really an advance feature that need good reason to be used.
There is also erlang:hibernate/3 that puts a process in "deep sleep", minimizing memory usage for it.

getting notified on flock/lockf/fcntl changes without polling

Is there a way (in Linux) of getting updates on the lockedness status of a file without polling?
I know that the status can be polled via a lockf(fd, F_TEST) or speculative LOCK_NB|LOCK_SH, but polling is bad(tm).
Of course, finding out when a file is NOT locked can be done with a simple lock attempt, but I want to sample the other edge too (use-case: a (large) program uses lockf to synchronize between instances - I can probably get it changed to flock, and I want to add a GUI that displays when the lock is acquirable, of course while not hogging the lock).
Note that inotify does not work in this case, at least on linux 3.9.1.
