Detect if userspace module dies unexpectidly from Linux driver - linux

I have a Linux driver which allows userspace applications to register some application specific data which would effect how the driver functions. Typically, the userspace application would deregister its information before exiting. If the application were to crash however, then there would be no way to deregister the information, and there would be a memory leak. I'm wondering if it's possible to detect from kernel space, if a userspace application has exited unexpectedly.

struct task_struct *task = current; would give you the pointer to the current task. task->comm and task->pid will give the name and PID for the process. When the userspace application calls your driver for registering its data (Assuming you are using a character device interface, write) you can add the pid to the list. Run a timer and whenever the timer expires, in the callback, initialize the task pointer again to current and go through the circular linked list by task = task->next until task becomes current again and see if all the pids in the list still exist in the kernel task list. If you find anything missing,

Along with the method in my previous answer, instead of polling, you can use find_task_by_vpid(pid); to get the task structure for the task directly. Comment says this must be called under rcu_read_lock().
Another option would be to implement a generic netlink socket communication between application and kernel module. Here the application will have to respond to the kernel messages. That would be more work compared to the first approach where if find_task_by_vpid() returns NULL, you can conclude your userspace process is dead.

Related

How to detect if a linux thread is crashed

I've this problem, I need to understand if a Linux thread is running or not due to crash and not for normal exit. The reason to do that is try to restart the thread without reset\restart all system.
The pthread_join() seems not a good option because I've several thread to monitoring and the function return on specific thread, It doesn't work in "parallel". At moment I've a keeep live signal from thread to main but I'm looking for some system call or thread attribute to understand the state
Any suggestion?
P
Thread "crashes"
How to detect if a linux thread is crashed
if (0) //...
That is, the only way that a pthreads thread can terminate abnormally while other threads in the process continue to run is via thread cancellation,* which is not well described as a "crash". In particular, if a signal is received whose effect is abnormal termination then the whole process terminates, not just the thread that handled the signal. Other kinds of errors do not cause threads to terminate.
On the other hand, if by "crash" you mean normal termination in response to the thread detecting an error condition, then you have no limitation on what the thread can do prior to terminating to communicate about its state. For example,
it could update a shared object that tracks information about your threads
it could write to a pipe designated for the purpose
it could raise a signal
If you like, you can use pthread_cleanup_push() to register thread cleanup handlers to help with that.
On the third hand, if you're asking about detecting live threads that are failing to make progress -- because they are deadlocked, for example -- then your best bet is probably to implement some form of heartbeat monitor. That would involve each thread you want to monitor periodically updating a shared object that tracks the time of each thread's last update. If a thread goes too long between beats then you can guess that it may be stalled. This requires you to instrument all the threads you want to monitor.
Thread cancellation
You should not use thread cancellation. But if you did, and if you include termination because of cancellation in your definition of "crash", then you still have all the options above available to you, but you must engage them by registering one or more cleanup handlers.
GNU-specific options
The main issues with using pthread_join() to check thread state are
it doesn't work for daemon threads, and
pthread_join() blocks until the specified thread terminates.
For daemon threads, you need one of the approaches already discussed, but for ordinary threads on GNU/Linux, Glibc provides non-standard pthread_tryjoin_np(), which performs a non-blocking attempt to join a thread, and also pthread_timedjoin_np(), which performs a join attempt with a timeout. If you are willing to rely on Glibc-specific functions then one of these might serve your purpose.
Linux-specific options
The Linux kernel makes per-process thread status information available via the /proc filesystem. See How to check the state of Linux threads?, for example. Do be aware, however, that the details vary a bit from one kernel version to another. And if you're planning to do this a lot, then also be aware that even though /proc is a virtual filesystem (so no physical disk is involved), you still access it via slow-ish I/O interfaces.
Any of the other alternatives is probably better than reading files in /proc. I mention it only for completeness.
Overall
I'm looking for some system call or thread attribute to understand the state
The pthreads API does not provide a "have you terminated?" function or any other such state-inquiry function, unless you count pthread_join(). If you want that then you need to roll your own, which you can do by means of some of the facilities already discussed.
*Do not use thread cancellation.

Is it possible to use eBPF to block a malicious process in kernel space?

One way to block a malicious process is tracing its behavior in kernel space eBPF program and then just simply kill it in user space program, but there is latency before user space program receiving data from kernel space. I wonder if there is a way to kill a malicious process in kernel space eBPF program as it is more efficient.
The BPF helper function bpf_send_signal() can be used to send a signal to the process of the monitored task, see its documentation:
long bpf_send_signal(u32 sig)
Description
Send signal sig to the process of the current task.
The signal may be delivered to any of this
process's threads.
Return
0 on success or successfully queued.
-EBUSY if work queue under nmi is full.
-EINVAL if sig is invalid.
-EPERM if no permission to send the sig.
-EAGAIN if bpf program can try again.
The signal to pass can be SIGKILL, for example.
Some projects use it already: Tetragon, a tool based on eBPF for “security observability and runtime enforcement”, can call it to terminate processes.
This helper is available starting with Linux 5.3.

Start process and terminate at later stage

In am writing an SDK in Go that integrators will communicate with via a local socket connection.
From the integrating application I need a way to start the SDK as a process but more importantly, I need to be able to cancel that process when the main application is closing too.
This question is language agnostic (I think) as I think the challenge is linux related. i.e. How to start a program and cancel it at a later stage.
Some possible approaches:
I am thinking that it's a case of starting the program via exec, getting it's PID or some ID then using that to kill later. Sudo may be required to do this, which is not ideal. Also, not good practice as you will be effectively force closing the SDK, offering no time for cleanup.
Start the program via any means. Once ready to close, just send a "shutdown" command via the SDK API which will allow the SDK to cleanup, manage state then exit the application.
What is best practice for this please?
Assuming you're using Linux or similar Unix:
You are on the right track. You won't need sudo. The comments thus far are pointing in the right direction, but not spelling out the details.
See section 2 of the manual pages (man 2 ...) for details on the functions mentioned here. They are documented for calling from C. I don't have experience with Go to help determine how to use them there.
The integrator application will be called the "parent" process. The SDK-as-a-process will be called the "child" process. A process creates a child and becomes its parent by calling fork(). The new process is a duplicate of the parent, running the same code, and having for the most part all the same state (data in memory). But fork() returns different values to parent and child, so each can determine its role in the relationship. This includes informing the parent of the process identifier (pid) of the child. Hang on to this value. Then, the child uses exec() to execute a different program within the existing process, i.e. your SDK binary. An alternative to fork-then-exec is posix_spawn(), which has rather involved parameters (but gives greater control if you need it).
Designing the child to shutdown in response to a signal, rather than a command through the API, will allow processes other than the parent to initiate clean shutdown in standard fashion. For example, this might be useful for the administrator or user; it enables sending the shutdown signal from the shell command-line or script.
The child installs a signal handler function, that will be called when the child process receives a signal, by calling signal() (or the more complex sigaction() recommended for its portability). There are different signals that can be sent/received, identified by different integer values (and also given names like SIGTERM). You indicate which you're interested in receiving when calling signal(). When your signal handler function is invoked, you've received the signal, and can initiate clean shutdown.
When the parent wants the child to shut down cleanly, the parent sends a signal to the child using the unfortunately named kill(). Unfortunately named because signals can be used for other purposes. Anyway, you pass to kill() the pid (returned by fork()) and the specific signal (e.g. SIGTERM) you want to send.
The parent can also determine when the child has completely shut down by calling waitpid(), again passing the pid returned by fork(); or alternately by registering to receive signal SIGCHLD. Register to receive SIGCHLD before fork()/exec() or you might miss the signal.
Actually, it's important that you do call waitpid(), optionally after receiving SIGCHLD, in order to deallocate a resource holding the child process's exit status, so the OS can cleanup that last remnant of the process. Failing to do so keeps the child as a "zombie" process, unable to be fully reclaimed. Too many zombies and the OS will be unable to launch new processes.
If a process refuses to shut down cleanly or as quickly as you require, you may force it to quit (without executing its cleanup code) by sending the signal SIGKILL.
There are variants of exec(), waitpid() and posix_spawn(), with different names and behaviors, mentioned in their man pages.

get current->pid while in interrupt

I'm writing something on the linux scheduler and I need to know which process was running before my interrupt came in.. is the current structure available? If I do current->pid while in the interrupt handler, do I get the pid of the process I interrupted?
You can, current->pid exists and is the process that was interrupted (may be the idle thread, or any).
If you're writing inside the Linux scheduler, you should be very careful. current is changed by the scheduler as it chooses a new process to run, so its value depends on when exactly you read it.
I wouldn't expect current to be valid outside process context. If you're working on scheduler maybe you can get hold of where it stores pointer to running task, e.g. struct cfs_rq.

How can I make system calls invoke my SIGSEGV handler when given protected memory?

I'm working on a memory tracking library where we use mprotect to remove access to most of a program's memory and a SIGSEGV handler to restore access to individual pages as the program touches them. This works great most of the time.
My problem is that when the program invokes a system call (say read) with memory that my library has marked no access, the system call just returns -1 and sets errno to EFAULT. This changes behavior of the programs being tested in strange ways. I would like to be able to restore access to each page of memory given to a system call before it actually goes to the kernel.
My current approach is to create a wrapper for each system call that touches memory. Each wrapper would touch all the memory given to it before handing it off to the real system call. It seems like this will work for calls made directly from the program, but not for those made by libc (for instance, fread will call read directly without using my wrapper). Is there any better approach? How is it possible to get this behavior?
You can use ptrace(2) to achieve this. It allows you to monitor a process and get told whenever certain events occur. For your purposes, look at PTRACE_SYSCALL which allows you to stop the process upon syscall entry and exit.
You will have to change some of your memory tracking infrastructure, however, as ptrace operates such that a parent process monitors a child process, and as far as the child is concerned it doesn't have visibility of when a monitored event occurs. Having said that, you should be able to do something along the lines of:
Setup ptrace parent and child, monitoring (at least) PTRACE_SYSCALL.
Child process does a syscall; and parent is notified.
Parent saves the requested syscall info; and uses PTRACE_GETREGS and PTRACE_SETREGS to change child state so instead of calling the syscall; the child process calls the 'memory unprotect' routine.
Child unprotect's it's memory; then raises SIGUSR1 or similar to tell controlling parent that the memory work is complete.
Parent catches SIGUSR, uses PTRACE_SETREGS to restore the previouly-saved syscall info and resumes the child.
Child resumes and executes the orignal syscall.

Resources