On OpenBSD:
I want to harden an OpenBSD install. For this imho:
sysctl -w kern.wxabort=1
would be more secure, the default is 0.
W^X violations are no longer permitted by default. A kernel log message
is generated, and mprotect/mmap return ENOTSUP. If the sysctl(8) flag
kern.wxabort is set then a SIGABRT occurs instead, for gdb use or
coredump creation.
so:
SIGABRT Abnormal termination
ENOTSUP Operation not supported (POSIX.1)
so for me (not a programmer) means that maybe SIGABRT is better, since it will kill (?) the process, not just an informational message. From security perspective, killing the badly behaving process is more secure.
Question: Is this true? Is using SIGABRT is more secure? Does SIGABRT really kills the process? Or they (SIGABRT vs. ENOTSUP) are almost the same and doesn't kill the process?
Preventing the operation is where you get security. Killing the process is bonus punishment. We're talking about processes not people, though, so punishment isn't necessary.
The question is whether the processes you're interested in handle errors well. If getting an error code back causes them to derail and do undesirable things, then you may want to send them a signal. Or, as the documentation says, if you want a coredump or want to break in with a debugger, SIGABRT would be useful.
Keep in mind that SIGABRT can be caught. Processes can ignore the signal if they want.
Bottom line, there's no real added security from enabling this option.
Related
To verify the behavior of a third party binary distributed software I'd like to use, I'm implementing a kernel module whose objective is to keep track of each child this software produces and terminates.
The target binary is a Golang produced one, and it is heavily multi thread.
The kernel module I wrote installs hooks on the kernel functions _do_fork() and do_exit() to keep track of each process/thread this binary produces and terminates.
The LKM works, more or less.
During some conditions, however, I have a scenario I'm not able to explain.
It seems like a process/thread could terminate without passing through do_exit().
The evidence I collected by putting printk() shows the process creation but does not indicate the process termination.
I'm aware that printk() can be slow, and I'm also aware that messages can be lost in such situations.
Trying to prevent message loss due to slow console (for this particular application, serial tty 115200 is used), I tried to implement a quicker console, and messages have been collected using netconsole.
The described setup seems to confirm a process can terminate without pass through the do_exit() function.
But because I wasn't sure my messages couldn't be lost on the printk() infrastructure, I decided to repeat the same test but replacing printk() with ftrace_printk(), which should be a leaner alternative to printk().
Still the same result, occasionally I see processes not passing through the do_exit(), and verifying if the PID is currently running, I have to face the fact that it is not running.
Also note that I put my hook in the do_exit() kernel function as the first instruction to ensure the function flow does not terminate inside a called function.
My question is then the following:
Can a Linux process terminate without its flow pass through the do_exit() function?
If so, can someone give me a hint of what this scenario can be?
After a long debug session, I'm finally able to answer my own question.
That's not all; I'm also able to explain why I saw the strange behavior I described in my scenario.
Let's start from the beginning: monitoring a heavily multithreading application. I observed rare cases where a PID that suddenly stops exists without observing its flow to pass through the Linux Kernel do_exit() function.
Because this my original question:
Can a Linux process terminate without pass through the do_exit() function?
As for my current knowledge, which I would by now consider reasonably extensive, a Linux process can not end its execution without pass through the do_exit() function.
But this answer is in contrast with my observations, and the problem leading me to this question is still there.
Someone here suggested that the strange behavior I watched was because my observations were somehow wrong, alluding my method was inaccurate, as for my conclusions.
My observations were correct, and the process I watched didn't pass through the do_exit() but terminated.
To explain this phenomenon, I want to put on the table another question that I think internet searchers may find somehow useful:
Can two processes share the same PID?
If you'd asked me this a month ago, I'd surely answered this question with: "definitively no, two processes can not share the same PID."
Linux is more complex, though.
There's a situation in which, in a Linux system, two different processes can share the same PID!
https://elixir.bootlin.com/linux/v4.19.20/source/fs/exec.c#L1141
Surprisingly, this behavior does not harm anyone; when this happens, one of these two processes is a zombie.
updated to correct an error
The circumstances of this duplicate PID are more intricate than those described previously. The process must flush the previous exec context if a threaded process forks before invoking an execve (the fork copies also the threads). If the intention is to use the execve() function to execute a new text, the kernel must first call the flush_old_exec() function, which then calls the de_thread() function for each thread in the process other than the task leader. Except the task leader, all the process' threads are eliminated as a result. Each thread's PID is changed to that of the leader, and if it is not immediately terminated, for example because it needs to wait an operation completion, it keeps using that PID.
end of the update
That was what I was watching; the PID I was monitoring did not pass through the do_exit() because when the corresponding thread terminated, it had no more the PID it had when it started, but it had its leader's.
For people who know the Linux Kernel's mechanics very well, this is nothing to be surprised for; this behavior is intended and hasn't changed since 2.6.17.
Current 5.10.3, is still this way.
Hoping this to be useful to internet searchers; I'd also like to add that this also answers the followings:
Question: Can a Linux process/thread terminate without pass through do_exit()? Answer: NO, do_exit() is the only way a process has to end its execution — both intentional than unintentional.
Question: Can two processes share the same PID? Answer: Normally don't. There's some rare case in which two schedulable entities have the same PID.
Question: Do Linux kernel have scenarios where a process change its PID? Answer: yes, there's at least one scenario where a Process changes its PID.
Can a Linux process terminate without its flow pass through the do_exit() function?
Probably not, but you should study the source code of the Linux kernel to be sure. Ask on KernelNewbies. Kernel threads and udev or systemd related things (or perhaps modprobe or the older hotplug) are probable exceptions. When your /sbin/init of pid 1 terminates (that should not happen) strange things would happen.
The LKM works, more or less.
What does that means? How could a kernel module half-work?
And in real life, it does happen sometimes that your Linux kernel is panicking or crashes (and it could happen with your LKM, if it has not been peer-reviewed by the Linux kernel community). In such a case, there is no more any notion of processes, since they are an abstraction provided by a living Linux kernel.
See also dmesg(1), strace(1), proc(5), syscalls(2), ptrace(2), clone(2), fork(2), execve(2), waitpid(2), elf(5), credentials(7), pthreads(7)
Look also inside the source code of your libc, e.g. GNU libc or musl-libc
Of course, see Linux From Scratch and Advanced Linux Programming
And verifying if the PID is currently running,
This can be done is user land with /proc/, or using kill(2) with a 0 signal (and maybe also pidfd_send_signal(2)...)
PS. I still don't understand why you need to write a kernel module or change the kernel code. My intuition would be to avoid doing that when possible.
According to the ptrace documentation.
Stop the tracee at the next clone(2) and automatically start tracing the newly cloned process, which will start with a SIGSTOP, or PTRACE_EVENT_STOP if PTRACE_SEIZE was used.
The problem is that SIGSTOP may not be caused by ptrace at all - even the user can send this signal to the process. Child process being stopped by PTRACE_EVENT_STOP would be more than perfect in this case.
I'm spawning a child process myself so using PTRACE_TRACEME is the best way to start tracing it - it's free of race conditions. If I insist on using PTRACE_SEIZE instead, the child process may have already exited before I call PTRACE_SEIZE in the parent process.
Is there any way to prevent the child process from receiving a plain SIGSTOP when tracing with PTRACE_TRACEME?
In a nutshell, you can't.
There is good news, however. Since Linux version 3.4, ptrace supports a new operation, PTRACE_SEIZE. It is from the parent rather than the child, so the attach semantics are somewhat different. Other than that, it has a few differences, one of which is that it solves this particular problem.
You will need to read the man page to get the gory details. Pretty much everything about the way events are reported has changed if you use it. This problem (along with similar ones) is precisely the reason it was introduced, so if that problem bothers you, you should definitely use PTRACE_SEIZE instead of PTRACE_TRACEME, despite the inconvenience.
If I receive ENOBUFS or ENOMEM during a call to read(2), is it possible that the kernel may free up resources and a future call will succeed? Or, do I treat the error as fatal, and begin a teardown process?
I'm a bit at a loss to see what possible use may come from retrying.
If you got back ENOMEM on a read, it means the kernel is in serious trouble. Yes, it is possible that retrying might work, but it is also possible it will not. If it will not, how long is appropriate to wait before retrying? If you retry immediately, what's to prevent you from adding another process doing 100% CPU bound loop?
Personally, if I got such an error from a read for which I know how to handle errors, I'd handle the error as usual. If it is a situation where I positively need the read to succeed, then I'd fail the program. If this program is mission critical, you will need to run it inside a watchdog that restarts it anyway.
On that note, please bear in mind that if the kernel returned ENOMEM, there is a non-negligible probability that the OOM killer will send SIGKILL to someone. Experience has shown that someone will likely be your process. That is just one more reason to just exit, and handle that exit with a watchdog monitoring the process (bear in mind, however, that the watchdog might also get a SIGKILL if the OOM killer was triggered).
The situation with ENOBUFS isn't much different. The "how long to delay" and infinite loop considerations are still there. OOM killer is less likely under such considerations, but relying on the watchdog is still the correct path, IMHO.
The core issue here is that there are no specific cases in which read(2) should return any of those errors. If a condition arises that results in those errors, it is just as legitimate for the driver to return EIO.
As such, and unless OP knows of a specific use case his code is built to handle, these errors really should be handled the same way.
One last not regarding the OOM killer. People sometimes think of it as something that will save them from hanging the entire system. That is not really the case. The OOM killer randomly kills a process. It is true that the more pages the process has, the more likely it is that it be the one being killed. I strongly suggest no relying on that fact, however.
I have seen cases where physical memory was exhausted, where the OOM killer killed a process that used very little memory, taking some time to get to the main culprit. I've seen cases that the memory exhaustion was in the kernel address space, and the user space processes being killed were completely random.
As I've said above, OOM killer might kill your watchdog process, leaving your main hogger running. Do not rely on it to fix your code path.
How is the signalling(interrupts) mechanism handled in kernel? The cause why I ask is: somehow a SIGABRT signal is received by my application and I want to find where does that come from..
You should be looking in your application for the cause, not in the kernel.
Usually a process receives SIGABRT when it directly calls abort or when an assert fails. Finding exactly the piece of the kernel that delivers the signal will gain you nothing.
In conclusion, your code or a library your code is using is causing this. See abort(3) and assert.
cnicutar's answer is the best guess IMHO.
It is possible that the signal has been emitted by another process, although in the case of SIGBART it most likely to be emitted by the same process which receives it via the abort(3) libc function.
In doubt, you can run your application with strace -e kill yourapp you args ... to quickly check if that kill system call is indeed invoked from within your program or dependent libraries. Or use gdb catch syscall.
Note that in some cases the kernel itself can emit signals, such as a SIGKILL when the infamous "OOM killer" goes into action.
BTW, signals are delivered asynchronously, they disrupt the normal workflow of your program. This is why they're painful to trace. Besides machinery such as SystemTap I don't know how to trace or log signals emission and delivery within the kernel.
I know that, given enough context, one could hope to use constructively (i.e. recover) from a segfault condition.
But, is the effort worth it? If yes, in what situation(s) ?
You can't really hope to recover from a segfault. You can detect that it happened, and dump out relevant application-specific state if possible, but you can't continue the process. This is because (amongst others)
The thread which failed cannot be continued, so your only options are longjmp or terminating the thread. Neither is safe in most cases.
Either way, you may leave a mutex / lock in a locked state which causes other threads to wait forever
Even if that doesn't happen, you may leak resources
Even if you don't do either of those things, the thread which segfaulted may have left the internal state of the application inconsistent when it failed. An inconsistent internal state could cause data errors or further bad behaviour subsequently which causes more problems than simply quitting
So in general, there is no point in trapping it and doing anything EXCEPT terminating the process in a fairly abrupt fashion. There's no point in attempting to write (important) data back to disc, or continue to do other useful work. There is some point in dumping out state to logs- which many applications do - and then quitting.
A possibly useful thing to do might be to exec() your own process, or have a watchdog process which restarts it in the case of a crash. (NB: exec does not always have well defined behaviour if your process has >1 thread)
A number of the reasons:
To provide more application specific information to debug a crash. For instance, I crashed at stage 3 processing file 'x'.
To probe whether certain memory regions are accessible. This was mostly to satisfy an API for an embedded system. We would try to write to the memory region and catch the segfault that told us that the memory was read-only.
The segfault usually originates with a signal from the MMU, which is used by the operating system to swap in pages of memory if necessary. If the OS doesn't have that page of memory, it then forwards the signal onto the application.
a Segmentation Fault is really accessing memory that you do not have permission to access ( either because it's not mapped, you don't have permissions, invalid virtual address, etc. ).
Depending on the underlying reason, you may want to trap and handle the segmentation fault. For instance, if your program is passed an invalid virtual address, it may log that segfault and then do some damage control.
A segfault does not necessarily mean that the program heap is corrupted. Reading an invalid address ( eg. null pointer ) may result in a segfault, but that does not mean that the heap is corrupted. Also, an application can have multiple heaps depending on the C runtime.
There are very advanced techniques that one might implementing by catching a segmentation fault, if you know the segmentation fault isn't an error. For example, you can protect pages so that you can't read from them, and then trap the SIGSEGV to perform "magical" behavior before the read completes. (See Tomasz Węgrzanowski "Segfaulting own programs for fun and profit" for an example of what you might do, but usually the overhead is pretty high so it's not worth doing.)
A similar principle applies to catching trapping an illegal instruction exception (usually in the kernel) to emulate an instruction that's not implemented on your processor.
To log a crash stack trace, for example.
No. I think it is a waste of time - a seg fault indicates there is something wrong in your code, and you will be better advised to find this by examining a core dump and/or your source code. The one time I tried trapping a seg fault lead me off into a hall of mirrors which I could have avoided by simply thinking about the source code. Never again.