Vulkan abort problematic commands? - graphics

I have an application where multiple threads would be rendering different parts of a world. However it may occur that one of those threads could submit a highly problematic, or even malicious, command to Vulkan.
Is there anyway to preemptively check for issues with the command that could catch it being problematic? Or let it attempt to be executed, but then by some means determine if it is problematic and abort it? All the while not corrupting or wrecking appropriate commands that were submitted from other threads.
I know obvious solution is "don't submit malicious commands!" but without explaining everything, the jist of this is to try and create a kind graphics sandbox.

The Vulkan run-time assumes well formed input; there isn't any error checking (that's left to layer drivers) so without validation you could get rendering corruption or driver crashes.
You can get some limited protection to GPU-side buffer overruns using robustBufferAccess, but it only catches a tiny subset of the problems.
Beyond that the only real solution is to rely on host process isolation, and put each content provider into a separate process on the host OS with a unique rendering context.
Even with that you can get trivial denial-of-service (shader with a very long running and/or infinite loop), which the API doesn't really give you any means to control. You'd be reliant on the privileged GPU driver timing out the process and killing it.

Related

Why do we need semaphores on single cpu?

I have read that we use semaphores inside the linux kerenl,and i have read that semaphores has advantages even in one single cpu (we can run only one process\thread). Can anyone please give me an example of a problem that semaphore solves(inside the kernel)?
In my view, there can be a problem only if we have more than one cpu, because two process may call system calls that use the same data structure, and probablly cause problems.
Thank you for your help!
You don't really need more than one CPU for concurrency. The multiple CPUs are really "an implementation detail," a piece of hardware quirkiness that you can abstract away from. Concurrency is a logical property of programs. You can have concurrency without multiple CPUs, and use multiple CPUs without "real concurrency".
Consider a web server. It has to be "concurrent," in the sense that it must serve multiple clients at once, hold information about multiple connections and once, and process multiple requests at once. You can have it literally do this, by having multiple CPUs all working at the same time. Yet, the program only has to appear to do multiple things at once. It could just as well be running on one CPU and context switching to fairly service all the work put to it. The fact that a web-server does multiple things at once is part of its interface: the I/O for the connections are interleaved, if a request has exclusively locked a resource, another request won't start trying to manipulate that same resource, etc. Writing a web server without concurrency produces a program that is wrong.
Semaphores help you with concurrency, by letting you control the way processes access resources. You asked, if you had one process running, how another could run at the same time with only a single core. Well, as I said, concurrency doesn't need multiple cores. The first process can be paused, and the second one started while the first one is still unfinished. This is just an implementation detail; logically, to the program writer, the two processes are running simultaneously, whether there are multiple cores or not. If the program was written without semaphores (or had broken concurrency in some other way), it would be wrong, even on a single core. Physically, this will be because context switching can abruptly pause one computation and start another at any time, and, without semaphores, the newly live thread won't know what resources it can and cannot access. Logically, this will be because the processes are running simultaneously, once you abstract yourself away from the implementation, and, in general, processes running simultaneously can walk over each other if not properly synchronized.
For an example applicable to an OS kernel, consider that every process is logically running concurrently with every other process. A kernel provides the implementation that makes this concurrency work. A resource that two processes may want simultaneously is a hard drive. A semaphore might be used in the kernel to track whether a given drive is currently busy with a read or write. A process trying to read or write to the same disk will ask the kernel to do so, and the kernel can check the semaphore to see that the disk is still busy and force the offending process to wait. Now, an operating system does count as low level code, so in some places, yes, you might want to omit some otherwise vital concurrency safeguards when running on a single CPU, because your job is to handle such implementation details, but higher level parts may still use them.
In contrast, consider a number-crunching program. Let's say it's processing each element of a huge array of data into an equal-sized array of modified data (a functional map operation). It can use multiple CPUs to do this more quickly, but it can also work one CPU. The observable behavior of the program is the same, and you never get any idea that it's doing multiple things at once from its behavior. Numbers go in, numbers come out, who cares what happens in the middle? Writing such a program without the ability to do multiple things at once does not produce a logically incorrect program, just a slow one. Such a program probably does not need semaphores when running on a single CPU, because it didn't need concurrency in the first place.

How do sandboxing environments recover from faults?

In JavaScript runtimes and other JIT-compiled environments where code needs to be sandboxed, how can the larger program recover when the dynamically loaded code faults?
For instance, the code that SpiderMonkey generates for WebAssembly contains instructions which throw SIGILL when they are executed (e.g., to do a bounds check). When such a fault is thrown, I suppose the exception handler of the JavaScript engine is called. But such a signal handler is very restricted in its abilities. To jump out of the bad dynamically generated code it would typically use siglongjmp, but if I understand the notes at the bottom of man signal-safety correctly, if that happens, then no other unsafe function may be used by the program ever again. Such a restriction is of course unreasonable for a JS runtime (e.g.).
So how do sandboxing environments recover from child faults without losing abilities (in particular, becoming restricted to only async-safe functions)?
One thought I had is that perhaps the sandboxed code is ran in another thread with its own signal handler, so that the main thread does not lose any abilities when the child thread receives a signal. But I don't know whether that is correct; I find it hard to find documentation on how signals and threads interact exactly.
As you will see this question is tagged posix, but I am in principle also interested in the Windows approach.

Multi threading analysis techniques

Does anyone know of any analysis techniques that can be used to design/debug thread locking and unlocking sequences? Essentially a technique (like a truth table) I can use to prove that my sequence of locks won't deadlock.
This is not the sort of problem that programming by trial and error works well in.
My particular problem is a read write lock - but I ask this in the general sense. I believe it would be a useful technique to learn if one exists.
I have tried a causal graph in which I have boxes and arrows that I can use to follow the flow of control and that has solved 80% of my problem. But I am still getting occasional deadlocks under stress testing when one thread sneaks though the "gap between instructions" if that makes any sense.
To summarize; what I need is some way of representing the problem so that I can formally analyze the overlap of mutex locks.
Bad news I'm afraid. There are no techniques that I know of that can "prove" that a system that uses locks to control access to shared memory. By "prove" I mean that you cannot demonstrate analytically that a program won't deadlock, livelock, etc.
The problem is that threads run asynchronously. As soon as you start having a sensible number of threads and shared resources, the number of possible sequences of events (e.g. locking/unlocking shared resources) is astronomically high and you cannot model / analyse each and every one of them.
For this reason Communicating Sequential Processes was developed by Tony Hoare, way back in 1978. It is a development of the Actor model which itself goes a long way to resolving the problem.
Actor and CSP
Briefly, in the Actor model data is not communicated via shared memory with a lock. Instead a copy is sent down a communications channel of some sort (e.g. a socket, or pipe) between two threads. This means that you're never locking memory. In effect all memory is private to threads, with copies of it being sent as and when required to other threads. It's a very 'object orientated' thing; private data (thread-owned memory), public interface (messages emitted and received on communications channels). It's also very scalable - pipes can become sockets, threads can become processes on other computers.
The CSP model is just like that, except that the communications channel won't accept a message unless the receiving end is ready to read it.
This addition is crucial - it means that a system design can be analysed algebraically. Indeed Tony Hoare formulated a process calculi for CSP. The Wikipedia page on CSP cites use of this to prove an eCommerce system's design.
So if one is developing a strict CSP system, it is possible to prove analytically that it cannot deadlock, etc.
Real World Experience
I've done many a CSP (or CSP-ish) system, and it's always been good. Instead of doing the maths I've used intuition to help me avoid problems. In effect CSP ensures that if I've gone and built a system that can deadlock, it will deadlock every time. So at least I find it in development, not 2 years later when some network link gets a bit busier than normal.
Real World Options
For Actor model programming there's a lot of options. ZeroMQ, nanomsg, Microsoft's .NET Data Flow library.
They're all pretty good, and with care you can make a system that'll be pretty good. I like ZeroMQ and nanomsg a lot - they make it trivial to split a bunch of threads up into separate processes on separate computers and you've not changed the architecture at all. If absolute performance isn't essential coupling these two up with, for example, Google Protocol Buffers makes for a really tidy system with huge options for incorporating different OSes, languages and systems into your design.
I suspect that MS's DataFlow library for .NET moves owner of references to the data around instead of copying it. That ought to make it pretty performant (though I've not actually tried it to see).
CSP is a bit harder to come by. You can nearly make ZeroMQ and DataFlow into CSP by setting message buffer lengths. Unfortunately you cannot set the buffer length to zero (which is what would make it CSP). MS's documentation even talks about the benefits to system robustness achieved by setting the queue length to 1.
You can synthesize CSP on top of Actor by having flows of synchronisation messages across the links. This is annoying to have to implement.
I've quite often spun up my own comms framework to get a CSP environment.
There's libraries for Java I think, don't know how actively developed they are.
However as you have existing code written around locked shared memory it'll be a tough job to adapt your code. So....
Kernel Shark
If you're on Linux and your kernel has FTRACE compiled in you can use Kernel Shark to see what has happened in your system. Similarly with DTRACE on Solaris, WindView on VxWorks, TATL on MCOS.
What you do is run your system until it stops, and then very quickly preserve the FTRACE log (it gets overwritten in a circular buffer by the OS). You can then see graphically what has happened (turn on Kernel Shark's process view), which may give clues as to what did what and when.
This helps you diagnose your application's deadlock, which may lead you towards getting things right, but ultimately you can never prove that it is correct this way. That doesn't stop you having a Eureka moment where you now know in your bones that you've got it right.
I know of no equivalent of FTRACE / Kernel shark for Windows.
For a broad range of multithreading tasks, we can draw a graph which reflects the order of locking of resources. If that graph has cycles, this means that deadlock is well possible. If there is no cycles, deadlock never occur.
For example, consider the Dining Philosophers task. If each philosopher takes left fork first, and then the right fork, then the graph of order of locking is a ring connecting all the forks. Deadlock is very possible in this situation. However, if one of philosophers changes his order, the ring become a line and deadlock would never occur. If all philosophers change their order and all would take right fork first, the graph again shapes a ring and deadlock is real.

Runtime integrity check of executed files

I just finished writing a linux security module which verifies the integrity of executable files at the start of their execution (using digital signatures). Now I want to dig a little bit deeper and want to check the files' integrity during run-time (i.e. periodically check them - since I am mostly dealing with processes that get started and run forever...) so that an attacker is not able to change the file within main memory without being identified (at least after some time).
The problem here is that I have absolutely no clue how I can check the file's current memory image. My authentication method mentioned above makes use of a mmap-hook which gets called whenever a file is mmaped before its execution, but as far as I know the LSM framework does not provide tools for periodical checks.
So my question: Are there any hints how I shoudl start this? How I can read a memory image and check its integrity?
Thank you
I understand what you're trying to do, but I'm really worried that this may be a security feature that gives you a warm fuzzy feeling for no good reason; and those are the most dangerous kinds of security features to have. (Another example of this might be the LSM sitting right next to yours, SElinux. Although I think I'm in the minority on this opinion...)
The program data of a process is not the only thing that affects its behavior. Stack overflows, where malicious code is written into the stack and jumped into, make integrity checking of the original program text moot. Not to mention the fact that an attacker can use the original unchanged program text to his advantage.
Also, there are probably some performance issues you'll run into if you are constantly computing DSA inside the kernel. And, you're adding that much more to long list of privileged kernel code that could be possibly exploited later on.
In any case, to address the question: You can possibly write a kernel module that instantiates a kernel thread that, on a timer, hops through each process and checks its integrity. This can be done by using the page tables for each process, mapping in the read only pages, and integrity checking them. This may not work, though, as each memory page probably needs to have its own signature, unless you concatenate them all together somehow.
A good thing to note is that shared libraries only need to be integrity checked once per sweep, since they are re-mapped across all the processes that use them. It takes sophistication to implement this though, so maybe have this under this "nice-to-have" section of your design.
If you disagree with my rationale that this may not be a good idea, I'd be very interested in your thoughts. I ran into this idea at work a while ago, and it would be nice to bring fresh ideas to our discussion.

Is firing off a Thread a valid answer to simplifying code?

As multi-processor and multi-core computers become more and more ubiquitous, is simply firing off a new thread a (relatively) simple and painless way of simplifying code? For instance, in a current personal project, I have a network server listening on a port. Since this is just a personal project, it's just a desktop app, with a GUI integrated into it for configuration. So, the app reads something like this:
Main()
Read configuration
Start listener thread
Run GUI
Listener Thread
While the app is running
Wait for a new connection
Run a client thread for the new connection
Client Thread
Write synchronously
Read synchronously
ad inifinitum, or till they disconnect
This approach means that while I have to worry about alot of locking, with the potential issues that involves, I avoid alot of spaghetti code from assynchronous calls, etc.
A slightly more insidious version of this came up today when I was working on the startup code. The startup was quick, but it was using lazy loading for alot of the configuration, which meant that while startup was quick, actually connecting to and using the service was difficult because of the lag while it loaded different sections (this was actually measurable in real time, up to 3-10 seconds sometimes). So I moved to a different strategy, on startup, loop through everything and force the lazy loading to kick in... but this made it start prohibitively slow; get up, go get a coffee slow. Final solution: throw the loop into a seperate thread with feedback in the system tray while it's still loading.
Is this "Meh, throw it in another thread, it'll be fine" attitude ok? At what point do you start getting diminishing returns and/or even reduced performance?
Multithreading does a lot of things, but I don't think "simplification" is ever one of them.
It's a great way to introduce bugs into code.
Using multiple threads properly is not easy. It should not be attempted by new developers.
In my opinion, multi-threaded programming is pretty high up on the difficulty (and complexity) scale, along with memory management. To me, the "Meh, throw it in another thread, it'll be fine" attitude is a bit too casual. Think long and hard you must, before forking threads you do.
No.
Plainly and simply, multithreading increases complexity and is a nearly trivial way to add bugs to code. There are concurrency issues such as synchronization, deadlock, race conditions, and priority inversion to name a few.
Secondly, the performance gains are not automatic. Recently, there was an excellent article in MSDN Magazine along these lines. The salient details are that a certain operation was taking 46 seconds per ten iterations coded as a single-threaded operation. The author parallelized the operation naively (one thread per four cores) and the operation dropped to 30 seconds per ten iterations. Sounds great until you take into consideration that the operation now eats 300% more processing power but only experienced a 34% gain in efficiency. It's not worth consuming all available processing power for a gain like that.
This gives you the extra job of debugging race conditions, and handling locks and sycronisation issues.
I would not use this unless there was a real need.
Read up on Amdahl's law, best summarized by "The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program."
As it turns out, if only a small part of your app can run in parallel you won't get much gains, but potentially many hard-to-debug bugs.
I don't mean to be flip but what's in that configuration file that it takes so long to load? That's the origin of your problem, right?
Before spawning another thread to handle it, perhaps it can be parred down? Reduced, perhaps put in another data format that would be quicker, etc?
How often does it change? Is it something you can parse once at the beginning of the day and put the variables in shared memory so subsequent runs of your main program can just attach and get the needed values from there?
While I agree with everyone else here in saying that multithreading does not simplify code, it can be used to greatly simplify the user experience of your application.
Consider an application that has a lot of interactive widgets (I am currently developing one where this helps) - in the workflow of my application, a user can "build" the current project they are working on. This requires disabling the interactive widgets my application presents to the user and presenting a dialog with a indeterminate progress bar and a friendly "please wait" message.
The "build" occurs on a background thread; if it were to happen on the UI thread it would make the user experience less enjoyable - after all, it's no fun not being able to tell whether or not you are able to click on a widget in an application while a background task is running (cough, Visual Studio). Not to say that VS doesn't use background threads, I'm just saying their user experience could use some improvement. But I digress.
The one thing I take issue with in the title of your post is that you think of firing off threads when you need to perform tasks - I generally prefer to reuse threads - in .NET, I generally favor using the system thread pool over creating a new thread each time I want to do something, for the sake of performance.
I'm going to provide some balance against the unanimous "no".
DISCLAIMER: Yes, threads are complicated and can cause a whole bunch of problems. Everyone else has pointed this out.
From experience, a sequence of blocking reads/writes to a socket (which requires a separate thead) is much simpler than non-blocking ones. With blocking calls, you can tell the state of the connection just by looking at where you are in the function. With non-blocking calls, you need a bunch of variables to record the state of the connection, and check and modify them every time you interact with the connection. With blocking calls, you can just say "read the next X bytes" or "read until you find X" and it will actually do it (or fail). With non-blocking calls, you have to deal with fragmented data which usually requires keeping temporary buffers and filling them as necessary. You also end up checking if you've received enough data every time you receive little more. Plus you have to keep a list of open connections and handle unexpected closes for all of them.
It doesn't get much simpler than this:
void WorkerThreadMain(Connection connection) {
Request request = ReadRequest(connection);
if(!request) return;
Reply reply = ProcessRequest(request);
if(!connection.isOpen) return;
SendReply(reply, connection);
connection.close();
}
I'd like to note that this "listener spawns off a worker thread per connection" pattern is how web servers are designed, and I assume it's how a lot of request/response soft of server applications are designed.
So in conclusion, I have experienced the asynchronous socket spaghetti code you mentioned, and spawning off worker threads for every connection ended up being a good solution. Having said all this, throwing threads at a problem should usually be your last resort.
I think your have no choice but to deal with threads especially with networking and concurrent connections. Do threads make code simpler? I don't think so. But without them how would you program a server that can handle more than 1 client at the same time?

Resources