Identifying taint sources in a program for taint propagation - security

In taint analysis, a taint source is a program location or statement that may produce an untrusted or external input.
My Goal : Identify all external user inputs to the program such as cmdline-input , file reading , environment and network variables using dynamic analysis(preferably) and propagate the taint.
I read this tutorial- http://shell-storm.org/blog/Taint-analysis-and-pattern-matching-with-Pin/ which intercepts read syscalls using Intel PIN and propagates the taint. I want to extend the same to include various external inputs mentioned as above.( To start out , for C - scanf , gets , fopen ,etc )
Is there any dynamic analysis tool which would help me with the identification of generic external inputs? Any other approaches with specific goals is also appreciated. Thanks

I'm assuming you only target Linux.
In general, the way for a program to get outside inputs is through contact with the operating system, using system calls. You're talking about libc functions. This is a higher level of abstraction. I would recommend looking at inputs at the system call level.
Additionally, another input for the program is environment variables and command line arguments, that are found on the stack when the program starts.
One more thing to consider is shared memory in all its shapes and forms.

Related

In OS, why loadable kernel modules (LKMs) don't need to invoke message passing in order to communicate?

My question lies in a paragraph, the paragraph are shown as follow, I can't understand the the bold sentence. If it doesn't need to invoke message passing, how does it complete communication between process?
Modules
Perhaps the best current methodology for operating-system design involves
using loadable kernel modules (LKMs). Here, the kernel has a set of core
components and can link in additional services via modules, either at boot time
or during run time. This type of design is common in modern implementations
of UNIX, such as Linux, macOS, and Solaris, as well as Windows.
The idea of the design is for the kernel to provide core services, while
other services are implemented dynamically, as the kernel is running. Linking
services dynamically is preferable to adding new features directly to the kernel,
which would require recompiling the kernel every time a change was made.
Thus, for example, we might build CPU scheduling and memory management
algorithms directly into the kernel and then add support for different file
systems by way of loadable modules.
The overall result resembles a layered system in that each kernel section
has defined, protected interfaces; but it is more flexible than a layered system,
because any module can call any other module. The approach is also similar to
the microkernel approach in that the primary module has only core functions
and knowledge of how to load and communicate with other modules; but it
is more efficient, because modules do not need to invoke message passing in
order to communicate.
Linux uses loadable kernel modules, primarily for supporting device
drivers and file systems. LKMs can be “inserted” into the kernel as the system is started (or booted) or during run time, such as when a USB device is
plugged into a running machine. If the Linux kernel does not have the necessary driver, it can be dynamically loaded. LKMs can be removed from the
kernel during run time as well. For Linux, LKMs allow a dynamic and modular
kernel, while maintaining the performance benefits of a monolithic system. We
cover creating LKMs in Linux in several programming exercises at the end of
this chapter.
In OS, why loadable kernel modules (LKMs) don't need to invoke message passing in order to communicate?
The simple answer is that because they're loaded into kernel space and dynamically linked; the kernel can use "mostly normal" functions calls instead of anything more expensive (message passing, remote procedure calls, ...) to communicate with it.
Note: Typically (especially for *nix systems) a driver will provide a set of function pointers to the kernel (e.g. maybe one for open(), one for read(), one for ioctl(), etc) in some kind of "device context" structure; allowing the kernel to call the driver's functions via. the function pointers (e.g. like "result = deviceContext->open( ..);).
"The approach is also similar to the microkernel approach in that the primary module has only core functions and knowledge of how to load and communicate with other modules; but it is more efficient, because modules do not need to invoke message passing in order to communicate."
This paragraph has the potential to give you a false impression. For extensibility alone, modular monolithic kernels are similar to micro-kernels (and both are a lot more extensible than a "literally monolithic (one piece, like stone)" kernel). For other things (e.g. security) modular monolithic kernels are extremely dissimilar to micro-kernels.
For Linux specifically; you can think of it as almost 30 million lines (growing at a rate of over 1 million lines per year) of potential security vulnerabilities running at the highest privilege level with full access to every scrap of data, with an average of about 150 discovered critical vulnerabilities per year (and who knows how many undiscovered critical vulnerabilities).
One of the main goals of micro-kernels is to place isolation barriers between the "kernel core" and everything else; so that you might end up with several thousand lines of kernel that doesn't grow (and a significant improvement in security). It's those isolation barriers that require less efficient communication (e.g. message passing).
"...but it is more efficient, because modules do not need to invoke message passing in order to communicate."
This could be rephrased more correctly as "...but it is more efficient, because modules do not need to pass through an isolation barrier."
Note that message passing is merely one way to pass through an isolation barrier - there's shared memory, signals, pipes, sockets, remote procedure calls, etc. Nothing says a micro-kernel has to use message passing and you could design a micro-kernel that does not use message passing at all.

Persistant storage values handling in Linux

I have a QSPI flash on my embedded board.
I have a driver + process "Q" to handle reading and writing into.
I want to store variables like SW revisions, IP, operation time, etc.
I would like to ask for suggestions how to handle the different access rights to write and read values from user space and other processes.
I was thinking to have file for each variable. Than I can assign access rights for those files and process Q can change the value in file if value has been changed. So process Q will only write into and other processes or users can only read.
But I don't know about writing. I was thinking about using message queue or zeroMQ and build the software around it but i am not sure if it is not overkill. But I am not sure how to manage access rights anyway.
What would be the best approach? I would really appreciate if you could propose even totally different approach.
Thanks!
This question will probably be downvoted / flagged due to the "Please suggest an X" nature.
That said, if a file per variable is what you're after, you might want to look at implementing a FUSE file system that wraps your SPI driver/utility "Q" (or build it into "Q" if you get to compile/control source to "Q"). I'm doing this to store settings in an EEPROM on a current work project and its turned out nicely. So I have, for example, a file, that when read, retrieves 6 bytes from EEPROM (or a cached copy) provides a MAC address in std hex/colon-separated notation.
The biggest advantage here, is that it becomes trivial to access all your configuration / settings data from shell scripts (e.g. your init process) or other scripting languages.
Another neat feature of doing it this way is that you can use inotify (which comes "free", no extra code in the fusefs) to create applications that efficiently detect when settings are changed.
A disadvantage of this approach is that it's non-trivial to do atomic transactions on multiple settings and still maintain normal file semantics.

Quick questions about Linux kernel modules

I'm very familiar with Linux (I've been using it for 2 years, no Windows for 1 1/2 years), and I'm finally digging deeper into kernel programming and I'm working a project. So my questions are:
Will a kernel module run faster than a traditional c program.
How can I communicate with a module (is that even possible), for example call a function in it.
1.Will a kernel module run faster than a traditional c program.
It Depends™
Running as a kernel module means you get to play by different rules, you potentially get to avoid some context switches depending on what you are doing. You get access to some powerful tools that can be leveraged to optimize your code, but don't expect your code to run magically faster just by throwing everything in kernelspace.
2.How can I communicate with a module (is that even possible), for example call a function in it.
There are various ways:
You can use the various file system interfaces: procfs, sysfs, debugfs, sysctl, ...
You could register a char device
You can make use of the Netlink interface
You could create new syscalls, although that's heavily discouraged
And you can always come up with your own scheme, or use some less common APIs
Will a kernel module run faster than a traditional c program.
The kernel is already a C program, which is most likely be compiled with same compiler you use. So generic algorithms or some processor intensive computations will be executed with almost same speed.
But most userspace programs (like bash) have to ask kernel to perform some operations on system resources, i.e. print prompt onto monitor. It will require entering the kernel with system call, sending data over tty interfaces and passing to a video-driver, it may introduce some latency. If you'd implemented bash in-kernel, you may directly call video-driver, which is definitely faster.
That approach however, have drawbacks. First of all, bash should be able to print prompt on ssh-session or serial console, and that will complicate logic. Also, if your bash will hang, you cannot just kill, you have to reboot system.
How can I communicate with a module (is that even possible), for example call a function in it.
In addition to excellent list provided by #tux3, I would suggest to start with char devices.

Capture file system system calls on Linux platform

I want to capture all the system calls on a file system in great details. E.g. for write system call, I want to record the target file, number of bytes written and the offset that write occurs.
Currently, I want to implement such a logger with inotify. However, it cannot provide such details. E.g. for write it does not provide number of bytes written and offset.
An alternative is to use bbfs implemented on fuse. However, it will introduce overhead when logging system calls and delay user operations to some un-tolerable degree.
Is there some library that can capture system calls on file system, just like ptrace when logging all system calls issued by a process?
There are many options for tracing in Linux. But this sounds like a pretty simple case. Have you investigated simply using the strace utility? It has lots of options that can control tracing granularity, will log arguments to almost all syscalls (including buffer contents if you want that) and exists and works basically everywhere without any setup beyond installing the package.
How about write your own profiling tool using a wrapper? See GCC -wrapper:
-wrapper
Invoke all subcommands under a wrapper program. The name of the wrapper program and its parameters are passed as a comma separated list.

What is the Linux built-in driver load order?

How can we customize the built-in driver load order (to make some built-in driver module load first, and the dependent module load later)?
Built-in drivers wont be loaded, hence built-in. Their initialization functions are called and the drivers are activated when kernel sets up itself. These init functions are called in init/main.c::do_initcalls(). All init calls are classified in levels, which are defined in initcall_levels and include/linux/init.h
These levels are actuall symbols defined in linker script (arch/*/kernel/vmlinux.lds.*). At kernel compile time, the linker collects all function marked module_init() or other *_initcall(), classify in levels, put all functions in the same level together in the same place, and create like an array of function pointers.
What do_initcall_level() does in the run-time is to call each function pointed by the pointers in the array. There is no calling policy, except levels, in do_initcall_level, but the order in the array is decided in the link time.
So, now you can see that the driver's initiation order is fixed at the link time, but what can you do?
put your init function in the higher level, or
put your device driver at the higher position in Makefile
The first one is clear if you've read the above. ie) use early_initcall() instead if it is appropriate.
The second one needs a bit more explanation. The reason why the order in a Makefile matter is how the current kernel build system works and how the linkers works. To make a long story short, the build system takes all object files in obj-y and link them together. It is highly environment dependent but there is high probability that the linker place first object file in the obj-y in lower address, thus, called earlier.
If you just want your driver to be called earlier than other drivers in the same directory, this is simplest way to do it.
depmod examines the symbols exported and required by each module and does a topological sort on them that modprobe can later use to load modules in the proper order. Requiring the symbols from modules you wish to be dependent on is enough for it to do the right thing.
Correct module order and dependencies are handled by modprobe, even within the initrd.
Recently i got this problem my charger driver is having dependency on ADC driver so before loading ADC driver charger driver has loaded and checking for adc phandle which is defined in DTS file and has to intialize by ADC driver. its got resolved by changing the order of the module in drivers/Makefile

Resources