I'm experimenting with Linux Security Modules, trying to make one.
My main source of knowledge about how they're supposed to work are mailing lists archives and the existing LSMs' sources, plus the few pages about them in the Linux documentation.
I understand that there are two kinds of LSMs.
Exclusive LSMs like SELinux / AppArmor, which have the LSM_FLAG_EXCLUSIVE flag set in their LSM definition.
Non-exclusive LSMs like Yama, capabilities or lockdown.
Browsing the source code of all these LSMs, I figured out non-exclusive ones never make use of security blobs. On the other hand, exclusive ones make heavy use of them.
For instance, see the AppArmor LSM definition, and the one for Yama.
So, can non-exclusive LSMs specify blob sizes and use this feature ?
Trying to find an answer, I explored the framework's source to see if maybe security blobs were switched between each LSM hook call, I guess that would allow each LSM to only have access to its own blobs and not those of another LSM.
However, we can see here in the LSM framework that it is not the case.
If my LSM declares blob sizes, can I use the blobs if my kernel also have SELinux, for instance, enabled ?
Won't the structures from SELinux and mine overlap ?
Alright, I found the relevant code in the LSM framework.
QED: Yes, all LSMs can use security blobs as long as they use the sizes structure as containing offsets once the module is ignited.
Explanation:
When you define your LSM, you use the DEFINE_LSM macro followed by various informations, including a pointer to a struct lsm_blobs_sizes.
During its own ignition, the LSM framework (which is mostly implemented in security/security.c) manipulates your structure in a few operations.
It stores, in its own instance of the structure (declared here), the sum of all LSM's security blobs sizes. Precisely, looking at this stack trace:
ordered_lsm_init()
`- prepare_lsm(*lsm)
`- lsm_set_blob_sizes(lsm->blobs)
`- lsm_set_blob_size(&needed->lbs_task, &blob_sizes.lbs_task);
lsm_set_blob_size is responsible for the actual addition to the framework's structure instance.
However, combined with lsm_set_blob_sizes, it effectively replaces each size in the currently prepared LSM's struct lsm_blob_sizes with the offset at which this LSM's part of the blob resides.
The framework then calls their init function.
Later, when any structure with a security blob (a task_struct for instance) gets allocated, the framework will allocate one blob with enough space for the blobs of all security modules, which in turn will find their spot in this larger blob using the offsets in their own lsm_blobs_sizes.
The summed-up sizes of security blobs is actually meant to be controlled using init_debug, here.
So what this means is that all LSMs can define security blobs sizes. The framework is responsible for their allocation (and deallocation) and the blobs for different LSMs can happily live side by side in memory.
Related
I have a QSPI flash on my embedded board.
I have a driver + process "Q" to handle reading and writing into.
I want to store variables like SW revisions, IP, operation time, etc.
I would like to ask for suggestions how to handle the different access rights to write and read values from user space and other processes.
I was thinking to have file for each variable. Than I can assign access rights for those files and process Q can change the value in file if value has been changed. So process Q will only write into and other processes or users can only read.
But I don't know about writing. I was thinking about using message queue or zeroMQ and build the software around it but i am not sure if it is not overkill. But I am not sure how to manage access rights anyway.
What would be the best approach? I would really appreciate if you could propose even totally different approach.
Thanks!
This question will probably be downvoted / flagged due to the "Please suggest an X" nature.
That said, if a file per variable is what you're after, you might want to look at implementing a FUSE file system that wraps your SPI driver/utility "Q" (or build it into "Q" if you get to compile/control source to "Q"). I'm doing this to store settings in an EEPROM on a current work project and its turned out nicely. So I have, for example, a file, that when read, retrieves 6 bytes from EEPROM (or a cached copy) provides a MAC address in std hex/colon-separated notation.
The biggest advantage here, is that it becomes trivial to access all your configuration / settings data from shell scripts (e.g. your init process) or other scripting languages.
Another neat feature of doing it this way is that you can use inotify (which comes "free", no extra code in the fusefs) to create applications that efficiently detect when settings are changed.
A disadvantage of this approach is that it's non-trivial to do atomic transactions on multiple settings and still maintain normal file semantics.
Although there is another question with similar topic, it does not cover the memory use by the shared libraries in chrooted jails.
Let's say we have a few similar chroots. To be more specific, exactly the same sets of binary files and shared libraries which are actually hard links to the master copies to conserve the disk space (to prevent the potential possibility of a files alteration the file system is mounted read only).
How is the memory use affected in such a setup?
As described in the chroot system call:
This call changes an ingredient in the pathname resolution process and does nothing else.
So, the shared library will be loaded in the same way as if it were outside the chroot jail (share read only pages, duplicate data, etc.)
http://man7.org/linux/man-pages/man2/chroot.2.html
Because hardlinks share the same underlying inode, the kernel treats them as the same item when it comes to caching/mapping.
You'll see filesystem cache savings by using hardlinks, as well as disk-space savings.
The biggest issue I'd have with this is that if someone manages so subvert the read-only nature of one of the chroot environments, then they could subvert all of them by making modifications to any of the hardlinked files.
When I set this up, I copied the shared libraries per chroot instead of linking to a read-only mount. With separate files, the text segments were not shared. It's likely that the same inode will map to the same read-only text segment, but this may vary with available memory management hardware and similar architectural details.
Try this experiment on your system: write a small program that makes some minimal use of a large shared library. Run twenty or thirty chroot jails as you describe, each with a running copy of the program. Check overall memory usage before & during running, and dissect one instance to get a good text/data segment breakdown. If memory use increases by the full size of the map for each instance, the segments are not shared. Conversely, if memory use goes up by a fraction of the map, the segments are shared.
Hello I am a newbie to kernel programming. I am writing a small kernel module
that is based on wrapfs template to implement a backup mechanism. This is
purely for learning basis.
I am extending wrapfs so that when a write call is made wrapfs transparently
makes a copy of that file in a separate directory and then write is performed
on the file. But I don't want that I create a copy for every write call.
A naive approach could be I check for existence of file in that directory. But
I think for each call checking this could be a severe penalty.
I could also check for first write call and then store a value for that
specific file using private_data attribute. But that would not be stored on
disk. So I would need to check that again.
I was also thinking of making use of modification time. I could save a
modification time. If the older modification time is before that time then only
a copy is created otherwise I won't do anything. I tried to use inode.i_mtime
for this but it was the modified time even before write was called, also
applications can modify that time.
So I was thinking of storing some value in inode on disk that indicates its
backup has been created or not. Is that possible? Any other suggestions or
approaches are welcome.
You are essentially saying you want to do a Copy-On-Write virtual filesystem layer.
IMO, some of these have been done, and it would be easier to implement these in userland (using libfuse and the fuse module, e.g.). That way, you can be king of your castle and add your metadata in any which way you feel is appriate:
just add (hidden) metadata files to each directory
use extended POSIX attributes (setfattr and friends)
heck, you could even use a sqlite database
If you really insist on doing these things in-kernel, you'll have a lot more work since accessing the metadata from kernel mode is goind to take a lot more effort (you'd most likely want to emulate your own database using memory mapped files so as to minimize the amount of 'userland (style)' work required and to make it relatively easy to get atomicity and reliability right1.
1
On How Everybody Gets File IO Wrong: see also here
You can use atime instead of mtime. In that case setting S_NOATIME flag on the inode prevents it from updating (see touch_atime() function at the inode.c). The only thing you'll need is to mount your filesystem with noatime option.
I'm currently developing an application which needs a lot of system and process information, some of which is only available through /proc, and I have some general questions about accessing the structures.
The application will be run on Linux (kernel >= 2.6), not on any other Unix-flavored OS. It should have access to any data in /proc, I can't say what is necessary now as the specifications are not clear yet, but the whole /proc directory is relevant to the application.
First of all: Is there a good documentation which covers all the features added / removed from kernel version to kernel version? One thing I'm curious about in particular is the format of the individual files. Can I take that for granted? Does it change among kernel versions?
Hooking up the parsing process based on the kernel wouldn't be a problem at all, it's just that I couldn't find any good docs on what has changed from version to version which could help me catching parsing errors in beforehand.
In addition: Is there a definite list of features that can be activated / deactivated by kernel options (except of course the /proc-feature itself)? I'm looking for a list of files / directories that only exist with the appropriate options being set in the kernel.
As an example of what I'm thinking of, this is a link to the proc manpage (http://linux.die.net/man/5/proc) which includes a lot of good information, e.g. some options include the earliest kernel version they were available at, some include whether a module is necessary to be loaded. This does not describe the output format of all information though, which is something I need if I want to parse it (e.g. if it is consistent throughout all kernel versions or changed at some point).
The second thing I'm wondering about is what happens if the process queried dies while being queried. What is my time interval? For example if I'm going to fetch a list of processes reading all the structures, and parse them one after another, what happens if my process x dies before I get to read it? Even if I check if the directory exists, it could still be gone one application call later.
Last but not least: Is there any major distribution out there that is not mounting proc?
From what I understand, a lot of common tools are based on the /proc interface such as lsmod or free, so I'm guessing that I can expect /proc to exist almost always.
The /proc interfaces are pretty stable (unlike the /sys interfaces), even if nothing is guaranteed. Almost all changes are backwards compatible, at least if they've been around for a few versions. You should
stick to the documented interfaces to be safe. If a file exists, its format may be extended in later versions, but normally in a backwards compatible way, e.g. adding columns to a table. The parts that are most at risk of disappearing are parts concerning hardware susbystems such as ACPI or SCSI, which are migrating to /sys (with a long transition period when both exist).
Most of the information is architecture-independent, except for hardware information (e.g. /proc/cpuinfo has very different fields on different architectures).
The main documentation is Documentation/filesystems/proc.txt in the kernel source. Consider proc(5) to be the overview and proc.txt to be the fine details. The kernel documentation is often incomplete, so don't be surprised if you need to resort to reading the source sometimes.
Most optional parts of /proc are activated by default if the driver whose data it exposes is included in the kernel. The exceptions are mostly related to hardware features that rarely need to be accessed from outside the kernel; if you need to access these features, you're probably already expecting to need to dig deeper. Look through Kconfig files in the kernel source for detailed information.
Process data (or hardware data related to removable hardware or provided by unloadable modules) can disappear under your nose. Most files under /proc can be read atomically, with a single read call with a reasonably-sized buffer; if you perform multiple read calls in sequence, drivers are supposed to guarantee that you get well-formed data. There is no way to guarantee atomicity between reads of separate files; if you're reading information about a process, this process can die at any time, and in principle could even be replaced by another process with the same PID before you're finished.
As it says in the description of /proc, “everyone should say Y here”. All desktop/server Linux systems and most embedded Linux systems must have /proc; a lot of things, including ps and other process management commands, many filesystem and device-related tools, and module loading, require it. The only systems that might be able to dispense with /proc are very small single-purpose embedded systems that support a single hardware configuration and run a fixed set of programs. You can count on its being here.
The function which creates shared memory in *inux programming takes a key as one of its parameters..
What is the meaning of this key? And How can I use it?
Edit:
Not shared memory id
It's just a System V IPC (inter-process communications) key so different processes can create or attach to the same block of shared memory. The key is typically created with ftok() which turns a fully-specified filename and project ID into a usable key.
Since an application can generally use the same filename in all its different processes (the filename is often a configuration file associated with your application), each different process gets the same key (or, more likely if your using the project ID to specify multiple shared memory segments, the same set of keys).
For example, we once had an application that used a configuration file processed by our lex/yacc code so we just used that pathname and one project ID for each different shared memory block (there were three or four depending on the purpose of the process in question). This actually made a lot of sense since it was the parsed and evaluated data from that configuration file that was stored in the shared memory blocks.
Since no other application on the system should be using our configuration file for making a key, there was no conflict. The key itself is not limited to shared memory, it could be used for semaphores and other IPC mechanisms as well.
The posix shared memory functions (shm_open and friends) have a more user-friendly interface in that they can accept a unique filename which must be used by the applications to open the same shared memory block.
Having said that, it is generally also feasible to open a file in /dev/shm under Linux and then mmap it with MAP_SHARED which achieves much the same.