Understanding kernel patch submitting check list? related to testing patch? - linux

What does this mean by following checklist rule
Tested after it has been merged into the -mm patchset to make sure
that it still works with all of the other queued patches and various
changes in the VM, VFS, and other subsystems.
mm patchset means memory management related patches ?
Please correct my understanding.

Once upon a time, the -mm patchset indeed contained memory management related patches for future verions of the kernel.
That checklist entry is outdated; you should test against the linux-next tree.

Related

Accessing /proc

I'm currently developing an application which needs a lot of system and process information, some of which is only available through /proc, and I have some general questions about accessing the structures.
The application will be run on Linux (kernel >= 2.6), not on any other Unix-flavored OS. It should have access to any data in /proc, I can't say what is necessary now as the specifications are not clear yet, but the whole /proc directory is relevant to the application.
First of all: Is there a good documentation which covers all the features added / removed from kernel version to kernel version? One thing I'm curious about in particular is the format of the individual files. Can I take that for granted? Does it change among kernel versions?
Hooking up the parsing process based on the kernel wouldn't be a problem at all, it's just that I couldn't find any good docs on what has changed from version to version which could help me catching parsing errors in beforehand.
In addition: Is there a definite list of features that can be activated / deactivated by kernel options (except of course the /proc-feature itself)? I'm looking for a list of files / directories that only exist with the appropriate options being set in the kernel.
As an example of what I'm thinking of, this is a link to the proc manpage (http://linux.die.net/man/5/proc) which includes a lot of good information, e.g. some options include the earliest kernel version they were available at, some include whether a module is necessary to be loaded. This does not describe the output format of all information though, which is something I need if I want to parse it (e.g. if it is consistent throughout all kernel versions or changed at some point).
The second thing I'm wondering about is what happens if the process queried dies while being queried. What is my time interval? For example if I'm going to fetch a list of processes reading all the structures, and parse them one after another, what happens if my process x dies before I get to read it? Even if I check if the directory exists, it could still be gone one application call later.
Last but not least: Is there any major distribution out there that is not mounting proc?
From what I understand, a lot of common tools are based on the /proc interface such as lsmod or free, so I'm guessing that I can expect /proc to exist almost always.
The /proc interfaces are pretty stable (unlike the /sys interfaces), even if nothing is guaranteed. Almost all changes are backwards compatible, at least if they've been around for a few versions. You should
stick to the documented interfaces to be safe. If a file exists, its format may be extended in later versions, but normally in a backwards compatible way, e.g. adding columns to a table. The parts that are most at risk of disappearing are parts concerning hardware susbystems such as ACPI or SCSI, which are migrating to /sys (with a long transition period when both exist).
Most of the information is architecture-independent, except for hardware information (e.g. /proc/cpuinfo has very different fields on different architectures).
The main documentation is Documentation/filesystems/proc.txt in the kernel source. Consider proc(5) to be the overview and proc.txt to be the fine details. The kernel documentation is often incomplete, so don't be surprised if you need to resort to reading the source sometimes.
Most optional parts of /proc are activated by default if the driver whose data it exposes is included in the kernel. The exceptions are mostly related to hardware features that rarely need to be accessed from outside the kernel; if you need to access these features, you're probably already expecting to need to dig deeper. Look through Kconfig files in the kernel source for detailed information.
Process data (or hardware data related to removable hardware or provided by unloadable modules) can disappear under your nose. Most files under /proc can be read atomically, with a single read call with a reasonably-sized buffer; if you perform multiple read calls in sequence, drivers are supposed to guarantee that you get well-formed data. There is no way to guarantee atomicity between reads of separate files; if you're reading information about a process, this process can die at any time, and in principle could even be replaced by another process with the same PID before you're finished.
As it says in the description of /proc, “everyone should say Y here”. All desktop/server Linux systems and most embedded Linux systems must have /proc; a lot of things, including ps and other process management commands, many filesystem and device-related tools, and module loading, require it. The only systems that might be able to dispense with /proc are very small single-purpose embedded systems that support a single hardware configuration and run a fixed set of programs. You can count on its being here.

Implementing an update/upgrade system for embedded Linux devices

I have an application that runs on an embedded Linux device and every now and then changes are made to the software and occasionally also to the root file system or even the installed kernel.
In the current update system the contents of the old application directory are simply deleted and the new files are copied over it. When changes to the root file system have been made the new files are delivered as part of the update and simply copied over the old ones.
Now, there are several problems with the current approach and I am looking for ways to improve the situation:
The root file system of the target that is used to create file system images is not versioned (I don't think we even have the original rootfs).
The rootfs files that go into the update are manually selected (instead of a diff)
The update continually grows and that becomes a pita. There is now a split between update/upgrade where the upgrade contains larger rootfs changes.
I have the impression that the consistency checks in an update are rather fragile if at all implemented.
Requirements are:
The application update package should not be too large and it must also be able to change the root file system in the case modifications have been made.
An upgrade can be much larger and only contains the stuff that goes into the root file system (like new libraries, kernel, etc.). An update can require an upgrade to have been installed.
Could the upgrade contain the whole root file system and simply do a dd on the flash drive of the target?
Creating the update/upgrade packages should be as automatic as possible.
I absolutely need some way to do versioning of the root file system. This has to be done in a way, that I can compute some sort of diff from it which can be used to update the rootfs of the target device.
I already looked into Subversion since we use that for our source code but that is inappropriate for the Linux root file system (file permissions, special files, etc.).
I've now created some shell scripts that can give me something similar to an svn diff but I would really like to know if there already exists a working and tested solution for this.
Using such diff's I guess an Upgrade would then simply become a package that contains incremental updates based on a known root file system state.
What are your thoughts and ideas on this? How would you implement such a system? I prefer a simple solution that can be implemented in not too much time.
I believe you are looking wrong at the problem - any update which is non atomic (e.g. dd a file system image, replace files in a directory) is broken by design - if the power goes off in the middle of an update the system is a brick and for embedded system, power can go off in the middle of an upgrade.
I have written a white paper on how to correctly do upgrade/update on embedded Linux systems [1]. It was presented at OLS. You can find the paper here: https://www.kernel.org/doc/ols/2005/ols2005v1-pages-21-36.pdf
[1] Ben-Yossef, Gilad. "Building Murphy-compatible embedded Linux systems." Linux Symposium. 2005.
I absolutely agree that an update must be atomic - I have started recently a Open Source project with the goal to provide a safe and flexible way for software management, with both local and remote update. I know my answer comes very late, but it could maybe help you on next projects.
You can find sources for "swupdate" (the name of the project) at github.com/sbabic/swupdate.
Stefano
Currently, there are quite a few Open Source embedded Linux update tools growing, with different focus each.
Another one that is worth being mentioned is RAUC, which focuses on handling safe and atomic installations of signed update bundles on your target while being really flexible in the way you adapt it to your application and environment. The sources are on GitHub: https://github.com/rauc/rauc
In general, a good overview and comparison of current update solutions you might find on the Yocto Project Wiki page about system updates:
https://wiki.yoctoproject.org/wiki/System_Update
Atomicity is critical for embedded devices, one of the reasons highlighted is power loss; but there could be others like hardware/network issues.
Atomicity is perhaps a bit misunderstood; this is a definition I use in the context of updaters:
An update is always either completed fully, or not at all
No software component besides the updater ever sees a half installed update
Full image update with a dual A/B partition layout is the simplest and most proven way to achieve this.
For Embedded Linux there are several software components that you might want to update and different designs to choose from; there is a newer paper on this available here: https://mender.io/resources/Software%20Updates.pdf
File moved to: https://mender.io/resources/guides-and-whitepapers/_resources/Software%2520Updates.pdf
If you are working with the Yocto Project you might be interested in Mender.io - the open source project I am working on. It consists of a client and server and the goal is to make it much faster and easier to integrate an updater into an existing environment; without needing to redesign too much or spend time on custom/homegrown coding. It also will allow you to manage updates centrally with the server.
You can journal an update and divide your update flash into two slots. Power failure always returns you to the currently executing slot. The last step is to modify the journal value. Non atomic and no way to make it brick. Even it if fails at the moment of writing the journal flags. There is no such thing as an atomic update. Ever. Never seen it in my life. Iphone, adroid, my network switch -- none of them are atomic. If you don't have enough room to do that kind of design, then fix the design.

Git/Linux: What is a good strategy for maintaining a Linux kernel with patches from multiple Git repositories?

I am maintaining a custom Linux kernel which is comprised of merged changes from a variety of sources. This is for an embedded system.
The chip vendor we are working with releases a board support package as a changes against a mainline kernel (2.6.31). I have since made changes to support our custom hardware, and also merged with the stable (2.6.31.y) kernel releases. I've also merged in bug fixes for a specific file system driver that we use, sometimes before the changes make it to the mainline kernel.
I haven't been very systematic about how I have managed the various contributing sources and my own changes. If the change was large I tended to merge; if it was small I tended to rebase the third party changeset on to my own. Generally speaking merge conflicts are rare, since most of my work affects drivers that are not in the mainline kernel anyway.
I'm wondering if there is a better way to manage all of this. One concern is that my changes are mixed in with merges. The history might look something like:
2.6.31 + board support package + my changes (1) + 2.6.31.12 changes + my changes (2) + file system driver update + my changes (3) + 2.6.31.14 changes + my changes(4) + ....
It worries me a bit that my changes are mixed in, sometimes on the other side of merges. Is there a better way to do this? In particular, is there a way to do this that will make life easier when I switch to a newer kernel?
I don't think it will be easy to clean up your current set-up, unless your history is reasonably short, but I would suggest this:
Set up a Master repository which has remotes set up for each of the other places that your code comes from -- the mainline kernel, patches, ...
Keep a separate branch specifically for the updates from your driver supplier.
When you fetch, the updates will not mess with your branches.
When you are ready to merge, merge into some kind of "release" branch. The idea is to keep each source separate from the others, except when it needs to be merged in. Base your changes off of this branch, merging/rebasing as necessary.
Here's a quick diagram which I hope is helpful:
mainline-\----------\-------------------------------\
\ \ /you---\---/-/ \
\release----\-/---/----/-/--------\-/ / --\-----
patches-----------------/---/ / /
/ /
driver-------------------------/--------------/
With so many branches, it is difficult to diagram effectively, but I hope that gives you an idea of what I mean. release holds the official code for your system, you holds your own modifications, driver holds patches from the driver supplier, patches holds patches from some other repo, and mainline is the mainline kernel. Everything gets merged into release, and you base your changes off of release but only interact by merging in each direction, not making changes directly to release.
I think the generally accepted best policy is
(a) patch sets
(b) topic branches
Topic branches are essentially the same but are regularly rebased onto mainline. Topgit is a known tool that makes handling topic branches 'easier' if you have a lot of them. Otherwise: plan ahead and limit the number of branches

Stripping down a kernel in linux?

I recently read a post (admittedly its a few years old) and it was advice for fast number-crunching program:
"Use something like Gentoo Linux with 64 bit processors as you can compile it natively as you install. This will allow you to get the maximum punch out of the machine as you can strip the kernel right down to only what you need."
can anyone elaborate on what they mean by stripping down the kernel? Also, as this post was about 6 years old, which current version of Linux would be best for this (to aid my google searches)?
There is some truth in the statement, as well as something somewhat nonsensical.
You do not spend resources on processes you are not running. So as a first instance I would try minimise the number of processes running. For that we quite enjoy Ubuntu server iso images at work -- if you install from those, log in and run ps or pstree you see a thing of beauty: six or seven processes. Nothing more. That is good.
That the kernel is big (in terms of source size or installation) does not matter per se. Many of this size stems from drivers you may not be using anyway. And the same rule applies again: what you do not run does not compete for resources.
So think about a headless server, stripped down -- rather than your average desktop installation with more than a screenful of processes trying to make the life of a desktop user easier.
You can create a custom linux kernel for any distribution.
Start by going to kernel.org and downloading the latest source. Then choose your configuration interface (you have the choice of console text, 'config', ncurses style 'menuconfig', KDE style 'xconfig' and GNOME style 'gconfig' these days) and execute ./make whateverconfig. After choosing all the options, type make to create your kernel. Then make modules to compile all the selected modules for this kernel. Then, make install will copy the files to your /boot directory, and make modules_install, copies the modules. Next, go to /boot and use mkinitrd to create the ram disk needed to boot properly, if needed. Then you'll add the kernel to your GRUB menu.lst, by editing menu.lst and copying the latest entry and adding a similar one pointing to the new kernel version.
Of course, that's a basic overview and you should probably search for 'linux kernel compile' to find more detailed info. Selecting the necessary kernel modules and options takes a bit of experience - if you choose the wrong options, the kernel might not be bootable and you'll have to start over, which is a pain because selecting the options and compiling the kernel can take 15-30 minutes.
Ultimately, it isn't going to make a large difference to compile a stripped-down custom kernel unless your given task is very, very performance sensitive. It makes sense to remove things you're never going to use from the kernel, though, like say ISDN support.
I'd have to say this question is more suited to SuperUser.com, by the way, as it's not quite about programming.

Linux - identify process owning a specific address in physical memory

Under Linux, how can I tell what specific process owns / is using a given address in physical memory?
I understand that this may require writing a kernel module to access some kernel data structure and return the results to a user - I need to know how it can be done, regardless of how complicated it is.
The pages in use by a process and their location in physical memory are not static pieces of information. However, the information you seek should be in the page tables. A change went into the kernel that might be almost exactly what you're looking for:
author Arjan van de Ven <arjan#linux.intel.com> 2008-04-17 15:40:45 (GMT)
committer Ingo Molnar <mingo#elte.hu> 2008-04-17 15:40:45 (GMT)
commit 926e5392ba8a388ae32ca0d2714cc2c73945c609 (patch)
tree 2718b50b8b66a3614f47d3246b080ee8511b299e
parent 2596e0fae094be9354b29ddb17e6326a18012e8c (diff)
x86: add code to dump the (kernel) page tables for visual inspection by kernel developers
This patch adds code to the kernel to have an (optional)
/proc/kernel_page_tables debug file that basically dumps the kernel
pagetables; this allows us kernel developers to verify that nothing
fishy is going on and that the various mappings are set up correctly.
This was quite useful in finding various change_page_attr() bugs, and
is very likely to be useful in the future as well.
Signed-off-by:Arjan van de Ven <arjan#linux.intel.com>
Cc: mingo#elte.hu
Cc: tglx#tglx.de
Cc: hpa#zytor.com
Signed-off-by: Ingo Molnar <mingo#elte.hu>
Signed-off-by: Thomas Gleixner <tglx#linutronix.de>
The added functionality is enabled by a new config option (X86_PTDUMP).
Might want to start here for a discusson of how process virtual memory is mapped to physical memory. That would give you a good place to start as far as figuring out where you would need to hook into the kernel to access the page table, etc. where that information is stored.
Well due to the way things are done under Linux, a process may own memory at one instance, and then will not anymore, due to paging.
http://en.wikipedia.org/wiki/Paging
Essentially this means that the computer switches out data it doesn't need at one moment so that the memory can be used for something else.
I'm not sure if this helped or not, but I'd advise you to look at page tables and directories, since you can use these to translate to physical addresses.
You might be able to use pmap -d [pid] for this... unfortunately you'd have to run it on all processes to see which one returned a result for the given memory address. Certainly not as efficient as a kernel module (and you might not even get a result, if the memory is paged out while you're looking for it).

Resources