I'm doing a project right now and the project is to manipulate linked list(s) in the kernel. The project will implement a "toy" locking mechanism, in which all the locks are in a linked list(s). Please help me out in the following questions:
1) How to create a linked list in the kernel? Can I use functions in ? Or simply malloc(),etc.?
2) Locks are grouped by "lock group name" in this project, does this mean there should be multiple linked lists and each linked list represent a "lock group"? Thank you!
1) queue(3) man page documents some very useful macros in sys/queue.h, implementing lists and tail queues. This header also available in kernel.
Memory allocation in kernel documented in malloc(9) man page. Generally it's just like user-level malloc, but with additional type parameter useful for finding memory leaks. Your code should look like this:
MALLOC_DEFINE(M_MYMALLOC, "mydata", "Some data");
struct foo {
SLIST_ENTRY(foo) chain;
};
struct foo *bar = malloc(1000, M_MYMALLOC, M_WAITOK);
SLIST_INSERT_HEAD(&head, bar, chain);
SLIST_REMOVE_HEAD(&head, chain);
free(var, M_MYMALLOC);
2) It's hard to answer this question without knowing what "grouping" mean in context of your project.
Related
On Linux and other POSIX systems, a program can be executed under the identity of another user (i.e. euid). Normally, you'd call geteuid and friends to reliably determine the current identities of the process. However, I couldn't figure out a reliable way to determine these identities using only rust's standard library.
The only thing I found that was close is std::os::unix::MetadataExt.
Is it currently possible to determine the euid (and other ids) of process using the rust's standard library? Is there a function or trait I'm missing?
This is going to be on an OS-specific dependency as the concept does not exist (or do what you think it will!) for most of the targets you can build rust code for. In particular, you will find this in the libc crate, which is, as the name suggests, a very small wrapper over libc.
The std::os namespace is typically limited for the bare minimum to get process and FS functionality going for the std::process, std::thread and std::fs modules. As such, it would not have been in there. MetadataExt is, for a similar reason, aimed and targeted at filesystem usage.
As you could have expected, the call itself is, unimaginatively, geteuid.
It is an unsafe extern import, so you'll have to wrap it in an unsafe block.
It appears that Rust 1.46.0 doesn't expose this functionality in the standard library. If you're using a POSIX system and don't want to rely on an extra dependency, you have four options:
You can use libc directly:
#[link(name = "c")]
extern "C" {
fn geteuid() -> u32;
fn getegid() -> u32;
}
If you're using GNU/Linux in particular, you won't need to link to libc at all since the system call symbols are automatically made available to your program via the VDSO. In other words, you can use a plain extern block without the link attribute.
Read /proc/self/status (potentially Linux only?). This file contains a line that starts with Uid:. This line lists the real user id, effective user id, and other information that you may also find relevant. Refer to man proc for more information.
If you're using a normal GNU/Linux system, you can access the metadata of the /proc/self directory itself. As pointed out in this question, the owner of this directory should match the effective user id of the process. You can get the euid as follows:
use std::os::unix::fs::MetadataExt;
println!("metadata for {:?}", std::fs::metadata("/proc/self").map(|m| m.uid()));
A benefit this approach provides is that it is relatively cheap compared to option #2 since it's only a single stat syscall (as opposed to opening a file and reading/parsing its contents).
If you're not using a normal GNU/Linux system, you might find success in creating a new dummy file and obtaining the owner id normally via Metadata.
We are given a project where we implementing memory checkpointing (basic is just looking over pages and dumping data found to a file (also check info about the page (private, locked, etc)) and incremental which is where we only look at if data changed previously and dump it to a file). My understanding of this is we are pretty much building a smaller scale version of memory save states (I could be wrong but that's just what I'm getting from this). We are currently using VMA approach to our problem to go through the given range (as long as it doesn't go below or above the user space range (this means no kernel range or below user space)) in order to report the data found from the pages we encounter. I know the vma_area_struct is used to access vma (some functions including find_vma()). My issue is I'm not sure how we check the individual pages within this given range of addresses (user gives us) from using this vma_area_struct. I only know about struct page (this is pretty much it), but im still learning about the kernel in detail, so im bound to miss things. Is there something I'm missing about the vma_area_sruct when accessing pages?
Second question is, what do we use to iterate through each individual page within the found vma (from given start and end address)?
VMAs contain the virtual adresses of their first and (one after their) last bytes:
struct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
unsigned long vm_start; /* Our start address within vm_mm. */
unsigned long vm_end; /* The first byte after our end address
within vm_mm. */
...
This means that in order to get the page's data you need to first figure out in what context is your code running?
If it's within the process context, then a simple copy_from_user approach might be enough to get the actual data and a page walk (through the entirety of your PGD/PUD/PMD/PTE) to get the PFN and then turn it to a struct page. (Take care not to use the seductive virt_to_page(addr) as this will only work on kernel addresses).
In terms of iteration, you need only iterate in PAGE_SIZEs, over the virtual addresses you get from the VMAs.
Note that this assumes that the pages are actually mapped. If not (!pte_present(pte_t a)) you might need to remap it yourself to access the data.
If your check is running in some other context (such as a kthread/interrupt) you must remap the page from the swap before accessing it which is a whole different case. If you want the easy way, I'd look up here: https://www.kernel.org/doc/gorman/html/understand/understand014.html to understand how to handle swap lookup / retrieval.
I've been reading the documentation and the comments in the source code files but cannot figure out the exact function/code which is responsible for implementing the LRU in the latest release of the kernel. I want to make slight modifications to it, which is why i'm looking for it.
I've come across that the kernel maintains active and inactive lists. Where is this code?
Assuming kernel v3.18, most of the LRU-related code is in mm/swap.c. If you look at this file, there are many functions that are probably what are you interested in. For example:
void lru_cache_add_active_or_unevictable(struct page *page,
struct vm_area_struct *vma)
See: http://lxr.free-electrons.com/source/mm/swap.c#L660
There are other files in mm that are relevant, as well. Try looking at files related to the Linux virtual memory (often shortened to "vm") subsystem, and the files with "swap" in the name.
A lot of the literature on Linux's LRU stuff is out-of-date, as you have discovered. The general concepts are probably the same, but they've renamed/moved around a lot of things.
I'm trying to figure out if there is a library that gives me something near the equivalent of Windows custom performance counters (described here http://geekswithblogs.net/.NETonMyMind/archive/2006/08/20/88549.aspx)
Basically, I'm looking for something that can be used to both track global counters within an application, and (ideally) something that presents that information via a well-defined interface to other applications/users. These are application statistics; stuff like memory and disk can be captured in other ways, but I'm looking to expose throughput/transactions/"widgets" handled during the lifetime of my application.
I've seen this question:
Concept of "Performance Counters" in Linux/Unix
and this one
Registry level counters in Linux accessible from Java
but neither is quite what I'm looking for. I don't want to write a static file (this is dynamic information after all; I should be able to get at it even if the disk is full etc.), and would rather avoid a homegrown set of code if at all possible. Ideally, at least on Linux, this data would (I think) be surfaced through /proc in some manner, though it's not clear to me if that can be done from userland (this is less important, as long as it is surfaced in some way to clients.)
But back to the crux of the question: is there any built-in or suitable 3rd-party library that gives me custom global (thread-safe, performant) counters suitable for application metrics that I can use on Linux and other *NIXy operating systems? (And can be interfaced from C/C++?)
In addition to #user964970 comment/solution, I suggest making it OS agnostic.
Use an OS agnostic API, like ACE or BOOST, to create your own library, supplying a named-semaphore write-protected-counter, placed inside a named-shared-memory segment.
This should be your library's API :
long * createCounter(const char * name); // Create a counter
// Will create a named semaphore and a named
// shared memory segment, holding the counter
// value. Will return pointer to counter
long * getCounter(const char * name); // Get existing counter pointer
// in the calling process' address space
long incCounter(const char * name); // increment existing counter
I'm working on Linux kernel version 2.6.39.1, and am developing a block device driver. In this regard, I want to combine multiple struct bios into a single struct request, which is then added to the request_queue for processing by the device driver, namely -- scsi_request_fn().
I tried using the ->bi_next field of struct bio to link multiple struct bios that I have composed, thereby creating a linked list of struct bios. When I call submit_bio() to submit a bio to the block device layer for I/O, this BUG_ON() is triggered because the code expects bio->bi_next to be NULL.
Is there a way to link several struct bios into a single struct request before sending it to lower layers for servicing?
I'm not sure how to string multiple struct bio together, but you might want to take a look at the "task collector" implementation in libsas and the aic94xx driver for an alternate approach. There isn't much documentation, but the libsas documentation describes it as
Some hardware (e.g. aic94xx) has the capability to DMA more
than one task at a time (interrupt) from host memory. Task
Collector Mode is an optional feature for HAs which support
this in their hardware. (Again, it is completely optional
even if your hardware supports it.)
In Task Collector Mode, the SAS Layer would do natural
coalescing of tasks and at the appropriate moment it would
call your driver to DMA more than one task in a single HA
interrupt. DMBS may want to use this by insmod/modprobe
setting the lldd_max_execute_num to something greater than 1.
Effectively, this lets the block layer (a.k.a. BIO) remain unchanged, but multiple requests are accumulated at the driver layer and submitted together.
Thanks for the reply, #ctuffli. I've decided to use a structure similar to the one described here. Basically, I allocate a struct packet_data which would contain pointers to all struct bios that should be merged to form one single struct bio (and later on, one single struct request). In addition, I store some driver related information as well in this struct packet_data. Next, I allocate a new struct bio (lets call it "merged_bio"), copy all the pages from the list of original BIOs and then make the merged_bio->bi_private point to the struct packet_data. This last hack would allow me to keep track of the list of original BIOs, and also call bio_endio() to end I/O on all individual BIOs once the merged_bio has been successfully transferred.
Not sure if this is the smartest way to do this, but it does what I intended! :^)