What address check acees_ok() does in Linux - linux

I am reading about copy_from_user(…) and copy_to_user(…) which copies from user and writes back to user space from kernel. when i see the internal implementation of copy_from_user(…), it is having two functions
access_ok(…) and memcpy(…), when i read about access_ok(…), it is saying access_ok(…) is used to check whether the userspace pointer is valid or not.
What check does access_ok(…) is doing to check the pointer validity?

In short it checks that memory is actually mapped at the given address. This is done by looking at the page mapping table, testing if that address has a matching entry there.
Furthermore it is tested that the address resides in user space, which is simply testing that its numeric value is in the lower half of the valid range of address values.

Related

2 Questions about memory check pointing with linux kernel (custom implementation)

We are given a project where we implementing memory checkpointing (basic is just looking over pages and dumping data found to a file (also check info about the page (private, locked, etc)) and incremental which is where we only look at if data changed previously and dump it to a file). My understanding of this is we are pretty much building a smaller scale version of memory save states (I could be wrong but that's just what I'm getting from this). We are currently using VMA approach to our problem to go through the given range (as long as it doesn't go below or above the user space range (this means no kernel range or below user space)) in order to report the data found from the pages we encounter. I know the vma_area_struct is used to access vma (some functions including find_vma()). My issue is I'm not sure how we check the individual pages within this given range of addresses (user gives us) from using this vma_area_struct. I only know about struct page (this is pretty much it), but im still learning about the kernel in detail, so im bound to miss things. Is there something I'm missing about the vma_area_sruct when accessing pages?
Second question is, what do we use to iterate through each individual page within the found vma (from given start and end address)?
VMAs contain the virtual adresses of their first and (one after their) last bytes:
struct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
unsigned long vm_start; /* Our start address within vm_mm. */
unsigned long vm_end; /* The first byte after our end address
within vm_mm. */
...
This means that in order to get the page's data you need to first figure out in what context is your code running?
If it's within the process context, then a simple copy_from_user approach might be enough to get the actual data and a page walk (through the entirety of your PGD/PUD/PMD/PTE) to get the PFN and then turn it to a struct page. (Take care not to use the seductive virt_to_page(addr) as this will only work on kernel addresses).
In terms of iteration, you need only iterate in PAGE_SIZEs, over the virtual addresses you get from the VMAs.
Note that this assumes that the pages are actually mapped. If not (!pte_present(pte_t a)) you might need to remap it yourself to access the data.
If your check is running in some other context (such as a kthread/interrupt) you must remap the page from the swap before accessing it which is a whole different case. If you want the easy way, I'd look up here: https://www.kernel.org/doc/gorman/html/understand/understand014.html to understand how to handle swap lookup / retrieval.

ARM domains in the Linux kernel

I have been reading through some ARM code in order to try and understand what exactly the cpu_domain field inside the struct thread_info represents. In an attempt to understand how it is used, I looked through the places where the variable is referenced. I am trying to understand the following :-
Why is the field present in thread_info? I can see that when a context switch happens, the value is set / read, but why? What purpose does the field serve?
I had a look at the function modify_domain that seems to retrieve the domain value and set it in coprocessor CP15, c3. But where is this used? Any system call that takes in addresses verifies it against addr_limit, and page tables have the supervisor bit to check if reads/writes are allowed from userspace. So where do ARM domains come into the picture?

How can I get A value of ELF file?

I'm Beginner of Linux System and I'm studying ELF File Format reading this Documents(http://www.skyfree.org/linux/references/ELF_Format.pdf).
But When i see related Document about Relocation, There is Strange Things in Relocation Calculation.
i know according to relocation type, it applies different way to calculate.
But look at this.
When R-type is R_386_RELATIVE, this document says the way to calculate "B + A".
However What is "A" meaning exactly? and How can I get this "A" Value in ELF File?
Please give me ur merciful answer.......;
From document you mentioned:
R_386_RELATIVE
The link editor creates this relocation type for dynamic linking. Its offset member gives a location within a shared object that contains a value representing a relative address. The dynamic linker computes the corresponding virtual address by adding the virtual address at which the shared object was loaded to the relative address. Relocation entries for this type must specify 0 for the symbol table index.
A
This means the addend used to compute the value of the relocatable field.
B
This means the base address at which a shared object has been loaded into memory during execution. Generally, a shared object file is built with a 0 base virtual address, but the execution address will be different.
Addend
As shown above, only Elf32_Rela entries contain an explicit addend. Entries of type `Elf32_Rel store an implicit addend in the location to be modified. Depending on the processor architecture, one form or the other might be necessary or more convenient. Consequently, an implementation for a particular machine may use one form exclusively or either form depending on context.
Base Address
To compute the base address, one determines the memory address associated with the lowest p_vaddr value for a PT_LOAD segment. One then obtains the base address by truncating the memory address to the nearest multiple of the maximum page size. Depending on the kind of file being loaded into memory, the memory address might or might not match the p_vaddr values.
So it boils down to next:
A is addend and calculated from Elf32_Rel or Elf32_Rela structure
B is base address and calculated from p_vaddr. Particular calculation depends on architecture.
You can observe relocation section of some binary/library file using readelf -r.

distinguishing between forwarding addresses and non-copied objects in stop and copy garbage collection

I've read many descriptions of the semispace stop-and-copy collector, and they all have one step in common:
If the pointer points to an object in fromspace, that object is evacuated. But if the pointer points to a forwarding address, the object has already been evacuated, so the old address is simply replaced with the new address.
How does one distinguish whether a pointer points to a forwarding address as opposed to a non-copied object? I've seen numerous claims that stop-and-copy doesn't need a block header, but I don't see how to distinguish the two choices without one. Every description I've read handwaves the issue as "if the target is a forwarding address", without describing how to tell if the target is a forwarding address or not.
For example, I took a look at an implementation here: https://gist.github.com/DmitrySoshnikov/4736334
And it appears to be cheating, by using an extra field "forwardingAddress". That isn't the block-header-free implementation I was led to believe existed.
The only solution I can think of right now is that the garbage collector is conservative instead of precise. But again, nothing I've read has suggested that the stop-and-copy collector is not precise.

Allocating specific address in Linux

I would like to allocate a memory in Linux in process at a specific address.
Actually I would like to do something like :
I will have number of process.
Each process will call an initialization function in a library (written by me) which will allocate some memory in address space of the process (which will store process related information). This will be done by each process
Once this memory is allocated, latter the program will call other function in the library. Now these function would like to access the memory allocated (containing process related information) by the first function.
The problem is that i cannot store the address of the memory allocated in the process address space in library (not even in static pointer as there are number of process) and i don't even want user program to store that address. I just don't want user program to know that there is memory allocated by library in their address space. Library function will be abstraction for them and they have to just use them.
Is it possible to to over come this problem.
I was thinking like, whenever any process calls initialization function of library which allocates memory , the memory always gets allocated at same address(say 10000) in all the process irrespective of all other things .
So that any library function which wants to access that memory can easily do by :
char *p=10000;
and then access, which will be access into the address space of the process which called the library function.
Not 100% I got what you are aiming for, but if you want to map memory into a specific set address you can use the MAP_FIXED flag to mmap():
"When MAP_FIXED is set in the flags argument, the implementation is informed that the value of pa shall be addr, exactly. If MAP_FIXED is set, mmap() may return MAP_FAILED and set errno to [EINVAL]. If a MAP_FIXED request is successful, the mapping established by mmap() replaces any previous mappings for the process' pages in the range [pa,pa+len)."
See mmap man page: http://linux.die.net/man/3/mmap
Your question doesnt make sense. As you have worded your question, a global variable in your library would work fine.
Maybe you are saying "a single process might load/unload your library and then load the library again and want the address on the second load". Maybe you are saying "there are 2 libraries and each library needs the same address". Simple. Use setenv() and getenv(). These will store/retrieve anything that can be represented as a string in a variable that has PROCESS WIDE SCOPE....i.e all libraries can see the same environment variables. Simply convert your address to a string (itoa), use setenv() to save it in an environment variable named "__SuperSecretGlobalAddress__", and then use getenv() to retrieve the value.
When your program starts up, a copy of the shell's environment is made for your process. getenv and setenv access and modify that copy. You cannot change the shell's environment with these functions.
See this post.

Resources