I know for malloc sbrk is the system call invoked ,Similarly What is the system cal invoked when i write to a malloed memory(heap memory)
int main
{
/* 10 byte of heap memory allocated */
char *ptr = malloc(5);
ptr[0] = 10; // **What is the system call invoked for
writing into this heap memory** ?????
}
There are no system call involved in this case. Ask you compiler to generate assembly so that you can see that there is only some MOV instructions there. Or you can use a debugger to see the assembly
Accessing memory does not require a system call. On the contrary, accessing memory is what most of your code does most of the time! On a modern OS, you have a flat view of a contiguous range of virtual memory, and you typically only need a system call to mark a particular region (a "page") of that memory as valid; other times, contiguously growing memory ranges such as the call stack don't even require any action on your program's part. It's solely the job of your operating system's memory manager to intercept accesses to memory that isn't mapped to physical memory (via a page fault), do some kernel magic to bring the desired memory into physical space and return control to your program.
The only reason malloc occasionally needs to perform a system call is because it asks the operating system for a random piece of virtual memory somewhere in the middle. If your program were to only function with global and local variables (but no dynamic allocation), you wouldn't need any system calls for memory management.
"operating system doesn't see every write that occurs: a write to memory corresponds simply to a STORE assembly instruction, not a system call. It is the hardware that takes care of the STORE and the necessary address translation. The only time the OS will see a memory write is when the address translation in the page tables fails, causing a trap to the OS. "
Please read the below link for details
http://pages.cs.wisc.edu/~dusseau/Classes/CS537-F04/Questions/sol12.html
Related
For fun I am just trying to write a program in assembly for Linux on a laptop with an x86 processor to get some system information. So one of the things I am trying to find is how much memory is available to my program, and where e.g. the stack is and if and how I can allocate additional memory if needed.
Long time ago I did things like this on an Atari ST and there was just a system 'malloc' I could ask memory from and there were functions to find the available memory.
I know Linux is set up differently and I kind of have the whole address space to myself, but I guess there are some memory areas I am not allowed to touch.
And somehow a default stack seems to have been setup.
I researched quite a bit for this, but I can't find any 'assembly' system call. Most people point to linking the C malloc for memory management, but I am not looking for a memory manager. I just want to know the memory boundaries of my program.
I find things like getrlimit, setrlimit, prlimit and brk and sbrk, but those seem to be C functions and not system calls.
What am i missing?
Linux uses virtual memory (and ASLR). Atari ST doesn't use either so it did have a fixed memory map for some OS data structures and code. (Because the OS was in ROM and couldn't be easily updated, some people even documented some internal addresses.)
Linux tries to keep the boundary between kernel and user-space rigid, with a well-defined documented API / ABI for user-space to interact with the kernel via system calls. (e.g. on x86-64, via the syscall instruction). User-space doesn't need to care what's on the other side of that wall, and usually not even where its pages are in virtual memory as long as it has pointers to them.
When glibc malloc wants more pages from the OS, it uses mmap(MAP_ANONYMOUS) or brk to get them, and hand out chunks of them for small calls to malloc. It keeps bookkeeping data structures in user-space (so that's per-process of course).
I know Linux is set up differently and I kind of have the whole address space to myself, but I guess there are some memory areas I am not allowed to touch.
Yeah, every process has its own virtual address space. You can only touch the parts you've allocated, otherwise the resulting page fault will be "invalid" (OS knows there isn't supposed to be a physical page for that virtual page) and will deliver a SIGSEGV signal to your process if you try to read or write it. ("valid" page faults happen because of swap space or lazy allocation / copy-on-write; the kernel updates the HW page tables and returns to user-space for it to re-run the instruction that faulted.)
Also, the kernel claims the high half of virtual address space for its own use. (https://wiki.osdev.org/Higher_Half_Kernel). See also https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt for Linux's x86-64 memory map layout.
I can't find any 'assembly' system call.
mmap and brk are true system calls. See the "notes" section of the brk(2) man page. Section 2 man pages are system calls, section 3 are libc functions.
Of course in C when you call mmap(...), you're actually calling a wrapper function in glibc. glibc provides wrapper functions, not inline asm macros that use the syscall instruction directly.
See also The Definitive Guide to Linux System Calls which explains the asm interface, and also the VDSO pages. Linux maps some kernel memory (read-only) into your user-space process, holding code and data so getpid() and clock_gettime() can run in user-space.
Also various Q&As on Stack Overflow, including What are the calling conventions for UNIX & Linux system calls on i386 and x86-64
So one of the things I am trying to find is how much memory is available to my program
There isn't a system call to query the current memory map of your process. Parsing /proc/self/maps would be your best bet.
See Finding mapped memory from inside a process for some fun ideas on using system calls to scan ranges of virtual address space for mapped pages. e.g. Like Linux's mincore(2) syscall returns -ENOMEM if the specified range contains any unmapped pages.
I had a debate with friend who said that in C, the memory is allocated in compile time.
I said it cannot be true since memory can be allocated only when the process loads to memory.
From this to that we started talking about stack allocation, after reading, I know its OS implementation,
But usually the stack start in default size, 1 MB (sequential) lets say, of reserved Virtual address, and when the function is called (no matter where we are on the stack of that thread),
A block of bytes in size X (all local variable for that function?) is commited (allocated) from the reserved address.
What I would like to understand, When we enter the function, How does the OS knows how big is the size to allocate (commit from reserved)?
Does it happen dynamically during the function execution ? or the compiler knows to calculate before each function what size this function needs ?
The compiler can and does (at compile time) count up the sizes of all the local variables that the function declares on the stack, and thereby it knows (again, at compile time) how much it will need to increase the stack-pointer when the function is entered (and decrease it when the function returns). This computed amount-to-increase-the-stack-pointer-by value will be written into the executable code directly at compile time, so that it doesn't need to be re-computed when the function is called at run time.
The exception to the above is when your C program is using C99s's variable-length-arrays (VLA) feature. As suggested by the name, variable-length-arrays are arrays whose size is not known until run-time, and as such the compiler will have to emit special code for any function that contains one or more VLAs, such that the amount by which to increase the stack-pointer is calculated at run-time.
Note that the physical act of mapping virtual stack addresses to physical RAM addresses (and making sure that the necessary RAM is allocated) is done at run time, and is handled by the operating system, not by the compiler. In particular, if a process tries to access a virtual address (on the stack or otherwise) that is not currently mapped to any physical address, a page fault will be generated by the MMU. The process's executation will be temporarily paused while a page-fault-handler routine executes. The page-fault-handler will evaluate the legality of the virtual address the process tried to access; if the virtual address was a legal one, the page-fault-handler will map it to an appropriate page of physical RAM, and then let the process continue executing. If the virtual address was not one the process is allowed to access (or if the attempt to procure a page of physical RAM failed, e.g. because the computer's memory is full), then the mapping will fail and the OS will halt/crash the process.
When I searched about "What happens if malloc and exit with not free?", I could find answers saying "Today, OS will recover all the allocated memory space after a program exit".
In that answer, what is the meaning of "recover"?.
OS just delete it's PCB and page table when process exit doesn't it?
Are there additional tasks OS has to do for complete termination of process?
When a program starts, the OS allocates some memory to it. During the programs execution the program can request more blocks of memory from the OS and it can release them as well when it doesn't need them any more. When the program exits, all the memory used by it is returned to the OS.
The pair malloc()/free() (and their siblings and equivalents) do not interact with the OS1. They manage a block of memory (called "heap") the program already got from the OS when it was launched.
All in all, from the OS point of view it doesn't matter if your program uses free() or not. For the program it's important to use free() when a piece of memory is not needed to let further allocations succeed (by reusing the freed memory blocks).
1 This is not entirely true. The implementation of malloc() may get more memory blocks from the OS to extend the heap when it is full but this process is transparent to the program. For the program's point of view, malloc() and free() operate inside a memory block that already belongs to the program.
The operating system allocates and manages memory pages in a process. As part of the process cleanup at exit, the operating system has to deallocate the pages assigned to a process. This includes the page tables, page file space, and physical page frames mapped to logical pages. This takes is complicated because multiple processes may map to the same physical page frames which requires some form of reference counting.
The heap is just memory. The operating system has no knowledge whatsoever of process heaps. Inside malloc (and similar functions), there will be calls to operating system services to map pages to the process address space. The operating system creates the pages but does not care what the pages are used for.
If you do malloc's without corresponding free's your process will keep requesting more and more pages from the operating system (until you reach the point where the system will fail to allocate more pages). All you are doing is screwing up the heap within your own process.
When the process exits, the operating system will just get rid of the the pages allocated to the heap and your application's failure to call free will cause no problem to the system at all.
Thus, there is a two level system at work. malloc allocates bytes. The operating system allocates pages.
I was wondering that if the space required on heap is not large enough
such that there is no need for a brk/sbrk system all (to shift the break pointer (brk) of data segment), how does a library function (such as malloc) allocates space on heap.
I am not asking about the data-structures and algorithms for heap management. I am just asking how does malloc get the address of the first location of the heap if it doesn't invoke a system call. I am asking this because I have heard that it is not always necessary to invoke a system call (brk/sbrk) as these are only required to expand the space.Please correct me if I am wrong.
The basic idea is that when your program starts, the heap is very small, but not necessarily zero. If you only allocate (malloc) a small amount of memory, the library is able to handle it within the small amount of space it has when it is loaded. However, when malloc runs out of that space, it needs to make a system call to get more memory.
That system call is often sbrk(), which moves the top of the heap's memory region up by a certain amount. Usually, the malloc library routine increases the heap by larger than what is needed for the current allocation, with the hope that future allocations can be performed w/o making a system call.
Other implementations of malloc use mmap() instead -- this allows the program to create a sparse virtual memory mapping. However, mmap() based malloc implementations do the same thing as the sbrk()-based ones: each system call reserves more memory than what is necessarily needed for the current call.
One way to look at this is to trace a program that uses malloc: you'll see that for N calls to malloc, you will see M system calls (where M is much smaller than N).
The short answer is that it uses sbrk() to allocate a big hunk, which at that point belongs to your app process. It can then further parcel out subsections of that as individual malloc calls without needing to ask the system for anything, until it exhausts that space and needs to resort sbrk() again.
You said you didn't want the details on the data structures, but suffice it to say that the implementation of malloc (i.e. your own process, not the OS kernel) is keeping track of which space in the region it got from the system is spoken for and which is still available to dole out as individual mallocs. It's like buying a big tract of land, then subdividing it into lots for individual houses.
Use sbrk() or mmap() — http://linux.die.net/man/2/sbrk, http://linux.die.net/man/2/mmap
How exactly does the copy_from_user() function work internally? Does it use any buffers or is there any memory mapping done, considering the fact that kernel does have the privilege to access the user memory space?
The implementation of copy_from_user() is highly dependent on the architecture.
On x86 and x86-64, it simply does a direct read from the userspace address and write to the kernelspace address, while temporarily disabling SMAP (Supervisor Mode Access Prevention) if it is configured. The tricky part of it is that the copy_from_user() code is placed into a special region so that the page fault handler can recognise when a fault occurs within it. A memory protection fault that occurs in copy_from_user() doesn't kill the process like it would if it is triggered by any other process-context code, or panic the kernel like it would if it occured in interrupt context - it simply resumes execution in a code path which returns -EFAULT to the caller.
regarding "how bout copy_to_user since the kernel is passing on the kernel space address,how can a user space process access it"
A user space process can attempt to access any address. However, if the address is not mapped in that process user space (i.e. in the page tables of that process) or if there is a problem with the access like a write attempt to a read-only location, then a page fault is generated. Note that at least on the x86, every process has all the kernel space mapped in the lowest 1 gigabyte of that process's virtual address space, while the 3 upper gigabytes of the 4GB total address space (I'm using here the 32-bit classic case) are used for the process text (i.e. code) and data.
A copy to or from user space is executed by the kernel code that is executing on behalf of the process and actually it's the memory mapping (i.e. page tables) of that process that are in-use during the copy. This takes place while execution is in kernel mode - i.e. privileged/supervisor mode in x86 language.
Assuming the user-space code has passed a legitimate target location (i.e. an address properly mapped in that process address space) to have data copied to, copy_to_user, run from kernel context would be able to normally write to that address/region w/out problems and after the control returns to the user, user space also can read from this location setup by the process itself to start with.
More interesting details can be found in chapters 9 and 10 of Understanding the Linux Kernel, 3rd Edition, By Daniel P. Bovet, Marco Cesati. In particular, access_ok() is a necessary but not sufficient validity check. The user can still pass addresses not belong to the process address space. In this case, a Page Fault exception will occur while the kernel code is executing the copy. The most interesting part is how the kernel page fault handler determines that the page fault in such case is not due to a bug in the kernel code but rather a bad address from the user (especially if the kernel code in question is from a kernel module loaded).
The best answer has something wrong, copy_(from|to)_user can't be used in interrupt context, they may sleep, copy_(from|to)_user function can only be used in process context,
the process's page table include all the information that kernel need to access it, so kernel can direct access the user space address if we can make sure the page addressed is in memory, use copy_(from|to)_user function, because they can check it for us and if the user space addressed page is not resident, it will fix it for us directly.
The implementation of copy_from_user() system call is done using two buffers from different address spaces:
The user-space buffer in user virtual address space.
The kernel-space buffer in kernel virtual address space.
When the copy_from_user() system call is invoked, data is copied from user buffer to kernel buffer.
A part (write operation) of character device driver code where copy_from_user() is used is given below:
ssize_t cdev_fops_write(struct file *flip, const char __user *ubuf,
size_t count, loff_t *f_pos)
{
unsigned int *kbuf;
copy_from_user(kbuf, ubuf, count);
printk(KERN_INFO "Data: %d",*kbuf);
}