Allocating specific address in Linux

Allocating specific address in Linux - linux

I would like to allocate a memory in Linux in process at a specific address.
Actually I would like to do something like :
I will have number of process.
Each process will call an initialization function in a library (written by me) which will allocate some memory in address space of the process (which will store process related information). This will be done by each process
Once this memory is allocated, latter the program will call other function in the library. Now these function would like to access the memory allocated (containing process related information) by the first function.
The problem is that i cannot store the address of the memory allocated in the process address space in library (not even in static pointer as there are number of process) and i don't even want user program to store that address. I just don't want user program to know that there is memory allocated by library in their address space. Library function will be abstraction for them and they have to just use them.
Is it possible to to over come this problem.
I was thinking like, whenever any process calls initialization function of library which allocates memory , the memory always gets allocated at same address(say 10000) in all the process irrespective of all other things .
So that any library function which wants to access that memory can easily do by :
char *p=10000;
and then access, which will be access into the address space of the process which called the library function.

Not 100% I got what you are aiming for, but if you want to map memory into a specific set address you can use the MAP_FIXED flag to mmap():
"When MAP_FIXED is set in the flags argument, the implementation is informed that the value of pa shall be addr, exactly. If MAP_FIXED is set, mmap() may return MAP_FAILED and set errno to [EINVAL]. If a MAP_FIXED request is successful, the mapping established by mmap() replaces any previous mappings for the process' pages in the range [pa,pa+len)."
See mmap man page: http://linux.die.net/man/3/mmap

Your question doesnt make sense. As you have worded your question, a global variable in your library would work fine.
Maybe you are saying "a single process might load/unload your library and then load the library again and want the address on the second load". Maybe you are saying "there are 2 libraries and each library needs the same address". Simple. Use setenv() and getenv(). These will store/retrieve anything that can be represented as a string in a variable that has PROCESS WIDE SCOPE....i.e all libraries can see the same environment variables. Simply convert your address to a string (itoa), use setenv() to save it in an environment variable named "__SuperSecretGlobalAddress__", and then use getenv() to retrieve the value.
When your program starts up, a copy of the shell's environment is made for your process. getenv and setenv access and modify that copy. You cannot change the shell's environment with these functions.
See this post.

Related

If I private-`mmap` a file and read it, then another process writes to the same file, will another read at the same location return the same value?

(Context: I'm trying to establish which sequences of mmap operations are safe from the "memory safety" point of view, i.e. what assumptions I can make about mmaped memory without risking security bugs as a consequence of undefined behaviour, or miscompiles due to compilers making incorrect assumptions about how memory could behave. I'm currently working on Linux but am hoping to port the program to other operating systems in the future, so although I'm primarily interested in Linux, answers about how other operating systems behave would also be appreciated.)
Suppose I map a portion into file into memory using mmap with MAP_PRIVATE. Now, assuming that the file doesn't change while I have it mapped, if I access part of the returned memory, I'll be given information from the file at that offset; and (because I used MAP_PRIVATE) if I write to the returned memory, my writes will persist in my process's memory but will have no effect on the underlying file.
However, I'm interested in what will happen if the file does change while I have it mapped (because some other process also has the file open and is writing to it). There are several cases that I know the answers to already:
If I map the file with MAP_SHARED, then if any other process writes to the file via a shared mmap, my own process's memory will also be updated. (This is the intended behaviour of MAP_SHARED, as one of its intended purposes is for shared-memory concurrency.) It's less clear what will happen if another process writes to the file via other means, but I'm not interested in that case.
If the following sequence of events occurs:
I map the file with MAP_PRIVATE;
A portion of the file I haven't accessed yet is written by another process;
I read that portion of the file via my mapping;
then, at least on Linux, the read might return either the old value or the new value:
It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.
— man 2 mmap on Linux
(This case – which is not the case I'm asking about – is covered in this existing StackOverflow question.)
I also checked the POSIX definition of mmap, but (unless I missed it) it doesn't seem to cover this case at all, leaving it unclear whether all POSIX systems would act the same way.
Linux's behaviour makes sense here: at the time of the access, the kernel might have already mapped the requested part of the file into memory, in which case it doesn't want to change the portion that's already there, but it might need to load it from disk, in which case it will see any new value that may have been written to the file since it was opened. So there are performance reasons to use the new value in some cases and the old value in other cases.
If the following sequence of events occurs:
I map the file with MAP_PRIVATE;
I write to a memory address within the file mapping;
Another process changes that part of the file;
then although I don't know this for certain, I think it's very likely that the rule is that the memory address in question continues to reflect the old value, that was written by our process. The reason is that the kernel needs to maintain two copies of that part of the file anyway: the values as seen by our process (which, because it used MAP_PRIVATE, can write to its view of the file without changing the underlying file), and the values that are actually in the file on disk. Writes by other processes obviously need to change the second copy here, so it would be bizarre to also change the first copy; doing so would make the interface less usable and also come at a performance cost, and would have no advantages.
There is one sequence of events, though, where I don't know what happens (and for which the behaviour is hard to determine experimentally, given the number of possible factors that might be relevant):
I map the file with MAP_PRIVATE;
I read some portion of the file via the mapping, without writing;
Another process changes part of the file that I just read;
I read the same portion of the file via the mapping, again.
In this situation, am I guaranteed to read the same data twice? Or is it possible to read the old data the first time and the new data the second time?

Is it possible to force a range of virtual addresses?

I have an Ada program that was written for a specific (embedded, multi-processor, 32-bit) architecture. I'm attempting to use this same code in a simulation on 64-bit RHEL as a shared object (since there are multiple versions and I have a requirement to choose a version at runtime).
The problem I'm having is that there are several places in the code where the people who wrote it (not me...) have used Unchecked_Conversions to convert System.Addresses to 32-bit integers. Not only that, but there are multiple routines with hard-coded memory addresses. I can make minor changes to this code, but completely porting it to x86_64 isn't really an option. There are routines that handle interrupts, CPU task scheduling, etc.
This code has run fine in the past when it was statically-linked into a previous version of the simulation (consisting of Fortran/C/C++). Now, however, the main executable starts, then loads a shared object based on some inputs. This shared object then checks some other inputs and loads the appropriate Ada shared object.
Looking through the code, it's apparent that it should work fine if I can keep the logical memory addresses between 0 and 2,147,483,647 (32-bit signed int). Is there a way to either force the shared object loader to leave space in the lower ranges for the Ada code or perhaps make the Ada code "think" that it's addresses are between 0 and 2,147,483,647?

Is there a way to either force the shared object loader to leave space in the lower ranges for the Ada code
The good news is that the loader will leave the lower ranges untouched.
The bad news is that it will not load any shared object there. There is no interface you could use to influence placement of shared objects.
That said, dlopen from memory (which we implemented in our private fork of glibc) would allow you to do that. But that's not available publicly.
Your other possible options are:
if you can fit the entire process into 32-bit address space, then your solution is trivial: just build everything with -m32.
use prelink to relocate the library to desired address. Since that address should almost always be available, the loader is very likely to load the library exactly there.
link the loader with a custom mmap implementation, which detects the library of interest through some kind of side channel, and does mmap syscall with MAP_32BIT set, or
run the program in a ptrace sandbox. Such sandbox can again intercept mmap syscall, and or-in MAP_32BIT when desirable.
or perhaps make the Ada code "think" that it's addresses are between 0 and 2,147,483,647?
I don't see how that's possible. If the library stores an address of a function or a global in a 32-bit memory location, then loads that address and dereferences it ... it's going to get a 32-bit truncated address and a SIGSEGV on dereference.

How a process can refer to objects that are not in its address space (for example, a file or another process)?

I have the homework question:
Explain how a process can refer to objects that are not in its
address space (for example, a file or another process)?
I know that each process is created with an address space that defines access to every memory mapped resource in that process (got that from this book). I think that the second part to this question does not make sense. How can a process reference an object of another process? Isn't the OS suppose to restrict that? maybe I am not understanding the question correctly. Anyways if I understood the question correctly the only way that will be possible is by using the kernel I believe.

If you are asking it in a general sense, then its a no. Operating systems do not allow one process to access another process's virtual address space under the normal circumstances.
However there are ways in which you can create a controlled environment where such a thing can be done using various techniques.
A perfect example is the debugger. It uses process tracing mechanism (like reading from /proc filesystem or using the ptrace() system calls) to gain access to read and write from another address space.
There is also a shared memory concept, where a particular piece of memory is explicitly shared between two processes and can be controlled via a shared memory object.

You can attach as a debugger to the application. Or if using Windows, you can use windows hooks

I have researched and I have the answer to the file part of the question.
first of an address space is the collection of addresses that a
thread can reference. Normally these addresses reference to an
executable in memory. Some operating systems allow a programmer to
read and write the contents of a file using addresses from the process
address space. This is accomplished by opening the file, then binding each byte in the file to an address in the address space.
The second part of the question this is what I will answer:
Most operating systems will not allow reading addresses from another
process. This will imply a huge security risk. I have not heard of any
operating system that enables you to read data from a thread that is
not owned by the current process. So in short I believe this will not
be possible.

How the share library be shared by different processes?

I read some documents that share library comiled with -fPIC argument,
the .text seqment of the .so will be shared at process fork's dynamic linking stage
(eq. the process will map the .so to the same physical address)
i am interested in who (the kernel or ld.so ) and how to accomplish this?
maybe i should trace the code, but i dont know where to start it.
Nevertheless, i try to verify the statement.
I decide to check the function address like printf which is in the libc.so that all c program will link.
I get the printf virtual address of the process and need to get the physical address. Tried to write a kernel module and pass the address value to kernel, then call virt_to_phys. But it did not work cause the virt_to_phys only works for kmalloc address.
So, process page table look-at might be the solution to find the virtual address map to physical address. Were there any ways to do page table look-at? Or othere ways can fit the verify experiment?
thanks in advance!

The dynamic loader uses mmap(2) with MAP_PRIVATE and appropriate permissions. You can see what it does exactly by running a command from strace -e file,mmap. For instance:
strace -e file,mmap ls
All the magic comes from mmap(2). mmap(2) creates mappings in the calling process, they are usually backed either by a file or by swap (anonymous mappings). In a file-backed mapping, MAP_PRIVATE means that writes to the memory don't update the file, and cause that page to be backed by swap from that point on (copy-on-write).
The dynamic loader gets the info it needs from ELF's program headers, which you can view with:
readelf -l libfoo.so
From these, the dynamic loader determines what to map as code, read-only data, data and bss (zero-filled segment with zero size in file, non-zero size in memory, and a name only matched in crypticness by Lisp's car and cdr).
So, in fact, code and also data is shared, until a write causes copy-on-write. That is why marking constant data as constant is a potentially important space optimization (see DSO howto).
You can get more info on the mmap(2) manpage, and in Documentation/nommu-mmap.txt (the MMU case, no-MMU is for embedded devices, like ADSL routers and the Nintendo DS).

Shared libraries just a particular use of mapped files.
The address which a file is mapped at in a process's address space has nothing to do with whether it is shared or not.
Pages can be shared even if they are mapped at different addresses.
To find out if pages are being shared, do the following:
Find the address that the file(s) are mapped at by examining /proc/pid/maps
There is a tool which extracts data from /proc/pid/pagemap - find it and use it. This gives you info as to exactly which page(s) of a mapping are present and what physical location they are at
If two processes have a page mapped in at the same physical address, it is of course, shared.

Can I write-protect every page in the address space of a Linux process?

I'm wondering if there's a way to write-protect every page in a Linux
process' address space (from inside of the process itself, by way of
mprotect()). By "every page", I really mean every page of the
process's address space that might be written to by an ordinary
program running in user mode -- so, the program text, the constants,
the globals, and the heap -- but I would be happy with just constants,
globals, and heap. I don't want to write-protect the stack -- that
seems like a bad idea.
One problem is that I don't know where to start write-protecting
memory. Looking at /proc/pid/maps, which shows the sections of memory
in use for a given pid, they always seem to start with the address
0x08048000, with the program text. (In Linux, as far as I can tell,
the memory of a process is laid out with the program text at the
bottom, then constants above that, then globals, then the heap, then
an empty space of varying size depending on the size of the heap or
stack, and then the stack growing down from the top of memory at
virtual address 0xffffffff.) There's a way to tell where the top of
the heap is (by calling sbrk(0), which simply returns a pointer to the
current "break", i.e., the top of the heap), but not really a way to
tell where the heap begins.
If I try to protect all pages from 0x08048000 up to the break, I
eventually get an mprotect: Cannot allocate memory error. I don't know why mprotect would be
allocating memory anyway -- and Google is not very helpful. Any ideas?
By the way, the reason I want to do this is because I want to create a
list of all pages that are written to during a run of the program, and
the way that I can think of to do this is to write-protect all pages,
let any attempted writes cause a write fault, then implement a write
fault handler that will add the page to the list and then remove the write
protection. I think I know how to implement the handler, if only I could
figure out which pages to protect and how to do it.
Thanks!

You recieve ENOMEM from mprotect() if you try to call it on pages that aren't mapped.
Your best bet is to open /proc/self/maps, and read it a line at a time with fgets() to find all the mappings in your process. For each writeable mapping (indicated in the second field) that isn't the stack (indicated in the last field), call mprotect() with the right base address and length (calculated from the start and end addresses in the first field).
Note that you'll need to have your fault handler already set up at this point, because the act of reading the maps file itself will likely cause writes within your address space.

Start simple. Write-protect a few page and make sure your signal handler works for these pages. Then worry about expanding the scope of the protection. For example, you probably do not need to write-protect the code-section: operating systems can implement write-or-execute protection semantics on memory that will prevent code sections from ever being written to:
http://en.wikipedia.org/wiki/Self-modifying_code#Operating_systems

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string