Linux Shared Memory - linux

The function which creates shared memory in *inux programming takes a key as one of its parameters..
What is the meaning of this key? And How can I use it?
Edit:
Not shared memory id

It's just a System V IPC (inter-process communications) key so different processes can create or attach to the same block of shared memory. The key is typically created with ftok() which turns a fully-specified filename and project ID into a usable key.
Since an application can generally use the same filename in all its different processes (the filename is often a configuration file associated with your application), each different process gets the same key (or, more likely if your using the project ID to specify multiple shared memory segments, the same set of keys).
For example, we once had an application that used a configuration file processed by our lex/yacc code so we just used that pathname and one project ID for each different shared memory block (there were three or four depending on the purpose of the process in question). This actually made a lot of sense since it was the parsed and evaluated data from that configuration file that was stored in the shared memory blocks.
Since no other application on the system should be using our configuration file for making a key, there was no conflict. The key itself is not limited to shared memory, it could be used for semaphores and other IPC mechanisms as well.

The posix shared memory functions (shm_open and friends) have a more user-friendly interface in that they can accept a unique filename which must be used by the applications to open the same shared memory block.
Having said that, it is generally also feasible to open a file in /dev/shm under Linux and then mmap it with MAP_SHARED which achieves much the same.

Related

Is it possible to share memory using the SysV shmat() interface in one application and the Posix shm_open() interace in another?

Ignoring some details there are two low-level SHM APIs available for in Linux.
We have the older (e.g System V IPC vs POSIX IPC) SysV interface using:
ftok
shmctl
shmget
shmat
shmdt
and the newer Posix interface (though Posix seems to standardize the SysV one as well):
shm_open
shm_unlink
It is possible and safe to share memory such that one program uses shm_open() while the other uses shmget() ?
I think the answer is no, though someone wiser may know better.
shm_open(path,...) maps one file to a shared memory segment whereas ftok(path,id,...) maps a named placeholder file to one or more segments.
See this related question - Relationship between shared memory and files
So on the one hand you have a one to one mapping between filenames and segments and on the other a one to many - as in the linked question.
Also the path used by shmget() is just a placeholder. For shm_open() the map might be the actual file (though this is implementation defined).
I'm not sure there is anyway to make shm_open() and shmat() refer to the same memory location.
Even if you could mix them somehow it would probably be undefined behaviour.
If you look the glibc implementation of shm_open it is simply a wrapper to opening a file.
The implementation of shmget and shmat are internal system calls.
It may be that they share an implementation further down in the Linux kernal but this is not a detail that should be exposed or relied upon.

How does chroot affect dynamic libraries memory use?

Although there is another question with similar topic, it does not cover the memory use by the shared libraries in chrooted jails.
Let's say we have a few similar chroots. To be more specific, exactly the same sets of binary files and shared libraries which are actually hard links to the master copies to conserve the disk space (to prevent the potential possibility of a files alteration the file system is mounted read only).
How is the memory use affected in such a setup?
As described in the chroot system call:
This call changes an ingredient in the pathname resolution process and does nothing else.
So, the shared library will be loaded in the same way as if it were outside the chroot jail (share read only pages, duplicate data, etc.)
http://man7.org/linux/man-pages/man2/chroot.2.html
Because hardlinks share the same underlying inode, the kernel treats them as the same item when it comes to caching/mapping.
You'll see filesystem cache savings by using hardlinks, as well as disk-space savings.
The biggest issue I'd have with this is that if someone manages so subvert the read-only nature of one of the chroot environments, then they could subvert all of them by making modifications to any of the hardlinked files.
When I set this up, I copied the shared libraries per chroot instead of linking to a read-only mount. With separate files, the text segments were not shared. It's likely that the same inode will map to the same read-only text segment, but this may vary with available memory management hardware and similar architectural details.
Try this experiment on your system: write a small program that makes some minimal use of a large shared library. Run twenty or thirty chroot jails as you describe, each with a running copy of the program. Check overall memory usage before & during running, and dissect one instance to get a good text/data segment breakdown. If memory use increases by the full size of the map for each instance, the segments are not shared. Conversely, if memory use goes up by a fraction of the map, the segments are shared.

How many copies of program/class gets loaded into memory when multiple users accessing it at the same time

We are trying to setup Eclipse in a shared environment, i.e., it will be installed on a server and each user connects to it using VNC. There are different reasons for sharing Eclipse, one being proper integration with ClearCase.
We identified that Eclipse is using large amounts of memory. We are wondering whether the Eclipse(JVM?) loads each class once per user/session or whether there is any sort of sharing of objects that are already loaded into memory?
This makes me think about a basic question in general. How many copies of a program gets loaded into memory when two or more users are accessing the host at the same time.
Is it one per user or a single copy is shared between users?
Two questions here:
1) How many copies of a program gets loaded into memory when two or
more users are using it at the same time?
2) How does the above holds in the world of Java/JVM?
Linux allows for sharing binary code between running processes, i.e. the segments that hold executable parts of a program are mapped into virtual memory space of each running copy. Then each process gets its own data parts (stack, heap, etc.).
The issue with Java, or almost any other interpreted language, is that run-time, the JVM, treats byte-code as data, loading it into heap. The fact that Java is half-compiled and half interpreted is irrelevant here. This results in a situation where the JVM executable itself is eligible for code sharing by the OS, but your application Java code is not.
In general, a single copy of a program (i.e. text segment) is loaded into RAM and shared by all instances, so the exact same read-only memory mapped physical pages (though possibly/probably mapped to different addresses in different address spaces, but it's still the same memory). Data is usually private to each process, i.e. each program's data lives in separate pages RAM (though it can be shared).
BUT
The problem is that the actual program here is only the Java runtime interpreter, or the JIT compiler. Eclipse, like all Java programs, is rather data than a program (which however is interpreted as a program). That data is either loaded into the private address space and interpreted by the JVM or turned into an executable by the JIT compiler, resulting in a (temporary) executable binary, which is launched. This means, in principle, each Java program runs as a separate copy, using separate RAM.
Now, you might of course be lucky, and the JVM might load the data as a shared mapping, in this case the bytecode would occupy the same identical RAM in all instances. However, whether that's the case is something only the author of the JVM could tell, and it's not something you can rely on in general.
Also, depending on how clever the JIT is, it might cache that binary for some time and reuse it for identical Java programs, which would be very advantageous, not only because it saves the compilation. All instances launched from the same executable image share the same memory, so this would be just what you want.
It is even likely that this is done -- at least to some extent -- on your JIT compiler, because compiling is rather expensive and it's a common optimization.

How a process can refer to objects that are not in its address space (for example, a file or another process)?

I have the homework question:
Explain how a process can refer to objects that are not in its
address space (for example, a file or another process)?
I know that each process is created with an address space that defines access to every memory mapped resource in that process (got that from this book). I think that the second part to this question does not make sense. How can a process reference an object of another process? Isn't the OS suppose to restrict that? maybe I am not understanding the question correctly. Anyways if I understood the question correctly the only way that will be possible is by using the kernel I believe.
If you are asking it in a general sense, then its a no. Operating systems do not allow one process to access another process's virtual address space under the normal circumstances.
However there are ways in which you can create a controlled environment where such a thing can be done using various techniques.
A perfect example is the debugger. It uses process tracing mechanism (like reading from /proc filesystem or using the ptrace() system calls) to gain access to read and write from another address space.
There is also a shared memory concept, where a particular piece of memory is explicitly shared between two processes and can be controlled via a shared memory object.
You can attach as a debugger to the application. Or if using Windows, you can use windows hooks
I have researched and I have the answer to the file part of the question.
first of an address space is the collection of addresses that a
thread can reference. Normally these addresses reference to an
executable in memory. Some operating systems allow a programmer to
read and write the contents of a file using addresses from the process
address space. This is accomplished by opening the file, then binding each byte in the file to an address in the address space.
The second part of the question this is what I will answer:
Most operating systems will not allow reading addresses from another
process. This will imply a huge security risk. I have not heard of any
operating system that enables you to read data from a thread that is
not owned by the current process. So in short I believe this will not
be possible.

Shared memory: location and locking strategies

I have a writer that creates a shared memory region, I'd like to ensure that readers fail to shm_open() the region until the writer is ready. My hacky way of doing this is writer will shm_open in read-only mode. Once the region is correctly constructed I chmod() the file. This is yucky, and I cannot fcntl() the file descriptor to change the permissions. Any suggestions (short of doing some awful sync in the region?)
Why is chmod() yucky? Partly because there is no glibc code (exposed that is) to tell me where the shared memory region lives (eg /dev/shm). There is some code in glibc to look through the mounts, I'd prefer not to copy it but might have no choice if noone can give me a better solution than the chmod().
Instead of using shm_open, you can certainly use mmap - this allows you to use a file in a directory of your choice (maybe it is an optimisation to place this on a ramdisc).
But to solve the locking problem, maybe you should use a mutex in the shared region, or (at a push) just flock() the file.
If you are trying to make it behave as a queue however, maybe you should just use a more queue-type IPC object instead.

Resources