How many copies of program/class gets loaded into memory when multiple users accessing it at the same time - linux

We are trying to setup Eclipse in a shared environment, i.e., it will be installed on a server and each user connects to it using VNC. There are different reasons for sharing Eclipse, one being proper integration with ClearCase.
We identified that Eclipse is using large amounts of memory. We are wondering whether the Eclipse(JVM?) loads each class once per user/session or whether there is any sort of sharing of objects that are already loaded into memory?
This makes me think about a basic question in general. How many copies of a program gets loaded into memory when two or more users are accessing the host at the same time.
Is it one per user or a single copy is shared between users?
Two questions here:
1) How many copies of a program gets loaded into memory when two or
more users are using it at the same time?
2) How does the above holds in the world of Java/JVM?

Linux allows for sharing binary code between running processes, i.e. the segments that hold executable parts of a program are mapped into virtual memory space of each running copy. Then each process gets its own data parts (stack, heap, etc.).
The issue with Java, or almost any other interpreted language, is that run-time, the JVM, treats byte-code as data, loading it into heap. The fact that Java is half-compiled and half interpreted is irrelevant here. This results in a situation where the JVM executable itself is eligible for code sharing by the OS, but your application Java code is not.

In general, a single copy of a program (i.e. text segment) is loaded into RAM and shared by all instances, so the exact same read-only memory mapped physical pages (though possibly/probably mapped to different addresses in different address spaces, but it's still the same memory). Data is usually private to each process, i.e. each program's data lives in separate pages RAM (though it can be shared).
BUT
The problem is that the actual program here is only the Java runtime interpreter, or the JIT compiler. Eclipse, like all Java programs, is rather data than a program (which however is interpreted as a program). That data is either loaded into the private address space and interpreted by the JVM or turned into an executable by the JIT compiler, resulting in a (temporary) executable binary, which is launched. This means, in principle, each Java program runs as a separate copy, using separate RAM.
Now, you might of course be lucky, and the JVM might load the data as a shared mapping, in this case the bytecode would occupy the same identical RAM in all instances. However, whether that's the case is something only the author of the JVM could tell, and it's not something you can rely on in general.
Also, depending on how clever the JIT is, it might cache that binary for some time and reuse it for identical Java programs, which would be very advantageous, not only because it saves the compilation. All instances launched from the same executable image share the same memory, so this would be just what you want.
It is even likely that this is done -- at least to some extent -- on your JIT compiler, because compiling is rather expensive and it's a common optimization.

Related

How is programming in rtems different than Linux?

I am new to programmming in rtem and was wondering how are the two, rtems and linux, are different in terms of programming. I understand rtems is an real time operating system but if you were to make a hello world app, wouldn’t the program be the same?
Note that your question is quite generic. There are a lot of detail differences.
One of the biggest is the format of your binary: Most RTEMS binaries are statically linked together. You only have one big binary containing your system and application. There is some dynamic loading supported but it's not the case used by most users.
As already mentioned my n.m. in the comments RTEMS has a lot of the POSIX API (at least the embedded sub set). So you can use a lot of the same API like you do on Linux.
A big differences is that RTEMS has a global address space (on most targets). So you don't have a separation between tasks. That makes pointer errors a bit harder to debug.
Also a difference: Most embedded systems are targeted for long running applications. In such applications (regardless whether you are on Linux or on RTEMS or on any other system) you should be careful to clean up your stuff (close files, free memory, ...). In Linux (or other desktop class systems) you have processes and the kernel cleans up all resources after your process exits. Although you can create threads in RTEMS no one cleans up after a thread exits.
The POSIX attribute defaults for threads are not specified in the standard and may vary between RTEMS and Linux.

modular kernel vs micro kernel / monolitic kernel

I am C programmer and new to the Linux kernel programming. I could find there are 3 type of kernel monolithic,micro and modular kernel.while googling i could find some website say linux is having monolithic kernel (in Stack overflow) and some other says micro kernel and the rest say hybrid kernel. So i am totally confused while reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single processes this is also bit confusing if so
Before you try to understand those differences, you have to understand other concepts first:
1. Modular programming.
Module is a functionally complete part of a program. Module usually has following properties:
Separation of interface and implementation.
Initialization and deinitialization routines. Both are optional. Deinitialization routine is likely to be missing in environment with GC (Garbage Collector).
Modules used by a program compose directed acyclic graph a.k.a. dependency graph (you might have heard about this - cyclic dependencies are not allowed, dependency is initialized before dependent module).
Modular programming is essential when building large systems. Every big kernel is a modular kernel, regardless of whether it is monolithic, hybrid or microkernel.
Sometimes modules can be loaded and unloaded dynamically. Dynamic modules are essential part of any extensible system. Those can be plugins or, if we talk about kernels, drivers that are developed and distributed separately from the kernel.
2. Safe and unsafe languages.
Safe languages very strictly define what can happen in a program. Most importantly they have no concept of malformed program (or meaningless program). Every program is valid and its execution always follows language specification. Whether program does what programmer expects it to do or not, is irrelevant in this context.
Common traits of safe languages:
They use garbage collection.
They have no pointer arithmetics. That means that writing or reading to an arbitrary address is not allowed.
They prevent out of range array access (if there is such concept). Exceptions or similar mechanisms can be used to signal and recover from such failures.
References (or pointers) have only two possible states: null reference and reference to a valid object. This is guaranteed by garbage collector. In fact, GC is the key component here. Some languages go even further and do no allow null references at all.
Every object (memory chunk in use) has type information assigned to it, object can only be accessed through a reference of appropriate type e.g. you cannot access array of integers through reference to a string.
You can add more entries to this list, but basic idea is to guarantee that program can only access valid memory regions using valid operations. Keep in mind that some unsafe languages can share some or even all of those traits.
Examples of safe languages: Python, Java, safe subset of C#.
Unsafe languages define what can and cannot be done in a program, but there is usually little to nothing to stop programmer from doing the wrong thing. Program that violates those rules is called a malformed program. From language point of view such program is meaningless and language does not even try to define its behavior, as it is usually near to impossible to do. In terms of C such program's behavior is undefined.
Examples of unsafe languages: Assembler, C, C++, Pascal.
3. Hardware is unsafe and thus has to be programmed using unsafe language.
Most hardware does nothing to provide you with a safe environment. There were some processors that used to attach type information to every memory cell (see tagged architecture), but modern ones do not do this as it complicates hardware, making it slower, more expensive and less generic.
Still, some features are provided to make it possible to implement safe environments within unsafe environment of hardware, such as memory protection, separate address spaces and separation of execution modes on user mode and kernel mode (a.k.a. supervisor mode).
Kernel is what runs on bare metal and thus much of it has to be written in unsafe languages like C and Assembly. Another reason is performance - safe environments imply huge overhead.
Microkernel and Monolithic kernel
Monolithic kernel and its modules run in a single shared address space. And since everything is usually written in an unsafe language, it is possible for any part of the kernel to access (and damage) memory that belongs to another part of the kernel due to bugs in code. Unsafe nature of this environment makes it impossible to detect or recover from those failures and most importantly predict kernel behavior after such failures.
Microkernel is an attempt to overcome those limitation by moving various parts of the kernel to a separate address spaces, effectively isolating them from each other, but providing safe way to communicate with each other (usually through message passing). Such separation creates safe environment composed from multiple unsafe processes, allowing kernel to recover from failure of some of its subsystems.
At the same time monolithic kernel can be able to run parts of it in a separate address space (FUSE), while nothing stops microkernel from being able to support modules that share address space with the main part of the kernel.
If most of the kernel runs in a single address space, it is considered to be a monolithic kernel. If most of it runs in separate address spaces, such kernel is considered to be a microkernel. If kernel is somewhere inbetween and actively uses both approaches, you have a hybrid kernel.
Hybrid kernel
Concept of hybrid kernel implies combining best of both worlds and was invented by Microsoft to boost sales of Windows NT in 90s. Joke! But this is almost true. Every important part of Windows NT runs in a shared address space, so it is another example of monolithic design.
Many people seem to use this term when describing monolithic kernel that is able to dynamically load modules. This is because in the past monolithic kernels didn't support dynamic module loading and had to be recompiled every time module is added to the kernel. Microkernels are not about dynamic module loading, but about reliability of the kernel, about its ability to recover from failues of its subsystems.
The answer: Linux is a monolithic kernel.
Monolithic kernel can be modular and can dynamically load modules. Microkernel, on the other hand, has to be modular and has to be able to dynamically load modules - the whole idea is about running them in a separate address space.
Microkernel is not the only way of overcoming unsafe nature of monolithic kernel. Another way is to write monolithic kernel in a safe language. One problem with such approach is that safe environment should be either provided by hardware (and will be very limited) or should be implemented in software using unsafe languages. Implementation of such environment will be extremely complex and will most likely have many bugs (think of all bugs found in JVM).
Example of this would be experimental OS Singularity.
Well, considering that I may have a quiz on this tomorrow, I should be able to help you out. However, I am still learning, and while my post may have some technical mistakes, it should be conceptually sound
Basically, as you may understand, there are different type of kernels for an OS.
Monolithic kernels have all their system functionalities and services together in one single giant program, occupying a single address space.
Microkernels on the other hand, have the bare minimum system programs and services on the microkernel. Most of the services that were previously considered as a part of the kernel (during the monolithic kernel version) such as process scheduler, etc. are now in the user space, and are termed as servers. these servers communicate with each other through the micro-kernel, using the inter-process communication, a form of communication that is laid down by the microkernel.
The modular approach builds on this, by making these "servers" dynamically loadable. Thus, one can have a particular "server" (in this type of kernel, called a module) dynamically loaded, without the kernel requiring to re-compile itself.
Linux Kernel is a monolithic kernel, but most flavours of Linux such as Ubuntu, Solaris, use a hybrid kernel, i.e. a mix of the monolithic and modular kernel approach. This is quite common, has different kernel structure has different pros and cons, and a hybrid structure is required to strike a balance
See this prior StackOverflow question for some information about your question. Briefly, it sounds like you're wondering...
... reading the modular concept which say new module for driver can be added without recompiling the kernel, which is against my assumption that Linux uses monolithic kernel. monolithic kernel runs in single address space and as a single process ...
These two concepts ("modular kernel" and "single address space") are not actually contradictory. You can build a new kernel module without recompiling the entire Linux kernel. When you load this new kernel module, it will actually be loaded into the same address space as the running kernel itself. From the link above...
Do not confuse the term modular kernel to be anything but monolithic. Some monolithic kernels can be compiled to be modular (e.g Linux), what matters is that the module is inserted to and runs from the same space that handles core functionality (kernel space).
As you have found, there are several ways to classify kernels and the different types are not necessarily mutually exclusive.

Linux memory/disk behavior during ELF execution

I have a series of executables running on a cluster that read in some small config files at execution start, and then do a lot of processing for several hours, and then write out some data and exit. Our sysadmin is trying to tell me that our fileserver is really slow because his analysis is showing that the cluster nodes are spending all their time using NFS disk I/O reading in ELF executables for execution, long after they've been spawned (note: our executables are only a few MB in size). This doesn't sound right to me, as I was under the impression that the dynamic linker loaded the entire executable into memory at runtime and then operated out of memory. I know that the kernel leaves an open file descriptor to the executable while it's running, but I didn't think it was continually reading from it.
My question is, is my understanding of how executables are loaded flawed? I find it hard to believe that the kernel is constantly doing file reads on the executable to fetch instructions, as this would be terribly slow (even with caching) because branch predictions are hardly reliable, so you'd be spending forever reading the executable from disk if your binary performed frequent jumps.
I was under the impression that the dynamic linker loaded the entire executable into memory at runtime and then operated out of memory.
Your impression is incorrect.
First, a minor inaccuracy: while the dynamic linker is responsible for loading shared libraries, the main executable itself is loaded by the kernel before dynamic loader is started.
Second, most current systems use demand paging. The files are mmaped, but the code isn't actually loaded into memory until that code is accessed (i.e. tries to execute). If you never execute some parts of the program, these parts are never loaded into memory at all.
I find it hard to believe that the kernel is constantly doing file reads on the executable to fetch instructions
It doesn't constantly do that. It typically loads the code into memory and the code stays there.
It is possible for the code to be discarded from memory (which would require reloading it again if it executes again) on a system that doesn't have enough memory (this is called thrashing).
because branch predictions are hardly reliable,
Branch prediction
has approximately nothing to do with your problem, and
is exceedingly good on modern CPUs.

Tool for analyse portable executable loaded into memory

There is a lot of tools designed to help analyzing portable executable files. For example PE Explorer. We can load .exe file into it and check things like number of sections, section alignment or virtual addresses of particular sections.
Is there any similar tool which allows me to do the same but for a portable executable already loaded into memory? Without access to it's .exe file?
EDIT:
Maybe I will try to clarify what I'm trying to achieve. Lets say (like #0x90 suggested) that I have two applications or maybe even three applications.
app1.exe - executed by user, creates new process basing on app3.exe and modifies its memory by putting app2.exe into it.
app2.exe - Injected into app3.exe memory by app1.exe.
app3.exe - Used to create the new process.
I have sources of all of three applications. I'm simply trying to learn about windows internals by practical exercises with injecting/Processes Hollowing. I have some bug in app1.exe which leads to:
The application was unable to start correctly (0xc0000018).
And I'm trying to find a way to debug this situation. My idea was to compare PE on disk with PE in memory and check if it looks correct. I was surprised that I can't find tool designed for such a propose.
For clarity, the application you wish to examine shall be called App1 while the application which loads the PE files contents into memory shall be called App2.
To my knowledge, all major disassemblers work on files.
This is due to the fact that App1 will be mapped into the address space of App2.
Your easiest solution is to dump the executable to disk from memory.
If you control the source of App2, this step is trivial.
If you don't, you will need to attack a debugger, identify the exact memory range where the PE file resides and use the debugger's functionality to dump that memory range to disk.

Is it possible to force a range of virtual addresses?

I have an Ada program that was written for a specific (embedded, multi-processor, 32-bit) architecture. I'm attempting to use this same code in a simulation on 64-bit RHEL as a shared object (since there are multiple versions and I have a requirement to choose a version at runtime).
The problem I'm having is that there are several places in the code where the people who wrote it (not me...) have used Unchecked_Conversions to convert System.Addresses to 32-bit integers. Not only that, but there are multiple routines with hard-coded memory addresses. I can make minor changes to this code, but completely porting it to x86_64 isn't really an option. There are routines that handle interrupts, CPU task scheduling, etc.
This code has run fine in the past when it was statically-linked into a previous version of the simulation (consisting of Fortran/C/C++). Now, however, the main executable starts, then loads a shared object based on some inputs. This shared object then checks some other inputs and loads the appropriate Ada shared object.
Looking through the code, it's apparent that it should work fine if I can keep the logical memory addresses between 0 and 2,147,483,647 (32-bit signed int). Is there a way to either force the shared object loader to leave space in the lower ranges for the Ada code or perhaps make the Ada code "think" that it's addresses are between 0 and 2,147,483,647?
Is there a way to either force the shared object loader to leave space in the lower ranges for the Ada code
The good news is that the loader will leave the lower ranges untouched.
The bad news is that it will not load any shared object there. There is no interface you could use to influence placement of shared objects.
That said, dlopen from memory (which we implemented in our private fork of glibc) would allow you to do that. But that's not available publicly.
Your other possible options are:
if you can fit the entire process into 32-bit address space, then your solution is trivial: just build everything with -m32.
use prelink to relocate the library to desired address. Since that address should almost always be available, the loader is very likely to load the library exactly there.
link the loader with a custom mmap implementation, which detects the library of interest through some kind of side channel, and does mmap syscall with MAP_32BIT set, or
run the program in a ptrace sandbox. Such sandbox can again intercept mmap syscall, and or-in MAP_32BIT when desirable.
or perhaps make the Ada code "think" that it's addresses are between 0 and 2,147,483,647?
I don't see how that's possible. If the library stores an address of a function or a global in a 32-bit memory location, then loads that address and dereferences it ... it's going to get a 32-bit truncated address and a SIGSEGV on dereference.

Resources