If we have a string "A" and a number 65, since they look identical in memory, how does the OS know which is the string and which is the number?
Another question - assume that a program allocates some memory (say, one byte). How does the OS remember where that memory has been allocated?
Neither of these details are handled by the operating system. They're handled by user programs.
For your first question, internally in memory there is absolutely no distinction between the character 'A' and the numeric value 65 (assuming, of course, that you're just looking at one byte of data). The difference arises when you see how those bits are interpreted by the program. For example, if the user program tries to print the string to the screen, it will probably make some system call to the OS to ask the OS to print the character. In that case, the code in the OS consists of a series of assembly instructions to replicate those bits somewhere in the display device. The display is then tasked with rendering a set of appropriate pixels to draw the character 'A.' In other words, at no point did the program ever "know" that the value was an 'A.' Instead, the hardware simply pushed around bits which controlled another piece of code that ultimately was tasked with turning those bits into pixels.
For your second question, that really depends on the memory manager. There are many ways for a program to allocate memory and know where it's stored. I'm not fully sure I understand what you're asking, but I believe that this answer should be sufficient:
At the OS level, the OS kernel doesn't even know that the byte was allocated. Instead, the OS just allocates giant blocks of memory for the user program to use as it runs. When the program terminates, all that memory is reclaimed.
At the program level, most programs contain a memory manager, a piece of code tasked with allocating and divvying up that large chunk of memory into smaller pieces that can then be used by the program. This usually keeps track of allocated memory as a list of "chunks," where each chunk of memory is treated as a doubly-linked list of elements. Each chunk is usually annotated with information indicating that it's in use and how large the chunk is, which allows the memory manager to reclaim the memory once it's freed.
At the user code level, when you ask for memory, you typically store it in a pointer to keep track of where the memory is. This is just a series of bytes in memory storing the address, which the OS and memory manager never look at unless instructed to.
Hope this helps!
No. 2 - the system keeps a record of all allocations (of a certain process) and can thus remove them e.g. when the process terminates. I propose you read a book on operating system priciples (e.g. Tanenbaum's "Modern Operating Systems").
The character 'A' and the integer number 65 are stored the same way (atleast on 32bit systems) in memory. The string "A" however is stored differently, and that can depend on the system or the programming language. Take for example C which will stored strings as essentially an array of the characters followed by the null character.
Operating Systems use memory managers to keep track of which process are using which parts of memory.
For a computer, a string is a number. A simpliest example would be an ASCII table where for every letter there is a number attached. So if you're familiar with C, you could write printf("%c", 0x65) and actually get a A instead of number. Hope that made sense.
OS don't remember the location of the memory the program has allocated. That's what pointers are for!
The 'OS' applies an algorithm, which will look something like: "if every character in the string is a number, then the string is a number", and gets more complicated for decimals, +/-, etc!
http://en.wikipedia.org/wiki/Dynamic_memory_allocation!
Related
Disclaimer: I am not a very experienced guy, and many questions might seem stupid or badly phrased.
I have heard about stacks and heaps and read a bit about them, but still a few things I don't quite understand:
How does a program find empty memory to store new variables/objects in physical memory.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
How does a program find empty memory to store new variables/objects in physical memory.
Modern operating systems use logical address translation. A process sees a range of logical addresses—its address space. The system hardware breaks the address range into pages. The size of the page is system dependent and is often configurable. The operating system manages page tables that map logical pages to physical page frames of the same size.
The address space is divided into a range of pages that is the system space, shared by all processes, and a user space, that is generally unique to each process.
Within the user and system spaces, pages may be valid or invalid. An invalid page has not yet been mapped to the process address space. Most pages are likely to be invalid.
Memory is always allocated from the operating system image pages. The operating system will have system services that transform invalid pages into valid pages with mappings to physical memory. In order to map pages, the operating system needs to find (or the application needs to specify) a range of pages that are invalid and then has to allocate physical page frames to map to the those pages. Note that physical page frames do not have to be mapped contiguously to logical pages.
You mention stacks and heaps. Stacks and heap are just memory. The operating system cannot tell whether memory is a stack, heap or something else. User mode libraries for memory allocation (such as those that implement malloc/free) allocate memory in pages to create heaps. The only thing that makes this memory a heap is that there is a heap manager controlling it. The heap manager can then allocate smaller blocks of memory from the pages allocated to the heap.
A stack is simpler. It is just a contiguous range of pages. Typically an operating system service that creates a thread or process will allocate a range of pages for a stack and assign the hardware stack pointer register to the high end of the stack range.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong.
This depends upon how the program is created and how the object is created in memory. For typed languages, the linker binds variables to addresses. The linker also generates instruction for mapping those addresses to the address space. For stack/auto variables, the compiler generates offsets from a pointer to the stack. When a function/subroutine gets called, the compiler generates code to allocate the memory required by the procedure, which it does by simply subtracting from the stack pointer. The memory gets freed by simply adding that value back to the stack pointer.
In the case of typeless languages, such as assembly language or Bliss, the programmer has to keep track of the type for each location. When memory is dynamically, the programmer also has to keep track of the type. Most programming languages help this out by having pointers with types.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
Free memory is invalid. Accessing free memory causes a hardware exception.
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
The linker defines the initial state of a program's user address space. Most linkers do not map the first page (or even more than one page). That page is then invalid. That means a null pointer, as you say, references absolutely nothing. If you try to dereference a null pointer you will usually get some kind of access violation exception
Most operating system will allow the user to map the first page. Some linkers will allow the user to override the default setting and map the first page. This is not commonly done as it makes detecting memory error difficult.
How does a program find empty memory to store new variables/objects in physical memory.
Physical memory is managed by the OS that knows which parts of the memory are used by processes and which parts are free. When it needs memory, a program asks the operating system to use parts of the memory. If this memory is for the heap, extra operations are needed. The operating systems delivers memory by fixed size blocks called pages. As a page is 4kbytes, if the user mallocs some bytes, there is a need, to optimize memory use, to know which parts of the page are used or available and to monitor page content after successive malloc and free. There are specific data structures to describe used space and algorithms to find space, whilst avoiding fragmentation.
How does a program know where an object starts and where an object ends in memory. With number variables I can imagine there is a few extra information provided in memory that show the porgram how many bits the variable occupies, but correct me if I'm wrong
The program knows the address (ie the start) of every variable. For global or static variables it is generated by the linker when it places vars in memory. For local variables, the processor has means to compute it given the stack position. For allocated variables, it is stored in another variable (a pointer) when memory is allocated. Concerning the end, it depends on the type of variables. For known types (like int) or composition of known types (like structs) it can be computed at compile time. In other situations, the program has no way to know the entity size. For instance a declaration like int * a may describe an array, but the program has no way to know the array size. The programmer must keep track of this information, for instance by writing the number of elements in the array in another variable.
This is similar to my first question, but: when a variable has a value representd only by zeros, how does the program not confuse that with free memory.
The program never looks at the memory to know if it is free or not. It managed by other means (see question 1).
Does the object value null mean that the address of an object is a bunch of 0's or does the object point to litterally nothing? And if so, how is the "reference" stored to assign it an address later on?
An address is never a bunch of zero, except for address '0' of memory. It is the content that is set to zero. Actually, it not possible to read or write address 0. It generates a "bus error" exception (and maybe you have already encountered it). Pointing to a zero address is exactly like "pointing to litterally nothing" and generate an error if encountered in a program. These variables hold addresses of other vars (pointer). So the address of the pointer is well defined. Was may not be defined is what it points to. It can be modified by assigning something to the pointer (for instance what malloc returned or the address of another var).
We need to store a large 1GB of contiguous bytes in memory for long periods of time (weeks to months), and are trying to choose a Vector/Array library. I had two concerns that I can't find the answer to.
Vector.Unboxed seems to store the underlying bytes on the heap, which can be moved around at will by the GC.... Periodically moving 1GB of data would be something I would like to avoid.
Vector.Storable solves this problem by storing the underlying bytes in the c heap. But everything I've read seems to indicate that this is really only to be used for communicating with other languages (primarily c). Is there some reason that I should avoid using Vector.Storable for internal Haskell usage.
I'm open to a third option if it makes sense!
My first thought was the mmap package, which allows you to "memory-map" a file into memory, using the virtual memory system to manage paging. I don't know if this is appropriate for your use case (in particular, I don't know if you're loading or computing this 1GB of data), but it may be worth looking at.
In particular, I think this prevents the GC moving the data around (since it's not on the Haskell heap, it's managed by the OS virtual memory subsystem). On the other hand, this interface handles only raw bytes; you couldn't have, say, an array of Customer objects or something.
Are there any modern, common CPUs where it is unsafe to write to adjacent elements of an array concurrently from different threads? I'm especially interested in x86. You may assume that the compiler doesn't do anything obviously ridiculous to increase memory granularity, even if it's technically within the standard.
I'm interested in the case of writing arbitrarily large structs, not just native types.
Note:
Please don't mention the performance issues with regard to false sharing. I'm well aware of these, but they're of no practical importance for my use cases. I'm also aware of visibility issues with regard to data written from threads other than the reader. This is addressed in my code.
Clarification: This issue came up because on some processors (for example, old DEC Alphas) memory could only be addressed at word level. Therefore, writing to memory in non-word size increments (for example, single bytes) actually involved read-modify-write of the byte to be written plus some adjacent bytes under the hood. To visualize this, think about what's involved in writing to a single bit. You read the byte or word in, perform a bitwise operation on the whole thing, then write the whole thing back. Therefore, you can't safely write to adjacent bits concurrently from different threads.
It's also theoretically possible, though utterly silly, for a compiler to implement memory writes this way when the hardware doesn't require it. x86 can address single bytes, so it's mostly not an issue, but I'm trying to figure out if there's any weird corner case where it is. More generally, I want to know if writing to adjacent elements of an array from different threads is still a practical issue or mostly just a theoretical one that only applies to obscure/ancient hardware and/or really strange compilers.
Yet another edit: Here's a good reference that describes the issue I'm talking about:
http://my.safaribooksonline.com/book/programming/java/0321246780/threads-and-locks/ch17lev1sec6
Writing a native sized value (i.e. 1, 2, 4, or 8 bytes) is atomic (well, 8 bytes is only atomic on 64-bit machines). So, no. Writing a native type will always write as expected.
If you're writing multiple native types (i.e. looping to write an array) then it's possible to have an error if there's a bug in the operating system kernel or an interrupt handler that doesn't preserve the required registers.
Yes, definitely, writing a mis-aligned word that straddles the CPU cache line boundary is not atomic.
I'm trying to understand the hazards of not locking shared variables in a threaded (or shared memory) environment. It is easy to argue that if you are doing two or more dependent operations on a variable it is important to hold some lock first. The typical example is the increment operation, which first reads the current value before adding one and writing back.
But what if you only have one writer (and lots of readers) and the write is not dependent on the previous value. So I have one thread storing a timestamp offset once every second. The offset holds the difference between local time and some other time base. A lot of readers use this offset to timestamp events and getting a read lock for each time is a little expensive. In this situation I don't care if the reader gets the value just before the write or just after, as long as the reader don't get garbage (that is an offset that was never set).
Say that the variable is a 32 bit integer. Is it possible to get a garbage read of the variable in the middle of a write? Or are writing a 32 bit integer an atomic operation? Will it depend on the Os or hardware? What a about a 64 bit integer on a 32 bit system?
What about shared memory instead of threading?
Writing a 64-bit integer on a 32-bit system is not atomic, and you could have incorrect data if you don't take a lock.
As an example, if your integer is
0x00000000 0xFFFFFFFF
and you are going to write the next int in sequence, you want to write:
0x00000001 0x00000000
But if you read the value after one of the ints is written and before the other is, then you could read
0x00000000 0x00000000
or
0x00000001 0xFFFFFFFF
which are wildly different than the correct value.
If you want to work without locks, you have to be very certain what constitutes an atomic operation on your OS/CPU/compiler combination.
In additions to the above comments, beware the register bank in a slightly more general setting. You may end up updating only the cpu register and not really write it back to main memory right away. Or the other way around where you use a cached register copy while the original value in memory has been updated. Some languages have a volatile keyword to mark a variable as "read-always-and-never-locally-register-cache".
The memory model of your language is important. It describes exactly under what conditions a given value is shared among several threads. Either this is the rules of the CPU architecture you are executing on, or it is determined by a virtual machine in which the language is running. Java for instance has a separate memory model you can look at to figure out what exactly to expect.
An 8-bit, 16-bit or 32-bit read/write is guaranteed to be atomic if it is aligned to it's size (on 486 and later) and unaligned but within a cache line (on P6 and later). Most compilers will guarantee stack (local, assuming C/C++) variables are aligned.
A 64-bit read/write is guaranteed to be atomic if it is aligned (on Pentium and later), however, this relies on the compiler generating a single instruction (for example, popping a 64-bit float from the FPU or using MMX). I expect most compilers will use two 32-bit accesses for compatibility, though it is certainly possible to check (the disassembly) and it may be possible to coerce different handling.
The next issue is caching and memory fencing. However, the effect of ignoring these is that some threads may see the old value even though it has been updated. The value won't be invalid, simply out of date (by microseconds, probably). If this is critical to your application, you will have to dig deeper, but I doubt it is.
(Source: Intel Software Developer Manual Volume 3A)
It very much depends on hardware and how you are talking to it. If you are writing assembler, you will know exactly what you get as processor manuals will tell you which operations are atomic and under what conditions. For example, in the Intel Pentium, 32-bit reads are atomic if the address is aligned, but not otherwise.
If you are working on any level above that, it will depend on how that ultimately gets translated into machine code. Be that a compiler, interpreter, or virtual machine.
The platform you run on determines the size of atomic reads/writes. Generally, a 32-bit (register) platform only supports 32-bit atomic operations. So, if you are writing more than 32-bits, you will probably have to use some other mechanism to coordinate access to that shared data.
One mechanism is to double or triple buffer the actual data and use a shared index to determine the "latest" version:
write(blah)
{
new_index= ...; // find a free entry in the global_data array.
global_data[new_index]= blah;
WriteBarrier(); // write-release
global_index= new_index;
}
read()
{
read_index= global_index;
ReadBarrier(); // read-acquire
return global_data[read_index];
}
You need the memory barriers to ensure that you don't read from global_data[...] until after you read global_index and you don't write to global_index until after you write to global_data[...].
This is a little awful since you can also run into the ABA issue with preemption, so don't use this directly.
Platforms often provide atomic read/write access (enforced at the hardware level) to primitive values (32-bit or 64-bit,as in your example) - see the Interlocked* APIs on Windows.
This can avoid the use of a heavier weight lock for threadsafe variable or member access, but should not be mixed up with other types of lock on the same instance or member. In other words, don't use a Mutex to mediate access in one place and use Interlocked* to modify or read it in another.
When executed, program will start running from virtual address 0x80482c0. This address doesn't point to our main() procedure, but to a procedure named _start which is created by the linker.
My Google research so far just led me to some (vague) historical speculations like this:
There is folklore that 0x08048000 once was STACK_TOP (that is, the stack grew downwards from near 0x08048000 towards 0) on a port of *NIX to i386 that was promulgated by a group from Santa Cruz, California. This was when 128MB of RAM was expensive, and 4GB of RAM was unthinkable.
Can anyone confirm/deny this?
As Mads pointed out, in order to catch most accesses through null pointers, Unix-like systems tend to make the page at address zero "unmapped". Thus, accesses immediately trigger a CPU exception, in other words a segfault. This is quite better than letting the application go rogue. The exception vector table, however, can be at any address, at least on x86 processors (there is a special register for that, loaded with the lidt opcode).
The starting point address is part of a set of conventions which describe how memory is laid out. The linker, when it produces an executable binary, must know these conventions, so they are not likely to change. Basically, for Linux, the memory layout conventions are inherited from the very first versions of Linux, in the early 90's. A process must have access to several areas:
The code must be in a range which includes the starting point.
There must be a stack.
There must be a heap, with a limit which is increased with the brk() and sbrk() system calls.
There must be some room for mmap() system calls, including shared library loading.
Nowadays, the heap, where malloc() goes, is backed by mmap() calls which obtain chunks of memory at whatever address the kernel sees fit. But in older times, Linux was like previous Unix-like systems, and its heap required a big area in one uninterrupted chunk, which could grow towards increasing addresses. So whatever was the convention, it had to stuff code and stack towards low addresses, and give every chunk of the address space after a given point to the heap.
But there is also the stack, which is usually quite small but could grow quite dramatically in some occasions. The stack grows down, and when the stack is full, we really want the process to predictably crash rather than overwriting some data. So there had to be a wide area for the stack, with, at the low end of that area, an unmapped page. And lo! There is an unmapped page at address zero, to catch null pointer dereferences. Hence it was defined that the stack would get the first 128 MB of address space, except for the first page. This means that the code had to go after those 128 MB, at an address similar to 0x080xxxxx.
As Michael points out, "losing" 128 MB of address space was no big deal because the address space was very large with regards to what could be actually used. At that time, the Linux kernel was limiting the address space for a single process to 1 GB, over a maximum of 4 GB allowed by the hardware, and that was not considered to be a big issue.
Why not start at address 0x0? There's at least two reasons for this:
Because address zero is famously known as a NULL pointer, and used by programming languages to sane check pointers. You can't use an address value for that, if you're going to execute code there.
The actual contents at address 0 is often (but not always) the exception vector table, and is hence not accessible in non-privileged modes. Consult the documentation of your specific architecture.
As for the entrypoint _start vs main:
If you link against the C runtime (the C standard libraries), the library wraps the function named main, so it can initialize the environment before main is called. On Linux, these are the argc and argv parameters to the application, the env variables, and probably some synchronization primitives and locks. It also makes sure that returning from main passes on the status code, and calls the _exit function, which terminates the process.