Just for clarity.
Does Local Memory refer to the memory allocated to a certain program?
And does the Global memory refer the the Main memory?
I am reading about Uniform Memory Access time and Non Uniform Memory Access time. They say a multiprocessor computer has a Uniform Memory Access time if the time it takes to access memory data locally is the same as the amount of time it takes to access data globally.
I thought by "locally" they are referring to a cache, but in the preceding statements they clarify that a local memory is not a cache.
Related
I can't find any clarity as to what is the performance of the so called Constant memory referred to in the Numba documentation:
https://numba.pydata.org/numba-doc/dev/cuda/memory.html#constant-memory
I am curious as to what are the size limits for this memory, how fast/slow it is when compared to other memory types and if there are any pitfalls using it.
Thank you!
This is more of a general question regarding the constant memory in a CUDA-capable device. You can find info in the official CUDA programming guide and here in which it says:
There is a total of 64 KB constant memory on a device. The constant
memory space is cached. As a result, a read from constant memory costs
one memory read from device memory only on a cache miss; otherwise, it
just costs one read from the constant cache. Accesses to different
addresses by threads within a warp are serialized, thus the cost
scales linearly with the number of unique addresses read by all
threads within a warp. As such, the constant cache is best when
threads in the same warp accesses only a few distinct locations. If
all threads of a warp access the same location, then constant memory
can be as fast as a register access.
Regarding how this compares to other memory types, here is my short answer. You may want to read this page for further details:
Registers: Thread private on-chip read + write memory which can be considered as the fastest memory space on a GPU.
Local memory: Thread private off-chip read + write memory which, despite its misleading name, is physically the same location as global memory. Hence, its high latency.
Global memory: The largest memory with a high latency and a global scope which is also off-chip with read + write permissions.
Constant memory: Off-chip cached read-only memory limited to 64 KB which could be accessed by threads as fast as registers, if all threads of a warp access the same location.
Shared memory: On-chip, low-latency, read + write with limited space per multiprocessor (48 KB to 164 KB depending on the compute capability of your device).
Texture memory: On-chip cached read-only memory optimized for 2D spatial locality that supports unique features like hardware filtering.
Pinned (page-locked) memory: Not an explicit device memory. Accessible directly by both CPU and GPU codes, used to maximize and overlap data transfer between CPU/GPU.
These memories have different scopes, life-times and usages. The Numba page that you have mentioned in your question explains the basics but the official CUDA programming guide has a lot more details. At the end of the day, the answer to the question of when to use each memory is to a large degree application-dependent.
How can I calculate the real memory usage of a single process? I am not talking about the virtual memory, because it just keeps growing. For instance, there are proc files like smaps, where you can get the mappings of a process. But this is virtual memory and the values of that file just keeps growing for running process. But I would like to reflect the real memory usage of a process. E.g. if you plot the memory usage of a process it should represent the allocations of memory and also the freeing of memory. So the plot should be like an up and down movement instead of a linear function, that just keeps growing for a running process.
So, how could I calculate the real memory usage? I would appreciate any helpful answer.
It's actually kind of a complicated question. The two most common metrics for a program's memory usage at the OS level are virtual size and resident set size. (These show in the output of ps -u as the VSZ and RSS columns.) Roughly speaking, these tell the total memory the program has assigned to it, versus how much it is currently actively using.
Further complicating the question is that when you use malloc (or the C++ new operator) to allocate memory, memory is allocated from a pool in your process which is built by occasionally requesting an allocation of memory from the operating system. But when you free memory, the memory goes back into this pool, but it is typically not returned to the OS. So as your program allocates and frees memory, you typically will not see its memory footprint go up and down. (However, if it frees a lot of memory and then doesn't allocate it any more, eventually you may see its rss go down.)
Why is it necessary that we need to bring program in main memory from secondary memory for execution?
Why cant we execute program from secondary memory?
Though, it may not be possible currently, but is it possible in future, somehow by some mechanism, that we can execute the program from secondary memory directly?
Almost all modern CPUs execute instructions by fetching them from an address in main memory identified by the instruction pointer register, loading the referenced memory through one or more cache levels before the portion of the CPU that executes the instruction even starts its work. Designing a CPU that could, for example, fetch instructions directly from a disk or network stream would be a rather large project, and performance would likely be pathetic. There's a reason you have a main memory that operates orders of magnitude faster than disk/network access, and caches between that and the actual execution cores that are orders of magnitude faster even than the main memory...
Mostly some parts of the program is required to be accessed multiple times during the execution of the program. Reading from secondary memory every single time we needed the particular data would obviously require a lot of time.
It is better to load the program in a faster memory i.e. Main memory , so that whenever a part of program is required it can be accessed much faster. Similarly, more frequently used variables are stored in the cache memory for even faster access. It;s all about speed.
If could somehow develop affordable secondary memories that have speed as fast as the main memory, we could do without copying the whole program into main memory. However, we would still need some memory to store the temporaries during the program execution.
Main memory is used to distinguish it from external mass storage devices such as hard drives.
Another term for main memory is RAM. The computer can manipulate only data that is in main memory.
So, every program you execute and every file you access must be copied from a
storage device into main memory.The amount of main memory on a computer is crucial because it determines
how many programs can be executed at one time and how much data can be readily available to a program.
As program is stored on flash/disk. For it execution, program is loaded into virtual memory and is mapped to RAM by virtual manager. During its execution process is in RAM. Then where does virtual memory exist (where it has all .text, .data, .stack, .heap)?
The virtual memory is a view of the RAM plus maybe some swap space provided by a virtual memory manager. Modern OSs have virtual memory managers and provide virtual memory to processes so that the executing program can behave as if it had a contiguous address space whose size is not limited by the actual RAM. The pages or blocks making up the virtual memory can be mapped anywhere in the RAM, so that contiguos virtual pages need to be stored in contiguos RAM areas. Or they can be swapped out to page space or swap space, waiting there until needed, whereupon they're read by the OS and mapped to some RAM page.
When you say
During its execution process is in RAM.
This is not entirely correct. Some or all memory pages that belong to the process may be swapped out, as explained.
One more word concerning the answers and comments that say that "virtual" means it doesn't exist. This makes no sense. On the contrary, according to Webster:
being such in essence or effect ...
Hence virtual memory is something (therefore, it exists!) that behaves as if it were memory.
Virtual memory is just like an illusion of RAM. It uses paging to acquire additional RAM that could be used by the processes in operating system.
Virtual memory means memory you can access with "normal" momory access methods, although it isn't clear where the data is actually stored.
It may be
actually in RAM
in a swap area
in another file (memory mapped file)
and access to it will be handled appropriately.
It is a layer of, well, virtualization so that you as a programmer don't have to worry about where the data is actually put.
The original purpose was mainly to be able to provide more memory to processes than we actually have and to extend it with means of swap space, but there are even more:
The OS is free to use the RAM for whatever it seems necessary, e. g. caching. Under some circumstances, it may be more effective to use RAM for cache than for holding parts of a program which hasn't been used for a long time.
Provide additional memory to a program when it requests it: if you call malloc(), the program's library may request the OS to provide a part of memory which can be attached seamlessly into the address space.
Avoid stack overflow: if the stack grows larger and larger, the respective memory section may be extended as well transparently so that the program won't have to worry about it.
A system can even do "overcommitment" of memory: if a process requests a large amount of memory, the OS may say "yes, ok", i. e. provide the memory to the program. That means in the first place "allow the program to access a certain address space area", but this address space is not immediately backed by memory. Only as soon as the program accesses this memory the mapping will be done, and if this cannot be fulfilled, the program is crashed by the Out of emory killer (at least, under Linux).
All this works by page-wise (1 page = 4 kiB) assignment of physical memory to a program, viewed via the program's address space, and this in the amount and frequency as it is needed.
What is a memory heap ?
Presumably you mean heap from a memory allocation point of view, not from a data structure point of view (the term has multiple meanings).
A very simple explanation is that the heap is the portion of memory where dynamically allocated memory resides (i.e. memory allocated via malloc). Memory allocated from the heap will remain allocated until one of the following occurs:
The memory is free'd
The program terminates
If all references to allocated memory are lost (e.g. you don't store a pointer to it anymore), you have what is called a memory leak. This is where the memory has still been allocated, but you have no easy way of accessing it anymore. Leaked memory cannot be reclaimed for future memory allocations, but when the program ends the memory will be free'd up by the operating system.
Contrast this with stack memory which is where local variables (those defined within a method) live. Memory allocated on the stack generally only lives until the function returns (there are some exceptions to this, e.g. static local variables).
You can find more information about the heap in this article.
A memory heap is a location in memory where memory may be allocated at random access. Unlike the stack where memory is allocated and released in a very defined order, individual data elements allocated on the heap are typically released in ways which is asynchronous from one another. Any such data element is freed when the program explicitly releases the corresponding pointer, and this may result in a fragmented heap. In opposition only data at the top (or the bottom, depending on the way the stack works) may be released, resulting in data element being freed in the reverse order they were allocated.
Heap is just an area where memory is allocated or deallocated without any order. This happens when one creates an object using the new operator or something similar. This is opposed to stack where memory is deallocated on the first in last out basis.
It's a chunk of memory allocated from the operating system by the memory manager in use by a process. Calls to malloc() et alia then take memory from this heap instead of having to deal with the operating system directly.
You probably mean heap memory, not memory heap.
Heap memory is essentially a large pool of memory (typically per process) from which the running program can request chunks. This is typically called dynamic allocation.
It is different from the Stack, where "automatic variables" are allocated. So, for example, when you define in a C function a pointer variable, enough space to hold a memory address is allocated on the stack. However, you will often need to dynamically allocate space (With malloc) on the heap and then provide the address where this memory chunk starts to the pointer.
A memory heap is a common structure for holding dynamically allocated memory.
See Dynamic_memory_allocation on wikipedia.
There are other structures, like pools, stacks and piles.
Memory organization is divided into two parts: heap memory and stack memory.
Heap memory is the main working memory, lowest address is the starting address.
In stack memory, the flow of data is driven by bottom to up approach. Then the memory Arch is named as stack.
every running process has its own private fake virtual memory provided by the OS.
the OS can map this to physical memory at any point as long as it is available otherwise it will map to disk and swap as needed.
this virtual memory is logically divided into segments for organizing different kinds of data.
the code segment holds the executable instructions.
the data segment holds static data such as global or static variables.
the stack holds local data that is automatically managed by called and returning functions.
all of these segments are fixed size even the stack its just the portion used can grow or shrink and is reclaimed as functions returned.
the only segment that is not preallocated at app startup and fixed size is the heap.
the app can request from the OS at runtime new memory to be allocated and the OS will reserve a part of your apps virtual space and commit that to physical memory as needed.
the OS will return a pointer to that newly allocated heap memory and that pointer holds the base or starting address of the new block. that pointer sits on the stack and when that stack space is reclaimed your pointer will be no longer in scope and therefore you have no means of access to that block of memory. and if you dont tell the OS you are done with it so it can reclaim it that is just zombie memory sitting there with no means of access and if your app keeps requesting memory while never giving it back it will crash when the system runs out of memory. so it is important to free or at least pass the pointer to another pointer external to the scope it was defined in so you can maintain an interface to that memory allocated in heap space. i would suggest looking into virtual memory further and understanding segments.