Interpreting GPU information from Process Explorer - memory-leaks

I am trying to hunt down a possible memory leak in my Sharpdx / DirectX application.
I am getting the following information from process explorer which I do not know how to interpret.
What is Dedicated GPU Memory?
What is System GPU Memory?
What is Comitted GPU Memory?

Dedicated GPU memory is basically the VRAM on-board the GPU
System GPU memory is memory that the graphics card driver is using the GART (Graphics Address Remapping Table) to store resources in system memory... AGP and PCI Express both provide regions of memory set aside for this purpose (sometimes referred to as aperture segments).
Committed GPU memory refers to the amount of memory mapped into a display device's address space by the display driver, it is a difficult concept to explain but this number typically does not represent anything worthwhile to anyone but driver developers.
I suggest you look into the following documentation on MSDN as well as this overview of GPU address space segementation, while they are somewhat technical they give a general overview of what is going on.

Related

is there a linux kernel module to perform contiguous physical memory allocation?

I have read this.
But I am working with PCI UIO, and I therefore need continuous physical memory. I am talking (among other things) virtio, so the presence of an IOMMU won't help here. PCI-VFIO has a much nicer and secured approach, I agree, but the IOMMU virtualisation techniques are not mature, as far as I understand.
Writing a PCI-UIO virtio driver in user space running on a guest requires physical contiguous memory. Besides, not all HW have IOMMUs, so even for such simpler hosts systems, VFIO cannot be used, and there is a need for contiguous physical memory...
So, in short, as long as IOMMUs are not everywhere and are not properly emmulated on virtualizers, it seems allocation of contiguous physical memory from user space is needed.
I am aware that user can READ the page mapping from /proc/<pid>/... and that allows for a try and error approach which dpdk is using...
But it feels that contigous physical memory allocation should typically a job from a kernel module... And that I cannot be the first one facing this situation...
Does such a kernel module exist?

Graphics card memory and virtual address space of a process

Supposing I have a game that does lots of graphics in terms of openGL and I have a desktop with Linux 32-bit installed with 4GB of RAM and 1G Nvidia Graphics card. How does my game application virtual address space look like ? Is graphics card memory mapped in this virtual address space ?
Also, is there some relation between RAM and graphics card memory ? Does linux allocate equal RAM for graphics card which can not be used by any process ? That said, it results then into only 3GB of RAM available to my game process ?
How does my game application virtual address space look like?
Impossible to tell. OpenGL leaves this detail completely open to the vendor implementation. Anything that satisfies the specification is allowed.
Is graphics card memory mapped in this virtual address space?
Maybe, maybe not. That depends on the actual implementation.
Also, is there some relation between RAM and graphics card memory?
Usually yes. As far and the majority of OpenGL implementation are concerned the graphics card's RAM is essentially a cache for things that actually live in system memory (CPU RAM + swap space + stuff memory mapped from storage). However this is not pinned down to the specification and anything that satisfies the OpenGL specification is allowed.
Does Linux allocate equal RAM for graphics card which can not be used by any process?
No, because Linux (the kernel) is not concerned with these things. Your graphics card's driver is, though. And the driver may do it any way it sees fit. It can either map OpenGL context data into a separate address space through Physical Address Extension (PAE) or place it in a different process or keep it in your game's address space, or…, or…, or…. There's no written down scheme on this.
That said, it results then into only 3GB of RAM available to my game process?
If so, then more like (3GB - 1GB) - x where 0 < x because the top 1GB of your process' address space are reserved for the kernel and of course your program's text (the binary executed by the CPU) and the text of the libraries it's using takes some address space as well.

Large physically contiguous memory area

For my M.Sc. thesis, I have to reverse-engineer the hash function Intel uses inside its CPUs to spread data among Last Level Cache slices in Sandy Bridge and newer generations. To this aim, I am developing an application in Linux, which needs a physically contiguous memory area in order to make my tests. The idea is to read data from this area, so that they are cached, probe if older data have been evicted (through delay measures or LLC miss counters) in order to find colliding memory addresses and finally discover the hash function by comparing these colliding addresses.
The same procedure has already been used in Windows by a researcher, and proved to work.
To do this, I need to allocate an area that must be large (64 MB or more) and fully cachable, so without DMA-friendly options in TLB. How can I perform this allocation?
To have a full control over the allocation (i.e., for it to be really physically contiguous), my idea was to write a Linux module, export a device and mmap() it from userspace, but I do not know how to allocate so much contiguous memory inside the kernel.
I heard about Linux Contiguous Memory Allocator (CMA), but I don't know how it works
Applications don't see physical memory, a process have some address space in virtual memory. Read about the MMU (what is contiguous in virtual space might not really be physically contiguous and vice versa)
You might perhaps want to lock some memory using mlock(2)
But your application will be scheduled, and other processes (or scheduled tasks) would dirty your CPU cache. See also sched_setaffinity(2)
(and even kernel code might be perhaps preempted)
This page on Kernel Newbies, has some ideas about memory allocation. But the max for get_free_pages looks like 8MiB. (Perhaps that's a compile-time constraint?)
Since this would be all-custom, you could explore the mem= boot parameter of the linux kernel. This will limit the amount of memory used, and you can party all over the remaining memory without anyone knowing. Heck, if you boot up a busybox system, you could probably do mem=32M, but even mem=256M should work if you're not booting a GUI.
You will also want to look into the Offline Scheduler (and here). It "unplugs" the CPU from Linux so you can have full control over ALL code running on it. (Some parts of this are already in the mainline kernel, and maybe all of it is.)

Are they same thing: Linux's framebuffer and GPU's memory

From my understanding they are different.
Linux framebuffer is a software object and GPU's memory is a physical memory mapped to GPU device.
My questions are the following:
1) Is my understanding correct?
2) If so, somehow merging two things into one looks like possible to improve the performance (I guess there are much more technical details why this is not possible and so on...)
3) If not, could you explain how Linux framebuffer and GPU work together?
Linux framebuffer device is a virtual device that wraps data it receives to display. So generally answer is no - it is not GPU memory. In theory driver can map GPU memory into fbdev, but it is unlikely anyone doing this. Main problem is that there may be many virtual consoles, but e.g. only one monitor - fbdev must handle this. Other thing is that GPU memory only quite recently became virtualised (directly accessible), on older GPUs you can't just write into GPU memory anything you like.
Aside from that, fbdev provides unified interface, while direct access to GPU memory will require hardware-specific data formats. When there is a difference between formats, fbdev driver performs conversion.
As for performance - it is already very good. There is probably not much benefit to raise it even further.

What's the difference between D3DERR_OUTOFVIDEOMEMORY and E_OUTOFMEMORY

I am developing a tool drawing primitves with DX9 in my XP-32.
When create vertex buffer and index buffer, there could be some error of creation failed.
Return code could be D3DERR_OUTOFVIDEOMEMORY or E_OUTOFMEMORY.
I am not sure what the difference of them.
I use VideoMemory tool in DX sample to check the memory, and it reports 1024MB.
Does that mean if I create a bunch of managed resource more than 1024MB, it will report D3DERR_OUTOFVIDEOMEMORY?
If there is no more free virtual space memory in process and malloc fail, DX9 will report E_OUTOFMEMORY?
E_OUTOFMEMORY means that DirectX was unable to allocate (ie through malloc or new) the block of memory you requested.
D3DERR_OUTOFVIDEOMEMORY means that DirectX was unable to allocate (ie out of the pool of memory, either on the gfx card or reserved in main memory) the block of memory you requested.
Caveat: Drivers might lie.
D3DERR_OUTOFVIDEOMEMORY is a directx memory error...not necessarily related to video memory, it could memory occupied for holding a scene or drawing an image, as you have found out if there is not enough memory for your process you will get E_OUTOFMEMORY. The latter refers to the memory that is assigned to your process being exhausted. You did not say what operating system/hardware spec you have, best bet would be to look into getting a system memory upgrade if you're running low on resources..
Edit: Some laptops/netbooks have a graphics adapter that is 'fitted with system memory', ok these graphics cards are not serious for the likes of 'Beyond Call of Duty' and other top end games...the graphics card actually steal some memory from the main board thus inflating the amount of RAM that is available to the graphics controllers. They are fine if you are doing word processing/emailing and so on...but at the cost of the system ram which is gobbled by the controller a la 'Integrated graphics controller'...
Hope this helps,
Best regards,
Tom.

Resources