I have a requirement for large video frame buffer that needs to be physically contiguous. So my question is when kernel driver request physical contig memory, the virtual address returned by kernel will be contiguous or non-contig?
Update:
My apology, let me add more details. For a video buffer of resolution 640x480 with each pixel of 1 byte, the total memory expected 307200 bytes (640x480). For a system that works on 4KiB page, the total pages needed by above buffer will be 75.
Now lets assume in some way this 307200 memory block requested is physically contiguous. But when virtual address of each page is being returned by kernel, will those pages be contiguous or non-contiguous?
Contiguous - the kernel virtual address space mapping is generally 1-1 with physical memory (ie V = P + offset)
Related
When we use kmalloc() it is said, this function returns contiguous physical blocks of memory (if available) and with vmalloc(), we get non-contiguous block of memory (if available) .
It is further stated, access of contiguous block of memory is faster compared to non-contiguous block of memory [Source Link].
To be more specific, lets consider two cases:
Let 1 physical frame=4 KB, page size =4 KB
Case 1:
In my module code, I am using kmalloc() to allocate 20 KB memory to a char array; call succeeds.
Case 2:
I have done above request using vmalloc() and the call has succeeded.
My questions are:
a) How does it take less time for kmalloc() to fulfil the request compared to vmalloc()?
b) How does contiguous allocation lead to fast access of memory compared to non-contiguous allocation?
In each case, CPU generates virtual address, gives to MMU (if TLB miss), does a page walk, identifies frame number, then converts virtual address to a physical address. How does it matter if address is contiguous or non-contiguous?
For kmalloc the whole physical RAM is already mapped 1:1 with an offset1, i.e. physical RAM address N is mapped to virtual address N+PAGE_OFFSET. This makes allocation using kmalloc simpler than with vmalloc, since vmalloc has to find free pages and set up the page tables so that the pages are mapped to a contiguous address block.
There is no difference in access time when accessing kmalloc vs. vmalloc allocated memory, except for the page faults mentioned in the document you linked to.
1 With the exception of systems with more physical memory than fits in the virtual address space reserved for the kernel.
In Linux running on an x86 platform where is the real mode address space mapped to in protected kernel mode? In kernel mode, a thread can access the kernel address space directly. The kernel is in the lower 8MB, The page table is at a certain position, etc (as describe here). But where does the real mode address space go? Can it be accessed directly? For example the BIOS and BIOS addons (See here)?
(My x86-fu is a bit weak. I'll add some tags so that other people can (hopefully) correct me if I'm lying anywhere.)
Physical addresses are the same in real and protected mode. The only difference is in how you get from an address (offset) specified in an instruction to a physical address:
In real mode, the physical address is basically (segment_reg << 4) + offset.
In protected mode, the physical address is translate_via_page_table([segment_reg] + offset).
By [segment_reg] I mean the base address of the segment, looked up in the Global or Local Descriptor Table at the offset in segment_reg. translate_via_page_table() means the address translation done via paging (if enabled).
Looking here, it seems the BIOS ROM appears at physical addresses 0x000F0000-0x000FFFFF. To get at that memory in protected mode with paging, you would have to map it into the virtual address space somewhere by setting up correct page table entries. Assuming 4 KB pages (the usual case), mapping the entire range should require 16 ((0xFFFFF-0xF0000+1)/4096) entries.
To see how the Linux kernel does things, you could look into how e.g. /dev/mem, which allows reading of arbitrary physical addresses, is implemented. The implementation is in drivers/char/mem.c.
The following command (from e.g. this answer) will dump the memory range 0xC0000-0xFFFFF (meaning it includes the video BIOS too, per the memory map linked above):
$ dd if=/dev/mem bs=1k skip=768 count=256 > bios
1024*768 = 0xC0000, and 1024*(768+256) - 1 = 0xFFFFF, which gives the expected physical memory range.
Tracing things a bit, read_mem() in drivers/char/mem.c calls xlate_dev_mem_ptr(), which has an x86-specific implementation in arch/x86/mm/ioremap.c. The ioremap_cache() call in that function seems to be responsible for mapping in the page if needed.
Note that BIOS routines won't work in protected mode by the way. They assume the CPU is running in real mode.
For Linux x86 32 bits, the first 896MB of physical RAM is mapped to a contiguous block of virtual memory starting at virtual address 0xC0000000 to 0xF7FFFFFF. Virtual addresses from 0xF8000000 to 0xFFFFFFFF are assigned dynamically to various parts of the physical memory, so the kernel can have a window of 128MB mapped into any part of physical memory beyond the 896MB limit.
The kernel itself loads at physical address 1MB and up, leaving the first MB free. This first MB is used, for instance, to have DMA buffers that ISA devices needs to be there, because they use the 8237 DMA controller, which can only be mapped to such addresses.
So, reading from virtual memory address 0xC0000000 is actually reading from physical address 0x00000000 (provided the kernel has flagged that page as present)
The linear address beyond 896MB correspond to High memory region ZONE_HIGHMEM.
So the page allocator functions will not work on this region, since they give the linear address of directly mapped page frames in ZONE_NORMAL and ZONE_DMA.
I am confused about these lines specified in Undertanding linux Kernel:
What do they mean when they say "In 64 bit hardware platforms ZONE_HIGHMEM is always empty."
What does this highlighted statement mean: "The allocation of high-memory page frames is done only through alloc_pages() function. These functions do not return linear address since they do not exist. Instead the functions return linear address of the page descriptor of the first allocated page frame. These linear addresses always exist, because all page descriptors are allocated in low memory once and forever during kernel initialization."
What are these Page descriptors and does the 896MB already have all page descriptors of entire RAM.
The x86-32 kernel needs high memory to access more than 1G of physical memory, as it is impossible to permanently map more than 2^{32} addresses within a 32-bit address space and the kernel/user split is 1G/3G.
The x86-64 kernel has no such limitation, as the amount of physically-addressable memory (currently 256T) fits within its 64-bit address space and thus may always be permanently mapped.
High memory is a hack. Ideally you don't need it. Indeed, the point of x86-64 is to be able to directly address all the memory you could possibly want. Taken
from https://www.quora.com/Linux-Kernel/What-is-the-difference-between-high-memory-and-normal-memory
I think page descriptor means struct page. And considering the sizeof struct page. Yes all of them can be stored in ZONE_NORMAL
I am following Gorman's virtual memory management book.
There is a section about kernel table page initialization which is said to be divided into two phases, bootstrapping and finalizing.
Here is what it says about the bootstrapping phase.
The assembler function startup_32() is responsible for enabling the paging unit in
arch/i386/kernel/head.S. While all normal kernel code in vmlinuz is compiled
with the base address at PAGE_OFFSET + 1MiB, the kernel is actually loaded beginning
at the first megabyte (0x00100000) of memory. The first megabyte is used
by some devices for communication with the BIOS and is skipped. The bootstrap
code in this file treats 1MiB as its base address by subtracting __PAGE_OFFSET from
any address until the paging unit is enabled. Therefore before the paging unit is
enabled, a page table mapping has to be established that translates the 8MiB of
physical memory to the virtual address PAGE_OFFSET.
Why we want to subtract __PAGE_OFFEST? For what purpose?
Why we have to do subtracting before the paging unit is enabled? Isn't that we always use subtracting for mapping kernel virtual address to physical memory address?
Why it is 8MB?
Thanks,
Since x86 code isn't generally position-independent, if it's compiled to execute at address X (__PAGE_OFFSET + 1MB) but loaded at address Y (1MB), all addresses inside of it need to be decremented by Y-X (__PAGE_OFFSET + 1MB - 1MB = __PAGE_OFFSET) for it to work.
For example, if there's an instruction to read a byte of memory from the beginning of the kernel, __PAGE_OFFSET + 1MB, the address is reduced by __PAGE_OFFSET and the actual read location becomes 1MB, exactly where the kernel starts in the memory.
When page translation is finally enabled, __PAGE_OFFSET can and, I believe, is effectively subtracted by the page translation mechanism by mapping a range of virtual addresses to a range of physical addresses that are smaller by __PAGE_OFFSET (that is, physical=virtual-__PAGE_OFFSET per the page tables).
Unless there's some additional kernel relocation involved, 8MB is likely just the size of the mapping range, sufficient to map the entire kernel.
In linux kernel, mem_map is the array which holds all "struct page" descriptors. Those pages includes the 128MiB memory in lowmem for dynamically mapping highmem.
Since the lowmem size is 1GiB, so the mem_map array has only 1GiB/4KiB=256KiB entries. If each entry size is 32 byte, then the mem_map memory size = 8MiB. But if we could use mem_map to map all 4GiB physical memory(if we have so much physical memory available on x86-32), then the mem_map array would occupy 32MiB, that is not a lot of kernel memory(or am i wrong?).
So my question is: why do we need to use that 128MiB in low for indirect highmem mapping in the first place? Or put another way, why not to map all those max 4GiB physical memory(if available) in the kernel space directly?
Note: if my understanding of the kernel source above is wrong, please correct. Thanks!
Look Here: http://www.xml.com/ldd/chapter/book/ch13.html
Kernel low memory is the 'real' memory map, addressed with 32-bit pointers on x86.
Kernel high memory is the 'virtual' memory map, addressed with virtual structures on x86.
You don't want to map it all into the kernel address space, because you can't always address all of it, and you need most of your memory for virtual memory segments (virtual, page-mapped process space.)
At least, that's how I read it. Wow, that's a complicated question you asked.
To throw more confusion, chapter 13 talks about some PCI devices not being able to address the 32-bit space, which was the genesis of my previous comment:
On x86, some kernel memory usage is limited to the first Gigabyte of memory bacause of DMA addressing concerns. I'm not 100% familiar with the topic, but there's a comapatibility mode for DMA on the PCI bus. That may be what you are looking at.
3.6 GB is not the ceiling when using physical address extension, which is commonly needed on most modern x86 boards, especially with memory hotplug.
Or put another way, why not to map all those max 4GiB physical
memory(if available) in the kernel space directly?
One reason is userspace: every usespace process have its own virtual address space. Suppose you have 4Gb of RAM on x86. So if we suggest that kernel owns 1Gb of memory (~800 directly mapped + ~200 vmalloc) all other ~3Gb should be dynamically distributed between processes spinning in user space. So how can you map your 4Gbs directly when you have a several address spaces?
why do we need zone_highmem on x86?
The reason is the same. Kernel reserves only ~800Mb for low mem. All other memory will be allocated and connected with particular virtual address only on demand. For example if you will execute a binary a new virtual address space will be created and some pages will be allocated for storing your binary code and data (heap ,stack ...). So the key attribute of high mem is to serve dynamic memory allocation requests, you never know in advance what will be triggered by userspace...