How to allocate a region of memories which similar VirtualAlloc? - linux

I was looking for a method of allocating memories on Linux which similar VirtualAlloc on Windows. Requirements are:
Size of memories block to allocate is 2^16.
Address of memories block is larger than 0x0000ffff
Address of memories block must have last 16 bits are zero.
On Windows because lower limit of application address (lpMinimumApplicationAddress) we have (2) obvious right. From (1), (2) and system rules we also achieved (3).
Thanks for helping.

Try mmap(..., MAP_ANONYMOUS, ...)
You'll get an address which is aligned to a page boundary. For more stringent alignment than that, you probably need to allocate extra and pick an address inside your larger block than is correctly aligned.

You want posix_memalign():
void *ptr;
int memalign_err = posix_memalign(&ptr, 1UL << 16, 1UL << 16);
if (memalign_err) {
fprintf(stderr, "posix_memalign: %s\n", strerror(memalign_err));
} else {
/* ptr is valid */
}
The first 1UL << 16 is the alignment, and the second is the size.
When you're done with the block you pass it to free().

you can ask a specific address to mmap, it may fail for some specific addresses, but generally it works

Related

Global Descriptor Table location

I'm confused with the location of the Global Descriptor Table (GDT). According to Intel Manuals from i386 to earlier ones, the GDTR register contains a base address of the GDT table which is pretended to be a linear address.
Following Intel conventions, linear addresses are subject to paging.
Nevertheless, I'm wondering which address space is considered. Ring 3 (user-land) programs are perfectly allowed to modify some segment selectors (ES for example). This modification should trigger the processor to load segment descriptor from corresponding entry in the GDT which base address is computed using the linear address given by the GDTR register.
Because linear address are subject to paging, I understand from Intel manuals, that segment descriptor loads go through the memory paging of current process. Because Linux certainly doesn't want to expose the GDT structure to user-land programs, I thought that it somehow managed to introduce a hole in the address space of user-land processes; preventing these processes to read the GDT, while allowing the processor to read it for segment reloads.
I checked by using the following code which showed I'm completely wrong about the GDTR's base linear address.
int
main()
{
struct
{
uint16_t pad;
uint16_t size;
uintptr_t base;
} gdt_info;
__asm__ volatile ("sgdt %0" : "=m" (gdt_info.size) );
void* try_mmgdt = (void*)( gdt_info.base & ~0xfff );
void* chk_mmgdt = mmap(try_mmgdt, 0x4000, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
std::cout << "gdt size: \t" << std::dec << gdt_info.size << std::endl;
std::cout << "gdt base: \t" << std::hex << gdt_info.base << std::endl;
std::cout << "mmgdt try:\t" << std::hex << uintptr_t(try_mmgdt) << std::endl;
std::cout << "mmgdt chk:\t" << std::hex << uintptr_t(chk_mmgdt) << std::endl;
return 0;
}
The program output (i386-compiled) on my machine is:
gdt size: 127
gdt base: 1dd89000
mmgdt try: 1dd89000
mmgdt chk: 1dd89000
The linear addresses of GDT entries and linear addresses of the mmap chunk perfectly overlap. Nevertheless the mmap chunk has obviously no relation with the GDT.
So my question finally is: which Intel/linux mechanism makes the linear address of the GDTR and the linear address of the current process point to different memory region ?
I found the answer, and its not straightforward so I'm posting it here so maybe it can help other.
First, I need to acknowledge OSDev.org for helping me understand that.
Though the code is compiled for i386, its running on a x86_64 linux system. Thus, it's not running in legacy 32-bits mode, but rather in the so called "compat mode". In this mode, legacy 32-bits software are allowed to run on an x86_64 environment.
When the system entered intel64 (long) mode, it placed the GDT at a linear address using the high end of the 64-bits address space (something like 0xffff88021dd89000). Whenever a "compat" 32-bits application retrieve the GDTR linear address using LGDT, it only retrieves the lower 32 bits of the linear address (0x1dd89000). When the processor access the GDT, it uses the full 64-bits linear address of the GDTR register, even in compat-mode.

Accessing memory pointers in hardware registers

I'm working on enhancing the stock ahci driver provided in Linux in order to perform some needed tasks. I'm at the point of attempting to issue commands to the AHCI HBA for the hard drive to process. However, whenever I do so, my system locks up and reboots. Trying to explain the process of issuing a command to an AHCI drive is far to much for this question. If needed, reference this link for the full discussion (the process is rather defined all over because there are several pieces, however, ch 4 has the data structures necessary).
Essentially, one writes the appropriate structures into memory regions defined by either the BIOS or the OS. The first memory region I should write to is the Command List Base Address contained in the register PxCLB (and PxCLBU if 64-bit addressing applies). My system is 64 bits and so I'm trying to getting both 32-bit registers. My code is essentially this:
void __iomem * pbase = ahci_port_base(ap);
u32 __iomem *temp = (u32*)(pbase + PORT_LST_ADDR);
struct ahci_cmd_hdr *cmd_hdr = NULL;
cmd_hdr = (struct ahci_cmd_hdr*)(u64)
((u64)(*(temp + PORT_LST_ADDR_HI)) << 32 | *temp);
pr_info("%s:%d cmd_list is %p\n", __func__, __LINE__, cmd_hdr);
// problems with this next line, makes the system reboot
//pr_info("%s:%d cl[0]:0x%08x\n", __func__, __LINE__, cmd_hdr->opts);
The function ahci_port_base() is found in the ahci driver (at least it is for CentOS 6.x). Basically, it returns the proper address for that port in the AHCI memory region. PORT_LST_ADDR and PORT_LST_ADDR_HI are both macros defined in that driver. The address that I get after getting both the high and low addresses is usually something like 0x0000000037900000. Is this memory address in a space that I cannot simply dereference it?
I'm hitting my head against the wall at this point because this link shows that accessing it in this manner is essentially how it's done.
The address that I get after getting both the high and low addresses
is usually something like 0x0000000037900000. Is this memory address
in a space that I cannot simply dereference it?
Yes, you are correct - that's a bus address, and you can't just dereference it because paging is enabled. (You shouldn't be just dereferencing the iomapped addresses either - you should be using readl() / writel() for those, but the breakage here is more subtle).
It looks like the right way to access the ahci_cmd_hdr in that driver is:
struct ahci_port_priv *pp = ap->private_data;
cmd_hdr = pp->cmd_slot;

Mmap is not working for high address memory mapping?

I am trying to do
memory = (char *)mmap((void *)0X0000100000000000,(size_t)0xffffffff/8,PROT_READ | PROT_WRITE , MAP_SHARED|MAP_ANONYMOUS,4,0);
but its not mapping anything and returning 0. I need to map memory at high address in 64-bit machine.
This is not meant as a complete answer - more of a possible explanation:
0X0000100000000000 is 281474976710656. Do you have that high a virtual memory address available? Or stated another way: is that address valid in your OS? I would guess the answer is no.
Is mmap actually returning MAP_FAILED ( (void *) -1 )? Usually when you give mmap an address it does not like, you get MAP_FAILED and errno == EINVAL. Did you check errno?
Note: 4 bytes is not the word length in a 64 bit OS, usually it is 8. A 4 byte word cannot address all of memory, for example.

How does this way of writing data to a specific physical memory address work?

I want to write data to an arbitrary physical memory address to test the error detection and correction feature of my system. One code segment in an existing kernel module is written like this:
u32 addr;
struct page *page;
void *mem;
pci_read_config_dword(priv->mc, I5100_MEMEINJADDRMAT, &addr);
/* Inject error by writing to address */
page = pfn_to_page(addr >> PAGE_SHIFT);
mem = kmap(page) + (addr & (~PAGE_MASK));
*((volatile u32*) (mem)) = 0x01010101;
kunmap(page);
I5100_MEMEINJADDRMAT is the register address of a register in i5100 memory controller. Basically, the memory address is retrieved in that register. I don't understand the remaining code, starting from retrieving a page then perform bitwise operations.
As far as I understand, pfn_to_page is used to get a page that includes a particular physical address by passing in a page frame number as argument. The addr >> PAGE_SHIFT part is to translate from a given address to its corresponding page frame number. But, I don't understand how to use PAGE_SHIFT correctly? What should be the correct data type to use with PAGE_SHIFT?
kmap() returns the appropriate virtual page address then add the offset to get the correct pointer to a virtual memory address. What does (addr & (~PAGE_MASK)) actually do?
My task is to write error injection to a physical address? But the above code seems to write to a virtual address. Is there any other way?
This:
(addr & (~PAGE_MASK))
will clear the bits in addr that are set in PAGE_MASK. Assuming a page size of 4 KB, the PAGE_MASK will likely have its 12 least significant bits set, since 212 = 4096.
So, PAGE_MASK is 0x00000fff. Then, the bit-wise inverse ~PAGE_MASK is simply 0xfffff000, so when addr is bitwise-and:ed with this, the lowest 12 bits of addr are cleared.
/* PAGE_SHIFT determines the page size */
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
I found these definition in linux-source-3.2.0.
but I get PAGE_MASK is 0xfffff000
so I think this operator is clearing the highest 20bit, or try to get the value of lower 12 bit.
page = pfn_to_page(addr >> PAGE_SHIFT);
It gets the page address using page frame number (page frame number (pfn) is obtained shifting to right the address). Because least significant bits represent offset in the page (12 bits for x86). Page frame number is equal to 20 most significant bits of the physical address for x86.
You can apply shifting on pointer or integer data types. It depends on the situation.
(addr & (~PAGE_MASK)) gets inner offset of the frame. To access correct byte you should add offset to related page.
Virtual address represents physical address. As far as I know, there is no way other than using virtual address. Also, I should say this is kernel virtual space, it is different from userspace virtual addresses.Do not mix userspace with kernel space.

After mmap(), write to returned address is OK, but read cause system crash.Why?

I want to share memory between two process.
After mmap(), I get a address mapStart, then I add offset to mapStart and get mapAddr, and make sure mapAddr will not exceed maped PAGE_SIZE.
When I write to mapAddr by
memcpy((void *)mapAddr, data, size);
everything is OK.
But when I read from mapAddr by
memcpy( &data, (void *)mapAddr, size);`
that will case system crash.
Who know Why?
The similar problem is here
Add some Info: #Tony Delroy, #J-16 SDiZ
mmap function is:
mapStart = (void volatile *)mmap(0, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, memfd, pa_base);
system crash: have no any OS error message, Console print some MCA info
the detail described in here
Just some idea.
Is your mmap() spanning over memory regions with different attribute? This is illegal.
Older kernel (you said 2.6.18) allowed this, but crash when you write to some of it.
See this post for some starting point. If it is possible, try a newer kernel.
There are at least two possible issues:
After mmap(), I get a address mapStart, then I add offset to mapStart and get mapAddr, and make sure mapAddr will not exceed maped PAGE_SIZE.
Not mapAddr must be made sure not to exceed the mapped size, but mapAddr+size. You are trying to touch size bytes, not just one.
memcpy((void *)mapAddr, data, size);
memcpy( &data, (void *)mapAddr, size);
Assuming data is not a array (which is a plausible assumption since you use it without address operator in the first line), the second line copies not from the location pointed to by data, but starting with data. This is quite possibly some unallocated memory, or some location on the stack, or whatever. If there is not a lot on the stack, it might as well read beyond the end of the stack into the text segment, or... something else.
(If data is indeed an array, it is of course equivalent, but then your code style would be inconsistent.)

Resources