Global Descriptor Table location - linux

I'm confused with the location of the Global Descriptor Table (GDT). According to Intel Manuals from i386 to earlier ones, the GDTR register contains a base address of the GDT table which is pretended to be a linear address.
Following Intel conventions, linear addresses are subject to paging.
Nevertheless, I'm wondering which address space is considered. Ring 3 (user-land) programs are perfectly allowed to modify some segment selectors (ES for example). This modification should trigger the processor to load segment descriptor from corresponding entry in the GDT which base address is computed using the linear address given by the GDTR register.
Because linear address are subject to paging, I understand from Intel manuals, that segment descriptor loads go through the memory paging of current process. Because Linux certainly doesn't want to expose the GDT structure to user-land programs, I thought that it somehow managed to introduce a hole in the address space of user-land processes; preventing these processes to read the GDT, while allowing the processor to read it for segment reloads.
I checked by using the following code which showed I'm completely wrong about the GDTR's base linear address.
int
main()
{
struct
{
uint16_t pad;
uint16_t size;
uintptr_t base;
} gdt_info;
__asm__ volatile ("sgdt %0" : "=m" (gdt_info.size) );
void* try_mmgdt = (void*)( gdt_info.base & ~0xfff );
void* chk_mmgdt = mmap(try_mmgdt, 0x4000, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
std::cout << "gdt size: \t" << std::dec << gdt_info.size << std::endl;
std::cout << "gdt base: \t" << std::hex << gdt_info.base << std::endl;
std::cout << "mmgdt try:\t" << std::hex << uintptr_t(try_mmgdt) << std::endl;
std::cout << "mmgdt chk:\t" << std::hex << uintptr_t(chk_mmgdt) << std::endl;
return 0;
}
The program output (i386-compiled) on my machine is:
gdt size: 127
gdt base: 1dd89000
mmgdt try: 1dd89000
mmgdt chk: 1dd89000
The linear addresses of GDT entries and linear addresses of the mmap chunk perfectly overlap. Nevertheless the mmap chunk has obviously no relation with the GDT.
So my question finally is: which Intel/linux mechanism makes the linear address of the GDTR and the linear address of the current process point to different memory region ?

I found the answer, and its not straightforward so I'm posting it here so maybe it can help other.
First, I need to acknowledge OSDev.org for helping me understand that.
Though the code is compiled for i386, its running on a x86_64 linux system. Thus, it's not running in legacy 32-bits mode, but rather in the so called "compat mode". In this mode, legacy 32-bits software are allowed to run on an x86_64 environment.
When the system entered intel64 (long) mode, it placed the GDT at a linear address using the high end of the 64-bits address space (something like 0xffff88021dd89000). Whenever a "compat" 32-bits application retrieve the GDTR linear address using LGDT, it only retrieves the lower 32 bits of the linear address (0x1dd89000). When the processor access the GDT, it uses the full 64-bits linear address of the GDTR register, even in compat-mode.

Related

SIDT instruction returns wrong base address in a Linux user-space process

I made the following x86-64 program to view where the base address of the Interrupt Descriptor Tables starts:
#include <stdio.h>
#include <inttypes.h>
typedef struct __attribute__((packed)) {
uint16_t limit;
uint64_t base;
}idt_data_t;
static inline void store_idt(idt_data_t *idt_data)
{
asm volatile("sidt %0":"=m" (*idt_data));
}
int main(void)
{
idt_data_t idt_data;
store_idt(&idt_data);
printf("IDT Limit : 0x%X\n", idt_data.limit);
printf("IDT Base : 0x%lX\n", idt_data.base);
return 0;
}
And it prints the following:
IDT Limit : 0xFFF
IDT Base : 0xFFFFFE0000000000
The base address doesn't seem to be correct because the address should always be a physical address, am I right?
Also, I'm not sure but the limit seems to be too high. What am I doing wrong?
It's a linear address, not necessarily a physical address. In other words, it's subject to the page table like most other addresses. It has to be in pages that are never paged to disk--it wouldn't be able to handle page faults if not--but it can be in addresses that differ physically from virtually.
On x86-64, each entry of the IDT is 16 bytes long. There are 256 interrupt vectors. 256 * 16 = 4096 = 0x1000. The IDTR limit is a "less than or equal" check, so it's typical to use 0xFFF.
SIDT is a privileged instruction on newer CPUs if the OS enables a certain feature, so it's advisable not to use it in user mode unless you're writing an exploit PoC or something. It's possible that an OS lies about the answer rather than throwing an exception, but I don't know.

How to read a memory dump in binary from GDB?

At the time of a crash I have a post crash handler where I try to dump whats in certain memory regions
auto memdump = std::fstream("FileMemDump.bin", std::ios::out | std::ios::binary);
auto memRegion = getMemoryRegion();
std::cout << "Memory region start: " << memRegion.start << " size: " << memRegion.size;
memdump.write((char*)memRegion.start, memRegion.size);
memdump.close();
and after the file has created a core file
So after I load the core in the following manner :
#gdb ./exec ./core.file
I give the restore command; the start address is what is printed from the above log... and it fails with the following message
(gdb) restore ./FileMemDump.bin binary 0 0xFFAA0000
You can't do that without a process to debug.
a. Are the options given to the std::fstream OK or
b. Is it possible call the gdb-dump command from with in the code (since dump from gdb can be restored)
or what I am trying to do is not feasible
EDIT:
The big picture : In my process I want to use memory mapped IO -- at the time of init I allocate huge pages and mmap() it to a /dev device and similarly I mmap() the nonvolatile dimm area as well (we do not use conventional malloc)
With this ; when the process asserts/cores I am not able to access the hugepages or the non volatile dimm areas
I was trying to have a post fatal hook where I dump these memory areas into a binary file(s). In this question I was asking to restore those memory areas into the GDB core to inspect those memory areas
Arguments to mmap()
fp = open("/dev/mem", O_RDWR);
mmap(NULL,
region.size,
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_NORESERVE,
fp,
phybaseaddr);

DMA over PCIe to other device

I am trying to access the DMA address in a NIC directly from another PCIe device in Linux. Specifically, I am trying to read that from an NVIDIA GPU to bypass the CPU all together. I have researched for zero-copy networking and DMA to userspace posts, but they either didn't answer the question or involve some copy from Kernel space to User space. I am trying to avoid using any CPU clocks because of the inconsistency with the delay and I have very tight latency requirements.
I got a hold of the NIC driver for the intel card I use (e1000e driver) and I found where the ring buffers are allocated. As I understood from a previous paper I was reading, I would be interested in the descriptor of type dma_addr_t. They also have a member of the rx_ring struct called dma. I pass both the desc and the dma members using an ioctl call but I am unable to get anything in the GPU besides zeros.
The GPU code is as follows:
int *setup_gpu_dma(u64 addr)
{
// Allocate GPU memory
int *gpu_ptr;
cudaMalloc((void **) &gpu_ptr, MEM_SIZE);
// Allocate memory in user space to read the stuff back
int *h_data;
cudaMallocHost((void **)&h_data, MEM_SIZE);
// Present FPGA memory to CUDA as CPU locked pages
int error = cudaHostRegister((void **) &addr, MEM_SIZE,
CU_MEMHOSTALLOC_DEVICEMAP);
cout << "Allocation error = " << error << endl;
// DMA from GPU memory to FPGA memory
cudaMemcpy((void **) &gpu_ptr, (void **)&addr, MEM_SIZE, cudaMemcpyHostToDevice);
cudaMemcpy((void **) &h_data, (void **)&gpu_ptr, MEM_SIZE, cudaMemcpyDeviceToHost);
// Print the data
// Clean up
}
What am I doing wrong?
cudaHostRegister() operates on already-allocated host memory, so you have to pass addr, not &addr.
If addr is not a host pointer, this will not work. If it is a host pointer, your function interface should use void * and then there will be no need for the typecast.

How to allocate a region of memories which similar VirtualAlloc?

I was looking for a method of allocating memories on Linux which similar VirtualAlloc on Windows. Requirements are:
Size of memories block to allocate is 2^16.
Address of memories block is larger than 0x0000ffff
Address of memories block must have last 16 bits are zero.
On Windows because lower limit of application address (lpMinimumApplicationAddress) we have (2) obvious right. From (1), (2) and system rules we also achieved (3).
Thanks for helping.
Try mmap(..., MAP_ANONYMOUS, ...)
You'll get an address which is aligned to a page boundary. For more stringent alignment than that, you probably need to allocate extra and pick an address inside your larger block than is correctly aligned.
You want posix_memalign():
void *ptr;
int memalign_err = posix_memalign(&ptr, 1UL << 16, 1UL << 16);
if (memalign_err) {
fprintf(stderr, "posix_memalign: %s\n", strerror(memalign_err));
} else {
/* ptr is valid */
}
The first 1UL << 16 is the alignment, and the second is the size.
When you're done with the block you pass it to free().
you can ask a specific address to mmap, it may fail for some specific addresses, but generally it works

When 2 int's are stored in Visual Studio, the difference between their locations comes out to be 12 bytes. Is there a reason for this?

When I run the following program in VC++ 2008 Express, I get the difference in location between two consecutively stored integers as '12' instead of expected '4'. On any other compilers, the answer comes out to be '4'. Is there a particular reason for why '12'?
#include <iostream>
using namespace std;
int main()
{
int num1, num2;
cin >> num1 >> num2;
cout << &num1 << endl << &num2 << endl;
cout << int(&num1) - int(&num2)<<endl; //Here it shows difference as 12.
cout << sizeof(num1); //Here it shows the size as 4.
return 0;
}
I'm going to make a wild guess and say that you built it in debug mode. Try building it in release mode and see what you get. I know the C++ run-time will place memory guards around allocated memory in debug mode to catch buffer overflows. I don't know if it does something similar with variables on the stack.
You could be developing code for a computer in China or it may be that there is a small and rare deficiency in the specific hardware you are using. One old model has difficulty with large numbers where the top bits become set and if the variables are in contiguous memory locations it was found that a buildup of charge in the core memory could have a crosseffect on adjacent memory locations and alter the contents. Other possibilities are spare memory locations for detecting overflows and underflows and it could be that you are running 32bit software mapped onto a 48bit hardware architecture brought forward to exist as a new model with the spare bits and bytes remaining unused.

Resources