How does address of kernel directly mapping area map to physical address - linux

In my first thought, I think when convert kernel virtual address in directly mapping zone to physical address. It should just minus the page_offset_base(0xffff888000000000 in 4-level paging).
But I do a simple test, I write a kernel module and require some memory by kmalloc(), then print it out in %x format and %p format. There are different in lower bits, not a minus of page_offset_base. I'm wondering why?
here is my code:
buf_high = (char*)kmalloc(140, __GFP_HIGHMEM);
buf = (char*)kmalloc(140, GFP_KERNEL);
printk("kmalloc highmem : %lx, %p\n", buf_high, buf_high);
printk("kmalloc : %lx, %p\n", buf, buf);
Here is the output:
[ 1157.671135] kmalloc highmem : ffff888118d2b000, 000000009412a3d4
[ 1157.671142] kmalloc : ffff888118d2b3c0, 000000009ce9058c

Related

How to read /proc/<pid>/pagemap in a kernel driver?

I am trying to read /proc//pagemap in a kernel driver like this:
uint64_t page;
uint64_t va = 0x7FFD1BF46530;`
loff_t pos = va / PAGE_SIZE * sizeof(uint64_t);
struct file * filp = filp_open("/proc/19030/pagemap", O_RDONLY, 0);
ssize_t nread = kernel_read(filp, &page, sizeof(page), &pos);
I get error -22 in nread (EINVAL, invalid argument) and
"kernel read not supported for file /19030/pagemap (pid: 19030 comm: tester)" in dmesg.
0x7FFD1BF46530 is a virtual address in a user space process pid 19030 (tester). I assume that pos is the offset into the file like in lseek64.
Doing the precise same thing as sudo with same values in a user space process, i.e. reading /proc/19030/pagemap works fine and produces a correct physical address.
The actual thing I am trying to do here is to find the physical address of a user space virtual address. I need the physical address for a device DMA transfer operation and a user space app needs to access this memory. This app allocates 1GB DMA memory with anonymous mmap from THP (Transparent Huge Pages). And I am trying to avoid the need for sudo by reading /proc//pagemap in a kernel driver via ioctl instead.
I would be happy to allocate huge page DMA memory in the driver but don't know how to do that. dma_alloc_coherent is limited to max 4MB allocations. Is there a way to get those allocated as continuous physical memory? I need hundreds of MB or many GB of DMA memory.
Problem with anonymous mmap is that it can only allocate max 1GB huge page as physically continuous memory. Allocating more works but the memory is not physically continuous and unusable for DMA.
Any good ideas or alternative ways of allocating huge pages as DMA memory?
Tried reading file /proc//pagemap in a kernel driver. Expected same results as when reading the file in a user space application which works ok.
"kernel read not supported for file …"
Indeed, as we see in __kernel_read()
if (unlikely(!file->f_op->read_iter || file->f_op->read))
return warn_unsupported(file, "read");
it fails if f_op->read_iter isn't or f_op->read is wired up (implemented), which is both the case for a pagemap file.
You could try pagemap_read() instead. – not feasible for reasons in the comments
When I had the problem of getting the physical address for a virtual address in a driver, I included and copied some kernel code (not that I recommend this, but I saw no other solution); here's an extract.
static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr
, unsigned long sz)
{ return NULL; }
void p4d_clear_bad(p4d_t *p4d) { p4d_ERROR(*p4d); p4d_clear(p4d); }
#include "mm/pagewalk.c"
static int pte(pte_t *pte, unsigned long addr
, unsigned long next, struct mm_walk *walk)
{
*(pte_t **)walk->private = pte;
return 1;
}
/* Scan the real Linux page tables and return a PTE pointer for
* a virtual address in a context.
* Returns true (1) if PTE was found, zero otherwise. The pointer to
* the PTE pointer is unmodified if PTE is not found.
*/
int
get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t **pmdp)
{
struct mm_walk walk = { .pte_entry = pte, .mm = mm, .private = ptep };
return walk_page_range(addr, addr+PAGE_SIZE, &walk);
}
/* Find physical address for this virtual address. Normally used by
* I/O functions, but anyone can call it.
*/
static inline unsigned long iopa(unsigned long addr)
{
unsigned long pa;
/* I don't know why this won't work on PMacs or CHRP. It
* appears there is some bug, or there is some implicit
* mapping done not properly represented by BATs or in page
* tables.......I am actively working on resolving this, but
* can't hold up other stuff. -- Dan
*/
pte_t *pte;
struct mm_struct *mm;
#if 0
/* Check the BATs */
phys_addr_t v_mapped_by_bats(unsigned long va);
pa = v_mapped_by_bats(addr);
if (pa)
return pa;
#endif
/* Allow mapping of user addresses (within the thread)
* for DMA if necessary.
*/
if (addr < TASK_SIZE)
mm = current->mm;
else
mm = &init_mm;
ATTENTION: I needed the current address space.
You'd have to use mm = file->private_data instead.
pa = 0;
if (get_pteptr(mm, addr, &pte, NULL))
pa = (pte_val(*pte) & PAGE_MASK) | (addr & ~PAGE_MASK);
return(pa);
}

Mapping of dmam_alloc_coherent allocated memory to the user space via remap_pfn_range gives pointer to wrong area of memory

I prepare an application running on ARM Intel Cyclone V SoC.
I need to map the DMA coherent memory buffer to the user space.
The buffer is allocated in the device driver with with:
buf_addr = dmam_alloc_coherent(&pdev->dev, size, &dma_addr, GFP_KERNEL);
The mapping is done correctly, and I have verified, that the buffer accessed by the hardware via dma_addr HW address is visible for the kernel via buf_addr pointer.
Then in the mmap function of the device driver I do:
unsigned long physical = virt_to_phys(buf_addr);
unsigned long vsize = vma->vm_end - vma->vm_start;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
remap_pfn_range(vma,vma->vm_start, physical >> PAGE_SHIFT , vsize, vma->vm_page_prot);
The application mmaps the buffer with:
buf = mmap(NULL,buf_size,PROT_READ | PROT_WRITE, dev_file, MAP_SHARED);
I do not get any error from remap_pfn_range function. Also the application is able to access the mmapped memory, but it is not the buffer allocated with dmam_alloc_coherent.
I have found the macro dma_mmap_coherent that seems to be dedicated particularly for that purpose.
I have verified that the following modification in the mmap function ensures proper operation:
unsigned long vsize = vma->vm_end - vma->vm_start;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
remap=dma_mmap_coherent(&my_pdev->dev,vma,fdata, dma_addr, vsize);
Because the pdev pointer is not directly delivered to the mmap function it is passed from the probe function via the global variable my_pdev. In case of driver supporting multiple devices, it should be stored in the device context.

nopage () method implementation

Any one know about how virtual address is translated to physical address in no page method.
with reference to Device Drivers book the nopage method is given as ,
struct page *simple_vma_nopage(struct vm_area_struct *vma,
unsigned long address, int *type)
{
struct page *pageptr;
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
unsigned long physaddr = address - vma->vm_start + offset;
unsigned long pageframe = physaddr >> PAGE_SHIFT;
if (!pfn_valid(pageframe))
return NOPAGE_SIGBUS;
pageptr = pfn_to_page(pageframe);
get_page(pageptr);
if (type)
*type = VM_FAULT_MINOR;
return pageptr;
}
page_shift is the number of bits used to reperesent offset for Virtual and physical memory address.
But what is the offset variable ?
How a physical address is calculated from arithmetic operations on virtual address variables like address and vm_start ?
I feel the documentation of vm_pgoff is not very clear.
This is the offset of the first page of the memory region in RAM.
So if our RAM begins at 0x00000000, and our memory region begins
at 0x0000A000, then vm_pgoff = 10. If you consider/ revisit the mmap
system call then you can see that the "offset" which we pass is the offset
of the starting byte in the file from which "length" bytes will be mapped
on to the memory region. This offset can be converted to address by left
shifting it to PAGE_SHIFT value which is 12 (i.e. 4KB per page size)
Now, irrespective of whether the cr3 register is used in linear address to
physical address translation or not, when we say that "address - vm_start"
then this gives the size of portion between the addresses.
example:
vm_start = 0xc0080000
address = 0xc0090000
address - vm_start = 0x00010000
physaddr = (address - vma->vm_start) + offset;
= 0x00010000 + (10 << PAGE_SHIFT)
= offset_to_page_that_fault + start_addr_of_memoryRegion_in_RAM
= 0x00010000 + 0x0000A000
= 0x0001A000
Now since this is the physical address therefore we need to convert to page frame
number by right shifting by PAGE_SHIFT value i.e 0x0001A000 >> 12 = 0x1A = 26 (decimal)
Therefore the 26th page-frame must be loaded with the data from the file which is being mapped.
Hence data is retrieved from the disk by using the inode's struct address_sapce
which contains the information of the location of page on the disk (swap space).
Once the data is brought in we return the struct page which represents this data in the
page_frame for which this page fault occurred. We return this to the user.
This is my understanding of the latest but I haven't tested it.
No, the statement in the book is correct, because
As aforementioned, "physical" is just the address
Of starting of your region/portion that you want
To map out of the physical memory which starts
From "off" physical address till the "simple_region_size"
The "simple_region_size" value is decided by the user.
Similarly "simple_region_start" is decided by the user.
simple_region_start >= off
So the maximum physical memory that user can map
Is decided by: psize = simple_region_size - off
I.e from start of physical memory till end
of the portion.
But actually how much will be mapped with this memory
region is given by "vma->vm_end - vma->vm_start" and is
represented by vsize. Hence the need existed to perform
the sanity check since User can get more than what it
intended.
Kind regards,
Sanjeev Ranot
"simple_region_start" is the offset from the starting of
physical memory out of which our sub-region needs to be mapped
Example:
off = start of the physical memory (page aligned)= 0xd000 8000
simple_region_start = 0x1000
therefore the physical address of the start of the sub region
we want to map is = 0xd000 8000 + 0x1000
= 0xd000 9000
now virtual size is the portion that needs to be mapped from the
physical memory available. This must be defined properly by the user.
simple_region_size = physical address pointing to last of the
portion that we need to map.
So if we wanted 8KBs to be mapped from the physical memory available
then following is how the calculation goes
simple_region_size = physical address just beyond the last of our portion
simple_region_size = 0xd000 9000 + 0x2000 (for the 8KBs)
simple_region_size = 0xd000 B000
So our 8KBs of portion will range from physical addresses [0xd000 B000 to 0xd000 9000]
Hence physical size i.e. psize = 0x2000
We perform the sanity check i.e
If the size of our portion of physical memory is smaller
than what the user tries to map using the full length
virtual address range of this memory region, then we
raise an exception. i.e say for ex. vsize = 0x3000
Otherwise we use the API "remap_pfn_range" to map the
portion of the physical memory passing in the the
physical address and not the page frame number as
was done previously since this is the IO memory.
I feel it should have been the API "io_remap_page_range"
here the aforementioned.
So it will map the portion of physical memory starting
from the physical address 0xd000 9000 on the the user
linear address starting from vma->vm_start of vsize.
N.B As before I have yet to test this out !

Linux device driver to allow an FPGA to DMA directly to CPU RAM

I'm writing a linux device driver to allow an FPGA (currently connected to the PC via PCI express) to DMA data directly into CPU RAM. This needs to happen without any interaction and user space needs to have access to the data. Some details:
- Running 64 bit Fedora 14
- System has 8GB of RAM
- The FPGA (Cyclone IV) is on a PCIe card
In an attempt to accomplish this I performed the following:
- Reserved the upper 2GB of RAM in grub with memmap 6GB$2GB (will not boot is I add mem=2GB). I can see that the upper 2GB of RAM is reserved in /proc/meminfo
- Mapped BAR0 to allow reading and writing to FPGA registers (this works perfectly)
- Implemented an mmap function in my driver with remap_pfn_range()
- Use ioremap to get the virtual address of the buffer
- Added ioctl calls (for testing) to write data to the buffer
- Tested the mmap by making an ioctl call to write data into the buffer and verified the data was in the buffer from user space
The problem I'm facing is when the FPGA starts to DMA data to the buffer address I provide. I constantly get PTE errors (from DMAR:) or with the code below I get the following error:
DMAR: [DMA Write] Request device [01:00.0] fault addr 186dc5000
DMAR: [fault reason 01] Present bit in root entry is clear
DRHD: handling fault status reg 3
The address in the first line increments by 0x1000 each time based on the DMA from the FPGA
Here's my init() code:
#define IMG_BUF_OFFSET 0x180000000UL // Location in RAM (6GB)
#define IMG_BUF_SIZE 0x80000000UL // Size of the Buffer (2GB)
#define pci_dma_h(addr) ((addr >> 16) >> 16)
#define pci_dma_l(addr) (addr & 0xffffffffUL)
if((pdev = pci_get_device(FPGA_VEN_ID, FPGA_DEV_ID, NULL)))
{
printk("FPGA Found on the PCIe Bus\n");
// Enable the device
if(pci_enable_device(pdev))
{
printk("Failed to enable PCI device\n");
return(-1);
}
// Enable bus master
pci_set_master(pdev);
pci_read_config_word(pdev, PCI_VENDOR_ID, &id);
printk("Vendor id: %x\n", id);
pci_read_config_word(pdev, PCI_DEVICE_ID, &id);
printk("Device id: %x\n", id);
pci_read_config_word(pdev, PCI_STATUS, &id);
printk("Device Status: %x\n", id);
pci_read_config_dword(pdev, PCI_COMMAND, &temp);
printk("Command Register : : %x\n", temp);
printk("Resources Allocated :\n");
pci_read_config_dword(pdev, PCI_BASE_ADDRESS_0, &temp);
printk("BAR0 : %x\n", temp);
// Get the starting address of BAR0
bar0_ptr = (unsigned int*)pcim_iomap(pdev, 0, FPGA_CONFIG_SIZE);
if(!bar0_ptr)
{
printk("Error mapping Bar0\n");
return -1;
}
printk("Remapped BAR0\n");
// Set DMA Masking
if(!pci_set_dma_mask(pdev, DMA_BIT_MASK(64)))
{
pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
printk("Device setup for 64bit DMA\n");
}
else if(!pci_set_dma_mask(pdev, DMA_BIT_MASK(32)))
{
pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
printk("Device setup for 32bit DMA\n");
}
else
{
printk(KERN_WARNING"No suitable DMA available.\n");
return -1;
}
// Get a pointer to reserved lower RAM in kernel address space (virtual address)
virt_addr = ioremap(IMG_BUF_OFFSET, IMG_BUF_SIZE);
kernel_image_buffer_ptr = (unsigned char*)virt_addr;
memset(kernel_image_buffer_ptr, 0, IMG_BUF_SIZE);
printk("Remapped image buffer: 0x%p\n", (void*)virt_addr);
}
Here's my mmap code:
unsigned long image_buffer;
unsigned int low;
unsigned int high;
if(remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot))
{
return(-EAGAIN);
}
image_buffer = (vma->vm_pgoff << PAGE_SHIFT);
if(0 > check_mem_region(IMG_BUF_OFFSET, IMG_BUF_SIZE))
{
printk("Failed to check region...memory in use\n");
return -1;
}
request_mem_region(IMG_BUF_OFFSET, IMG_BUF_SIZE, DRV_NAME);
// Get the bus address from the virtual address above
//dma_page = virt_to_page(addr);
//dma_offset = ((unsigned long)addr & ~PAGE_MASK);
//dma_addr = pci_map_page(pdev, dma_page, dma_offset, IMG_BUF_SIZE, PCI_DMA_FROMDEVICE);
//dma_addr = pci_map_single(pdev, image_buffer, IMG_BUF_SIZE, PCI_DMA_FROMDEVICE);
//dma_addr = IMG_BUF_OFFSET;
//printk("DMA Address: 0x%p\n", (void*)dma_addr);
// Write start or image buffer address to the FPGA
low = pci_dma_l(image_buffer);
low &= 0xfffffffc;
high = pci_dma_h(image_buffer);
if(high != 0)
low |= 0x00000001;
*(bar0_ptr + (17024/4)) = 0;
//printk("DMA Address LOW : 0x%x\n", cpu_to_le32(low));
//printk("DMA Address HIGH: 0x%x\n", cpu_to_le32(high));
*(bar0_ptr + (4096/4)) = cpu_to_le32(low); //2147483649;
*(bar0_ptr + (4100/4)) = cpu_to_le32(high);
*(bar0_ptr + (17052/4)) = cpu_to_le32(low & 0xfffffffe);//2147483648;
printk("Process Read Command: Addr:0x%x Ret:0x%x\n", 4096, *(bar0_ptr + (4096/4)));
printk("Process Read Command: Addr:0x%x Ret:0x%x\n", 4100, *(bar0_ptr + (4100/4)));
printk("Process Read Command: Addr:0x%x Ret:0x%x\n", 17052, *(bar0_ptr + (17052/4)));
return(0);
Thank you for any help you can provide.
Do you control the RTL code that writes the TLP packets yourself, or can you name the DMA engine and PCIe BFM (bus functional model) you are using? What do your packets look like in the simulator? Most decent BFM should trap this rather than let you find it post-deploy with a PCIe hardware capture system.
To target the upper 2GB of RAM you will need to be sending 2DW (64-bit) addresses from the device. Are the bits in your Fmt/Type set to do this? The faulting address looks like a masked 32-bit bus address, so something at this level is likely incorrect. Also bear in mind that because PCIe is big-endian take care when writing the target addresses to the PCIe device endpoint. You might have the lower bytes of the target address dropping into the payload if Fmt is incorrect - again a decent BFM should spot the resulting packet length mismatch.
If you have a recent motherboard/modern CPU, the PCIe endpoint should do PCIe AER (advanced error reporting), so if running a recent Centos/RHEL 6.3 you should get a dmesg report of endpoint faults. This is very useful as the report capture the first handful of DW's of the packet to special capture registers, so you can review the TLP as received.
In your kernel driver, I see you setup the DMA mask, that is not sufficient as you have not programmed the mmu to allow writes to the pages from the device. Look at the implementation of pci_alloc_consistent() to see what else you should be calling to achieve this.
If you are still looking for a reason, then it goes like this:
Your kernel has DMA_REMAPPING flags enabled by default, thus IOMMU is throwing the above error, as IOMMU context/domain entries are not programmed for your device.
You can try using intel_iommu=off in the kernel command line or putting IOMMU in bypass mode for your device.
Regards,
Samir

Is there any API for determining the physical address from virtual address in Linux?

Is there any API for determining the physical address from virtual address in Linux operating system?
Kernel and user space work with virtual addresses (also called linear addresses) that are mapped to physical addresses by the memory management hardware. This mapping is defined by page tables, set up by the operating system.
DMA devices use bus addresses. On an i386 PC, bus addresses are the same as physical addresses, but other architectures may have special address mapping hardware to convert bus addresses to physical addresses.
In Linux, you can use these functions from asm/io.h:
virt_to_phys(virt_addr);
phys_to_virt(phys_addr);
virt_to_bus(virt_addr);
bus_to_virt(bus_addr);
All this is about accessing ordinary memory. There is also "shared memory" on the PCI or ISA bus. It can be mapped inside a 32-bit address space using ioremap(), and then used via the readb(), writeb() (etc.) functions.
Life is complicated by the fact that there are various caches around, so that different ways to access the same physical address need not give the same result.
Also, the real physical address behind virtual address can change. Even more than that - there could be no address associated with a virtual address until you access that memory.
As for the user-land API, there are none that I am aware of.
/proc/<pid>/pagemap userland minimal runnable example
virt_to_phys_user.c
#define _XOPEN_SOURCE 700
#include <fcntl.h> /* open */
#include <stdint.h> /* uint64_t */
#include <stdio.h> /* printf */
#include <stdlib.h> /* size_t */
#include <unistd.h> /* pread, sysconf */
typedef struct {
uint64_t pfn : 55;
unsigned int soft_dirty : 1;
unsigned int file_page : 1;
unsigned int swapped : 1;
unsigned int present : 1;
} PagemapEntry;
/* Parse the pagemap entry for the given virtual address.
*
* #param[out] entry the parsed entry
* #param[in] pagemap_fd file descriptor to an open /proc/pid/pagemap file
* #param[in] vaddr virtual address to get entry for
* #return 0 for success, 1 for failure
*/
int pagemap_get_entry(PagemapEntry *entry, int pagemap_fd, uintptr_t vaddr)
{
size_t nread;
ssize_t ret;
uint64_t data;
uintptr_t vpn;
vpn = vaddr / sysconf(_SC_PAGE_SIZE);
nread = 0;
while (nread < sizeof(data)) {
ret = pread(pagemap_fd, ((uint8_t*)&data) + nread, sizeof(data) - nread,
vpn * sizeof(data) + nread);
nread += ret;
if (ret <= 0) {
return 1;
}
}
entry->pfn = data & (((uint64_t)1 << 55) - 1);
entry->soft_dirty = (data >> 55) & 1;
entry->file_page = (data >> 61) & 1;
entry->swapped = (data >> 62) & 1;
entry->present = (data >> 63) & 1;
return 0;
}
/* Convert the given virtual address to physical using /proc/PID/pagemap.
*
* #param[out] paddr physical address
* #param[in] pid process to convert for
* #param[in] vaddr virtual address to get entry for
* #return 0 for success, 1 for failure
*/
int virt_to_phys_user(uintptr_t *paddr, pid_t pid, uintptr_t vaddr)
{
char pagemap_file[BUFSIZ];
int pagemap_fd;
snprintf(pagemap_file, sizeof(pagemap_file), "/proc/%ju/pagemap", (uintmax_t)pid);
pagemap_fd = open(pagemap_file, O_RDONLY);
if (pagemap_fd < 0) {
return 1;
}
PagemapEntry entry;
if (pagemap_get_entry(&entry, pagemap_fd, vaddr)) {
return 1;
}
close(pagemap_fd);
*paddr = (entry.pfn * sysconf(_SC_PAGE_SIZE)) + (vaddr % sysconf(_SC_PAGE_SIZE));
return 0;
}
int main(int argc, char **argv)
{
pid_t pid;
uintptr_t vaddr, paddr = 0;
if (argc < 3) {
printf("Usage: %s pid vaddr\n", argv[0]);
return EXIT_FAILURE;
}
pid = strtoull(argv[1], NULL, 0);
vaddr = strtoull(argv[2], NULL, 0);
if (virt_to_phys_user(&paddr, pid, vaddr)) {
fprintf(stderr, "error: virt_to_phys_user\n");
return EXIT_FAILURE;
};
printf("0x%jx\n", (uintmax_t)paddr);
return EXIT_SUCCESS;
}
GitHub upstream.
Usage:
sudo ./virt_to_phys_user.out <pid> <virtual-address>
sudo is required to read /proc/<pid>/pagemap even if you have file permissions as explained at: https://unix.stackexchange.com/questions/345915/how-to-change-permission-of-proc-self-pagemap-file/383838#383838
As mentioned at: https://stackoverflow.com/a/46247716/895245 Linux allocates page tables lazily, so make sure that you read and write a byte to that address from the test program before using virt_to_phys_user.
How to test it out
Test program:
#define _XOPEN_SOURCE 700
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
enum { I0 = 0x12345678 };
static volatile uint32_t i = I0;
int main(void) {
printf("vaddr %p\n", (void *)&i);
printf("pid %ju\n", (uintmax_t)getpid());
while (i == I0) {
sleep(1);
}
printf("i %jx\n", (uintmax_t)i);
return EXIT_SUCCESS;
}
The test program outputs the address of a variable it owns, and its PID, e.g.:
vaddr 0x600800
pid 110
and then you can pass convert the virtual address with:
sudo ./virt_to_phys_user.out 110 0x600800
Finally, the conversion can be tested by using /dev/mem to observe / modify the memory, but you can't do this on Ubuntu 17.04 without recompiling the kernel as it requires: CONFIG_STRICT_DEVMEM=n, see also: How to access physical addresses from user space in Linux? Buildroot is an easy way to overcome that however.
Alternatively, you can use a Virtual machine like QEMU monitor's xp command: How to decode /proc/pid/pagemap entries in Linux?
See this to dump all pages: How to decode /proc/pid/pagemap entries in Linux?
Userland subset of this question: How to find the physical address of a variable from user-space in Linux?
Dump all process pages with /proc/<pid>/maps
/proc/<pid>/maps lists all the addresses ranges of the process, so we can walk that to translate all pages: /proc/[pid]/pagemaps and /proc/[pid]/maps | linux
Kerneland virt_to_phys() only works for kmalloc() addresses
From a kernel module, virt_to_phys(), has been mentioned.
However, it is import to highlight that it has this limitation.
E.g. it fails for module variables. arc/x86/include/asm/io.h documentation:
The returned physical address is the physical (CPU) mapping for
the memory address given. It is only valid to use this function on
addresses directly mapped or allocated via kmalloc().
Here is a kernel module that illustrates that together with an userland test.
So this is not a very general possibility. See: How to get the physical address from the logical one in a Linux kernel module? for kernel module methods exclusively.
As answered before, normal programs should not need to worry about physical addresses as they run in a virtual address space with all its conveniences. Furthermore, not every virtual address has a physical address, the may belong to mapped files or swapped pages. However, sometimes it may be interesting to see this mapping, even in userland.
For this purpose, the Linux kernel exposes its mapping to userland through a set of files in the /proc. The documentation can be found here. Short summary:
/proc/$pid/maps provides a list of mappings of virtual addresses together with additional information, such as the corresponding file for mapped files.
/proc/$pid/pagemap provides more information about each mapped page, including the physical address if it exists.
This website provides a C program that dumps the mappings of all running processes using this interface and an explanation of what it does.
The suggested C program above usually works, but it can return misleading results in (at least) two ways:
The page is not present (but the virtual addressed is mapped to a page!). This happens due to lazy mapping by the OS: it maps addresses only when they are actually accessed.
The returned PFN points to some possibly temporary physical page which could be changed soon after due to copy-on-write. For example: for memory mapped files, the PFN can point to the read-only copy. For anonymous mappings, the PFN of all pages in the mapping could be one specific read-only page full of 0s (from which all anonymous pages spawn when written to).
Bottom line is, to ensure a more reliable result: for read-only mappings, read from every page at least once before querying its PFN. For write-enabled pages, write into every page at least once before querying its PFN.
Of course, theoretically, even after obtaining a "stable" PFN, the mappings could always change arbitrarily at runtime (for example when moving pages into and out of swap) and should not be relied upon.
I wonder why there is no user-land API.
Because user land memory's physical address is unknown.
Linux uses demand paging for user land memory. Your user land object will not have physical memory until it is accessed. When the system is short of memory, your user land object may be swapped out and lose physical memory unless the page is locked for the process. When you access the object again, it is swapped in and given physical memory, but it is likely different physical memory from the previous one. You may take a snapshot of page mapping, but it is not guaranteed to be the same in the next moment.
So, looking for the physical address of a user land object is usually meaningless.

Resources