how to traverse page cache tree (radix tree) of a file address space in linux kernel - linux

I need to get page-cache statistics of an open file. There is a address_space pointer(f_mapping) in file struct which in turn has the root of the radix tree called page_tree. I need to traverse that tree to get information about all the cached pages for that open file.
There are some functions like radix_tree_for_each_chunk(to iterate over chunks), radix_tree_for_each_chunk_slot (to iterate over slots in one chunk) etc, using these the functionality can be achieved. I am unsure about the proper use (arguments) of the same. It would be helpful if any example is posted.

I figured it out from Linux kernel source code.
struct file *file = filp_open("filename",O_RDONLY,0);
struct address_space *file_addr_space = file->f_mapping;
if(file_addr_space==NULL){
printk("error")
}
struct radix_tree_root file_page_tree_root = file_addr_space->page_tree; //contains all pages in page cache
struct radix_tree_iter iter;
void **slot;
int num_dirty = 0;
radix_tree_for_each_slot(slot,&file_page_tree_root,&iter,0){
struct page *page = radix_tree_deref_slot(slot);
if(page!=NULL){
//printk("information about page");
}
}

Related

Kernel API to get Physical RAM Offset

I'm writing a device driver (for Linux kernel 2.6.x) that interacts directly with physical RAM using physical addresses. For my device's memory layout (according to the output of cat /proc/iomem), System RAM begins at physical address 0x80000000; however, this code may run on other devices with different memory layouts so I don't want to hard-code that offset.
Is there a function, macro, or constant which I can use from within my device driver that gives me the physical address of the first byte of System RAM?
Is there a function, macro, or constant which I can use from within my device driver that gives me the physical address of the first byte of System RAM?
It doesn't matter, because you're asking an XY question.
You should not be looking for or trying to use the "first byte of System RAM" in a device driver.
The driver only needs knowledge of the address (and length) of its register block (that is what this "memory" is for, isn't it?).
In 2.6 kernels (i.e. before Device Tree), this information was typically passed to drivers through struct resource and struct platform_device definitions in a board_devices.c file.
The IORESOURCE_MEM property in the struct resource is the mechanism to pass the device's memory block start and end addresses to the device driver.
The start address is typically hardcoded, and taken straight from the SoC datasheet or the board's memory map.
If you change the SoC, then you need new board file(s).
As an example, here's code from arch/arm/mach-at91/at91rm9200_devices.c to configure and setup the MMC devices for a eval board (AT91RM9200_BASE_MCI is the physical memory address of this device's register block):
#if defined(CONFIG_MMC_AT91) || defined(CONFIG_MMC_AT91_MODULE)
static u64 mmc_dmamask = DMA_BIT_MASK(32);
static struct at91_mmc_data mmc_data;
static struct resource mmc_resources[] = {
[0] = {
.start = AT91RM9200_BASE_MCI,
.end = AT91RM9200_BASE_MCI + SZ_16K - 1,
.flags = IORESOURCE_MEM,
},
[1] = {
.start = AT91RM9200_ID_MCI,
.end = AT91RM9200_ID_MCI,
.flags = IORESOURCE_IRQ,
},
};
static struct platform_device at91rm9200_mmc_device = {
.name = "at91_mci",
.id = -1,
.dev = {
.dma_mask = &mmc_dmamask,
.coherent_dma_mask = DMA_BIT_MASK(32),
.platform_data = &mmc_data,
},
.resource = mmc_resources,
.num_resources = ARRAY_SIZE(mmc_resources),
};
void __init at91_add_device_mmc(short mmc_id, struct at91_mmc_data *data)
{
if (!data)
return;
/* input/irq */
if (data->det_pin) {
at91_set_gpio_input(data->det_pin, 1);
at91_set_deglitch(data->det_pin, 1);
}
if (data->wp_pin)
at91_set_gpio_input(data->wp_pin, 1);
if (data->vcc_pin)
at91_set_gpio_output(data->vcc_pin, 0);
/* CLK */
at91_set_A_periph(AT91_PIN_PA27, 0);
if (data->slot_b) {
/* CMD */
at91_set_B_periph(AT91_PIN_PA8, 1);
/* DAT0, maybe DAT1..DAT3 */
at91_set_B_periph(AT91_PIN_PA9, 1);
if (data->wire4) {
at91_set_B_periph(AT91_PIN_PA10, 1);
at91_set_B_periph(AT91_PIN_PA11, 1);
at91_set_B_periph(AT91_PIN_PA12, 1);
}
} else {
/* CMD */
at91_set_A_periph(AT91_PIN_PA28, 1);
/* DAT0, maybe DAT1..DAT3 */
at91_set_A_periph(AT91_PIN_PA29, 1);
if (data->wire4) {
at91_set_B_periph(AT91_PIN_PB3, 1);
at91_set_B_periph(AT91_PIN_PB4, 1);
at91_set_B_periph(AT91_PIN_PB5, 1);
}
}
mmc_data = *data;
platform_device_register(&at91rm9200_mmc_device);
}
#else
void __init at91_add_device_mmc(short mmc_id, struct at91_mmc_data *data) {}
#endif
ADDENDUM
i'm still not seeing how this is an xy question.
I consider it an XY question because:
You conflate "System RAM" with physical memory address space.
"RAM" would be actual (readable/writable) memory that exists in the address space.
"System memory" is the RAM that the Linux kernel manages (refer to your previous question).
Peripherals can have registers and/or device memory in (physical) memory address space, but this should not be called "System RAM".
You have not provided any background on how or why your driver "interacts directly with physical RAM using physical addresses." in a manner that is different from other Linux drivers.
You presume that a certain function is the solution for your driver, but you don't know the name of that function. That's a prototype for an XY question.
can't i call a function like get_platform_device (which i just made up) to get the struct platform_device and then find the struct resource that represents System RAM?
The device driver would call platform_get_resource() (in its probe function) to retrieve its struct resource that was defined in the board file.
To continue the example started above, the driver's probe routune has:
static int __init at91_mci_probe(struct platform_device *pdev)
{
struct mmc_host *mmc;
struct at91mci_host *host;
struct resource *res;
int ret;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
return -ENXIO;
if (!request_mem_region(res->start, resource_size(res), DRIVER_NAME))
return -EBUSY;
...
/*
* Map I/O region
*/
host->baseaddr = ioremap(res->start, resource_size(res));
if (!host->baseaddr) {
ret = -ENOMEM;
goto fail1;
}
that would allow me to write code that can always access the nth byte of RAM, without assumptions of how RAM is arranged in relation to other parts of memory.
That reads like a security hole or a potential bug.
I challenge you to to find a driver in the mainline Linux kernel that uses the "physical address of the first byte of System RAM".
Your title is "Kernel API to get Physical RAM Offset".
The API you are looking would seem to be the struct resource.
What you want to do seems to fly in the face of Linux kernel conventions. For the integrity and security of the system, drivers do not try to access any/every part of memory.
The driver will request and can be given exclusive access to the address space of its registers and/or device memory.
All RAM under kernel management is only accessed through well-defined conventions, such as buffer addresses for copy_to_user() or the DMA API.
A device driver simply does not have free reign to access any part of memory it chooses.
Once a driver is started by the kernel, there is absolutely no way it can disregard "assumptions of how RAM is arranged".
RAM memory mapping is based identical to each SOC or processor. Vendor usually provide their memory mapping related documents to the user.
Probably you need to refer your processor datasheet related document.
As you said memory is hard-coded. In most cases memory mapping is hard-coded over device-tree or in SDRAM driver itself.
Maybe you could look into memblock struct and memblock_start_of_DRAM().
/sys/kernel/debug/memblock/memory represent memory banks.
0: 0x0000000940000000..0x0000000957bb5fff
RAM bank 0: 0x0000000940000000 0x0000000017bb6000
1: 0x0000000980000000..0x00000009ffffffff
RAM bank 1: 0x0000000980000000 0x0000000080000000

How to get the page directory of a pointer in xv6

Here is my 'translate()' in proc.c
I want to get the physical address given the virtual address of a pointer, but I don't know how to get the pointers pgdir(page directory)...
int translate(void* vaddr)
{
cprintf("vaddr = %p\n",vaddr);
int paddr;
pde_t *pgdir;
pte_t *pgtab;
pde_t *pde;
pte_t *pte;
pgdir = (pde_t*)cpu->ts.cr3;
cprintf("page directory base is: %p\n",cpu->ts.cr3);
pde = &pgdir[PDX(vaddr)];
if(*pde & PTE_P){
pgtab = (pte_t*)P2V(PTE_ADDR(*pde));
}else{
cprintf("pde = %d\n",*pde);
cprintf("PTE_P = %d\n",PTE_P);
cprintf("pte not present\n");
return -1;
}
pte = &pgtab[PTX(vaddr)];
paddr = PTE_ADDR(*pte);
cprintf("the virtual address is %p\n",vaddr);
cprintf("the physical address is %d\n",paddr);
return 0;
}
You need to use argint() or argptr() to read the argument.
There's a global variable proc that lives in proc.h.
proc.h, lines 34 to 43
// Per-CPU variables, holding pointers to the
// current cpu and to the current process.
// The asm suffix tells gcc to use "%gs:0" to refer to cpu
// and "%gs:4" to refer to proc. seginit sets up the
// %gs segment register so that %gs refers to the memory
// holding those two variables in the local cpu's struct cpu.
// This is similar to how thread-local variables are implemented
// in thread libraries such as Linux pthreads.
extern struct cpu *cpu asm("%gs:0"); // &cpus[cpunum()]
extern struct proc *proc asm("%gs:4"); // cpus[cpunum()].proc
You can reference proc->pgdir in proc.c or anywhere else that proc.h is included.
For what it's worth your translate function looks good to me.

How to attach per open() data in a device driver?

I have some code which looks like:
static int devname_read(struct cdev *dev, struct uio *uio, int ioflag)
{
int error = modify_state();
return (error);
}
The issue here is that modify_state() operates on global state when it really should be operating on is per open(2). In other words no reader should conflict with each other, and nothing persist when the device is close(2)ed.
How can I associate state with the file-descriptor or related identifier?
You probably want to use cdevpriv; see http://www.freebsd.org/cgi/man.cgi?devfs_set_cdevpriv.

Ripping out the hidden kernel module by reading kernel memory directly?

Is it possible to find hidden kernel modules by reading kernel memory directly?
By hiding I mean a LKM that removes itself from the kernel module list.
If so, what structure should I expect, or what document should I read?
following #Eugene, I find a way to read kernel memory directly to find the so called not-so-clever hidden module: just compare the module from both procfs perspective and sysfs perspective:
static int detect_hidden_mod_init(void)
{
char *procfs_modules[MAX_MODULE_SIZE];
char *sysfs_modules[MAX_MODULE_SIZE];
int proc_module_index = 0, sys_module_index = 0;
struct module *mod;
struct list_head *p;
// get modules from procfs perspective
list_for_each(p, &__this_module.list){
mod = list_entry(p, struct module, list);
procfs_modules[proc_module_index++] = mod->name;
}
// get modules from sysfs perspective
struct kobject *kobj;
struct kset *kset = __this_module.mkobj.kobj.kset;
list_for_each(p, &kset->list) {
kobj = container_of(p, struct kobject, entry);
sysfs_modules[sys_module_index++] = kobj->k_name;
}
//compare the procfs_modules and sysfs_modules
...
}
Actually it can detect most of current module-hidden rootkit, however as Eugene said, "A clever rootkit could try to hide that data as well". So it is not a perfect way.

What does get_current() return in this kernel module?

I have written a kernel module which reads and writes /proc files, and it is working fine. Now I want to use permissions with it, but when I write the function for permissions shown below it gives me an error. The goal is for everyone to be able to read the file but only root can write to it.
int my_permission(struct inode *inode, int op)
{
if(op == 4||(op == 2 && current->euid = 0)) //euid is not a member of task_struct
return 0;
return -EACCES;
}
const struct inode_operations my_iops = {
.permission = my_permission,
};
The error I'm getting is:
/home/karan/practice/procf/testproc1.c: In function ‘my_permission’:
/home/karan/practice/procf/testproc1.c:50:32: error: ‘struct task_struct’ has no member named ‘euid'
I know that current is #defined to get_current(). Why is this happening? Is there a list of members of the struct returned from get_current()?
The struct task_struct is defined in include/linux/sched.h in the kernel source tree, you can view the members there. The current credentials would be in get_current()->cred , and the effective user id is get_current()->cred->euid
It's not safe to access those members directly, you must rather call current_euid() from include/linux/cred.h
http://www.kernel.org/doc/Documentation/security/credentials.txt might be of interest to you as well

Resources