I want to read task_struct in my user level code, without making any of my own syscall to get into kernel.
Is there any system call available for mapping task_struct into user level code?
Related
Background :
I am a beginner in the area of linux kernel. I just started to understand Linux kernel by reading a book 'Linux kernel Development - Third Edition' by Robert Love. Most of the explanations in this book are based on Linux kernel 2.6.34.
Hence, I am sorry, if this is repetitive question, but I could not find any info related to this in stack overflow.
Question:
What I understood from the book is that, each thread in linux has a structure called 'thread_info', which has pointer to its process/task.
This 'thread_info' is stored and the end of the kernel stack for each alive thread.
and the 'thread_info' has a pointer to its belonging task as below.
struct thread_info {
struct task_struct *task;
...
};
But when I checked the same structure in the latest linux code, I see a very different thread_info structure as below. (https://elixir.bootlin.com/linux/v5.16-rc1/source/arch/x86/include/asm/thread_info.h). It does not have 'task_struct' in it.
struct thread_info {
unsigned long flags; /* low level flags */
unsigned long syscall_work; /* SYSCALL_WORK_ flags */
u32 status; /* thread synchronous flags */
#ifdef CONFIG_SMP
u32 cpu; /* current CPU */
#endif
};
My Question is, that if 'thread_info' structure does not have its related task structure here, then how does it find the information about its address space?
Also, If you know any good book on the latest linux kernel, please provide links to me.
Pointer to the current task_struct object is stored in architecture-dependent way. On x86 it is stored in per-CPU variable:
DECLARE_PER_CPU(struct task_struct *, current_task);
(In arch/x86/include/asm/current.h).
For find out how current task_struct is stored on particular architecture and/or in particular kernel version just search for implementation of current macro: exactly that macro is responsible for returning a pointer to the task_struct of the current process.
I have a question about getuid() and geteuid() in linux.
I know that getuid will return the real user id of the current process. Also geteuid() will return the effective user id of the current process.
My question is, where the informations about id are stored. Apart from the existence of /etc/passwd, I think every process should store their own id information somewhere.
If I'm right, please tell me where is the information stored (say the area like the stack). If I'm wrong, how does the process get its id?
This is something maintained by the kernel in its internal in-memory structures.
Linux kernel uses something called struct task_struct:
Every process under Linux is dynamically allocated a struct task_struct structure.
In Linux kernel 4.12.10 this is defined as follows:
task_struct.h:
struct task_struct {
...
/* Objective and real subjective task credentials (COW): */
const struct cred __rcu *real_cred;
/* Effective (overridable) subjective task credentials (COW): */
const struct cred __rcu *cred;
cred.h:
struct cred {
...
kuid_t uid; /* real UID of the task */
kgid_t gid; /* real GID of the task */
kuid_t suid; /* saved UID of the task */
kgid_t sgid; /* saved GID of the task */
kuid_t euid; /* effective UID of the task */
kgid_t egid; /* effective GID of the task */
kuid_t fsuid; /* UID for VFS ops */
kgid_t fsgid; /* GID for VFS ops */
These structures cannot be accessed directly by a user space process. To get this information, such processes have to use either system calls (such as getuid() and geteuid()) or the /proc file system.
Read Advanced Linux Programming and perhaps Operating System: Three Easy Pieces (both are freely downloadable).
(several books are needed to answer your question)
getuid(2) is (like getpid(2) and many others) a system call provided and implemented by the Linux kernel. syscalls(2) is a list of them.
(please take time to read more about system calls in general)
where the informations about id are stored.
The kernel manages data describing every process (in kernel memory, see NPE's answer for details). Each system call is a primitive atomic operation (from user-space perspective) and returns a result (usually in some register, not in memory). Read about CPU modes.
So that information is not in the user-level virtual address space of the process, it is returned at every invocation of getuid.
I wonder where Linux kernel keeps 'ruid' and 'euid'.
Below is what I know about them.
When an user runs a file and the file turns to a process, the process gets to have ruid and euid.
If the file had been set to use setuid, euid of the process would change to user id of the owner of that file, and if not, euid would not change and be the same as ruid.
Then, Linux kernel allows the process to run another process or use other resources in the system according to ruid and euid.
So, I think that means kernel has to keep ruid and euid of each process somewhere in RAM.
I thought the 'somewhere' is in PCB, but PCB block does not have fields for ruid and euid.
I tried to find them in the process file of '/proc' directory, but failed.
Where does Linux keep ruid and euid of running processes?
Here is an explanation of how it works in new kernels:
From user-space point of view, real and effective user ID can be changed using setreuid() syscall. See man 2 setreuid for usage details
Kernel is using struct cred for storing UID and EUID
Each process has its own struct cred; take a look at .cred field in struct task_struct
RUID is stored in .uid field of struct cred; see setreuid() syscall code:
struct cred *new;
kuid_t kruid, keuid;
...
kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
...
new->uid = kruid;
new->euid = keuid;
...
return commit_creds(new);
commit_creds() function is actually sets RUID and EUID to current process
See also this answer to get a clue about older kernels: How to get current process's UID and EUID in Linux Kernel 4.2?
Can please summarize the events/steps that happen when I try to execute a read()/write() system call. How does the kernel know which file system to issue these commands.
Lets say a process calls write().
Then It will call sys_write().
Now probably, since sys_write() is executed on behalf of the current process, it can access the struct task_struct and hence it can access the struct files_struct and struct fs_struct which contains file system information.
But after that I am not seeing, how this fs_struct is helping to identify the file system.
Edit: Now that Alex has described the flow...I have still doubt how the read/write are getting routed to a FS, since the VFS does not do it, then it must be happening somewhere else, Also how is the underlying block device and then finally the hardware protocol PCI/USB getting attached.
A simple flow chart involving actual data structures would be helpful
Please help.
This answer is based on kernel version 4.0. I traced out some of the code which handles a read syscall. I recommend you clone the Linux source repo and follow along in the source code.
Syscall handler for read, at fs/read_write.c:620 is called. It receives a file descriptor (integer) as an argument, and calls fdget_pos to convert it to a struct fd.
fdget_pos calls __fdget_pos calls __fdget calls __fget_light. __fget_light uses current->files, the file descriptor table for the current process, to look up the struct file which corresponds to the passed file descriptor number.
Back in the syscall handler, the file struct is passed to vfs_read, at fs/read_write.c:478.
vfs_read calls __vfs_read, which calls file->f_op->read. From here on, you are in filesystem-specific code.
So the VFS doesn't really bother "identifying" the filesystem which a file lives on; it simply uses the table of "file operation" function pointers which is stored in its struct file. When that struct file is initialized, it is given the correct f_op function pointer table which implements all the filesystem-specific operations for its filesystem.
Each filesystem registers itself to VFS. When a filesystem is mounted, its superblock is read and VFS superblock is populated with this information. Function pointer table for this filesystem is also populated at this time. when file->f_op->read call happens, registered function from the filesystem is actually called. You can refer to text in http://www.science.unitn.it/~fiorella/guidelinux/tlk/node102.html
I need to know how to write a system call that blocks(lock) and unblocks(unlock) an archive(inode) or a partition(super_block) for read and write functions.
Example: these function are in fs.h
lock_super(struct super_block *);
unlock_super(struct super_block *);
How to obtain the super_block (/dev/sda1 for example)?
The lock_super and unlock_super calls are not meant to be controlled directly by the user level processes. It is only meant to be called by the VFS layer, when a operation(operation on inode) on the filesystem is called by the user process. If you still wish to do that, you have to write your own device driver and expose the desired functionality(locking unlocking of the inode) to the user level.
There are no current system calls that would allow you to lock, unlock inodes. There are many reasons why it is not wise to implement new system call, without due consideration. But if you wish to do that, you would need to write the handler of your own system call in the kernel. It seems you want fine-grain control over the file-system, perhaps you are implementing user-level file-system.
For the answer on how to get the super_block, every file-system module registers itself with the VFS(Virtual File System). VFS acts as a intermediate layer between user and the actual file-system. So, it is the VFS that knows about the function pointers to the lock_super and unlock_super methods. The VFS Superblock contains the "device info" and "set of pointers to the file-system superblock". You can get those pointers from here and call them. But remember, because the actual file-system is managed by the VFS, you would be potentially corrupting the data.