how list_head structure used for scheduling in kernel - linux

I have read a few things from which I can make out that instead of scheduling a task with a scheduling policy it is better that we schedule an entity with a scheduling policy. The advantages being that you can schedule many things with the same scheduling policy. So there are two entities defined for two scheduling policies(CFS and RT) namely as sched_entity and sched_rt_entity. The code for CFS entity is (from v3.5.4)
struct sched_entity {
struct load_weight load; /* for load-balancing */
struct rb_node run_node;
struct list_head group_node;
unsigned int on_rq;
u64 exec_start;
u64 sum_exec_runtime;
u64 vruntime;
u64 prev_sum_exec_runtime;
u64 nr_migrations;
#ifdef CONFIG_SCHEDSTATS
struct sched_statistics statistics;
#endif
#ifdef CONFIG_FAIR_GROUP_SCHED
struct sched_entity *parent;
/* rq on which this entity is (to be) queued: */
struct cfs_rq *cfs_rq;
/* rq "owned" by this entity/group: */
struct cfs_rq *my_q;
#endif
};
and for RT(real time) entity is
struct sched_rt_entity {
struct list_head run_list;
unsigned long timeout;
unsigned int time_slice;
struct sched_rt_entity *back;
#ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity *parent;
/* rq on which this entity is (to be) queued: */
struct rt_rq *rt_rq;
/* rq "owned" by this entity/group: */
struct rt_rq *my_q;
#endif
};
Both of these uses the list_head structures defined in ./include/linux/types.h
struct list_head {
struct list_head *next, *prev;
};
I honestly do not understand how any such thing is going to be scheduled. Can anyone explain how this is working.
P.S.:
Moreover, I am really having a hard time understanding the meaning of the names of data members. Can anyone suggest a good read for understanding kernel structures so that I can figure out these things a bit easily. Most of the time I spend is wasted in searching what a data member could mean.

Scheduling entities were introduced in order to implement group scheduling, so that CFS (or RT scheduler) will provide fair CPU time for individual tasks but also fair CPU time to groups of tasks. Scheduling entity may be either a task or group of tasks.
struct list_head is just Linux way to implement linked list. In the code you posted fields group_node and run_list allow to create lists of struct sched_entity and struct sched_rt_entity. More information can be found here.
Using these list_heads scheduling entities are stored in certain scheduler related data structures, for example cfs_rq.cfs_tasks if an entity is a task enqueued using account_entity_enqueue().
Always up to date documentation of Linux kernel can be found within its sources. In this case you should check this directory and especially this file which describes CFS. There is also an explanation of task groups.
EDIT: task_struct contains a field se of type struct sched_entity. Then, having an address to a sched_entity object using container_of macro it is possible to retrieve an address to the task_struct object, see task_of(). (address of sched_entity object - offset of se in task_struct = address of task_struct object) This is quite common trick used also in the implementation of lists I mentioned earlier in this answer.

Related

Judge whether the current process is running when SMP is turned on

I found that there is a function called task_running in the kernel, and its judgment logic is as follows
static inline int task_current(struct rq *rq, struct task_struct *p)
{
return rq->curr == p;
}
static inline int task_running(struct rq *rq, struct task_struct *p)
{
#ifdef CONFIG_SMP
return p->on_cpu;
#else
return task_current(rq, p);
#endif
}
Is there any difference between rq->curr and p->on_cpu? I think they both mean that the process is being scheduled by the current cpu. Why are separate judgments required under SMP?
On a multiprocessor system (CONFIG_SMP) the scheduler usually needs to continuously acquire and releases the spinlocks p->pi_lock (for tasks i.e. struct task_struct) and rq->lock (for runqueues i.e. struct rq). Since lock contention is one of the main factors of slowdown in the case of multiprocessor systems, the on_cpu field of task_struct was added to avoid acquiring rq->lock to look at rq->curr.
On a uniprocessor system (no CONFIG_SMP), there is only one CPU with one runqeueue, and there is no need to acquire runqueue locks when checking the running task. Since the scheduler will not do any locking anyway, we can also avoid having the on_cpu field in task_struct, saving some memory and also dumping all the code that deals with it. In fact, if you take a look at the code for struct task_struct, you can see:
struct task_struct {
/* ... */
#ifdef CONFIG_SMP
int on_cpu;
/* ... */
#endif
/* ... */
}
Here's the relevant commit that implemented this optimization. It was part of this patchwork by Peter Zijlstra to reduce lock contention in the scheduler (expand the "related" field to list everything).

How can I retrieve a task's sessionid in an eBPF program?

I want to retrieve the sessionid from a task struct in an eBPF program. I have the following code in my eBPF program:
struct task_struct *task;
u32 sessionid;
task = (struct task_struct *)bpf_get_current_task();
sessionid = task->sessionid;
This runs, but the sessionid always ends up being -1. I read in this answer that I can use task_session to retrieve it, but I get an error about invalid memory access. I believe I need to use bpf_probe_read to move the task_struct that task points to onto the stack, but I can't get it to work. Is there anything I'm missing?
After a bit more digging through the task_struct struct I realised you could do this:
struct task_struct *task;
struct pid_link pid_link;
struct pid pid;
unsigned int sessionid;
task = (struct task_struct *)bpf_get_current_task();
bpf_probe_read(&pid_link, sizeof(pid_link), (void *)&task->group_leader->pids[PIDTYPE_SID]);
bpf_probe_read(&pid, sizeof(pid), (void *)pid_link.pid);
sessionid = pid.numbers[0].nr;

"current" in Linux kernel code

As I was going through the below chunk of Linux char driver code, I found the structure pointer current in printk.
I want to know what structure the current is pointing to and its complete elements.
What purpose does this structure serve?
ssize_t sleepy_read (struct file *filp, char __user *buf, size_t count, loff_t *pos)
{
printk(KERN_DEBUG "process %i (%s) going to sleep\n",
current->pid, current->comm);
wait_event_interruptible(wq, flag != 0);
flag = 0;
printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm);
return 0;
}
It is a pointer to the current process ie, the process which has issued the system call.
From the docs:
The Current Process
Although kernel modules don't execute sequentially as applications do,
most actions performed by the kernel are related to a specific
process. Kernel code can know the current process driving it by
accessing the global item current, a pointer to struct task_struct,
which as of version 2.4 of the kernel is declared in
<asm/current.h>, included by <linux/sched.h>. The current pointer
refers to the user process currently executing. During the execution
of a system call, such as open or read, the current process is the one
that invoked the call. Kernel code can use process-specific
information by using current, if it needs to do so. An example of this
technique is presented in "Access Control on a Device File", in
Chapter 5, "Enhanced Char Driver Operations".
Actually, current is not properly a global variable any more, like it
was in the first Linux kernels. The developers optimized access to the
structure describing the current process by hiding it in the stack
page. You can look at the details of current in <asm/current.h>. While
the code you'll look at might seem hairy, we must keep in mind that
Linux is an SMP-compliant system, and a global variable simply won't
work when you are dealing with multiple CPUs. The details of the
implementation remain hidden to other kernel subsystems though, and a
device driver can just include and refer to the
current process.
From a module's point of view, current is just like the external
reference printk. A module can refer to current wherever it sees fit.
For example, the following statement prints the process ID and the
command name of the current process by accessing certain fields in
struct task_struct:
printk("The process is \"%s\" (pid %i)\n",
current->comm, current->pid);
The command name stored in current->comm is the base name of the
program file that is being executed by the current process.
Here is the complete structure the "current" is pointing to
task_struct
Each task_struct data structure describes a process or task in the system.
struct task_struct {
/* these are hardcoded - don't touch */
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
long counter;
long priority;
unsigned long signal;
unsigned long blocked; /* bitmap of masked signals */
unsigned long flags; /* per process flags, defined below */
int errno;
long debugreg[8]; /* Hardware debugging registers */
struct exec_domain *exec_domain;
/* various fields */
struct linux_binfmt *binfmt;
struct task_struct *next_task, *prev_task;
struct task_struct *next_run, *prev_run;
unsigned long saved_kernel_stack;
unsigned long kernel_stack_page;
int exit_code, exit_signal;
/* ??? */
unsigned long personality;
int dumpable:1;
int did_exec:1;
int pid;
int pgrp;
int tty_old_pgrp;
int session;
/* boolean value for session group leader */
int leader;
int groups[NGROUPS];
/*
* pointers to (original) parent process, youngest child, younger sibling,
* older sibling, respectively. (p->father can be replaced with
* p->p_pptr->pid)
*/
struct task_struct *p_opptr, *p_pptr, *p_cptr,
*p_ysptr, *p_osptr;
struct wait_queue *wait_chldexit;
unsigned short uid,euid,suid,fsuid;
unsigned short gid,egid,sgid,fsgid;
unsigned long timeout, policy, rt_priority;
unsigned long it_real_value, it_prof_value, it_virt_value;
unsigned long it_real_incr, it_prof_incr, it_virt_incr;
struct timer_list real_timer;
long utime, stime, cutime, cstime, start_time;
/* mm fault and swap info: this can arguably be seen as either
mm-specific or thread-specific */
unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
int swappable:1;
unsigned long swap_address;
unsigned long old_maj_flt; /* old value of maj_flt */
unsigned long dec_flt; /* page fault count of the last time */
unsigned long swap_cnt; /* number of pages to swap on next pass */
/* limits */
struct rlimit rlim[RLIM_NLIMITS];
unsigned short used_math;
char comm[16];
/* file system info */
int link_count;
struct tty_struct *tty; /* NULL if no tty */
/* ipc stuff */
struct sem_undo *semundo;
struct sem_queue *semsleeping;
/* ldt for this task - used by Wine. If NULL, default_ldt is used */
struct desc_struct *ldt;
/* tss for this task */
struct thread_struct tss;
/* filesystem information */
struct fs_struct *fs;
/* open file information */
struct files_struct *files;
/* memory management info */
struct mm_struct *mm;
/* signal handlers */
struct signal_struct *sig;
#ifdef __SMP__
int processor;
int last_processor;
int lock_depth; /* Lock depth.
We can context switch in and out
of holding a syscall kernel lock... */
#endif
};

How many items can be queued in the waitqueu in linux?

I want to know what is maximum size of waitqueue. How to know upper limit of waitqueue
regards
Sobin
According to the source , the waitqueue is just a task list, so there is not max size:
struct __wait_queue_head {
spinlock_t lock;
struct list_head task_list;
};
typedef struct __wait_queue_head wait_queue_head_t;

What is cron job's server usage?

Alright,
I'm thinking of creating a webscript that depends on cronjob.. I'm wondering, would it ever make any server damages for the amount of crontabs ?
lets say i have 50 crontabs to be done everyday, would it ever hurt the server ?
if no, what's the max amount of crontabs to be added in a linux server # 512MB memory
When you create a new job the cron daemon call the function job_add (job.c), this function alloc the memory to the job and add it to the tail of the job list.
The job is allocated on the heap, so theorically you're limited just by the RAM installed on your machine.
Some notes from the CRON code:
The job structure:
typedef struct _job {
struct _job *next;
entry *e;
user *u;
} job;
Each user crontab entry is defined by:
typedef struct _entry {
struct _entry *next;
uid_t uid;
gid_t gid;
char **envp;
char *cmd;
bitstr_t bit_decl(minute, MINUTE_COUNT);
bitstr_t bit_decl(hour, HOUR_COUNT);
bitstr_t bit_decl(dom, DOM_COUNT);
bitstr_t bit_decl(month, MONTH_COUNT);
bitstr_t bit_decl(dow, DOW_COUNT);
int flags;
#define DOM_STAR 0x01
#define DOW_STAR 0x02
#define WHEN_REBOOT 0x04
} entry;
And the user struct:
typedef struct _user {
struct _user *next, *prev; /* links */
char *name;
time_t mtime; /* last modtime of crontab */
entry *crontab; /* this person's crontab */
} user;
You can see that this structs does not cosume a lot of memory.
If you're curious about how the implementation of cron work, you can see the code here : cron ubuntu source.

Resources