This question already has an answer here:
volatile variable using in making application
(1 answer)
Closed 8 years ago.
I am new in this field. Previously i was doing microcontroller programming. where I used in volatile variable to avoid compiler optimization. But I never saw such volatile declaration before variable declaration.Does it mean compilation is done without any optimization in arago build. Here I have two doubts.
How can I enable different types of optimization during compilation
like speed and space optimization in angstrom build?
If it already optimization compilation, why do not we need volatile declaration?
Optimization is typically controlled via compiler settings - such as the compiler command line. It is not controlled in code.
However for doing optimization the compiler assumes that variables behave like "normal variables" while the code is not interrupted.
This may lead to the following error: Some example code:
int a;
void myFunc(void)
{
a=1;
/* Wait until the interrupt sets a back to 0 */
while(a==1);
}
void interruptHandler(void)
{
/* Some hardware interrupt */
if(a==1) doSomeAction();
a=0;
}
The compiler assumes that there are no interrupts. Therefore it would see that
"a" is set to 1 and never changed before the "while" loop
The while loop is an endless loop because "a" does not change whithin this loop
"a" is never read before this endless loop
Therefore the optimizing compiler may change the code internally like this:
void myFunc(void)
{
while(1);
}
Leaving the "volatile" away may work but may not work.
If you do not have hardware interrupts (and no parallel threads, multi-core CPUs etc.) "volatile" makes the code only slower and has no benefit because it is not required.
Related
Kprobe has a pre-handler function vaguely documented as followed:
User's pre-handler (kp->pre_handler)::
#include <linux/kprobes.h>
#include <linux/ptrace.h>
int pre_handler(struct kprobe *p, struct pt_regs *regs);
Called with p pointing to the kprobe associated with the breakpoint,
and regs pointing to the struct containing the registers saved when
the breakpoint was hit. Return 0 here unless you're a Kprobes geek.
I was wondering if one can use this function (or any other Kprobe feature) to prevent a process from being executed \ forked.
As documented in the kernel documentation, you can change the execution path by changing the appropriate register (e.g., IP register in x86):
Changing Execution Path
-----------------------
Since kprobes can probe into a running kernel code, it can change the
register set, including instruction pointer. This operation requires
maximum care, such as keeping the stack frame, recovering the execution
path etc. Since it operates on a running kernel and needs deep knowledge
of computer architecture and concurrent computing, you can easily shoot
your foot.
If you change the instruction pointer (and set up other related
registers) in pre_handler, you must return !0 so that kprobes stops
single stepping and just returns to the given address.
This also means post_handler should not be called anymore.
Note that this operation may be harder on some architectures which use
TOC (Table of Contents) for function call, since you have to setup a new
TOC for your function in your module, and recover the old one after
returning from it.
So you might be able to block a process' execution by jumping over some code. I wouldn't recommend it; you're more likely to cause a kernel crash than to succeed in stopping the execution of a new process.
seccomp-bpf is probably better suited for your use case. This StackOverflow answer gives you all the information you need to leverage seccomp-bpf.
The following implementation from Wikipedia:
volatile unsigned int produceCount = 0, consumeCount = 0;
TokenType buffer[BUFFER_SIZE];
void producer(void) {
while (1) {
while (produceCount - consumeCount == BUFFER_SIZE)
sched_yield(); // buffer is full
buffer[produceCount % BUFFER_SIZE] = produceToken();
// a memory_barrier should go here, see the explanation above
++produceCount;
}
}
void consumer(void) {
while (1) {
while (produceCount - consumeCount == 0)
sched_yield(); // buffer is empty
consumeToken(buffer[consumeCount % BUFFER_SIZE]);
// a memory_barrier should go here, the explanation above still applies
++consumeCount;
}
}
says that a memory barrier must be used between the line that accesses the buffer and the line that updates the Count variable.
This is done to prevent the CPU from reordering the instructions above the fence along-with that below it. The Count variable shouldn't be incremented before it is used to index into the buffer.
If a fence is not used, won't this kind of reordering violate the correctness of code? The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?
Thanks
If a fence is not used, won't this kind of reordering violate the correctness of code? The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?
Good question.
In c++, unless some form of memory barrier is used (atomic, mutex, etc), the compiler assumes that the code is single-threaded. In which case, the as-if rule says that the compiler may emit whatever code it likes, provided that the overall observable effect is 'as if' your code was executed sequentially.
As mentioned in the comments, volatile does not necessarily alter this, being merely an implementation-defined hint that the variable may change between accesses (this is not the same as being modified by another thread).
So if you write multi-threaded code without memory barriers, you get no guarantees that changes to a variable in one thread will even be observed by another thread, because as far as the compiler is concerned that other thread should not be touching the same memory, ever.
What you will actually observe is undefined behaviour.
It seems, that your question is "can incrementing Count and assigment to buffer be reordered without changing code behavior?".
Consider following code tansformation:
int count1 = produceCount++;
buffer[count1 % BUFFER_SIZE] = produceToken();
Notice that code behaves exactly as original one: one read from volatile variable, one write to volatile, read happens before write, state of program is the same. However, other threads will see different picture regarding order of produceCount increment and buffer modifications.
Both compiler and CPU can do that transformation without memory fences, so you need to force those two operations to be in correct order.
If a fence is not used, won't this kind of reordering violate the correctness of code?
Nope. Can you construct any portable code that can tell the difference?
The CPU shouldn't perform increment of Count before it is used to index into buffer. Does the CPU not take care of data dependency while instruction reordering?
Why shouldn't it? What would the payoff be for the costs incurred? Things like write combining and speculative fetching are huge optimizations and disabling them is a non-starter.
If you're thinking that volatile alone should do it, that's simply not true. The volatile keyword has no defined thread synchronization semantics in C or C++. It might happen to work on some platforms and it might happen not to work on others. In Java, volatile does have defined thread synchronization semantics, but they don't include providing ordering for accesses to non-volatiles.
However, memory barriers do have well-defined thread synchronization semantics. We need to make sure that no thread can see that data is available before it sees that data. And we need to make sure that a thread that marks data as able to be overwritten is not seen before the thread is finished with that data.
When I was discussing the behavior of spinlocks in uni- and SMP kernels with some colleagues, we dived into the code and found a line that really surprised us, and we can’t figure out why it’s done this way.
short calltrace to show where we’re coming from:
spin_lock calls raw_spin_lock,
raw_spin_lock calls _raw_spin_lock, and
on a uni-processor system, _raw_spin_lock is #defined as __LOCK
__LOCK is a define:
#define __LOCK(lock) \
do { preempt_disable(); ___LOCK(lock); } while (0)
So far, so good. We disable preemption by increasing the kernel task’s lock counter. I assume this is done to improve performance: since you should not hold a spinlock for more than a very short time, you should just finish your critical section instead of being interrupted and potentially have another task spin its scheduling slice away while waiting for you to finish.
However, now we finally come to my question. The corresponding unlock code looks like this:
#define __UNLOCK(lock) \
do { preempt_enable(); ___UNLOCK(lock); } while (0)
Why would you call preempt_enable() before ___UNLOCK? This seems very unintuitive to us, because you might get preempted immediately after calling preempt_enable, without ever having the chance to release your spinlock. It feels like this renders the whole preempt_disable/preempt_enable logic somewhat ineffective, especially since preempt_disable specifically checks during its call whether the lock counter is 0 again, and then calls the scheduler. It seems to us like it would make so much more sense to first release the lock, then decrease the lock counter and thus potentially enable scheduling again.
What are we missing? What is the idea behind calling preempt_enable before ___UNLOCK instead of the other way round?
You're looking at the uni-processor defines. As the comment in spinlock_api_up.h says (http://lxr.free-electrons.com/source/include/linux/spinlock_api_up.h#L21):
/*
* In the UP-nondebug case there's no real locking going on, so the
* only thing we have to do is to keep the preempt counts and irq
* flags straight, to suppress compiler warnings of unused lock
* variables, and to add the proper checker annotations:
*/
The ___LOCK and ___UNLOCK macros are there for annotation purposes, and unless __CHECKER__ is defined (It is defined by sparse), it ends up to be compiled out.
In other words, preempt_enable() and preempt_disable() are the ones doing the locking in a single processor case.
The following macro is defined in ./kernel/sched/sched.h
#define sched_feat(x) (static_branch_##x(&sched_feat_keys[__SCHED_FEAT_##x]))
#else /* !(SCHED_DEBUG && HAVE_JUMP_LABEL) */
#define sched_feat(x) (sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
#endif /* SCHED_DEBUG && HAVE_JUMP_LABEL */
I am unable to understand what role does it play.
The sched_feat() macro is used in scheduler code to test if a certain scheduler feature is enabled. For example, in kernel/sched/core.c, there is a snippet of code
int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
{
if (!sched_feat(OWNER_SPIN))
return 0;
which is testing whether the "spin-wait on mutex acquisition if the mutex owner is running" feature is set. You can see the full list of scheduler features in kernel/sched/features.h but a short summary is that they are tunables settable at runtime without rebuilding the kernel through /sys/kernel/debug/sched_features.
For example if you have not changed the default settings on your system, you will see "OWNER_SPIN" in your /sys/kernel/debug/sched_features, which means the !sched_feat(OWNER_SPIN) in the snippet above will evaluate to false and the scheduler code will continue on into the rest of the code in mutex_spin_on_owner().
The reason that the macro definition you partially copied is more complicated than you might expect is that it uses the jump labels feature when available and needed to eliminate the overhead of these conditional tests in frequently run scheduler code paths. (The jump label version is only used when HAVE_JUMP_LABEL is set in the config, for obvious reasons, and when SCHED_DEBUG is set because otherwise the scheduler feature bits can't change at runtime) You can follow the link above to lwn.net for more details, but in a nutshell jump labels are a way to use runtime binary patching to make conditional tests of flags much cheaper at the cost of making changing the flags much more expensive.
You can also look at the scheduler commit that introduced jump label use to see how the code used to be a bit simpler but not quite as efficient.
Is there any tool available for tracing communication among threads;
1. running in a single process
2. running in different processes (IPC)
I am presuming you need to trace this for debugging. Under normal circumstances it's hard to do this, without custom written code. For a similar problem that I faced, I had a per-processor tracing buffer, which used to record briefly the time and interesting operation that was performed by the running thread. The log was a circular trace which used to store data like this:
struct trace_data {
int op;
void *data;
struct time t;
union {
struct {
int op1_field1;
int op1_field2;
} d1;
struct {
int op2_field1;
int op2_field2;
} d2
} u;
}
The trace log was an array of these structures of length 1024, one for each processor. Each thread used to trace operations, as well as time to determine causality of events. The fields which were used to store data in the "union" depended upon the operation being done. The "data" pointer's meaning depended upon the "op" as well. When the program used to crash, I'd open the core in gdb and I had a gdb script which would go through the logs in each processor and print out the ops and their corresponding data, to find out the history of events.
For different processes you could do such logging to a file instead - one per process. This example is in C, but you can do this in whatever language you want to use, as long as you can figure out the CPU id on which the thread is running currently.
You might be looking for something like the Intel Thread Checker as long as you're using pthreads in (1).
For communication between different processes (2), you can use Aspect-Oriented Programming (AOP) if you have the source code, or write your own wrapper for the IPC functions and LD_PRELOAD it.
Edit: Whoops, you said tracing, not checking.
It will depend so much on the operating system and development environment that you are using. If you're with Visual Studio, look at the tools in Visual Studio 2010.