delay generated from the for loop - delay

//say delay_ms = 1
void Delay(const unsigned int delay_ms)
{
unsigned int x,y;
for(x=0;x<delay_ms;x++)
{
for(y=0;y<120;y++);
}
}
I am trying to use the C code above for my 8051 microcontroller. I wish to know what is the delay time generated above. I am using a 12MHz oscillator.

This is a truly lousy way to generate a time delay.
If you look at the assembler generated by the compiler then, from the data sheet for the processor variant that you are using, you can look up the clock cycles required for each instruction in the listing. Add these up and you will get the minimum delay time that this code will produce.
If you have interrupts enabled on your processor then the delay time will be extended by the execution time of any of the interrupt handlers that are triggered during the delay. These will add an essentially random amount of time to each delay function call depending upon the frequency and processing requirements of each interrupt.
The 8051 is built with hardware timer/counters that are designed to produce a signal after a user programmable delay. These are not affected by interrupt processing (it is true that the servicing of their trigger events may be delayed by another interrupt source) and so give a far more reliable duration for the delay .

Related

Analyzing Context Switch in Multithread [duplicate]

I want to calculate the context switch time and I am thinking to use mutex and conditional variables to signal between 2 threads so that only one thread runs at a time. I can use CLOCK_MONOTONIC to measure the entire execution time and CLOCK_THREAD_CPUTIME_ID to measure how long each thread runs.
Then the context switch time is the (total_time - thread_1_time - thread_2_time).
To get a more accurate result, I can just loop over it and take the average.
Is this a correct way to approximate the context switch time? I cant think of anything that might go wrong but I am getting answers that are under 1 nanosecond..
I forgot to mention that the more time I loop it over and take the average, the smaller results I get.
Edit
here is a snippet of the code that I have
typedef struct
{
struct timespec start;
struct timespec end;
}thread_time;
...
// each thread function looks similar like this
void* thread_1_func(void* time)
{
thread_time* thread_time = (thread_time*) time;
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->start));
for(x = 0; x < loop; ++x)
{
//where it switches to another thread
}
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->end));
return NULL;
};
void* thread_2_func(void* time)
{
//similar as above
}
int main()
{
...
pthread_t thread_1;
pthread_t thread_2;
thread_time thread_1_time;
thread_time thread_2_time;
struct timespec start, end;
// stamps the start time
clock_gettime(CLOCK_MONOTONIC, &start);
// create two threads with the time structs as the arguments
pthread_create(&thread_1, NULL, &thread_1_func, (void*) &thread_1_time);
pthread_create(&thread_2, NULL, &thread_2_func, (void*) &thread_2_time);
// waits for the two threads to terminate
pthread_join(thread_1, NULL);
pthread_join(thread_2, NULL);
// stamps the end time
clock_gettime(CLOCK_MONOTONIC, &end);
// then I calculate the difference between between total execution time and the total execution time of two different threads..
}
First of all, using CLOCK_THREAD_CPUTIME_ID is probably very wrong; this clock will give the time spent in that thread, in user mode. However the context switch does not happen in user mode, You'd want to use another clock. Also, on multiprocessing systems the clocks can give different values from processor to another! Thus I suggest you use CLOCK_REALTIME or CLOCK_MONOTONIC instead. However be warned that even if you read either of these twice in rapid succession, the timestamps usually will tens of nanoseconds apart already.
As for context switches - tthere are many kinds of context switches. The fastest approach is to switch from one thread to another entirely in software. This just means that you push the old registers on stack, set task switched flag so that SSE/FP registers will be lazily saved, save stack pointer, load new stack pointer and return from that function - since the other thread had done the same, the return from that function happens in another thread.
This thread to thread switch is quite fast, its overhead is about the same as for any system call. Switching from one process to another is much slower: this is because the user-space page tables must be flushed and switched by setting the CR0 register; this causes misses in TLB, which maps virtual addresses to physical ones.
However the <1 ns context switch/system call overhead does not really seem plausible - it is very probable that there is either hyperthreading or 2 CPU cores here, so I suggest that you set the CPU affinity on that process so that Linux only ever runs it on say the first CPU core:
#include <sched.h>
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
result = sched_setaffinity(0, sizeof(mask), &mask);
Then you should be pretty sure that the time you're measuring comes from a real context switch. Also, to measure the time for switching floating point / SSE stacks (this happens lazily), you should have some floating point variables and do calculations on them prior to context switch, then add say .1 to some volatile floating point variable after the context switch to see if it has an effect on the switching time.
This is not straight forward but as usual someone has already done a lot of work on this. (I'm not including the source here because I cannot see any License mentioned)
https://github.com/tsuna/contextswitch/blob/master/timetctxsw.c
If you copy that file to a linux machine as (context_switch_time.c) you can compile and run it using this
gcc -D_GNU_SOURCE -Wall -O3 -std=c11 -lpthread context_switch_time.c
./a.out
I got the following result on a small VM
2000000 thread context switches in 2178645536ns (1089.3ns/ctxsw)
This question has come up before... for Linux you can find some material here.
Write a C program to measure time spent in context switch in Linux OS
Note, while the user was running the test in the above link they were also hammering the machine with games and compiling which is why the context switches were taking a long time. Some more info here...
how can you measure the time spent in a context switch under java platform

difference between update_rq_clock and update_rq_clock_task

I understand the notion of update_rq_clock as it updates the run queue clock on system tick periodically. But this function calls update_rq_clock_task(). What is the purpose behind this function?
Within update_rq_clock the difference between the CPU timestamp and the run queue clock is calculated (The rq->clock variable represents the last clock read from the CPU). That difference is added to the rq->clock and to the rq->clock_task (Which is the same as rq->clock - time for interrupts and stolen time) through update_rq_clock_task.
There are a couple of options within the function, which you can activate with kernel build options. But basically it breaks down to:
...
rq->clock_task += delta;
...
update_rq_clock_pelt(rq, delta);
...
So, both functions together update the clock of the run queue and the clock of the run queue without accounting for interrupts and stolen time (unless you activated that accounting through the kernel options), so the actual time that the tasks used.

Using perf to get events counts depending on the occurrence of other events

Is there any possible way on which I can get the value of event counters depending on the occurrence of other events?
For example: if I want to know the value of performance counters each time a specific counter reach a specific value.
You can do that with perf_event_open, but AFAIK not directly with the current version of perf record.
I want to know the value of performance counters each time a specific counter reach a specific value.
Use a group of events, the "specific counter" is the group leader. For this event you set:
struct perf_event_attr leader;
leader.sample_type = PERF_SAMPLE_TIME | PERF_SAMPLE_READ;
leader.sample_period = specific_value;
// set type/config accordingly
leader.read_format = PERF_FORMAT_GROUP;
group_fd = syscall(__NR_perf_event_open, &leader, tid, cpu, -1, 0);
...
struct perf_event_attr other;
other.sample_period = 0; // doesn't trigger overflows
// set type/config accordingly
syscall(__NR_perf_event_open, &other, tid, cpu, group_fd, 0);
// do the mmap dance, ioctl etc. with the fd you get for the leader
// read values from both leader and other counters in your mmap buffer.
This isn't a great or complete answer, but it's too big for a comment.
IDK if that's possible with the perf utility itself, but in theory yes you could get that for legacy events that trigger an interrupt every time their counter overflows (at a programmable overflow count; this is how event sampling granularity works). You can then read the values from the counters for other events. Probably using the same API that perf does, you could write code that does this from user-space.
But on x86 for PEBS (precise events), you probably can't, because counter overflows put an event in a buffer instead of triggering an interrupt right then where you could do arbitrary other things. So if the event you want to use is only available as a precise event, you will need a different solution to your ultimate problem.
(Low level bonus reading about interrupts / exceptions in general, including performance events vs. PEBS: When an interrupt occurs, what happens to instructions in the pipeline?)
You probably want to know something about how events are correlated with each other. Wanting to sample other events when one overflows may be an X-Y problem, if you can't implement it easily.
perf record --timestamp will put a timestamp on each event. This may give you the raw data you need to learn what you want to know. Collecting the data for a partiulcar process from PMU for every 1 milli second is related, and suggests using perf script to do something with the results of perf record --all-cpus --timestamp.

How do I block all other processes on a Linux machine for XXX milliseconds?

I'm working with a Linux embedded SMP system that does audio I/O using ALSA and an external USB Audio device, using a 3.6.6. kernel. Problem: I'm getting infrequent (once every few weeks) system hiccups that are causing the audio stream to die. Although it's tough to be sure, the hiccups look like they lock up the entire system for a few dozens of milliseconds.
I can write ALSA code to recover after one of these hiccups, but since it's ALSA some trial and error will be required. Add that to having to wait weeks for a reoccurrence, and I'll be up a creek with a crowbar. I really need a way to cause the problem on demand.
I'd like to write a C program that runs as root and blocks all other processes on the system for a given number of milliseconds. I imagine it would involve disabling interrupts, doing a delay loop (since the timers will probably fail), and then restoring interrupts. But, I have to do it in such a way that the whole system doesn't go belly up.
Any ideas on how I would write such a program?
You could try raising the priority of your process and then using one of the "realtime" scheduling algorithms (e.g. SCHED_FIFO). This will help make sure that your process gets scheduled more consistently, even if other processes are running.
Well, based on CL's tip, and on information from http://www.tldp.org/HOWTO/text/IO-Port-Programming, I wrote the following code:
#include <stdio.h>
int main(int argc, char *argv[]) {
long i, j;
printf("About to lock system!\n");
// Boost I/O privilege level
iopl(3);
// Clear interrupt flag, masking interrupts
asm("cli");
// Wait about a second (with some hijinks to keep
// the loop from being optimized into oblivion)
j = 1;
for (i = 0; i < 250000000; i++) {
j *= i;
}
// Restore interrupt flag, restoring interrupts
asm("sti");
// Restore I/O privilege level
iopl(0);
printf("Phew! Survived!\n");
return 0;
}
When run as root, it works! Although not everything is suspended (and it's not clear to me what is and what isn't), enough locks up that my ALSA stream fails quite nicely. So, now I can stimulate the problem and ensure my code can handle it.
One note: I'd assumed that between the CLI and STI, system timing routines would fail due to the lack of interrupts. However, when just for the heck of it I tried usleep(), the timing code worked! But, the code as a whole actually didn't, because the call re-enabled interrupts, making the tool useless. Hence the use of a simple delay loop.

interrupt switch (PIC)

#define SW1 RB5
int IOFlag = 2; //while in out
void SW(){
if(!RB5)
__delay_ms(50);
while(!RB5);
__delay_ms(50);
IOFlag++;
}
void main(){
SW();
while(IOFlag % 2 != 0){
SW();
//some routines..
}
}
I used pic16f73, RB5 input use for switch.
When some of the routine is running, switch is not operating properly.
It is possible if you use the interrupt. However I don't know how to use it properly.
You need to understand the difference between polling and interrupts.
With polling (what you appear to be doing), you periodically check the state of some "thing" and act on it.
With interrupts, the "thing" happening causes your main thread of execution to be suspended, and an interrupt service routine (ISR) run.
Polling has the disadvantage of potentially long latency, the time between the thing happening and you finding out about it. In fact, you can even lose events if the thing is a momentary switch for example - you switch it on then off then, when the code checks for it, it's off.
Now you can still use polling if you wish, provided you understand these implications. Sometimes the easiest solution is to poll more often.
For example, if one of your //some routines.. jobs is a long running loop, you can poll from within there:
for (int i = 0; i < numThings; i++) {
doSomethingQuickWitn (thing[i]);
SW(); // Poll here as well
}
// Rather than here.
However, for _minimal latency, using interrupts is usually better and is reasonably simple once you wrap your head around the concept.
Your ISR (which will run on the given event, interrupting the main thread of execution) simply has to store the fact that the event has happened and communicate that to your main thread somehow.
For situations where you don't care how many times the event has happened, a flag will do the job. Your ISR simply sets the flag and your main thread of execution checks it periodically to see if it's been set, then clears it (with interrupts disabled so as to avoid race conditions). That would be something like (pseudo-code):
global val switchHit = false;
main:
interrupt (7, intFn) // call intfn() on interrupt 7
while true:
disableInts() // disallow interrupts for a short while
if switchHit:
handleSwitch() // switch was hit, do something (quickly)
switchHit = false // mark as not hit
enableInts() // and re-allow interrupts
doLotsOfOtherStuff()
intfn:
switchHit = true // notify main
Note that I'm not worry about race conditions within the ISR, interrupts are generally disabled automatically there.
More complicated information transfer may involve a count rather than a flag, or even a message queue of some sort, flowing from the ISR to the main thread of execution.

Resources