Clock frequency setting doesn't change simulation speed - emulation

I'm trying to run the following AVR program on SimAVR:
#include <avr/io.h>
#include <util/delay.h>
int main ()
{
DDRB |= _BV(DDB5);
for (;;)
{
PORTB ^= _BV(PB5);
_delay_ms(2000);
}
}
I've compiled it with F_CPU=16000000. The SimAVR runner is as follows:
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include "sim_avr.h"
#include "avr_ioport.h"
#include "sim_elf.h"
avr_t * avr = NULL;
static void* avr_run_thread(void * ignore)
{
for (;;) {
avr_run(avr);
}
return NULL;
}
void led_changed_hook(struct avr_irq_t* irq, uint32_t value, void* param)
{
printf("led_changed_hook %d %d\n", irq->irq, value);
}
int main(int argc, char *argv[])
{
elf_firmware_t f;
elf_read_firmware("image.elf", &f);
f.frequency = 16e6;
const char *mmcu = "atmega328p";
avr = avr_make_mcu_by_name(mmcu);
if (!avr) {
fprintf(stderr, "%s: AVR '%s' not known\n", argv[0], mmcu);
exit(1);
}
avr_init(avr);
avr_load_firmware(avr, &f);
avr_irq_register_notify(
avr_io_getirq(avr, AVR_IOCTL_IOPORT_GETIRQ('B'), 5),
led_changed_hook,
NULL);
pthread_t run;
pthread_create(&run, NULL, avr_run_thread, NULL);
for (;;) {}
}
The problem is that I see from the output of led_changed_hook that it runs at ~4x speed. Moreover, changing f.frequency doesn't seem to have any effect on the simulation speed whatsoever.
How do I ensure that SimAVR runs the simulation at the correct real-time speed?

It turns out SimAVR doesn't support timing-accurate simulation of opcodes so the simulation time of running the busy-wait of _delay_ms to completion is completely unrelated to
how long it would take on the real MCU
the clock frequency of the simulated MCU
The correct solution is to use a timer interrupt, and then go to sleep on the MCU. The simulator will correctly simulate the timer counters and the sleep will suspend the simulation until the timer fires.
#include <avr/interrupt.h>
#include <avr/power.h>
#include <avr/sleep.h>
int main ()
{
DDRB |= _BV(DDB5);
TCCR1A = 0;
TCCR1B = 0;
TCNT1 = 0;
TIMSK1 |= (1 << OCIE1A);
sei();
/* Set TIMER1 to 0.5 Hz */
TCCR1B |= (1 << WGM12);
OCR1A = 31248;
TCCR1B |= ((1 << CS12) | (1 << CS10));
set_sleep_mode(SLEEP_MODE_IDLE);
sleep_enable();
for (;;)
{
sleep_mode();
}
}
ISR(TIMER1_COMPA_vect){
PORTB ^= _BV(PB5);
}

Related

Pause thread execution without using condition variable or other various synchronization pritmives

Problem
I wish to be able to pause the execution of a thread from a different thread. Note the thread paused should not have to cooperate. The pausing of the target thread does not have to occur as soon as the pauser thread wants to pause. Delaying the pausing is allowed.
I cannot seem to find any information on this, as all searches yielded me results that use condition variables...
Ideas
use the scheduler and kernel syscalls to stop the thread from being scheduled again
use debugger syscalls to stop the target thread
OS-agnostic is preferable, but not a requirement. This likely will be very OS-dependent, as messing with scheduling and threads is a pretty low-level operation.
On a Unix-like OS, there's pthread_kill() which delivers a signal to a specified thread. You can arrange for that signal to have a handler which waits until told in some manner to resume.
Here's a simple example, where the "pause" just sleeps for a fixed time before resuming. Try on godbolt.
#include <unistd.h>
#include <pthread.h>
#include <signal.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
void safe_print(const char *s) {
int saved_errno = errno;
if (write(1, s, strlen(s)) < 0) {
exit(1);
}
errno = saved_errno;
}
void sleep_msec(int msec) {
struct timespec t = {
.tv_sec = msec / 1000,
.tv_nsec = (msec % 1000) * 1000 * 1000
};
nanosleep(&t, NULL);
}
void *work(void *unused) {
(void) unused;
for (;;) {
safe_print("I am running!\n");
sleep_msec(100);
}
return NULL;
}
void handler(int sig) {
(void) sig;
safe_print("I am stopped.\n");
sleep_msec(500);
}
int main(void) {
pthread_t thr;
pthread_create(&thr, NULL, work, NULL);
sigset_t empty;
sigemptyset(&empty);
struct sigaction sa = {
.sa_handler = handler,
.sa_flags = 0,
};
sigemptyset(&sa.sa_mask);
sigaction(SIGUSR1, &sa, NULL);
for (int i = 0; i < 5; i++) {
sleep_msec(1000);
pthread_kill(thr, SIGUSR1);
}
pthread_cancel(thr);
pthread_join(thr, NULL);
return 0;
}

Using eBPF to measure CPU mode switch overhead incured by making system call

As title, but the measurement result is unreasonable. Let me describe the current status.
I'm using syscall getuid as measurement target, I started by measureing the complete overhead with two clock_gettime bounded around, then measure the entry (what SYSCALL instruction does before executing the actual getuid code) and leaving overhead saparately (with eBPF program hook onto the entry and leaving point).
The result for the complete overhead is ~65ns, and regarding to the entry and leaving overhead, it's ~77ns and ~70ns respectively.
It's obvious that my measurement has some additional overhead except the typical overhead. However, it's weird that since clock_gettime is a vDSO syscall, it should barely have noticeable overhead. And BPF, which is a lightweight instrumental tool (JIT-ed and etc.) these day in Linux, shouldn't have noticeable overhead too.
Is there anyone have idea what additional overhead my measurement incurs?
Following is my measurement code:
userland (measuring the return-from-kernel overhead):
#define _GNU_SOURCE
#include <bpf.h>
#include <libbpf.h>
#include <stdlib.h>
#include <arpa/inet.h>
#include <net/if.h>
#include <string.h>
#include <asm/errno.h>
#include <linux/if_link.h>
#include <errno.h>
#include <sys/resource.h>
#include <unistd.h>
#include <asm/unistd.h>
#include <time.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <sched.h>
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
#define TEST_CNT 1000000
#define BPF_FILE_NAME "mkern.o"
#define BPF_MAP_NAME "msys"
static inline int sys_perf_event_open(struct perf_event_attr *attr, pid_t pid,
int cpu, int group_fd,
unsigned long flags)
{
attr->size = sizeof(*attr);
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
}
static int attach_kprobe(int prog_fd)
{
int err, fd, id;
char buf[32];
struct perf_event_attr attr = {};
err = system("echo 'r:kp_sys_batch __x64_sys_getuid' > /sys/kernel/debug/tracing/kprobe_events");
if (err < 0) {
fprintf(stderr, "Failed to create kprobe, error '%s'\n", strerror(errno));
return -1;
}
fd = open("/sys/kernel/debug/tracing/events/kprobes/kp_sys_batch/id", O_RDONLY, 0);
if (fd < 0) {
fprintf(stderr, "Failed to open event %s\n", "sys_batch");
return -1;
}
err = read(fd, buf, sizeof(buf));
if (err < 0 || err >= sizeof(buf)) {
fprintf(stderr, "read from '%s' failed '%s'\n", "sys_batch", strerror(errno));
return -1;
}
close(fd);
buf[err] = 0;
id = atoi(buf);
attr.config = id;
attr.type = PERF_TYPE_TRACEPOINT;
attr.sample_type = PERF_SAMPLE_RAW;
attr.sample_period = 1;
attr.wakeup_events = 1;
fd = sys_perf_event_open(&attr, 0/*this process*/, -1/*any cpu*/, -1/*group leader*/, 0);
if (fd < 0) {
perror("sys_perf_event_open");
fprintf(stderr, "Failed to open perf_event (id: %llu)\n", attr.config);
return -1;
}
err = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
if (err < 0) {
fprintf(stderr, "ioctl PERF_EVENT_IOC_ENABLE failed err %s\n",
strerror(errno));
return -1;
}
err = ioctl(fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
if (err < 0) {
fprintf(stderr, "ioctl PERF_EVENT_IOC_SET_BPF failed: %s\n",
strerror(errno));
return -1;
}
return 0;
}
static void maxi_memlock_rlimit(void)
{
struct rlimit rlim_new = {
.rlim_cur = RLIM_INFINITY,
.rlim_max = RLIM_INFINITY,
};
if (setrlimit(RLIMIT_MEMLOCK, &rlim_new)) {
fprintf(stderr, "Failed to increase RLIMIT_MEMLOCK limit!\n");
exit(-1);
}
}
static int find_map_fd(struct bpf_object *bpf_obj, const char *mapname)
{
struct bpf_map *map;
int map_fd = -1;
map = bpf_object__find_map_by_name(bpf_obj, mapname);
if (!map) {
fprintf(stderr, "Failed finding map by name: %s\n", mapname);
exit(-1);
}
map_fd = bpf_map__fd(map);
return map_fd;
}
int main(int argc, char **argv)
{
int bpf_map_fd;
int bpf_prog_fd = -1;
int err;
int key = 0;
struct timespec tp;
struct bpf_object *bpf_obj;
struct reals map;
struct bpf_prog_load_attr xattr = {
.prog_type = BPF_PROG_TYPE_KPROBE,
.file = BPF_FILE_NAME,
};
maxi_memlock_rlimit();
err = bpf_prog_load_xattr(&xattr, &bpf_obj, &bpf_prog_fd);
if (err) {
fprintf(stderr, "Failed loading bpf object file\n");
exit(-1);
}
if (attach_kprobe(bpf_prog_fd)) {
fprintf(stderr, "Failed attaching kprobe\n");
exit(-1);
}
bpf_map_fd = find_map_fd(bpf_obj, BPF_MAP_NAME);
if (find_map_fd < 0) {
fprintf(stderr, "Failed finding map fd\n");
exit(-1);
}
/* warm up */
for (int i = 0; i < TEST_CNT; i++) {
syscall(__NR_getuid); /* dummy call */
clock_gettime(CLOCK_MONOTONIC, &tp);
if (unlikely(bpf_map_lookup_elem(bpf_map_fd, &key, &map))) {
fprintf(stderr, "Failed to lookup map element\n");
perror("lookup");
exit(-1);
}
}
uint64_t delta = 0;
for (int i = 0; i < TEST_CNT; i++) {
syscall(__NR_getuid); /* dummy call */
clock_gettime(CLOCK_MONOTONIC, &tp);
if (unlikely(bpf_map_lookup_elem(bpf_map_fd, &key, &map))) {
fprintf(stderr, "Failed to lookup map element\n");
perror("lookup");
exit(-1);
}
delta += (1000000000 * tp.tv_sec + tp.tv_nsec) - map.ts;
}
printf("avg: %fns\n", (double) delta / TEST_CNT);
return 0;
}
user land (measuring the enter-kernel overhead, almost same as the above, except what I pointed out):
err = system("echo 'p:kp_sys_batch sys_batch' > /sys/kernel/debug/tracing/kprobe_events");
...
clock_gettime(CLOCK_MONOTONIC, &tp);
syscall(__NR_getuid); /* dummy call */
...
delta += map.ts - (1000000000 * tp.tv_sec + tp.tv_nsec);
kernel land:
SEC("getuid")
int kp_sys_batch(struct pt_regs *ctx)
{
__u32 i = 0;
struct reals *r;
r = bpf_map_lookup_elem(&reals, &i);
if (!r)
return 1;
r->ts = bpf_ktime_get_ns();
return 0;
}
Except the additional overhead I mentioned above, inside the return-from-kernel measurement code, if the echo 'r:kp_sys_batch sys_batch' is changed to echo 'p:kp_sys_batch sys_batch' (which means that the measurement would take the syscall execution overhead into account), the result would be ~48ns, this means that the result includes overhead of syscall execution and return-from-kernel. Any idea why this could be only ~48ns?
Thanks!

Using CLOCK_MONOTONIC type in the 'condition variable' wait_for() notify() mechanism

I am using code that runs on ARM (not Intel processor). Running c++11 code example (CODE A) from: http://www.cplusplus.com/reference/condition_variable/condition_variable/wait_for/ to test the wait_for() mechanism. This is not working right - looks like the wait_for() does not wait. In Intel works fine. After some research and using pthread library directly and setting MONOTONIC_CLOCK definition, solves the issue (CODE B).
(Running on ARM is not the issue)
My problem is :
How can I force the C++11 API wait_for() to work with MONOTONIC_CLOCK?
Actually I would like to stay with 'CODE A' but with the support or setting of MONOTONIC_CLOCK.
Thanks
CODE A
// condition_variable::wait_for example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <chrono> // std::chrono::seconds
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable, std::cv_status
std::condition_variable cv;
int value;
void read_value() {
std::cin >> value;
cv.notify_one();
}
int main ()
{
std::cout << "Please, enter an integer (I'll be printing dots): \n";
std::thread th (read_value);
std::mutex mtx;
std::unique_lock<std::mutex> lck(mtx);
while
(cv.wait_for(lck,std::chrono::seconds(1))==std::cv_status::timeout)
{
std::cout << '.' << std::endl;
}
std::cout << "You entered: " << value << '\n';
th.join();
return 0;
}
CODE B
#include <sys/time.h>
#include <unistd.h>
#include <iostream> // std::cout
#include <thread> // std::thread
#include <chrono> // std::chrono::seconds
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable, std::cv_status
const size_t NUMTHREADS = 1;
pthread_mutex_t mutex;
pthread_cond_t cond;
int value;
bool done = false;
void* read_value( void* id )
{
const int myid = (long)id; // force the pointer to be a 64bit integer
std::cin >> value;
done = true;
printf( "[thread %d] done is now %d. Signalling cond.\n", myid, done
);
pthread_cond_signal( &cond );
}
int main ()
{
struct timeval now;
pthread_mutexattr_t Attr;
pthread_mutexattr_init(&Attr);
pthread_mutexattr_settype(&Attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(&mutex, &Attr);
pthread_condattr_t CaAttr;
pthread_condattr_init(&CaAttr);
pthread_condattr_setclock(&CaAttr, CLOCK_MONOTONIC);
pthread_cond_init(&cond, &CaAttr);
std::cout << "Please, enter an integer:\n";
pthread_t threads[NUMTHREADS];
int t = 0;
pthread_create( &threads[t], NULL, read_value, (void*)(long)t );
struct timespec ts;
pthread_mutex_lock( &mutex );
int rt = 0;
while( !done )
{
clock_gettime(CLOCK_MONOTONIC, &ts);
ts.tv_sec += 1;
rt = pthread_cond_timedwait( & cond, & mutex, &ts );
std::cout << "..." << std::endl;
}
pthread_mutex_unlock( & mutex );
std::cout << "You entered: " << value << '\n';
return 0;
}
The documentation for std::condition_variable::wait_for says:
A steady clock is used to measure the duration.
std::chrono::steady_clock:
Class std::chrono::steady_clock represents a monotonic clock. The time points of this clock cannot decrease as physical time moves forward.
Unfortunately, this is gcc Bug 41861 (DR887) - (DR 887)(C++0x) does not use monotonic_clock that it uses system_clock instead of steady_clock for condition variables.
One solution is to use wait_until (be sure to read Notes section) function that allows to specify durations relative to a specific clock. E.g.:
cv.wait_until(lck, std::chrono::steady_clock::now() + std::chrono::seconds(1))

C++ thread and mutex and condition variable

findsmallest common multiple of 10-million numbers in the queue does not exceed 10,000
I killed 2 days to sort out but I just do not understand! please help me
#include <condition_variable>
#include <mutex>
#include <thread>
#include <iostream>
#include <queue>
#include <chrono>
#include <cmath>
#include <map>
#include <cstdlib>
#include <fstream>
#include <ctime>
using namespace std;
int main()
{
std::map <int, int> NOK;
map<int, int> snok;
std::queue<int> oche;
std::mutex m;
std::condition_variable cond_var;
bool done = false;
bool notified = false;
std::thread filev([&]() {
//std::unique_lock<std::mutex> lock(m);
ifstream in; // Поток in будем использовать для чтения
int ch;
in.open("/home/akrasikov/prog/output.txt");
while(!in.eof()){
if (oche.size()>9999){
std::this_thread::sleep_for(std::chrono::milliseconds(3));
std::unique_lock<std::mutex> lock(m);
} else {
in>>ch;
oche.push(ch);
}
}
notified = true;
cond_var.notify_one();
done = true;
cond_var.notify_one();
});
std::thread nok([&]() {
std::unique_lock<std::mutex> lock(m);
while (!done) {
while (!notified) { // loop to avoid spurious wakeups
cond_var.wait(lock);
}
while (!oche.empty()) {
ch=oche.front();
oche.pop();
int j=2;
while (j < sqrt((double)ch)+1){
int s=0;
while(!(ch%j)){
s++;
ch/=j;
}
if (s > 0 && NOK[j] < s){
NOK[j] = s;
}
j++;
}
if (NOK[ch] == 0) NOK[ch]++;
}
long int su=1;
int temp=-1;
int step=0;
int sa=1;
std::cout << " NOK= ";
for (std::map<int, int>::iterator it=NOK.begin(); it!=NOK.end(); it++){
for (int i=0; i<it->second; i++){
su*=it->first;
sa=it->first;
if (temp<sa && sa >1){
temp=sa;
step=1;
} else {
if(sa>1)
step++;
}
}
cout<< temp << "^"<< step << " * " ;
}
std::cout << "su = " << su << '\n';
}
notified = false;
});
filev.join();
nok.join();
}
This program does not work! how come? what's wrong? it just starts and hangs, but if you do not delete is code
if (oche.size()>9999){
std::this_thread::sleep_for(std::chrono::milliseconds(3));
std::unique_lock<std::mutex> lock(m);
} else {
and
while (!done) {
while (!notified) { // loop to avoid spurious wakeups
cond_var.wait(lock);
}
everything works help plz
From what I understand of your problem, you have 3 problems
Conpute the least common multiple for a list of 1M elements
You want to have one thread that produces the elements and one that consumes it. They transfer it through a buffer (a queue in your case)
Your queue cannot exceed 10K elements
In my implementation I m generating the numbers randomly and using condition variables to coordinate between the threads.
Note that the LCM is associative so you can compute it recursively, not matter what the order is.
Here is the code but please DO NOT POST DIRTY CODE LIKE YOU DID NEXT TIME OR EVERYONE will kick you out.
Here is the code
#include <condition_variable>
#include <mutex>
#include <thread>
#include <iostream>
#include <queue>
#include <chrono>
#include <cmath>
#include <map>
#include <cstdlib>
#include <fstream>
#include <ctime>
#include <atomic>
#include <random>
using namespace std;
std::mutex mutRandom;//use for multithreading for random variables
int getNextRandom()
{
std::lock_guard<std::mutex> lock(mutRandom);
// C++11 Random number generator
std::mt19937 eng (time(NULL)); // Mersenne Twister generator with a different seed at each run
std::uniform_int_distribution<int> dist (1, 1000000);
return dist(eng);
}
//thread coordination
std::mutex mut;
std::queue<int> data_queue;
std::condition_variable data_cond;
std::atomic<int> nbData=0;
std::atomic<int> currLCM=1;//current LCM
const unsigned int nbMaxData=100000;
const unsigned int queueMaxSize=10000;
//Arithmetic function, nothing to do with threads
//greatest common divider
int gcd(int a, int b)
{
for (;;)
{
if (a == 0) return b;
b %= a;
if (b == 0) return a;
a %= b;
}
}
//least common multiple
int lcm(int a, int b)
{
int temp = gcd(a, b);
return temp ? (a / temp * b) : 0;
}
/// Thread related part
//for producing the data
void produceData()
{
while (nbData<nbMaxData)
{
std::unique_lock<std::mutex> lk(mut);
data_cond.wait(lk,[]{
return data_queue.size()<queueMaxSize;
});
cout<<nbData<<endl;
++nbData;
data_queue.push(getNextRandom());
data_cond.notify_one();
lk.unlock();
}
cout<<"Producer done \n";
}
//for consuming the data
void consumeData()
{
while (nbData<nbMaxData)
{
std::unique_lock<std::mutex> lk(mut);
data_cond.wait(lk,[]{
return !data_queue.empty();
});
int currData=data_queue.front();
data_queue.pop();
lk.unlock();
currLCM = lcm(currLCM,currData);
}
cout<<"Consumer done \n";
}
int main()
{
std::thread thProduce(&produceData);
std::thread thConsume(&consumeData);
thProduce.join();//to wait for the producing thread to finish before the program closes
thConsume.join();//same thing for the consuming one
return 0;
}
Hope that helps,

Logging a message from SIGTERM

What's the proper way to log a shutdown message when an application (a C++ daemon, in my case) receives a SIGTERM or SIGINT?
According to CERT and the signal(7) manpage, many functions (including, presumably, those used by most logging libraries) aren't safe to call from signal handlers.
Vlad Lazarenko wrote a great blog post earlier this year on this very topic. On Linux it boils down to creating a signal descriptor with signalfd(2) and use an event loop such as poll(2) or epoll_wait(2). Here is Vlad's example reading from the descriptor
#include <sys/signalfd.h>
#include <signal.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
int
main(int argc, char *argv[])
{
sigset_t mask;
int sfd;
struct signalfd_siginfo fdsi;
ssize_t s;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigaddset(&mask, SIGQUIT);
/* Block signals so that they aren't handled
according to their default dispositions */
if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
handle_error("sigprocmask");
sfd = signalfd(-1, &mask, 0);
if (sfd == -1)
handle_error("signalfd");
for (;;) {
s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
if (s != sizeof(struct signalfd_siginfo))
handle_error("read");
if (fdsi.ssi_signo == SIGINT) {
printf("Got SIGINT\n");
} else if (fdsi.ssi_signo == SIGQUIT) {
printf("Got SIGQUIT\n");
exit(EXIT_SUCCESS);
} else {
printf("Read unexpected signal\n");
}
}
}
This example can easily be extended to integrate into an event loop.
Logging could be done not from handler, but after it:
int received_sigterm = 0;
void
sigterm_handler(int sig)
{
received_sigterm = 1;
}
void
loop(void)
{
for(;;) {
sleep(1);
if (received_sigterm)
log("finish\n");
}
}
int
main()
{
log("start\n");
signal(SIGTERM, sigterm_handler);
loop();
}
The concept is borrowed from openssh-6.1 sources.

Resources