Thread Sleeps Too Long on OS X When In Background - multithreading

I have a background thread in a Qt App on OS X used for collecting data. The thread is supposed to sleep for 100 ms between each iteration, but it doesn't always work properly. When the app is the topmost OS X app, the sleep works fine. But when it isn't, the sleep lasts an arbitrary amount of time, up to ~10 seconds, after about a minute of operation.
Here is a simple Cocoa app that demonstrates the problem (note .mm for objc++)
AppDelegate.mm:
#import "AppDelegate.h"
#include <iostream>
#include <thread>
#include <libgen.h>
using namespace std::chrono;
#define DEFAULT_COLLECTOR_SAMPLING_FREQUENCY 10
namespace Helpers {
uint64_t time_ms() {
return duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
}
}
std::thread _collectorThread;
bool _running;
#interface AppDelegate ()
#end
#implementation AppDelegate
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
_running = true;
uint64_t start = Helpers::time_ms();
_collectorThread =
std::thread (
[&]{
while(_running) {
uint64_t t1, t2;
t1 = Helpers::time_ms();
std::this_thread::sleep_for((std::chrono::duration<int, std::milli>)(1000 / DEFAULT_COLLECTOR_SAMPLING_FREQUENCY));
t2 = Helpers::time_ms();
std::cout << (int)((t1 - start)/1000) << " TestSleep: sleep lasted " << t2-t1 << " ms" << std::endl;
}
});
}
- (void)applicationWillTerminate:(NSNotification *)aNotification {
_running = false;
_collectorThread.join();
}
#end
stdout:
0 TestSleep: sleep lasted 102 ms. // Window is in background
0 TestSleep: sleep lasted 101 ms. // behind Xcode window
0 TestSleep: sleep lasted 104 ms
0 TestSleep: sleep lasted 104 ms
0 TestSleep: sleep lasted 105 ms
0 TestSleep: sleep lasted 105 ms
0 TestSleep: sleep lasted 105 ms
0 TestSleep: sleep lasted 104 ms
0 TestSleep: sleep lasted 102 ms
0 TestSleep: sleep lasted 102 ms
1 TestSleep: sleep lasted 105 ms
1 TestSleep: sleep lasted 105 ms
1 TestSleep: sleep lasted 104 ms
1 TestSleep: sleep lasted 101 ms
1 TestSleep: sleep lasted 101 ms
1 TestSleep: sleep lasted 100 ms
...
...
52 TestSleep: sleep lasted 102 ms
52 TestSleep: sleep lasted 101 ms
52 TestSleep: sleep lasted 104 ms
52 TestSleep: sleep lasted 105 ms
52 TestSleep: sleep lasted 104 ms
52 TestSleep: sleep lasted 100 ms
52 TestSleep: sleep lasted 322 ms. // after ~1 minute,
53 TestSleep: sleep lasted 100 ms. // sleep gets way off
53 TestSleep: sleep lasted 499 ms
53 TestSleep: sleep lasted 1093 ms
54 TestSleep: sleep lasted 1086 ms
56 TestSleep: sleep lasted 1061 ms
57 TestSleep: sleep lasted 1090 ms
58 TestSleep: sleep lasted 1100 ms
59 TestSleep: sleep lasted 1099 ms
60 TestSleep: sleep lasted 1096 ms
61 TestSleep: sleep lasted 390 ms
61 TestSleep: sleep lasted 100 ms
61 TestSleep: sleep lasted 102 ms // click on app window
62 TestSleep: sleep lasted 102 ms // to bring it to foreground
62 TestSleep: sleep lasted 105 ms
On the other hand, the following complete program does not slow down:
#include <iostream>
#include <thread>
#include <libgen.h>
using namespace std::chrono;
#define DEFAULT_COLLECTOR_SAMPLING_FREQUENCY 10
namespace Helpers {
uint64_t time_ms() {
return duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
}
}
int main(int argc, char *argv[])
{
bool _running = true;
uint64_t start = Helpers::time_ms();
std::thread collectorThread = std::thread (
[&]{
while(_running) {
uint64_t t1, t2;
t1 = Helpers::time_ms();
std::this_thread::sleep_for((std::chrono::duration<int, std::milli>)(1000 / DEFAULT_COLLECTOR_SAMPLING_FREQUENCY));
t2 = Helpers::time_ms();
std::cout << (int)((t1 - start)/1000) << " TestSleep: sleep lasted " << t2-t1 << " ms" << std::endl;
}
});
collectorThread.join();
return 0;
}
// clang++ -std=c++14 -o testc++ main.cpp
stdout:
0 TestSleep: sleep lasted 100 ms
0 TestSleep: sleep lasted 101 ms
0 TestSleep: sleep lasted 105 ms
0 TestSleep: sleep lasted 105 ms
0 TestSleep: sleep lasted 100 ms
0 TestSleep: sleep lasted 100 ms
0 TestSleep: sleep lasted 101 ms
0 TestSleep: sleep lasted 104 ms
0 TestSleep: sleep lasted 101 ms
0 TestSleep: sleep lasted 104 ms
1 TestSleep: sleep lasted 102 ms
1 TestSleep: sleep lasted 101 ms
1 TestSleep: sleep lasted 101 ms
1 TestSleep: sleep lasted 101 ms
1 TestSleep: sleep lasted 100 ms
...
...
99 TestSleep: sleep lasted 101 ms
99 TestSleep: sleep lasted 105 ms
99 TestSleep: sleep lasted 104 ms
100 TestSleep: sleep lasted 104 ms
100 TestSleep: sleep lasted 101 ms
100 TestSleep: sleep lasted 104 ms
My original app was QML, also shows same slowing down behavior.
TestSleep.pro:
QT += quick
CONFIG += c++11
SOURCES += \
main.cpp
RESOURCES += qml.qrc
main.qml:
import QtQuick 2.9
import QtQuick.Controls 2.2
ApplicationWindow {
visible: true
width: 640
height: 480
title: qsTr("Scroll")
ScrollView {
anchors.fill: parent
ListView {
width: parent.width
model: 20
delegate: ItemDelegate {
text: "Item " + (index + 1)
width: parent.width
}
}
}
}
main.cpp:
#define DEFAULT_COLLECTOR_SAMPLING_FREQUENCY 10
namespace Helpers {
uint64_t time_ms() {
return duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
}
}
int main(int argc, char *argv[])
{
bool _running = true;
QThread *collectorThread = QThread::create(
// std::thread collectorThread = std::thread (
[&]{
while(_running) {
uint64_t t1;
t1 = Helpers::time_ms();
QThread::msleep(1000 / DEFAULT_COLLECTOR_SAMPLING_FREQUENCY);
// std::this_thread::sleep_for((std::chrono::duration<int, std::milli>)(1000 / DEFAULT_COLLECTOR_SAMPLING_FREQUENCY));
t1 = Helpers::time_ms() - t1;
std::cout << "TestUSleep: sleep lasted " << t1 << " ms" << std::endl;
}
});
collectorThread->start();
collectorThread->setPriority(QThread::TimeCriticalPriority);
QCoreApplication::setAttribute(Qt::AA_EnableHighDpiScaling);
QGuiApplication app(argc, argv);
QQmlApplicationEngine engine;
engine.load(QUrl(QStringLiteral("qrc:/main.qml")));
if (engine.rootObjects().isEmpty())
return -1;
int returnValue = app.exec();
// collectorThread.join();
collectorThread->quit();
collectorThread->wait();
collectorThread->deleteLater();
return returnValue;
}
stdout:
0 TestSleep: sleep lasted 101 ms
0 TestSleep: sleep lasted 100 ms
0 TestSleep: sleep lasted 101 ms
0 TestSleep: sleep lasted 100 ms
0 TestSleep: sleep lasted 102 ms
0 TestSleep: sleep lasted 100 ms
0 TestSleep: sleep lasted 102 ms
0 TestSleep: sleep lasted 101 ms
0 TestSleep: sleep lasted 101 ms
0 TestSleep: sleep lasted 101 ms
1 TestSleep: sleep lasted 100 ms
1 TestSleep: sleep lasted 101 ms
1 TestSleep: sleep lasted 101 ms
1 TestSleep: sleep lasted 101 ms
...
...
63 TestSleep: sleep lasted 100 ms
63 TestSleep: sleep lasted 101 ms
63 TestSleep: sleep lasted 102 ms
63 TestSleep: sleep lasted 101 ms
63 TestSleep: sleep lasted 101 ms
63 TestSleep: sleep lasted 7069 ms # slows down
70 TestSleep: sleep lasted 235 ms
70 TestSleep: sleep lasted 10100 ms
80 TestSleep: sleep lasted 7350 ms
88 TestSleep: sleep lasted 10100 ms
98 TestSleep: sleep lasted 3566 ms
101 TestSleep: sleep lasted 100 ms
102 TestSleep: sleep lasted 3242 ms
105 TestSleep: sleep lasted 2373 ms
107 TestSleep: sleep lasted 100 ms # click on main window
107 TestSleep: sleep lasted 101 ms # to put app on top
107 TestSleep: sleep lasted 101 ms # and back to normal
107 TestSleep: sleep lasted 101 ms # behavior
108 TestSleep: sleep lasted 101 ms
108 TestSleep: sleep lasted 102 ms
...
The behavior is the same when using std::thread instead of QThread (commented out in code).

What you are seeing is the effect of Apple's power-saving App Nap feature.
You can verify that it is App Nap that is the culprit by running Apple's Activity Manager program and looking in the "App Nap" column (you may need to right-click in the process table's header-bar to make that column visible first). If your program is being app-napped, you will see a "Yes" in that column for your program's row in the table.
If you want to programatically disable app-nap for your program you can put this Objective-C++ file into your program and call the disable_app_nap() function at the top of main():
#import <Foundation/Foundation.h>
#import <Foundation/NSProcessInfo.h>
void disable_app_nap(void)
{
if ([[NSProcessInfo processInfo] respondsToSelector:#selector(beginActivityWithOptions:reason:)])
{
[[NSProcessInfo processInfo] beginActivityWithOptions:0x00FFFFFF reason:#"Not sleepy and don't want to nap"];
}
}

This is due to App Nap, and I can reproduce the issue on macOS 10.13.4. The example below reproduces it when reproduce is set to true. When set to false, the LatencyCriticalLock takes care of ensuring that App Nap is not active.
Also note that sleep does not ensure that your operation runs with stated period - if the operation takes any time at all, even due to system load and latencies, the period will be longer than intended. The system timers on most platforms ensure that the average period is correct. The sleep-based pacing will always run at a period longer than desired.
// https://github.com/KubaO/stackoverflown/tree/master/questions/appnap-49677034
#if !defined(__APPLE__)
#error This example is for macOS
#endif
#include <QtWidgets>
#include <mutex>
#include <objc/runtime.h>
#include <objc/message.h>
// see https://stackoverflow.com/a/49679984/1329652
namespace detail { struct LatencyCriticalLock {
int count = {};
id activity = {};
id processInfo = {};
id reason = {};
std::unique_lock<std::mutex> mutex_lock() {
init();
return std::unique_lock<std::mutex>(mutex);
}
private:
std::mutex mutex;
template <typename T> static T check(T i) {
return (i != nil) ? i : throw std::runtime_error("LatencyCrticalLock init() failed");
}
void init() {
if (processInfo != nil) return;
auto const NSProcessInfo = check(objc_getClass("NSProcessInfo"));
processInfo = check(objc_msgSend((id)NSProcessInfo, sel_getUid("processInfo")));
reason = check(objc_msgSend((id)objc_getClass("NSString"), sel_getUid("alloc")));
reason = check(objc_msgSend(reason, sel_getUid("initWithUTF8String:"), "LatencyCriticalLock"));
}
}; }
class LatencyCriticalLock {
static detail::LatencyCriticalLock d;
bool locked = {};
public:
struct NoLock {};
LatencyCriticalLock &operator=(const LatencyCriticalLock &) = delete;
LatencyCriticalLock(const LatencyCriticalLock &) = delete;
LatencyCriticalLock() { lock(); }
explicit LatencyCriticalLock(NoLock) {}
~LatencyCriticalLock() { unlock(); }
void lock() {
if (locked) return;
auto l = d.mutex_lock();
assert(d.count >= 0);
if (!d.count) {
assert(d.activity == nil);
/* Start activity that tells App Nap to mind its own business: */
/* NSActivityUserInitiatedAllowingIdleSystemSleep */
/* | NSActivityLatencyCritical */
d.activity = objc_msgSend(d.processInfo, sel_getUid("beginActivityWithOptions:reason:"),
0x00FFFFFFULL | 0xFF00000000ULL, d.reason);
assert(d.activity != nil);
}
d.count ++;
locked = true;
assert(d.count > 0 && locked);
}
void unlock() {
if (!locked) return;
auto l = d.mutex_lock();
assert(d.count > 0);
if (d.count == 1) {
assert(d.activity != nil);
objc_msgSend(d.processInfo, sel_getUid("endActivity:"), d.activity);
d.activity = nil;
locked = false;
}
d.count--;
assert(d.count > 0 || d.count == 0 && !locked);
}
bool isLocked() const { return locked; }
};
detail::LatencyCriticalLock LatencyCriticalLock::d;
int main(int argc, char *argv[]) {
struct Thread : QThread {
bool reproduce = {};
void run() override {
LatencyCriticalLock lock{LatencyCriticalLock::NoLock()};
if (!reproduce)
lock.lock();
const int period = 100;
QElapsedTimer el;
el.start();
QTimer timer;
timer.setTimerType(Qt::PreciseTimer);
timer.start(period);
connect(&timer, &QTimer::timeout, [&el]{
auto const duration = el.restart();
if (duration >= 1.1*period) qWarning() << duration << " ms";
});
QEventLoop().exec();
}
~Thread() {
quit();
wait();
}
} thread;
QApplication app{argc, argv};
thread.reproduce = false;
thread.start();
QPushButton msg;
msg.setText("Click to close");
msg.showMinimized();
msg.connect(&msg, &QPushButton::clicked, &msg, &QWidget::close);
return app.exec();
}

An alternate solution which worked in this case was to increase the thread priority using the c function pthread_setschedparam, if for some reason you want an App that Naps with a background thread that does not.
int priority_max = sched_get_priority_max(SCHED_RR);
struct sched_param sp;
sp.sched_priority = priority_max;
pthread_setschedparam(_collectorThread.native_handle(), SCHED_RR, &sp);

Related

How to filter Zero division expression when I want to generate random expression

I have written a program to calculate a math expression with four basic operator.Now I want to write a program to generate expressions to test my calculate program,but there is a problem that there may be Zero division expression that will raise error.I tried to kill subprocess when it cause error,but I failed.I don't know how to avoid this problem.
#include <stdint.h>
17 #include <stdio.h>
18 #include <stdlib.h>
19 #include <time.h>
20 #include <assert.h>
21 #include <string.h>
22
23 static char buf[65536] = {};
24 static char code_buf[65536 + 128] = {}; // a little larger than `buf`
25 static char *code_format =
26 "#include <stdio.h>\n"
27 "int main() { "
28
29 " unsigned result = %s; "
30 " printf(\"%%u\", result); "
31 " return 0; "
32 "}";
33
34 static int loc=0;//used in gen_rand_expr()
int main(int argc, char *argv[]) {
183 int seed = time(0);
184 srand(seed);
185 int loop = 1;
186 if (argc > 1) {
187 sscanf(argv[1], "%d", &loop);
188 }
189 int i;
190 for (i = 0; i < loop; i ++) {
191 gen_rand_expr(); //generate random expression
192
193 loc=0;
194 sprintf(code_buf, code_format, buf);
195
196 FILE *fp = fopen("/tmp/.code.c", "w");
197 assert(fp != NULL);
198 fputs(code_buf, fp);
199 fclose(fp);
200
201 int ret = system("gcc /tmp/.code.c -o /tmp/.expr");
202 if (ret != 0) continue;
203 fp = popen("/tmp/.expr", "r");
204 assert(fp != NULL);
205
206 int result;
207 fscanf(fp, "%d", &result);
208 pclose(fp);
209
210 printf("%u %s\n", result, buf);
211 }
212 return 0;
213 }

First requests not sending in the start of the goroutine func

i am running goroutines in my code. say, if i set my threads to 50, it will not run the first 49 requests, but it will run the 50th request and continue with the rest. i am not really sure how to describe the issue i am having, and it gives no errors. this has only happened while using fasthttp, and works fine with net/http. could it be an issue with fasthttp? (this is not my whole code, just the area where i think the issues are occurring)
threads := 50
var Lock sync.Mutex
semaphore := make(chan bool, threads)
for len(userArray) != 0 {
semaphore <- true
go func() {
Lock.Lock()
var values []byte
defer func() { <-semaphore }()
fmt.Println(len(userArray))
if len(userArray) == 0 {
return
}
values, _ = json.Marshal(userArray[0])
currentArray := userArray[0]
userArray = userArray[1:]
client := &fasthttp.Client{
Dial: fasthttpproxy.FasthttpHTTPDialerTimeout(proxy, time.Second * 5),
}
time.Sleep(1 * time.Nanosecond)
Lock.Unlock()
this is the output i get (the numbers are the amount of requests left)
200
199
198
197
196
195
194
193
192
191
190
189
188
187
186
185
184
183
182
181
180
179
178
177
176
175
174
173
172
171
170
169
168
167
166
165
164
163
162
161
160
159
158
157
156
155
154
153
152
151
(10 lines of output from req 151)
150
(10 lines of output from req 150)
cont.
sorry if my explanation is confusing, i honestly don't know how to explain this error
I think the problem is with the scoping of the variables. In order to represent the queueing, I'd have a pool of parallel worker threads that all pull from the same channel and then wait for them using a waitgroup.
The exact code might need to be adapted as I don't have a go compiler at hand, but the idea is like this:
threads := 50
queueSize := 100 // trying to add more into the queue will blocke
semaphore := make(chan bool, threads)
jobQueue := make(chan MyItemType, queueSize)
var wg sync.WaitGroup
func processQueue(jobQueue <- chan MyItemType) {
defer wg.Done()
for item := range jobQueue {
values, _ = json.Marshal(item) // doesn't seem to be used?
client := &fasthttp.Client{
Dial: fasthttpproxy.FasthttpHTTPDialerTimeout(proxy, time.Second * 5),
}
}
}
for i := 0; i < threads; i++ {
wg.Add(1)
go processQueue(jobQueue)
}
close(jobQueue)
wg.Wait()
Now you can put items into jobQueue and they will be processed by one of these threads.

How to properly use if else statements and while loops with a child process in C

I'm new to C and I've been trying to create a program that takes a user input integer makes a sequence depending on whether the number is even or odd.
n / 2 if n is even
3 * n + 1 if n is odd
A new number will be computed until the sequence reaches 1. For example if a user inputs 35:
35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1
For some reason my code doesn't work after the scan statement of the child process. I left my code and sample output below:
Code:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int main()
{
pid_t pid;
int i = 0;
int j = 0;
/* fork a child process */
pid = fork();
if (pid < 0) { /* error occurred */
fprintf(stderr, "Fork Failed\n");
return 1;
}
else if (pid == 0) { /* child process */
printf("I am the child %d\n",pid);
printf("Enter a value: \n");
scanf("%d", i);
while (i < 0) {
printf("%d is not a positive integer. Please try again.\n", i);
printf("Enter a value: \n");
scanf("%d", i);
}
// can add a print i here
while (i != 1) {
if (i % 2 == 0) { // if the inputted number is even
j = i / 2;
}
else {
j = 3 * i + 1;
}
printf("%d", j);
}
}
else { /* parent process */
/* parent will wait for the child to complete */
printf("I am the parent %d\n",pid);
wait(NULL); // wait(NULL) will wait for the child process to complete and takes the status code of the child process as a parameter
printf("Child Complete\n");
}
return 0;
}
Output I'm getting on terminal in Linux (Debian):
oscreader#OSC:~/osc9e-src/ch3$ gcc newproc-posix.c
oscreader#OSC:~/osc9e-src/ch3$ ./a.out
I am the parent 16040
I am the child 0
Enter a value:
10
Child Complete
oscreader#OSC:~/osc9e-src/ch3$
Transferring comments into a semi-coherent answer.
Your calls to scanf() require a pointer argument; you give it an integer argument. Use scanf("%d", &i); — and it would be a good idea to check that scanf() returns 1 before testing the result.
My compiler told me about your bug. Why didn't your compiler do so too? Make sure you enable every warning you can! Your comment indicates that you're using gcc (or perhaps clang) — I routinely compile with:
gcc -std=c11 -O3 -g -Werror -Wall -Wextra -Wstrict-prototypes …
Indeed, for code from SO, I add -Wold-style-declarations -Wold-style-definitions to make sure functions are declared and defined properly. It's often a good idea to add -pedantic to avoid accidental use of GCC extensions.
In the loop, you don't need j — you should be changing and printing i instead.
cz17.c
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void)
{
int i = 0;
pid_t pid = fork();
if (pid < 0)
{
fprintf(stderr, "Fork Failed\n");
return 1;
}
else if (pid == 0)
{
printf("I am the child %d\n", pid);
printf("Enter a value: \n");
if (scanf("%d", &i) != 1)
{
fprintf(stderr, "failed to read an integer\n");
return 1;
}
while (i <= 0 || i > 1000000)
{
printf("value %d out of range 1..1000000. Try again.\n", i);
printf("Enter a value: \n");
if (scanf("%d", &i) != 1)
{
fprintf(stderr, "failed to read an integer\n");
return 1;
}
}
while (i != 1)
{
if (i % 2 == 0)
{
i = i / 2;
}
else
{
i = 3 * i + 1;
}
printf(" %d", i);
fflush(stdout);
}
putchar('\n');
}
else
{
printf("I am the parent of %d\n", pid);
int status;
int corpse = wait(&status);
printf("Child Complete (%d - 0x%.4X)\n", corpse, status);
}
return 0;
}
Compilation:
gcc -O3 -g -std=c11 -Wall -Wextra -Werror -Wmissing-prototypes -Wstrict-prototypes cz17.c -o cz17
Sample output:
$ cz17
I am the parent of 41838
I am the child 0
Enter a value:
2346
1173 3520 1760 880 440 220 110 55 166 83 250 125 376 188 94 47 142 71 214 107 322 161 484 242 121 364 182 91 274 137 412 206 103 310 155 466 233 700 350 175 526 263 790 395 1186 593 1780 890 445 1336 668 334 167 502 251 754 377 1132 566 283 850 425 1276 638 319 958 479 1438 719 2158 1079 3238 1619 4858 2429 7288 3644 1822 911 2734 1367 4102 2051 6154 3077 9232 4616 2308 1154 577 1732 866 433 1300 650 325 976 488 244 122 61 184 92 46 23 70 35 106 53 160 80 40 20 10 5 16 8 4 2 1
Child Complete (41838 - 0x0000)
$

Unexpected task switch on Linux despite of real time and nice -20

I have a program that needs to execute with 100% performance but I see that it is sometimes paused for more than 20 uSec. I've struggled with this for a while and can't find the reason/explanation.
So my question is:
Why is my program "paused"/"stalled" for 20 uSec every now and then?
To investigate this I wrote the following small program:
#include <string.h>
#include <iostream>
#include <signal.h>
using namespace std;
unsigned long long get_time_in_ns(void)
{
struct timespec tmp;
if (clock_gettime(CLOCK_MONOTONIC, &tmp) == 0)
{
return tmp.tv_sec * 1000000000 + tmp.tv_nsec;
}
else
{
exit(0);
}
}
bool go_on = true;
static void Sig(int sig)
{
(void)sig;
go_on = false;
}
int main()
{
unsigned long long t1=0;
unsigned long long t2=0;
unsigned long long t3=0;
unsigned long long t4=0;
unsigned long long t5=0;
unsigned long long t2saved=0;
unsigned long long t3saved=0;
unsigned long long t4saved=0;
unsigned long long t5saved=0;
struct sigaction sig;
memset(&sig, 0, sizeof(sig));
sig.sa_handler = Sig;
if (sigaction(SIGINT, &sig, 0) < 0)
{
cout << "sigaction failed" << endl;
return 0;
}
while (go_on)
{
t1 = get_time_in_ns();
t2 = get_time_in_ns();
t3 = get_time_in_ns();
t4 = get_time_in_ns();
t5 = get_time_in_ns();
if ((t2-t1)>t2saved) t2saved = t2-t1;
if ((t3-t2)>t3saved) t3saved = t3-t2;
if ((t4-t3)>t4saved) t4saved = t4-t3;
if ((t5-t4)>t5saved) t5saved = t5-t4;
cout <<
t1 << " " <<
t2-t1 << " " <<
t3-t2 << " " <<
t4-t3 << " " <<
t5-t4 << " " <<
t2saved << " " <<
t3saved << " " <<
t4saved << " " <<
t5saved << endl;
}
cout << endl << "Closing..." << endl;
return 0;
}
The program simply test how long time it takes to call the function "get_time_in_ns". The program does this 5 times in a row. The program also tracks the longest time measured.
Normally it takes 30 ns to call the function but sometimes it takes as long as 20000 ns. Which I don't understand.
A little part of the program output is:
8909078678739 37 29 28 28 17334 17164 17458 18083
8909078680355 36 30 29 28 17334 17164 17458 18083
8909078681947 38 28 28 27 17334 17164 17458 18083
8909078683521 37 29 28 27 17334 17164 17458 18083
8909078685096 39 27 28 29 17334 17164 17458 18083
8909078686665 37 29 28 28 17334 17164 17458 18083
8909078688256 37 29 28 28 17334 17164 17458 18083
8909078689827 37 27 28 28 17334 17164 17458 18083
The output shows that normal call time is approx. 30ns (column 2 to 5) but the largest time is nearly 20000ns (column 6 to 9).
I start the program like this:
chrt -f 99 nice -n -20 myprogram
Any ideas why the call sometimes takes 20000ns when it normally takes 30ns?
The program is executed on a dual Xeon (8 cores each) machine.
I connect using SSH.
top shows:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8107 root rt -20 16788 1448 1292 S 3.0 0.0 0:00.88 myprogram
2327 root 20 0 69848 7552 5056 S 1.3 0.0 0:37.07 sshd
Even the lowest value of niceness is not a real time priority — it is still in policy SCHED_OTHER, which is a round-robin time-sharing policy. You need to switch to a real time scheduling policy with sched_setscheduler(), either SCHED_FIFO or SCHED_RR as required.
Note that that will still not give you absolute 100% CPU if it isn't the only task running. If you run the task without interruption, Linux will still grant a few percent of the CPU time to non-real time tasks so that a runaway RT task will not effectively hang the machine. Of course, a real time task needing 100% CPU time is unlikely to perform correctly.
Edit: Given that the process already runs with a RT scheduler (nice values are only relevant to SCHED_OTHER, so it's pointless to set those in addition) as pointed out, the rest of my answer still applies as to how and why other tasks still are being run (remember that there are also a number kernel tasks).
The only way better than this is probably dedicating one CPU core to the task to get the most out of it. Obviously this only works on multi-core CPUs. There is a question related to that here: Whole one core dedicated to single process

Deadlock in MCS lock implementation

Hardware:
Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64 x86_64
atomics.hpp
1 #ifndef ATOMIC_UTILS_H
2 #define ATOMIC_UTILS_H
3
4 #include
5
6 #define BARRIER() __asm__ volatile ( "": : :"memory" )
7
8 #define CPU_RELAX() __asm__ volatile( "pause\n\t": : :"memory" )
9
10 #define STORE_FENCE() __asm__ volatile("mfence" ::: "memory");
11
12 class AtomicUtils
13 {
14 public:
15
16 /**
17 * check if the value at addr is equal to oldval, if so replace it with newva l
18 * and return the oldval
19 */
20 inline static size_t compareAndExchange( volatile size_t* addr, size_t oldval , size_t newval )
21 {
22 size_t ret;
23 __asm__ volatile( "lock cmpxchgq %2, %1\n\t"
24 :"=a"(ret), "+m"(*addr)
25 : "r"(newval), "0"(oldval)
26 : "memory" );
27 return ret;
28 }
29
30 /**
31 * Atomically stores x into addr and returns the previous
32 * stored in addr
33 */
34 inline static size_t loadAndStore( size_t x, volatile size_t* addr )
36 {
37 size_t ret;
38 __asm__ volatile( "lock xchgq %1, %0\n\t"
39 : "+m"(*addr), "=r"(ret)
40 : "1"(x) );
41 return ret;
42 }
43
44 };
45
46 #endif
mcs.hpp
1 #ifndef MCS_LOCK_H
2 #define MCS_LOCK_H
3
4 #include "atomics.hpp"
5 #include
6
7 class MCSLock
8 {
9 struct mcs_lock_t
10 {
11 mcs_lock_t():next(0), locked(false){}
12 struct mcs_lock_t* next;
13 bool locked;
14 };
15
16 public:
17 typedef struct mcs_lock_t mcs_lock;
18
19 private:
20 mcs_lock** tail;
21 static boost::thread_specific_ptr tls_node;
22
23 public:
24 MCSLock( mcs_lock** lock_tail ):tail( lock_tail )
25 {
26 if( tls_node.get() == 0 )
27 tls_node.reset( new mcs_lock() );
28 }
29
30 void lock()
31 {
32 mcs_lock* thread_node = tls_node.get();
33 thread_node->next = 0;
34 thread_node->locked = true;
35
36 volatile mcs_lock* pred = reinterpret_cast(
37 AtomicUtils::loadAndStore(
38 reinterpret_cast( thread_node ),
39 reinterpret_cast( tail )
40 )
41 );
42 if( pred != 0 )
43 {
44 pred->next = *tail;
45
46 STORE_FENCE();
47 //BARRIER(); // Required to prevent re ordering between prev->next = tail and thread_node->locked. ( WR harzard )
48
49 // Spin on a local variable. Someone unlock me plz !!
50 while( thread_node->locked )
51 CPU_RELAX();
52
53 }
54 }
55
56 void unlock()
57 {
58 mcs_lock* thread_node = tls_node.get();
59 if( thread_node->next == 0 )
60 {
61 // If false, then we a new thread has request for lock. Now release t he lock for the new thread
62 if(
63 AtomicUtils::compareAndExchange(
64 reinterpret_cast( tail ),
65 reinterpret_cast( thread_node ),
66 0
67 ) == reinterpret_cast( thread_node ) 68 )
69 {
70 return;
71 }
72
73 while( thread_node->next == 0 )
74 CPU_RELAX();
75 }
76
77 thread_node->next->locked = false;
78 }
79 };
80
81 boost::thread_specific_ptr MCSLock::tls_node;
82 #endif
mcs_test.cpp
1 #include "mcs.hpp"
2 #include <iostream>
3 #include <pthread.h>
4 #include <vector>
5 #define NUM_THREADS 16
6 #define NUM_ITERATIONS 100
7
8 std::vector<int> elements;
9 MCSLock::mcs_lock *tail = 0;
10
11 void* thread_run( void* data )
12 {
13 MCSLock lock( &tail );
14 for( int i = 0; i < NUM_ITERATIONS; ++i )
15 {
16 lock.lock();
17 elements.push_back( i );
18 lock.unlock();
19 }
20
21 return 0;
22 }
23
24 int main()
25 {
26 pthread_t threads[ NUM_THREADS ];
27 elements.reserve( NUM_THREADS * NUM_ITERATIONS );
28
29 {
30 for( int i = 0; i < NUM_THREADS; ++i )
31 pthread_create( &threads[i], NULL, thread_run, NULL );
32
33 for( int i = 0; i < NUM_THREADS; ++i )
34 pthread_join( threads[i], NULL );
35
36 std::cout <<"\nExiting main thread: " << std::endl;
37 }
38 }
The above code is compiled using clang
Problem:
I see that 1 or 2 threads are stuck in lock() in line 50. Except the main threads, the threads which are stuck in lock() there are no other threads alive. This means that when the other threads invoke unlock() they somehow don't set the locked = false for other variables and exit.
Any pointers on debugging this please ?
Stuck on this for many hours and no clues.
Doesn't clang have builtins for these inline-asm blocks (like gcc's __sync_val_compare_and_swap)? Why re-invent the wheel?
Second, I'd really think about adding the memory clobber to loadAndStore. You need to make sure that any writes the compiler is holding in registers gets flushed to memory before doing the xchgq. Similarly it will prevent gcc from optimizing memory reads to before the xchgq. Either would be bad.
Third, I'd examine the asm output for your while loops (thread_node->locked and thread_node->next). Since these variables are not volatile, gcc may optimize this to only perform the read once.
These may not solve your problem, but that's where I'd start.

Resources