I've found this post about changing the name of a thread.
I tried the prctl() and pthread_setname_np() functions. Both change the name of ALL my threads. In other words, it doesn't seem to work as expected.
I used:
pthread_setname_np(pthread_self(), "thread ONE");
and
pthread_setname_np(pthread_self(), "thread TWO");
Depending on which runs first, both threads say "thread ONE" or "thread TWO". I was expecting one of them to be "thread ONE" and the other to be "thread TWO".
Am I doing something wrong?
As proposed by Tzig in a comment, I tested the example as shown in the pthread_setname_np() documentation. However, I needed to test with at least two threads so I changed the code as follow to have a thread1 and thread2.
By default, I can start htop and use F4 to only show threads/processes with names including THREAD (I can also use the command line to use a different name: ./a.out MULTIFOO MULTIBAR and then use the word MULTI as the filter).
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#define NAMELEN 16
#define errExitEN(en, msg) \
do { errno = en; perror(msg); \
exit(EXIT_FAILURE); } while (0)
static void *
threadfunc(void *parm)
{
sleep(15); // allow main program to set the thread name
return NULL;
}
int
main(int argc, char **argv)
{
pthread_t thread1, thread2;
int rc;
char thread_name[NAMELEN];
rc = pthread_create(&thread1, NULL, threadfunc, NULL);
if (rc != 0)
errExitEN(rc, "pthread_create");
rc = pthread_create(&thread2, NULL, threadfunc, NULL);
if (rc != 0)
errExitEN(rc, "pthread_create");
// test name of thread1
//
rc = pthread_getname_np(thread1, thread_name, NAMELEN);
if (rc != 0)
errExitEN(rc, "pthread_getname_np");
printf("Created thread1. Default name is: %s\n", thread_name);
rc = pthread_setname_np(thread1, (argc > 1) ? argv[1] : "THREADFOO");
if (rc != 0)
errExitEN(rc, "pthread_setname_np");
sleep(2);
rc = pthread_getname_np(thread1, thread_name, NAMELEN);
if (rc != 0)
errExitEN(rc, "pthread_getname_np");
printf("The thread1 name after setting it is %s.\n", thread_name);
// test name of thread2
//
rc = pthread_getname_np(thread2, thread_name, NAMELEN);
if (rc != 0)
errExitEN(rc, "pthread_getname_np");
printf("Created thread2. Default name is: %s\n", thread_name);
rc = pthread_setname_np(thread2, (argc > 2) ? argv[2] : "THREADBAR");
if (rc != 0)
errExitEN(rc, "pthread_setname_np");
sleep(2);
rc = pthread_getname_np(thread2, thread_name, NAMELEN);
if (rc != 0)
errExitEN(rc, "pthread_getname_np");
printf("The thread2 name after setting it is %s.\n", thread_name);
// thread1 name changed too?
//
rc = pthread_getname_np(thread1, thread_name, NAMELEN);
if (rc != 0)
errExitEN(rc, "pthread_getname_np");
printf("The thread1 name after setting thread2 name is %s.\n", thread_name);
rc = pthread_join(thread1, NULL);
if (rc != 0)
errExitEN(rc, "pthread_join");
rc = pthread_join(thread2, NULL);
if (rc != 0)
errExitEN(rc, "pthread_join");
printf("Done\n");
exit(EXIT_SUCCESS);
}
P.S. There is a bug in the original:
rc = pthread_getname_np(thread, thread_name,
(argc > 2) ? atoi(argv[1]) : NAMELEN);
Notice that the atoi() uses argv[1] instead of argv[2]. (I have reported the bug to the man-pages maintainers.)
In my example, I use the second argument as the name of the second thread and always use NAMELEN as the length of my buffer. I have no reason for reducing that amount.
RESULTS:
As expected, the pthread_getname_np() works. Great!
However, the htop or cat /proc/self/task/<tid>/comm all return the last name that was set. I guess that's a bug in the Linux kernel... Yet, my process has other threads created by the NVidia driver and those have different names.
Just in case, I tried the functions found in Linux - how to change info of forked processes in C which did seem wrong since it says "fork()'ed". But since each task has its own entry under /proc... but the issue, I suppose, is that the threads share the same memory as their main process and there is only one location for the argv[0] data. In other words, they implemented a pthread_setname_np() which works internally, but does not reflect that name in tools such as ps and htop.
Okay, I found out how to make it work. You want to write the name in the proc file directly. Than it works as expected. Each thread gets its own name.
First, we need to know the thread identifier (it's number, not the pthread_id). Under Linux, you can get that information with the following function:
pid_t gettid()
{
return static_cast<pid_t>(syscall(SYS_gettid));
}
Now to setup the thread name open the comm file in write mode and write the name there:
void set_thread_name(std::string const & name)
{
if(name.length() > 15)
throw std::range_error("thread name is limited to 15 chars");
pid_t const tid(gettid());
std::ofstream comm("/proc/" + std::to_string(tid) + "/comm");
comm << name;
}
I use C++ which simplifies things. You can, of course, do the same thing in C:
void set_thread_name(const char * name)
{
pid_t tid;
char filename[6 + 5 + 5 + 1];
if(strlen(name) > 15)
{
errno = EINVAL;
return -1;
}
tid = gettid();
snprintf(filename, sizeof(filename), "/proc/%d/comm", tid);
FILE * comm(fopen(filename, "w"));
fprintf(comm, "%s", name);
fclose(comm);
}
Now each one of my thread has a different name. Thanks to Nate who gave me the idea of trying this (even if his comment does not quite read this way).
You may also want to use the pthread_setname_np(). Somehow that function will not update the name you directly wrote to the comm file.
Related
What would be your suggestion in order to create a single instance application, so that only one process is allowed to run at a time? File lock, mutex or what?
A good way is:
#include <sys/file.h>
#include <errno.h>
int pid_file = open("/var/run/whatever.pid", O_CREAT | O_RDWR, 0666);
int rc = flock(pid_file, LOCK_EX | LOCK_NB);
if(rc) {
if(EWOULDBLOCK == errno)
; // another instance is running
}
else {
// this is the first instance
}
Note that locking allows you to ignore stale pid files (i.e. you don't have to delete them). When the application terminates for any reason the OS releases the file lock for you.
Pid files are not terribly useful because they can be stale (the file exists but the process does not). Hence, the application executable itself can be locked instead of creating and locking a pid file.
A more advanced method is to create and bind a unix domain socket using a predefined socket name. Bind succeeds for the first instance of your application. Again, the OS unbinds the socket when the application terminates for any reason. When bind() fails another instance of the application can connect() and use this socket to pass its command line arguments to the first instance.
Here is a solution in C++. It uses the socket recommendation of Maxim. I like this solution better than the file based locking solution, because the file based one fails if the process crashes and does not delete the lock file. Another user will not be able to delete the file and lock it. The sockets are automatically deleted when the process exits.
Usage:
int main()
{
SingletonProcess singleton(5555); // pick a port number to use that is specific to this app
if (!singleton())
{
cerr << "process running already. See " << singleton.GetLockFileName() << endl;
return 1;
}
... rest of the app
}
Code:
#include <netinet/in.h>
class SingletonProcess
{
public:
SingletonProcess(uint16_t port0)
: socket_fd(-1)
, rc(1)
, port(port0)
{
}
~SingletonProcess()
{
if (socket_fd != -1)
{
close(socket_fd);
}
}
bool operator()()
{
if (socket_fd == -1 || rc)
{
socket_fd = -1;
rc = 1;
if ((socket_fd = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
{
throw std::runtime_error(std::string("Could not create socket: ") + strerror(errno));
}
else
{
struct sockaddr_in name;
name.sin_family = AF_INET;
name.sin_port = htons (port);
name.sin_addr.s_addr = htonl (INADDR_ANY);
rc = bind (socket_fd, (struct sockaddr *) &name, sizeof (name));
}
}
return (socket_fd != -1 && rc == 0);
}
std::string GetLockFileName()
{
return "port " + std::to_string(port);
}
private:
int socket_fd = -1;
int rc;
uint16_t port;
};
For windows, a named kernel object (e.g. CreateEvent, CreateMutex). For unix, a pid-file - create a file and write your process ID to it.
You can create an "anonymous namespace" AF_UNIX socket. This is completely Linux-specific, but has the advantage that no filesystem actually has to exist.
Read the man page for unix(7) for more info.
Avoid file-based locking
It is always good to avoid a file based locking mechanism to implement the singleton instance of an application. The user can always rename the lock file to a different name and run the application again as follows:
mv lockfile.pid lockfile1.pid
Where lockfile.pid is the lock file based on which is checked for existence before running the application.
So, it is always preferable to use a locking scheme on object directly visible to only the kernel. So, anything which has to do with a file system is not reliable.
So the best option would be to bind to a inet socket. Note that unix domain sockets reside in the filesystem and are not reliable.
Alternatively, you can also do it using DBUS.
It's seems to not be mentioned - it is possible to create a mutex in shared memory but it needs to be marked as shared by attributes (not tested):
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_mutex_t *mutex = shmat(SHARED_MEMORY_ID, NULL, 0);
pthread_mutex_init(mutex, &attr);
There is also shared memory semaphores (but I failed to find out how to lock one):
int sem_id = semget(SHARED_MEMORY_KEY, 1, 0);
No one has mentioned it, but sem_open() creates a real named semaphore under modern POSIX-compliant OSes. If you give a semaphore an initial value of 1, it becomes a mutex (as long as it is strictly released only if a lock was successfully obtained).
With several sem_open()-based objects, you can create all of the common equivalent Windows named objects - named mutexes, named semaphores, and named events. Named events with "manual" set to true is a bit more difficult to emulate (it requires four semaphore objects to properly emulate CreateEvent(), SetEvent(), and ResetEvent()). Anyway, I digress.
Alternatively, there is named shared memory. You can initialize a pthread mutex with the "shared process" attribute in named shared memory and then all processes can safely access that mutex object after opening a handle to the shared memory with shm_open()/mmap(). sem_open() is easier if it is available for your platform (if it isn't, it should be for sanity's sake).
Regardless of the method you use, to test for a single instance of your application, use the trylock() variant of the wait function (e.g. sem_trywait()). If the process is the only one running, it will successfully lock the mutex. If it isn't, it will fail immediately.
Don't forget to unlock and close the mutex on application exit.
It will depend on which problem you want to avoid by forcing your application to have only one instance and the scope on which you consider instances.
For a daemon — the usual way is to have a /var/run/app.pid file.
For user application, I've had more problems with applications which prevented me to run them twice than with being able to run twice an application which shouldn't have been run so. So the answer on "why and on which scope" is very important and will probably bring answer specific on the why and the intended scope.
Here is a solution based on sem_open
/*
*compile with :
*gcc single.c -o single -pthread
*/
/*
* run multiple instance on 'single', and check the behavior
*/
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <semaphore.h>
#include <unistd.h>
#include <errno.h>
#define SEM_NAME "/mysem_911"
int main()
{
sem_t *sem;
int rc;
sem = sem_open(SEM_NAME, O_CREAT, S_IRWXU, 1);
if(sem==SEM_FAILED){
printf("sem_open: failed errno:%d\n", errno);
}
rc=sem_trywait(sem);
if(rc == 0){
printf("Obtained lock !!!\n");
sleep(10);
//sem_post(sem);
sem_unlink(SEM_NAME);
}else{
printf("Lock not obtained\n");
}
}
One of the comments on a different answer says "I found sem_open() rather lacking". I am not sure about the specifics of what's lacking
Based on the hints in maxim's answer here is my POSIX solution of a dual-role daemon (i.e. a single application that can act as daemon and as a client communicating with that daemon). This scheme has the advantage of providing an elegant solution of the problem when the instance started first should be the daemon and all following executions should just load off the work at that daemon. It is a complete example but lacks a lot of stuff a real daemon should do (e.g. using syslog for logging and fork to put itself into background correctly, dropping privileges etc.), but it is already quite long and is fully working as is. I have only tested this on Linux so far but IIRC it should be all POSIX-compatible.
In the example the clients can send integers passed to them as first command line argument and parsed by atoi via the socket to the daemon which prints it to stdout. With this kind of sockets it is also possible to transfer arrays, structs and even file descriptors (see man 7 unix).
#include <stdio.h>
#include <stddef.h>
#include <stdbool.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <signal.h>
#include <sys/socket.h>
#include <sys/un.h>
#define SOCKET_NAME "/tmp/exampled"
static int socket_fd = -1;
static bool isdaemon = false;
static bool run = true;
/* returns
* -1 on errors
* 0 on successful server bindings
* 1 on successful client connects
*/
int singleton_connect(const char *name) {
int len, tmpd;
struct sockaddr_un addr = {0};
if ((tmpd = socket(AF_UNIX, SOCK_DGRAM, 0)) < 0) {
printf("Could not create socket: '%s'.\n", strerror(errno));
return -1;
}
/* fill in socket address structure */
addr.sun_family = AF_UNIX;
strcpy(addr.sun_path, name);
len = offsetof(struct sockaddr_un, sun_path) + strlen(name);
int ret;
unsigned int retries = 1;
do {
/* bind the name to the descriptor */
ret = bind(tmpd, (struct sockaddr *)&addr, len);
/* if this succeeds there was no daemon before */
if (ret == 0) {
socket_fd = tmpd;
isdaemon = true;
return 0;
} else {
if (errno == EADDRINUSE) {
ret = connect(tmpd, (struct sockaddr *) &addr, sizeof(struct sockaddr_un));
if (ret != 0) {
if (errno == ECONNREFUSED) {
printf("Could not connect to socket - assuming daemon died.\n");
unlink(name);
continue;
}
printf("Could not connect to socket: '%s'.\n", strerror(errno));
continue;
}
printf("Daemon is already running.\n");
socket_fd = tmpd;
return 1;
}
printf("Could not bind to socket: '%s'.\n", strerror(errno));
continue;
}
} while (retries-- > 0);
printf("Could neither connect to an existing daemon nor become one.\n");
close(tmpd);
return -1;
}
static void cleanup(void) {
if (socket_fd >= 0) {
if (isdaemon) {
if (unlink(SOCKET_NAME) < 0)
printf("Could not remove FIFO.\n");
} else
close(socket_fd);
}
}
static void handler(int sig) {
run = false;
}
int main(int argc, char **argv) {
switch (singleton_connect(SOCKET_NAME)) {
case 0: { /* Daemon */
struct sigaction sa;
sa.sa_handler = &handler;
sigemptyset(&sa.sa_mask);
if (sigaction(SIGINT, &sa, NULL) != 0 || sigaction(SIGQUIT, &sa, NULL) != 0 || sigaction(SIGTERM, &sa, NULL) != 0) {
printf("Could not set up signal handlers!\n");
cleanup();
return EXIT_FAILURE;
}
struct msghdr msg = {0};
struct iovec iovec;
int client_arg;
iovec.iov_base = &client_arg;
iovec.iov_len = sizeof(client_arg);
msg.msg_iov = &iovec;
msg.msg_iovlen = 1;
while (run) {
int ret = recvmsg(socket_fd, &msg, MSG_DONTWAIT);
if (ret != sizeof(client_arg)) {
if (errno != EAGAIN && errno != EWOULDBLOCK) {
printf("Error while accessing socket: %s\n", strerror(errno));
exit(1);
}
printf("No further client_args in socket.\n");
} else {
printf("received client_arg=%d\n", client_arg);
}
/* do daemon stuff */
sleep(1);
}
printf("Dropped out of daemon loop. Shutting down.\n");
cleanup();
return EXIT_FAILURE;
}
case 1: { /* Client */
if (argc < 2) {
printf("Usage: %s <int>\n", argv[0]);
return EXIT_FAILURE;
}
struct iovec iovec;
struct msghdr msg = {0};
int client_arg = atoi(argv[1]);
iovec.iov_base = &client_arg;
iovec.iov_len = sizeof(client_arg);
msg.msg_iov = &iovec;
msg.msg_iovlen = 1;
int ret = sendmsg(socket_fd, &msg, 0);
if (ret != sizeof(client_arg)) {
if (ret < 0)
printf("Could not send device address to daemon: '%s'!\n", strerror(errno));
else
printf("Could not send device address to daemon completely!\n");
cleanup();
return EXIT_FAILURE;
}
printf("Sent client_arg (%d) to daemon.\n", client_arg);
break;
}
default:
cleanup();
return EXIT_FAILURE;
}
cleanup();
return EXIT_SUCCESS;
}
All credits go to Mark Lakata. I merely did some very minor touch up only.
main.cpp
#include "singleton.hpp"
#include <iostream>
using namespace std;
int main()
{
SingletonProcess singleton(5555); // pick a port number to use that is specific to this app
if (!singleton())
{
cerr << "process running already. See " << singleton.GetLockFileName() << endl;
return 1;
}
// ... rest of the app
}
singleton.hpp
#include <netinet/in.h>
#include <unistd.h>
#include <cerrno>
#include <string>
#include <cstring>
#include <stdexcept>
using namespace std;
class SingletonProcess
{
public:
SingletonProcess(uint16_t port0)
: socket_fd(-1)
, rc(1)
, port(port0)
{
}
~SingletonProcess()
{
if (socket_fd != -1)
{
close(socket_fd);
}
}
bool operator()()
{
if (socket_fd == -1 || rc)
{
socket_fd = -1;
rc = 1;
if ((socket_fd = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
{
throw std::runtime_error(std::string("Could not create socket: ") + strerror(errno));
}
else
{
struct sockaddr_in name;
name.sin_family = AF_INET;
name.sin_port = htons (port);
name.sin_addr.s_addr = htonl (INADDR_ANY);
rc = bind (socket_fd, (struct sockaddr *) &name, sizeof (name));
}
}
return (socket_fd != -1 && rc == 0);
}
std::string GetLockFileName()
{
return "port " + std::to_string(port);
}
private:
int socket_fd = -1;
int rc;
uint16_t port;
};
#include <windows.h>
int main(int argc, char *argv[])
{
// ensure only one running instance
HANDLE hMutexH`enter code here`andle = CreateMutex(NULL, TRUE, L"my.mutex.name");
if (GetLastError() == ERROR_ALREADY_EXISTS)
{
return 0;
}
// rest of the program
ReleaseMutex(hMutexHandle);
CloseHandle(hMutexHandle);
return 0;
}
FROM: HERE
On Windows you could also create a shared data segment and use an interlocked function to test for the first occurence, e.g.
#include <Windows.h>
#include <stdio.h>
#include <conio.h>
#pragma data_seg("Shared")
volatile LONG lock = 0;
#pragma data_seg()
#pragma comment(linker, "/SECTION:Shared,RWS")
void main()
{
if (InterlockedExchange(&lock, 1) == 0)
printf("first\n");
else
printf("other\n");
getch();
}
I have just written one, and tested.
#define PID_FILE "/tmp/pidfile"
static void create_pidfile(void) {
int fd = open(PID_FILE, O_RDWR | O_CREAT | O_EXCL, 0);
close(fd);
}
int main(void) {
int fd = open(PID_FILE, O_RDONLY);
if (fd > 0) {
close(fd);
return 0;
}
// make sure only one instance is running
create_pidfile();
}
Just run this code on a seperate thread:
void lock() {
while(1) {
ofstream closer("myapplock.locker", ios::trunc);
closer << "locked";
closer.close();
}
}
Run this as your main code:
int main() {
ifstream reader("myapplock.locker");
string s;
reader >> s;
if (s != "locked") {
//your code
}
return 0;
}
Parent receives SIGPIPE sending chars to aborted child process through FIFO pipe.
I am trying to avoid this, using select() function. In the attached sample code,
select() retruns OK even after the child at the other end of pipe having been terminated.
Tested in
RedHat EL5 (Linux 2.6.18-194.32.1.el5)
GNU C Library stable release version 2.5
Any help appreciated. Thnak you.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/stat.h>
#include <unistd.h>
static void sigpipe_fct();
main()
{
struct stat st;
int i, fd_out, fd_in, child;
char buf[1024];
#define p_out "/tmp/pout"
signal(SIGPIPE, sigpipe_fct);
if (stat(p_out, &st) != 0) {
mknod(p_out, S_IFIFO, 0);
chmod(p_out, 0666);
}
/* start receiving process */
if ((child = fork()) == 0) {
if ((fd_in = open(p_out, O_RDONLY)) < 0) {
perror(p_out);
exit(1);
}
while(1) {
i = read(fd_in, buf, sizeof(buf));
fprintf(stderr, "child %d read %.*s\n", getpid(), i, buf);
lseek(fd_in, 0, 0);
}
}
else {
fprintf(stderr,
"reading from %s - exec \"kill -9 %d\" to test\n", p_out, child);
if ((fd_out = open(p_out, O_WRONLY + O_NDELAY)) < 0) { /* output */
perror(p_out);
exit(1);
}
while(1) {
if (SelectChkWrite(fd_out) == fd_out) {
fprintf(stderr, "SelectChkWrite() success write abc\n");
write(fd_out, "abc", 3);
}
else
fprintf(stderr, "SelectChkWrite() failed\n");
sleep(3);
}
}
}
static void sigpipe_fct()
{
fprintf(stderr, "SIGPIPE received\n");
exit(-1);
}
SelectChkWrite(ch)
int ch;
{
#include <sys/select.h>
fd_set writefds;
int i;
FD_ZERO(&writefds);
FD_SET (ch, &writefds);
i = select(ch + 1, NULL, &writefds, NULL, NULL);
if (i == -1)
return(-1);
else if (FD_ISSET(ch, &writefds))
return(ch);
else
return(-1);
}
From the Linux select(3) man page:
A descriptor shall be considered ready for writing when a call to an
output function with O_NONBLOCK clear would not block, whether or not
the function would transfer data successfully.
When the pipe is closed, it won't block, so it is considered "ready" by select.
BTW, having #include <sys/select.h> inside your SelectChkWrite() function is extremely bad form.
Although select() and poll() are both in the POSIX standard, select() is much older and more limited than poll(). In general, I recommend people use poll() by default and only use select() if they have a good reason. (See here for one example.)
I want to use eventfd as a way to signal simple events between kernelspace and userspace. eventfd will be used as a way to signal and the actual data will be transferred using ioctl.
Before going ahead with implementing this, I wrote a simple program to see how eventfd behaves with select(). It seems that if you use select to wait on an eventfd, it wont return when u write to it in a separate thread. In the code I wrote, the writing thread waits for 5 seconds beginning from program start before writing to the eventfd twice. I would expect the select() to return in the reading thread immediately following this write but this does not happen. The select() returns only after the timeout of 10 seconds and returns zero. Regardless of this return zero, when I try to read the eventfd after 10 seconds, I get the correct value.
I use Ubuntu 12.04.1 (3.2.0-29-generic-pae) i386
Any idea why this is so? It seems to me that select() is not working as it should.
PS: This question is similar to linux - Can't get eventfd to work with epoll together
Is anyone else facing similar issues?
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h> //Definition of uint64_t
#include <pthread.h> //One thread writes to fd, other waits on it and then reads it
#include <time.h> //Writing thread uses delay before writing
#include <sys/eventfd.h>
int efd; //Event file descriptor
void * writing_thread_func() {
uint64_t eftd_ctr = 34;
ssize_t s;
printf("\n%s: now running...",__func__);
printf("\n%s: now sleeping for 5 seconds...",__func__);
fflush(stdout); //must call fflush before sleeping to ensure previous printf() is executed
sleep(5);
printf("\n%s: Writing %lld to eventfd...",__func__,eftd_ctr);
s = write(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)) {
printf("\n%s: eventfd writing error. Exiting...",__func__);
exit(EXIT_FAILURE);
}
eftd_ctr = 99;
printf("\n%s: Writing %lld to eventfd...",__func__,eftd_ctr);
s = write(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)) {
printf("\n%s: eventfd writing error. Exiting...",__func__);
exit(EXIT_FAILURE);
}
printf("\n%s: thread exiting...",__func__);
pthread_exit(0);
}
void * reading_thread_func() {
ssize_t s;
uint64_t eftd_ctr;
int retval; //for select()
fd_set rfds; //for select()
struct timeval tv; //for select()
printf("\n%s: now running...",__func__);
printf("\n%s: now waiting on select()...",__func__);
//Watch efd
FD_ZERO(&rfds);
FD_SET(efd, &rfds);
//Wait up to 10 seconds
tv.tv_sec = 10;
tv.tv_usec = 0;
retval = select(1, &rfds, NULL, NULL, &tv);
if (retval == -1){
printf("\n%s: select() error. Exiting...",__func__);
exit(EXIT_FAILURE);
} else if (retval > 0) {
printf("\n%s: select() says data is available now. Exiting...",__func__);
printf("\n%s: returned from select(), now executing read()...",__func__);
s = read(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)){
printf("\n%s: eventfd read error. Exiting...",__func__);
exit(EXIT_FAILURE);
}
printf("\n%s: Returned from read(), value read = %lld",__func__, eftd_ctr);
} else if (retval == 0) {
printf("\n%s: select() says that no data was available even after 10 seconds...",__func__);
printf("\n%s: but lets try reading efd count anyway...",__func__);
s = read(efd, &eftd_ctr, sizeof(uint64_t));
if (s != sizeof(uint64_t)){
printf("\n%s: eventfd read error. Exiting...",__func__);
exit(EXIT_FAILURE);
}
printf("\n%s: Returned from read(), value read = %lld",__func__, eftd_ctr);
exit(EXIT_FAILURE);
}
printf("\n%s: thread exiting...",__func__);
pthread_exit(0);
}
int main() {
pthread_t writing_thread_var, reading_thread_var;
//Create eventfd
efd = eventfd(0,0);
if (efd == -1){
printf("\n%s: Unable to create eventfd! Exiting...",__func__);
exit(EXIT_FAILURE);
}
printf("\n%s: eventfd created. value = %d. Spawning threads...",__func__,efd);
//Create threads
pthread_create(&writing_thread_var, NULL, writing_thread_func, NULL);
pthread_create(&reading_thread_var, NULL, reading_thread_func, NULL);
//Wait for threads to terminate
pthread_join(writing_thread_var, NULL);
pthread_join(reading_thread_var, NULL);
printf("\n%s: closing eventfd. Exiting...",__func__);
close(efd);
exit(EXIT_SUCCESS);
}
So it was a silly mistake:
I changed:
retval = select(1, &rfds, NULL, NULL, &tv);
to:
retval = select(efd+1, &rfds, NULL, NULL, &tv);
and it worked.
Thanks again #Steve-o
I wrote a code to create some threads and whenever one of the threads finish a new thread is created to replace it. As I was not able to create very large number of threads (>450) using pthreads, I used clone system call instead. (Please note that I am aware of the implication of having such a huge number of threads, but this program is meant to only stress the system).
As clone() requires the stack space for the child thread to be specified as parameter, I malloc the required chunk of stack space for each thread and free it up when the thread finishes. When a thread finishes I send a signal to the parent to notify it of the same.
The code is given below:
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <errno.h>
#define NUM_THREADS 5
unsigned long long total_count=0;
int num_threads = NUM_THREADS;
static int thread_pids[NUM_THREADS];
static void *thread_stacks[NUM_THREADS];
int ppid;
int worker() {
int i;
union sigval s={0};
for(i=0;i!=99999999;i++);
if(sigqueue(ppid, SIGUSR1, s)!=0)
fprintf(stderr, "ERROR sigqueue");
fprintf(stderr, "Child [%d] done\n", getpid());
return 0;
}
void sigint_handler(int signal) {
char fname[35]="";
FILE *fp;
int ch;
if(signal == SIGINT) {
fprintf(stderr, "Caught SIGINT\n");
sprintf(fname, "/proc/%d/status", getpid());
fp = fopen(fname,"r");
while((ch=fgetc(fp))!=EOF)
fprintf(stderr, "%c", (char)ch);
fclose(fp);
fprintf(stderr, "No. of threads created so far = %llu\n", total_count);
exit(0);
} else
fprintf(stderr, "Unhandled signal (%d) received\n", signal);
}
int main(int argc, char *argv[]) {
int rc, i; long t;
void *chld_stack, *chld_stack2;
siginfo_t siginfo;
sigset_t sigset, oldsigset;
if(argc>1) {
num_threads = atoi(argv[1]);
if(num_threads<1) {
fprintf(stderr, "Number of threads must be >0\n");
return -1;
}
}
signal(SIGINT, sigint_handler);
/* Block SIGUSR1 */
sigemptyset(&sigset);
sigaddset(&sigset, SIGUSR1);
if(sigprocmask(SIG_BLOCK, &sigset, &oldsigset)==-1)
fprintf(stderr, "ERROR: cannot block SIGUSR1 \"%s\"\n", strerror(errno));
printf("Number of threads = %d\n", num_threads);
ppid = getpid();
for(t=0,i=0;t<num_threads;t++,i++) {
chld_stack = (void *) malloc(148*512);
chld_stack2 = ((char *)chld_stack + 148*512 - 1);
if(chld_stack == NULL) {
fprintf(stderr, "ERROR[%ld]: malloc for stack-space failed\n", t);
break;
}
rc = clone(worker, chld_stack2, CLONE_VM|CLONE_FS|CLONE_FILES, NULL);
if(rc == -1) {
fprintf(stderr, "ERROR[%ld]: return code from pthread_create() is %d\n", t, errno);
break;
}
thread_pids[i]=rc;
thread_stacks[i]=chld_stack;
fprintf(stderr, " [index:%d] = [pid:%d] ; [stack:0x%p]\n", i, thread_pids[i], thread_stacks[i]);
total_count++;
}
sigemptyset(&sigset);
sigaddset(&sigset, SIGUSR1);
while(1) {
fprintf(stderr, "Waiting for signal from childs\n");
if(sigwaitinfo(&sigset, &siginfo) == -1)
fprintf(stderr, "- ERROR returned by sigwaitinfo : \"%s\"\n", strerror(errno));
fprintf(stderr, "Got some signal from pid:%d\n", siginfo.si_pid);
/* A child finished, free the stack area allocated for it */
for(i=0;i<NUM_THREADS;i++) {
fprintf(stderr, " [index:%d] = [pid:%d] ; [stack:%p]\n", i, thread_pids[i], thread_stacks[i]);
if(thread_pids[i]==siginfo.si_pid) {
free(thread_stacks[i]);
thread_stacks[i]=NULL;
break;
}
}
fprintf(stderr, "Search for child ended with i=%d\n",i);
if(i==NUM_THREADS)
continue;
/* Create a new thread in its place */
chld_stack = (void *) malloc(148*512);
chld_stack2 = ((char *)chld_stack + 148*512 - 1);
if(chld_stack == NULL) {
fprintf(stderr, "ERROR[%ld]: malloc for stack-space failed\n", t);
break;
}
rc = clone(worker, chld_stack2, CLONE_VM|CLONE_FS|CLONE_FILES, NULL);
if(rc == -1) {
fprintf(stderr, "ERROR[%ld]: return code from clone() is %d\n", t, errno);
break;
}
thread_pids[i]=rc;
thread_stacks[i]=chld_stack;
total_count++;
}
fprintf(stderr, "Broke out of infinite loop. [total_count=%llu] [i=%d]\n",total_count, i);
return 0;
}
I have used couple of arrays to keep track of the child processes' pid and the stack area base address (for freeing it).
When I run this program it terminates after sometime. Running with gdb tells me that one of the thread gets a SIGSEGV (segmentation fault). But it doesn't gives me any location, the output is similar to the following:
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 15864]
0x00000000 in ?? ()
I tried running it under valgrind with the following commandline:
valgrind --tool=memcheck --leak-check=yes --show-reachable=yes -v --num-callers=20 --track-fds=yes ./a.out
But it keeps running without any issues under valgrind.
I am puzzled as to how to debug this program. I felt that this might be some stack overflow or something but increasing the stack size (upto 74KB) didn't solved the problem.
My only query is why and where is the segmentation fault or how to debug this program.
Found the actual issue.
When the worker thread signals the parent process using sigqueue(), the parent sometimes gets the control immediately and frees up the stack before the child executes the return statement. When the same child thread uses return statement, it causes segmentation fault as the stack got corrupted.
To solve this I replaced
exit(0)
instead of
return 0;
I think i found the answer
Step 1
Replace this:
static int thread_pids[NUM_THREADS];
static void *thread_stacks[NUM_THREADS];
By this:
static int *thread_pids;
static void **thread_stacks;
Step 2
Add this in the main function (after checking arguments):
thread_pids = malloc(sizeof(int) * num_threads);
thread_stacks = malloc(sizeof(void *) * num_threads);
Step 3
Replace this:
chld_stack2 = ((char *)chld_stack + 148*512 - 1);
By this:
chld_stack2 = ((char *)chld_stack + 148*512);
In both places you use it.
I dont know if its really your problem, but after testing it i didnt get any segmentation fault. Btw i did only get segmentation faults when using more than 5 threads.
Hope i helped!
edit: tested with 1000 threads and runs perfectly
edit2: Explanation why the static allocation of thread_pids and thread_stacks causes an error.
The best way to do this is with an example.
Assume num_threads = 10;
The problem occurs in the following code:
for(t=0,i=0;t<num_threads;t++,i++) {
...
thread_pids[i]=rc;
thread_stacks[i]=chld_stack;
...
}
Here you try to access memory which does not belong to you (0 <= i <= 9, but both arrays have a size of 5). That can cause either segmentation fault or data corruption. Data corruption may happen if both arrays are allocated one after the other, resulting in writing to the other array. Segmentation can happen if you write in memory you dont have allocated (statically or dynamically).
You may be lucky and have no errors at all, but the code is surely not safe.
About the non-aligned pointer: I think i dont have to explain more than in my comment.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <semaphore.h>
void *thread_function(void *arg);
sem_t bin_sem;
#define WORK_SIZE 1024
char work_area[WORK_SIZE];
int main() {
int res;
pthread_t a_thread;
void *thread_result;
res = sem_init(&bin_sem, 0, 0);
if (res != 0) {
perror(“Semaphore initialization failed”);
exit(EXIT_FAILURE);
}
res = pthread_create(&a_thread, NULL, thread_function, NULL);
if (res != 0) {
perror(“Thread creation failed”);
exit(EXIT_FAILURE);
}
printf(“Input some text. Enter ‘end’ to finish\n”);
while(strncmp(“end”, work_area, 3) != 0) {
fgets(work_area, WORK_SIZE, stdin);
sem_post(&bin_sem);
}
printf(“\nWaiting for thread to finish...\n”);
res = pthread_join(a_thread, &thread_result);
if (res != 0) {
perror(“Thread join failed”);
exit(EXIT_FAILURE);
}
printf(“Thread joined\n”);
sem_destroy(&bin_sem);
exit(EXIT_SUCCESS);
}
void *thread_function(void *arg) {
sem_wait(&bin_sem);
while(strncmp(“end”, work_area, 3) != 0) {
printf(“You input %d characters\n”, strlen(work_area) -1);
sem_wait(&bin_sem);}
pthread_exit(NULL);
}
In the program above, when the semaphore is released using sem_post(), is it
possible that the fgets and the counting function in thread_function execute
simultaneously .And I think this program fails in allowing the second thread
to count the characters before the main thread reads the keyboard again.
Is that right?
The second thread will only read characters after sem_wait has returned, signaling that a sem_post has been called somewhere, so I think that is fine.
As for fgets and the counting function, those two could be running simultaneously.
I would recommend a mutex lock on the work_area variable in this case, because if the user is editing the variable in one thread while it is being read in another thread, problems will occur.
You can either use a mutex or you can use a semaphore and set the initial count on it to 1.
If you implement a mutex or use a semaphore like that though, make sure to put the mutex_lock after sema_wait, or else a deadlock may occur.
In this example you want to have a mutex around the read & writes of the shared memory.
I know this is an example, but the following code:
fgets(work_area, WORK_SIZE, stdin);
Should really be:
fgets(work_area, sizeof(work_area), stdin);
If you change the size of work_area in the future (to some other constant, etc), it's quite likely that changing this second WORK_SIZE could be missed.