call pthread_self() from a single-threaded application

call pthread_self() from a single-threaded application - linux

On Linux ps -Lf will display a thread ID in the column LWP and number of thread in NLWP column. Any single threaded process will have PID and LWP values the same.
What should pthread_self() return on a single threaded application? Initially I was expecting that its value should be the same as a process ID, executing this call, but results were different. Then I read man pthread_self and man gettid and learned that the value returned by pthread_self() is not the same as gettid() result.
So can I even trust pthread_self() output executed in a non-threaded environment (process)?

pthread_self is defined to return the calling thread's ID regardless of whether the program
is multi-threaded or a single-threaded.
As you found, the return value of pthread_self() isn't same as the LWP in Linux (gettid) and as such it doesn't have any meaning outside of the process; pthread_t is
an opaque type. Related: The thread ID returned by pthread_self() is not the same thing as the kernel thread ID returned by a call to gettid(2)
Its utility is very limited as there's not much practical use for pthread_t in a single-threaded program. You can use in pthread_setschedparam for example.
But if you are asking whether returns any valid value in single-threaded program, then the answer is yes.

Related

Which Linux syscall is used to get a thread's ID?

I have to implement a wrapper function that serves as pthread_self() to get a pthread ID but I've been searching and haven´t found which syscall does this. Reading another post from Stack O. I know clone() is used to create threads, also that I can trace the syscalls with ptrace() but before tracing it by hand...could someone knows which syscall is?

There are 3 different IDs for a linux process thread: pid, pthread id, and tid.
The 'pid' is global and equivalent to the parent process id, and is easily obtained by 'getpid()'. This value is unique, but only for the duration of an the active process assigned the given id. This value may be 'recycled' for a new process after a process is terminated and new ones are spawned. This value is the same across all threads, within a process. This value is what you'll see in top, and htop, 'ps -ef', and pidstat.
The 'pthread id' is reported by pthread_create() and phtread_self(). This is value is unique only within the process, and only for the duration of the assign thread. This value may be 'recycled' as threads are terminated and spawned. This value is not unique across the system, nor across threads that have been terminated and started. This value is NOT visible outside of a program. This value is opaque and may be a pointer or structure depending on the platform.
The 'tid' Thread id is reported by gettid(). This was introduced to Linux 2.4, and does not appear to be available on other platforms. This value is unique within the process and across the system. This value is reported by top and htop, and 'pidstat -t'. I'm not 100% certain, but suspect this value can be 'recycled' as processes are terminated and spawned. This is the value that appears in the Linux tools 'top','htop', 'pidstat -t', and 'ps -efL', when shown threads.
Documentation for gettid: linux.die.net/gettid
You can obtain 'gettid()' through:
#include <sys/types.h>
#include <sys/syscall.h>
#include <pthread.h>
My CentOS 6.5 is not properly setup and missing the gettid prototype, though the documentation says it should be present through the above #includes. Here is a macro that mimics 'gettid':
#ifndef gettid
// equivalent to: pid_t gettid(void)
#define gettid() syscall(SYS_gettid)
#endif
Be aware that since this is a syscall(), you'll gain efficiency by caching the result and avoiding repeatedly using the syscall().

How about syscall 0xe0, gettid()?
gettid() returns the caller's thread ID (TID). In a single-threaded process, the thread ID is equal to the process ID (PID, as returned by getpid(2)). In a multithreaded process, all threads have the same PID, but each one has a unique TID. For further details, see the discussion of CLONE_THREAD in clone(2).

In glibc, pthread_self() does not do system calls, but returns a pointer to a struct pthread, located in the TSD segment.

This might be helpfull.
UINT32 tid= syscall(SYS_gettid);

Can the thread ID of a multithreaded process be the same as the process ID of another running process?

I'm trying to find a way to uniquely identify threads in a multi-process environment. I have a server that keeps track of the different processes connecting to it, some of which are multi-threaded and some of which are not. To identify the threads from multi-threaded connections I'm using the thread ID as a unique identifier (there will be a maximum of 1 multi-threaded process connected at any given time). My question is: is it possible the thread ID of one of these threads could be the same as the process ID of another processes running on the system?
Thanks in advance for the help!

The TID (as returned by the sys_gettid() system call) is unique across all threads on the system1, and for a single-threaded process the PID and TID are equal. This means that a TID will never clash with a PID from another process.
1. With the caveat that if PID namespaces are in use, TIDs and PIDs are only unique within the same PID namespace.

According to the man page of pthreads the thread ID is unique within the creating process, so yes another thread or process could have the same ID. However, If it's unique within a process and a process ID is unique in the system then maybe you can use a combination of the two as a unique identifier.
Each of the threads in a process has a unique thread identifier
(stored in the type pthread_t). This identifier is returned to the
caller of pthread_create(3), and a thread can obtain its own thread
identifier using pthread_self(3). Thread IDs are only guaranteed to
be unique within a process.

While the pthread ID might not be unique, in a implementation where threads map to tasks, the task id (as seen in /proc/PID/task) will in fact be unique system wide, and have a form similar to an actual PID.

Well, I came across the same problem just now and here is my program for validation.
#include <pthread.h>
#include <stdio.h>
int main() {
printf("%lu\n", pthread_self());
}
clang -pthread test.c && strace ./a.out
Part of the output is as follows.
...
arch_prctl(ARCH_SET_FS, 0x7f53259be740) = 0
...
write(1, "139995089987392\n", 16139995089987392
) = 16
...
Then we know 0x7f53259be740 equals to 139995089987392 and the second argument of arch_prctl should be within the process address space(man arch_prctl). That is to say, the thread ID is a virtual address in fact. So if you use pthread_self() to identify threads in a multi-process environment, collisions may happen though it is a small chance.

pthread_equal(id1,id2)
It will compare the ID's of two threads and will return 0 if they are the same and a non zero number if they are different.

Are process ids non-negative in Linux?

I am implementing a syscall that is called in user-space, lets say by foo.
The syscall accesses foo's task_struct ( throught the global pointer current), prints it's name and pid, then goes on to foo's parent-process, foo's parent's parent etc. Prints all their names and pids up to and including the init process's.
The pid=1 is reserved for init, the pid=0 is reserved for swapper.
According to swapper's task_struct, it's parent process is itself.
Swapper (or sched) always has pid=0 and is always init's parent-process?
Are all pids non-negative? Is it ok for me to make that assumption?

To answer your questions more concisely:
IBM's Inside the Linux boot process describes the swapper process as the process having a PID value of 0. At startup this process runs /sbin/init (or another process given as parameter by the bootloader), which will typically be the process with PID 1.
In Unix systems PID values are allocated sequentially, starting from the first process and up to a maximum value specified by /proc/sys/kernel/pid_max. Therefore you can safely go under the assumption that all valid PIDs have a positive value, while negatives boil down to error values and such.
A good idea would also be accounting for zombie processes, since they can receive "special treatment" in the process tree when/if they are adopted by init.

It is always positive or 0. The kernel sources define it to be of pid_t type which, afaik is considered to be an unsigned (although it is defined as signed in order to be able to make calls such as fork return negative numbers in case of errors).

Yes, they are actually always positive.
You can verify this by several POSIX system calls, such as wait, that use negative values to represent things such as all child processes of your own, or the like, and only positive values represent valid PID's.
Example: http://linux.die.net/man/2/wait

when is the system call set_tid_address used?

i have been trying to undertand the system calls, and want to understand how set_tid_address works. bascially from what i have read is that it returns the pid of the program or process which is executed.
I have tested this with ls, however with some commands like uptime, top etc i dont see set_tid_address being used. Why is that?

The clone() syscall can take a CLONE_CHILD_CLEARTID flag, that the value at child_tidptr (another clone() argument) gets cleared and an associated futex signal a wake-up when the child thread exits. This is used to implement pthread_join() (the parent thread waits on the futex).
set_tid_address() allows to pthread_join() on the initial thread. More information in the following LKML threads:
[patch] threading fix, tid-2.5.47-A3
[patch] user-vm-unlock-2.5.31-A2
As to why some programs call set_tid_address() and others don't, the answer is easy. Programs linked (directly or indirectly) to libpthread call set_tid_address. ls is linked to librt, which is linked to libpthread, so it runs the initialization for NPTL.

According to the Linux Programmer's Manual, set_tid_address is used to:
set pointer to thread ID
When it is finished, it returns the PID of the calling process. Unfortunately the manual is vague as to when you would actually want to use this system call.
In any case, why do you think that these commands are using set_tid_address?

Difference between PID and TID

What is the difference between PID and TID?
The standard answer would be that PID is for processes while TID is for threads. However, I have seen that some commands use them interchangeably. For example, htop has a column for PIDs, in which PIDs for threads of the same process are shown (with different values). So when does a PID represent a thread or a process?

It is complicated: pid is process identifier; tid is thread identifier.
But as it happens, the kernel doesn't make a real distinction between them: threads are just like processes but they share some things (memory, fds...) with other instances of the same group.
So, a tid is actually the identifier of the schedulable object in the kernel (thread), while the pid is the identifier of the group of schedulable objects that share memory and fds (process).
But to make things more interesting, when a process has only one thread (the initial situation and in the good old times the only one) the pid and the tid are always the same. So any function that works with a tid will automatically work with a pid.
It is worth noting that many functions/system calls/command line utilities documented to work with pid actually use tids. But if the effect is process-wide you will simply not notice the difference.

Actually, each thread in a Linux process is Light Weight Process (LWP). So, people may call thread as a process... But there is surely a difference.
Each thread in a process has a different thread ID (TID) and share the same process ID (PID).
If you are working with pthread library functions, then these functions don't use these TIDs because these are kernel/OS level thread IDs.

Just to add to other answers, according to man gettid:
The thread ID returned by this call is not the same thing as a POSIX thread ID (i.e., the opaque value returned by pthread_self(3)).
So there are two different things one could mean by TID!

pid and tid are the same except when a process is created with a call to clone with CLONE_THREAD (per the man pages of gettid). In this case, you get a unique thread id but all threads belonging to the same thread group share the same process id.
However, I also recall reading (though I cant find the source) that the values returned from getpid may be cached.
[UPDATE]
See the NOTES section here for a discussion on the effects of caching pids.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string