Why only __add_wait_queue(q, wait) when wait is empty? - linux

https://elixir.bootlin.com/linux/v4.5/source/kernel/sched/wait.c#L172
void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state)
{
unsigned long flags;
wait->flags &= ~WQ_FLAG_EXCLUSIVE;
spin_lock_irqsave(&q->lock, flags);
if (list_empty(&wait->task_list))
__add_wait_queue(q, wait);
set_current_state(state);
spin_unlock_irqrestore(&q->lock, flags);
}
In above code, we can see __add_wait_queue(q, wait) only executed when list_empty(&wait->task_list) is true.
Why when &wait->task_list is not empty, then wait don't need to be added to q (wait_queue_head_t)?
Does that mean if wait (wait_queue_t) already in a q (wait_queue_head_t ) then don't change it?

Does that mean if wait (wait_queue_t) already in a q (wait_queue_head_t ), then don't change it?
Yes, the branch
if (list_empty(&wait->task_list))
__add_wait_queue(q, wait);
means, that wait is added to the wait queue q only if wait hasn't belong to any queue already.
Otherwise, if it is determined that wait has already belonged to (some) wait queue, it is assumed that wait belongs to q specifically, and it is not added again.
There is some specific with calling list_empty function for an object, which can be a list's element (not a head of the list).
list_empty always returns false, if object belongs to a list.
But if the object doesn't belong to any list, then return value is generally unspecified (and in the most cases it is false too).
Exception is an object, initialized with INIT_LIST_HEAD function or LIST_HEAD_INIT macro or deleted from the list with list_del_init function: in such cases list_empty returns true with guarantee.
If look for the usage of INIT_LIST_HEAD, LIST_HEAD_INIT or list_del_init in the wait.h header, then it can be found that prepare_to_wait function is allowed only for wait object:
Created with DEFINE_WAIT macro or one of DEFINE_WAIT_* macros.
Initialized with init_wait function, which is called e.g. from one of wait_event_* macros.
Which has been passed to finish_wait function.
But prepare_to_wait function cannot be used for a wait object, created with DECLARE_WAITQUEUE macro: this macro initializes the task_list field with {NULL, NULL}, so list_empty would return false for it (as if the wait object is already added into the wait queue).

Related

Using fetch-and-add as lock

I am trying to understand how fetch-and-add can be used as a lock. Here is what the book (OS's: 3 Easy pieces) says:
The basic operation is pretty simple: when
a thread wishes to acquire a lock, it first does an atomic fetch-and-add
on the ticket value; that value is now considered this thread’s “turn”
(myturn). The globally shared lock->turn is then used to determine
which thread’s turn it is; when (myturn == turn) for a given thread,
it is that thread’s turn to enter the critical section.
What I do not understand is how the thread checks if the lock held by another process before entering the cretical seection. All I can read that the value will be incremented, no mention of checks!
Another part says:
Unlock is accomplished
simply by incrementing the turn such that the next waiting thread (if
there is one) can now enter the critical section.
Which I can not interpret in a way where checks will not be performed, which can not be true becuase it compremises the whole porpose of locking cretical sections. What am I fmissing here? Thanks.
What I do not understand is how the thread checks if the lock held by another process before entering the cretical seection.
You need an "atomic fetch" for this, maybe something like "while( atomic_fetch(currently_serving) != my_ticket) { /* wait */ }".
If you have "atomic fetch and add", then you can implement "atomic fetch" by doing "atomic fetch and add the value zero", maybe something like "while( atomic_fetch_and_add(currently_serving, 0) != my_ticket) { /* wait */ }".
For reference; the full sequence could be something like:
my_ticket = atomic_fetch_and_add(ticket_counter, 1);
while( atomic_fetch_and_add(currently_serving, 0) != my_ticket) {
/* wait */
}
/* Critical section (lock successfully acquired). */
atomic_fetch_and_add(currently_serving, 1); /* Release the lock */
Of course you might have a better atomic fetch you can use instead (e.g. for some CPUs any normal aligned load is atomic).

linux socket: lifetime of ancillary data for sendmsg

I use cmsg to activate timestamping on linux socket tx.
ssize_t sendWithOptions
(int sd, std::vector<uint8_t> &payload, uint32_t destIP, int flags)
{
msghdr msg { };
.... // filling standard
std::array<uint8_t, CMSG_LEN(sizeof(__u32))> buf;
msg.msg_control = buf.data();
msg.msg_controlen = buf.size();
auto cmsg { CMSG_FIRSTHDR ( &msg ) };
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SO_TIMESTAMPING;
cmsg->cmsg_len = buf.size();
*(reinterpret_cast<__u32>(CMSG_DATA (cmsg)) = static_cast<__u32>(flags);
return sendmsg ( sd, &msg, MSG_DONTWAIT );
}
Leaving the function, "buf" is automatically destroyed, but does sendmsg need this buffer to live longer?
Do I have a guarantee that the function does not need this buffer once it has returned the number of bytes sent.
Except for specific interfaces, it is generally the case that operating system calls do not rely on user-space to maintain data structures affecting their operation after they are finished. The exceptions will be spelled out in the manual pages.
With sendmsg, in particular, you can rely on the call to complete immediately - whether successful or not. It's fine therefore to use a dynamically allocated buffer as you're doing, and destroy it immediately after the call.
As an example of one exception, aio_write(2) is specifically intended to allow user-space to queue a write operation that will be completed asynchronously. For this call, the data is not consumed until it can be successfully written. Hence, you must not modify the data structures provided in the call until you have confirmed it is complete. That caveat is called out in the NOTES section of the manual page:
... The control block must not be changed while the write operation is in progress. The buffer area being written out must not be accessed during the operation or undefined results may occur. The memory areas involved must remain valid.
In summary: check the manual page for the system call. But most of the time, you don't need to worry about it.

Safely close an indefinitely running thread

So first off, I realize that if my code was in a loop I could use a do while loop to check a variable set when I want the thread to close, but in this case that is not possible (so it seems):
DWORD WINAPI recv thread (LPVOID random) {
recv(ClientSocket, recvbuffer, recvbuflen, 0);
return 1;
}
In the above, recv() is a blocking function.
(Please pardon me if the formatting isn't correct. It's the best I can do on my phone.)
How would I go about terminating this thread since it never closes but never loops?
Thanks,
~P
Amongst other solutions you can
a) set a timeout for the socket and handle timeouts correctly by checking the return values and/or errors in an appropriate loop:
setsockopt(ClientSocket,SOL_SOCKET,SO_RCVTIMEO,(char *)&timeout,sizeof(timeout))
b) close the socket with recv(..) returning from blocked state with error.
You can use poll before recv() to check if some thing there to receive.
struct pollfd poll;
int res;
poll.fd = ClientSocket;
poll.events = POLLIN;
res = poll(&poll, 1, 1000); // 1000 ms timeout
if (res == 0)
{
// timeout
}
else if (res == -1)
{
// error
}
else
{
// implies (poll.revents & POLLIN) != 0
recv(ClientSocket, recvbuffer, recvbuflen,0); // we can read ...
}
The way I handle this problem is to never block inside recv() -- preferably by setting the socket to non-blocking mode, but you may also be able to get away with simply only calling recv() when you know the socket currently has some bytes available to read.
That leads to the next question: if you don't block inside recv(), how do you prevent CPU-spinning? The answer to that question is to call select() (or poll()) with the correct arguments so that you'll block there until the socket has bytes ready to recv().
Then comes the third question: if your thread is now blocked (possibly forever) inside select(), aren't we back to the original problem again? Not quite, because now we can implement a variation of the self-pipe trick. In particular, because select() (or poll()) can 'watch' multiple sockets at the same time, we can tell the call to block until either of two sockets has data ready-to-read. Then, when we want to shut down the thread, all the main thread has to do is send a single byte of data to the second socket, and that will cause select() to return immediately. When the thread sees that it is this second socket that is ready-for-read, it should respond by exiting, so that the main thread's blocking call to WaitForSingleObject(theThreadHandle) will return, and then the main thread can clean up without any risk of race conditions.
The final question is: how to set up a socket-pair so that your main thread can call send() on one of the pair's sockets, and your recv-thread will see the sent data appear on the other socket? Under POSIX it's easy, there is a socketpair() function that does exactly that. Under Windows, socketpair() does not exist, but you can roll your own implementation of it as shown here.

Assigning return value of an atomic function

I'm trying to implement a barrier function, such that when a thread calls waitBarrier() it will wait until all other n threads have called the function, after which all will proceed, i.e. a sort of synchronization construct.
I have following code:
int i = 0; // Shared variable. Initialized as 0 at the beginning.
waitBarrier() {
// CAS = Compare-and-swap, the first argument holds "old_val" the second the new
i = CAS(i, i+1);
// Spin until all n threads (number of all threads known prior) have been "here"
while (i != n) {}
}
If this gets accessed by n threads, will this function work? Is the assignment of the return value of an atomic function atomic? Or could race conditions occur?
First of all you would have to specify the address of a register, the value of which you are comparing and swapping with. This can be done with either of the following:
CAS(int* reg, int oldValue, int newValue)
or
reg.CAS(int oldValue, int newValue)
Assuming your line now would be:
i = i.CAS(i, i+1)
Imagine two threads calling waitBarrier() at the same time.
Assuming arguments of an atomic function are being evaluated non-atomically, i.e. both threads will actually call i.CAS(0,1)
Whoever atomic call is being executed first will successfully set the shared variable i to 1.
Since CAS does always return the old value, by having an assignment i = OLD_VALUE_OF_i you're actually resetting the shared variable back to 0.
Not only that but imagine you would omit that assignment completely and just made the CAS call, whoever thread executes CAS second will compare the value of the shared value (which now would be 1) with the initial value of i (at evaluation time of the arguments that was 0) which will fail and therefore the shared variable will only be incremented once!
Taking those two aspects into consideration, your code would have to look as following:
int i = 0;
waitBarrier() {
// Atomically increment the shared value i by 1
do {
int value = i;
} while (CAS(i, value, value + 1));
// Wait until all threads have passed the barrier
while (i != n) {}
}

Behavior of WaitForMultipleObjects when multiple handles signal at the same time

Given: I fill up an array of handles with auto reset events and pass it off to WaitForMultipleObjects with bWaitAll = FALSE.
From MSDN:
“When bWaitAll is FALSE, this function checks the handles in the array in order starting with index 0, until one of the objects is signaled. If multiple objects become signaled, the function returns the index of the first handle in the array whose object was signaled.”
So, now if multiple objects signal I’ll get the index of the first one. Do I have to loop though my array to see if any others have signaled?
Right now I have a loop that’s along the lines of:
For ( ; ; )
{
WaitForMultipleObjects(…)
If (not failed)
Process object that called.
Remove the handle that signaled from the array.
Compact the arrary.
}
So, now if multiple objects signal I’ll get the index of the first one. Do I have to loop
though my array to see if any others have signaled?
Why not just go back round into the Wait()? if multiple objects signalled, they will still be signalled when you come back round. Of course, if you have a very rapidly firing first object in the wait object array, it will starve the others; what you do is order your objects in the wait object array by frequency of firing, with the least frequent being first.
BTW, where you're using an endless for(), you could use a goto. If you really are not leaving a loop, an unconditional goto most properly expresses your intent.
Yes. One alternative would be that you could do WaitForSingleObject(handle, 0) on each handle which will return immediately and indicate if they are signaled or not.
EDIT: Here's sample pseudocode for what I mean:
ret = WaitForMultipleObjects()
if (ret >= WAIT_OBJECT_0 && ret < WAIT_OBJECT_0 + (count))
{
firstSignaled = ret - WAIT_OBJECT_0;
// handles[firstSignaled] guaranteed signalled!!
for (i = firstSignaled + 1; i < count; i++)
{
if (WaitForSingleObject(handles[i], 0) == WAIT_OBJECT_0)
{
// handles[i] Signaled!
}
}
}
One other option you might have is to use RegisterWaitForSingleObject. The idea is that you flag the signaled state of event in a secondary array from the callback function and then signal a master event which is used to wake up your primary thread (which calls WaitForSingleObject on the master event).
Obviously you'd have to take care to ensure that the secondary array was protected from access by the main thread but it would work.
Only the auto-reset event that ended the wait (whose index is returned) will be reset. If the wait times out no events will be reset.
cf
https://blogs.msdn.microsoft.com/oldnewthing/20150409-00/?p=44273

Resources