This question already has answers here:
Atomicity on x86
(2 answers)
Why is integer assignment on a naturally aligned variable atomic on x86?
(5 answers)
Can num++ be atomic for 'int num'?
(13 answers)
Is x86 CMPXCHG atomic, if so why does it need LOCK?
(3 answers)
How are atomic operations implemented at a hardware level?
(4 answers)
Closed 2 years ago.
I'm learning thread and locks, which might use atomic operation at the CPU level. I wonder what sample commands are atomic inside an Intel CPU? From a hardware perspective?
Related
This question already has answers here:
Spinlock with XCHG unlocking
(2 answers)
C++ How is release-and-acquire achieved on x86 only using MOV?
(2 answers)
Locks around memory manipulation via inline assembly
(1 answer)
how are barriers/fences and acquire, release semantics implemented microarchitecturally?
(1 answer)
Closed 1 year ago.
I have read bunch of spinlock implementations for x86/amd64 architecture, including
glibc's pthread_spin_lock/pthread_spin_unlock. Roughly speaking they use cmpxchg
instruction to acquire the lock, and just use regular MOV instruction to release
the unlock. How come there is no need to flush store-buffer before the lock is released.
Consider following two threads running on different cores, and statement s100 runs
immediately after s3.
thread 1:
s1: pthread_spin_lock(&mylock)
s2: x = 100
s3: pthread_spin_unlock() // call a function which contains "MOV mylock, 1"
thread 2:
s100: pthread_spin_lock(&mylock)
s200: assert(x == 100)
s300: pthread_spin_unlock(&mylock)
Is the s200 guaranteed true? Is it possible that by the time s100 acquire the lock,
x's is still not yet flushed from store-buffer to cache?
I'm wondering:
Is the call-overhead (of pthread_spin_unlock()) sufficient for covering the time of
flushing store-buffer to cache?
Does the cmpxchg or any instruction with implicit or explicit LOCK prefix magically
flush store-buffers on other cores?
If the s200 is not guaranteed true, what is the most inexpensive way to fix it?
insert mfence instruction prior to the MOV instruction.
replace the MOV instruction with atomic fetch-and-and/or instruction,
or others?
Profuse thanks in advance!
This question already has answers here:
How to get an ideal number of threads in parallel programs in Java?
(3 answers)
Closed 3 years ago.
What is the best number of threads in parallel programs in java?
NoT<= (noc) / (1 - bf)
NoT- Number of threads
noc- number of cores
Bf - block factor
For best performance
No of cores -1
This question already has answers here:
During an x86 software interrupt, when exactly is a context switch made?
(1 answer)
Context switch in Interrupt handlers
(2 answers)
Closed 5 years ago.
A process's virtual address space contains 1 GB of kernel space:
Now I assume that this 1 GB of kernel space points to data and code related to the kernel (including the Interrupt Descriptor Table (IDT)).
Now let's say that some process is being executed by the CPU, and this process made a system call (fired the interrupt 0x80 (int 0x80)). What will happen is that the CPU will go to the IDT and execute the interrupt handler associated with the interrupt number 0x80.
Now will the CPU stays in the current process, and execute the interrupt handler from the kernel space of the current process (so no context switching occurs)?
This question already has answers here:
Why does a std::atomic store with sequential consistency use XCHG?
(1 answer)
Are loads and stores the only instructions that gets reordered?
(2 answers)
Which is a better write barrier on x86: lock+addl or xchgl?
(5 answers)
Does lock xchg have the same behavior as mfence?
(1 answer)
Closed 4 years ago.
What is the difference in logic and performance between x86-instructions LOCK XCHG and MOV+MFENCE for doing a sequential-consistency store.
(We ignore the load result of the XCHG; compilers other than gcc use it for the store + memory barrier effect.)
Is it true, that for sequential consistency, during the execution of an atomic operation: LOCK XCHG locks only a single cache-line, and vice versa MOV+MFENCE locks whole cache-L3(LLC)?
The difference is in purpose of usage.
MFENCE (or SFENCE or LFENCE) is useful when we are locking a part of memory region accessible from two or more threads. When we atomically set the lock for this memory region we can after that use all non-atomic instruction, because there are faster. But we must call SFANCE (or MFENCE) one instruction before unlocking the memory region to ensure that locked memory is visible correctly to all other threads.
If we are changing only a single memory aligned variable, then we are using atomic instructions like LOCK XCHG so no lock of memory region is needed.
This question already has answers here:
In Node.js how does the event loop work? [closed]
(1 answer)
Nodejs Event Loop
(8 answers)
Closed 9 years ago.
Nodejs is known to be best for simplification by "single thread". As I know, this single thread handles requests through a event loop. I want to ask:
Is Nodejs really only have one thread only? For example, if there is a billion users on a website, the thread will loop through a billion times? Or in fact there are some "small threads" that the large single thread will use to do different stuffs?
Thank you!