Can the go block below cause a memory leak in Clojure if data is no longer put into in channel? Do i need to find a way to close the channel?
(defn printer
[in]
(go (while true (println (<! in)))))
Given there is no way for this go block to end, yes it will continue to use some extremely small amount of memory in the system to keep track of it. However as it's a go it will not consume a thread.
Related
I’ve understood the ideas behind having a ring buffer and how it helps to not have to shift elements around in the process. However, I’m curious how one would best deal with a variable-length buffer that is thread-safe and offers similar advantages to the ring buffer? Could we double the size upon reaching capacity and have one thread do the copy-over within a mutex? Would this just variable-sized buffer just be a queue that is implemented to be thread-safe? What would be the best approach and what are the advantages and disadvantages to the alternate solutions to this type of concurrent read/write access?
For a multi-thread producer/consumer application, a single circular buffer usually stops being a good idea when you need it to grow.
I would usually switch to a lock-free singly-linked list of single-use FIFO buffers of fixed size, with unused buffers that can be recycled stored in a lock-free stack.
The non-bocking queue from here is simple and practical: https://www.cs.rochester.edu/u/scott/papers/1996_PODC_queues.pdf
A linked list might be better as no copy-over would be needed when extending the ring buffer.
Let's suppose I have a big memory buffer used as a framebuffer, what is constantly written by a thread (or even multiple threads, guaranteed that no two threads write the same byte concurrently). These writes are indeterministic in time, scattered through the codebase, and cannot be blocked.
I have another single thread which periodically reads out (copies) the whole buffer for generating a display frame. This read should not be blocked, too. Tearing is not a problem in my case. In other words, my only goal is that every change done by the write thread(s) should eventually appear in the reading thread. The ordering or some (negligible compared to a display refresh rate) delay does not matter.
Reading and writing the same memory location concurrently is a data race, which results in an undefined behavior in c++11, and this article lists same really dreadful examples where the optimizing compiler generates code for a memory read that alters the memory contents in the presence of data race.
Still, I need some solution without completely redesigning this legacy code. Every advice counts what is safe from practical standpoints, independent of if it is theoretically correct or not. I am also open to not-fully-portable solutions, too.
Aside from that I have a data race, I can easily force the visibility of the buffer changes in the reading thread by establishing a synchronizes-with relation between the threads (acquire-release an atomic guard variable, used for nothing else), or by adding platform-specific memory fence calls to key points in the writer thread(s).
My ideas to target the data race:
Use assembly for the reading thread. I would try to avoid that.
Make the memory buffer volatile, thus preventing the compiler to optimize such nasty things what are described in the referenced article.
Put the reading thread's code in a separate compile unit, and compile with -O0
+1. Leave everything as is, and cross my fingers (as currently I do not notice issues) :)
What is the safest from the list above? Do you see a better solution?
FYI, the target platform is ARM (with multiple cores) and x86 (for testing).
(This question is concretizing a previous one what was a little too generic.)
A server program keeps long-running TCP connections with many clients. Each client connection is served by a thread created with forkIO. The server takes up a lot of memory when running, so naturally I did a profiling to hunt down possible space leaks. However, with around 10k clients (hence 10k threads), the result shows that a major portion of the heap is actually STACK allocated by threads. If I understand correctly, this is not surprising since the stack of a thread starts with 1k by default, and increases by 32k chunks. As these are long-running threads, these memory won't be GCed.
My question: STACK takes up too much space, is there a way to reduce it?
I had some thoughts on this: previously I could use the event notification APIs from GHC to write the program without using threads, however it seems this option is no longer possible as GHC has stopped exporting some of the event handling functions such as loop. On the other hand, such change means a major shift in concurrency model (threads vs events), which is very undesirable since the Haskell threads are simply so enjoyable to work with. Another way that came to my mind is to split/rewrite the threads so that one thread does all the handshaking+authentication stuff, creates a new thread and then exits. The new thread, which will keep looping, hopefully doesn't require more STACK space. However I'm not sure if this idea is correct or doable.
Let's consider this sentence (Total Store Ordering):
reads are ordered before reads, writes before writes, and reads before writes, but not writes before reads.
I think I almost get the basics:
Each thread has its own program order (code as it is written)
In general, CPU may reorder instructions and we must constrain it to exclude incorrect orderings
CPU may also reorder memory loads and stores and we must constrain those as well
Current hardware implementation has "serializing instructions" like mfence which are invoked by all threads to address both of the problems.
Hardware typically allows only one dirty cache, so it is all about flushing that cache:
Storing thread flushes dirty cache
Loading thread requests and blocks until there is no dirty cache
Kernel developers care about devices other than CPU accessing memory but I don't.
Yet I still fail to understand what does "reads before reads" really mean. It probably means that there are implicit barriers and serializing instructions in those architectures but I can't really tell.
I am so sure I have heard that in the OS course at my uni in Greece, damn, I read it with the voice of the prof. :) Since nobody answered, I will attempt to answer.
Imagine you are the OS, every thread/program wants to perform reads and write. Now, since we are talking about multithreading, a thread may read something another thread has written, like a value of a variable.
Now if thread 1 wants to perform a read of a memory address x, and thread 2 wants to perform a read x too, it's OK to allow them to read x in any order. That's what it means, I think!
Hope it helps somehow, since I know it's not the best answer one could give! :/
Is it good idea to use LINKED LIST to store overlapped structure?
my overlapped structure looks like this
typedef struct _PER_IO_CONTEXT
{
WSAOVERLAPPED Overlapped;
WSABUF wsabuf;
....
some other per operation data
....
struct _PER_IO_CONTEXT *pOverlappedForward;
struct _PER_IO_CONTEXT *pOverlappedBack;
} PER_IO_CONTEXT, *PPER_IO_CONTEXT;
When iocp server starts i allocate (for example) 5000 of them in Linked List. The start of this list is stored in global variable PPER_IO_CONTEXT OvlList. I must add that i use this linked list only in a case when i have to send data to all connected clients.
When Wsasend is posted or GQCS gets notification, linked list is updated( i use EnterCriticalSection for synchronization ).
Thanks in advance for yours tips, opinions and suggestions for better storage(caching) overlapped structure.
I assume the proposed use case is that you wish to cache the "per operation" overlapped structure to avoid repeated allocation and release of dynamic memory which could lead to both contention on the allocator and heap fragmentation.
Using a single 'pool' reduces the contention from 'contention between all threads using the allocator that is used for allocating and destroying the overlapped structures' to 'to contention between all threads issuing or handling I/O operations' which is usually a good thing to do. You're right that you need to synchronise around access, a critical section or perhaps a SRW lock in exclusive mode is probably best (the later is fractionally faster for uncontended access).
The reduction in heap fragmentation is also worth achieving in a long running system.
Using a 'standard' non invasive list, such as a std::deque looks like the obvious choice at first but the problem with non invasive collections is that they tend to allocate and deallocate memory for each operation (so you're back to your original contention). Far better, IMHO, to put a pointer in each overlapped structure and simply chain them together. This requires no additional memory to be allocated or released on pool access and means that your contention is back down to just the threads that use the pool.
Personally I find that I only need a singly linked list of per-operation structures for a pool (free list) which is actually just a stack, and a doubly linked list if I want to maintain a list of 'in use' 'per-operation' data which is sometimes useful (though not something that I now do).
The next step may then be to have several pools, but this will depend on the design of your system and how your I/O works.
If you can have multiple pending send and receive operations for a given connection it may be useful to have a small pool at the connection level. This can dramatically reduce contention for your single shared pool as each connection would first attempt to use the per-connection pool and if that is empty (or full) fall back to using the global pool. This tends to result in far less contention for the lock on the global pool.