I have a matrix of a big size say 20000*20000 and this matrix keeps on changing every iteration. The matrix is being produced in Fortran, and Fortran calls a C++ function that processes the matrix into block diagonal form. I would like to have the c++ function creates two threads (using C++11) where each will handle 10000 * 10000. I can easily break the matrix into two parts since it's a special matrix. The matrix elements keeps on changing every iterations and if I create and join (kill) the two threads every iteration the over head becomes way expensive and the point of using multi-threading approach is lost. I decided to do the iterations inside the threads; however, I am not sure if I can keep the threads waiting for the updated matrix in order to solve in the next interation (we need to go back to Fortran to calculate the new matrix through).
The point I am stuck at is the following:
When I create the two threads from the function in c++, that function will return to Fortran and the function instance is destroyed (right?). What happen to the two threads that are currently waiting for the new matrix ?
Related
I want to see the intrinsic difference between a thread and a long-running go block in Clojure. In particular, I want to figure out which one I should use in my context.
I understand if one creates a go-block, then it is managed to run in a so-called thread-pool, the default size is 8. But thread will create a new thread.
In my case, there is an input stream that takes values from somewhere and the value is taken as an input. Some calculations are performed and the result is inserted into a result channel. In short, we have input and out put channel, and the calculation is done in the loop. So as to achieve concurrency, I have two choices, either use a go-block or use thread.
I wonder what is the intrinsic difference between these two. (We may assume there is no I/O during the calculations.) The sample code looks like the following:
(go-loop []
(when-let [input (<! input-stream)]
... ; calculations here
(>! result-chan result))
(recur))
(thread
(loop []
(when-let [input (<!! input-stream)]
... ; calculations here
(put! result-chan result))
(recur)))
I realize the number of threads that can be run simultaneously is exactly the number of CPU cores. Then in this case, is go-block and thread showing no differences if I am creating more than 8 thread or go-blocks?
I might want to simulate the differences in performance in my own laptop, but the production environment is quite different from the simulated one. I could draw no conclusions.
By the way, the calculation is not so heavy. If the inputs are not so large, 8,000 loops can be run in 1 second.
Another consideration is whether go-block vs thread will have an impact on GC performance.
There's a few things to note here.
Firstly, the thread pool that threads are created on via clojure.core.async/thread is what is known as a cached thread pool, meaning although it will re-use recently used threads inside that pool, it's essentially unbounded. Which of course means it could potentially hog a lot of system resources if left unchecked.
But given that what you're doing inside each asynchronous process is very lightweight, threads to me seem a little overkill. Of course, it's also important to take into account the quantity of items you expect to hit the input stream, if this number is large you could potentially overwhelm core.async's thread pool for go macros, potentially to the point where we're waiting for a thread to become available.
You also didn't mention preciously where you're getting the input values from, are the inputs some fixed data-set that remains constant at the start of the program, or are inputs continuously feed into the input stream from some source over time?
If it's the former then I would suggest you lean more towards transducers and I would argue that a CSP model isn't a good fit for your problem since you aren't modelling communication between separate components in your program, rather you're just processing data in parallel.
If it's the latter then I presume you have some other process that's listening to the result channel and doing something important with those results, in which case I would say your usage of go-blocks is perfectly acceptable.
Let's say I have a generator that produces some random number on each call. I want the combinations of these numbers in 2's.
def generate():
while True: # Note this is not actually infinite, just an example
yield random(1,10)
for combos in iter.combinations(generate(),2):
#DO AN OPERATION WITH A COMBINATION
#HOW DO I MULTI-THREAD THIS?
But my generator is going to yield a total number 'n' which is 24000+ numbers. So I need to process the combinations as they are made instead of storing in a list(memory).
I also need to multithread this operation by dividing the combinations among at least 4 threads.
I thought of doing this round robin, i.e. Assign 4 queues and each thread being responsible for 1 queue.
Do you guys have any other recommendations? I need the script to finish executing as soon as possible.
EDIT:
Ok, I just wrote both versions of this program (list based, generator based).
And My list based version is actually taking less RAM. How is this possible?
EDIT 2:
It was because I tried to plot points using pyplot point by point. This caused the graph to be re-rendered on every call.
I have a ConcurrentLinkedQueue and I want to split it into two halves and let two separate threads handle each. I have tried using Spliterator but I do not understand how to get the partitioned queues.
ConcurrentLinkedQueue<int[]> q = // contains a large number of elements
Spliterator<int[]> p1 = q.spliterator();
Spliterator<int[]> p2 = p1.trySplit();
p1.getQueue();
p2.getQueue();
I want to but cannot do p1.getQueue() etc.
Please let me know the correct way to do it.
You can't split it in half in general, I mean to split in half this queue must have a size at each point in time. And while CLQ does have a size() method, it's documentation is pretty clear that this size requires O(n) traversal time and because this is a concurrent queue it's size might not be accurate at all (it is named concurrent for a reason after all). The current Spliterator from CLQ splits it in batches from what I can see.
If you want to split it in half logically and process the elements, then I would suggest moving to some Blocking implementation that has a drainTo method, this way you could drain the elements to an ArrayList for example, that will split much better (half, then half again and so on).
On a side note, why would you want to do the processing in different threads yourself? This seems very counter-intuitive, the Spliterator is designed to work for parallel streams. Calling trySplit once is probably not even enough - you have to call it until it returns null... Either way doing these things on your own sounds like a very bad idea to me.
I have a Large-Scale Gradient Descent optimization problem that I am running using Matlab. The code has got two parts:
A Sequential update part that fires every iteration that updates the parameter vector.
A validation error computation part that fires every 10 iterations or so using the parameter value at the end of the corresponding iteration in which its fired.
The way that I am running this now is to do (1) and (2) sequentially. But (2) takes a lot of time and its not the core part of my routine - I made it just to check the progress and plot the error of my model. Is it possible in Matlab to run (2) in a parallel manner to (1) ? Please note that (1) cannot be run in parallel since it performs sequential update. So a simple 'parfor' usage is not a solution, unless there is a really smart way of doing that.
I don't think Matlab has any way of multi-threading outside of the (rather restricted) parallel computing toolbox. There is a work over which may help you though:
Open 2 sessions of Matlab, sessions A and B (or instances, or workspaces, however you call it)
Matlab session A:
Calculate the 10 iterations of your sequential process (1)
Saves the result in a file (adequately and uniquely named)
Goes on to calculate the next 10 iterations (back to the top of this loop basically)
In parralel:
Matlab session B:
Check periodically for the existence of the file written by process A (define a timer that will do that at the time interval which make sense for your process, a few seconds or a few minutes ...)
If the file exist => load it then do the validation computation (your process (2)) and display/report the results.
note: This only works if process (1) doesn't need the result of process (2) to run its iterations, but if it is the case I don't know how you could parallelise anyway.
If you have multiple cores on your machine that should run smoothly, if you have a single core then the 2 sessions will have to share and you will see a performance impact.
I was asked this question to reverse a singly linked list as big as having 7 million nodes by using threads efficiently. Using recursion doesn't look feasible if there are so many nodes so I opted for divide and conquer where in each thread be given a chunk of linked list which gets reversed by just making the node pointer point back to previous node by store a reference to current, future and past node and later adding it with reversed chunks from other threads. But the interviewer insisted that the size of the link list is not know, and you can do it without finding the size in an efficient manner. Well I couldn't figure it out , how would you go about it ?
Such questions I like to implement "top-down":
Assume that you already have a Class that implement Runnable or extends Thread out of which you can create instances and run, each instance receives two parameters: a pointer to a Node in the List and number of Nodes to reverse
Your main traverse all 7 million nodes and "marks" the starting points for your threads, say we have 7 threads, the marked points will be: 1, 1,000,000, 2,000,000,... save the marked nodes in an array or whichever data-structure you like
After you finished "marking the starting points, create the threads and give each one of them its starting point and the counter 1,000,000
After all the threads are done, "glue" each of the marking points to point back to the last node of the previous thread (which should be saved in another "static" ordered data-structure).
Now that we have a plan - all that's left to do is implement a (considerably easy) algorithm that, give the number N and a Node x, it will reverse the next N nodes (including x) in a singly linked list :)