I have a uniform 2D coordinate grid stored in a numpy array. The values of this array are assigned by a function that looks roughly like the following:
def update_grid(grid):
n, m = grid.shape
for i in range(n):
for j in range(m):
#assignment
Calling this function takes 5-10 seconds for a 100x100 grid, and it needs to be called several hundred times during the execution of my main program. This function is the rate limiting step in my program, so I want to reduce the process time as much as possible.
I believe that the assignment expression inside can be split up in a manner which accommodates multiprocessing. The value at each gridpoint is independent of the others, so the assignments can be split something like this:
def update_grid(grid):
n, m = grid.shape
for i in range (n):
for j in range (m):
p = Process(target=#assignment)
p.start()
So my questions are:
Does the above loop structure ensure each process will only operate
on a single gridpoint? Do I need anything else to allow each
process to write to the same array, even if they're writing to
different placing in that array?
The assignment expression requires a set of parameters. These are
constant, but each process will be reading at the same time. Is this okay?
To explicitly write the code I've structured above, I would need to
define my assignment expression as another function inside of
update_grid, correct?
Is this actually worthwhile?
Thanks in advance.
Edit:
I still haven't figured out how to speed up the assignment, but I was able to avoid the problem by changing my main program. It no longer needs to update the entire grid with each iteration, and instead tracks changes and only updates what has changed. This cut the execution time of my program down from an hour to less than a minute.
Related
I cannot understand if what I want to do in Dask is possible...
Currently, I have a long list of heavy files.
I am using multiprocessing library to process every entry of the list. My function opens and entry, operates on it, saves the result in a binary file to disk, and returns None. Everything works fine. I did this essentially to reduce RAM usage.
I would like to do "the same" in Dask, but I cannot figure out how to save binary data in parallel. In my mind, it should be something like:
for element in list:
new_value = func(element)
new_value.tofile('filename.binary')
where there can only be N elements loaded at once, where N is the number of workers, and each element is used and forgotten at the end of each cycle.
Is it possible?
Thanks a lot for any suggestion!
That does sound like a feasible task:
from dask import delayed, compute
#delayed
def myfunc(element):
new_value = func(element)
new_value.tofile('filename.binary') # you might want to
# change the destination for each element...
delayeds = [myfunc(e) for e in list]
results = compute(delayeds)
If you want fine control over tasks, you might want to explicitly specify the number of workers by starting a LocalCluster:
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=3)
client = Client(cluster)
There is a lot more that can be done to customize the settings/workflow, but perhaps the above will work for your use case.
I am very confused why the concurrent.futures module is giving me different results each time. I have a function, say foo(), which runs on segments of a larger set of data d.
I consistently break this larger data set d into parts and make a list
d_parts = [d1, d2, d3, ...]
Then following the documentation, I do the following
results = [executor.submit(foo, d) for d in d_parts]
which is supposed to give me a list of "futures" objects in the order of foo(d1), foo(d2), and so on.
However, when I try to compile results with
done, _ = concurrent.futures.wait(results)
The list of results stored in done seem to be out of order, i.e. they are not the returns of foo(d1), foo(d2) but some different ordering. Hence, running this program on the same data set multiple times yields different results, as a result of the indeterminacy of which finishes first (the d1, d2... are roughly same size) Is there a reason why, since it seems that wait() should preserve the ordering in which the jobs were submitted?
Thanks!
I write Python 3 code, in which I have 2 functions. The first function insertBlock() inserts data in MongoDB collection 1, the second function insertTransactionData() takes data from collection 1 and inserts it into collection 2. Data is in very large amount so I use threading to increase performance. But when I use threading it is taking more time to insert data than without threading. I am so confused that exactly how threading will work in my code and how to increase performance? Here is the main function :
if __name__ == '__main__':
t1 = threading.Thread(target=insertBlock())
t1.start()
t2 = threading.Thread(target=insertTransactionData())
t2.start()
From the python documentation for threading:
target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.
So the correct usage is
threading.Thread(target=insertBlock)
(without the () after insertBlock), because otherwise insertBlock is called, executed normally (blocking the main thread) and target is set to it's return value None. This causes t1.start() not to do anything and you don't get any performance improvement.
Warning:
Be aware that multithreading gives you no guarantee on what the order of execution in different threads will be. You can not rely on the data that insertBlock has inserted into the database inside the insertTransactionData function, because at the time insertTransactionData uses this data, you can not be sure that it was already inserted. So, maybe multithreading does not work at all for this code or you need to restructure your code and only parallelize those parts that do not depend on each other.
I solved this problem by merging these two functionalities into one new function
insertBlockAndTransaction(startrange,endrange). As these two functionalities depend on each other so what I did is I insert transaction information immediately below where block information is inserted (block number was common and needed for both functionalities).Then did multithreading by creating 10 threads for single function:
for i in range(10):
print('thread:',i)
t1 = threading.Thread(target=insertBlockAndTransaction,args(5000000+i*10000,5000000+(i+1)*10000))
t1.start()
It helps me to deal with increasing execution time for more than 1lakh data.
I have to calculate a result of an stochastic algorithm multiple times. In the end I want to have all results in an array. The executions of the algorithm are independent of one another. In Julia this can be parallelized easily with a parallel for-loop like this:
#parallel (vcat) for i=1:10
rand() # or any other algorithm yielding a number
end
But it seems a little inefficient if one thread gets the result of another thread and the two results are merged after every iteration of the for loop.
Is this correct? In this case, it could be that one thread yields a 100-element array and another one has a 200-element array and these arrays are merged into a 300-element array?
Could I somehow prevent this and rewrite the above code to prevent multiple array allocations and maybe put the result that is calculated inside the for-loop into a pre-allocated array?
Or can I make the reduction operator smarter somehow?
You could use pmap for this. It can distribute the work in parallel over your workers, and then store the results of each job as a separate element in an array. You can then combine this array at the end.
Consider this example, where each job is to create a random vector of differing length, all of which are combined at the end:
addprocs(3)
Results = pmap(rand, 1:10)
Result = vcat(Results...) ## array of length 55.
pmap will assign each worker a job as soon as it finishes the job it is working on. As such, it can be more efficient than #parallel if your jobs are of variable length. (see here for details).
The ... syntax breaks the elements of Results (i.e. the 10 vectors of varying length) into separate arguments to feed to the vcat function.
so I've got this multithreaded, recursive application. It's coded in Pharo Smalltalk but the logical solution to the issue is likely to be the same across most languages.
I have 4 of the same process running relatively simultaneously. It's the last iteration of a recursive call. I'd like to print the result calculated by my recursive function (it's a dictionary being modified in the argument of the recursive function/message). The issue I'm facing right now is that the print is called in the base case terminator of the recursion, so the result is printed 4 times.
I tried setting a global variable which allows for me to print the result of the process which finishes first, but of course that means that the result is wrong. It needs to print the result of the last process to execute of all the processes in that last iteration of the recursion.
How could I go about this without going too deep into the Process class? Thanks for any help.
Do you know the number of threads? (Supposedly, 4)
Then you can use an atomic long (in java, for example):
AtomicLong myAtomicLong = new AtomicLong(0);
...
...
// do my work
if (totalThreadCount == myAtomicLong.getAndIncrement() -1)
{
//my print
}
The increment and get is atomic, so the last thread to want to print, will get there and the condition will be true after all other threads have finished their jobs. Please notice that it is important to place the increment and check after the job, is done.