I'm new with multi-threading in Matlab so I guess that what I need to do will be simple for anyone with a little experience in it.
I have two functions f1 and f2 such that:
f1 - runs about 10 seconds and returns accurate results.
f2 - return estimated results immediately.
Both functions get the same input and return the same output (but in f1 the output is more accurate).
The functions get new inputs all the time and run constantly.
I want to run the functions in two different threads so that the user will never need to wait.
and every time one of them finishes, the other one uses its results.
The goal is to get more accurate results than running f2 all the time. Running f1 all the time is the optimal solution but then the user needs to wait 10 seconds each time. So I want to combine the two functions and to get something in the middle.
What is the best way to implement the above?
I read about batch but I'm not sure it's what I need (at least I didn't find how to do the above with batch).
Can you just lead me with the relevant functions and I'll read about them and learn how to use them?
Thanks
Related
When using the function ParameterManager.recalculate() to get the actualised values of all the parameterized exchanges of my database, the functions ActivityParameter.recalculate() and ActivityParameter.recalculate_exchanges() are applied to all the parameters groups. But it seems that the function ActivityParameter.recalculate_exchanges() is ran twice because it is also used inside the function ActivityParameter.recalculate(). When deleting one, I get the same results but twice faster (that is what I was looking for because otherwise my calculation is a bit long). Is there a reason for running the function twice ? Is it right to delete one to get faster results ? Would there be a way to reduce the duration of this calculation ?
You are totally right - ParameterManager.recalculate calls both ActivityParameter.recalculate and ActivityParameter.recalculate_exchanges, but ActivityParameter.recalculate already calls ActivityParameter.recalculate_exchanges. This makes things slower, but doesn't break anything.
This duplication has been removed; you can safely do the same.
I have a function that receives a list of integers and what each integer does is build a json, consume some api, do validations and then insert it into a database. But there is a problem, with a list of 2 values the function takes seconds but with a list of 200 values it takes minutes more than 8 minutes, is it possible to optimize that time by sending that amount of values? , it is possible to send a series of values so that the function does its job and at the same time send another amount of values and that the function does its job and thus, simultaneously send, for example, 50 values to the function and then send other 50 to the same function at the same time to optimize time? or how would be the ideal way to optimize it
Generally there are two approaches. Optimizing the process for a single call and running multiple calls at once also known as parallelization.
For optimizing a single call, you should start by profiling the function. Python has a built-in profiler, cProfile. There are other packages as well such as py-spy which can have the results formatted to be integrated into https://www.speedscope.app/ which is a nice visual tool.
Regarding parallelization, it depends on how this function is used. You could take a look at Asyncio. However if you're hitting the database, that's probably going to prove difficult and you'll need to integrate Django Channels or something similar. Django is still working on proper Async DB functionality. Alternatively, you can use something like Celery and wrap this function in a Task. Then you can schedule calls with various arguments and scale up the workers as needed.
Hopefully this helps you on your search and understanding of how to build what you need.
So, I have a chain of tasks in Python 3 that a celery worker runs. Currently, I use the following piece of code to get and print the final result of the chain :
while not result.ready():
pass
print(result.get())
I have run the code with and without the while-loop, and it seems that the while-loop is redundant.
My question is: "is it necessary to have that while-loop?"
If by redundant, you mean that the code works fine without the while loop, then I would venture to say that the loop is not necessary. If, however, you throw an error without the loop because you're trying to print something that doesn't exist yet, then you should keep it. This can be a problem, though, because an empty while loop means you're just checking the same variable as fast as your computer can physically handle it, which tends to eat up your CPU. I recommend something like the following:
import time
t = 1 #The number of seconds you want to wait between checking if the result is ready
while not result.ready():
time.sleep(t)
print(result.get())
You can set t to whatever makes sense. If the task you're running takes several hours, maybe set it to 60, and you'll get the result within a minute. If you want the result faster, you can make the interval smaller. This will keep the program from dragging down the rest of your computer. However, if you don't mind your fans blowing and you absolutely need to know the moment the result is ready, ignore all of the above and leave your code the way it is :)
If I have two datasets (having equal number of rows and columns) and I wish to run a piece of code that I have made, then there are two options obviously, either to go with sequential execution or parallel programming.
Now, the algorithm (code) that I have made is a big one and consists of multiple for loops. I wish to ask, is there any way to directly use it on both of them or will I have to transform the code in some way? A heads up would be great.
To answer your question: you do not have to transform the code to run it on two datasets in parallel, it should work fine like it is.
The need for parallel processing usually arises in two ways (for most users, I would imagine):
You have code you can run sequentially, but you would like to do it in parallel.
You have a function that is taking very long to execute on a large dataset, and you would like to run it in parallel to speed it up.
For the first case, you do not have to do anything, you can just execute it in parallel using one of the libraries designed for it, or just run two instances of R on the same computer and run the same code but with different datasets in each of them.
It doesn't matter how many for loops you have in there and you don't even need to have the same number of rows in columns in the datasets.
If it runs fine sequentially, it means there will be no dependence between the parallel chains and thus no problem.
Since your question falls in the first case, you can run it in parallel.
If you have the second case, you can sometimes turn it into the first case by splitting your dataset into pieces (where you can run each of the pieces sequentially) and then you run it in parallel. This is easier said than done, and won't always be possible. It is also why not all functions just have a run.in.parallel=TRUE option: it is not always obvious how you should split the data, nor is it always possible.
So you have already done most of the work by writing the functions, and splitting the data.
Here is a general way of doing parallel processing with one function, on two datasets:
library( doParallel )
cl <- makeCluster( 2 ) # for 2 processors, i.e. 2 parallel chains
registerDoParallel( cl )
datalist <- list(mydataset1 , mydataset2)
# now start the chains
nchains <- 2 # for two processors
results_list <- foreach(i=1:nchains ,
.packages = c( 'packages_you_need') ) %dopar% {
result <- find.string( datalist[[i]] )
return(result) }
The result will be a list with two elements, each containing the results from a chain. You can then combine it as you wish, or use a .combine function. See the foreach help for details.
You can use this code any time you have a case like number 1 described above. Most of the time you can also use it for cases like number 2, if you spend some time thinking about how you want to divide the data, and then combine the results. Think of it as a "parallel wrapper".
It should work in Windows, GNU/Linux, and Mac OS, but I haven't tested it on all of them.
I keep this script handy whenever I need a quick speed-up, but I still always start out by writing code I can run sequentially. Thinking in parallel hurts my brain.
I have a Large-Scale Gradient Descent optimization problem that I am running using Matlab. The code has got two parts:
A Sequential update part that fires every iteration that updates the parameter vector.
A validation error computation part that fires every 10 iterations or so using the parameter value at the end of the corresponding iteration in which its fired.
The way that I am running this now is to do (1) and (2) sequentially. But (2) takes a lot of time and its not the core part of my routine - I made it just to check the progress and plot the error of my model. Is it possible in Matlab to run (2) in a parallel manner to (1) ? Please note that (1) cannot be run in parallel since it performs sequential update. So a simple 'parfor' usage is not a solution, unless there is a really smart way of doing that.
I don't think Matlab has any way of multi-threading outside of the (rather restricted) parallel computing toolbox. There is a work over which may help you though:
Open 2 sessions of Matlab, sessions A and B (or instances, or workspaces, however you call it)
Matlab session A:
Calculate the 10 iterations of your sequential process (1)
Saves the result in a file (adequately and uniquely named)
Goes on to calculate the next 10 iterations (back to the top of this loop basically)
In parralel:
Matlab session B:
Check periodically for the existence of the file written by process A (define a timer that will do that at the time interval which make sense for your process, a few seconds or a few minutes ...)
If the file exist => load it then do the validation computation (your process (2)) and display/report the results.
note: This only works if process (1) doesn't need the result of process (2) to run its iterations, but if it is the case I don't know how you could parallelise anyway.
If you have multiple cores on your machine that should run smoothly, if you have a single core then the 2 sessions will have to share and you will see a performance impact.