Program contains 2 nested loops which contain 4 if conditionals. How many paths? - combinatorics

In Roger Pressman's book, there is an example described of a program with 2 nested loops, the inner loop enclosing four if statements. The two loops can execute up to 20 times. He states that this makes about 10^14 paths. To get a number this large, it seems the paths inside the loops are multipllied by 2^40, i.e. 2^20 times 2^20 to account for all the possibilities of going through the two loops. I can't see why this factor is not just 400, i.e. 20 times 20. Can someone shed some light? It will help if you have the ppt slides and can see the program graph. Thanks.

The inner block would be multiplied by 20*20 if the loops execute exactly 20 times each, since the inner block would be run through a fixed 20*20 times, and all that would matter is the path through it each time. You said they execute "up to 20 times", so you need to account for control flow changes if the inner one executes 19 times, 18 times, etc., and then the same for the outer one

Related

How to use multiple threads to execute the same code and speed up it?

I'm facing some performance issues to execute a fuzzy match based on Leveinshtein distance algorithm.
I'm comparing two lists, a small one with 1k lines and a second one with 10k lines.
I have splitted the bigger list in 10 files of 1000 lines to check speed performance, but I checked that Python is using only 1 thread.
I have googled for many articles and people says how to execute TWO different functions in paralel.
I would like to know how to execute the SAME code in multiple threads.
For example: it's taking 1 second to compare 1 word in a 1000 lines. I would like to split this time in 4 threads.
Is it possible?
Sorry for the long text and thanks a lot for your help!
Running the same code in two or more threads won't assist performance. You could potentially split up the task so each handles 250, then have each thread handle 1 of those tasks. Then compare the results at the end.

Cross-checking each item in a list with every other item in the same list in a parallel fashion

I have a list of several thousand items. Each item has an attribute called "address range". I have a function that verifies the correctness of the items in the list by making sure that none of their address ranges overlap with the address ranges of any other items in the list (each item has precisely one address range). If N is the number of entries in the list, I essentially have to run (N-1)*(N/2) address range overlap checks. In other words, if the number of items in the list doubles, the number of overlap checks quadruples.
Months ago, such a list would only have a few thousand items, and the whole operation would finish relatively quickly, but over time the number of items has grown, and now it takes several minutes to run all the cross-checks.
I've been trying to parallelize the cross-checks, but I have yet to think of a feasible approach. My problem is that if I want to distribute the cross-checks to perform over say 8 threads (to fully exploit the CPUs on the computer), I would have to split the possible cross-check combinations into 8 independent chunks.
To use an example, say we have 5 items in our list: ( A, B, C, D, E ). Using the formula (N-1)*(N/2), we can see that this requires (5-1)*(5/2)=10 cross-checks:
A vs B
A vs C
A vs D
A vs E
B vs C
B vs D
B vs E
C vs D
C vs E
D vs E
The only way I can think of to distribute the cross-check combinations across a given number of threads is to first create a list all cross-check combination pairs and then split that list into evenly sized chunks. That would work in principle, but even for just 20,000 items that list would already contain (20,000-1)*(20,000/2)=199,990,000 entries!!
So my question is, is there some super-sophisticated algorithm that would allow me to pass the entire list of items to each thread and then have each individual thread figure out by itself which cross-checks it should run so that no 2 threads would repeat the same cross-checks?
I'm programming this in Perl, but really the problem is independent from any particular programming language.
EDIT: Hmmm, I'm now wondering if I've been going about this the wrong way altogether. If I could sort the items by their address ranges, I could just walk through the sorted list and check if any item overlaps with its successor item. I'll try that and see if that speeds things up.
UPDATE: Oh my God, this actually works!!! :D Using a pre-sorted list, the entire operation takes 0.7 seconds for 11,700 items, where my previous naive implementation would take 2 to 3 minutes!
UPDATE AFTER usr's comment: As usr has noted, just checking each item against its immediate successor is not enough. As I'm walking through the sorted list, I'm dragging along an additional (initially empty) list in which I keep track of all items involved in the current overlap. Each time an item is found to overlap with its successor item, the successor item is added to the list (if the list was previously empty, the current item itself is also added). As soon as an item does NOT overlap with its successor item, I locally cross-check all items in my additional list against each other and then clear that list (the same operation is performed if there are still any items in my additional list after I've finished walking the list of all items).
My unit tests seem to confirm that this algorithm works; at least with all the examples I've fed it so far.
It seems like you could create N threads where N = number of cores on the computer. Each of these threads is identical and consumes items from the queue until there are no more items. Each item is the comparison pair that the thread should work on. Since an item can only be consumed once, you will get no duplicate work.
On the producer side, simply send every valid combination to the queue (just the pairs of items); the threads are what does the work on each item. Thus there is no need to spit the items into chunks.
It would be great if each thread could be pinned to a core, but whatever OS you're running on will most likely do a good enough job at scheduling that you won't need to worry about that.

Parallelization slows down execution in MatLab

I am using Matlab on a Mac OS X running on a Pentium processor with 4 real cores.
I want to analyse Magnetic resonance images (MRI) and fit the signal from these images using optimisation. For every pixel I have 35 values (i.e. the same image acquired 35 times during different conditions) and I want to fit these values to some function
Below, I have stripped my code down to the very basic loop that calls the fitting function:
ticid1 = tic;
for x= a:1:b
[a, b, c, d] = FitSignal(Volume(y,x,:));
end;
toc(ticid1);
Here Volume is a 3D matrix holding all MRI images about 9 MB in size. FitSignal thus gets an array holding 35 values for a specific pixel and the optimisation finds the best fit. The loop runs in this case 120 times (b-a = 120) which is once for every pixel that are on a horizontal line in the image.
Timing the above code using tic and toc, the entire loop takes about 50 seconds
I thought executing the code in parallel may provide some speed up. So I opened 3 workers and ran the loop with parfor but found only marginal (20-30%) speedup.
Then I reduced the number of workers to 1. Now running the code with parfor took about 90 seconds. So with 1 worker the code is app. twice as slow as when running without parallelization. This is consistent with the small benefit seen with 3 workers.
I then tried timing inside the function FitSignal and found that without parallelization it takes app. 0.4 seconds while with parallelization it takes 0.7 seconds.
I understand that parallelization comes with overhead but in this case it seems excessive to me. Besides, once inside the function FitSignal, and when there is only one worker, it should not matter if the function runs on the main process or within a worker - right ? However, running inside a sole worker, the function runs quite slower!
Can anyone tell me what is wrong? and importantly, how to change the code to take advantage of any possible speedup with parallel execution ?
Thanks in advance
PS: I have checked my system. Memory pressure low, I even issued "purge" in terminal to free memory. CPU does not exceed 15% during run.
When running on a single machine, Matlab automatically parallelises vector operations (1)... except when you are running explicit parallelisation, like parfor (2).
So, what is happening here is that when you run in normal, not parfor mode you are getting a 100% speedup from parallelised vector operations, based on your numbers.
When you run in parfor mode, you loose the vector operations boost, but gain the parallelisation from parfor, so half the speed of normal processing, but split over three cores, so taking about two thirds of the time.
The above is a rough estimate based on the numbers in the question; naturally for other problems these relative speedups will vary due to a number of factors, such as differing amounts of vectorized code and overheads of parfor.

How can I accomplish parallel processing in R?

If I have two datasets (having equal number of rows and columns) and I wish to run a piece of code that I have made, then there are two options obviously, either to go with sequential execution or parallel programming.
Now, the algorithm (code) that I have made is a big one and consists of multiple for loops. I wish to ask, is there any way to directly use it on both of them or will I have to transform the code in some way? A heads up would be great.
To answer your question: you do not have to transform the code to run it on two datasets in parallel, it should work fine like it is.
The need for parallel processing usually arises in two ways (for most users, I would imagine):
You have code you can run sequentially, but you would like to do it in parallel.
You have a function that is taking very long to execute on a large dataset, and you would like to run it in parallel to speed it up.
For the first case, you do not have to do anything, you can just execute it in parallel using one of the libraries designed for it, or just run two instances of R on the same computer and run the same code but with different datasets in each of them.
It doesn't matter how many for loops you have in there and you don't even need to have the same number of rows in columns in the datasets.
If it runs fine sequentially, it means there will be no dependence between the parallel chains and thus no problem.
Since your question falls in the first case, you can run it in parallel.
If you have the second case, you can sometimes turn it into the first case by splitting your dataset into pieces (where you can run each of the pieces sequentially) and then you run it in parallel. This is easier said than done, and won't always be possible. It is also why not all functions just have a run.in.parallel=TRUE option: it is not always obvious how you should split the data, nor is it always possible.
So you have already done most of the work by writing the functions, and splitting the data.
Here is a general way of doing parallel processing with one function, on two datasets:
library( doParallel )
cl <- makeCluster( 2 ) # for 2 processors, i.e. 2 parallel chains
registerDoParallel( cl )
datalist <- list(mydataset1 , mydataset2)
# now start the chains
nchains <- 2 # for two processors
results_list <- foreach(i=1:nchains ,
.packages = c( 'packages_you_need') ) %dopar% {
result <- find.string( datalist[[i]] )
return(result) }
The result will be a list with two elements, each containing the results from a chain. You can then combine it as you wish, or use a .combine function. See the foreach help for details.
You can use this code any time you have a case like number 1 described above. Most of the time you can also use it for cases like number 2, if you spend some time thinking about how you want to divide the data, and then combine the results. Think of it as a "parallel wrapper".
It should work in Windows, GNU/Linux, and Mac OS, but I haven't tested it on all of them.
I keep this script handy whenever I need a quick speed-up, but I still always start out by writing code I can run sequentially. Thinking in parallel hurts my brain.

modified activity selection

in activity selection we sort on finish time of activities and then apply the constraint that no two activities can overlap.i want to know whether can we do it by sorting on start time andthenseeing if activities do not overlap
i was going through http://www.geeksforgeeks.org/dynamic-programming-set-20-maximum-length-chain-of-pairs/
this link has a dynamic programming solution for finding maximum length chain of pairs of numbers .. this according to me is another formulation of activity selection problem but i have searched on net and as also have read cormen but everywhere they ask to sort on finish times ...
i guess it shouldnt matter on what times(start or finish)we sort but just want to confirm the same
In greedy algorithm we always try to maximize our result. Thus, In activity selection we try to accommodate as many processes as we can in a given time interval without overlapping each other.
If you sort on start time then your solution might not be an optimal solution. Let's take an example,
Processes start Time Finish Time
A 1 9
B 3 5
C 6 8
Sorted on start Time:
If you execute process A because it starts at the earliest no other process can be executed because they will overlap. Therefore, for a given time interval you can only execute one process.
Sorted on Finish Time:
If you execute process B because it ends at the earliest you can execute process C after that. Therefore, for a given time interval you can execute two processes.

Resources