How to handle n not a multiple p in worker processes in matrix multiplication? - multithreading

I am working on a problem regarding pseudocode for matrix multiplication using worker processes. w is the amount of workers, p is the amount of processors and n is the amount of processes.
The psuedocode calculates the matrix result by dividing the i rows into P strips of n/P rows each.
process worker[w = 1 to P]
int first = (w-1) * n/P;
int last = first + n/P - 1;
for [i = first to last] {
for [j = 0 to n-1] {
c[i,j] = 0.0;
for[k = 0 to n-1]
c[i,j] = c[i,j] + a[i,k]*b[k,j];
}
}
}
my question is how I would handle if n was not a multiple of P processors as can happen often where n is not divisible by p?

The simplest solution is to give the last worker all the remaining rows (they won't be more than P-1):
if w == P {
last += n mod P
}
n mod P is the remainder of the division of n by P.
Or change the calculation of first and last like this:
int first = ((w-1) * n)/P
int last = (w * n)/P - 1
This automatically takes care for the case when n is not divisible by P. The brackets are not really necessary in most languages where * and / have the same precedence and are left-associative. The point is that the multiplication by n should happen before the division by P.
Example: n = 11, P = 3:
w = 1: first = 0, last = 2 (3 rows)
w = 2: first = 3, last = 6 (4 rows)
w = 3: first = 7, last = 10 (4 rows)
This is a better solution as it spreads the remainder of the division evenly among the workers.

Related

What is the worst case for binary search

Where should an element be located in the array so that the run time of the Binary search algorithm is O(log n)?
The first or last element will give the worst case complexity in binary search as you'll have to do maximum no of comparisons.
Example:
1 2 3 4 5 6 7 8 9
Here searching for 1 will give you the worst case, with the result coming in 4th pass.
1 2 3 4 5 6 7 8
In this case, searching for 8 will give the worst case, with the result coming in 4 passes.
Note that in the second case searching for 1 (the first element) can be done in just 3 passes. (compare 1 & 4, compare 1 & 2 and finally 1)
So, if no. of elements are even, the last element gives the worst case.
This is assuming all arrays are 0 indexed. This happens due to considering the mid as float of (start + end) /2.
// Java implementation of iterative Binary Search
class BinarySearch
{
// Returns index of x if it is present in arr[],
// else return -1
int binarySearch(int arr[], int x)
{
int l = 0, r = arr.length - 1;
while (l <= r)
{
int m = l + (r-l)/2;
// Check if x is present at mid
if (arr[m] == x)
return m;
// If x greater, ignore left half
if (arr[m] < x)
l = m + 1;
// If x is smaller, ignore right half
else
r = m - 1;
}
// if we reach here, then element was
// not present
return -1;
}
// Driver method to test above
public static void main(String args[])
{
BinarySearch ob = new BinarySearch();
int arr[] = {2, 3, 4, 10, 40};
int n = arr.length;
int x = 10;
int result = ob.binarySearch(arr, x);
if (result == -1)
System.out.println("Element not present");
else
System.out.println("Element found at " +
"index " + result);
}
}
Time Complexity:
The time complexity of Binary Search can be written as
T(n) = T(n/2) + c
The above recurrence can be solved either using Recurrence T ree method or Master method. It falls in case II of Master Method and solution of the recurrence is Theta(Logn).
Auxiliary Space: O(1) in case of iterative implementation. In case of recursive implementation, O(Logn) recursion call stack space.

Find the number of subsequences of a n-digit number, that are divisible by 8

Given n = 1 to 10^5, stored as a string in decimal format.
Example: If n = 968, then out of all subsequences i.e 9, 6, 8, 96, 68, 98, 968 there are 3 sub-sequences of it, i.e 968, 96 and 8, that are divisible by 8. So, the answer is 3.
Since the answer can be very large, print the answer modulo (10^9 + 7).
You can use dynamic programming. Let f(len, sum) be the number of subsequences of the prefix of length len such that their sum is sum modulo 8 (sum ranges from 0 to 7).
The value of f for len = 1 is obvious. The transitions go as follows:
We can start a new subsequence in the new position: f(len, a[i] % 8) += 1.
We can continue any subsequence from the shorter prefix:
for old_sum = 0..7
f(len, (old_sum * 10 + a[i]) % 8) += f(len - 1, old_sum) // take the new element
f(len, old_sum) += f(len - 1, old_sum) // ignore the new element
Of course, you can perform all computations module 10^9 + 7 and use a standard integer type.
The answer is f(n, 0) (all elements are taken into account and the sum modulo 8 is 0).
The time complexity of this solution is O(n) (as there are O(n) states and 2 transition from each of them).
Note: if the numbers can't have leading zeros, you can just one more parameter to the state: a flag that indicates whether the first element of the subsequence is zero (this sequences should never be extended). The rest of the solution stays the same.
Note: This answer assumes you mean contiguous subsequences.
The divisibility rule for a number to be divisible by 8 is if the last three digits of the number are divisible by 8. Using this, a simple O(n) algorithm can be obtained where n is the number of digits in the number.
Let N=a_0a_1...a_(n-1) be the decimal representation of N with n digits.
Let the number of sequences so far be s = 0
For each set of three digits, a_i a_(i+1) a_(i+2), check if the number is divisible by 8. If so, add i + 1 to the number of sequences, i.e., s = s + i. This is because all strings a_k..a_(i+2) will be divisible by 8 for k ranging from 0..i.
Loop i from 0 to n-2-1 and continue.
So, if you have 1424968, the subsequences divisible are at:
i=1 (424 yielding i+1 = 2 numbers: 424 and 1424)
i=3 (496 yielding i+1 = 4 numbers: 496, 2496, 42496, 142496)
i=4 (968 yielding i+1 = 5 numbers: 968, 4968, 24968, 424968, 1424968)
Note that some small modifications will be needed to consider numbers lesser than three digits in length.
Hence the total number of sequences = 2 + 4 + 5 = 11. Total complexity = O(n) where n is the number of digits.
One can use the fact that for any three-digit number abc the following holds:
abc % 8 = ((ab % 8) * 10 + c) % 8
Or in other words: the test for a number with a fixed start-index can be cascaded:
int div8(String s){
int total = 0, mod = 0;
for(int i = 0; i < s.length(); i++)
{
mod = (mod * 10 + s.charAt(i) - '0') % 8
if(mod == 0)
total++;
}
return total;
}
But we don't have fixed start-indices!
Well, that's pretty easy to fix:
Suppose two sequences a and b, such that int(a) % 8 = int(b) % 8 and b is a suffix of a. No matter what how the sequence continues, the modulos of a and b will always remain equal. Thus it's sufficient to keep track of the number of sequences that share the property of having an equal value modulo 8.
final int RESULTMOD = 1000000000 + 7;
int div8(String s){
int total = 0;
//modtable[i] is the number of subsequences with int(sequence) % 8 = i
int[] modTable = new int[8];
for(int i = 0; i < s.length(); i++){
int[] nextTable = new int[8];
//transform table from last loop-run (shared modulo)
for(int j = 0; j < 8; j++){
nextTable[(j * 10 + s.charAt(i) - '0') % 8] = modTable[j] % RESULTMOD;
}
//add the sequence that starts at this index to the appropriate bucket
nextTable[(s.charAt(i) - '0') % 8]++;
//add the count of all sequences with int(sequence) % 8 = 0 to the result
total += nextTable[0];
total %= RESULTMOD;
//table for next run
modTable = nextTable;
}
return total;
}
Runtime is O(n).
There are 10 possible states a subsequence can be in. The first is empty. The second is that there was a leading 0. And the other 8 are a ongoing number that is 0-7 mod 8. You start at the beginning of the string with 1 way of being empty, no way to be anything else. At the end of the string your answer is the number of ways to have a leading 0 plus an ongoing number that is 0 mod 8.
The transition table should be obvious. The rest is just normal dynamic programming.

Time Complexity of dependant nested loop

I've had a look at similar questions that have been asked, and have asked my classmates for advice but I am questioning the answer.
What's the time complexity of this algorithm?
for (i = 1; i < n; i *= 2)
for (j = 1; j < i; j *= 2)
\\ c elementary operations
I have been told O(log(n))^2 but from what I've read and tried it looks like O(log(n)*log(log(n))). Any help?
The inner loops repeats itself log_2(i) times for each iteration of the outer loop.
Let's sum that up then
(1) T(n) = log_2(1) + log_2(2) + log_2(4) + log_2(8) + ... + log_2(n)
(2) T(n) = sum { log_2(2^i) | i=0,1,..,log_2(n) }
(3) T(n) = sum { i * log_2(2) | i=0,1,...,log_2(n) }
(4) T(n) = 0 + 1 + ... + log_2(n)
(5) T(n) = (log_2(n) + 1)(log_2(n))/2
(6) T(n) is in O(log_2(n)^2)
Explanation:
(1) -> (2) is simply summation shorthand
(2) -> (3) is because log(a^b) = blog(a)
(3) -> (4) log_2(2) = 1
(4) -> (5) Sum of arithmetic progression
(5) -> (6) is giving asymptotic notation

loops and Factorial calculation in C#

I am just starting to learn c#. I tried calculating factorial function for a number. The number of times the loop runs is out of my understanding. here is the code for the function.
static int func(int p)
{
int l = 1;
while(p>0)
{
l = l*p;
p--;
}
}
Here the explanation.
int l = 1 //Because is the initial number
while (p > 0) //THis means that while the number you want to get the factorial is bigger than 0, everything inside the loop will execute
Imagine that p = 3, then:
1st time - l * 3 = 3. and substract -1 to p.
2nd time - Now, p = 2, then l * 2 = 6, and substract -1 to p.
3rd time - Now, p = 1, then l ? 1 = 6, and substact -1 to p.
Now p = 0, then the loop will end.
**Number of times the while loop has been executed are 3 if p = 3.
If p = 10, times will be 10.**
Your function will be something like this:
public int GetFactorial(int number)
{
int start = 1;
while(number > 0)
{
start = start * number;
number--;
}
return start;
}
Better now? :)

Minimum no. of comparisons to find median of 3 numbers

I was implementing quicksort and I wished to set the pivot to be the median or three numbers. The three numbers being the first element, the middle element, and the last element.
Could I possibly find the median in less no. of comparisons?
median(int a[], int p, int r)
{
int m = (p+r)/2;
if(a[p] < a[m])
{
if(a[p] >= a[r])
return a[p];
else if(a[m] < a[r])
return a[m];
}
else
{
if(a[p] < a[r])
return a[p];
else if(a[m] >= a[r])
return a[m];
}
return a[r];
}
If the concern is only comparisons, then this should be used.
int getMedian(int a, int b , int c) {
int x = a-b;
int y = b-c;
int z = a-c;
if(x*y > 0) return b;
if(x*z > 0) return c;
return a;
}
int32_t FindMedian(const int n1, const int n2, const int n3) {
auto _min = min(n1, min(n2, n3));
auto _max = max(n1, max(n2, n3));
return (n1 + n2 + n3) - _min - _max;
}
You can't do it in one, and you're only using two or three, so I'd say you've got the minimum number of comparisons already.
Rather than just computing the median, you might as well put them in place. Then you can get away with just 3 comparisons all the time, and you've got your pivot closer to being in place.
T median(T a[], int low, int high)
{
int middle = ( low + high ) / 2;
if( a[ middle ].compareTo( a[ low ] ) < 0 )
swap( a, low, middle );
if( a[ high ].compareTo( a[ low ] ) < 0 )
swap( a, low, high );
if( a[ high ].compareTo( a[ middle ] ) < 0 )
swap( a, middle, high );
return a[middle];
}
I know that this is an old thread, but I had to solve exactly this problem on a microcontroller that has very little RAM and does not have a h/w multiplication unit (:)). In the end I found the following works well:
static char medianIndex[] = { 1, 1, 2, 0, 0, 2, 1, 1 };
signed short getMedian(const signed short num[])
{
return num[medianIndex[(num[0] > num[1]) << 2 | (num[1] > num[2]) << 1 | (num[0] > num[2])]];
}
If you're not afraid to get your hands a little dirty with compiler intrinsics you can do it with exactly 0 branches.
The same question was discussed before on:
Fastest way of finding the middle value of a triple?
Though, I have to add that in the context of naive implementation of quicksort, with a lot of elements, reducing the amount of branches when finding the median is not so important because the branch predictor will choke either way when you'll start tossing elements around the the pivot. More sophisticated implementations (which don't branch on the partition operation, and avoid WAW hazards) will benefit from this greatly.
remove max and min value from total sum
int med3(int a, int b, int c)
{
int tot_v = a + b + c ;
int max_v = max(a, max(b, c));
int min_v = min(a, min(b, c));
return tot_v - max_v - min_v
}
There is actually a clever way to isolate the median element from three using a careful analysis of the 6 possible permutations (of low, median, high). In python:
def med(a, start, mid, last):
# put the median of a[start], a[mid], a[last] in the a[start] position
SM = a[start] < a[mid]
SL = a[start] < a[last]
if SM != SL:
return
ML = a[mid] < a[last]
m = mid if SM == ML else last
a[start], a[m] = a[m], a[start]
Half the time you have two comparisons otherwise you have 3 (avg 2.5). And you only swap the median element once when needed (2/3 of the time).
Full python quicksort using this at:
https://github.com/mckoss/labs/blob/master/qs.py
You can write up all the permutations:
1 0 2
1 2 0
0 1 2
2 1 0
0 2 1
2 0 1
Then we want to find the position of the 1. We could do this with two comparisons, if our first comparison could split out a group of equal positions, such as the first two lines.
The issue seems to be that the first two lines are different on any comparison we have available: a<b, a<c, b<c. Hence we have to fully identify the permutation, which requires 3 comparisons in the worst case.
Using a Bitwise XOR operator, the median of three numbers can be found.
def median(a,b,c):
m = max(a,b,c)
n = min(a,b,c)
ans = m^n^a^b^c
return ans

Resources