Calculating speed up and efficiency of a parallel program

Calculating speed up and efficiency of a parallel program - multithreading

Lets say the least time taken for a parallel program is 25 ms for 32 threads and it takes 400 ms for 1 thread, how can I find speed up and efficiency of the program?
speed up = 400 / 25 = 16
efficiency = speedup /no. of threads = 0.5 x 100 = 50 %
Technically, if we calculate efficiency for 1 thread it will result in 100 %
Even when I calculated the efficiency for other number of threads that took more time, their efficiency was higher than the "optimum" time which took 32 threads.
So are my calculations correct? and how can I tell which is the most efficient?

Related

diﬀerence between counting packets and counting the total number of bytes in the packets

I'm reading perfbook. In chapter5.2, the book give some example about statistical counters. These example can solve the network packet count problem.
Quick Quiz 5.2: Network-packet counting problem. Suppose that you need
to collect statistics on the number of networking packets (or total
number of bytes) transmitted and/or received. Packets might be
transmitted or received by any CPU on the system. Suppose further that
this large machine is capable of handling a million packets per
second, and that there is a system-monitoring package that reads out
the count every ﬁve seconds. How would you implement this statistical
counter?
There is one QuickQuiz ask about difference between counting packets and counting the total number of bytes in the packets.
I can't understand the answer. After reading it, I still don't know the difference.
The example in "To see this" paragraph, if changing number the 3 and 5 to 1, what difference does it make?
Please help me to understand it.
QuickQuiz5.26: What fundamental difference is there between counting
packets and counting the total number of bytes in the packets, given
that the packets vary in size?
Answer: When counting packets, the
counter is only incremented by the value one. On the other hand, when
counting bytes, the counter might be incremented by largish numbers.
Why does this matter? Because in the increment-by-one case, the value
returned will be exact in the sense that the counter must necessarily
have taken on that value at some point in time, even if it is
impossible to say precisely when that point occurred. In contrast,
when counting bytes, two different threads might return values that are
inconsistent with any global ordering of operations.
To see this, suppose that thread 0 adds the value three to its counter,
thread 1 adds the value ﬁve to its counter, and threads 2 and 3 sum the
counters. If the system is “weakly ordered” or if the compiler uses
aggressive optimizations, thread 2 might ﬁnd the sum to be three and
thread 3 might ﬁnd the sum to be ﬁve. The only possible global orders
of the sequence of values of the counter are 0,3,8 and 0,5,8, and
neither order is consistent with the results obtained.
If you missed > this one, you are not alone. Michael Scott used this
question to stump Paul E. McKenney during Paul’s Ph.D. defense.

I can be wrong but presume that idea behind that is the following: suppose there are 2 separate processes which collect their counters to be summed up for a total value. Now suppose that there are some sequences of events which occur simultaneously in both processes, for example a packet of size 10 comes to the first process and a packet of size 20 comes to the second at the same time and after some period of time a packet of size 30 comes to the first process at the same time when a packet of size 60 comes to the second process. So here is the the sequence of events:
Time point#1 Time point#2
Process1: 10 30
Process2: 20 60
Now let's build a vector of possible total counter states after the time point #1 and #2 for a weakly ordered system, considering the previous total value was 0:
Time point#1
0 + 10 (process 1 wins) = 10
0 + 20 (process 2 wins) = 20
0 + 10 + 20 = 30
Time point#2
10 + 30 = 40 (process 1 wins)
10 + 60 = 70 (process 2 wins)
20 + 30 = 50 (process 1 wins)
20 + 60 = 80 (process 2 wins)
30 + 30 = 60 (process 1 wins)
30 + 60 = 90 (process 2 wins)
30 + 90 = 110
Now presuming that there can be some period of time between time point#1 and time point#2 let's assess which values reflect the real state of the system. Apparently all states after time point#1 can be treated as valid as there was some precise moment in time when total received size was 10, 20 or 30 (we ignore the fact the the final value may not the actual one - at least it contains a value which was actual at some moment of system functioning). For the possible states after the Time point#2 the picture is slightly different. For example the system has never been in the states 40, 70, 50 and 80 but we are under the risk to get these values after the second collection.
Now let's take a look at the situation from the number of packets perspective. Our matrix of events is:
Time point#1 Time point#2
Process1: 1 1
Process2: 1 1
The possible total states:
Time point#1
0 + 1 (process 1 wins) = 1
0 + 1 (process 1 wins) = 1
0 + 1 + 1 = 2
Time point#2
1 + 1 (process 1 wins) = 2
1 + 1 (process 2 wins) = 2
2 + 1 (process 1 wins) = 3
2 + 1 (process 2 wins) = 3
2 + 2 = 4
In that case all possible values (1, 2, 3, 4) reflect a state in which the system definitely was at some point in time.

How to calculate Pacing time in load runner

I have to run 100 iterations with 50 users. The total duration of the test is 1 hour. 1 user can do 2 iterations and the number of transactions in the script is 6.
How to calculate pacing time?

Example:
1000 Users, 10000 Full Iterations per hour
10,000/1,000 = 10 iterations per user per hour
3600 seconds per hour /10 iterations per user per hour = one iteration every 360 seconds ( six minutes ) on average
The random algorithm in LoadRunner is based upon the C rand() function, which is approximately (but not exactly ) uniform for large datasets. So, I take the average pacing interval from the start of one iteration to the next and then adjust it by plus/minus 20%.
So, your 360 ( 0:06:00 ) second pacing becomes a range from 288 seconds (0:04:48) to 432 seconds (0:07:12 ).
You would run these calculations for each business process you want to stage
For think time look to your production logs for information on the range of users from page X to Page X+1. This is easily achievable since each top level page refers to the REFERER, or previous page that it came from. A comparison of the timestamps grouped by client IP can provide that range you need for think times.

Always Apply Little's Law for calculate Pacing, ThinkTime, No.of VUsers
From Little's Law: No of VUsers= Throughput*(Responce_Time + Think_Time)
Expl.
Throughput= Total No of Transactions/Time in Seconds
, Pacing= (Response_Time + Think_Time)
From Your Requirements-
Total No of iterations 100 and 1 iteration have 6 transactions, So total no of transactions = 600
Throughput for 1 Minute is: 600/60 = 10
, Throughput for 1 Sec is: 0.16
According to formula 50 = 0.16*(Pacing)
Pacing = 312.5 seconds
To achieve 100 Iterations in 1 Hour you have to set pacing 312.5 seconds, Make sure Pacing = Response_time + Think_Time.

Pacing is the 'inter-iteration' gap and it is used to control the rate of iterations during the test. If the goal for 1 user is to complete 2 iterations per hour, that results into a Pacing of 1800sec (little's law mentioned above) . Now as long as the summation of resp times of those 6 transactions and think time between them is less than 1800s, you will be able to achieve the desired rate.
NOTE: iteration is not equal to transaction, unless the iteration has just one transaction. Refer this to get a pictorial understanding
https://theperformanceengineer.com/2013/09/11/loadrunner-how-to-calculate-transaction-per-second-tps/

Pacing is the wait time between iterations so i'm agree with #CyberNinja, in your use case pacing is 1800s because it's the max duration of your script that achieve your goal : produce 100 iterations with 50 users in a hour.
Pacing is not Response_time + Think_Time!

According to Little's Law :
No. of Concurrent Users(N) =
Throughput or TPS(X) * [
Response Time (RT) + Think Time (TT) + Pacing (P)
]
Here RT+TT is Script Execution Time SET which you can calculate by running script once and adding up all the RT of transactions and all think times.
Assume SET to be 60 seconds.
As per your question
total transactions in 1 hr =
100(Iterations) *
50(Users) *
2(Each User Iteration) *
6(No. of Transactions)
= 60000 Transactions/hr
Converting it to TPS = 60000/3600 = 16.66
Now Putting all values in Little's Law:
50 = 16.66 (60 - Pacing)
Pacing = 60 - 50/16.66
Pacing = 57 secs (approx).

Constraining the range of decision variables based on other decision variables

I have a regular classroom assignment problem with course sizes and class capacities. Decision variables are binary. The model allows assigning one course to more than one room as long as the total capacity assigned is bigger than the course size. The constraint I want to add to this model is to make sure that the respective sizes of the rooms assigned to each course are within a reasonable range (say 20 seats) from each other. How can this be done in a linear way? How can I prevent the model from assigning a course of 60 students to 2 rooms of 10 and 50 capacities and instead make sure their sizes are close together (preferably even equal).
I'm using Excel with OpenSolver.
Here's some sample data:
Course/Room 324A 321D 124B 328 Course Size Capacity Assigned Wasted
Management 0 0 0 1 15 25 10
Engineering 1 0 0 0 20 20 0
Science 0 1 1 0 60 75 15
Room Sizes 20 40 35 25
The objective is to minimize the total space wasted (which is 25 seats in this example).

Introduce variables minseat and maxseat and form the inequalities:
minseat(course) <= seats(room)+(1-assign(course,room))*M
maxseat(course) >= seats(room)-(1-assign(course,room))*M
maxseat(course)-minseat(course) <= 20
Alternatively put maxseat(course)-minseat(course) in the objective with some cost factor. Choose M judiciously.

Find the minimum number of tanks to hold the maximum quantity of wines, at each tank maximum possible capacity

My business is in the wine reselling business, and we have this problem I've been trying to solve. We have 50 - 70 types of wine to be stored at any time, and around 500 tanks of various capacity. Each tank can only hold 1 type of wine. My job is to determine the minimum number of tanks to hold the maximum number of type of wines, each filled as close to its maximum capacity as possible, i.e 100l of wine should not be stored in a 200l tank if 2 tanks of 60l and 40l also exist.
I've been doing the job by hand in excel and want to try to automate the process, but using macros and array formulas quickly get out of hand. I can write a simple program in C and Swift, but stuck at finding a general algorithm. And pointer on where I can start is much appreciated. A full solution and I will send you a bottle ;)
Edit: for clarification, I do know how many types of wine I have and their total quantity, i.e Pinot at 700l, Merlot 2000l, etc. These change every week. The tanks however have many different capacities (40, 60, 80, 100, 200 liters etc) and change at irregular interval since they have to be taken out for cleaning and replaced. Simply using 70 tanks to hold 70 types is not possible.
Also, total quantity of wine never matches total tanks' capacity, and I need to use the minimum number of tanks to hold the maximum amount of wine. In case of insufficient capacity the amount of wine left over must be smallest possible (they'll spoil quickly). If there is left-over, the amount left over of each type must be proportional to their quantity.
A simplified example of the problem is this:
Wine:
----------
Merlot 100
Pinot 120
Tocai 230
Chardonay 400
Total: 850L
Tanks:
----------
T1 10
T2 20
T3 60
T4 150
T5 80
T6 80
T7 90
T8 80
T9 50
T10 110
T11 50
T12 50
Total: 830L

This greedy-DP algorithm attempts to perform a proportional split: for example, if you have 700l Pinot, 2000l Merlot and tank capacities 40, 60, 80, 100, 200, that means a total capacity of 480.
700 / (700 + 2000) = 0.26
2000 / (700 + 2000) = 0.74
0.26 * 480 = 125
0.74 * 480 = 355
So we will attempt to store 125l of the Pinot and 355l of the Merlot, to make the storage proportional to the amounts we have.
Obviously this isn't fully possible, because you cannot mix wines, but we should be able to get close enough.
To store the Pinot, the closest would be to use tanks 1 (40l) and 3 (80l), then use the rest for the Merlot.
This can be implemented as a subset sum problem:
d[i] = true if we can make sum i and false otherwise
d[0] = true, false otherwise
sum_of_tanks = 0
for each tank i:
sum_of_tanks += tank_capacities[i]
for s = sum_of_tanks down to tank_capacities[i]
d[s] = d[s] OR d[s - tank_capacities[i]]
Compute the proportions then run this for each type of wine you have (removing the tanks already chosen, which you can find by using the d array, I can detail if you want). Look around d[computed_proportion] to find the closest sum possible to achieve for each wine type.
This should be fast enough for a few hundred tanks, which I'm guessing don't have capacities larger than a few thousands.

theoretical Question about multithreading (scaling)

I need to answer following question:
A server needs to do 15 ms of work per request for a file. if the file is not in cache, the harddisk must be accessed and thread sleeps for 75 ms. This happens in 1/3 of the cases.
a) How many request can the server process per second with 1 Thread?
->15 ms + 1/3 * 75 ms = 40 ms per request -> 1000/40 ms = 25 Request per second
b) How many with multiple threads?
Is there a formula for this?
For 2 threads I got 40.625 Request per second:
25 ms pause on average -> 25/40 = 0.625 -> 25 * 1.625 = 40.625 Requests per second
What about 3 or more threads?

I know I'm doing your homework but it is interesting because the problem statement is flawed. It can't be answered as-is because a important piece of info is missing: the number of cores that the machine has available. Running more threads than you've got cores doesn't improve throughput. Assuming J jobs, T threads and C cores, the amount of time spent on them is
time = J x 15 msec / min(T, C) + J x 75 msec / 3
Solving for J per second:
rate = 1000 / (15 / min(T, C) + 25)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string