Finding requests per second for distributed system - a textbook query - multithreading

Found a question in Pradeep K Sinha's book
From my understanding it is safe to assume howsoever number of threads are available. But how could we compute the time?

Single-threaded:
We want to figure out many requests per second the system can support. This is represented as n below.
1 second
= 1000 milliseconds
= 0.7n(20) + 0.3n(100)
Since 70% of the requests hit the cache, we represent the time spent handling requests that hit the cache with 0.7n(20). We represent the requests that miss the cache with 0.3n(100). Since the thread goes to sleep when there is a cache miss and it contacts the file server, we don't need to worry about interleaving the handling for the next request with the current one.
Solving for n:
1000
= 0.7n(20) + 0.3n(100)
= 0.7n(20) + 1.5n(20)
= 2.2n(20)
= 44n
=> n = 100/44 = 22.73.
Therefore, a single thread can handle 22.73 requests per second.
Multi-threaded:
The question does not give much detail about the multi-threaded state, apart from the context switch cost. The answer to this question depends on several factors:
How many cores does the computer have?
How many threads can exist at once?
When there is a cache miss, how much time does the computer spend servicing the request and how much time does the computer spend sleeping?
I am going to make the following assumptions:
There is 1 core.
There is no bound on how many threads can exist at once.
On a cache miss, the computer spends 20 milliseconds servicing the request (e.g. checking the cache, contacting the file server, and forwarding the response to the client) and 80 milliseconds sleeping.
I can now solve for n:
1000 milliseconds
= 0.7n(20) + 0.3n(20).
On a cache miss, a thread spends 20 milliseconds doing work and 80 milliseconds sleeping. When the thread is sleeping, another thread can run and do useful work. Thus, on a cache miss, the thread only uses the CPU for 20 milliseconds, whereas when the process was single-threaded, the next request was blocked from being serviced for 100 milliseconds.
Solving for n:
1000 milliseconds
= 0.7n(20) + 0.3n(20)
= 1.0n(20)
= 20n
=> n = 1000/20 = 50.
Therefore, a multi-threaded process can handle 50 requests per second given the assumptions above.

Related

Handling RabbitMQ heartbeats when cpu is loaded 100% for a long time

I'm using pika 1.1 and graph-tool 3.4 in my python application. It consumes tasks from RabbitMQ, which then used to build graphs with graph-tool and then runs some calculations.
Some of the calculations, such as betweenness, take a lot of cpu power which make cpu usage hit 100% for a long time. Sometimes rabbitmq connection drops down, which causes task to start from the beginning.
Even though calculations are run in a separate process, my guess is during the time cpu is loaded 100%, it can't find any opportunity to send a heartbeat to rabbitmq, which causes connection to terminate. This doesn't happen all the time, which indicates by chance it could send heartbeats time to time. This is only my guess, I am not sure what else can cause this.
I tried lowering the priority of the calculation process using nice(19), which didn't work. I'm assuming it's not affecting the processes spawned by graph-tool, which parallelizes work on its own.
Since it's just one line of code, graph.calculate_betweenness(... I don't have a place to manually send heartbeats or slow the execution down to create chance for heartbeats.
Can my guess about heartbeats not getting sent because cpu is super busy be correct?
If yes, how can I handle this scenario?
Answering to your questions:
Yes, that's basically it.
The solution we do is creating a separate process for the CPU intensive tasks.
import time
from multiprocessing import Process
import pika
connection = pika.BlockingConnection(
pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
channel.exchange_declare(exchange='logs', exchange_type='fanout')
result = channel.queue_declare(queue='', exclusive=True)
queue_name = result.method.queue
channel.queue_bind(exchange='logs', queue=queue_name)
def cpu_intensive_task(ch, method, properties, body):
def work(body):
time.sleep(60) # If I remember well default HB is 30 seconds
print(" [x] %r" % body)
p = Process(target=work, args=(body,))
p.start()
# Important to notice if you do p.join() You will have the same problem.
channel.basic_consume(
queue=queue_name, on_message_callback=cpu_intensive_task, auto_ack=True)
channel.start_consuming()
I wonder if this is the best solution to this problem or if rabbitMQ is the best tool for CPU intensive tasks. (For really long CPU intensive tasks (more than 30 min) if you send manual ACK you will need to handle with this also: https://www.rabbitmq.com/consumers.html#acknowledgement-timeout)

Node.js performance question for outgoing requests

i remember the default thread pool size for node is 4 (or based on cpu count). This brings my question like this.
For the very basic simplified case, i'm writing a service1 in node, which sends requests to service2, wait till it finishes the computation and then continue. Now service2 in another server can handle 1000 requests at the same time, it takes time, also it's a blocking call (which is out of my control).
If i do the java way, i can create 1000 threads from glassfish, so the 1st 1000 blast requests can be processed at the same time. The 1001th may need to wait a little bit.
1000 incoming req -> java server1 -> 1000 threads -> 1000 outgoing req -> server2
But in node, if the thread pool size is 4 given it's a 4 core CPU machine, that means node app will be slower than java in this case ? What happens if i increase the pool size to 1000 ? Can i increase to 1000 ?
1000 incoming req -> node server1 -> ~4 threads -> 1000 outgoing req -> server2
I don't see an easy for node, or i can let node handle most stuff, for the above blocking call, add a small java server and dispatch outing req to that ? Any suggestion ?
UPDATE: found this, We use setTimeout( function(){} , 0 ); to create asynchronous functions in JavaScript!
https://medium.com/from-the-scratch/javascript-writing-your-own-non-blocking-asynchronous-functions-60091ceacc79
Guess if i convert the block call into async function, it can solve my issue, i hope, praying !!!
Node hands it's I/O tasks off to the operating system to handle, which are generally multi-threaded. It takes the approach of not having to wait for requests to finish (by blocking a thread), because it wastes time sitting. So, Node hands these tasks off and just tells it to poke Node when it's done. There is a very good related question.
How, in general, does Node.js handle 10,000 concurrent requests?

How can I improve performance with FutureTasks

The problem seems simple, I have a number (huge) of operations that I need to work and the main thread can only proceed when all of those operations return their results, however. I tried in one thread only and each operation took about let's say from 2 to 10 seconds at most, and at the end it took about 2,5 minutes. Tried with future tasks and submited them all to the ExecutorService. All of them processed at a time, however each of them took about let's say from 40 to 150 seconds. In the end of the day the full process took about 2,1 minutes.
If I'm right, all the threads were nothing but a way of execute all at once, although sharing processor's power, and what I thought I would get would be the processor working heavily to get me all the tasks executed at the same time taking the same time they take to excecuted in a single thread.
Question is: Is there a way I can reach this? (maybe not with future tasks, maybe with something else, I don't know)
Detail: I don't need them to exactly work at the same time that actually doesn't matter to me what really matters is the performance
You might have created way too many threads. As a consequence, the cpu was constantly switching between them thus generating a noticeable overhead.
You probably need to limit the number of running threads and then you can simply submit your tasks that will execute concurrently.
Something like:
ExecutorService es = Executors.newFixedThreadPool(8);
List<Future<?>> futures = new ArrayList<>(runnables.size());
for(Runnable r : runnables) {
es.submit(r);
}
// wait they all finish:
for(Future<?> f : futures) {
f.get();
}
// all done

Performance when calling fsync on multiple files vs one file

I have multiple threads each accepting requests, doing some processing, storing the result in a commit log, and returning the results. In order to guarantee that at most x seconds worth of data is lost, this commit log needs to be fsync'd every x seconds.
I would like to avoid synchronization between threads, which means they each need to have their own commit log rather than a shared log - is it possible to fsync all these different commit logs regularly in a performant way?
This is on Linux, ext4 (or ext3)
(Note: due to the nature of the code, even during normal processing the threads need to re-read some of their own recent data from the commit log (but never other threads commit log data), so I believe it would be impractical to use a shared log since many threads need to read/write to it)
If you only need flushing to happen every few seconds, do you need to fsync() at all? I.e. the OS should do it for you fairly regularly (unless the system is under heavy load and disk I/O is in short supply).
Otherwise, have your threads do something like:
if (high_resolution_time() % n == 0) {
fsync();
}
Where n is a value that would be e.g. 3 if high_resolution_time() returned returned Unix EPOCH time (which is expressed in seconds). Would make the thread flush the file every 3 seconds.
The problem, of course, is that you need much higher clock resolution to avoid having a thread that passes this code section several times per second not flush its file multiple times in quick succession. I don't know what programming language you use, but in C on Linux you could use
gettimeofday:
struct timeval tv;
gettimeofday(&tv, null);
double x = (double)tv.tv_sec * (double)1000000 + (double)tv.tv_usec;
if (x % 3000000 == 0) { // fsync every 3 seconds
fsync();
}

Coldfusion limit to total number of threads

I've got some code that is trying to create 100 threaded http calls. It seems to be getting capped at about 40.
When I do threadJoin I'm only getting 38 - 40 sets of results from my http calls, despite the loop being from 1 - 100.
// thread http calls
pages = 100;
for (page="1";page <= pages; page++) {
thread name="req#page#" {
grabber.setURL('http://site.com/search.htm');
// request headers
grabber.addParam(type="url",name="page",value="#page#");
results = grabber.send().getPrefix();
arrayAppend(VARIABLES.arrResults,results.fileContent);
}
}
// rejoin threads
for (page="2";page <= pages; page++) {
threadJoin('req#page#',10000);
}
Is there a limit to the number of threads that CF can create? Is it to do with Java running in the background? Or can it not handle that many http requests?
Is there a just a much better way for me to do this than threaded HTTP calls?
The result you're seeing is likely because your variables aren't thread safe.
grabber.addParam(type="url",name="page",value="#page#");
That line is accessing Variables.Page which is shared by all of the spawned threads. Because threads start at different times, the value of page is often different from the value you think it is. This will lead to multiple threads having the same value for page.
Instead, if you pass page as an attribute to the thread, then each thread will have its own version of the variable, and you will end up with 100 unique values. (1-100).
Additionally you're writing to a shared variable as well.
arrayAppend(VARIABLES.arrResults,results.fileContent);
ArrayAppend is not thread safe and you will be overwriting versions of VARIABLES.arrResults with other versions of itself, instead of appending each bit.
You want to set the result to a thread variable, and then access that once the joins are complete.
thread name="req#page#" page=Variables.page {
grabber.setURL('http://site.com/search.htm');
// request headers
grabber.addParam(type="url",name="page",value="#Attributes.page#");
results = grabber.send().getPrefix();
thread.Result = results.fileContent;
}
And the join:
// rejoin threads
for (page="2";page <= pages; page++) {
threadJoin('req#page#',10000);
arrayAppend(VARIABLES.arrResults, CFThread['req#page#'].Result);
}
In the ColdFusion administrator, there's a setting for how many will run concurrently, mine's defaulted to 10. The rest apparently are queued. An Phantom42 mentions, you can up the number of running CF threads, however, with 100 or more threads, you may run into other problems.
On 32-bit processes, your whole process can only use 2gig of memory. Each thread uses up an amount of Stack memory, which isn't part of the heap. We've had problems with running out of memory with high numbers of threads as your Java Binary+Heap+Non-Heap(PermGen)+(threads*512k) can easily go over the 2-gig limit.
You'd also have to allow enough threads to handle your code above, as well as other requests coming into your app, which may bog down the app as a whole.
I would suggest changing your code to create N threads, each of which does more than 1 request. It's more work, but you break the N requests=N Threads problem. There's a couple of approaches you can take:
If you think that each request is going to take roughly the same time, then you can split up the work and give each thread a portion to work on before you start each one up.
Or each thread picks a URL off a list and processes it, you can then join to all N threads. You'd need to make sure you put locking around whatever counter you used to track progress though.
Check your Maximum number of running JRun threads setting in ColdFusion Administrator under the Request Tuning tab. The default is 50.

Resources