Why multithreading doesn't make the execution faster - multithreading

So I have this experimental code:
class WorkLoader : Runnable {
private val id : Int
private val listener : Listener?
private val lock : ReentrantLock
private val condition : Condition
private val counter : Counter?
private var isFinished : Boolean
constructor(counter: Counter? = null, listener: Listener? = null) {
id = IdGenerator.getId()
isFinished = false
lock = ReentrantLock()
condition = lock.newCondition()
this.counter = counter
this.listener = listener
}
interface Listener {
fun onWorkStarted(id : Int)
fun onWorkFinished(id : Int, s : String, elapsed : Long)
}
override fun run() {
listener?.onWorkStarted(id)
val startTime = System.currentTimeMillis()
//The loop below just simply loads the CPU with useless stuff, it does nothing important
var s = ""
for (i in 1 .. 10_000_000) {
counter?.add()
val c : Char = (i % 95 + 32).toChar()
s += c
if (s.length > 200) {
s = s.substring(1)
}
}
val elapsedTime = System.currentTimeMillis() - startTime
listener?.onWorkFinished(id, s, elapsedTime)
lock.lock()
isFinished = true
condition.signal()
lock.unlock()
}
fun waitTillFinished() {
lock.lock()
while (!isFinished) {
condition.await()
}
lock.unlock()
}
}
And the main function that runs simultaneously 6 instances of WorkLoader in 6 separate threads:
fun main(arguments: Array<String>) {
println("Hello World!")
val workListener = WorkLoaderListener()
val workers = ArrayList<WorkLoader>()
for (i in 1..6) {
val workLoader = WorkLoader(counter = null, workListener)
workers.add(workLoader)
val thread = Thread(workLoader)
thread.start()
}
for (worker in workers) {
worker.waitTillFinished()
}
println("End of main thread")
}
class WorkLoaderListener : WorkLoader.Listener {
override fun onWorkStarted(id: Int) {
println("Work started, id:$id ${getFormattedTime()}")
}
override fun onWorkFinished(id: Int, s: String, elapsed : Long) {
println("Work ENDED, id:$id ${getFormattedTime()}, in ${elapsed/1000} s")
}
}
It takes 8s to get all 6 threads to finish execution. Here is the output:
Hello World!
Work started, id:1 21:12:26.577
Work started, id:0 21:12:26.577
Work started, id:2 21:12:26.577
Work started, id:4 21:12:26.577
Work started, id:5 21:12:26.577
Work started, id:3 21:12:26.577
Work ENDED, id:2 21:12:35.137, in 8 s
Work ENDED, id:1 21:12:35.137, in 8 s
Work ENDED, id:3 21:12:35.215, in 8 s
Work ENDED, id:0 21:12:35.215, in 8 s
Work ENDED, id:5 21:12:35.215, in 8 s
Work ENDED, id:4 21:12:35.231, in 8 s
End of main thread
However!!! only 1 instance of WorkLoader in a separate thread executes in just 1 second. Which makes it more efficient to run those threads one by one and not lunch them simultaneously.
Like this:
for (i in 1..6) {
val workLoader = WorkLoader(counter = null, workListener)
workers.add(workLoader)
val thread = Thread(workLoader)
thread.start()
//just one extra line to wait for the termination before starting another workLoader
workLoader.waitTillFinished() //I understand that the workLoader thread might still be running when this method returns,
// but it doesn't matter, the thread is about to die anyway
}
output:
Hello World!
Work started, id:0 21:23:33.622
Work ENDED, id:0 21:23:35.411, in 1 s
Work started, id:1 21:23:35.411
Work ENDED, id:1 21:23:36.545, in 1 s
Work started, id:2 21:23:36.545
Work ENDED, id:2 21:23:37.576, in 1 s
Work started, id:3 21:23:37.576
Work ENDED, id:3 21:23:38.647, in 1 s
Work started, id:4 21:23:38.647
Work ENDED, id:4 21:23:39.687, in 1 s
Work started, id:5 21:23:39.687
Work ENDED, id:5 21:23:40.726, in 1 s
End of main thread
So in this case the execution of the whole program ended in like 6 or 7 seconds. I have a 6 core intel CPU with 12 logical threads. So I'm expecting to have all 6 threads executed in like 2 seconds at most (when launched all at once). In first case (all threads at once) the CPU spiked to 100% utilization and it stayed there for the entire time of execution. In the second case (one thread at a time) the CPU spiked to 47% for a brief moment and the whole execution went slightly faster.
So what's the point of multithreading? Why is this happening? It feels like there is no point to have more then 1 worker thread, since that any additional threads will make all other threads slower, regardless of how many CPU cores you have at your disposal. And if one single thread is able to use all the cores of the CPU then why didn't my CPU spike to 100% load in the second case?

#Tenfour04, Thank you! your comment directed me to the right answer. The purpose of multithreading is SAVED!
So, apparently my String manipulations where indeed not multithreading friendly, don't know why.
So I changed my CPU loading code to this:
val cArr = arrayOfNulls<Char>(200)
for (i in 1..20_000_000) {
val cValue: Char = (i % 95 + 32).toChar()
if (cArr[0] == null) {
cArr[0] = cValue
} else {
var tempC = cValue
for (ci in cArr.indices) {
val temp = cArr[ci]
cArr[ci] = tempC
if (temp == null) {
break
}
tempC = temp
}
}
}
Now this code, executes in 3 seconds on 1 thread. As you can see in the output below:
Hello World!
All threads initiated
Work started, id:0 02:16:58.000
Work ENDED, id:0 02:17:01.189, in 3 s
End of main thread
Now 6 threads:
Hello World!
All threads initiated
Work started, id:0 02:18:48.830
Work started, id:2 02:18:48.830
Work started, id:1 02:18:48.830
Work started, id:5 02:18:48.830
Work started, id:4 02:18:48.830
Work started, id:3 02:18:48.830
Work ENDED, id:1 02:18:53.090, in 4 s
Work ENDED, id:0 02:18:53.168, in 4 s
Work ENDED, id:3 02:18:53.230, in 4 s
Work ENDED, id:4 02:18:53.246, in 4 s
Work ENDED, id:2 02:18:53.340, in 4 s
Work ENDED, id:5 02:18:53.340, in 4 s
End of main thread
6 times the amount of work, but it adds just 1 second to the total execution and wait, my CPU has 12 logical threads, so...
Hello World!
All threads initiated
Work started, id:1 02:22:43.299
Work started, id:8 02:22:43.299
Work started, id:3 02:22:43.299
Work started, id:5 02:22:43.299
Work started, id:7 02:22:43.299
Work started, id:9 02:22:43.299
Work started, id:11 02:22:43.299
Work started, id:10 02:22:43.299
Work started, id:0 02:22:43.299
Work started, id:2 02:22:43.299
Work started, id:4 02:22:43.299
Work started, id:6 02:22:43.299
Work ENDED, id:4 02:22:50.115, in 6 s
Work ENDED, id:11 02:22:50.132, in 6 s
Work ENDED, id:7 02:22:50.148, in 6 s
Work ENDED, id:1 02:22:50.148, in 6 s
Work ENDED, id:6 02:22:50.148, in 6 s
Work ENDED, id:9 02:22:50.148, in 6 s
Work ENDED, id:10 02:22:50.163, in 6 s
Work ENDED, id:5 02:22:50.163, in 6 s
Work ENDED, id:0 02:22:50.179, in 6 s
Work ENDED, id:8 02:22:50.195, in 6 s
Work ENDED, id:2 02:22:50.195, in 6 s
Work ENDED, id:3 02:22:50.210, in 6 s
End of main thread
Now if I go over 12 threads which is more then the amount of logical threads of my CPU, the time increases considerably and it becomes inefficient. I won't post the output on this one, too many lines, so just believe me.

Related

Only a portion of multiple non-returning function calls run concurrently with async/await

I have a test case that attempts to run async methods concurrently. My code attempts to start. 50 of these, but when I run it, only about 12 print the statement at the beginning of the async method.
What do I need to do to get them all running concurrently?
use futures::executor;
use futures::future;
use async_std;
async fn _spin(i: u32) {
println!("starting {}", i);
loop {
//do work
}
}
fn main() {
let mut futures = vec![];
for i in 0..50 {
futures.push(_spin(i));
}
let handles = futures.into_iter().map(async_std::task::spawn).collect::<Vec<_>>();
let joined = future::join_all(handles);
let _results = executor::block_on(joined);
}
output. Note the ones that get run are seemingly randomly picked.
starting 0
starting 7
starting 12
starting 2
starting 11
starting 16
starting 10
starting 17
starting 1
starting 15
starting 3
starting 13
If you call a "blocking" function, ie. a function that does a lot of work without calling await, then you should use spawn_blocking instead of spawn so that the function will get a dedicated thread. Otherwise tasks are spawned depending on the executor. In async-std, the default executor is a thread pool with at most as many threads as logical cores. You can however spawn tasks on other executors, including thread pools with a different number of threads:
futures::executor::ThreadPool;
let pool = ThreadPool::builder()
.pool_size (2)
.create()
.unwrap();
pool.spawn_ok (async { /* do work */ });
/* Wait for tasks to complete */

Running blocking CPU bound tasks on Kotlin coroutines

I have been experimenting with Kotlin and running blocking CPU tasks on kotlin coroutines. When things are blocking such as big cpu intensive computations we dont really have suspension but rather we need to launch things on different threads and let them run in parallel.
I managed to get the following code working as expected with async + Default dispatcher but wondered if it was gonna work with withContext and it did not.
fun cpuBlockingTasks() = runBlocking {
val time = measureTimeMillis {
val t1 = cpuTask(id = 1, blockTime = 500)
val t2 = cpuTask(id = 2, blockTime = 2000)
println("The answer is ${t1 + t2}")
}
println("Time taken: $time")
}
suspend fun cpuTask(id: Int, blockTime: Long): Int = withContext(Dispatchers.Default) {
println("work $id start ${getThreadName()}")
val res = doSomeCpuIntensiveTask(blockTime)
println("work $id end ${getThreadName()}")
res
}
fun doSomeCpuIntensiveTask(time: Long): Int {
Thread.sleep(time) // to mimick actual thread blocking / cpu work
return 1
}
This code completes in >2500 ms and runs on the same thread sequentially. I was expecting it to kick off the first coroutine in a thread, immediately return to the caller and kick of the second on a different thread but did not work like that. Anyone know why would that be and how it can be fixed without launching async coroutine in the caller function?
This it the output
work 1 start ForkJoinPool.commonPool-worker-5 #coroutine#1
work 1 end ForkJoinPool.commonPool-worker-5 #coroutine#1
work 2 start ForkJoinPool.commonPool-worker-5 #coroutine#1
work 2 end ForkJoinPool.commonPool-worker-5 #coroutine#1
The answer is 2
Time taken: 2523
You are not creating a new coroutine in cpuTask 1 and cpuTask 2. You are just switching context. It can be easily fixed with async:
fun cpuBlockingTasks() = runBlocking {
val time = measureTimeMillis {
val t1 = async { cpuTask(id = 1, blockTime = 500) }
val t2 = async { cpuTask(id = 2, blockTime = 2000) }
println("The answer is ${t1.await() + t2.await()}")
}
println("Time taken: $time") // Time taken: 2026
}

CPU-bound process blocks worker pool while using Child Process in NestJS HTTP server

Node version: v10.13.0
I'm trying a very simple test on NodeJS request concurrency involving heavy CPU-calculation. I understand NodeJS is not the best tool for CPU-bound processes, and that a child process should not be spawned systematically, but this code is for the sake of testing how the child process works. Also this is written in TypeScript, using NestJS.
src/app.controller.ts
import { Get, Param, Controller } from '#nestjs/common';
import fork = require('child_process');
#Controller()
export class AppController {
#Get()
async root(): Promise<string> {
let promise = new Promise<string>(
(resolve, reject) => {
// spawn new child process
const process = fork.fork('./src/cpu-intensive.ts');
process.on('message', (message) => {
// when process finished, resolve
resolve( message.result);
});
process.send({});
}
);
return await promise;
}
}
src/cpu-intensive.ts
process.on('message', async (message) => {
// simulates a 10s-long process
let now = new Date().getTime();
let waittime = 10000; // 10 seconds
while (new Date().getTime() < now + waittime) { /* do nothing */ };
// send response to master process
process.send({ result: 'Process ended' });
});
Such a long process, if executed without spawning new child processes, leads to this timeline of results, with 5 concurrent requests (noted from #1 to #5). Each process blocking the loop-event, each request has to wait for the previous ones to complete to be answered.
Time 0 10 20 30 40 50
#1 +----+
#2 +----+----+
#3 +----+----+----+
#4 +----+----+----+----+
#5 +----+----+----+----+----+
While spawning new child processes, I was expecting each process would be handled concurrently by a different logical core on my CPU (mine has 8 logical cores), leading to this predicted timeline:
Time 0 10 20 30 40 50
#1 +----+
#2 +----+
#3 +----+
#4 +----+
#5 +----+
Though, I observe this strange result on each test:
Time 0 10 20 30 40 50
#1 +----+
#2 +----+----+
#3 +----+----+----+
#4 +----+----+----++
#5 +----+----+----+-+
The first 3 requests act as if the worker pool was starved, though I'd assume that 3 different pools would have been created. The 2 last requests are very confusing, as they act like working concurrently with request #3.
I'm currently looking for an explanation for:
why the first 3 requests don't act as if running concurrently
why the last 3 requests act as if running concurrently
Please note that if I add another 'fast' method as follows:
#Get('fast')
async fast(): Promise<string> {
return 'Fast process ended.';
}
this method is not impacted by the CPU-intensive processes run in concurrency, and replies always instantly.
I performed test case on my machine and its working fine can you check that on your machine.
Node Version: v8.11.2 OS: macOs High Sierra 10.13.4, 8 Cores
child-process-test.js
const child_process = require('child_process');
for(let i=0; i<8; i++) {
console.log('Start Child Process:',i,(new Date()));
let worker_process = child_process.fork("cpu-intensive-child.js", [i]);
worker_process.on('close', function (code) {
console.log('End Child Process:', i , (new Date()), code);
});
}
cpu-intensive-child.js
const fs = require('fs');
// simulates a 10s-long process
let now = new Date().getTime();
let waittime = 10000; // 10 seconds
while (new Date().getTime() < now + waittime) { /* do nothing */ };
// send response to master process
// process.send({ result: 'Process ended' });
Output
You can check in output the difference is only 10 sec for all the process, you can perform this test case on you machine and let me know, may be it can help.

How do I kill linux spawnProcess when the main process suddenly dies?

I have come across a problem with my application and the spawnProcess.
If the main application for some reason dies/is killed then the spawned processes seem to live on and I can't reach them unless I use terminal to kill them via their PIDs. My goal is if the main application dies then the spawned processes should be killed also, somehow.
My code is like this
auto appPid = spawnProcess("path/to/process");
scope(exit){ auto exitcode = wait(appPid);
stderr.writeln(...);}
And if I use the same approach when the main process dies, using wait(thisProcessID) I get an error. "No overload matches". Any ideas how to solve this problem?
Here's some code that will do it on Linux. It doesn't have all the features of the stdlib's spawnProcess, it just shows the bare basics, but expanding it from here isn't hard if you need more.
import core.sys.posix.unistd;
version(linux) {
// this function is Linux-specific
import core.stdc.config;
import core.sys.posix.signal;
// we can tell the kernel to send our child process a signal
// when the parent dies...
extern(C) int prctl(int, c_ulong, c_ulong, c_ulong, c_ulong);
// the constant I pulled out of the C headers
enum PR_SET_PDEATHSIG = 1;
}
pid_t mySpawnProcess(string process) {
if(auto pid = fork()) {
// this branch is the parent, it can return the child pid
// you can:
// import core.sys.posix.sys.wait;
// waitpid(this_ret_value, &status, 0);
// if you want the parent to wait for the child to die
return pid;
} else {
// child
// first, tell it to terminate when the parent dies
prctl(PR_SET_PDEATHSIG, SIGTERM, 0, 0, 0);
// then, exec our process
char*[2] args;
char[255] buffer;
// gotta copy the string into another buffer
// so we zero terminate it and have a C style char**...
buffer[0 .. process.length] = process[];
buffer[process.length] = 0;
args[0] = buffer.ptr;
// then call exec to run the new program
execve(args[0], args.ptr, null);
assert(0); // never reached
}
}
void main() {
mySpawnProcess("/usr/bin/cat");
// parent process sleeps for one second, then exits
usleep(1_000_000);
}
So the lower level functions need to be used, but Linux does have a function that does what you need.
Of course, since it sends a signal, your child might want to handle that to close more gracefully than the default termination, but try this program and run ps while it sleeps to see cat running, then notice the cat dies when the parent exits.

Delete pending task in SimGrid

I have process worker which launches executor. Executor is a process which creates a 10-sec task and executes it. But after 2 sec worker kills executor process. SimGrid gives me a log after killing executor:
[ 2.000000] (0:maestro#) dp_objs: 1 pending task?
How should I properly destroy tasks and task_data when another process kill currently working process?
int worker(int argc, char *argv[])
{
msg_process_t x = MSG_process_create("", executor, NULL, MSG_host_self());
MSG_process_sleep(2);
MSG_process_kill(x);
}
int executor(){
MSG_process_on_exit(my_onexit, NULL);
task = MSG_task_create("", 1e10, 10, NULL);
MSG_task_execute(task);
return 0;
}
int my_onexit() {
MSG_task_cancel(task);
XBT_INFO("Exiting now (done sleeping or got killed).");
return 0;
}
UPD:
I declared a global variable msg_task_t task.
Now when I run code I have:
[ 2.000000] (0:maestro#) Oops ! Deadlock or code not perfectly clean.
[ 2.000000] (0:maestro#) 1 processes are still running, waiting for something.
[ 2.000000] (0:maestro#) Legend of the following listing: "Process <pid> (<name>#<host>): <status>"
[ 2.000000] (0:maestro#) Process 2 (#Worker2)
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
I expected that simgrid would show xbt_info message, but it didn't and interrupted with SIGABRT error.
You should MSG_task_cancel() the task that you want to "kill". You could do that in a function that is registered in the MSG_process_on_exit() callback.
Thinking again about it, the message that you see is not an error message but merely a warning. You can ignore it safely. I am pretty sure that executed tasks are automatically canceled when the processor is killed.
So you don't have anything to do anything to get it working, I'd say. Just ignore that message.

Resources