Parallel execution with data-provider-thread-count and thread-count - multithreading

I have used DataProviders in my tests. I want to execute them in parallel[#DataProvider(parallel = true)].
When I give parallel = methods, data-provider-thread-count = 1 , thread-count =2.
Total no of thread i want to execute at a given time is 2. I want the DataProviders to pick up the next input whenever there is an idle thread. Currently the DataProvider is using the same thread(for one input after another) for execution which is more like sequential.
If I give data-provider-thread-count = 2 & thread-count =2, 2X2=4 threads are running in Parallel. This will increase the load when there are 100 DataProvider tests.
Is there a way to control the DP threads from spawning separate thread pool? So we can enable them to be picked up for parallel execution.

Related

Are system calls ran on the same thread?

When using the multi-threaded approach to solve IO Bound problems in Python, this works by freeing the GIL. Let us suppose we have Thread1 which takes 10 seconds to read a file, during this 10 seconds it does not require the GIL and can leave Thread2 to execute code. Thread1 and Thread2 are effectively running in parallel because Thread1 is doing system call operations and can execute independently of Thread2, however Thread1 is still executing code.
Now, suppose we have a setup using asyncio or any asynchronous programming code. When we do something such as,
file_content = await ten_second_long_file_read()
During the time in which await is called, system calls are done to read the content of the files and when it is done an event is sent back and code execution can be later continue. During the time we are await'ing, other code can be ran.
My confusion comes from the fact that asynchronous programming is primarily single threaded. With the multiple threaded approach when T1 is reading from a file, it is still performing code execution, it simply free'd the GIL to perform work in parallel with another thread. However with asynchronous programming, when we are awaiting, how is it performing other tasks when we are waiting, aswell as reading data in a single thread? I understand the multiple-threaded idea, but not asynchronous because it is still performing the system calls in a single thread. With asynchronous programming it has nowhere to free the GIL to, considering there is only one thread. Is asyncio secretly using threads?
The number of filehandles is independent of the GIL, and threads. Posix select documentation gives a bit of an idea of the distinct mechanism around file handles.
To illustrate I created three files, 1.txt etc. These are just:
1
one
Obviously open for reading is ok but not for writing. To make a ten second read I just held the filehandle open for ten seconds, reading the first line, waiting 10 seconds, then reading the second line.
asyncio version
import asyncio
from threading import active_count
do = ['1.txt', '2.txt', '3.txt']
async def ten_second_long_file_read():
while do:
doing = do.pop()
with open(doing, 'r') as f:
print(f.readline().strip())
await asyncio.sleep(10)
print(f"threads {active_count()}")
print(f.readline().strip())
async def main():
await asyncio.gather(asyncio.create_task(ten_second_long_file_read()),
asyncio.create_task(ten_second_long_file_read()))
asyncio.run(main())
This produces a very predictable output and as expected, one thread only.
3
2
threads 1
three
1
threads 1
two
threads 1
one
threading - changes
Remove async of course. Swap asyncio.sleep(10) for time.sleep(10). The main change is the calling function.
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as e:
e.submit(ten_second_long_file_read)
e.submit(ten_second_long_file_read)
Also a fairly predictable output, however you cannot rely on this.
3
2
threads 3
three
threads 3
two
1
threads 2
one
Running the same threaded version in debug the output is a bit random, on one run on my computer this was:
23
threads 3threads 3
twothree
1
threads 2
one
This highlights a difference in threads in that the running thread is pre-emptively switched creating a whole bundle of complexity under the heading thread safety. This issue does not exist in asyncio as there is a single thread.
multi-processing
Similar to the threaded code however __name__ == '__main__' is required and the process pool executor provides a snapshot of the context.
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as e:
e.submit(ten_second_long_file_read)
e.submit(ten_second_long_file_read)
if __name__ == '__main__': # required for executor
main()
Two big differences. No shared understanding of the do list so everything is done twice. Processes don't know what the other process has done. More CPU power available, however more work required to manage the load.
Three processes required for this so the overhead is large, however each process only has one thread.
3
3
threads 1
threads 1
three
three
2
2
threads 1
threads 1
two
two
1
1
threads 1
threads 1
one
one

How to execute longest tasks first with TBB

I have 10000 tasks that I am trying to schedule with tbb across N threads. 9900 tasks take O(1) unit time for execution whereas the remaining 100 tasks take O(100)-O(1000) time for execution. I want tbb to schedule these tasks such that the top 100 longest tasks are scheduled first on the threads, so that we maximize efficiency. If some threads finish faster, they can then run the shorter jobs at the end.
I have one (hypothetical) example: Out of 10000 tasks, I have one super long task which takes 1111 units, remaining 9999 tasks all take just 1 unit, and I have 10 threads. I want thread j to run this super long task in 1111 units of time, and the other 9 threads run the remaining 9999 tasks which take 1 unit each so those 9 threads run 9999/9=1111 tasks in 1111 units of time. Which means I am using my threads at 100% efficiency (ignore any overhead).
What I have is a function which does something like this:
bool run( Worker& worker, size_t numTasks, WorkerData& workerData ) {
xtbbTaskData<Worker,WorkerData> taskData( worker, workerData, numThreads);
arena.execute( [&]{ tbb::parallel_for( tbb::blocked_range<size_t>( 0, numTasks ), xtbbTask<Worker,WorkerData>( taskData ) ); } );
}
where I have a xtbb::arena arena created with numThreads. Worker is my worker class with 10000 tasks, workerData is the class with the data needed to run each task, and I have a template class xtbbTaskData which takes the worker, workerdata and eventually has the operator() which calls run on each task in the worker.
What syntax should I use to schedule these tasks such that the longest task gets schedules first? There is task priority, tbb area, enque etc that I have come across but I am not finding good examples of how to code this.
Do I need to create multiple arenas? Or multiple workers? Or put the longest tasks at the end of the vector in worker? Or something else?
If someone can point me to examples or templates that are already doing this, that would be great.
A task_group_context represents a group of tasks that can be canceled, or have their priority level set, together.
Refer to page no: 378 in the Pro TBB C++ Parallel Programming with
Threading Building Blocks textbook for examples.
We can also define priority as an attribute to the task arena.
Refer to page no: 494 in the Pro TBB C++ Parallel Programming with
Threading Building Blocks textbook for example.

What is the strategy by thread assignment in Kafka streams?

I am doing more less such a setup in the code:
// loop over the inTopicName(s) {
KStream<String, String> stringInput = kBuilder.stream( STRING_SERDE, STRING_SERDE, inTopicName );
stringInput.filter( streamFilter::passOrFilterMessages ).map( processor_i ).to( outTopicName );
// } end of loop
streams = new KafkaStreams( kBuilder, streamsConfig );
streams.cleanUp();
streams.start();
If there is e.g. num.stream.threads > 1, how tasks are assigned to the prepared and assigned (in the loop) threads?
I suppose (I am not sure) there is thread pool and with some kind of round-robin policy the tasks are assigned to threads, but it can be done fully dynamically in runtime or once at the beginning by creation of the filtering/mapping to structure.
Especially I am interesting in the situation when one topic is getting computing intensive tasks and other not. Is it possible that application will starve because all threads will be assigned to the processor which is time consuming.
Let's play a bit with scenario: num.stream.threads=2, no. partitions=4 per topic, no. topics=2 (huge_topic and slim_topic)
The loop in my question is done once at startup of the app. If in the loop I define 2 topics, and I know from one topic comes messages which are heavy weighted (huge_topic) and from the other comes lightweighted messsages (slim_topic).
Is it possible that both threads from num.stream.threads will be busy only with tasks which are comming from huge_topic? And messages from slimm_topic will have to wait for processing?
Internally, Kafka Streams create tasks based on partitions. Going with your loop example and assume you have 3 input topics A, B, C with 2, 4, and 3 partition respectively. For this, you will get 4 task (ie, max number of partitions over all topics) with the following partition to task assignment:
t0: A-0, B-0, C-0
t1: A-1, B-1, C-1
t2:        B-2, C-2
t3:        B-3
Partitions are grouped "by number" and assigned to the corresponding task. This is determined at runtime (ie, after you call KafakStreams#start()) because before that, the number of partitions per topic is unknown.
It is not recommended to mess with the partitions grouped if you don't understand all the internal details of Kafka Streams -- you can very easily break stuff! (This interface was deprecated already and will be removed in upcoming 3.0 release.)
With regard to threads: tasks limit the number of threads. For our example, this implies that you can have max 4 thread (if you have more, those threads will be idle, as there is no task left for thread assignment). How you "distribute" those thread is up to you. You can either have 4 single threaded application instances of one single application instance with 4 thread (or anything in between).
If you have fewer tasks than threads, task will be assigned in a load balanced way, based on number of tasks (all tasks are assumed to have the same load).
If there is e.g. num.stream.threads > 1, how tasks are assigned to the
prepared and assigned (in the loop) threads?
Tasks are assigned to threads with the usage of a partition grouper. You can read about it here. AFAIK it's called after a rebalance, so it's not a very dynamic process. That said, I'd argue that there is no option for starvation.

Understanding Threads Swift

I sort of understand threads, correct me if I'm wrong.
Is a single thread allocated to a piece of code until that code has completed?
Are the threads prioritised to whichever piece of code is run first?
What is the difference between main queue and thread?
My most important question:
Can threads run at the same time? If so how can I specify which parts of my code should run at a selected thread?
Let me start this way. Unless you are writing a special kind of application (and you will know if you are), forget about threads. Working with threads is complex and tricky. Use dispatch queues… it's simpler and easier.
Dispatch queues run tasks. Tasks are closures (blocks) or functions. When you need to run a task off the main dispatch queue, you call one of the dispatch_ functions, the primary one being dispatch_async(). When you call dispatch_async(), you need to specify which queue to run the task on. To get a queue, you call one of the dispatch_queue_create() or dispatch_get_, the primary one being dispatch_get_global_queue.
NOTE: Swift 3 changed this from a function model to an object model. The dispatch_ functions are instance methods of DispatchQueue. The dispatch_get_ functions are turned into class methods/properties of DispatchQueue
// Swift 3
DispatchQueue.global(qos: .background).async {
var calculation = arc4random()
}
// Swift 2
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)) {
var calculation = arc4random()
}
The trouble here is any and all tasks which update the UI must be run on the main thread. This is usually done by calling dispatch_async() on the main queue (dispatch_get_main_queue()).
// Swift 3
DispatchQueue.global(qos: .background).async {
var calculation = arc4random()
DispatchQueue.main.async {
print("\(calculation)")
}
}
// Swift 2
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)) {
var calculation = arc4random()
dispatch_async(dispatch_get_main_queue()) {
print("\(calculation)")
}
}
The gory details are messy. To keep it simple, dispatch queues manage thread pools. It is up to the dispatch queue to create, run, and eventually dispose of threads. The main queue is a special queue which has only 1 thread. The operating system is tasked with assigning threads to a processor and executing the task running on the thread.
With all that out of the way, now I will answer your questions.
Is a single thread allocated to a piece of code until that code has completed?
A task will run in a single thread.
Are the threads prioritised to whichever piece of code is run first?
Tasks are assigned to a thread. A task will not change which thread it runs on. If a task needs to run in another thread, then it creates a new task and assigns that new task to the other thread.
What is the difference between main queue and thread?
The main queue is a dispatch queue which has 1 thread. This single thread is also known as the main thread.
Can threads run at the same time?
Threads are assigned to execute on processors by the operating system. If your device has multiple processors (they all do now-a-days), then multiple threads are executing at the same time.
If so how can I specify which parts of my code should run at a selected thread?
Break you code into tasks. Dispatch the tasks on a dispatch queue.

Play Framework: thread-pool-executor vs fork-join-executor

Let's say we have a an action below in our controller. At each request performLogin will be called by many users.
def performLogin( ) = {
Async {
// API call to the datasource1
val id = databaseService1.getIdForUser();
// API call to another data source different from above
// This process depends on id returned by the call above
val user = databaseService2.getUserGivenId(id);
// Very CPU intensive task
val token = performProcess(user)
// Very CPU intensive calculations
val hash = encrypt(user)
Future.successful(hash)
}
}
I kind of know what the fork-join-executor does. Basically from the main thread which receives a request, it spans multiple worker threads which in tern will divide the work into few chunks. Eventually main thread will join those result and return from the function.
On the other hand, if I were to choose the thread-pool-executor, my understanding is that a thread is chosen from the thread pool, this selected thread will do the work, then go back to the thread pool to listen to more work to do. So no sub dividing of the task happening here.
In above code parallelism by fork-join executor is not possible in my opinion. Each call to the different methods/functions requires something from the previous step. If I were to choose the fork-join executor for the threading how would that benefit me? How would above code execution differ among fork-join vs thread-pool executor.
Thanks
This isn't parallel code, everything inside of your Async call will run in one thread. In fact, Play! never spawns new threads in response to requests - it's event-based, there is an underlying thread pool that handles whatever work needs to be done.
The executor handles scheduling the work from Akka actors and from most Futures (not those created with Future.successful or Future.failed). In this case, each request will be a separate task that the executor has to schedule onto a thread.
The fork-join-executor replaced the thread-pool-executor because it allows work stealing, which improves efficiency. There is no difference in what can be parallelized with the two executors.

Resources