ThreadpoolExecutor with Observables

ThreadpoolExecutor with Observables - multithreading

Hi I am currently using Schedulers.io() with Observables for my api which makes network calls. I have a concern that in production it might create a lot of threads if there is huge volume of requests. I am expecting about 500k - 700k requests per day. Is Schedulers.io() a good candidate in this scenario.
Or should I create a custom threadpoolExecutor and use it as Schedulers.from(myExecutor)
//Sample
ThreadPoolExecutor myExecutor= new ThreadPoolExecutor(corePoolSize,
maxPoolSize, poolKeepAliveInMillis, TimeUnit.MILLISECONDS,
new ArrayBlockingQueue<Runnable>(corePoolSize));
All the examples that I saw online used a fixed thread pool like Schedulers.from(Executors.newFixedThreadPool(n)).
Does Rx java Schedulers support ThreadPoolExecutor? Which one is the best approach, please advice.

RxJava can use various kinds of schedulers, using Schedulers.from(threadPoolExecutor) for instance. In one project, there was a strong requirement on threads that they have a given name, often with a number, and that they have an exception catcher. It was easy to use Schedulers.from() to repurpose the executor into a scheduler.
Also, the Schedulers.io() schedulers recycle threads fairly efficiently. You can do a thread dump after a lot of transactions to see that the number of threads is relatively limited.

Related

Spawning 10,000 threads doesn't seem like the right approach, alternative ideas?

I am trying to simulate a decentralized system, but having trouble simulating given the real-life parameters.
Real-world:
Each module has its own computer and they communicate over a network
There can be hundreds of thousands of modules
They will communicate with each other to perform a certain action
Simulation:
Each module is considered its own thread since they are working async
Can't really spawn more than 1,000 threads
The thread to the module ratio is 1 to 1
Is spawning a thread per module the right approach? In theory, this seems like the right approach but in practice, it hits limitations at around 1,000 threads.

Your context perfectly match with the actor model
https://en.wikipedia.org/wiki/Actor_model
Explaining it through a response is impossible, start from the wiki link and search for some implementation in the language your are using, but it does what you need, you can simulate millions of "isolated states" and manage the concurrency of their mutations using very few resources (you should be able to reach 1K actors with very few threads, maybe also 2).
Also, nowadays a lot of languages offers (in their flavour) a version of lightweight threads that can be used to reduce the number of real threads used (goroutine, kotlin coroutines, java fibers, etc..)

What is so great about TPL

I have done this POC and verified that when you create 4 threads and run them on Quad core machine, all cores get busy - so, CLR is already scheduling threads on different cores effectively, so why the class TASK?
I agree Task simplifies the creation and use of threads, but apart from that what? Its just a wrapper around threads and threadpools right? Or does it in some way help scheduling threads on multicore machines?
I am specifially looking at whats with Task wrt multicore that wasnt there in 2.0 threads.

"I agree Task simplifies the creation and use of threads"
Isn't that enough? Isn't it fabulous that it provides higher-level building blocks so that us mere mortals can build lock-free multithreaded code which is safe because really smart people like Joe Duffy have done the work for us?
If TPL really just consisted of a way of starting a new task, it wouldn't be much use - the work-stealing etc is nice, but probably not crucial to most of us. It's the building blocks around tasks - and in particular around the idea of a "future" - which provide the value. Do you really want to write Parallel.ForEach yourself? Do you want to want to work out how to perform partitioning efficiently yourself? I know that if I tried doing that, it would take me a long time and I'd certainly do a worse job of it than the PFX team.
Many of the advances in development haven't been about making it possible to do something which was impossible before - they've been about raising the abstraction level so that a problem can be solved once and then that solution reused. Do you feel the same way about the CLR itself? You could do the same thing in assembly yourself, obviously... but by raising the abstraction level, the CLR and C# make us more productive.

Although you could do everything equivalently in TPL or threadpool, for a better abstraction, and scalability patterns TPL is preferred over Threadpool. But it is upto the programmer, and if you know exactly what you are doing, and based on your scheduling and synchronization requirements play out in your specific application you could use Threadpool more effectively. There are some stuff you get free with TPL which you've got to code up when using Threadpool, like following few I can think of now.
work stealing
worker thread local pool
scheduling groups of actions like Parallel.For

The TPL lets you program in terms of Tasks not threads. Thinking of tasks solely as a better thread abstraction would be a mistake. Tasks allow you to specify the work you want to get executed not how you want the work executed (threads). This allows you to express the potential parallelism of your application and have the TPL runtime--the scheduler and thread pool--handle how that work gets executed. This means that the TPL will take a lot of the burden off you of having your application deal with ensuring the best perfromance on a wide variety of hardware with different numbers of cores.
For example, the TPL makes it easy to implement key several design patterns that allow you to express the potential parallelism of your application.
http://msdn.microsoft.com/en-us/library/ff963553.aspx
Like Futures (mentioned by Jon) as well as pipelines and parallel loops.

Thread pooling and multi core systems

Do you think threadpooling design pattern is the way to go for the multi-core future?
A threadpooling library for instance, if used extensively, makes/force the application writer
(1) to break the problem into separate parallel jobs hence promoting (enforcing :) ) parallelism
(2) With abstraction from all the low level OS calls, synchronization etc etc makes programmer's life easier. (Especially for C programmers :) )
I have strong belief its the best way (Or One of the "best" ways :) ) for multi-core future...
So, my question is, am I write in thinking so or I am in some delusion :)
Regards,
Microkernel

Thread pooling is a technique that involves a queue and a number of threads taking jobs from the queue and process them. This is in contrast to the technique of just starting new threads whenever a new task arrives.
Benefits are that the maximum number of threads is limited to avoid too much threading and that there is less overhead involved with any new task (Thread is already running and takes task. No threat starting needed).
Whether this is a good design highly depends on your problem. If you have many short jobs that come to your program at a very fast rate, then this is a good idea because the lower overhead is really a benefit. If you have extremely large numbers of concurrent tasks this is a good idea to keep your scheduler from having to do too much work.
There are many areas where thread pooling is just not helpful. So you cannot generalize. Sometimes multi threading at all is not possible. Or not even desired, as multi threading adds an unpredictable element (race conditions) to your code which is extremely hard to debug.
A thread pooling library can hardly "force" you to use it. You still need to think stuff through and if you just start one thread... Won't help.

As almost every informatics topic the answer is: It Depends.
the pooling system is fine with Embarrassingly parallel http://en.wikipedia.org/wiki/Embarrassingly_parallel
For other task where you need more syncornization between threads it's not that good

For the Windows NT engine thread pools are usually much less efficient than I/O Completion Ports. These are covered extensively in numerous questions and answers here. IOCPs enable event-driven processing in that multiple threads can wait on the IOCP until an event occurs due to an IOC (read or write) on a socket or handle which is then queued to the IOCP. The IOCP in turn pairs a waiting thread with the id of the event and releases the thread for processing. After the thread has processed the event and initiated a new I/O it returns to the IOCP to wait for the next event (which may or may not be the completion of the I/O it just initiated).
IOCs may also be artificially signalled by explicit posting from a non-event.
Using IOCPs is not polling. The optimal IOCP implementation will have as many threads waiting on the IOCP as there are cores in the system. The threads may all execute the same physical code if that is deemed efficient. Since a thread processes from an IOC up until it issues an I/O it does nothing which forces it to wait for other resources except perhaps to compete for access to thread-safe areas. It is a natural choice to move away from the "one handle per thread" paradigm. IOCP-controlled threads are therefore as efficient as the programmer is able to construct them.

I like the answer by #yaankee a lot except I would argue that thread pool is almost always the right way to go. The reason: a thread pool can degenerate itself into a simple static work partitioning model for problems like matrix-matrix multiply. OpenMP guided is kind of along those lines.

Check number of idle cores when creating .Net 4.0 Parallel Task

My question might sound a bit naive but I'm pretty new with multi-threaded programming.
I'm writing an application which processes incoming external data. For each data that arrives a new task is created in the following way:
System.Threading.Tasks.Task.Factory.StartNew(() => methodToActivate(data));
The items of data arrive very fast (each second, half second, etc...), so many tasks are created. Handling each task might take around a minute. When testing it I saw that the number of threads is increasing all the time. How can I limit the number of tasks created, so the number of actual working threads is stable and efficient. My computer is only dual core.
Thanks!

One of your issues is that the default scheduler sees tasks that last for a minute and makes the assumption that they are blocked on another tasks that have yet to be executed. To try and unblock things it schedules more pending tasks, hence the thread growth. There are a couple of things you can do here:
Make your tasks shorter (probably not an option).
Write a scheduler that deals with this scenario and doesn't add more threads.
Use SetMaxThreads to prevent
unbounded thread pool growth.
See the section on Thread Injection here:
http://msdn.microsoft.com/en-us/library/ff963549.aspx

You should look into using the producer/consumer pattern with a BlockingCollection<T> around a ConcurrentQueue<T> where you set the BoundedCapacity to something that makes sense given the characteristics of your workload. You can make your BoundedCapacity configurable and then tweak as you run through some profiling sessions to find the sweet spot.
While it's true that the TPL will take care of queueing up the tasks you create, creating too many tasks does not come without penalties. Also, what's the point in producing more work than you can consume? You want to produce enough work that the consumers will never be starved, but you don't want to get to far ahead of yourself because that's just wasting resources and potentially stealing those very same resources from your consumers.

You can create a custom TaskScheduler for the Task Parallel library and then schedule tasks on that by passing an instance of it to the TaskFactory constructor.
Here's one example of how to do that: Task Scheduler with a maximum degree of parallelism.

Multithreading in .NET 4.0 and performance

I've been toying around with the Parallel library in .NET 4.0. Recently, I developed a custom ORM for some unusual read/write operations one of our large systems has to use. This allows me to decorate an object with attributes and have reflection figure out what columns it has to pull from the database, as well as what XML it has to output on writes.
Since I envision this wrapper to be reused in many projects, I'd like to squeeze as much speed out of it as possible. This library will mostly be used in .NET web applications. I'm testing the framework using a throwaway console application to poke at the classes I've created.
I've now learned a lesson of the overhead that multithreading comes with. Multithreading causes it to run slower. From reading around, it seems like it's intuitive to people who've been doing it for a long time, but it's actually counter-intuitive to me: how can running a method 30 times at the same time be slower than running it 30 times sequentially?
I don't think I'm causing problems by multiple threads having to fight over the same shared object (though I'm not good enough at it yet to tell for sure or not), so I assume the slowdown is coming from the overhead of spawning all those threads and the runtime keeping them all straight. So:
Though I'm doing it mainly as a learning exercise, is this pessimization? For trivial, non-IO tasks, is multithreading overkill? My main goal is speed, not responsiveness of the UI or anything.
Would running the same multithreading code in IIS cause it to speed up because of already-created threads in the thread pool, whereas right now I'm using a console app, which I assume would be single-threaded until I told it otherwise? I'm about to run some tests, but I figure there's some base knowledge I'm missing to know why it would be one way or the other. My console app is also running on my desktop with two cores, whereas a server for a web app would have more, so I might have to use that as a variable as well.

Thread's don't actually all run concurrently.
On a desktop machine I'm presuming you have a dual core CPU, (maybe a quad at most). This means only 2/4 threads can be running at the same time.
If you have spawned 30 threads, the OS is going to have to context switch between those 30 threads to keep them all running. Context switches are quite costly, so hence the slowdown.
As a basic suggestion, I'd aim for 1 thread per CPU if you are trying to optimise calculations. Any more than this and you're not really doing any extra work, you are just swapping threads in an out on the same CPU. Try to think of your computer as having a limited number of workers inside, you can't do more work concurrently than the number of workers you have available.
Some of the new features in the .net 4.0 parallel task library allow you to do things that account for scalability in the number of threads. For example you can create a bunch of tasks and the task parallel library will internally figure out how many CPUs you have available, and optimise the number of threads is creates/uses so as not to overload the CPUs, so you could create 30 tasks, but on a dual core machine the TP library would still only create 2 threads, and queue the . Obviously, this will scale very nicely when you get to run it on a bigger machine. Or you can use something like ThreadPool.QueueUserWorkItem(...) to queue up a bunch of tasks, and the pool will automatically manage how many threads is uses to perform those tasks.
Yes there is a lot of overhead to thread creation, but if you are using the .net thread pool, (or the parallel task library in 4.0) .net will be managing your thread creation, and you may actually find it creates less threads than the number of tasks you have created. It will internally swap your tasks around on the available threads. If you actually want to control explicit creation of actual threads you would need to use the Thread class.
[Some cpu's can do clever stuff with threads and can have multiple Threads running per CPU - see hyperthreading - but check out your task manager, I'd be very surprised if you have more than 4-8 virtual CPUs on today's desktops]

There are so many issues with this that it pays to understand what is happening under the covers. I would highly recommend the "Concurrent Programming on Windows" book by Joe Duffy and the "Java Concurrency in Practice" book. The latter talks about processor architecture at the level you need to understand it when writing multithreaded code. One issue you are going to hit that's going to hurt your code is caching, or more likely the lack of it.
As has been stated there is an overhead to scheduling and running threads, but you may find that there is a larger overhead when you share data across threads. That data may be flushed from the processor cache into main memory, and that will cause serious slow downs to your code.
This is the sort of low-level stuff that managed environments are supposed to protect us from, however, when writing highly parallel code, this is exactly the sort of issue you have to deal with.
A colleague of mine recorded a screencast about the performance issue with Parallel.For and Parallel.ForEach which may help:
http://rocksolidknowledge.com/ScreenCasts.mvc/Watch?video=ParallelLoops.wmv

You're speaking of an ORM, so I presume some amount of I/O is going on. If this is the case, the overhead of thread creation and context switching is going to be comparatively non-existent.
Most likely, you're experiencing I/O contention: it can be slower (particularly on rotational hard drives, but also on other storage devices) to read the same set of data if you read it out of order than if you read it in-order. So, if you're executing 30 database queries, it's possible they'll run faster sequentially than in parallel if they're all backed by the same I/O device and the queries aren't in cache. Running them in parallel may cause the system to have a bunch of I/O read requests almost simultaneously, which may cause the OS to read little bits of each in turn - causing your drive head to jump back and forth, wasting precious milliseconds.
But that's just a guess; it's not possible to really determine what's causing your slowdown without knowing more.
Although thread creation is "extremely expensive" when compared to say adding two numbers, it's not usually something you'll easily overdo. If your operations are extremely short (say, a millisecond or less), using a thread-pool rather than new threads will noticeably save time. Generally though, if your operations are that short, you should reconsider the granularity of parallelism anyhow; perhaps you're better off splitting the computation into bigger chunks: for instance, by having a fairly low number of worker tasks which handle entire batches of smaller work-items at a time rather than each item separately.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string