From an Apache storm bolt, I am using Elasticsearch's transport client to remotely connect to an ES cluster. When I take a jstack output of the storm process, I notice that there are nearly 1000 threads with an ES stack trace like:
elasticsearch[Flying Tiger][transport_client_worker][T#22]{New I/O worker #269}" daemon prio=10 tid=0x00007f80ac3cb000 nid=0x356b runnable [0x00007f7e96b2a000]
java.lang.Thread.State: RUNNABLE
at Method)
- locked <0x00000000d148d138> (a$2)
- locked <0x00000000d148d128> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000d148c9b8> (a
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
I am using a single instance of the ES transport client across my storm topology, which has about 18 output streams invoking the ES client to write data to the ES cluster.
Why does the ES transport client spawn so many threads ? Is there any way I can tune this ? I tried looking up ES documentation but it does not provide any internal details on the threading mechanism of the transport client nor does it give an option to tune the number of threads of the client.
I had very similar experience before. As you've mentioned, one transport client creates tens of threads including timers and so on.
What you have to check is whether there is exactly one transport client per worker process. Back in my earlier days, when I used 32 transport clients, there were more than 1 thousand threads and after I've correctly modified it to be singleton instance, number of threads decreased to less than 2 hundreds(including all other threads created in my topology)
You can define system property: "es.processors.override" or setting "processors", based on source code of org.elasticsearch.common.util.concurrent.EsExecutors. I tried this method and limit the number of worker threads successfully.
* Settings key to manually set the number of available processors.
* This is used to adjust thread pools sizes etc. per node.
public static final String PROCESSORS = "processors";
/** Useful for testing */
public static final String DEFAULT_SYSPROP = "es.processors.override";
Also from limit number of thread in ThreadPool while creating TransportClient in elasticsearch
Settings settings = ImmutableSettings.settingsBuilder()
I have a play scala application running on play 2.7. this is used as a middleware for our frontend and it has rest end points.
Now I am running two different instances on cloud and using nginx and bound these two servers and load balance it with round robin.
Now I am having a problem that the servers goes down quite often i.e. 3 times a day and interesting thing is both server goes down at same time. When I looked at it says out of memory exception on the both servers. I tried to print javaheapdump for out of memory but getting no dump . I am still analysing the thread dump to figure out what might be the actual cause of my server going down but what pins me is why the two servers are going down at the same time.
Out of thread dump I see there are 7707 thread with sleeping state. here it is
"Connection evictor" #146 daemon prio=5 os_prio=0 cpu=2.33ms elapsed=1822.02s tid=0x00007f8a840c4800 nid=0x194 waiting on condition [0x00007f8a58a5e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(java.base#11/Native Method)
at org.apache.http.impl.client.IdleConnectionEvictor$
This what I see when server goes down
[35966.967s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
Uncaught error from thread [ error from thread []: ]: unable to create native thread: possibly out of memory or process/resource limits reachedunable to create native thread: possibly out of memory or process/resource limits reached, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[ ActorSystem[applicationapplication]
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.base/java.lang.Thread.start0(Native Method)
at java.base/java.lang.Thread.start(
at org.apache.http.impl.client.IdleConnectionEvictor.start(
at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(
at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(
at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(
at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(
at org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(
at org.apache.solr.client.solrj.impl.HttpSolrClient$
at com.github.takezoe.solr.scala.SolrClient$.$anonfun$$lessinit$greater$default$2$1(SolrClient.scala:11)
at com.github.takezoe.solr.scala.SolrClient.<init>(SolrClient.scala:14)
at service.tvt.solr.SolrPolygonService.getSuburbBoundary(SolrPolygonService.scala:212)
at service.bto.business_categories.MeedssCountService.$anonfun$suburbMeedssCount$2(MeedssCountService.scala:81)
at service.bto.business_categories.MeedssCountService.suburbMeedssCount(MeedssCountService.scala:80)
at controllers.bto.industry_categories.meedss.MeedssController.$anonfun$suburbMeedssCount$1(MeedssController.scala:38)
at play.api.mvc.ActionBuilder.$anonfun$apply$11(Action.scala:368)
at scala.Function1.$anonfun$andThen$1(Function1.scala:52)
at play.api.mvc.ActionBuilderImpl.invokeBlock(Action.scala:489)
at play.api.mvc.ActionBuilderImpl.invokeBlock(Action.scala:487)
at play.api.mvc.ActionBuilder$$anon$9.invokeBlock(Action.scala:336)
at play.api.mvc.ActionBuilder$$anon$9.invokeBlock(Action.scala:331)
at play.api.mvc.ActionBuilder$$anon$10.apply(Action.scala:426)
at play.api.mvc.Action.$anonfun$apply$2(Action.scala:98)
at play.api.libs.streams.StrictAccumulator.$anonfun$mapFuture$4(Accumulator.scala:184)
at scala.util.Try$.apply(Try.scala:209)
at play.api.libs.streams.StrictAccumulator.$anonfun$mapFuture$3(Accumulator.scala:184)
at play.api.libs.streams.Accumulator$.$anonfun$futureToSink$2(Accumulator.scala:262)
at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:303)
at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
at play.api.libs.streams.Execution$trampoline$.execute(Execution.scala:72)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
at scala.concurrent.impl.Promise$DefaultPromise.dispatchOrAddCallback(Promise.scala:312)
at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:303)
at scala.concurrent.impl.Promise.transformWith(Promise.scala:36)
at scala.concurrent.impl.Promise.transformWith$(Promise.scala:34)
at scala.concurrent.impl.Promise$DefaultPromise.transformWith(Promise.scala:183)
at scala.concurrent.Future.flatMap(Future.scala:302)
at scala.concurrent.Future.flatMap$(Future.scala:302)
at scala.concurrent.impl.Promise$DefaultPromise.flatMap(Promise.scala:183)
at play.api.libs.streams.Accumulator$.$anonfun$futureToSink$1(Accumulator.scala:261)
at play.core.server.AkkaHttpServer.$anonfun$runAction$4(AkkaHttpServer.scala:441)
at akka.http.scaladsl.util.FastFuture$.strictTransform$1(FastFuture.scala:41)
at akka.http.scaladsl.util.FastFuture$.$anonfun$transformWith$3(FastFuture.scala:51)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:92)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
at akka.dispatch.BatchingExecutor$
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(
Any quick pointers will be really helpful
Levi Ramsey was right it was because of TakeZoe lib which we were using. We were creating client for every new request and not closing it. Finally we created a connection pool with limited active connections and it worked.
I am creating an application which requires multiple processes to run in parallel. The number of processes to run is dynamic, it depends on the input received.
E.g., if the user wants information about three different things [car, bike, auto] then I need three separate thread to run each in parallel.
numberOfThreadsNeeded = getNumberOfThingsFromInput();
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreadsNeeded);
Code Snippet:
public class ConsoleController {
private static final Log LOG = LogFactory.getLog(ConsoleController.class);
ConsoleCache consoleCache;
Metrics metrics;
public List<Feature> getConsoleData(List<String> featureIds, Map<String, Object> input, Metrics metrics) {
this.metrics = metrics;
List<FeatureModel> featureModels =
Integer numberOfThreadsNeeded = getThreadCount(featureModels);
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreadsNeeded);
.map(f -> (Callable<Result>) () -> f.fetchData(input, metrics))
The number of threads to be created varies from 1 to 100. Is it safe to define the thread pool size during initialization?
And also I want to know whether it is safe to run 100 threads in parallel?
There is no hard limit as per Java, but there might be a limit, for example, in the JVM implementation or the Operating System. So, practically speaking there is a limit. And there is a point where adding more threads can make the performance worse, not better. Also, there is a possibility of running out of memory.
The way you use ExecutorService is not the way it was intended to be used. Normally you would create a single ExecutorService with the threads limit number that is best suited for your environment.
Keep in mind that even if you really want all your tasks to be executed in parallel you won't be able to achieve it anyways given the hardware/software concurrency limitations.
BTW, if you still want to create an ExecutorService per request - don't forget to call its shutdown() method, otherwise the JVM won't be able to exit gracefully as there will be threads still hanging around.
I have following entry in conf file. But I'm not sure if this dispatcher setting is being picked up and what's ultimate parallelism value being used
default-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
throughput = 3
fork-join-executor {
parallelism-min = 40
parallelism-factor = 10
parallelism-max = 100
I've 8 core machine so I expect 80 parallel threads to be in ready state
40min < 80 (8*10 factor) < 100max. I'd like to see what value is akka using for max parallel thread.
I created 45 child actors and in my logs, I'm printing the thread id [] and I don't see more than 20 threads running in parallel.
In order to max-out the parallelism factor, all the actors needs to be processing some messages at the same time. Are you sure this is the case in your application?
Take for example the following code
object Test extends App {
val system = ActorSystem()
(1 to 80).foreach{ _ =>
val ref = system.actorOf(Props[Sleeper])
ref ! "hello"
class Sleeper extends Actor {
override def receive: Receive = {
case msg =>
If you consider your config and 8 cores, you will see a small amount of threads being spawned (4, 5?) as the processing of the messages is too quick for some real parallelism to build up.
On the contrary, if you keep your actors CPU-busy uncommenting the nasty Thread.sleep you will see the number of threads will bump up to 80. However, this will only last 1 minute, after which the threads will be gradually be retired from the pool.
I guess the main trick is: don't think of each actor being run on a separate thread. It's whenever one or more messages appear on an actor's mailbox that the dispatcher awakes and - indeed - dispatches the message processing task to a designated pool.
Assuming you have an ActorSystem instance you can check the values set in its configuration. This is how you could get your hand on the values you've set in the config file:
val system = ActorSystem()
val config = system.settings.config.getConfig("")
I hope this helps. You can also consult this page for more details on specific configuration settings.
I've dug up a bit more in Akka to find out exactly what it uses for your settings. As you might already expect it uses a ForkJoinPool. The parallelism used to build it is given by:
object ThreadPoolConfig {
def scaledPoolSize(floor: Int, multiplier: Double, ceiling: Int): Int =
math.min(math.max((Runtime.getRuntime.availableProcessors * multiplier).ceil.toInt, floor), ceiling)
This function is used at some point to build a ForkJoinExecutorServiceFactory:
new ForkJoinExecutorServiceFactory(
Anyway, this is the parallelism that will be used to create the ForkJoinPool, which is actually an instance of java.lang.ForkJoinPool. Now we have to ask how many thread does this pool use? The short answer is that it will use the whole capacity (80 threads in our case) only if it needs it.
To illustrate this scenario, I've ran a couple of tests with various uses of Thread.sleep inside the actor. What I've found out is that it can use from somewhere around 10 threads (if no sleep call is made) to around the max 80 threads (if I call sleep for 1 second). The tests were made on a machine with 8 cores.
Summing it up, you will need to check the implementation used by Akka to see exactly how that parallelism is used, this is why I looked into ForkJoinPool. Other than looking at the config file and then inspecting that particular implementation I don't think you can do unfortunately :(
I hope this clarifies the answer - initially I thought you wanted to see how the actor system's dispatcher is configured.
Many people are saying that modern rest apis should be "async", and as a main argument they say that on some platforms, for example in Java, "blocking" way of doing things produce many threads and "async" way allows to limit thread count and overhead.
What I don't understand, is how it is achieved.
Consider I have an app in a framework like vert.x (but actually it doesn't matter, you can think of NodeJS as well), and say 1_000_000 concurrent connections for a service which makes some request to a database. The framework allows each request itself to be processed async on the long task i|o operations, so database data exchange looks syntactically asynchronous in the business logic code. BUT. As I understand, DB request is made not in the vacuum - it is processed in some other thread, and that thread actually blocks until db request is finished. So it means, that despite the fact, that request business logic looks async and non blocking, long time operations which are called from such logic are actually blocking somewhere under the hood of framework and the more such operations are done, the more threads should be consumed anyway (for NodeJS you can think of threads, created in C++ code of a framework itself)
So as I see the big picture - in async approach there is only one thread, which processes all the requests, it's ok, but there is a bunch of threads, which are doing the actual I/O work in the background anyway, and if one doesn't limit their count, then the number of threads will be the same as for a blocking approach + 1. On the other hand if you limit the number of background thread pool programmatically, then what will be the benefits compared to the blocking approach, which combines a queue for user requests and a limit for the number of request processing threads?
Since you're asking a fairly low level question I'll answer with a low level answer. Hope you're comfortable with C.
First, a disclaimer: I'll be talking mostly about networking code because the only widely used database I know of that use file I/O is sqlite. Since you're asking about postgres I can assume you're interested about how socket I/O (be it TCP socket or unix local sockets) can work with only one thread.
At the core of almost all async systems and libraries is a piece of code that looks like this:
while (1)
read_fd_set = active_fd_set;
// This blocks until we receive a packet or until timeout expires:
select(FD_SETSIZE, &read_fd_set, NULL, NULL, timeout);
// Process timed events:
timeout = process_timeout();
// Process I/O:
for (i = 0; i < FD_SETSIZE; ++i) {
if (FD_ISSET(i, &read_fd_set)) {
if (i == sock) {
/* Connection arriving on listening socket */
int new;
size = sizeof(clientname);
new = accept (sock,(struct sockaddr *) &clientname, &size);
FD_SET (new, &active_fd_set);
else {
/* Data arriving on an already-connected socket. */
if (read_from_client(i) < 0) {
close (i);
FD_CLR (i, &active_fd_set);
(code example paraphrased from a GNU socket programming example)
As you can see, the code above uses no threading whatsoever. Yet it can handle many connections simultaneously. If you take a look at the for loop it is also obvious that it is basically a simple state machine that processes sockets one at a time if they have any packets waiting to be read (if not it is skipped by the if (FD_ISSET...) statement).
Non-I/O events can logically only come from timed events. And that's where the timeout management (details not shown for clarity) comes in. All I/O related stuff (basically almost all your async code) gets called back from the read_from_client() function (again, details omitted for clarity).
There is zero code running in parallel.
Where does the parallelization come from?
Basically the server you're connecting to. Most databases support some form of parallelism. Some support mulththreading. Some even support node.js or vert.x style parallelism by supporting asynchronous disk I/O (like postgres). Some configurations of databases allow higher level of parallelism by storing data on more than one server via partitioning and/or sharding and/or master/slave servers.
That's where the big parallelism comes from -- parallel computing. Most databases have very strong support for read parallelism but weaker support for write parallelism (master/slave setups for example allow you to write only to the master database). But this is still a big win because most apps read more data than they write.
Where does disk parallelism come from?
The hardware. Mostly this has to do with DMA which can transfer data without the CPU. DMA is not one thing. It is more like a concept. Different systems like the PCI bus, SATA, USB even the CPU RAM bus itself has various kinds of DMA to transfer data directly to RAM (and in the case of RAM, to transfer data higher up to the various levels of CPU cache) or to a faster buffer.
While waiting for the DMA to complete. The CPU is not doing anything. And while it is doing nothing and there happens to be a network packet coming in or a setTimeout() expiring the code that handles them can be executed on the CPU. All while a file is being read into RAM.
But Node.js docs keep mentioning I/O threads
Only for disk I/O. It's not impossible to do async disk I/O with a single thread. Tcl has done that for years and many other programming languages and frameworks have too. It's just very-very messy since BSD does it differently form Linux which does it differently from Windows and even OSX may be subtly different form BSD even though it is derived from it etc. etc.
For the sake of simplicity and solid reliability node developers have opted to process disk I/O in separate threads.
Note that even for socket I/O it is not as simple as the code example I gave above. Since select() has some limitations (for example, you're forced to loop over ALL sockets to check for incoming data even though most won't have incoming data), people have come up with better APIs. And obviously different OSes do it differently. That is why there are a lot of libraries created to handle cross platform event processing like libevent and libuv (the one node.js uses).
OK. But postgres still runs on my PC
Asynchronous, event-oriented systems does not automagically give you performance superpowers. What they DO give you is choice: the app server is blazing fast so where you put your database servers and what database you use us up to you.
OK. But I can do this with threads. Why async?
Since 1999, many people have run many benchmarks and in the majority of cases single threaded (or low thread count), event-oriented systems have outperformed simple multithreaded systems. It was especially true in the old days of single CPU, single core servers. It is still partly true now (since cores are still limited).
That is why Apache was re-written into Apache2 to use a thread pool of async listeners and why Nginx was written from scratch to use a thread pool of async code.
Yes, on modern servers ideally you'd still want some threads in order to use all your CPUs. The alternative is a process pool like how the cluster module works in node.js. But you'd want the number of threads/processes to be constant or as constant as possible to avoid the overhead of context switching and thread creation.
This is true to some async frameworks where JDBC client is still synchronised.
When querying DB in Vert.x you reuse same application threads.
Please see the following example:
public void testMultipleThreads() throws InterruptedException {
Vertx vertx = Vertx.vertx();
System.out.println("Before starting server: " + Thread.activeCount());
// Start server
requestHandler(httpServerRequest -> {
// System.out.println("Request");
listen(8080, o -> {
System.out.println("Server ready");
// Start counting threads
vertx.setPeriodic(500, (o) -> {
// Create requests
HttpClient client = vertx.createHttpClient();
int loops = 1_000_000;
CountDownLatch latch = new CountDownLatch(loops);
for (int i = 0; i < loops; i++) {
client.getNow(8080, "localhost", "/", httpClientResponse -> {
// System.out.println("Response received");
You'll notice that the number of threads doesn't change, even though you serve as many connections as you would like. You can also add Vert.x JDBC client to test it.
I would like my IHandleMessages<X>.Handle(X x) methods to be called concurrently by NSB. Even when configuring the default host AsA_Client - which turns off transactions - and providing two or more threads (NumberOfWorkerThreads="3" in App.Config), the following handler is called twice sequentially when there are two messages on the queue:
public void Handle(EventMessage message)
Logger.Info(string.Format("Subscriber 1 received EventMessage with Id {0}.", message.EventId));
Logger.Info(string.Format("Message time: {0}.", message.Time));
Logger.Info(string.Format("Message duration: {0}.", message.Duration));
This is merely a modified version of the PubSub demo that is supplied with NSB. No matter what settings I provide - I've also tried tweaking the IsolationLevel, to no avail - this handler blocks concurrent calls.
In practice, this is not desirable for one specific set of handlers that we are writing. The desired behavior would be - at minimum - to let concurrent threads into the Handle method and we would manually mediate access to state with software locks.
Is this not possible or am I missing a trick?
The most likely cause is that you're using the free Express Edition of NServiceBus which is limited to a single thread. The commercially available Standard Edition allows you to run multiple threads.
NOTE: NServiceBus now performs at full speed in the free trial - no more performance throttling.