I have some questions about thread name in the PlayFramework.
I've developed Rest-API service on the Play for about 5 months.
The app simply accesses MySQL, and send back json formatted data to clients.
I've already understood the pit fall of the 'blocking io', so
I create a thread pool for blocking io, and use it all the Future block that
block thread execution.
The definition of the thread pool is as follows.
akka {
actor-system = "myActorSystem"
blocking-io-dispatcher {
type = Dispatcher
executor = "thread-pool-executor"
thread-pool-executor {
fixed-pool-size = 64
}
throughput = 10
}
}
I checked the log file, and be sure that all non-blocking logics
run under thread named 'application-akka.actor.default-dispatcher-#' where
is integer value and that all blocking logics run under thread named
'application-blocking-io-dispatcher'.
Then I checked the all thread name and count using 'Jconsole'.
The number of thread named 'application-akka.actor.default-dispatcher-#' is
always under 13, and thread count of 'application-blocking-io-dispatcher-#'
is always under 30.
However, the total thread count of the JVM under which my app runs increases
constantly. The total number of thread is more than 10,000.
There is so many threads whose name start with 'default-scheduler-' or
'default-akka.actor.default-dispatcher'.
My questions are
a. What's the difference between 'application-akka.actor.default-dispatcher'
and 'default-akka.actor.default-dispatcher-' ?
b. Is there any reason thread count increases?
I want to solve this issue.
Here's my environment.
OS : Windows 10 Pro. 64bit
CPU : Intel(R) Core i7 # 3.5GHz
RAM : 64GB
JVM : 1.8.0_162 64bit
PlayFramework : 2.6
RDBMS : MySQL 5.7.21
Any suggestions will be greatly appreciated. Thanks in advance.
Finally I solved the problem. There was a bug that would not shutdown the instance of
akka's Materializor. After modifying the code, thread count in the VM keeps stable value.
Thanks.
Related
I have following entry in conf file. But I'm not sure if this dispatcher setting is being picked up and what's ultimate parallelism value being used
akka{
actor{
default-dispatcher {
type = Dispatcher
executor = "fork-join-executor"
throughput = 3
fork-join-executor {
parallelism-min = 40
parallelism-factor = 10
parallelism-max = 100
}
}
}
}
I've 8 core machine so I expect 80 parallel threads to be in ready state
40min < 80 (8*10 factor) < 100max. I'd like to see what value is akka using for max parallel thread.
I created 45 child actors and in my logs, I'm printing the thread id [application-akka.actor.default-dispatcher-xx] and I don't see more than 20 threads running in parallel.
In order to max-out the parallelism factor, all the actors needs to be processing some messages at the same time. Are you sure this is the case in your application?
Take for example the following code
object Test extends App {
val system = ActorSystem()
(1 to 80).foreach{ _ =>
val ref = system.actorOf(Props[Sleeper])
ref ! "hello"
}
}
class Sleeper extends Actor {
override def receive: Receive = {
case msg =>
//Thread.sleep(60000)
println(msg)
}
}
If you consider your config and 8 cores, you will see a small amount of threads being spawned (4, 5?) as the processing of the messages is too quick for some real parallelism to build up.
On the contrary, if you keep your actors CPU-busy uncommenting the nasty Thread.sleep you will see the number of threads will bump up to 80. However, this will only last 1 minute, after which the threads will be gradually be retired from the pool.
I guess the main trick is: don't think of each actor being run on a separate thread. It's whenever one or more messages appear on an actor's mailbox that the dispatcher awakes and - indeed - dispatches the message processing task to a designated pool.
Assuming you have an ActorSystem instance you can check the values set in its configuration. This is how you could get your hand on the values you've set in the config file:
val system = ActorSystem()
val config = system.settings.config.getConfig("akka.actor.default-dispatcher")
config.getString("type")
config.getString("executor")
config.getString("throughput")
config.getInt("fork-join-executor.parallelism-min")
config.getInt("fork-join-executor.parallelism-max")
config.getDouble("fork-join-executor.parallelism-factor")
I hope this helps. You can also consult this page for more details on specific configuration settings.
Update
I've dug up a bit more in Akka to find out exactly what it uses for your settings. As you might already expect it uses a ForkJoinPool. The parallelism used to build it is given by:
object ThreadPoolConfig {
...
def scaledPoolSize(floor: Int, multiplier: Double, ceiling: Int): Int =
math.min(math.max((Runtime.getRuntime.availableProcessors * multiplier).ceil.toInt, floor), ceiling)
...
}
This function is used at some point to build a ForkJoinExecutorServiceFactory:
new ForkJoinExecutorServiceFactory(
validate(tf),
ThreadPoolConfig.scaledPoolSize(
config.getInt("parallelism-min"),
config.getDouble("parallelism-factor"),
config.getInt("parallelism-max")),
asyncMode)
Anyway, this is the parallelism that will be used to create the ForkJoinPool, which is actually an instance of java.lang.ForkJoinPool. Now we have to ask how many thread does this pool use? The short answer is that it will use the whole capacity (80 threads in our case) only if it needs it.
To illustrate this scenario, I've ran a couple of tests with various uses of Thread.sleep inside the actor. What I've found out is that it can use from somewhere around 10 threads (if no sleep call is made) to around the max 80 threads (if I call sleep for 1 second). The tests were made on a machine with 8 cores.
Summing it up, you will need to check the implementation used by Akka to see exactly how that parallelism is used, this is why I looked into ForkJoinPool. Other than looking at the config file and then inspecting that particular implementation I don't think you can do unfortunately :(
I hope this clarifies the answer - initially I thought you wanted to see how the actor system's dispatcher is configured.
I have a CPU intensive task (looping through a some data and evaluating results). I want to make use of multiple cores for these but my performance is consistently worse than just using a single core.
I've tried:
Creating multiple processes on different ports with express and sending the tasks to these processes
Using webworker-threads to run the tasks in different threads using the thread pool
I'm measuring the results by counting the total number of iterations I can complete and dividing by the amount of time I spent working on the problem. When using a single core, my results are significantly better.
some points of interest:
I can identify when I am just using one core and when I am using multiple cores through task manager. I am using the expected number of cores.
I have lots of ram
I've tried running on just 2 or 3 cores
I added nextTicks which doesn't seem to impact anything in this case
The tasks take several seconds each so I don't feel like I'm losing a lot to overhead
Any idea as to what is going on here?
Update for threads: I suspect a bug in webworker-threads
Skipping express for now, I think the issue may have to do with my thread loop. What I'm doing is creating a threads and then trying to continuously run them but send data back and forth between them. Even though both of the threads are using up CPU, only thread 0 is returning values. My assumption was emit any would generally end up emitting the message to the thread that had been idle the longest but that does not seem to be the case. My set up looks like this
Within threadtask.js
thread.on('init', function() {
thread.emit('ready');
thread.on('start', function(data) {
console.log("THREAD " + thread.id + ": execute task");
//...
console.log("THREAD " + thread.id + ": emit result");
thread.emit('result', otherData));
});
});
main.js
var tp = Threads.createPool(NUM_THREADS);
tp.load(threadtaskjsFilePath);
var readyCount = 0;
tp.on('ready', function() {
readyCount++;
if(readyCount == tp.totalThreads()) {
console.log('MAIN: Sending first start event');
tp.all.emit('start', JSON.stringify(data));
}
});
tp.on('result', function(eresult) {
var result = JSON.parse(eresult);
console.log('MAIN: result from thread ' + result.threadId);
//...
console.log('MAIN: emit start' + result.threadId);
tp.any.emit('start' + result.threadId, data);
});
tp.all.emit("init", JSON.stringify(data2));
The output to this disaster
MAIN: Sending first start event
THREAD 0: execute task
THREAD 1: execute task
THREAD 1: emit result
MAIN: result from thread 1
THREAD 0: emit result
THREAD 0: execute task
THREAD 0: emit result
MAIN: result from thread 0
MAIN: result from thread 0
THREAD 0: execute task
THREAD 0: emit result
THREAD 0: execute task
THREAD 0: emit result
MAIN: result from thread 0
MAIN: result from thread 0
THREAD 0: execute task
THREAD 0: emit result
THREAD 0: execute task
THREAD 0: emit result
MAIN: result from thread 0
MAIN: result from thread 0
I did try another approach as well where I would emit all but then have each thread listen for a message that only it could answer. Eg, thread.on('start' + thread.id, function() { ... }). This doesn't work because in the result when I do tp.all.emit('start' + result.threadId, ... ), the message doesn't get picked up.
MAIN: Sending first start event
THREAD 0: execute task
THREAD 1: execute task
THREAD 1: emit result
THREAD 0: emit result
Nothing more happens after that.
Update for multiple express servers: I'm getting improvements but smaller than expected
I revisited this solution and had more luck. I think my original measurement may have been flawed. New results:
Single process: 3.3 iterations/second
Main process + 2 servers: 4.2 iterations/second
Main process + 3 servers: 4.9 iterations/second
One thing I find a little odd is that I'm not seeing around 6 iterations/second for 2 servers and 9 for 3. I get that there are some losses for networking but if I increase my task time to be sufficiently high, the network losses should be pretty minor I would think.
You shouldn't be pushing your Node.js processes to run multiple threads for performance improvements. Running on a quad-core processor, having 1 express process handling general requests and 3 express processes handling the CPU intensive requests would probably be the most effective setup, which is why I would suggest that you try to design your express processes to defer from using Web workers and simply block until they produce a result. This will get you down to running a single process with a single thread, as per design, most likely yielding the best results.
I do not know the intricacies of how the Web workers package handles synchronization, affects the I/O thread pools of Node.js that happen in c space, etc., but I believe you would generally want to introduce Web workers to be able to manage more blocking tasks at the same time without severely affecting other requests that require no threading and system I/O, or can otherwise be expediently responded to. It doesn't necessarily mean that applying this would yield improved performance for the particular tasks being performed. If you run 4 processes with 4 threads that perform I/O, you might be locking yourself into wasting time continuously switching between the thread contexts outside the application space.
I have played around akka for two weeks and still very confused with some basic concept.
I have a very simple pattern which contains three kinds of actors:
master
worker
reporter
I config these actors as following:
Master
Master use following dispatcher with RoundRobinRouter(10):
mailbox-capacity = 10000
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 0
parallelism-max = 600
parallelism-factor = 3.0
}
Worker
I have several workers(ref) in this system, they all receive messages from master, and each of them use a router of RoundRobinRouter(10).
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 0
parallelism-max = 600
parallelism-factor = 3.0
}
mailbox-capacity = 100000
Notifier
is just an actor used to receive result from the worker, and counting up.
It uses the same dispatcher as workers.
I have made some adjustment on the parallelism parameters and router, but the performance seems no change. It takes 80 seconds to consume 10 million tasks, of each takes at least 500 ms to finish.
So it got me there, if a dispatcher acts like a thread pool, what if an actor use this dispatcher without using a router, which means there is only one instance. Will the code in receive block be executed parallel?
Just in case something else in my code messed things up:
Gist
And this is the virtual machine runs this program:
32-bit ubuntu 12.04 lts
Memory 5.0 GiB
Intel® Core™ i5-2500 CPU # 3.30GHz × 4
Sorry to put this question so unclear here.
If there is any thing can improve this performance please tell me.
Any advise is welcome. Thank in advance.
Update
Sorry!
It was 10 million task, not 100 million!
My bad, sorry!
In one of my actors react method I output the thread pool by doing:
val scalaThreadSet = asScalaSet(Thread.getAllStackTraces().keySet());
scalaThreadSet.foreach(element => Console.println("Thread=" + element + ",state=" +
element.getState()))
I see a bunch of threads:
Thread=Thread[ForkJoinPool-1-worker-6,5,main],state=WAITING
Thread=Thread[Signal Dispatcher,9,system],state=RUNNABLE
Thread=Thread[ForkJoinPool-1-worker-10,5,main],state=RUNNABLE
Thread=Thread[ForkJoinPool-1-worker-13,5,main],state=WAITING
Thread=Thread[ForkJoinPool-1-worker-7,5,main],state=WAITING
Thread=Thread[ForkJoinPool-1-worker-9,5,main],state=WAITING
Thread=Thread[ForkJoinPool-1-worker-14,5,main],state=WAITING
I wish to reduce the size of thread pool to one and only see one thread, so I pass in:
So I pass in:
-Dactors.maxPoolSize=1
as a VM argument.
My expectation is I should now only see one thread but I still see loads. Any ideas?
Short answer
Try running the VM with
-Dactors.corePoolSize=1
Explanation
The ForkJoinScheduler, which is the default scheduler for most OSes running Java 1.6 or later, uses a DrainableForkJoinPool under to covers which, as far as I can tell, ignores the maxPoolSize property. See the makeNewPool method of the ForkJoinScheduler and the constructor for DrainableForkJoinPool.
I've got some code that is trying to create 100 threaded http calls. It seems to be getting capped at about 40.
When I do threadJoin I'm only getting 38 - 40 sets of results from my http calls, despite the loop being from 1 - 100.
// thread http calls
pages = 100;
for (page="1";page <= pages; page++) {
thread name="req#page#" {
grabber.setURL('http://site.com/search.htm');
// request headers
grabber.addParam(type="url",name="page",value="#page#");
results = grabber.send().getPrefix();
arrayAppend(VARIABLES.arrResults,results.fileContent);
}
}
// rejoin threads
for (page="2";page <= pages; page++) {
threadJoin('req#page#',10000);
}
Is there a limit to the number of threads that CF can create? Is it to do with Java running in the background? Or can it not handle that many http requests?
Is there a just a much better way for me to do this than threaded HTTP calls?
The result you're seeing is likely because your variables aren't thread safe.
grabber.addParam(type="url",name="page",value="#page#");
That line is accessing Variables.Page which is shared by all of the spawned threads. Because threads start at different times, the value of page is often different from the value you think it is. This will lead to multiple threads having the same value for page.
Instead, if you pass page as an attribute to the thread, then each thread will have its own version of the variable, and you will end up with 100 unique values. (1-100).
Additionally you're writing to a shared variable as well.
arrayAppend(VARIABLES.arrResults,results.fileContent);
ArrayAppend is not thread safe and you will be overwriting versions of VARIABLES.arrResults with other versions of itself, instead of appending each bit.
You want to set the result to a thread variable, and then access that once the joins are complete.
thread name="req#page#" page=Variables.page {
grabber.setURL('http://site.com/search.htm');
// request headers
grabber.addParam(type="url",name="page",value="#Attributes.page#");
results = grabber.send().getPrefix();
thread.Result = results.fileContent;
}
And the join:
// rejoin threads
for (page="2";page <= pages; page++) {
threadJoin('req#page#',10000);
arrayAppend(VARIABLES.arrResults, CFThread['req#page#'].Result);
}
In the ColdFusion administrator, there's a setting for how many will run concurrently, mine's defaulted to 10. The rest apparently are queued. An Phantom42 mentions, you can up the number of running CF threads, however, with 100 or more threads, you may run into other problems.
On 32-bit processes, your whole process can only use 2gig of memory. Each thread uses up an amount of Stack memory, which isn't part of the heap. We've had problems with running out of memory with high numbers of threads as your Java Binary+Heap+Non-Heap(PermGen)+(threads*512k) can easily go over the 2-gig limit.
You'd also have to allow enough threads to handle your code above, as well as other requests coming into your app, which may bog down the app as a whole.
I would suggest changing your code to create N threads, each of which does more than 1 request. It's more work, but you break the N requests=N Threads problem. There's a couple of approaches you can take:
If you think that each request is going to take roughly the same time, then you can split up the work and give each thread a portion to work on before you start each one up.
Or each thread picks a URL off a list and processes it, you can then join to all N threads. You'd need to make sure you put locking around whatever counter you used to track progress though.
Check your Maximum number of running JRun threads setting in ColdFusion Administrator under the Request Tuning tab. The default is 50.