I'm building a system in Kotlin that need's to schedule a lot of jobs at the same time (sometimes even a few thousands of jobs). Most of the jobs are not complex, they are simply doing a few HTTP requests with different libraries.
I'm currently working with Quartz scheduler. the problem is that sometimes it misfires a job and that something that can be critical to me. I know Quartz has different settings you can apply to handle misfires but I didn't find something suitable.
I know that in Kotlin there are coroutines that can be considered as lightweight threads. Is there any scheduler like Quartz for Kotlin that executes the jobs as coroutines instead of threads with the same (or at least most of the same) features? I'm asking because I believe that such scheduler can increase my system performance.
follow up question: if such scheduler exists will it be safe to use him even if I don't know the implementation of the libraries I'm using inside the job? (For example making sure there are no Thread.sleep() calls)
Pure curiosity, I'm just wondering if there is any case where a webworker would manage to execute a separate thread if only one thread is available in the CPU, maybe with some virtualization, using the GPU?
Thanks!
There seem to be two premises behind your question: firstly, that web workers use threads; and secondly that multiple threads require multiple cores. But neither is really true.
On the first: there’s no actual requirement that web workers be implemented with threads. User agents are free to use processes, threads or any “equivalent construct” [see the web worker specification]. They could use multitasking within a single thread if they wanted to. Web worker scripts are run concurrently but not necessarily parallel to browser JavaScript.
On the second: it’s quite possible for multiple threads to run on a single CPU. It works a lot like concurrent async functions do in single threaded JavaScript.
So yes, in answer to your question: web workers do run properly on a single core client. You will lose some of the performance benefits but the code will still behave as it would in a multi core system.
I've seen some older posts touching on this topic but I wanted to know what the current, modern approach is.
The use case is: (1) assume you want to do a long running task on a video file, say 60 seconds long, say jspm install that can take up to 60 seconds. (2) you can NOT subdivide the task.
Other requirements include:
need to know when a task finishes
nice to be able to stop a running task
stability: if one task dies, it doesn't bring down the server
needs to be able to handle 100s of simultaneous requests
I've seen these solutions mentioned:
nodejs child process
webworkers
fibers - not used for CPU-bound tasks
generators - not used for CPU-bound tasks
https://adambom.github.io/parallel.js/
https://github.com/xk/node-threads-a-gogo
any others?
Which is the modern, standard-based approach? Also, if nodejs isn't suited for this type of task, then that's also a valid answer.
The short answer is: Depends
If you mean a nodejs server, then the answer is no for this use case. Nodejs's single-thread event can't handle CPU-bound tasks, so it makes sense to outsource the work to another process or thread. However, for this use case where the CPU-bound task runs for a long time, it makes sense to find some way of queueing tasks... i.e., it makes sense to use a worker queue.
However, for this particular use case of running JS code (jspm API), it makes sense to use a worker queue that uses nodejs. Hence, the solution is: (1) use a nodejs server that does nothing but queue tasks in the worker queue. (2) use a nodejs worker queue (like kue) to do the actual work. Use cluster to spread the work across different CPUs. The result is a simple, single server that can handle hundreds of requests (w/o choking). (Well, almost, see the note below...)
Note:
the above solution uses processes. I did not investigate thread solutions because it seems that these have fallen out of favor for node.
the worker queue + cluster give you the equivalent of a thread pool.
yea, in the worst case, the 100th parallel request will take 25 minutes to complete on a 4-core machine. The solution is to spin up another worker queue server (if I'm not mistaken, with a db-backed worker queue like kue this is trivial---just make each point server point to the same db).
You're mentioning a CPU-bound task, and a long-running one, that's definitely not a node.js thing. You also mention hundreds of simultaneous tasks.
You might take a look at something like Gearman job server for things like that - it's a dedicated solution.
Alternatively, you can still have Node.js manage the requests, just not do the actual job execution.
If it's relatively acceptable to have lower then optimal performance, and you want to keep your code in JavaScript, you can still do it, but you should have some sort of job queue - something like Redis or RabbitMQ comes to mind.
I think job queue will be a must-have requirement for long-running, hundreds/sec tasks, regardless of your runtime. Except if you can spawn this job on other servers/services/machines - then you don't care, your Node.js API is just a front and management layer for the job cluster, then Node.js is perfectly ok for the job, and you need to focus on that job cluster, and you could then make a better question.
Now, node.js can still be useful for you here, it can help manage and hold those hundreds of tasks, depending where they come from (ie. you might only allow requests to go through to your job server for certain users, or limit the "pause" functionality to others etc.
Easily perform Concurrent Execution to LongRunning Processes using Simple ConcurrentQueue. Feel free to improve and share feedback.
👨🏻💻 Create your own Custom ConcurrentExecutor and set your concurrency limit.
🔥 Boom you got all your long-running processes run in concurrent mode.
For Understanding you can have a look:
Concurrent Process Executor Queue
I have a web application that simply acts as a Front Controller using Spring Boot to call other remote REST services where I am combining Spring's DeferredResult with Observables subscribed on Scheduler.computation().
We are also using JMeter to stress out the web application, and we have noticed that requests start to fail with a 500 status, no response data and no logs anywhere when the number of concurrent threads scheduled in JMeter increases from 25, which obviously is a very "manageable" number for Tomcat.
Digging into the issue with the use of VisualVM to analyze how the threads were being created and used, we realized that the use of rx.Schedulers was somehow impacting the number of threads created by Tomcat NIO. Let me summarize our tests based on the rx.Scheduler used and a test in JMeter with 100 users (threads):
SCHEDULERS.COMPUTATION()
As we're using the Schedulers.computation() and my local machine has 4 available processors, then 4 EventLoop thread pools are created by RxJava (named RxComputationThreadPool-XXX) and ONLY 10 of Tomcat (named http-nio-8080-exec-XXX), as per VisualVM:
http://screencast.com/t/7C9La6K4Kt6
SCHEDULERS.IO() / SCHEDULERS.NEWTHREAD()
This scheduler seems to basically act as the Scheduler.newThread(), so a new thread is always created when required. Again, we can see lots of threads created by RxJava (named RxNewThreadScheduler-XXX), but ONLY 10 for Tomcat (named http-nio-8080-exec-XXX), as per VisualVM:
http://screencast.com/t/K7VWhkxci09o
SCHEDULERS.IMMEDIATE() / NO SCHEDULER
If we disable the creation of new threads in RxJava, either by setting the Schedulers.immediate() or removing it from the Observable, then we see the expected behaviour from Tomcat's threads, i.e. 100 http-nio-8080-exec corresponding to the number of users defined for the JMeter test:
http://screencast.com/t/n9TLVZGJ
Therefore, based on our testing, it's clear to us that the combination of RxJava with Schedulers and Tomcat 8 is somehow constraining the number of threads created by Tomcat... And we have no idea why or how this is happening.
Any help would be much appreciated as this is blocking our development so far.
Thanks in advance.
We have several Linux processes implemented in various technologies, Java, C++, etc. They interact with each other by passing messages on Websphere MQ. If any process crashes, we would like it to be restarted automatically for a configured number of times.
Would it involve a change in the applications, such as periodically raising a heartbeat to indicate that the application is in good health?
Thanks,
Yash
At my previous job we had a similar problem.
We developed our own solution. We implemented a watcher program in two different technologies: one in Java and one in C++ using QT. Each had a list of programs to watch. For every program that the watcher watch we had maximum time between two heartbeats, what program to run on every heartbeat and what program to run if the time between two heartbeats maxed out.
Watcher in Java had an entry for the watcher in C++ and vise-versa.