I have a question that how does the "polling SCM in jenkins work"? I am seeking an explanation in terms of how it creates a thread or a process (if it does) to poll, while continuously polling let's say (every 5 minutes). Does polling create only a new thread in an existing process or does it create new processes? What it does to these created processes/threads once the polling returns the change or no change?
I am suspecting, too much polling is keeping my jenkins server busy and having performance issues. However, I could not find proper documentation or discussion that explains this process. I would like be educated on this.
Related
In Node.js cluster mode, if multiple jobs exist in the event loop for one process, should the current job crash the process, what happens to the remaining job?
I'm assuming the remaining jobs in the event loop would go unfulfilled or return a server error. My question is, why is this an acceptable risk? Why would someone opt to use Node.js cluster mode in production then, rather than use something like PHP in production, where there is no risk of this, because PHP handles each request in its own process.
Edit:
Obviously this doesn't just apply to Node.js cluster mode. It can happen on a single instance, in which case obviously the end user would just get a server error. Cluster mode just happens to be my personal use case.
I'm looking for a way to pick back up a job in the queue job should a previous job cause the process to exit, before the subsequent job gets a change to be fulfilled. I am currently reading about how you can use a tool like RabbitMQ to handle your job queue outside of the node.js cluster, and each cluster instance just pulls jobs from the RabbitMQ queue. If anyone has any input on that, that would also be greatly appreciated.
If multiple jobs exist in the event loop for one process. What happens to the remaining jobs if the current job crashes the process?
If a node.js process crashes, the same thing happens to it that happens to any other process. All open sockets get automatically disconnected and the client will receive an immediate close on their socket (socket connection dropped essentially).
If you were using a Java server that was in the middle of handling 10 requests (perhaps in threads) and it crashed, the consequences would be the same. All 10 socket connections would get dropped.
If process isolation from one request to another is your #1 criteria for selecting a server environment, then I guess you wouldn't pick any environment that ever serves multiple requests from the same process. But, you would give up a lot of get that. One of the reasons for the node.js design is that is scales really, really well for a high number of concurrent connections that are all doing mostly I/O things (disk, networking, database stuff, etc...) which happens to be most web servers. Whereas a design that fires up a new process for every incoming connection does not scale as well for a large number of concurrent connections because a process is a much more heavy-weight thing in the eyes of the operating system (memory usage, other system resource usage, task switching overhead, etc...) than the way node.js does things.
And, there are obviously hundreds of other considerations too when choosing a server environment. So, you kind of have to look at the whole picture of what you're designing for and make the best set of tradeoffs.
In general, I wouldn't put this issue anywhere on the radar for why you should choose one over the other unless you expect to be running risky code (perhaps out of your control) that crashes a lot and this issue is therefore more important in your deployment than all the other differences. And, if that was the case, I'd probably isolate the risky code to its own process (even when using nodejs) to alleviate any pain from that crash. You could have a process pool waiting to process risky things. For example, if you were running code submitted by a user, I might run that code in its own isolated VM.
If you're just worried about your own code crashing a lot, then you probably have bigger problems and need more extensive unit testing, more robust error handling and need to take advantage of other tools just as a linter and other code analysis tools to find potential problem areas. With proper design, implementation and error handling, you should be able to keep a single incoming request from harming anything other than itself. That's certainly the philosophy that every server environment that serves multiple requests from the same process advises and the people/companies deploying those servers use.
I am using an bpmn process which is already running using thread and also using spring ftp where the Task scheduler thread is running but I found the application is cannot switch from the threads. Is there any way to invoke the task-scheduler process without any interrupt and I am using InboundchannelAdapter to copy files from FTP. Please suggest any feasible way to resolve the issue.
I don't see any issues in your question. And to be honest it fully isn't clear.
Please, be more specific and sharing some code/config/logs/stack-trace sometime is really useful. More info, more chance to get quick and proper answer.
I guess your problem that you download files from FTP and in the same thread run a BPM process which might block eventually waiting for some actor action.
Fro this purpose you should shift Spring Integration flow on the <poller> to different thread and don't steal task-scheduler resources. They are really so expensive for the whole system. Consider to use enough big ThreadPoolTaskExecutor for the task-executor reference on the <poller>. Also there is an ExecutorChannel for you with similar thread shifting capabilities.
I currently have an API running that delegates an ingestion job to a short polling bash script. I wan't to know if using something like the native adapter of strongMQ would consume less resources than the short-polling implementation.
https://github.com/strongloop/strong-mq
If message queuing is less resource-intensive than short-polling how exactly is this possible considering that an implementation of a message queue is a separately running microservice which is yet another node-process or "cluster" running on the operating system.
It comes down to polling frequency and the overhead for each poll.
If you aren't polling very frequently, then it may consume less resources to do something like a bash script in a crontab because the resources are only consumed during the poll.
If you are polling so frequently that your polling script spends more time running than sleeping, then it may make more sense to use something like a message queue and pay a smaller but more constant overhead tax.
"short polling" using a shell script
Each cycle involves starting a bash process, and that bash process forks off a handful of child processes to perform tasks like reading a file and running the results through grep, then you've got some measurable overhead. If the result of the checks are positive, you fork off some other script or process to perform the queued action.
message queue polling
Each cycle involves inspecting a value in memory against some sort of conditional. If the condition is met, send a notification message/packet to a connected client. The client then performs whatever action was queued.
I've seen some older posts touching on this topic but I wanted to know what the current, modern approach is.
The use case is: (1) assume you want to do a long running task on a video file, say 60 seconds long, say jspm install that can take up to 60 seconds. (2) you can NOT subdivide the task.
Other requirements include:
need to know when a task finishes
nice to be able to stop a running task
stability: if one task dies, it doesn't bring down the server
needs to be able to handle 100s of simultaneous requests
I've seen these solutions mentioned:
nodejs child process
webworkers
fibers - not used for CPU-bound tasks
generators - not used for CPU-bound tasks
https://adambom.github.io/parallel.js/
https://github.com/xk/node-threads-a-gogo
any others?
Which is the modern, standard-based approach? Also, if nodejs isn't suited for this type of task, then that's also a valid answer.
The short answer is: Depends
If you mean a nodejs server, then the answer is no for this use case. Nodejs's single-thread event can't handle CPU-bound tasks, so it makes sense to outsource the work to another process or thread. However, for this use case where the CPU-bound task runs for a long time, it makes sense to find some way of queueing tasks... i.e., it makes sense to use a worker queue.
However, for this particular use case of running JS code (jspm API), it makes sense to use a worker queue that uses nodejs. Hence, the solution is: (1) use a nodejs server that does nothing but queue tasks in the worker queue. (2) use a nodejs worker queue (like kue) to do the actual work. Use cluster to spread the work across different CPUs. The result is a simple, single server that can handle hundreds of requests (w/o choking). (Well, almost, see the note below...)
Note:
the above solution uses processes. I did not investigate thread solutions because it seems that these have fallen out of favor for node.
the worker queue + cluster give you the equivalent of a thread pool.
yea, in the worst case, the 100th parallel request will take 25 minutes to complete on a 4-core machine. The solution is to spin up another worker queue server (if I'm not mistaken, with a db-backed worker queue like kue this is trivial---just make each point server point to the same db).
You're mentioning a CPU-bound task, and a long-running one, that's definitely not a node.js thing. You also mention hundreds of simultaneous tasks.
You might take a look at something like Gearman job server for things like that - it's a dedicated solution.
Alternatively, you can still have Node.js manage the requests, just not do the actual job execution.
If it's relatively acceptable to have lower then optimal performance, and you want to keep your code in JavaScript, you can still do it, but you should have some sort of job queue - something like Redis or RabbitMQ comes to mind.
I think job queue will be a must-have requirement for long-running, hundreds/sec tasks, regardless of your runtime. Except if you can spawn this job on other servers/services/machines - then you don't care, your Node.js API is just a front and management layer for the job cluster, then Node.js is perfectly ok for the job, and you need to focus on that job cluster, and you could then make a better question.
Now, node.js can still be useful for you here, it can help manage and hold those hundreds of tasks, depending where they come from (ie. you might only allow requests to go through to your job server for certain users, or limit the "pause" functionality to others etc.
Easily perform Concurrent Execution to LongRunning Processes using Simple ConcurrentQueue. Feel free to improve and share feedback.
👨🏻💻 Create your own Custom ConcurrentExecutor and set your concurrency limit.
🔥 Boom you got all your long-running processes run in concurrent mode.
For Understanding you can have a look:
Concurrent Process Executor Queue
I have a site which sometimes takes particularly long to process a request (and that's not a defect). 99% of the time it's pretty quick because it almost doesn't do any processing.
I want to show a message that says "Loading" when the site takes long to process the request. My site uses mod_wsgi and Apache. The way I see it, I would respond saying 'Loading' before completing the processing and do one of two things right before:
-spawn a (daemon) thread to take care of the processing.
-communicate through socket with other process and tell it to take care of the processing (most likely send request to http://localhost:8080/do_processing).
What are the pros and cons of one approach vs the other?
Using a separate process is better. It does not have to be hard at all as suggested in another answer as you can use an existing system for doing exactly that such as Celery (http://celeryproject.org/). Relying on in process threads is not necessarily a good idea unless you are going to implement an internal job queueing system of your own to prevent blowing out of number of threads. Also, in a multiprocess server configuration you cant be guaranteed a request comes back to the same process and so not easy to get status of a running operation. Finally, the web server processes could get killed off and thus your background task could also be killed before it finishes. You would need to have a mechanism for holding state which can survive such an event if that was important. Far easier to use something like Celery.
The process route requires quite a bit of a system processing. Creation of a separate process is relatively expensive and slow. However if your process crashes it doesn't affect your main governing process (you will receive the exit status code and will have an opportunity to respawn a new working process). You will also need some sort of InterProcessCommunication layer (can be a socket, pipe, shared memory, etc...) which is adds to complexity if your project.
Threads are lightweight and cheap. All you need to do is to manage concurrent access to shared resources. So it really depends on the task you have in mind. Threads probably will be more likely the appropriate way to implement your task.