Spring IntegrationFlow Disallow Concurrent Execution

Spring IntegrationFlow Disallow Concurrent Execution - spring-integration

Is there a way we could instruct Spring IntegrationFlow DSL to avoid concurrent execution, for example - first run is still not completed however as per the Poller it is time for the second run? Similar to #DisallowConcurrentExecution we use in Spring Batch Quartz scheduler.
Thanks

To avoid concurrent execution from the polling channel adapter, you really must not use anything that could lead for parallelism. First: don't use a TaskExecutor for the polling tasks. Second don't use fixedRate and just fixedDelay. The second one behaves the way that it does schedule the next polling task only when the previous has finished. And if you don't shift the work to other thread (see that TaskExecutor), anything is going to be performed on the same scheduled thread.
Technically what you are asking is there by default if you just use a fixedDelay for the poller and no more other options are configured.
Although you need to keep in mind that that the rest of your flow must be direct as well: no ExecutorChannel or QueueChannel in use!
Also see docs for conditional polling if you still cannot make your flow direct and blocking: https://docs.spring.io/spring-integration/docs/current/reference/html/core.html#conditional-pollers

Related

Spring Integration MDC for Async Flow and Task Executors

I have a flow that starts with a poller and hands off the message to several async flows downstream using task-executors to execute in parallel for a given dataset. A downstream aggregator completes the flow and notifies the poller that the flow is complete.
I would like to track every execution of the poller by using MDC so that the logs can be mapped to a particular execution of the flow.
I started by adding MDC to the poller thread (using Advice), however with this approach there could be a couple of issues:
How do I stamp the MDC on the executor thread when the async hand off happens?
Since executor uses a a thread pool, do I need to clear the MDC before the thread returns to the pool? Will there be any side effects?
Another approach would be to add MDC to the Message header and set it manually on the new thread during the async handoff. How to do that? For example, if I turn on the debug logs, the MDC should be stamped right from the beginning of the new thread execution and not from the point where my logic starts in the service activator.
How to set this on the task-executor thread (and probably also remove before returning to the pool) using XML configuration? Something like an MdcAwareThreadPoolExecutor seen here.
Also, I would not want the MDC logic to be spread across all the async handoff endpoints, may be there is some generic way to configure it?
Is there a better way to achieve this? Any known solutions?

I would like to track every execution of the poller by using MDC so that the logs can be mapped to a particular execution of the flow.
It is fully sound as "you would like to track the message journey in your flow". As you noticed there is the way to set some message header. So, why just don't map your logs by this specific header?
You can take a look into Message History pattern how to gather the whole path for the message, so then in logs you can track it back looking into message headers.
See here: https://docs.spring.io/spring-integration/docs/5.3.2.RELEASE/reference/html/system-management.html#message-history
If you really still insist on the MDC, then you definitely need to take a look into some MDCDelegatingExecutorDecorator. Some sample you can borrow from Spring Security and its DelegatingSecurityContextExecutor`: https://docs.spring.io/spring-security/site/docs/5.4.0/reference/html5/#concurrency

How splitting in spring integration works for web container?

I want to use Spring Integration for HTTP inbound message processing.
I know, that it spring integration channel would run on a container thread, but if I want to use splits,
what threads would be used?
How the result of split would be returned to the initial web request thread?

(Note: I am not 100% sure if I understand you use case, but as a general remark:)
The spring integration spitter splits a message in multiple "smaller" messages. This is unrelated to multi-threading, that is, it does not per-se imply that the smaller messages are processed in parallel. It is still a sequential stream of smaller messages.
You can then process the smaller messages in parallel, by defining a handler with a given parallelism and you can define that this handler uses a dedicated thread pool.
(Sorry if this does not answer your question, please clarify).

Does Node.js need a job queue?

Say I have a express service which sends email:
app.post('/send', function(req, res) {
sendEmailAsync(req.body).catch(console.error)
res.send('ok')
})
this works.
I'd like to know what's the advantage of introducing a job queue here? like Kue.

Does Node.js need a job queue?
Not generically.
A job queue is to solve a specific problem, usually with more to do than a single node.js process can handle at once so you "queue" up things to do and may even dole them out to other processes to handle.
You may even have priorities for different types of jobs or want to control the rate at which jobs are executed (suppose you have a rate limit cap you have to remain below on some external server or just don't want to overwhelm some other server). One can also use nodejs clustering to increase the amount of tasks that your node server can handle. So, a queue is about controlling the execution of some CPU or resource intensive task when you have more of it to do than your server can easily execute at once. A queue gives you control over the flow of execution.
I don't see any reason for the code you show to use a job queue unless you were doing a lot of these all at once.
The specific https://github.com/OptimalBits/bull library or Kue library you mention lists these features on its NPM page:
Delayed jobs
Distribution of parallel work load
Job event and progress pubsub
Job TTL
Optional retries with backoff
Graceful workers shutdown
Full-text search capabilities
RESTful JSON API
Rich integrated UI
Infinite scrolling
UI progress indication
Job specific logging
So, I think it goes without saying that you'd add a queue if you needed some specific queuing features and you'd use the Kue library if it had the best set of features for your particular problem.
In case it matters, your code is sending res.send("ok") before it finishes with the async tasks and before you know if it succeeded or not. Sometimes there are reasons for doing that, but sometimes you want to communicate back whether the operation was successful or not (which you are not doing).

Basically, the point of a queue would simply be to give you more control over their execution.
This could be for things like throttling how many you send, giving priority to other actions first, evening out the flow (i.e., if 10000 get sent at the same time, you don't try to send all 10000 at the same time and kill your server).
What exactly you use your queue for, and whether it would be of any benefit, depends on your actual situation and use cases. At the end of the day, it's just about controlling the flow.

Transaction boundaries without using pollers

Our project has the following flow pattern:
<input-flow> | <routing-flow> | <output-flow>
Where the pipes symbolize the transaction boundaries and all flows are multi threaded using TaskExecutors. In the input-flow, the transaction is started by the message-driven-channel-adapter, but in the routing-flow and output-flow it is currently started by a poller which causes latency.
To avoid the poller latency, I would like to create the transaction boundaries using ExecutorChannels, but the ExecutorChannel does not start a transaction for the flow.
Are there other possibilities to achieve this?

You can avoid the latency by reducing the polling interval (even to 0) and increasing the receive timeout (at the expense of tying up a scheduler thread to wait for messages).
For an executor channel, you can insert a transactional gateway in the flow (see this answer for an example, or use AOP to start the transaction on a direct channel send() somewhere downstream of the executor.

configure a task executor for parellel processing of intemediate steps under transaction

The SI workflow starts with an inbound-channel-adapter which runs under a new transaction started by a pollar.
The adapter triggers a data processing flow which kind of fans out, for example the adapter polls a database and get few rows, splits and next adapter makes another db call (hence for each row) as input which produces several other rows and so on.
Right now its running as single threaded behavior as I only want to commit the original transaction when everything went well.
Now I want to to speed up the processing by running it under more threads, so if one of my original adapter produced 3 rows, I want to process them simultaneously for downstream flow and so on.
is this possible and if yes, how can I define a global task executor with some configuration and allow the processing at various stages to be executed under this.
Thank you

You could insert a <gateway/> in the flow between the inbound adapter and the rest of the flow; the poller thread will be suspended in the gateway awaiting a reply. You would likely need an aggregator to send the reply to the gateway when the other tasks are complete. You would discard the reply.
<service-activator input-channel="fromAdapter" ref="gw" output-channel="nullChannel" />
<gateway default-request-channel="toRestOfFlow" />
The problem is if the reply is never received the thread will wait forever. If you add a reply-timeout, the transaction will commit after the timeout. So you may need some additional logic to handle that, perhaps adding your own bean that invokes the gateway and detects a null reply.
Of course, in all these scenarios; the async tasks cannot participate in the original transaction.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string