concurrent consumers in context to spring integration - spring-integration

In spring integration application, I am using concurrent consumers to consume and process the multiple messages at a time.
In my application, I configured all beans to a singleton. I am assuming if I am going to parallelize the processing by using the concurrent consumer's, multiple messages entered into same integration components.
Does it leads to data collision between two objects?

Does it leads to data collision between two objects?
No, that doesn't mean. If you don't do any state management in your components, then the is not going to be any collisions. Just because one thread can perform only one task at a time. So, if you use the same component in different threads to perform stateless work, there is no any inter-thread interaction. Just because each thread get its own call stack.

Related

Kinesis Streams and Spring Integration Channels

I am developing a consumer which consumes events from multiple Kinesis streams. I have some questions to understand the best practices.
Should I create one channel per stream? What factors should be considered to decide between "channel per stream" or "one channel for all streams"?
Which channel fits better for my case performance wise? There are different channel types like PollableChannel, SubscribaleChannel and DirectChannel.
Thank you
The KinesisMessageDrivenChannelAdapter is an active component and it performs consumption and message sending in the task executor. Therefore you might think do not shift messages to the QueueChannel or an ExecutorChannel - the logic is already async and involves enough threads on the machine. It is really much better do not shift the processing to a separate thread and keep this consumption thread busy and don't poll more records from the Kinesis into the memory.
One KinesisMessageDrivenChannelAdapter can do, essentially, the same work for several streams as several separate adapters for different stream - the thread capacity on the machine is going to be used.
We need different channel adapters in case of different processing logic or different data types, or different Kinesis Client options. In all other cases the single instance is pretty sufficient.

What abstraction to use for an ASynchronous data collection driver

I would like to implement a mechanism in my server application but I'm not sure which OTL abstraction would be best appropriate.
My application collects data about various types of equipements.
Some of them use synchronous communication, thus generating Delphi event in my server application. (push-like)
Some of them use asynchronous communication, requiring my application to periodically request the latest data available. (pull like)
Because I want my server application to stay responsive while requesting as frequently as possible the new available data, I want to put that "pull driver" within a separated thread that will request all the configured data points one by one.
I'd like my main thread to spawn this OTL object and then receive the result as a delphi event in the main thread. This would emulate the "push-like" that my server main code is already made for.
Think of it as a thread you launch that periodically request a data you want to monitor and only send you an event when the value has changed.
Which OTL abstraction (high level? Low level?) do you think would be appropriate to this behavior?
Thank you.
I'm not sure OTL gives you much benefit here at all, to be honest. I write a lot of classes for managing hardware devices and the class model is almost invariably a plain TThread descendent. OTL is nice for spinning off tasks and work packages, queues, parallel calculations, etc. In this case, however, you don't want to do any of that. What you do want is a class that models your device and encapsulates the functions it can perform.
This is going to be a single worker thread dedicated to pumping reads and writes to the device. It is going to be a long-lived thread that will persist as long as the class that encapsulates the device remains alive - TThread makes sense for this. Your thread is going to be a simple loop that runs continuously, polling all the required data and flushing any write requests.
The class will also serve as a data cache for the device parameters and you will need some sort of synchronization devices (mutex, critical section, etc) to protect reads and writes to those fields through properties so again it makes sense that these sync objects also exist as class fields and that your thread and class model live together in a single entity. If you want event notifications, these too conveniently wrap into the same model. One device, one thread, one class. It's a perfect job for a TThread descendent.

Re-engineer POJOs to EJBs or client transaction

I have a couple of questions regarding EJB transactions. I have a situation where a process has become longer running that originally intended and is sometimes failing due to server timeout's being exceeded. While I have increased the timeouts initially (both total transaction and max transaction), for a long running process, I know that it make more sense to segment this work as much as possible into smaller units of work that don't fail based on timeout. As a result, I'm looking for some thoughts or references regarding next course of action based on the background below and the questions that follow.
Environment:
EJB 3.1, JPA 2.0, WebSphere 8.5
Background:
I built a set of POJOs to do some batch oriented work for an enterprise application. They are non-EJB POJOs that were intended to implement several business processes (5 related, sequential processes, each depending on it's predecessor). The POJOs are in a plain Java project, not an EJB project.
However, these POJOs access an EJB facade for database access via JPA. The abstract core of the 5 business processes does the JNDI lookup for the EJB facade in order to return the domain objects for processing. Originally, the design was to run from the server completely, however, a need arose to initiate these processes externally. As a result, I created an EJB wrapper so that the processes could be called remotely (individually or as a single process based on a common strategy interface). Unfortunately, the size of the data, both row width and row count, has grown well beyond the original intent.
The processing time required to complete these batch processes has increased significantly (from around a couple of hours to around 1/2 a day and could increase beyond that). Only one of the 5 processes made sense to multi-thread (I did implement it multi-threaded). Since I have the wrapper EJB to initiate 1 or all, I have decided to create a new container transaction for each process as opposed to the single default transaction of "required" when I run all as a single process. Since the one process is multi-threaded, it would make sense to attempt to create a new transaction per thread, however, being a group of POJOs, I do not have transaction capability.
Question:
So my question is, what makes more sense and why? Re-engineer the POJOs to be EJBs themselves and have the wrapper EJB instantiate each process as a child process where each can have its own transaction and more importantly, the multi-threaded process can create a transaction per thread. Or does it make more sense to attempt to create a UserTransaction in the POJOs from a JNDI lookup in the container and try to manage it as if it were a bean managed transaction (if that's even a viable solution). I know this may be application dependent, but what is reasonable with regard to timeouts for a Java EE container? Obviously, I don't want run away processes, but want to make sure that I can complete these batch processes.
Unfortunatly, this application has already been deployed as a production system. Re-engineering, though it may be little more than assembling the strategy logic in EJBs, is a large change to the functionality.
I did look around for some other threads here and via general internet searches, but thought I would see if anyone had compelling arguments for one over the other or another solution entirely. Additional links that talk about a topic such as this are appreciated. I wrestled with whether to post this since some may construe this as subjective, however, I felt the narrowed topic was worth the post and potentially relevant to others attempting processes like this.
This is not direct answer to your question, but something you could consider.
WebSphere 8.5 especially for these kind of applications (batch) provides a batch container. The batch function accommodate applications that must perform batch work alongside transactional applications. Batch work might take hours or even days to finish and uses large amounts of memory or processing power while it runs. You can reuse your Java classes in batch applications, batch steps can be run in parallel in cluster and has transaction checkpoint management.
Take a look at following resources:
IBM Education Assistant - Batch applications
Getting started with the batch environment
Since I really didn't get a whole lot of response or thoughts for this question over the past couple of weeks, I figured I would answer this question to hopefully help others in making a decision if they run across this or a similar situation.
Ultimately, I re-engineered one of the POJOs into an EJB that acted as a wrapper to call the other POJOs. The wrapper EJB performed the same activity as when it was just a POJO, except for the fact that I added the transaction semantics (REQUIRES_NEW) on the primary method. The primary method calls the other POJOs based on a stategy pattern so each call (or POJO) gets its own transaction. Other methods in the EJB that call the primary method were defined with NOT_SUPPORTED so that I could separate the transactions for each call to the primary method and not join an existing transaction.
Full disclosure, the original addition of transaction semantics significantly increased the processing time (on the order of days), but the process did not fail due to exceeding transaction timeouts. It was the result of some unexpected problems with JPA Many-To-One relationships that were bringing back too much data. Data retreived as a result of a the Many-To-One relationship. As I mentioned originally, some of my data row width increased unexpectedly. That data increase was in the related table object, but the query did not need that data at the time. I corrected those issues by changing my queries (creating objects for SELECT NEW queries, changed relationships to FetchType.LAZY, etc).
Going forward, if I am able to dedicate enough time, I will transform the rest of those POJOs into EJBs. The POJO doing the most significant amount of work that is threaded has been implemented with a Callable implementation that is run via an ExecutorService. If I can transform that one, the plan will be to make each thread its own transaction. However, while I'm not sure yet, it appears that my container may already be creating transactions for each thread group (of 10 threads) due to status updates I'm seeing. I will have to do more investigation.

JMS MDB or ScheduledThreadPoolExecutor for asynchronous tasks

I've been using JMS Message Driven Bean for a while and it is working great for the asynchronous tasks. I know that there is many ways to handle the asynchronous processes, but I am just curious what are the benefits over using JMS Message Driven Bean and ScheduledThreadPoolExecutor?
For example I have a web service which handles some tasks asynchronously. So I see two main differences. If I would be using ScheduledThreadPoolExecutor I don't need application server, I could use a servlet container for e.g. Tomcat, because I am not using any EJB stuff, for MDB I need an application server, for e.g. Glassfish. But in terms of handling the actual asynchronous process, what are the advantages over each ScheduledThreadPoolExecutor and MDB?
ScheduledThreadPoolExecutor is used to schedule tasks, the abstraction best corresponding to MDB is ExecutorService. But back to your question.
MDB is more heavyweight, API is much more complex and in principle it was actually designed for transferring data, not logic. On the other hand ExecutorService is a thin layer on top of actual thread pool. So if you need performance, low latency and small overhead, go for ordinary thread pool.
The only reason for MDB and JMS is when you need durability and transaction support. That of course introduces even bigger overhead as each message needs to be persisted. But you won't loose any tasks that are queued or even in the middle of processing are not lost due to crash.

How to process data in multiple threads using EJB3?

Sometimes it would be useful to distribute the processing of some data to several threads in an EJB3 session bean.
Let's say that a stateless session bean fetches a lot of data from the database, splits it into several partitions and would like to spawn processing of those partitions in their own, parallel threads. What is the best way to accomplish this? Using message driven beans?
EDIT:
I would also need to somehow get informed, when all the MDBs have finished processing their data, so that the results could be combined and sent for the requester.
Yes. MDB. You are not permitted to start your own threads in an EJB, according to the spec.
Just a remimder, EJB 3 framework does all thread management for you. For developer, it is single thread and thread-safe programming. You are not allowed to create your own thread.

Resources