We have a Spring Integration application which polls from a database, marshalls into XML, and then sends that XML as a message to a downstream system. The application works, but the issue we face is that some of the result sets can be massive. In such cases, it appears Spring Integration cannot handle the transformation because the result set is too big to handle in memory. I do not see a Stax marshaller in Spring Integration, as there is say in Spring Batch. That actually makes sense, because messaging usually means working with many small messages and not large files.
One option we have is to develop a Spring Batch application instead.
Is there a design we could adopt for Spring Integration to handle this? Does Spring Integration have any notion of streaming? For instance, would it be possible to read the result set in chunks, transform each piece, and each piece as a separate message which is part of a set? Or is Spring Batch just a better fit?
Thanks very much
You can set max-rows-per-poll to limit the size of each result set.
If that's not practical then you can use a combination of Spring Batch and Spring Integration - have the batch ItemWriter send the chunks into a Spring Integration flow via a <gateway/>.
Related
I manage a now fairly large Spring Integration application. Recently, we wanted to add a batch job using Spring Batch (rationale: administration, monitoring, scheduling, ability to restart in case of failure anywhere along the way).
Since we have developed a lot of connectors to the company services, we planned to reuse them in the batch, so I have been looking closely at how Spring Batch and Spring Integration work together (among other things: Spring Batch Integration). What we had in mind was to somehow implement Spring Batch tasklets using (already available) Spring Integration components.
I may have missed something fundamental at the Spring core level but I wasn't able to figure out a simple way to "invoke" a Spring Integration endpoint from a Spring Batch tasklet (if we forget complicated plumbing like RMI calls, "remote chunking", "remote partitioning"...)
Did I miss something ?
This is a very common use case (invoking an integration flow from a batch job/step).
Simply wire an integration gateway into the tasklet.
In recent applications, it's quite common to use the MessagingGateway annotation or the DSL. But, if you're more familiar with XML, then a <gateway/> will work well too.
I have been reading much Spring Cloud DataFlow and related documentation in order to produce a data ingest solution that will run in my organization's Cloud Foundry deployment. The goal is to poll an HTTP service for data, perhaps three times per day for the sake of discussion, and insert/update that data in a PostgreSQL database. The HTTP service seems to provide 10s of thousands of records per day.
One point of confusion thus far is a best practice in the context of a DataFlow pipeline for deduplicating polled records. The source data do not have a timestamp field to aid in tracking polling, only a coarse day-level date field. I also have no guarantee that records are not ever updated retroactively. The records appear to have a unique ID, so I can dedup the records that way, but I am just not sure based on the documentation how best to implement that logic in DataFlow. As far as I can tell, the Spring Cloud Stream starters do not provide for this out-of-the-box. I was reading about Spring Integration's smart polling, but I'm not sure that's meant to address my concern either.
My intuition is to create a custom Processor Java component in a DataFlow Stream that performs a database query to determine whether polled records have already been inserted, then inserts the appropriate records into the target database, or passes them on down the stream. Is querying the target database in an intermediate step acceptable in a Stream app? Alternatively, I could implement this all in a Spring Cloud Task as a batch operation which triggers based on some schedule.
What is the best way to proceed with respect to a DataFlow app? What are common/best practices for achieving deduplication as I described above in a DataFlow/Stream/Task/Integration app? Should I copy the setup of a starter app or just start from scratch, because I am fairly certain I'll need to write custom code? Do I even need Spring Cloud DataFlow, because I'm not sure I'll be using its DSL at all? Apologies for all the questions, but being new to Cloud Foundry and all these Spring projects, it's daunting to piece it all together.
Thanks in advance for any help.
You are on the right track, given your requirements you will most likely need to create a custom processor. You need to keep track of what has been inserted in order to avoid duplication.
There's nothing preventing you from writing such processor in a stream app, however performance may take a hit, since for each record you will issue a DB query.
If order is not important, you could parallelize the query so you could process several concurrent messages, but in the end your DB would still pay the price.
Another approach would to use a bloomfilter that can help quite a lot on speeding up your checking for inserted records.
You can start by cloning the starter apps, you could have a poller trigger an http client processor that fetches your data and then go through your custom code processor and finally to a jdbc-sink. Something like stream create time --triger.cron=<CRON_EXPRESSION> | httpclient --httpclient.url-expression=<remote_endpoint> | customProcessor | jdbc
One of the advantages of using SCDF is that you could independently scale your custom processor via deployment properties such as deployer.customProcessor.count=8
Spring Cloud Data Flow builds integration streams for data based on the Spring Cloud Stream, which, in turn, is fully based on the Spring Integration. And all the principles exist in Spring Integration can be applied everywhere there on the SCDF level.
That really might be a case that you won't be able to avoid some codding, but what you need is called in EIP Idempotent Receiver. And Spring Integration provides one for us:
#ServiceActivator(inputChannel = "processChannel")
#IdempotentReceiver("idempotentReceiverInterceptor")
public void handle(Message<?> message)
I am looking to understand what is the best way to persist the output of an integration flow containing an aggregator (with release strategy spanning approx. 3 minutes) into a datasource.
I've been considering JDBC, but open to other options as well. Ideally I could use a spring integration component that deals well with buffered writes, maintaining jdbc connections with low overhead and that can operate at a spring data level (I want to work with a domain model not with raw SQL updates).
JDCBOutboundGateway and JDBCMessageHandler seem to be limited by SQL direct
interaction.
I considered JDBCChanelMessageStore but seemed that would be limited by a binary representation of the messages.
Is a service activator a more suited choice? Would I have to read service activator inputs from a pollable queue in order to ensure better write performance?
Transactional constraints are also under consideration for this integration flow with the goal to accommodate about 10K messages/hour.
Any directions are highly appreciated. Have been having trouble finding resources about JDBC output from integration flows (mostly mentions of jdbc message handlers and gateways without specific considerations for performance and data model flexibility).
I have too many emails. I should write scheduler in order to send messages to them. Messages are different. I use spring framework 4.x.
I can write simple class, which connects to SMTP server. But in this case I should write my thread library too in order to send emails parallel.
Do spring have already written library which give me more flexible way to do this tasks? I do not want to use threads. It will be nice if spring already have this functionality.
Do I need Spring integration for this?
Best regards,
Yes, you definitely can do that with Spring Integration, because there is an ExecutorChannel implementation with can be supplied with an TaskExecutor from the Spring Core:
<channel id="sendEmailChannel">
<dispatcher task-executor="threadPoolTaskExecutor"/>
</channel>
<int-mail:outbound-channel-adapter channel="sendEmailChannel" mail-sender="mailSender"/>
But anyway you should keep in mind that all Spring Integration components are based on the Java and that ExecutorService is used on the background.
From other side if you need only the mail sending stuff from the Spring Integration, it would be an overhead and can simply use Core Spring Framework legacy like JavaMailSender as a bean and #Async for the sendMail method to achieve your parallel requirement.
UPDATE
could you tell me whether I need JMS for this situation?
I don't see any JMS-related stuff here. You don't have (or at least don't show) any real integration points in your solution. The same I can say even about Spring Integration just for email sending. However with the Spring Boot your SI config will be enough short. From other side if you'll study Spring Integration better eventually you'll get more gain to rely on the Integration components for your systems, as internally, as well as externally with other systems through JMS, AMQP, Kafka etc.
To be honest: a lot of years ago my first acquaintance with Spring Integration was due the requirement to get files from the FTP and have ability to pick up new files automatically. I found the solution only in the Spring Integration 1.0.0.M1. After that short XML config for the <int-ftp:inbound-channel-adapter> I loved Spring Integration and since that time it became as a part of my life. :-)
So, it's up to you to go ahead with Spring Integration in your simple app, or just follow with more formal solution with JavaMailSender direct usage.
You should use java executors framework. For example you can write something like the code below:
ExecutorService executor = Executors.newWorkStealingPool();
executor.execute(() -> mailSender.send(mail));
Spring integration is used to make the communication between two systems easier.
So does that mean the if two systems are talking using JMS queus, then we can remove queues and integrate two systems using Spring Integration Framework?
Looks like you should study more on the matter. I mean EIP-book, Spring Integration in Action or, at least the Reference Manual from Spring IO site.
The main Spring Integration goal is integration and JMS is only one way to do that.
If your two system can get deal with JMS, there is no stops to integrate them using Spring Integration: just provide JMS adapters for boht of them.
As per my understanding,
Spring Integration solution a framework level solutions to Design Patterns listed in http://eaipatterns.com/
I might be an alternate to ESB which is like one big tunnel & everything passes through that.
Spring Integration provides end to end message between different de-coupled, disparate connecting points.
On the other hand JMS is an API in the Java EE spec which can be used in conjunction with Spring-Integration.
You might as well want to read about AMQP which is a messaging protocol.