We have a requirement where we need to process multiple files from a folder (each file having 50K-100K records) and after doing some calculation store the data in database. Once the content of the file is processed then we need to trigger a second stream to do some further processing.
The way we have designed the solution is
File Reader -- File Splitter -- Processor -- Re-sequencer (release strategy per record) -- Sink (Load Data in Database)
The issue with the above design is from re-sequencer and Sink we are not able to figure out if all the contents of a file are successfully stored in database without which we cannot invoke the second stream.
We have done some benchmark activities with aggregator and it seems to be much slower than re-sequencer. Are there any known performance issues with aggregator? Are there any serious performance degradation if we have a release strategy of releasing only when all the data is with re-sequencer
Thanks,
Jayadeep
Processor 1 - Splits the lines
Processor 2 - Calculation on each message
<int:service-activator input-channel="input" ref="singlePointCalculator" method="onMessage"/>
<bean id="singlePointCalculator" class="com.processor.SinglePointCalculator">
<constructor-arg index="0" ref="output"/>
</bean>
Sink
<int:resequencer input-channel='input' output-channel='reseqChannel' release-strategy-expression="size() == 1" ></int:resequencer>
<int:transformer ref="mapTransformer" input-channel="reseqChannel" output-channel="mapChannel"/>
<bean id="mapTransformer" class="com.sink.MapTransformer">
<property name="columns" value=" side,value" />
</bean>
<int-jdbc:outbound-channel-adapter
data-source="dataSource"
channel="mapChannel"
query=" ">
</int-jdbc:outbound-channel-adapter>
Sink Using Aggregator
<int:aggregator input-channel='input' output-channel='aggChannel' " ></int:aggregator >
<int:transformer ref="mapTransformer" input-channel="aggChannel" output-channel="mapChannel"/>
<bean id="mapTransformer" class="com.sink.MapTransformer">
<property name="columns" value=" side,value" />
</bean>
<int-jdbc:outbound-channel-adapter
data-source="dataSource"
channel="mapChannel"
query=" ">
</int-jdbc:outbound-channel-adapter>
Related
Referring to my earlier question at URL - Spring integration multithreading requirement - I think I may have figured out the root cause of the issue.
My requirement in brief -
Poll the database after a fixed delay of 1 sec and then publish very limited data to Tibco EMS queue. Now from this EMS queue I have to do the following tasks all in multithreaded fashion :- i) consume the messages, ii) fetch the full data now from the database and iii) converting this data into json format.
My design -
`<int:channel id="dbchannel"/>
<int-jdbc:inbound-channel-adapter id="dbchanneladapter"
channel="dbchannel" data-source="datasource"
query="${selectquery}" update="${updatequery}"
max-rows-per-poll="1000">
<int:poller id="dbchanneladapterpoller"
fixed-delay="1000">
<int:transactional transaction-manager="transactionmanager" />
</int:poller>
</int-jdbc:inbound-channel-adapter>
<int:service-activator input-channel="dbchannel"
output-channel="publishchannel" ref="jdbcmessagehandler" method="handleJdbcMessage" />
<bean id="jdbcmessagehandler" class="com.citigroup.handler.JdbcMessageHandler" />
<int:publish-subscribe-channel id="publishchannel"/>
<int-jms:outbound-channel-adapter id="publishchanneladapter"
channel="publishchannel" jms-template="publishrealtimefeedinternaljmstemplate" />
<int:channel id="subscribechannel"/>
<int-jms:message-driven-channel-adapter
id="subscribechanneladapter" destination="subscriberealtimeinternalqueue"
connection-factory="authenticationconnectionfactory" channel="subscribechannel"
concurrent-consumers="5" max-concurrent-consumers="5" />
<int:service-activator input-channel="subscribechannel"
ref="subscribemessagehandler" method="logJMSMessage" />
<bean id="subscribemessagehandler" class="com.citigroup.handler.SubscribeJMSMessageHandler" />
</beans>
<bean id="authenticationconnectionfactory"
class="org.springframework.jms.connection.UserCredentialsConnectionFactoryAdapter">
<property name="targetConnectionFactory" ref="connectionFactory" />
<property name="username" value="test" />
<property name="password" value="test123" />
</bean>
<bean id="connectionFactory" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiTemplate">
<ref bean="jndiTemplate" />
</property>
<property name="jndiName" value="app.jndi.testCF" />
</bean>
<bean id="subscriberealtimeinternalqueue" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiTemplate">
<ref bean="jndiTemplate" />
</property>
<property name="jndiName"
value="app.queue.testQueue" />
</bean>
<bean id="jndiTemplate" class="org.springframework.jndi.JndiTemplate">
<property name="environment">
<props>
<prop key="java.naming.factory.initial">com.tibco.tibjms.naming.TibjmsInitialContextFactory
</prop>
<prop key="java.naming.provider.url">tibjmsnaming://test01d.nam.nsroot.net:7222</prop>
</props>
</property>
</bean>`
Issue -
Using message-driven-channel with concurrent consumers value set to 5. However, it looks like just one consumer thread (container-2) is created and is picking up the messages from EMS queue. Please find below the log4j log -
16 Aug 2018 11:31:12,077 INFO SubscribeJMSMessageHandler [subscribechanneladapter.container-2][]: Total count of records read from Queue at this moment is 387
record#1:: [ID=7694066395] record#2:: [ID=7694066423] .. .. .. record#387:: [ID=6147457333]
Probable root cause here -
May be its the first step in the configuration where I am polling the database to fetch the data after a fixed-delay that's causing this multithreading issue. Referring to the logs above, my assumption here is since the number of records fetched is 387 and all these are bundled into a List object (List> message), it is being considered as just 1 message/payload instead of 387 different messages and that's why just one thread/container/consumer is picking up this bundled message. Reason for this assumption is the logs below -
GenericMessage [payload=[{"ID":7694066395},{"ID":7694066423},{"ID":6147457333}], headers={json__ContentTypeId__=class org.springframework.util.LinkedCaseInsensitiveMap, jms_redelivered=false, json__TypeId__=class java.util.ArrayList, jms_destination=Queue[app.queue.testQueue], id=e034ba73-7781-b62c-0307-170099263068, priority=4, jms_timestamp=1534820792064, contentType=application/json, jms_messageId=ID:test.21415B667C051:40C149C0, timestamp=1534820792481}]
Question -
Is my understanding of the root cause correct? If yes then what can be done to treat these 387 messages as individual messages (and not one List object of messages) and publish them one by one without impacting the transaction management??
I had discussed this issue with https://stackoverflow.com/users/2756547/artem-bilan in my earlier post on stackoverflow and I had to check this design by replacing Tibco EMS with ActiveMQ. However, ActiveMQ infrastructure is is still being analysed by our architecture team and so can't be used till its approved.
Oh! Now I see what is your problem. The int-jdbc:inbound-channel-Adapter indeed returns a list of records it could select from the DB. And this whole list is sent as a single message to the JMS. That’s the reason how you see only one thread in the consumer side: there is just only one message to get from the queue.
If you would like to have separate messages for each pulled record, you need to consider to use a <splitter> in between JDBC polling operation and sending to JMS.
We are implementing a flow where a <int-sftp:inbound-streaming-channel-adapter/> polls a directory for a file and when found it passes the stream to a service activator.
The issue is we will have multiple instances of the app running and we would like to lock the process so that only one instance can pick up the file.
Looking at the documentation, Redis Lock Registry looks to be the solution, is there an example of this being used in xml?
All I can find is a few references to it and the source code for it.
http://docs.spring.io/spring-integration/reference/html/redis.html point 24.1
Added info:
Ive added the RedisMetaDataStore and SftpSimplePatternFileListFilter. It does work but it does have one oddity, when sftpInboundAdapter is activated by the poller it adds an entry for each file in the metadatastore. Say there are 10 files, there would be 10 entries in the datastore, but it does not process all 10 files in "1 go", only 1 file is processed per poll from the adapter, which would be fine, but in a multi instance environment if the server which picked up the files went down after processing 5 files, another server doesn't seem to pick up the remaining 5 files unless the files are "touched".
Is the behaviour of picking up 1 file per poll correct or should it process all valid files during one poll.
Below is my XML
<int:channel id="sftpInbound"/> <!-- To Java -->
<int:channel id="sftpOutbound"/>
<int:channel id="sftpStreamTransformer"/>
<int-sftp:inbound-streaming-channel-adapter id="sftpInboundAdapter"
channel="sftpInbound"
session-factory="sftpSessionFactory"
filter="compositeFilter"
remote-file-separator="/"
remote-directory="${sftp.directory}">
<int:poller cron="${sftp.cron}"/>
</int-sftp:inbound-streaming-channel-adapter>
<int:stream-transformer input-channel="sftpStreamTransformer" output-channel="sftpOutbound"/>
<bean id="compositeFilter"
class="org.springframework.integration.file.filters.CompositeFileListFilter">
<constructor-arg>
<list>
<bean
class="org.springframework.integration.sftp.filters.SftpSimplePatternFileListFilter">
<constructor-arg value="Receipt*.txt" />
</bean>
<bean id="SftpPersistentAcceptOnceFileListFilter" class="org.springframework.integration.sftp.filters.SftpPersistentAcceptOnceFileListFilter">
<constructor-arg ref="metadataStore" />
<constructor-arg value="ReceiptLock_" />
</bean>
</list>
</constructor-arg>
</bean>
<bean id="redisConnectionFactory"
class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory">
<property name="port" value="${redis.port}" />
<property name="password" value="${redis.password}" />
<property name="hostName" value="${redis.host}" />
</bean>
No; you need to use a SftpPersistentAcceptOnceFileListFilter (docs here) with a Redis (or some other) metadata store, not a lock registry.
EDIT
Regarding your comment below.
Yes, it's a known issue; in the next release we've added a max-fetch-size for exactly this reason - so the instances can each retrieve some of the files rather than the first instance grabbing them all.
(The inbound adapter works by first copying files found, that are not already in the store, to the local disk, and then emits them one at a time).
5.0 only available as a milestone right now M2 at the time of writing, but the current version and milestone repo can be found here; it won't be released for a few more months.
Another alternative would be to use outbound gateways - one to LS the files and one to GET individual files; your app would have to use the metadata store itself, though, to determine which file(s) can be fetched.
I am using Spring Batch 3.2 for bulk migration of data from XML to Database.
My XML contains around 140K users and I want to dump it into DB.
I do not want to proceed it in a single thread.
I tried using TaskExecutor but not able to succeed due to below error.
at java.lang.Thread.run(Thread.java:724)
Caused by: com.ctc.wstx.exc.WstxParsingException: Unexpected close tag </consumerName>; expected </first>.
at [row,col {unknown-source}]: [4814,26]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:606)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:479)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:464)
at com.ctc.wstx.sr.BasicStreamReader.reportWrongEndElem(BasicStreamReader.java:3283)
at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3210)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2829)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1072)
at org.codehaus.stax2.ri.Stax2EventReaderImpl.peek(Stax2EventReaderImpl.java:367)
at org.springframework.batch.item.xml.stax.DefaultFragmentEventReader.nextEvent(DefaultFragmentEventReader.java:114)
at org.springframework.batch.item.xml.stax.DefaultFragmentEventReader.markFragmentProcessed(DefaultFragmentEventReader.java:184)
where consumerName and first are XML nodes.
I knew StaxEventItemReader is not a thread safe and multiple threads are using XML and due to that issue, there is some problem in marking fragments as processed and I am not able to get unique record as well as complete fragment to process.
Can any one suggest me how Can I use multi-threading/partitioning in my case.
What I want
By using multi-threading, how can I make sure that each thread get unique chuck i.e (Thread 1 - fragement 1-100, Thread 2 - fragement 101-200.... so on)
Each thread process unique chuck and dump into DB.
My configuration
<batch:job id="sampleJob">
<batch:step id="multiThreadStep" allow-start-if-complete="true">
<batch:tasklet transaction-manager="transactionManager" task-executor="taskExecutor" throttle-limit="10">
<batch:chunk reader="xmlItemReader" writer="itemWriter"
processor="itemProcessor" commit-interval="10" skip-limit="1500000">
<batch:skippable-exception-classes>
<batch:include class="java.lang.Exception" />
</batch:skippable-exception-classes>
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<!-- <bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor">
<property name="concurrencyLimit" value="10"/>
</bean> -->
<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="10" />
<property name="maxPoolSize" value="10" />
<property name="allowCoreThreadTimeOut" value="true" />
</bean>
<bean id="xmlItemReader" class="org.springframework.batch.item.xml.StaxEventItemReader">
<property name="fragmentRootElementName" value="userItem" />
<property name="unmarshaller" ref="userDetailUnmarshaller" />
<property name="saveState" value="false" />
</bean>
Sample XML
<userItems>
<userItem>
<...>
<...>
</userItem>
<userItem>
<...>
<...>
</userItem>
...
...
</userItems>
I have following requirement to implement for each incoming message into our system
Convert incoming to to XML file and save in database
Read that file from data base and Send it SFTP
Delete the entry once File send is sucessful.
I have implemented Step-1 successfully :
<int-jdbc:outbound-channel-adapter query="insert into table_name(id, payload) values (:headers[fileName], :payload)" channel="outputChannel" jdbc-operations="jdbcTemplate">
<int:poller fixed-rate="1000" />
</int-jdbc:outbound-channel-adapter>
While implementing Step-2, if I read all entries together, I get the payload in List which spring-sftp doesn't support it. If read each record to and send to SFTP, its a performance problem. Any suggestion please?
<int-jdbc:inbound-channel-adapter row-mapper="rowMapper" query="select * from table_name" channel="sftpChannel" jdbc-operations="jdbcTemplate">
<int:poller fixed-rate="1000" />
</int-jdbc:inbound-channel-adapter>
How to delete the record upon file send successful?. I guess payload.delete() won't work here.
<sftp:outbound-channel-adapter auto-startup="true" id="sftpOutboundAdapter" session-factory="sftpSessionFactory" channel="sftpChannel" charset="UTF-8" remote-directory="${spectrum.ftp.path}" remote-filename-generator-expression="headers[fileName] + '.xml'">
<sftp:request-handler-advice-chain>
<bean class="org.springframework.integration.handler.advice.ExpressionEvaluatingRequestHandlerAdvice">
<property name="onSuccessExpression" value="payload.delete()" />
</bean>
</sftp:request-handler-advice-chain>
</sftp:outbound-channel-adapter>
I have an application relying on Spring Integration (4.0.4.RELEASE) and RabbitMQ. My flow is as follow:
Messages are put in queue via a process (they do not expect any answer):
Gateway -> Channel -> RabbitMQ
And then drained by another process:
RabbitMQ --1--> inbound-channel-adapter A --2--> chain B --3--> aggregator C --4--> service-activator D --5--> final service-activator E
Explanations & context
The specific thing is that nowhere in my application I am using a splitter: aggregator C just waits for enough messages to come, or for a timeout to expire, and then forwards the batch to service D. Messages can get stuck in aggregator C for quite a long time, and should NOT be considered as consumed there. They should only be consumed once service D successfully completes. Therefore, I am using MANUAL acknowledgement on inbound-channel-adapter A and service E is in charge of acknowledging the batch.
Custom aggregator
I solved the acknowledgement issue I had when set to AUTO by redefining the aggregator. Indeed, messages are acknowledged immediately if any asynchronous process occurs in the flow (see question here). Therefore, I switched to MANUAL acknowledgement and implemented the aggregator like this:
<bean class="org.springframework.integration.config.ConsumerEndpointFactoryBean">
<property name="inputChannel" ref="channel3"/>
<property name="handler">
<bean class="org.springframework.integration.aggregator.AggregatingMessageHandler">
<constructor-arg name="processor">
<bean class="com.test.AMQPAggregator"/>
</constructor-arg>
<property name="correlationStrategy">
<bean class="com.test.AggregatorDefaultCorrelationStrategy" />
</property>
<property name="releaseStrategy">
<bean class="com.test.AggregatorMongoReleaseStrategy" />
</property>
<property name="messageStore" ref="messageStoreBean"/>
<property name="expireGroupsUponCompletion" value="true"/>
<property name="sendPartialResultOnExpiry" value="true"/>
<property name="outputChannel" ref="channel4"/>
</bean>
</property>
</bean>
<bean id="messageStoreBean" class="org.springframework.integration.store.SimpleMessageStore"/>
<bean id="messageStoreReaperBean" class="org.springframework.integration.store.MessageGroupStoreReaper">
<property name="messageGroupStore" ref="messageStore" />
<property name="timeout" value="${myapp.timeout}" />
</bean>
<task:scheduled-tasks>
<task:scheduled ref="messageStoreReaperBean" method="run" fixed-rate="2000" />
</task:scheduled-tasks>
I wanted indeed to aggregate the headers in a different way, and keep the highest value of all the amqp_deliveryTag for later multi-acknoledgement in service E (see this thread). This works great so far, apart from the fact that it is far more verbose than the typical aggregator namespace (see this old Jira ticket).
Services
I am just using basic configurations:
chain-B
<int:chain input-channel="channel2" output-channel="channel3">
<int:header-enricher>
<int:error-channel ref="errorChannel" /> // Probably useless
</int:header-enricher>
<int:json-to-object-transformer/>
<int:transformer ref="serviceABean"
method="doThis" />
<int:transformer ref="serviceBBean"
method="doThat" />
</int:chain>
service-D
<int:service-activator ref="serviceDBean"
method="doSomething"
input-channel="channel4"
output-channel="channel5" />
Error management
As I rely on MANUAL acknowledgement, I need to manually reject messages as well in case an exception occurs. I have the following definition for inbound-channel-adapter A:
<int-amqp:inbound-channel-adapter channel="channel2"
queue-names="si.queue1"
error-channel="errorChannel"
mapped-request-headers="*"
acknowledge-mode="MANUAL"
prefetch-count="${properties.prefetch_count}"
connection-factory="rabbitConnectionFactory"/>
I use the following definition for errorChannel:
<int:chain input-channel="errorChannel">
<int:transformer ref="errorUnwrapperBean" method="unwrap" />
<int:service-activator ref="amqpAcknowledgerBean" method="rejectMessage" />
</int:chain>
ErrorUnwrapper is based on this code and the whole exception detection and message rejection works well until messages reach aggregator C.
Problem
If an exception is raised while processing the messages in service-activator D, then I see this exception but errorChannel does not seem to receive any message, and my ErrorUnwrapper unwrap() method is not called. The tailored stack traces I see when an Exception("ahahah") is thrown are as follow:
2014-09-23 16:41:18,725 ERROR o.s.i.s.SimpleMessageStore:174: Exception in expiry callback
org.springframework.messaging.MessageHandlingException: java.lang.Exception: ahahaha
at org.springframework.integration.handler.MethodInvokingMessageProcessor.processMessage(MethodInvokingMessageProcessor.java:78)
at org.springframework.integration.handler.ServiceActivatingHandler.handleRequestMessage(ServiceActivatingHandler.java:71)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:170)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78)
(...)
Caused by: java.lang.Exception: ahahaha
at com.myapp.ServiceD.doSomething(ServiceD.java:153)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
(...)
2014-09-23 16:41:18,733 ERROR o.s.s.s.TaskUtils$LoggingErrorHandler:95: Unexpected error occurred in scheduled task.
org.springframework.messaging.MessageHandlingException: java.lang.Exception: ahahaha
(...)
Question
How can one tell the services that process messages coming from such an aggregator to publish errors to errorChannel? I tried to specify in the header via a header-enricher the error-channel with no luck. I am using the default errorChannel definition, but I tried as well to change its name and redefine it. I am clueless here, and even though I found this and that, I have not managed to get it to work. Thanks in advance for your help!
As you see by StackTrace your process is started from the MessageGroupStoreReaper Thread, which is initiated from the default ThreadPoolTaskScheduler.
So, you must provide a custom bean for that:
<bean id="scheduler" class="org.springframework.scheduling.concurrent.ThreadPoolTaskScheduler">
<property name="errorHandler">
<bean class="org.springframework.integration.channel.MessagePublishingErrorHandler">
<property name="defaultErrorChannel" ref="errorChannel"/>
</bean>
</property>
</bean>
<task:scheduled-tasks scheduler="scheduler">
<task:scheduled ref="messageStoreReaperBean" method="run" fixed-rate="2000" />
</task:scheduled-tasks>
However I see the benefits from having the error-channel on the <aggregator>, where we really have several points from different detached Threads, with wich we can't get deal normally.