We are implementing a flow where a <int-sftp:inbound-streaming-channel-adapter/> polls a directory for a file and when found it passes the stream to a service activator.
The issue is we will have multiple instances of the app running and we would like to lock the process so that only one instance can pick up the file.
Looking at the documentation, Redis Lock Registry looks to be the solution, is there an example of this being used in xml?
All I can find is a few references to it and the source code for it.
http://docs.spring.io/spring-integration/reference/html/redis.html point 24.1
Added info:
Ive added the RedisMetaDataStore and SftpSimplePatternFileListFilter. It does work but it does have one oddity, when sftpInboundAdapter is activated by the poller it adds an entry for each file in the metadatastore. Say there are 10 files, there would be 10 entries in the datastore, but it does not process all 10 files in "1 go", only 1 file is processed per poll from the adapter, which would be fine, but in a multi instance environment if the server which picked up the files went down after processing 5 files, another server doesn't seem to pick up the remaining 5 files unless the files are "touched".
Is the behaviour of picking up 1 file per poll correct or should it process all valid files during one poll.
Below is my XML
<int:channel id="sftpInbound"/> <!-- To Java -->
<int:channel id="sftpOutbound"/>
<int:channel id="sftpStreamTransformer"/>
<int-sftp:inbound-streaming-channel-adapter id="sftpInboundAdapter"
channel="sftpInbound"
session-factory="sftpSessionFactory"
filter="compositeFilter"
remote-file-separator="/"
remote-directory="${sftp.directory}">
<int:poller cron="${sftp.cron}"/>
</int-sftp:inbound-streaming-channel-adapter>
<int:stream-transformer input-channel="sftpStreamTransformer" output-channel="sftpOutbound"/>
<bean id="compositeFilter"
class="org.springframework.integration.file.filters.CompositeFileListFilter">
<constructor-arg>
<list>
<bean
class="org.springframework.integration.sftp.filters.SftpSimplePatternFileListFilter">
<constructor-arg value="Receipt*.txt" />
</bean>
<bean id="SftpPersistentAcceptOnceFileListFilter" class="org.springframework.integration.sftp.filters.SftpPersistentAcceptOnceFileListFilter">
<constructor-arg ref="metadataStore" />
<constructor-arg value="ReceiptLock_" />
</bean>
</list>
</constructor-arg>
</bean>
<bean id="redisConnectionFactory"
class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory">
<property name="port" value="${redis.port}" />
<property name="password" value="${redis.password}" />
<property name="hostName" value="${redis.host}" />
</bean>
No; you need to use a SftpPersistentAcceptOnceFileListFilter (docs here) with a Redis (or some other) metadata store, not a lock registry.
EDIT
Regarding your comment below.
Yes, it's a known issue; in the next release we've added a max-fetch-size for exactly this reason - so the instances can each retrieve some of the files rather than the first instance grabbing them all.
(The inbound adapter works by first copying files found, that are not already in the store, to the local disk, and then emits them one at a time).
5.0 only available as a milestone right now M2 at the time of writing, but the current version and milestone repo can be found here; it won't be released for a few more months.
Another alternative would be to use outbound gateways - one to LS the files and one to GET individual files; your app would have to use the metadata store itself, though, to determine which file(s) can be fetched.
Related
Referring to my earlier question at URL - Spring integration multithreading requirement - I think I may have figured out the root cause of the issue.
My requirement in brief -
Poll the database after a fixed delay of 1 sec and then publish very limited data to Tibco EMS queue. Now from this EMS queue I have to do the following tasks all in multithreaded fashion :- i) consume the messages, ii) fetch the full data now from the database and iii) converting this data into json format.
My design -
`<int:channel id="dbchannel"/>
<int-jdbc:inbound-channel-adapter id="dbchanneladapter"
channel="dbchannel" data-source="datasource"
query="${selectquery}" update="${updatequery}"
max-rows-per-poll="1000">
<int:poller id="dbchanneladapterpoller"
fixed-delay="1000">
<int:transactional transaction-manager="transactionmanager" />
</int:poller>
</int-jdbc:inbound-channel-adapter>
<int:service-activator input-channel="dbchannel"
output-channel="publishchannel" ref="jdbcmessagehandler" method="handleJdbcMessage" />
<bean id="jdbcmessagehandler" class="com.citigroup.handler.JdbcMessageHandler" />
<int:publish-subscribe-channel id="publishchannel"/>
<int-jms:outbound-channel-adapter id="publishchanneladapter"
channel="publishchannel" jms-template="publishrealtimefeedinternaljmstemplate" />
<int:channel id="subscribechannel"/>
<int-jms:message-driven-channel-adapter
id="subscribechanneladapter" destination="subscriberealtimeinternalqueue"
connection-factory="authenticationconnectionfactory" channel="subscribechannel"
concurrent-consumers="5" max-concurrent-consumers="5" />
<int:service-activator input-channel="subscribechannel"
ref="subscribemessagehandler" method="logJMSMessage" />
<bean id="subscribemessagehandler" class="com.citigroup.handler.SubscribeJMSMessageHandler" />
</beans>
<bean id="authenticationconnectionfactory"
class="org.springframework.jms.connection.UserCredentialsConnectionFactoryAdapter">
<property name="targetConnectionFactory" ref="connectionFactory" />
<property name="username" value="test" />
<property name="password" value="test123" />
</bean>
<bean id="connectionFactory" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiTemplate">
<ref bean="jndiTemplate" />
</property>
<property name="jndiName" value="app.jndi.testCF" />
</bean>
<bean id="subscriberealtimeinternalqueue" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiTemplate">
<ref bean="jndiTemplate" />
</property>
<property name="jndiName"
value="app.queue.testQueue" />
</bean>
<bean id="jndiTemplate" class="org.springframework.jndi.JndiTemplate">
<property name="environment">
<props>
<prop key="java.naming.factory.initial">com.tibco.tibjms.naming.TibjmsInitialContextFactory
</prop>
<prop key="java.naming.provider.url">tibjmsnaming://test01d.nam.nsroot.net:7222</prop>
</props>
</property>
</bean>`
Issue -
Using message-driven-channel with concurrent consumers value set to 5. However, it looks like just one consumer thread (container-2) is created and is picking up the messages from EMS queue. Please find below the log4j log -
16 Aug 2018 11:31:12,077 INFO SubscribeJMSMessageHandler [subscribechanneladapter.container-2][]: Total count of records read from Queue at this moment is 387
record#1:: [ID=7694066395] record#2:: [ID=7694066423] .. .. .. record#387:: [ID=6147457333]
Probable root cause here -
May be its the first step in the configuration where I am polling the database to fetch the data after a fixed-delay that's causing this multithreading issue. Referring to the logs above, my assumption here is since the number of records fetched is 387 and all these are bundled into a List object (List> message), it is being considered as just 1 message/payload instead of 387 different messages and that's why just one thread/container/consumer is picking up this bundled message. Reason for this assumption is the logs below -
GenericMessage [payload=[{"ID":7694066395},{"ID":7694066423},{"ID":6147457333}], headers={json__ContentTypeId__=class org.springframework.util.LinkedCaseInsensitiveMap, jms_redelivered=false, json__TypeId__=class java.util.ArrayList, jms_destination=Queue[app.queue.testQueue], id=e034ba73-7781-b62c-0307-170099263068, priority=4, jms_timestamp=1534820792064, contentType=application/json, jms_messageId=ID:test.21415B667C051:40C149C0, timestamp=1534820792481}]
Question -
Is my understanding of the root cause correct? If yes then what can be done to treat these 387 messages as individual messages (and not one List object of messages) and publish them one by one without impacting the transaction management??
I had discussed this issue with https://stackoverflow.com/users/2756547/artem-bilan in my earlier post on stackoverflow and I had to check this design by replacing Tibco EMS with ActiveMQ. However, ActiveMQ infrastructure is is still being analysed by our architecture team and so can't be used till its approved.
Oh! Now I see what is your problem. The int-jdbc:inbound-channel-Adapter indeed returns a list of records it could select from the DB. And this whole list is sent as a single message to the JMS. That’s the reason how you see only one thread in the consumer side: there is just only one message to get from the queue.
If you would like to have separate messages for each pulled record, you need to consider to use a <splitter> in between JDBC polling operation and sending to JMS.
I am new to gridgain and we are doing a POC using gridgain. We did some simple examples using partitioned cache, it works well however we found that when we bring a node down, cache from that node was gone. so my questions is: if we keep using patitioned mode, is there any way to re-distributed cache when a node (or several nodes) is undeployed. if not, is there any good way to do it? Thanks!
configuration Code:
<context:component-scan base-package="com.test" />
<bean id="hostGrid" class="org.gridgain.grid.GridSpringBean">
<property name="configuration">
<bean class="org.gridgain.grid.GridConfiguration">
<property name="localHost" value="127.0.0.1"/>
<property name="peerClassLoadingEnabled" value="false"/>
<property name="marshaller">
<bean class="org.gridgain.grid.marshaller.optimized.GridOptimizedMarshaller">
<property name="requireSerializable" value="false"/>
</bean>
</property
<property name="cacheConfiguration">
<list>
<bean class="org.gridgain.grid.cache.GridCacheConfiguration">
<property name="name" value="CACHE"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="store" >
<bean class="com.test.CacheJdbcPOCStore"></bean>
</property>
</bean>
</list>
</property>
</bean>
</property>
</bean>
We deployed the same war (using above configuration) to 3 tomcat 7 server. we did not specify number of backup so it should be 1 by default.
follow up
I solved this problem by putting backups= 1 in configuration. looks like previously it did not create backup copy. however it should make 1 copy since it is by default. also, when i tried to bring down 2 nodes at one time, i saw part of cache was gone, so I set backups=2 and found no cache loss this time. so it looks like if in a very bad case where all nodes except for the main node crash, we need to have # of nodes -1 backups to prevent data loss. but if I do so then it is just like replicated mode and replicated mode has less restriction on query and transactions. So my question is : if we need to take the advantage of parallel computation and at mean time want to prevent data loss when nodes crash what is the best practice?
Thanks!
Number of backups is 0 by default. The documentation has been fixed.
You are right about REPLICATED mode. If you are worried about any data loss, the REPLICATED mode is the only way to guarantee it. The disadvantage here is that writes will get slower, as all the nodes in the cluster will be updated. The advantage is that the data is available on every node, so you can easily access it from your computations without worrying which node to send them to.
3 of the webservices that I am working on uses Springs, SimpleMessageStore for storing the messages. For some reason it is causing memory leak in production env and I am unable to reproduce it in the lower environments. I am new to spring integration and need help in understanding what might be causing this.
the spring config code looks like this:
<!-- MESSAGE STORES -->
<bean id="monitoringHeaderRequestMsgStore" class="org.springframework.integration.store.SimpleMessageStore"/>
<bean id="gbqHeaderRequestMsgStore" class="org.springframework.integration.store.SimpleMessageStore"/>
<bean id="bondAgreementResponseMsgStore" class="org.springframework.integration.store.SimpleMessageStore"/>
<bean id="bondWIthRulesRequestMsgStore" class="org.springframework.integration.store.SimpleMessageStore"/>
<bean id="ProcessVariableMessageStores" class="com.aviva.uklife.investment.impl.ProcessVariableMessageStores">
<property name="_monitoringHeaderRequestMsgStore" ref="monitoringHeaderRequestMsgStore"/>
<property name="_gbqHeaderRequestMsgStore" ref="gbqHeaderRequestMsgStore"/>
<property name="_bondWIthRulesRequestMsgStore" ref="bondWIthRulesRequestMsgStore"/>
<property name="_bondAgreementResponseMsgStore" ref="bondAgreementResponseMsgStore"/>
</bean>
<!-- Retrieve stored MonitoringHeaderRequest -->
<int:transformer expression="headers.get('#{T(.....Constants).MONITORING_HEADER_REQUEST_CLAIM_CHECK_ID}')"/>
<int:claim-check-out message-store="monitoringHeaderRequestMsgStore" remove-message="false"/>
<!-- Store HeaderRequest -->
<int:gateway request-channel="header-req-store-channel"/>
<!-- PROCESS VARIABLES STORAGE IN STORE CHANNELS WITH KEY OR CLAIMCHECK ID -->
<int:chain input-channel="monitoring-header-req-store-channel">
<int:claim-check-in message-store="monitoringHeaderRequestMsgStore"/>
<int:header-enricher>
<int:header name="#{T(....Constants).MONITORING_HEADER_REQUEST_CLAIM_CHECK_ID}" expression="payload"/>
</int:header-enricher>
<int:claim-check-out message-store="monitoringHeaderRequestMsgStore" remove-message="false"/>
</int:chain>
thank you
To be honest, it isn't recommended to use SimpleMessageStore in the production environment. That's because of memory-leak, as you noticed. If you don't clear the MessageStore periodically.
Right, there are might be some cases, when you need to keep messages in the MessageStore for the long time. So consider to replace SimpleMessageStore with some persistent MessageStore.
From other side we need to have more info on the matter to provide better help.
Maybe you just have several aggregators and don't use expire-groups-upon-completion = "true"...
I have an scenario where I must send messages in order to a rest service and I plan to use a resequencer. The behaviour of this resequencer must be:
Order messages by time in day (hh:mm:ss): data on the message
Release messages only after they stay a period of time in the bus (p.e. 2 minutes)
As the default Resequencer didn't serve for this purpose I decided to develop a custom one changing the ResequencerMessageGroupProcessor for a CustomResequencerMessageGroupProcessor.
I succeded using a service activator but I had to explictly define the output-channel as a property. Isn't there a way to use the output-channel attribute on the xml declaration?
When I use the output-channel attribute the following error occurs:
Caused by: java.lang.IllegalArgumentException: no outputChannel or replyChannel header available
at org.springframework.util.Assert.notNull(Assert.java:112)
at org.springframework.integration.aggregator.AbstractCorrelatingMessageHandler.sendReplies(AbstractCorrelatingMessageHandler.java:616)
at org.springframework.integration.aggregator.AbstractCorrelatingMessageHandler.completeGroup(AbstractCorrelatingMessageHandler.java:597)
at org.springframework.integration.aggregator.AbstractCorrelatingMessageHandler.handleMessageInternal(AbstractCorrelatingMessageHandler.java:405)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78)
... 46 more
here's my example:
<int:channel id="resequencerChannel"/>
<int:service-activator id="customResequencer" ref="resequencingMessageHandler"
input-channel="resequencerChannel" />
<int:channel id="aggregatedMessageChannel" />
<bean id="resequencingMessageHandler" class="org.springframework.integration.aggregator.ResequencingMessageHandler">
<constructor-arg name="releaseStrategy" ref="timeoutReleaseStrategy"/>
<constructor-arg name="processor" ref="customResequencerMessageGroupProcessor"/>
<constructor-arg name="store" ref="redisMessageStore"/>
<constructor-arg name="correlationStrategy" ref="customCorrelationStrategy"/>
<property name="outputChannel" ref="aggregatedMessageChannel"/>
<property name="sendPartialResultOnExpiry" value="true"></property>
</bean>
<bean id="customResequencerMessageGroupProcessor" class="test.resequencer.CustomResequencerMessageGroupProcessor">
<constructor-arg name="timeout" value="10000"/>
</bean>
<bean id="timeoutReleaseStrategy" class="org.springframework.integration.aggregator.TimeoutCountSequenceSizeReleaseStrategy" >
<constructor-arg name="threshold" value="100000"></constructor-arg>
<constructor-arg name="timeout" value="10000"/>
</bean>
<bean id="customCorrelationStrategy" class="org.springframework.integration.aggregator.HeaderAttributeCorrelationStrategy" >
<constructor-arg name="attributeName" value="correlationId"/>
Also, if you think there is a better way to do this, please, I would apreciatte telling so
Thanks in advance!
Regards
Guzman
When referencing (ref) a MessageHandler from a <service-activator/> the XML output-channel is only applied if the referenced handler is an AbstractReplyProducingMessageHandler (ARPMH).
Components such as routers, aggregators, resequencers, are not considered to be ARPMHs because they sometimes produce a reply, sometimes don't and, in the case of a router, might produce multiple "replies" which doesn't fit the service activator model.
We could probably refactor the aggregator/resequencer to be ARPMHs because they only produce 0 or 1 "reply". We could also add some smarts to the ServiceActivatorFactoryBean to inject the output channel if the reference is an AbstractCorrelatingMessageHandler. Feel free to open an Improvement JIRA Issue.
In the meantime, your solution is the correct work-around.
We have a requirement where we need to process multiple files from a folder (each file having 50K-100K records) and after doing some calculation store the data in database. Once the content of the file is processed then we need to trigger a second stream to do some further processing.
The way we have designed the solution is
File Reader -- File Splitter -- Processor -- Re-sequencer (release strategy per record) -- Sink (Load Data in Database)
The issue with the above design is from re-sequencer and Sink we are not able to figure out if all the contents of a file are successfully stored in database without which we cannot invoke the second stream.
We have done some benchmark activities with aggregator and it seems to be much slower than re-sequencer. Are there any known performance issues with aggregator? Are there any serious performance degradation if we have a release strategy of releasing only when all the data is with re-sequencer
Thanks,
Jayadeep
Processor 1 - Splits the lines
Processor 2 - Calculation on each message
<int:service-activator input-channel="input" ref="singlePointCalculator" method="onMessage"/>
<bean id="singlePointCalculator" class="com.processor.SinglePointCalculator">
<constructor-arg index="0" ref="output"/>
</bean>
Sink
<int:resequencer input-channel='input' output-channel='reseqChannel' release-strategy-expression="size() == 1" ></int:resequencer>
<int:transformer ref="mapTransformer" input-channel="reseqChannel" output-channel="mapChannel"/>
<bean id="mapTransformer" class="com.sink.MapTransformer">
<property name="columns" value=" side,value" />
</bean>
<int-jdbc:outbound-channel-adapter
data-source="dataSource"
channel="mapChannel"
query=" ">
</int-jdbc:outbound-channel-adapter>
Sink Using Aggregator
<int:aggregator input-channel='input' output-channel='aggChannel' " ></int:aggregator >
<int:transformer ref="mapTransformer" input-channel="aggChannel" output-channel="mapChannel"/>
<bean id="mapTransformer" class="com.sink.MapTransformer">
<property name="columns" value=" side,value" />
</bean>
<int-jdbc:outbound-channel-adapter
data-source="dataSource"
channel="mapChannel"
query=" ">
</int-jdbc:outbound-channel-adapter>