How to get a file daily via SFTP using Spring Integration with Java config? - spring-integration

I need to get a file daily via SFTP. I would like to use Spring Integration with Java config. The file is generally available at a specific time each day. The application should try to get the file near that time each day. If the file is not available, it should continue to retry for x attempts. After x attempts, it should send an email to let the admin know that the file is still not available on the SFTP site.
One option is to use SftpInboundFileSynchronizingMessageSource. In the MessageHandler, I can kick off a job to process the file. However, I really don't need synchronization with the remote file system. After all, it is a scheduled delivery of the file. Plus, I need to delay at most 15 minutes for the next retry and to poll every 15 minutes seems a bit overkill for a daily file. I guess that I could use this but would need some mechanism to send email after a certain time elapsed and no file was received.
The other option seems to be using get of the SFTP Outbound Gateway. But the only examples I can find seem to be XML config.
Update
Adding code after using help provided by Artem Bilan's answer below:
Configuration class:
#Bean
#InboundChannelAdapter(autoStartup="true", channel = "sftpChannel", poller = #Poller("pollerMetadata"))
public SftpInboundFileSynchronizingMessageSource sftpMessageSource(ApplicationProperties applicationProperties, PropertiesPersistingMetadataStore store) {
SftpInboundFileSynchronizingMessageSource source =
new SftpInboundFileSynchronizingMessageSource(sftpInboundFileSynchronizer(applicationProperties));
source.setLocalDirectory(new File("ftp-inbound"));
source.setAutoCreateLocalDirectory(true);
FileSystemPersistentAcceptOnceFileListFilter local = new FileSystemPersistentAcceptOnceFileListFilter(store,"test");
source.setLocalFilter(local);
source.setCountsEnabled(true);
return source;
}
#Bean
public PollerMetadata pollerMetadata() {
PollerMetadata pollerMetadata = new PollerMetadata();
List<Advice> adviceChain = new ArrayList<Advice>();
adviceChain.add(retryCompoundTriggerAdvice());
pollerMetadata.setAdviceChain(adviceChain);
pollerMetadata.setTrigger(compoundTrigger());
return pollerMetadata;
}
#Bean
public RetryCompoundTriggerAdvice retryCompoundTriggerAdvice() {
return new RetryCompoundTriggerAdvice(compoundTrigger(), secondaryTrigger());
}
#Bean
public CompoundTrigger compoundTrigger() {
CompoundTrigger compoundTrigger = new CompoundTrigger(primaryTrigger());
return compoundTrigger;
}
#Bean
public Trigger primaryTrigger() {
return new CronTrigger("*/60 * * * * *");
}
#Bean
public Trigger secondaryTrigger() {
return new PeriodicTrigger(10000);
}
#Bean
#ServiceActivator(inputChannel = "sftpChannel")
public MessageHandler handler(PropertiesPersistingMetadataStore store) {
return new MessageHandler() {
#Override
public void handleMessage(Message<?> message) throws MessagingException {
System.out.println(message.getPayload());
store.flush();
}
};
}
RetryCompoundTriggerAdvice class:
public class RetryCompoundTriggerAdvice extends AbstractMessageSourceAdvice {
private final CompoundTrigger compoundTrigger;
private final Trigger override;
private int count = 0;
public RetryCompoundTriggerAdvice(CompoundTrigger compoundTrigger, Trigger overrideTrigger) {
Assert.notNull(compoundTrigger, "'compoundTrigger' cannot be null");
this.compoundTrigger = compoundTrigger;
this.override = overrideTrigger;
}
#Override
public boolean beforeReceive(MessageSource<?> source) {
return true;
}
#Override
public Message<?> afterReceive(Message<?> result, MessageSource<?> source) {
if (result == null && count <= 5) {
count++;
this.compoundTrigger.setOverride(this.override);
}
else {
this.compoundTrigger.setOverride(null);
if (count > 5) {
//send email
}
count = 0;
}
return result;
}
}

Since Spring Integration 4.3 there is CompoundTrigger:
* A {#link Trigger} that delegates the {#link #nextExecutionTime(TriggerContext)}
* to one of two Triggers. If the {#link #setOverride(Trigger) override} trigger is
* {#code null}, the primary trigger is invoked; otherwise the override trigger is
* invoked.
With the combination of CompoundTriggerAdvice:
* An {#link AbstractMessageSourceAdvice} that uses a {#link CompoundTrigger} to adjust
* the poller - when a message is present, the compound trigger's primary trigger is
* used to determine the next poll. When no message is present, the override trigger is
* used.
it can be used to reach your task:
The primaryTrigger can be a CronTrigger to run the task only once a day.
The override could be a PeriodicTrigger with desired short period to retry.
The retry logic you can utilize with one more Advice for poller or just extend that CompoundTriggerAdvice to add count logic to send an email eventually.
Since there is no file, therefore no message to kick the flow. And we don't have choice unless dance around the poller infrastructure.

Related

Spring Integration aws Kinesis , message aggregator, Release Strategy

this is a follow-up question to Spring Integration AWS RabbitMQ Kinesis
I have the following configuration. I am noticing that when I send a message to the input channel named kinesisSendChannel for the first time, the aggregator and release strategy is getting invoked and messages are sent to Kinesis Streams. I put debug breakpoints at different places and could verify this behavior. But when I again publish messages to the same input channel the release strategy and the outbound processor are not getting invoked and messages are not sent to the Kinesis. I am not sure why the aggregator flow is getting invoked only the first time and not for subsequent messages. For testing purpose , the TimeoutCountSequenceSizeReleaseStrategy is set with count as 1 & time as 60 seconds. There is no specific MessageStore used. Could you help identify the issue?
#Bean(name = "kinesisSendChannel")
public MessageChannel kinesisSendChannel() {
return MessageChannels.direct().get();
}
#Bean(name = "resultChannel")
public MessageChannel resultChannel() {
return MessageChannels.direct().get();
}
#Bean
#ServiceActivator(inputChannel = "kinesisSendChannel")
public MessageHandler aggregator(TestMessageProcessor messageProcessor,
MessageChannel resultChannel,
TimeoutCountSequenceSizeReleaseStrategy timeoutCountSequenceSizeReleaseStrategy) {
AggregatingMessageHandler handler = new AggregatingMessageHandler(messageProcessor);
handler.setCorrelationStrategy(new ExpressionEvaluatingCorrelationStrategy("headers['foo']"));
handler.setReleaseStrategy(timeoutCountSequenceSizeReleaseStrategy);
handler.setOutputProcessor(messageProcessor);
handler.setOutputChannel(resultChannel);
return handler;
}
#Bean
#ServiceActivator(inputChannel = "resultChannel")
public MessageHandler kinesisMessageHandler1(#Qualifier("successChannel") MessageChannel successChannel,
#Qualifier("errorChannel") MessageChannel errorChannel, final AmazonKinesisAsync amazonKinesis) {
KinesisMessageHandler kinesisMessageHandler = new KinesisMessageHandler(amazonKinesis);
kinesisMessageHandler.setSync(true);
kinesisMessageHandler.setOutputChannel(successChannel);
kinesisMessageHandler.setFailureChannel(errorChannel);
return kinesisMessageHandler;
}
public class TestMessageProcessor extends AbstractAggregatingMessageGroupProcessor {
#Override
protected Object aggregatePayloads(MessageGroup group, Map<String, Object> defaultHeaders) {
final PutRecordsRequest putRecordsRequest = new PutRecordsRequest().withStreamName("test-stream");
final List<PutRecordsRequestEntry> putRecordsRequestEntry = group.getMessages().stream()
.map(message -> (PutRecordsRequestEntry) message.getPayload()).collect(Collectors.toList());
putRecordsRequest.withRecords(putRecordsRequestEntry);
return putRecordsRequestEntry;
}
}
I believe the problem is here handler.setCorrelationStrategy(new ExpressionEvaluatingCorrelationStrategy("headers['foo']"));. All your messages come with the same foo header. So, all of them form the same message group. As long as you release group and don’t remove it, all the new messages are going to be discarded.
Please, revise aggregator documentation to make yourself familiar with all the possible behavior : https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#aggregator

spring batch api request on wait

I have written a simple spring batch project where
API to execute job: returns job ID on job launch
reads/processes/writes from/to DB in multithread parallel processing
(Launching the job asynchronously to get the job ID in advance so I can poll the status of the job from another API request.)
API to poll the status of the job with respect to the job ID passed.
Polling api works smoothly if job step's throttle limit is 7 or less.
However, if throttle limit is more than 7, job execution continues but polling api will be on wait till read/process releases.
Have also tried a simple api which simply returns String instead of polling but that goes on wait too.
Sample of the code as shown below:
#Configuration
#EnableBatchProcessing
public class SpringBatchConfig {
private int core = 200;
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get(SC_Constants.JOB)
.incrementer(new RunIdIncrementer())
.listener(new Listener(transDAO))
.start(step_processRecords()
.build();
}
#Bean
public ThreadPoolTaskExecutor taskExecutor(){
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(this.core);
threadPoolTaskExecutor.setMaxPoolSize(this.core);
threadPoolTaskExecutor.setQueueCapacity(this.core);
threadPoolTaskExecutor.setThreadNamePrefix("threadExecutor");
return threadPoolTaskExecutor;
}
#Bean
#StepScope
public JdbcPagingItemReader<Transaction> itemReader(...) {
JdbcPagingItemReader<Transaction> itemReader = new JdbcPagingItemReader<Transaction>();
...
return itemReader;
}
#Bean
#StepScope
public ItemProcessor<Transaction,Transaction> processor() {
return new Processor();
}
#Bean
#StepScope
public ItemWriter<Transaction> writer(...) {
return new Writer();
}
#Bean
public Step step3_processRecords() throws Exception {
return stepBuilderFactory.get(SC_Constants.STEP_3_PROCESS_RECORDS)
.<Transaction,Transaction>chunk(this.chunk)
.reader(itemReader(null,null,null))
.processor(processor())
.writer(writer(null,null,null))
.taskExecutor(taskExecutor())
.throttleLimit(20)
.build();
}
}
file that extends DefaultBatchConfigurer has below:
#Override
public JobLauncher getJobLauncher() {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
SimpleAsyncTaskExecutor exec = new SimpleAsyncTaskExecutor();
exec.setConcurrencyLimit(concurrency_limit);
jobLauncher.setTaskExecutor(exec);
return jobLauncher;
}
Edit:
polling api code snippet
#POST
#Consumes(MediaType.APPLICATION_JSON)
#Path("/getJobStatus")
public Response getJobStatus(#RequestBody String body){
JSONObject jsonObject = new JSONObject(body);
Long jobId = jsonObject.get("jobId");
jobExecution = jobExplorer.getJobExecution(jobId);
batchStatus = jobExecution.getStatus().getBatchStatus();
write_count = jobExecution.getStepExecutions().iterator().next().getWriteCount();
responseDto.setJob_id(jobId);
responseDto.setWrite_count(write_count);
responseDto.setStatus(batchStatus.name());
return responseDto;
}
Second edit:
sharing a snippet of the jobrepository setting: using postgres jdbc job repository.
#Component
public class SpringBatchConfigurer extends DefaultBatchConfigurer{
...
#PostConstruct
public void initialize() {
try {
BasicDataSource dataSource = new BasicDataSource();
dataSource.setDriverClassName(driverClassName);
dataSource.setUsername(username);
dataSource.setPassword(password);
dataSource.setUrl(dsUrl + "?currentSchema=public");
dataSource.setInitialSize(3);
dataSource.setMinIdle(1);
dataSource.setMaxIdle(3);
dataSource.addConnectionProperty("maxConnLifetimeMillis", "30000");
this.transactionManager = new DataSourceTransactionManager(dataSource);
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setDataSource(dataSource);
factory.setTransactionManager(transactionManager);
factory.afterPropertiesSet();
this.jobRepository = factory.getObject();
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
jobLauncher.afterPropertiesSet();
this.jobLauncher = jobLauncher;
} catch (Exception e) {
throw new BatchConfigurationException(e);
}
}
Third Edit: Tried passing it as a local variable under this step. polling works but now, job execution is not happening. No threads generated. No processing is happening.
#Bean
public Step step3_processRecords() throws Exception {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(this.core_size);
threadPoolTaskExecutor.setMaxPoolSize(this.max_pool_size);
threadPoolTaskExecutor.setQueueCapacity(this.queue_capacity);
threadPoolTaskExecutor.setThreadNamePrefix("threadExecutor");
return stepBuilderFactory.get("step3")
.<Transaction,Transaction>chunk(this.chunk)
.reader(itemReader(null,null,null))
.processor(processor())
.writer(writer(null,null,null))
.taskExecutor(threadPoolTaskExecutor)
.throttleLimit(20)
.build();
}

StreamingMessageSource keeps firing when a filter is applied

I am trying to poll an FTP directory for a certain kind of file, the polling of a directory works, but whenever I try to apply a filter to filter the files by extension, the messagesource keeps spamming messages about the file with no regard to the polling delay. Without the filters everything works fine, once I enable them my application authenticates with the FTP, downloads the file and sends the message nonstop over and over again. I have the following beans:
/**
* Factory that creates the remote connection
*
* #return DefaultSftpSessionFactory
*/
#Bean
public DefaultSftpSessionFactory sftpSessionFactory(#Value("${ftp.host}") String ftpHost,
#Value("${ftp.port}") int ftpPort,
#Value("${ftp.user}") String ftpUser,
#Value("${ftp.pass}") String ftpPass) {
DefaultSftpSessionFactory factory = new DefaultSftpSessionFactory();
factory.setAllowUnknownKeys(true);
factory.setHost(ftpHost);
factory.setPort(ftpPort);
factory.setUser(ftpUser);
factory.setPassword(ftpPass);
return factory;
}
/**
* Template to handle remote files
*
* #param sessionFactory SessionFactory bean
* #return SftpRemoteFileTemplate
*/
#Bean
public SftpRemoteFileTemplate fileTemplate(DefaultSftpSessionFactory sessionFactory) {
SftpRemoteFileTemplate template = new SftpRemoteFileTemplate(sessionFactory);
template.setAutoCreateDirectory(true);
template.setUseTemporaryFileName(false);
return template;
}
/**
* To listen to multiple directories, declare multiples of this bean with the same inbound channel
*
* #param fileTemplate FileTemplate bean
* #return MessageSource
*/
#Bean
#InboundChannelAdapter(channel = "deeplinkAutomated", poller = #Poller(fixedDelay = "6000", maxMessagesPerPoll = "-1"))
public MessageSource inboundChannelAdapter(SftpRemoteFileTemplate fileTemplate) {
SftpStreamingMessageSource source = new SftpStreamingMessageSource(fileTemplate);
source.setRemoteDirectory("/upload");
source.setFilter(new CompositeFileListFilter<>(
Arrays.asList(new AcceptOnceFileListFilter<>(), new SftpSimplePatternFileListFilter("*.trg"))
));
return source;
}
/**
* Listener that activates on new messages on the specified input channel
*
* #return MessageHandler
*/
#Bean
#ServiceActivator(inputChannel = "deeplinkAutomated")
public MessageHandler handler(JobLauncher jobLauncher, Job deeplinkBatch) {
return message -> {
Gson gson = new Gson();
SFTPFileInfo info = gson.fromJson((String) message.getHeaders().get("file_remoteFileInfo"), SFTPFileInfo.class);
System.out.println("File to download: " + info.getFilename().replace(".trg", ".xml"));
};
}
I think AcceptOnceFileListFilter is not suitable for SFTP files. The returned LsEntry doesn't match previously stored in the HashSet: just their hashes are different!
Consider to use a SftpPersistentAcceptOnceFileListFilter instead.
Also it would be better to configure a DefaultSftpSessionFactory for the isSharedSession:
/**
* #param isSharedSession true if the session is to be shared.
*/
public DefaultSftpSessionFactory(boolean isSharedSession) {
To avoid session recreation on each polling task.
you don't have a 6 seconds delay between calls because you have a maxMessagesPerPoll = "-1". That means poll remote files until they are there in remote dir. In your case with the AcceptOnceFileListFilter you always end up with the same file by the hash reason.

FTP - Using Spring Integration task-scheduler process stops after certain period

When trying to start the jar seperately in Unix machine the Thread for task-schedular is not listnening after some time but it is working fine in Windows machine.Even the application is working in linux on startup but going further sometime it is not working.Please let me know Is there any way to avoid the issue.
#Bean
#InboundChannelAdapter(value = "inputChannel", poller = #Poller(fixedDelay = "1000", maxMessagesPerPoll = "1"))
public MessageSource<?> receive() {
FtpInboundFileSynchronizingMessageSource messageSource = new FtpInboundFileSynchronizingMessageSource(synchronizer());
File Temp = new File(TEMP_FOLDER);
messageSource.setLocalDirectory(Temp);
messageSource.setAutoCreateLocalDirectory(true);
return messageSource;
}
private AbstractInboundFileSynchronizer<FTPFile> synchronizer() {
AbstractInboundFileSynchronizer<FTPFile> fileSynchronizer = new FtpInboundFileSynchronizer(sessionFactory());
fileSynchronizer.setRemoteDirectory(ftpFileLocation);
fileSynchronizer.setDeleteRemoteFiles(false);
Pattern pattern = Pattern.compile(".*\\.xml$");
FtpRegexPatternFileListFilter ftpRegexPatternFileListFilter = new FtpRegexPatternFileListFilter(pattern);
fileSynchronizer.setFilter(ftpRegexPatternFileListFilter);
return fileSynchronizer;
}
#Bean(name = "sessionFactory")
public SessionFactory<FTPFile> sessionFactory() {
DefaultFtpSessionFactory sessionFactory = new DefaultFtpSessionFactory();
sessionFactory.setHost(ftpHostName);
sessionFactory.setUsername(ftpUserName);
sessionFactory.setPassword(ftpPassWord);
return sessionFactory;
}
#Bean(name = "inputChannel")
public PollableChannel inputChannel() {
return new QueueChannel();
}
#Bean(name = PollerMetadata.DEFAULT_POLLER)
public PollerMetadata defaultPoller() {
PollerMetadata pollerMetadata = new PollerMetadata();
pollerMetadata.setTrigger(new PeriodicTrigger(100));
return pollerMetadata;
}
#ServiceActivator(inputChannel = "inputChannel")
public void transferredFilesFromFTP(File payload) {
callWork(payload);
}
There is no reason to have one poller immediately after another one. I mean you don't need that QueueChannel.
It's really interesting what does that magic callWork(payload); code do. Doesn't it have anything blocking for some long time? Even if that looks like void (without returning something to wait), but you may have there some thread starvation code, which steals all the thread from the default TaskScheduler (10 by default).
Looks like this is fully related to your another question Spring Integration ftp Thread process

Spring Integration Cassandra persistence workflow

I try to realize the following workflow with Spring Integration:
1) Poll REST API
2) store the POJO in Cassandra cluster
It's my first try with Spring Integration, so I'm still a bit overwhelmed about the mass of information from the reference. After some research, I could make the following work.
1) Poll REST API
2) Transform mapped POJO JSON result into a string
3) save string into file
Here's the code:
#Configuration
public class ConsulIntegrationConfig {
#InboundChannelAdapter(value = "consulHttp", poller = #Poller(maxMessagesPerPoll = "1", fixedDelay = "1000"))
public String consulAgentPoller() {
return "";
}
#Bean
public MessageChannel consulHttp() {
return MessageChannels.direct("consulHttp").get();
}
#Bean
#ServiceActivator(inputChannel = "consulHttp")
MessageHandler consulAgentHandler() {
final HttpRequestExecutingMessageHandler handler =
new HttpRequestExecutingMessageHandler("http://localhost:8500/v1/agent/self");
handler.setExpectedResponseType(AgentSelfResult.class);
handler.setOutputChannelName("consulAgentSelfChannel");
LOG.info("Created bean'consulAgentHandler'");
return handler;
}
#Bean
public MessageChannel consulAgentSelfChannel() {
return MessageChannels.direct("consulAgentSelfChannel").get();
}
#Bean
public MessageChannel consulAgentSelfFileChannel() {
return MessageChannels.direct("consulAgentSelfFileChannel").get();
}
#Bean
#ServiceActivator(inputChannel = "consulAgentSelfFileChannel")
MessageHandler consulAgentFileHandler() {
final Expression directoryExpression = new SpelExpressionParser().parseExpression("'./'");
final FileWritingMessageHandler handler = new FileWritingMessageHandler(directoryExpression);
handler.setFileNameGenerator(message -> "../../agent_self.txt");
handler.setFileExistsMode(FileExistsMode.APPEND);
handler.setCharset("UTF-8");
handler.setExpectReply(false);
return handler;
}
}
#Component
public final class ConsulAgentTransformer {
#Transformer(inputChannel = "consulAgentSelfChannel", outputChannel = "consulAgentSelfFileChannel")
public String transform(final AgentSelfResult json) throws IOException {
final String result = new StringBuilder(json.toString()).append("\n").toString();
return result;
}
This works fine!
But now, instead of writing the object to a file, I want to store it in a Cassandra cluster with spring-data-cassandra. For that, I commented out the file handler in the config file, return the POJO in transformer and created the following, :
#MessagingGateway(name = "consulCassandraGateway", defaultRequestChannel = "consulAgentSelfFileChannel")
public interface CassandraStorageService {
#Gateway(requestChannel="consulAgentSelfFileChannel")
void store(AgentSelfResult agentSelfResult);
}
#Component
public final class CassandraStorageServiceImpl implements CassandraStorageService {
#Override
public void store(AgentSelfResult agentSelfResult) {
//use spring-data-cassandra repository to store
LOG.info("Received 'AgentSelfResult': {} in Cassandra cluster...");
LOG.info("Trying to store 'AgentSelfResult' in Cassandra cluster...");
}
}
But this seems to be a wrong approach, the service method is never triggered.
So my question is, what would be a correct approach for my usecase? Do I have to implement the MessageHandler interface in my service component, and use a #ServiceActivator in my config. Or is there something missing in my current "gateway-approach"?? Or maybe there is another solution, that I'm not able to see..
Like mentioned before, I'm new to SI, so this may be a stupid question...
Nevertheless, thanks a lot in advance!
It's not clear how you are wiring in your CassandraStorageService bean.
The Spring Integration Cassandra Extension Project has a message-handler implementation.
The Cassandra Sink in spring-cloud-stream-modules uses it with Java configuration so you can use that as an example.
So I finally made it work. All I needed to do was
#Component
public final class CassandraStorageServiceImpl implements CassandraStorageService {
#ServiceActivator(inputChannel="consulAgentSelfFileChannel")
#Override
public void store(AgentSelfResult agentSelfResult) {
//use spring-data-cassandra repository to store
LOG.info("Received 'AgentSelfResult': {}...");
LOG.info("Trying to store 'AgentSelfResult' in Cassandra cluster...");
}
}
The CassandraMessageHandler and the spring-cloud-streaming seemed to be a to big overhead to my use case, and I didn't really understand yet... And with this solution, I keep control over what happens in my spring component.

Resources