I am trying to build a spring batch application that starts a job only after a file comes into a directory. For that I need a file poller and something like the snippet found in Spring reference manual:
public class FileMessageToJobRequest {
private Job job;
private String fileParameterName;
public void setFileParameterName(String fileParameterName) {
this.fileParameterName = fileParameterName;
}
public void setJob(Job job) {
this.job = job;
}
#Transformer
public JobLaunchRequest toRequest(Message<File> message) {
JobParametersBuilder jobParametersBuilder =
new JobParametersBuilder();
jobParametersBuilder.addString(fileParameterName,
message.getPayload().getAbsolutePath());
return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());
}
}
I would like to manage everything with configuration classes, but I can't really figure out how to make it work.
Your question isn't clear. Would be better to have something that works, then some your own PoC or attempt to reach the task.
But anyway that looks like you would like to avoid XML configuration and be only with Java & Annotation Configuration.
For this purpose I suggest you to take a look into Reference Manual and find this sample in the File Support chapter, too:
#Bean
#InboundChannelAdapter(value = "fileInputChannel", poller = #Poller(fixedDelay = "1000"))
public MessageSource<File> fileReadingMessageSource() {
FileReadingMessageSource source = new FileReadingMessageSource();
source.setDirectory(new File(INBOUND_PATH));
source.setFilter(new SimplePatternFileListFilter("*.txt"));
return source;
}
Related
I have written a simple spring batch project where
API to execute job: returns job ID on job launch
reads/processes/writes from/to DB in multithread parallel processing
(Launching the job asynchronously to get the job ID in advance so I can poll the status of the job from another API request.)
API to poll the status of the job with respect to the job ID passed.
Polling api works smoothly if job step's throttle limit is 7 or less.
However, if throttle limit is more than 7, job execution continues but polling api will be on wait till read/process releases.
Have also tried a simple api which simply returns String instead of polling but that goes on wait too.
Sample of the code as shown below:
#Configuration
#EnableBatchProcessing
public class SpringBatchConfig {
private int core = 200;
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get(SC_Constants.JOB)
.incrementer(new RunIdIncrementer())
.listener(new Listener(transDAO))
.start(step_processRecords()
.build();
}
#Bean
public ThreadPoolTaskExecutor taskExecutor(){
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(this.core);
threadPoolTaskExecutor.setMaxPoolSize(this.core);
threadPoolTaskExecutor.setQueueCapacity(this.core);
threadPoolTaskExecutor.setThreadNamePrefix("threadExecutor");
return threadPoolTaskExecutor;
}
#Bean
#StepScope
public JdbcPagingItemReader<Transaction> itemReader(...) {
JdbcPagingItemReader<Transaction> itemReader = new JdbcPagingItemReader<Transaction>();
...
return itemReader;
}
#Bean
#StepScope
public ItemProcessor<Transaction,Transaction> processor() {
return new Processor();
}
#Bean
#StepScope
public ItemWriter<Transaction> writer(...) {
return new Writer();
}
#Bean
public Step step3_processRecords() throws Exception {
return stepBuilderFactory.get(SC_Constants.STEP_3_PROCESS_RECORDS)
.<Transaction,Transaction>chunk(this.chunk)
.reader(itemReader(null,null,null))
.processor(processor())
.writer(writer(null,null,null))
.taskExecutor(taskExecutor())
.throttleLimit(20)
.build();
}
}
file that extends DefaultBatchConfigurer has below:
#Override
public JobLauncher getJobLauncher() {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
SimpleAsyncTaskExecutor exec = new SimpleAsyncTaskExecutor();
exec.setConcurrencyLimit(concurrency_limit);
jobLauncher.setTaskExecutor(exec);
return jobLauncher;
}
Edit:
polling api code snippet
#POST
#Consumes(MediaType.APPLICATION_JSON)
#Path("/getJobStatus")
public Response getJobStatus(#RequestBody String body){
JSONObject jsonObject = new JSONObject(body);
Long jobId = jsonObject.get("jobId");
jobExecution = jobExplorer.getJobExecution(jobId);
batchStatus = jobExecution.getStatus().getBatchStatus();
write_count = jobExecution.getStepExecutions().iterator().next().getWriteCount();
responseDto.setJob_id(jobId);
responseDto.setWrite_count(write_count);
responseDto.setStatus(batchStatus.name());
return responseDto;
}
Second edit:
sharing a snippet of the jobrepository setting: using postgres jdbc job repository.
#Component
public class SpringBatchConfigurer extends DefaultBatchConfigurer{
...
#PostConstruct
public void initialize() {
try {
BasicDataSource dataSource = new BasicDataSource();
dataSource.setDriverClassName(driverClassName);
dataSource.setUsername(username);
dataSource.setPassword(password);
dataSource.setUrl(dsUrl + "?currentSchema=public");
dataSource.setInitialSize(3);
dataSource.setMinIdle(1);
dataSource.setMaxIdle(3);
dataSource.addConnectionProperty("maxConnLifetimeMillis", "30000");
this.transactionManager = new DataSourceTransactionManager(dataSource);
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setDataSource(dataSource);
factory.setTransactionManager(transactionManager);
factory.afterPropertiesSet();
this.jobRepository = factory.getObject();
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
jobLauncher.afterPropertiesSet();
this.jobLauncher = jobLauncher;
} catch (Exception e) {
throw new BatchConfigurationException(e);
}
}
Third Edit: Tried passing it as a local variable under this step. polling works but now, job execution is not happening. No threads generated. No processing is happening.
#Bean
public Step step3_processRecords() throws Exception {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(this.core_size);
threadPoolTaskExecutor.setMaxPoolSize(this.max_pool_size);
threadPoolTaskExecutor.setQueueCapacity(this.queue_capacity);
threadPoolTaskExecutor.setThreadNamePrefix("threadExecutor");
return stepBuilderFactory.get("step3")
.<Transaction,Transaction>chunk(this.chunk)
.reader(itemReader(null,null,null))
.processor(processor())
.writer(writer(null,null,null))
.taskExecutor(threadPoolTaskExecutor)
.throttleLimit(20)
.build();
}
I am quite new to Spring Batch and tried to run Spring batch with single thread. Now I need to add multithreading in step and have below configuration, but parallel processing is getting hang after some time and no trace on console after it processes some records. Earlier for single thread I used JdbcCursorItemReader and then switch to JdbcPagingItemReader for thread safe reader.
Reader is reading entries from postgres DB and then processor (which calls other rest webservice and return response to writer) and writer (which creates new file and update status data in DB) can execute parallelly.
#Bean
public Job job(JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
ItemReader<OrderRequest> itemReader,
ItemProcessor<OrderRequest, OrderResponse> dataProcessor,
ItemWriter<OrderResponse> fileWriter, JobExecutionListener jobListener,
ItemReadListener<OrderRequest> stepItemReadListener,
SkipListener<OrderRequest, OrderResponse> stepSkipListener, TaskExecutor taskExecutor) {
Step step1 = stepBuilderFactory.get("Process-Data")
.<OrderRequest, OrderResponse>chunk(10)
.listener(stepItemReadListener)
.reader(itemReader)
.processor(dataProcessor)
.writer(fileWriter)
.faultTolerant()
.processorNonTransactional()
.skipLimit(5)
.skip(CustomException.class)
.listener(stepSkipListener)
.taskExecutor(taskExecutor)
.throttleLimit(5)
.build();
return jobBuilderFactory.get("Batch-Job")
.incrementer(new RunIdIncrementer())
.listener(jobListener)
.start(step1)
.build();
}
#StepScope
#Bean
public JdbcPagingItemReader<OrderRequest> jdbcPagingItemReader(#Qualifier("postgresDataSource") DataSource dataSource,
#Value("#{jobParameters[customerId]}") String customerId, OrderRequestRowMapper rowMapper) {
// reading database records using JDBC in a paging fashion
JdbcPagingItemReader<OrderRequest> reader = new JdbcPagingItemReader<>();
reader.setDataSource(dataSource);
reader.setFetchSize(1000);
reader.setRowMapper(rowMapper);
// Sort Keys
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("OrderRequestID", Order.ASCENDING);
// Postgres implementation of a PagingQueryProvider using database specific features.
PostgresPagingQueryProvider queryProvider = new PostgresPagingQueryProvider();
queryProvider.setSelectClause("*");
queryProvider.setFromClause("FROM OrderRequest");
queryProvider.setWhereClause("CUSTOMER = '" + customerId + "'");
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
return reader;
}
#StepScope
#Bean
public SynchronizedItemStreamReader<OrderRequest> itemReader(JdbcPagingItemReader<OrderRequest> jdbcPagingItemReader) {
return new SynchronizedItemStreamReaderBuilder<OrderRequest>().delegate(jdbcPagingItemReader).build();
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(5);
taskExecutor.setMaxPoolSize(5);
taskExecutor.setQueueCapacity(0);
return taskExecutor;
}
#StepScope
#Bean
ItemProcessor<OrderRequest, OrderResponse> dataProcessor() {
return new BatchDataFileProcessor();
}
#StepScope
#Bean
ItemWriter<OrderResponse> fileWriter() {
return new BatchOrderFileWriter();
}
#StepScope
#Bean
public ItemReadListener<OrderRequest> stepItemReadListener() {
return new StepItemReadListener();
}
#Bean
public JobExecutionListener jobListener() {
return new JobListener();
}
#StepScope
#Bean
public SkipListener<OrderRequest, OrderResponse> stepSkipListener() {
return new StepSkipListener();
}
What is problem with multithreading configuration here?
Batch works fine with single record at a time when used JdbcCursorItemReader and no TaskExecutor bean:
#StepScope
#Bean
public JdbcCursorItemReader<OrderRequest> jdbcCursorItemReader(#Qualifier("postgresDataSource") DataSource dataSource,
#Value("#{jobParameters[customerId]}") String customerId, OrderRequestRowMapper rowMapper) {
return new JdbcCursorItemReaderBuilder<OrderRequest>()
.name("jdbcCursorItemReader")
.dataSource(dataSource)
.queryArguments(customerId)
.sql(CommonConstant.FETCH_QUERY)
.rowMapper(rowMapper)
.saveState(true)
.build();
}
After changing TaskExecutor as follows its working now:
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(concurrencyLimit);
return taskExecutor;
}
Didn't get what was the problem with earlier.
When a large file is uploaded to a polled directory, and is attempted to be unzipped, an error takes place, because the file is not yet complete.
How to make a retry of polling after some time?
#Bean
#InboundChannelAdapter(value = "inputChannel", poller = #Poller(fixedRate = "1500"))
public FileReadingMessageSource poll() {
FileReadingMessageSource source = new FileReadingMessageSource();
source.setScanEachPoll(true);
source.setDirectory(new File(pathConfig.getIncomingDirPath()));
source.setUseWatchService(true);
source.setFilter(new SimplePatternFileListFilter("*.zip"));
return source;
}
#Transformer(inputChannel = "inputChannel", outputChannel = "unzipChannel")
public Message convert(Message<File> fileMessage) {
UnZipTransformer unzipTransformer = new UnZipTransformer();
unzipTransformer.setZipResultType(ZipResultType.FILE);
unzipTransformer.setWorkDirectory(new File(pathConfig.getWorkDirPath()));
unzipTransformer.setDeleteFiles(false);
unzipTransformer.afterPropertiesSet();
File file = fileMessage.getPayload();
return unzipTransformer.transform(fileMessage);
}
EDIT - i can't get LastModifiedFileListFilter to work with large files. It works fine with small ones; and when I upload a large one, it does not react any more.
EDIT2
Thanks to advice from Artem and Gary, here is the solution that worked for me.
#Bean
#InboundChannelAdapter(value = "inputChannel", poller = #Poller(fixedDelay = "1500"))
public FileReadingMessageSource poll() {
FileReadingMessageSource source = new FileReadingMessageSource();
source.setScanEachPoll(true);
source.setDirectory(new File(pathConfig.getIncomingDirPath()));
source.setUseWatchService(true);
FileListFilter simplePatternFileListFilter = new SimplePatternFileListFilter("*.zip");
source.setFilter(new ChainFileListFilter<>().addFilter(simplePatternFileListFilter));
return source;
}
#Transformer(inputChannel = "inputChannel", outputChannel = "unzipChannel",
adviceChain = "retryOnIncompleteData")
public Message convertZip(Message<File> fileMessage) {
UnZipTransformer unzipTransformer = new UnZipTransformer();
unzipTransformer.setZipResultType(ZipResultType.FILE);
unzipTransformer.setWorkDirectory(new File(pathConfig.getWorkDirPath()));
unzipTransformer.setDeleteFiles(false);
unzipTransformer.afterPropertiesSet();
return unzipTransformer.transform(fileMessage);
}
#Bean
public Advice retryOnIncompleteData() {
RequestHandlerRetryAdvice advice = new RequestHandlerRetryAdvice();
RetryTemplate template = createRetryTemplate();
advice.setRetryTemplate(template);
return advice;
}
private RetryTemplate createRetryTemplate() {
RetryTemplate template = new RetryTemplate();
SimpleRetryPolicy policy = new SimpleRetryPolicy();
policy.setMaxAttempts(15);
template.setRetryPolicy(policy);
FixedBackOffPolicy backOffPolicy = new FixedBackOffPolicy();
backOffPolicy.setBackOffPeriod(25000l);
template.setBackOffPolicy(backOffPolicy);
return template;
}
Well, actually since you use only SimplePatternFileListFilter, any your files are going to be retried on each poll interval.
To avoid re-fetching the same file over and over (if you don't delete it in the end of the process), we recommend to use AcceptOnceFileListFilter as a part of the CompositeFileListFilter.
In case of error like yours you can use ExpressionEvaluatingRequestHandlerAdvice to call onFailureExpression to ResettableFileListFilter.remove() implementation of the AcceptOnceFileListFilter.
On the other hand instead of ExpressionEvaluatingRequestHandlerAdvice you can consider to use RequestHandlerRetryAdvice to retry an unzipping process: you don't need to re-fetch file from the localsystem.
These AOP advices you should apply to the #Transformer.
See more info in the Reference Manual.
BTW, I would say fixedRate is not good for large files. Especially when you have errors like this. The fixedDelay would be better. See their JavaDocs for more information.
When trying to start the jar seperately in Unix machine the Thread for task-schedular is not listnening after some time but it is working fine in Windows machine.Even the application is working in linux on startup but going further sometime it is not working.Please let me know Is there any way to avoid the issue.
#Bean
#InboundChannelAdapter(value = "inputChannel", poller = #Poller(fixedDelay = "1000", maxMessagesPerPoll = "1"))
public MessageSource<?> receive() {
FtpInboundFileSynchronizingMessageSource messageSource = new FtpInboundFileSynchronizingMessageSource(synchronizer());
File Temp = new File(TEMP_FOLDER);
messageSource.setLocalDirectory(Temp);
messageSource.setAutoCreateLocalDirectory(true);
return messageSource;
}
private AbstractInboundFileSynchronizer<FTPFile> synchronizer() {
AbstractInboundFileSynchronizer<FTPFile> fileSynchronizer = new FtpInboundFileSynchronizer(sessionFactory());
fileSynchronizer.setRemoteDirectory(ftpFileLocation);
fileSynchronizer.setDeleteRemoteFiles(false);
Pattern pattern = Pattern.compile(".*\\.xml$");
FtpRegexPatternFileListFilter ftpRegexPatternFileListFilter = new FtpRegexPatternFileListFilter(pattern);
fileSynchronizer.setFilter(ftpRegexPatternFileListFilter);
return fileSynchronizer;
}
#Bean(name = "sessionFactory")
public SessionFactory<FTPFile> sessionFactory() {
DefaultFtpSessionFactory sessionFactory = new DefaultFtpSessionFactory();
sessionFactory.setHost(ftpHostName);
sessionFactory.setUsername(ftpUserName);
sessionFactory.setPassword(ftpPassWord);
return sessionFactory;
}
#Bean(name = "inputChannel")
public PollableChannel inputChannel() {
return new QueueChannel();
}
#Bean(name = PollerMetadata.DEFAULT_POLLER)
public PollerMetadata defaultPoller() {
PollerMetadata pollerMetadata = new PollerMetadata();
pollerMetadata.setTrigger(new PeriodicTrigger(100));
return pollerMetadata;
}
#ServiceActivator(inputChannel = "inputChannel")
public void transferredFilesFromFTP(File payload) {
callWork(payload);
}
There is no reason to have one poller immediately after another one. I mean you don't need that QueueChannel.
It's really interesting what does that magic callWork(payload); code do. Doesn't it have anything blocking for some long time? Even if that looks like void (without returning something to wait), but you may have there some thread starvation code, which steals all the thread from the default TaskScheduler (10 by default).
Looks like this is fully related to your another question Spring Integration ftp Thread process
I try to realize the following workflow with Spring Integration:
1) Poll REST API
2) store the POJO in Cassandra cluster
It's my first try with Spring Integration, so I'm still a bit overwhelmed about the mass of information from the reference. After some research, I could make the following work.
1) Poll REST API
2) Transform mapped POJO JSON result into a string
3) save string into file
Here's the code:
#Configuration
public class ConsulIntegrationConfig {
#InboundChannelAdapter(value = "consulHttp", poller = #Poller(maxMessagesPerPoll = "1", fixedDelay = "1000"))
public String consulAgentPoller() {
return "";
}
#Bean
public MessageChannel consulHttp() {
return MessageChannels.direct("consulHttp").get();
}
#Bean
#ServiceActivator(inputChannel = "consulHttp")
MessageHandler consulAgentHandler() {
final HttpRequestExecutingMessageHandler handler =
new HttpRequestExecutingMessageHandler("http://localhost:8500/v1/agent/self");
handler.setExpectedResponseType(AgentSelfResult.class);
handler.setOutputChannelName("consulAgentSelfChannel");
LOG.info("Created bean'consulAgentHandler'");
return handler;
}
#Bean
public MessageChannel consulAgentSelfChannel() {
return MessageChannels.direct("consulAgentSelfChannel").get();
}
#Bean
public MessageChannel consulAgentSelfFileChannel() {
return MessageChannels.direct("consulAgentSelfFileChannel").get();
}
#Bean
#ServiceActivator(inputChannel = "consulAgentSelfFileChannel")
MessageHandler consulAgentFileHandler() {
final Expression directoryExpression = new SpelExpressionParser().parseExpression("'./'");
final FileWritingMessageHandler handler = new FileWritingMessageHandler(directoryExpression);
handler.setFileNameGenerator(message -> "../../agent_self.txt");
handler.setFileExistsMode(FileExistsMode.APPEND);
handler.setCharset("UTF-8");
handler.setExpectReply(false);
return handler;
}
}
#Component
public final class ConsulAgentTransformer {
#Transformer(inputChannel = "consulAgentSelfChannel", outputChannel = "consulAgentSelfFileChannel")
public String transform(final AgentSelfResult json) throws IOException {
final String result = new StringBuilder(json.toString()).append("\n").toString();
return result;
}
This works fine!
But now, instead of writing the object to a file, I want to store it in a Cassandra cluster with spring-data-cassandra. For that, I commented out the file handler in the config file, return the POJO in transformer and created the following, :
#MessagingGateway(name = "consulCassandraGateway", defaultRequestChannel = "consulAgentSelfFileChannel")
public interface CassandraStorageService {
#Gateway(requestChannel="consulAgentSelfFileChannel")
void store(AgentSelfResult agentSelfResult);
}
#Component
public final class CassandraStorageServiceImpl implements CassandraStorageService {
#Override
public void store(AgentSelfResult agentSelfResult) {
//use spring-data-cassandra repository to store
LOG.info("Received 'AgentSelfResult': {} in Cassandra cluster...");
LOG.info("Trying to store 'AgentSelfResult' in Cassandra cluster...");
}
}
But this seems to be a wrong approach, the service method is never triggered.
So my question is, what would be a correct approach for my usecase? Do I have to implement the MessageHandler interface in my service component, and use a #ServiceActivator in my config. Or is there something missing in my current "gateway-approach"?? Or maybe there is another solution, that I'm not able to see..
Like mentioned before, I'm new to SI, so this may be a stupid question...
Nevertheless, thanks a lot in advance!
It's not clear how you are wiring in your CassandraStorageService bean.
The Spring Integration Cassandra Extension Project has a message-handler implementation.
The Cassandra Sink in spring-cloud-stream-modules uses it with Java configuration so you can use that as an example.
So I finally made it work. All I needed to do was
#Component
public final class CassandraStorageServiceImpl implements CassandraStorageService {
#ServiceActivator(inputChannel="consulAgentSelfFileChannel")
#Override
public void store(AgentSelfResult agentSelfResult) {
//use spring-data-cassandra repository to store
LOG.info("Received 'AgentSelfResult': {}...");
LOG.info("Trying to store 'AgentSelfResult' in Cassandra cluster...");
}
}
The CassandraMessageHandler and the spring-cloud-streaming seemed to be a to big overhead to my use case, and I didn't really understand yet... And with this solution, I keep control over what happens in my spring component.