spring batch getting stuck in parallel processing where works fine in serial processing - multithreading

I am quite new to Spring Batch and tried to run Spring batch with single thread. Now I need to add multithreading in step and have below configuration, but parallel processing is getting hang after some time and no trace on console after it processes some records. Earlier for single thread I used JdbcCursorItemReader and then switch to JdbcPagingItemReader for thread safe reader.
Reader is reading entries from postgres DB and then processor (which calls other rest webservice and return response to writer) and writer (which creates new file and update status data in DB) can execute parallelly.
#Bean
public Job job(JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
ItemReader<OrderRequest> itemReader,
ItemProcessor<OrderRequest, OrderResponse> dataProcessor,
ItemWriter<OrderResponse> fileWriter, JobExecutionListener jobListener,
ItemReadListener<OrderRequest> stepItemReadListener,
SkipListener<OrderRequest, OrderResponse> stepSkipListener, TaskExecutor taskExecutor) {
Step step1 = stepBuilderFactory.get("Process-Data")
.<OrderRequest, OrderResponse>chunk(10)
.listener(stepItemReadListener)
.reader(itemReader)
.processor(dataProcessor)
.writer(fileWriter)
.faultTolerant()
.processorNonTransactional()
.skipLimit(5)
.skip(CustomException.class)
.listener(stepSkipListener)
.taskExecutor(taskExecutor)
.throttleLimit(5)
.build();
return jobBuilderFactory.get("Batch-Job")
.incrementer(new RunIdIncrementer())
.listener(jobListener)
.start(step1)
.build();
}
#StepScope
#Bean
public JdbcPagingItemReader<OrderRequest> jdbcPagingItemReader(#Qualifier("postgresDataSource") DataSource dataSource,
#Value("#{jobParameters[customerId]}") String customerId, OrderRequestRowMapper rowMapper) {
// reading database records using JDBC in a paging fashion
JdbcPagingItemReader<OrderRequest> reader = new JdbcPagingItemReader<>();
reader.setDataSource(dataSource);
reader.setFetchSize(1000);
reader.setRowMapper(rowMapper);
// Sort Keys
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("OrderRequestID", Order.ASCENDING);
// Postgres implementation of a PagingQueryProvider using database specific features.
PostgresPagingQueryProvider queryProvider = new PostgresPagingQueryProvider();
queryProvider.setSelectClause("*");
queryProvider.setFromClause("FROM OrderRequest");
queryProvider.setWhereClause("CUSTOMER = '" + customerId + "'");
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
return reader;
}
#StepScope
#Bean
public SynchronizedItemStreamReader<OrderRequest> itemReader(JdbcPagingItemReader<OrderRequest> jdbcPagingItemReader) {
return new SynchronizedItemStreamReaderBuilder<OrderRequest>().delegate(jdbcPagingItemReader).build();
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(5);
taskExecutor.setMaxPoolSize(5);
taskExecutor.setQueueCapacity(0);
return taskExecutor;
}
#StepScope
#Bean
ItemProcessor<OrderRequest, OrderResponse> dataProcessor() {
return new BatchDataFileProcessor();
}
#StepScope
#Bean
ItemWriter<OrderResponse> fileWriter() {
return new BatchOrderFileWriter();
}
#StepScope
#Bean
public ItemReadListener<OrderRequest> stepItemReadListener() {
return new StepItemReadListener();
}
#Bean
public JobExecutionListener jobListener() {
return new JobListener();
}
#StepScope
#Bean
public SkipListener<OrderRequest, OrderResponse> stepSkipListener() {
return new StepSkipListener();
}
What is problem with multithreading configuration here?
Batch works fine with single record at a time when used JdbcCursorItemReader and no TaskExecutor bean:
#StepScope
#Bean
public JdbcCursorItemReader<OrderRequest> jdbcCursorItemReader(#Qualifier("postgresDataSource") DataSource dataSource,
#Value("#{jobParameters[customerId]}") String customerId, OrderRequestRowMapper rowMapper) {
return new JdbcCursorItemReaderBuilder<OrderRequest>()
.name("jdbcCursorItemReader")
.dataSource(dataSource)
.queryArguments(customerId)
.sql(CommonConstant.FETCH_QUERY)
.rowMapper(rowMapper)
.saveState(true)
.build();
}

After changing TaskExecutor as follows its working now:
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(concurrencyLimit);
return taskExecutor;
}
Didn't get what was the problem with earlier.

Related

spring batch api request on wait

I have written a simple spring batch project where
API to execute job: returns job ID on job launch
reads/processes/writes from/to DB in multithread parallel processing
(Launching the job asynchronously to get the job ID in advance so I can poll the status of the job from another API request.)
API to poll the status of the job with respect to the job ID passed.
Polling api works smoothly if job step's throttle limit is 7 or less.
However, if throttle limit is more than 7, job execution continues but polling api will be on wait till read/process releases.
Have also tried a simple api which simply returns String instead of polling but that goes on wait too.
Sample of the code as shown below:
#Configuration
#EnableBatchProcessing
public class SpringBatchConfig {
private int core = 200;
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get(SC_Constants.JOB)
.incrementer(new RunIdIncrementer())
.listener(new Listener(transDAO))
.start(step_processRecords()
.build();
}
#Bean
public ThreadPoolTaskExecutor taskExecutor(){
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(this.core);
threadPoolTaskExecutor.setMaxPoolSize(this.core);
threadPoolTaskExecutor.setQueueCapacity(this.core);
threadPoolTaskExecutor.setThreadNamePrefix("threadExecutor");
return threadPoolTaskExecutor;
}
#Bean
#StepScope
public JdbcPagingItemReader<Transaction> itemReader(...) {
JdbcPagingItemReader<Transaction> itemReader = new JdbcPagingItemReader<Transaction>();
...
return itemReader;
}
#Bean
#StepScope
public ItemProcessor<Transaction,Transaction> processor() {
return new Processor();
}
#Bean
#StepScope
public ItemWriter<Transaction> writer(...) {
return new Writer();
}
#Bean
public Step step3_processRecords() throws Exception {
return stepBuilderFactory.get(SC_Constants.STEP_3_PROCESS_RECORDS)
.<Transaction,Transaction>chunk(this.chunk)
.reader(itemReader(null,null,null))
.processor(processor())
.writer(writer(null,null,null))
.taskExecutor(taskExecutor())
.throttleLimit(20)
.build();
}
}
file that extends DefaultBatchConfigurer has below:
#Override
public JobLauncher getJobLauncher() {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
SimpleAsyncTaskExecutor exec = new SimpleAsyncTaskExecutor();
exec.setConcurrencyLimit(concurrency_limit);
jobLauncher.setTaskExecutor(exec);
return jobLauncher;
}
Edit:
polling api code snippet
#POST
#Consumes(MediaType.APPLICATION_JSON)
#Path("/getJobStatus")
public Response getJobStatus(#RequestBody String body){
JSONObject jsonObject = new JSONObject(body);
Long jobId = jsonObject.get("jobId");
jobExecution = jobExplorer.getJobExecution(jobId);
batchStatus = jobExecution.getStatus().getBatchStatus();
write_count = jobExecution.getStepExecutions().iterator().next().getWriteCount();
responseDto.setJob_id(jobId);
responseDto.setWrite_count(write_count);
responseDto.setStatus(batchStatus.name());
return responseDto;
}
Second edit:
sharing a snippet of the jobrepository setting: using postgres jdbc job repository.
#Component
public class SpringBatchConfigurer extends DefaultBatchConfigurer{
...
#PostConstruct
public void initialize() {
try {
BasicDataSource dataSource = new BasicDataSource();
dataSource.setDriverClassName(driverClassName);
dataSource.setUsername(username);
dataSource.setPassword(password);
dataSource.setUrl(dsUrl + "?currentSchema=public");
dataSource.setInitialSize(3);
dataSource.setMinIdle(1);
dataSource.setMaxIdle(3);
dataSource.addConnectionProperty("maxConnLifetimeMillis", "30000");
this.transactionManager = new DataSourceTransactionManager(dataSource);
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setDataSource(dataSource);
factory.setTransactionManager(transactionManager);
factory.afterPropertiesSet();
this.jobRepository = factory.getObject();
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
jobLauncher.afterPropertiesSet();
this.jobLauncher = jobLauncher;
} catch (Exception e) {
throw new BatchConfigurationException(e);
}
}
Third Edit: Tried passing it as a local variable under this step. polling works but now, job execution is not happening. No threads generated. No processing is happening.
#Bean
public Step step3_processRecords() throws Exception {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(this.core_size);
threadPoolTaskExecutor.setMaxPoolSize(this.max_pool_size);
threadPoolTaskExecutor.setQueueCapacity(this.queue_capacity);
threadPoolTaskExecutor.setThreadNamePrefix("threadExecutor");
return stepBuilderFactory.get("step3")
.<Transaction,Transaction>chunk(this.chunk)
.reader(itemReader(null,null,null))
.processor(processor())
.writer(writer(null,null,null))
.taskExecutor(threadPoolTaskExecutor)
.throttleLimit(20)
.build();
}

Spring Kafka, manual committing in different threads with multiple acks?

I am trying to ack kafka Message consumed via a batchListener in a separate thread; Using #Async for the called method.
#KafkaListener( topics = "${topic.name}" ,containerFactory = "kafkaListenerContainerFactoryBatch", id ="${kafkaconsumerprefix}")
public void consume(List<ConsumerRecord<String, String>> records,Acknowledgment ack) {
records.forEach(record -> asynchttpCaller.posttoHttpsURL(record,ack));
}
and my Async code is below where KafkaConsumerException extends BatchListenerFailedException
#Async
public void posttoHttpsURL(ConsumerRecord<String, String> record,Acknowledgment ack)
{
try {
//post to http
ack.acknowledge();
}
catch(Exception ex){
throw new KafkaConsumerException("Exception occured in sending via HTTPS",record);
}
}
With the below Configuration
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG,
"read_committed");
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 10000);
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 30000);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,
maxpollRecords);
return props;
}
#Bean
public ConsumerFactory<Object, Object> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
/**
* Batch Listener */
#Bean
#Primary
public ConcurrentKafkaListenerContainerFactory<Object, Object>
kafkaListenerContainerFactoryBatch (
ConcurrentKafkaListenerContainerFactoryConfigurer configurer,
ConsumerFactory<Object, Object> kafkaConsumerFactory,
KafkaOperations<? extends Object, ? extends Object> template ) {
ConcurrentKafkaListenerContainerFactory<Object, Object>
factory = new ConcurrentKafkaListenerContainerFactory<>();
configurer.configure(factory, consumerFactory());
factory.setBatchListener(true);
factory.getContainerProperties().setAckMode(AckMode.MANUAL);
DeadLetterPublishingRecoverer recoverer = new
DeadLetterPublishingRecoverer(template);
ExponentialBackOff fbo = new ExponentialBackOff();
fbo.setMaxElapsedTime(maxElapsedTime);
fbo.setInitialInterval(initialInterval);
fbo.setMultiplier(multiplier);
RecoveringBatchErrorHandler errorHandler = new
RecoveringBatchErrorHandler(recoverer, fbo);
factory.setBatchErrorHandler(errorHandler);
factory.setConcurrency(setConcurrency);
return factory;
}
This ack.acknowledge() acknowledges every record in that batch if using AckMode as MANUAL_IMMEDIATE and will ack only if all are success when AckMode is MANUAL.
The Scenario I have is --> there will be certain httpcalls that results in success and certain that gets a timeout both in the same batch; if the errored Messages has a greater offset than the successful one ;even the succesful one is not getting acknowledged and is being duplicated.
Not sure why BatchListenerFailedException always throws the whole batch though I give specifically the record that errored.
Any suggestions on how to implement this ?
You should not process asynchronously because offsets could be committed out-of-sequence.
BatchListenerFailedException will only work if thrown on the listener thread.

spring batch integration file poller

I am trying to build a spring batch application that starts a job only after a file comes into a directory. For that I need a file poller and something like the snippet found in Spring reference manual:
public class FileMessageToJobRequest {
private Job job;
private String fileParameterName;
public void setFileParameterName(String fileParameterName) {
this.fileParameterName = fileParameterName;
}
public void setJob(Job job) {
this.job = job;
}
#Transformer
public JobLaunchRequest toRequest(Message<File> message) {
JobParametersBuilder jobParametersBuilder =
new JobParametersBuilder();
jobParametersBuilder.addString(fileParameterName,
message.getPayload().getAbsolutePath());
return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());
}
}
I would like to manage everything with configuration classes, but I can't really figure out how to make it work.
Your question isn't clear. Would be better to have something that works, then some your own PoC or attempt to reach the task.
But anyway that looks like you would like to avoid XML configuration and be only with Java & Annotation Configuration.
For this purpose I suggest you to take a look into Reference Manual and find this sample in the File Support chapter, too:
#Bean
#InboundChannelAdapter(value = "fileInputChannel", poller = #Poller(fixedDelay = "1000"))
public MessageSource<File> fileReadingMessageSource() {
FileReadingMessageSource source = new FileReadingMessageSource();
source.setDirectory(new File(INBOUND_PATH));
source.setFilter(new SimplePatternFileListFilter("*.txt"));
return source;
}

FTP - Using Spring Integration task-scheduler process stops after certain period

When trying to start the jar seperately in Unix machine the Thread for task-schedular is not listnening after some time but it is working fine in Windows machine.Even the application is working in linux on startup but going further sometime it is not working.Please let me know Is there any way to avoid the issue.
#Bean
#InboundChannelAdapter(value = "inputChannel", poller = #Poller(fixedDelay = "1000", maxMessagesPerPoll = "1"))
public MessageSource<?> receive() {
FtpInboundFileSynchronizingMessageSource messageSource = new FtpInboundFileSynchronizingMessageSource(synchronizer());
File Temp = new File(TEMP_FOLDER);
messageSource.setLocalDirectory(Temp);
messageSource.setAutoCreateLocalDirectory(true);
return messageSource;
}
private AbstractInboundFileSynchronizer<FTPFile> synchronizer() {
AbstractInboundFileSynchronizer<FTPFile> fileSynchronizer = new FtpInboundFileSynchronizer(sessionFactory());
fileSynchronizer.setRemoteDirectory(ftpFileLocation);
fileSynchronizer.setDeleteRemoteFiles(false);
Pattern pattern = Pattern.compile(".*\\.xml$");
FtpRegexPatternFileListFilter ftpRegexPatternFileListFilter = new FtpRegexPatternFileListFilter(pattern);
fileSynchronizer.setFilter(ftpRegexPatternFileListFilter);
return fileSynchronizer;
}
#Bean(name = "sessionFactory")
public SessionFactory<FTPFile> sessionFactory() {
DefaultFtpSessionFactory sessionFactory = new DefaultFtpSessionFactory();
sessionFactory.setHost(ftpHostName);
sessionFactory.setUsername(ftpUserName);
sessionFactory.setPassword(ftpPassWord);
return sessionFactory;
}
#Bean(name = "inputChannel")
public PollableChannel inputChannel() {
return new QueueChannel();
}
#Bean(name = PollerMetadata.DEFAULT_POLLER)
public PollerMetadata defaultPoller() {
PollerMetadata pollerMetadata = new PollerMetadata();
pollerMetadata.setTrigger(new PeriodicTrigger(100));
return pollerMetadata;
}
#ServiceActivator(inputChannel = "inputChannel")
public void transferredFilesFromFTP(File payload) {
callWork(payload);
}
There is no reason to have one poller immediately after another one. I mean you don't need that QueueChannel.
It's really interesting what does that magic callWork(payload); code do. Doesn't it have anything blocking for some long time? Even if that looks like void (without returning something to wait), but you may have there some thread starvation code, which steals all the thread from the default TaskScheduler (10 by default).
Looks like this is fully related to your another question Spring Integration ftp Thread process

Large data processing using Spring Batch Multi-threaded Step and RepositoryItemWriter/ RepositoryItemReader

I am trying to write a batch processing application using spring batch with multi-thread step.this is simple application reading data from a table and writing to another table but data is large around 2 million record .
I am using RepositoryItemReader & RepositoryItemWriter for reading and writing data. But after processing some data it failing due to Unable to acquire JDBC Connection.
//Config.Java
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(10);
return taskExecutor;
}
#Bean(name = "personJob")
public Job personKeeperJob() {
Step step = stepBuilderFactory.get("step-1")
.<User, Person> chunk(1000)
.reader(userReader)
.processor(jpaProcessor)
.writer(personWriter)
.taskExecutor(taskExecutor())
.throttleLimit(10)
.build();
Job job = jobBuilderFactory.get("person-job")
.incrementer(new RunIdIncrementer())
.listener(this)
.start(step)
.build();
return job;
}
//Processor.Java
#Override
public Person process(User user) throws Exception {
Optional<User> userFromDb = userRepo.findById(user.getUserId());
Person person = new Person();
if(userFromDb.isPresent()) {
person.setName(userFromDb.get().getName());
person.setUserId(userFromDb.get().getUserId());
person.setDept(userFromDb.get().getDept());
}
return person;
}
//Reader.Java
#Autowired
public UserItemReader(final UserRepository repository) {
super();
this.repository = repository;
}
#PostConstruct
protected void init() {
final Map<String, Sort.Direction> sorts = new HashMap<>();
sorts.put("userId", Direction.ASC);
this.setRepository(this.repository);
this.setSort(sorts);
this.setMethodName("findAll");
}
//Writer.Java
#PostConstruct
protected void init() {
this.setRepository(repository);
}
#Transactional
public void write(List<? extends Person> persons) throws Exception {
repository.saveAll(persons);
}
application.properties
# Datasource
spring.datasource.platform=h2
spring.datasource.url=jdbc:h2:mem:batchdb
spring.main.allow-bean-definition-overriding=true
spring.datasource.hikari.maximum-pool-size=500
Error :
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:447)
......................
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48)
............................
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30927ms.
You run out of connections.
Try to set the Hikari Connection Pool to a bigger number:
spring.datasource.hikari.maximum-pool-size=20

Resources