I am using the Spring Batch remote partitioning. My steps are not running in parallel. Instead they run sequentially I mean the partitioned steps run sequentially. What is the root cause of the issue?
I'm relatively new to Spring Batch, but I had similar problem when I first tried writing my own partitioned step.
In my case, the problem was my taskExecutor (which wasn't asynchronous).
I added a #Bean that initialized an asynTaskExecutor and chained that to my partition step. Eureka, it worked.
Here is an example:
private Step partitionStep() throws SQLException {
return stepBuilderFactory.get("example_partitionstep")
.partitioner(step.getName(), columnRangePartitioner(partitionColumn, tableName))
.partitionHandler(taskExecutorPartitionHandler(step))
.build();
}
For the step:
private Step step() throws SQLException {
return stepBuilderFactory.get("example_step")
.<>chunk(1000)
.reader(cursorItemReader(0L, 0L))
.processor(compositeItemProcessor())
.writer(itemWriter())
.build();
}
And for the TaskExecutor:
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(6);
return taskExecutor;
}
Related
I exposed 2 api's
/endpoint/A and /endpoint/B .
#GetMapping("/endpoint/A")
public ResponseEntity<ResponseA> controllerA() throws InterruptedException {
ResponseA responseA = serviceA.responseClient();
return ResponseEntity.ok().body(responseA);
}
#GetMapping("/endpoint/B")
public ResponseEntity<ResponseA> controllerB() throws InterruptedException {
ResponseA responseB = serviceB.responseClient();
return ResponseEntity.ok().body(responseB);
}
Services implemented regarding endpoint A internally call /endpoint/C and endpoint B internally call /endpoint/D.
As external service /endpoint/D taking more time i.e getting response from /endpoint/A takes more time hence whole threads are stucked that is affecting /endpoint/B.
I tried to solve this using executor service having following implementation
#Bean(name = "serviceAExecutor")
public ThreadPoolTaskExecutor serviceAExecutor(){
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(100);
taskExecutor.setMaxPoolSize(120);
taskExecutor.setQueueCapacity(50);
taskExecutor.setKeepAliveSeconds(120);
taskExecutor.setThreadNamePrefix("serviceAExecutor");
return taskExecutor;
}
Even after implementing this if I received more than 200 request on /endpoint/A simultaneously (greater than default max number of threads in Tomcat server) then I am not getting responses from /endpoint/B as all threads are busy for getting response from endpoint A or in queue.
Can someone plz suggest is there any way to apply bucketization on each exposed endpoint level and allow only limited request to process at a time & put remaining into bucket/queue so that request on other endpoints can work properly ?
Edit:- following is solution approach
#GetMapping("/endpoint/A")
public CompletableFuture<ResponseEntity<ResponseA>> controllerA() throws InterruptedException {
return CompletableFuture.supplyAsync(()->controllerHelperA());
}
#GetMapping("/endpoint/B")
public CompletableFuture<ResponseEntity<ResponseB>> controllerB() throws InterruptedException {
return CompletableFuture.supplyAsync(()->controllerHelperB());
}
private ResponseEntity<ResponseA> controllerHelperA(){
ResponseA responseA = serviceA.responseClient();
return ResponseEntity.ok().body(responseA);
}
private ResponseEntity<ResponseB> controllerHelperB(){
ResponseB responseB = serviceB.responseClient();
return ResponseEntity.ok().body(responseB);
}
Spring MVC supports the async servlet API introduced in Servlet API 3.0. To make it easier when your controller returns a Callable, CompletableFuture or DeferredResult it will run in a background thread and free the request handling thread for further processing.
#GetMapping("/endpoint/A")
public CompletableFuture<ResponseEntity<ResponseA>> controllerA() throws InterruptedException {
return () {
return controllerHelperA();
}
}
private ResponseEntity<ResponseA> controllerHelperA(){
ResponseA responseA = serviceA.responseClient();
return ResponseEntity.ok().body(responseA);
}
Now this will be executed in a background thread. Depending on your version of Spring Boot and if you have configured your own TaskExecutor it will either
use the SimpleAsycnTaskExecutor (which will issue a warning in your logs),
the default provided ThreadPoolTaskExecutor which is configurable through the spring.task.execution namespace
Use your own TaskExecutor but requires additional configuration.
If you don't have a custom TaskExecutor defined and are on a relatively recent version of Spring Boot 2.1 or up (IIRC) you can use the following properties to configure the TaskExecutor.
spring.task.execution.pool.core-size=20
spring.task.execution.pool.max-size=120
spring.task.execution.pool.queue-capacity=50
spring.task.execution.pool.keep-alive=120s
spring.task.execution.thread-name-prefix=async-web-thread
Generally this will be used to execute Spring MVC tasks in the background as well as regular #Async tasks.
If you want to explicitly configure which TaskExecutor to use for your web processing you can create a WebMvcConfigurer and implement the configureAsyncSupport method.
#Configuration
public class AsyncWebConfigurer implements WebMvcConfigurer {
private final AsyncTaskExecutor taskExecutor;
public AsyncWebConfigurer(AsyncTaskExecutor taskExecutor) {
this.taskExecutor=taskExecutor;
}
public void configureAsyncSupport(AsyncSupportConfigurer configurer) {
configurer.setTaskExecutor(taskExecutor);
}
}
You could use an #Qualifier on the constructor argument to specify which TaskExecutor you want to use.
I have the following Spring Integration flow:
It gathers records from one database, converts to json and sends to another database.
The idea is to have 10 pollers (channel0 to 9). Each one is a pollingFlowChanN Bean. But I suspect they are sharing the same thread.
How to I make the polling multi-thread in this scenario?
private IntegrationFlow getChannelPoller(final int channel, final int pollSize, final long delay) {
return IntegrationFlows.from(jdbcMessageSource(channel, pollSize), c -> c.poller(Pollers.fixedDelay(delay)
.transactional(transactionManager)))
.split()
.handle(intControleToJson())
.handle(pgsqlSink)
.get();
}
#Bean
public IntegrationFlow pollingFlowChan0() {
return getChannelPoller(0, properties.getChan0PollSize(), properties.getChan0Delay());
}
#Bean
public IntegrationFlow pollingFlowChan1() {
return getChannelPoller(1, properties.getChan1PollSize(), properties.getChan1Delay());
}
....
I assume you use the latest Spring Boot, which have a TaskScheduler auto-configured with one thread: https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#features.spring-integration. That's the best guess why your tasks use the same thread.
See also answer here: Why does the SFTP Outbound Gateway not start working as soon as I start its Integration Flow?
I am quite new to Spring Batch and tried to run Spring batch with single thread. Now I need to add multithreading in step and have below configuration, but parallel processing is getting hang after some time and no trace on console after it processes some records. Earlier for single thread I used JdbcCursorItemReader and then switch to JdbcPagingItemReader for thread safe reader.
Reader is reading entries from postgres DB and then processor (which calls other rest webservice and return response to writer) and writer (which creates new file and update status data in DB) can execute parallelly.
#Bean
public Job job(JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
ItemReader<OrderRequest> itemReader,
ItemProcessor<OrderRequest, OrderResponse> dataProcessor,
ItemWriter<OrderResponse> fileWriter, JobExecutionListener jobListener,
ItemReadListener<OrderRequest> stepItemReadListener,
SkipListener<OrderRequest, OrderResponse> stepSkipListener, TaskExecutor taskExecutor) {
Step step1 = stepBuilderFactory.get("Process-Data")
.<OrderRequest, OrderResponse>chunk(10)
.listener(stepItemReadListener)
.reader(itemReader)
.processor(dataProcessor)
.writer(fileWriter)
.faultTolerant()
.processorNonTransactional()
.skipLimit(5)
.skip(CustomException.class)
.listener(stepSkipListener)
.taskExecutor(taskExecutor)
.throttleLimit(5)
.build();
return jobBuilderFactory.get("Batch-Job")
.incrementer(new RunIdIncrementer())
.listener(jobListener)
.start(step1)
.build();
}
#StepScope
#Bean
public JdbcPagingItemReader<OrderRequest> jdbcPagingItemReader(#Qualifier("postgresDataSource") DataSource dataSource,
#Value("#{jobParameters[customerId]}") String customerId, OrderRequestRowMapper rowMapper) {
// reading database records using JDBC in a paging fashion
JdbcPagingItemReader<OrderRequest> reader = new JdbcPagingItemReader<>();
reader.setDataSource(dataSource);
reader.setFetchSize(1000);
reader.setRowMapper(rowMapper);
// Sort Keys
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("OrderRequestID", Order.ASCENDING);
// Postgres implementation of a PagingQueryProvider using database specific features.
PostgresPagingQueryProvider queryProvider = new PostgresPagingQueryProvider();
queryProvider.setSelectClause("*");
queryProvider.setFromClause("FROM OrderRequest");
queryProvider.setWhereClause("CUSTOMER = '" + customerId + "'");
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
return reader;
}
#StepScope
#Bean
public SynchronizedItemStreamReader<OrderRequest> itemReader(JdbcPagingItemReader<OrderRequest> jdbcPagingItemReader) {
return new SynchronizedItemStreamReaderBuilder<OrderRequest>().delegate(jdbcPagingItemReader).build();
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(5);
taskExecutor.setMaxPoolSize(5);
taskExecutor.setQueueCapacity(0);
return taskExecutor;
}
#StepScope
#Bean
ItemProcessor<OrderRequest, OrderResponse> dataProcessor() {
return new BatchDataFileProcessor();
}
#StepScope
#Bean
ItemWriter<OrderResponse> fileWriter() {
return new BatchOrderFileWriter();
}
#StepScope
#Bean
public ItemReadListener<OrderRequest> stepItemReadListener() {
return new StepItemReadListener();
}
#Bean
public JobExecutionListener jobListener() {
return new JobListener();
}
#StepScope
#Bean
public SkipListener<OrderRequest, OrderResponse> stepSkipListener() {
return new StepSkipListener();
}
What is problem with multithreading configuration here?
Batch works fine with single record at a time when used JdbcCursorItemReader and no TaskExecutor bean:
#StepScope
#Bean
public JdbcCursorItemReader<OrderRequest> jdbcCursorItemReader(#Qualifier("postgresDataSource") DataSource dataSource,
#Value("#{jobParameters[customerId]}") String customerId, OrderRequestRowMapper rowMapper) {
return new JdbcCursorItemReaderBuilder<OrderRequest>()
.name("jdbcCursorItemReader")
.dataSource(dataSource)
.queryArguments(customerId)
.sql(CommonConstant.FETCH_QUERY)
.rowMapper(rowMapper)
.saveState(true)
.build();
}
After changing TaskExecutor as follows its working now:
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(concurrencyLimit);
return taskExecutor;
}
Didn't get what was the problem with earlier.
I want to set a Sync/Async TaskExecutor in Spring Batch. Is that possible?
I want to configure my step as follows:
<job id="myJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1">
<tasklet task-executor="MyTaskExecutor">
<chunk reader="ReaderFile" processor="ProcessFile" writer="WriterFile"
commit-interval="10" />
</tasklet>
</step>
</job>
Then create the bean "MyTaskExecutor" as follows:
<bean id="MyTaskExecutor" scope="step" class="batch.app.util.MyTaskExecutor"/>
Then in my class configure the TaskExecutor. (Now working as Async):
package batch.app.util;
import org.springframework.batch.core.JobExecution;
import org.springframework.core.task.TaskExecutor;
public class MyTaskExecutor extends SimpleAsyncTaskExecutor{
public TaskExecutor taskExecutor(){
return new SimpleAsyncTaskExecutor("spring_batch");
}
}
I would like that MyTaskExecutor extends from wether SimpleAsyncTaskExecutor or SyncTaskExecutor depending on a condition... Or if that is not possible, to be Async but before executing the step, check that condition and if the taskExecutor executing that step, then throw an error.
I've been looking if there is a way to obtain the class of the TaskExecutor from the Reader (or the Processor or the Writer), but didn't find anything.
Thank you very much
You can use a condition inside your job config to pick up the custom task executor. Below is a small snippet with an annotation driven bean creation for reference. You can use similar logic in your configuration approach as well.
Below has a condition on the TaskExecutor which can be resolved at the time of construction and we can create custom executors and add it up to the step config,
Job job = jobBuilderFactory.get("testJob").incrementer(new RunIdIncrementer())
.start(testStep()).next(testStep1()).end()
.listener(jobCompletionListener()).build();
#Bean
public Step testStep() {
boolean sync = false;
AbstractTaskletStepBuilder<SimpleStepBuilder<String, Test>> stepConfig = stepBuilderFactory
.get("testStep").<String, Test>chunk(10)
.reader(reader())
.processor(processor())
.writer(writer())
.listener(testListener());
if (sync) {
stepConfig.taskExecutor(syncTaskExecutor());
} else {
stepConfig.taskExecutor(asyncTaskExecutor());
}
return stepConfig.build();
}
#Bean
public TaskExecutor asyncTaskExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(10);
return taskExecutor;
}
// Similarly, other TaskExecutor can have its own bean config
I have a JdbcPollingChannelAdapter which reads data via JDBC. I want to make it poll manually (using a commandChannel). It should never poll automatically, and it should run immediately when I trigger a manual poll.
Below I am using a poller which runs every 24 hours to get the channel running at all. I cannot use a cronExpression that never fires as in Quartz: Cron expression that will never execute since Pollers.cronExpression() takes no year.
#Bean
public MessageSource<Object> jdbcMessageSource() {
return new JdbcPollingChannelAdapter(this.dataSource, "SELECT...");
}
#Bean
public IntegrationFlow jdbcFlow() {
return IntegrationFlows
.from(jdbcMessageSource(),
spec -> spec.poller(
Pollers.fixedRate(24, TimeUnit.HOURS)))
.handle(System.out::println)
.get();
}
Well, you go right way about JdbcPollingChannelAdapter and commandChannel, but you don't have configure SourcePollingChannelAdapter as you do with that IntegrationFlows.from(jdbcMessageSource().
What you need is really the jdbcMessageSource(), but to poll it manually you should configure command-based flow:
#Bean
public IntegrationFlow jdbcFlow() {
return IntegrationFlows
.from("commandChannel")
.handle(jdbcMessageSource(), "receive")
.handle(System.out::println)
.get();
}
Exactly that receive() is called from the SourcePollingChannelAdapter on the timing basis.