Spring Batch thread-safe Map job repository - multithreading

the Spring Batch docs say of the Map-backed job repository:
Note that the in-memory repository is volatile and so does not allow restart between JVM instances. It also cannot guarantee that two job instances with the same parameters are launched simultaneously, and is not suitable for use in a multi-threaded Job, or a locally partitioned Step. So use the database version of the repository wherever you need those features.
I would like to use a Map job repository, and I do not care about restarting, prevention of concurrent job executions, etc. but I do care about being able to use multi-threading and local partitioning.
My batch application has some partitioned steps, and at first glance it seems to run just fine with a Map-backed job repository.
What is the reason it said to be not possible with MapJobRepositoryFactoryBean? Looking at the implementation of Map DAOs, they are using ConcurrentHashMap. Is this not thread-safe ?

I would advise you to follow the documentation, rather than relying on implementation details. Even if the maps are individually thread-safe, there might be race conditions in changes than involve more than one of these maps.
You can use an in-memory database very easily. Example
#Grapes([
#Grab('org.springframework:spring-jdbc:4.0.5.RELEASE'),
#Grab('com.h2database:h2:1.3.175'),
#Grab('org.springframework.batch:spring-batch-core:3.0.6.RELEASE'),
// must be passed with -cp, for whatever reason the GroovyClassLoader
// is not used for com.thoughtworks.xstream.io.json.JettisonMappedXmlDriver
//#Grab('org.codehaus.jettison:jettison:1.2'),
])
import org.h2.jdbcx.JdbcDataSource
import org.springframework.batch.core.Job
import org.springframework.batch.core.JobParameters
import org.springframework.batch.core.Step
import org.springframework.batch.core.StepContribution
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory
import org.springframework.batch.core.launch.JobLauncher
import org.springframework.batch.core.scope.context.ChunkContext
import org.springframework.batch.core.step.tasklet.Tasklet
import org.springframework.batch.repeat.RepeatStatus
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.context.annotation.AnnotationConfigApplicationContext
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration
import org.springframework.core.io.ResourceLoader
import org.springframework.jdbc.datasource.init.DatabasePopulatorUtils
import org.springframework.jdbc.datasource.init.ResourceDatabasePopulator
import javax.annotation.PostConstruct
import javax.sql.DataSource
#Configuration
#EnableBatchProcessing
class AppConfig {
#Autowired
private JobBuilderFactory jobs
#Autowired
private StepBuilderFactory steps
#Bean
public Job job() {
return jobs.get("myJob").start(step1()).build()
}
#Bean
Step step1() {
this.steps.get('step1')
.tasklet(new MyTasklet())
.build()
}
#Bean
DataSource dataSource() {
new JdbcDataSource().with {
url = 'jdbc:h2:mem:temp_db;DB_CLOSE_DELAY=-1'
user = 'sa'
password = 'sa'
it
}
}
#Bean
BatchSchemaPopulator batchSchemaPopulator() {
new BatchSchemaPopulator()
}
}
class BatchSchemaPopulator {
#Autowired
ResourceLoader resourceLoader
#Autowired
DataSource dataSource
#PostConstruct
void init() {
def populator = new ResourceDatabasePopulator()
populator.addScript(
resourceLoader.getResource(
'classpath:/org/springframework/batch/core/schema-h2.sql'))
DatabasePopulatorUtils.execute populator, dataSource
}
}
class MyTasklet implements Tasklet {
#Override
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
println 'TEST!'
}
}
def ctx = new AnnotationConfigApplicationContext(AppConfig)
def launcher = ctx.getBean(JobLauncher)
def jobExecution = launcher.run(ctx.getBean(Job), new JobParameters([:]))
println "Status is: ${jobExecution.status}"

Related

Spring reactive file integration

I am trying to use spring-integration-file to poll a directory and create a reactive stream from files placed in this directory. This is working for the most part, but when I place a file but have no subscriber in place I get an exception. To demonstrate the problem I have written a small demo application:
import org.reactivestreams.Publisher;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.http.MediaType;
import org.springframework.integration.dsl.IntegrationFlows;
import org.springframework.integration.dsl.Pollers;
import org.springframework.integration.file.dsl.Files;
import org.springframework.messaging.Message;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Flux;
import java.io.File;
#SpringBootApplication
#RestController
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
#Bean
public Publisher<Message<File>> reactiveSource() {
return IntegrationFlows
.from(Files.inboundAdapter(new File("."))
.patternFilter("*.csv"),
e -> e.poller(Pollers.fixedDelay(1000)))
.channel("processFileChannel")
.toReactivePublisher();
}
#GetMapping(value = "/files", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> files() {
return Flux.from(reactiveSource())
.map(message -> message.getPayload().getAbsolutePath());
}
}
So if I now do a curl to localhost:8080/files and then place a csv file in the root directory of the project everything is fine, I see the path of the file as response to my curl. But when I don't do a curl and then place a file in the root directory I get the following exception:
java.lang.IllegalStateException: The [bean 'reactiveSource.channel#0'; defined in: 'com.example.demo.DemoApplication';
from source: 'bean method reactiveSource'] doesn't have subscribers to accept messages
at org.springframework.util.Assert.state(Assert.java:97)
at org.springframework.integration.channel.FluxMessageChannel.doSend(FluxMessageChannel.java:61)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:570)
... 38 more
I thought one of the attributes of reactive streams was that when there was no subscriber the stream would not start due to the stream being lazy. But apparently this is not the case. Could someone explain what I would need to do to have the stream not start if there is no subscriber?
If you use one of the latest version, then you can use a FluxMessageChannel channel instead of that DirectChannel for the "processFileChannel". This way a SourcePollingChannel adapter will becomes reactive and indeed the source is not going to be polled until a subscription happens to that FluxMessageChannel.
You then create a Flux in your files() API from this FluxMessageChannel - no need in the .toReactivePublisher().
See more in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/reactive-streams.html#source-polling-channel-adapter
The point is that .toReactivePublisher() just makes an integration flow as a Publisher exactly at this point. Everything before this point is in regular, imperative way and works independently from the downstream logic.
UPDATE
Something like this:
#Bean
FluxMessageChannel filesChannel() {
return new FluxMessageChannel();
}
#Bean
public IntegrationFlow reactiveSource() {
return IntegrationFlows
.from(Files.inboundAdapter(new File("."))
.patternFilter("*.csv"),
e -> e.poller(Pollers.fixedDelay(1000)))
.channel(filesChannel())
.get();
}
#GetMapping(value = "/files", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> files() {
return Flux.from(filesChannel())
.map(message -> ((File) message.getPayload()).getAbsolutePath());
}

Operation APPEND failed with HTTP500?

package org.apache.spark.examples.kafkaToflink;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.PrintStream;
import java.nio.charset.StandardCharsets;
import java.util.Properties;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;
import com.microsoft.azure.datalake.store.ADLException;
import com.microsoft.azure.datalake.store.ADLFileOutputStream;
import com.microsoft.azure.datalake.store.ADLStoreClient;
import com.microsoft.azure.datalake.store.IfExists;
import com.microsoft.azure.datalake.store.oauth2.AccessTokenProvider;
import com.microsoft.azure.datalake.store.oauth2.ClientCredsTokenProvider;
import scala.util.parsing.combinator.testing.Str;
public class App {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "192.168.1.72:9092");
properties.setProperty("group.id", "test");
DataStream<String> stream = env.addSource(
new FlinkKafkaConsumer010<String>("tenant", new SimpleStringSchema(), properties), "Kafka_Source");
stream.addSink(new ADLSink()).name("Custom_Sink").setParallelism(128);
env.execute("App");
}
}
class ADLSink<String> extends RichSinkFunction<String> {
private java.lang.String clientId = "***********";
private java.lang.String authTokenEndpoint = "***************";
private java.lang.String clientKey = "*****************";
private java.lang.String accountFQDN = "****************";
private java.lang.String filename = "/Bitfinex/ETHBTC/ORDERBOOK/ORDERBOOK.json";
#Override
public void invoke(String value) {
AccessTokenProvider provider = new ClientCredsTokenProvider(authTokenEndpoint, clientId, clientKey);
ADLStoreClient client = ADLStoreClient.createClient(accountFQDN, provider);
try {
client.setPermission(filename, "744");
ADLFileOutputStream stream = client.getAppendStream(filename);
System.out.println(value);
stream.write(value.toString().getBytes());
stream.close();
} catch (ADLException e) {
System.out.println(e.requestId);
} catch (Exception e) {
System.out.println(e.getMessage());
System.out.println(e.getCause());
}
}
}
I am continuously trying to append a file which is in Azure data lake Store using while loop .But sometimes it gives this , Operation APPEND failed with HTTP500, error in starting or sometimes after 10 min. I am using java
Anubhav, Azure Data Lake streams are single-writer streams - i.e., you cannot write to the same stream from multiple threads, unless you do some form of synchronization between these threads. This is because each write specifies the offset it is writing to, and with multiple threads, the offsets are not consistent.
You seem to be writing from multiple threads (.setParallelism(128) call in your code)
In your case, you have two choices:
Write to a different file in each thread. I do not know your use-case, but we have found that for a lot of cases that is the natural use of different threads - to write to different files.
If it is important to have all the threads write to the same file, then you will need to refactor the sink a little bit so that all the instances have reference to the same ADLFileOutputStream, and you will need to make sure the calls to write() and close() are synchronized.
Now, there is one more issue here - the error you got should have been an HTPP 4xx error (indicating a lease conflict, since the ADLFileOutputStreams acquire lease), rather than HTTP 500, which says there was a server-side problem. To troubleshoot that, I will need to know your account name and time of access. That info is not safe to share on StackOverflow, so please open a support ticket for that and reference this SO question, so the issue gets eventually routed to me.

Cannot get AppInsights working under Spring Boot

I followed the https://learn.microsoft.com/en-us/azure/application-insights/app-insights-java-get-started, but still without success.
I have applicationinsights-web dependency in place via maven
I added ApplicationInsights.xml to main/resources with hardcoded instrumentation key and even with <SDKLogger /> inside
I added the scan path: #ComponentScan({...., "com.microsoft.applicationinsights.web.spring"})
Results:
I see no logs about looking up the configuration file, even if I make the syntax error in in or remove it completely
in debug I see that RequestNameHandlerInterceptorAdapter is instantiated via com.microsoft.applicationinsights.web.spring.internal.InterceptorRegistry, and during calls the preHandle method is called, but calls to ThreadContext.getRequestTelemetryContext() returns always null and nothing more happens
It looks like it is something obvious, but no idea what. What part/classes are responsible for loading the configuration file?
I was a little bit confused with documentation. As mentioned by yonisha, the filter does the whole magic. The following configuration class takes care of creating and adding the filter in Spring Boot application.
import com.microsoft.applicationinsights.web.internal.WebRequestTrackingFilter;
import org.springframework.boot.web.servlet.FilterRegistrationBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import javax.servlet.Filter;
#Configuration
#ComponentScan("com.microsoft.applicationinsights.web.spring")
public class ApplicationInsightsConfiguration {
#Bean
public FilterRegistrationBean someFilterRegistration() {
FilterRegistrationBean registration = new FilterRegistrationBean();
registration.setFilter(appInsightsWebRequestTrackingFilter());
registration.addUrlPatterns("/*");
registration.setName("webRequestTrackingFilter");
registration.setOrder(1);
return registration;
}
#Bean(name = "appInsightsWebRequestTrackingFilter")
public Filter appInsightsWebRequestTrackingFilter() {
return new WebRequestTrackingFilter();
}
Important: It will work nicely if you set the server.context-path property to some value. If not, AI initialization will fail with error
AI: ERROR 03-04-2017 14:11, 20: WebApp name is not found, unable to register WebApp
In order to keep servlet-context empty, I had to implement wrappers for the filter and 2 other classes to override it, but it was a very dirty fix... Would be great, if the name could be passed as a parameter to the filter, but that is not yet possible (https://github.com/Microsoft/ApplicationInsights-Java/issues/359)
In spring boot , We need to configure WebRequestTrackingFilter by extending WebSecurityConfigurerAdapter and overriding configure(HttpSecurity httpSecurity)
#Bean
public WebRequestTrackingFilter applicationInsightsFilterBean() throws Exception {
WebRequestTrackingFilter webRequestTrackingFilter = new WebRequestTrackingFilter();
return webRequestTrackingFilter;
}
#Override
public void configure(HttpSecurity httpSecurity) throws Exception {
//other stuff...
httpSecurity.addFilterBefore(applicationInsightsFilterBean(), UsernamePasswordAuthenticationFilter.class);
}
you need to have below configuration also ..
applicationinsights-web dependency in place via maven
added ApplicationInsights.xml to main/resources
Here is newer guide for Spring Boot Application Insights integration that worked well for me just now:
https://github.com/AzureCAT-GSI/DevCamp/tree/master/HOL/java/06-appinsights
The idea is basically what Tomasz has above with some minor differences.
package devCamp.WebApp.configurations;
import javax.servlet.Filter;
import org.springframework.boot.context.embedded.FilterRegistrationBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import com.microsoft.applicationinsights.TelemetryConfiguration;
import com.microsoft.applicationinsights.web.internal.WebRequestTrackingFilter;
#Configuration
public class AppInsightsConfig {
#Bean
public String telemetryConfig() {
String telemetryKey = System.getenv("APPLICATION_INSIGHTS_IKEY");
if (telemetryKey != null) {
TelemetryConfiguration.getActive().setInstrumentationKey(telemetryKey);
}
return telemetryKey;
}
#Bean
public FilterRegistrationBean aiFilterRegistration() {
FilterRegistrationBean registration = new FilterRegistrationBean();
registration.setFilter(new WebRequestTrackingFilter());
registration.addUrlPatterns("/**");
registration.setOrder(1);
return registration;
}
#Bean(name = "WebRequestTrackingFilter")
public Filter WebRequestTrackingFilter() {
return new WebRequestTrackingFilter();
}
}
The guide at the link above has a full set of instructions and includes client side js and a java log appender example as well. Hope this helps.
The above all method works! However you can try the whole new seamless experience using Application Insights SpringBoot Starter.
https://github.com/Microsoft/ApplicationInsights-Java/blob/master/azure-application-insights-spring-boot-starter/README.md
This is currently in BETA

spring-integration amqp outbound adapter race condition?

We've got a rather complicated spring-integration-amqp use case in one of our production applications and we've been seeing some "org.springframework.integration.MessageDispatchingException: Dispatcher has no subscribers" exceptions on startup. After the initial errors on startup, we don't see those exceptions anymore from the same components. This is seeming like some kind of startup race condition on components that depend on AMQP outbound adapters and that end up using them early in the lifecycle.
I can reproduce this by calling a gateway that sends to a channel wired to an outbound adapter in a PostConstruct method.
config:
package gadams;
import org.springframework.amqp.core.Queue;
import org.springframework.amqp.rabbit.core.RabbitTemplate;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.integration.annotation.IntegrationComponentScan;
import org.springframework.integration.dsl.IntegrationFlow;
import org.springframework.integration.dsl.IntegrationFlows;
import org.springframework.integration.dsl.amqp.Amqp;
import org.springframework.integration.dsl.channel.MessageChannels;
import org.springframework.messaging.MessageChannel;
#SpringBootApplication
#IntegrationComponentScan
public class RabbitRace {
public static void main(String[] args) {
SpringApplication.run(RabbitRace.class, args);
}
#Bean(name = "HelloOut")
public MessageChannel channelHelloOut() {
return MessageChannels.direct().get();
}
#Bean
public Queue queueHello() {
return new Queue("hello.q");
}
#Bean(name = "helloOutFlow")
public IntegrationFlow flowHelloOutToRabbit(RabbitTemplate rabbitTemplate) {
return IntegrationFlows.from("HelloOut").handle(Amqp.outboundAdapter(rabbitTemplate).routingKey("hello.q"))
.get();
}
}
gateway:
package gadams;
import org.springframework.integration.annotation.Gateway;
import org.springframework.integration.annotation.MessagingGateway;
#MessagingGateway
public interface HelloGateway {
#Gateway(requestChannel = "HelloOut")
void sendMessage(String message);
}
component:
package gadams;
import javax.annotation.PostConstruct;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.DependsOn;
import org.springframework.stereotype.Component;
#Component
#DependsOn("helloOutFlow")
public class HelloPublisher {
#Autowired
private HelloGateway helloGateway;
#PostConstruct
public void postConstruct() {
helloGateway.sendMessage("hello");
}
}
In my production use case, we have a component with a PostConstruct method where we're using a TaskScheduler to schedule a bunch of components with some that depend on AMQP outbound adapters, and some of those end up executing immediately. I've tried putting bean names on the IntegrationFlows that involve an outbound adapter and using #DependsOn on the beans that use the gateways and/or the gateway itself, but that doesn't get rid of the errors on startup.
That everything called Lifecycle. Any Spring Integration endpoints start listen for or produce messages only when their start() is performed.
Typically for standard default autoStartup = true it is done in the ApplicationContext.finishRefresh(); as a
// Propagate refresh to lifecycle processor first.
getLifecycleProcessor().onRefresh();
To start producing messages to the channel from the #PostConstruct (afterPropertiesSet()) is really very early, because it is does far away from the finishRefresh().
You really should reconsider your producing logic and that implementation into SmartLifecycle.start() phase.
See more info in the Reference Manual.

Unit tests - run task programmatically

I'm creating custom task for gradle. I don't know how I can create task which will use my custom task class. Is it possible? I want to create this task for functional tests which will be runned on jenkins.
This is my custom task:
package pl.gradle
import org.gradle.api.DefaultTask
import org.gradle.api.tasks.TaskAction
class MyCustomTask extends DefaultTask {
public MyCustomTask() {
// do something
}
#TaskAction
def build() {
ant.echo(message: "only for tests")
}
}
And this is my test class:
package pl.gradle
import static org.junit.Assert.*
import org.gradle.testfixtures.ProjectBuilder
import org.gradle.api.Project
import org.junit.Before;
import org.junit.Test
class MyCustomTaskTest {
private Project project;
def task
#Before
public void setUp() {
project = ProjectBuilder.builder().build()
task = project.task("build", type: MyCustomTask)
}
#Test
public void taskCreatedProperly() {
assertTrue(task instanceof MyCustomTask)
}
#Test
public void shouldRunTask() {
// task.execute() // how to run this task? I want to run build() method from MyCustomTask class which is #TaskAction
}
}
ProjectBuilder is meant for lower-level tests that don't execute tasks. Its sweet spot is testing plugins. In addition you'd write higher-level tests that execute real builds (and therefore also tasks). You'd typically use the Gradle tooling API to kick off these builds. Check out the tooling API samples in the full Gradle distribution.
Call Action.execute yourself for each of the task actions:
project.tasks.getByName("myTask").with { Task task ->
assertEquals task.group, 'My Group'
task.actions.each { Action action ->
action.execute task
}
}
I've previously mentioned this here: https://stackoverflow.com/a/63613564/410939

Resources