Spring cloud dataflow does not flow

Spring cloud dataflow does not flow - spring-integration

I have a DSL (shown below) that ends on "log", so the json produced from the jdbc source should be logged and it's not.
The Supplier reads a database queue and produce a json array for the rows.
If I turn on logging, the SybaseSupplierConfiguration.this.logger.debug("Json: {}", json); is outputted.
Why is it not flowing to "log" ?
So far I have tried:
Downgrade spring boot to 2.2.9 (using 2.3.2)
Fixed the return result of jsonSupplier (to a json string)
Disabled prometheus / grafana
Explicitly configured poll spring.cloud.stream.poller.fixed-delay=10
Used rabbitmq binder and docker image
Offered some booz to the spring cloud dataflow god.
None worked.
the docker:
export DATAFLOW_VERSION=2.6.0
export SKIPPER_VERSION=2.5.0
docker-compose -f ./docker-compose.yml -f ./docker-compose-prometheus.yml up -d
the pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>company-cloud-dataflow-apps</artifactId>
<groupId>br.com.company.cloud.dataflow.apps</groupId>
<version>1.0.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>jdbc-sybase-supplier</artifactId>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream-test-support</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>${lombok.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-configuration-processor</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud.stream.app</groupId>
<artifactId>app-starters-micrometer-common</artifactId>
<version>${app-starters-micrometer-common.version}</version>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer.prometheus</groupId>
<artifactId>prometheus-rsocket-spring</artifactId>
<version>${prometheus-rsocket.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream-binder-kafka</artifactId>
</dependency>
<dependency>
<groupId>net.sourceforge.jtds</groupId>
<artifactId>jtds</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-integration</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.integration</groupId>
<artifactId>spring-integration-jdbc</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jdk8</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jsr310</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-parameter-names</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-jaxb-annotations</artifactId>
<version>${jackson.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
The configuration:
....
spring.cloud.stream.function.bindings.jsonSupplier-out-0=output
spring.cloud.function.definition=jsonSupplier
The implementation:
#SpringBootApplication
#EnableConfigurationProperties(SybaseSupplierProperties.class)
public class SybaseSupplierConfiguration {
private final DataSource dataSource;
private final SybaseSupplierProperties properties;
private final ObjectMapper objectMapper;
private final JdbcTemplate jdbcTemplate;
private final Logger logger = LoggerFactory.getLogger(getClass());
#Autowired
public SybaseSupplierConfiguration(DataSource dataSource,
JdbcTemplate jdbcTemplate,
SybaseSupplierProperties properties) {
this.dataSource = dataSource;
this.jdbcTemplate = jdbcTemplate;
this.properties = properties;
objectMapper = new ObjectMapper().registerModule(new ParameterNamesModule())
.registerModule(new Jdk8Module())
.registerModule(new JavaTimeModule())
.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);
}
public static void main(String[] args) {
SpringApplication.run(SybaseSupplierConfiguration.class, args);
}
#Value
static class IntControle {
long cdIntControle;
String deTabela;
}
#Bean
public MessageSource<Object> jdbcMessageSource() {
String query = "select cd_int_controle, de_tabela from int_controle rowlock readpast " +
"where id_status = 0 order by cd_int_controle";
JdbcPollingChannelAdapter adapter =
new JdbcPollingChannelAdapter(dataSource, query) {
#Override
protected Object doReceive() {
Object object = super.doReceive();
if (object == null) {
return null;
}
#SuppressWarnings("unchecked")
List<IntControle> ints = (List<IntControle>) object;
try (ByteArrayOutputStream out = new ByteArrayOutputStream()) {
try (JsonGenerator jen = objectMapper.createGenerator(out)) {
jen.writeStartArray();
for (IntControle itm : ints) {
String qry = String.format("select * from vw_integ_%s where cd_int_controle = %d",
itm.getDeTabela(), itm.getCdIntControle());
List<Map<String, Object>> maps = jdbcTemplate.queryForList(qry);
for (Map<String, Object> l : maps) {
jen.writeStartObject();
for (Map.Entry<String, Object> entry : l.entrySet()) {
String k = entry.getKey();
Object v = entry.getValue();
jen.writeFieldName(k);
if (v == null) {
jen.writeNull();
} else {
//caso necessário um item específico, ver em:
// https://stackoverflow.com/questions/6514876/most-efficient-conversion-of-resultset-to-json
jen.writeObject(v);
}
}
jen.writeEndObject();
}
}
jen.writeEndArray();
}
String json = out.toString();
SybaseSupplierConfiguration.this.logger.debug("Json: {}", json);
return json;
} catch (IOException e) {
throw new IllegalArgumentException("Erro ao converter json", e);
}
}
};
adapter.setMaxRows(properties.getPollSize());
adapter.setUpdatePerRow(true);
adapter.setRowMapper((RowMapper<IntControle>) (rs, i) -> new IntControle(rs.getLong(1), rs.getString(2)));
adapter.setUpdateSql("update int_controle set id_status = 1 where cd_int_controle = :cdIntControle");
return adapter;
}
#Bean
public Supplier<Message<?>> jsonSupplier() {
return jdbcMessageSource()::receive;
}
}
the shell setup:
app register --name jdbc-postgresql-sink --type sink --uri maven://br.com.company.cloud.dataflow.apps:jdbc-postgresql-sink:1.0.0-SNAPSHOT --force
app register --name jdbc-sybase-supplier --type source --uri maven://br.com.company.cloud.dataflow.apps:jdbc-sybase-supplier:1.0.0-SNAPSHOT --force
stream create --name sybase_to_pgsql --definition "jdbc-sybase-supplier | log "
stream deploy --name sybase_to_pgsql
the log:
....
2020-08-02 00:40:18.644 INFO 81 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 0 endpoint(s) beneath base path '/actuator'
2020-08-02 00:40:18.793 INFO 81 --- [ main] o.s.i.endpoint.EventDrivenConsumer : Adding {logging-channel-adapter:_org.springframework.integration.errorLogger} as a subscriber to the 'errorChannel' channel
2020-08-02 00:40:18.793 INFO 81 --- [ main] o.s.i.channel.PublishSubscribeChannel : Channel 'application.errorChannel' has 1 subscriber(s).
2020-08-02 00:40:18.793 INFO 81 --- [ main] o.s.i.endpoint.EventDrivenConsumer : started bean '_org.springframework.integration.errorLogger'
2020-08-02 00:40:18.793 INFO 81 --- [ main] o.s.i.endpoint.EventDrivenConsumer : Adding {router} as a subscriber to the 'jsonSupplier_integrationflow.channel#0' channel
2020-08-02 00:40:18.793 INFO 81 --- [ main] o.s.integration.channel.DirectChannel : Channel 'application.jsonSupplier_integrationflow.channel#0' has 1 subscriber(s).
2020-08-02 00:40:18.794 INFO 81 --- [ main] o.s.i.endpoint.EventDrivenConsumer : started bean 'jsonSupplier_integrationflow.org.springframework.integration.config.ConsumerEndpointFactoryBean#0'
2020-08-02 00:40:18.795 INFO 81 --- [ main] o.s.c.s.binder.DefaultBinderFactory : Creating binder: kafka
2020-08-02 00:40:19.235 INFO 81 --- [ main] o.s.c.s.binder.DefaultBinderFactory : Caching the binder: kafka
2020-08-02 00:40:19.235 INFO 81 --- [ main] o.s.c.s.binder.DefaultBinderFactory : Retrieving cached binder: kafka
2020-08-02 00:40:19.362 INFO 81 --- [ main] o.s.c.s.b.k.p.KafkaTopicProvisioner : Using kafka topic for outbound: sybase_to_pgsql.jdbc-sybase-supplier
2020-08-02 00:40:19.364 INFO 81 --- [ main] o.a.k.clients.admin.AdminClientConfig : AdminClientConfig values:
bootstrap.servers = [PLAINTEXT://kafka-broker:9092]
client.dns.lookup = default
client.id =
connections.max.idle.ms = 300000
default.api.timeout.ms = 60000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
2020-08-02 00:40:19.572 INFO 81 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.5.0
2020-08-02 00:40:19.572 INFO 81 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 66563e712b0b9f84
2020-08-02 00:40:19.572 INFO 81 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1596328819571
2020-08-02 00:40:20.403 INFO 81 --- [ main] o.a.k.clients.producer.ProducerConfig : ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [PLAINTEXT://kafka-broker:9092]
buffer.memory = 33554432
client.dns.lookup = default
client.id = producer-1
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metadata.max.idle.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
2020-08-02 00:40:20.477 INFO 81 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.5.0
2020-08-02 00:40:20.477 INFO 81 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 66563e712b0b9f84
2020-08-02 00:40:20.477 INFO 81 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1596328820477
2020-08-02 00:40:20.573 INFO 81 --- [ad | producer-1] org.apache.kafka.clients.Metadata : [Producer clientId=producer-1] Cluster ID: um9lJtXTQUmURh9cwOkqxA
2020-08-02 00:40:20.574 INFO 81 --- [ main] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 30000 ms.
2020-08-02 00:40:20.622 INFO 81 --- [ main] o.s.c.s.m.DirectWithAttributesChannel : Channel 'application.output' has 1 subscriber(s).
2020-08-02 00:40:20.625 INFO 81 --- [ main] o.s.i.e.SourcePollingChannelAdapter : started bean 'jsonSupplier_integrationflow.org.springframework.integration.config.SourcePollingChannelAdapterFactoryBean#0'
2020-08-02 00:40:20.654 INFO 81 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 20031 (http) with context path ''
2020-08-02 00:40:20.674 INFO 81 --- [ main] b.c.c.d.a.s.SybaseSupplierConfiguration : Started SybaseSupplierConfiguration in 12.982 seconds (JVM running for 14.55)
2020-08-02 00:40:21.160 INFO 81 --- [ask-scheduler-1] o.a.k.clients.producer.ProducerConfig : ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [PLAINTEXT://kafka-broker:9092]
buffer.memory = 33554432
client.dns.lookup = default
client.id = producer-2
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metadata.max.idle.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
2020-08-02 00:40:21.189 INFO 81 --- [ask-scheduler-1] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.5.0
2020-08-02 00:40:21.189 INFO 81 --- [ask-scheduler-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 66563e712b0b9f84
2020-08-02 00:40:21.189 INFO 81 --- [ask-scheduler-1] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1596328821189
2020-08-02 00:40:21.271 INFO 81 --- [ad | producer-2] org.apache.kafka.clients.Metadata : [Producer clientId=producer-2] Cluster ID: um9lJtXTQUmURh9cwOkqxA

If you are using function-based applications in SCDF, you will have to supply an extra configuration when deploying streams. Please have a look at the recipe that walks through the function-based deployment scenario.
Specifically, look at the application-specific function bindings and the property override for the time-source and the log-sink applications.
app.time-source.spring.cloud.stream.function.bindings.timeSupplier-out-0=output
app.log-sink.spring.cloud.stream.function.bindings.logConsumer-in-0=input
The input/output channel bindings require an explicit mapping to the function-binding that you have in your custom source. You will have to override the custom-sources' function binding to the output channel, and everything should come together then.
In v2.6, we are attempting to automate this explicit binding in SCDF, so there will be one less thing to configure in the future.

Related

JHipster Integration Tests with LocalStack testcontainers

I have a JHipster (7.9.3) application with Kafka and Postgres TestContainers used in the integration tests. I want to integrate my application with S3 storage. For this purpose, I want to write some Integration tests using LocalStack testcontainer.
I have created a new annotation:
#Target(ElementType.TYPE)
#Retention(RetentionPolicy.RUNTIME)
public #interface EmbeddedS3 {
}
and added localstack dependency to the project:
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>localstack</artifactId>
<scope>test</scope>
</dependency>
created a LocalStackTestContainer as:
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.DisposableBean;
import org.springframework.beans.factory.InitializingBean;
import org.testcontainers.containers.localstack.LocalStackContainer;
import org.testcontainers.containers.output.Slf4jLogConsumer;
import org.testcontainers.utility.DockerImageName;
public class LocalStackTestContainer implements InitializingBean, DisposableBean {
private LocalStackContainer localStackContainer;
private static final Logger log = LoggerFactory.getLogger(LocalStackTestContainer.class);
#Override
public void destroy() {
if (null != localStackContainer) {
localStackContainer.close();
}
}
#Override
public void afterPropertiesSet() {
if (null == localStackContainer) {
localStackContainer =
new LocalStackContainer(DockerImageName.parse("localstack/localstack:1.2.0"))
.withServices(LocalStackContainer.Service.S3)
.withLogConsumer(new Slf4jLogConsumer(log))
.withReuse(true)
;
}
if (!localStackContainer.isRunning()) {
localStackContainer.start();
}
}
public LocalStackContainer getLocalStackContainer() {
return localStackContainer;
}
}
and adjusted TestContainersSpringContextCustomizerFactory.createContextCustomizer method with:
EmbeddedS3 s3LocalStackAnnotation = AnnotatedElementUtils.findMergedAnnotation(testClass, EmbeddedS3.class);
if (null != s3LocalStackAnnotation) {
log.debug("detected the EmbeddedS3 annotation on class {}", testClass.getName());
log.info("Warming up the localstack S3");
if (null == localStackTestContainer) {
localStackTestContainer = beanFactory.createBean(LocalStackTestContainer.class);
beanFactory.registerSingleton(LocalStackTestContainer.class.getName(), localStackTestContainer);
// ((DefaultListableBeanFactory) beanFactory).registerDisposableBean(LocalStackTestContainer.class.getName(), localStackTestContainer);
}
}
Added #EmbeddedS3 annotation to the #IntegrationTest annotation as:
#Target(ElementType.TYPE)
#Retention(RetentionPolicy.RUNTIME)
#SpringBootTest(classes = {AgentMarlinApp.class, AsyncSyncConfiguration.class, TestSecurityConfiguration.class, TestLocalStackConfiguration.class})
#EmbeddedKafka
#EmbeddedSQL
#EmbeddedS3
#DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_CLASS)
#ActiveProfiles({"testdev", "it-test"})
public #interface IntegrationTest {
// 5s is the spring default https://github.com/spring-projects/spring-framework/blob/29185a3d28fa5e9c1b4821ffe519ef6f56b51962/spring-test/src/main/java/org/springframework/test/web/reactive/server/DefaultWebTestClient.java#L106
String DEFAULT_TIMEOUT = "PT5S";
String DEFAULT_ENTITY_TIMEOUT = "PT5S";
}
To initialize AmazonS3 client I have #TestConfiguration:
#Bean
public AmazonS3 amazonS3(LocalStackTestContainer localStackTestContainer) {
LocalStackContainer localStack = localStackTestContainer.getLocalStackContainer();
return AmazonS3ClientBuilder
.standard()
.withEndpointConfiguration(
new AwsClientBuilder.EndpointConfiguration(
localStack.getEndpointOverride(LocalStackContainer.Service.S3).toString(),
localStack.getRegion()
)
)
.withCredentials(
new AWSStaticCredentialsProvider(
new BasicAWSCredentials(localStack.getAccessKey(), localStack.getSecretKey())
)
)
.build();
}
I have 2 integration classes (ending with *IT), when first class tests are executed I see that testcontainers are started
2022-10-30 14:34:42.031 DEBUG 27208 --- [ main] 🐳 [testcontainers/ryuk:0.3.3] : Starting container: testcontainers/ryuk:0.3.3
2022-10-30 14:34:42.032 DEBUG 27208 --- [ main] 🐳 [testcontainers/ryuk:0.3.3] : Trying to start container: testcontainers/ryuk:0.3.3 (attempt 1/1)
2022-10-30 14:34:42.033 DEBUG 27208 --- [ main] 🐳 [testcontainers/ryuk:0.3.3] : Starting container: testcontainers/ryuk:0.3.3
2022-10-30 14:34:42.033 INFO 27208 --- [ main] 🐳 [testcontainers/ryuk:0.3.3] : Creating container for image: testcontainers/ryuk:0.3.3
2022-10-30 14:34:42.371 INFO 27208 --- [ main] 🐳 [testcontainers/ryuk:0.3.3] : Container testcontainers/ryuk:0.3.3 is starting: 5505472cec1608db3383ebeeee99a8d02b48331a53f5d53613a0a53c1cd51986
2022-10-30 14:34:43.271 INFO 27208 --- [ main] 🐳 [testcontainers/ryuk:0.3.3] : Container testcontainers/ryuk:0.3.3 started in PT1.2706326S
2022-10-30 14:34:43.282 INFO 27208 --- [ main] o.t.utility.RyukResourceReaper : Ryuk started - will monitor and terminate Testcontainers containers on JVM exit
2022-10-30 14:34:43.283 INFO 27208 --- [ main] org.testcontainers.DockerClientFactory : Checking the system...
2022-10-30 14:34:43.283 INFO 27208 --- [ main] org.testcontainers.DockerClientFactory : ✔︎ Docker server version should be at least 1.6.0
2022-10-30 14:34:43.284 INFO 27208 --- [ main] 🐳 [localstack/localstack:1.2.0] : HOSTNAME_EXTERNAL environment variable set to localhost (to match host-routable address for container)
2022-10-30 14:34:43.284 DEBUG 27208 --- [ main] 🐳 [localstack/localstack:1.2.0] : Starting container: localstack/localstack:1.2.0
2022-10-30 14:34:43.285 DEBUG 27208 --- [ main] 🐳 [localstack/localstack:1.2.0] : Trying to start container: localstack/localstack:1.2.0 (attempt 1/1)
2022-10-30 14:34:43.285 DEBUG 27208 --- [ main] 🐳 [localstack/localstack:1.2.0] : Starting container: localstack/localstack:1.2.0
2022-10-30 14:34:43.285 INFO 27208 --- [ main] 🐳 [localstack/localstack:1.2.0] : Creating container for image: localstack/localstack:1.2.0
2022-10-30 14:34:44.356 WARN 27208 --- [ main] 🐳 [localstack/localstack:1.2.0] : Reuse was requested but the environment does not support the reuse of containers
To enable reuse of containers, you must set 'testcontainers.reuse.enable=true' in a file located at C:\Users\artjo\.testcontainers.properties
2022-10-30 14:34:44.581 INFO 27208 --- [ main] 🐳 [localstack/localstack:1.2.0] : Container localstack/localstack:1.2.0 is starting: d09c4e105058444699a29338b85d7294efec29941857e581daf634d391395869
2022-10-30 14:34:48.321 INFO 27208 --- [ main] 🐳 [localstack/localstack:1.2.0] : Container localstack/localstack:1.2.0 started in PT5.0370436S
2022-10-30 14:34:48.321 INFO 27208 --- [ main] ContainersSpringContextCustomizerFactory : Warming up the sql database
2022-10-30 14:34:48.327 DEBUG 27208 --- [ main] 🐳 [postgres:14.5] : Starting container: postgres:14.5
2022-10-30 14:34:48.327 DEBUG 27208 --- [ main] 🐳 [postgres:14.5] : Trying to start container: postgres:14.5 (attempt 1/1)
2022-10-30 14:34:48.327 DEBUG 27208 --- [ main] 🐳 [postgres:14.5] : Starting container: postgres:14.5
2022-10-30 14:34:48.327 INFO 27208 --- [ main] 🐳 [postgres:14.5] : Creating container for image: postgres:14.5
2022-10-30 14:34:48.328 WARN 27208 --- [ main] 🐳 [postgres:14.5] : Reuse was requested but the environment does not support the reuse of containers
To enable reuse of containers, you must set 'testcontainers.reuse.enable=true' in a file located at C:\Users\artjo\.testcontainers.properties
2022-10-30 14:34:48.415 INFO 27208 --- [ main] 🐳 [postgres:14.5] : Container postgres:14.5 is starting: 5e4fcf583b44345651aad9d5f939bc913d62eafe16a73017379c9e43f2028ff4
But when the second IT test is started the LocalStackTestContainer bean is not available anymore.
How do I need to configure this so localstack container bean remain available during all the tests execution?
######################## UPDATE 01.11.2022 ########################
Other testcontainers are configured the same way (no changes from the auto generated code)
public class TestContainersSpringContextCustomizerFactory implements ContextCustomizerFactory {
private Logger log = LoggerFactory.getLogger(TestContainersSpringContextCustomizerFactory.class);
private static LocalStackTestContainer localStackTestContainer;
private static KafkaTestContainer kafkaBean;
private static SqlTestContainer devTestContainer;
private static SqlTestContainer prodTestContainer;
#Override
public ContextCustomizer createContextCustomizer(Class<?> testClass, List<ContextConfigurationAttributes> configAttributes) {
return (context, mergedConfig) -> {
ConfigurableListableBeanFactory beanFactory = context.getBeanFactory();
TestPropertyValues testValues = TestPropertyValues.empty();
// EmbeddedS3 s3LocalStackAnnotation = AnnotatedElementUtils.findMergedAnnotation(testClass, EmbeddedS3.class);
// if (null != s3LocalStackAnnotation) {
// log.debug("detected the EmbeddedS3 annotation on class {}", testClass.getName());
// log.info("Warming up the localstack S3");
//
// if (null == localStackTestContainer) {
// localStackTestContainer = beanFactory.createBean(LocalStackTestContainer.class);
// beanFactory.registerSingleton(LocalStackTestContainer.class.getName(), localStackTestContainer);
//// ((DefaultListableBeanFactory) beanFactory).registerDisposableBean(LocalStackTestContainer.class.getName(), localStackTestContainer);
// }
// }
EmbeddedSQL sqlAnnotation = AnnotatedElementUtils.findMergedAnnotation(testClass, EmbeddedSQL.class);
if (null != sqlAnnotation) {
log.debug("detected the EmbeddedSQL annotation on class {}", testClass.getName());
log.info("Warming up the sql database");
if (
Arrays
.asList(context.getEnvironment().getActiveProfiles())
.contains("test" + JHipsterConstants.SPRING_PROFILE_DEVELOPMENT)
) {
if (null == devTestContainer) {
try {
Class<? extends SqlTestContainer> containerClass = (Class<? extends SqlTestContainer>) Class.forName(
this.getClass().getPackageName() + ".PostgreSqlTestContainer"
);
devTestContainer = beanFactory.createBean(containerClass);
beanFactory.registerSingleton(containerClass.getName(), devTestContainer);
// ((DefaultListableBeanFactory)beanFactory).registerDisposableBean(containerClass.getName(), devTestContainer);
} catch (ClassNotFoundException e) {
throw new RuntimeException(e);
}
}
testValues =
testValues.and(
"spring.r2dbc.url=" + devTestContainer.getTestContainer().getJdbcUrl().replace("jdbc", "r2dbc") + ""
);
testValues = testValues.and("spring.r2dbc.username=" + devTestContainer.getTestContainer().getUsername());
testValues = testValues.and("spring.r2dbc.password=" + devTestContainer.getTestContainer().getPassword());
testValues = testValues.and("spring.liquibase.url=" + devTestContainer.getTestContainer().getJdbcUrl() + "");
}
if (
Arrays
.asList(context.getEnvironment().getActiveProfiles())
.contains("test" + JHipsterConstants.SPRING_PROFILE_PRODUCTION)
) {
if (null == prodTestContainer) {
try {
Class<? extends SqlTestContainer> containerClass = (Class<? extends SqlTestContainer>) Class.forName(
this.getClass().getPackageName() + ".PostgreSqlTestContainer"
);
prodTestContainer = beanFactory.createBean(containerClass);
beanFactory.registerSingleton(containerClass.getName(), prodTestContainer);
// ((DefaultListableBeanFactory)beanFactory).registerDisposableBean(containerClass.getName(), prodTestContainer);
} catch (ClassNotFoundException e) {
throw new RuntimeException(e);
}
}
testValues =
testValues.and(
"spring.r2dbc.url=" + prodTestContainer.getTestContainer().getJdbcUrl().replace("jdbc", "r2dbc") + ""
);
testValues = testValues.and("spring.r2dbc.username=" + prodTestContainer.getTestContainer().getUsername());
testValues = testValues.and("spring.r2dbc.password=" + prodTestContainer.getTestContainer().getPassword());
testValues = testValues.and("spring.liquibase.url=" + prodTestContainer.getTestContainer().getJdbcUrl() + "");
}
}
EmbeddedKafka kafkaAnnotation = AnnotatedElementUtils.findMergedAnnotation(testClass, EmbeddedKafka.class);
if (null != kafkaAnnotation) {
log.debug("detected the EmbeddedKafka annotation on class {}", testClass.getName());
log.info("Warming up the kafka broker");
if (null == kafkaBean) {
kafkaBean = beanFactory.createBean(KafkaTestContainer.class);
beanFactory.registerSingleton(KafkaTestContainer.class.getName(), kafkaBean);
// ((DefaultListableBeanFactory)beanFactory).registerDisposableBean(KafkaTestContainer.class.getName(), kafkaBean);
}
testValues =
testValues.and(
"spring.cloud.stream.kafka.binder.brokers=" +
kafkaBean.getKafkaContainer().getHost() +
':' +
kafkaBean.getKafkaContainer().getMappedPort(KafkaContainer.KAFKA_PORT)
);
}
testValues.applyTo(context);
};
}
}
######################## UPDATE 02.11.2022 ########################
Reproducible example project can be found here GITHUB

Join two Spark DStreams with complex nested structure

I have implemented a Spark custom receiver to receive DStreams from http/REST as follows
val mem1Total:ReceiverInputDStream[String] = ssc.receiverStream(new CustomReceiver("httpURL1"))
val dstreamMem1:DStream[String] = mem1Total.window(Durations.seconds(30), Durations.seconds(10))
val mem2Total:ReceiverInputDStream[String] = ssc.receiverStream(new CustomReceiver("httpURL2"))
val dstreamMem2:DStream[String] = mem2Total.window(Durations.seconds(30), Durations.seconds(10))
Each stream has the following schema
val schema = StructType(Seq(
StructField("status", StringType),
StructField("data", StructType(Seq(
StructField("resultType", StringType),
StructField("result", ArrayType(StructType(Array(
StructField("metric", StructType(Seq(StructField("application", StringType),
StructField("component", StringType),
StructField("instance", StringType)))),
StructField("value", ArrayType(StringType))
))))
)
))))
Here is what I how far i could go to extract the features that I need from dstreamMem1.
dstreamMem1.foreachRDD { rdd =>
import sparkSession.implicits._
val df = rdd.toDS()
.selectExpr("cast (value as string) as myData")
.select(from_json($"myData", schema).as("myDataEvent"))
.select($"myDataEvent.data.*")
.select(explode($"result").as("flat"))
.select($"flat.metric.*", $"flat.value".getItem(0).as("value1"), $"flat.value".getItem(1).as("value2"))
}
However I am cant figure out how to join dstreamMem1 with dstreamMem2 while also dealing with the complex structure. I can do a union operation on dstreamMem1 and dstreamMem2. But that wont work in my case because the "value" fields represent different things on each stream. Any ideas please?
Edit#1
Based on the following resources
How to create a custom streaming data source?
https://github.com/apache/spark/pull/21145
https://github.com/hienluu/structured-streaming-sources/tree/master/streaming-sources/src/main/scala/org/structured_streaming_sources/twitter
I have been able to create the following class to implement the following
class SSPStreamMicroBatchReader(options: DataSourceOptions) extends MicroBatchReader with Logging {
private val httpURL = options.get(SSPStreamingSource.HTTP_URL).orElse("") //.toString()
private val numPartitions = options.get(SSPStreamingSource.NUM_PARTITIONS).orElse("5").toInt
private val queueSize = options.get(SSPStreamingSource.QUEUE_SIZE).orElse("512").toInt
private val debugLevel = options.get(SSPStreamingSource.DEBUG_LEVEL).orElse("debug").toLowerCase
private val NO_DATA_OFFSET = SSPOffset(-1)
private var startOffset: SSPOffset = new SSPOffset(-1)
private var endOffset: SSPOffset = new SSPOffset(-1)
private var currentOffset: SSPOffset = new SSPOffset(-1)
private var lastReturnedOffset: SSPOffset = new SSPOffset(-2)
private var lastOffsetCommitted : SSPOffset = new SSPOffset(-1)
private var incomingEventCounter = 0;
private var stopped:Boolean = false
private var acsURLConn:HttpURLConnection = null
private var worker:Thread = null
private val sspList:ListBuffer[StreamingQueryStatus] = new ListBuffer[StreamingQueryStatus]()
private var sspQueue:BlockingQueue[StreamingQueryStatus] = null
initialize()
private def initialize(): Unit = synchronized {
log.warn(s"Inside initialize ....")
sspQueue = new ArrayBlockingQueue(queueSize)
new Thread("Socket Receiver") { log.warn(s"Inside thread ....")
override def run() {
log.warn(s"Inside run ....")
receive() }
}.start()
}
private def receive(): Unit = {
log.warn(s"Inside recieve() ....")
var userInput: String = null
acsURLConn = new AccessACS(httpURL).getACSConnection();
// Until stopped or connection broken continue reading
val reader = new BufferedReader(
new InputStreamReader(acsURLConn.getInputStream(), java.nio.charset.StandardCharsets.UTF_8))
userInput = reader.readLine()
while(!stopped) {
// poll tweets from queue
val tweet:StreamingQueryStatus = sspQueue.poll(100, TimeUnit.MILLISECONDS)
if (tweet != null) {
sspList.append(tweet);
currentOffset = currentOffset + 1
incomingEventCounter = incomingEventCounter + 1;
}
}
reader.close()
}
override def planInputPartitions(): java.util.List[InputPartition[org.apache.spark.sql.catalyst.InternalRow]] = {
synchronized {
log.warn(s"Inside planInputPartitions ....")
//initialize()
val startOrdinal = startOffset.offset.toInt + 1
val endOrdinal = endOffset.offset.toInt + 1
internalLog(s"createDataReaderFactories: sOrd: $startOrdinal, eOrd: $endOrdinal, " +
s"lastOffsetCommitted: $lastOffsetCommitted")
val newBlocks = synchronized {
val sliceStart = startOrdinal - lastOffsetCommitted.offset.toInt - 1
val sliceEnd = endOrdinal - lastOffsetCommitted.offset.toInt - 1
assert(sliceStart <= sliceEnd, s"sliceStart: $sliceStart sliceEnd: $sliceEnd")
sspList.slice(sliceStart, sliceEnd)
}
newBlocks.grouped(numPartitions).map { block =>
new SSPStreamBatchTask(block).asInstanceOf[InputPartition[org.apache.spark.sql.catalyst.InternalRow]]
}.toList.asJava
}
}
override def setOffsetRange(start: Optional[Offset],
end: Optional[Offset]): Unit = {
if (start.isPresent && start.get().asInstanceOf[SSPOffset].offset != currentOffset.offset) {
internalLog(s"setOffsetRange: start: $start, end: $end currentOffset: $currentOffset")
}
this.startOffset = start.orElse(NO_DATA_OFFSET).asInstanceOf[SSPOffset]
this.endOffset = end.orElse(currentOffset).asInstanceOf[SSPOffset]
}
override def getStartOffset(): Offset = {
internalLog("getStartOffset was called")
if (startOffset.offset == -1) {
throw new IllegalStateException("startOffset is -1")
}
startOffset
}
override def getEndOffset(): Offset = {
if (endOffset.offset == -1) {
currentOffset
} else {
if (lastReturnedOffset.offset < endOffset.offset) {
internalLog(s"** getEndOffset => $endOffset)")
lastReturnedOffset = endOffset
}
endOffset
}
}
override def commit(end: Offset): Unit = {
internalLog(s"** commit($end) lastOffsetCommitted: $lastOffsetCommitted")
val newOffset = SSPOffset.convert(end).getOrElse(
sys.error(s"SSPStreamMicroBatchReader.commit() received an offset ($end) that did not " +
s"originate with an instance of this class")
)
val offsetDiff = (newOffset.offset - lastOffsetCommitted.offset).toInt
if (offsetDiff < 0) {
sys.error(s"Offsets committed out of order: $lastOffsetCommitted followed by $end")
}
sspList.trimStart(offsetDiff)
lastOffsetCommitted = newOffset
}
override def stop(): Unit = {
log.warn(s"There is a total of $incomingEventCounter events that came in")
stopped = true
if (acsURLConn != null) {
try {
//acsURLConn.disconnect()
} catch {
case e: IOException =>
}
}
}
override def deserializeOffset(json: String): Offset = {
SSPOffset(json.toLong)
}
override def readSchema(): StructType = {
SSPStreamingSource.SCHEMA
}
private def internalLog(msg:String): Unit = {
debugLevel match {
case "warn" => log.warn(msg)
case "info" => log.info(msg)
case "debug" => log.debug(msg)
case _ =>
}
}
}
object SSPStreamingSource {
val HTTP_URL = "httpURL"
val DEBUG_LEVEL = "debugLevel"
val NUM_PARTITIONS = "numPartitions"
val QUEUE_SIZE = "queueSize"
val SCHEMA = StructType(Seq(
StructField("status", StringType),
StructField("data", StructType(Seq(
StructField("resultType", StringType),
StructField("result", ArrayType(StructType(Array(
StructField("application", StringType),
StructField("component", StringType),
StructField("instance", StringType)))),
StructField("value", ArrayType(StringType))
))))
)
))))
}
class SSPStreamBatchTask(sspList:ListBuffer[StreamingQueryStatus]) extends InputPartition[Row] {
override def createPartitionReader(): InputPartitionReader[Row] = new SSPStreamBatchReader(sspList)
}
class SSPStreamBatchReader(sspList:ListBuffer[StreamingQueryStatus]) extends InputPartitionReader[Row] {
private var currentIdx = -1
override def next(): Boolean = {
// Return true as long as the new index is in the seq.
currentIdx += 1
currentIdx < sspList.size
}
override def get(): Row = {
val tweet = sspList(currentIdx)
Row(tweet.json)
}
override def close(): Unit = {}
}
Further this class is used as follows
val a = sparkSession.readStream
.format(providerClassName)
.option(SSPStreamingSource.HTTP_URL, httpMemTotal)
.load()
a.printSchema()
a.writeStream
.outputMode(OutputMode.Append())
.option("checkpointLocation", "/home/localCheckpoint1") //local
.start("/home/sparkoutput/aa00a01")
Here is the error. Im yet to crack this :(
18/11/29 13:33:28 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/11/29 13:33:28 WARN SSPStreamingSource: Inside createMicroBatchReader() ....
18/11/29 13:33:28 WARN SSPStreamMicroBatchReader: Inside initialize ....
18/11/29 13:33:28 WARN SSPStreamMicroBatchReader: Inside thread ....
18/11/29 13:33:28 WARN SSPStreamMicroBatchReader: There is a total of 0 events that came in
18/11/29 13:33:28 WARN SSPStreamMicroBatchReader: Inside run ....
18/11/29 13:33:28 WARN SSPStreamMicroBatchReader: Inside recieve() ....
root
|-- status: string (nullable = true)
|-- data: struct (nullable = true)
| |-- resultType: string (nullable = true)
| |-- result: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- metric: struct (nullable = true)
| | | | |-- application: string (nullable = true)
| | | | |-- component: string (nullable = true)
| | | | |-- instance: string (nullable = true)
| | | |-- value: array (nullable = true)
| | | | |-- element: string (containsNull = true)
18/11/29 13:33:30 INFO MicroBatchExecution: Starting [id = f15252df-96d8-45b4-a6db-83fd4c7aed71, runId = 65a6dc28-5eb4-468a-80c3-f547504689d7]. Use file:///home/localCheckpoint1 to store the query checkpoint.
18/11/29 13:33:30 WARN SSPStreamingSource: Inside createMicroBatchReader() ....
18/11/29 13:33:30 WARN SSPStreamMicroBatchReader: Inside initialize ....
18/11/29 13:33:30 ERROR StreamingContext: Error starting the context, marking it as stopped
java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:168)
at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:513)
at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:573)
at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:572)
at myproject.spark.predictive_monitoring.predictmyproject$.run(predictmyproject.scala:99)
at myproject.spark.predictive_monitoring.predictmyproject$.main(predictmyproject.scala:31)
at myproject.spark.predictive_monitoring.predictmyproject.main(predictmyproject.scala)
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:168)
at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:513)
at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:573)
at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:572)
at myproject.spark.predictive_monitoring.predictmyproject$.run(predictmyproject.scala:99)
at myproject.spark.predictive_monitoring.predictmyproject$.main(predictmyproject.scala:31)
at myproject.spark.predictive_monitoring.predictmyproject.main(predictmyproject.scala)
18/11/29 13:33:30 INFO SparkContext: Invoking stop() from shutdown hook
18/11/29 13:33:30 WARN SSPStreamMicroBatchReader: Inside thread ....
18/11/29 13:33:30 WARN SSPStreamMicroBatchReader: Inside run ....
18/11/29 13:33:30 WARN SSPStreamMicroBatchReader: Inside recieve() ....
18/11/29 13:33:30 INFO MicroBatchExecution: Using MicroBatchReader [myproject.spark.predictive_monitoring.SSPStreamMicroBatchReader#74cc1ddc] from DataSourceV2 named 'myproject.spark.predictive_monitoring.SSPStreamingSource' [myproject.spark.predictive_monitoring.SSPStreamingSource#7e503c3]
18/11/29 13:33:30 INFO SparkUI: Stopped Spark web UI at http://172.16.221.232:4040
18/11/29 13:33:30 ERROR MicroBatchExecution: Query [id = f15252df-96d8-45b4-a6db-83fd4c7aed71, runId = 65a6dc28-5eb4-468a-80c3-f547504689d7] terminated with error
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:838)
org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:85)
myproject.spark.predictive_monitoring.predictmyproject$.run(predictmyproject.scala:37)
myproject.spark.predictive_monitoring.predictmyproject$.main(predictmyproject.scala:31)
myproject.spark.predictive_monitoring.predictmyproject.main(predictmyproject.scala)
The currently active SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:838)
org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:85)
myproject.spark.predictive_monitoring.predictmyproject$.run(predictmyproject.scala:37)
myproject.spark.predictive_monitoring.predictmyproject$.main(predictmyproject.scala:31)
myproject.spark.predictive_monitoring.predictmyproject.main(predictmyproject.scala)
at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:100)
at org.apache.spark.sql.SparkSession.<init>(SparkSession.scala:91)
at org.apache.spark.sql.SparkSession.cloneSession(SparkSession.scala:256)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:268)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
18/11/29 13:33:30 WARN SSPStreamMicroBatchReader: There is a total of 0 events that came in
18/11/29 13:33:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/11/29 13:33:30 INFO MemoryStore: MemoryStore cleared
18/11/29 13:33:30 INFO BlockManager: BlockManager stopped
18/11/29 13:33:30 INFO BlockManagerMaster: BlockManagerMaster stopped
18/11/29 13:33:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/11/29 13:33:31 INFO SparkContext: Successfully stopped SparkContext

print spark streaming not working

I have written a simple spark streaming receiver but I have trouble in processing the stream..The data is received but its not processed by spark streaming.
public class JavaCustomReceiver extends Receiver<String> {
private static final Pattern SPACE = Pattern.compile(" ");
public static void main(String[] args) throws Exception {
// if (args.length < 2) {
// System.err.println("Usage: JavaCustomReceiver <hostname> <port>");
// System.exit(1);
// }
// StreamingExamples.setStreamingLogLevels();
LogManager.getRootLogger().setLevel(Level.WARN);
Log log = LogFactory.getLog("EXECUTOR-LOG:");
// Create the context with a 1 second batch size
SparkConf sparkConf = new SparkConf().setAppName("JavaCustomReceiver").setMaster("local[4]");
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(10000));
// Create an input stream with the custom receiver on target ip:port and count the
// words in input stream of \n delimited text (eg. generated by 'nc')
JavaReceiverInputDStream<String> lines = ssc.receiverStream(
new JavaCustomReceiver("localhost", 9999));
// JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(""))).iterator();
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(" ")).iterator());
words.foreachRDD( x-> {
x.collect().stream().forEach(n-> System.out.println("item of list: "+n));
});
words.foreachRDD( rdd -> {
if (!rdd.isEmpty()) System.out.println("its empty"); });
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1))
.reduceByKey((i1, i2) -> i1 + i2);
wordCounts.count();
System.out.println("WordCounts == " + wordCounts);
wordCounts.print();
log.warn("This is a test message");
log.warn(wordCounts.count());
ssc.start();
ssc.awaitTermination();
}
// ============= Receiver code that receives data over a socket
// ==============
String host = null;
int port = -1;
public JavaCustomReceiver(String host_, int port_) {
super(StorageLevel.MEMORY_AND_DISK_2());
host = host_;
port = port_;
}
#Override
public void onStart() {
// Start the thread that receives data over a connection
new Thread(this::receive).start();
}
#Override
public void onStop() {
// There is nothing much to do as the thread calling receive()
// is designed to stop by itself isStopped() returns false
}
/** Create a socket connection and receive data until receiver is stopped */
private void receive() {
try {
Socket socket = null;
BufferedReader reader = null;
try {
// connect to the server
socket = new Socket(host, port);
reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), StandardCharsets.UTF_8));
// Until stopped or connection broken continue reading
String userInput;
while (!isStopped() && (userInput = reader.readLine()) != null) {
System.out.println("Received data '" + userInput + "'");
store(userInput);
}
} finally {
Closeables.close(reader, /* swallowIOException = */ true);
Closeables.close(socket, /* swallowIOException = */ true);
}
// Restart in an attempt to connect again when server is active
// again
restart("Trying to connect again");
} catch (ConnectException ce) {
// restart if could not connect to server
restart("Could not connect", ce);
} catch (Throwable t) {
restart("Error receiving data", t);
}
}
}
Here is my logs - You can see the testdata being displayed but after that I dont see the contents of the result being printed at all..
I have set the master to local[2]/local[4] but nothing works.
Received data 'testdata'
17/10/04 11:43:14 INFO MemoryStore: Block input-0-1507131793800 stored as values in memory (estimated size 80.0 B, free 912.1 MB)
17/10/04 11:43:14 INFO BlockManagerInfo: Added input-0-1507131793800 in memory on 10.99.1.116:50088 (size: 80.0 B, free: 912.2 MB)
17/10/04 11:43:14 WARN BlockManager: Block input-0-1507131793800 replicated to only 0 peer(s) instead of 1 peers
17/10/04 11:43:14 INFO BlockGenerator: Pushed block input-0-1507131793800
17/10/04 11:43:20 INFO JobScheduler: Added jobs for time 1507131800000 ms
17/10/04 11:43:20 INFO JobScheduler: Starting job streaming job 1507131800000 ms.0 from job set of time 1507131800000 ms
17/10/04 11:43:20 INFO SparkContext: Starting job: collect at JavaCustomReceiver.java:61
17/10/04 11:43:20 INFO DAGScheduler: Got job 44 (collect at JavaCustomReceiver.java:61) with 1 output partitions
17/10/04 11:43:20 INFO DAGScheduler: Final stage: ResultStage 59 (collect at JavaCustomReceiver.java:61)
17/10/04 11:43:20 INFO DAGScheduler: Parents of final stage: List()
17/10/04 11:43:20 INFO DAGScheduler: Missing parents: List()
17/10/04 11:43:20 INFO DAGScheduler: Submitting ResultStage 59 (MapPartitionsRDD[58] at flatMap at JavaCustomReceiver.java:59), which has no missing parents
17/10/04 11:43:20 INFO MemoryStore: Block broadcast_32 stored as values in memory (estimated size 2.7 KB, free 912.1 MB)
17/10/04 11:43:20 INFO MemoryStore: Block broadcast_32_piece0 stored as bytes in memory (estimated size 1735.0 B, free 912.1 MB)
17/10/04 11:43:20 INFO BlockManagerInfo: Added broadcast_32_piece0 in memory on 10.99.1.116:50088 (size: 1735.0 B, free: 912.2 MB)
17/10/04 11:43:20 INFO SparkContext: Created broadcast 32 from broadcast at DAGScheduler.scala:1012
17/10/04 11:43:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 59 (MapPartitionsRDD[58] at flatMap at JavaCustomReceiver.java:59)
17/10/04 11:43:20 INFO TaskSchedulerImpl: Adding task set 59.0 with 1 tasks
17/10/04 11:43:20 INFO TaskSetManager: Starting task 0.0 in stage 59.0 (TID 60, localhost, partition 0, ANY, 5681 bytes)
17/10/04 11:43:20 INFO Executor: Running task 0.0 in stage 59.0 (TID 60)

Found the answer .
Why doesn't the Flink SocketTextStreamWordCount work?
I changed my program to save the output on the text file and it was saving the streaming data perfectly.
Thanks
Adi

Geomesa + SparkSQL integration issue

My setup is a 3-nodes cluster running in AWS. I already ingested my data (30 millon rows) and have no problems when running queries using jupyter notebook. But now I am trying to run a query using spark and java, as seen in the following snippet.
public class SparkSqlTest {
private static final Logger log = Logger.getLogger(SparkSqlTest.class);
public static void main(String[] args) {
Map<String, String> dsParams = new HashMap<>();
dsParams.put("instanceId", "gis");
dsParams.put("zookeepers", "server ip");
dsParams.put("user", "root");
dsParams.put("password", "secret");
dsParams.put("tableName", "posiciones");
try {
DataStoreFinder.getDataStore(dsParams);
SparkConf conf = new SparkConf();
conf.setAppName("testSpark");
conf.setMaster("yarn");
SparkContext sc = SparkContext.getOrCreate(conf);
SparkSession ss = SparkSession.builder().config(conf).getOrCreate();
Dataset<Row> df = ss.read()
.format("geomesa")
.options(dsParams)
.option("geomesa.feature", "posicion")
.load();
df.createOrReplaceTempView("posiciones");
long t1 = System.currentTimeMillis();
Dataset<Row> rows = ss.sql("select count(*) from posiciones where id_equipo = 148 and fecha_hora >= '2015-04-01' and fecha_hora <= '2015-04-30'");
long t2 = System.currentTimeMillis();
rows.show();
log.info("Tiempo de la consulta: " + ((t2 - t1) / 1000) + " segundos.");
} catch (Exception e) {
e.printStackTrace();
}
}
}
I upload the code in my master EC2 box (inside the jupyter notebook image), and run it using the following commands:
docker cp myjar-0.1.0.jar jupyter:myjar-0.1.0.jar
docker exec jupyter sh -c '$SPARK_HOME/bin/spark-submit --master yarn --class mypackage.SparkSqlTest file:///myjar-0.1.0.jar --jars $GEOMESA_SPARK_JARS'
But I got the following error:
17/09/15 19:45:01 INFO HSQLDB4AD417742A.ENGINE: dataFileCache open start
17/09/15 19:45:02 INFO execution.SparkSqlParser: Parsing command: posiciones
17/09/15 19:45:02 INFO execution.SparkSqlParser: Parsing command: select count(*) from posiciones where id_equipo = 148 and fecha_hora >= '2015-04-01' and fecha_hora <= '2015-04-30'
java.lang.RuntimeException: Could not find a SpatialRDDProvider
at org.locationtech.geomesa.spark.GeoMesaSpark$$anonfun$apply$2.apply(GeoMesaSpark.scala:33)
at org.locationtech.geomesa.spark.GeoMesaSpark$$anonfun$apply$2.apply(GeoMesaSpark.scala:33)
Any ideas why this happens?

I finally sorted out, my problem was that I did not include the following entries in my pom.xml
<dependency>
<groupId>org.locationtech.geomesa</groupId>
<artifactId>geomesa-accumulo-spark_2.11</artifactId>
<version>${geomesa.version}</version>
</dependency>
<dependency>
<groupId>org.locationtech.geomesa</groupId>
<artifactId>geomesa-spark-converter_2.11</artifactId>
<version>${geomesa.version}</version>
</dependency>

how to check the number of entries on local member

my prime member
public static void main(String[] args) throws InterruptedException {
Config config = new Config();
config.setProperty(GroupProperty.ENABLE_JMX, "true");
config.setProperty(GroupProperty.BACKPRESSURE_ENABLED, "true");
config.setProperty(GroupProperty.SLOW_OPERATION_DETECTOR_ENABLED, "true");
config.getSerializationConfig().addPortableFactory(1, new MyPortableFactory());
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
IMap<Integer, Rule> ruleMap = hz.getMap("ruleMap");
// TODO generate rule map data ; more than 100,000 entries
generateRuleMapData(ruleMap);
logger.info("generate rule finised!");
// TODO rule map index
// health check
PartitionService partitionService = hz.getPartitionService();
LocalMapStats mapStatistics = ruleMap.getLocalMapStats();
while (true) {
logger.info("isClusterSafe:{},isLocalMemberSafe:{},number of entries owned on this node = {}",
partitionService.isClusterSafe(), partitionService.isLocalMemberSafe(),
mapStatistics.getOwnedEntryCount());
Thread.sleep(1000);
}
}
logs
2016-06-28 13:53:05,048 INFO [main] b.PrimeMember (PrimeMember.java:41) - isClusterSafe:true,isLocalMemberSafe:true,number of entries owned on this node = 997465
2016-06-28 13:53:06,049 INFO [main] b.PrimeMember (PrimeMember.java:41) - isClusterSafe:true,isLocalMemberSafe:true,number of entries owned on this node = 997465
2016-06-28 13:53:07,050 INFO [main] b.PrimeMember (PrimeMember.java:41) - isClusterSafe:true,isLocalMemberSafe:true,number of entries owned on this node = 997465
my slave member
public static void main(String[] args) throws InterruptedException {
Config config = new Config();
config.setProperty(GroupProperty.ENABLE_JMX, "true");
config.setProperty(GroupProperty.BACKPRESSURE_ENABLED, "true");
config.setProperty(GroupProperty.SLOW_OPERATION_DETECTOR_ENABLED, "true");
HazelcastInstance hz = Hazelcast.newHazelcastInstance(config);
IMap<Integer, Rule> ruleMap = hz.getMap("ruleMap");
PartitionService partitionService = hz.getPartitionService();
LocalMapStats mapStatistics = ruleMap.getLocalMapStats();
while (true) {
logger.info("isClusterSafe:{},isLocalMemberSafe:{},number of entries owned on this node = {}",
partitionService.isClusterSafe(), partitionService.isLocalMemberSafe(),
mapStatistics.getOwnedEntryCount());
Thread.sleep(1000);
}
}
logs
2016-06-28 14:05:53,543 INFO [main] b.SlaveMember (SlaveMember.java:31) - isClusterSafe:false,isLocalMemberSafe:false,number of entries owned on this node = 412441
2016-06-28 14:05:54,556 INFO [main] b.SlaveMember (SlaveMember.java:31) - isClusterSafe:false,isLocalMemberSafe:false,number of entries owned on this node = 412441
2016-06-28 14:05:55,563 INFO [main] b.SlaveMember (SlaveMember.java:31) - isClusterSafe:false,isLocalMemberSafe:false,number of entries owned on this node = 412441
2016-06-28 14:05:56,578 INFO [main] b.SlaveMember (SlaveMember.java:31) - isClusterSafe:false,isLocalMemberSafe:false,number of entries owned on this node = 412441
my question is :
why number of entries owned on prime member is not changed, after the cluster adds one slave member?

I should get statics per second.
while (true) {
LocalMapStats mapStatistics = ruleMap.getLocalMapStats();
logger.info(
"isClusterSafe:{},isLocalMemberSafe:{},rulemap.size:{}, number of entries owned on this node = {}",
partitionService.isClusterSafe(), partitionService.isLocalMemberSafe(), ruleMap.size(),
mapStatistics.getOwnedEntryCount());
Thread.sleep(1000);
}

Another option is to make use of localKeySet which returns the locally owned set of keys.
IMap::localKeySet.size()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Spring cloud dataflow does not flow - spring-integration

Related

JHipster Integration Tests with LocalStack testcontainers

Join two Spark DStreams with complex nested structure

print spark streaming not working

Geomesa + SparkSQL integration issue

how to check the number of entries on local member

Categories

Resources