Spark Streaming: how not to restart receiver after receiver's failure - apache-spark

We are using a custom spark receiver that reads streamed data from a provided http link. If the provided http link is incorrect, the receiver fails. The problem is that spark will continuously restart the receiver, and the application will never terminate. The question is how to tell Spark to terminate the application if the receiver fails.
This is an extract of our custom receiver:
def onStart() {
// Start the thread that receives data over a connection
new Thread("Receiver") {
override def run() { receive() }
}.start()
}
private def receive(): Unit = {
....
val response: CloseableHttpResponse = httpclient.execute(req)
try {
val sl = response.getStatusLine()
if (sl.getStatusCode != 200){
val errorMsg = "Error: " + sl.getStatusCode
val thrw = new RuntimeException(errorMsg)
stop(errorMsg, thrw)
} else {
...
store(doc)
}
We have a spark streaming application that uses this receiver:
val ssc = new StreamingContext(sparkConf, duration)
val changes = ssc.receiverStream(new CustomReceiver(...
...
ssc.start()
ssc.awaitTermination()
Everything works as expected if the receiver doesn't have errors. If the receiver fails (e.g. with a wrong http link), spark will continuously restart it and the application will never terminate.
16/05/31 17:03:38 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
16/05/31 17:03:38 ERROR ReceiverTracker: Receiver has been stopped. Try to restart it.
We just want to terminate the whole application if a receiver fails.

There is a way to control the life cycle of Custom receiver based spark-streaming applications. Define job progress listener for your application and keep track of what is happening.
class CustomReceiverListener extends StreamingJobProgressListener {
private boolean receiverStopped = false;
public CustomReceiverListener(StreamingContext ssc) { super(ssc);}
public boolean isReceiverStopped() {
return receiverStopped;
}
#Override
public void onReceiverStopped(StreamingListenerReceiverStopped receiverStopped) {
LOG.info("Update the flag field");
this.receiverStopped = true;
}
}
And in your driver, initialize a thread to monitor the state of receiverStopped flag. Driver will stop the stream app when this thread is finished. (Better approach is to define a callback method defined by the driver, that will stop the streaming application).
CustomReceiverListener listener = new CustomReceiverListener(ssc);
ssc.addStreamingListener(listener);
ssc.start();
Thread thread = new Thread(() -> {
while (!listener.isReceiverStopped()) {
LOG.info("Sleepy head...");
try {
Thread.sleep(2 * 1000); /*check after 2 seconds*/
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
thread.start();
thread.join();
LOG.info("Listener asked to die! Going to commit suicide :(");
ssc.stop(true, false);
Note: In case of multiple instances of your receivers, change the implementation of CustomReceiverListener to make sure all the receiver instances are stopped.

It seems that the scheduling in Spark Streaming works in such a way that ReceiverTracker will keep restarting a failed receiver until ReceiverTracker is not stopped itself.
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L618
To stop ReceiverTracker, we need to stop the whole application. Thus, it seems there is no a way to control this process from a receiver itself.

Related

Why SingleThreadExecutor throws OutOfMemoryError in Java

I have a Message Producer (RabbitMq) and a springboot service that receives messages from this a Queue (RabbitMQ). S the amount of messages from this Queue is unknown as it depends on the traffic or amount of messages pushed to this rabbitMQ. After messages have been received from this RabbitMq into my Springboot service, I then store those messages locally in an ArrayDeque. Every message that comes through is stored in this local queue and then send to the socket to another Application. These messages have to be send in the order that they arrived from the RabbitMQ.
Here is a snippet of my code.
public void addMessageToQueue(CML cml) throws ParseException {
if (cml != null) {
AgentEventData agentEventData = setAgentEventData(cml);
log.info("Populated AgentEventData: {} ", agentEventData);
MessageProcessor.getMessageQueue().getMessageQueue().add(agentEventData);
// ExecutorService executorService = Executors.newFixedThreadPool(MessageProcessor.getMessageQueue().getMessageQueue().size());
log.info("Message QUEUE Size: {}", MessageProcessor.getMessageQueue().getMessageQueue().size());
QUEUE_MONITOR.setCachedQueue(MessageProcessor.getMessageQueue());
/**
* Queue has already methods for monitoring events, no need for a seperate object
* */
executeTasks();
} else {
log.error("CML Message is NULL, Message Cannot be added to the Message Queue.");
}
}
private static void executeTasks() {
ExecutorService executorService = Executors.newSingleThreadExecutor();
try {
executorService.execute(new MessageProcessor());
} catch (Exception e) {
log.error("Exception when executing Task: {}", e.getMessage());
}
log.info("Shutting down Executor Service........");
executorService.shutdown();
log.info("Executor Service Shutdown : {}", executorService.isShutdown());
}
I tried using a newSingleThreadExecutor as shown in the executeTasks() method but after some time when my app is running in the server, i get the Consumer thread error, java.lang.OutOfMemoryError, unable to create native thread. Possibly out of memory or process/resource limits reached.
I then tried newFixedThreadExecutor(10), and still get the same error after some time.
What is it that i am doing wrong and which approach best fit my App/Service ?

Socket Time Out Exception on Async Java Http Service calls using ThreadPoolTaskExecutor

I am new in using ThreadPoolTaskExecutor and CompletableFuture for Async Operations. Since the service I am developing should have good speed and performance should not be impacted so I had to leverage the use of Async calls. However, I am not sure what's wrong in the code, the service performs fine on individual requests but once load occurs on the service or load test is performed, the service stops responding, it gets stuck in sending request from postman after the load and it shows "Socket Timeout Exception: Read Time out" in Soap UI Load test results.
Below is the service level operation:
#Async
public CompletableFuture<FindWorkOrdersResponse> findWorkOrders(BusinessUnit businessUnit,
FindWorkOrdersRequest findWorkOrdersRequest) {
CompletableFuture<FindWorkOrdersResponse> result = null;
try {
result = CompletableFuture.supplyAsync(() -> {
return ibsAdapter.findWorkOrders(businessUnit, findWorkOrdersRequest);
}, taskExecutor).thenApplyAsync(findWorkOrdersResp -> {
if (findWorkOrdersResp.getWorkOrders() != null && !findWorkOrdersResp.getWorkOrders().isEmpty()) {
findWorkOrdersResp.getWorkOrders().parallelStream().forEach(wo -> {
wo.setInstallerInfo(mustangAdapter.getCustomer(businessUnit, wo.getServiceProviderId()));
});
}
return findWorkOrdersResp;
}, taskExecutor);
} catch (RejectedExecutionException e) {
logger.warn(
"FindWorkOrders was rejected for async execution, falling back to non-critical execution mode: {}",
e.getMessage());
}
return result;
}
The data returned from first service call is a list of size 50 and for each index second service call is being made.
The task executor settings are below:
#Bean("threadPoolTaskExecutor")
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setThreadNamePrefix("Async-");
executor.setCorePoolSize(MdcConstants.CORE_POOL_SIZE);
executor.setMaxPoolSize(MdcConstants.MAX_POOL_SIZE);
executor.setQueueCapacity(MdcConstants.QUEUE_CAPACITY);
//executor.setWaitForTasksToCompleteOnShutdown(true);
executor.setRejectedExecutionHandler(new RejectedExecutionHandlerImpl());
executor.setAllowCoreThreadTimeOut(true);
executor.setKeepAliveSeconds(2);
executor.setTaskDecorator(runnable -> {
Map<String, String> mdcContext = MDC.getCopyOfContextMap();
return () -> {
try {
if (mdcContext != null) {
MDC.setContextMap(mdcContext);
}
runnable.run();
} finally {
MDC.clear();
}
};
});
return executor;
}
The core pool size and max pool size are set to below values:
public static final Integer CORE_POOL_SIZE=2;
public static final Integer MAX_POOL_SIZE=100;
public static final Integer QUEUE_CAPACITY=0;
Socket time out of ibs adapter service call is set to 600000. Increasing this also doesn't fix the issue. The second service call is very small and returns data in milliseconds and that is probably not causing the issue. Anyone please let help me fix this. I thought increasing core pool size, max pool size or socket timeout will fix it but it didn't. Socket timeout exception read timeout should not occur on increasing socket timeout but it doesn't fix it.

How to read messages in MQs using spark streaming,i.e ZeroMQ,RabbitMQ?

As the spark docs says,it support kafka as data streaming source.but I use ZeroMQ,And there is not a ZeroMQUtils.so how can I use it? and generally,how about other MQs. I am totally new to spark and spark streaming, so I am sorry if the question is stupid.Could anyone give me a solution.Thanks
BTW,I use python.
Update, I finally did it in java with a Custom Receiver. Below is my solution
public class ZeroMQReceiver extends Receiver<T> {
private static final ObjectMapper mapper = new ObjectMapper();
public ZeroMQReceiver() {
super(StorageLevel.MEMORY_AND_DISK_2());
}
#Override
public void onStart() {
// Start the thread that receives data over a connection
new Thread(this::receive).start();
}
#Override
public void onStop() {
// There is nothing much to do as the thread calling receive()
// is designed to stop by itself if isStopped() returns false
}
/** Create a socket connection and receive data until receiver is stopped */
private void receive() {
String message = null;
try {
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket subscriber = context.socket(ZMQ.SUB);
subscriber.connect("tcp://ip:port");
subscriber.subscribe("".getBytes());
// Until stopped or connection broken continue reading
while (!isStopped() && (message = subscriber.recvStr()) != null) {
List<T> results = mapper.readValue(message,
new TypeReference<List<T>>(){} );
for (T item : results) {
store(item);
}
}
// Restart in an attempt to connect again when server is active again
restart("Trying to connect again");
} catch(Throwable t) {
// restart if there is any other error
restart("Error receiving data", t);
}
}
}
I assume you are talking about Structured Streaming.
I am not familiar with ZeroMQ, but an important point in Spark Structured Streaming sources is replayability (in order to ensure fault tolerance), which, if I understand correctly, ZeroMQ doesn't deliver out-of-the-box.
A practical approach would be buffering the data either in Kafka and using the KafkaSource or as files in a (local FS/NFS, HDFS, S3) directory and using the FileSource for reading. Cf. Spark Docs. If you use the FileSource, make sure not to append anything to an existing file in the FileSource's input directory, but move them into the directory atomically.

How can I parallel consumption kafka with spark streaming? I set concurrentJobs but something error [duplicate]

The doc of kafka give an approach about with following describes:
One Consumer Per Thread:A simple option is to give each thread its own consumer > instance.
My code:
public class KafkaConsumerRunner implements Runnable {
private final AtomicBoolean closed = new AtomicBoolean(false);
private final CloudKafkaConsumer consumer;
private final String topicName;
public KafkaConsumerRunner(CloudKafkaConsumer consumer, String topicName) {
this.consumer = consumer;
this.topicName = topicName;
}
#Override
public void run() {
try {
this.consumer.subscribe(topicName);
ConsumerRecords<String, String> records;
while (!closed.get()) {
synchronized (consumer) {
records = consumer.poll(100);
}
for (ConsumerRecord<String, String> tmp : records) {
System.out.println(tmp.value());
}
}
} catch (WakeupException e) {
// Ignore exception if closing
System.out.println(e);
//if (!closed.get()) throw e;
}
}
// Shutdown hook which can be called from a separate thread
public void shutdown() {
closed.set(true);
consumer.wakeup();
}
public static void main(String[] args) {
CloudKafkaConsumer kafkaConsumer = KafkaConsumerBuilder.builder()
.withBootstrapServers("172.31.1.159:9092")
.withGroupId("test")
.build();
ExecutorService executorService = Executors.newFixedThreadPool(5);
executorService.execute(new KafkaConsumerRunner(kafkaConsumer, "log"));
executorService.execute(new KafkaConsumerRunner(kafkaConsumer, "log.info"));
executorService.shutdown();
}
}
but it doesn't work and throws an exception:
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
Furthermore, I read the source of Flink (an open source platform for distributed stream and batch data processing). Flink using multi-thread consumer is similar to mine.
long pollTimeout = Long.parseLong(flinkKafkaConsumer.properties.getProperty(KEY_POLL_TIMEOUT, Long.toString(DEFAULT_POLL_TIMEOUT)));
pollLoop: while (running) {
ConsumerRecords<byte[], byte[]> records;
//noinspection SynchronizeOnNonFinalField
synchronized (flinkKafkaConsumer.consumer) {
try {
records = flinkKafkaConsumer.consumer.poll(pollTimeout);
} catch (WakeupException we) {
if (running) {
throw we;
}
// leave loop
continue;
}
}
flink code of mutli-thread
What's wrong?
Kafka consumer is not thread safe. As you pointed out in your question, the document stated that
A simple option is to give each thread its own consumer instance
But in your code, you have the same consumer instance wrapped by different KafkaConsumerRunner instances. Thus multiple threads are accessing the same consumer instance. The kafka documentation clearly stated
The Kafka consumer is NOT thread-safe. All network I/O happens in the
thread of the application making the call. It is the responsibility of
the user to ensure that multi-threaded access is properly
synchronized. Un-synchronized access will result in
ConcurrentModificationException.
That's exactly the exception you received.
It is throwing the exception on your call to subscribe. this.consumer.subscribe(topicName);
Move that block into a synchronized block like this:
#Override
public void run() {
try {
synchronized (consumer) {
this.consumer.subscribe(topicName);
}
ConsumerRecords<String, String> records;
while (!closed.get()) {
synchronized (consumer) {
records = consumer.poll(100);
}
for (ConsumerRecord<String, String> tmp : records) {
System.out.println(tmp.value());
}
}
} catch (WakeupException e) {
// Ignore exception if closing
System.out.println(e);
//if (!closed.get()) throw e;
}
}
Maybe is not your case, but if you are mergin processing of data of serveral topics, then you can read data from multiple topics with the same consumer. If not, then is preferable to create separate jobs consuming each topic.

Azure web jobs - parallel message processing from queues not working properly

I need to provision SharePoint Online team rooms using azure queues and web jobs.
I have created a console application and published as continuous web job with the following settings:
config.Queues.BatchSize = 1;
config.Queues.MaxDequeueCount = 4;
config.Queues.MaxPollingInterval = TimeSpan.FromSeconds(15);
JobHost host = new JobHost();
host.RunAndBlock();
The trigger function looks like this:
public static void TriggerFunction([QueueTrigger("messagequeue")]CloudQueueMessage message)
{
ProcessQueueMsg(message.AsString);
}
Inside ProcessQueueMsg function i'm deserialising the received json message in a class and run the following operations:
I'm creating a sub site in an existing site collection;
Using Pnp provisioning engine i'm provisioning content in the sub
site (lists,upload files,permissions,quick lunch etc.).
If in the queue I have only one message to process, everything works correct.
However, when I send two messages in the queue with a few seconds delay,while the first message is processed, the next one is overwriting the class properties and the first message is finished.
Tried to run each message in a separate thread but the trigger functions are marked as succeeded before the processing of the message inside my function.This way I have no control for potential exceptions / message dequeue.
Tried also to limit the number of threads to 1 and use semaphore, but had the same behavior:
private const int NrOfThreads = 1;
private static readonly SemaphoreSlim semaphore_ = new SemaphoreSlim(NrOfThreads, NrOfThreads);
//Inside TriggerFunction
try
{
semaphore_.Wait();
new Thread(ThreadProc).Start();
}
catch (Exception e)
{
Console.Error.WriteLine(e);
}
public static void ThreadProc()
{
try
{
DoWork();
}
catch (Exception e)
{
Console.Error.WriteLine(">>> Error: {0}", e);
}
finally
{
// release a slot for another thread
semaphore_.Release();
}
}
public static void DoWork()
{
Console.WriteLine("This is a web job invocation: Process Id: {0}, Thread Id: {1}.", System.Diagnostics.Process.GetCurrentProcess().Id, Thread.CurrentThread.ManagedThreadId);
ProcessQueueMsg();
Console.WriteLine(">> Thread Done. Processing next message.");
}
Is there a way I can run my processing function for parallel messages in order to provision my sites without interfering?
Please let me know if you need more details.
Thank you in advance!
You're not passing in the config object to your JobHost on construction - that's why your config settings aren't having an effect. Change your code to:
JobHost host = new JobHost(config);
host.RunAndBlock();

Resources