JeroMQ Subscriber in a Runnable - multithreading

I'm trying to embed ZMQ subscriber in a Runnable.
I'm able to start the Runnable for the first time and everything seems okay.
The problem is when I interrupt the Thread and try to start a new Thread, the subscriber does not get any messages. For example:
I have a publisher runnable
class ZMQPublisherRunnable() extends Runnable {
override def run() {
val ZMQcontext = ZMQ.context(1)
val publisher = ZMQcontext.socket(ZMQ.PUB)
var count = 0
publisher.connect(s"tcp://127.0.0.1:16666")
while (!Thread.currentThread().isInterrupted) {
try {
println(s"PUBLISHER -> $count")
publisher.send(s"PUBLISHER -> $count")
count += 1
Thread.sleep(1000)
}
catch {
case e: Exception =>
println(e.getMessage)
publisher.disconnect(s"tcp://127.0.0.1:16666")
ZMQcontext.close()
}
}
}
}
I have a Subscriber Runnable:
class ZMQSubscriberRunnable1() extends Runnable {
override def run() {
println("STARTING SUBSCRIBER")
val ZMQcontext = ZMQ.context(1)
val subscriber = ZMQcontext.socket(ZMQ.SUB)
subscriber.subscribe("".getBytes)
subscriber.bind(s"tcp://127.0.0.1:16666")
while (!Thread.currentThread().isInterrupted) {
try {
println("waiting")
val mesg = new String(subscriber.recv(0))
println(s"SUBSCRIBER -> $mesg")
}
catch {
case e: Exception =>
println(e.getMessage)
subscriber.unbind("tcp://127.0.0.1:16666")
subscriber.close()
ZMQcontext.close()
}
}
}
}
My main code looks like this:
object Application extends App {
val zmqPUB = new ZMQPublisherRunnable
val zmqThreadPUB = new Thread(zmqPUB, "MY_PUB")
zmqThreadPUB.setDaemon(true)
zmqThreadPUB.start()
val zmqRunnable = new ZMQSubscriberRunnable1
val zmqThread = new Thread(zmqRunnable, "MY_TEST")
zmqThread.setDaemon(true)
zmqThread.start()
Thread.sleep(10000)
zmqThread.interrupt()
zmqThread.join()
Thread.sleep(2000)
val zmqRunnable_2 = new ZMQSubscriberRunnable1
val zmqThread_2 = new Thread(zmqRunnable_2, "MY_TEST_2")
zmqThread_2.setDaemon(true)
zmqThread_2.start()
Thread.sleep(10000)
zmqThread_2.interrupt()
zmqThread_2.join()
}
The first time I start the Subscriber, I'm able to receive all messages:
STARTING SUBSCRIBER
PUBLISHER -> 0
waiting
PUBLISHER -> 1
SUBSCRIBER -> PUBLISHER -> 1
waiting
PUBLISHER -> 2
SUBSCRIBER -> PUBLISHER -> 2
waiting
PUBLISHER -> 3
SUBSCRIBER -> PUBLISHER -> 3
waiting
...
Once I interrupt the Thread and start a new one from the same Runnable, I'm not able to read messages anymore. It is waiting forever
STARTING SUBSCRIBER
waiting
PUBLISHER -> 13
PUBLISHER -> 14
PUBLISHER -> 15
PUBLISHER -> 16
PUBLISHER -> 17
...
Any insights about what I'm doing wrong?
Thanks

JeroMQ is not Thread.interrupt safe.
To work around it you have to stop the ZMQContext before you call the Thread.interrupt
Instantiate the ZMQContext outside the Runnable
Pass the ZMQContext as an argument to the ZMQ Runnable (You can also use it is a global variable)
Call zmqContext.term()
Call zmqSubThread.interrupt()
Call zmqSubThread.join()
For more details take a look at: https://github.com/zeromq/jeromq/issues/116
My subscriber Runnable looks like:
class ZMQSubscriberRunnable(zmqContext:ZMQ.Context, port: Int, ip: String, topic: String) extends Runnable {
override def run() {
var contextTerminated = false
val subscriber = zmqContext.socket(ZMQ.SUB)
subscriber.subscribe(topic.getBytes)
subscriber.bind(s"tcp://$ip:$port")
while (!contextTerminated && !Thread.currentThread().isInterrupted) {
try {
println(new String(subscriber.recv(0)))
}
catch {
case e: ZMQException if e.getErrorCode == ZMQ.Error.ETERM.getCode =>
contextTerminated = true
subscriber.close()
case e: Exception =>
zmqContext.term()
subscriber.close()
}
}
}
}
To interrupt the Thread:
zmqContext.term()
zmqSubThread.interrupt()
zmqSubThread.join()

Related

Get return value from thread, is this Kotlin code thread safe?

I would like to run some treads, wait till all of them are finished and get the results.
Possible way to do that would be in the code below. Is it thread safe though?
import kotlin.concurrent.thread
sealed class Errorneous<R>
data class Success<R>(val result: R) : Errorneous<R>()
data class Fail<R>(val error: Exception) : Errorneous<R>()
fun <R> thread_with_result(fn: () -> R): (() -> Errorneous<R>) {
var r: Errorneous<R>? = null
val t = thread {
r = try { Success(fn()) } catch (e: Exception) { Fail(e) }
}
return {
t.join()
r!!
}
}
fun main() {
val tasks = listOf({ 2 * 2 }, { 3 * 3 })
val results = tasks
.map{ thread_with_result(it) }
.map{ it() }
println(results)
}
P.S.
Are there better built-in tools in Kotlin to do that? Like process 10000 tasks with pool of 10 threads?
It should be threads, not coroutines, as it will be used with legacy code and I don't know if it works well with coroutines.
Seems like Java has Executors that doing exactly that
fun <R> execute_in_parallel(tasks: List<() -> R>, threads: Int): List<Errorneous<R>> {
val executor = Executors.newFixedThreadPool(threads)
val fresults = executor.invokeAll(tasks.map { task ->
Callable<Errorneous<R>> {
try { Success(task()) } catch (e: Exception) { Fail(e) }
}
})
return fresults.map { future -> future.get() }
}

Spark Streaming: Exception thrown while > writing record: BatchAllocationEvent

I shut down a Spark StreamingContext with the following code.
Essentially a thread monitors for a boolean switch and then calls StreamingContext.stop(true,true)
Everything seems to process and all my data appears to have been collected. However, I get the following exception on shutdown.
Can I ignore? It looks like there is potential for data loss.
18/03/07 11:46:40 WARN ReceivedBlockTracker: Exception thrown while
writing record: BatchAllocationEvent(1520452000000
ms,AllocatedBlocks(Map(0 -> ArrayBuffer()))) to the WriteAheadLog.
java.lang.IllegalStateException: close() was called on
BatchedWriteAheadLog before write request with time 1520452000001
could be fulfilled.
at org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:86)
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:234)
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.allocateBlocksToBatch(ReceivedBlockTracker.scala:118)
at org.apache.spark.streaming.scheduler.ReceiverTracker.allocateBlocksToBatch(ReceiverTracker.scala:213)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
The Thread
var stopScc=false
private def stopSccThread(): Unit = {
val thread = new Thread {
override def run {
var continueRun=true
while (continueRun) {
logger.debug("Checking status")
if (stopScc == true) {
getSparkStreamingContext(fieldVariables).stop(true, true)
logger.info("Called Stop on Streaming Context")
continueRun=false
}
Thread.sleep(50)
}
}
}
thread.start
}
The Stream
#throws(classOf[IKodaMLException])
def startStream(ip: String, port: Int): Unit = {
try {
val ssc = getSparkStreamingContext(fieldVariables)
ssc.checkpoint("./ikoda/cp")
val lines = ssc.socketTextStream(ip, port, StorageLevel.MEMORY_AND_DISK_SER)
lines.print
val lmap = lines.map {
l =>
if (l.contains("IKODA_END_STREAM")) {
stopScc = true
}
l
}
lmap.foreachRDD {
r =>
if (r.count() > 0) {
logger.info(s"RECEIVED: ${r.toString()} first: ${r.first().toString}")
r.saveAsTextFile("./ikoda/test/test")
}
else {
logger.info("Empty RDD. No data received")
}
}
ssc.start()
ssc.awaitTermination()
}
catch {
case e: Exception =>
logger.error(e.getMessage, e)
throw new IKodaMLException(e.getMessage, e)
}
I had the same issue and calling close() instead of stop fixed it.

Number of threads in Akka keep increasing. What could be wrong?

Why does the thread count keep increasing ?
LOOK AT THE BOTTOM RIGHT in this image.
The overall flow is like this:
Akka HTTP Server API
-> on http request, sendMessageTo DataProcessingActor
-> sendMessageTo StorageActor
-> sendMessageTo DataBaseActor
-> sendMessageTo IndexActor
This is the definition of Akka HTTP API ( in pseudo-code ):
Main {
path("input/") {
post {
dataProcessingActor forward message
}
}
}
Below are the actor definitions ( in pseudo-code ):
DataProcessingActor {
case message =>
message = parse message
storageActor ! message
}
StorageActor {
case message =>
indexActor ! message
databaseActor ! message
}
DataBaseActor {
case message =>
val c = get monogCollection
c.store(message)
}
IndexActor {
case message =>
elasticSearch.index(message)
}
After I run this setup, and on sending multiple HTTP requsts to "input/" HTTP endpoint, I get errors:
for( i <- 0 until 1000000) {
post("input/", someMessage+i)
}
Error:
[ERROR] [04/22/2016 13:20:54.016] [Main-akka.actor.default-dispatcher-15] [akka.tcp://Main#127.0.0.1:2558/system/IO-TCP/selectors/$a/0] Accept error: could not accept new connection
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at akka.io.TcpListener.acceptAllPending(TcpListener.scala:107)
at akka.io.TcpListener$$anonfun$bound$1.applyOrElse(TcpListener.scala:82)
at akka.actor.Actor$class.aroundReceive(Actor.scala:480)
at akka.io.TcpListener.aroundReceive(TcpListener.scala:32)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
EDIT 1
Here is the application.conf file being used:
akka {
loglevel = "INFO"
stdout-loglevel = "INFO"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
actor {
default-dispatcher {
throughput = 10
}
}
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 2558
}
}
}
I figured out that ElasticSearch was the problem. I am using Java API for ElasticSearch, and that is leaking sockets because of the way it was being used from Java API. Now resolved as described here.
Below is the Elastic Search client service using Java API
trait ESClient { def getClient(): Client }
case class ElasticSearchService() extends ESClient {
def getClient(): Client = {
val client = new TransportClient().addTransportAddress(
new InetSocketTransportAddress(Config.ES_HOST, Config.ES_PORT)
)
client
}
}
This is the actor which was causing the leak:
class IndexerActor() extends Actor {
val elasticSearchSvc = new ElasticSearchService()
lazy val client = elasticSearchSvc.getClient()
override def preStart = {
// initialize index, and mappings etc.
}
def receive() = {
case message =>
// do indexing here
indexMessage(ES.client, message)
}
}
NOTE: Every time an actor instance is created, a new connection is being made.
Every invocation of new ElasticSearchService() was creating a new connection to ElasticSearch. I moved that into a separate object as shown below, and also the actor uses this object instead:
object ES {
val elasticSearchSvc = new ElasticSearchService()
lazy val client = elasticSearchSvc.getClient()
}
class IndexerActor() extends Actor {
override def preStart = {
// initialize index, and mappings etc.
}
def receive() = {
case message =>
// do indexing here
indexMessage(ES.client, message)
}
}

Nicifying execution contex's thread pool's output for logging/debuging in scala

Is there is nice way to rename a pool in/for an executon context to produce nicer output in logs/wile debugging. Not to be look like ForkJoinPool-2-worker-7 (because ~2 tells nothing about pool's purose in app) but WorkForkJoinPool-2-worker-7.. wihout creating new WorkForkJoinPool class for it?
Example:
object LogSample extends App {
val ex1 = ExecutionContext.global
val ex2 = ExecutionContext.fromExecutor(null:Executor) // another global ex context
val system = ActorSystem("system")
val log = Logging(system.eventStream, "my.nice.string")
Future {
log.info("1")
}(ex1)
Future {
log.info("2")
}(ex2)
Thread.sleep(1000)
// output, like this:
/*
[INFO] [09/14/2015 21:53:34.897] [ForkJoinPool-2-worker-7] [my.nice.string] 2
[INFO] [09/14/2015 21:53:34.897] [ForkJoinPool-1-worker-7] [my.nice.string] 1
*/
}
You need to implement custom thread factory, something like this:
class CustomThreadFactory(prefix: String) extends ForkJoinPool.ForkJoinWorkerThreadFactory {
def newThread(fjp: ForkJoinPool): ForkJoinWorkerThread = {
val thread = new ForkJoinWorkerThread(fjp) {}
thread.setName(prefix + "-" + thread.getName)
thread
}
}
val threadFactory = new CustomThreadFactory("custom prefix here")
val uncaughtExceptionHandler = new UncaughtExceptionHandler {
override def uncaughtException(t: Thread, e: Throwable) = e.printStackTrace()
}
val executor = new ForkJoinPool(10, threadFactory, uncaughtExceptionHandler, true)
val ex2 = ExecutionContext.fromExecutor(executor) // another global ex context
val system = ActorSystem("system")
val log = Logging(system.eventStream, "my.nice.string")
Future {
log.info("2") //[INFO] [09/15/2015 18:22:43.728] [custom prefix here-ForkJoinPool-1-worker-29] [my.nice.string] 2
}(ex2)
Thread.sleep(1000)
Ok. Seems this is not possible (particulary for default global iml) due to current scala ExecutonContext implementation.
What I could do is just copy that impl and replace:
class DefaultThreadFactory(daemonic: Boolean) ... {
def wire[T <: Thread](thread: T): T = {
thread.setName("My" + thread.getId) // ! add this one (make 'My' to be variable)
thread.setDaemon(daemonic)
thread.setUncaughtExceptionHandler(uncaughtExceptionHandler)
thread
}...
because threadFactory there
val threadFactory = new DefaultThreadFactory(daemonic = true)
is harcoded ...
(seems Vladimir Petrosyan was first showing nicer way :) )

how to make the program pause when actor is running

For example
import scala.actors.Actor
import scala.actors.Actor._
object Main {
class Pong extends Actor {
def act() {
var pongCount = 0
while (true) {
receive {
case "Ping" =>
if (pongCount % 1000 == 0)
Console.println("Pong: ping "+pongCount)
sender ! "Pong"
pongCount = pongCount + 1
case "Stop" =>
Console.println("Pong: stop")
exit()
}
}
}
}
class Ping(count: Int, pong: Actor) extends Actor {
def act() {
var pingsLeft = count - 1
pong ! "Ping"
while (true) {
receive {
case "Pong" =>
if (pingsLeft % 1000 == 0)
Console.println("Ping: pong")
if (pingsLeft > 0) {
pong ! "Ping"
pingsLeft -= 1
} else {
Console.println("Ping: stop")
pong ! "Stop"
exit()
}
}
}
}
}
def main(args: Array[String]): Unit = {
val pong = new Pong
val ping = new Ping(100000, pong)
ping.start
pong.start
println("???")
}
}
I try to print "???" after the two actors call exit(), but now it is printed before "Ping: Stop" and "Pong stop"
I have try have a flag in the actor, flag is false while actor is running, and flag is true when actor stops, and in the main func, there is a while loop, such as while (actor.flag == false) {}, but it doesn't works, it is a endless loop:-/
So, please give me some advice.
If you need synchronous calls in akka, use ask pattern. Like
Await.result(ping ? "ping")
Also, you'd better use actor system to create actors.
import akka.actor.{ActorRef, Props, Actor, ActorSystem}
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
object Test extends App {
implicit val timeout = Timeout(3 second)
val system = ActorSystem("ActorSystem")
class Pong extends Actor {
def receive: Receive = {
case "Ping" =>
println("ping")
context.stop(self)
}
}
lazy val pong = system.actorOf(Props(new Pong), "Pong")
val x = pong.ask("Ping")
val res = Await.result(x, timeout.duration)
println("????")
system.shutdown()
}

Resources