Number of threads in Akka keep increasing. What could be wrong? - multithreading

Why does the thread count keep increasing ?
LOOK AT THE BOTTOM RIGHT in this image.
The overall flow is like this:
Akka HTTP Server API
-> on http request, sendMessageTo DataProcessingActor
-> sendMessageTo StorageActor
-> sendMessageTo DataBaseActor
-> sendMessageTo IndexActor
This is the definition of Akka HTTP API ( in pseudo-code ):
Main {
path("input/") {
post {
dataProcessingActor forward message
}
}
}
Below are the actor definitions ( in pseudo-code ):
DataProcessingActor {
case message =>
message = parse message
storageActor ! message
}
StorageActor {
case message =>
indexActor ! message
databaseActor ! message
}
DataBaseActor {
case message =>
val c = get monogCollection
c.store(message)
}
IndexActor {
case message =>
elasticSearch.index(message)
}
After I run this setup, and on sending multiple HTTP requsts to "input/" HTTP endpoint, I get errors:
for( i <- 0 until 1000000) {
post("input/", someMessage+i)
}
Error:
[ERROR] [04/22/2016 13:20:54.016] [Main-akka.actor.default-dispatcher-15] [akka.tcp://Main#127.0.0.1:2558/system/IO-TCP/selectors/$a/0] Accept error: could not accept new connection
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at akka.io.TcpListener.acceptAllPending(TcpListener.scala:107)
at akka.io.TcpListener$$anonfun$bound$1.applyOrElse(TcpListener.scala:82)
at akka.actor.Actor$class.aroundReceive(Actor.scala:480)
at akka.io.TcpListener.aroundReceive(TcpListener.scala:32)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
EDIT 1
Here is the application.conf file being used:
akka {
loglevel = "INFO"
stdout-loglevel = "INFO"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
actor {
default-dispatcher {
throughput = 10
}
}
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 2558
}
}
}

I figured out that ElasticSearch was the problem. I am using Java API for ElasticSearch, and that is leaking sockets because of the way it was being used from Java API. Now resolved as described here.
Below is the Elastic Search client service using Java API
trait ESClient { def getClient(): Client }
case class ElasticSearchService() extends ESClient {
def getClient(): Client = {
val client = new TransportClient().addTransportAddress(
new InetSocketTransportAddress(Config.ES_HOST, Config.ES_PORT)
)
client
}
}
This is the actor which was causing the leak:
class IndexerActor() extends Actor {
val elasticSearchSvc = new ElasticSearchService()
lazy val client = elasticSearchSvc.getClient()
override def preStart = {
// initialize index, and mappings etc.
}
def receive() = {
case message =>
// do indexing here
indexMessage(ES.client, message)
}
}
NOTE: Every time an actor instance is created, a new connection is being made.
Every invocation of new ElasticSearchService() was creating a new connection to ElasticSearch. I moved that into a separate object as shown below, and also the actor uses this object instead:
object ES {
val elasticSearchSvc = new ElasticSearchService()
lazy val client = elasticSearchSvc.getClient()
}
class IndexerActor() extends Actor {
override def preStart = {
// initialize index, and mappings etc.
}
def receive() = {
case message =>
// do indexing here
indexMessage(ES.client, message)
}
}

Related

The connection was inactive for more than the allowed 60000 milliseconds and is closed by container

I have an azure function that sends a message to the service bus queue. Since a recent deployment, I see an exception occurring frequently: The connection was inactive for more than the allowed 60000 milliseconds and is closed by container.
I looked into this GitHub post: https://github.com/Azure/azure-service-bus-java/issues/280 it says this is a warning. Is there a way to increase this timeout? Or any suggestions on how to resolve this? Here is my code:
namespace Repositories.ServiceBusQueue
{
public class MembershipServiceBusRepository : IMembershipServiceBusRepository
{
private readonly QueueClient _queueClient;
public MembershipServiceBusRepository(string serviceBusNamespacePrefix, string queueName)
{
var msiTokenProvider = TokenProvider.CreateManagedIdentityTokenProvider();
_queueClient = new QueueClient($"https://{serviceBusNamespacePrefix}.servicebus.windows.net", queueName, msiTokenProvider);
}
public async Task SendMembership(GroupMembership groupMembership, string sentFrom = "")
{
if (groupMembership.SyncJobPartitionKey == null) { throw new ArgumentNullException("SyncJobPartitionKey must be set."); }
if (groupMembership.SyncJobRowKey == null) { throw new ArgumentNullException("SyncJobRowKey must be set."); }
foreach (var message in groupMembership.Split().Select(x => new Message
{
Body = Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(x)),
SessionId = groupMembership.RunId.ToString(),
ContentType = "application/json",
Label = sentFrom
}))
{
await _queueClient.SendAsync(message);
}
}
}
}
This could be due to deadlock in the thread pool, please check if you are calling an async method from a sync method.

JeroMQ Subscriber in a Runnable

I'm trying to embed ZMQ subscriber in a Runnable.
I'm able to start the Runnable for the first time and everything seems okay.
The problem is when I interrupt the Thread and try to start a new Thread, the subscriber does not get any messages. For example:
I have a publisher runnable
class ZMQPublisherRunnable() extends Runnable {
override def run() {
val ZMQcontext = ZMQ.context(1)
val publisher = ZMQcontext.socket(ZMQ.PUB)
var count = 0
publisher.connect(s"tcp://127.0.0.1:16666")
while (!Thread.currentThread().isInterrupted) {
try {
println(s"PUBLISHER -> $count")
publisher.send(s"PUBLISHER -> $count")
count += 1
Thread.sleep(1000)
}
catch {
case e: Exception =>
println(e.getMessage)
publisher.disconnect(s"tcp://127.0.0.1:16666")
ZMQcontext.close()
}
}
}
}
I have a Subscriber Runnable:
class ZMQSubscriberRunnable1() extends Runnable {
override def run() {
println("STARTING SUBSCRIBER")
val ZMQcontext = ZMQ.context(1)
val subscriber = ZMQcontext.socket(ZMQ.SUB)
subscriber.subscribe("".getBytes)
subscriber.bind(s"tcp://127.0.0.1:16666")
while (!Thread.currentThread().isInterrupted) {
try {
println("waiting")
val mesg = new String(subscriber.recv(0))
println(s"SUBSCRIBER -> $mesg")
}
catch {
case e: Exception =>
println(e.getMessage)
subscriber.unbind("tcp://127.0.0.1:16666")
subscriber.close()
ZMQcontext.close()
}
}
}
}
My main code looks like this:
object Application extends App {
val zmqPUB = new ZMQPublisherRunnable
val zmqThreadPUB = new Thread(zmqPUB, "MY_PUB")
zmqThreadPUB.setDaemon(true)
zmqThreadPUB.start()
val zmqRunnable = new ZMQSubscriberRunnable1
val zmqThread = new Thread(zmqRunnable, "MY_TEST")
zmqThread.setDaemon(true)
zmqThread.start()
Thread.sleep(10000)
zmqThread.interrupt()
zmqThread.join()
Thread.sleep(2000)
val zmqRunnable_2 = new ZMQSubscriberRunnable1
val zmqThread_2 = new Thread(zmqRunnable_2, "MY_TEST_2")
zmqThread_2.setDaemon(true)
zmqThread_2.start()
Thread.sleep(10000)
zmqThread_2.interrupt()
zmqThread_2.join()
}
The first time I start the Subscriber, I'm able to receive all messages:
STARTING SUBSCRIBER
PUBLISHER -> 0
waiting
PUBLISHER -> 1
SUBSCRIBER -> PUBLISHER -> 1
waiting
PUBLISHER -> 2
SUBSCRIBER -> PUBLISHER -> 2
waiting
PUBLISHER -> 3
SUBSCRIBER -> PUBLISHER -> 3
waiting
...
Once I interrupt the Thread and start a new one from the same Runnable, I'm not able to read messages anymore. It is waiting forever
STARTING SUBSCRIBER
waiting
PUBLISHER -> 13
PUBLISHER -> 14
PUBLISHER -> 15
PUBLISHER -> 16
PUBLISHER -> 17
...
Any insights about what I'm doing wrong?
Thanks
JeroMQ is not Thread.interrupt safe.
To work around it you have to stop the ZMQContext before you call the Thread.interrupt
Instantiate the ZMQContext outside the Runnable
Pass the ZMQContext as an argument to the ZMQ Runnable (You can also use it is a global variable)
Call zmqContext.term()
Call zmqSubThread.interrupt()
Call zmqSubThread.join()
For more details take a look at: https://github.com/zeromq/jeromq/issues/116
My subscriber Runnable looks like:
class ZMQSubscriberRunnable(zmqContext:ZMQ.Context, port: Int, ip: String, topic: String) extends Runnable {
override def run() {
var contextTerminated = false
val subscriber = zmqContext.socket(ZMQ.SUB)
subscriber.subscribe(topic.getBytes)
subscriber.bind(s"tcp://$ip:$port")
while (!contextTerminated && !Thread.currentThread().isInterrupted) {
try {
println(new String(subscriber.recv(0)))
}
catch {
case e: ZMQException if e.getErrorCode == ZMQ.Error.ETERM.getCode =>
contextTerminated = true
subscriber.close()
case e: Exception =>
zmqContext.term()
subscriber.close()
}
}
}
}
To interrupt the Thread:
zmqContext.term()
zmqSubThread.interrupt()
zmqSubThread.join()

How to read InputStream only once using CustomReceiver

I have written custom receiver to receive the stream that is being generated by one of our application. The receiver starts the process gets the stream and then cals store. However, the receive method gets called multiple times, I have written proper loop break condition, but, could not do it. How to ensure it only reads once and does not read the already processed data.?
Here is my custom receiver code:
class MyReceiver() extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {
def onStart() {
new Thread("Splunk Receiver") {
override def run() { receive() }
}.start()
}
def onStop() {
}
private def receive() {
try {
/* My Code to run a process and get the stream */
val reader = new ResultsReader(job.getResults()); // ResultReader is reader for the appication
var event:String = reader.getNextLine;
while (!isStopped || event != null) {
store(event);
event = reader.getNextLine;
}
reader.close()
} catch {
case t: Throwable =>
restart("Error receiving data", t)
}
}
}
Where did i go wrong.?
Problems
1) The job and stream reading happening after every 2 seconds and same data is piling up. So, for 60 line of data, i am getting 1800 or greater some times, in total.
Streaming Code:
val conf = new SparkConf
conf.setAppName("str1");
conf.setMaster("local[2]")
conf.set("spark.driver.allowMultipleContexts", "true");
val ssc = new StreamingContext(conf, Minutes(2));
val customReceiverStream = ssc.receiverStream(new MyReceiver)
println(" searching ");
//if(customReceiverStream.count() > 0 ){
customReceiverStream.foreachRDD(x => {println("=====>"+ x.count());x.count()});
//}
ssc.start();
ssc.awaitTermination()
Note: I am trying this in my local cluster, and with master as local[2].

spark-streaming and connection pool implementation

The spark-streaming website at https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations-on-dstreams mentions the following code:
dstream.foreachRDD { rdd =>
rdd.foreachPartition { partitionOfRecords =>
// ConnectionPool is a static, lazily initialized pool of connections
val connection = ConnectionPool.getConnection()
partitionOfRecords.foreach(record => connection.send(record))
ConnectionPool.returnConnection(connection) // return to the pool for future reuse
}
}
I have tried to implement this using org.apache.commons.pool2 but running the application fails with the expected java.io.NotSerializableException:
15/05/26 08:06:21 ERROR OneForOneStrategy: org.apache.commons.pool2.impl.GenericObjectPool
java.io.NotSerializableException: org.apache.commons.pool2.impl.GenericObjectPool
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
...
I am wondering how realistic it is to implement a connection pool that is serializable. Has anyone succeeded in doing this ?
Thank you.
To address this "local resource" problem what's needed is a singleton object - i.e. an object that's warranted to be instantiated once and only once in the JVM. Luckily, Scala object provides this functionality out of the box.
The second thing to consider is that this singleton will provide a service to all tasks running on the same JVM where it's hosted, so, it MUST take care of concurrency and resource management.
Let's try to sketch(*) such service:
class ManagedSocket(private val pool: ObjectPool, val socket:Socket) {
def release() = pool.returnObject(socket)
}
// singleton object
object SocketPool {
var hostPortPool:Map[(String, Int),ObjectPool] = Map()
sys.addShutdownHook{
hostPortPool.values.foreach{ // terminate each pool }
}
// factory method
def apply(host:String, port:String): ManagedSocket = {
val pool = hostPortPool.getOrElse{(host,port), {
val p = ??? // create new pool for (host, port)
hostPortPool += (host,port) -> p
p
}
new ManagedSocket(pool, pool.borrowObject)
}
}
Then usage becomes:
val host = ???
val port = ???
stream.foreachRDD { rdd =>
rdd.foreachPartition { partition =>
val mSocket = SocketPool(host, port)
partition.foreach{elem =>
val os = mSocket.socket.getOutputStream()
// do stuff with os + elem
}
mSocket.release()
}
}
I'm assuming that the GenericObjectPool used in the question is taking care of concurrency. Otherwise, access to each pool instance need to be guarded with some form of synchronization.
(*) code provided to illustrate the idea on how to design such object - needs additional effort to be converted into a working version.
Below answer is wrong!
I'm leaving the answer here for reference, but the answer is wrong for the following reason. socketPool is declared as a lazy val so it will get instantiated with each first request for access. Since the SocketPool case class is not Serializable, this means that it will get instantiated within each partition. Which makes the connection pool useless because we want to keep connections across partitions and RDDs. It makes no difference wether this is implemented as a companion object or as a case class. Bottom line is: the connection pool must be Serializable, and apache commons pool is not.
import java.io.PrintStream
import java.net.Socket
import org.apache.commons.pool2.{PooledObject, BasePooledObjectFactory}
import org.apache.commons.pool2.impl.{DefaultPooledObject, GenericObjectPool}
import org.apache.spark.streaming.dstream.DStream
/**
* Publish a Spark stream to a socket.
*/
class PooledSocketStreamPublisher[T](host: String, port: Int)
extends Serializable {
lazy val socketPool = SocketPool(host, port)
/**
* Publish the stream to a socket.
*/
def publishStream(stream: DStream[T], callback: (T) => String) = {
stream.foreachRDD { rdd =>
rdd.foreachPartition { partition =>
val socket = socketPool.getSocket
val out = new PrintStream(socket.getOutputStream)
partition.foreach { event =>
val text : String = callback(event)
out.println(text)
out.flush()
}
out.close()
socketPool.returnSocket(socket)
}
}
}
}
class SocketFactory(host: String, port: Int) extends BasePooledObjectFactory[Socket] {
def create(): Socket = {
new Socket(host, port)
}
def wrap(socket: Socket): PooledObject[Socket] = {
new DefaultPooledObject[Socket](socket)
}
}
case class SocketPool(host: String, port: Int) {
val socketPool = new GenericObjectPool[Socket](new SocketFactory(host, port))
def getSocket: Socket = {
socketPool.borrowObject
}
def returnSocket(socket: Socket) = {
socketPool.returnObject(socket)
}
}
which you can invoke as follows:
val socketStreamPublisher = new PooledSocketStreamPublisher[MyEvent](host = "10.10.30.101", port = 29009)
socketStreamPublisher.publishStream(myEventStream, (e: MyEvent) => Json.stringify(Json.toJson(e)))

Scala synchronized consumer producer

I want to implement something like the producer-consumer problem (with only one information transmitted at a time), but I want the producer to wait for someone to take his message before leaving.
Here is an example that doesn't block the producer but works otherwise.
class Channel[T]
{
private var _msg : Option[T] = None
def put(msg : T) : Unit =
{
this.synchronized
{
waitFor(_msg == None)
_msg = Some(msg)
notifyAll
}
}
def get() : T =
{
this.synchronized
{
waitFor(_msg != None)
val ret = _msg.get
_msg = None
notifyAll
return ret
}
}
private def waitFor(b : => Boolean) =
while(!b) wait
}
How can I changed it so the producers gets blocked (as the consumer is) ?
I tried to add another waitFor at the end of but sometimes my producer doesn't get released.
For instance, if I have put ; get || get ; put, most of the time it works, but sometimes, the first put is not terminated and the left thread never even runs the get method (I print something once the put call is terminated, and in this case, it never gets printed).
This is why you should use a standard class, SynchronousQueue in this case.
If you really want to work through your problematic code, start by giving us a failing test case or a stack trace from when the put is blocking.
You can do this by means of a BlockingQueue descendant whose producer put () method creates a semaphore/event object that is queued up with the passed message and then the producer thread waits on it.
The consumer get() method extracts a message from the queue and signals its semaphore, so allowing its original producer to run on.
This allows a 'synchronous queue' with actual queueing functionality, should that be what you want?
I came up with something that appears to be working.
class Channel[T]
{
class Transfer[A]
{
protected var _msg : Option[A] = None
def msg_=(a : A) = _msg = Some(a)
def msg : A =
{
// Reading the message destroys it
val ret = _msg.get
_msg = None
return ret
}
def isEmpty = _msg == None
def notEmpty = !isEmpty
}
object Transfer {
def apply[A](msg : A) : Transfer[A] =
{
var t = new Transfer[A]()
t.msg = msg
return t
}
}
// Hacky but Transfer has to be invariant
object Idle extends Transfer[T]
protected var offer : Transfer[T] = Idle
protected var request : Transfer[T] = Idle
def put(msg : T) : Unit =
{
this.synchronized
{
// push an offer as soon as possible
waitFor(offer == Idle)
offer = Transfer(msg)
// request the transfer
requestTransfer
// wait for the transfer to go (ie the msg to be absorbed)
waitFor(offer isEmpty)
// delete the completed offer
offer = Idle
notifyAll
}
}
def get() : T =
{
this.synchronized
{
// push a request as soon as possible
waitFor(request == Idle)
request = new Transfer()
// request the transfer
requestTransfer
// wait for the transfer to go (ie the msg to be delivered)
waitFor(request notEmpty)
val ret = request.msg
// delete the completed request
request = Idle
notifyAll
return ret
}
}
protected def requestTransfer()
{
this.synchronized
{
if(offer != Idle && request != Idle)
{
request.msg = offer.msg
notifyAll
}
}
}
protected def waitFor(b : => Boolean) =
while(!b) wait
}
It has the advantage of respecting symmetry between producer and consumer but it is a bit longer than what I had before.
Thanks for your help.
Edit : It is better but still not safeā€¦

Resources