Scala Chat Application, separate threads for local IO and socket IO - multithreading

I'm writing a chat application in Scala, the problem is with the clients, the client reads from StdIn (which blocks) before sending the data to the echo server, so if multiple clients are connected then they don't receive data from the server until reading from StdIn has completed. I'm thinking that local IO, i.e reading from StdIn and reading/writing to the socket should be on separate threads but I can't think of a way to do this, below is the Client singleton code:
import java.net._
import scala.io._
import java.io._
import java.security._
object Client {
var msgAcc = ""
def main(args: Array[String]): Unit = {
val conn = new ClientConnection(InetAddress.getByName(args(0)), args(1).toInt)
val server = conn.connect()
println("Enter a username")
val user = new User(StdIn.readLine())
println("Welcome to the chat " + user.username)
sys.addShutdownHook(this.shutdown(conn, server))
while (true) {
val txMsg = StdIn.readLine()//should be on a separate thread?
if (txMsg != null) {
conn.sendMsg(server, user, txMsg)
val rxMsg = conn.getMsg(server)
val parser = new JsonParser(rxMsg)
val formattedMsg = parser.formatMsg(parser.toJson())
println(formattedMsg)
msgAcc = msgAcc + formattedMsg + "\n"
}
}
}
def shutdown(conn: ClientConnection, server: Socket): Unit = {
conn.close(server)
val fileWriter = new BufferedWriter(new FileWriter(new File("history.txt"), true))
fileWriter.write(msgAcc)
fileWriter.close()
println("Leaving chat, thanks for using")
}
}
below is the ClientConnection class used in conjunction with the Client singleton:
import javax.net.ssl.SSLSocket
import javax.net.ssl.SSLSocketFactory
import javax.net.SocketFactory
import java.net.Socket
import java.net.InetAddress
import java.net.InetSocketAddress
import java.security._
import java.io._
import scala.io._
import java.util.GregorianCalendar
import java.util.Calendar
import java.util.Date
import com.sun.net.ssl.internal.ssl.Provider
import scala.util.parsing.json._
class ClientConnection(host: InetAddress, port: Int) {
def connect(): Socket = {
Security.addProvider(new Provider())
val sslFactory = SSLSocketFactory.getDefault()
val sslSocket = sslFactory.createSocket(host, port).asInstanceOf[SSLSocket]
sslSocket
}
def getMsg(server: Socket): String = new BufferedSource(server.getInputStream()).getLines().next()
def sendMsg(server: Socket, user: User, msg: String): Unit = {
val out = new PrintStream(server.getOutputStream())
out.println(this.toMinifiedJson(user.username, msg))
out.flush()
}
private def toMinifiedJson(user: String, msg: String): String = {
s"""{"time":"${this.getTime()}","username":"$user","msg":"$msg"}"""
}
private def getTime(): String = {
val cal = Calendar.getInstance()
cal.setTime(new Date())
"(" + cal.get(Calendar.HOUR_OF_DAY) + ":" + cal.get(Calendar.MINUTE) + ":" + cal.get(Calendar.SECOND) + ")"
}
def close(server: Socket): Unit = server.close()
}

You can add concurrency by using Scala Akka Actors. As of this writing the current Scala version is 2.11.8. See Actor documentation here:
http://docs.scala-lang.org/overviews/core/actors.html
This chat example is old but demonstrates a technique to handle in the neighborhood of a million simultaneous clients using Actors:
http://doc.akka.io/docs/akka/1.3.1/scala/tutorial-chat-server.html
Finally you can also Google the Twitter Finagle project which uses Scala and provides servers with concurrency. A lot of work to learn it I think...

Related

Is it possible for the database to block parallel table accesses in Scala threads?

In my Scala application, I make several threads. In each thread, I write different data from the array to the same PostgreSQL table. I noticed that some threads did not write data to the PostgreSQL table. However, there are no errors in the application logs. Is it possible for the database to block parallel table accesses? What can be the cause of this behavior?
MainApp.scala:
val postgreSQL = new PostgreSQL(configurations)
val semaphore = new Semaphore(5)
for (item <- array) {
semaphore.acquire()
val thread = new Thread(new CustomThread(postgreSQL, semaphore, item))
thread.start()
}
CustomThread.scala:
import java.util.concurrent.Semaphore
import java.util.UUID.randomUUID
import utils.PostgreSQL
class CustomThread(postgreSQL: PostgreSQL, semaphore: Semaphore, item: Item) extends Runnable {
override def run(): Unit = {
try {
// Create the unique filename.
val filename: String = randomUUID().toString
// Write to the database the filename of the item.
postgreSQL.changeItemFilename(filename, item.id)
// Change the status type of the item.
postgreSQL.changeItemStatusType(3, request.id)
} catch {
case e: Throwable =>
e.printStackTrace()
} finally {
semaphore.release()
}
}
}
PostgreSQL.scala:
package utils
import java.sql.{Connection, DriverManager, PreparedStatement, ResultSet}
import java.util.Properties
class PostgreSQL(configurations: Map[String, String]) {
val host: String = postgreSQLConfigurations("postgresql.host")
val port: String = postgreSQLConfigurations("postgresql.port")
val user: String = postgreSQLConfigurations("postgresql.user")
val password: String = postgreSQLConfigurations("postgresql.password")
val db: String = postgreSQLConfigurations("postgresql.db")
val url: String = "jdbc:postgresql://" + host + ":" + port + "/" + db
val driver: String = "org.postgresql.Driver"
val properties = new Properties()
val connection: Connection = getConnection
var statement: PreparedStatement = _
def getConnection: Connection = {
properties.setProperty("user", user)
properties.setProperty("password", password)
var connection: Connection = null
try {
Class.forName(driver)
connection = DriverManager.getConnection(url, properties)
} catch {
case e:Exception =>
e.printStackTrace()
}
connection
}
def changeItemFilename(filename: String, id: Int): Unit = {
try {
statement = connection.prepareStatement("UPDATE REPORTS SET FILE_NAME = ? WHERE ID = ?;", ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY)
statement.setString(1, filename)
statement.setInt(2, id)
statement.execute()
} catch {
case e: Exception =>
e.printStackTrace()
}
}
}
Just for your interest, by default, JDBC is synchronous. It means that it blocks your thread until the operation is done on a specific connection. This means if you try to do multiple things on a single connection at the same time, actions will be done sequentially instead.
More on that:
https://dzone.com/articles/myth-asynchronous-jdbc
That's the first and the most probable reason. The second possible reason, database blocks modifying actions on table cells which are being updated by another transaction, how exactly - depends on isolation level.
https://www.sqlservercentral.com/articles/isolation-levels-in-sql-server
That's the second probable reason.
The last, but not least, it is not necessary to use bare threads in Scala. For concurrent/asynchronous programming many of libraries like cats-effects, monix, zio, was developed, and there are special libraries for database access using these libraries like slick or doobie.
It's better to use them than bare threads due to numerous reasons.

Calling a function periodically in Scala while another expensive function is computing

I have a function that takes a long time to compute
def longFunc = {Thread.sleep(30000); true}
while this function is computing, I'd need to ping a server so it keeps waiting for the value of my function. But for the sake of the argument let's say I need to run the following function every 5 seconds while my longFunc is running
def shortFunc = println("pinging server! I am alive!")
To do this I have the following snippet and it works but I wonder if there is a better pattern for this scenario
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import java.util.{Timer, TimerTask}
import scala.concurrent.ExecutionContext.Implicits.global
def shortFunc = println("pinging server! I am alive!")
def longFunc = {Thread.sleep(30000); true}
val funcFuture = Future{longFunc}
val timer = new Timer()
def pinger = new TimerTask {
def run(): Unit = shortFunc
}
timer.schedule(pinger, 0L, 5000L) // ping the server every two minutes to say you are still working
val done = Await.result(funcFuture, 1 minutes)
pinger.cancel
I'm not actually sure if this is more elegant pattern or just for fun:
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
def waiter[T](futureToWait:Future[_], waitFunc: => T, timer: Duration) = Future {
while (!futureToWait.isCompleted) {
Try(Await.ready(futureToWait, timer))
waitFunc
}
}
def longFunc = {Thread.sleep(30000); true}
def shortFunc = println("pinging server! I am alive!")
val funcFuture = Future{longFunc}
waiter(funcFuture,shortFunc,5 second)
val done = Await.result(funcFuture, 1 minutes)
The same but shorter:
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
def longFunc = {Thread.sleep(30000); true}
def shortFunc = println("pinging server! I am alive!")
val funcFuture = Future{longFunc}
def ff:Future[_] = Future{
shortFunc
Try(Await.ready(funcFuture, 5 second)).getOrElse(ff)
}
ff
val done = Await.result(funcFuture, 1 minutes)

Corda RPC Connection Pooling/Caching

Does Corda has connection pooling feature? How to handle multiple RPC user connection pooling...
Appreciate if you could redirect to any opensource implementation/guide for RPC Connection Pooling/caching...
Here's an example of how to pool the node RPC connections:
import net.corda.client.rpc.CordaRPCClient
import net.corda.client.rpc.CordaRPCConnection
import net.corda.core.utilities.NetworkHostAndPort
import net.corda.core.utilities.contextLogger
import net.corda.core.utilities.getOrThrow
import net.corda.node.services.Permissions
import net.corda.testing.driver.DriverParameters
import net.corda.testing.driver.NodeParameters
import net.corda.testing.driver.driver
import net.corda.testing.node.User
import org.junit.Test
import java.util.concurrent.ConcurrentHashMap
import java.util.concurrent.LinkedBlockingQueue
data class UserParams(val username: String, val password: String)
class PooledRpcConnections(address: NetworkHostAndPort): AutoCloseable {
val userToPool = ConcurrentHashMap<UserParams, LinkedBlockingQueue<CordaRPCConnection>>()
val client = CordaRPCClient(address)
fun <A> withConnection(userParams: UserParams, block: (CordaRPCConnection) -> A): A {
val queue = userToPool.getOrPut(userParams) { LinkedBlockingQueue() }
val connection = queue.poll() ?: client.start(userParams.username, userParams.password)
return try {
block(connection)
} finally {
queue.add(connection)
}
}
override fun close() {
for (queue in userToPool.values) {
do {
val connection = queue.poll()
connection?.close()
} while (connection != null)
}
}
}
class Test {
companion object {
val log = contextLogger()
}
#Test
fun poolWorks() {
val users = ('a' .. 'f').map { User(it.toString(), it.toString(), setOf(Permissions.all())) }
val userParams = users.map { UserParams(it.username, it.password) }
driver(DriverParameters(startNodesInProcess = true, notarySpecs = emptyList())) {
log.info("Starting node for users ${users.map { it.username }}")
val node = startNode(NodeParameters(rpcUsers = users)).getOrThrow()
log.info("Starting pool")
PooledRpcConnections(node.rpcAddress).use { pool ->
val N = 1000
log.info("Making $N requests using pooled connections")
(1 .. N).toList().parallelStream().forEach { i ->
val user = userParams[i % users.size]
pool.withConnection(user) { connection ->
log.info("USER[${user.username}] CONNECTION[${connection.hashCode()}] NODE_TIME[${connection.proxy.currentNodeTime()}]")
}
}
log.info("Done! Number of connections used per user: ${pool.userToPool.map { it.key.username to it.value.size }}")
}
}
}
}

GitBlit add a hook

I have a GitBlit instance on a windows server, and i want to set a hook on post receive callback to start a gitlab ci pipeline on another server.
I already have set a GitlabCi trigger who works well, but my hook doesn't. Here is build-gitlab-ci.groovy file :
import com.gitblit.GitBlit
import com.gitblit.Keys
import com.gitblit.models.RepositoryModel
import com.gitblit.models.UserModel
import com.gitblit.utils.JGitUtils
import org.eclipse.jgit.lib.Repository
import org.eclipse.jgit.revwalk.RevCommit
import org.eclipse.jgit.transport.ReceiveCommand
import org.eclipse.jgit.transport.ReceiveCommand.Result
import org.slf4j.Logger
logger.info("Gitlab-CI hook triggered by ${user.username} for ${repository.name}")
// POST :
def sendPostRequest(urlString, paramString) {
def url = new URL(urlString)
def conn = url.openConnection()
conn.setDoOutput(true)
def writer = new OutputStreamWriter(conn.getOutputStream())
writer.write(paramString)
writer.flush()
String line
def reader = new BufferedReader(new InputStreamReader(conn.getInputStream()))
while ((line = reader.readLine()) != null) {
println line
}
writer.close()
reader.close()
}
sendPostRequest("https://xxxxx/api/v4/projects/1/trigger/pipeline", "token=xxxxxxxx&ref=master")
The project configuration :
Moreover, i don't know where logger.info write the log, so i don't know if my script was executed well. Thanks for help
I found my problem, it was a SSL self-certificate problem. I added this code to ignore it :
import com.gitblit.GitBlit
import com.gitblit.Keys
import com.gitblit.models.RepositoryModel
import com.gitblit.models.UserModel
import com.gitblit.utils.JGitUtils
import org.eclipse.jgit.lib.Repository
import org.eclipse.jgit.revwalk.RevCommit
import org.eclipse.jgit.transport.ReceiveCommand
import org.eclipse.jgit.transport.ReceiveCommand.Result
import org.slf4j.Logger
logger.info("Gitlab-CI hook triggered by ${user.username} for ${repository.name}")
def nullTrustManager = [
checkClientTrusted: { chain, authType -> },
checkServerTrusted: { chain, authType -> },
getAcceptedIssuers: { null }
]
def nullHostnameVerifier = [
verify: { hostname, session -> hostname.startsWith('yuml.me')}
]
javax.net.ssl.SSLContext sc = javax.net.ssl.SSLContext.getInstance("SSL")
sc.init(null, [nullTrustManager as javax.net.ssl.X509TrustManager] as javax.net.ssl.X509TrustManager[], null)
javax.net.ssl.HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory())
javax.net.ssl.HttpsURLConnection.setDefaultHostnameVerifier(nullHostnameVerifier as javax.net.ssl.HostnameVerifier)
def url = new URL("https://xxxx/api/v4/projects/{idProject}/trigger/pipeline")
def conn = url.openConnection()
conn.setDoOutput(true)
def writer = new OutputStreamWriter(conn.getOutputStream())
writer.write("token={token}&ref={branch}")
writer.flush()
String line
def reader = new BufferedReader(new InputStreamReader(conn.getInputStream()))
while ((line = reader.readLine()) != null) {
println line
}
writer.close()
reader.close()
And I identified the error checking the logs in E:\gitblit-1.7.1\logs\gitblit-stdout.{date}.log.
NB : stdout file date can be very old. Gitblit doesn't create a file per day. Mine had a name expired 4 months ago.

spark-streaming and connection pool implementation

The spark-streaming website at https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations-on-dstreams mentions the following code:
dstream.foreachRDD { rdd =>
rdd.foreachPartition { partitionOfRecords =>
// ConnectionPool is a static, lazily initialized pool of connections
val connection = ConnectionPool.getConnection()
partitionOfRecords.foreach(record => connection.send(record))
ConnectionPool.returnConnection(connection) // return to the pool for future reuse
}
}
I have tried to implement this using org.apache.commons.pool2 but running the application fails with the expected java.io.NotSerializableException:
15/05/26 08:06:21 ERROR OneForOneStrategy: org.apache.commons.pool2.impl.GenericObjectPool
java.io.NotSerializableException: org.apache.commons.pool2.impl.GenericObjectPool
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
...
I am wondering how realistic it is to implement a connection pool that is serializable. Has anyone succeeded in doing this ?
Thank you.
To address this "local resource" problem what's needed is a singleton object - i.e. an object that's warranted to be instantiated once and only once in the JVM. Luckily, Scala object provides this functionality out of the box.
The second thing to consider is that this singleton will provide a service to all tasks running on the same JVM where it's hosted, so, it MUST take care of concurrency and resource management.
Let's try to sketch(*) such service:
class ManagedSocket(private val pool: ObjectPool, val socket:Socket) {
def release() = pool.returnObject(socket)
}
// singleton object
object SocketPool {
var hostPortPool:Map[(String, Int),ObjectPool] = Map()
sys.addShutdownHook{
hostPortPool.values.foreach{ // terminate each pool }
}
// factory method
def apply(host:String, port:String): ManagedSocket = {
val pool = hostPortPool.getOrElse{(host,port), {
val p = ??? // create new pool for (host, port)
hostPortPool += (host,port) -> p
p
}
new ManagedSocket(pool, pool.borrowObject)
}
}
Then usage becomes:
val host = ???
val port = ???
stream.foreachRDD { rdd =>
rdd.foreachPartition { partition =>
val mSocket = SocketPool(host, port)
partition.foreach{elem =>
val os = mSocket.socket.getOutputStream()
// do stuff with os + elem
}
mSocket.release()
}
}
I'm assuming that the GenericObjectPool used in the question is taking care of concurrency. Otherwise, access to each pool instance need to be guarded with some form of synchronization.
(*) code provided to illustrate the idea on how to design such object - needs additional effort to be converted into a working version.
Below answer is wrong!
I'm leaving the answer here for reference, but the answer is wrong for the following reason. socketPool is declared as a lazy val so it will get instantiated with each first request for access. Since the SocketPool case class is not Serializable, this means that it will get instantiated within each partition. Which makes the connection pool useless because we want to keep connections across partitions and RDDs. It makes no difference wether this is implemented as a companion object or as a case class. Bottom line is: the connection pool must be Serializable, and apache commons pool is not.
import java.io.PrintStream
import java.net.Socket
import org.apache.commons.pool2.{PooledObject, BasePooledObjectFactory}
import org.apache.commons.pool2.impl.{DefaultPooledObject, GenericObjectPool}
import org.apache.spark.streaming.dstream.DStream
/**
* Publish a Spark stream to a socket.
*/
class PooledSocketStreamPublisher[T](host: String, port: Int)
extends Serializable {
lazy val socketPool = SocketPool(host, port)
/**
* Publish the stream to a socket.
*/
def publishStream(stream: DStream[T], callback: (T) => String) = {
stream.foreachRDD { rdd =>
rdd.foreachPartition { partition =>
val socket = socketPool.getSocket
val out = new PrintStream(socket.getOutputStream)
partition.foreach { event =>
val text : String = callback(event)
out.println(text)
out.flush()
}
out.close()
socketPool.returnSocket(socket)
}
}
}
}
class SocketFactory(host: String, port: Int) extends BasePooledObjectFactory[Socket] {
def create(): Socket = {
new Socket(host, port)
}
def wrap(socket: Socket): PooledObject[Socket] = {
new DefaultPooledObject[Socket](socket)
}
}
case class SocketPool(host: String, port: Int) {
val socketPool = new GenericObjectPool[Socket](new SocketFactory(host, port))
def getSocket: Socket = {
socketPool.borrowObject
}
def returnSocket(socket: Socket) = {
socketPool.returnObject(socket)
}
}
which you can invoke as follows:
val socketStreamPublisher = new PooledSocketStreamPublisher[MyEvent](host = "10.10.30.101", port = 29009)
socketStreamPublisher.publishStream(myEventStream, (e: MyEvent) => Json.stringify(Json.toJson(e)))

Resources