I cant get my head around this one here
I am a beginner to Scala just few weeks old and have tried but failed
I have read and tried about Actors, Futures,...etc didnt work for me
Could you supply a code of a server client example (or at least the server side)
Suppose to open connection using a socket that receives a string (i.e. file path) from several clients and process each one in a thread
import java.net.{Socket, ServerSocket}
import java.util.concurrent.{Executors, ExecutorService}
import java.util.Date
import java.io._
import scala.io._
import java.nio._
import java.util._
import scala.util.control.Breaks
import java.security.MessageDigest
import java.security.DigestInputStream
import scala.util.Sorting
class NetworkService(port: Int, poolSize: Int) extends Runnable {
val serverSocket = new ServerSocket(port)
val pool: ExecutorService = Executors.newFixedThreadPool(poolSize)
def run() {
try {
var i = 0
while (true) {
// This will block until a connection comes in.
val socket = serverSocket.accept()
val in = new BufferedReader(new InputStreamReader(socket.getInputStream)).readLine
/*var f = new FileSplit(in) //FileSplit is another class that i would like each
// client's sent string to be passed as an instance of
f.move*/
pool.execute(new Handler(socket))
}
} finally {
pool.shutdown()
}
}
}
class Handler(socket: Socket) extends Runnable {
def message = (Thread.currentThread.getName() + "\n").getBytes
def run() {
socket.getOutputStream.write(message)
socket.getOutputStream.close()
}
}
object MyServer {
def main(args: Array[String]) {
(new NetworkService(2030, 2)).run
}
}
You have several options available. You could do same old java style app, basically just using java standard libraries and scala syntax.
Maybe this helps: Scala equivalent of python echo server/client example?
You would just need to write logic that handles each socket (the one you get from accept()) in a new thread.
However I would not recommend using plain old java approach directly. There are great libraries out there that can handle that for you. For example Akka:
http://doc.akka.io/docs/akka/2.3.3/scala/io-tcp.html
I would also urge you to read about futures as they are super useful to do stuff async.
Related
I am building an API with Kotlin and Ktor that should be able to receive a normal POST request.
Upon receiving it, he should keep it alive and establish a series of asynchronous communications with other systems using websocket.
Only at the end of these communications and receiving certain information will it be able to respond to the POST request.
Needless to say, the request must be kept alive.
I'm not sure how to make this possible.
I have investigated using coroutines and threads but my inexperience prevents me from understanding what would be the best solution.
By default sequential code inside a coroutine is executed synchronously so you can just put your code for communication via Websockets inside a route's handler and in the end send a response. Here is an example:
import io.ktor.client.*
import io.ktor.client.engine.okhttp.*
import io.ktor.client.plugins.websocket.*
import io.ktor.client.plugins.websocket.WebSockets
import io.ktor.server.application.*
import io.ktor.server.engine.*
import io.ktor.server.netty.*
import io.ktor.server.response.*
import io.ktor.server.routing.*
import io.ktor.server.websocket.*
import io.ktor.websocket.*
fun main() {
val client = HttpClient(OkHttp) {
install(WebSockets)
}
embeddedServer(Netty, port = 12345) {
routing {
get("/") {
client.webSocket("ws://127.0.0.1:5050/ws") {
outgoing.send(Frame.Text("Hello"))
val frame = incoming.receive()
println((frame as Frame.Text).readText())
println("Websockets is done")
}
call.respondText { "Done" }
}
}
}.start(wait = false)
embeddedServer(Netty, port = 5050) {
install(io.ktor.server.websocket.WebSockets)
routing {
webSocket("/ws") {
outgoing.send(Frame.Text("Hello from server"))
}
}
}.start()
}
I'm new to Spark and currently battling a problem related to save the result of a Spark Stream to file after Context time. So the issue is: I want a query to run for 60seconds and save all the input it reads in that time to a file and also be able to define the file name for future processing.
Initially I thought the code below would be the way to go:
sc.socketTextStream("localhost", 12345)
.foreachRDD(rdd -> {
rdd.saveAsTextFile("./test");
});
However, after running, I realized that not only it saved a different file for each input read - (imagine that I have random numbers generating at random pace at that port), so if in one second it read 1 the file would contain 1 number, but if it read more the file would have them, instead of writing just one file with all the values from that 60s timeframe - but also I wasn't able to name the file, since the argument inside saveAsTextFile was the desired directory.
So I would like to ask if there is any spark native solution so I don't have to solve it by "java tricks", like this:
sc.socketTextStream("localhost", 12345)
.foreachRDD(rdd -> {
PrintWriter out = new PrintWriter("./logs/votes["+dtf.format(LocalDateTime.now().minusMinutes(2))+","+dtf.format(LocalDateTime.now())+"].txt");
List<String> l = rdd.collect();
for(String voto: l)
out.println(voto + " "+dtf.format(LocalDateTime.now()));
out.close();
});
I searched the spark documentation of similar problems but was unable to find a solution :/
Thanks for your time :)
Below is the template to consume socket stream data using new Spark APIs.
import org.apache.spark.sql.streaming.{OutputMode, Trigger}
object ReadSocket {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess
//Start reading from socket
val dfStream = spark.readStream
.format("socket")
.option("host","127.0.0.1") // Replace it your socket host
.option("port","9090")
.load()
dfStream.writeStream
.trigger(Trigger.ProcessingTime("1 minute")) // Will trigger 1 minute
.outputMode(OutputMode.Append) // Batch will processed for the data arrived in last 1 minute
.foreachBatch((ds,id) => { //
ds.foreach(row => { // Iterdate your data set
//Put around your File generation logic
println(row.getString(0)) // Thats your record
})
}).start().awaitTermination()
}
}
For code explanation please read read the code inline comments
Java Version
import org.apache.spark.api.java.function.VoidFunction2;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.streaming.StreamingQueryException;
import org.apache.spark.sql.streaming.Trigger;
public class ReadSocketJ {
public static void main(String[] args) throws StreamingQueryException {
SparkSession spark = Constant.getSparkSess();
Dataset<Row> lines = spark
.readStream()
.format("socket")
.option("host", "127.0.0.1") // Replace it your socket host
.option("port", "9090")
.load();
lines.writeStream()
.trigger(Trigger.ProcessingTime("5 seconds"))
.foreachBatch((VoidFunction2<Dataset<Row>, Long>) (v1, v2) -> {
v1.as(Encoders.STRING())
.collectAsList().forEach(System.out::println);
}).start().awaitTermination();
}
}
I'm setting up a kotlin coroutine based networking framework for the jvm. The Client and Server classes implement CoroutineScope, and the override for coroutinecontext is Dispatchers.IO, as I am pretty sure that's the correct Dispatcher to use for such a case. However, I wish to handle read packets on the main thread, or at least provide that option. Without reading the documentation, I used Dispatchers.Main, which I now realize is for the android UI thread. Is there a dispatcher I can use to get a coroutine running on the main thread? If not, how would I go about making one?
I have looked around the kotlin documentation on how to create a dispatcher based around a single thread, but I couldn't find anything besides newSingleThreadContext which creates a new thread. I also figured out that it is possible to create a dispatcher from a java Executor, but I'm still not sure how to limit this to a already existing thread.
class AbstractNetworkComponent : CoroutineScope {
private val packetProcessor = PacketProcessor()
private val job = Job()
override val coroutineContext = job + Dispatchers.IO
}
class PacketProcessor : CoroutineScope {
private val job = Job()
override val coroutineContext = job + Dispatchers.Main //Android only!
private val packetHandlers = mutableMapOf<Opcode, PacketHandlerFunc>()
fun handlePacket(opcode: Opcode, packet: ReceivablePacket, networker: Writable) {
launch(coroutineContext) {
packetHandlers[opcode]?.invoke(packet, networker)
}
}
}
So with the Dispatchers.Main I get an IllegalStateException due to the android component missing. Is there a way to create a dispatcher that blocks the main thread until its completion (like runBlocking does?) Thanks!
runBlocking is exactly what you need. It creates a dispatcher and sets it in the coroutine context. You can access the dispatcher with
coroutineContext[ContinuationInterceptor] as CoroutineDispatcher
and then you can pass it to an object that implements CoroutineScope or whatever else you want to do with it. Here's some sample code:
import kotlinx.coroutines.*
import kotlinx.coroutines.Dispatchers.IO
import kotlin.coroutines.ContinuationInterceptor
fun main() {
println("Top-level: current thread is ${Thread.currentThread().name}")
runBlocking {
val dispatcher = coroutineContext[ContinuationInterceptor]
as CoroutineDispatcher
ScopedObject(dispatcher).launchMe().join()
}
}
class ScopedObject(dispatcher: CoroutineDispatcher) : CoroutineScope {
override val coroutineContext = Job() + dispatcher
fun launchMe() = launch {
val result = withContext(IO) {
"amazing"
}
println("Launched coroutine: " +
"current thread is ${Thread.currentThread().name}, " +
"result is $result")
}
}
This will print
Top-level: current thread is main
Launched coroutine: current thread is main, result is amazing
As per Guide to UI programming with coroutines kotlinx.coroutines has three modules that provide coroutine context for different UI application libraries:
kotlinx-coroutines-android -- Dispatchers.Main context for Android
applications.
kotlinx-coroutines-javafx -- Dispatchers.JavaFx context for JavaFX UI
applications.
kotlinx-coroutines-swing -- Dispatchers.Swing context for Swing UI
applications.
Also, UI dispatcher is available via Dispatchers.Main from kotlinx-coroutines-core and corresponding implementation (Android, JavaFx or Swing) is discovered by ServiceLoader API. For example, if you are writing JavaFx application, you can use either Dispatchers.Main or Dispachers.JavaFx extension, it will be the same object.
I basically want to write an event callback in my driver program which will restart the spark streaming application on arrival of that event.
My driver program is setting up the streams and the execution logic by reading configurations from a file.
Whenever the file is changed (new configs added) the driver program has to do the following steps in a sequence,
Restart,
Read the config file (as part of the main method) and
Set up the streams
What is the best way to achieve this?
In some cases you may want to reload streaming context dynamically (for example to reloading of streaming operations).
In that cases you may (Scala example):
val sparkContext = new SparkContext()
val stopEvent = false
var streamingContext = Option.empty[StreamingContext]
val shouldReload = false
val processThread = new Thread {
override def run(): Unit = {
while (!stopEvent){
if (streamingContext.isEmpty) {
// new context
streamingContext = Option(new StreamingContext(sparkContext, Seconds(1)))
// create DStreams
val lines = streamingContext.socketTextStream(...)
// your transformations and actions
// and decision to reload streaming context
// ...
streamingContext.get.start()
} else {
if (shouldReload) {
streamingContext.get.stop(stopSparkContext = false, stopGracefully = true)
streamingContext.get.awaitTermination()
streamingContext = Option.empty[StreamingContext]
} else {
Thread.sleep(1000)
}
}
}
streamingContext.get.stop(stopSparkContext =true, stopGracefully = true)
streamingContext.get.awaitTermination()
}
}
// and start it in separate thread
processThread.start()
processThread.join()
or in python:
spark_context = SparkContext()
stop_event = Event()
spark_streaming_context = None
should_reload = False
def process(self):
while not stop_event.is_set():
if spark_streaming_context is None:
# new context
spark_streaming_context = StreamingContext(spark_context, 0.5)
# create DStreams
lines = spark_streaming_context.socketTextStream(...)
# your transformations and actions
# and decision to reload streaming context
# ...
self.spark_streaming_context.start()
else:
# TODO move to config
if should_reload:
spark_streaming_context.stop(stopSparkContext=False, stopGraceFully=True)
spark_streaming_context.awaitTermination()
spark_streaming_context = None
else:
time.sleep(1)
else:
self.spark_streaming_context.stop(stopGraceFully=True)
self.spark_streaming_context.awaitTermination()
# and start it in separate thread
process_thread = threading.Thread(target=process)
process_thread.start()
process_thread.join()
If you want to prevent you code from crashes and restart streaming context from the last place use checkpointing mechanism.
It allow you to restore your job state after failure.
The best way to Restart the Spark is actually according to your environment.But it is always suggestible to use spark-submit console.
You can background the spark-submit process like any other linux process, by putting it into the background in the shell. In your case, the spark-submit job actually then runs the driver on YARN, so, it's baby-sitting a process that's already running asynchronously on another machine via YARN.
Cloudera blog
One way that we explored recently (in a spark meetup here) was to achieve this by using Zookeeper in Tandem with Spark. This in a nutshell uses Apache Curator to watch for changes on Zookeeper (changes in config of ZK this can be triggered by your external event) that then causes a listener to restart.
The referenced code base is here , you will find that a change in config causes the Watcher (a spark streaming app) to reboot after a graceful shutdown and reload changes. Hope this is a pointer!
I am currently solving this issue as follows,
Listen to external events by subscribing to a MQTT topic
In the MQTT callback, stop the streaming context ssc.stop(true,true) which will gracefully shutdown the streams and underlying
spark config
Start the spark application again by creating a spark conf and
setting up the streams by reading the config file
// Contents of startSparkApplication() method
sparkConf = new SparkConf().setAppName("SparkAppName")
ssc = new StreamingContext(sparkConf, Seconds(1))
val myStream = MQTTUtils.createStream(ssc,...) //provide other options
myStream.print()
ssc.start()
The application is built as Spring boot application
In Scala, stopping sparkStreamingContext may involve stopping SparkContext. I have found that when a receiver hangs, it is best to restart the SparkCintext and the SparkStreamingContext.
I am sure the code below can be written much more elegantly, but it allows for the restarting of SparkContext and SparkStreamingContext programatically. Once this is done, you can restart your receivers programatically as well.
package coname.utilobjects
import com.typesafe.config.ConfigFactory
import grizzled.slf4j.Logging
import coname.conameMLException
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.collection.mutable
object SparkConfProviderWithStreaming extends Logging
{
val sparkVariables: mutable.HashMap[String, Any] = new mutable.HashMap
}
trait SparkConfProviderWithStreaming extends Logging{
private val keySSC = "SSC"
private val keyConf = "conf"
private val keySparkSession = "spark"
lazy val packagesversion=ConfigFactory.load("streaming").getString("streaming.cassandraconfig.packagesversion")
lazy val sparkcassandraconnectionhost=ConfigFactory.load("streaming").getString("streaming.cassandraconfig.sparkcassandraconnectionhost")
lazy val sparkdrivermaxResultSize=ConfigFactory.load("streaming").getString("streaming.cassandraconfig.sparkdrivermaxResultSize")
lazy val sparknetworktimeout=ConfigFactory.load("streaming").getString("streaming.cassandraconfig.sparknetworktimeout")
#throws(classOf[conameMLException])
def intitializeSpark(): Unit =
{
getSparkConf()
getSparkStreamingContext()
getSparkSession()
}
#throws(classOf[conameMLException])
def getSparkConf(): SparkConf = {
try {
if (!SparkConfProviderWithStreaming.sparkVariables.get(keyConf).isDefined) {
logger.info("\n\nLoading new conf\n\n")
val conf = new SparkConf().setMaster("local[4]").setAppName("MLPCURLModelGenerationDataStream")
conf.set("spark.streaming.stopGracefullyOnShutdown", "true")
conf.set("spark.cassandra.connection.host", sparkcassandraconnectionhost)
conf.set("spark.driver.maxResultSize", sparkdrivermaxResultSize)
conf.set("spark.network.timeout", sparknetworktimeout)
SparkConfProviderWithStreaming.sparkVariables.put(keyConf, conf)
logger.info("Loaded new conf")
getSparkConf()
}
else {
logger.info("Returning initialized conf")
SparkConfProviderWithStreaming.sparkVariables.get(keyConf).get.asInstanceOf[SparkConf]
}
}
catch {
case e: Exception =>
logger.error(e.getMessage, e)
throw new conameMLException(e.getMessage)
}
}
#throws(classOf[conameMLException])
def killSparkStreamingContext
{
try
{
if(SparkConfProviderWithStreaming.sparkVariables.get(keySSC).isDefined)
{
SparkConfProviderWithStreaming.sparkVariables -= keySSC
SparkConfProviderWithStreaming.sparkVariables -= keyConf
}
SparkSession.clearActiveSession()
SparkSession.clearDefaultSession()
}
catch {
case e: Exception =>
logger.error(e.getMessage, e)
throw new conameMLException(e.getMessage)
}
}
#throws(classOf[conameMLException])
def getSparkStreamingContext(): StreamingContext = {
try {
if (!SparkConfProviderWithStreaming.sparkVariables.get(keySSC).isDefined) {
logger.info("\n\nLoading new streaming\n\n")
SparkConfProviderWithStreaming.sparkVariables.put(keySSC, new StreamingContext(getSparkConf(), Seconds(6)))
logger.info("Loaded streaming")
getSparkStreamingContext()
}
else {
SparkConfProviderWithStreaming.sparkVariables.get(keySSC).get.asInstanceOf[StreamingContext]
}
}
catch {
case e: Exception =>
logger.error(e.getMessage, e)
throw new conameMLException(e.getMessage)
}
}
def getSparkSession():SparkSession=
{
if(!SparkSession.getActiveSession.isDefined)
{
SparkSession.builder.config(getSparkConf()).getOrCreate()
}
else
{
SparkSession.getActiveSession.get
}
}
}
I am trying to write to multiple files concurrently using the Akka framework, First I created a class called MyWriter that writes to a file, then using futures I call the object twice hopping that 2 files will be created for me, but when I monitor the execusion of the program, it first populates the first file and then the second one (blocking /synchronously).
Q: how can I make the code bellow run (none-blocking /asynchronously)
import akka.actor._
import akka.dispatch._
import akka.pattern.ask
import akka.util.Timeout
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.concurrent.Future
import scala.concurrent.{ ExecutionContext, Promise }
import ExecutionContext.Implicits.global
class my_controler {
}
object Main extends App {
val system = ActorSystem("HelloSystem")
val myobj = system.actorOf(Props(new MyWriter), name = "myobj")
implicit val timeout = Timeout(50 seconds)
val future2 = Future { myobj ! save("lots of conentet") }
val future1 = Future { myobj ! save("event more lots of conentet") }
}
the MyWriter code:
case class save(startval: String)
class MyWriter extends Actor {
def receive = {
case save(startval) => save_to_file(startval)
}
any ideas why the code does not execute concurrently?
Why are you wrapping the call to ? with an additional Future? Ask (?) returns a Future anyway, so what you are doing here is wrapping a Future around another Future and I'm not surte that's what you wanted to do.
The second issue I see is that you are sending two messages to the same actor instance and you are expecting them to be running in parallel. An actor instance processes its mailbox serially. If you wanted to process concurrently, then you will need two instances of your FileWriter actor to accomplish that. If that's all you want to do then just start up another instance of FileWriter and send it the second message.