I have a use case where I use Camunda (7.12) for a stateless DMN evaluation. So it's just a big DMN table used for mapping values. The application is written in Spring Boot in Kotlin and is exposed as a REST service.
The code looks like this:
companion object {
private const val DMN_NAME = "mappingRules.dmn"
private const val DECISION_NAME = "mappingRules"
}
val dmnEngine: DmnEngine
val dmnTableStream: InputStream
val dmnDecision: DmnDecision
init {
dmnEngine = DmnEngineConfiguration.createDefaultDmnEngineConfiguration().buildEngine()
dmnTableStream = MyMappingService::class.java.classLoader.getResourceAsStream(DMN_NAME)
dmnDecision = dmnEngine.parseDecision(DECISION_NAME, dmnTableStream)
}
fun mapValue(source: String) {
val decisionResult: DmnDecisionTableResult = dmnEngine
.evaluateDecisionTable(dmnDecision, Variables.createVariables()
.putValue("source", source)
)
return decisionResult.firstResult.getValue("target") as String
}
mapValue and then dmnEngine.evaluateDecisionTable may be executed by multiple threads concurrently, is this Camunda method thread-safe? I could not find the information in the official documentation about a usage of stateless DMN evaluation and thread safety.
According to an answer in the Camunda forums, both DmnEngine and DmnDecision are supposed to be thread-safe: https://forum.camunda.org/t/re-use-caching-of-dmn-objects-in-multi-threaded-application/1576/2
Related
I have a very simple Flink application in Scala. I have 2 simple streams. I am broadcasting one of my stream to the other stream. Broadcasted stream is containing rules and just checking whether the other is stream's tuples are inside of rules or not. Everything is working fine and my code is like below.
This is an infinite running application. I wonder if there is any possibility for JVM to collect my rules object as garbage or not.
Does anyone has any idea? Many thanks in advance.
object StreamBroadcasting extends App {
val env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI()
val stream = env
.socketTextStream("localhost", 9998)
.flatMap(_.toLowerCase.split("\\W+").filter(_.nonEmpty))
.keyBy(l => l)
val ruleStream = env
.socketTextStream("localhost", 9999)
.flatMap(_.toLowerCase.split("\\W+").filter(_.nonEmpty))
val broadcastStream: DataStream[String] = ruleStream.broadcast
stream.connect(broadcastStream)
.flatMap(new SimpleConnect)
.print
class SimpleConnect extends RichCoFlatMapFunction[String, String, (String, Boolean)] {
private var rules: Set[String] = Set.empty[String] // Can JVM collect this object after a long time?
override def open(parameters: Configuration): Unit = {}
override def flatMap1(value: String, out: Collector[(String, Boolean)]): Unit = {
out.collect(value, rules.contains(value))
}
override def flatMap2(value: String, out: Collector[(String, Boolean)]): Unit = {
rules = rules.+(value)
}
}
env.execute("flink-broadcast-streams")
}
No, the Set of rules will not be garbage collected. It will stick around forever. (Of course, since you're not using Flink's broadcast state, the rules won't survive an application restart.)
I intend to use apache flink for read/write data into cassandra using flink. I was hoping to use flink-connector-cassandra, I don't find good documentation/examples for the connector.
Can you please point me to the right way for read and write data from cassandra using Apache Flink. I see only sink example which are purely for write ? Is apache flink meant for reading data too from cassandra similar to apache spark ?
I had the same question, and this is what I was looking for. I don't know if it is over simplified for what you need, but figured I should show it none the less.
ClusterBuilder cb = new ClusterBuilder() {
#Override
public Cluster buildCluster(Cluster.Builder builder) {
return builder.addContactPoint("urlToUse.com").withPort(9042).build();
}
};
CassandraInputFormat<Tuple2<String, String>> cassandraInputFormat = new CassandraInputFormat<>("SELECT * FROM example.cassandraconnectorexample", cb);
cassandraInputFormat.configure(null);
cassandraInputFormat.open(null);
Tuple2<String, String> testOutputTuple = new Tuple2<>();
cassandraInputFormat.nextRecord(testOutputTuple);
System.out.println("column1: " + testOutputTuple.f0);
System.out.println("column2: " + testOutputTuple.f1);
The way I figured this out was thanks to finding the code for the "CassandraInputFormat" class and seeing how it worked (http://www.javatips.net/api/flink-master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/batch/connectors/cassandra/CassandraInputFormat.java). I honestly expected it to just be a format and not the full class of reading from Cassandra based on the name, and I have a feeling others might be thinking the same thing.
ClusterBuilder cb = new ClusterBuilder() {
#Override
public Cluster buildCluster(Cluster.Builder builder) {
return builder.addContactPoint("localhost").withPort(9042).build();
}
};
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
InputFormat inputFormat = new CassandraInputFormat<Tuple3<Integer, Integer, Integer>>("SELECT * FROM test.example;", cb);//, TypeInformation.of(Tuple3.class));
DataStreamSource t = env.createInput(inputFormat, TupleTypeInfo.of(new TypeHint<Tuple3<Integer, Integer,Integer>>() {}));
tableEnv.registerDataStream("t1",t);
Table t2 = tableEnv.sql("select * from t1");
t2.printSchema();
You can use RichFlatMapFunction to extend a class
class MongoMapper extends RichFlatMapFunction[JsonNode,JsonNode]{
var userCollection: MongoCollection[Document] = _
override def open(parameters: Configuration): Unit = {
// do something here like opening connection
val client: MongoClient = MongoClient("mongodb://localhost:10000")
userCollection = client.getDatabase("gp_stage").getCollection("users").withReadPreference(ReadPreference.secondaryPreferred())
super.open(parameters)
}
override def flatMap(event: JsonNode, out: Collector[JsonNode]): Unit = {
// Do something here per record and this function can make use of objects initialized via open
userCollection.find(Filters.eq("_id", somevalue)).limit(1).first().subscribe(
(result: Document) => {
// println(result)
},
(t: Throwable) =>{
println(t)
},
()=>{
out.collect(event)
}
)
}
}
}
Basically open function executes once per worker and flatmap executes it per record. The example is for mongo but can be similarly used for cassandra
In your case as I understand the first step of your pipeline is reading data from Cassandra rather than writing a RichFlatMapFunction you should write your own RichSourceFunction
As a reference you can have a look at simple implementation of WikipediaEditsSource.
I have a simple method that writes a line of data to a File followed by a new line that is executed asynchronously.
def writeToFile(bw: BufferedWriter, str: String) = {
bw.write(str)
bw.newLine
}
When my program runs I'm getting "mixed up" rows in the file due to the async nature of the calls. For instance...say writeToFile(bw, "foo") is executed 3 times asynchronously I may get:
correct output
foo
foo
foo
possible incorrect output
foofoo
foo
I'm able to avoid this possibility by using synchronized method like this:
def writeToFile(bw: BufferedWriter, str: String) = synchronized {
bw.write(str)
bw.newLine
}
From what I researched I can't determine how "safe" this is in regards to scaling my application. The only examples I can find using synchronized is when accessing collections, not writing to a file. My application is built in the Play! Framework 2.4.2.
I personally would create an akka actor for each BufferedWriter what will encapsulate it completely.
import java.io.BufferedWriter
import akka.actor._
import playground.BufferedWriterActor.WriteToBuffer
object BufferedWriterActor {
val name = "BufferedWriterActor"
def props(bw: BufferedWriter) = Props(classOf[BufferedWriterActor], bw)
case class WriteToBuffer(str: String)
}
class BufferedWriterActor(bw: BufferedWriter) extends Actor {
def receive: Actor.Receive = {
case WriteToBuffer(str) =>
bw.write(str)
bw.newLine()
}
}
Use it like this:
import akka.actor.{ActorSystem, Props}
object HelloWorld {
def main(args: Array[String]): Unit = {
val system = ActorSystem("mySystem")
// Share this actor across all your threads.
val myActor = system.actorOf(BufferedWriterActor.props(bw), BufferedWriterActor.name)
// Send messages to this actor from all you threads.
myActor ! BufferedWriterActor.WriteToBuffer("The Text")
}
}
This will chain all calls to this buffer in a single thread.
More info on akka and its actors is here:
http://akka.io/
http://doc.akka.io/docs/akka/snapshot/scala/actors.html
Also play framework itself uses akka so you should be able to use its default ActorSystem, but I do not remember how exactly, sorry.
I have an Actor and some other object:
object Config {
val readValueFromConfig() = { //....}
}
class MyActor extends Actor {
val confValue = Config.readValueFromConfig()
val initValue = Future {
val a = confValue // sometimes it's null
val a = Config.readValueFromConfig() //always works well
}
//..........
}
The code above is a very simplified version of what I actually have. The odd thing is that sometimes val a = confValue returns null, whereas if I replace it with val a = Config.readValueFromConfig() then it always works well.
I wonder, is this due to the fact that the only way to interact with an actor is sending it a message? Therefore, since val confValue is not a local variable, I must either use val a = Config.readValueFromConfig() (a different object, not an actor) or val a = self ! GetConfigValue and read the result afterwards?
val readValueFromConfig() = { //....}
This gives me a compile error. I assume you mean without parentheses?
val readValueFromConfig = { //....}
Same logic with different timing gives different result = a race condition.
val confValue = Config.readValueFromConfig() is always executed during construction of MyActor objects (because it's a field of MyActor). Sometimes this is returning null.
val a = Config.readValueFromConfig() //always works well is always executed later - after MyActor is constructed, when the Future initValue is executed by it's Executor. It seems this never returns null.
Possible causes:
Could be explained away if the body of readValueFromConfig was dependent upon another
parallel/async operation having completed. Any chance you're reading the config asynchronously? Given the name of this method, it probably just reads synchronously from a file - meaning this is not the cause.
Singleton objects are not threadsafe?? I compiled your code. Here's the decompilation of your singleton object java class:
public final class Config
{
public static String readValueFromConfig()
{
return Config..MODULE$.readValueFromConfig();
}
}
public final class Config$
{
public static final MODULE$;
private final String readValueFromConfig;
static
{
new ();
}
public String readValueFromConfig()
{
return this.readValueFromConfig;
}
private Config$()
{
MODULE$ = this;
this.readValueFromConfig = // ... your logic here;
}
}
Mmmkay... Unless I'm mistaken, that ain't thread-safe.
IF two threads are accessing readValueFromConfig (say Thread1 accesses it first), then inside method private Config$(), MODULE$ is unsafely published before this.readValueFromConfig is set (reference to this prematurely escapes the constructor). Thread2 which is right behind can read MODULE$.readValueFromConfig before it is set. Highly likely to be a problem if '... your logic here' is slow and blocks the thread - which is precisely what synchronous I/O does.
Moral of story: avoid stateful singleton objects from Actors (or any Threads at all, including Executors) OR make them become thread-safe through very careful coding style. Work-Around: change to a def, which internally caches the value in a private val.
I wonder, is this due to the fact that the only way to interact with an actor is sending it a message? Therefore, since val confValue is not a local variable, I must either use val a = Config.readValueFromConfig() (a different object, not an actor)
Just because it's not an actor, doesn't mean it's necessarily safe. It probably isn't.
or val a = self ! GetConfigValue and read the result afterwards?
That's almost right. You mean self ? GetConfigValue, I think - that will return a Future, which you can then map over. ! doesn't return anything.
You cannot read from an actor's variables directly inside a Future because (in general) that Future could be running on any thread, on any processor core, and you don't have any memory barrier there to force the CPU caches to reload the value from main memory.
I frequently want to parallelize a task that relies on a non-threadsafe shared resource. Consider the following non-threadsafe class. I want to do a map over a data: Vector[String].
class Processor { def apply: String => String }
Basically, I want to create n threads, each with an instance of Processor and a partition of the data. Scala parallel collections have spoiled me into thinking the parallelization should be dirt simple. However, they don't seem well suited for this problem. Yes, I can use actors but Scala actors might become deprecated and Akka seems like overkill.
The first thing that comes to mind is to have a synchronized map Thread -> Processor and then use parallel collections, looking up my Processor in this thread-safe map. Is there a better way?
Instead of building your own synchronized map, you can use ThreadLocal. That will guarantee a unique Processor per thread.
val processors = new ThreadLocal[Processor] {
def initialValue() = new Processor
}
data.par.map(x => processors.get.apply(x))
Alternatively you try using an executor service configured to use specified number of threads explicitly:
val processors = new ThreadLocal[Processor] {
override def initialValue() = new Processor
}
val N = 4
// create an executor with fixed number of threads
val execSvc = Executors.newFixedThreadPool(N)
// create the tasks
data foreach {
loopData =>
execSvc.submit(new Runnable() {
def run = processors.get().apply(loopData)
})
}
// await termination
execSvc.shutdown()
while(!execSvc.awaitTermination(1, TimeUnit.SECONDS)) {
;
}
// processing complete!