Terminate request to database in flink operation - cassandra

I am trying to work with Flink and Cassandra. Both are massively parallel environments, but I have difficulties to make them working together.
Right now I need to make an operation for parallel read from Cassandra by different token ranges with the possibility to terminate query after N objects read.
The batch mode suites me more, but DataStreams are also possible.
I tried LongCounter (see below), but it would not work as I expected. I failed to get the global sum with them. Only local values.
Async mode is not nessesary since this operation CassandraRequester is performed in a parallel context with parallelization of about 64 or 128.
This is my attempt
class CassandraRequester<T> (val klass: Class<T>, private val context: FlinkCassandraContext):
RichFlatMapFunction<CassandraTokenRange, T>() {
companion object {
private val session = ApplicationContext.session!!
private var preparedStatement: PreparedStatement? = null
private val manager = MappingManager(session)
private var mapper: Mapper<*>? = null
private val log = LoggerFactory.getLogger(CassandraRequesterStateless::class.java)
public const val COUNTER_ROWS_NUMBER = "flink-cassandra-select-count"
}
private lateinit var counter: LongCounter
override fun open(parameters: Configuration?) {
super.open(parameters)
if(preparedStatement == null)
preparedStatement = session.prepare(context.prepareQuery()).setConsistencyLevel(ConsistencyLevel.LOCAL_ONE)
if(mapper == null) {
mapper = manager.mapper<T>(klass)
}
counter = runtimeContext.getLongCounter(COUNTER_ROWS_NUMBER)
}
override fun flatMap(tokenRange: CassandraTokenRange, collector: Collector<T>) {
val bs = preparedStatement!!.bind(tokenRange.start, tokenRange.end)
val rs = session.execute(bs)
val resultSelect = mapper!!.map(rs)
val iter = resultSelect.iterator()
while (iter.hasNext()) when {
this.context.maxRowsExtracted == 0L || counter.localValue < context.maxRowsExtracted -> {
counter.add(1)
collector.collect(iter.next() as T)
}
else -> {
collector.close()
return
}
}
}
}
Is it possible to terminate query in such a case?

Related

How to compare a previous list and updated a field in multi thread

I have a local cache where I store the runner's lap info, I need to show if the runner's current lap was better or worse than the current lap, while displaying the current lap information.
data class RunInfo(
val runnerId: String,
val lapTime: Double,
var betterThanLastLap: BETTERTHANLASTLAP
)
enum class BETTERTHANLASTLAP {
NA, YES, NO
}
object RunDB {
private var listOfRunners: MutableList<RunInfo> =
java.util.Collections.synchronizedList(mutableListOf())
private var previousList: MutableList<RunInfo> = mutableListOf()
fun save(runList: MutableList<RunInfo>) {
previousList = listOfRunners.toMutableList()
listOfRunners.clear()
listOfRunners.addAll(runList)
listOfRunners.forEach { runner ->
previousList.forEach { previousLap ->
if (runner.runnerId == previousLap.runnerId) {
runner.betterThanLastLap =
when {
previousLap.lapTime == 0.0 -> BETTERTHANLASTLAP.NA
runner.lapTime >= previousLap.lapTime -> BETTERTHANLASTLAP.YES
else -> BETTERTHANLASTLAP.NO
}
}
}
}
}
}
This seems to do the job, but often I get concurrent modification exception. Is there a better way of solving this problem?
I don't recommend combining mutable lists with read-write var properties. Making it mutable in two different ways creates ambiguity and is error prone. Since you're just clearing and replacing the list contents, I would make it a read-only list and a read-write property.
You need to synchronize the whole function so it can only be executed once at a time.
object RunDB {
private var listOfRunners: List<RunInfo> = listOf()
private var previousList: List<RunInfo> = listOf()
fun save(runList: List<RunInfo>) {
sychronized(this) {
previousList = listOfRunners.toList()
listOfRunners = runList.toList()
listOfRunners.forEach { runner ->
previousList.forEach { previousLap ->
if (runner.runnerId == previousLap.runnerId) {
runner.betterThanLastLap =
when {
previousLap.lapTime == 0.0 -> BETTERTHANLASTLAP.NA
runner.lapTime >= previousLap.lapTime -> BETTERTHANLASTLAP.YES
else -> BETTERTHANLASTLAP.NO
}
}
}
}
}
}
}
It also feels error prone to have a mutable data class in these lists that you're copying and shuffling around. I recommend making it immutable:
data class RunInfo(
val runnerId: String,
val lapTime: Double,
val betterThanLastLap: BETTERTHANLASTLAP
)
object RunDB {
private var listOfRunners: List<RunInfo> = listOf()
private var previousList: List<RunInfo> = listOf()
fun save(runList: List<RunInfo>) {
sychronized(this) {
previousList = listOfRunners.toList()
listOfRunners = runList.map { runner ->
val previousLap = previousList.find { runner.runnerId == previousLap.runnerId }
runner.copy(betterThanLastLap = when {
previousLap == null || previousLap.lapTime == 0.0 -> BETTERTHANLASTLAP.NA
runner.lapTime >= previousLap.lapTime -> BETTERTHANLASTLAP.YES
else -> BETTERTHANLASTLAP.NO
})
}
}
}
}

Error with Gson().fromJson - "Failed to invoke public com.keikakupet.PetStatus() with no args"

I'm trying to store an instance of my Kotlin class into a JSON file using the Gson library. However, when I run Gson().fromJson, I'm receiving the following error:
java.lang.RuntimeException: Failed to invoke public com.keikakupet.PetStatus() with no args
My understanding of the error is that Gson requires that my class has a primary constructor which takes no arguments so it can construct the desired object (in this case, a PetStatus object). However, I have such a constructor. I'm not sure if part of the problem lies in the fact that I'm running the method from the init. Does anyone know how I might fix this error?
My code:
package com.keikakupet
import android.content.Context
import android.util.Log
import com.google.gson.Gson
import java.io.File
import java.util.*
import java.io.BufferedReader
class PetStatus constructor(){
var maxHealth: Int = 10
var currentHealth: Int = 10
var healthIncrementer: Int = 2 // amount by which health increments when a task is completed
var healthDecrementer: Int = 0 // amount by which health decrements when user approaches / misses deadline
var isHungry: Boolean = false
var isTired: Boolean = false
var isSick: Boolean = false
var isAlive: Boolean = true
init{
//if a json exists, use it to update PetStatus
val context = getContext()
var file = File(context.getFilesDir(), "PetStatus.json")
if(file.exists()){
Log.d("FILE_EXISTS", "File exists!")
val bufferedReader: BufferedReader = file.bufferedReader()
val json = bufferedReader.readText()
val retrievedStatus = Gson().fromJson(json, PetStatus::class.java)
Log.d("JSON_RETRIEVED", json)
}
else
Log.d("FILE_DNE", "File does not exist!")
updateJson()
}
// method to update pet's health and ailment upon completing a task
fun processCompletedTask(){
incrementHealth()
removeAilment()
Log.d("TASK_COMPLETE", "completed task processed")
}
fun getHealthPercent(): Int {
return currentHealth / maxHealth
}
// method to update pet's health and ailment upon missing a task
fun processMissedTask(){
decrementHealth()
addAilment()
Log.d("TASK_MISSED", "missed task processed")
}
/*
Potentially creating another method to update pet's
health and status because of an approaching deadline.
*/
// method to decrement the pet's health
private fun decrementHealth(){
currentHealth-=healthDecrementer
if(currentHealth <= 0)
isAlive = false
updateJson()
}
// method to increment the pet's health
private fun incrementHealth(){
val sum = currentHealth + healthIncrementer
if(sum > maxHealth)
currentHealth = maxHealth
else
currentHealth = sum
updateJson()
}
// method to add an ailment to the pet
private fun addAilment(){
// if no ailment, randomly assign hungry or tired
if(!isHungry && !isTired && !isSick){
val rand = Random()
val randBool = rand.nextBoolean()
if(randBool)
isHungry = true
else
isTired = true
healthDecrementer = 1
}
// otherwise, if hungry XOR tired, assign the other
else if(isHungry && !isTired){
isTired = true
healthDecrementer = 2
}
else if(isTired && !isHungry){
isHungry = true
healthDecrementer = 2
}
// otherwise, if both hungry AND tired, assign sick
else if(isHungry && isTired){
isSick = true
healthDecrementer = 3
}
updateJson()
}
// method to remove an ailment from the pet
private fun removeAilment(){
// if sick, remove sick
if(isSick){
isSick = false
healthDecrementer = 2
}
// otherwise, if hungry and tired, remove one of the two randomly
else if(isHungry && isTired){
val rand = Random()
val randBool = rand.nextBoolean()
if(randBool)
isHungry = false
else
isTired = false
healthDecrementer = 1
}
// otherwise, if hungry XOR tired, remove relevant ailment
else if(isHungry && !isTired){
isHungry = false
healthDecrementer = 0
}
else if(isTired){
isTired = false
healthDecrementer = 0
}
updateJson()
}
private fun updateJson(){
val gson = Gson()
var json: String = gson.toJson(this)
Log.d("JSON_UPDATE", json)
val context = getContext()
var file = File(context.getFilesDir(), "PetStatus.json")
file.writeText(json)
val bufferedReader: BufferedReader = file.bufferedReader()
json = bufferedReader.readText()
Log.d("JSON_FROM_FILE", json)
}
companion object {
private lateinit var context: Context
fun setContext(con: Context) {
context=con
}
fun getContext() : Context {
return context
}
}
}
Logcat information:
Caused by: java.lang.RuntimeException: Failed to invoke public com.keikakupet.PetStatus() with no args
at com.google.gson.internal.ConstructorConstructor$3.construct(ConstructorConstructor.java:118)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:212)
at com.google.gson.Gson.fromJson(Gson.java:927)
at com.google.gson.Gson.fromJson(Gson.java:892)
at com.google.gson.Gson.fromJson(Gson.java:841)
at com.google.gson.Gson.fromJson(Gson.java:813)
at com.keikakupet.PetStatus.<init>(PetStatus.kt:32)
at java.lang.reflect.Constructor.newInstance0(Native Method)
at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
at com.google.gson.internal.ConstructorConstructor$3.construct(ConstructorConstructor.java:110)
... 4981 more
Since your constructor has no parameters, Gson is not able to instantiate your class with the help of json.
Restructure your class like this:
class PetStatus private constructor(
var maxHealth: Int = 10,
var currentHealth: Int = 10,
var healthIncrementer: Int = 2, // amount by which health increments when a task is completed
var healthDecrementer: Int = 0, // amount by which health decrements when user approaches / misses deadline
var isHungry: Boolean = false,
var isTired: Boolean = false,
var isSick: Boolean = false,
var isAlive: Boolean = true) {
/**
* This is optional fix, since it is a design guideline
* recommendation, you can retain your original function as well
* fun getHealthPercent(): Int {
* return currentHealth / maxHealth
* }
*/
val healthPercent: Int
get() = currentHealth / maxHealth
...
companion object {
lateinit var context: Context // getters and setters for java are automatically generated
operator fun invoke(): PetStatus {
//if a json exists, use it to update PetStatus
val context = context
var file = File(context.getFilesDir(), "PetStatus.json")
if(file.exists()){
Log.d("FILE_EXISTS", "File exists!")
val bufferedReader: BufferedReader = file.bufferedReader()
val json = bufferedReader.readText()
val retrievedStatus = Gson().fromJson(json, PetStatus::class.java)
Log.d("JSON_RETRIEVED", json)
return retrievedStatus
} else {
Log.d("FILE_DNE", "File does not exist!")
return PetStatus()
}
updateJson()
}
operator fun invoke(maxHealth: Int, currentHealth: Int, healthIncrementer: Int, healthDecrementer: Int, isHungry: Boolean, isTired: Boolean, isSick: Boolean, isAlive: Boolean): PetStatus
= PetStatus(maxHealth, currentHealth, healthIncrementer, healthDecrementer, isHungry, isTired, isSick, isAlive)
}
}
Now you can call the class the same
PetStatus()
operator fun invoke() is a hack (ish) approach, I'll recommend you should actually take the code out and instantiate the class from outside.

Is it possible for the database to block parallel table accesses in Scala threads?

In my Scala application, I make several threads. In each thread, I write different data from the array to the same PostgreSQL table. I noticed that some threads did not write data to the PostgreSQL table. However, there are no errors in the application logs. Is it possible for the database to block parallel table accesses? What can be the cause of this behavior?
MainApp.scala:
val postgreSQL = new PostgreSQL(configurations)
val semaphore = new Semaphore(5)
for (item <- array) {
semaphore.acquire()
val thread = new Thread(new CustomThread(postgreSQL, semaphore, item))
thread.start()
}
CustomThread.scala:
import java.util.concurrent.Semaphore
import java.util.UUID.randomUUID
import utils.PostgreSQL
class CustomThread(postgreSQL: PostgreSQL, semaphore: Semaphore, item: Item) extends Runnable {
override def run(): Unit = {
try {
// Create the unique filename.
val filename: String = randomUUID().toString
// Write to the database the filename of the item.
postgreSQL.changeItemFilename(filename, item.id)
// Change the status type of the item.
postgreSQL.changeItemStatusType(3, request.id)
} catch {
case e: Throwable =>
e.printStackTrace()
} finally {
semaphore.release()
}
}
}
PostgreSQL.scala:
package utils
import java.sql.{Connection, DriverManager, PreparedStatement, ResultSet}
import java.util.Properties
class PostgreSQL(configurations: Map[String, String]) {
val host: String = postgreSQLConfigurations("postgresql.host")
val port: String = postgreSQLConfigurations("postgresql.port")
val user: String = postgreSQLConfigurations("postgresql.user")
val password: String = postgreSQLConfigurations("postgresql.password")
val db: String = postgreSQLConfigurations("postgresql.db")
val url: String = "jdbc:postgresql://" + host + ":" + port + "/" + db
val driver: String = "org.postgresql.Driver"
val properties = new Properties()
val connection: Connection = getConnection
var statement: PreparedStatement = _
def getConnection: Connection = {
properties.setProperty("user", user)
properties.setProperty("password", password)
var connection: Connection = null
try {
Class.forName(driver)
connection = DriverManager.getConnection(url, properties)
} catch {
case e:Exception =>
e.printStackTrace()
}
connection
}
def changeItemFilename(filename: String, id: Int): Unit = {
try {
statement = connection.prepareStatement("UPDATE REPORTS SET FILE_NAME = ? WHERE ID = ?;", ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY)
statement.setString(1, filename)
statement.setInt(2, id)
statement.execute()
} catch {
case e: Exception =>
e.printStackTrace()
}
}
}
Just for your interest, by default, JDBC is synchronous. It means that it blocks your thread until the operation is done on a specific connection. This means if you try to do multiple things on a single connection at the same time, actions will be done sequentially instead.
More on that:
https://dzone.com/articles/myth-asynchronous-jdbc
That's the first and the most probable reason. The second possible reason, database blocks modifying actions on table cells which are being updated by another transaction, how exactly - depends on isolation level.
https://www.sqlservercentral.com/articles/isolation-levels-in-sql-server
That's the second probable reason.
The last, but not least, it is not necessary to use bare threads in Scala. For concurrent/asynchronous programming many of libraries like cats-effects, monix, zio, was developed, and there are special libraries for database access using these libraries like slick or doobie.
It's better to use them than bare threads due to numerous reasons.

Nicifying execution contex's thread pool's output for logging/debuging in scala

Is there is nice way to rename a pool in/for an executon context to produce nicer output in logs/wile debugging. Not to be look like ForkJoinPool-2-worker-7 (because ~2 tells nothing about pool's purose in app) but WorkForkJoinPool-2-worker-7.. wihout creating new WorkForkJoinPool class for it?
Example:
object LogSample extends App {
val ex1 = ExecutionContext.global
val ex2 = ExecutionContext.fromExecutor(null:Executor) // another global ex context
val system = ActorSystem("system")
val log = Logging(system.eventStream, "my.nice.string")
Future {
log.info("1")
}(ex1)
Future {
log.info("2")
}(ex2)
Thread.sleep(1000)
// output, like this:
/*
[INFO] [09/14/2015 21:53:34.897] [ForkJoinPool-2-worker-7] [my.nice.string] 2
[INFO] [09/14/2015 21:53:34.897] [ForkJoinPool-1-worker-7] [my.nice.string] 1
*/
}
You need to implement custom thread factory, something like this:
class CustomThreadFactory(prefix: String) extends ForkJoinPool.ForkJoinWorkerThreadFactory {
def newThread(fjp: ForkJoinPool): ForkJoinWorkerThread = {
val thread = new ForkJoinWorkerThread(fjp) {}
thread.setName(prefix + "-" + thread.getName)
thread
}
}
val threadFactory = new CustomThreadFactory("custom prefix here")
val uncaughtExceptionHandler = new UncaughtExceptionHandler {
override def uncaughtException(t: Thread, e: Throwable) = e.printStackTrace()
}
val executor = new ForkJoinPool(10, threadFactory, uncaughtExceptionHandler, true)
val ex2 = ExecutionContext.fromExecutor(executor) // another global ex context
val system = ActorSystem("system")
val log = Logging(system.eventStream, "my.nice.string")
Future {
log.info("2") //[INFO] [09/15/2015 18:22:43.728] [custom prefix here-ForkJoinPool-1-worker-29] [my.nice.string] 2
}(ex2)
Thread.sleep(1000)
Ok. Seems this is not possible (particulary for default global iml) due to current scala ExecutonContext implementation.
What I could do is just copy that impl and replace:
class DefaultThreadFactory(daemonic: Boolean) ... {
def wire[T <: Thread](thread: T): T = {
thread.setName("My" + thread.getId) // ! add this one (make 'My' to be variable)
thread.setDaemon(daemonic)
thread.setUncaughtExceptionHandler(uncaughtExceptionHandler)
thread
}...
because threadFactory there
val threadFactory = new DefaultThreadFactory(daemonic = true)
is harcoded ...
(seems Vladimir Petrosyan was first showing nicer way :) )

scala parallel collections: Idiomatic way of having thread-local-variables for worker threads

The progress function below is my worker function. I need to give it access to some classes which are costly to create / acquire. Is there any standard machinery for thread-local-variables in the libraries for this ? Or will I have to write a object pool manager myself ?
object Start extends App {
def progress {
val current = counter.getAndIncrement
if(current % 100 == 0) {
val perc = current.toFloat * 100 / totalPosts
print(f"\r$perc%4.2f%%")
}
}
val lexicon = new Global()
def processTopic(forumId: Int, topicId: Int) {
val(topic, posts) = ProcessingQueries.getTopicAndPosts(forumId,topicId)
progress
}
val (fid, tl) = ProcessingQueries.getAllTopics("hon")
val totalPosts = tl.size
val counter = new AtomicInteger(0)
val par = tl.par
par.foreach { topic_id =>
processTopic(fid,topic_id)
}
}
Replaced the previous answer. This does the trick nice and tidy
object MyAnnotator extends ThreadLocal[StanfordCoreNLP] {
val props = new Properties()
props.put("annotators", "tokenize,ssplit,pos,lemma,parse")
props.put("ssplit.newlineIsSentenceBreak", "two")
props.put("parse.maxlen", "40")
override def initialValue = new StanfordCoreNLP(props)
}

Resources