Writing tests to use Kafka consumer in multi-threading environment - multithreading

I am trying to create a kafka consumer in a separate thread which consumes data from kafka topic. For this, I have extended ShutdownableThread abstract class and provided implementation for doWork method. My code is like this -
abstract class MyConsumer(topic: String) extends ShutdownableThread(topic) {
val props: Properties = ???
private val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List(topic).asJava)
def process(value: String): Unit // Abstract method defining what to do with each record
override def doWork(): Unit = {
for (record <- consumer.poll(Duration.ofMillis(1000)).asScala)
process(record.value())
}
}
Now in my tests, I create consumer providing implementation of process() method which just mutates a variable and then call start() method of it to start the thread.
var mutVar = "initial_value"
val consumer = new MyConsumer("test_topic") {
override def process(value: String): Unit = mutVar = "updated_value"
}
consumer.start()
assert(mutVar === "updated_value")
The Consumer does consume the message from the kafka but it does not update it before the test finishes and hence the test fails. So, I tried to put the main thread on sleep. But it throws ConcurrentModificationException exception with the message - KafkaConsumer is not safe for multi-threaded access
Any idea what is wrong with my approach ? Thanks in advance.

Had to put main thread to sleep for few seconds to allow consumer to consume message from kafka topic and store it in the mutable variable. Added - Thread.sleep(5000) after starting the consumer.

Related

How do we run multiple functions concurrently in Scala?

This is a scala code I'm trying to write for processing a function multiple times concurrently with two different parameters. However, I notice that the functions are being executed one by one and not both at same time.
class Method1 extends Thread {
override def run(): Unit ={
println("Hello, Current running is Thread No. " + Thread.currentThread().getName )
Function("Parameter 1")
Function("Parameter 2")
}
}
object Obj extends App {
for (x <- 1 to 4){
val th1 = new Method1()
th1.setName(x.toString)
th1.start()
}
}
val a = Array("Justanormalarray")
Obj.main(a)
How to achieve this? Sorry for being dumb but all tutorials I seen only explains a very basic level that multithreading can be achieved by extending thread class or by using runnable interface, but doesn't seem to clear this up (i.e. how to actually execute multiple stuff at same time)

run several coroutines in parallel (with return value)

I'm new with kotlin I'm trying to run several requests to a web in parallel threads
so far I got
class HttpClient {
private val DEFAULT_BASE_URL = "https://someapi"
fun fetch(endPoint: String, page: Int): String {
FuelManager.instance.basePath = DEFAULT_BASE_URL
val (_, response, _) = endPoint.httpGet(listOf("page" to page)).response()
return String(response.data)
}
fun headers(endPoint: String): Headers {
FuelManager.instance.basePath = DEFAULT_BASE_URL
val (_, response, _) = endPoint.httpGet(listOf("page" to 1)).response()
return response.headers
}
}
and the class that runs the whole process
class Fetcher(private val page: Int) {
suspend fun run(): String = coroutineScope {
async {
HttpClient().fetch(DEFAULT_ENDPOINT, page)
}
}.await()
companion object {
private const val DEFAULT_ENDPOINT = "endpoint"
suspend fun fetchAll(): MutableList<String> {
val totalThreads = (totalCount() / pageSize()) + 1
return runBlocking {
var deck: MutableList<String> = mutableListOf()
for (i in 1..totalThreads) {
deck.add(Fetcher(i).run())
}
deck
}
}
private fun pageSize(): Int {
return HttpClient().headers(DEFAULT_ENDPOINT)["page-size"].first().toInt()
}
private fun totalCount(): Int {
return HttpClient().headers(DEFAULT_ENDPOINT)["total-count"].first().toInt()
}
}
}
I'm looking to mirror the Thread.join() from Java. Could you give me some pointers on how to improve my code to achieve that?
Also if not much asking, could you suggest a book/example set on this subject?
Thanks for your help in advance!
A few points:
If you're going to be using coroutines in a project, you'll mostly want to be exposing suspending functions instead of blocking functions. I don't use Fuel, but I see it has a coroutines library with suspend function versions of its blocking functions. Usually, suspend functions that unwrap an asynchronous result have the word "await" in them. I don't know for sure what response() is since I don't use fuel, but if I had to guess, you can use awaitResponse() instead and then make the functions suspend functions.
Not related to coroutines, but there's almost no reason to ever use the String constructor to wrap another String, since Strings are immutable. (The only reason you would ever need to copy a String in memory like that is maybe if you were using it in some kind of weird collection that uses identity comparison instead of `==`` comparison, and you need it to be treated as a different value.)
Also not related to coroutines, but HttpClient in your case should be a singleton object since it holds no state. Then you won't need to instantiate it when you use it or worry about holding a reference to one in a property.
Never use runBlocking in a suspend function. A suspend function must never block. runBlocking creates a blocking function. The only two places runBlocking should ever appear in an application are at the top level main function of a CLI app, or in an app that has both coroutines and some other thread-management library and you need to convert suspend functions into blocking non-suspend functions so they can be used by the non-coroutine-based code.
There's no reason to immediately follow async() with await() if you aren't doing it in parallel with something else. You could just use withContext instead. If you don't need to use a specific dispatcher to call the code, which you don't if it's a suspend function, then you don't even need withContext. You can just call suspend functions directly in your coroutine.
There's no reason to use coroutineScope { } to wrap a single child coroutine. It's for running multiple child coroutines and waiting for all of them.
So, if we change HttpClient's functions into suspend functions, then Fetcher.run becomes very simple.
I also think that it's kind of weird that Fetcher is a class with a single property that is only used in a one-off fashion with its only function. Instead, it would be more straight-forward for Fetcher to be a singleton object and for run to have the parameter it needs. Then you won't need a companion object either since Fetcher as an object can directly host those functions.
Finally, the part you were actually asking about: to run parallel tasks in a coroutine, use coroutineScope { } and then launch async coroutines inside it and await them. The map function is handy for doing this with something you can iterate, and then you can use awaitAll(). You can also get totalCount and pageSize in parallel.
Bringing that all together:
object HttpClient {
private val DEFAULT_BASE_URL = "https://someapi"
suspend fun fetch(endPoint: String, page: Int): String {
FuelManager.instance.basePath = DEFAULT_BASE_URL
val (_, response, _) = endPoint.httpGet(listOf("page" to page)).awaitResponse()
return response.data
}
suspend fun headers(endPoint: String): Headers {
FuelManager.instance.basePath = DEFAULT_BASE_URL
val (_, response, _) = endPoint.httpGet(listOf("page" to 1)).awaitResponse()
return response.headers
}
}
object Fetcher() {
suspend fun run(page: Int): String =
HttpClient.fetch(DEFAULT_ENDPOINT, page)
private const val DEFAULT_ENDPOINT = "endpoint"
suspend fun fetchAll(): List<String> {
val totalThreads = coroutineScope {
val totalCount = async { totalCount() }
val pageSize = async { pageSize() }
(totalCount.await() / pageSize.await()) + 1
}
return coroutineScope {
(1..totalThreads).map { i ->
async { run(i) }
}.awaitAll()
}
}
private suspend fun pageSize(): Int {
return HttpClient.headers(DEFAULT_ENDPOINT)["page-size"].first().toInt()
}
private suspend fun totalCount(): Int {
return HttpClient.headers(DEFAULT_ENDPOINT)["total-count"].first().toInt()
}
}
I changed MutableList to List, since it's simpler, and usually you don't need a MutableList. If you really need one you can call toMutableList() on it.

Replace a Thread -- that has state variable -- with a Coroutine

To restate the title, I'm wondering if there is a way to convert the MyThread class below to a Kotlin Coroutine.
If you look closely, you will notice that the MyThread class has a property variable called someObject that can be modified from inside the both the run and the cancel methods. In this case SomeObject is completely encapsulated inside MyThread and I want to keep it that way. Is there a way to convert MyThread to a coroutine or do I already have the most elegant version of the code?
class MyCancellable: Thread(){
val someObject= SomeObject()
override fun run() {
super.run()
while(someObject.keepGoing){
someObject.value++
}
}
fun cancel(){
someObject.keepGoing=false
}
}
A reusable coroutine is a suspend function where the only parameter is CoroutineScope, so something roughly equivalent to what you have is:
fun CoroutineScope.cancellableCounter() = withContext(Dispatchers.Default) {
val someObject = SomeObject()
while (someObject.keepRunning) {
yield()
someObject.value++
}
}
The function can be called from inside another coroutine, or it can be passed to async or launch, such as myScope.launch(::cancellableCounter). The returned Job can be cancelled by calling cancel() on it.
But as mentioned in the comments, there may be a better way to design it depending on how SomeObject is supposed to be used.
Edit: Maybe for the ServerSocket you'd need to do something like this. I haven't tested it, so not totally sure. But I don't think you want to directly call accept() in a coroutine because it blocks for potentially a long time and does not cooperate with cancellation. So I'm suggesting you still need a dedicated thread. suspendCancellableCoroutine can bridge this to a suspend function.
suspend fun awaitSomeSocket(): Socket = suspendCancellableCoroutine { continuation ->
val socket: ServerSocket = generateSocket()
continuation.invokeOnCancellation { socket.close() }
thread {
runCatching {
val result = socket.use(ServerSocket::accept)
continuation.resume(result)
}
}
}
I think you want a class that can start its own coroutine? That seems like the equivalent, something like:
class MyCancellable(private val scope: CoroutineScope) {
private var job: Job? = null
val someObject = SomeObject()
fun run() {
if (job != null) return
job = scope.launch {
while(someObject.keepGoing) {
someObject.value++
}
}
}
fun cancel() {
someObject.keepGoing = false
}
}
Typically you'd do job.cancel() instead, and check isActive in the while loop - I don't think it matters here, but it might be worth doing it "properly" (and it is technically different to someObject.keepGoing going false for some other reason). And if you're doing that, maybe TenFour04's suggestion is better, since the only reason you need a class/object is so you can put externally visible run and cancel functions in it. If the coroutine just runs anyway, and you call cancel on the Job it returns, it's all good!

How to understand "new {}" syntax in Scala?

I am learning Scala multi-thread programming, and write a simple program through referring a tutorial:
object ThreadSleep extends App {
def thread(body: =>Unit): Thread = {
val t = new Thread {
override def run() = body
}
t.start()
t
}
val t = thread{println("New Therad")}
t.join
}
I can't understand why use {} in new Thread {} statement. I think it should be new Thread or new Thread(). How can I understand this syntax?
This question is not completely duplicated to this one, because the point of my question is about the syntax of "new {}".
This is a shortcut for
new Thread() { ... }
This is called anonymous class and it works just like in JAVA:
You are here creating a new thread, with an overriden run method. This is useful because you don't have to create a special class if you only use it once.
Needs confirmation but you can override, add, redefine every method or attribute you want.
See here for more details: https://docs.oracle.com/javase/tutorial/java/javaOO/anonymousclasses.html
By writing new Thread{} your creating an anonymous subclass of Thread where you're overriding the run method. Normally, you'd prefer to create a subclass of Runnable and create a thread with it instead of subclassing Thread
val r = new Runnable{ override def run { body } }
new Thread(r).start
This is usually sematincally more correct, since you'd want to subclass Thread only if you were specializing the Thread class more, for example with an AbortableThread. If you just want to run a task on a thread, the Runnable approach is more adequate.

How to configure a fine tuned thread pool for futures?

How large is Scala's thread pool for futures?
My Scala application makes many millions of future {}s and I wonder if there is anything I can do to optimize them by configuring a thread pool.
Thank you.
This answer is from monkjack, a comment from the accepted answer. However, one can miss this great answer so I'm reposting it here.
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(10))
If you just need to change the thread pool count, just use the global executor and pass the following system properties.
-Dscala.concurrent.context.numThreads=8 -Dscala.concurrent.context.maxThreads=8
You can specify your own ExecutionContext that your futures will run in, instead of importing the global implicit ExecutionContext.
import java.util.concurrent.Executors
import scala.concurrent._
implicit val ec = new ExecutionContext {
val threadPool = Executors.newFixedThreadPool(1000)
def execute(runnable: Runnable) {
threadPool.submit(runnable)
}
def reportFailure(t: Throwable) {}
}
best way to specify threadpool in scala futures:
implicit val ec = new ExecutionContext {
val threadPool = Executors.newFixedThreadPool(conf.getInt("5"));
override def reportFailure(cause: Throwable): Unit = {};
override def execute(runnable: Runnable): Unit = threadPool.submit(runnable);
def shutdown() = threadPool.shutdown();
}
class ThreadPoolExecutionContext(val executionContext: ExecutionContext)
object ThreadPoolExecutionContext {
val executionContextProvider: ThreadPoolExecutionContext = {
try {
val executionContextExecutor: ExecutionContextExecutor = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(25))
new ThreadPoolExecutionContext(executionContextExecutor)
} catch {
case exception: Exception => {
Log.error("Failed to create thread pool", exception)
throw exception
}
}
}
}

Resources