running Scala threads - multithreading

running Scala threads - multithreading

I am an absolute beginner in Scala, but have this problem to solve.
So i have a list of parameters
itemList = List('abc', 'def', 'ghi','jkl','mno', 'pqr')
I have these 3 parameter queries
val q1 = "env='dev1'&id='123'&listitem='xyz'"
val q2 = "env='dev2'&id='1234'&listitem='xyz'"
val q3 = "env='dev3'&id='12345'&listitem='xyz'"
val report1 = getReport(q1)
val report2 = getReport(q2)
val report3 = getReport(q3)
So I am trying to loop through the list, replace the listitem parameter in q1, q2 and q3 with the listitem and then run the http request report for each item in the list.
Since each getReport request is asynchronous, i need to wait , and so i cannot go to the next item in the list, as it would be if i were to do a loop.
So i would like to start up 3 threads for each item in the list and then combine the 3 reports into 1 final one, or i could do it sequentially.
How would i go about doing it with 3 Threads for each item in the list?
This is my idea:
val reportToken = [ q1, q2,q3 ]
val listTasks = [ getReport(q1) , getReport(q2) , getReport(q3) ]
for (i <- 1 to 3) {
val thread = new Thread {
override def run {
listTasks (reportToken(i))
}
val concat += listTask(i)
}
thread.start
Thread.sleep(50)
}

You can wrap each of your tasks in a Future, apply map/recover to handle the successful/failed Futures, and use Future.sequence to transform the list of Futures into a Future of list. Here's a trivialized example:
import scala.concurrent.{Future, Await}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
def getReport(q: String) = q match {
case "q2" => throw new Exception()
case q => s"got $q"
}
val reportToken = Seq("q1", "q2", "q3")
val listTasks = reportToken.map( q => Future{ getReport(q) } )
// listTasks: Seq[scala.concurrent.Future[String]] = ...
val f = Future.sequence(
listTasks.map(_.map(Some(_)).recover{case _ => None})
)
// f: scala.concurrent.Future[Seq[Option[String]]] = ...
Await.result(f, Duration.Inf)
// res1: Seq[Option[String]] = List(Some(got q1), None, Some(got q3))
For more details about Futures, here's a relevant Scala doc.

Assuming def getReport(str: String): Future[HttpResponse]
Future.sequence(itemList.map( item => {
for {
report1 <- getReport(q1.replace("xyz", item))
report2 <- getReport(q2.replace("xyz", item))
report3 <- getReport(q3.replace("xyz", item))
} yield {
(report1, report2, report3)
}
})).onComplete {
case Success(res) => // do something
case Failure(err) => // handle error
}

Related

How to remove repeated elements from an iterator?

This a follow-up to my previous question.
How would you write a function to filter out adjacent duplicates from a given iterator?
def remove[A](it: Iterator[A]): Iterator[A] = ???
remove("aaabccbbad".iterator).toList.map(_.mkString) // "abcbad"
P.S. The function should work when the whole input does not fit in the memory. That's why the function uses iterators.

You can use the following:
"aaabccbbad"
.map(ch => s"${ch}")
.reduce((s1, s2) => if(s1.takeRight(1) == s2) s1 else s1 + s2)
This results in
res0: String = abcbad
First for convenience I'm casting chars to strings. Then I'm comparing the last character of what I have already with consecutive characters and if they are different, I'm appending it.
More generally it could be like that:
stream.map(el => ListBuffer.empty.addOne(el))
.reduce((lb1, lb2) => if(lb1.last == lb2.last) lb1 else lb1.addAll(lb2))
.toList

A bit too low level. But this ensures that it will be only consuming elements as they are needed.
def remove[A](it: Iterator[A]): Iterator[A] =
new Iterator[A] {
private[this] var current: Option[A] = None
override def hasNext: Boolean =
it.hasNext || (current ne None)
override def next(): A = {
#annotation.tailrec
def loop(): A = (it.nextOption(), current) match {
case (Some(a), Some(c)) if (a == c) =>
loop()
case (sa # Some(a), Some(c)) =>
current = sa
c
case (sa # Some(a), None) =>
current = sa
loop()
case (None, Some(c)) =>
current = None
c
case (None, None) =>
Iterator.empty[A].next()
}
loop()
}
}
More or less the same as above, but using unfold instead.
def remove[A](it: Iterator[A]): Iterator[A] = {
type State = (Option[A], Option[A]) // value -> current
def process(state: State): Option[State] = state match {
case (Some(a), sc # Some(c)) if (a == c) =>
Some(None -> sc)
case (sa # Some(a), sc # Some(c)) =>
Some(sc -> sa)
case (sa # Some(a), None) =>
Some(None -> sa)
case (None, sc # Some(c)) =>
Some(sc -> None)
case (None, None) =>
None
}
Iterator.unfold(it.nextOption() -> Option.empty[A]) { state =>
process(state).map {
case (value, current) =>
(value -> (it.nextOption() -> current))
}
} collect {
case Some(a) => a
}
}
(they could be made more efficient using null instead of Option, but it requires special handling of primitives)

I would transform the input iterator to an iterator of lists of duplicates and then just map every list to its head. In order to do it I would use two functions from my previous questions:
function splitDupes: Iterator[A] => (List[A], Iterator[A]) (suggested here) to split out a prefix of duplicates and return a pair of the prefix and the rest
function split: Iterator[A] => Iterator[List[A]] (suggested here) to transform a given iterator into an iterator of lists of duplicates using splitDupes.
Thanks a lot to Kolmar for these suggestions. Using them I can implement remove like that:
def remove[A](it: Iterator[A]): Iterator[A] = split(it).flatMap(_.headOption)
See below implementations of splitDupes and split just for the reference:
def splitDupes[A](it: Iterator[A]): (List[A], Iterator[A]) = {
if (it.isEmpty) {
(Nil, Iterator.empty)
} else {
val head = it.next()
val (dupes, rest) = it.span(_ == head)
(head +: dupes.toList, rest)
}
}
def split[A](it: Iterator[A]): Iterator[List[A]] = {
Iterator.iterate(splitDupes(it))(x => splitDupes(x._2)).map(_._1).takeWhile(_.nonEmpty)
}

I think its easier with tail recursion:
def remove[A](it: Iterator[A]): Iterator[A] = {
def removeLoop(result: Iterator[A], remaining: Iterator[A]): Iterator[A] = {
if(remaining.isEmpty) {
result
} else {
val e = remaining.next();
removeLoop(result ++ Iterator(e), remaining.dropWhile(a => a == e))
}
}
removeLoop(Iterator.empty[A], it)
}
remove("aaabccbbad".iterator).toList.map(_.toString).mkString // "abcbad"
Edit: As per the comment, the above implementation is not lazy.
A possible lazy implementation will look something like this:
def remove[A](it: Iterator[A]): Iterator[A] = new AbstractIterator[A] {
var lastElement: A = _
def hasNext: Boolean = it.hasNext
def next(): A = {
#scala.annotation.tailrec
def nextLoop(lastElement: A, it: Iterator[A]): A = {
val temp = it.next
if(lastElement == temp)
nextLoop(lastElement, it)
else
temp
}
lastElement = nextLoop(lastElement, it)
lastElement
}
}
remove("aaabccbbad".iterator).take(2).foreach(print) // "ab"
remove("aaabccbbad".iterator).foreach(print) // "abcbad"

Scala slick retrieve data tables in parallel

I need to read data from two different tables (both with above 100k rows) in the same database. So I tried to create two Futures and connection pool size is 50, but the performance doesn't seem to improve (total time around 5 seconds). Then I found this article
So if you want to run multiple queries in parallel: no problem, just start them in separate Futures. However you won't have performance benefits, JDBC simply blocks a different Thread, not your main Thread of execution.
This means all the threads will be stuck at JDBC and processed sequentially. Is this true even if my connection pool size is 50? If yes, could you suggest an efficient way when dealing with data tables with large rows (such as load the data in less than 2 seconds)?
Here is my piece of code:
case class User(name: String, age: Int)
class User(tag: Tag) extends Table[User](tag, "User"){
def user_id = column[Long]("user_id")
def name = column[String]("user_name")
def age = column[Int]("user_age")
def * = (user_id, name, age) <> (
{ (row:(Long, String, Int)) => User(row._1, row._2, row._3)}
{ (p: User) => Some(p.name, p.age) }
)
}
val users = TableQuery[User]
case class Patron(name: String, type: Int)
class Patron(tag: Tag) extends Table[Patron](tag, "Patron"){
def patron_id = column[Long]("patron_id")
def name = column[String]("patron_name")
def type = column[Int]("patron_type")
def * = (patron_id, name, type) <> (
{ (row:(Long, String, Int)) => User(row._1, row._2, row._3)}
{ (p: Patron) => Some(p.name, p.type) }
)
}
val patrons = TableQuery[Patron]
def getUsers(implicit session:Session): Future[Map[String, Int]] = Future {
val allUserQuery = for (
user <- users
) yield(user.name, user.age)
allUserQuery.run.toMap
}
def getPatrons(implicit session:Session): Future[Map[String, Int]] = Future {
val allPatronQuery = for (
patron <- patrons
) yield(patron.name, patron.type)
allPatronQuery.run.toMap
}
val (users: List[Map[String, Int]], patrons: Map[Long, String]) = Await.result(
for {
userData <- getUsers
patronData <- getPatrons
} yield (userData, patronData),
10.seconds)

Defining and reading a nullable date column in Slick 3.x

I have a table with a column type date. This column accepts null values, therefore, I declared it as an Option (see field perDate below). The issue is that apparently the implicit conversion from/to java.time.LocalDate/java.sql.Date is incorrect as reading from this table when perDate is null fails with the error:
slick.SlickException: Read NULL value (null) for ResultSet column <computed>
This is the Slick table definition, including the implicit function:
import java.sql.Date
import java.time.LocalDate
class FormulaDB(tag: Tag) extends Table[Formula](tag, "formulas") {
def sk = column[Int]("sk", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def descrip = column[Option[String]]("descrip")
def formula = column[Option[String]]("formula")
def notes = column[Option[String]]("notes")
def periodicity = column[Int]("periodicity")
def perDate = column[Option[LocalDate]]("per_date")(localDateColumnType)
def * = (sk, name, descrip, formula, notes, periodicity, perDate) <>
((Formula.apply _).tupled, Formula.unapply)
implicit val localDateColumnType = MappedColumnType.base[Option[LocalDate], Date](
{
case Some(localDate) => Date.valueOf(localDate)
case None => null
},{
sqlDate => if (sqlDate != null) Some(sqlDate.toLocalDate) else None
}
)
}

Actually your implicit conversion from/to java.time.LocalDate/java.sql.Date is not incorrect.
I have faced the same error, and doing some research I found that the Node created by the Slick SQL Compiler is actually of type MappedJdbcType[Scala.Option -> LocalDate], and not Option[LocalDate].
That is the reason why when the mapping compiler create the column converter for your def perDate it is creating a Base ResultConverterand not a Option ResultConverter
Here is the Slick code for the base converter:
def base[T](ti: JdbcType[T], name: String, idx: Int) = (ti.scalaType match {
case ScalaBaseType.byteType => new BaseResultConverter[Byte](ti.asInstanceOf[JdbcType[Byte]], name, idx)
case ScalaBaseType.shortType => new BaseResultConverter[Short](ti.asInstanceOf[JdbcType[Short]], name, idx)
case ScalaBaseType.intType => new BaseResultConverter[Int](ti.asInstanceOf[JdbcType[Int]], name, idx)
case ScalaBaseType.longType => new BaseResultConverter[Long](ti.asInstanceOf[JdbcType[Long]], name, idx)
case ScalaBaseType.charType => new BaseResultConverter[Char](ti.asInstanceOf[JdbcType[Char]], name, idx)
case ScalaBaseType.floatType => new BaseResultConverter[Float](ti.asInstanceOf[JdbcType[Float]], name, idx)
case ScalaBaseType.doubleType => new BaseResultConverter[Double](ti.asInstanceOf[JdbcType[Double]], name, idx)
case ScalaBaseType.booleanType => new BaseResultConverter[Boolean](ti.asInstanceOf[JdbcType[Boolean]], name, idx)
case _ => new BaseResultConverter[T](ti.asInstanceOf[JdbcType[T]], name, idx) {
override def read(pr: ResultSet) = {
val v = ti.getValue(pr, idx)
if(v.asInstanceOf[AnyRef] eq null) throw new SlickException("Read NULL value ("+v+") for ResultSet column "+name)
v
}
}
}).asInstanceOf[ResultConverter[JdbcResultConverterDomain, T]]
Unfortunately I have no solution for this problem, what I suggest as a workaround, is to map your perDate property as follows:
import java.sql.Date
import java.time.LocalDate
class FormulaDB(tag: Tag) extends Table[Formula](tag, "formulas") {
def sk = column[Int]("sk", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def descrip = column[Option[String]]("descrip")
def formula = column[Option[String]]("formula")
def notes = column[Option[String]]("notes")
def periodicity = column[Int]("periodicity")
def perDate = column[Option[Date]]("per_date")
def toLocalDate(time : Option[Date]) : Option[LocalDate] = time.map(t => t.toLocalDate))
def toSQLDate(localDate : Option[LocalDate]) : Option[Date] = localDate.map(localDate => Date.valueOf(localDate)))
private type FormulaEntityTupleType = (Int, String, Option[String], Option[String], Option[String], Int, Option[Date])
private val formulaShapedValue = (sk, name, descrip, formula, notes, periodicity, perDate).shaped[FormulaEntityTupleType]
private val toFormulaRow: (FormulaEntityTupleType => Formula) = { formulaTuple => {
Formula(formulaTuple._1, formulaTuple._2, formulaTuple._3, formulaTuple._4, formulaTuple._5, formulaTuple._6, toLocalDate(formulaTuple._7))
}
}
private val toFormulaTuple: (Formula => Option[FormulaEntityTupleType]) = { formulaRow =>
Some((formulaRow.sk, formulaRow.name, formulaRow.descrip, formulaRow.formula, formulaRow.notes, formulaRow.periodicity, toSQLDate(formulaRow.perDate)))
}
def * = formulaShapedValue <> (toFormulaRow, toFormulaTuple)
Hopefully the answer comes not too late.

I'm pretty sure the problem is that your'e returning null from your mapping function instead of None.
Try rewriting your mapping function as a function from LocalDate to Date:
implicit val localDateColumnType = MappedColumnType.base[LocalDate, Date](
{
localDate => Date.valueOf(localDate)
},{
sqlDate => sqlDate.toLocalDate
}
)
Alternately, mapping from Option[LocalDate] to Option[Date] should work:
implicit val localDateColumnType =
MappedColumnType.base[Option[LocalDate], Option[Date]](
{
localDateOption => localDateOption.map(Date.valueOf)
},{
sqlDateOption => sqlDateOption.map(_.toLocalDate)
}
)

repeat action several times and collect the result

I have an action which I need to do over an object several times and the collect the result of each action with that object.
Basically it looks like this
def one_action = { obj ->
def eval_object = process(obj)
eval_object.processed = true
return eval_object
}
def multiple_actions = { obj, n, action ->
def result = []
n.times {
result << action(obj)
}
return result
}
println multiple_actions(object, 10, one_action)
Is there a way to omit declaration of def result = [] and return the list directly from the closure?

You can collect the range, starting from zero:
def one_action = { obj ->
"a $obj"
}
def multiple_actions = { obj, n, action ->
(0..<n).collect { action obj }
}
assert multiple_actions("b", 3, one_action) == ["a b"] * 3

scala parallel collections: Idiomatic way of having thread-local-variables for worker threads

The progress function below is my worker function. I need to give it access to some classes which are costly to create / acquire. Is there any standard machinery for thread-local-variables in the libraries for this ? Or will I have to write a object pool manager myself ?
object Start extends App {
def progress {
val current = counter.getAndIncrement
if(current % 100 == 0) {
val perc = current.toFloat * 100 / totalPosts
print(f"\r$perc%4.2f%%")
}
}
val lexicon = new Global()
def processTopic(forumId: Int, topicId: Int) {
val(topic, posts) = ProcessingQueries.getTopicAndPosts(forumId,topicId)
progress
}
val (fid, tl) = ProcessingQueries.getAllTopics("hon")
val totalPosts = tl.size
val counter = new AtomicInteger(0)
val par = tl.par
par.foreach { topic_id =>
processTopic(fid,topic_id)
}
}

Replaced the previous answer. This does the trick nice and tidy
object MyAnnotator extends ThreadLocal[StanfordCoreNLP] {
val props = new Properties()
props.put("annotators", "tokenize,ssplit,pos,lemma,parse")
props.put("ssplit.newlineIsSentenceBreak", "two")
props.put("parse.maxlen", "40")
override def initialValue = new StanfordCoreNLP(props)
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

running Scala threads - multithreading

Related

How to remove repeated elements from an iterator?

Scala slick retrieve data tables in parallel

Defining and reading a nullable date column in Slick 3.x

repeat action several times and collect the result

scala parallel collections: Idiomatic way of having thread-local-variables for worker threads

Categories

Resources