What does Queue() function do in Chisel? - riscv

I was reading source code of rocket chip, in rocc.scala file in rocket/src/main/scala/ there is an example AccumulatorExample for using rocc. At first part of the code there is a function Queue() that I couldn't figure out what it's doing?
val n = 4
val regfile = Mem(UInt(width = params(XprLen)), n)
val busy = Vec.fill(n){Reg(init=Bool(false))}
val cmd = Queue(io.cmd)
val funct = cmd.bits.inst.funct
val addr = cmd.bits.inst.rs2(log2Up(n)-1,0)
val doWrite = funct === UInt(0)
val doRead = funct === UInt(1)
val doLoad = funct === UInt(2)
val doAccum = funct === UInt(3)
val memRespTag = io.mem.resp.bits.tag(log2Up(n)-1,0)
Thanks

Queue is a Module providing a hardware queue. Circle-talk I know but it's the best I can give. Hope this helps! Your code looks like it is setting the source of the queue as the io.cmd.
Constructor:
Queue(enq:DecoupledIO, entries:Int)
enq DecoupledIO source for the queue
entries size of queue
Interface:
.io.enq Decoupled | IO source (flipped)
.io.deq Decoupled | IO sink
.io.count UInt | count of elements in the queue

Related

Smart cast to 'Bitmap!' is impossible, because 'textBitmap' is a local variable that is captured by a changing closure

when ever I build my project I got this error
here is the kotlin class code
var textBitmap: Bitmap? = null
dynamicItem.dynamicText[imageKey]?.let { drawingText ->
dynamicItem.dynamicTextPaint[imageKey]?.let { drawingTextPaint ->
drawTextCache[imageKey]?.let {
textBitmap = it
} ?: kotlin.run {
textBitmap = Bitmap.createBitmap(drawingBitmap.width, drawingBitmap.height, Bitmap.Config.ARGB_8888)
val drawRect = Rect(0, 0, drawingBitmap.width, drawingBitmap.height)
val textCanvas = Canvas(textBitmap)
drawingTextPaint.isAntiAlias = true
val fontMetrics = drawingTextPaint.getFontMetrics();
val top = fontMetrics.top
val bottom = fontMetrics.bottom
val baseLineY = drawRect.centerY() - top / 2 - bottom / 2
textCanvas.drawText(drawingText, drawRect.centerX().toFloat(), baseLineY, drawingTextPaint);
drawTextCache.put(imageKey, textBitmap as Bitmap)
}
I couldn't figure out how to fix it
Instead of doing nested let like that, i would prefer to do some guard clause
val drawingText = dynamicItem.dynamicText[imageKey] ?: return // or you could assign an empty string `?: "" `
val drawingTextPaint = dynamicItem.dynamicTextPaint[imageKey] ?: return
val textBitmap: Bitmap = drawTextCache[imageKey] ?: Bitmap.createBitmap(drawingBitmap.width, drawingBitmap.height, Bitmap.Config.ARGB_8888).applyCanvas {
val drawRect = Rect(0, 0, drawingBitmap.width, drawingBitmap.height)
val fontMetrics = drawingTextPaint.getFontMetrics()
val top = fontMetrics.top
val bottom = fontMetrics.bottom
val baseLineY = drawRect.centerY() - top / 2 - bottom / 2
drawingTextPaint.isAntiAlias = true
drawText(drawingText, drawRect.centerX().toFloat(), baseLineY, drawingTextPaint);
}
drawTextCache.put(imageKey, textBitmap)
Basically Kotlin can't smart cast textBitmap to a non-null Bitmap inside that lambda. You're probably getting the error on the Canvas(textBitmap) call, which can't take a null parameter, and the compiler can't guarantee textBitmap isn't null at that moment.
It's a limitation of lambdas referencing external vars which can be changed - I think it's because a lambda could potentially be run at some other time, so no guarantees can be made about what's happening to that external variable and whether something else could have modified it. I don't know the details, there's some chat here if you like.
The fix is pretty easy though, if all you're doing is creating a textBitmap variable and assigning something to it:
// Assign it as a result of the expression - no need to create a var first and keep
// changing the value, no need for a temporary null value, it can just be a val
val textBitmap: Bitmap? =
dynamicItem.dynamicText[imageKey]?.let { drawingText ->
dynamicItem.dynamicTextPaint[imageKey]?.let { drawingTextPaint ->
drawTextCache[imageKey]
?: Bitmap.createBitmap(drawingBitmap.width, drawingBitmap.height, Bitmap.Config.ARGB_8888).apply {
val drawRect = Rect(0, 0, drawingBitmap.width, drawingBitmap.height)
val textCanvas = Canvas(this)
drawingTextPaint.isAntiAlias = true
val fontMetrics = drawingTextPaint.getFontMetrics();
val top = fontMetrics.top
val bottom = fontMetrics.bottom
val baseLineY = drawRect.centerY() - top / 2 - bottom / 2
textCanvas.drawText(drawingText, drawRect.centerX().toFloat(), baseLineY, drawingTextPaint);
drawTextCache.put(imageKey, this)
}
}
}
I'd recommend breaking the bitmap creation part out into its own function for readability, and personally I'd avoid the nested lets (because it's not immediately obvious what you get in what situation) but that's a style choice

treeAggregate use case explanation

I am trying to understand treeAggregate but there isn't enough examples online.
So does the following code merges the elements of partition then calls makeSummary and in parallel do the same for each partition (sums the result and summarizes it again) then with depth set to (lets say) 5, is this repeated 5 times?
The result I want to get from these is to summarize the arrays until I get one of them.
val summary = input.transform(rdd=>{
rdd.treeAggregate(initialSet)(addToSet,mergePartitionSets,5)
// this returns Array[Double] not rdd but still
})
val initialSet = Array.empty[Double]
def addToSet = (s: Array[Double], v: (Int,Array[Double])) => {
val p=s ++ v._2
val ret = makeSummary(p,10000)
ret
}
val mergePartitionSets = (p1: Array[Double], p2: Array[Double]) => {
val p = p1 ++ p2
val ret = makeSummary(p,10000)
ret
}
//makeSummary selects half of the points of p randomly

Why am I getting a race condition in multi-threading scala?

I am trying to parallelise a p-norm calculation over an array.
To achieve that I try the following, I understand I can solve this differently but I am interested in understanding where the race condition is occurring,
val toSum = Array(0,1,2,3,4,5,6)
// Calculate the sum over a segment of an array
def sumSegment(a: Array[Int], p:Double, s: Int, t: Int): Int = {
val res = {for (i <- s until t) yield scala.math.pow(a(i), p)}.reduceLeft(_ + _)
res.toInt
}
// Calculate the p-norm over an Array a
def parallelpNorm(a: Array[Int], p: Double): Double = {
var acc = 0L
// The worker who should calculate the sum over a slice of an array
class sumSegmenter(s: Int, t: Int) extends Thread {
override def run() {
// Calculate the sum over the slice
val subsum = sumSegment(a, p, s, t)
// Add the sum of the slice to the accumulator in a synchronized fashion
val x = new AnyRef{}
x.synchronized {
acc = acc + subsum
}
}
}
val split = a.size / 2
val seg_one = new sumSegmenter(0, split)
val seg_two = new sumSegmenter(split, a.size)
seg_one.start
seg_two.start
seg_one.join
seg_two.join
scala.math.pow(acc, 1.0 / p)
}
println(parallelpNorm(toSum, 2))
Expected output is 9.5393920142 but instead some runs give me 9.273618495495704 or even 2.23606797749979.
Any recommendations where the race condition could happen?
The problem has been explained in the previous answer, but a better way to avoid this race condition and improve performance is to use an AtomicInteger
// Calculate the p-norm over an Array a
def parallelpNorm(a: Array[Int], p: Double): Double = {
val acc = new AtomicInteger(0)
// The worker who should calculate the sum over a slice of an array
class sumSegmenter(s: Int, t: Int) extends Thread {
override def run() {
// Calculate the sum over the slice
val subsum = sumSegment(a, p, s, t)
// Add the sum of the slice to the accumulator in a synchronized fashion
acc.getAndAdd(subsum)
}
}
val split = a.length / 2
val seg_one = new sumSegmenter(0, split)
val seg_two = new sumSegmenter(split, a.length)
seg_one.start()
seg_two.start()
seg_one.join()
seg_two.join()
scala.math.pow(acc.get, 1.0 / p)
}
Modern processors can do atomic operations without blocking which can be much faster than explicit synchronisation. In my tests this runs twice as fast as the original code (with correct placement of x)
Move val x = new AnyRef{} outside sumSegmenter (that is, into parallelpNorm) -- the problem is that each thread is using its own mutex rather than sharing one.

Spark,Graphx program does not utilize cpu and memory

I have a function that takes the neighbors of a node ,for the neighbors i use broadcast variable and the id of the node itself and it calculates the closeness centrality for that node.I map each node of the graph with the result of that function.When i open the task manager the cpu is not utilized at all as if it is not working in parallel , the same goes for memory , but the every node executes the function in parallel and also the data is large and it takes time to complete ,its not like it does not need the resources.Every help is truly appreciated , thank you.
For loading the graph i use val graph = GraphLoader.edgeListFile(sc, path).cache
object ClosenessCentrality {
case class Vertex(id: VertexId)
def run(graph: Graph[Int, Float],sc: SparkContext): Unit = {
//Have to reverse edges and make graph undirected because is bipartite
val neighbors = CollectNeighbors.collectWeightedNeighbors(graph).collectAsMap()
val bNeighbors = sc.broadcast(neighbors)
val result = graph.vertices.map(f => shortestPaths(f._1,bNeighbors.value))
//result.coalesce(1)
result.count()
}
def shortestPaths(source: VertexId, neighbors: Map[VertexId, Map[VertexId, Float]]): Double ={
val predecessors = new mutable.HashMap[VertexId, ListBuffer[VertexId]]()
val distances = new mutable.HashMap[VertexId, Double]()
val q = new FibonacciHeap[Vertex]
val nodes = new mutable.HashMap[VertexId, FibonacciHeap.Node[Vertex]]()
distances.put(source, 0)
for (w <- neighbors) {
if (w._1 != source)
distances.put(w._1, Int.MaxValue)
predecessors.put(w._1, ListBuffer[VertexId]())
val node = q.insert(Vertex(w._1), distances(w._1))
nodes.put(w._1, node)
}
while (!q.isEmpty) {
val u = q.minNode
val node = u.data.id
q.removeMin()
//discover paths
//println("Current node is:"+node+" "+neighbors(node).size)
for (w <- neighbors(node).keys) {
//print("Neighbor is"+w)
val alt = distances(node) + neighbors(node)(w)
// if (distances(w) > alt) {
// distances(w) = alt
// q.decreaseKey(nodes(w), alt)
// }
// if (distances(w) == alt)
// predecessors(w).+=(node)
if(alt< distances(w)){
distances(w) = alt
predecessors(w).+=(node)
q.decreaseKey(nodes(w), alt)
}
}//For
}
val sum = distances.values.sum
sum
}
To provide somewhat of an answer to your original question, I suspect that your RDD only has a single partition, thus using a single core to process.
The edgeListFile method has an argument to specify the minimum number of partitions you want.
Also, you can use repartition to get more partitions.
You mentionned coalesce but that only reduces the number of partitions by default, see this question : Spark Coalesce More Partitions

Scala Future/Promise fast-fail pipeline

I want to launch two or more Future/Promises in parallel and fail even if one of the launched Future/Promise fails and dont want to wait for the rest to complete.
What is the most idiomatic way to compose this pipeline in Scala.
EDIT: more contextual information.
I have to launch two external processes one writing to a fifo file and another reading from it. Say if the writer process fails; the reader thread might hang forever waiting for any input from the file. So I would want to launch both the processes in parallel and fail fast even if one of the Future/Promise fails without waiting for the completion of the other.
Below is the sample code to be more precise. the commands are not exactly cat and tail. I have used them for brevity.
val future1 = Future { executeShellCommand("cat file.txt > fifo.pipe") }
val future2 = Future { executeShellCommand("tail fifo.pipe") }
If I understand the question correctly, what we are looking for is a fail-fast sequence implementation, which is akin to a failure-biased version of firstCompletedOf
Here, we eagerly register a failure callback in case one of the futures fails early on, ensuring that we fail as soon as any of the futures fail.
import scala.concurrent.{Future, Promise}
import scala.util.{Success, Failure}
import scala.concurrent.ExecutionContext.Implicits.global
def failFast[T](futures: Seq[Future[T]]): Future[Seq[T]] = {
val promise = Promise[Seq[T]]
futures.foreach{f => f.onFailure{case ex => promise.failure(ex)}}
val res = Future.sequence(futures)
promise.completeWith(res).future
}
In contrast to Future.sequence, this implementation will fail as soon as any of the futures fail, regardless of ordering.
Let's show that with an example:
import scala.util.Try
// help method to measure time
def resilientTime[T](t: =>T):(Try[T], Long) = {
val t0 = System.currentTimeMillis
val res = Try(t)
(res, System.currentTimeMillis-t0)
}
import scala.concurrent.duration._
import scala.concurrent.Await
First future will fail (failure in 2 seconds)
val f1 = Future[Int]{Thread.sleep(2000); throw new Exception("boom")}
val f2 = Future[Int]{Thread.sleep(5000); 42}
val f3 = Future[Int]{Thread.sleep(10000); 101}
val res = failFast(Seq(f1,f2,f3))
resilientTime(Await.result(res, 10.seconds))
// res: (scala.util.Try[Seq[Int]], Long) = (Failure(java.lang.Exception: boom),1998)
Last future will fail. Failure also in 2 seconds. (note the order in the sequence construction)
val f1 = Future[Int]{Thread.sleep(2000); throw new Exception("boom")}
val f2 = Future[Int]{Thread.sleep(5000); 42}
val f3 = Future[Int]{Thread.sleep(10000); 101}
val res = failFast(Seq(f3,f2,f1))
resilientTime(Await.result(res, 10.seconds))
// res: (scala.util.Try[Seq[Int]], Long) = (Failure(java.lang.Exception: boom),1998)
Comparing with Future.sequence where failure depends on the ordering (failure in 10 seconds):
val f1 = Future[Int]{Thread.sleep(2000); throw new Exception("boom")}
val f2 = Future[Int]{Thread.sleep(5000); 42}
val f3 = Future[Int]{Thread.sleep(10000); 101}
val seq = Seq(f3,f2,f1)
resilientTime(Await.result(Future.sequence(seq), 10.seconds))
//res: (scala.util.Try[Seq[Int]], Long) = (Failure(java.lang.Exception: boom),10000)
Use Future.sequence:
val both = Future.sequence(Seq(
firstFuture,
secondFuture));
This is the correct way to aggregate two or more futures where the failure of one fails the aggregated future and the aggregated future completes when all inner futures complete. An older version of this answer suggested a for-comprehension which while very common would not reject immediately of one of the futures rejects but rather wait for it.
Zip the futures
val f1 = Future { doSomething() }
val f2 = Future { doSomething() }
val resultF = f1 zip f2
resultF future fails if any one of f1 or f2 is failed
Time taken to resolve is min(f1time, f2time)
scala> import scala.util._
import scala.util._
scala> import scala.concurrent._
import scala.concurrent._
scala> import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.ExecutionContext.Implicits.global
scala> val f = Future { Thread.sleep(10000); throw new Exception("f") }
f: scala.concurrent.Future[Nothing] = scala.concurrent.impl.Promise$DefaultPromise#da1f03e
scala> val g = Future { Thread.sleep(20000); throw new Exception("g") }
g: scala.concurrent.Future[Nothing] = scala.concurrent.impl.Promise$DefaultPromise#634a98e3
scala> val x = f zip g
x: scala.concurrent.Future[(Nothing, Nothing)] = scala.concurrent.impl.Promise$DefaultPromise#3447e854
scala> x onComplete { case Success(x) => println(x) case Failure(th) => println(th)}
result: java.lang.Exception: f

Resources