How to use content in a self-defined register in accelerator? - riscv

I am confusing that how to retrieve a register content, following is my code:
val one = Reg(UInt(5.W))
val two = Reg(UInt(5.W))
val three = Reg(UInt(5.W))
How can I implement this:
val addend = cmd.bits.rs1 + "value in register three"

Related

Join apache spark dataframes properly with scala avoiding null values

Hellow everyone!
I have two DataFrames in apache spark (2.3) and I want to join them properly. I will explain below what I mean with 'properly'. First of all the two dataframes holds the following information:
nodeDf: ( id, year, title, authors, journal, abstract )
edgeDf: ( srcId, dstId, label )
The label could be 0 or 1 in case node1 is connected with node2 or not.
I want to combine this two dataframes to get one dataframe withe the following information:
JoinedDF: ( id_from, year_from, title_from, journal_from, abstract_from, id_to, year_to, title_to, journal_to, abstract_to, time_dist )
time_dist = abs(year_from - year_to)
When I said 'properly' I meant that the query must be as fast as it could be and I don't want to contain null rows or cels ( value on a row ).
I have tried the following but I took me 500 -540 sec to execute the query and the final dataframe contains null values. I don't even know if the dataframes ware joined correctly.
I want to mention that the node file from which I create the nodeDF has 27770 rows and the edge file (edgeDf) has 615512 rows.
Code:
val spark = SparkSession.builder().master("local[*]").appName("Logistic Regression").getOrCreate()
val sc = spark.sparkContext
val data = sc.textFile("resources/data/training_set.txt").map(line =>{
val fields = line.split(" ")
(fields(0),fields(1), fields(2).toInt)
})
val data2 = sc.textFile("resources/data/test_set.txt").map(line =>{
val fields = line.split(" ")
(fields(0),fields(1))
})
import spark.implicits._
val trainingDF = data.toDF("srcId","dstId", "label")
val testDF = data2.toDF("srcId","dstId")
val infoRDD = spark.read.option("header","false").option("inferSchema","true").format("csv").load("resources/data/node_information.csv")
val infoDF = infoRDD.toDF("srcId","year","title","authors","jurnal","abstract")
println("Showing linksDF sample...")
trainingDF.show(5)
println("Rows of linksDF: ",trainingDF.count())
println("Showing infoDF sample...")
infoDF.show(2)
println("Rows of infoDF: ",infoDF.count())
println("Joining linksDF and infoDF...")
var joinedDF = trainingDF.as("a").join(infoDF.as("b"),$"a.srcId" === $"b.srcId")
println(joinedDF.count())
joinedDF = joinedDF.select($"a.srcId",$"a.dstId",$"a.label",$"b.year",$"b.title",$"b.authors",$"b.jurnal",$"b.abstract")
joinedDF.show(5)
val graphX = new GraphX()
val pageRankDf =graphX.computePageRank(spark,"resources/data/training_set.txt",0.0001)
println("Joining joinedDF and pageRankDf...")
joinedDF = joinedDF.as("a").join(pageRankDf.as("b"),$"a.srcId" === $"b.nodeId")
var dfWithRanks = joinedDF.select("srcId","dstId","label","year","title","authors","jurnal","abstract","rank").withColumnRenamed("rank","pgRank")
dfWithRanks.show(5)
println("Renameming joinedDF...")
dfWithRanks = dfWithRanks
.withColumnRenamed("srcId","id_from")
.withColumnRenamed("dstId","id_to")
.withColumnRenamed("year","year_from")
.withColumnRenamed("title","title_from")
.withColumnRenamed("authors","authors_from")
.withColumnRenamed("jurnal","jurnal_from")
.withColumnRenamed("abstract","abstract_from")
var infoDfRenamed = dfWithRanks
.withColumnRenamed("id_from","id_from")
.withColumnRenamed("id_to","id_to")
.withColumnRenamed("year_from","year_to")
.withColumnRenamed("title_from","title_to")
.withColumnRenamed("authors_from","authors_to")
.withColumnRenamed("jurnal_from","jurnal_to")
.withColumnRenamed("abstract_from","abstract_to").select("id_to","year_to","title_to","authors_to","jurnal_to","jurnal_to")
var finalDF = dfWithRanks.as("a").join(infoDF.as("b"),$"a.id_to" === $"b.srcId")
finalDF = finalDF
.withColumnRenamed("year","year_to")
.withColumnRenamed("title","title_to")
.withColumnRenamed("authors","authors_to")
.withColumnRenamed("jurnal","jurnal_to")
.withColumnRenamed("abstract","abstract_to")
println("Dropping unused columns from joinedDF...")
finalDF = finalDF.drop("srcId")
finalDF.show(5)
Here are my results!
Avoid all calculations and code related to pgRank! Is there any proper way to do this join works?
You can filter your data first and then join, in that case you will avoid nulls
df.filter($"ColumnName".isNotNull)
use <=> operator in your joining column condition
var joinedDF = trainingDF.as("a").join(infoDF.as("b"),$"a.srcId" <=> $"b.srcId")
There is a function in spark 2.1 or greater is eqNullSafe
var joinedDF = trainingDF.join(infoDF,trainingDF("srcId").eqNullSafe(infoDF("srcId")))

Spark Testing Base returning a tuple of Dstream

I'm using spark-unit-test and I don't get to compile the code.
test("Testging") {
val inputInsert = A("data2")
val inputDelete = A("data1")
val outputInsert = B(1)
val outputDelete = C(1)
val input = List(List(inputInsert), List(inputDelete))
val output = (List(List(outputInsert)), List(List(outputDelete)))
//Why doesn't it compile?? I have tried many things here.
testOperation[A,(B,C)](input, service.processing _, output)
}
My method is:
def processing(avroDstream: DStream[A]) : (DStream[B],DStream[C]) ={...}
What does the "_" means in this case?

create a register dynamic dataframe as temptable in spark

I am trying to registerTemptables, from dynamic dataframes.
I am getting the output as a string., i am not sure if there is a way to execute dataframe or convert a string to dataframe so that the temptable can be created.
Here are the steps to replicate this issue :
import org.apache.spark.sql._
val contact_df = sc.makeRDD(1 to 5).map(i => (i, i * i)).toDF("value", "square")
val acct_df = sc.makeRDD(1 to 5).map(i => (i, i / i)).toDF("value", "devide")
val dataframeJoins = Array(
Row("x","","","" ,"Y","",1,"contact_hotline_df","contact_df","acct_nbr","hotline_df","tm49_acct_nbr"),
Row("x","","","","Y","",2,"contact_hotline_acct_df","acct_df","tm06_acct_nbr" ,"contact_hotline_df","acct_nbr")
)
val dfJoinbroadcast = sc.broadcast(dataframeJoins)
val DFJoins1 = for ( row <- dfJoinbroadcast.value ) yield {
(row(8)+".registerTempTable(\""+row(8)+"\")" )
}
for (rows <- 0 until DFJoins1.size ){
println(DFJoins1(rows) )
DFJoins1(rows)
}
Here is the output of the above for loop :
contact_df.registerTempTable("contact_df")
acct_df.registerTempTable("acct_df")
I am not getting any error. But the table is not getting created.
When i say sqlContext.sql("select * from contact_df") i am getting an error that table is not created.
Is there a way to convert string to a dataframe and execute the dataframe to create temptable.
Please suggest.
Thanks,
Sreehari
Your code concatenates the strings and prints the result, that's it. The registerTempTable method is not being called, that's why you cant use it in the SQL query. Try to do this:
// assuming we have this string to object mapping
val tableNameToDf = Map("contact_df" -> contact_df, "acct_df" -> acct_df)
you could restructure your for loop into something like:
val dfJoins = for (row <- dfJoinbroadcast.value) yield {
val wannabeTable = row(8)
tableNameToRdd(wannabeTable).createOrReplaceTempView(wannabeTable)
wannabeTableName
}

Spark 1.6 - Remove itemsets with only 1 item

I've the following code:
val df = sqlContext.sql("SELECT Transaction_ID,Product_ID FROM Transactions as tmp")
val rawDict = df.select('Product_ID).distinct().sort('Product_ID)
val dictCounts = rawDict.groupBy('Product_ID).count().filter(col("count") >= 2)
val sigCounts = dictCounts.filter('count === 1)
val dupCounts = dictCounts.filter('count > 1)
val sigDescs = rawDict.join(sigCounts, "Product_ID").drop('count)
val invoiceToStockCode = df.select('Transaction_ID, 'Product_ID).distinct()
val baskets = invoiceToStockCode.groupBy('Transaction_ID).agg(collect_list('Product_ID).as('StockCodes)).cache()
And I'm trying to extract some association rules. For that I need to guarantee that all the transactions are groupped by more than one product. But with my code I'm getting transaction with only one product.
How can I filter that?
Thanks!
Although I'm no 100% sure I got your requirement, I think this should work:
val rawDict = df.select('Product_ID).sort('Product_ID)
val dictCounts = rawDict.groupBy('Product_ID).count()
val valid = dictCounts.filter('count > 1).drop('count) // only consider counts > 1
val singleRemoved=df.join(valid,Seq("Product_ID")).distinct()
val groupedTransactions = singleRemoved.groupBy("Transaction_ID").agg(collect_list("Product_ID"))

How can I output the content of a RDD in the console

val sparkConf = new SparkConf().setAppName("SparkKMeans")
val sc = new SparkContext(sparkConf)
val lines = sc.textFile(args(0))
val data = lines.map(parseVector _).cache()
val K = args(1).toInt
val convergeDist = args(2).toDouble
val kPoints = data.takeSample(withReplacement = false, K, 42).toArray
For example,how can I output the content of "lines" or "data"?That's the result I got(kPoints):[Lbreeze.linalg.Vector;#3bfc6a5e
You asked the wrong question.
Your question is not how to get the content - you've already did with RDD.takeSample. And you also know how to print it to stdout otherwise you won't have such thing like [Lbreeze.linalg.Vector;#3bfc6a5e
Things you really need is a readable format. In most case kPoints.map(_.toString) should do it. If not, you need to manually write a function to do it. I guess you can try convert Vector to Array[double] then call toString
And: Check on vectors' value is meaningless, why not use some stat method to observe?

Resources