Spark AccumulatorV2 with HashMap - apache-spark

I am trying to create a custom AccumulatorV2 with a hash map, the input would be hashmap and output would be a map of HashMap,
My intention is to have a K -> K1,V, where the value will increment. I am confused by the scala syntax for overriding AccumulatorV2 for Map, did anyone had a luck with this.
class CustomAccumulator extends AccumulatorV2[java.util.Map[String, String], java.util.Map[String,java.util.Map[String, Double]]]

I'm assuming that this is the scenario that needs to be implemented.
Input:
HashMap<String, String>
Output:
Should output a HashMap<String, HashMap<String, Double>>, where the second hashmap contains the count of values corresponding to the keys.
Example:
Inputs:(Following HashMaps are added to the accumulator)
Input HashMap1 -> {"key1", "value1"}, {"key2", "value1"}, {"key3", "value3"}
Input HashMap2 -> {"key1", "value1"}, {"key2", "value1"}
Input HashMap3 -> {"key2", "value1"}
Output:
{"key1", {"value1", 2}}, {"key2", {"value1", 3}}, {"key3", {"value3", 1}}
Code below:
import java.util
import java.util.Map.Entry
import java.util.{HashMap, Map}
import java.util.function.{BiFunction, Consumer}
import scala.collection.JavaConversions._
import org.apache.spark.util.AccumulatorV2
import org.datanucleus.store.rdbms.fieldmanager.OldValueParameterSetter
class CustomAccumulator extends AccumulatorV2[Map[String, String], Map[String, Map[String,Double]]] {
private var hashmap : Map[String, Map[String, Double]] = new HashMap[String, Map[String, Double]];
override def isZero: Boolean = {
return hashmap.size() == 0
}
override def copy(): AccumulatorV2[util.Map[String, String], util.Map[String, util.Map[String, Double]]] = {
var customAccumulatorcopy = new CustomAccumulator()
customAccumulatorcopy.merge(this)
return customAccumulatorcopy
}
override def reset(): Unit = {
this.hashmap = new HashMap[String, Map[String, Double]];
}
override def add(v: util.Map[String, String]): Unit = {
v.foreach(kv => {
val unitValueDouble : Double = 1;
if(this.hashmap.containsKey(kv._1)){
val innerMap = this.hashmap.get(kv._1)
innerMap.merge(kv._2, unitValueDouble, addFunction)
}
else {
val innerMap : Map[String, Double] = new HashMap[String, Double]()
innerMap.put(kv._2, unitValueDouble)
this.hashmap.put(kv._1, innerMap)
}
}
)
}
override def merge(otherAccumulator: AccumulatorV2[util.Map[String, String], util.Map[String, util.Map[String, Double]]]): Unit = {
otherAccumulator.value.foreach(kv => {
this.hashmap.merge(kv._1, kv._2, mergeMapsFunction)
})
}
override def value: util.Map[String, util.Map[String, Double]] = {
return this.hashmap
}
val mergeMapsFunction = new BiFunction[Map[String, Double], Map[String, Double], Map[String, Double]] {
override def apply(oldMap: Map[String, Double], newMap: Map[String, Double]): Map[String, Double] = {
newMap.foreach(kv => {
oldMap.merge(kv._1, kv._2, addFunction);
})
oldMap
}
}
val addFunction = new BiFunction[Double, Double, Double] {
override def apply(oldValue: Double, newValue: Double): Double = oldValue + newValue
}
}
Thanks!!!

Related

Error: retrofit2.DefaultCallAdapterFactory$ExecutorCallbackCall#94952f

I want to get information using retrofit2
However, it is not possible due to the first error. How to solve this
this error is not infomation,,,,,
data class Beers(
val id : String,
val name : String,
val ph : Int,
val first_brewed : String
)
val retrofit = Retrofit.Builder()
.baseUrl("https://api.punkapi.com/v2/")
.addConverterFactory(GsonConverterFactory.create())
.build()
val service = retrofit.create(BeerService::class.java)
service.getBeers().enqueue(object : Callback<List<Beers>>{
override fun onResponse(call: Call<List<Beers>>, response: Response<List<Beers>>) {
if (response.isSuccessful) {
val body = response.body()
binding.tvRe.text = body.toString()
Log.d("Response :", response?.body().toString())
// body?.let {
//
// }
// Log.d("Response :", response?.body().toString())
}
}
override fun onFailure(call: Call<List<Beers>>, t: Throwable) {
Log.d("Error", call.toString())
}
})
}
I fixed the data class
val ph : Int --> val ph : Float

Retrieve nested Data from Firebase Database android

Snapshot of my firebase realtime database
I want to extract the entire data under the "Orders" node, please tell me how should I model my data class for android in Kotlin?
I tried with this type of modeling,
After getting the reference of (Orders/uid/)
Order.kt
data class Order(
val items:ArrayList<Myitems>=ArrayList(),
val timeStamp:Long=0,
val totalCost:Int=0
)
MyItems.kt
data class MyItems(
val Item:ArrayList<Menu>=ArrayList()
)
Menu.kt
data class Menu(
val menCategory:String="",
val menName:String="",
val menImage:String="",
val menId:String="",
val menQuantity:Int=0,
val menCost:Int=0
)
After a lot of thinking and research online. I was finally able to model my classes and call add value event listener to it. Here it goes:
Order.kt
data class Order(
val items: ArrayList<HashMap<String, Any>> = ArrayList(),
val timeStamp: Long = 0,
val totalCost: Int = 0
)
OItem.kt
data class OItem(
val menCategory: String = "",
val menId: String = "",
val menImage: String = "",
val menName: String = "",
val menPrice: Int = 0,
var menQuantity: Int = 0
)
MainActivity.kt
val uid = FirebaseAuth.getInstance().uid
val ref = FirebaseDatabase.getInstance().getReference("Orders/$uid")
ref.addListenerForSingleValueEvent(object : ValueEventListener {
override fun onCancelled(error: DatabaseError) {
//
}
override fun onDataChange(p0: DataSnapshot) {
p0.children.forEach {
val order = it.getValue(Order::class.java)
ordList.add(order!!)
}
Log.d("hf", ordList.toString())
}
})

Same value was passed as the nextKey in two sequential Pages loaded from a PagingSource in Paging Library 3 Android

I migrated from paging 2 to paging 3. I tried to implement ItemKeyedDataSource of Paging 2 to Paging library 3. But the problem I was facing is, the same value(currentJodId) was passed as the nextkey in two sequential Pages loaded. And After that app crashes. but if I add "keyReuseSupported = true" in DataSource, app does not crash. But it started calling same item id as the nextkey.
JobSliderRestApi.kt
#GET("job/list/slides")
fun getDetailOfSelectedJob(
#Query("current_job") currentJodId: Int?,
#Query("limit") jobLimit: Int?,
#Query("search_in") fetchType: String?
): Single<Response<JobViewResponse>>
JobViewResponse.kt
data class JobViewResponse(
#SerializedName("data") val data: ArrayList<JobDetail>?
) : BaseResponse()
JobDetail.kt
data class JobDetail(
#SerializedName("job_id") val jobId: Int,
#SerializedName("tuition_type") val jobType: String?,
#SerializedName("class_image") val jobImage: String,
#SerializedName("salary") val salary: String,
#SerializedName("no_of_student") val noOfStudent: Int,
#SerializedName("student_gender") val studentGender: String,
#SerializedName("tutor_gender") val preferredTutor: String,
#SerializedName("days_per_week") val daysPerWeek: String?,
#SerializedName("other_req") val otherReq: String?,
#SerializedName("latitude") val latitude: Double?,
#SerializedName("longitude") val longitude: Double?,
#SerializedName("area") val area: String,
#SerializedName("tutoring_time") val tutoringTime: String?,
#SerializedName("posted_date") val postedDate: String?,
#SerializedName("subjects") val subjects: String,
#SerializedName("title") val title: String
)
JodSliderDataSource.kt
class JodSliderDataSource #Inject constructor(
private val jobSliderRestApi: JobSliderRestApi
): RxPagingSource<Int, JobDetail>() {
// override val keyReuseSupported = true
#ExperimentalPagingApi
override fun getRefreshKey(state: PagingState<Int, JobDetail>): Int? {
return state.anchorPosition?.let {
state.closestItemToPosition(it)?.jobId
}
}
override fun loadSingle(params: LoadParams<Int>): Single<LoadResult<Int, JobDetail>> {
return jobSliderRestApi.getDetailOfSelectedJob(42673, 2, "next").toSingle()
.subscribeOn(Schedulers.io())
.map { jobResponse -> toLoadResult(jobResponse.data) }
.onErrorReturn { LoadResult.Error(it) }
}
private fun toLoadResult(data: ArrayList<JobDetail>): LoadResult<Int, JobDetail> {
return LoadResult.Page(data = data, prevKey = null, nextKey = data.lastOrNull()?.jobId)
}
}
I was getting the same error and this is what worked for me. In the JodSliderDataSource class, toLoadResult method, set the nextKey parameter value by getting the page number from the response data and adding one.
private fun toLoadResult(
data: ArrayList<JobDetail>
): LoadResult<Int, JobDetail> {
return LoadResult.Page(
data = data,
prevKey = null,
nextKey = data.lastOrNull()?.jobId + 1 // Add one to the page number here.
)
}

Spark SQL doesn't call my UDT equals/hashcode methods

I want to implement my comparison operators(equals, hashcode, ordering) in a data type defined by me in Spark SQL. Although Spark SQL UDT's still remains private, I follow some examples like this, to workaround this situation.
I have a class called MyPoint:
#SQLUserDefinedType(udt = classOf[MyPointUDT])
case class MyPoint(x: Double, y: Double) extends Serializable {
override def hashCode(): Int = {
println("hash code")
31 * (31 * x.hashCode()) + y.hashCode()
}
override def equals(other: Any): Boolean = {
println("equals")
other match {
case that: MyPoint => this.x == that.x && this.y == that.y
case _ => false
}
}
Then, I have the UDT class:
private class MyPointUDT extends UserDefinedType[MyPoint] {
override def sqlType: DataType = ArrayType(DoubleType, containsNull = false)
override def serialize(obj: MyPoint): ArrayData = {
obj match {
case features: MyPoint =>
new GenericArrayData2(Array(features.x, features.y))
}
}
override def deserialize(datum: Any): MyPoint = {
datum match {
case data: ArrayData if data.numElements() == 2 => {
val arr = data.toDoubleArray()
new MyPoint(arr(0), arr(1))
}
}
}
override def userClass: Class[MyPoint] = classOf[MyPoint]
override def asNullable: MyPointUDT = this
}
Then I create a simple DataFrame:
val p1 = new MyPoint(1.0, 2.0)
val p2 = new MyPoint(1.0, 2.0)
val p3 = new MyPoint(10.0, 20.0)
val p4 = new MyPoint(11.0, 22.0)
val points = Seq(
("P1", p1),
("P2", p2),
("P3", p3),
("P4", p4)
).toDF("label", "point")
points.registerTempTable("points")
spark.sql("SELECT Distinct(point) FROM points").show()
The problem is: Why the SQL query doesn't execute the equals method inside MyPoint class? How comparasions are being made? How can I implement my comparasion operators in this example?

Exception when using UDT in Spark DataFrame

I'm trying to create a user defined type in spark sql, but I receive:
com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType even when using their example. Has anyone made this work?
My code:
test("udt serialisation") {
val points = Seq(new ExamplePoint(1.3, 1.6), new ExamplePoint(1.3, 1.8))
val df = SparkContextForStdout.context.parallelize(points).toDF()
}
#SQLUserDefinedType(udt = classOf[ExamplePointUDT])
case class ExamplePoint(val x: Double, val y: Double)
/**
* User-defined type for [[ExamplePoint]].
*/
class ExamplePointUDT extends UserDefinedType[ExamplePoint] {
override def sqlType: DataType = ArrayType(DoubleType, false)
override def pyUDT: String = "pyspark.sql.tests.ExamplePointUDT"
override def serialize(obj: Any): Seq[Double] = {
obj match {
case p: ExamplePoint =>
Seq(p.x, p.y)
}
}
override def deserialize(datum: Any): ExamplePoint = {
datum match {
case values: Seq[_] =>
val xy = values.asInstanceOf[Seq[Double]]
assert(xy.length == 2)
new ExamplePoint(xy(0), xy(1))
case values: util.ArrayList[_] =>
val xy = values.asInstanceOf[util.ArrayList[Double]].asScala
new ExamplePoint(xy(0), xy(1))
}
}
override def userClass: Class[ExamplePoint] = classOf[ExamplePoint]
}
The usefull stackstrace is this:
com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType
java.lang.ClassCastException: com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:316)
at org.apache.spark.sql.SQLContext$implicits$.rddToDataFrameHolder(SQLContext.scala:254)
It seems that the UDT needs to be used inside of another class to work (as the type of a field). One solution to use it directly is to wrap it into a Tuple1:
test("udt serialisation") {
val points = Seq(new Tuple1(new ExamplePoint(1.3, 1.6)), new Tuple1(new ExamplePoint(1.3, 1.8)))
val df = SparkContextForStdout.context.parallelize(points).toDF()
df.collect().foreach(println(_))
}

Resources