Exception when using UDT in Spark DataFrame

Exception when using UDT in Spark DataFrame - apache-spark

I'm trying to create a user defined type in spark sql, but I receive:
com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType even when using their example. Has anyone made this work?
My code:
test("udt serialisation") {
val points = Seq(new ExamplePoint(1.3, 1.6), new ExamplePoint(1.3, 1.8))
val df = SparkContextForStdout.context.parallelize(points).toDF()
}
#SQLUserDefinedType(udt = classOf[ExamplePointUDT])
case class ExamplePoint(val x: Double, val y: Double)
/**
* User-defined type for [[ExamplePoint]].
*/
class ExamplePointUDT extends UserDefinedType[ExamplePoint] {
override def sqlType: DataType = ArrayType(DoubleType, false)
override def pyUDT: String = "pyspark.sql.tests.ExamplePointUDT"
override def serialize(obj: Any): Seq[Double] = {
obj match {
case p: ExamplePoint =>
Seq(p.x, p.y)
}
}
override def deserialize(datum: Any): ExamplePoint = {
datum match {
case values: Seq[_] =>
val xy = values.asInstanceOf[Seq[Double]]
assert(xy.length == 2)
new ExamplePoint(xy(0), xy(1))
case values: util.ArrayList[_] =>
val xy = values.asInstanceOf[util.ArrayList[Double]].asScala
new ExamplePoint(xy(0), xy(1))
}
}
override def userClass: Class[ExamplePoint] = classOf[ExamplePoint]
}
The usefull stackstrace is this:
com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType
java.lang.ClassCastException: com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:316)
at org.apache.spark.sql.SQLContext$implicits$.rddToDataFrameHolder(SQLContext.scala:254)

It seems that the UDT needs to be used inside of another class to work (as the type of a field). One solution to use it directly is to wrap it into a Tuple1:
test("udt serialisation") {
val points = Seq(new Tuple1(new ExamplePoint(1.3, 1.6)), new Tuple1(new ExamplePoint(1.3, 1.8)))
val df = SparkContextForStdout.context.parallelize(points).toDF()
df.collect().foreach(println(_))
}

Related

Retrieve nested Data from Firebase Database android

Snapshot of my firebase realtime database
I want to extract the entire data under the "Orders" node, please tell me how should I model my data class for android in Kotlin?
I tried with this type of modeling,
After getting the reference of (Orders/uid/)
Order.kt
data class Order(
val items:ArrayList<Myitems>=ArrayList(),
val timeStamp:Long=0,
val totalCost:Int=0
)
MyItems.kt
data class MyItems(
val Item:ArrayList<Menu>=ArrayList()
)
Menu.kt
data class Menu(
val menCategory:String="",
val menName:String="",
val menImage:String="",
val menId:String="",
val menQuantity:Int=0,
val menCost:Int=0
)

After a lot of thinking and research online. I was finally able to model my classes and call add value event listener to it. Here it goes:
Order.kt
data class Order(
val items: ArrayList<HashMap<String, Any>> = ArrayList(),
val timeStamp: Long = 0,
val totalCost: Int = 0
)
OItem.kt
data class OItem(
val menCategory: String = "",
val menId: String = "",
val menImage: String = "",
val menName: String = "",
val menPrice: Int = 0,
var menQuantity: Int = 0
)
MainActivity.kt
val uid = FirebaseAuth.getInstance().uid
val ref = FirebaseDatabase.getInstance().getReference("Orders/$uid")
ref.addListenerForSingleValueEvent(object : ValueEventListener {
override fun onCancelled(error: DatabaseError) {
//
}
override fun onDataChange(p0: DataSnapshot) {
p0.children.forEach {
val order = it.getValue(Order::class.java)
ordList.add(order!!)
}
Log.d("hf", ordList.toString())
}
})

Spark SQL doesn't call my UDT equals/hashcode methods

I want to implement my comparison operators(equals, hashcode, ordering) in a data type defined by me in Spark SQL. Although Spark SQL UDT's still remains private, I follow some examples like this, to workaround this situation.
I have a class called MyPoint:
#SQLUserDefinedType(udt = classOf[MyPointUDT])
case class MyPoint(x: Double, y: Double) extends Serializable {
override def hashCode(): Int = {
println("hash code")
31 * (31 * x.hashCode()) + y.hashCode()
}
override def equals(other: Any): Boolean = {
println("equals")
other match {
case that: MyPoint => this.x == that.x && this.y == that.y
case _ => false
}
}
Then, I have the UDT class:
private class MyPointUDT extends UserDefinedType[MyPoint] {
override def sqlType: DataType = ArrayType(DoubleType, containsNull = false)
override def serialize(obj: MyPoint): ArrayData = {
obj match {
case features: MyPoint =>
new GenericArrayData2(Array(features.x, features.y))
}
}
override def deserialize(datum: Any): MyPoint = {
datum match {
case data: ArrayData if data.numElements() == 2 => {
val arr = data.toDoubleArray()
new MyPoint(arr(0), arr(1))
}
}
}
override def userClass: Class[MyPoint] = classOf[MyPoint]
override def asNullable: MyPointUDT = this
}
Then I create a simple DataFrame:
val p1 = new MyPoint(1.0, 2.0)
val p2 = new MyPoint(1.0, 2.0)
val p3 = new MyPoint(10.0, 20.0)
val p4 = new MyPoint(11.0, 22.0)
val points = Seq(
("P1", p1),
("P2", p2),
("P3", p3),
("P4", p4)
).toDF("label", "point")
points.registerTempTable("points")
spark.sql("SELECT Distinct(point) FROM points").show()
The problem is: Why the SQL query doesn't execute the equals method inside MyPoint class? How comparasions are being made? How can I implement my comparasion operators in this example?

Custom Partitioner in Scala for RDD is not working as expected

I define a spark partitioner and want it to partition data by key,in my sample,the result data should be three different file(not null and their key are "aaa","aa" and "a"),but the reslut only two part
class Mypartitioner2( num:Int) extends org.apache.spark.Partitioner{
override def numPartitions: Int = num
override def getPartition(key: Any): Int = {
if(key.toString.size ==3){
2
}
if(key.toString.size ==2){
1
}
else {
0
}
}
}
object PersonalPartitioner {
def main(args: Array[String]): Unit = {
val spark =SparkSession.builder().config(new SparkConf()).getOrCreate()
val sc =spark.sparkContext
val data =sc.parallelize(Array(
("aaa",2),("aaa",3),("aaa",1),("aaa",0),("aaa",4),
("aa",2),("aa",3),("aa",1),("aa",0),("aa",4),
("a",2),("a",3),("a",1),("a",0),("a",4) ))
data.partitionBy(new Mypartitioner2(3)).saveAsTextFile("develop/wangdaopeng/lab4")
}
}
but the result is
enter image description here
key of “aaa” and "a" was in the same partition

Clause "else" missed in "Mypartitioner2" between two "if"s.

Defining and reading a nullable date column in Slick 3.x

I have a table with a column type date. This column accepts null values, therefore, I declared it as an Option (see field perDate below). The issue is that apparently the implicit conversion from/to java.time.LocalDate/java.sql.Date is incorrect as reading from this table when perDate is null fails with the error:
slick.SlickException: Read NULL value (null) for ResultSet column <computed>
This is the Slick table definition, including the implicit function:
import java.sql.Date
import java.time.LocalDate
class FormulaDB(tag: Tag) extends Table[Formula](tag, "formulas") {
def sk = column[Int]("sk", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def descrip = column[Option[String]]("descrip")
def formula = column[Option[String]]("formula")
def notes = column[Option[String]]("notes")
def periodicity = column[Int]("periodicity")
def perDate = column[Option[LocalDate]]("per_date")(localDateColumnType)
def * = (sk, name, descrip, formula, notes, periodicity, perDate) <>
((Formula.apply _).tupled, Formula.unapply)
implicit val localDateColumnType = MappedColumnType.base[Option[LocalDate], Date](
{
case Some(localDate) => Date.valueOf(localDate)
case None => null
},{
sqlDate => if (sqlDate != null) Some(sqlDate.toLocalDate) else None
}
)
}

Actually your implicit conversion from/to java.time.LocalDate/java.sql.Date is not incorrect.
I have faced the same error, and doing some research I found that the Node created by the Slick SQL Compiler is actually of type MappedJdbcType[Scala.Option -> LocalDate], and not Option[LocalDate].
That is the reason why when the mapping compiler create the column converter for your def perDate it is creating a Base ResultConverterand not a Option ResultConverter
Here is the Slick code for the base converter:
def base[T](ti: JdbcType[T], name: String, idx: Int) = (ti.scalaType match {
case ScalaBaseType.byteType => new BaseResultConverter[Byte](ti.asInstanceOf[JdbcType[Byte]], name, idx)
case ScalaBaseType.shortType => new BaseResultConverter[Short](ti.asInstanceOf[JdbcType[Short]], name, idx)
case ScalaBaseType.intType => new BaseResultConverter[Int](ti.asInstanceOf[JdbcType[Int]], name, idx)
case ScalaBaseType.longType => new BaseResultConverter[Long](ti.asInstanceOf[JdbcType[Long]], name, idx)
case ScalaBaseType.charType => new BaseResultConverter[Char](ti.asInstanceOf[JdbcType[Char]], name, idx)
case ScalaBaseType.floatType => new BaseResultConverter[Float](ti.asInstanceOf[JdbcType[Float]], name, idx)
case ScalaBaseType.doubleType => new BaseResultConverter[Double](ti.asInstanceOf[JdbcType[Double]], name, idx)
case ScalaBaseType.booleanType => new BaseResultConverter[Boolean](ti.asInstanceOf[JdbcType[Boolean]], name, idx)
case _ => new BaseResultConverter[T](ti.asInstanceOf[JdbcType[T]], name, idx) {
override def read(pr: ResultSet) = {
val v = ti.getValue(pr, idx)
if(v.asInstanceOf[AnyRef] eq null) throw new SlickException("Read NULL value ("+v+") for ResultSet column "+name)
v
}
}
}).asInstanceOf[ResultConverter[JdbcResultConverterDomain, T]]
Unfortunately I have no solution for this problem, what I suggest as a workaround, is to map your perDate property as follows:
import java.sql.Date
import java.time.LocalDate
class FormulaDB(tag: Tag) extends Table[Formula](tag, "formulas") {
def sk = column[Int]("sk", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def descrip = column[Option[String]]("descrip")
def formula = column[Option[String]]("formula")
def notes = column[Option[String]]("notes")
def periodicity = column[Int]("periodicity")
def perDate = column[Option[Date]]("per_date")
def toLocalDate(time : Option[Date]) : Option[LocalDate] = time.map(t => t.toLocalDate))
def toSQLDate(localDate : Option[LocalDate]) : Option[Date] = localDate.map(localDate => Date.valueOf(localDate)))
private type FormulaEntityTupleType = (Int, String, Option[String], Option[String], Option[String], Int, Option[Date])
private val formulaShapedValue = (sk, name, descrip, formula, notes, periodicity, perDate).shaped[FormulaEntityTupleType]
private val toFormulaRow: (FormulaEntityTupleType => Formula) = { formulaTuple => {
Formula(formulaTuple._1, formulaTuple._2, formulaTuple._3, formulaTuple._4, formulaTuple._5, formulaTuple._6, toLocalDate(formulaTuple._7))
}
}
private val toFormulaTuple: (Formula => Option[FormulaEntityTupleType]) = { formulaRow =>
Some((formulaRow.sk, formulaRow.name, formulaRow.descrip, formulaRow.formula, formulaRow.notes, formulaRow.periodicity, toSQLDate(formulaRow.perDate)))
}
def * = formulaShapedValue <> (toFormulaRow, toFormulaTuple)
Hopefully the answer comes not too late.

I'm pretty sure the problem is that your'e returning null from your mapping function instead of None.
Try rewriting your mapping function as a function from LocalDate to Date:
implicit val localDateColumnType = MappedColumnType.base[LocalDate, Date](
{
localDate => Date.valueOf(localDate)
},{
sqlDate => sqlDate.toLocalDate
}
)
Alternately, mapping from Option[LocalDate] to Option[Date] should work:
implicit val localDateColumnType =
MappedColumnType.base[Option[LocalDate], Option[Date]](
{
localDateOption => localDateOption.map(Date.valueOf)
},{
sqlDateOption => sqlDateOption.map(_.toLocalDate)
}
)

Custom unpickler for Enumeration

I have an enumeration that I'm trying to pickle and unpickle with pickling 0.8.0 and scala 2.11:
object CommandType extends Enumeration {
val Push, Pop = Value
}
Pickling cannot do it automagically at the moment. The custom pickler-unpickler looks like this:
class CommandTypePickler(implicit val format: PickleFormat)
extends SPickler[CommandType.Value] with Unpickler[CommandType.Value] with LazyLogging {
def pickle(picklee: CommandType.Value, builder: PBuilder): Unit = {
builder.beginEntry(picklee)
builder.putField("commandType", b =>
b.hintTag(stringTag).beginEntry(picklee.toString).endEntry()
)
builder.endEntry()
}
override def unpickle(tag: => FastTypeTag[_], reader: PReader): CommandType.Value = {
val ctReader = reader.readField("commandType")
val tag = ctReader.beginEntry()
logger.debug(s"tag is ${tag.toString}")
val value = stringUnpickler.unpickle(tag, ctReader).asInstanceOf[String]
ctReader.endEntry()
CommandType.withName(value)
}
}
Serialized enumeration:
{
"tpe": "scala.Enumeration.Value",
"commandType": {
"tpe": "java.lang.String",
"value": "Push"
}
}
When unpickling, this throws the following: ScalaReflectionException: : class scala.Enumeration.Value in JavaMirror with sun.misc.Launcher$AppClassLoader#5c3eeab3 of type class sun.misc.Launcher$AppClassLoader with classpath ... not found. What am I doing wrong?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Exception when using UDT in Spark DataFrame - apache-spark

Related

Retrieve nested Data from Firebase Database android

Spark SQL doesn't call my UDT equals/hashcode methods

Custom Partitioner in Scala for RDD is not working as expected

Defining and reading a nullable date column in Slick 3.x

Custom unpickler for Enumeration

Categories

Resources