Defining and reading a nullable date column in Slick 3.x - slick

I have a table with a column type date. This column accepts null values, therefore, I declared it as an Option (see field perDate below). The issue is that apparently the implicit conversion from/to java.time.LocalDate/java.sql.Date is incorrect as reading from this table when perDate is null fails with the error:
slick.SlickException: Read NULL value (null) for ResultSet column <computed>
This is the Slick table definition, including the implicit function:
import java.sql.Date
import java.time.LocalDate
class FormulaDB(tag: Tag) extends Table[Formula](tag, "formulas") {
def sk = column[Int]("sk", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def descrip = column[Option[String]]("descrip")
def formula = column[Option[String]]("formula")
def notes = column[Option[String]]("notes")
def periodicity = column[Int]("periodicity")
def perDate = column[Option[LocalDate]]("per_date")(localDateColumnType)
def * = (sk, name, descrip, formula, notes, periodicity, perDate) <>
((Formula.apply _).tupled, Formula.unapply)
implicit val localDateColumnType = MappedColumnType.base[Option[LocalDate], Date](
{
case Some(localDate) => Date.valueOf(localDate)
case None => null
},{
sqlDate => if (sqlDate != null) Some(sqlDate.toLocalDate) else None
}
)
}

Actually your implicit conversion from/to java.time.LocalDate/java.sql.Date is not incorrect.
I have faced the same error, and doing some research I found that the Node created by the Slick SQL Compiler is actually of type MappedJdbcType[Scala.Option -> LocalDate], and not Option[LocalDate].
That is the reason why when the mapping compiler create the column converter for your def perDate it is creating a Base ResultConverterand not a Option ResultConverter
Here is the Slick code for the base converter:
def base[T](ti: JdbcType[T], name: String, idx: Int) = (ti.scalaType match {
case ScalaBaseType.byteType => new BaseResultConverter[Byte](ti.asInstanceOf[JdbcType[Byte]], name, idx)
case ScalaBaseType.shortType => new BaseResultConverter[Short](ti.asInstanceOf[JdbcType[Short]], name, idx)
case ScalaBaseType.intType => new BaseResultConverter[Int](ti.asInstanceOf[JdbcType[Int]], name, idx)
case ScalaBaseType.longType => new BaseResultConverter[Long](ti.asInstanceOf[JdbcType[Long]], name, idx)
case ScalaBaseType.charType => new BaseResultConverter[Char](ti.asInstanceOf[JdbcType[Char]], name, idx)
case ScalaBaseType.floatType => new BaseResultConverter[Float](ti.asInstanceOf[JdbcType[Float]], name, idx)
case ScalaBaseType.doubleType => new BaseResultConverter[Double](ti.asInstanceOf[JdbcType[Double]], name, idx)
case ScalaBaseType.booleanType => new BaseResultConverter[Boolean](ti.asInstanceOf[JdbcType[Boolean]], name, idx)
case _ => new BaseResultConverter[T](ti.asInstanceOf[JdbcType[T]], name, idx) {
override def read(pr: ResultSet) = {
val v = ti.getValue(pr, idx)
if(v.asInstanceOf[AnyRef] eq null) throw new SlickException("Read NULL value ("+v+") for ResultSet column "+name)
v
}
}
}).asInstanceOf[ResultConverter[JdbcResultConverterDomain, T]]
Unfortunately I have no solution for this problem, what I suggest as a workaround, is to map your perDate property as follows:
import java.sql.Date
import java.time.LocalDate
class FormulaDB(tag: Tag) extends Table[Formula](tag, "formulas") {
def sk = column[Int]("sk", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def descrip = column[Option[String]]("descrip")
def formula = column[Option[String]]("formula")
def notes = column[Option[String]]("notes")
def periodicity = column[Int]("periodicity")
def perDate = column[Option[Date]]("per_date")
def toLocalDate(time : Option[Date]) : Option[LocalDate] = time.map(t => t.toLocalDate))
def toSQLDate(localDate : Option[LocalDate]) : Option[Date] = localDate.map(localDate => Date.valueOf(localDate)))
private type FormulaEntityTupleType = (Int, String, Option[String], Option[String], Option[String], Int, Option[Date])
private val formulaShapedValue = (sk, name, descrip, formula, notes, periodicity, perDate).shaped[FormulaEntityTupleType]
private val toFormulaRow: (FormulaEntityTupleType => Formula) = { formulaTuple => {
Formula(formulaTuple._1, formulaTuple._2, formulaTuple._3, formulaTuple._4, formulaTuple._5, formulaTuple._6, toLocalDate(formulaTuple._7))
}
}
private val toFormulaTuple: (Formula => Option[FormulaEntityTupleType]) = { formulaRow =>
Some((formulaRow.sk, formulaRow.name, formulaRow.descrip, formulaRow.formula, formulaRow.notes, formulaRow.periodicity, toSQLDate(formulaRow.perDate)))
}
def * = formulaShapedValue <> (toFormulaRow, toFormulaTuple)
Hopefully the answer comes not too late.

I'm pretty sure the problem is that your'e returning null from your mapping function instead of None.
Try rewriting your mapping function as a function from LocalDate to Date:
implicit val localDateColumnType = MappedColumnType.base[LocalDate, Date](
{
localDate => Date.valueOf(localDate)
},{
sqlDate => sqlDate.toLocalDate
}
)
Alternately, mapping from Option[LocalDate] to Option[Date] should work:
implicit val localDateColumnType =
MappedColumnType.base[Option[LocalDate], Option[Date]](
{
localDateOption => localDateOption.map(Date.valueOf)
},{
sqlDateOption => sqlDateOption.map(_.toLocalDate)
}
)

Related

Spark SQL doesn't call my UDT equals/hashcode methods

I want to implement my comparison operators(equals, hashcode, ordering) in a data type defined by me in Spark SQL. Although Spark SQL UDT's still remains private, I follow some examples like this, to workaround this situation.
I have a class called MyPoint:
#SQLUserDefinedType(udt = classOf[MyPointUDT])
case class MyPoint(x: Double, y: Double) extends Serializable {
override def hashCode(): Int = {
println("hash code")
31 * (31 * x.hashCode()) + y.hashCode()
}
override def equals(other: Any): Boolean = {
println("equals")
other match {
case that: MyPoint => this.x == that.x && this.y == that.y
case _ => false
}
}
Then, I have the UDT class:
private class MyPointUDT extends UserDefinedType[MyPoint] {
override def sqlType: DataType = ArrayType(DoubleType, containsNull = false)
override def serialize(obj: MyPoint): ArrayData = {
obj match {
case features: MyPoint =>
new GenericArrayData2(Array(features.x, features.y))
}
}
override def deserialize(datum: Any): MyPoint = {
datum match {
case data: ArrayData if data.numElements() == 2 => {
val arr = data.toDoubleArray()
new MyPoint(arr(0), arr(1))
}
}
}
override def userClass: Class[MyPoint] = classOf[MyPoint]
override def asNullable: MyPointUDT = this
}
Then I create a simple DataFrame:
val p1 = new MyPoint(1.0, 2.0)
val p2 = new MyPoint(1.0, 2.0)
val p3 = new MyPoint(10.0, 20.0)
val p4 = new MyPoint(11.0, 22.0)
val points = Seq(
("P1", p1),
("P2", p2),
("P3", p3),
("P4", p4)
).toDF("label", "point")
points.registerTempTable("points")
spark.sql("SELECT Distinct(point) FROM points").show()
The problem is: Why the SQL query doesn't execute the equals method inside MyPoint class? How comparasions are being made? How can I implement my comparasion operators in this example?

In Spark How do i read a field by its name itself instead by its index

I use Spark 1.3.
My data has 50 and more attributes and hence I went for a custom class.
How do I access a Field from a Custom Class by its name not by its position
Here every time I need to invoke a method productElement(0)
Also i am not supposed to use case class , Hence i am using a Custom class for schema.
class OnlineEvents(gsm_id:String,
attribution_id:String,
event_date:String,
event_timestamp:String,
event_type:String
) extends Product {
override def productElement(n: Int): Any = n match {
case 0 => impression_id
case 1 => attribution_id
case 2 => event_date
case 3 => event_timestamp
case 4 => event_type
case _ => throw new IndexOutOfBoundsException(n.toString)
}
override def productArity: Int = 5
override def canEqual(that: Any): Boolean = that.isInstanceOf[OnlineEvents]
}
My Spark Code :
val onlineRDD = sc.textFile("/user/cloudera/input_files/online_events.txt")
val schemaRDD = onlineRDD.map(record => {
val arr: Array[String] = record.split(",")
new OnlineEvents(arr(0),arr(1),arr(2),arr(3),arr(4))
})
val keyvalueRDD = schemaRDD .map(online => ((online.productElement(0).toString,online.productElement(4).toString),online))
If i try to access any field from OnlineEvents then i need to use productElement() .(i.e online.productElement(0) for gsm_id )
Can i directly access the field as online.gsm_id ... online.event_type , so that my code is easily readable
How do i directly access a field by its name when i use Custom Class for schema?
According to my understanding of your question, you need to define some functions inside OnlineEvents to return the types. So your solution should be
class OnlineEvents(gsm_id:String,
attribution_id:String,
event_date:String,
event_timestamp:String,
event_type:String
) extends Product {
def get_gsm_id(): String ={
gsm_id
}
def get_attribution_id(): String ={
attribution_id
}
def get_event_date(): String ={
event_date
}
def get_event_timestamp(): String ={
event_timestamp
}
def get_event_type(): String ={
event_type
}
override def productElement(n: Int): Any = n match {
case 0 => gsm_id
case 1 => attribution_id
case 2 => event_date
case 3 => event_timestamp
case 4 => event_type
case _ => throw new IndexOutOfBoundsException(n.toString)
}
override def productArity: Int = 5
override def canEqual(that: Any): Boolean = that.isInstanceOf[OnlineEvents]
}
And call the funtions as below
val keyvalueRDD = schemaRDD .map(online => ((online.get_gsm_id().toString,online.get_event_type().toString),online))
I strongly recommend using a case class per use case (which all together cover all the use cases that use the data).
A single use case would then be a single case class that would save you a lot of thinking about how to maintain the 50+ fields.
Yeah, you'd "trade" a single big 50-or-more-field class for 10 5-field case classes, but given how easy it is to create a case class and how nicely they would describe your data I think it's worth the hassle.

Slick 3.0 best practices

Coming from JPA and Hibernate, I find using Slick pretty straight forward apart from using some Joins and Aggregate queries.
Now is there some best practices that I could employ when defining my tables and the mapping case classes? I currently have a single class that holds all the queries, Table definitions and the case classes to which I map to when I query the database. This class deals with say 10 to 15 tables and the file has become quite big.
I'm now wondering if I should split this into different packages for readability. What do you guys think?
You can separate out slick mapping table from slick queries. Put slick mapping table into trait. Mixed this trait where you want write quires and join into table. for example:
package com.knol.db.repo
import com.knol.db.connection.DBComponent
import scala.concurrent.Future
trait BankRepository extends BankTable { this: DBComponent =>
import driver.api._
def create(bank: Bank): Future[Int] = db.run { bankTableAutoInc += bank }
def update(bank: Bank): Future[Int] = db.run { bankTableQuery.filter(_.id === bank.id.get).update(bank) }
def getById(id: Int): Future[Option[Bank]] = db.run { bankTableQuery.filter(_.id === id).result.headOption }
def getAll(): Future[List[Bank]] = db.run { bankTableQuery.to[List].result }
def delete(id: Int): Future[Int] = db.run { bankTableQuery.filter(_.id === id).delete }
}
private[repo] trait BankTable { this: DBComponent =>
import driver.api._
private[BankTable] class BankTable(tag: Tag) extends Table[Bank](tag,"bank") {
val id = column[Int]("id", O.PrimaryKey, O.AutoInc)
val name = column[String]("name")
def * = (name, id.?) <> (Bank.tupled, Bank.unapply)
}
protected val bankTableQuery = TableQuery[BankTable]
protected def bankTableAutoInc = bankTableQuery returning bankTableQuery.map(_.id)
}
case class Bank(name: String, id: Option[Int] = None)
For joining two table:
package com.knol.db.repo
import com.knol.db.connection.DBComponent
import scala.concurrent.Future
trait BankInfoRepository extends BankInfoTable { this: DBComponent =>
import driver.api._
def create(bankInfo: BankInfo): Future[Int] = db.run { bankTableInfoAutoInc += bankInfo }
def update(bankInfo: BankInfo): Future[Int] = db.run { bankInfoTableQuery.filter(_.id === bankInfo.id.get).update(bankInfo) }
def getById(id: Int): Future[Option[BankInfo]] = db.run { bankInfoTableQuery.filter(_.id === id).result.headOption }
def getAll(): Future[List[BankInfo]] = db.run { bankInfoTableQuery.to[List].result }
def delete(id: Int): Future[Int] = db.run { bankInfoTableQuery.filter(_.id === id).delete }
def getBankWithInfo(): Future[List[(Bank, BankInfo)]] =
db.run {
(for {
info <- bankInfoTableQuery
bank <- info.bank
} yield (bank, info)).to[List].result
}
def getAllBankWithInfo(): Future[List[(Bank, Option[BankInfo])]] =
db.run {
bankTableQuery.joinLeft(bankInfoTableQuery).on(_.id === _.bankId).to[List].result
}
}
private[repo] trait BankInfoTable extends BankTable { this: DBComponent =>
import driver.api._
private[BankInfoTable] class BankInfoTable(tag: Tag) extends Table[BankInfo](tag,"bankinfo") {
val id = column[Int]("id", O.PrimaryKey, O.AutoInc)
val owner = column[String]("owner")
val bankId = column[Int]("bank_id")
val branches = column[Int]("branches")
def bank = foreignKey("bank_product_fk", bankId, bankTableQuery)(_.id)
def * = (owner, branches, bankId, id.?) <> (BankInfo.tupled, BankInfo.unapply)
}
protected val bankInfoTableQuery = TableQuery[BankInfoTable]
protected def bankTableInfoAutoInc = bankInfoTableQuery returning bankInfoTableQuery.map(_.id)
}
case class BankInfo(owner: String, branches: Int, bankId: Int, id: Option[Int] = None)
For more explanation see Blog and github.
It might be helpful!!!

Slick DBIO: master-detail

I need to collect report data from a master-detail relations. Here is a simplified example:
case class Person(id: Int, name: String)
case class Order(id: String, personId: Int, description: String)
class PersonTable(tag: Tag) extends Table[Person](tag, "person") {
def id = column[Int]("id")
def name = column[String]("name")
override def * = (id, name) <>(Person.tupled, Person.unapply)
}
class OrderTable(tag: Tag) extends Table[Order](tag, "order") {
def id = column[String]("id")
def personId = column[Int]("personId")
def description = column[String]("description")
override def * = (id, personId, description) <>(Order.tupled, Order.unapply)
}
val persons = TableQuery[PersonTable]
val orders = TableQuery[OrderTable]
case class PersonReport(nameToDescription: Map[String, Seq[String]])
/** Some complex function that cannot be expressed in SQL and
* in slick's #join.
*/
def myScalaCondition(person: Person): Boolean =
person.name.contains("1")
// Doesn't compile:
// val reportDbio1:DBIO[PersonReport] =
// (for{ allPersons <- persons.result
// person <- allPersons
// if myScalaCondition(person)
// descriptions <- orders.
// filter(_.personId == person.id).
// map(_.description).result
// } yield (person.name, descriptions)
// ).map(s => PersonReport(s.toMap))
val reportDbio2: DBIO[PersonReport] =
persons.result.flatMap {
allPersons =>
val dbios = allPersons.
filter(myScalaCondition).map { person =>
orders.
filter(_.personId == person.id).
map(_.description).result.map { seq => (person.name, seq) }
}
DBIO.sequence(dbios)
}.map(ps => PersonReport(ps.toMap))
It looks far away from straightforward. When I need to collect master-detail data with 3 levels, it becomes incomprehensible.
Is there a better way?

Exception when using UDT in Spark DataFrame

I'm trying to create a user defined type in spark sql, but I receive:
com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType even when using their example. Has anyone made this work?
My code:
test("udt serialisation") {
val points = Seq(new ExamplePoint(1.3, 1.6), new ExamplePoint(1.3, 1.8))
val df = SparkContextForStdout.context.parallelize(points).toDF()
}
#SQLUserDefinedType(udt = classOf[ExamplePointUDT])
case class ExamplePoint(val x: Double, val y: Double)
/**
* User-defined type for [[ExamplePoint]].
*/
class ExamplePointUDT extends UserDefinedType[ExamplePoint] {
override def sqlType: DataType = ArrayType(DoubleType, false)
override def pyUDT: String = "pyspark.sql.tests.ExamplePointUDT"
override def serialize(obj: Any): Seq[Double] = {
obj match {
case p: ExamplePoint =>
Seq(p.x, p.y)
}
}
override def deserialize(datum: Any): ExamplePoint = {
datum match {
case values: Seq[_] =>
val xy = values.asInstanceOf[Seq[Double]]
assert(xy.length == 2)
new ExamplePoint(xy(0), xy(1))
case values: util.ArrayList[_] =>
val xy = values.asInstanceOf[util.ArrayList[Double]].asScala
new ExamplePoint(xy(0), xy(1))
}
}
override def userClass: Class[ExamplePoint] = classOf[ExamplePoint]
}
The usefull stackstrace is this:
com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType
java.lang.ClassCastException: com.ubs.ged.risk.stdout.spark.ExamplePointUDT cannot be cast to org.apache.spark.sql.types.StructType
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:316)
at org.apache.spark.sql.SQLContext$implicits$.rddToDataFrameHolder(SQLContext.scala:254)
It seems that the UDT needs to be used inside of another class to work (as the type of a field). One solution to use it directly is to wrap it into a Tuple1:
test("udt serialisation") {
val points = Seq(new Tuple1(new ExamplePoint(1.3, 1.6)), new Tuple1(new ExamplePoint(1.3, 1.8)))
val df = SparkContextForStdout.context.parallelize(points).toDF()
df.collect().foreach(println(_))
}

Resources