How to use java.time.LocalDate in Cassandra query from Spark?

How to use java.time.LocalDate in Cassandra query from Spark? - apache-spark

We have a table in Cassandra with column start_time of type date.
When we execute following code:
val resultRDD = inputRDD.joinWithCassandraTable(KEY_SPACE,TABLE)
.where("start_time = ?", java.time.LocalDate.now)
We get following error:
com.datastax.spark.connector.types.TypeConversionException: Cannot convert object 2016-10-13 of type class java.time.LocalDate to com.datastax.driver.core.LocalDate.
at com.datastax.spark.connector.types.TypeConverter$$anonfun$convert$1.apply(TypeConverter.scala:45)
at com.datastax.spark.connector.types.TypeConverter$$anonfun$convert$1.apply(TypeConverter.scala:43)
at com.datastax.spark.connector.types.TypeConverter$LocalDateConverter$$anonfun$convertPF$14.applyOrElse(TypeConverter.scala:449)
at com.datastax.spark.connector.types.TypeConverter$class.convert(TypeConverter.scala:43)
at com.datastax.spark.connector.types.TypeConverter$LocalDateConverter$.com$datastax$spark$connector$types$NullableTypeConverter$$super$convert(TypeConverter.scala:439)
at com.datastax.spark.connector.types.NullableTypeConverter$class.convert(TypeConverter.scala:56)
at com.datastax.spark.connector.types.TypeConverter$LocalDateConverter$.convert(TypeConverter.scala:439)
at com.datastax.spark.connector.types.TypeConverter$OptionToNullConverter$$anonfun$convertPF$29.applyOrElse(TypeConverter.scala:788)
at com.datastax.spark.connector.types.TypeConverter$class.convert(TypeConverter.scala:43)
at com.datastax.spark.connector.types.TypeConverter$OptionToNullConverter.com$datastax$spark$connector$types$NullableTypeConverter$$super$convert(TypeConverter.scala:771)
at com.datastax.spark.connector.types.NullableTypeConverter$class.convert(TypeConverter.scala:56)
at com.datastax.spark.connector.types.TypeConverter$OptionToNullConverter.convert(TypeConverter.scala:771)
at com.datastax.spark.connector.writer.BoundStatementBuilder$$anonfun$8.apply(BoundStatementBuilder.scala:93)
I've tried to register custom converters according to documentation:
object JavaLocalDateToCassandraLocalDateConverter extends TypeConverter[com.datastax.driver.core.LocalDate] {
def targetTypeTag = typeTag[com.datastax.driver.core.LocalDate]
def convertPF = {
case ld: java.time.LocalDate => com.datastax.driver.core.LocalDate.fromYearMonthDay(ld.getYear, ld.getMonthValue, ld.getDayOfMonth)
case _ => com.datastax.driver.core.LocalDate.fromYearMonthDay(1971, 1, 1)
}
}
object CassandraLocalDateToJavaLocalDateConverter extends TypeConverter[java.time.LocalDate] {
def targetTypeTag = typeTag[java.time.LocalDate]
def convertPF = { case ld: com.datastax.driver.core.LocalDate => java.time.LocalDate.of(ld.getYear(), ld.getMonth(), ld.getDay())
case _ => java.time.LocalDate.now
}
}
TypeConverter.registerConverter(JavaLocalDateToCassandraLocalDateConverter)
TypeConverter.registerConverter(CassandraLocalDateToJavaLocalDateConverter)
But it didn't help.
How can I use JDK8 Date/Time classes in Cassandra queries executed from Spark?

I think the simplest thing to do in a where clause like this is to just call
sc
.cassandraTable("test","test")
.where("start_time = ?", java.time.LocalDate.now.toString)
.collect`
And just pass in the string since that will be a well defined conversion.
There seems to be an issue in the TypeConverters where your converter is not taking precedence over the built in converter. I'll take a quick look.
--Edit--
It seems like the registered converters are not being properly transferred to the Executors. In Local mode the code works as expected which makes me think this is a serialization issue. I would open a ticket on the Spark Cassandra Connector for this issue.

Cassandra date format is yyyy-MM-dd HH:mm:ss.SSS
so you can use the below code, if you are using Java 8 to convert Cassandra date to LocalDate, then you can do your logic.
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS")
val dateTime = LocalDateTime.parse(cassandraDateTime, formatter);
Or you can convert LocalDate to Cassandra date format and check it.

Related

How to retrieve particular string from a text kotlin?

I am getting the scan result in a string like ---
DriverId=60cb1daa20056c0c92ebe457,Amount=10.0
I want to retrive driver id and amount from this string.
How can I retrive?
Please help...

It depends on your overall format. Basic operations like substrings as suggested by #iLoveYou3000 can work fine if you really have this fixed format.
If the keys are dynamic, or could be changed in the future, you could also use more general approaches, for instance using split():
val attributeStrings = input.split(",")
val attributesMap = attributeStrings.map { it.split("=") }.associate { it[0] to it[1] }
val driverId = attributesMap["DriverId"]
val amount = attributesMap["Amount"].toDouble() // or .toBigDecimal()

This is one of the possible ways that I could think of.
val driverID= str.substringAfter("DriverId=", "").substringBefore(",", "")
val amount = str.substringAfter("Amount=", "")

How to create an update statement where a UDT value need to be updated using QueryBuilder

I have the following udt type
CREATE TYPE tag_partitions(
year bigint,
month bigint);
and the following table
CREATE TABLE ${tableName} (
tag text,
partition_info set<FROZEN<tag_partitions>>,
PRIMARY KEY ((tag))
)
The table schema is mapped using the following model
case class TagPartitionsInfo(year:Long, month:Long)
case class TagPartitions(tag:String, partition_info:Set[TagPartitionsInfo])
I have written a function which should create an Update.IfExists query: But I don't know how I should update the udt value. I tried to use set but it isn't working.
def updateValues(tableName:String, model:TagPartitions, id:TagPartitionKeys):Update.IfExists = {
val partitionInfoType:UserType = session.getCluster().getMetadata
.getKeyspace("codingjedi").getUserType("tag_partitions")
//create value
//the logic below assumes that there is only one element in the set
val partitionsInfoSet:Set[UDTValue] = model.partition_info.map((partitionInfo:TagPartitionsInfo) =>{
partitionInfoType.newValue()
.setLong("year",partitionInfo.year)
.setLong("month",partitionInfo.month)
})
println("partition info converted to UDTValue: "+partitionsInfoSet)
QueryBuilder.update(tableName).
`with`(QueryBuilder.WHAT_TO_DO_HERE_TO_UPDATE_UDT("partition_info",partitionsInfoSet))
.where(QueryBuilder.eq("tag", id.tag)).ifExists()
}

The mistake was I was adding partitionsInfoSet in the table but it is a Set of Scala. I needed to convert into Set of Java using setAsJavaSet
QueryBuilder.update(tableName).`with`(QueryBuilder.set("partition_info",setAsJavaSet(partitionsInfoSet)))
.where(QueryBuilder.eq("tag", id.tag))
.ifExists()
}

Although, it didn't answer your exact question, wouldn't it be easier to use Object Mapper for this? Something like this (I didn't modify it heavily to match your code):
#UDT(name = "scala_udt")
case class UdtCaseClass(id: Integer, #(Field #field)(name = "t") text: String) {
def this() {
this(0, "")
}
}
#Table(name = "scala_test_udt")
case class TableObjectCaseClassWithUDT(#(PartitionKey #field) id: Integer,
udts: java.util.Set[UdtCaseClass]) {
def this() {
this(0, new java.util.HashSet[UdtCaseClass]())
}
}
and then just create case class and use mapper.save on it. (Also note that you need to use Java collections, until you're imported Scala codecs).
The primary reason for using Object Mapper could be ease of use, and also better performance, because it's using prepared statements under the hood, instead of built statements that are much less efficient.
You can find more information about Object Mapper + Scala in article that I wrote recently.

Unable to map values of a Set into a model

I have the following model
case class TagPartitionsInfo (
year:Int,
month:Int
)
case class TagPartitions(tag:String,
partition_info:Set[TagPartitionsInfo])
The data in the Cassandra table is stored like follows:
tag | partition_info
------------+--------------------------------------------------
javascript | {{year: 2018, month: 1}, {year: 2018, month: 2}}
When I am querying the table, I am trying to create the TagPartitions as follows from the ResultSet but my code isn't compiling. The issue seem to be the way I am extracting Set from the row:
TagPartitions(row.getString("tag"),row.getSet[TagPartitionsInfo]("partition_info",TagPartitionsInfo.getClass))
The error is Cannot resolve symbol getSet.
I also tried row.getSet("partition_info",TagPartitionsInfo.getClass) but then I see the error Type mismatch, expected Set[TagPartitionsInfo], actual util.Set[Any]
What am I doing wrong?

This worked. As I am using a UDT, I have to use UDTType and UDTValue to convert the UDT into my model
val tag = row.getString("tag")
val partitionInfoType:UserType = session.getCluster().getMetadata.getKeyspace("codingjedi").getUserType("tag_partitions")
//create value
//the logic below assumes that there is only one element in the set
val partitionsInfo =
row.getSet("partition_info",partitionInfoType.newValue().getClass)
println("tag is "+tag +" and partition info converted to UDTValue: "+partitionsInfo)
val udtValueScalaSet:Set[UDTValue] = partitionsInfo.asScala.toSet
//convert Set[UDTValue] = Set[TagPartitionsInfo]
val partitionInfoSet:Set[TagPartitionsInfo] = udtValueScalaSet.map(partition=>TagPartitionsInfo(partition.getLong("year"),partition.getLong("month")))
return TagPartitions(tag,partitionInfoSet)

Sorting a table by date in Slick 3.1.x

I have the following Slick class that includes a date:
import java.sql.Date
import java.time.LocalDate
class ReportDateDB(tag: Tag) extends Table[ReportDateVO](tag, "report_dates") {
def reportDate = column[LocalDate]("report_date")(localDateColumnType)
def * = (reportDate) <> (ReportDateVO.apply, ReportDateVO.unapply)
implicit val localDateColumnType = MappedColumnType.base[LocalDate, Date](
d => Date.valueOf(d),
d => d.toLocalDate
)
}
When I attempt to sort the table by date:
val query = TableQuery[ReportDateDB]
val action = query.sortBy(_.reportDate).result
I get the following compilation error
not enough arguments for method sortBy: (implicit evidence$2: slick.lifted.Rep[java.time.LocalDate] ⇒
slick.lifted.Ordered)slick.lifted.Query[fdic.ReportDateDB,fdic.ReportDateDB#TableElementType,Seq].
Unspecified value parameter evidence$2.
No implicit view available from slick.lifted.Rep[java.time.LocalDate] ⇒ slick.lifted.Ordered.
How to specify the implicit default order?

You need to make your implicit val localDateColumnType available where you run the query. For example, this will work:
implicit val localDateColumnType = MappedColumnType.base[LocalDate, Date](
d => Date.valueOf(d),
d => d.toLocalDate)
val query = TableQuery[ReportDateDB]
val action = query.sortBy(_.reportDate).result
I'm not sure where the best place to put this is, but I usually put all these conversions in a package object.

It should work like described here:
implicit def localDateOrdering: Ordering[LocalDate] = Ordering.fromLessThan(_ isBefore _)

Try add this line to your import list:
import slick.driver.MySQLDriver.api._

spark udf with data frame

I am using Spark 1.3. I have a dataset where the dates in column (ordering_date column) are in yyyy/MM/dd format. I want to do some calculations with dates and therefore I want to use jodatime to do some conversions/formatting. Here is the udf that I have :
val return_date = udf((str: String, dtf: DateTimeFormatter) => dtf.formatted(str))
Here is the code where the udf is being called. However, I get error saying "Not Applicable". Do I need to register this UDF or am I missing something here?
val user_with_dates_formatted = users.withColumn(
"formatted_date",
return_date(users("ordering_date"), DateTimeFormat.forPattern("yyyy/MM/dd")
)

I don't believe you can pass in the DateTimeFormatter as an argument to the UDF. You can only pass in a Column. One solution would be to do:
val return_date = udf((str: String, format: String) => {
DateTimeFormat.forPatten(format).formatted(str))
})
And then:
val user_with_dates_formatted = users.withColumn(
"formatted_date",
return_date(users("ordering_date"), lit("yyyy/MM/dd"))
)
Honestly, though -- both this and your original algorithms have the same problem. They both parse yyyy/MM/dd using forPattern for every record. Better would be to create a singleton object wrapped around a Map[String,DateTimeFormatter], maybe like this (thoroughly untested, but you get the idea):
object DateFormatters {
var formatters = Map[String,DateTimeFormatter]()
def getFormatter(format: String) : DateTimeFormatter = {
if (formatters.get(format).isEmpty) {
formatters = formatters + (format -> DateTimeFormat.forPattern(format))
}
formatters.get(format).get
}
}
Then you would change your UDF to:
val return_date = udf((str: String, format: String) => {
DateFormatters.getFormatter(format).formatted(str))
})
That way, DateTimeFormat.forPattern(...) is only called once per format per executor.
One thing to note about the singleton object solution is that you can't define the object in the spark-shell -- you have to pack it up in a JAR file and use the --jars option to spark-shell if you want to use the DateFormatters object in the shell.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to use java.time.LocalDate in Cassandra query from Spark? - apache-spark

Related

How to retrieve particular string from a text kotlin?

How to create an update statement where a UDT value need to be updated using QueryBuilder

Unable to map values of a Set into a model

Sorting a table by date in Slick 3.1.x

spark udf with data frame

Categories

Resources