I'm trying to catch an SQL error in Slick 3.x. The code below doesn't print anything, but if traced under debug, it works fine (it prints the failure). What's wrong with this code?
object TestSlick extends App {
val db = Database.forConfig("dbconfig")
val sql = "update table_does_not_exist set zzz=1 where ccc=2"
val q = sqlu"#$sql"
db.run(q.asTry).map {result =>
result match {
case Success(r) => println(r)
case Failure(e) => {
println(s"SQL Error, ${e.getMessage}")
println("command:" + sql)
throw e
}
}
}
}
This works, a future is needed, thanks to lxx for the tip
val future = db.run(q.asTry).map {result =>
result match {
case Success(r) => println(r)
case Failure(e) => {
println(s"SQL Error, ${e.getMessage}")
println("command:" + sql)
throw e
}
}
}
Await.result(future, Duration.Inf)
Related
object Test extends App {
val x = 0
val threadHello = (1 to 5).map(_ => new Thread(() => {
println("Hello")
println(x) // this results in join never resolving "collecting data"
}))
threadHello.foreach(_.start())
threadHello.foreach(_.join())
println(x)
}
I'm still learning about concurrency in Scala but I'm having an issue where thread.join() never resolves and the program ends up running forever, UNLESS I comment out the println(x) statement.
Debugging reveals that the thread is never able to access the value of x, and I'm not sure why this is an issue.
The problem highlighted when debugging in IntelliJ
Your code actually runs just fine for me, in Scala 2.13.
That said, I suspect your problem has to do with the initialization order of vals in scala.App. If so, you should be able to work around it by making x lazy, like so:
object Test extends App {
lazy val x = 0
val threadHello = (1 to 5).map(_ => new Thread(() => {
println("Hello")
println(x) // this results in join never resolving "collecting data"
}))
threadHello.foreach(_.start())
threadHello.foreach(_.join())
println(x)
}
Alternatively, just don't use scala.App:
object Main {
def main(args: Array[String]) {
val x = 0
val threadHello = (1 to 5).map(_ => new Thread(() => {
println("Hello")
println(x) // this results in join never resolving "collecting data"
}))
threadHello.foreach(_.start())
threadHello.foreach(_.join())
println(x)
}
}
I have following UDF used to convert time stored as a string into a timestamp.
val hmsToTimeStampUdf = udf((dt: String) => {
if (dt == null) null else {
val formatter = DateTimeFormat.forPattern("HH:mm:ss")
try {
new Timestamp(formatter.parseDateTime(dt).getMillis)
} catch {
case t: Throwable => throw new RuntimeException("hmsToTimeStampUdf,dt="+dt, t)
}
}
})
This UDF is used to convert String value into Timestamp:
outputDf.withColumn(schemaColumn.name, ymdToTimeStampUdf(col(schemaColumn.name))
But some CSV files have invalid value for this column causing RuntimeException. I want to find which rows have these broken records. Is it possible to access row information inside the UDF?
Instead of throwing a RuntimeException that kills your .csv parsing, maybe a better approach would be to have UDF returning a tuple (well-formed, corrupted) value. Then, you can easily segregate good/bad rows by selecting is null/is not null subsets.
def safeConvert(dt: String) : (Timestamp,String) = {
if (dt == null)
(null,null)
else {
val formatter = DateTimeFormat.forPattern("HH:mm:ss")
try {
(new Timestamp(formatter.parseDateTime(dt).getMillis),null)
} catch {
case e:Exception =>
(null,dt)
}
}
}
val safeConvertUDF = udf(safeConvert(_:String))
val df = Seq(("00:01:02"),("03:04:05"),("67:89:10")).toDF("dt")
df.withColumn("temp",safeConvertUDF($"dt"))
.withColumn("goodData",$"temp".getItem("_1"))
.withColumn("badData",$"temp".getItem("_2"))
.drop($"temp").show(false)
+--------+-------------------+--------+
|dt |goodData |badData |
+--------+-------------------+--------+
|00:01:02|1970-01-01 00:01:02|null |
|03:04:05|1970-01-01 03:04:05|null |
|67:89:10|null |67:89:10|
+--------+-------------------+--------+
You can add the row as second input parameter to the udf:
val hmsToTimeStampUdf = udf((dt: String, r: Row) => {
if (dt == null) null else {
val formatter = DateTimeFormat.forPattern("HH:mm:ss")
try {
new Timestamp(formatter.parseDateTime(dt).getMillis)
} catch {
case t: Throwable => {
println(r) //do some error handling
null
}
}
}
})
When calling the udf, use a struct with all columns of the dataframe as second parameter (thanks to this answer):
df.withColumn("dt", hmsToTimeStampUdf(col("dt"), struct(df.columns.map(df(_)) : _*)))
Want to put Avro messages from Kafka topics into Elasticsearch using Spark job (and SchemaRegistry with many defined schemas). I was able to read and deserialize records into Strings (json) format succesfully (with those 2 methods):
// Deserialize Avro to String
def avroToJsonString(record: GenericRecord): String = try {
val baos = new ByteArrayOutputStream
try {
val schema = record.getSchema
val jsonEncoder = EncoderFactory.get.jsonEncoder(schema, baos, false)
val avroWriter = new SpecificDatumWriter[GenericRecord](schema)
avroWriter.write(record, jsonEncoder)
jsonEncoder.flush()
baos.flush()
new String(baos.toByteArray)
} catch {
case ex: IOException =>
throw new IllegalStateException(ex)
} finally if (baos != null) baos.close()
}
// Parse JSON String
val parseJsonStream = (inStream: String) => {
try {
val parsed = Json.parse(inStream)
Option(parsed)
} catch {
case e: Exception => System.err.println("Exception while parsing JSON: " + inStream)
e.printStackTrace()
None
}
}
I'm reading record by record and I see deserialized JSON strings in debugger, everything looks fine, but for some reason I couldn't save them into Elasticsearch, because I guess RDD is needed to call saveToEs method. This is how I read avro records from Kafka:
val kafkaStream : InputDStream[ConsumerRecord[String, GenericRecord]] = KafkaUtils.createDirectStream[String, GenericRecord](ssc, PreferBrokers, Subscribe[String, GenericRecord](KAFKA_AVRO_TOPICS, kafkaParams))
val kafkaStreamParsed= kafkaStream.foreachRDD(rdd => {
rdd.foreach( x => {
val jsonString: String = avroToJsonString(x.value())
parseJsonStream(jsonString)
})
})
In case when I was reading json (not Avro) records, I was able to do it with:
EsSparkStreaming.saveToEs(kafkaStreamParsed, ELASTICSEARCH_EVENTS_INDEX + "/" + ELASTICSEARCH_TYPE)
I have an error in saveToEs method saying
Cannot resolve overloaded method 'saveToEs'
Tried to make rdd with sc.makeRDD() but had no luck either. How should I put all these records from batch job into RDD and afterward to Elasticsearch or I'm doing it all wrong?
UPDATE
Tried with solution:
val messages: DStream[Unit] = kafkaStream
.map(record => record.value)
.flatMap(record => {
val record1 = avroToJsonString(record)
JSON.parseFull(record1).map(rawMap => {
val map = rawMap.asInstanceOf[Map[String,String]]
})
})
again with the same Error (cannot resolve overloaded method)
UPDATE2
val kafkaStreamParsed: DStream[Any] = kafkaStream.map(rdd => {
val eventJSON = avroToJsonString(rdd.value())
parseJsonStream(eventJSON)
})
try {
EsSparkStreaming.saveToEs(kafkaStreamParsed, ELASTICSEARCH_EVENTS_INDEX + "/" + ELASTICSEARCH_TYPE)
} catch {
case e: Exception =>
EsSparkStreaming.saveToEs(kafkaStreamParsed, ELASTICSEARCH_FAILED_EVENTS)
e.printStackTrace()
}
Now I get the records in ES.
Using Spark 2.3.0 and Scala 2.11.8
I've managed to do it:
val kafkaStream : InputDStream[ConsumerRecord[String, GenericRecord]] = KafkaUtils.createDirectStream[String, GenericRecord](ssc, PreferBrokers, Subscribe[String, GenericRecord](KAFKA_AVRO_EVENT_TOPICS, kafkaParams))
val kafkaStreamParsed: DStream[Any] = kafkaStream.map(rdd => {
val eventJSON = avroToJsonString(rdd.value())
parseJsonStream(eventJSON)
})
try {
EsSparkStreaming.saveToEs(kafkaStreamParsed, ELASTICSEARCH_EVENTS_INDEX + "/" + ELASTICSEARCH_TYPE)
} catch {
case e: Exception =>
EsSparkStreaming.saveToEs(kafkaStreamParsed, ELASTICSEARCH_FAILED_EVENTS)
e.printStackTrace()
}
I shut down a Spark StreamingContext with the following code.
Essentially a thread monitors for a boolean switch and then calls StreamingContext.stop(true,true)
Everything seems to process and all my data appears to have been collected. However, I get the following exception on shutdown.
Can I ignore? It looks like there is potential for data loss.
18/03/07 11:46:40 WARN ReceivedBlockTracker: Exception thrown while
writing record: BatchAllocationEvent(1520452000000
ms,AllocatedBlocks(Map(0 -> ArrayBuffer()))) to the WriteAheadLog.
java.lang.IllegalStateException: close() was called on
BatchedWriteAheadLog before write request with time 1520452000001
could be fulfilled.
at org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:86)
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:234)
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.allocateBlocksToBatch(ReceivedBlockTracker.scala:118)
at org.apache.spark.streaming.scheduler.ReceiverTracker.allocateBlocksToBatch(ReceiverTracker.scala:213)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
The Thread
var stopScc=false
private def stopSccThread(): Unit = {
val thread = new Thread {
override def run {
var continueRun=true
while (continueRun) {
logger.debug("Checking status")
if (stopScc == true) {
getSparkStreamingContext(fieldVariables).stop(true, true)
logger.info("Called Stop on Streaming Context")
continueRun=false
}
Thread.sleep(50)
}
}
}
thread.start
}
The Stream
#throws(classOf[IKodaMLException])
def startStream(ip: String, port: Int): Unit = {
try {
val ssc = getSparkStreamingContext(fieldVariables)
ssc.checkpoint("./ikoda/cp")
val lines = ssc.socketTextStream(ip, port, StorageLevel.MEMORY_AND_DISK_SER)
lines.print
val lmap = lines.map {
l =>
if (l.contains("IKODA_END_STREAM")) {
stopScc = true
}
l
}
lmap.foreachRDD {
r =>
if (r.count() > 0) {
logger.info(s"RECEIVED: ${r.toString()} first: ${r.first().toString}")
r.saveAsTextFile("./ikoda/test/test")
}
else {
logger.info("Empty RDD. No data received")
}
}
ssc.start()
ssc.awaitTermination()
}
catch {
case e: Exception =>
logger.error(e.getMessage, e)
throw new IKodaMLException(e.getMessage, e)
}
I had the same issue and calling close() instead of stop fixed it.
this code works fine but i want to manage threads, by Future.
sendSMS method takes normally 3 to 5 seconds to execute, i want to applying future and applied at one place but want to know is it enough or not?
val c = for {
t <- Future { doSendSms("+9178787878787","i scare with threads") }
} yield t
c.map { res =>
res match {
case e: Error => {
Ok(write(Map("result" -> "error")))
}
case Success() => {
Ok(write(Map("result" -> "success")))
}
def doSendSms(recipient: String, body: String): SentSmsResult = {
try {
sendSMS(recipient, body)
Success()
} catch {
case twilioEx: TwilioRestException =>
return Error(twilioEx.toString)
case e: Exception =>
return Error(e.toString)
}
}
def sendSMS(smsTo: String, body: String) = {
val params = Map("To" -> smsTo, "From" -> twilioNumber, "Body" -> body)
val messageFactory = client.getAccount.getSmsFactory
messageFactory.create(params)
}// sending sms from twilio, this method takes 3 to 5 seconds to execute
if not how to manage Future in this code
I would use recover:
val c = for {
t <- doSendSms("+9178787878787","i scare with threads")
} yield t
def doSendSms(recipient: String, body: String): Future[SentSmsResult] =
Future {
sendSMS(recipient, body)
}
.recover {
case twilioEx: TwilioRestException => Error(twilioEx.toString)
case e: Exception => Error(e.toString)
}
}
recover will catch exceptions thrown in the future execution allowing you to return a new result wrapped in a Future, as the documentation states:
The recover combinator creates a new future which holds the same result as the original future if it completed successfully. If it did not then the partial function argument is applied to the Throwable which failed the original future.