Transform object in another one, manually - object

I have the next method:
fun get(browsePlayerContext: BrowsePlayerContext): Single<List<Conference>>
Which returns a Single> with the next structure for the Conference object:
data class Conference(
val label: String,
val uid: UID?,
val action: BrowsePlayerAction?,
val image: String
)
But I need to transfor this response in a:
Single<List<EntityBrowse>>
The entity browse has the same structure I mean:
data class EntityBrowse(
val label: String,
val uid: UID?,
val action: BrowsePlayerAction?,
val image: String
)
I am doing the transformation manually, but I need a more sophisticated way, because I am gona get different kind of objects and I will have to do the same transformation to EntityBrowse.
Any ideas?

You could use the .map function to transform the Conference objects into EntityBrowse objects
val conferences: List<Conference> = getConferences()
val entities: List<Entities> = conferences.map {conference ->
EntityBrowse(conference.label, conference.uid, conference.action, conference.image)
}

You can use map function on Single object to transform Single<List<Conference>> to Single<List<EntityBrowse>>:
val result: Single<List<EntityBrowse>> = get(context).map { conferences: List<Conference> ->
// transform List<Conference> to List<EntityBrowse> using `conferences` variable
conferences.map { EntityBrowse(it.label, it.uid, it.action, it.image) }
}

Related

BSON to Play JSON support for Long values

I've started using the play-json/play-json-compat libraries with reactivemongo 0.20.11.
So I can use JSON Play reads/writes while importing the 'reactivemongo.play.json._' package and then easily fetch data from a JSONCollection instead of a BSONCollection.
For most cases, this works great but for Long fields, it doesn't :(
For example:
case class TestClass(name: String, age: Long)
object TestClass {
implicit val reads = Json.reads[TestClass]
}
If I try querying using the following func:
def getData: Map[String, TestClass] = {
val res = collection.find(emptyDoc)
.cursor[TestClass]()
.collect[List](-1, Cursor.ContOnError[List[TestClass]] { case (_, t) =>
failureLogger.error(s"Failed deserializing TestClass from Mongo", t)
})
.map { items =>
items map { item =>
item.name -> item.age
} toMap
}
Await.result(res, 10 seconds)
}
Then I get the following error:
play.api.libs.json.JsResultException: JsResultException(errors:List((/age,List(ValidationError(List(error.expected.jsnumber),WrappedArray())))))
I've debugged the reading of the document and noticed that when it first converts the BSON to a JsObject, then the long field is as following:
"age": {"$long": 1526389200000}
I found a way to make this work but I really don't like it:
case class MyBSONLong(`$long`: Long)
object MyBSONLong {
implicit val longReads = Json.reads[MyBSONLong]
}
case class TestClass(name: String, age: Long)
object TestClass {
implicit val reads = (
(__ \ "name").read[String] and
(__ \ "age").read[MyBSONLong].map(_.`$long`)
) (apply _)
}
So this works, but it's a very ugly solution.
Is there a better way to do this?
Thanks in advance :)

How to parse a JSON containing string property representing JSON

I have many JSONs with structure as followed.
{
"p1":"v1",
"p2":"v2",
"p3":"v3",
"modules": "{ \"nest11\":\"n1v1\", \"nest12\":\"n1v2\", \"nest13\": { \"nest21\": \"n2v1\" } }"
}
How to parse it to this?
v1, v2, v3, n1v1, n1v2, n2v1
It is not a problem to extract "v1, v2, v3", but how to access "n1v1, n1v2, n2v1" With Spark Data Frame API
One approach is to use the DataFrameFlattener implicit class found in the official databricks site.
First you will need to define the JSON schema for the modules column then you flatten the dataframe as shown below. Here I assume that the file test_json.txt
will have the next content:
{
"p1":"v1",
"p2":"v2",
"p3":"v3",
"modules": "{ \"nest11\":\"n1v1\", \"nest12\":\"n1v2\", \"nest13\": { \"nest21\": \"n2v1\" } }"
}
Here is the code:
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.{Column, DataFrame}
import org.apache.spark.sql.types.{DataType, StructType, StringType}
implicit class DataFrameFlattener(df: DataFrame) {
def flattenSchema: DataFrame = {
df.select(flatten(Nil, df.schema): _*)
}
protected def flatten(path: Seq[String], schema: DataType): Seq[Column] = schema match {
case s: StructType => s.fields.flatMap(f => flatten(path :+ f.name, f.dataType))
case other => col(path.map(n => s"`$n`").mkString(".")).as(path.mkString(".")) :: Nil
}
}
val schema = (new StructType)
.add("nest11", StringType)
.add("nest12", StringType)
.add("nest13", (new StructType).add("nest21", StringType, false))
val df = spark.read
.option("multiLine", true).option("mode", "PERMISSIVE")
.json("C:\\temp\\test_json.txt")
df.withColumn("modules", from_json($"modules", schema))
.select($"*")
.flattenSchema
And this should be the output:
+--------------+--------------+---------------------+---+---+---+
|modules.nest11|modules.nest12|modules.nest13.nest21|p1 |p2 |p3 |
+--------------+--------------+---------------------+---+---+---+
|n1v1 |n1v2 |n2v1 |v1 |v2 |v3 |
+--------------+--------------+---------------------+---+---+---+
Please let me know if you need further clarification.
All you need to do is parse the JSON string to actual javascript object
const originalJSON = {
"p1":"v1",
"p2":"v2",
"p3":"v3",
"modules": "{ \"nest11\":\"n1v1\", \"nest12\":\"n1v2\", \"nest13\": { \"nest21\": \"n2v1\" } }"
}
const { modules, ...rest } = originalJSON
const result = {
...rest,
modules: JSON.parse(modules)
}
console.log(result)
console.log(result.modules.nest11)
When you retrieve the "modules" element, you are actually retrieving a string. You have to instantiate this string as a new JSON object. I don't know what language you're using, but you generally do something like:
String modules_str = orginalJSON.get("modules");
JSON modulesJSON = new JSON(modules_str);
String nest11_str = modulesJSON.get("nest11");

Scala - Remove all elements in a list/map of strings from a single String

Working on an internal website where the URL contains the source reference from other systems. This is a business requirement and cannot be changed.
i.e. "http://localhost:9000/source.address.com/7808/project/repo"
"http://localhost:9000/build.address.com/17808/project/repo"
I need to remove these strings from the "project/repo" string/variables using a trait so this can be used natively from multiple services. I also want to be able to add more sources to this list (which already exists) and not modify the method.
"def normalizePath" is the method accessed by services, 2 non-ideal but reasonable attempts so far. Getting stuck on a on using foldLeft which I woudl like some help with or an simpler way of doing the described. Code Samples below.
1st attempt using an if-else (not ideal as need to add more if/else statements down the line and less readable than pattern match)
trait NormalizePath {
def normalizePath(path: String): String = {
if (path.startsWith("build.address.com/17808")) {
path.substring("build.address.com/17808".length, path.length)
} else {
path
}
}
}
and 2nd attempt (not ideal as likely more patterns will get added and it generates more bytecode than if/else)
trait NormalizePath {
val pattern = "build.address.com/17808/"
val pattern2 = "source.address.com/7808/"
def normalizePath(path: String) = path match {
case s if s.startsWith(pattern) => s.substring(pattern.length, s.length)
case s if s.startsWith(pattern2) => s.substring(pattern2.length, s.length)
case _ => path
}
}
Last attempt is to use an address list(already exists elsewhere but defined here as MWE) to remove occurrences from the path string and it doesn't work:
trait NormalizePath {
val replacements = (
"build.address.com/17808",
"source.address.com/7808/")
private def remove(path: String, string: String) = {
path-string
}
def normalizePath(path: String): String = {
replacements.foldLeft(path)(remove)
}
}
Appreciate any help on this!
If you are just stripping out those strings:
val replacements = Seq(
"build.address.com/17808",
"source.address.com/7808/")
replacements.foldLeft("http://localhost:9000/source.address.com/7808/project/repo"){
case(path, toReplace) => path.replaceAll(toReplace, "")
}
// http://localhost:9000/project/repo
If you are replacing those string by something else:
val replacementsMap = Seq(
"build.address.com/17808" -> "one",
"source.address.com/7808/" -> "two/")
replacementsMap.foldLeft("http://localhost:9000/source.address.com/7808/project/repo"){
case(path, (toReplace, replacement)) => path.replaceAll(toReplace, replacement)
}
// http://localhost:9000/two/project/repo
The replacements collection can come from elsewhere in the code and will not need to be redeployed.
// method replacing by empty string
def normalizePath(path: String) = {
replacements.foldLeft(path){
case(startingPoint, toReplace) => startingPoint.replaceAll(toReplace, "")
}
}
normalizePath("foobar/build.address.com/17808/project/repo")
// foobar/project/repo
normalizePath("whateverPath")
// whateverPath
normalizePath("build.address.com/17808build.address.com/17808/project/repo")
// /project/repo
A very simple replacement could be made as follows:
val replacements = Seq(
"build.address.com/17808",
"source.address.com/7808/")
def normalizePath(path: String): String = {
replacements.find(path.startsWith(_)) // find the first occurrence
.map(prefix => path.substring(prefix.length)) // remove the prefix
.getOrElse(path) // if not found, return the original string
}
Since the expected replacements are very similar, have you tried to generalize them and use regex matching?
There are a million and one ways to extract /project/repo from a String in Scala. Here are a few I came up with:
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
path.stripPrefix(list.find(x => path.contains(x)).getOrElse(""))
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: String = /project/repo
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
list.map(x => if (path.contains(x)) {
path.takeRight(path.length - x.length)
}).filter(y => y != ()).head
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: Any = /project/repo
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
list.foldLeft(path)((a, b) => a.replace(b, ""))
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: String = /project/repo
Depends how complicated you want your code to look (or how silly you want to be), really. Note that the second example has return type Any, which might not be ideal for your scenario. Also, these examples aren't meant to be able to just take the String out of the middle of your path... they can be fairly easily modified if you want to do that though. Let me know if you want me to add some examples just stripping things like build.address.com/17808 out of a String - I'd be happy to do so.

Convert RDD[Array[Row]] to RDD[Row]

How to convert RDD[Array[Row]] to RDD[Row]?
Details:
I have some use case where my parsing function returns type Array[Row] for some data and Row for some data. How will I convert both of these to RDD[Row] for further use?
CODE SAMPLE
private def getRows(rdd: RDD[String], parser: Parser): RDD[Row] = {
var processedLines = rdd.map { line =>
map(p => parser.processBeacon(line) }
val rddOfRowsList = processedLines.map { x =>
x match {
case Right(obj) => obj.map { p =>
MyRow.getValue(p)
}//I can use flatmap here
case Left(obj) =>
MyRow.getValue(obj)
}//Cant use flatmap here
}
// Here I have to convert rddOfRowsList to RDD[Row]
//?????
val rowsRdd =?????
//
rowsRdd
}
def processLine(logMap: Map[String, String]):Either[Map[String, Object], Array[Map[String, Object]]] =
{
//process
}
Use flatMap;
rdd.flatMap(identity)
You ca use flatmap to get new rdd, and then use union to compose them.
use flatMap to flattern the contents of RDD

Can't call a function from Spark Streaming 'RDD.foreachPartition' but copying all lines of the function works

I am trying to create a stream a Spark RDD from a Worker node instead of collecting it at the Driver first. So I created the following code
def writeToKafka[K: ClassTag, V: ClassTag](rdd: RDD[(K, V)], topic: String, keySerializerClass: String, valueSerializerClass: String, brokers: String = producerBroker) = {
rdd.foreachPartition { partitionOfRecords =>
val producer = new KafkaProducer[K, V](getProducerProps(keySerializerClass, valueSerializerClass, brokers))
partitionOfRecords.foreach { message =>
producer.send(new ProducerRecord[K, V](topic, message._1, message._2))
}
producer.close()
}
}
def getProducerProps(keySerializerClass: String, valueSerializerClass: String, brokers: String): Properties = {
val producerProps: Properties = new Properties
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, keySerializerClass)
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, valueSerializerClass)
producerProps
}
Running this code causes the following exception
15/09/01 15:13:00 ERROR JobScheduler: Error running job streaming job 1441120380000 ms.3
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:805)
at com.company.opt.detector.StreamingDetector.writeToKafka(StreamingDetector.scala:84)
at com.company.opt.MyClass.MyClass$$anonfun$doStreamingWork$3.apply(MyClass.scala:47)
at com.company.opt.MyClass.MyClass$$anonfun$doStreamingWork$3.apply(MyClass.scala:47)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:534)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:534)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:42)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:176)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:176)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:176)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException: com.company.opt.MyClass.MyClass$
Serialization stack:
- object not serializable (class: com.company.opt.MyClass.MyClass$, value: com.company.opt.MyClass.MyClass$#7e2bb5e0)
- field (class: com.company.opt.detector.StreamingDetector$$anonfun$writeToKafka$1, name: $outer, type: class com.company.opt.detector.StreamingDetector)
- object (class com.company.opt.detector.StreamingDetector$$anonfun$writeToKafka$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
... 21 more
However when I just copy the code from the getProducerProps function directly into my writeToKafka function, as follows, everything works correctly.
def writeToKafka[K: ClassTag, V: ClassTag](rdd: RDD[(K, V)], topic: String, keySerializerClass: String, valueSerializerClass: String, brokers: String = producerBroker) = {
rdd.foreachPartition { partitionOfRecords =>
val producerProps: Properties = new Properties
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, keySerializerClass)
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, valueSerializerClass)
val producer = new KafkaProducer[K, V](producerProps)
partitionOfRecords.foreach { message =>
producer.send(new ProducerRecord[K, V](topic, message._1, message._2))
}
producer.close()
}
}
Can someone explain why this happens? Thanks
Given that getProducerProps is a method of the class enclosing it, when it's used from the closure, it's equivalent to do this.getProducerProps(...).
Then the issue becomes evident: this is being pulled into the closure and needs to be serialized, together with all other fields. Some member of that class is not serializable and gives this exception.
A good practice is to put such methods is a separate object:
object ProducerUtils extends Serializable {
def getProducerProps(keySerializerClass: String, valueSerializerClass: String, brokers: String): Properties = ???
}
Another way is to make that method a function and assign it to a val. Then, the value of that val is inlined an therefore will not pull the whole instance into the serializable closure:
val producerProps: (String,String,String) => Properties = ???
I agree with maasg's answer, maybe you would found interesting this post exploring the topic of ensuring which data in a closure is serialized by Spark

Resources