Cannot Generate unpickler for case class with immutable HashMaps - scala-pickling

I've recently come across scala-pickling and I'm trying to see how I can use it in a project, so I've been working through a simple example of a case class with immutable hashmaps. In this example, scala-pickling doesn't generate an unpickler and I can't figure out why. The Following is a REPL session to demonstrate the issue:
scala> case class Foo(a: HashMap[Symbol,Symbol], b: HashMap[Symbol,Double], c: Symbol, d: Double)
defined class Foo
scala> val bar = Foo(new HashMap[Symbol,Symbol](), new HashMap[Symbol,Double](), 'A, 1.4)
bar: Foo = Foo(Map(),Map(),'A,1.4)
scala> val pickled = bar.pickle
pickled: scala.pickling.json.JSONPickle =
JSONPickle({
"tpe": "Foo",
"c": {
"name": "A"
},
"d": 1.4,
"a": {
},
"b": {
}
})
scala> val unpickled = pickled.unpickle[Foo]
<console>:18: error: Cannot generate an unpickler for Foo. Recompile with -Xlog-implicits for details
val unpickled = pickled.unpickle[Foo]
can anybody point out what I'm doing wrong? or is there any issue with scala-pickling?
EDIT: Actually, the same seems to happen when I generate a class with one attribute which is just a Symbol (I'll post another REPL session). Is there a special way to deal with Symbols in scala-pickling?
scala> case class Foo(symb: Symbol)
defined class Foo
scala> val foo = Foo('A)
foo: Foo = Foo('A)
scala> val pick = foo.pickle
pick: scala.pickling.json.JSONPickle =
JSONPickle({
"tpe": "Foo",
"symb": {
"name": "A"
}
})
scala> val unpick = pick.unpickle[Foo]
<console>:17: error: Cannot generate an unpickler for Foo. Recompile with -Xlog-implicits for details
val unpick = pick.unpickle[Foo]

I know this is an old post, but try
import scala.pickling._, Defaults._, json._

Related

Unable to use String.contains() in Kotlin 'if' expression

Koltin
I have created a list of things and want to skip those things which do not contain the character 'o' in them. For which I am using the str.contains() function, but this is throwing error for me.
fun main(args: Array<String>) {
val list = listOf<Any>("book", "table", "laptop", "pen")
for (str in list)
{
if (**!str.contains('o')**)
{
continue
}
println(list)
}
}
When you use listOf<Any>, what you are essentially saying, this list can have any type of object. It can be Int, String, or Any.
For example, this is perfectly valid:
val list = listOf<Any>(1, "table", 'c', true, 3.0)
println(list)
-------------
[1, table, c, true, 3.0]
But, with that compiler lost the ability to infer the specific type of object this list might contain.
So, when you do this:
val list = listOf("my", "username", "is", "saiful103a")
or
val list = listOf<String>("my", "username", "is", "saiful103a")
The compiler can infer and you will not face any problem:
for(item in list){
println(item.contains('m'))
}
With that being said, there are two solutions.
Using smart cast:
val list = listOf<Any>(1, "table", 'c', true, 3.0)
for(item in list){
if(item is String){
println(item.contains('t'))
}
}
Another is, you can specify or let the compiler infer what the type is, using
listOf("my", "username", "is", "saiful103a") or listOf<String>("my", "username", "is", "saiful103a").
contains is a function on CharSequence. Your str is of type Any and it does not have contains function.
One solution is to change the listOf<Any> to listOf<String>. String is a CharSequence.

BSON to Play JSON support for Long values

I've started using the play-json/play-json-compat libraries with reactivemongo 0.20.11.
So I can use JSON Play reads/writes while importing the 'reactivemongo.play.json._' package and then easily fetch data from a JSONCollection instead of a BSONCollection.
For most cases, this works great but for Long fields, it doesn't :(
For example:
case class TestClass(name: String, age: Long)
object TestClass {
implicit val reads = Json.reads[TestClass]
}
If I try querying using the following func:
def getData: Map[String, TestClass] = {
val res = collection.find(emptyDoc)
.cursor[TestClass]()
.collect[List](-1, Cursor.ContOnError[List[TestClass]] { case (_, t) =>
failureLogger.error(s"Failed deserializing TestClass from Mongo", t)
})
.map { items =>
items map { item =>
item.name -> item.age
} toMap
}
Await.result(res, 10 seconds)
}
Then I get the following error:
play.api.libs.json.JsResultException: JsResultException(errors:List((/age,List(ValidationError(List(error.expected.jsnumber),WrappedArray())))))
I've debugged the reading of the document and noticed that when it first converts the BSON to a JsObject, then the long field is as following:
"age": {"$long": 1526389200000}
I found a way to make this work but I really don't like it:
case class MyBSONLong(`$long`: Long)
object MyBSONLong {
implicit val longReads = Json.reads[MyBSONLong]
}
case class TestClass(name: String, age: Long)
object TestClass {
implicit val reads = (
(__ \ "name").read[String] and
(__ \ "age").read[MyBSONLong].map(_.`$long`)
) (apply _)
}
So this works, but it's a very ugly solution.
Is there a better way to do this?
Thanks in advance :)

How to parse a JSON containing string property representing JSON

I have many JSONs with structure as followed.
{
"p1":"v1",
"p2":"v2",
"p3":"v3",
"modules": "{ \"nest11\":\"n1v1\", \"nest12\":\"n1v2\", \"nest13\": { \"nest21\": \"n2v1\" } }"
}
How to parse it to this?
v1, v2, v3, n1v1, n1v2, n2v1
It is not a problem to extract "v1, v2, v3", but how to access "n1v1, n1v2, n2v1" With Spark Data Frame API
One approach is to use the DataFrameFlattener implicit class found in the official databricks site.
First you will need to define the JSON schema for the modules column then you flatten the dataframe as shown below. Here I assume that the file test_json.txt
will have the next content:
{
"p1":"v1",
"p2":"v2",
"p3":"v3",
"modules": "{ \"nest11\":\"n1v1\", \"nest12\":\"n1v2\", \"nest13\": { \"nest21\": \"n2v1\" } }"
}
Here is the code:
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.{Column, DataFrame}
import org.apache.spark.sql.types.{DataType, StructType, StringType}
implicit class DataFrameFlattener(df: DataFrame) {
def flattenSchema: DataFrame = {
df.select(flatten(Nil, df.schema): _*)
}
protected def flatten(path: Seq[String], schema: DataType): Seq[Column] = schema match {
case s: StructType => s.fields.flatMap(f => flatten(path :+ f.name, f.dataType))
case other => col(path.map(n => s"`$n`").mkString(".")).as(path.mkString(".")) :: Nil
}
}
val schema = (new StructType)
.add("nest11", StringType)
.add("nest12", StringType)
.add("nest13", (new StructType).add("nest21", StringType, false))
val df = spark.read
.option("multiLine", true).option("mode", "PERMISSIVE")
.json("C:\\temp\\test_json.txt")
df.withColumn("modules", from_json($"modules", schema))
.select($"*")
.flattenSchema
And this should be the output:
+--------------+--------------+---------------------+---+---+---+
|modules.nest11|modules.nest12|modules.nest13.nest21|p1 |p2 |p3 |
+--------------+--------------+---------------------+---+---+---+
|n1v1 |n1v2 |n2v1 |v1 |v2 |v3 |
+--------------+--------------+---------------------+---+---+---+
Please let me know if you need further clarification.
All you need to do is parse the JSON string to actual javascript object
const originalJSON = {
"p1":"v1",
"p2":"v2",
"p3":"v3",
"modules": "{ \"nest11\":\"n1v1\", \"nest12\":\"n1v2\", \"nest13\": { \"nest21\": \"n2v1\" } }"
}
const { modules, ...rest } = originalJSON
const result = {
...rest,
modules: JSON.parse(modules)
}
console.log(result)
console.log(result.modules.nest11)
When you retrieve the "modules" element, you are actually retrieving a string. You have to instantiate this string as a new JSON object. I don't know what language you're using, but you generally do something like:
String modules_str = orginalJSON.get("modules");
JSON modulesJSON = new JSON(modules_str);
String nest11_str = modulesJSON.get("nest11");

Scala - Remove all elements in a list/map of strings from a single String

Working on an internal website where the URL contains the source reference from other systems. This is a business requirement and cannot be changed.
i.e. "http://localhost:9000/source.address.com/7808/project/repo"
"http://localhost:9000/build.address.com/17808/project/repo"
I need to remove these strings from the "project/repo" string/variables using a trait so this can be used natively from multiple services. I also want to be able to add more sources to this list (which already exists) and not modify the method.
"def normalizePath" is the method accessed by services, 2 non-ideal but reasonable attempts so far. Getting stuck on a on using foldLeft which I woudl like some help with or an simpler way of doing the described. Code Samples below.
1st attempt using an if-else (not ideal as need to add more if/else statements down the line and less readable than pattern match)
trait NormalizePath {
def normalizePath(path: String): String = {
if (path.startsWith("build.address.com/17808")) {
path.substring("build.address.com/17808".length, path.length)
} else {
path
}
}
}
and 2nd attempt (not ideal as likely more patterns will get added and it generates more bytecode than if/else)
trait NormalizePath {
val pattern = "build.address.com/17808/"
val pattern2 = "source.address.com/7808/"
def normalizePath(path: String) = path match {
case s if s.startsWith(pattern) => s.substring(pattern.length, s.length)
case s if s.startsWith(pattern2) => s.substring(pattern2.length, s.length)
case _ => path
}
}
Last attempt is to use an address list(already exists elsewhere but defined here as MWE) to remove occurrences from the path string and it doesn't work:
trait NormalizePath {
val replacements = (
"build.address.com/17808",
"source.address.com/7808/")
private def remove(path: String, string: String) = {
path-string
}
def normalizePath(path: String): String = {
replacements.foldLeft(path)(remove)
}
}
Appreciate any help on this!
If you are just stripping out those strings:
val replacements = Seq(
"build.address.com/17808",
"source.address.com/7808/")
replacements.foldLeft("http://localhost:9000/source.address.com/7808/project/repo"){
case(path, toReplace) => path.replaceAll(toReplace, "")
}
// http://localhost:9000/project/repo
If you are replacing those string by something else:
val replacementsMap = Seq(
"build.address.com/17808" -> "one",
"source.address.com/7808/" -> "two/")
replacementsMap.foldLeft("http://localhost:9000/source.address.com/7808/project/repo"){
case(path, (toReplace, replacement)) => path.replaceAll(toReplace, replacement)
}
// http://localhost:9000/two/project/repo
The replacements collection can come from elsewhere in the code and will not need to be redeployed.
// method replacing by empty string
def normalizePath(path: String) = {
replacements.foldLeft(path){
case(startingPoint, toReplace) => startingPoint.replaceAll(toReplace, "")
}
}
normalizePath("foobar/build.address.com/17808/project/repo")
// foobar/project/repo
normalizePath("whateverPath")
// whateverPath
normalizePath("build.address.com/17808build.address.com/17808/project/repo")
// /project/repo
A very simple replacement could be made as follows:
val replacements = Seq(
"build.address.com/17808",
"source.address.com/7808/")
def normalizePath(path: String): String = {
replacements.find(path.startsWith(_)) // find the first occurrence
.map(prefix => path.substring(prefix.length)) // remove the prefix
.getOrElse(path) // if not found, return the original string
}
Since the expected replacements are very similar, have you tried to generalize them and use regex matching?
There are a million and one ways to extract /project/repo from a String in Scala. Here are a few I came up with:
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
path.stripPrefix(list.find(x => path.contains(x)).getOrElse(""))
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: String = /project/repo
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
list.map(x => if (path.contains(x)) {
path.takeRight(path.length - x.length)
}).filter(y => y != ()).head
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: Any = /project/repo
val list = List("build.address.com/17808", "source.address.com/7808") //etc
def normalizePath(path: String) = {
list.foldLeft(path)((a, b) => a.replace(b, ""))
}
Output:
scala> normalizePath("build.address.com/17808/project/repo")
res0: String = /project/repo
Depends how complicated you want your code to look (or how silly you want to be), really. Note that the second example has return type Any, which might not be ideal for your scenario. Also, these examples aren't meant to be able to just take the String out of the middle of your path... they can be fairly easily modified if you want to do that though. Let me know if you want me to add some examples just stripping things like build.address.com/17808 out of a String - I'd be happy to do so.

How can I retrieve the alias for a DataFrame in Spark

I'm using Spark 2.0.2. I have a DataFrame that has an alias on it, and I'd like to be able to retrieve that. A simplified example of why I'd want that is below.
def check(ds: DataFrame) = {
assert(ds.count > 0, s"${df.getAlias} has zero rows!")
}
The above code of course fails because DataFrame has no getAlias function. Is there a way to do this?
You can try something like this but I wouldn't go so far to claim it is supported:
Spark < 2.1:
import org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias
import org.apache.spark.sql.Dataset
def getAlias(ds: Dataset[_]) = ds.queryExecution.analyzed match {
case SubqueryAlias(alias, _) => Some(alias)
case _ => None
}
Spark 2.1+:
def getAlias(ds: Dataset[_]) = ds.queryExecution.analyzed match {
case SubqueryAlias(alias, _, _) => Some(alias)
case _ => None
}
Example usage:
val plain = Seq((1, "foo")).toDF
getAlias(plain)
Option[String] = None
val aliased = plain.alias("a dataset")
getAlias(aliased)
Option[String] = Some(a dataset)
Disclaimer: as stated above, this code relies on undocumented APIs subject to change. It works as of Spark 2.3.
After much digging into mostly undocumented Spark methods, here is the full code to pull the list of fields, along with the table alias for a dataframe in PySpark:
def schema_from_plan(df):
plan = df._jdf.queryExecution().analyzed()
all_fields = _schema_from_plan(plan)
iterator = plan.output().iterator()
output_fields = {}
while iterator.hasNext():
field = iterator.next()
queryfield = all_fields.get(field.exprId().id(),{})
if not queryfield=={}:
tablealias = queryfield["tablealias"]
else:
tablealias = ""
output_fields[field.exprId().id()] = {
"tablealias": tablealias,
"dataType": field.dataType().typeName(),
"name": field.name()
}
return list(output_fields.values())
def _schema_from_plan(root,tablealias=None,fields={}):
iterator = root.children().iterator()
while iterator.hasNext():
node = iterator.next()
nodeClass = node.getClass().getSimpleName()
if (nodeClass=="SubqueryAlias"):
# get the alias and process the subnodes with this alias
_schema_from_plan(node,node.alias(),fields)
else:
if tablealias:
# add all the fields, along with the unique IDs, and a new tablealias field
iterator = node.output().iterator()
while iterator.hasNext():
field = iterator.next()
fields[field.exprId().id()] = {
"tablealias": tablealias,
"dataType": field.dataType().typeName(),
"name": field.name()
}
_schema_from_plan(node,tablealias,fields)
return fields
# example: fields = schema_from_plan(df)
For Java:
As #veinhorn mentioned, it is also possible to get the alias in Java. Here is a utility method example:
public static <T> Optional<String> getAlias(Dataset<T> dataset){
final LogicalPlan analyzed = dataset.queryExecution().analyzed();
if(analyzed instanceof SubqueryAlias) {
SubqueryAlias subqueryAlias = (SubqueryAlias) analyzed;
return Optional.of(subqueryAlias.alias());
}
return Optional.empty();
}

Resources