Implement Hashmap with different value types in Kotlin - hashmap

Is it possible to have a hashmap in Kotlin that takes different value types?
I've tried this:
val template = "Hello {{world}} - {{count}} - {{tf}}"
val context = HashMap<String, Object>()
context.put("world", "John")
context.put("count", 1)
context.put("tf", true)
... but that gives me a type mismatch (apparantly "John", 1 and true are not Objects)
In Java you can get around this by creating types new String("John"), new Integer(1), Boolean.TRUE, I've tried the equivalent in Kotlin, but still getting the type mismatch error.
context.put("tf", Boolean(true))
Any ideas?

In Kotlin, Any is the supertype of all the other types, and you should replace Java Object with it:
val context = HashMap<String, Any>()
context.put("world", "John")
context.put("count", 1)
context.put("tf", true)

For new visitors,It can also be done with this
val a= hashMapOf<Any,Any>( 1 to Exception(), 2 to Throwable(), Object() to 33)
Where both keys and values can be of any type.

Related

Class[_] equivalent in python

I want to perform the following scala code into python3 language
class xyz{
def abc():Unit={
val clazz:Class[_] = this.getClass()
var fields: List[String] = getFields(clazz);
val method = clazz.getDeclaredMethods()
val methodname=method.getName()
val supper= clazz.getSuperclass()
println(clazz)
println(fields)
println(method)
}}
Class[_] equivalent in python
Class[_] is a static type. Python doesn't have static types, so there is no equivalent to the static type Class[_] in Python.
I want to perform the following scala code into python3 language
class xyz{
def abc():Unit={
val clazz:Class[_] = this.getClass()
var fields: List[String] = getFields(clazz);
val method = clazz.getDeclaredMethods()
val methodname=method.getName()
val supper= clazz.getSuperclass();}
def mno():Unit={
println("hello")}}
abc is simply a NO-OP(*). mno just prints to stdout. So, the equivalent in Python is
class xyz:
def abc(self):
pass
def mno(self):
print("hello")
Note that I made abc and mno instance methods, even though it makes no sense. (But that's the same for the Scala version.)
(*) Some who knows more about corner cases and side-effects of Java Reflection can correct me here. Maybe, this triggers some kind of Classloader refresh or something like that?
You can't get one-to-one correspondence simply because Python classes are organized very differently from JVM classes.
The equivalent of getClass() is type;
there is no equivalent to Class#getFields because fields aren't necessarily defined on a class in Python, but see How to list all fields of a class (and no methods)?.
Similarly getSuperclass(); Python classes can have more than one superclass, so __bases__ returns a tuple of base classes instead of just one.

Scala interning: how does different initialisation affect comparison?

I am new to Scala but I know Java. Thus, as far as I understand, the difference is that == in Scala acts as .equals in Java, which means we are looking for the value; and eq in Scala acts as == in Java, which means we are looking for the reference address and not value.
However, after running the code below:
val greet_one_v1 = "Hello"
val greet_two_v1 = "Hello"
println(
(greet_one_v1 == greet_two_v1),
(greet_one_v1 eq greet_two_v1)
)
val greet_one_v2 = new String("Hello")
val greet_two_v2 = new String("Hello")
println(
(greet_one_v2 == greet_two_v2),
(greet_one_v2 eq greet_two_v2)
)
I get the following output:
(true,true)
(true,false)
My theory is that the initialisation of these strings differs. Hence, how is val greet_one_v1 = "Hello" different from val greet_one_v2 = new String("Hello")? Or, if my theory is incorrect, why do I have different outputs?
As correctly answered by Luis Miguel Mejía Suárez, the answer lies in String Interning which is part of what JVM (Java Virtual Machine) does automatically. To initiate a new String it needs to be initiated explicitly like in my example above; otherwise, Java will allocate the same memory for same values for optimisation reasons.

Apache Spark: pass Column as Transformer parameter

I defined a pipeline Transformer like this:
class MyTransformer(condition: Column) extends SparkTransformer {
override def transform(dataset: Dataset[_]): DataFrame = {...}
}
which is then used in a pipeline:
val pipeline = new Pipeline()
pipeline.setStages(Array(new MyTransformer(col("test).equals(lit("value"))))
pipeline.fit(df).transform(mydf)
In my transformer, I want to apply a transformation only on rows that verify the condition.
It results in a serialization issue:
Serialization stack:
- object not serializable (class: org.apache.spark.sql.Column, value: (test = value))
- field (class: my.project.MyTransformer, name: condition, type: class org.apache.spark.sql.Column)
- ...
In my understanding, the Transformer are serialized to be dispatched to executors, so every parameter should be serializable.
How can I bypass it? Is there a workaround?
Thx.
This question seems a bit old...
I don't know if my (untested) idea match your needs.
A solution could be to use the SQL expression (a String instance)
val pipeline = new Pipeline()
pipeline.setStages(Array(new MyTransformer("test = 'value'")))
pipeline.fit(df).transform(mydf)
and to use functions.expr() to convert the expression String to Column instance in Transformer.transform method.
This way, the condition is Serializable and the non-serializable objects are created when needed in executors.

Transform JavaPairDStream to Tuple3 in Java

I am experimenting with the Spark job that streams data from Kafka and produces to Cassandra.
The sample I am working with takes a bunch of words in a given time interval and publishes the word count to Cassandra. I am also trying to also publish the timestamp along with the word and its count.
What I have so far is as follows:
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc, zkQuorum, groupId, topicMap);
JavaDStream<String> lines = messages.map(Tuple2::_2);
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(x)).iterator());
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1))
.reduceByKey((i1, i2) -> i1 + i2);
Now I am trying to append to these records the timestamp. What I have tried is something like this:
Tuple3<String, Date, Integer> finalRecord =
wordCounts.map(s -> new Tuple3<>(s._1(), new Date().getTime(), s._2()));
Which of course is shown as wrong in my IDE. I am completely new to working with Spark libraries and writing in this form (I guess lambda based) functions.
Can someone help me correct this error and achieve what I am trying to do?
After some searching done on the web and studying some examples I was able to achieve what I wanted as follows.
In order to append the timestamp attribute to the existing Tuple with two values, I had to create a simple bean with which represents my Cassandra row.
public static class WordCountRow implements Serializable {
String word = "";
long timestamp;
Integer count = 0;
Then, I had map the (word, count) Tuple2 objects in the JavaPairDStream structure to a JavaDStream structure that holds objects of the above WordCountRow class.
JavaDStream<WordCountRow> wordCountRows = wordCounts.map((Function<Tuple2<String, Integer>, WordCountRow>)
tuple -> new WordCountRow(tuple._1, new Date().getTime(), tuple._2));
Finally, I could call foreachRDD method on this structure (which returns objects of WordCountRow) which I can write to Cassandra one after the other.
wordCountRows.foreachRDD((VoidFunction2<JavaRDD<WordCountRow>,Time>)(rdd,time)->{
final SparkConf sc=rdd.context().getConf();
final CassandraConnector cc=CassandraConnector.apply(sc);
rdd.foreach((VoidFunction<WordCountRow>)wordCount->{
try(Session session=cc.openSession()){
String query=String.format(Joiner.on(" ").join(
"INSERT INTO test_keyspace.word_count",
"(word, ts, count)",
"VALUES ('%s', %s, %s);"),
wordCount.word,wordCount.timestamp,wordCount.count);
session.execute(query);
}
});
});
Thanks

controlling fields nullability in spark-sql and dataframes

I'm using spark-sql's DataFrame to implement a generic data integration component.
basic idea, user configures fields by naming them and mapping them with simple sql fragments (ones that can appear in the select clause), the component adds this columns and group them in struct fields (using struct from the columns DSL).
later processing takes some of this struct fields and group them into an array, at this point I hit an issue related to one of the fields being nullable in one tuple and not nullable in the other.
since the fields are grouped in a struct I was able to extract the struct type, modify it and use the Column.cast method to apply it back to the entire tuple, I'm not sure if this approach would work for top level fields (btw, the SQL cast syntax doesn't allow specifying fields' nullability).
my question is, is there a better way to achieve this? something like a nullable() function that can be applied to an expression in order to tag it as nullable, similar to the way cast works.
Sample code:
val df = (1 to 8).map(x => (x,x+1)).toDF("x","y")
val df6 = df.select(
functions.struct( $"x" + 1 as "x1", $"y" + 1 as "y1" ) as "struct1",
functions.struct( $"x" + 1 as "x1", functions.lit(null).cast( DataTypes.IntegerType ) as "y1" ) as "struct2"
)
val df7 = df6.select( functions.array($"struct1", $"struct2") as "arr" )
this fails with this exception:
cannot resolve 'array(struct1,struct2)' due to data type mismatch:
input to function array should all be the same type, but it's
[struct, struct];
org.apache.spark.sql.AnalysisException: cannot resolve
'array(struct1,struct2)' due to data type mismatch: input to function
array should all be the same type, but it's [struct,
struct];
and the fix looks like this:
//val df7 = df6.select( functions.array($"struct1", $"struct2") as "arr" )
val df7 = df6.select( functions.array($"struct1" cast df6.schema("struct2").dataType, $"struct2" ) as "arr" )
You can make this a little cleaner with a udf that creates an Option[Int]:
val optionInt = udf[Option[Int],Int](i => Option(i))
Then you need to use optionInt($"y" + 1) when you create y1 for struct1. Everything else stays the same (although edited for conciseness).
val df6 = df.select(
struct($"x" + 1 as "x1", optionInt($"y" + 1) as "y1" ) as "struct1",
struct($"x" + 1 as "x1", lit(null).cast(IntegerType) as "y1" ) as "struct2"
)
Then df6.select(array($"struct1", $"struct2") as "arr" ) works fine.

Resources