Pair RDD with Array Values - apache-spark

I'm new to Spark. Trying to flatten the RDD from the below format.
rdd=((key),((value1,value2),Some((value3,value4))))
to
(key,value1,value2,value3,value4)
tried to map the values as below. with case class
case class outdata(Key: String, Value1: String, Value2: String, Value3:String, Value4:String)
rdd.map{case(x,y)=>outdata(x_.1,y._1._1,y._1._2,y._2._1,y._2._2)}
getting error y._2._1 is not member

Scala's pattern matching is expressive enough to do this without a case class:
rdd.map{case (key : String, ((value1 : String, value2: String), Some((value3 : String, value4 : String)))) => (key, value1, value2, value3, value4) }

Related

Groovy - Replace ${} within a string

I receive a string like this: "The code is ${code}. Your currency is ${currency}".
The ${...} characters are already part of the string, not variables.
I have a map with key-value and I would like to replace all the occurrences of the ${key} with the value:
def myMap = [
'code' : '123456',
'currency' : 'CHF'
]
myMap.each{
fieldValue.replaceAll("${$.key}", it.value)
}
The expected result is the following one: "The code is 123456. Your currency is CHF".
I'm stuck because I don't know how to deal with the $ {} character escapes. I can escape the $ character but not {}. Any idea in order to achieve the expected result?
You need to quote the outer $ and use it.key: "\\${$it.key}"
Also you can use each here as it is for side-effects and replaceAll
does not modify the string (strings in java are immutable). So you need
something like this:
def result = myMap.inject(fieldValue){ acc, kv ->
acc.replaceAll("\\${$kv.key}", kv.value)
}
Or by using a regexp:
fieldValue.replaceAll(/\$\{(.+?)\}/, { _, k -> myMap[k] })
It works also with closure and with delegate strategy. You can evaluate your string in the context of the map. See this example:
def myMap = [
'code' : '123456',
'currency' : 'CHF'
]
closure = { "The code is ${code}. Your currency is ${currency}" }
closure.delegate = myMap
println closure()

Expression of Type Scala.Predef.String doesn't conform to expected type String

I have a function, validateCell, that takes a function, func, as one of its input parameters. It is as follows:
def validateCell[String](cellKey: String, cell: Option[String], func:(String) => Boolean): Tuple2[Boolean, String] = {
cell match {
case Some(cellContents) => (func(cellContents), s"$cellContents is not valid.")
case None => (false, s"$cellKey, not found.")
}
}
I call the function as follows:
val map = Map("Well" -> "A110")
validateCell("Well", map.get("Well"), BarcodeWell.isValidWell)
The function that is passed in this case is as follows, though I don't think it's related to the problem:
def isValidWell(w: String): Boolean = {
val row: String = w.replaceAll("[^A-Za-z]", "")
val col: Int = w.replaceAll("[^0-9]+", "").toInt
isValidRow(row) && isValidColumn(col)
}
I am expecting validateCell to return a Tuple(Boolean, String), but I get the following error:
Error:(5, 55) type mismatch;
found : java.lang.String
required: String(in method validateCell)
case Some(cellContents) => (func(cellContents), s"$cellContents is not valid.")
I can make this error go away by converting the java strings in each tuple that are returned by the case statements to Scala strings like so:
s"$cellContents is not valid.".asInstanceOf[String]
s"$cellKey, not found.".asInstanceOf[String]
This seems really silly. What am I missing here? Shouldn't this conversion be handled by Scala automatically and why are my strings being cast as Java strings in the first place?
There is no difference between Scala strings and Java strings. In fact, Predef.String aliases to java.lang.String. However, you're working with neither of these things; you're working with a type parameter.
def validateCell[String](cellKey: String, cell: Option[String], func:(String) => Boolean): Tuple2[Boolean, String] = {
This is a generic function which takes a type argument whose name is String. When you call validateCell, this type argument is being inferred and filled in for you, by something that definitely isn't a string. My guess is that you're misunderstanding the point of the brackets and that you meant to write
def validateCell(cellKey: String, cell: Option[String], func:(String) => Boolean): Tuple2[Boolean, String] = {

Spark. Keep partitioner after modifying key

first of all, sorry if this is a dump question, I'm kinda new with Spark.
I am trying to do some group operations in Spark and I'm trying to avoid extra shuffle when modifying the key of my RDD.
Original RDDs are json Strings
Simplifying the logic my code looks like this:
case class Key1 (a: String, b: String)
val grouped1: RDD[(Key1, String)] = rdd1.keyBy(generateKey1(_))
val grouped2: RDD[(Key1, String)] = rdd2.keyBy(generateKey2(_))
val joined: RDD[(Key1, (String, String)) = groped1.join(grouped2)
Now I want to include a new field in the key and do some reduce operations. So I have something like:
case class key2 (a: String, b: String, c: String)
val withNewKey: RDD[Key2, (String, String)] = joined.map{ case (key, (val1, val2)) => {
val newKey = Key2(key.a, key.b, extractWhatever(val2))
(newKey, (val1, val2))
}}
withNewKey.reduceByKey.....
If I'm not wrong, as the Key has changed the partition is lost, so the reduce operation will probably shuffle the data, but it doesn't make sense, as the key was extended and no shuffle would be needed.
Am I missing something? How can I avoid that shuffle?
Thanks
You can use mapPartitions with preservesPartitioning set to true:
joined.mapPartitions(
_.map{ case (key, (val1, val2)) => ... },
true
)

'(NSObject, AnyObject)' is not convertible to 'String'

How do I convert an object of type (NSObject, AnyObject) to the type String?
At the end of the first line of the method below, as String causes the compiler error:
'(NSObject, AnyObject)' is not convertible to 'String'
Casting street to NSString instead of String compiles, but I'm casting street to String because I want to compare it to placemark.name, which has the type String!, not NSString.
I know name and street are optionals, but I'm assuming they're not nil for now because all the places returned from MKLocalSearch seem to have non-nil names and streets.
func formatPlacemark(placemark: CLPlacemark) -> (String, String) {
let street = placemark.addressDictionary["Street"] as String
if placemark.name == street {
// Do something
}
}
A String is not an object, so you do need to cast it to an NSString. I would recommend the following syntax to cast it and unwrap it at the same time. Don't worry about comparing it to a variable of type String! since they are compatible. This will work:
func formatPlacemark(placemark: CLPlacemark) -> (String, String) {
if let street = placemark.addressDictionary["Street"] as? NSString {
if placemark.name == street {
// Do something
}
}
}
This has the added benefits that if "Street" is not a valid key in your dictionary or if the object type is something other than NSString, this will not crash. It just won't enter the block.
If you really want street to be a String you could do this:
if let street:String = placemark.addressDictionary["Street"] as? NSString
but it doesn't buy you anything in this case.
The return type from looking up via subscript for a swift dictionary has to be an optional since there may be no value for the given key.
Therefor you must do:
as String?
I think it may have to do with addressDictionary being an NSDictionary.
If you convert addressDictionary to a Swift dictionary, it should work.
let street = (placemark.addressDictionary as Dictionary<String, String>)["String"]

Evaluating string inside closure in groovy

sql.eachRow( query ) { columns ->
println columns.firstname //executes well
Eval.me( "columns.firstname" ) //No such property: columns
}
How do I evaluate the String containing the closure variable columns?
You can use the 3 parameter form of Eval.me:
Eval.me( 'columns', columns, 'columns.firstname' )

Resources