How to convert RDD Structure - apache-spark

How to convert
RDD[(String, (((A, B), C), D))]
to
RDD[(String, (A, B, C, D))]
Do I need to use flatMapValues? I have no idea how to use it.
Can anybody help with this?

You can just use mapValues and select the values from tuple as
rdd.mapValues(x => (x._1._1._1, x._1._1._2, x._1._2, x._2))

This is almost a Scala question, more than Spark. Alternatively, try a pattern match like:
rdd.mapValues { case (((a, b), c), d) => (a, b, c, d) }
mapValues is important as it maintains the partitioner of the RDD, if any.

Related

Flattening tuples in Haskell

In Haskell we can flatten a list of lists Flatten a list of lists
For simple cases of tuples, I can see how we would flatten certain tuples, as in the following examples:
flatten :: (a, (b, c)) -> (a, b, c)
flatten x = (fst x, fst(snd x), snd(snd x))
flatten2 :: ((a, b), c) -> (a, b, c)
flatten2 x = (fst(fst x), snd(fst x), snd x)
However, I'm after a function that accepts as input any nested tuple and which flattens that tuple.
Can such a function be created in Haskell?
If one cannot be created, why is this the case?
No, it's not really possible. There are two hurdles to clear.
The first is that all the different sizes of tuples are different type constructors. (,) and (,,) are not really related to each other at all, except in that they happen to be spelled with a similar sequence of characters. Since there are infinitely many such constructors in Haskell, having a function which did something interesting for all of them would require a typeclass with infinitely many instances. Whoops!
The second is that there are some very natural expectations we naively have about such a function, and these expectations conflict with each other. Suppose we managed to create such a function, named flatten. Any one of the following chunks of code seems very natural at first glance, if taken in isolation:
flattenA :: ((Int, Bool), Char) -> (Int, Bool, Char)
flattenA = flatten
flattenB :: ((a, b), c) -> (a, b, c)
flattenB = flatten
flattenC :: ((Int, Bool), (Char, String)) -> (Int, Bool, Char, String)
flattenC = flatten
But taken together, they seem a bit problematic: flattenB = flatten can't possibly be type-correct if both flattenA and flattenC are! Both of the input types for flattenA and flattenC unify with the input type to flattenB -- they are both pairs whose first component is itself a pair -- but flattenA and flattenC return outputs with differing numbers of components. In short, the core problem is that when we write (a, b), we don't yet know whether a or b is itself a tuple and should be "recursively" flattened.
With sufficient effort, it is possible to do enough type-level programming to put together something that sometimes works on limited-size tuples. But it is 1. a lot of up-front effort, 2. very little long-term programming efficiency payoff, and 3. even at use sites requires a fair amount of boilerplate. That's a bad combo; if there's use-site boilerplate, then you might as well just write the function you cared about in the first place, since it's generally so short to do so anyway.

Extract (a, b, c) from (Value a, Value b, Value c)

I'm using esqueleto for making SQL queries, and I have one query which returns data with type (Value a, Value b, Value c). I want to extract (a, b, c) from it. I know that I can use pattern matching like that:
let (Value a, Value b, Value c) = queryResult
But I'd like to avoid repeating Value for every tuple element. This is particularly annoying when the tuple has much more elements (like 10). Is there any way to simplify this? Is there a function which I could use like that:
let (a, b, c) = someFunction queryResult
Data.Coerce from base provides coerce, which acts as your someFunction.
coerce "exchanges" newtypes for the underlying type they wrap (and visa-versa). This works even if they are wrapped deeply within other types. This is also done with zero overhead, since newtypes have the exact same runtime representation as the type they wrap.
There is a little bit more complexity with type variable roles that you can read about on the Wiki page if you're interested, but an application like this turns out to be straightforward since the package uses the "default" role for Value's type variable argument.
The library appears to have an unValue function, so you just need to choose a way to map over arbitrary length tuples. Then someFunction can become
import Control.Lens (over, each)
someFunction = (over each) unValue
If you want to try some other ways to map tuples without a lens dependency, you could check out this question: Haskell: how to map a tuple?
edit: As danidiaz points out this only works for tuples which are max 8 fields long. I'm not sure if there's a better way to generalise it.
If your tuple has all the same element type:
all3 :: (a -> b) -> (a, a, a) -> (b, b, b)
all3 f (x, y, z) = (f x, f y, f z)
This case can be abstracted over with lenses, using over each as described in #Zpalmtree’s answer.
But if your tuple has different element types, you can make the f argument of this function polymorphic using the RankNTypes extension:
all3 :: (forall a. c a -> a) -> (c x, c y, c z) -> (x, y, z)
all3 f (x, y, z) = (f x, f y, f z)
Then assuming you have unValue :: Value a -> a, you can write:
(a, b, c) = all3 unValue queryResult
However, you would need to write separate functions all4, all5, …, all10 if you have large tuples. In that case you could cut down on the boilerplate by generating them with Template Haskell. This is part of the reason that large tuples are generally avoided in Haskell, since they’re awkward to work with and can’t be easily abstracted over.

Generalized tuple reduce

How can I write function that reduce n-tuple to (n-m)-tuple?
For example, I have (a, b, c, d, e) and want to get (a, b, c)
which is used like
let ntup = (1, "a", "b", 5, "c")
nmtup = reduce ntup 3
It appears there are some solutions to similar problems (e.g., Manipulating "arbitrary" tuples), but I'd strongly advise you to consider changing data types instead because tuples are not meant to be used in a context such as this one. Tuples are not meant to somehow iterate on the elements, but rather to pattern match against (a fixed number of) them.
An alternative could be an HList data type, as mentioned in one of the answers to the question I linked you.

Scala Map, implict symbol key to string

UPDATE
All the answers are good here, but #senia's does so the most directly, without need for additional steps. Will this lead to bugs, possibly, but when using Map[Symbol, T] convention in hundreds of methods, a 1-step implicit conversion prior to map creation is preferred (avoids Symbol Map key permgen storage). At any rate, here's the pimp:
class SymbolProvidesPair(i: Symbol) { def ->[T](s: T) = (i.toString.tail, s) }
#inline implicit def symbol2String(i: Symbol) = new SymbolProvidesPair(i)
Original
It bothers me a bit using string keys in Maps, just slows me down and is, IMO, not as syntactically easy on the eyes as symbol keys.
val map: Map[String, Int] = Map("strings" -> 1, "blow" -> 2)
val map: Map[String, Int] = Map('symbols -> 1, 'rock -> 2)
So, I created an implicit to scratch my itch:
implicit def symbolKey2String[A <: Symbol, B](x:(A,B)) = (x._1.toString, x._2)
Couple things:
1) is this the correct signature? The above works, but A <: Symbol I take to mean, something that derives from Symbol vs. something that equals Symbol.
2) I'll be using this when I manually type out Maps; i.e. just for convenience. Am I going to hit any snags with this implicit? It seems edge case enough to not cause issues (like string2Int, for example), but not sure if I'm missing something.
Thanks
EDIT
Ok, well #1 I can just actually say what I mean, [Symbol, B] instead of [A <: Symbol, B]
But now I find myself with another issue, the symbol-to-string implicit boxes me into a corner of sorts as I then have to explicitly define Map[String, Type] for all new Maps (i.e. lose the nice compiler type inference) in order to be able to use symbol keys.
How then to get the best of both worlds, Map symbol keys, but with inferred [String, Type] when not specifying the type signature? i.e. have the compiler infer Map[String, Int] when I do:
val map = Map('foo -> 1)
You don't need to specify map's type explicitly:
scala> class SymbolToPait(i: Symbol) { def ->[T](s: T) = (i.toString().tail, s)}
defined class SymbolToPait
scala> implicit def symbolToPair(i: Symbol) = new SymbolToPait(i)
symbolToPair: (i: Symbol)SymbolToPait
scala> 'Symbol -> "String"
res0: (String, String) = (Symbol,String)
scala> Map('Symbol -> "String")
res1: scala.collection.immutable.Map[String,String] = Map(Symbol -> String)
scala> Map('Symbol -> 1)
res2: scala.collection.immutable.Map[String,Int] = Map(Symbol -> 1)
This kind of behavior could surprise other developers. Maybe it would be better to replace -> with some other word? For example :-> or ~>.
As you noted, there is no need for A. You probably want to drop the first character as well, which is always a '
implicit def symbolKeyToString[B](x: (Symbol, B)) = (x._1.toString.tail, x._2)
As for snags, well, you have to type out the signature of the map every time, and your keys can't contain spaces or operator characters. This is not something I would do myself...
Edit: if you don't want to type out the signature each time, use an alternative to Map.apply and forget implicits:
object Map2 {
def apply[B](xs: (Symbol, B)*) =
xs map {case (k, v) => (k.toString.tail, v)} toMap
}
I have a couple of warnings about the current solutions.
First of all, you're changing the meaning of 'sym -> x, and it will mean something different from ('sym, x). I would find this confusing.
You also make it difficult to mix code that uses this conversion with code that actually needs Map[Symbol, _].
Instead of converting the symbols to strings before putting them into a map, I recommend just converting the map. Seems much more straightforward to me.
scala> implicit def symMap2strMap[T](m: Map[Symbol, T]): Map[String, T] = m.map {
| case (key, value) => key.toString.tail -> value
| }
symMap2strMap: [T](m: Map[Symbol,T])scala.collection.immutable.Map[String,T]
scala> val sym = Map('foo -> 1, 'bar -> 2)
sym: scala.collection.immutable.Map[Symbol,Int] = Map('foo -> 1, 'bar -> 2)
scala> sym: Map[String, Int]
res0: Map[String,Int] = Map(foo -> 1, bar -> 2)
Edit:
You should never have to specify the type to explicitly convert Map[Symbol, T] to Map[String, T]. Just leave it as a Map[Symbol, T] until you hit an API which requires string keys, then let Scala implicitly convert it to the type you want.

Dropping the first element of a 3 element tuple

Is there a way to drop the first element of a 3 element tuple, so I get a 2 element tuple without having to make another function for this purpose?
(a,b,c)->(b,c)
Basically I have to use a function, which creates a 3 element tuple and then I have to use a function that only uses the last two element of it.
Thank you for your answers.
Your question almost has the required function itself!
\(a,b,c)->(b,c)
is the function you need. It is an "anonymous" function defined on the fly, you need not give it a name. So for example if you have
someFunc :: someType -> (Int, Char, Bool)
You could do
(\(a,b,c)->(b,c)) (someFunc someValue)
to get the second and third component of someFunc someValue.
You can do pattern matching with several syntactic constructs like let. So you could do something like:
let (a, b, c) = triTuple in fn (b, c)
You can use case:
case x of (a, b, c) -> (b, c)

Resources