Generalized tuple reduce - haskell

How can I write function that reduce n-tuple to (n-m)-tuple?
For example, I have (a, b, c, d, e) and want to get (a, b, c)
which is used like
let ntup = (1, "a", "b", 5, "c")
nmtup = reduce ntup 3

It appears there are some solutions to similar problems (e.g., Manipulating "arbitrary" tuples), but I'd strongly advise you to consider changing data types instead because tuples are not meant to be used in a context such as this one. Tuples are not meant to somehow iterate on the elements, but rather to pattern match against (a fixed number of) them.
An alternative could be an HList data type, as mentioned in one of the answers to the question I linked you.

Related

Manipulating Tuples in Haskell

I'm new to Haskell, I have a question regarding tuples. Is there not a way to traverse a tuple? I understand that traversal is very easy with lists but if the input is given as a tuple is there not a way to check the entire tuple as you do with a list? If that's not the case would it possible to just extract the values from the tuple into a list and perform traversal that way?
In Haskell, it’s not considered idiomatic (nor is it really possible) to use the tuple as a general-purpose traversable container. Any tuple you deal with is going to have a fixed number of elements, with the types of these elements also being fixed. (This is quite different from how tuples are idiomatically used in, for example, Python.) You ask about a situation where “the input is given as a tuple” but if the input is going to have a flexible number of elements then it definitely won’t be given as a tuple—a list is a much more likely choice.
This makes tuples seem less flexible than in some other languages. The upside is that you can examine them using pattern matching. For example, if you want to evaluate some predicate for each element of a tuple and return True if the predicate passes for all of them, you would write something like
all2 :: (a -> Bool) -> (a, a) -> Bool
all2 predicate (x, y) = predicate x && predicate y
Or, for three-element tuples,
all3 :: (a -> Bool) -> (a, a, a) -> Bool
all3 predicate (x, y, z) = predicate x && predicate y && predicate z
You might be thinking, “Wait, you need a separate function for each tuple size?!” Yes, you do, and you can start to see why there’s not a lot of overlap between the use cases for tuples and the use cases for lists. The advantages of tuples are exactly that they are kind of inflexible: you always know how many values they contain, and what type those values have. The former is not really true for lists.
Is there not a way to traverse a tuple?
As far as I know, there’s no built-in way to do this. It would be easy enough to write down instructions for traversing a 2-tuple, traversing a 3-tuple, and so on, but this would have the big limitation that you’d only be able to deal with tuples whose elements all have the same type.
Think about the map function as a simple example. You can apply map to a list of type [a] as long as you have a function a -> b. In this case map looks at each a value in turn, passes it to the function, and assembles the list of resulting b values. But with a tuple, you might have three elements whose values are all different types. Your function for converting as to bs isn’t sufficient if the tuple consists of two a values and a c! If you try to start writing down the Foldable instance or the Traversable instance even just for two-element tuples, you quickly realize that those typeclasses aren’t designed to handle containers whose values might have different types.
Would it be possible to just extract the values from the tuple into a list?
Yes, but you would need a separate function for each possible size of the input tuple. For example,
tupleToList2 :: (a, a) -> [a]
tupleToList2 (x, y) = [x, y]
tupleToList3 :: (a, a, a) -> [a]
tupleToList3 (x, y, z) = [x, y, z]
The good news, of course, is that you’re never going to get a situation where you have to deal with tuples of arbitrary size, because that isn’t a thing that can happen in Haskell. Think about the type signature of a function that accepted a tuple of any size: how could you write that?
In any situation where you’re accepting a tuple as input, it’s probably not necessary to convert the tuple to a list first, because the pattern-matching syntax means that you can just address each element of the tuple individually—and you always know exactly how many such elements there are going to be.
If your tuple is a homogeneous tuple, and you don't mind to use the third-party package, then lens provides some functions to traverse each elements in an arbitrary tuple.
ghci> :m +Control.Lens
ghci> over each (*10) (1, 2, 3, 4, 5) --traverse each element
(10,20,30,40,50)
Control.Lens.Tuple provides some lens to get and set the nth element up to 19th.
You can explore the lens package for more information. If you want to learn the lens package, Optics by examples by Chris Penner is a good book.

Flattening tuples in Haskell

In Haskell we can flatten a list of lists Flatten a list of lists
For simple cases of tuples, I can see how we would flatten certain tuples, as in the following examples:
flatten :: (a, (b, c)) -> (a, b, c)
flatten x = (fst x, fst(snd x), snd(snd x))
flatten2 :: ((a, b), c) -> (a, b, c)
flatten2 x = (fst(fst x), snd(fst x), snd x)
However, I'm after a function that accepts as input any nested tuple and which flattens that tuple.
Can such a function be created in Haskell?
If one cannot be created, why is this the case?
No, it's not really possible. There are two hurdles to clear.
The first is that all the different sizes of tuples are different type constructors. (,) and (,,) are not really related to each other at all, except in that they happen to be spelled with a similar sequence of characters. Since there are infinitely many such constructors in Haskell, having a function which did something interesting for all of them would require a typeclass with infinitely many instances. Whoops!
The second is that there are some very natural expectations we naively have about such a function, and these expectations conflict with each other. Suppose we managed to create such a function, named flatten. Any one of the following chunks of code seems very natural at first glance, if taken in isolation:
flattenA :: ((Int, Bool), Char) -> (Int, Bool, Char)
flattenA = flatten
flattenB :: ((a, b), c) -> (a, b, c)
flattenB = flatten
flattenC :: ((Int, Bool), (Char, String)) -> (Int, Bool, Char, String)
flattenC = flatten
But taken together, they seem a bit problematic: flattenB = flatten can't possibly be type-correct if both flattenA and flattenC are! Both of the input types for flattenA and flattenC unify with the input type to flattenB -- they are both pairs whose first component is itself a pair -- but flattenA and flattenC return outputs with differing numbers of components. In short, the core problem is that when we write (a, b), we don't yet know whether a or b is itself a tuple and should be "recursively" flattened.
With sufficient effort, it is possible to do enough type-level programming to put together something that sometimes works on limited-size tuples. But it is 1. a lot of up-front effort, 2. very little long-term programming efficiency payoff, and 3. even at use sites requires a fair amount of boilerplate. That's a bad combo; if there's use-site boilerplate, then you might as well just write the function you cared about in the first place, since it's generally so short to do so anyway.

Apply function to all pairs efficiently

I need a second order function pairApply that applies a binary function f to all unique pairs of a list-like structure and then combines them somehow. An example / sketch:
pairApply (+) f [a, b, c] = f a b + f a c + f b c
Some research leads me to believe that Data.Vector.Unboxed probably will have good performance (I will also need fast access to specific elements); also it necessary for Statistics.Sample, which would come in handy further down the line.
With this in mind I have the following, which almost compiles:
import qualified Data.Vector.Unboxed as U      
pairElement :: (U.Unbox a, U.Unbox b)    
=> (U.Vector a)                    
  -> (a -> a -> b)                   
  -> Int                             
-> a                               
 -> (U.Vector b)                    
pairElement v f idx el =
U.map (f el) $ U.drop (idx + 1) v            
pairUp :: (U.Unbox a, U.Unbox b)   
=> (a -> a -> b)                        
 -> (U.Vector a)                         
-> (U.Vector (U.Vector b))
pairUp f v = U.imap (pairElement v f) v 
pairApply :: (U.Unbox a, U.Unbox b)
=> (b -> b -> b)                     
-> b                                 
 -> (a -> a -> b)                     
-> (U.Vector a)                      
 -> b
pairApply combine neutral f v =
folder $ U.map folder (pairUp f v) where
folder = U.foldl combine neutral
The reason this doesn't compile is that there is no Unboxed instance of a U.Vector (U.Vector a)). I have been able to create new unboxed instances in other cases using Data.Vector.Unboxed.Deriving, but I'm not sure it would be so easy in this case (transform it to a tuple pair where the first element is all the inner vectors concatenated and the second is the length of the vectors, to know how to unpack?)
My question can be stated in two parts:
Does the above implementation make sense at all or is there some quick library function magic etc that could do it much easier?
If so, is there a better way to make an unboxed vector of vectors than the one sketched above?
Note that I'm aware that foldl is probably not the best choice; once I've got the implementation sorted I plan to benchmark with a few different folds.
There is no way to define a classical instance for Unbox (U.Vector b), because that would require preallocating a memory area in which each element (i.e. each subvector!) has the same fixed amount of space. But in general, each of them may be arbitrarily big, so that's not feasible at all.
It might in principle be possible to define that instance by storing only a flattened form of the nested vector plus an extra array of indices (where each subvector starts). I once briefly gave this a try; it actually seems somewhat promising as far as immutable vectors are concerned, but a G.Vector instance also requires a mutable implementation, and that's hopeless for such an approach (because any mutation that changes the number of elements in one subvector would require shifting everything behind it).
Usually, it's just not worth it, because if the individual element vectors aren't very small the overhead of boxing them won't matter, i.e. often it makes sense to use B.Vector (U.Vector b).
For your application however, I would not do that at all – there's no need to ever wrap the upper element-choices in a single triangular array. (And it would be really bad for performance to do that, because it make the algorithm take O (n²) memory rather than O (n) which is all that's needed.)
I would just do the following:
pairApply combine neutral f v
= U.ifoldl' (\acc i p -> U.foldl' (\acc' q -> combine acc' $ f p q)
acc
(U.drop (i+1) v) )
neutral v
This corresponds pretty much to the obvious nested-loops imperative implementation
pairApply(combine, b, f, v):
for(i in 0..length(v)-1):
for(j in i+1..length(v)-1):
b = combine(b, f(v[i], v[j]);
return b;
My answer is basically the same as leftaroundabout's nested-loops imperative implementation:
pairApply :: (Int -> Int -> Int) -> Vector Int -> Int
pairApply f v = foldl' (+) 0 [f (v ! i) (v ! j) | i <- [0..(n-1)], j <- [(i+1)..(n-1)]]
where n = length v
As far as I know, I do not see any performance issue with this implementation.
Non-polymorphic for simplicity.

Lens setter to add element to end of tuple

Is there any lens that will help me do the following transformation for a tuple of any length (say up 10-15 elements, at least):
(a, b, c) -> d -> (a, b, c, d)
To get a lens you need a getter an setter functions. Unfortunately, there is no way to obtain a fourth element of a triple, (except for Nothing, or any other unit type). So, you end up with a bunch of setters, which are trivial pattern-matching functions (one for each n-tuple), but not a lens.
Probably, you need a simple list, or some free construction, if you really need non-uniform container?

What is the name for the contrary of Tuple or Either with more than two options?

There is a Tuple as a Product of any number of types and there is an Either as a Sum of two types. What is the name for a Sum of any number of types, something like this
data Thing a b c d ... = Thing1 a | Thing2 b | Thing3 c | Thing4 d | ...
Is there any standard implementation?
Before I make the suggestion against using such types, let me explain some background.
Either is a sum type, and a pair or 2-tuple is a product type. Sums and products can exist over arbitrarily many underlying types (sets). However, in Haskell, only tuples come in a variety of sizes out of the box. Either on the other hand, can to be (arbitrarily) nested to achieve that: Either Foo (Either Bar Baz).
Of course it's easy to instead define e.g. the types Either3 and Either4 etc, in the spirit of 3-tuples, 4-tuples and so on.
data Either3 a b c = Left a | Middle b | Right c
data Either4 a b c d = LeftMost a | Left b | Right c | RightMost d
...if you really want. Or you can find a library the does this, but I doubt you could call it "standard" by any standards...
However, if you do define your own generic sum and product types, they will be completely isomorphic to any type that is structurally equivalent, regardless of where it is defined. This means that you can, with relative ease, nicely adapt your code to interface with any other code that uses an alternative definition.
Furthermore, it is even very likely to be beneficial because that way you can give more meaningful, descriptive names to your sum and product types, instead of going with the generic tuple and either. In fact, some people advise for using custom types because it essentially adds static type safety. This also applies to non-sum/product types, e.g.:
employment :: Bool -- so which one is unemplyed and which one is employed?
data Empl = Employed | Unemployed
employment' :: Empl -- no ambiguity
or
person :: (Name, Age) -- yeah but when you see ("Erik", 29), is it just some random pair of name and age, or does it represent a person?
data Person = Person { name :: Name, age :: Age }
person' :: Person -- no ambiguity
— above, Person really encodes a product type, but with more meaning attached to it. You can also do newtype Person = Person (Name, Age), and it's actually quite equivalent anyway. So I always just prefer a nice and intention-revealing custom type. The same goes about Either and custom sum types.
So basically, Haskell gives you all the tools necessary to quickly build your own custom types with very clean and readable syntax, so it's best if we use it not resort to primitive types like tuples and either. However, it's nice to know about this isomorphism, for example in the context of generic programming. If you want to know more about that, you can google up "scrap your boilerplate" and "template your boilerplate" and just "(datatype) generic programming".
P.S. The reason they are called sum and product types respectively is that they correspond to set-union (sum) and set-product. Therefore, the number of values (or unique instances if you will) in the set that is described by the product type (a, b) is the product of the number of values in a and the number of values in b. For example (Bool, Bool) has exactly 2*2 values: (True, True), (False, False), (True, False), (False, True).
However Either Bool Bool has 2+2 values, Left True, Left False, Right True, Right False. So it happens to be the same number but that's obviously not the case in general.
But of course this can also be said about our custom Person product type, so again, there is little reason to use Either and tuples.
There are some predefined versions in HaXml package with OneOfN, TwoOfN, .. constructors.
In a generic context, this is usually done inductively, using Either or
data (:+:) f g a = L1 (f a) | R1 (g a)
The latter is defined in GHC.Generics to match the funny way it handles things.
In fact, the generic approach is to break every algebraic datatype down into (:+:) and
data (:*:) f g a = f a :*: f a
along with some extra stuff. That is, it turns everything into binary sums and binary products.
In a more concrete context, you're almost always better off using a custom algebraic datatype for things bigger than pairs or with more options than Either, as others have discussed. Slightly larger tuples (triples and maybe 4-tuples) can be useful for local one-off constructs, but it's hard to see how you'd use larger general sum types as one-offs.
Such a type is usually called a sum, variant, union, or tagged union type. Because the capability is a built-in feature of data types in Haskell, there's no name for it widely used in Haskell code. The Report only calls them "algebraic datatypes" (usually abbreviated to ADT), so that's the name you'll see most often in comments, but this name includes types with only one data constructor, which are only sum types in the trivial sense.

Resources