I'm trying to implement a Fisher-Yates shuffle of some data. This algorithm is easy to implement for one-dimensional arrays. However, I need to be able to shuffle data in a two-dimensional matrix.
An approach which I think could generalize nicely to higher dimensional arrays is to convert my arbitrarily dimensioned matrix to a single-dimensional array of indices, shuffle those, and then reorganize the matrix by swapping the element at each index of this index array with the element at the index of the element of the index array. In other words, to take a 2x2 matrix such as:
1 2
3 4
I would convert this into this "array":
[(0, (0,0)), (1, (0,1)), (2, ((1,0)), (3, (1,1))]
This I would then scramble per normal into, say,
[(0, (1,0)), (1, (0,1)), (2, ((1,1)), (3, (0,0))]
Once reorganized, the original matrix would become:
2 3
4 1
My basic approach here is that I want to have a type class that looks something like this:
class Shufflable a where
indices :: a -> Array Int b
reorganize :: a -> Array Int b -> a
Then I'll have a function to perform the shuffle which looks like this:
fisherYates :: (RandomGen g) => g -> Array Int b -> (Array Int b, g)
The thinking is that (minus the RandomGen plumbing) I should be able to shuffle a shuffleable thing like so:
shuffle :: (Shufflable a, RandomGen g) => a -> g -> (a, g)
shuffle array = reorganize array (fisherYates (indices array))
Here's what I have so far:
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies, FlexibleInstances #-}
module Shuffle where
import Data.Array hiding (indices)
import System.Random
fisherYates :: (RandomGen g) => Array Int e -> g -> (Array Int e, g)
fisherYates arr gen = go max gen arr
where
(_, max) = bounds arr
go 0 g arr = (arr, g)
go i g arr = go (i-1) g' (swap arr i j)
where
(j, g') = randomR (0, i) g
class Shuffle a b | a -> b where
indices :: a -> Array Int b
reorganize :: a -> Array Int b -> a
shuffle :: (Shuffle a b, RandomGen g) => a -> g -> (a, g)
shuffle a gen = (reorganize a indexes, gen')
where
(indexes, gen') = fisherYates (indices a) gen
instance (Ix ix) => Shuffle (Array ix e) ix where
reorganize a = undefined
indices a = array (0, maxIdx) (zip [0..maxIdx] (range bound))
where
bound = bounds a
maxIdx = rangeSize bound - 1
swap :: Ix i => Array i e -> i -> i -> Array i e
swap arr i j = arr // [ (i, i'), (j, j') ]
where
i' = arr!j
j' = arr!i
My problems:
I feel like this is a lot of language extensions for solving a simple problem. Would it be easier to understand or write another way?
I feel like the community is moving towards type families over functional dependencies. Is there a way to use that instead to solve this problem?
A part of me wonders if the fisherYates function can be moved into the Shuffle typeclass somehow. Would it be possible and/or worth doing to set this up so that you either implement shuffle or implement both indices and reorganize?
Thanks!
You might want to look into repa, which offers n-dimensional arrays which encode their shape (dimensions) into the type; you can code generic operations that work on arrays of any shape with it.
I think you could avoid a typeclass entirely by constructing the array with backpermute or fromFunction and translating the indices (it's more efficient than it looks, since it gets turned into an unboxed array when you force it; in fact, backpermute is implemented in terms of fromFunction under the hood).
repa itself uses quite a few language extensions, but you might find it preferable to the standard library's arrays for reasons of both performance (repa's arrays are unboxed, and the standard operations offered do nice things like automatic parallelisation) and convenience (IMO repa has a nicer API than the standard arrays).
Here's a good introduction to repa.
Admittedly, none of this directly simplifies your code. But if repa's arrays are a good fit for you, then the code you end up with will probably avoid many of the complexities of your current solution.
That said, turning your use of functional dependencies into a type family is really simple; the Shuffle class becomes
class Shuffle a where
type Elt a
indices :: a -> Array Int (Elt a)
reorganize :: a -> Array Int (Elt a) -> a
the instance becomes
instance (Ix ix) => Shuffle (Array ix e) where
type Elt (Array ix e) = ix
...
and the Shuffle a b constraint becomes Shuffle a.
Related
I'm trying to get some basic information on the performance characteristics of branches in SBV.
Let's suppose I have an SInt16 and a very sparse lookup table Map Int16 a. I can implement the lookup with nested ite:
sCase :: (Mergeable a) => SInt16 -> a -> Map Int16 a -> a
sCase x def = go . toList
where
go [] = def
go ((k,v):kvs) = ite (x .== literal k) v (go kvs)
However, this means the generated tree will be very deep.
Does that matter?
If yes, is it better to instead generate a balanced tree of branches, effectively mirroring the Map's structure? Or is there some other scheme that would give even better performance?
If there are less than 256 entries in the map, would it change anything to "compress" it so that sCase works on an SInt8 and a Map Int8 a?
Is there some built-in SBV combinator for this use case that works better than iterated ite?
EDIT: It turns out that it matters a lot what my a is, so let me add some more detail to that. I am currently using sCase to branch in a stateful computation modeled as an RWS r w s a, with the following instances:
instance forall a. Mergeable a => Mergeable (Identity a) where
symbolicMerge force cond thn els = Identity $ symbolicMerge force cond (runIdentity thn) (runIdentity els)
instance (Mergeable s, Mergeable w, Mergeable a, forall a. Mergeable a => Mergeable (m a)) => Mergeable (RWST r w s m a) where
symbolicMerge force cond thn els = Lazy.RWST $
symbolicMerge force cond (runRWST thn) (runRWST els)
So stripping away all the newtypes, I'd like to branch into something of type r -> s -> (a, s, w) s.t. Mergeable s, Mergeable w and Mergeable a.
Symbolic look-ups are expensive
Symbolic array lookup will be expensive regardless of what data-structure you use. It boils down to the fact that there's no information available to the symbolic execution engine to cut-down on the state-space, so it ends up doing more or less what you coded yourself.
SMTLib Arrays
However, the best solution in these cases is to actually use SMT's support for arrays: http://smtlib.cs.uiowa.edu/theories-ArraysEx.shtml
SMTLib arrays are different than what you'd consider as an array in a regular programming language: It does not have bounds. In that sense, it's more of a map from inputs to outputs, spanning the entire domain. (i.e., they are equivalent to functions.) But SMT has custom theories to deal with arrays and thus they can handle problems involving arrays much more efficiently. (On the down-side, there's no notion of index-out-of-bounds or somehow controlling the range of elements you can access. You can code those up yourself on top of the abstraction though, leaving it up to you to decide how you want to handle such invalid accesses.)
If you are interested in learning more about how SMT solvers deal with arrays, the classic reference is: http://theory.stanford.edu/~arbrad/papers/arrays.pdf
Arrays in SBV
SBV supports arrays, through the SymArray class: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SymArray
The SFunArray type actually does not use SMTLib arrays. This was designed to support solvers that didn't understand Arrays, such as ABC: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SFunArray
The SArray type fully supports SMTLib arrays: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SArray
There are some differences between these types, and the above links describe them. However, for most purposes, you can use them interchangeably.
Converting a Haskell map to an SBV array
Going back to your original question, I'd be tempted to use an SArray to model such a look up. I'd code it as:
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import qualified Data.Map as M
import Data.Int
-- Fill an SBV array from a map
mapToSArray :: (SymArray array, SymVal a, SymVal b) => M.Map a (SBV b) -> array a b -> array a b
mapToSArray m a = foldl (\arr (k, v) -> writeArray arr (literal k) v) a (M.toList m)
And use it as:
g :: Symbolic SBool
g = do let def = 0
-- get a symbolic array, initialized with def
arr <- newArray "myArray" (Just def)
let m :: M.Map Int16 SInt16
m = M.fromList [(5, 2), (10, 5)]
-- Fill the array from the map
let arr' :: SArray Int16 Int16 = mapToSArray m arr
-- A simple problem:
idx1 <- free "idx1"
idx2 <- free "idx2"
pure $ 2 * readArray arr' idx1 + 1 .== readArray arr' idx2
When I run this, I get:
*Main> sat g
Satisfiable. Model:
idx1 = 5 :: Int16
idx2 = 10 :: Int16
You can run it as satWith z3{verbose=True} g to see the SMTLib output it generates, which avoids costly lookups by simply delegating those tasks to the backend solver.
Efficiency
The question of whether this will be "efficient" really depends on how many elements your map has that you're constructing the array from. The larger the number of elements and the trickier the constraints, the less efficient it will be. In particular, if you ever write to an index that is symbolic, I'd expect slow-downs in solving time. If they're all constants, it should be relatively performant. As is usual in symbolic programming, it's really hard to predict any performance without seeing the actual problem and experimenting with it.
Arrays in the query context
The function newArray works in the symbolic context. If you're in a query context, instead use freshArray: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV-Control.html#v:freshArray
I'm having trouble printing contents of a custom matrix type I made. When I try to do it tells me
Ambiguous occurrence `show'
It could refer to either `MatrixShow.show',
defined at Matrices.hs:6:9
or `Prelude.show',
imported from `Prelude' at Matrices.hs:1:8-17
Here is the module I'm importing:
module Matrix (Matrix(..), fillWith, fromRule, numRows, numColumns, at, mtranspose, mmap) where
newtype Matrix a = Mat ((Int,Int), (Int,Int) -> a)
fillWith :: (Int,Int) -> a -> (Matrix a)
fillWith (n,m) k = Mat ((n,m), (\(_,_) -> k))
fromRule :: (Int,Int) -> ((Int,Int) -> a) -> (Matrix a)
fromRule (n,m) f = Mat ((n,m), f)
numRows :: (Matrix a) -> Int
numRows (Mat ((n,_),_)) = n
numColumns :: (Matrix a) -> Int
numColumns (Mat ((_,m),_)) = m
at :: (Matrix a) -> (Int, Int) -> a
at (Mat ((n,m), f)) (i,j)| (i > 0) && (j > 0) || (i <= n) && (j <= m) = f (i,j)
mtranspose :: (Matrix a) -> (Matrix a)
mtranspose (Mat ((n,m),f)) = (Mat ((m,n),\(j,i) -> f (i,j)))
mmap :: (a -> b) -> (Matrix a) -> (Matrix b)
mmap h (Mat ((n,m),f)) = (Mat ((n,m), h.f))
This is my module:
module MatrixShow where
import Matrix
instance (Show a) => Show (Matrix a) where
show (Mat ((x,y),f)) = show f
Also is there some place where I can figure this out on my own, some link with instructions or some tutorial or something to learn how to do this.
The problem is with your indentation. The definition of show needs to be indented relative to the instance show a => Show (Matrix a). As it is, it appears that you are trying to define a new function called show, unrelated to the Show class, which you can't do.
#dfeuer, whose name I continue to have trouble spelling, has given you the direct answer - Haskell is sensitive to layout - but I'm going to try to help you with the underlying question that you've alluded to in the comments, without giving you the full answer.
You mentioned that you were confused about how matrices are represented. Read the source, Luke:
newtype Matrix a = Mat ((Int,Int), (Int,Int) -> a)
This newtype declaration tells you that a Matrix is formed from a pair ((Int,Int), (Int,Int) -> a). If you split up the tuple, that's an (Int, Int) pair and a function of type (Int, Int) -> a (a function with two integer arguments which returns something of arbitrary type a). This suggests to me that the first part of the tuple represents the size of the matrix, and the second part is a function mapping coordinates onto elements. This hypothesis seems to be confirmed by some of the example code your professor has given you - have a look at at or mtranspose, for example.
So, the question is - given the width and height of the matrix, and a function which will give you the element at a given coordinate, how do we give a string showing the items in the matrix?
The first thing we need to do is enumerate all the possible coordinates for the given width and height of the matrix. Haskell provides some useful syntactic constructs for this sort of operation - we can write [x .. y] to enumerate all the values between x and y, and use a list comprehension to unpack those enumerations in a nested loop.
coords :: (Int, Int) -- (width, height)
-> [(Int, Int)] -- (x, y) pairs
coords (w, h) = [(x, y) | x <- [0 .. w], y <- [0 .. h]]
For example:
ghci> coords (2, 4)
[(0,0),(0,1),(0,2),(0,3),(0,4),(1,0),(1,1),(1,2),(1,3),(1,4),(2,0),(2,1),(2,2),(2,3),(2,4)]
Now that we've worked out how to list all the possible coordinates in a matrix, how do we turn coordinates into elements of type a? Well, the Mat constructor contains a function (Int, Int) -> a which gives you the element associated with a single coordinate. We need to apply that function to each of the coordinates in the list which we just enumerated. This is what map does.
elems :: Matrix a -> [a]
elems (Mat (size, f)) = map f $ coords size
So, there's the code to enumerate the elements of a matrix. Can you figure out how to modify this code so that a) it shows the elements as a string and b) it shows them in a row-by-row fashion? You'll probably need to adjust both of these functions.
I suppose the broader point I'd like to make is that even though it feels like your professor has thrown you into the deep end, it's always possible to do a little detective work and figure out for yourself what something means. Many - most? - of the people answering questions on this site are self-taught programmers, myself included. We persevered!
After all, it's just code. If a computer's going to understand it then it must be written down on the page, and that means that you can understand it, too.
Is there any recommended way to use typeclasses to emulate OCaml-like parametrized modules?
For an instance, I need the module that implements the complex
generic computation, that may be parmetrized with different
misc. types, functions, etc. To be more specific, let it be
kMeans implementation that could be parametrized with different
types of values, vector types (list, unboxed vector, vector, tuple, etc),
and distance calculation strategy.
For convenience, to avoid crazy amount of intermediate types, I want to
have this computation polymorphic by DataSet class, that contains all
required interfaces. I also tried to use TypeFamilies to avoid a lot
of typeclass parameters (that cause problems as well):
{-# Language MultiParamTypeClasses
, TypeFamilies
, FlexibleContexts
, FlexibleInstances
, EmptyDataDecls
, FunctionalDependencies
#-}
module Main where
import qualified Data.List as L
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Distances
-- contains instances for Euclid distance
-- import Distances.Euclid as E
-- contains instances for Kulback-Leibler "distance"
-- import Distances.Kullback as K
class ( Num (Elem c)
, Ord (TLabel c)
, WithDistance (TVect c) (Elem c)
, WithDistance (TBoxType c) (Elem c)
)
=> DataSet c where
type Elem c :: *
type TLabel c :: *
type TVect c :: * -> *
data TDistType c :: *
data TObservation c :: *
data TBoxType c :: * -> *
observations :: c -> [TObservation c]
measurements :: TObservation c -> [Elem c]
label :: TObservation c -> TLabel c
distance :: TBoxType c (Elem c) -> TBoxType c (Elem c) -> Elem c
distance = distance_
instance DataSet () where
type Elem () = Float
type TLabel () = Int
data TObservation () = TObservationUnit [Float]
data TDistType ()
type TVect () = V.Vector
data TBoxType () v = VectorBox (V.Vector v)
observations () = replicate 10 (TObservationUnit [0,0,0,0])
measurements (TObservationUnit xs) = xs
label (TObservationUnit _) = 111
kMeans :: ( Floating (Elem c)
, DataSet c
) => c
-> [TObservation c]
kMeans s = undefined -- here the implementation
where
labels = map label (observations s)
www = L.map (V.fromList.measurements) (observations s)
zzz = L.zipWith distance_ www www
wtf1 = L.foldl wtf2 0 (observations s)
wtf2 acc xs = acc + L.sum (measurements xs)
qq = V.fromList [1,2,3 :: Float]
l = distance (VectorBox qq) (VectorBox qq)
instance Floating a => WithDistance (TBoxType ()) a where
distance_ xs ys = undefined
instance Floating a => WithDistance V.Vector a where
distance_ xs ys = sqrt $ V.sum (V.zipWith (\x y -> (x+y)**2) xs ys)
This code somehow compiles and work, but it's pretty ugly and hacky.
The kMeans should be parametrized by value type (number, float point number, anything),
box type (vector,list,unboxed vector, tuple may be) and distance calculation strategy.
There are also types for Observation (that's the type of sample provided by user,
there should be a lot of them, measurements that contained in each observation).
So the problems are:
1) If the function does not contains the parametric types in it's signature,
types will not be deduced
2) Still no idea, how to declare typeclass WithDistance to have different instances
for different distance type (Euclid, Kullback, anything else via phantom types).
Right now WithDistance just polymorphic by box type and value type, so if we need
different strategies, we may only put them in different modules and import the required
module. But this is a hack and non-typed approach, right?
All of this may be done pretty easy in OCaml with is't modules. What the proper approach
to implement such things in Haskell?
Typeclasses with TypeFamilies somehow look similar to parametric modules, but they
work different. I really need something like that.
It is really the case that Haskell lacks useful features found in *ML module systems.
There is ongoing effort to extend Haskell's module system: http://plv.mpi-sws.org/backpack/
But I think you can get a bit further without those ML modules.
Your design follows God class anti-pattern and that is why it is anti-modular.
Type class can be useful only if every type can have no more than a single instance of that class. E.g. DataSet () instance fixes type TVect () = V.Vector and you can't easily create similar instance but with TVect = U.Vector.
You need to start with implementing kMeans function, then generalize it by replacing concrete types with type variables and constraining those type variables with type classes when needed.
Here is little example. At first you have some non-general implementation:
kMeans :: Int -> [(Double,Double)] -> [[(Double,Double)]]
kMeans k points = ...
Then you generalize it by distance calculation strategy:
kMeans
:: Int
-> ((Double,Double) -> (Double,Double) -> Double)
-> [(Double,Double)]
-> [[(Double,Double)]]
kMeans k distance points = ...
Now you can generalize it by type of points, but this requires introducing a class that will capture some properties of points that are used by distance computation e.g. getting list of coordinates:
kMeans
:: Point p
=> Int -> (p -> p -> Coord p) -> [p]
-> [[p]]
kMeans k distance points = ...
class Num (Coord p) => Point p where
type Coord p
coords :: p -> [Coord p]
euclidianDistance
:: (Point p, Floating (Coord p))
=> p -> p -> Coord p
euclidianDistance a b
= sum $ map (**2) $ zipWith (-) (coords a) (coords b)
Now you may wish to make it a bit faster by replacing lists with vectors:
kMeans
:: (Point p, DataSet vec p)
=> Int -> (p -> p -> Coord p) -> vec p
-> [vec p]
kMeans k distance points = ...
class DataSet vec p where
map :: ...
foldl' :: ...
instance Unbox p => DataSet U.Vector p where
map = U.map
foldl' = U.foldl'
And so on.
Suggested approach is to generalize various parts of algorithm and constrain those parts with small loosely coupled type classes (when required).
It is a bad style to collect everything in a single monolithic type class.
The problem
I have a vector a of size N holding sample data, and another vector b of size M (N>M) holding indices. I would like to obtain a vector c of size N containing the filtered elements from a based on the indices in b.
The question
Is it possible to implement the desired function without using list comprehension, just basic higher-order functions like map, zipWith, filter, etc. (more precisely, their equivalents mapV, zipWithV, filterV, etc.)
Prerequisites:
I am using a Haskell Embedded Domain Specific Language (ForSyDe, module ForSyDe.Shallow.Vector), limited to a set of hardware synthesize-able functions. In order to respect the design methodology, I am allowed to use only the provided functions (thus I cannot use list comprehensions, etc.)
Disclaimer:
I did not test this code for functionality because cabal started bugging around. It worked well for lists and as I transformed every vector to a list, it should work fine although problems may arise.
Try this:
indexFilter :: (Num b, Eq b, Enum b) => Vector a -> Vector b -> Vector a
indexFilter vector indices = vector (map fst (filter (\x -> (snd x) `elem` (fromVector indices)) vectorMap))
where
vectorMap = zip (fromVector vector) [0..]
indexFilter takes a list of tuple of the form (<element>, <index>) and then returns a vector of all elements which index is in the vector b. vectorMap is a just a zip of the elements of a and their indices in the vector.
Although the answer provided by ThreeFx is a correct answer to the question, it did not solve my problem due to several constraints enforced by the design methodology (ForSyDe), which were not mentioned:
lists cannot be used (they cannot be synthesized to other backends). ForSyDe provides two data containers: Signal (for temporal span) and Vector (for spatial span). This should ensure analyzability for system synthesis.
elem does not have a ForSyDe.Shallow.Vector implementation
Solution 1
Using only what the library provides, the shortest solution I found is:
indexFilter1 :: (Num b, Eq b, Enum b) => Vector a
-> Vector b
-> Vector (Vector a)
indexFilter1 v = mapV (\idx -> selectV idx 1 1 v)
The output vector can further be unwrapped, depending on the further usage.
Solution 2
Translating ThreeFx's solution to satisfy the constraints mentioned:
indexFilter :: (Num b, Eq b, Enum b) => Vector a
-> Vector b
-> Vector a
indexFilter v idx = mapV (fst) (filterV (\x -> elemV (snd x) idx) vectorMap)
where
vectorMap = zipWithV (\a b -> (b, a)) (iterateV size (+1) 0) v
size = lengthV v
elemV a = foldlV (\acc x -> if x == a then True else acc) False
I have a problem with writing a simple function without too much repeating myself, below is a simplified example. The real program I am trying to write is a port of an in-memory database for a BI server from python. In reality there are more different types (around 8) and much more logic, that is mostly expressible as functions operating on polymorphic types, like Vector a, but still some logic must deal with different types of values.
Wrapping each value separatly (using [(Int, WrappedValue)] type) is not an option due to efficiency reasons - in real code I am using unboxed vectors.
type Vector a = [(Int, a)] -- always sorted by fst
data WrappedVector = -- in fact there are 8 of them
FloatVector (Vector Float)
| IntVector (Vector Int)
deriving (Eq, Show)
query :: [WrappedVector] -> [WrappedVector] -- equal length
query vectors = map (filterIndexW commonIndices) vectors
where
commonIndices = intersection [mapFstW vector | vector <- vectors]
intersection :: [[Int]] -> [Int]
intersection = head -- dummy impl. (intersection of sorted vectors)
filterIndex :: Eq a => [Int] -> Vector a -> Vector a
filterIndex indices vector = -- sample inefficient implementation
filter (\(idx, _) -> idx `elem` indices) vector
mapFst :: Vector a -> [Int]
mapFst = map fst
-- idealy I whould stop here, but I must write repeat for all possible types
-- and kinds of wrapped containers and function this:
filterIndexW :: [Int] -> WrappedVector -> WrappedVector
filterIndexW indices vw = case vw of
FloatVector v -> FloatVector $ filterIndex indices v
IntVector v -> IntVector $ filterIndex indices v
mapFstW :: WrappedVector -> [Int]
mapFstW vw = case vw of
FloatVector v -> map fst v
IntVector v -> map fst v
-- sample usage of query
main = putStrLn $ show $ query [FloatVector [(1, 12), (2, -2)],
IntVector [(2, 17), (3, -10)]]
How can I express such code without wrapping and unwrapping like in mapFstW and filterIndexW functions?
If you're willing to work with a few compiler extensions, ExistentialQuantification solves your problem nicely.
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE StandaloneDeriving #-}
module VectorTest where
type PrimVector a = [(Int, a)]
data Vector = forall a . Show a => Vector (PrimVector a)
deriving instance Show Vector
query :: [Vector] -> [Vector] -- equal length
query vectors = map (filterIndex commonIndices) vectors
where
commonIndices = intersection [mapFst vector | vector <- vectors]
intersection :: [[Int]] -> [Int]
intersection = head -- dummy impl. (intersection of sorted vectors)
filterIndex :: [Int] -> Vector -> Vector
filterIndex indices (Vector vector) = -- sample inefficient implementation
Vector $ filter (\(idx, _) -> idx `elem` indices) vector
mapFst :: Vector -> [Int]
mapFst (Vector l) = map fst l
-- sample usage of query
main = putStrLn $ show $ query [Vector [(1, 12), (2, -2)],
Vector [(2, 17), (3, -10)]]
The StandaloneDeriving requirement can be removed if you write a manual Show instance for Vector, e.g.
instance Show Vector where
show (Vector v) = show v
The standard option for wrapping a single type without a performance hit is to do
{-# LANGUAGE GeneralizedNewtypeDeriving #-} -- so we can derive Num
newtype MyInt = My Int deriving (Eq,Ord,Show,Num)
newtype AType a = An a deriving (Show, Eq)
Because it creates a difference only at the type level - the data representation is identical because it all gets compiled away. You can even specify that values are unboxed, BUT... this doesn't help you here because you're wrapping multiple types.
The real problem is that you're trying to represent a dynamically typed solution in a staticly typed language. There is necessarily a performance hit for dynamic typing which is hidden from you in a dynamic language but made explicit here in tagging.
You have two solutions:
Accept that dynamic typing involves additional runtime checks over static typing, and live with the ugly.
Reject the need for dynamic typing, accepting that polymorphic typing tidies up all the code and moves the type checking to compile time and data aquisition.
I feel that 2 is by far the best solution, and you should give up trying to list an program all the types you want to use, instead programming to use any type. It's neat, clear and efficient. You check validity and handle it once, then stop worrying.