Haskell Access Tuple Data Inside List Comprehension - haskell

I have defined a custom type as follows:
-- Atom reference number, x coordinate, y coordinate, z coordinate, element symbol,
-- atom name, residue sequence number, amino acid abbreviation
type Atom = (Int, Double, Double, Double, Word8, ByteString, Int, ByteString)
I would like to gather all of the atoms with a certain residue sequence number nm.
This would be nice:
[x | x <- p, d == nm]
where
(_, _, _, _, _, _, d, _) = x
where p is a list of atoms.
However, this does not work because I can not access the variable x outside of the list comprehension, nor can I think of a way to access a specific tuple value from inside the list comprehension.
Is there a tuple method I am missing, or should I be using a different data structure?
I know I could write a recursive function that unpacks and checks every tuple in the list p, but I am actually trying to use this nested inside an already recursive function, so I would rather not need to introduce that complexity.

This works:
[x | (_, _, _, _, _, _, d, _) <- p, d == nm]
However, you should really define your own data type here. A three-element tuple is suspicious; an eight-element tuple is very bad news indeed. Tuples are difficult to work with and less type-safe than data types (if you represent two different kinds of data with two tuples with the same element types, they can be used interchangeably). Here's how I'd write Atom as a record:
data Point3D = Point3D Double Double Double
data Atom = Atom
{ atomRef :: Int
, atomPos :: Point3D
, atomSymbol :: Word8
, atomName :: ByteString
, atomSeqNum :: Int
, atomAcidAbbrev :: ByteString
} deriving (Eq, Show)
(The "atom" prefix is to avoid clashing with the names of fields in other records.)
You can then write the list comprehension as follows:
[x | x <- p, atomSeqNum x == nm]
As a bonus, your definition of Atom becomes self-documenting, and you reap the benefits of increased type safety. Here's how you'd create an Atom using this definition:
myAtom = Atom
{ atomRef = ...
, atomPos = ...
, ... etc. ...
}
By the way, it's probably a good idea to make some of the fields of these types strict, which can be done by putting an exclamation mark before the type of the field; this helps avoid space leaks from unevaluated thunks building up. For instance, since it doesn't make much sense to evaluate a Point3D without also evaluating all its components, I would instead define Point3D as:
data Point3D = Point3D !Double !Double !Double
It would probably be a good idea to make all the fields of Atom strict too, although perhaps not all of them; for example, the ByteString fields should be left non-strict if they're generated by the program, not always accessed and possibly large. On the other hand, if their values are read from a file, then they should probably be made strict.

You should definitely use a different structure. Instead of using a tuple, take a look at records.
data Atom = Atom { reference :: Int
, position :: (Double, Double, Double)
, symbol :: Word8
, name :: ByteString
, residue :: Int
, abbreviation :: ByteString
}
You can then do something like this:
a = Atom ...
a {residue=10} -- this is now a with a residue of 10

Related

QuickCheck limit to only certain data constructor

I have a data type definition:
data Point = Point {x :: Int, h :: Int} | EmptyPoint
In my property test, I would like to limit the test only on the Point constructor cases. For example point1 - point2 = Point 0 0. This presumes that the accessor x is defined which is not the case with EmptyPoint.
in other words: I don't want EmptyPoint to be generated.
Is there a way to do that?
Instead of automatically deriving the Arbitrary class for your type (which is what, I assume, you're doing at the moment), you can just write one manually and make it generate your points however you want, for example:
instance Arbitrary Point where
arbitrary = Point <$> arbitrary <*> arbitrary
Or in a slightly more verbose way if you like:
instance Arbitrary Point where
arbitrary = do
x <- arbitrary
y <- arbitrary
pure Point { x, y }

Do newtypes incur no cost even when you cannot pattern-match on them?

Context
Most Haskell tutorials I know (e.g. LYAH) introduce newtypes as a cost-free idiom that allows enforcing more type safety. For instance, this code will type-check:
type Speed = Double
type Length = Double
computeTime :: Speed -> Length -> Double
computeTime v l = l / v
but this won't:
newtype Speed = Speed { getSpeed :: Double }
newtype Length = Length { getLength :: Double }
-- wrong!
computeTime :: Speed -> Length -> Double
computeTime v l = l / v
and this will:
-- right
computeTime :: Speed -> Length -> Double
computeTime (Speed v) (Length l) = l / v
In this particular example, the compiler knows that Speed is just a Double, so the pattern-matching is moot and will not generate any executable code.
Question
Are newtypes still cost-free when they appear as arguments of parametric types? For instance, consider a list of newtypes:
computeTimes :: [Speed] -> Length -> [Double]
computeTimes vs l = map (\v -> getSpeed v / l) vs
I could also pattern-match on speed in the lambda:
computeTimes' :: [Speed] -> Length -> [Double]
computeTimes' vs l = map (\(Speed v) -> v / l) vs
In either case, for some reason, I feel that real work is getting done! I start to feel even more uncomfortable when the newtype is buried within a deep tree of nested parametric datatypes, e.g. Map Speed [Set Speed]; in this situation, it may be difficult or impossible to pattern-match on the newtype, and one would have to resort to accessors like getSpeed.
TL;DR
Will the use of a newtype never ever incur a cost, even when the newtype appears as a (possibly deeply-buried) argument of another parametric type?
On their own, newtypes are cost-free. Applying their constructor, or pattern matching on them has zero cost.
When used as parameter for other types e.g. [T] the representation of [T] is precisely the same as the one for [T'] if T is a newtype for T'. So, there's no loss in performance.
However, there are two main caveats I can see.
newtypes and instances
First, newtype is frequently used to introduce new instances of type classes. Clearly, when these are user-defined, there's no guarantee that they have the same cost as the original instances. E.g., when using
newtype Op a = Op a
instance Ord a => Ord (Op a) where
compare (Op x) (Op y) = compare y x
comparing two Op Int will cost slightly more than comparing Int, since the arguments need to be swapped. (I am neglecting optimizations here, which might make this cost free when they trigger.)
newtypes used as type arguments
The second point is more subtle. Consider the following two implementations of the identity [Int] -> [Int]
id1, id2 :: [Int] -> [Int]
id1 xs = xs
id2 xs = map (\x->x) xs
The first one has constant cost. The second has a linear cost (assuming no optimization triggers). A smart programmer should prefer the first implementation, which is also simpler to write.
Suppose now we introduce newtypes on the argument type, only:
id1, id2 :: [Op Int] -> [Int]
id1 xs = xs -- error!
id2 xs = map (\(Op x)->x) xs
We can no longer use the constant cost implementation because of a type error. The linear cost implementation still works, and is the only option.
Now, this is quite bad. The input representation for [Op Int] is exactly, bit by bit, the same for [Int]. Yet, the type system forbids us to perform the identity in an efficient way!
To overcome this issue, safe coercions where introduced in Haskell.
id3 :: [Op Int] -> [Int]
id3 = coerce
The magic coerce function, under certain hypotheses, removes or inserts newtypes as needed to make type match, even inside other types, as for [Op Int] above. Further, it is a zero-cost function.
Note that coerce works only under certain conditions (the compiler checks for them). One of these is that the newtype constructor must be visible: if a module does not export Op :: a -> Op a you can not coerce Op Int to Int or vice versa. Indeed, if a module exports the type but not the constructor, it would be wrong to make the constructor accessible anyway through coerce. This makes the "smart constructors" idiom still safe: modules can still enforce complex invariants through opaque types.
It doesn't matter how deeply buried a newtype is in a stack of (fully) parametric types. At runtime, the values v :: Speed and w :: Double are completely indistinguishable – the wrapper is erased by the compiler, so even v is really just a pointer to a single 64-bit floating-point number in memory. Whether that pointer is stored in a list or tree or whatever doesn't make a difference either. getSpeed is a no-op and will not appear at runtime in any way at all.
So what do I mean by “fully parametric”? The thing is, newtypes can obviously make a difference at compile time, via the type system. In particular, they can guide instance resolution, so a newtype that invokes a different class method may certainly have worse (or, just as easily, better!) performance than the wrapped type. For example,
class Integral n => Fibonacci n where
fib :: n -> Integer
instance Fibonacci Int where
fib = (fibs !!)
where fibs = [ if i<2 then 1
else fib (i-2) + fib (i-1)
| i<-[0::Int ..] ]
this implementation is pretty slow, because it uses a lazy list (and performs lookups in it over and over again) for memoisation. On the other hand,
import qualified Data.Vector as Arr
-- | A number between 0 and 753
newtype SmallInt = SmallInt { getSmallInt :: Int }
instance Fibonacci SmallInt where
fib = (fibs Arr.!) . getSmallInt
where fibs = Arr.generate 754 $
\i -> if i<2 then 1
else fib (SmallInt $ i-2) + fib (SmallInt $ i-1)
This fib is much faster, because thanks to the input being limited to a small range, it is feasible to strictly allocate all of the results and store them in a fast O (1) lookup array, not needing the spine-laziness.
This of course applies again regardless of what structure you store the numbers in. But the different performance only comes about because different method instantiations are called – at runtime this means simply, completely different functions.
Now, a fully parametric type constructor must be able to store values of any type. In particular, it cannot impose any class restrictions on the contained data, and hence also not call any class methods. Therefore this kind of performance difference can not happen if you're just dealing with generic [a] lists or Map Int a maps. It can, however, occur when you're dealing with GADTs. In this case, even the actual memory layout might be completely differet, for instance with
{-# LANGUAGE GADTs #-}
import qualified Data.Vector as Arr
import qualified Data.Vector.Unboxed as UArr
data Array a where
BoxedArray :: Arr.Vector a -> Array a
UnboxArray :: UArr.Unbox a => UArr.Vector a -> Array a
might allow you to store Double values more efficiently than Speed values, because the former can be stored in a cache-optimised unboxed array. This is only possible because the UnboxArray constructor is not fully parametric.

Haskell--Manipulating data within a tuple

I'm attempting to simulate a checkers game using haskell. I am given a 4-tuple named, checkersState, that I would like to manipulate with a few different functions. So far, I have a function, oneMove, that receives input from checkerState and should return a tuple of the modified data:
The Input Tuple:
(
3600,
"",
[
"----------",
"------r---",
"----------",
"----------",
"---r-r----",
"------r---",
"---w---w-w",
"----------",
"----------",
"------w---"
],
(
49
,
43
)
)
So far I have something similar to below defining my function but am unsure how to access the individual members within the tuple checkerState. This method will take a time, array of captured pieces, board, and move to make, and return a time, array of captured pieces, and board. Currently, I would like to modify the time (INT) in the tuple depending on the state of the board:
onemove :: (Int,[Char],[[Char]],(Int,Int)) -> (Int,[Char],[[Char]])
Thanks in advance!
You can use pattern-matching to pull out the elements, do whatever changes need to be made, and pack them back into a tuple. For example, if you wanted to increment the first value, you could:
onemove (a,b,c,d) = (a + 1,b,c,d)
If you find yourself doing this a lot, you might reconsider using a tuple and instead use a data type:
data CheckersState = CheckersState { time :: Int -- field names are just
, steps :: [Char] -- guesses; change them
, board :: [[Char]] -- to something that
, pos :: (Int, Int) -- makes sense
} deriving (Eq, Read, Show)
Then you can update it with a much more convenient syntax:
onemove state = state { time = time state + 1 }
If you want to stick with tuples and you happen to be using lenses, there’s another easy way to update your tuple:
onemove = over _1 (+1)
Or if you’re using lenses and your own data type (with an appropriately-defined accessor like the one provided), you can do something similar:
_time :: Lens' CheckersState Int
_time f state = (\newTime -> state { time = newTime }) <$> f (time state)
onemove = over _time (+1)
So there’s plenty of fancy ways to do it. But the most general way is to use pattern-matching.
As icktoofay is saying, using tuples is a code smell, and records with named components is way better.
Also, using Char (and String) is a code smell. To repair it, define a data type that precisely describes what you expect in a cell of the board, like data Colour = None | Red | Black, but see next item.
And, using Lists is also a code smell. You actually want something like type Board = Data.Map.Map Pos Colour or Data.Map.Map Pos (Maybe Colour') with data Colour' = Red | Black.
Oh, and Int is also a code smell. You could define newtype Row = Row Int ; newtype Col = Col Int ; type Pos = (Row,Col). Possibly deriving Num for the newtypes, but it's not clear, e.g., you don't want to multiply row numbers. Perhaps deriving (Eq,Ord,Enum) is enough, with Enum you get pred and succ.
(Ah - this Pos is using a tuple, thus it`s smelly? Well, no, 2-tuples is allowed, sometimes.)
You use pattern matching to decompose the tuple into variables.
onemove (i, c, board, (x, y)) = <do something> (i, c, board)
However, you should define a separate data structure for the board to make your intention clear. I don't know what the meaning of the first two values. See: http://learnyouahaskell.com/making-our-own-types-and-typeclasses

How to have an operator which adds/subtracts both absolute and relative values, in Haskell

(Apologies for the weird title, but I could not think of a better one.)
For a personal Haskell project I want to have the concepts of 'absolute values' (like a frequency) and relative values (like the ratio between two frequencies). In my context, it makes no sense to add two absolute values: one can add relative values to produce new relative values, and add a relative value to an absolute one to produce a new absolute value (and likewise for subtraction).
I've defined type classes for these: see below. However, note that the operators ##+ and #+ have a similar structure (and likewise for ##- and #-). Therefore I would prefer to merge these operators, so that I have a single addition operator, which adds a relative value (and likewise a single subtraction operator, which results in a relative value). UPDATE: To clarify, my goal is to unify my ##+ and #+ into a single operator. My goal is not to unify this with the existing (Num) + operator.
However, I don't see how to do this with type classes.
Question: Can this be done, and if so, how? Or should I not be trying?
The following is what I currently have:
{-# LANGUAGE MultiParamTypeClasses #-}
class Abs a where
nullPoint :: a
class Rel r where
zero :: r
(##+) :: r -> r -> r
neg :: r -> r
(##-) :: Rel r => r -> r -> r
r ##- s = r ##+ neg s
class (Abs a, Rel r) => AbsRel a r where
(#+) :: a -> r -> a
(#-) :: a -> a -> r
I think you're looking for a concept called a Torsor. A torsor consists of set of values, set of differences, and operator which adds a difference to a value. Additionally, the set of differences must form an additive group, so differences also can be added together.
Interestingly, torsors are everywhere. Common examples include
Points and Vectors
Dates and date-differences
Files and diffs
etc.
One possible Haskell definition is:
class Torsor a where
type TorsorOf a :: *
(.-) :: a -> a -> TorsorOf a
(.+) :: a -> TorsorOf a -> a
Here are few example instances:
instance Torsor UTCTime where
type TorsorOf UTCTime = NominalDiffTime
a .- b = diffUTCTime a b
a .+ b = addUTCTime b a
instance Torsor Double where
type TorsorOf Double = Double
a .- b = a - b
a .+ b = a + b
instance Torsor Int where
type TorsorOf Int = Int
a .- b = a - b
a .+ b = a + b
In the last case, notice that the two sets of the torsors don't need to be a different set, which makes adding your relative values together simple.
For more information, see a much nicer description in Roman Cheplyakas blog
I don't think you should be trying to unify these operators. Subtracting two vectors and subtracting two points are fundamentally different operations. The fact that it's difficult to represent them as the same thing in the type system is not the type system being awkward - it's because these two concepts really are different things!
The mathematical framework behind what you're working with is the affine space.
These are already available in Haskell in the vector-space package (do cabal install vector-space at the command prompt). Rather than using multi parameter type classes, they use type families to associate a vector (relative) type with each point (absolute) type.
Here's a minimal example showing how to define your own absolute and relative data types, and their interaction:
{-# LANGUAGE TypeFamilies #-}
import Data.VectorSpace
import Data.AffineSpace
data Point = Point { px :: Float, py :: Float }
data Vec = Vec { vx :: Float, vy :: Float }
instance AdditiveGroup Vec where
zeroV = Vec 0 0
negateV (Vec x y) = Vec (-x) (-y)
Vec x y ^+^ Vec x' y' = Vec (x+x') (y+y')
instance AffineSpace Point where
type Diff Point = Vec
Point x y .-. Point x' y' = Vec (x-x') (y-y')
Point x y .+^ Vec x' y' = Point (x+x') (y+y')
You have two answers telling you what you should do, here's another answer telling you how to do what you asked for (which might not be a good idea). :)
class Add a b c | a b -> c where
(#+) :: a -> b -> c
instance Add AbsTime RelTime AbsTime where
(#+) = ...
instance Add RelTime RelTime RelTime where
(#+) = ...
The overloading for (#+) makes it very flexible. Too flexible, IMO. The only restraint is that the result type is determined by the argument types (without this FD the operator becomes almost unusable because it constrains nothing).

how can I add an unboxed array to a Haskell record?

I want to do write some monte-carlo simulations. Because of the nature of simulation, I'll get much better performance if I use mutable state. I think that unboxed mutable arrays are the way to go. There's a bunch of items I'll want to keep track of, so I've created a record type to hold the state.
import Control.Monad.State
import Data.Array.ST
data Board = Board {
x :: Int
, y :: Int
,board :: STUArray (Int,Int) Int
} deriving Show
b = Board {
x = 5
,y = 5
,board = newArray ((1,1),(10,10)) 37 :: STUArray (Int,Int) Int
}
growBoard :: State Board Int
growBoard = do s <- get
let xo = x s
yo = y s in
put s{x=xo*2, y=yo*2}
return (1)
main = print $ runState growBoard b
If I leave out the "board" field from the record, everything else works fine. But with it, I get a type error:
`STUArray (Int, Int) Int' is not applied to enough type arguments
Expected kind `?', but `STUArray (Int, Int) Int' has kind `* -> *'
In the type `STUArray (Int, Int) Int'
In the definition of data constructor `Board'
In the data type declaration for `Board'
I've read through the Array page, and I can get STUArray examples working. But as soon as I try to add one to my State record, I get the error about the unexpected kind. I'm guessing I need a monad transformer of some kind, but I don't know where to start.
How should I declare an unboxed array inside a record? How should I initialize it?
I see alot of example of unboxed STArray, but they're mostly program fragments, so I feel like I'm missing context.
Also, where can I learn more about "kinds"? I know kinds are "type types" but the abstract nature of that is making it hard to grasp.
STUArray is a mutable array, designed to be used internally from within the ST monad to implement externally-pure code. Just like STRef and all the other structures used in the ST monad, STUArray takes an additional parameter representing a state thread.
The kind error you're getting is simply telling you missed an argument: at the value level, you might get an error "expected b but got a -> b" to tell you you missed an argument; at the type level, it looks like "expected ? but got * -> *", where * represents a plain, "fully-applied" type (like Int). (You can pretend ? is the same as *; it's just there to support unboxed types, which are a GHC-specific implementation detail.)
Basically, you can think of kinds as coming in two shapes:
*, representing a concrete type, like Int, Double, or [(Float, String)];
k -> l, where k and l are both kinds, representing a type constructor, like Tree, [], IO, and STUArray. Such a type constructor takes a type of kind k, and returns a type of kind l.
If you want to use ST arrays, you'll need to add a type parameter to Board:
data Board s = Board {
x :: Int
, y :: Int
,board :: STUArray s (Int,Int) Int
} deriving Show
and use StateT (Board s) (ST s) as your monad rather than just State Board.
However, I don't see any reason to use ST or mutable structures in general here, and I would instead suggest using a simple immutable array, and mutating it in the same way as the rest of your state, with the State monad:
data Board = Board {
x :: Int
, y :: Int
,board :: UArray (Int,Int) Int
} deriving Show
(using Data.Array.Unboxed.UArray)
This can be "modified" just like any other element of your record, by transforming it with the pure functions from the immutable array interface.

Resources