I've been learning Haskell and I noticed that many of the built in functions accept parameters in an order counter intuitive to what I would expect. For example:
replicate :: Int -> a -> [a]
If I want to replicate 7 twice, I would write replicate 2 7. But when read out loud in English, the function call feels like it is saying "Replicate 2, 7 times". If I would have written the function myself, I would have swapped the first and second arguments so that replicate 7 2 would read "replicate 7, 2 times".
Some other examples appeared when I was going through 99 Haskell Problems. I had to write a function:
dropEvery :: [a] -> Int -> [a]`
It takes a list as its first argument and an Int as its second. Intuitively, I would have written the header as dropEvery :: Int -> [a] -> [a] so that dropEvery 3 [1..100] would read as: "drop every third element in the list [1..100]. But in the question's example, it would look like: dropEvery [1..100] 3.
I've also seen this with other functions that I cannot find right now. Is it common to write functions in such a way due to a practical reason or is this all just in my head?
It's common practice in Haskell to order function parameters so that parameters which "configure" an operation come first, and the "main thing being operated on" comes last. This is often counter intuitive coming from other languages, since it tends to mean you end up passing the "least important" information first. It's especially jarring coming from OO where the "main" argument is usually the object on which the method is being invoked, occurring so early in in the call that it's out of the parameter list entirely!
There's a method to our madness though. The reason we do this is that partial application (through currying) is so easy and so widely used in Haskell. Say I have a functions like foo :: Some -> Config -> Parameters -> DataStrucutre -> DataStructure and bar :: Differnt -> Config -> DataStructure -> DataStructure. When you're not used to higher-order thinking you just see these as things you call to transform a data structure. But you can also use either of them as a factory for "DataStructure transformers": functions of the type DataStructure -> DataStructure.
It's very likely that there are other operations that are configured by such DataStructure -> DataStructure functions; at the very least there's fmap for turning transformers of DataStructures into transformers of functors of DataStructures (lists, Maybes, IOs, etc).
We can take this a bit further sometimes too. Consider foo :: Some -> Config -> Parameters -> DataStructure -> DataStructure again. If I expect that callers of foo will often call it many times with the same Some and Config, but varying Parameters, then even-more-partial applications become useful.
Of course, even if the parameters are in the "wrong" order for my partial application I can still do it, using combinators like flip and/or creating wrapper functions/lambdas. But this results in a lot of "noise" in my code, meaning that a reader has to be able to puzzle out what is the "important" thing being done and what's just adapting interfaces.
So the basic theory is for a function writer to try to anticipate the usage patterns of the function, and list its arguments in order from "most stable" to "least stable". This isn't the only consideration of course, and often there are conflicting patterns and no clear "best" order.
But "the order the parameters would be listed in an English sentence describing the function call" would not be something I would give much weight to in designing a function (and not in other languages either). Haskell code just does not read like English (nor does code in most other programming languages), and trying to make it closer in a few cases doesn't really help.
For your specific examples:
For replicate, it seems to me like the a parameter is the "main" argument, so I would put it last, as the standard library does. There's not a lot in it though; it doesn't seem very much more useful to choose the number of replications first and have an a -> [a] function than it would be to choose the replicated element first and have an Int -> [a] function.
dropEvery indeed seems to take it's arguments in a wonky order, but not because we say in English "drop every Nth element in a list". Functions that take a data structure and return a "modified version of the same structure" should almost always take the data structure as their last argument, with the parameters that configure the "modification" coming first.
One of the reasons functions are written this way is because their curried forms turn out to be useful.
For example, consider the functions map and filter:
map :: (a -> b) -> [a] -> [b]
filter :: (a -> Bool) -> [a] -> [a]
If I wanted to keep the even numbers in a list and then divide them by 2, I could write:
myfunc :: [Int] -> [Int]
myfunc as = map (`div` 2) (filter even as)
which may also be written this way:
myfunc = map (`div` 2) . filter even
\___ 2 ____/ \___ 1 ___/
Envision this as a pipeline going from right to left:
first we keep the even numbers (step 1)
then we divide each number by 2 (step 2)
The . operator at as a way of joining pipeline segments together - much like how the | operator works in the Unix shell.
This is all possible because the list argument for map and filter are the last parameters to those functions.
If you write your dropEvery with this signature:
dropEvery :: Int -> [a] -> [a]
then we can include it in one of these pipelines, e.g.:
myfunc2 = dropEvery 3 . map (`div` 2) . filter even
To add to the other answers, there's also often an incentive to make the last argument be the one whose construction is likely to be most complicated and/or to be a lambda abstraction. This way one can write
f some little bits $
big honking calculation
over several lines
rather than having the big calculation surrounded by parentheses and a few little arguments trailing off at the end.
If you wish to flip arguments, just use flip function from Prelude
replicate' = flip replicate
> :t replicate'
replicate' :: a -> Int -> [a]
Related
As a Haskell beginner, I'm curious about best practices. In particular, in the absence of other requirements, is it better to associate related function arguments using tuples, or keep them "naked"?
E.g.
vector :: Float -> Float -> Float -> Vector
vs.
vector :: (Float, Float, Float) -> Vector
The reason I ask is that sometimes aspects of a parameter (e.g. x coordinate in a 2D or 3D point or vector) are normally bound up with other parameters (e.g. the y & z coordinates). I can see how pattern-matching can be used in both cases, but I'm curious to know whether there are serious implications "down the track" to using tuples or distinct parameters.
When other parameters are involved, the use of tuples seems to make it clear that a certain set of parameters are associated with each other. But it also makes the code more verbose when functions take just the tuple as a parameter.
I would recommend, as a rule of thumb, to never put tuples in the arguments of a function signature.
Why? Well, if the point is to group stuff together, then tuples do a rather measly job at it. Sure, you could use nested tuples and type synonyms to explain what they mean, but all of that is brittle and much better and safer done with proper record types. As you've identified, the x- and y-components of a vector usually come together. Well, not only that, in many a sense it is a good idea to keep the x- and y-components completely hidden from any interesting code. That's exactly what the Vector type should accomplish. (Which should probably be called Vector3 or ℝ³ instead.) And the only purpose of the vector function should be to assemble one of those from the components. Well, if that's the only thing it does, then the three components are the only arguments, and there's no point grouping them together any further... that's basically just putting a single suitcase into another transport box. Better just use the right container right away as a single wrapper.
vector3 :: Float -> Float -> Float -> Vector3
An example of a tuple in a signature of a commonly used function is
randomR :: (Random a, RandomGen g) => (a,a) -> g -> (a,g)
Why is this a bad idea? Well, you're using a tuple to denote an interval... but also in the result to denote something completely different, a grouping of the obtained random value with the updated generator. The proper way to do this is to either have a type that properly expresses what it is
data Interval a = Interval {lowerBound, upperBound :: a}
randomR :: (Random a, RandomGen g) => Interval a -> g -> (a,g)
...or better, separate the concerns, i.e. that manual state-threading should be hidden in a suitable monad – such as RVar. At that point the range limits become the only arguments, thus you don't need to group them together anymore!
uniform :: Distribution Uniform a => a -> a -> RVar a
That doesn't mean you should never use tuples at all. For result values, the currying mechanism doesn't work as easily†, so if you have a function that gives back two results but there's not really any meaningful interpretation for what those two values represent together, well, give back a tuple.
Furthermore, if you're grouping together completely abstract types, you can't possibly have an interpretation for what they mean together. That's the reason why zip :: [a] -> [b] -> [(a,b)] gives a list of tuples.
†You can also have multi-result functions with tuples. For that, you need to use continuation-passing style, for example splitAt :: Int -> [a] -> ([a],[a]) becomes splitAt' :: Int -> [a] -> ([a] -> [a] -> r) -> r.
There are no implications down the line. A function that can accept one argument first and then another one later, is said to be curried. A function that accepts a tuple as an argument is said to be uncurried. You can convert between the two using curry and uncurry. Feel free to extend this definition to three parameters and define new functions curry3 f a b c= f(a,b,c) and uncurry3 f (a,b,c)= f a b c.
In this case, I would going for a named datatype for most uses. In fact, you already seem to have a Vector type. Making your constructor, vector, accept a triple seems like an excellent idea. That way, those who try to use it to construct a 2D vector will get the most helpful message from the type checker.
Haskell newbie here. It is my observation that:
zip and zip3 are important functions - the are included in the Prelude, implemented by many other languages, represent a common operation in mathematics(transposition)
not generic with respect to parameter structure
easy to implement in traditional languages - C or C++ (say 20 hours work); python already has it as a build-in
Why is zip so restricted? Is there an abstraction, generalizing it? Something wrong with n-sized tuples?
Because the suggested duplicates answer most of this, I will focus on the questions in your followup comment.
1) why is the standard implementation for fixed n = 2
zipWith is for 2 arguments, and repeat is for 0 arguments. This is enough to get arbitrary-arity zips. For example, the 1 argument version (also called map) can be implemented as
map f = zipWith ($) (repeat f)
and the 3 argument version as
zipWith3 f = (.) (zipWith ($)) . zipWith f
and so on. There is a pretty pattern to the implementations of larger zips (admittedly not obvious from this small sample size). This result is analogous to the one in CT which says that any category with 0-ary and 2-ary products has all finitary products.
The other half of the answer, I suppose, is that type-level numbers (which are the most frequent implementation technique for arbitrary-arity zips) are possible but annoying to use, and avoiding them tends to reduce both term- and type-level noise.
2) I need to pass the number of lists, that's unwieldy
Use ZipList. You don't need to pass the number of lists (though you do need to write one infix operator per list -- a very light requirement, I think, as even in Python you need a comma between each list).
Empirically: I have not found arbitrary-arity zips such a common need that I would label it "unwieldy".
3) even if I define my own zip, there will be collisions with Prelude.zip.
So pick another name...?
Because the type signatures would be different, for example the type signatures of zip and zip3 are different:
zip :: [a] -> [b] ->[(a,b)]
zip3:: [a] -> [b] -> [c] -> [(a,b,c)]
zip3 takes one more argument than zip and secondly the type, and haskell does not allow you to have a polymorphism with different numbers of arguments because of currying. Here is an explanation of what currying is on SO.
I would like to define a type for infinite number sequence in haskell. My idea is:
type MySeq = Natural -> Ratio Integer
However, I would also like to be able to define some properties of the sequence on the type level. A simple example would be a non-decreasing sequence like this. Is this possible to do this with current dependent-type capabilities of GHC?
EDIT: I came up with the following idea:
type PositiveSeq = Natural -> Ratio Natural
data IncreasingSeq = IncreasingSeq {
start :: Ratio Natural,
diff :: PositiveSeq}
type IKnowItsIncreasing = [Ratio Natural]
getSeq :: IncreasingSeq -> IKnowItsIncreasing
getSeq s = scanl (+) (start s) [diff s i | i <- [1..]]
Of course, it's basically a hack and not actually type safe at all.
This isn't doing anything very fancy with types, but you could change how you interpret a sequence of naturals to get essentially the same guarantee.
I think you are thinking along the right lines in your edit to the question. Consider
data IncreasingSeq = IncreasingSeq (Integer -> Ratio Natural)
where each ratio represents how much it has increased from the previous number (starting with 0).
Then you can provide a single function
applyToIncreasing :: ([Ratio Natural] -> r) -> IncreasingSeq -> r
applyToIncreasing f (IncreasingSeq s) = f . drop 1 $ scanl (+) 0 (map (s $) [0..])
This should let you deconstruct it in any way, without allowing the function to inspect the real structure.
You just need a way to construct it: probably a fromList that just sorts it and an insert that performs a standard ordered insertion.
It pains part of me to say this, but I don't think you'd gain anything over this using fancy type tricks: there are only three functions that could ever possibly go wrong, and they are fairly simple to correctly implement. The implementation is hidden so anything that uses those is correct as a result of those functions being correct. Just don't export the data constructor for IncreasingSeq.
I would also suggest considering making [Ratio Natural] be the underlying representation. It simplifies things and guarantees that there are no "gaps" in the sequence (so it is guaranteed to be a sequence).
If you want more safety and can take the performance hit, you can use data Nat = Z | S Nat instead of Natural.
I will say that if this was Coq, or a similar language, instead of Haskell I would be more likely to suggest doing some fancier type-level stuff (depending on what you are trying to accomplish) for a couple reasons:
In systems like Coq, you are usually proving theorems about the code. Because of this, it can be useful to have a type-level proof that a certain property holds. Since Haskell doesn't really have a builtin way to prove those sorts of theorems, the utility diminishes.
On the other hand, we can (sometimes) construct data types that essentially must have the properties we want using a small number of trusted functions and a hidden implementation. In the context of a system with more theorem proving capability, like Coq, this might be harder to convince theorem prover of the property than if we used a dependent type (possibly, at least). In Haskell, however, we don't have that issue in the first place.
The Prelude shows examples for take and drop with negative arguments:
take (-1) [1,2] == []
drop (-1) [1,2] == [1,2]
Why are these defined the way they are, when e.g. x !! (-1) does the "safer" thing and crashes? It seems like a hackish and very un-Haskell-like way to make these functions total, even when the argument doesn't make sense. Is there some greater design philosophy behind this that I'm not seeing? Is this behavior guaranteed by the standard, or is this just how GHC decided to implement it?
There would be mainly one good reason to make take partial: it could guarantee that the result list, if there is one, has always the requested number of elements.
Now, take already violates this in the other direction: when you try to take more elements than there are in the list, is simply takes as many as there are, i.e. fewer than requested. Perhaps not the most elegant thing to do, but in practice this tends to work out quite usefully.
The main invariant for take is combined with drop:
take n xs ++ drop n xs ≡ xs
and that holds true even if n is negative.
A good reason not to check the length of the list is that it makes the functions perform nicely on lazy infinite lists: for instance,
take hugeNum [1..] ++ 0 : drop hugeNum [1..]
will immediately give 1 as the first result element. This would not be possible if take and drop first had to check whether there are enough elements in the input.
I think it's a matter of design choice here.
The current definition ensures that the property
take x list ++ drop x list == list
holds for any x, including negative ones as well as those larger than length list.
I can however see the value in a variant of take/drop which errors out: sometimes a crash is preferred to a wrong result.
x !! (-1) does the "safer" thing and crashes
Crashing is not safe. Making a function non-total destroys your ability
to reason about the behaviour of a function based on its type.
Let us imagine that take and drop did have "crash on negative" behaviour. Consider their type:
take, drop :: Int -> [a] -> [a]
One thing this type definitely doesn't tell you that this function could crash! It's helpful to reason about code as though we were using a total language, even though we are not - an idea called fast and loose reasoning - but to be able to do that, you have to avoid using (and writing) non-total functions as much as possible.
What to do, then, about operations that might fail or have no result? Types are the answer! A truly safe variant of (!!) would have a type that models the failure case, like:
safeIndex :: [a] -> Int -> Maybe a
This is preferable to the type of (!!),
(!!) :: [a] -> Int -> a
Which, by simple observation, can have no (total) inhabitants - you cannot "invent" an a if the list is empty!
Finally, let us return to take and drop. Although their type doesn't fully say what the behaviour is, coupled with their names (and ideally a few QuickCheck properties) we get a pretty good idea. As other responders have pointed out, this behaviour is appropriate in many cases. If you truly have a need to reject negative length inputs, you don't have to choose between non-totality (crashing) or the possibility of surprising behaviour (negative length accepted) - model the possible outcomes responsibly with types.
This type makes it clear that there is "no result"
for some inputs:
takePos, dropPos :: Int -> [a] -> Maybe [a]
Better still, this type uses natural numbers;
functions with this type cannot even be
applied to a negative number!
takeNat, dropNat :: Nat -> [a] -> [a]
Lets say there is a list of all possible things
all3PStrategies :: [Strategy3P]
all3PStrategies = [strategyA, strategyB, strategyC, strategyD] //could be longer, maybe even infinite, but this is good enough for demonstrating
Now we have another function that takes an integer N and two strategies, and uses the first strategy for N times, and then uses the second strategy for N times and continues to repeat for as long as needed.
What happens if the N is 0, I want to return a random strategy, since it breaks the purpose of the function, but it must ultimatley apply a particular strategy.
rotatingStrategy [] [] _ = chooseRandom all3PStrategies
rotatingStrategy strategy3P1 strategy3P2 N =
| … // other code for what really happens
So I am trying to get a rondom strategy from the list. I Think this will do it:
chooseRandom :: [a] -> RVar a
But how do I test it using Haddock/doctest?
-- >>> chooseRandom all3PStrategies
-- // What goes here since I cant gurauntee what will be returned...?
I think random functions kind of goes against the Haskell idea of functional, but I also am likely mistaken. In imperative languages the random function uses various parameters (like Time in Java) to determine the random number, so can't I just plug in a/the particular parameters to ensure which random number I will get?
If you do this: chooseRandom :: [a] -> RVar a, then you won't be able to use IO. You need to be able to include the IO monad throughout the type declaration, including the test cases.
Said more plainly, as soon as you use the IO monad, all return types must include the type of the IO monad, which is not likely to be included in the list that you want returned, unless you edit the structure of the list to accommodate items that have the IO Type included.
There are several ways to implement chooseRandom. If you use a version that returns RVar Strategy3P, you will still need to sample the RVar using runRVar to get a Strategy3P that you can actually execute.
You can also solve the problem using the IO monad, which is really no different: instead of thinking of chooseRandom as a function that returns a probability distribution that we can sample as necessary, we can think of it as a function that returns a computation that we can evaluate as necessary. Depending on your perspective, this might make things more or less confusing, but at least it avoids the need to install the rvar package. One implementation of chooseRandom using IO is the pick function from this blog post:
import Random (randomRIO)
pick :: [a] -> IO a
pick xs = randomRIO (0, (length xs - 1)) >>= return . (xs !!)
This code is arguably buggy: it crashes at runtime when you give it the empty list. If you're worried about that, you can detect the error at compile time by wrapping the result in Maybe, but if you know that your strategy list will never be empty (for example, because it's hard-coded) then it's probably not worth bothering.
It probably follows that it's not worth testing either, but there are a number of solutions to the fundamental problem, which is how to test monadic functions. In other words, given a monadic value m a, how can we interrogate it in our testing framework (ideally by reusing functions that work on the raw value a)? This is a complex problem addressed in the QuickCheck library and associated research paper, Testing Monadic Code with QuickCheck).
However, it doesn't look like it would be easy to integrate QuickCheck with doctest, and the problem is really too simple to justify investing in a whole new testing framework! Given that you just need some quick-and-dirty testing code (that won't actually be part of your application), it's probably OK to use unsafePerformIO here, even though many Haskellers would consider it a code smell:
{-|
>>> let xs = ["cat", "dog", "fish"]
>>> elem (unsafePerformIO $ pick xs) xs
True
-}
pick :: [a] -> IO a
Just make sure you understand why using unsafePerformIO is "unsafe" (it's non-deterministic in general), and why it doesn't really matter for this case in particular (because failure of the standard RNG isn't really a big enough risk, for this application, to justify the extra work we'd require to capture it in the type system).