The Prelude shows examples for take and drop with negative arguments:
take (-1) [1,2] == []
drop (-1) [1,2] == [1,2]
Why are these defined the way they are, when e.g. x !! (-1) does the "safer" thing and crashes? It seems like a hackish and very un-Haskell-like way to make these functions total, even when the argument doesn't make sense. Is there some greater design philosophy behind this that I'm not seeing? Is this behavior guaranteed by the standard, or is this just how GHC decided to implement it?
There would be mainly one good reason to make take partial: it could guarantee that the result list, if there is one, has always the requested number of elements.
Now, take already violates this in the other direction: when you try to take more elements than there are in the list, is simply takes as many as there are, i.e. fewer than requested. Perhaps not the most elegant thing to do, but in practice this tends to work out quite usefully.
The main invariant for take is combined with drop:
take n xs ++ drop n xs ≡ xs
and that holds true even if n is negative.
A good reason not to check the length of the list is that it makes the functions perform nicely on lazy infinite lists: for instance,
take hugeNum [1..] ++ 0 : drop hugeNum [1..]
will immediately give 1 as the first result element. This would not be possible if take and drop first had to check whether there are enough elements in the input.
I think it's a matter of design choice here.
The current definition ensures that the property
take x list ++ drop x list == list
holds for any x, including negative ones as well as those larger than length list.
I can however see the value in a variant of take/drop which errors out: sometimes a crash is preferred to a wrong result.
x !! (-1) does the "safer" thing and crashes
Crashing is not safe. Making a function non-total destroys your ability
to reason about the behaviour of a function based on its type.
Let us imagine that take and drop did have "crash on negative" behaviour. Consider their type:
take, drop :: Int -> [a] -> [a]
One thing this type definitely doesn't tell you that this function could crash! It's helpful to reason about code as though we were using a total language, even though we are not - an idea called fast and loose reasoning - but to be able to do that, you have to avoid using (and writing) non-total functions as much as possible.
What to do, then, about operations that might fail or have no result? Types are the answer! A truly safe variant of (!!) would have a type that models the failure case, like:
safeIndex :: [a] -> Int -> Maybe a
This is preferable to the type of (!!),
(!!) :: [a] -> Int -> a
Which, by simple observation, can have no (total) inhabitants - you cannot "invent" an a if the list is empty!
Finally, let us return to take and drop. Although their type doesn't fully say what the behaviour is, coupled with their names (and ideally a few QuickCheck properties) we get a pretty good idea. As other responders have pointed out, this behaviour is appropriate in many cases. If you truly have a need to reject negative length inputs, you don't have to choose between non-totality (crashing) or the possibility of surprising behaviour (negative length accepted) - model the possible outcomes responsibly with types.
This type makes it clear that there is "no result"
for some inputs:
takePos, dropPos :: Int -> [a] -> Maybe [a]
Better still, this type uses natural numbers;
functions with this type cannot even be
applied to a negative number!
takeNat, dropNat :: Nat -> [a] -> [a]
Related
I would like to define a type for infinite number sequence in haskell. My idea is:
type MySeq = Natural -> Ratio Integer
However, I would also like to be able to define some properties of the sequence on the type level. A simple example would be a non-decreasing sequence like this. Is this possible to do this with current dependent-type capabilities of GHC?
EDIT: I came up with the following idea:
type PositiveSeq = Natural -> Ratio Natural
data IncreasingSeq = IncreasingSeq {
start :: Ratio Natural,
diff :: PositiveSeq}
type IKnowItsIncreasing = [Ratio Natural]
getSeq :: IncreasingSeq -> IKnowItsIncreasing
getSeq s = scanl (+) (start s) [diff s i | i <- [1..]]
Of course, it's basically a hack and not actually type safe at all.
This isn't doing anything very fancy with types, but you could change how you interpret a sequence of naturals to get essentially the same guarantee.
I think you are thinking along the right lines in your edit to the question. Consider
data IncreasingSeq = IncreasingSeq (Integer -> Ratio Natural)
where each ratio represents how much it has increased from the previous number (starting with 0).
Then you can provide a single function
applyToIncreasing :: ([Ratio Natural] -> r) -> IncreasingSeq -> r
applyToIncreasing f (IncreasingSeq s) = f . drop 1 $ scanl (+) 0 (map (s $) [0..])
This should let you deconstruct it in any way, without allowing the function to inspect the real structure.
You just need a way to construct it: probably a fromList that just sorts it and an insert that performs a standard ordered insertion.
It pains part of me to say this, but I don't think you'd gain anything over this using fancy type tricks: there are only three functions that could ever possibly go wrong, and they are fairly simple to correctly implement. The implementation is hidden so anything that uses those is correct as a result of those functions being correct. Just don't export the data constructor for IncreasingSeq.
I would also suggest considering making [Ratio Natural] be the underlying representation. It simplifies things and guarantees that there are no "gaps" in the sequence (so it is guaranteed to be a sequence).
If you want more safety and can take the performance hit, you can use data Nat = Z | S Nat instead of Natural.
I will say that if this was Coq, or a similar language, instead of Haskell I would be more likely to suggest doing some fancier type-level stuff (depending on what you are trying to accomplish) for a couple reasons:
In systems like Coq, you are usually proving theorems about the code. Because of this, it can be useful to have a type-level proof that a certain property holds. Since Haskell doesn't really have a builtin way to prove those sorts of theorems, the utility diminishes.
On the other hand, we can (sometimes) construct data types that essentially must have the properties we want using a small number of trusted functions and a hidden implementation. In the context of a system with more theorem proving capability, like Coq, this might be harder to convince theorem prover of the property than if we used a dependent type (possibly, at least). In Haskell, however, we don't have that issue in the first place.
I've been learning Haskell and I noticed that many of the built in functions accept parameters in an order counter intuitive to what I would expect. For example:
replicate :: Int -> a -> [a]
If I want to replicate 7 twice, I would write replicate 2 7. But when read out loud in English, the function call feels like it is saying "Replicate 2, 7 times". If I would have written the function myself, I would have swapped the first and second arguments so that replicate 7 2 would read "replicate 7, 2 times".
Some other examples appeared when I was going through 99 Haskell Problems. I had to write a function:
dropEvery :: [a] -> Int -> [a]`
It takes a list as its first argument and an Int as its second. Intuitively, I would have written the header as dropEvery :: Int -> [a] -> [a] so that dropEvery 3 [1..100] would read as: "drop every third element in the list [1..100]. But in the question's example, it would look like: dropEvery [1..100] 3.
I've also seen this with other functions that I cannot find right now. Is it common to write functions in such a way due to a practical reason or is this all just in my head?
It's common practice in Haskell to order function parameters so that parameters which "configure" an operation come first, and the "main thing being operated on" comes last. This is often counter intuitive coming from other languages, since it tends to mean you end up passing the "least important" information first. It's especially jarring coming from OO where the "main" argument is usually the object on which the method is being invoked, occurring so early in in the call that it's out of the parameter list entirely!
There's a method to our madness though. The reason we do this is that partial application (through currying) is so easy and so widely used in Haskell. Say I have a functions like foo :: Some -> Config -> Parameters -> DataStrucutre -> DataStructure and bar :: Differnt -> Config -> DataStructure -> DataStructure. When you're not used to higher-order thinking you just see these as things you call to transform a data structure. But you can also use either of them as a factory for "DataStructure transformers": functions of the type DataStructure -> DataStructure.
It's very likely that there are other operations that are configured by such DataStructure -> DataStructure functions; at the very least there's fmap for turning transformers of DataStructures into transformers of functors of DataStructures (lists, Maybes, IOs, etc).
We can take this a bit further sometimes too. Consider foo :: Some -> Config -> Parameters -> DataStructure -> DataStructure again. If I expect that callers of foo will often call it many times with the same Some and Config, but varying Parameters, then even-more-partial applications become useful.
Of course, even if the parameters are in the "wrong" order for my partial application I can still do it, using combinators like flip and/or creating wrapper functions/lambdas. But this results in a lot of "noise" in my code, meaning that a reader has to be able to puzzle out what is the "important" thing being done and what's just adapting interfaces.
So the basic theory is for a function writer to try to anticipate the usage patterns of the function, and list its arguments in order from "most stable" to "least stable". This isn't the only consideration of course, and often there are conflicting patterns and no clear "best" order.
But "the order the parameters would be listed in an English sentence describing the function call" would not be something I would give much weight to in designing a function (and not in other languages either). Haskell code just does not read like English (nor does code in most other programming languages), and trying to make it closer in a few cases doesn't really help.
For your specific examples:
For replicate, it seems to me like the a parameter is the "main" argument, so I would put it last, as the standard library does. There's not a lot in it though; it doesn't seem very much more useful to choose the number of replications first and have an a -> [a] function than it would be to choose the replicated element first and have an Int -> [a] function.
dropEvery indeed seems to take it's arguments in a wonky order, but not because we say in English "drop every Nth element in a list". Functions that take a data structure and return a "modified version of the same structure" should almost always take the data structure as their last argument, with the parameters that configure the "modification" coming first.
One of the reasons functions are written this way is because their curried forms turn out to be useful.
For example, consider the functions map and filter:
map :: (a -> b) -> [a] -> [b]
filter :: (a -> Bool) -> [a] -> [a]
If I wanted to keep the even numbers in a list and then divide them by 2, I could write:
myfunc :: [Int] -> [Int]
myfunc as = map (`div` 2) (filter even as)
which may also be written this way:
myfunc = map (`div` 2) . filter even
\___ 2 ____/ \___ 1 ___/
Envision this as a pipeline going from right to left:
first we keep the even numbers (step 1)
then we divide each number by 2 (step 2)
The . operator at as a way of joining pipeline segments together - much like how the | operator works in the Unix shell.
This is all possible because the list argument for map and filter are the last parameters to those functions.
If you write your dropEvery with this signature:
dropEvery :: Int -> [a] -> [a]
then we can include it in one of these pipelines, e.g.:
myfunc2 = dropEvery 3 . map (`div` 2) . filter even
To add to the other answers, there's also often an incentive to make the last argument be the one whose construction is likely to be most complicated and/or to be a lambda abstraction. This way one can write
f some little bits $
big honking calculation
over several lines
rather than having the big calculation surrounded by parentheses and a few little arguments trailing off at the end.
If you wish to flip arguments, just use flip function from Prelude
replicate' = flip replicate
> :t replicate'
replicate' :: a -> Int -> [a]
Lets say there is a list of all possible things
all3PStrategies :: [Strategy3P]
all3PStrategies = [strategyA, strategyB, strategyC, strategyD] //could be longer, maybe even infinite, but this is good enough for demonstrating
Now we have another function that takes an integer N and two strategies, and uses the first strategy for N times, and then uses the second strategy for N times and continues to repeat for as long as needed.
What happens if the N is 0, I want to return a random strategy, since it breaks the purpose of the function, but it must ultimatley apply a particular strategy.
rotatingStrategy [] [] _ = chooseRandom all3PStrategies
rotatingStrategy strategy3P1 strategy3P2 N =
| … // other code for what really happens
So I am trying to get a rondom strategy from the list. I Think this will do it:
chooseRandom :: [a] -> RVar a
But how do I test it using Haddock/doctest?
-- >>> chooseRandom all3PStrategies
-- // What goes here since I cant gurauntee what will be returned...?
I think random functions kind of goes against the Haskell idea of functional, but I also am likely mistaken. In imperative languages the random function uses various parameters (like Time in Java) to determine the random number, so can't I just plug in a/the particular parameters to ensure which random number I will get?
If you do this: chooseRandom :: [a] -> RVar a, then you won't be able to use IO. You need to be able to include the IO monad throughout the type declaration, including the test cases.
Said more plainly, as soon as you use the IO monad, all return types must include the type of the IO monad, which is not likely to be included in the list that you want returned, unless you edit the structure of the list to accommodate items that have the IO Type included.
There are several ways to implement chooseRandom. If you use a version that returns RVar Strategy3P, you will still need to sample the RVar using runRVar to get a Strategy3P that you can actually execute.
You can also solve the problem using the IO monad, which is really no different: instead of thinking of chooseRandom as a function that returns a probability distribution that we can sample as necessary, we can think of it as a function that returns a computation that we can evaluate as necessary. Depending on your perspective, this might make things more or less confusing, but at least it avoids the need to install the rvar package. One implementation of chooseRandom using IO is the pick function from this blog post:
import Random (randomRIO)
pick :: [a] -> IO a
pick xs = randomRIO (0, (length xs - 1)) >>= return . (xs !!)
This code is arguably buggy: it crashes at runtime when you give it the empty list. If you're worried about that, you can detect the error at compile time by wrapping the result in Maybe, but if you know that your strategy list will never be empty (for example, because it's hard-coded) then it's probably not worth bothering.
It probably follows that it's not worth testing either, but there are a number of solutions to the fundamental problem, which is how to test monadic functions. In other words, given a monadic value m a, how can we interrogate it in our testing framework (ideally by reusing functions that work on the raw value a)? This is a complex problem addressed in the QuickCheck library and associated research paper, Testing Monadic Code with QuickCheck).
However, it doesn't look like it would be easy to integrate QuickCheck with doctest, and the problem is really too simple to justify investing in a whole new testing framework! Given that you just need some quick-and-dirty testing code (that won't actually be part of your application), it's probably OK to use unsafePerformIO here, even though many Haskellers would consider it a code smell:
{-|
>>> let xs = ["cat", "dog", "fish"]
>>> elem (unsafePerformIO $ pick xs) xs
True
-}
pick :: [a] -> IO a
Just make sure you understand why using unsafePerformIO is "unsafe" (it's non-deterministic in general), and why it doesn't really matter for this case in particular (because failure of the standard RNG isn't really a big enough risk, for this application, to justify the extra work we'd require to capture it in the type system).
Note to other potential contributors: Please don't hesitate to use abstract or mathematical notations to make your point. If I find your answer unclear, I will ask for elucidation, but otherwise feel free to express yourself in a comfortable fashion.
To be clear: I am not looking for a "safe" head, nor is the choice of head in particular exceptionally meaningful. The meat of the question follows the discussion of head and head', which serve to provide context.
I've been hacking away with Haskell for a few months now (to the point that it has become my main language), but I am admittedly not well-informed about some of the more advanced concepts nor the details of the language's philosophy (though I am more than willing to learn). My question then is not so much a technical one (unless it is and I just don't realize it) as it is one of philosophy.
For this example, I am speaking of head.
As I imagine you'll know,
Prelude> head []
*** Exception: Prelude.head: empty list
This follows from head :: [a] -> a. Fair enough. Obviously one cannot return an element of (hand-wavingly) no type. But at the same time, it is simple (if not trivial) to define
head' :: [a] -> Maybe a
head' [] = Nothing
head' (x:xs) = Just x
I've seen some little discussion of this here in the comment section of certain statements. Notably, one Alex Stangl says
'There are good reasons not to make everything "safe" and to throw exceptions when preconditions are violated.'
I do not necessarily question this assertion, but I am curious as to what these "good reasons" are.
Additionally, a Paul Johnson says,
'For instance you could define "safeHead :: [a] -> Maybe a", but now instead of either handling an empty list or proving it can't happen, you have to handle "Nothing" or prove it can't happen.'
The tone that I read from that comment suggests that this is a notable increase in difficulty/complexity/something, but I am not sure that I grasp what he's putting out there.
One Steven Pruzina says (in 2011, no less),
"There's a deeper reason why e.g 'head' can't be crash-proof. To be polymorphic yet handle an empty list, 'head' must always return a variable of the type which is absent from any particular empty list. It would be Delphic if Haskell could do that...".
Is polymorphism lost by allowing empty list handling? If so, how so, and why? Are there particular cases which would make this obvious? This section amply answered by #Russell O'Connor. Any further thoughts are, of course, appreciated.
I'll edit this as clarity and suggestion dictates. Any thoughts, papers, etc., you can provide will be most appreciated.
Is polymorphism lost by allowing empty
list handling? If so, how so, and why?
Are there particular cases which would
make this obvious?
The free theorem for head states that
f . head = head . $map f
Applying this theorem to [] implies that
f (head []) = head (map f []) = head []
This theorem must hold for every f, so in particular it must hold for const True and const False. This implies
True = const True (head []) = head [] = const False (head []) = False
Thus if head is properly polymorphic and head [] were a total value, then True would equal False.
PS. I have some other comments about the background to your question to the effect of if you have a precondition that your list is non-empty then you should enforce it by using a non-empty list type in your function signature instead of using a list.
Why does anyone use head :: [a] -> a instead of pattern matching? One of the reasons is because you know that the argument cannot be empty and do not want to write the code to handle the case where the argument is empty.
Of course, your head' of type [a] -> Maybe a is defined in the standard library as Data.Maybe.listToMaybe. But if you replace a use of head with listToMaybe, you have to write the code to handle the empty case, which defeats this purpose of using head.
I am not saying that using head is a good style. It hides the fact that it can result in an exception, and in this sense it is not good. But it is sometimes convenient. The point is that head serves some purposes which cannot be served by listToMaybe.
The last quotation in the question (about polymorphism) simply means that it is impossible to define a function of type [a] -> a which returns a value on the empty list (as Russell O'Connor explained in his answer).
It's only natural to expect the following to hold: xs === head xs : tail xs - a list is identical to its first element, followed by the rest. Seems logical, right?
Now, let's count the number of conses (applications of :), disregarding the actual elements, when applying the purported 'law' to []: [] should be identical to foo : bar, but the former has 0 conses, while the latter has (at least) one. Uh oh, something's not right here!
Haskell's type system, for all its strengths, is not up to expressing the fact that you should only call head on a non-empty list (and that the 'law' is only valid for non-empty lists). Using head shifts the burden of proof to the programmer, who should make sure it's not used on empty lists. I believe dependently typed languages like Agda can help here.
Finally, a slightly more operational-philosophical description: how should head ([] :: [a]) :: a be implemented? Conjuring a value of type a out of thin air is impossible (think of uninhabited types such as data Falsum), and would amount to proving anything (via the Curry-Howard isomorphism).
There are a number of different ways to think about this. So I am going to argue both for and against head':
Against head':
There is no need to have head': Since lists are a concrete data type, everything that you can do with head' you can do by pattern matching.
Furthermore, with head' you're just trading off one functor for another. At some point you want to get down to brass tacks and get some work done on the underlying list element.
In defense of head':
But pattern matching obscures what's going on. In Haskell we are interested in calculating functions, which is better accomplished by writing them in point-free style using compositions and combinators.
Furthermore, thinking about the [] and Maybe functors, head' allows you to move back and forth between them (In particular the Applicative instance of [] with pure = replicate.)
If in your use case an empty list makes no sense at all, you can always opt to use NonEmpty instead, where neHead is safe to use. If you see it from that angle, it's not the head function that is unsafe, it's the whole list data-structure (again, for that use case).
I think this is a matter of simplicity and beauty. Which is, of course, in the eye of the beholder.
If coming from a Lisp background, you may be aware that lists are built of cons cells, each cell having a data element and a pointer to next cell. The empty list is not a list per se, but a special symbol. And Haskell goes with this reasoning.
In my view, it is both cleaner, simpler to reason about, and more traditional, if empty list and list are two different things.
...I may add - if you are worried about head being unsafe - don't use it, use pattern matching instead:
sum [] = 0
sum (x:xs) = x + sum xs
So, I have a function with multiple definitions (guards), and depending on which one it matches, I'm trying to have it either return an (a,b) or [(a,b)], however the compiler is throwing up errors 'cause they're different types. I was trying to use Either to solve this, but probably not using it right :P. any help?
Either -- or a custom data type equivalent to it -- is the only way to do this. Here's a dumb example:
stuff :: Int -> Either (Int,Int) [(Int,Int)]
stuff 0 = Left (0, 0)
stuff n = Right [ (x,x) | x <- [0..n] ]
Then when somebody calls this function, they can pattern match to find out which of the two types it returned:
foo n = case stuff n of
Left (a,b) -> ...
Right pairs -> ...
However, knowing nothing about your problem, in general I would recommend thinking a bit more about the meaning of your function. What does it take, what does it return? Be precise, mathematical. The simpler the answer, the more smoothly this function will work with the rest of your program and the concepts of Haskell. For me, in such descriptions, Either rarely turns up. How can you unify the two results? Maybe you just return the singleton list [(a,b)] instead of Left (a,b), if that makes sense for your function.
Haskell does not play well with functions that try to be too smart, the type that you may be used to from Python or jQuery. Keep it dumb and precise -- get your complexity from composing these simple pieces. If you are curious about this, ask another question with more details about your problem, what you are trying to accomplish, and why you want it to work that way. Sorry for the preaching :-)
Functions in Haskell can only return one type so Either will work for you since it would have your function return the type Either (a,b) [(a.b)].
I am not sure what went wrong with how you are using Either but here is a simple example of its usage:
test a b =
if a == True
then Left (a,b)
else Right [(a,b)]