Getting constant length retrieve time constant with immutable lists in a functional programming context - haskell

I am currently facing the problem of having to make my calculations based on the length of a given list. Having to iterate over all the elements of the list to know its size is a big performance penalty as I'm using rather big lists.
What are the suggested approaches to the problem?
I guess I could always carry a size value together with the list so I know beforehand its size without having to compute it at the call site but that seems a brittle approach. I could also define a own type of list where each node has as property its the lists' size but then I'd lose the leverage provided by my programming language's libraries for standard lists.
How do you guys handle this on your daily routine?
I am currently using F#. I am aware I can use .NET's mutable (array) lists, which would solve the problem. I am way more interested, though, in the purely immutable functional approach.

The built-in F# list type doesn't have any caching of the length and there is no way to add that in some clever way, so you'll need to define your own type. I think that writing a wrapper for the existing F# list type is probably the best option.
This way, you can avoid explicit conversions - when you wrap the list, it will not actually copy it (as in svick's implementation), but the wrapper can easily cache the Length property:
open System.Collections
type LengthList<'T>(list:list<'T>) =
let length = lazy list.Length
member x.Length = length.Value
member x.List = list
interface IEnumerable with
member x.GetEnumerator() = (list :> IEnumerable).GetEnumerator()
interface seq<'T> with //'
member x.GetEnumerator() = (list :> seq<_>).GetEnumerator()
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module LengthList =
let ofList l = LengthList<_>(l)
let ofSeq s = LengthList<_>(List.ofSeq s)
let toList (l:LengthList<_>) = l.List
let length (l:LengthList<_>) = l.Length
The best way to work with the wrapper is to use LengthList.ofList for creating LengthList from a standard F# list and to use LengthList.toList (or just the List) property before using any functions from the standard List module.
However, it depends on the complexity of your code - if you only need length in a couple of places, then it may be easier to keep it separately and use a tuple list<'T> * int.

How do you guys handle this on your daily routine?
We don't, because this isn't a problem in daily routine. It sounds like a problem perhaps in limited domains.
If you created the lists recently, then you've probably already done O(N) work, so walking the list to get its length is probably not a big deal.
If you're making a few very large lists that are then not 'changing' much (obviously never changing, but I mean changing set of references to heads of lists that are used in your domain/algorithm), then it may make sense to just have a dictionary off to the side of reference-to-list-head*length tuples, and consult the dictionary when asking for lengths (doing the real work to walk them when needed, but caching results for future asks about the same list).
Finally, if you really are dealing with some algorithm that needs to be constantly updating the lists in play and constantly consulting the lengths, then create your own list-like data type (yes, you'll also need to write map/filter and any others).
(Very generally, I think it is typically best to use the built-in data structures 99.99% of the time. In the 0.01% of the time where you are developing an algorithm or bit of code that needs to be very highly optimized, then almost always you need to abandon built-in data structures (which are good enough for most cases) and use a custom data structure designed to solve the exact problem you are working on. Look to wikipedia or Okasaki's "'Purely Functional Data Structures" for ideas and inspriation in that case. But rarely go to that case.)

I don't see why carying the length around is a brittle approach. Try something like this (Haskell):
data NList a = NList Int [a]
nNil :: NList [a]
nNil = NList 0 []
nCons :: a -> NList a -> NList a
nCons x (NList n xs) = NList (n+1) (x:xs)
nHead :: NList a -> a
nHead (NList _ (x:_)) = x
nTail :: NList a -> NList a
nTail (NList n (_:xs)) = NList (n-1) xs
convert :: [a] -> NList a
convert xs = NList (length xs) xs
and so on. If this is in a library or module, you can make it safe (I think) by not exporting the constructor NList.
It may also be possible to coerce GHC into memoizing length, but I'm not sure how or when.

In F#, most List functions have an equivalent Seq functions. That means, you can just implement your own immutable linked list that carries the length with each node. Something like this:
type MyList<'T>(item : Option<'T * MyList<'T>>) =
let length =
match item with
| None -> 0
| Some (_, tail) -> tail.Length + 1
member this.Length = length
member private this.sequence =
match item with
| None -> Seq.empty
| Some (x, tail) ->
seq {
yield x
yield! tail.sequence
}
interface seq<'T> with
member this.GetEnumerator() =
(this.sequence).GetEnumerator()
member this.GetEnumerator() =
(this.sequence :> System.Collections.IEnumerable).GetEnumerator()
module MyList =
let rec ofList list =
match list with
| [] -> MyList None
| head::tail -> MyList(Some (head, ofList tail))

Related

case-of / case expression with or in pattern matching possible?

I am learning haskell on my own. And was working on implementing a custom List data type using basic lists and case of.
So data structure is something similar to this
data List = List [String] | EmptyList deriving Show
now if I am doing case expressions for base case, I have to do two matchings. A simple example would be the size function
size :: List -> Int
size lst = case lst of
(List []) -> 0
EmptyList -> 0
(List (x:xs)) -> 1 + size (List xs)
Can't I do something like combining the two base cases of list being empty (List []) and EmptyList somehow to reduce redundancy?
size :: List -> Int
size lst = case lst of
(List []) | EmptyList -> 0
(List (x:xs)) -> 1 + size (List xs)
I have tried searching all over the net for this, but unfortunately wasn't able to find anything concrete over matching multiple patterns in one case.
First of all you should consider why you have separate constructors for List and EmptyList in the first place. The empty list clearly is already a special case of a list anyway, so this is an awkward redundancy. If anything, you should make it
import Data.List.NonEmpty
data List' a = NEList (NonEmpty a) | EmptyList
Another option that would work for this specific example is to make the empty case into a “catch-all pattern”:
size :: List -> Int
size lst = case lst of
(List (x:xs)) -> 1 + size (List xs)
_ -> 0
BTW there's no reason to use case here, you can also just write two function clauses:
size :: List -> Int
size (List (x:xs)) = 1 + size (List xs)
size _ = 0
Anyways – this is generally discouraged, because catch-all clauses are an easy place for hard to detect bugs to creep in if you extend your data type in the future.
Also possible, but even worse style is to use a boolean guard match – this can easily use lookups in a list of options, like
size lst | lst`elem`[EmptyList, List []] = 0
size (List (x:xs)) = 1 + size (List xs)
Equality checks should be avoided if possible; they introduce an Eq constraint which, quite needlessly, will require the elements to be equality-comparable. And often equality check is also more computationally expensive than a pattern match.
Another option if you can't change the data structure itself but would like to work with it as if List [] and EmptyList were the same thing would be to write custom pattern synonyms. This is a relatively recent feature of Haskell; it kind of pretends the data structure is actually different – like List' – from how it's really layed out.
In the comments, you say
there are no such functions [which should return different results for EmptyList and List []]
therefore I recommend merging these two constructors in the type itself:
data List = List [String] deriving Show
Now you no longer need to distinguish between EmptyList and List [] in functions that consume a List.
...in point of fact, I would go even further and elide the definition entirely, simply using [String] everywhere instead. There is one exception to this: if you need to define an instance for a class that differs in behavior from [String]'s existing instance. In that exceptional case, defining a new type is sensible; but I would use newtype instead of data, for the usual efficiency and semantics reasons.

The meaning of the universal quantification

I am trying to understand the meaning of the universal quantification from the following page http://dev.stephendiehl.com/hask/#universal-quantification.
I am not sure, if I understand this sentence correctly
The essence of universal quantification is that we can express
functions which operate the same way for a set of types and whose
function behavior is entirely determined only by the behavior of all
types in this span.
Let`s take the function from the example:
-- ∀a. [a]
example1 :: forall a. [a]
example1 = []
What I can do with the function example1 is, to use every functions, that is defined for List type.
But I did not get the exactly purpose of the universal quantification in Haskell.
I need a collection of numbers, and I need to be able to easily insert into the middle of the list, so I decide on making a linked list. Being a savvy Hask-- programmer (Hask-- being the variant of Haskell that does not have universal quantification!), I quickly whip up a type and a length function without trouble:
data IntLinkedList = IntNil | IntCons Int IntLinkedList
length_IntLinkedList :: IntLinkedList -> Int
length_IntLinkedList IntNil = 0
length_IntLinkedList (IntCons _ tail) = 1 + length_IntLinkedList tail
Later I realize it would be handy to have a variant type that can store numbers not quite as big as 1 and not quite as small as 0. No problem...
data FloatLinkedList = FloatNil | FloatCons Float FloatLinkedList
length_FloatLinkedList :: FloatLinkedList -> Int
length_FloatLinkedList FloatNil = 0
length_FloatLinkedList (FloatCons _ tail) = 1 + length_FloatLinkedList tail
Boy that code looks awfully familiar! And if, later, I discover it would be nice to have a variant that can store Strings I am once again left copying and pasting the exact same code, and tweaking the exact same places that are specific to the contained type. Wouldn't it be nice if there were a way to just cook up a linked list once and for all that could contain elements of any single type, and a length function that worked uniformly no matter what elements it had? After all, our length functions above didn't even care what values the elements had. In Haskell, this is exactly what universal quantification gives you: a way to write a single function which works with an entire collection of types.
Here's how it looks:
data LinkedList a = Nil | Cons a (LinkedList a)
length_LinkedList :: forall a. LinkedList a -> Int
length_LinkedList Nil = 0
length_LinkedList (Cons _ tail) = 1 + length_LinkedList tail
The forall says that this function for all variants of linked lists -- linked lists of Ints, linked lists of Floats, linked lists of Strings, linked lists of functions that take FibbledyGibbets and return linked lists of tuples of Grazbars and WonkyNobbers, ...
How nice! Now instead of separate IntLinkedList and FloatLinkedList types, we can just use LinkedList Int and LinkedList Float for that, and length_LinkedList, implemented once, works for both.

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?
No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

How to modify this Haskell square root function to take an array

I have a function that will take and int and return its square root. However now i want to modify it so that it takes an array of integers and gives back an array with the square roots of the elements of the first array. I know Haskell does not use loops so how can this modification be done? Thanks.
intSquareRoot :: Int -> Int
intSquareRoot n = try n where
try i | i*i > n = try (i - 1)
| i*i <= n = i
Don't.
The idea of “looping through some collection”, putting each result in the corresponding slot of its input, is a somewhat trivial, extremely common pattern. Patterns are for OO programmers. In Haskell, when there's a pattern, we want to abstract over it, i.e. give it a simple name that we can always re-use without extra boilerplate.
This particular “pattern” is the functor operation1. For lists it's called
map :: (a->b) -> [a]->[b]
more generally (e.g. it'll also work with real arrays; lists aren't actually arrays),
class Functor f where
fmap :: (a->b) -> f a->f b
So instead of defining an extra function
intListSquareRoot :: [Int] -> [Int]
intListSquareRoot = ...
you simply use map intSquareRoot right where you wanted to use that function.
Of course, you could also define that “lifted” version of intSquareRoot,
intListSquareRoot = map intSquareRoot
but that gains you practically nothing over simply inlining the map call right where you need it.
If you insist
That said... it's of course valid to wonder how map itself works. Well, you can manually “loop” through a list by recursion:
map' :: (a->b) -> [a]->[b]
map' _ [] = []
map' f (x:xs) = f x : map' f xs
Now, you could inline your specific function here
intListSquareRoot' :: [Int] -> [Int]
intListSquareRoot' [] = []
intListSquareRoot' (x:xs) = intSquareRoot x : intListSquareRoot' xs
This is not only much more clunky and awkward than quickly inserting the map magic word, it will also often be slower: compilers such as GHC can make better optimisations when they work on higher-level concepts2 such as folds, than when they have to work again and again with manually defined recursion.
1Not to be confused what many C++ programmers call a “functor”. Haskell uses the word in the correct mathematical sense, which comes from category theory.
2This is why languages such as Matlab and APL actually achieve decent performance for special applications, although they are dynamically-typed, interpreted languages: they have this special case of “vector looping” hard-coded into their very syntax. (Unfortunately, this is pretty much the only thing they can do well...)
You can use map:
arraySquareRoot = map intSquareRoot

Counting number of elements in a list that satisfy the given predicate

Does Haskell standard library have a function that given a list and a predicate, returns the number of elements satisfying that predicate? Something like with type (a -> Bool) -> [a] -> Int. My hoogle search didn't return anything interesting. Currently I am using length . filter pred, which I don't find to be a particularly elegant solution. My use case seems to be common enough to have a better library solution that that. Is that the case or is my premonition wrong?
The length . filter p implementation isn't nearly as bad as you suggest. In particular, it has only constant overhead in memory and speed, so yeah.
For things that use stream fusion, like the vector package, length . filter p will actually be optimized so as to avoid creating an intermediate vector. Lists, however, use what's called foldr/build fusion at the moment, which is not quite smart enough to optimize length . filter p without creating linearly large thunks that risk stack overflows.
For details on stream fusion, see this paper. As I understand it, the reason that stream fusion is not currently used in the main Haskell libraries is that (as described in the paper) about 5% of programs perform dramatically worse when implemented on top of stream-based libraries, while foldr/build optimizations can never (AFAIK) make performance actively worse.
No, there is no predefined function that does this, but I would say that length . filter pred is, in fact, an elegant implementation; it's as close as you can get to expressing what you mean without just invoking the concept directly, which you can't do if you're defining it.
The only alternatives would be a recursive function or a fold, which IMO would be less elegant, but if you really want to:
foo :: (a -> Bool) -> [a] -> Int
foo p = foldl' (\n x -> if p x then n+1 else n) 0
This is basically just inlining length into the definition. As for naming, I would suggest count (or perhaps countBy, since count is a reasonable variable name).
Haskell is a high-level language. Rather than provide one function for every possible combination of circumstances you might ever encounter, it provides you with a smallish set of functions that cover all of the basics, and you then glue these together as required to solve whatever problem is currently at hand.
In terms of simplicity and conciseness, this is as elegant as it gets. So yes, length . filter pred is absolutely the standard solution. As another example, consider elem, which (as you may know) tells you whether a given item is present in a list. The standard reference implementation for this is actually
elem :: Eq x => x -> [x] -> Bool
elem x = foldr (||) False . map (x ==)
In order words, compare every element in the list to the target element, creating a new list of Bools. Then fold the logical-OR function over this new list.
If this seems inefficient, try not to worry about it. In particular,
The compiler can often optimise away temporary data structures created by code like this. (Remember, this is the standard way to write code in Haskell, so the compiler is tuned to deal with it.)
Even if it can't be optimised away, laziness often makes such code fairly efficient anyway.
(In this specific example, the OR function will terminate the loop as soon as a match is seen - just like what would happen if you hand-coded it yourself.)
As a general rule, write code by gluing together pre-existing functions. Change this only if performance isn't good enough.
This is my amateurish solution to a similar problem. Count the number of negative integers in a list l
nOfNeg l = length(filter (<0) l)
main = print(nOfNeg [0,-1,-2,1,2,3,4] ) --2
No, there isn't!
As of 2020, there is indeed no such idiom in the Haskell standard library yet! One could (and should) however insert an idiom howMany (resembling good old any)
howMany p xs = sum [ 1 | x <- xs, p x ]
-- howMany=(length.).filter
main = print $ howMany (/=0) [0..9]
Try howMany=(length.).filter
I'd do manually
howmany :: (a -> Bool) -> [a] -> Int
howmany _ [ ] = 0
howmany pred (x:xs) = if pred x then 1 + howmany pred xs
else howmany pred xs

Resources