Efficient Rational Resampling with lazy semantics - haskell

To change the sampling rate of a signal, one needs to upsample , filter, then downsample. Doing this naively means inserting zeros into the input signal, correlating with a filter's impulse response, then discarding all but every nth sample of the convolution.
The problem with the naive approach is that there is a lot of useless computation. When convolving with the filter, most of the filter taps are multiplied by zero, and computing the value of samples that will be discarded in the downsampling phase is useless. That's why efficient rational resampling uses polyphase filter banks, where only the computations that are needed are performed.
I wonder if it would be possible to use lazy computation to avoid the useless multiplications, while also avoiding explicitely constructing the polyphase filter banks. My ideal solution would be something that resembled the naive approach (upsample, then correlate, then downsample), but did the same computations as the explicit polyphase filter approach.
The downsampling is easy, since values that aren't needed won't be calculated. But I can't figure out how to avoid the multiplications-by-zero in the correlation part. The best I've come up with is to use the Maybe type and upsample with Nothings (instead of zeros):
upsample n xs = upsample2' n xs 0
where upsample' _ [] _ = []
upsample' _ (x:_) 0 = Just x : upsample' n xs n
upsample' n xs counter = Nothing : upsample' n xs (counter - 1)
correlate xs ys = sum $ catMaybes $ zipWith (fmap . (*)) xs ys
firFilter taps signal = map (correlate taps) (tails signal)
downsample _ [] = []
downsample n (x:xs) = x : downsample n (drop (n-1) xs)
upfirdn up down taps = (downsample down).(fir_filter taps).(upsample up)
The upfirdn function indeed is just the straightforward approach, and laziness in the downsampling avoids computation, but I think the processor still needs to check if values are Nothing in the correlation step.
Is there a way to use laziness to get the same computational savings as the polyphase filter approach? If not, is there a fundamental reason it can't be done?

I don't think laziness is helpful for this kind of problem for two reasons:
In Haskell, laziness is achieved by building unevaluated thunks in memory. This means laziness is not completely free: you still incur the cost of creating the thunk. This cost can be negligible if the evaluation of the thunk is expensive.
However, in your case, for every thunk you are saving yourself a multiplication and an addition, which is only a few CPU instructions. The cost of creating a thunk is probably of the same order of magnitude.
Laziness is helpful when you don't know a priori which elements will be used -- often because the choice depends on the input/environment in some complicated or unknown way, so you would rather defer the decision until later.
In your case, you know exactly which elements will be used: the elements must have indices divisible by n. Therefore, it's going to be more efficient to just iterate through [0, n, 2 * n, 3 * n, ...].
A naive way to add laziness would be to define a lazy multiply-add operation:
(+*) :: Num a => a -> (a, a) -> a
z +* (_, 0) = z
z +* (x, y) = z + x * y
The operation is biased so that if y is zero the calculation is skipped.
Now, when generating the mask via upsample, there is no need to use Maybe: just yield zero instead of Nothing. Then, to calculate the sum, simply use:
correlate xs ys = foldl' (+*) 0 (zip xs ys)

One does not need to upsample and downsample to resample.
If efficiency and performance are not important, you can resample by simple interpolation at each sample point in the new array of equally spaced samples, and just recompute the needed phases or values of a (low-pass/anti-alias) interpolation polynomial at every new interpolation point (instead of precomputing and caching in a poly-phase table).
This also allows "lazy resampling" by only computing the new samples as needed.
There is a "quick-and-dirty" example of how to do this, using a computed von-Hann-Windowed-Sinc interpolation kernel, in Basic, on my DSP blog here:
http://www.nicholson.com/rhn/dsp.html#3
Since this is a just array function computation at each new sample point, it should not be too difficult to convert this procedural Basic into functional Haskell.

Related

Use of folding in defining functions

I was introduced to the use of fold in defining function. I have an idea how that works but im not sure why one should do it. To me, it feels like just simplifying name of data type and data value ... Would be great if you can show me examples where it is significant to use fold.
data List a = Empty | (:-:) a (List a)
--Define elements
List a :: *
[] :: List a
(:) :: a -> List a -> List a
foldrList :: (a -> b -> b) -> b -> List a -> b
foldrList f e Empty = e
foldrList f e (x:-:xs) = f x (foldrList f e xs)
The idea of folding is a powerful one. The fold functions (foldr and foldl in the Haskell base library) come from a family of functions called Higher-Order Functions (for those who don't know - these are functions which take functions as parameters or return functions as their output).
This allows for greater code clarity as the intention of the program is more clearly expressed. A function written using fold functions strongly indicates that there is an intention to iterate over the list and apply a function repeatedly to obtain an output. Using the standard recursive method is fine for simple programs but when complexity increases it can become difficult to understand quickly what is happening.
Greater code re-use can be achieved with folding due to the nature of passing in a function as the parameter. If a program has some behaviour that is affected by the passing of a Boolean or enumeration value then this behaviour can be abstracted away into a separate function. The separate function can then be used as an argument to fold. This achieves greater flexibility and simplicity (as there are 2 simpler functions versus 1 more complex function).
Higher-Order Functions are also essential for Monads.
Credit to the comments for this question as well for being varied and informative.
Higher-order functions like foldr, foldl, map, zipWith, &c. capture common patterns of recursion so you can avoid writing manually recursive definitions. This makes your code higher-level and more readable: instead of having to step through the code and infer what a recursive function is doing, the programmer can reason about compositions of higher-level components.
For a somewhat extreme example, consider a manually recursive calculation of standard deviation:
standardDeviation numbers = step1 numbers
where
-- Calculate length and sum to obtain mean
step1 = loop 0 0
where
loop count sum (x : xs) = loop (count + 1) (sum + x) xs
loop count sum [] = step2 sum count numbers
-- Calculate squared differences with mean
step2 sum count = loop []
where
loop diffs (x : xs) = loop ((x - (sum / count)) ^ 2 : diffs) xs
loop diffs [] = step3 count diffs
-- Calculate final total and return square root
step3 count = loop 0
where
loop total (x : xs) = loop (total + x) xs
loop total [] = sqrt (total / count)
(To be fair, I went a little overboard by also inlining the summation, but this is roughly how it may typically be done in an imperative language—manually looping.)
Now consider a version using a composition of calls to standard functions, some of which are higher-order:
standardDeviation numbers -- The standard deviation
= sqrt -- is the square root
. mean -- of the mean
. map (^ 2) -- of the squares
. map (subtract -- of the differences
(mean numbers)) -- with the mean
$ numbers -- of the input numbers
where -- where
mean xs -- the mean
= sum xs -- is the sum
/ fromIntegral (length xs) -- over the length.
This more declarative code is also, I hope, much more readable—and without the heavy commenting, could be written neatly in two lines. It’s also much more obviously correct than the low-level recursive version.
Furthermore, sum, map, and length can all be implemented in terms of folds, as well as many other standard functions like product, and, or, concat, and so on. Folding is an extremely common operation on not only lists, but all kinds of containers (see the Foldable typeclass), because it captures the pattern of computing something incrementally from all elements of a container.
A final reason to use folds instead of manual recursion is performance: thanks to laziness and optimisations that GHC knows how to perform when you use fold-based functions, the compiler may fuse a series of folds (maps, &c.) together into a single loop at runtime.

How does GHC know how to cache one function but not the others?

I'm reading Learn You a Haskell (loving it so far) and it teaches how to implement elem in terms of foldl, using a lambda. The lambda solution seemed a bit ugly to me so I tried to think of alternative implementations (all using foldl):
import qualified Data.Set as Set
import qualified Data.List as List
-- LYAH implementation
elem1 :: (Eq a) => a -> [a] -> Bool
y `elem1` ys =
foldl (\acc x -> if x == y then True else acc) False ys
-- When I thought about stripping duplicates from a list
-- the first thing that came to my mind was the mathematical set
elem2 :: (Eq a) => a -> [a] -> Bool
y `elem2` ys =
head $ Set.toList $ Set.fromList $ filter (==True) $ map (==y) ys
-- Then I discovered `nub` which seems to be highly optimized:
elem3 :: (Eq a) => a -> [a] -> Bool
y `elem3` ys =
head $ List.nub $ filter (==True) $ map (==y) ys
I loaded these functions in GHCi and did :set +s and then evaluated a small benchmark:
3 `elem1` [1..1000000] -- => (0.24 secs, 160,075,192 bytes)
3 `elem2` [1..1000000] -- => (0.51 secs, 168,078,424 bytes)
3 `elem3` [1..1000000] -- => (0.01 secs, 77,272 bytes)
I then tried to do the same on a (much) bigger list:
3 `elem3` [1..10000000000000000000000000000000000000000000000000000000000000000000000000]
elem1 and elem2 took a very long time, while elem3 was instantaneous (almost identical to the first benchmark).
I think this is because GHC knows that 3 is a member of [1..1000000], and the big number I used in the second benchmark is bigger than 1000000, hence 3 is also a member of [1..bigNumber] and GHC doesn't have to compute the expression at all.
But how is it able to automatically cache (or memoize, a term that Land of Lisp taught me) elem3 but not the two other ones?
Short answer: this has nothing to do with caching, but the fact that you force Haskell in the first two implementations, to iterate over all elements.
No, this is because foldl works left to right, but it will thus keep iterating over the list until the list is exhausted.
Therefore you better use foldr. Here from the moment it finds a 3 it in the list, it will cut off the search.
This is because foldris defined as:
foldr f z [x1, x2, x3] = f x1 (f x2 (f x3 z))
whereas foldl is implemented as:
foldl f z [x1, x2, x3] = f (f (f (f z) x1) x2) x3
Note that the outer f thus binds with x3, so that means foldl first so if due to laziness you do not evaluate the first operand, you still need to iterate to the end of the list.
If we implement the foldl and foldr version, we get:
y `elem1l` ys = foldl (\acc x -> if x == y then True else acc) False ys
y `elem1r` ys = foldr (\x acc -> if x == y then True else acc) False ys
We then get:
Prelude> 3 `elem1l` [1..1000000]
True
(0.25 secs, 112,067,000 bytes)
Prelude> 3 `elem1r` [1..1000000]
True
(0.03 secs, 68,128 bytes)
Stripping the duplicates from the list will not imrpove the efficiency. What here improves the efficiency is that you use map. map works left-to-right. Note furthermore that nub works lazy, so nub is here a no op, since you are only interested in the head, so Haskell does not need to perform memberchecks on the already seen elements.
The performance is almost identical:
Prelude List> 3 `elem3` [1..1000000]
True
(0.03 secs, 68,296 bytes)
In case you work with a Set however, you do not perform uniqueness lazily: you first fetch all the elements into the list, so again, you will iterate over all the elements, and not cut of the search after the first hit.
Explanation
foldl goes to the innermost element of the list, applies the computation, and does so again recursively to the result and the next innermost value of the list, and so on.
foldl f z [x1, x2, ..., xn] == (...((z `f` x1) `f` x2) `f`...) `f` xn
So in order to produce the result, it has to traverse all the list.
Conversely, in your function elem3 as everything is lazy, nothing gets computed at all, until you call head.
But in order to compute that value, you just the first value of the (filtered) list, so you just need to go as far as 3 is encountered in your big list. which is very soon, so the list is not traversed. if you asked for the 1000000th element, eleme3 would probably perform as badly as the other ones.
Lazyness
Lazyness ensure that your language is always composable : breaking a function into subfunction does not changes what is done.
What you are seeing can lead to a space leak which is really about how control flow works in a lazy language. both in strict and in lazy, your code will decide what gets evaluated, but with a subtle difference :
In a strict language, the builder of the function will choose, as it forces evaluation of its arguments: whoever is called is in charge.
In a lazy language, the consumer of the function chooses. whoever called is in charge. It may choose to only evaluate the first element (by calling head), or every other element. All that provided its own caller choose to evaluate his own computation as well. there is a whole chain of command deciding what to do.
In that reading, your foldl based elem function uses that "inversion of control" in an essential way : elem gets asked to produce a value. foldl goes deep inside the list. if the first element if y then it return the trivial computation True. if not, it forwards the requests to the computation acc. In other words, what you read as values acc, x or even True, are really placeholders for computations, which you receive and yield back. And indeed, acc may be some unbelievably complex computation (or divergent one like undefined), as long as you transfer control to the computation True, your caller will never see the existence of acc.
foldr vs foldl vs foldl' VS speed
As suggested in another answer, foldr might best your intent on how to traverse the list, and will shield you away from space leaks (whereas foldl' will prevent space leaks as well if you really want to traverse the other way, which can lead to buildup of complex computations ... and can be very useful for circular computation for instance).
But the speed issue is really an algorithmic one. There might be better data structure for set membership if and only if you know beforehand that you have a certain pattern of usage.
For instance, it might be useful to pay some upfront cost to have a Set, then have fast membership queries, but that is only useful if you know that you will have such a pattern where you have a few sets and lots of queries to those sets. Other data structure are optimal for other patterns, and it's interesting to note that from a API/specification/interface point of view, they are usually the same to the consumer. That's a general phenomena in any languages, and why many people love abstract data types/modules in programming.
Using foldr and expecting to be faster really encodes the assumption that, given your static knowledge of your future access pattern, the values you are likely to test membership of will sit at the beginning. Using foldl would be fine if you expect your values to be at the end of it.
Note that using foldl, you might construct the entire list, you do not construct the values themselves, until you need it of course, for instance to test for equality, as long as you have not found the searched element.

haskell - is it my partitions lazy?

For example
partitions [1,2,3] =
[([],[1,2,3])
,([1],[2,3])
,([1,2],[3])
,([1,2,3],[])]
partitions :: [a] -> [([a], [a])]
partitions (x:xs) = ([], x:xs):[(x:ys, rs) | (ys, rs) <- partitions xs]
I wonder is it lazy solution. For example partitions [1..] is infinite. Additionally, take 5 $ partitions [1..] is also infinite. I think that obvious given the fact that result of this function is infinite. However, I am not sure if it is lazy, if I correctly understand laziness.
There are different degrees of laziness.
One might say that your function is strict since partitions undefined triggers an exception, but that would be too pedantic.
Chances are that by "lazy" you actually mean "it will produce a part the output after having accessed only a part of the input". Several degrees of laziness then arise, depending on how much input is needed for each part of the output.
In your case, the shape of the function is as follows:
foo [] = (some constant value)
foo (x:xs) = C expression1 ... expressionN
where C is a value constructor. More precisely, C = (:) and N=2. Since Haskell constructors are lazy (unless bang annotations are involved), the result of foo (x:xs) will always be non-bottom: consuming an element in the input list is enough to produce an element of the output list.
You might be confused by the output of partitions [1..] being an infinite list of pairs (xs, ys) where each ys is an infinite list. This makes the notion of laziness much more complex, since you might now wonder, e.g., "how much input is accessed for me to take the 100th pair of the output, and then access the 500th element of its second component?". Such questions are perfectly legitimate, and tricky to answer in general.
Still, your code will never demand the full input list to output a finite portion of the output. This makes it "lazy".
For completeness, let me show a non lazy function:
partitions (x:xs) = case partitions xs of
[] -> expression0
(y:ys) -> expression1 : expression2
Above, the result of the recursive call is demanded before the head of the output list is produced. That will demand the whole input before any part of the output is generated.

A pipeline of maps/folds where each "transformation layer" is run in parallel in Haskell? ("Vertical" parallelism as opposed to "horizontal" parMap.)

Most questions I've seen regarding parallel list processing are concerned with the kind of parallelism achieved by chunking the list and processing each chunk in parallel to each other.
My question is different.
I have something simpler/more stupid in mind concerning a sequence of maps and folds: what if we want to simply set up a job for the first map which should be done in parallel to the second map?
The structure of the computation I'm thinking of:
xs -- initial data
ys = y1 : y2 : ... : yn -- y1 = f x1, ... and so on.
-- computed in one parallel job.
zs = z1 : z2 : ... : zn -- z1 = g y1, ... and so on.
-- computed in another job (the applications of `g`), i.e., the "main" job.
Will something in the spirit of the following code work?
ys = map f xs
zs = ys `par` map g' ys
where g' y = y `pseq` g y
I'd only need to say that ys should be evaluated with a kind of
deepSeq instead of simply writing:
ys `par` ...
So while the main job would be busy with computing a g, we are also forcing the premature computation of ys in parallel.
Is there anything wrong with this approach?
The documentation and examples on par and pseq are a bit scarce for me to understand how this will work out. The difference of my code from what I've seen in some examples is that the values on the left side of par and pseq are different in my code.
Discussion
I can think of similar parallelization for other kinds of transformations (fold, scan, and more complex compositions).
For one thing, I'm afraid that the elements of ys could be evaluated
twice if g is too quick...
This should give a fixed two-times speedup with 2 cores.
And if there are more such costly transformation nodes (say, N) in my pipeline, I'd get a fixed N-times speedup.
As for my vertical parallelization vs theirs(1,2,etc.) horizontal (achieved with parMap): I want to get faster streaming. In other words: I want to see the intermediate results (incremental inits zs) faster in the first place.
CORRECTION
It seems that I didn't understand pseq. Consider my old code from above:
zs = ys `par` map g' ys
where g' y = y `pseq` g y
and re-read the documentation for pseq:
seq is strict in both its arguments, so the compiler may, for
example, rearrange
a `seq` b
into ... . ... it can be a problem when annotating code for
parallelism, because we need more control over the order of
evaluation; we may want to evaluate a before b, because we know
that b has already been sparked in parallel with par.
So, in my case, y is a part of value which I want to have sparked and forced with par So it's like b, and there is no need/sense to put it under a pseq, right?
But I'm a bit afraid still whether its computation can accidentally be duplicated if we are too fast in the map....
I've also had a look at Ch.2 of Parallel and Concurrent Programming in Haskell, but they talk about rpar and rseq...
And they seem to imply that it is OK to do rpar/rpar or rpar/rseq/rseq without extra worrying for waiting for the value in the case of rpar/rpar. Have I got something wrong?

Complexity of two cumulative sum (cumsum) functions in Haskell

Consider the following two cumulative sum (cumsum) functions:
cumsum :: Num a => [a] -> [a]
cumsum [] = []
cumsum [x] = [x]
cumsum (x:y:ys) = x : (cumsum $ (x+y) : ys)
and
cumsum' :: Num a => [a] -> [a]
cumsum' x = [sum $ take k x | k <- [1..length x]]
Of course, I prefer the definition of cumsum to that of cumsum' and I understand that the former has linear complexity.
But just why does cumsum' also have linear complexity? take itself has linear complexity in the length of its argument and k runs from 1 to length x. Therefore I'd have expected quadratic complexity for cumsum'.
Moreover, the constant of cumsum' is lower than that of cumsum. Is that due to the recursive list appending of the latter?
NOTE: welcoming any smart definition of a cumulative sum.
EDIT: I'm measuring execution times using (after enabling :set +s in GHCi):
last $ cumsum [1..n]
This is a measurement error caused by laziness.
Every value in Haskell is lazy: it isn't evaluated until necessary. This includes sub-structure of values - so for example when we see a pattern (x:xs) this only forces evaluation of the list far enough to identify that the list is non-empty, but it doesn't force the head x or the tail xs.
The definition of last is something like:
last [x] = x
last (x:xs) = last xs
So when last is applied to the result of cumsum', it inspects the list comprehension recursively, but only enough to track down the last entry. It doesn't force any of the entries, but it does return the last one.
When this last entry is printed in ghci or whatever, then it is forced which takes linear time as expected. But the other entries are never calculated so we don't see the "expected" quadratic behaviour.
Using maximum instead of last does demonstrate that cumnorm' is quadratic whereas cumnorm is linear.
[Note: this explanation is somewhat hand-wavy: really evaluation is entirely driven by what's needed for the final result, so even last is only evaluted at all because its result is needed. Search for things like "Haskell evaluation order" and "Weak Head Normal Form" to get a more precise explanation.]

Resources