Updating a value in Data.ByteString - haskell

The C language provides a very handy way of updating the nth element of an array: array[n] = new_value. My understanding of the Data.ByteString type is that it provides a very similar functionality to a C array of uint8_t - access via index :: ByteString -> Int -> Word8. It appears that the opposite operation - updating a value - is not that easy.
My initial approach was to use the take, drop and singleton functions, concatetaned in the following way:
updateValue :: ByteString -> Int -> Word8 -> ByteString
updateValue bs n value = concat [take (n-1) bs, singleton value, drop (n+1) bs]
(this is a very naive implementation as it does not handle edge cases)
Coming with a C background, it feels a bit too heavyweight to call 4 functions to update one value. Theoretically, the operation complexity is not that bad:
take is O(1)
drop is O(1)
singleton is O(1)
concat is O(n), but here I am not sure if the n is the length of the concatenated list altogether or if its just, in our case, 3.
My second approach was to ask Hoogle for a function with a similar type signature: ByteString -> Int -> a -> ByteString, but nothing appropriate appeared.
Am I missing something very obvious, or is really that complex to update the value?
I would like to note that I understand the fact that the ByteString is immutable and that changing any of its elements will result into a new ByteString instance.
EDIT:
A possible solution that I found while reading about the Control.Lens library uses the set lens. The following is an outtake from GHCi with omitted module names:
> import Data.ByteString
> import Control.Lens
> let clock = pack [116, 105, 99, 107]
> clock
"tick"
> let clock2 = clock & ix 1 .~ 111
> clock2
"tock"

One solution is to convert the ByteString to a Storable Vector, then modify that:
import Data.ByteString (ByteString)
import Data.Vector.Storable (modify)
import Data.Vector.Storable.ByteString -- provided by the "spool" package
import Data.Vector.Storable.Mutable (write)
import Data.Word (Word8)
updateAt :: Int -> Word8 -> ByteString -> ByteString
updateAt n x s = vectorToByteString . modify inner . byteStringToVector
where
inner v = write v n x
See the documentation for spool and vector.

Related

Inverse of `Data.Text.Encoding.decodeLatin1`?

Is there a function f :: Text -> Maybe ByteString such that forall x:
f (decodeLatin1 x) == Just x
Note, decodeLatin1 has the signature:
decodeLatin1 :: ByteString -> Text
I'm concerned that encodeUtf8 is not what I want, as I'm guessing what it does is just dump the UTF-8 string out as a ByteString, not reverse the changes that decodeLatin1 made on the way in to characters in the upper half of the character set.
I understand that f has to return a Maybe, because in general there's Unicode characters that aren't in the Latin character set, but I just want this to round trip at least, in that if we start with a ByteString we should get back to it.
DISCLAIMER: consider this a long comment rather than a solution, because I haven't tested.
I think you can do it with witch library. It is a general purpose type converter library with a fair amount of type safety. There is a type class called TryFrom to perform conversion between types that might fail to cast.
Luckily witch provides conversions from/to encondings too, having an instance TryFrom Text (ISO_8859_1 ByteString), meaning that you can convert between Text and latin1 encoded ByteString. So I think (not tested!!) this should work
{-# LANGUAGE TypeApplications #-}
import Witch (tryInto, ISO_8859_1)
import Data.Tagged (Tagged(unTagged))
f :: Text -> Maybe ByteString
f s = case tryInto #(ISO_8859_1 ByteString) s of
Left err -> Nothing
Right bs -> Just (unTagged bs)
Notice that tryInto returns a Either TryFromException s, so if you want to handle errors you can do it with Either. Up to you.
Also, witch docs points out that this conversion is done via String type, so probably there is an out-of-the-box solution without the need of depending on witch package. I don't know such a solution, and looking to the source code hasn't helped
Edit:
Having read witch source code aparently this should work
import qualified Data.Text as T
import Data.Char (isLatin1)
import qualified Data.ByteString.Char8 as C
f :: Text -> Maybe ByteString
f t = if allCharsAreLatin then Just (C.pack str) else Nothing
where str = T.unpack t
allCharsAreLatin = all isLatin1 str
The latin1 encoding is pretty damn simple -- codepoint X maps to byte X, whenever that's in range of a byte. So just unpack and repack immediately.
import Control.Monad
import qualified Data.Text as T
import qualified Data.ByteString.Char8 as BS
latin1EncodeText :: T.Text -> Maybe BS.ByteString
latin1EncodeText t = BS.pack (T.unpack t) <$ guard (T.all (<'\256') t)
It's possible to avoid the intermediate String, but you should probably make sure this is your bottleneck before trying for that.

Can I exploit lazy evaluation to reference future values without space leaks?

I'm looking to try to run a moderately expensive function on a large list of inputs, using part of the output of that function as one of its inputs. The code runs as expected, unfortunately it consumes a large amount of memory in the process (just under 22GiB on the heap, just over 1GiB maximum residency). Here is a simplified example of what I mean:
{-# LANGUAGE OverloadedStrings #-}
import Data.List (foldl')
import qualified Data.Text as T
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.IO as TL
import qualified Data.Text.Lazy.Builder as TB
main :: IO ()
main = TL.putStr $ TB.toLazyText showInts
showInts :: TB.Builder
showInts = foldMap fst shownLines
where
shownLines = map (showInt maxwidth) [0..10^7]
maxwidth = foldl' (\n -> max n . snd) 0 shownLines
showInt :: Int -> Int -> (TB.Builder, Int)
showInt maxwidth n = (builder, len)
where
builder = TB.fromText "This number: "
<> TB.fromText (T.replicate (maxwidth - len) " ") <> thisText
<> TB.singleton '\n'
(thisText, len) = expensiveShow n
expensiveShow :: Int -> (TB.Builder, Int)
expensiveShow n = (TB.fromText text, T.length text)
where text = T.pack (show n)
Note that in the where clause of showInts, showInt takes maxwidth as an argument, where maxwidth itself depends on the output of running showInt maxwidth on the whole list.
If, on the other hand, I do the naìˆve thing and replace the definition of maxwidth with foldl' max 0 $ map (snd . expensiveShow) [0..10^7], then maximum residency falls to just 44KiB. I would hope that performance like this would be achievable without workarounds like precomputing expensiveShow and then zipping it with the list [0..10^7].
I tried consuming the list strictly (using the foldl package), but this did not improve the situation.
I'm trying to have my cake and eat it too: exploiting laziness, while also making things strict enough that we don't build up a mountain of thunks. Is this possible to do? Or is there a better technique for accomplishing this?
You can't do it like this.
The problem is that your showInts has to traverse the list twice, first to find the longest number, second to print the numbers with the necessary format. That means the list has to be held in memory between the first and second passes. This isn't a problem with unevaluated thunks; it is simply that the whole list, completely evaluated, is being traversed twice.
The only solution is to generate the same list twice. In this case it is trivial; just have two [0..10^7] values, one for the maximum length and the second to format them. I suspect in your real application you are reading them from a file or something, in which case you need to read the file twice.

Efficient bitstreams in Haskell

In an ongoing endeavour to efficiently fiddle with bits (e.g. see this SO question) the newest challenge is the efficient streaming and consumption of bits.
As a first simple task I choose to find the longest sequence of identical bits in a bitstream generated by /dev/urandom. A typical incantation would be head -c 1000000 </dev/urandom | my-exe. The actual goal is to stream bits and decode an Elias gamma code, for example, i.e. codes that are not chunks of bytes or multiples thereof.
For such codes of variable length it is nice to have the take, takeWhile, group, etc. language for list manipulation. Since a BitStream.take would actually consume part of the bistream some monad would probably come into play.
The obvious starting point is the lazy bytestring from Data.ByteString.Lazy.
A. Counting bytes
This very simple Haskell program performs on par with a C program, as is to be expected.
import qualified Data.ByteString.Lazy as BSL
main :: IO ()
main = do
bs <- BSL.getContents
print $ BSL.length bs
B. Adding bytes
Once I start using unpack things should get worse.
main = do
bs <- BSL.getContents
print $ sum $ BSL.unpack bs
Suprisingly, Haskell and C show the almost same performance.
C. Longest sequence of identical bits
As a first nontrivial task the longest sequence of identical bits can be found like this:
module Main where
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Lazy as BSL
import Data.List (group)
import Data.Word8 (Word8)
splitByte :: Word8 -> [Bool]
splitByte w = Prelude.map (\i-> (w `shiftR` i) .&. 1 == 1) [0..7]
bitStream :: BSL.ByteString -> [Bool]
bitStream bs = concat $ map splitByte (BSL.unpack bs)
main :: IO ()
main = do
bs <- BSL.getContents
print $ maximum $ length <$> (group $ bitStream bs)
The lazy bytestring is converted to a list [Word8] and then, using shifts, each Word is split into the bits, resulting in a list [Bool]. This list of lists is then flattened with concat. Having obtained a (lazy) list of Bool, use group to split the list into sequences of identical bits and then map length over it. Finally maximum gives the desired result. Quite simple, but not very fast:
# C
real 0m0.606s
# Haskell
real 0m6.062s
This naive implementation is exactly one order of magnitude slower.
Profiling shows that quite a lot of memory gets allocated (about 3GB for parsing 1MB of input). There is no massive space leak to be observed, though.
From here I start poking around:
There is a bitstream package that promises "Fast, packed, strict bit streams (i.e. list of Bools) with semi-automatic stream fusion.". Unfortunately it is not up-to-date with the current vector package, see here for details.
Next, I investigate streaming. I don't quite see why I should need 'effectful' streaming that brings some monad into play - at least until I start with the reverse of the posed task, i.e. encoding and writing bitstreams to file.
How about just fold-ing over the ByteString? I'd have to introduce state to keep track of consumed bits. That's not quite the nice take, takeWhile, group, etc. language that is desirable.
And now I'm not quite sure where to go.
Update:
I figured out how to do this with streaming and streaming-bytestring. I'm probably not doing this right because the result is catastrophically bad.
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Streaming as BSS
import Data.Word8 (Word8)
import qualified Streaming as S
import Streaming.Prelude (Of, Stream)
import qualified Streaming.Prelude as S
splitByte :: Word8 -> [Bool]
splitByte w = (\i-> (w `shiftR` i) .&. 1 == 1) <$> [0..7]
bitStream :: Monad m => Stream (Of Word8) m () -> Stream (Of Bool) m ()
bitStream s = S.concat $ S.map splitByte s
main :: IO ()
main = do
let bs = BSS.unpack BSS.getContents :: Stream (Of Word8) IO ()
gs = S.group $ bitStream bs :: Stream (Stream (Of Bool) IO) IO ()
maxLen <- S.maximum $ S.mapped S.length gs
print $ S.fst' maxLen
This will test your patience with anything beyond a few thousand bytes of input from stdin. The profiler says it spends an insane amount of time (quadratic in the input size) in Streaming.Internal.>>=.loop and Data.Functor.Of.fmap. I'm not quite sure what the first one is, but the fmap indicates (?) that the juggling of these Of a b isn't doing us any good and because we're in the IO monad it can't be optimised away.
I also have the streaming equivalent of the byte adder here: SumBytesStream.hs, which is slightly slower than the simple lazy ByteString implementation, but still decent. Since streaming-bytestring is proclaimed to be "bytestring io done right" I expected better. I'm probably not doing it right, then.
In any case, all these bit-computations shouldn't be happening in the IO monad. But BSS.getContents forces me into the IO monad because getContents :: MonadIO m => ByteString m () and there's no way out.
Update 2
Following the advice of #dfeuer I used the streaming package at master#HEAD. Here's the result.
longest-seq-c 0m0.747s (C)
longest-seq 0m8.190s (Haskell ByteString)
longest-seq-stream 0m13.946s (Haskell streaming-bytestring)
The O(n^2) problem of Streaming.concat is solved, but we're still not getting closer to the C benchmark.
Update 3
Cirdec's solution produces a performance on par with C. The construct that is used is called "Church encoded lists", see this SO answer or the Haskell Wiki on rank-N types.
Source files:
All the source files can be found on github. The Makefile has all the various targets to run the experiments and the profiling. The default make will just build everything (create a bin/ directory first!) and then make time will do the timing on the longest-seq executables. The C executables get a -c appended to distinguish them.
Intermediate allocations and their corresponding overhead can be removed when operations on streams fuse together. The GHC prelude provides foldr/build fusion for lazy streams in the form of rewrite rules. The general idea is if one function produces a result that looks like a foldr (it has the type (a -> b -> b) -> b -> b applied to (:) and []) and another function consumes a list that looks like a foldr, constructing the intermediate list can be removed.
For your problem I'm going to build something similar, but using strict left folds (foldl') instead of foldr. Instead of using rewrite rules that try to detect when something looks like a foldl, I'll use a data type that forces lists to look like left folds.
-- A list encoded as a strict left fold.
newtype ListS a = ListS {build :: forall b. (b -> a -> b) -> b -> b}
Since I've started by abandoning lists we'll be re-implementing part of the prelude for lists.
Strict left folds can be created from the foldl' functions of both lists and bytestrings.
{-# INLINE fromList #-}
fromList :: [a] -> ListS a
fromList l = ListS (\c z -> foldl' c z l)
{-# INLINE fromBS #-}
fromBS :: BSL.ByteString -> ListS Word8
fromBS l = ListS (\c z -> BSL.foldl' c z l)
The simplest example of using one is to find the length of a list.
{-# INLINE length' #-}
length' :: ListS a -> Int
length' l = build l (\z a -> z+1) 0
We can also map and concatenate left folds.
{-# INLINE map' #-}
-- fmap renamed so it can be inlined
map' f l = ListS (\c z -> build l (\z a -> c z (f a)) z)
{-# INLINE concat' #-}
concat' :: ListS (ListS a) -> ListS a
concat' ll = ListS (\c z -> build ll (\z l -> build l c z) z)
For your problem we need to be able to split a word into bits.
{-# INLINE splitByte #-}
splitByte :: Word8 -> [Bool]
splitByte w = Prelude.map (\i-> (w `shiftR` i) .&. 1 == 1) [0..7]
{-# INLINE splitByte' #-}
splitByte' :: Word8 -> ListS Bool
splitByte' = fromList . splitByte
And a ByteString into bits
{-# INLINE bitStream' #-}
bitStream' :: BSL.ByteString -> ListS Bool
bitStream' = concat' . map' splitByte' . fromBS
To find the longest run we'll keep track of the previous value, the length of the current run, and the length of the longest run. We make the fields strict so that the strictness of the fold prevents chains of thunks from being accumulated in memory. Making a strict data type for a state is an easy way to get control over both its memory representation and when its fields are evaluated.
data LongestRun = LongestRun !Bool !Int !Int
{-# INLINE extendRun #-}
extendRun (LongestRun previous run longest) x = LongestRun x current (max current longest)
where
current = if x == previous then run + 1 else 1
{-# INLINE longestRun #-}
longestRun :: ListS Bool -> Int
longestRun l = longest
where
(LongestRun _ _ longest) = build l extendRun (LongestRun False 0 0)
And we're done
main :: IO ()
main = do
bs <- BSL.getContents
print $ longestRun $ bitStream' bs
This is much faster, but not quite the performance of c.
longest-seq-c 0m00.12s (C)
longest-seq 0m08.65s (Haskell ByteString)
longest-seq-fuse 0m00.81s (Haskell ByteString fused)
The program allocates about 1 Mb to read 1000000 bytes from input.
total alloc = 1,173,104 bytes (excludes profiling overheads)
Updated github code
I found another solution that is on par with C. The Data.Vector.Fusion.Stream.Monadic has a stream implementation based on this Coutts, Leshchinskiy, Stewart 2007 paper. The idea behind it is to use a destroy/unfoldr stream fusion.
Recall that list's unfoldr :: (b -> Maybe (a, b)) -> b -> [a] creates a list by repeatedly applying (unfolding) a step-forward function, starting with an initial value. A Stream is just an unfoldr function with starting state. (The Data.Vector.Fusion.Stream.Monadic library uses GADTs to create constructors for Step that can be pattern-matched conveniently. It could just as well be done without GADTs, I think.)
The central piece of the solution is the mkBitstream :: BSL.ByteString -> Stream Bool function that turns a BytesString into a stream of Bool. Basically, we keep track of the current ByteString, the current byte, and how much of the current byte is still unconsumed. Whenever a byte is used up another byte is chopped off ByteString. When Nothing is left, the stream is Done.
The longestRun function is taken straight from #Cirdec's solution.
Here's the etude:
{-# LANGUAGE CPP #-}
#define PHASE_FUSED [1]
#define PHASE_INNER [0]
#define INLINE_FUSED INLINE PHASE_FUSED
#define INLINE_INNER INLINE PHASE_INNER
module Main where
import Control.Monad.Identity (Identity)
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Lazy as BSL
import Data.Functor.Identity (runIdentity)
import qualified Data.Vector.Fusion.Stream.Monadic as S
import Data.Word8 (Word8)
type Stream a = S.Stream Identity a -- no need for any monad, really
data Step = Step BSL.ByteString !Word8 !Word8 -- could use tuples, but this is faster
mkBitstream :: BSL.ByteString -> Stream Bool
mkBitstream bs' = S.Stream step (Step bs' 0 0) where
{-# INLINE_INNER step #-}
step (Step bs w n) | n==0 = case (BSL.uncons bs) of
Nothing -> return S.Done
Just (w', bs') -> return $
S.Yield (w' .&. 1 == 1) (Step bs' (w' `shiftR` 1) 7)
| otherwise = return $
S.Yield (w .&. 1 == 1) (Step bs (w `shiftR` 1) (n-1))
data LongestRun = LongestRun !Bool !Int !Int
{-# INLINE extendRun #-}
extendRun :: LongestRun -> Bool -> LongestRun
extendRun (LongestRun previous run longest) x = LongestRun x current (max current longest)
where current = if x == previous then run + 1 else 1
{-# INLINE longestRun #-}
longestRun :: Stream Bool -> Int
longestRun s = runIdentity $ do
(LongestRun _ _ longest) <- S.foldl' extendRun (LongestRun False 0 0) s
return longest
main :: IO ()
main = do
bs <- BSL.getContents
print $ longestRun (mkBitstream bs)

Rechunk a conduit into larger chunks using combinators

I am trying to construct a Conduit that receives as input ByteStrings (of around 1kb per chunk in size) and produces as output concatenated ByteStrings of 512kb chunks.
This seems like it should be simple to do, but I'm having a lot of trouble, most of the strategies I've tried using have only succeeded in dividing the chunks into smaller chunks, I haven't succeeded in concatenating larger chunks.
I started out trying isolate, then takeExactlyE and eventually conduitVector, but to no avail. Eventually I settled on this:
import qualified Data.Conduit as C
import qualified Data.Conduit.Combinators as C
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL
chunksOfAtLeast :: Monad m => Int -> C.Conduit B.ByteString m BL.ByteString
chunksOfAtLeast chunkSize = loop BL.empty chunkSize
where
loop buffer n = do
mchunk <- C.await
case mchunk of
Nothing ->
-- Yield last remaining bytes
when (n < chunkSize) (C.yield buffer)
Just chunk -> do
-- Yield when the buffer has been filled and start over
let buffer' = buffer <> BL.fromStrict chunk
l = B.length chunk
if n <= l
then C.yield buffer' >> loop BL.empty chunkSize
else loop buffer' (n - l)
P.S. I decided not to split larger chunks for this function, but this was just a convenient simplification.
However, this seems very verbose given all the conduit functions that deal with chunking[1,2,3,4]. Please help! There must surely be a better way to do this using combinators, but I am missing some piece of intuition!
P.P.S. Is it ok to use lazy bytestring for the buffer as I've done? I'm a bit unclear about the internal representation for bytestring and whether this will help, especially since I'm using BL.length which I guess might evaluate the thunk anyway?
Conclusion
Just to elaborate on Michael's answer and comments, I ended up with this conduit:
import qualified Data.Conduit as C
import qualified Data.Conduit.Combinators as C
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL
-- | "Strict" rechunk of a chunked conduit
chunksOfE' :: (MonadBase base m, PrimMonad base)
=> Int
-> C.Conduit ByteString m ByteString
chunksOfE' chunkSize = C.vectorBuilder chunkSize C.mapM_E =$= C.map fromByteVector
My understanding is that vectorBuilder will pay the cost for concatenating the smaller chunks early on, producing the aggregated chunks as strict bytestrings.
From what I can tell, an alternative implementation that produces lazy bytestring chunks (i.e. "chunked chunks") might be desirable when the aggregated chunks are very large and/or feed into a naturally streaming interface like a network socket. Here's my best attempt at the "lazy bytestring" version:
import qualified Data.Sequences.Lazy as SL
import qualified Data.Sequences as S
import qualified Data.Conduit.List as CL
-- | "Lazy" rechunk of a chunked conduit
chunksOfE :: (Monad m, SL.LazySequence lazy strict)
=> S.Index lazy
-> C.Conduit strict m lazy
chunksOfE chunkSize = CL.sequence C.sinkLazy =$= C.takeE chunkSize
How about this?
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
import ClassyPrelude.Conduit
chunksOfAtLeast :: Monad m => Int -> Conduit ByteString m LByteString
chunksOfAtLeast chunkSize =
loop
where
loop = do
lbs <- takeCE chunkSize =$= sinkLazy
unless (null lbs) $ do
yield lbs
loop
main :: IO ()
main =
yieldMany ["hello", "there", "world!"]
$$ chunksOfAtLeast 3
=$ mapM_C print
There are lots of other approaches that you could take depending on your goals. If you wanted to have a strict buffer, then using blaze-builder of vectorBuilder would make a lot of sense. But this keeps the same type signature you have already.

Efficient hash map container in Haskell?

I want to count unique blocks stored in a file using Haskell.
The block is just consecutive bytes with a length of 512 and the target file has a size of at least 1GB.
This is my initial try.
import Control.Monad
import qualified Data.ByteString.Lazy as LB
import Data.Foldable
import Data.HashMap
import Data.Int
import qualified Data.List as DL
import System.Environment
type DummyDedupe = Map LB.ByteString Int64
toBlocks :: Int64 -> LB.ByteString -> [LB.ByteString]
toBlocks n bs | LB.null bs = []
| otherwise = let (block, rest) = LB.splitAt n bs
in block : toBlocks n rest
dedupeBlocks :: [LB.ByteString] -> DummyDedupe -> DummyDedupe
dedupeBlocks = flip $ DL.foldl' (\acc block -> insertWith (+) block 1 $! acc)
dedupeFile :: FilePath -> DummyDedupe -> IO DummyDedupe
dedupeFile fp dd = LB.readFile fp >>= return . (`dedupeBlocks` dd) . toBlocks 512
main :: IO ()
main = do
dd <- getArgs >>= (`dedupeFile` empty) . head
putStrLn . show . (*512) . size $ dd
putStrLn . show . (*512) . foldl' (+) 0 $ dd
It works, but I got frustrated with its execution time and memory usage. Especilly when I compared with those of C++ and even Python implementation listed below, it was 3~5x slower and consumed 2~3x more memory space.
import os
import os.path
import sys
def dedupeFile(dd, fp):
fd = os.open(fp, os.O_RDONLY)
for block in iter(lambda : os.read(fd, 512), ''):
dd.setdefault(block, 0)
dd[block] = dd[block] + 1
os.close(fd)
return dd
dd = {}
dedupeFile(dd, sys.argv[1])
print(len(dd) * 512)
print(sum(dd.values()) * 512)
I thought it was mainly due to the hashmap implementation, and tried other implementations such as hashmap, hashtables and unordered-containers.
But there wasn't any noticeable difference.
Please help me to improve this program.
I don't think you will be able to beat the performance of python dictionaries. They are actually implemented in c with years of optimizations put into it on the other hand hashmap is new and not that much optimized. So getting 3x performance in my opinion is good enough. You can optimize you haskell code at certain places but still it won't matter much. If you are still adamant about increasing performance I think you should use a highly optimized c library with ffi in your code.
Here are some of the similar discussions
haskell beginners
This may be completely irrelevant depending on your usage, but I am slightly worried about insertWith (+) block 1. If your counts reach high numbers, you will accumulate thunks in the cells of the hash map. It doesn't matter that you used ($!), that only forces the spine -- the values are likely still lazy.
Data.HashMap provides no strict version insertWith' like Data.Map does. But you can implement it:
insertWith' :: (Hashable k, Ord k) => (a -> a -> a) -> k -> a
-> HashMap k a -> HashMap k a
insertWith' f k v m = maybe id seq maybeval m'
where
(maybeval, m') = insertLookupWithKey (const f) k v m
Also, you may want to output (but not input) a list of strict ByteStrings from toBlocks, which will make hashing faster.
That's all I've got -- I'm no performance guru, though.

Resources