How to convert between ByteString and Storable Vector? - haskell

What is the best way to convert between Storable.Vector Word8 and a strict ByteString?
Of course a non-copying (no-op) way would be most appreciated.
Should I just unsafeCoerce or is there a library function for that (I couldn't find one)?
Also, will the approach be the same for an Unboxed.Vector Word8?

A simple unsafeCoerce will not work, as the layout of the data constructors is different:
data StorableArray i e = StorableArray !i !i Int !(ForeignPtr e)
vs.
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
You can import Data.Array.Storable.Internals and Data.ByteString.Internal to get access to the raw constructors and then construct one out of the other without copying the data:
> let bs = pack [1,2,3]
> bs
"\SOH\STX\ETX"
> let sa = case bs of (PS ptr 0 n) -> StorableArray 0 (n-1) n ptr
> :t sa
sa :: StorableArray Int GHC.Word.Word8
> Data.Array.MArray.readArray sa 1
2
> Data.Array.MArray.readArray sa 0
1
> Data.Array.MArray.readArray sa 3
*** Exception: Ix{Int}.index: Index (3) out of range ((0,2))
(I removed the rather long prompt of Prelude Data.Array.Storable.Internals Data.ByteString.Internal Data.ByteString>).
This will not work for Data.Vector.Unboxed, because here the data is on the Haskell heap and managed by the GHC runtime, while the other two manage the data outside the Haskell heap.

byteStringToVector :: (Storable a) => BS.ByteString -> V.Vector a
byteStringToVector bs = vec where
vec = V.unsafeFromForeignPtr (castForeignPtr fptr) (scale off) (scale len)
(fptr, off, len) = BS.toForeignPtr bs
scale = (`div` sizeOfElem vec)
sizeOfElem vec = sizeOf (undefined `asTypeOf` V.head vec)
http://hackage.haskell.org/packages/archive/spool/0.1/doc/html/Data-Vector-Storable-ByteString.html

Related

How to evaluate a list into its normal form with a good running speed?

I am working with an implement of SHA-1, using Haskell. However, I got into trouble with seq, which slows down the speed of the following code.
The following code is simplified but still able to show my problem. (The code about IO actions is reserved, because only the large data makes efforts)
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE ScopedTypeVariables #-}
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as B
import Data.Word
import Data.List
import Data.Bits
import System.IO
import Text.Printf
main :: IO ()
main = do
let filePath2 = "/path/to/a/large/file"
content <- L.readFile filePath2
print $ sha1Run content
type ByteSeq = [Word8]
type HsPair = (ByteSeq, ByteSeq, ByteSeq, ByteSeq, ByteSeq)
data ChunkContext = ChunkContext !Int !Int !B.ByteString !HsPair
deriving Show
initHs :: HsPair -- some initial values, normal form
initHs = let
h0 = [0x67, 0x45, 0x23, 0x01]
h1 = [0xef, 0xcd, 0xab, 0x89]
h2 = [0x98, 0xba, 0xdc, 0xfe]
h3 = [0x10, 0x32, 0x54, 0x76]
h4 = [0xc3, 0xd2, 0xe1, 0xf0]
in (h0, h1, h2, h3, h4)
seqList :: [a] -> [a]
seqList xs = foldr ((.) . seq) id xs xs
sha1Run :: L.ByteString -> ChunkContext
sha1Run = L.foldlChunks chunkRoll initCtx
where
initCtx = ChunkContext 0 0 B.empty initHs
-- Something needed to ensure just 512 bits are passed into chunkUpdate
chunkRoll :: ChunkContext -> B.ByteString -> ChunkContext
chunkRoll ctx#(ChunkContext accuSize restSize rest hs) chunk
| B.null chunk = ctx
| sizeTaken < sizeToTake = ChunkContext 0 0 (rest <> m) hs
| sizeTaken >= sizeToTake = chunkRoll (ChunkContext 0 0 B.empty (chunkUpdate hs chunkData)) ms
where
chunkData = B.unpack chunk
!sizeToTake = 64 - restSize
!sizeTaken = B.length m
(!m, !ms) = B.splitAt sizeToTake chunk
chunkUpdate :: HsPair -> ByteSeq -> HsPair
chunkUpdate hs bs = foldl' step hs [0..79] -- iteration for 80 times
where
step zh i = roundUpdate cz zh
chunkData = bs
cz = take 13 chunkData
-- really simple function
roundUpdate :: ByteSeq -> HsPair -> HsPair
roundUpdate wt hs#(a, b, c, d, e) = (na, nb, nc, nd, ne)
where
!na = seqList $ a
!nb = seqList $ b
!nc = seqList $ c
!nd = seqList $ d
!ne = seqList $ e
The key is seqList! I try to apply seqList to a list to limit the expanding "thunk". When there is no seqList and some extra operation for lists in the function roundUpdate, the program would quickly eat so much memory that I have to force to stop the process. seq is needed, in my view.
But this program runs really slowly when seqList added, especially filePath2 is a "large" file (about 1MB), though the constant menory is allocated.
A profiling report has been generated by ghc -prof, following this guide.
The amazing seqList!
COST CENTRE MODULE SRC %time %alloc
seqList Main lab.hs:39:1-39 93.3 97.3
chunkUpdate Main lab.hs:(58,1)-(62,26) 2.8 1.6
roundUpdate Main lab.hs:(65,1)-(71,21) 1.6 1.0
How solve the performance problem in the code above? How to design the function seqList for the reasonable memory usage and executing speed.

Efficient bitstreams in Haskell

In an ongoing endeavour to efficiently fiddle with bits (e.g. see this SO question) the newest challenge is the efficient streaming and consumption of bits.
As a first simple task I choose to find the longest sequence of identical bits in a bitstream generated by /dev/urandom. A typical incantation would be head -c 1000000 </dev/urandom | my-exe. The actual goal is to stream bits and decode an Elias gamma code, for example, i.e. codes that are not chunks of bytes or multiples thereof.
For such codes of variable length it is nice to have the take, takeWhile, group, etc. language for list manipulation. Since a BitStream.take would actually consume part of the bistream some monad would probably come into play.
The obvious starting point is the lazy bytestring from Data.ByteString.Lazy.
A. Counting bytes
This very simple Haskell program performs on par with a C program, as is to be expected.
import qualified Data.ByteString.Lazy as BSL
main :: IO ()
main = do
bs <- BSL.getContents
print $ BSL.length bs
B. Adding bytes
Once I start using unpack things should get worse.
main = do
bs <- BSL.getContents
print $ sum $ BSL.unpack bs
Suprisingly, Haskell and C show the almost same performance.
C. Longest sequence of identical bits
As a first nontrivial task the longest sequence of identical bits can be found like this:
module Main where
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Lazy as BSL
import Data.List (group)
import Data.Word8 (Word8)
splitByte :: Word8 -> [Bool]
splitByte w = Prelude.map (\i-> (w `shiftR` i) .&. 1 == 1) [0..7]
bitStream :: BSL.ByteString -> [Bool]
bitStream bs = concat $ map splitByte (BSL.unpack bs)
main :: IO ()
main = do
bs <- BSL.getContents
print $ maximum $ length <$> (group $ bitStream bs)
The lazy bytestring is converted to a list [Word8] and then, using shifts, each Word is split into the bits, resulting in a list [Bool]. This list of lists is then flattened with concat. Having obtained a (lazy) list of Bool, use group to split the list into sequences of identical bits and then map length over it. Finally maximum gives the desired result. Quite simple, but not very fast:
# C
real 0m0.606s
# Haskell
real 0m6.062s
This naive implementation is exactly one order of magnitude slower.
Profiling shows that quite a lot of memory gets allocated (about 3GB for parsing 1MB of input). There is no massive space leak to be observed, though.
From here I start poking around:
There is a bitstream package that promises "Fast, packed, strict bit streams (i.e. list of Bools) with semi-automatic stream fusion.". Unfortunately it is not up-to-date with the current vector package, see here for details.
Next, I investigate streaming. I don't quite see why I should need 'effectful' streaming that brings some monad into play - at least until I start with the reverse of the posed task, i.e. encoding and writing bitstreams to file.
How about just fold-ing over the ByteString? I'd have to introduce state to keep track of consumed bits. That's not quite the nice take, takeWhile, group, etc. language that is desirable.
And now I'm not quite sure where to go.
Update:
I figured out how to do this with streaming and streaming-bytestring. I'm probably not doing this right because the result is catastrophically bad.
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Streaming as BSS
import Data.Word8 (Word8)
import qualified Streaming as S
import Streaming.Prelude (Of, Stream)
import qualified Streaming.Prelude as S
splitByte :: Word8 -> [Bool]
splitByte w = (\i-> (w `shiftR` i) .&. 1 == 1) <$> [0..7]
bitStream :: Monad m => Stream (Of Word8) m () -> Stream (Of Bool) m ()
bitStream s = S.concat $ S.map splitByte s
main :: IO ()
main = do
let bs = BSS.unpack BSS.getContents :: Stream (Of Word8) IO ()
gs = S.group $ bitStream bs :: Stream (Stream (Of Bool) IO) IO ()
maxLen <- S.maximum $ S.mapped S.length gs
print $ S.fst' maxLen
This will test your patience with anything beyond a few thousand bytes of input from stdin. The profiler says it spends an insane amount of time (quadratic in the input size) in Streaming.Internal.>>=.loop and Data.Functor.Of.fmap. I'm not quite sure what the first one is, but the fmap indicates (?) that the juggling of these Of a b isn't doing us any good and because we're in the IO monad it can't be optimised away.
I also have the streaming equivalent of the byte adder here: SumBytesStream.hs, which is slightly slower than the simple lazy ByteString implementation, but still decent. Since streaming-bytestring is proclaimed to be "bytestring io done right" I expected better. I'm probably not doing it right, then.
In any case, all these bit-computations shouldn't be happening in the IO monad. But BSS.getContents forces me into the IO monad because getContents :: MonadIO m => ByteString m () and there's no way out.
Update 2
Following the advice of #dfeuer I used the streaming package at master#HEAD. Here's the result.
longest-seq-c 0m0.747s (C)
longest-seq 0m8.190s (Haskell ByteString)
longest-seq-stream 0m13.946s (Haskell streaming-bytestring)
The O(n^2) problem of Streaming.concat is solved, but we're still not getting closer to the C benchmark.
Update 3
Cirdec's solution produces a performance on par with C. The construct that is used is called "Church encoded lists", see this SO answer or the Haskell Wiki on rank-N types.
Source files:
All the source files can be found on github. The Makefile has all the various targets to run the experiments and the profiling. The default make will just build everything (create a bin/ directory first!) and then make time will do the timing on the longest-seq executables. The C executables get a -c appended to distinguish them.
Intermediate allocations and their corresponding overhead can be removed when operations on streams fuse together. The GHC prelude provides foldr/build fusion for lazy streams in the form of rewrite rules. The general idea is if one function produces a result that looks like a foldr (it has the type (a -> b -> b) -> b -> b applied to (:) and []) and another function consumes a list that looks like a foldr, constructing the intermediate list can be removed.
For your problem I'm going to build something similar, but using strict left folds (foldl') instead of foldr. Instead of using rewrite rules that try to detect when something looks like a foldl, I'll use a data type that forces lists to look like left folds.
-- A list encoded as a strict left fold.
newtype ListS a = ListS {build :: forall b. (b -> a -> b) -> b -> b}
Since I've started by abandoning lists we'll be re-implementing part of the prelude for lists.
Strict left folds can be created from the foldl' functions of both lists and bytestrings.
{-# INLINE fromList #-}
fromList :: [a] -> ListS a
fromList l = ListS (\c z -> foldl' c z l)
{-# INLINE fromBS #-}
fromBS :: BSL.ByteString -> ListS Word8
fromBS l = ListS (\c z -> BSL.foldl' c z l)
The simplest example of using one is to find the length of a list.
{-# INLINE length' #-}
length' :: ListS a -> Int
length' l = build l (\z a -> z+1) 0
We can also map and concatenate left folds.
{-# INLINE map' #-}
-- fmap renamed so it can be inlined
map' f l = ListS (\c z -> build l (\z a -> c z (f a)) z)
{-# INLINE concat' #-}
concat' :: ListS (ListS a) -> ListS a
concat' ll = ListS (\c z -> build ll (\z l -> build l c z) z)
For your problem we need to be able to split a word into bits.
{-# INLINE splitByte #-}
splitByte :: Word8 -> [Bool]
splitByte w = Prelude.map (\i-> (w `shiftR` i) .&. 1 == 1) [0..7]
{-# INLINE splitByte' #-}
splitByte' :: Word8 -> ListS Bool
splitByte' = fromList . splitByte
And a ByteString into bits
{-# INLINE bitStream' #-}
bitStream' :: BSL.ByteString -> ListS Bool
bitStream' = concat' . map' splitByte' . fromBS
To find the longest run we'll keep track of the previous value, the length of the current run, and the length of the longest run. We make the fields strict so that the strictness of the fold prevents chains of thunks from being accumulated in memory. Making a strict data type for a state is an easy way to get control over both its memory representation and when its fields are evaluated.
data LongestRun = LongestRun !Bool !Int !Int
{-# INLINE extendRun #-}
extendRun (LongestRun previous run longest) x = LongestRun x current (max current longest)
where
current = if x == previous then run + 1 else 1
{-# INLINE longestRun #-}
longestRun :: ListS Bool -> Int
longestRun l = longest
where
(LongestRun _ _ longest) = build l extendRun (LongestRun False 0 0)
And we're done
main :: IO ()
main = do
bs <- BSL.getContents
print $ longestRun $ bitStream' bs
This is much faster, but not quite the performance of c.
longest-seq-c 0m00.12s (C)
longest-seq 0m08.65s (Haskell ByteString)
longest-seq-fuse 0m00.81s (Haskell ByteString fused)
The program allocates about 1 Mb to read 1000000 bytes from input.
total alloc = 1,173,104 bytes (excludes profiling overheads)
Updated github code
I found another solution that is on par with C. The Data.Vector.Fusion.Stream.Monadic has a stream implementation based on this Coutts, Leshchinskiy, Stewart 2007 paper. The idea behind it is to use a destroy/unfoldr stream fusion.
Recall that list's unfoldr :: (b -> Maybe (a, b)) -> b -> [a] creates a list by repeatedly applying (unfolding) a step-forward function, starting with an initial value. A Stream is just an unfoldr function with starting state. (The Data.Vector.Fusion.Stream.Monadic library uses GADTs to create constructors for Step that can be pattern-matched conveniently. It could just as well be done without GADTs, I think.)
The central piece of the solution is the mkBitstream :: BSL.ByteString -> Stream Bool function that turns a BytesString into a stream of Bool. Basically, we keep track of the current ByteString, the current byte, and how much of the current byte is still unconsumed. Whenever a byte is used up another byte is chopped off ByteString. When Nothing is left, the stream is Done.
The longestRun function is taken straight from #Cirdec's solution.
Here's the etude:
{-# LANGUAGE CPP #-}
#define PHASE_FUSED [1]
#define PHASE_INNER [0]
#define INLINE_FUSED INLINE PHASE_FUSED
#define INLINE_INNER INLINE PHASE_INNER
module Main where
import Control.Monad.Identity (Identity)
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Lazy as BSL
import Data.Functor.Identity (runIdentity)
import qualified Data.Vector.Fusion.Stream.Monadic as S
import Data.Word8 (Word8)
type Stream a = S.Stream Identity a -- no need for any monad, really
data Step = Step BSL.ByteString !Word8 !Word8 -- could use tuples, but this is faster
mkBitstream :: BSL.ByteString -> Stream Bool
mkBitstream bs' = S.Stream step (Step bs' 0 0) where
{-# INLINE_INNER step #-}
step (Step bs w n) | n==0 = case (BSL.uncons bs) of
Nothing -> return S.Done
Just (w', bs') -> return $
S.Yield (w' .&. 1 == 1) (Step bs' (w' `shiftR` 1) 7)
| otherwise = return $
S.Yield (w .&. 1 == 1) (Step bs (w `shiftR` 1) (n-1))
data LongestRun = LongestRun !Bool !Int !Int
{-# INLINE extendRun #-}
extendRun :: LongestRun -> Bool -> LongestRun
extendRun (LongestRun previous run longest) x = LongestRun x current (max current longest)
where current = if x == previous then run + 1 else 1
{-# INLINE longestRun #-}
longestRun :: Stream Bool -> Int
longestRun s = runIdentity $ do
(LongestRun _ _ longest) <- S.foldl' extendRun (LongestRun False 0 0) s
return longest
main :: IO ()
main = do
bs <- BSL.getContents
print $ longestRun (mkBitstream bs)

maxIndex of an MVector

Data.Vector includes a function maxIndex with type maxIndex :: (Ord a) => Vector a -> Int that returns the index of the maximum value in that Vector. I'm working with mutable Vectors, however, and MVector doesn't have maxIndex defined for it.
What's the best way of getting the data I want out of the MVector I have? My code currently is:
import qualified Data.Vector.Unboxed.Mutable as MV
import Control.Monad.ST
import Control.Monad (mapM_)
type MaxIndex = Int
step :: forall s. MV.MVector s Int -> MaxIndex -> ST s ()
step vec i = do
n <- MV.unsafeRead vec i
MV.write vec i 0
let l = MV.length vec
(k, x) = n `divMod` l
mapM_ (\j -> MV.modify vec (+k) j) [0..l-1] -- side note, this is just
-- fmap (+k) vec, but MVector is not
-- a functor. Is there a better way?
mapM_ (\j -> MV.modify vec (+1) (j `mod` l)) [i+1..i+x]
where i is the index I'm looking to derive inside step. I'm doing this because the actions here need to eventually be wrapped inside an until and repeated until a predicate is satisfied, and freezing and thawing every cycle sounds ludicrously expensive.
I see lots of talk about unsafe freezing which seems suspect since you plan to mutate this memory later, thus violating the assurance you are implicitly giving when calling unsafeFreeze.
My suggestion is to just write an imperative-style maxIndex function. The below is typed but not tested:
import qualified Data.Vector.Unboxed.Mutable as MV
import Control.Monad.ST
import Control.Monad (mapM_)
maxIndex :: (Ord a, MV.Unbox a) => MV.MVector s a -> ST s (Maybe Int)
maxIndex mv | len == 0 = pure Nothing
| otherwise = Just <$> go 0 0
where
len = MV.length mv
go n i | i >=len = pure n
| otherwise = do
nVal <- MV.unsafeRead mv n
iVal <- MV.unsafeRead mv i
if nVal < iVal then go i (i+1)
else go n (i+1)
Have you considered freezing the vector with unsafeFreeze which is supposed to be fast (i.e. Θ(1))? For example you can define maxIndex for mutable vectors like this:
maxIndex = fmap V.maxIndex . V.unsafeFreeze
This assumes that you have imported the following:
import qualified Data.Vector.Unboxed as V
unsafeFreeze doesn't actually copy any data and should be fast, but it would be interesting to run a criterion benchmark to see if this approach is actually faster compared to an explicit loop.

Haskell: Efficient way to bitwise cast String to Integer?

What's the most efficient way to cast String (or ByteString) to an integral type bitwise, for example:
smallEndianStringToInt32 :: ByteString -> Int32
smallEndianStringToInt32 str =
case str of
[a,b,c,d] -> shiftL d 24 .|. shiftL c 16 .|. shiftL b 8 .|. a
_ -> 0
In C we can simply do it like this, really low-cost:
char* some_string = "....";
int32_t x = *(int32_t*)some_string;
But the former implementation (smallEndianStringToInt32) doesn't look as fast as the C code.
How can this be done? (Or it actually IS as fast?)
I think this does what you're looking for, but like the other commenters, I'd recommend you clarify what you're actually trying to do, as this is very non-idiomatic Haskell:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Internal
import Foreign.Storable
import Foreign.ForeignPtr
import Foreign.Ptr
import Data.Word (Word32)
main :: IO ()
main = withForeignPtr fptr $ \ptr -> do
i <- peek $ castPtr (ptr `plusPtr` off)
print (i :: Word32)
where
PS fptr off len = "A\0\0\0"
As should be painfully obvious, this code is quite susceptible to segfaults, for example.

Declare all instances of a typeclass are in another typeclass without modifying the original class declarations

There is an Crypto.Random API inside the crypto-api package that specifies what it means for something to be a "pseudorandom number generator".
I have implemented this API using an instance of System.Random's RandomGen class, namely, StdGen:
instance CryptoRandomGen StdGen where
newGen bs = Right $ mkStdGen $ shift e1 24 + shift e2 16 + shift e3 8 + e4
where (e1 : e2 : e3 : e4 : _) = Prelude.map fromIntegral $ unpack bs
genSeedLength = Tagged 4
genBytes n g = Right $ genBytesHelper n empty g
where genBytesHelper 0 partial gen = (partial, gen)
genBytesHelper n partial gen = genBytesHelper (n-1) (partial `snoc` nextitem) newgen
where (nextitem, newgen) = randomR (0, 255) gen
reseed bs _ = newGen bs
However, this implementation is only for the StdGen type, but it would really work for anything in System.Random's RandomGen typeclass.
Is there a way to say that everything in RandomGen is a member of CryptoRandomGen using the given shim functions? I'd like to be able to do this in my own code, without having to change the source of either of those two libraries. My instincts would be to change the first line to something like
instance (RandomGen a) => CryptoRandomGen a where
but that doesn't appear to be syntactically correct.
Crypto-API author here. Please don't do this - it's really a violation of the implicit properties of CryptoRandomGen.
That said, here's how I'd do it: Just make a newtype that wraps your RandomGen and make that newtype an instance of CryptoRandomGen.
newtype AsCRG g = ACRG { unACRG :: g}
instance RandomGen g => CryptoRandomGen (AsCRG g) where
newGen = -- This is not possible to implement with only a 'RandomGen' constraint. Perhaps you want a 'Default' instance too?
genSeedLength = -- This is also not possible from just 'RandomGen'
genBytes nr g =
let (g1,g2) = split g
randInts :: [Word32]
randInts = B.concat . map Data.Serialize.encode
. take ((nr + 3) `div` 4)
$ (randoms g1 :: [Word32])
in (B.take nr randInts, g2)
reseed _ _ = -- not possible w/o more constraints
newGenIO = -- not possible w/o more constraints
So you see, you can split the generator (or manage many intermediate generators), make the right number of Ints (or in my case, Word32s), encode them, and return the bytes.
Because RandomGen is limited to just generation (and splitting), there isn't any straight-forward way to support instatiation, reinstantiation, or querying properties such as the seed length.
As far as I know, this is impossible, unless you're willing to turn on UndecidableInstances (which, of course, can make the typechecker go in an infinite loop). Here's an example that makes every instance of Monad an instance of Functor:
{-# LANGUAGE FlexibleInstances, UndecidableInstances #-}
module Main
where
import Control.Monad (liftM)
instance (Monad a) => Functor a where
fmap = liftM
-- Test code
data MyState a = MyState { unM :: a }
deriving Show
instance Monad MyState where
return a = MyState a
(>>=) m k = k (unM m)
main :: IO ()
main = print . fmap (+ 1) . MyState $ 1
Testing:
*Main> :main
MyState { unM = 2 }
In your case, this translates to:
{-# LANGUAGE FlexibleInstances, UndecidableInstances #-}
instance (RandomGen a) => CryptoRandomGen a where
newGen = ...
genSeedLength = ...
genBytes = ...
reseed = ...
As an aside, I once asked how to implement this without UndecidableInstances on haskell-cafe and got this answer (the same workaround that Thomas proposed; I consider it ugly).

Resources