I want to use a lazy Bytestring to represent a stream of bits. I need to be able to take arbitrary slices of bits from this stream efficiently. For example, I might have a ByteString of length 10, and I'd like slice a new ByteString consisting of bits 24-36 from the original ByteString.
The problem is that ByteStrings are arrays of Word8s, so taking ranges that are not multiples of 8 is difficult. The best I've been able to come up with is this, using Data.Binary and Data.Binary.Bits. Note that get32BitRange is specifically for ranges <= 32.
get32BitRange :: Int -> Int -> ByteString -> ByteString
get32BitRange lo hi = runPut . putWord32be
. runGet (runBitGet . block $ word8 (8 - (lo `quot` 8)) *> word32be len)
. drop offset
where len = hi - lo
lo' = lo `div` 8
offset = fromIntegral lo' - 1
The algorithm is:
find the index of the first Word8 containing the bits I want
drop from the ByteString up to that index
if the low end of the bit range is not a multiple of 8, there will be some excess bits at the beginning of the Word8, so skip those
get (hi - lo) bits, and store in a Word32
put that Word32 into a ByteString
It looks more than a little ugly, is there a more efficient way to grab arbitrary slices of bits from a ByteString?
EDIT: here is a more efficient version
get32BitRange :: Int -> Int -> ByteString -> Word32
get32BitRange lo hi = runGet get
where get = runBitGet . block $ byteString byteOff *> word8 bitOff *> word32be len
len = hi - lo
(byteOff, bitOff) = lo `quotRem` 8
I think other solutions are way better but you can use the Internal module to get at the underlying structure: http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/src/Data-ByteString-Internal.html#ByteString
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
Then you can use standard pointer tools to generate ByteStrings pointing exactly where you want, by manipulating the ForeignPtr directly...
You can't make this efficient with ByteString as your API type, because it doesn't carry the information that the "bits" you want really start at some offset into the first byte.
Best bet is to make a wrapper type:
data BitStream =
BitStream {
info :: ByteString,
-- values from 0-7: ignore all bits in the first byte up to
-- but not including this offset
firstBitOffset :: !Int,to but not including this offset
-- values from 0-7: ignore all bits in the last byte after
-- but not including this offset
lastBitOffset :: !Int
}
Then you can design a bit-based API around this.
I'm going to mark this as resolved. This is what I ended up using:
get32BitRange :: Int -> Int -> ByteString -> Word32
get32BitRange lo hi = assert (lo < hi) $
runGet (runBitGet bitGet)
where bitGet = block $ byteString byteOff
*> word8 bitOff
*> word32be len
len = hi - lo
(byteOff, bitOff) = lo `quotRem` 8
Related
In an ongoing endeavour to efficiently fiddle with bits (e.g. see this SO question) the newest challenge is the efficient streaming and consumption of bits.
As a first simple task I choose to find the longest sequence of identical bits in a bitstream generated by /dev/urandom. A typical incantation would be head -c 1000000 </dev/urandom | my-exe. The actual goal is to stream bits and decode an Elias gamma code, for example, i.e. codes that are not chunks of bytes or multiples thereof.
For such codes of variable length it is nice to have the take, takeWhile, group, etc. language for list manipulation. Since a BitStream.take would actually consume part of the bistream some monad would probably come into play.
The obvious starting point is the lazy bytestring from Data.ByteString.Lazy.
A. Counting bytes
This very simple Haskell program performs on par with a C program, as is to be expected.
import qualified Data.ByteString.Lazy as BSL
main :: IO ()
main = do
bs <- BSL.getContents
print $ BSL.length bs
B. Adding bytes
Once I start using unpack things should get worse.
main = do
bs <- BSL.getContents
print $ sum $ BSL.unpack bs
Suprisingly, Haskell and C show the almost same performance.
C. Longest sequence of identical bits
As a first nontrivial task the longest sequence of identical bits can be found like this:
module Main where
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Lazy as BSL
import Data.List (group)
import Data.Word8 (Word8)
splitByte :: Word8 -> [Bool]
splitByte w = Prelude.map (\i-> (w `shiftR` i) .&. 1 == 1) [0..7]
bitStream :: BSL.ByteString -> [Bool]
bitStream bs = concat $ map splitByte (BSL.unpack bs)
main :: IO ()
main = do
bs <- BSL.getContents
print $ maximum $ length <$> (group $ bitStream bs)
The lazy bytestring is converted to a list [Word8] and then, using shifts, each Word is split into the bits, resulting in a list [Bool]. This list of lists is then flattened with concat. Having obtained a (lazy) list of Bool, use group to split the list into sequences of identical bits and then map length over it. Finally maximum gives the desired result. Quite simple, but not very fast:
# C
real 0m0.606s
# Haskell
real 0m6.062s
This naive implementation is exactly one order of magnitude slower.
Profiling shows that quite a lot of memory gets allocated (about 3GB for parsing 1MB of input). There is no massive space leak to be observed, though.
From here I start poking around:
There is a bitstream package that promises "Fast, packed, strict bit streams (i.e. list of Bools) with semi-automatic stream fusion.". Unfortunately it is not up-to-date with the current vector package, see here for details.
Next, I investigate streaming. I don't quite see why I should need 'effectful' streaming that brings some monad into play - at least until I start with the reverse of the posed task, i.e. encoding and writing bitstreams to file.
How about just fold-ing over the ByteString? I'd have to introduce state to keep track of consumed bits. That's not quite the nice take, takeWhile, group, etc. language that is desirable.
And now I'm not quite sure where to go.
Update:
I figured out how to do this with streaming and streaming-bytestring. I'm probably not doing this right because the result is catastrophically bad.
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Streaming as BSS
import Data.Word8 (Word8)
import qualified Streaming as S
import Streaming.Prelude (Of, Stream)
import qualified Streaming.Prelude as S
splitByte :: Word8 -> [Bool]
splitByte w = (\i-> (w `shiftR` i) .&. 1 == 1) <$> [0..7]
bitStream :: Monad m => Stream (Of Word8) m () -> Stream (Of Bool) m ()
bitStream s = S.concat $ S.map splitByte s
main :: IO ()
main = do
let bs = BSS.unpack BSS.getContents :: Stream (Of Word8) IO ()
gs = S.group $ bitStream bs :: Stream (Stream (Of Bool) IO) IO ()
maxLen <- S.maximum $ S.mapped S.length gs
print $ S.fst' maxLen
This will test your patience with anything beyond a few thousand bytes of input from stdin. The profiler says it spends an insane amount of time (quadratic in the input size) in Streaming.Internal.>>=.loop and Data.Functor.Of.fmap. I'm not quite sure what the first one is, but the fmap indicates (?) that the juggling of these Of a b isn't doing us any good and because we're in the IO monad it can't be optimised away.
I also have the streaming equivalent of the byte adder here: SumBytesStream.hs, which is slightly slower than the simple lazy ByteString implementation, but still decent. Since streaming-bytestring is proclaimed to be "bytestring io done right" I expected better. I'm probably not doing it right, then.
In any case, all these bit-computations shouldn't be happening in the IO monad. But BSS.getContents forces me into the IO monad because getContents :: MonadIO m => ByteString m () and there's no way out.
Update 2
Following the advice of #dfeuer I used the streaming package at master#HEAD. Here's the result.
longest-seq-c 0m0.747s (C)
longest-seq 0m8.190s (Haskell ByteString)
longest-seq-stream 0m13.946s (Haskell streaming-bytestring)
The O(n^2) problem of Streaming.concat is solved, but we're still not getting closer to the C benchmark.
Update 3
Cirdec's solution produces a performance on par with C. The construct that is used is called "Church encoded lists", see this SO answer or the Haskell Wiki on rank-N types.
Source files:
All the source files can be found on github. The Makefile has all the various targets to run the experiments and the profiling. The default make will just build everything (create a bin/ directory first!) and then make time will do the timing on the longest-seq executables. The C executables get a -c appended to distinguish them.
Intermediate allocations and their corresponding overhead can be removed when operations on streams fuse together. The GHC prelude provides foldr/build fusion for lazy streams in the form of rewrite rules. The general idea is if one function produces a result that looks like a foldr (it has the type (a -> b -> b) -> b -> b applied to (:) and []) and another function consumes a list that looks like a foldr, constructing the intermediate list can be removed.
For your problem I'm going to build something similar, but using strict left folds (foldl') instead of foldr. Instead of using rewrite rules that try to detect when something looks like a foldl, I'll use a data type that forces lists to look like left folds.
-- A list encoded as a strict left fold.
newtype ListS a = ListS {build :: forall b. (b -> a -> b) -> b -> b}
Since I've started by abandoning lists we'll be re-implementing part of the prelude for lists.
Strict left folds can be created from the foldl' functions of both lists and bytestrings.
{-# INLINE fromList #-}
fromList :: [a] -> ListS a
fromList l = ListS (\c z -> foldl' c z l)
{-# INLINE fromBS #-}
fromBS :: BSL.ByteString -> ListS Word8
fromBS l = ListS (\c z -> BSL.foldl' c z l)
The simplest example of using one is to find the length of a list.
{-# INLINE length' #-}
length' :: ListS a -> Int
length' l = build l (\z a -> z+1) 0
We can also map and concatenate left folds.
{-# INLINE map' #-}
-- fmap renamed so it can be inlined
map' f l = ListS (\c z -> build l (\z a -> c z (f a)) z)
{-# INLINE concat' #-}
concat' :: ListS (ListS a) -> ListS a
concat' ll = ListS (\c z -> build ll (\z l -> build l c z) z)
For your problem we need to be able to split a word into bits.
{-# INLINE splitByte #-}
splitByte :: Word8 -> [Bool]
splitByte w = Prelude.map (\i-> (w `shiftR` i) .&. 1 == 1) [0..7]
{-# INLINE splitByte' #-}
splitByte' :: Word8 -> ListS Bool
splitByte' = fromList . splitByte
And a ByteString into bits
{-# INLINE bitStream' #-}
bitStream' :: BSL.ByteString -> ListS Bool
bitStream' = concat' . map' splitByte' . fromBS
To find the longest run we'll keep track of the previous value, the length of the current run, and the length of the longest run. We make the fields strict so that the strictness of the fold prevents chains of thunks from being accumulated in memory. Making a strict data type for a state is an easy way to get control over both its memory representation and when its fields are evaluated.
data LongestRun = LongestRun !Bool !Int !Int
{-# INLINE extendRun #-}
extendRun (LongestRun previous run longest) x = LongestRun x current (max current longest)
where
current = if x == previous then run + 1 else 1
{-# INLINE longestRun #-}
longestRun :: ListS Bool -> Int
longestRun l = longest
where
(LongestRun _ _ longest) = build l extendRun (LongestRun False 0 0)
And we're done
main :: IO ()
main = do
bs <- BSL.getContents
print $ longestRun $ bitStream' bs
This is much faster, but not quite the performance of c.
longest-seq-c 0m00.12s (C)
longest-seq 0m08.65s (Haskell ByteString)
longest-seq-fuse 0m00.81s (Haskell ByteString fused)
The program allocates about 1 Mb to read 1000000 bytes from input.
total alloc = 1,173,104 bytes (excludes profiling overheads)
Updated github code
I found another solution that is on par with C. The Data.Vector.Fusion.Stream.Monadic has a stream implementation based on this Coutts, Leshchinskiy, Stewart 2007 paper. The idea behind it is to use a destroy/unfoldr stream fusion.
Recall that list's unfoldr :: (b -> Maybe (a, b)) -> b -> [a] creates a list by repeatedly applying (unfolding) a step-forward function, starting with an initial value. A Stream is just an unfoldr function with starting state. (The Data.Vector.Fusion.Stream.Monadic library uses GADTs to create constructors for Step that can be pattern-matched conveniently. It could just as well be done without GADTs, I think.)
The central piece of the solution is the mkBitstream :: BSL.ByteString -> Stream Bool function that turns a BytesString into a stream of Bool. Basically, we keep track of the current ByteString, the current byte, and how much of the current byte is still unconsumed. Whenever a byte is used up another byte is chopped off ByteString. When Nothing is left, the stream is Done.
The longestRun function is taken straight from #Cirdec's solution.
Here's the etude:
{-# LANGUAGE CPP #-}
#define PHASE_FUSED [1]
#define PHASE_INNER [0]
#define INLINE_FUSED INLINE PHASE_FUSED
#define INLINE_INNER INLINE PHASE_INNER
module Main where
import Control.Monad.Identity (Identity)
import Data.Bits (shiftR, (.&.))
import qualified Data.ByteString.Lazy as BSL
import Data.Functor.Identity (runIdentity)
import qualified Data.Vector.Fusion.Stream.Monadic as S
import Data.Word8 (Word8)
type Stream a = S.Stream Identity a -- no need for any monad, really
data Step = Step BSL.ByteString !Word8 !Word8 -- could use tuples, but this is faster
mkBitstream :: BSL.ByteString -> Stream Bool
mkBitstream bs' = S.Stream step (Step bs' 0 0) where
{-# INLINE_INNER step #-}
step (Step bs w n) | n==0 = case (BSL.uncons bs) of
Nothing -> return S.Done
Just (w', bs') -> return $
S.Yield (w' .&. 1 == 1) (Step bs' (w' `shiftR` 1) 7)
| otherwise = return $
S.Yield (w .&. 1 == 1) (Step bs (w `shiftR` 1) (n-1))
data LongestRun = LongestRun !Bool !Int !Int
{-# INLINE extendRun #-}
extendRun :: LongestRun -> Bool -> LongestRun
extendRun (LongestRun previous run longest) x = LongestRun x current (max current longest)
where current = if x == previous then run + 1 else 1
{-# INLINE longestRun #-}
longestRun :: Stream Bool -> Int
longestRun s = runIdentity $ do
(LongestRun _ _ longest) <- S.foldl' extendRun (LongestRun False 0 0) s
return longest
main :: IO ()
main = do
bs <- BSL.getContents
print $ longestRun (mkBitstream bs)
I'm thinking about a FFI calling some C functions from Haskell.
If a memory buffer is used to hold some data and is allocated "manually" and then it is used in Haskell computations, can I somehow rely on the garbage collector to free it when it is not needed anymore.
As for the manual allocations, there are basically two ways (but the difference doesn't seem to be essential for my question):
allocating a buffer in Haskell, then passing it to C function, like in fdRead
allocating a buffer in C (with malloc, like in GNU's asprintf), then returning the pointer to Haskell
In both examples (fdRead or asprintf) there is also a problem that the data type stored in the buffer is not suitable for a Haskell program, therefore it is copied&converted to be used in Haskell (with peekCString). (I'll put the code below.) After the copying&conversion happens, the buffer is freed (in both cases).
However, I'm thinking about a more efficient interface, where the Haskell would directly use the data as it is stored by a C function (without a conversion). (I haven't yet explored, say, alternative implementations of String and related functions: whether there is one among them which can work directly with some kind of C strings.)
If I follow this route, then there is one global problem: how to control the disposal of the allocated buffers. (For side-effects-free functions--except for the allocation--I could even wrap the calls in unsafePerformIO or declare them so that they are not an IO.)
Examples with conversion and immediate freeing
allocating in Haskell:
fdRead (here allocaBytes must care for the freeing):
-- -----------------------------------------------------------------------------
-- fd{Read,Write}
-- | Read data from an 'Fd' and convert it to a 'String' using the locale encoding.
-- Throws an exception if this is an invalid descriptor, or EOF has been
-- reached.
fdRead :: Fd
-> ByteCount -- ^How many bytes to read
-> IO (String, ByteCount) -- ^The bytes read, how many bytes were read.
fdRead _fd 0 = return ("", 0)
fdRead fd nbytes = do
allocaBytes (fromIntegral nbytes) $ \ buf -> do
rc <- fdReadBuf fd buf nbytes
case rc of
0 -> ioError (ioeSetErrorString (mkIOError EOF "fdRead" Nothing Nothing) "EOF")
n -> do
s <- peekCStringLen (castPtr buf, fromIntegral n)
return (s, n)
-- | Read data from an 'Fd' into memory. This is exactly equivalent
-- to the POSIX #read# function.
fdReadBuf :: Fd
-> Ptr Word8 -- ^ Memory in which to put the data
-> ByteCount -- ^ Maximum number of bytes to read
-> IO ByteCount -- ^ Number of bytes read (zero for EOF)
fdReadBuf _fd _buf 0 = return 0
fdReadBuf fd buf nbytes =
fmap fromIntegral $
throwErrnoIfMinus1Retry "fdReadBuf" $
c_safe_read (fromIntegral fd) (castPtr buf) nbytes
foreign import ccall safe "read"
c_safe_read :: CInt -> Ptr CChar -> CSize -> IO CSsize
allocating in C
getValue.c:
#define _GNU_SOURCE
#include <stdio.h>
#include "getValue.h"
char * getValue(int key) {
char * value;
asprintf(&value, "%d", key); // TODO: No error handling!
// If memory allocation wasn't possible, or some other error occurs, these functions will
// return -1, and the contents of strp is undefined.
return value;
}
GetValue.hs (here I explicitly call free after the conversion is actually done):
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign hiding (unsafePerformIO)
import Foreign.Ptr
import Foreign.C.Types
import Foreign.C.String(peekCString)
import System.IO.Unsafe
getValue :: Int -> IO String
getValue key = do
valptr <- c_safe_getValue (fromIntegral key)
value <- peekCString valptr
c_safe_free valptr
return value
foreign import ccall safe "getValue.h getValue" c_safe_getValue :: CInt -> IO (Ptr CChar)
foreign import ccall safe "stdlib.h free" c_safe_free :: Ptr a -> IO ()
value :: Int -> String
value = unsafePerformIO . getValue -- getValue has no side-effects, so we wrap it.
{- A simple test: -}
main1 = putStrLn (value 2)
{- A test with an infinite list, which employs laziness: -}
keys = [-5..]
results = map value keys
main = foldr (>>)
(return ())
(map putStrLn (take 20 results))
If there wasn't the (ineffective) conversion©ing step, I would need to rely on garbage collector for freeing, but have no idea how to define such things in Haskell.
The ForeignPtr type acts as a Ptr with an attached finalizer. When the ForeignPtr gets garbage collected, the finalizer is run, and can call the C side to free the pointer using the proper function.
Since the pointer is no longer accessible from Haskell, this is typically the right moment to free it.
Hi have binaries of float data (single-precision 32-bit IEEE) that I would like to work on.
How can I best load this for further use, ideally as (IOArray Int Float).
bytesToFloats :: ByteString -> [Float]
bytesToFloatArray :: ByteString -> IOArray Int Float
If you've got bog standard single-precision floats, and you just want to work them over in Haskell, you can always be down and dirty about it:
import Data.ByteString.Internal as BS
import qualified Data.Vector.Storable as V
bytesToFloats :: BS.ByteString -> V.Vector Float
bytesToFloats = V.unsafeCast . aux . BS.toForeignPtr
where aux (fp,offset,len) = V.unsafeFromForeignPtr fp offset len
I think you might be happier with Data.Vector:
http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Vector_Tutorial#Parsing_Binary_Data
You could also use cereal library, for example:
import Control.Applicative
import Data.ByteString
import Data.Serialize
floatsToBytes :: [Float] -> ByteString
floatsToBytes = runPut . mapM_ putFloat32le
-- | Parses the input and returns either the result or an error description.
bytesToFloat :: ByteString -> Either String [Float]
bytesToFloat = runGet $ many getFloat32le
If you can convert 4 bytes to a Word32, you can use the function wordToFloat in the data-binary-ieee754 package to convert it to a float. You could then load this into any kind of list-like structure you want to manipulate it.
What is the best way to convert between Storable.Vector Word8 and a strict ByteString?
Of course a non-copying (no-op) way would be most appreciated.
Should I just unsafeCoerce or is there a library function for that (I couldn't find one)?
Also, will the approach be the same for an Unboxed.Vector Word8?
A simple unsafeCoerce will not work, as the layout of the data constructors is different:
data StorableArray i e = StorableArray !i !i Int !(ForeignPtr e)
vs.
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
You can import Data.Array.Storable.Internals and Data.ByteString.Internal to get access to the raw constructors and then construct one out of the other without copying the data:
> let bs = pack [1,2,3]
> bs
"\SOH\STX\ETX"
> let sa = case bs of (PS ptr 0 n) -> StorableArray 0 (n-1) n ptr
> :t sa
sa :: StorableArray Int GHC.Word.Word8
> Data.Array.MArray.readArray sa 1
2
> Data.Array.MArray.readArray sa 0
1
> Data.Array.MArray.readArray sa 3
*** Exception: Ix{Int}.index: Index (3) out of range ((0,2))
(I removed the rather long prompt of Prelude Data.Array.Storable.Internals Data.ByteString.Internal Data.ByteString>).
This will not work for Data.Vector.Unboxed, because here the data is on the Haskell heap and managed by the GHC runtime, while the other two manage the data outside the Haskell heap.
byteStringToVector :: (Storable a) => BS.ByteString -> V.Vector a
byteStringToVector bs = vec where
vec = V.unsafeFromForeignPtr (castForeignPtr fptr) (scale off) (scale len)
(fptr, off, len) = BS.toForeignPtr bs
scale = (`div` sizeOfElem vec)
sizeOfElem vec = sizeOf (undefined `asTypeOf` V.head vec)
http://hackage.haskell.org/packages/archive/spool/0.1/doc/html/Data-Vector-Storable-ByteString.html
I need to read a binary format in Haskell. The format is fairly simple: four octets indicating the length of the data, followed by the data. The four octets represent an integer in network byte-order.
How can I convert a ByteString of four bytes to an integer? I want a direct cast (in C, that would be *(int*)&data), not a lexicographical conversion. Also, how would I go about endianness? The serialized integer is in network byte-order, but the machine may use a different byte-order.
I tried Googling but that only yold results about lexicographical conversion.
The binary package contains tools to get integer types of various sizes and endianness from ByteStrings.
λ> :set -XOverloadedStrings
λ> import qualified Data.Binary.Get as B
λ> B.runGet B.getWord32be "\STX\SOH\SOH\SOH"
33620225
λ> B.runGet B.getWord32be "\STX\SOH\SOH\SOHtrailing characters are ignored"
33620225
λ> B.runGet B.getWord32be "\STX\SOH\SOH" -- remember to use `catch`:
*** Exception: Data.Binary.Get.runGet at position 0: not enough bytes
CallStack (from HasCallStack):
error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.5.1:Data.Binary.Get
I assume you can use a fold, and then use either foldl or foldr to determine which endian you want (I forget which is which).
foldl :: (a -> Word8 -> a) -> a -> ByteString -> a
I think this will work for the binary operator:
foo :: Int -> Word8 -> Int
foo prev v = (prev * 256) + v
I'd just extract the first four bytes and merge them into a single 32bit integer using the functions in Data.Bits:
import qualified Data.ByteString.Char8 as B
import Data.Char (chr, ord)
import Data.Bits (shift, (.|.))
import Data.Int (Int32)
readInt :: B.ByteString -> Int32
readInt bs = (byte 0 `shift` 24)
.|. (byte 1 `shift` 16)
.|. (byte 2 `shift` 8)
.|. byte 3
where byte n = fromIntegral $ ord (bs `B.index` n)
sample = B.pack $ map chr [0x01, 0x02, 0x03, 0x04]
main = print $ readInt sample -- prints 16909060