Is there any hope to cast ForeignPtr to ByteArray# (for a function :: ByteString -> Vector) - haskell

For performance reasons I would like a zero-copy cast of ByteString (strict, for now) to a Vector. Since Vector is just a ByteArray# under the hood, and ByteString is a ForeignPtr this might look something like:
caseBStoVector :: ByteString -> Vector a
caseBStoVector (BS fptr off len) =
withForeignPtr fptr $ \ptr -> do
let ptr' = plusPtr ptr off
p = alignPtr ptr' (alignment (undefined :: a))
barr = ptrToByteArray# p len -- I want this function, or something similar
barr' = ByteArray barr
alignI = minusPtr p ptr
size = (len-alignI) `div` sizeOf (undefined :: a)
return (Vector 0 size barr')
That certainly isn't right. Even with the missing function ptrToByteArray# this seems to need to escape the ptr outside of the withForeignPtr scope. So my quesetions are:
This post probably advertises my primitive understanding of ByteArray#, if anyone can talk a bit about ByteArray#, it's representation, how it is managed (GCed), etc I'd be grateful.
The fact that ByteArray# lives on the GCed heap and ForeignPtr is external seems to be a fundamental issue - all the access operations are different. Perhaps I should look at redefining Vector from = ByteArray !Int !Int to something with another indirection? Someing like = Location !Int !Int where data Location = LocBA ByteArray | LocFPtr ForeignPtr and provide wrapping operations for both those types? This indirection might hurt performance too much though.
Failing to marry these two together, maybe I can just access arbitrary element types in a ForeignPtr in a more efficient manner. Does anyone know of a library that treats ForeignPtr (or ByteString) as an array of arbitrary Storable or Primitive types? This would still lose me the stream fusion and tuning from the Vector package.

Disclaimer: everything here is an implementation detail and specific to GHC and the internal representations of the libraries in question at the time of posting.
This response is a couple years after the fact, but it is indeed possible to get a pointer to bytearray contents. It's problematic as the GC likes to move data in the heap around, and things outside of the GC heap can leak, which isn't necessarily ideal. GHC solves this with:
newPinnedByteArray# :: Int# -> State# s -> (#State# s, MutableByteArray# s#)
Primitive bytearrays (internally typedef'd C char arrays) can be statically pinned to an address. The GC guarantees not to move them. You can convert a bytearray reference to a pointer with this function:
byteArrayContents# :: ByteArray# -> Addr#
The address type forms the basis of Ptr and ForeignPtr types. Ptrs are addresses marked with a phantom type and ForeignPtrs are that plus optional references to GHC memory and IORef finalizers.
Disclaimer: This will only work if your ByteString was built Haskell. Otherwise, you can't get a reference to the bytearray. You cannot dereference an arbitrary addr. Don't try to cast or coerce your way to a bytearray; that way lies segfaults. Example:
{-# LANGUAGE MagicHash, UnboxedTuples #-}
import GHC.IO
import GHC.Prim
import GHC.Types
main :: IO()
main = test
test :: IO () -- Create the test array.
test = IO $ \s0 -> case newPinnedByteArray# 8# s0 of {(# s1, mbarr# #) ->
-- Write something and read it back as baseline.
case writeInt64Array# mbarr# 0# 1# s1 of {s2 ->
case readInt64Array# mbarr# 0# s2 of {(# s3, x# #) ->
-- Print it. Should match what was written.
case unIO (print (I# x#)) s3 of {(# s4, _ #) ->
-- Convert bytearray to pointer.
case byteArrayContents# (unsafeCoerce# mbarr#) of {addr# ->
-- Dereference the pointer.
case readInt64OffAddr# addr# 0# s4 of {(# s5, x'# #) ->
-- Print what's read. Should match the above.
case unIO (print (I# x'#)) s5 of {(# s6, _ #) ->
-- Coerce the pointer into an array and try to read.
case readInt64Array# (unsafeCoerce# addr#) 0# s6 of {(# s7, y# #) ->
-- Haskell is not C. Arrays are not pointers.
-- This won't match. It might segfault. At best, it's garbage.
case unIO (print (I# y#)) s7 of (# s8, _ #) -> (# s8, () #)}}}}}}}}
Output:
1
1
(some garbage value)
To get the bytearray from a ByteString, you need to import the constructor from Data.ByteString.Internal and pattern match.
data ByteString = PS !(ForeignPtr Word8) !Int !Int
(\(PS foreignPointer offset length) -> foreignPointer)
Now we need to rip the goods out of the ForeignPtr. This part is entirely implementation-specific. For GHC, import from GHC.ForeignPtr.
data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents
(\(ForeignPtr addr# foreignPointerContents) -> foreignPointerContents)
data ForeignPtrContents = PlainForeignPtr !(IORef (Finalizers, [IO ()]))
| MallocPtr (MutableByteArray# RealWorld) !(IORef (Finalizers, [IO ()]))
| PlainPtr (MutableByteArray# RealWorld)
In GHC, ByteString is built with PlainPtrs which are wrapped around pinned byte arrays. They carry no finalizers. They are GC'd like regular Haskell data when they fall out of scope. Addrs don't count, though. GHC assumes they point to things outside of the GC heap. If the bytearray itself falls out of the scope, you're left with a dangling pointer.
data PlainPtr = (MutableByteArray# RealWorld)
(\(PlainPtr mutableByteArray#) -> mutableByteArray#)
MutableByteArrays are identical to ByteArrays. If you want true zero-copy construction, make sure you either unsafeCoerce# or unsafeFreeze# to a bytearray. Otherwise, GHC creates a duplicate.
mbarrTobarr :: MutableByteArray# s -> ByteArray#
mbarrTobarr = unsafeCoerce#
And now you have the raw contents of the ByteString ready to be turned into a vector.
Best Wishes,

You might be able to hack together something :: ForeignPtr -> Maybe ByteArray#, but there is nothing you can do in general.
You should look at the Data.Vector.Storable module. It includes a function unsafeFromForeignPtr :: ForeignPtr a -> Int -> Int -> Vector a. It sounds like what you want.
There is also a Data.Vector.Storable.Mutable variant.

Related

Copying GHC ByteArray# to Ptr

I am trying to write the following function:
memcpyByteArrayToPtr ::
ByteArray# -- ^ source
-> Int -- ^ start
-> Int -- ^ length
-> Ptr a -- ^ destination
-> IO ()
The behavior should be to internally use memcpy to copy the contents of a ByteArray# to the Ptr. There are two techniques I have seen for doing something like this, but it's difficult for me to reason about their safety.
The first is found in the memory package. There is an auxiliary function withPtr defined as:
data Bytes = Bytes (MutableByteArray# RealWorld)
withPtr :: Bytes -> (Ptr p -> IO a) -> IO a
withPtr b#(Bytes mba) f = do
a <- f (Ptr (byteArrayContents# (unsafeCoerce# mba)))
touchBytes b
return a
But, I'm pretty sure that this is only safe because the only way to construct Bytes is by using a smart constructor that calls newAlignedPinnedByteArray#. An answer given to a similar question and the docs for byteArrayContents# indicate that it is only safe when dealing with pinned ByteArray#s. In my situation, I'm dealing with the ByteArray#s that the text library uses internally, and they are not pinned, so I believe this would be unsafe.
The second possibility I've stumbled across is in text itself. At the bottom of the Data.Text.Array source code, there is an ffi function memcpyI:
foreign import ccall unsafe "_hs_text_memcpy" memcpyI
:: MutableByteArray# s -> CSize -> ByteArray# -> CSize -> CSize -> IO ()
This is backed by the following c code:
void _hs_text_memcpy(void *dest, size_t doff, const void *src, size_t soff, size_t n)
{
memcpy(dest + (doff<<1), src + (soff<<1), n<<1);
}
Because its a part of text, I trust that this is safe. It looks like it's dangerous because is that it's getting a memory location from an unpinned ByteArray#, the very thing that the byteArrayContents# documentation warns against. I suspect that it's ok because the ffi call is marked as unsafe, which I think prevents the GC from moving the ByteArray# during the ffi call.
That's the research I've done far. So far, my best guess is that I can just copy what's been done in text. The big difference would be that, instead of passing in MutableByteArray# and ByteArray# as the two pointers, I would be passing in ByteArray# and Ptr a (or maybe Addr#, I'm not sure which of those you typically use with the ffi).
Is what I have suggested safe? Is there a better way that would allow me to avoid using the ffi? Is there something in base that does this? Feel free to correct any incorrect assumptions I've made, and thanks for any suggestions or guidance.
copyByteArrayToAddr# :: ByteArray# -> Int# -> Addr# -> Int# -> State# s -> State# s
looks like the right primop. You just need to be sure not to try to copy it into memory it occupies. So you should probably be safe with
copyByteArrayToPtr :: ByteArray# -> Int -> Ptr a -> Int -> ST s ()
copyByteArrayToPtr ba (I# x) (Ptr p) (I# y) = ST $ \ s ->
(# copyByteArrayToAddr# ba x p y s, () #)
Unfortunately, the documentation gives me no clue what each Int# is supposed to mean, but I imagine you can figure that out through trial and segfault.

Haskell vector C++ push_back analogue

I've discovered that Haskell Data.Vector.* miss C++ std::vector::push_back's functionality. There is grow/unsafeGrow, but they seem to have O(n) complexity.
Is there a way to grow vectors in O(1) amortized time for an element?
No there really is no such facility in Data.Vector. It isn't too difficult to implement this from scratch using MutableArray like Data.Vector.Mutable does (see my implementation below), but there are some significant drawbacks. In particular, all of its operations end up happening inside some state context usually ST or IO. This has the downsides that
Any code that manipulates such a data structure ends up having to be monadic
The compiler is much less likely to be able to optimize. For example, libraries like vector use something really clever called fusion to optimize away intermediate allocations. This sort of thing is not possible in a state context.
Parallelism is going to be a lot tougher: in ST I can't even have two threads and in IO I will have race conditions all over the place. The nasty bit here is that any sharing is going to have to happen in IO.
As if all this wasn't enough, garbage collection also performs better inside pure code.
What do I do then?
It isn't particularly often that you have a need for exactly this behaviour - usually you are better off using an immutable data structure (thereby avoiding all of the aforementioned problems) which does something similar. Just limiting ourselves to containers which comes with GHC, some alternatives include:
if you are almost always just using push_back, maybe you just want a stack (a plain old [a]).
if you anticipate doing more push_back than lookups, Data.Sequence gives you O(1) appending to either end and O(log n) lookup.
if you are interested in a lot of operations especially hashmap-like, Data.IntMap is pretty optimized. Even if the theoretical cost of those operations is O(log n), you will need a pretty big IntMap to start feeling those costs.
Making something like C++ vector
Of course, if one doesn't care about the restrictions mentioned initially, there is no reason not to have a C++ like vector. Just for fun, I went ahead and implemented this from scratch (needs packages data-default and primitive).
The reason this code is probably not already in some library is that it goes against much of the spirit of Haskell (I do this with the intent of conforming to a C++ style vector).
The only operation that actually makes a new vector is newVector - everything else "modifies" an existing vector. Since pushBack doesn't return a new GrowVector, it has to modify the existing one (including its length and/or capacity), so length and capacity have to be "pointers". In turn, that means that even getting the length is a monadic operation.
While this isn't unboxed, it would not be too difficult to replicate vectors data family approach - it is just tedious1.
With that said:
module GrowVector (
GrowVector, newEmpty, size, read, write, pushBack, popBack
) where
import Data.Primitive.Array
import Data.Primitive.MutVar
import Data.Default
import Control.Monad
import Control.Monad.Primitive (PrimState, PrimMonad)
import Prelude hiding (length, read)
data GrowVector s a = GrowVector
{ underlying :: MutVar s (MutableArray s a) -- ^ underlying array
, length :: MutVar s Int -- ^ perceived length of vector
, capacity :: MutVar s Int -- ^ actual capacity
}
type GrowVectorIO = GrowVector (PrimState IO)
-- | Make a new empty vector with the given capacity. O(n)
newEmpty :: (Default a, PrimMonad m) => Int -> m (GrowVector (PrimState m) a)
newEmpty cap = do
arr <- newArray cap def
GrowVector <$> newMutVar arr <*> newMutVar 0 <*> newMutVar cap
-- | Read an element in the vector (unchecked). O(1)
read :: PrimMonad m => GrowVector (PrimState m) a -> Int -> m a
g `read` i = do arr <- readMutVar (underlying g); arr `readArray` i
-- | Find the size of the vector. O(1)
size :: PrimMonad m => GrowVector (PrimState m) a -> m Int
size g = readMutVar (length g)
-- | Double the vector capacity. O(n)
resize :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> m ()
resize g = do
curCap <- readMutVar (capacity g) -- read current capacity
curArr <- readMutVar (underlying g) -- read current array
curLen <- readMutVar (length g) -- read current length
newArr <- newArray (2 * curCap) def -- allocate a new array twice as big
copyMutableArray newArr 1 curArr 1 curLen -- copy the old array over
underlying g `writeMutVar` newArr -- use the new array in the vector
capacity g `modifyMutVar'` (*2) -- update the capacity in the vector
-- | Write an element to the array (unchecked). O(1)
write :: PrimMonad m => GrowVector (PrimState m) a -> Int -> a -> m ()
write g i x = do arr <- readMutVar (underlying g); writeArray arr i x
-- | Pop an element of the vector, mutating it (unchecked). O(1)
popBack :: PrimMonad m => GrowVector (PrimState m) a -> m a
popBack g = do
s <- size g;
x <- g `read` (s - 1)
length g `modifyMutVar'` (+ negate 1)
pure x
-- | Push an element. (Amortized) O(1)
pushBack :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> a -> m ()
pushBack g x = do
s <- readMutVar (length g) -- read current size
c <- readMutVar (capacity g) -- read current capacity
when (s+1 == c) (resize g) -- if need be, resize
write g (s+1) x -- write to the back of the array
length g `modifyMutVar'` (+1) -- increase te length
Current semantics of grow
I think the github issue does a pretty good job of explaining the semantics:
I think the intended semantics are that it may do a realloc, but not guaranteed to, and all the current implementations do the simpler copying semantics because for on heap allocations the cost should be roughly the same.
Basically you should use grow when you want a new mutable vector of an increased size, starting with the elements of the old vector (and no longer care about the old vector). This is quite useful - for example one could implement GrowVector using MVector and grow.
1 the approach is that for every new type of unboxed vector you want to have, you make a data instance that "expands" your type into a fixed number of unboxed arrays (or other unboxed vectors). This is the point of data family - to allow different instantiations of a type to have totally different runtime representations, and to also be extensible (you can add your own data instance if you want).

Add with carry on Word8

I can't find a function addWithCarry :: Word8 -> Word8 -> (Word8, Bool) already defined in base. The only function documented as caring about carries seems to be addIntC# in GHC.Prim but it seems to never be pushed upwards through the various abstraction layers.
I could obviously roll out my own by testing whether the output value is in range and it's in fact what I am currently doing but I'd rather reuse an (potentially more efficient) already defined one.
Is there such a thing?
If you look at the source for Word8's Num instance, you'll see that everything is done by converting to a Word# unboxed value and performing operations on that, and then narrowing down to a 8-bit value. I suspected that doing comparison on that Word# value would be more efficient, so I implemented such a thing. It's available on lpaste (which I find easier to read than StackOverflow).
Note that it includes both a test suite and Criterion benchmark. On my system, all of the various tests take ~31ns for the boxed version (user5402's implementation) and ~24ns for the primops versions.
The important function from the lpaste above is primops, which is:
primops :: Word8 -> Word8 -> (Word8, Bool)
primops (W8# x#) (W8# y#) =
(W8# (narrow8Word# z#), isTrue# (gtWord# z# 255##))
where
z# = plusWord# x# y#
A way to do this is:
addWithCarry :: Word8 -> Word8 -> (Word8, Bool)
addWithCarry x y = (z, carry)
where z = x + y
carry = z < x

What are hashes (#) used for in the library's source?

I was trying to figure out how mVars work, and I came across this bit of code:
-- |Create an 'MVar' which is initially empty.
newEmptyMVar :: IO (MVar a)
newEmptyMVar = IO $ \ s# ->
case newMVar# s# of
(# s2#, svar# #) -> (# s2#, MVar svar# #)
Besides being confusingly mutually recursive with newMVar, it's also littered with hashs (#).
Between the two, I can't figure out how it works. I know that this is basically just a pseudo-constructor for mVar, but the rest of the module (most of the library actually) contains them, and I can't find anything on them. Googling "Haskell hashs" didn't yield anything relevant.
They're (literally) magic hashes. They distinguish GHC's primitive's like addition, unboxed types, and unboxed tuples. You can enable writing them with
{-# LANGUAGE MagicHash #-}
Now you can import the stubs that let you use them with
import GHC.Exts
unboxed :: Int# -> Int# -> Int#
unboxed a# b# = a# +# b#
boxed :: Int -> Int -> Int
boxed (I# a#) (I# b#) = I# (unboxed a# b#)
This actually is kinda nifty when you think about it, by wrapping the magical and strict primitives like this, we can handle lazy Ints and Chars uniformly at the runtime system level.
Because primitives are not boxed, they're segregated at the kind level. This means that Int# doesn't have the kind * like normal types, which also means something like
kindClash :: Int# -> Int#
kindClash = id -- id expects boxed types
Won't compile.
To further elaborate on your code, newMVar includes a call to a compiler primitive in GHC to allocate a new mutable variable. It's not mutually recursive so much as a thin wrapper over a compiler call. There's also some darkness gathering at the corners of this function since we're treating IO as a perverse state monad, but let's not look to closely at that. I like my sanity too much.
I don't use primitives in everyday code, nor should you. They come up when implementing crazy optimized hotspots, or near primitive abstractions like what you're looking at.

Passing list of different typed elements to a C function

I have a function written in C I’d like to call from a Haskell program. The function type is:
foo :: Int -> Ptr a -> IO ()
It takes a size and a pointer on whatever and puts the whole thing somewhere in memory. It’s intended to be used with mixed types. You can put n floats then m bools and so on (in C).
The most convenient way to represent such a situation in Haskell would be – in my opinion – something like ([a],[b]) for instance. But, I need the whole thing to fit in a Ptr a (it’s actually a void* in C). I can try to write a function like ([a],[b]) -> Ptr c, but I need some help around it. The desired final function would be:
withArrayLen magicArray foo
Things that can be stored in memory are instances of type class Storable (in Foreign.Storable). So, given the raw FFI prototype
foreign import "foo" c_foo :: CInt -> Ptr a -> IO ()
you could write something like this for homogenous lists:
homfoo :: Storable a => [a] -> IO ()
homfoo items = withArray items $ \ptr -> c_foo (fromIntegral len) ptr
where len = length items * sizeOf (head items)
But you've said the function is intended to work with mixed types, so we need some kind of type-constrained heterogeneous list for the nice Haskell wrapper. Here is one way to do this:
{-# LANGUAGE GADTs #-}
data DynStorable where
MkStorable :: Storable a => a -> DynStorable
foo :: [DynStorable] -> IO ()
foo items =
let (requiredSize, offsets) = mapAccumL sizeFold 0 items in
allocaBytes requiredSize $ \ptr -> do
zipWithM
(\offset (MkStorable x) -> pokeByteOff ptr offset x)
offsets items
c_foo (fromIntegral requiredSize) ptr
where
sizeFold offset (MkStorable x) =
let unalignment = offset `mod` alignment x
offset' = if unalignment /= 0
then offset + alignment x - unalignment
else offset
in (offset' + sizeOf x, offset')
main :: IO ()
main = do
foo [MkStorable (2 :: Int), MkStorable (3.0 :: Double), MkStorable True]
C function has no means to distinguish item boundaries in the received chunk of data, but it wouldn't be hard to include length prefixes or type codes if required.

Resources