Copying GHC ByteArray# to Ptr - haskell

I am trying to write the following function:
memcpyByteArrayToPtr ::
ByteArray# -- ^ source
-> Int -- ^ start
-> Int -- ^ length
-> Ptr a -- ^ destination
-> IO ()
The behavior should be to internally use memcpy to copy the contents of a ByteArray# to the Ptr. There are two techniques I have seen for doing something like this, but it's difficult for me to reason about their safety.
The first is found in the memory package. There is an auxiliary function withPtr defined as:
data Bytes = Bytes (MutableByteArray# RealWorld)
withPtr :: Bytes -> (Ptr p -> IO a) -> IO a
withPtr b#(Bytes mba) f = do
a <- f (Ptr (byteArrayContents# (unsafeCoerce# mba)))
touchBytes b
return a
But, I'm pretty sure that this is only safe because the only way to construct Bytes is by using a smart constructor that calls newAlignedPinnedByteArray#. An answer given to a similar question and the docs for byteArrayContents# indicate that it is only safe when dealing with pinned ByteArray#s. In my situation, I'm dealing with the ByteArray#s that the text library uses internally, and they are not pinned, so I believe this would be unsafe.
The second possibility I've stumbled across is in text itself. At the bottom of the Data.Text.Array source code, there is an ffi function memcpyI:
foreign import ccall unsafe "_hs_text_memcpy" memcpyI
:: MutableByteArray# s -> CSize -> ByteArray# -> CSize -> CSize -> IO ()
This is backed by the following c code:
void _hs_text_memcpy(void *dest, size_t doff, const void *src, size_t soff, size_t n)
{
memcpy(dest + (doff<<1), src + (soff<<1), n<<1);
}
Because its a part of text, I trust that this is safe. It looks like it's dangerous because is that it's getting a memory location from an unpinned ByteArray#, the very thing that the byteArrayContents# documentation warns against. I suspect that it's ok because the ffi call is marked as unsafe, which I think prevents the GC from moving the ByteArray# during the ffi call.
That's the research I've done far. So far, my best guess is that I can just copy what's been done in text. The big difference would be that, instead of passing in MutableByteArray# and ByteArray# as the two pointers, I would be passing in ByteArray# and Ptr a (or maybe Addr#, I'm not sure which of those you typically use with the ffi).
Is what I have suggested safe? Is there a better way that would allow me to avoid using the ffi? Is there something in base that does this? Feel free to correct any incorrect assumptions I've made, and thanks for any suggestions or guidance.

copyByteArrayToAddr# :: ByteArray# -> Int# -> Addr# -> Int# -> State# s -> State# s
looks like the right primop. You just need to be sure not to try to copy it into memory it occupies. So you should probably be safe with
copyByteArrayToPtr :: ByteArray# -> Int -> Ptr a -> Int -> ST s ()
copyByteArrayToPtr ba (I# x) (Ptr p) (I# y) = ST $ \ s ->
(# copyByteArrayToAddr# ba x p y s, () #)
Unfortunately, the documentation gives me no clue what each Int# is supposed to mean, but I imagine you can figure that out through trial and segfault.

Related

FFI: returning three Int#'s on the stack

Is there any way to return three Int# on the stack from a foreign function call? Here is the C code (or C code that generates an object file equivalent to the one I'm actually interested in):
struct Tuple {
int bezout1, bezout2, gcd;
}
struct Tuple extendedGcd(int a, int b) {
/* elided */
}
This won't compile since (# Int#, Int#, Int# #) isn't supported by ccall:
foreign import ccall "extendedGcd"
extendedGcd :: Int# -> Int# -> (# Int#, Int#, Int# #)
All my uses of the following (which compiles, albeit with MagicHash, GHCForeignImportPrim, ForeignFunctionInterface, UnboxedTuples, and MagicHash) seg-fault:
foreign import prim "extendedGcd"
extendedGcd :: Int# -> Int# -> (# Int#, Int#, Int# #)
I can find some evidence that there was some effort to get around this sort of problem:
thread on the mailing list
a proposed (but abandoned) CStructures GHC extension (by the same person)
I imagine this is a common issue, yet I can't find any working examples. Is this at all possible? If it isn't does anyone care about this anymore (is there a Trac ticket somewhere?)
If you can implement your procedure in assembly directly, rather than C, it's pretty easy to do this. See: http://brandon.si/code/almost-inline-asm-in-haskell-with-foreign-import-prim/ . Maybe for a GCD algorithm this will be okay for you.
There is a complete cabal project I put together here.

Using alloca- with c2hs

Consider from the c2hs documentation that this:
{#fun notebook_query_tab_label_packing as ^
`(NotebookClass nb, WidgetClass cld)' =>
{notebook `nb' ,
widget `cld' ,
alloca- `Bool' peekBool*,
alloca- `Bool' peekBool*,
alloca- `PackType' peekEnum*} -> `()'#}
generates in Haskell
notebookQueryTabLabelPacking :: (NotebookClass nb, WidgetClass cld)
=> nb -> cld -> IO (Bool, Bool, PackType)
which binds the following C function:
void gtk_notebook_query_tab_label_packing (GtkNotebook *notebook,
GtkWidget *child,
gboolean *expand,
gboolean *fill,
GtkPackType *pack_type);
Problem: Here I'm confused what the effect of alloca- has on the left side of, i.e., `Bool'.
Now I know that, in practice, what is happening is that alloca is somehow generating a Ptr Bool that peekBool can convert into an output argument. But what I'm terribly confused about is how alloca is doing this, given its type signature alloca :: Storable a => (Ptr a -> IO b) -> IO b. More specifically:
Question 1. When someone calls notebookQueryTabLabelPacking in Haskell, what does c2hs provide as argument for alloca's first parameter (Ptr a -> IO b)?
Question 2. In this case, what is the concrete type signature for alloca's first paramater (Ptr a -> IO b)? Is it (Ptr CBool -> IO CBool)?
c2hs will call alloca 3 times to allocate space for the 3 reference arguments. Each call will be provided a lambda that binds the pointer to a name, and returns an IO action that either allocates the space for the next reference argument, or, when all space has been allocated, calls the underlying function, reads the values of the reference arguments, and packages them into a tuple. Something like:
notebookQueryTabLabelPacking nb cld =
alloca $ \a3 -> do
alloca $ \a4 -> do
alloca $ \a5 -> do
gtk_notebookQueryTabLabelPacking nb cld a3 a4 a5
a3' <- peekRep a3
a4' <- peekRep a4
a5' <- peekRep a5
return (a3', a4', a5')
Each alloca is nested in the IO action of the previous one.
Note that the lambdas all take a pointer to their allocated storage, and return an IO (Bool, Bool, PackType).

Add with carry on Word8

I can't find a function addWithCarry :: Word8 -> Word8 -> (Word8, Bool) already defined in base. The only function documented as caring about carries seems to be addIntC# in GHC.Prim but it seems to never be pushed upwards through the various abstraction layers.
I could obviously roll out my own by testing whether the output value is in range and it's in fact what I am currently doing but I'd rather reuse an (potentially more efficient) already defined one.
Is there such a thing?
If you look at the source for Word8's Num instance, you'll see that everything is done by converting to a Word# unboxed value and performing operations on that, and then narrowing down to a 8-bit value. I suspected that doing comparison on that Word# value would be more efficient, so I implemented such a thing. It's available on lpaste (which I find easier to read than StackOverflow).
Note that it includes both a test suite and Criterion benchmark. On my system, all of the various tests take ~31ns for the boxed version (user5402's implementation) and ~24ns for the primops versions.
The important function from the lpaste above is primops, which is:
primops :: Word8 -> Word8 -> (Word8, Bool)
primops (W8# x#) (W8# y#) =
(W8# (narrow8Word# z#), isTrue# (gtWord# z# 255##))
where
z# = plusWord# x# y#
A way to do this is:
addWithCarry :: Word8 -> Word8 -> (Word8, Bool)
addWithCarry x y = (z, carry)
where z = x + y
carry = z < x

Haskell function with type IO Int -> Int, without using unsafePerformIO

I have a homework question asking me:
Can you write a Haskell function with type IO Int -> Int (without using unsafePerformIO)? If yes, give the function; if not, explain the reason.
I have tried to write such a function:
test :: IO Int -> Int
test a = do
x <- a
return x
But this does not work. I have tried to make it work for a while, and I can't, so I gather that the answer to the question is no. But, I do not understand why it is not possible. Why doesn't it work?
The only functions of type IO Int -> Int are uninteresting, since they must ignore their argument. that is, they must be equivalent to
n :: Int
n = ...
f :: IO Int -> Int
f _ = n
(Technically, there's also f x = x `seq` n, as #Keshav points out).
This is because there's no way to escape the IO monad (unlike most other monads). This is by design. Consider
getInt :: IO Int
getInt = fmap read getLine
which is a function which reads an integer from stdin. If we could write
n :: Int
n = f getInt
this would be an integer value which can "depend" on the IO action getInt... but must be pure, i.e. must do no IO at all. How can we use an IO action if we must do no IO at all? It turns out that we can not use it in any way.
The only way to do such operation meaningfully is to allow the programmer break the "purity" contract, which is the main thing behind Haskell. GHC gives the programmer a few "unsafe" operations if the programmer is bold enough to declare "trust me, I know what I am doing". One of them is "unsafePerformIO". Another is accessing the realWorld# low-level stuff in GHC.* modules (as #Michael shows above). Even the FFI can import a C function claiming it to be pure, essentially enabling the user to write their own unsafePerformIO.
Well, there are two ways...
If you are happy with a constant function, use this:
test :: IO Int -> Int
test a = 42
Otherwise, you have to do the same thing that unsafePerformIO does (or more specifically unsafeDupablePerformIO).
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
import GHC.Base (IO(..), realWorld#)
unIO (IO m) = case m realWorld# of (# _, r #) -> r
test :: IO Int -> Int
test a = (unIO a) + 1
Have fun...
Disclaimer: this is all really hacky ;)

Is there any hope to cast ForeignPtr to ByteArray# (for a function :: ByteString -> Vector)

For performance reasons I would like a zero-copy cast of ByteString (strict, for now) to a Vector. Since Vector is just a ByteArray# under the hood, and ByteString is a ForeignPtr this might look something like:
caseBStoVector :: ByteString -> Vector a
caseBStoVector (BS fptr off len) =
withForeignPtr fptr $ \ptr -> do
let ptr' = plusPtr ptr off
p = alignPtr ptr' (alignment (undefined :: a))
barr = ptrToByteArray# p len -- I want this function, or something similar
barr' = ByteArray barr
alignI = minusPtr p ptr
size = (len-alignI) `div` sizeOf (undefined :: a)
return (Vector 0 size barr')
That certainly isn't right. Even with the missing function ptrToByteArray# this seems to need to escape the ptr outside of the withForeignPtr scope. So my quesetions are:
This post probably advertises my primitive understanding of ByteArray#, if anyone can talk a bit about ByteArray#, it's representation, how it is managed (GCed), etc I'd be grateful.
The fact that ByteArray# lives on the GCed heap and ForeignPtr is external seems to be a fundamental issue - all the access operations are different. Perhaps I should look at redefining Vector from = ByteArray !Int !Int to something with another indirection? Someing like = Location !Int !Int where data Location = LocBA ByteArray | LocFPtr ForeignPtr and provide wrapping operations for both those types? This indirection might hurt performance too much though.
Failing to marry these two together, maybe I can just access arbitrary element types in a ForeignPtr in a more efficient manner. Does anyone know of a library that treats ForeignPtr (or ByteString) as an array of arbitrary Storable or Primitive types? This would still lose me the stream fusion and tuning from the Vector package.
Disclaimer: everything here is an implementation detail and specific to GHC and the internal representations of the libraries in question at the time of posting.
This response is a couple years after the fact, but it is indeed possible to get a pointer to bytearray contents. It's problematic as the GC likes to move data in the heap around, and things outside of the GC heap can leak, which isn't necessarily ideal. GHC solves this with:
newPinnedByteArray# :: Int# -> State# s -> (#State# s, MutableByteArray# s#)
Primitive bytearrays (internally typedef'd C char arrays) can be statically pinned to an address. The GC guarantees not to move them. You can convert a bytearray reference to a pointer with this function:
byteArrayContents# :: ByteArray# -> Addr#
The address type forms the basis of Ptr and ForeignPtr types. Ptrs are addresses marked with a phantom type and ForeignPtrs are that plus optional references to GHC memory and IORef finalizers.
Disclaimer: This will only work if your ByteString was built Haskell. Otherwise, you can't get a reference to the bytearray. You cannot dereference an arbitrary addr. Don't try to cast or coerce your way to a bytearray; that way lies segfaults. Example:
{-# LANGUAGE MagicHash, UnboxedTuples #-}
import GHC.IO
import GHC.Prim
import GHC.Types
main :: IO()
main = test
test :: IO () -- Create the test array.
test = IO $ \s0 -> case newPinnedByteArray# 8# s0 of {(# s1, mbarr# #) ->
-- Write something and read it back as baseline.
case writeInt64Array# mbarr# 0# 1# s1 of {s2 ->
case readInt64Array# mbarr# 0# s2 of {(# s3, x# #) ->
-- Print it. Should match what was written.
case unIO (print (I# x#)) s3 of {(# s4, _ #) ->
-- Convert bytearray to pointer.
case byteArrayContents# (unsafeCoerce# mbarr#) of {addr# ->
-- Dereference the pointer.
case readInt64OffAddr# addr# 0# s4 of {(# s5, x'# #) ->
-- Print what's read. Should match the above.
case unIO (print (I# x'#)) s5 of {(# s6, _ #) ->
-- Coerce the pointer into an array and try to read.
case readInt64Array# (unsafeCoerce# addr#) 0# s6 of {(# s7, y# #) ->
-- Haskell is not C. Arrays are not pointers.
-- This won't match. It might segfault. At best, it's garbage.
case unIO (print (I# y#)) s7 of (# s8, _ #) -> (# s8, () #)}}}}}}}}
Output:
1
1
(some garbage value)
To get the bytearray from a ByteString, you need to import the constructor from Data.ByteString.Internal and pattern match.
data ByteString = PS !(ForeignPtr Word8) !Int !Int
(\(PS foreignPointer offset length) -> foreignPointer)
Now we need to rip the goods out of the ForeignPtr. This part is entirely implementation-specific. For GHC, import from GHC.ForeignPtr.
data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents
(\(ForeignPtr addr# foreignPointerContents) -> foreignPointerContents)
data ForeignPtrContents = PlainForeignPtr !(IORef (Finalizers, [IO ()]))
| MallocPtr (MutableByteArray# RealWorld) !(IORef (Finalizers, [IO ()]))
| PlainPtr (MutableByteArray# RealWorld)
In GHC, ByteString is built with PlainPtrs which are wrapped around pinned byte arrays. They carry no finalizers. They are GC'd like regular Haskell data when they fall out of scope. Addrs don't count, though. GHC assumes they point to things outside of the GC heap. If the bytearray itself falls out of the scope, you're left with a dangling pointer.
data PlainPtr = (MutableByteArray# RealWorld)
(\(PlainPtr mutableByteArray#) -> mutableByteArray#)
MutableByteArrays are identical to ByteArrays. If you want true zero-copy construction, make sure you either unsafeCoerce# or unsafeFreeze# to a bytearray. Otherwise, GHC creates a duplicate.
mbarrTobarr :: MutableByteArray# s -> ByteArray#
mbarrTobarr = unsafeCoerce#
And now you have the raw contents of the ByteString ready to be turned into a vector.
Best Wishes,
You might be able to hack together something :: ForeignPtr -> Maybe ByteArray#, but there is nothing you can do in general.
You should look at the Data.Vector.Storable module. It includes a function unsafeFromForeignPtr :: ForeignPtr a -> Int -> Int -> Vector a. It sounds like what you want.
There is also a Data.Vector.Storable.Mutable variant.

Resources