Mutable data across the FFI and laziness

Mutable data across the FFI and laziness - haskell

Intro
I'm wrapping a C numerical library with inline-c; some functions can be passed callbacks to step routines, think optimization or time integration of ODEs.
In particular in the native C use the callbacks can operate on contiguous arrays, modify them via a pointer, and return them to some opaque (distributed) datastructure.
So it's a mutable data problem, and I'd like to represent it on the Haskell side: in my understanding, in the callback we should freeze the array, work on it e.g. as a Data.Vector.Storable.Vector or with repa, thaw the result, get the foreign pointer and pass it back.
The internals: newtype Vec = Vec (Ptr Vec) deriving Storable, and the associated entry in an inline-c Context, represents the type of a pointer to an opaque C data structure and vecGetArray/vecRestoreArray produce/request the same pointer to contiguous memory and request/produce a Vec, respectively.
Q:
I noticed that, while the returned Vector is correct, when I use the resulting, modified Vec (the "side effect"), after returning from this function, it is not modified. GHC doesn't recompute it (laziness?). How do I make it recompute it? What idiomatic way is there in Haskell to work with mutable data across the FFI?
* Fixed *
see answer
Thanks!
import qualified Data.Vector.Storable as V
import qualified Data.Vector.Storable.Mutable as VM
withVecGetVectorM ::
Vec ->
(V.Vector PetscScalar_ -> IO (V.Vector PetscScalar_)) ->
IO (V.Vector PetscScalar_)
withVecGetVectorM v f = do
p <- vecGetArrayPtr v
pf <- newForeignPtr_ p
vImm <- V.freeze (VM.unsafeFromForeignPtr0 pf len)
vImmOut <- f vImm
vMutOut <- V.thaw vImmOut
let (fpOut, _, _) = VM.unsafeToForeignPtr vMutOut
pOut = unsafeForeignPtrToPtr fpOut
vecRestoreArrayPtr v pOut
return vImmOut
where len = vecSize v
Vec.hs :
vecGetArrayPtr :: Vec -> IO (Ptr PetscScalar_)
vecGetArrayPtr v = chk1 (vecGetArrayPtr' v)
vecRestoreArrayPtr :: Vec -> Ptr PetscScalar_ -> IO ()
vecRestoreArrayPtr v ar = chk0 (vecRestoreArrayPtr' v ar)
InlineC.hs
-- PETSC_EXTERN PetscErrorCode VecGetArray(Vec,PetscScalar**);
vecGetArrayPtr' :: Vec -> IO (Ptr PetscScalar_, CInt)
vecGetArrayPtr' v = withPtr $ \p -> vga v p where
vga v p = [C.exp|int{VecGetArray($(Vec v), $(PetscScalar** p))}|]
-- PETSC_EXTERN PetscErrorCode VecRestoreArray(Vec,PetscScalar**);
vecRestoreArrayPtr' :: Vec -> Ptr PetscScalar_ -> IO CInt
vecRestoreArrayPtr' v c = with c $ \pc -> vra v pc
where
vra w pc = [C.exp|int{VecRestoreArray($(Vec w), $(PetscScalar** pc))}|]
Moreover, IIUC, the code makes 2 additional copies of the vector, one at the freeze and one at thaw. , but I suspect it's inefficient. Can someone suggest improvements or simplifications?

Fixed:
I was making a trivial mistake re. which pointers to handle; here's the fix. This function has the same signature as the buggy one, which suggests me there has to be a typeful way to signify which pointer we are modifying. If anyone has suggestions regarding this, do not hesitate to get in touch.
vecRestoreVector :: Vec -> V.Vector PetscScalar_ -> IO ()
vecRestoreVector v w = do
p <- vecGetArrayPtr v
pf <- newForeignPtr_ p
V.copy (VM.unsafeFromForeignPtr0 pf len) w
vecRestoreArrayPtr v p
where
len = vecSize v

Related

maxIndex of an MVector

Data.Vector includes a function maxIndex with type maxIndex :: (Ord a) => Vector a -> Int that returns the index of the maximum value in that Vector. I'm working with mutable Vectors, however, and MVector doesn't have maxIndex defined for it.
What's the best way of getting the data I want out of the MVector I have? My code currently is:
import qualified Data.Vector.Unboxed.Mutable as MV
import Control.Monad.ST
import Control.Monad (mapM_)
type MaxIndex = Int
step :: forall s. MV.MVector s Int -> MaxIndex -> ST s ()
step vec i = do
n <- MV.unsafeRead vec i
MV.write vec i 0
let l = MV.length vec
(k, x) = n `divMod` l
mapM_ (\j -> MV.modify vec (+k) j) [0..l-1] -- side note, this is just
-- fmap (+k) vec, but MVector is not
-- a functor. Is there a better way?
mapM_ (\j -> MV.modify vec (+1) (j `mod` l)) [i+1..i+x]
where i is the index I'm looking to derive inside step. I'm doing this because the actions here need to eventually be wrapped inside an until and repeated until a predicate is satisfied, and freezing and thawing every cycle sounds ludicrously expensive.

I see lots of talk about unsafe freezing which seems suspect since you plan to mutate this memory later, thus violating the assurance you are implicitly giving when calling unsafeFreeze.
My suggestion is to just write an imperative-style maxIndex function. The below is typed but not tested:
import qualified Data.Vector.Unboxed.Mutable as MV
import Control.Monad.ST
import Control.Monad (mapM_)
maxIndex :: (Ord a, MV.Unbox a) => MV.MVector s a -> ST s (Maybe Int)
maxIndex mv | len == 0 = pure Nothing
| otherwise = Just <$> go 0 0
where
len = MV.length mv
go n i | i >=len = pure n
| otherwise = do
nVal <- MV.unsafeRead mv n
iVal <- MV.unsafeRead mv i
if nVal < iVal then go i (i+1)
else go n (i+1)

Have you considered freezing the vector with unsafeFreeze which is supposed to be fast (i.e. Θ(1))? For example you can define maxIndex for mutable vectors like this:
maxIndex = fmap V.maxIndex . V.unsafeFreeze
This assumes that you have imported the following:
import qualified Data.Vector.Unboxed as V
unsafeFreeze doesn't actually copy any data and should be fast, but it would be interesting to run a criterion benchmark to see if this approach is actually faster compared to an explicit loop.

"Persistently" Impure (IO) Vectors in Haskell, with database-like persistent interface

I have a computation that is best described as iterative mutations on a vector; the final result is the final state of the vector.
The "idiomatic" approach to making this functional, I think, is to simply pass on a new vector object along whenever it is "modified". So your iterative method would be operate_on_vector :: Vector -> Vector, which takes in a vector and outputs the modified vector, which is then fed through the method again.
This method is pretty straightforward and I had no problems implementing it, even being new to Haskell.
Alternatively, one could encapsulate all of this in a State monad and pass along a constantly re-created and modified vector as the state value.
However, I suffer a huge, huge performance cost, as these calculations are pretty intensive, the iterations many (on the order of millions) and the data vectors can get pretty large (on the order of at least thousands of primitives). Re-creating a new vector in memory at every step of the iteration seems pretty costly, data collection or not.
Then I considered how IO works -- it can be seen as basically like State, except the state value is the "World", which is constantly changing.
Maybe I could use something that is like IO to "operate" on a "world"? And the "world" would be the vector in-memory? Sort of like a database query, but everything is in memory.
For example with io you could do
do
putStrLn "enter something"
something <- getLine
putStrLine $ "you entered " ++ something
which can be seen as "performing" putStrLn and "modifying" the World object, returning a new World object and feeding it into the next function, which queryies the world object for a string that is the result of the modification, and then returns another world object after another modification.
Is there anything like that that can do this for mutable vectors?
do
putInVec 0 9 -- index 0, value 9
val <- getFromVec 0
putInVec 0 (val + 1)
, with "impure" "mutable" vectors, instead of passing along a new modified vector at each step.

I believe you can do this using mutable vector and a thin wrapper over Reader + ST (or IO) monad.
It can look like this:
type MyVector = IOVector $x -- Use your own elements type here instead of $x
newtype VectorIO a = VectorIO (ReaderT MyVector IO a) deriving (Monad, MonadReader, MonadIO)
-- You will need GeneralizedNewtypeDeriving extension here
-- Run your computation over an existing vector
runComputation :: MyVector -> VectorIO a -> IO MyVector
runComputation vector (VectorIO action) = runReaderT action vector >> return vector
-- Run your computation over a new vector of the specified length
runNewComputation :: Int -> VectorIO a -> IO MyVector
runNewComputation n action = do
vector <- new n
runComputation vector action
putInVec :: Int -> $x -> VectorIO ()
putInVec idx val = do
v <- ask
liftIO $ write v idx val
getFromVec :: Int -> VectorIO $x
getFromVec idx = do
v <- ask
liftIO $ read v idx
That's really all. You can use VectorIO monad to perform your computations, just like you wanted in your example. If you do not want IO but want pure computations, you can use ST monad; modifications to the code above will be trivial.
Update
Here is an ST-based version:
{-# LANGUAGE GeneralizedNewtypeDeriving, FlexibleInstances, MultiParamTypeClasses, Rank2Types #-}
module Main where
import Control.Monad
import Control.Monad.Trans.Class
import Control.Monad.Reader
import Control.Monad.Reader.Class
import Control.Monad.ST
import Data.Vector as V
import Data.Vector.Mutable as MV
-- Your type of the elements
type E = Int
-- Mutable vector which will be used as a context
type MyVector s = MV.STVector s E
-- Immutable vector compatible with MyVector in its type
type MyPureVector = V.Vector E
-- Simple monad stack consisting of a reader with the mutable vector as a context
-- and of an ST action
newtype VectorST s a = VectorST (ReaderT (MyVector s) (ST s) a) deriving Monad
-- Make the VectorST a reader monad
instance MonadReader (MyVector s) (VectorST s) where
ask = VectorST $ ask
local f (VectorST a) = VectorST $ local f a
reader = VectorST . reader
-- Lift an ST action to a VectorST action
liftST :: ST s a -> VectorST s a
liftST = VectorST . lift
-- Run your computation over an existing vector
runComputation :: MyVector s -> VectorST s a -> ST s (MyVector s)
runComputation vector (VectorST action) = runReaderT action vector >> return vector
-- Run your computation over a new vector of the specified length
runNewComputation :: Int -> VectorST s a -> ST s (MyVector s)
runNewComputation n action = do
vector <- MV.new n
runComputation vector action
-- Run a computation on a new mutable vector and then freeze it to an immutable one
runComputationPure :: Int -> (forall s. VectorST s a) -> MyPureVector
runComputationPure n action = runST $ do
vector <- runNewComputation n action
V.unsafeFreeze vector
-- Put an element into the current vector
putInVec :: Int -> E -> VectorST s ()
putInVec idx val = do
v <- ask
liftST $ MV.write v idx val
-- Retrieve an element from the current vector
getFromVec :: Int -> VectorST s E
getFromVec idx = do
v <- ask
liftST $ MV.read v idx

How do closures work in Haskell?

I'm reading about the math foundation behind Haskell - I've learned about how closures can be used to save state in a function.
I was wondering if Haskell allows closures, and how they work because they are not pure functions?
If a function modifies it's closed-over state it will be capable of giving different outputs on identical inputs.
How is this not a problem in Haskell? Is it because you can't reassign a variable after you initially assign it a value?

You actually can simulate closures in Haskell, but not the way you might think. First, I will define a closure type:
data Closure i o = Respond (i -> (o, Closure i o ))
This defines a type that at each "step" takes a value of type i which is used to compute a response of type o.
So, let's define a "closure" that accepts empty inputs and answers with integers, i.e.:
incrementer :: Closure () Int
This closure's behavior will vary from request to request. I'll keep it simple and make it so that it responds with 0 to the first response and then increments its response for each successive request:
incrementer = go 0 where
go n = Respond $ \() -> (n, go (n + 1))
We can then repeatedly query the closure, which yields a result and a new closure:
query :: i -> Closure i o -> (o, Closure i o)
query i (Respond f) = f i
Notice that the second half of the above type resembles a common pattern in Haskell, which is the State monad:
newtype State s a = State { runState :: s -> (a, s) }
It can be imported from Control.Monad.State. So we can wrap query in this State monad:
query :: i -> State (Closure i o) o
query i = state $ \(Respond f) -> f i
... and now we have a generic way to query any closure using the State monad:
someQuery :: State (Closure () Int) (Int, Int)
someQuery = do
n1 <- query ()
n2 <- query ()
return (n1, n2)
Let's pass it our closure and see what happens:
>>> evalState someQuery incrementer
(0, 1)
Let's write a different closure that returns some arbitrary pattern:
weirdClosure :: Closure () Int
weirdClosure = Respond (\() -> (42, Respond (\() -> (666, weirdClosure))))
... and test it:
>>> evalState someQuery weirdClosure
(42, 666)
Now, writing closures by hand seems pretty awkward. Wouldn't it be nice if we could use do notation to write the closure? Well, we can! We only have to make one change to our closure type:
data Closure i o r = Done r | Respond (i -> (o, Closure i o r))
Now we can define a Monad instance (from Control.Monad) for Closure i o:
instance Monad (Closure i o) where
return = Done
(Done r) >>= f = f r
(Respond k) >>= f = Respond $ \i -> let (o, c) = k i in (o, c >>= f)
And we can write a convenience function which corresponds to servicing a single request:
answer :: (i -> o) -> Closure i o ()
answer f = Respond $ \i -> (f i, Done ())
... which we can use to rewrite all our old closures:
incrementer :: Closure () Int ()
incrementer = forM_ [1..] $ \n -> answer (\() -> n)
weirdClosure :: Closure () Int r
weirdClosure = forever $ do
answer (\() -> 42)
answer (\() -> 666)
Now we just change our query function to:
query :: i -> StateT (Closure i o r) (Either r) o
query i = StateT $ \x -> case x of
Respond f -> Right (f i)
Done r -> Left r
... and use it to write queries:
someQuery :: StateT (Closure () Int ()) (Either ()) (Int, Int)
someQuery = do
n1 <- query ()
n2 <- query ()
return (n1, n2)
Now test it!
>>> evalStateT someQuery incrementer
Right (1, 2)
>>> evalStateT someQuery weirdClosure
Right (42, 666)
>>> evalStateT someQuery (return ())
Left ()
However, I still don't consider that a truly elegant approach, so I'm going to conclude by shamelessly plugging my Proxy type in my pipes as a much general and more structured way of writing closures and their consumers. The Server type represents a generalized closure and the Client represents a generalized consumer of a closure.

The closure just 'adds' additional variables to function, so there is nothing more you can do with them than you can with 'normal' ones, that is, certainly not modify the state.
Read more:
Closures (in Haskell)

As others have said, Haskell does not allow the "state" in a closure to be altered. This prevents you from doing anything that might break function purity.

Densely packed tree of signals

I collect realtime signals, compute derived signals and store both raw and derived data
in a circular buffer, so I hold only last million of samples.
Sometimes I need to serialize current values for all signals. So I need something like:
type D0 a = M.Map SignalType D1
data D1 a = D1
{ foo :: M.Map DoorType a
, bar :: D2 a
, baz :: a
}
data D2 = D2
{
quux :: a
, zoo :: a
}
data MyData = D0 SignalBuffer
data CurrentSignals = D0 SignalValue
SignalBuffer is a sequence of SignalValue. It can be an unboxed array of floats. Haskell can derive Functor instances for me, so I can use fmap to fetch last SignalValue from every SignalBuffer and pass the structure to Aeson to serialize.
How do I implement a circular buffer API for SignalBuffer so I can push new values to all the buffers when new ticks arrive? I'd like to conserve memory, so I think I have to use unboxed arrays. Is it advantageous to use mutable unboxed arrays (STUArray?) so array updates don't pile up in memory? Is it possible to use mutable arrays in this setting at all? I'm ready to change MyData and CurrentSignals to whatever does the job.
I know how to implement circular buffers, the question is how to elegantly apply the updates to MyData.
I'm thinking of something like
type UpdateFunc a = MyData -> SignalValue -> Modifier SignalBuffer
updateAllBuffers :: D0 UpdateFunc -> Modifier MyData
Some signals are "convolutions" of other signals (not real convolutions, but a similar kind of processing). To update a buffer for a signal I need to access buffers of other signals - that's why UpdateFunc accepts MyData and SignalValue and returns a buffer modification function.
updateAllBuffers then "zips" D0 UpdateFunc and MyData to get new MyData.
Of course I'm ready to use whatever Modifier fits my task - it can be a function, a monadic value etc.

I do not entirely understand what you are trying to do with the code above, but you can use IOVector from Data.Vector.Unboxed.Mutable for a high-performance array to make a circular buffer:
{-# LANGUAGE GADTs #-}
import Data.IORef (IORef, newIORef, readIORef, writeIORef)
import Data.Vector.Unboxed.Mutable (IOVector, Unbox)
import qualified Data.Vector.Unboxed.Mutable as V
data CircularBuffer a where
CircularBuffer :: Unbox a =>
{ start :: IORef Int -- index for getting an element
, end :: IORef Int -- index for putting an element
, array :: IOVector a
} -> CircularBuffer a
newCircularBuffer :: (Unbox a) => Int -> IO (CircularBuffer a)
newCircularBuffer size = CircularBuffer <$> newIORef 0 <*> newIORef 0 <*> V.new size
putCircularBuffer :: Unbox a => a -> CircularBuffer a -> IO ()
putCircularBuffer newEndValue (CircularBuffer _start end array) = do
endIndex <- readIORef end
V.write array endIndex newEndValue
writeIORef end $! (endIndex + 1) `mod` V.length array
getCircularBuffer :: Unbox a => CircularBuffer a -> IO a
getCircularBuffer (CircularBuffer start _end array) = do
startIndex <- readIORef start
startValue <- V.read array startIndex
writeIORef start $! (startIndex + 1) `mod` V.length array
pure startValue
You can then make a map like function (it would be IO though) that applies a function to every item in the CircularBuffer's array.

Monadic creation of vectors (or: can someone type annotate this for me?)

I came across the following piece of code as part of this Redddit sub-thread discussing an implementation of the Fisher-Yates shuffle:
randomIs g n = fill g 0
where
v = enumFromN 0 n
fill g i = when (i < n) $ do
let (x,g') = randomR (i, n-1) g
G.swap v i x
fill g' (i+1)
(I guess G refers to Data.Vector.Generic.Mutable... right?). Having never created vectors monadically before, I'm struggling to grasp this, especially with no type annotations. Doesn't v have type Data.Vector Int? How come one can pass it to G.swap then? Won't it have to be thawed first?
I might have just misunderstood Data.Vector.Generic, but if someone could clarify the above (by adding type annotations, perhaps?), I'd appreciate it.
Addendum: Here's my own attempt at adding type annotations:
import qualified Data.Vector.Unboxed as UVect
import qualified Data.Vector.Unboxed.Mutable as UMVect
import qualified System.Random as R
import Control.Monad
import Control.Monad.ST
randomPermutation :: forall a. (R.RandomGen a) => a -> Int -> UVect.Vector Int
randomPermutation g n = runST newVect
where
newVect :: ST s (UVect.Vector Int)
newVect = UVect.unsafeThaw (UVect.enumFromN 0 n) >>= \v ->
fill v 0 g >>
UVect.unsafeFreeze v
fill x i gen = when (i < n) $
let (j, gen') = R.randomR (i, n-1) gen in
UMVect.unsafeSwap x i j >>
fill x (i+1) gen'
As you can see, I'm avoiding Data.Vector.Generic to rule out the error source caused by perhaps not understanding it right. I'm also doing things in the ST monad.
In my head, the type of fill should be
UMVect.MVector (ST s (UVect.Vector Int)) Int -> Int -> a -> ST s ()
but GHC objects. Any hints? Again: It typechecks if I don't annotate fill.
Sidenote: I'd also like randomPermutation to return the updated random number generator. Thus, I'd need fill to also handle the generator's state. With my current type confusion, I don't see how to do that neatly. Any hints?

The compile error is telling us:
Expected type: ST s (UMVect.MVector (ST s (UVect.Vector Int)) Int)
Actual type: ST s (UMVect.MVector (Control.Monad.Primitive.PrimState (ST s)) Int)
So, changing the type signature of fill to UMVect.MVector (PrimState (ST s)) Int -> Int -> a -> ST s () (adding import Control.Monad.Primitive too) solves the problem!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string