Efficient way to do n-ary branch / tabulated functions? - haskell

I'm trying to get some basic information on the performance characteristics of branches in SBV.
Let's suppose I have an SInt16 and a very sparse lookup table Map Int16 a. I can implement the lookup with nested ite:
sCase :: (Mergeable a) => SInt16 -> a -> Map Int16 a -> a
sCase x def = go . toList
where
go [] = def
go ((k,v):kvs) = ite (x .== literal k) v (go kvs)
However, this means the generated tree will be very deep.
Does that matter?
If yes, is it better to instead generate a balanced tree of branches, effectively mirroring the Map's structure? Or is there some other scheme that would give even better performance?
If there are less than 256 entries in the map, would it change anything to "compress" it so that sCase works on an SInt8 and a Map Int8 a?
Is there some built-in SBV combinator for this use case that works better than iterated ite?
EDIT: It turns out that it matters a lot what my a is, so let me add some more detail to that. I am currently using sCase to branch in a stateful computation modeled as an RWS r w s a, with the following instances:
instance forall a. Mergeable a => Mergeable (Identity a) where
symbolicMerge force cond thn els = Identity $ symbolicMerge force cond (runIdentity thn) (runIdentity els)
instance (Mergeable s, Mergeable w, Mergeable a, forall a. Mergeable a => Mergeable (m a)) => Mergeable (RWST r w s m a) where
symbolicMerge force cond thn els = Lazy.RWST $
symbolicMerge force cond (runRWST thn) (runRWST els)
So stripping away all the newtypes, I'd like to branch into something of type r -> s -> (a, s, w) s.t. Mergeable s, Mergeable w and Mergeable a.

Symbolic look-ups are expensive
Symbolic array lookup will be expensive regardless of what data-structure you use. It boils down to the fact that there's no information available to the symbolic execution engine to cut-down on the state-space, so it ends up doing more or less what you coded yourself.
SMTLib Arrays
However, the best solution in these cases is to actually use SMT's support for arrays: http://smtlib.cs.uiowa.edu/theories-ArraysEx.shtml
SMTLib arrays are different than what you'd consider as an array in a regular programming language: It does not have bounds. In that sense, it's more of a map from inputs to outputs, spanning the entire domain. (i.e., they are equivalent to functions.) But SMT has custom theories to deal with arrays and thus they can handle problems involving arrays much more efficiently. (On the down-side, there's no notion of index-out-of-bounds or somehow controlling the range of elements you can access. You can code those up yourself on top of the abstraction though, leaving it up to you to decide how you want to handle such invalid accesses.)
If you are interested in learning more about how SMT solvers deal with arrays, the classic reference is: http://theory.stanford.edu/~arbrad/papers/arrays.pdf
Arrays in SBV
SBV supports arrays, through the SymArray class: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SymArray
The SFunArray type actually does not use SMTLib arrays. This was designed to support solvers that didn't understand Arrays, such as ABC: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SFunArray
The SArray type fully supports SMTLib arrays: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV.html#t:SArray
There are some differences between these types, and the above links describe them. However, for most purposes, you can use them interchangeably.
Converting a Haskell map to an SBV array
Going back to your original question, I'd be tempted to use an SArray to model such a look up. I'd code it as:
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import qualified Data.Map as M
import Data.Int
-- Fill an SBV array from a map
mapToSArray :: (SymArray array, SymVal a, SymVal b) => M.Map a (SBV b) -> array a b -> array a b
mapToSArray m a = foldl (\arr (k, v) -> writeArray arr (literal k) v) a (M.toList m)
And use it as:
g :: Symbolic SBool
g = do let def = 0
-- get a symbolic array, initialized with def
arr <- newArray "myArray" (Just def)
let m :: M.Map Int16 SInt16
m = M.fromList [(5, 2), (10, 5)]
-- Fill the array from the map
let arr' :: SArray Int16 Int16 = mapToSArray m arr
-- A simple problem:
idx1 <- free "idx1"
idx2 <- free "idx2"
pure $ 2 * readArray arr' idx1 + 1 .== readArray arr' idx2
When I run this, I get:
*Main> sat g
Satisfiable. Model:
idx1 = 5 :: Int16
idx2 = 10 :: Int16
You can run it as satWith z3{verbose=True} g to see the SMTLib output it generates, which avoids costly lookups by simply delegating those tasks to the backend solver.
Efficiency
The question of whether this will be "efficient" really depends on how many elements your map has that you're constructing the array from. The larger the number of elements and the trickier the constraints, the less efficient it will be. In particular, if you ever write to an index that is symbolic, I'd expect slow-downs in solving time. If they're all constants, it should be relatively performant. As is usual in symbolic programming, it's really hard to predict any performance without seeing the actual problem and experimenting with it.
Arrays in the query context
The function newArray works in the symbolic context. If you're in a query context, instead use freshArray: https://hackage.haskell.org/package/sbv-8.7/docs/Data-SBV-Control.html#v:freshArray

Related

The simplest way to generically traverse a tree in haskell

Suppose I used language-javascript library to build AST in Haskell. The AST has nodes of different types, and each node can have fields of those different types.
And each type can have numerous constructors. (All the types instantiate Data, Eq and Show).
I would like to count each type's constructor occurrence in the tree. I could use toConstr to get the constructor, and ideally I'd make a Tree -> [Constr] function fisrt (then counting is easy).
There are different ways to do that. Obviously pattern matching is too verbose (imagine around 3 types with 9-28 constructors).
So I'd like to use a generic traversal, and I tried to find the solution in SYB library.
There is an everywhere function, which doesn't suit my needs since I don't need a Tree -> Tree transformation.
There is gmapQ, which seems suitable in terms of its type, but as it turns out it's not recursive.
The most viable option so far is everywhereM. It still does the useless transformation, but I can use a Writer to collect toConstr results. Still, this way doesn't really feel right.
Is there an alternative that will not perform a useless (for this task) transformation and still deliver the list of constructors? (The order of their appearance in the tree doesn't matter for now)
Not sure if it's the simplest, but:
> data T = L | B T T deriving Data
> everything (++) (const [] `extQ` (\x -> [toConstr (x::T)])) (B L (B (B L L) L))
[B,L,B,B,L,L,L]
Here ++ says how to combine the results from subterms.
const [] is the base case for subterms who are not of type T. For those of type T, instead, we apply \x -> [toConstr (x::T)].
If you have multiple tree types, you'll need to extend the query using
const [] `extQ` (handleType1) `extQ` (handleType2) `extQ` ...
This is needed to identify the types for which we want to take the constructors. If there are a lot of types, probably this can be made shorter in some way.
Note that the code above is not very efficient on large trees since using ++ in this way can lead to quadratic complexity. It would be better, performance wise, to return a Data.Map.Map Constr Int. (Even if we do need to define some Ord Constr for that)
universe from the Data.Generics.Uniplate.Data module can give you a list of all the sub-trees of the same type. So using Ilya's example:
data T = L | B T T deriving (Data, Show)
tree :: T
tree = B L (B (B L L) L)
λ> import Data.Generics.Uniplate.Data
λ> universe tree
[B L (B (B L L) L),L,B (B L L) L,B L L,L,L,L]
λ> fmap toConstr $ universe tree
[B,L,B,B,L,L,L]

Haskell vector C++ push_back analogue

I've discovered that Haskell Data.Vector.* miss C++ std::vector::push_back's functionality. There is grow/unsafeGrow, but they seem to have O(n) complexity.
Is there a way to grow vectors in O(1) amortized time for an element?
No there really is no such facility in Data.Vector. It isn't too difficult to implement this from scratch using MutableArray like Data.Vector.Mutable does (see my implementation below), but there are some significant drawbacks. In particular, all of its operations end up happening inside some state context usually ST or IO. This has the downsides that
Any code that manipulates such a data structure ends up having to be monadic
The compiler is much less likely to be able to optimize. For example, libraries like vector use something really clever called fusion to optimize away intermediate allocations. This sort of thing is not possible in a state context.
Parallelism is going to be a lot tougher: in ST I can't even have two threads and in IO I will have race conditions all over the place. The nasty bit here is that any sharing is going to have to happen in IO.
As if all this wasn't enough, garbage collection also performs better inside pure code.
What do I do then?
It isn't particularly often that you have a need for exactly this behaviour - usually you are better off using an immutable data structure (thereby avoiding all of the aforementioned problems) which does something similar. Just limiting ourselves to containers which comes with GHC, some alternatives include:
if you are almost always just using push_back, maybe you just want a stack (a plain old [a]).
if you anticipate doing more push_back than lookups, Data.Sequence gives you O(1) appending to either end and O(log n) lookup.
if you are interested in a lot of operations especially hashmap-like, Data.IntMap is pretty optimized. Even if the theoretical cost of those operations is O(log n), you will need a pretty big IntMap to start feeling those costs.
Making something like C++ vector
Of course, if one doesn't care about the restrictions mentioned initially, there is no reason not to have a C++ like vector. Just for fun, I went ahead and implemented this from scratch (needs packages data-default and primitive).
The reason this code is probably not already in some library is that it goes against much of the spirit of Haskell (I do this with the intent of conforming to a C++ style vector).
The only operation that actually makes a new vector is newVector - everything else "modifies" an existing vector. Since pushBack doesn't return a new GrowVector, it has to modify the existing one (including its length and/or capacity), so length and capacity have to be "pointers". In turn, that means that even getting the length is a monadic operation.
While this isn't unboxed, it would not be too difficult to replicate vectors data family approach - it is just tedious1.
With that said:
module GrowVector (
GrowVector, newEmpty, size, read, write, pushBack, popBack
) where
import Data.Primitive.Array
import Data.Primitive.MutVar
import Data.Default
import Control.Monad
import Control.Monad.Primitive (PrimState, PrimMonad)
import Prelude hiding (length, read)
data GrowVector s a = GrowVector
{ underlying :: MutVar s (MutableArray s a) -- ^ underlying array
, length :: MutVar s Int -- ^ perceived length of vector
, capacity :: MutVar s Int -- ^ actual capacity
}
type GrowVectorIO = GrowVector (PrimState IO)
-- | Make a new empty vector with the given capacity. O(n)
newEmpty :: (Default a, PrimMonad m) => Int -> m (GrowVector (PrimState m) a)
newEmpty cap = do
arr <- newArray cap def
GrowVector <$> newMutVar arr <*> newMutVar 0 <*> newMutVar cap
-- | Read an element in the vector (unchecked). O(1)
read :: PrimMonad m => GrowVector (PrimState m) a -> Int -> m a
g `read` i = do arr <- readMutVar (underlying g); arr `readArray` i
-- | Find the size of the vector. O(1)
size :: PrimMonad m => GrowVector (PrimState m) a -> m Int
size g = readMutVar (length g)
-- | Double the vector capacity. O(n)
resize :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> m ()
resize g = do
curCap <- readMutVar (capacity g) -- read current capacity
curArr <- readMutVar (underlying g) -- read current array
curLen <- readMutVar (length g) -- read current length
newArr <- newArray (2 * curCap) def -- allocate a new array twice as big
copyMutableArray newArr 1 curArr 1 curLen -- copy the old array over
underlying g `writeMutVar` newArr -- use the new array in the vector
capacity g `modifyMutVar'` (*2) -- update the capacity in the vector
-- | Write an element to the array (unchecked). O(1)
write :: PrimMonad m => GrowVector (PrimState m) a -> Int -> a -> m ()
write g i x = do arr <- readMutVar (underlying g); writeArray arr i x
-- | Pop an element of the vector, mutating it (unchecked). O(1)
popBack :: PrimMonad m => GrowVector (PrimState m) a -> m a
popBack g = do
s <- size g;
x <- g `read` (s - 1)
length g `modifyMutVar'` (+ negate 1)
pure x
-- | Push an element. (Amortized) O(1)
pushBack :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> a -> m ()
pushBack g x = do
s <- readMutVar (length g) -- read current size
c <- readMutVar (capacity g) -- read current capacity
when (s+1 == c) (resize g) -- if need be, resize
write g (s+1) x -- write to the back of the array
length g `modifyMutVar'` (+1) -- increase te length
Current semantics of grow
I think the github issue does a pretty good job of explaining the semantics:
I think the intended semantics are that it may do a realloc, but not guaranteed to, and all the current implementations do the simpler copying semantics because for on heap allocations the cost should be roughly the same.
Basically you should use grow when you want a new mutable vector of an increased size, starting with the elements of the old vector (and no longer care about the old vector). This is quite useful - for example one could implement GrowVector using MVector and grow.
1 the approach is that for every new type of unboxed vector you want to have, you make a data instance that "expands" your type into a fixed number of unboxed arrays (or other unboxed vectors). This is the point of data family - to allow different instantiations of a type to have totally different runtime representations, and to also be extensible (you can add your own data instance if you want).

Haskell: Filtering based on an index vector, using only basic higher-order functions

The problem
I have a vector a of size N holding sample data, and another vector b of size M (N>M) holding indices. I would like to obtain a vector c of size N containing the filtered elements from a based on the indices in b.
The question
Is it possible to implement the desired function without using list comprehension, just basic higher-order functions like map, zipWith, filter, etc. (more precisely, their equivalents mapV, zipWithV, filterV, etc.)
Prerequisites:
I am using a Haskell Embedded Domain Specific Language (ForSyDe, module ForSyDe.Shallow.Vector), limited to a set of hardware synthesize-able functions. In order to respect the design methodology, I am allowed to use only the provided functions (thus I cannot use list comprehensions, etc.)
Disclaimer:
I did not test this code for functionality because cabal started bugging around. It worked well for lists and as I transformed every vector to a list, it should work fine although problems may arise.
Try this:
indexFilter :: (Num b, Eq b, Enum b) => Vector a -> Vector b -> Vector a
indexFilter vector indices = vector (map fst (filter (\x -> (snd x) `elem` (fromVector indices)) vectorMap))
where
vectorMap = zip (fromVector vector) [0..]
indexFilter takes a list of tuple of the form (<element>, <index>) and then returns a vector of all elements which index is in the vector b. vectorMap is a just a zip of the elements of a and their indices in the vector.
Although the answer provided by ThreeFx is a correct answer to the question, it did not solve my problem due to several constraints enforced by the design methodology (ForSyDe), which were not mentioned:
lists cannot be used (they cannot be synthesized to other backends). ForSyDe provides two data containers: Signal (for temporal span) and Vector (for spatial span). This should ensure analyzability for system synthesis.
elem does not have a ForSyDe.Shallow.Vector implementation
Solution 1
Using only what the library provides, the shortest solution I found is:
indexFilter1 :: (Num b, Eq b, Enum b) => Vector a
-> Vector b
-> Vector (Vector a)
indexFilter1 v = mapV (\idx -> selectV idx 1 1 v)
The output vector can further be unwrapped, depending on the further usage.
Solution 2
Translating ThreeFx's solution to satisfy the constraints mentioned:
indexFilter :: (Num b, Eq b, Enum b) => Vector a
-> Vector b
-> Vector a
indexFilter v idx = mapV (fst) (filterV (\x -> elemV (snd x) idx) vectorMap)
where
vectorMap = zipWithV (\a b -> (b, a)) (iterateV size (+1) 0) v
size = lengthV v
elemV a = foldlV (\acc x -> if x == a then True else acc) False

Haskell: shuffling data without functional dependencies

I'm trying to implement a Fisher-Yates shuffle of some data. This algorithm is easy to implement for one-dimensional arrays. However, I need to be able to shuffle data in a two-dimensional matrix.
An approach which I think could generalize nicely to higher dimensional arrays is to convert my arbitrarily dimensioned matrix to a single-dimensional array of indices, shuffle those, and then reorganize the matrix by swapping the element at each index of this index array with the element at the index of the element of the index array. In other words, to take a 2x2 matrix such as:
1 2
3 4
I would convert this into this "array":
[(0, (0,0)), (1, (0,1)), (2, ((1,0)), (3, (1,1))]
This I would then scramble per normal into, say,
[(0, (1,0)), (1, (0,1)), (2, ((1,1)), (3, (0,0))]
Once reorganized, the original matrix would become:
2 3
4 1
My basic approach here is that I want to have a type class that looks something like this:
class Shufflable a where
indices :: a -> Array Int b
reorganize :: a -> Array Int b -> a
Then I'll have a function to perform the shuffle which looks like this:
fisherYates :: (RandomGen g) => g -> Array Int b -> (Array Int b, g)
The thinking is that (minus the RandomGen plumbing) I should be able to shuffle a shuffleable thing like so:
shuffle :: (Shufflable a, RandomGen g) => a -> g -> (a, g)
shuffle array = reorganize array (fisherYates (indices array))
Here's what I have so far:
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies, FlexibleInstances #-}
module Shuffle where
import Data.Array hiding (indices)
import System.Random
fisherYates :: (RandomGen g) => Array Int e -> g -> (Array Int e, g)
fisherYates arr gen = go max gen arr
where
(_, max) = bounds arr
go 0 g arr = (arr, g)
go i g arr = go (i-1) g' (swap arr i j)
where
(j, g') = randomR (0, i) g
class Shuffle a b | a -> b where
indices :: a -> Array Int b
reorganize :: a -> Array Int b -> a
shuffle :: (Shuffle a b, RandomGen g) => a -> g -> (a, g)
shuffle a gen = (reorganize a indexes, gen')
where
(indexes, gen') = fisherYates (indices a) gen
instance (Ix ix) => Shuffle (Array ix e) ix where
reorganize a = undefined
indices a = array (0, maxIdx) (zip [0..maxIdx] (range bound))
where
bound = bounds a
maxIdx = rangeSize bound - 1
swap :: Ix i => Array i e -> i -> i -> Array i e
swap arr i j = arr // [ (i, i'), (j, j') ]
where
i' = arr!j
j' = arr!i
My problems:
I feel like this is a lot of language extensions for solving a simple problem. Would it be easier to understand or write another way?
I feel like the community is moving towards type families over functional dependencies. Is there a way to use that instead to solve this problem?
A part of me wonders if the fisherYates function can be moved into the Shuffle typeclass somehow. Would it be possible and/or worth doing to set this up so that you either implement shuffle or implement both indices and reorganize?
Thanks!
You might want to look into repa, which offers n-dimensional arrays which encode their shape (dimensions) into the type; you can code generic operations that work on arrays of any shape with it.
I think you could avoid a typeclass entirely by constructing the array with backpermute or fromFunction and translating the indices (it's more efficient than it looks, since it gets turned into an unboxed array when you force it; in fact, backpermute is implemented in terms of fromFunction under the hood).
repa itself uses quite a few language extensions, but you might find it preferable to the standard library's arrays for reasons of both performance (repa's arrays are unboxed, and the standard operations offered do nice things like automatic parallelisation) and convenience (IMO repa has a nicer API than the standard arrays).
Here's a good introduction to repa.
Admittedly, none of this directly simplifies your code. But if repa's arrays are a good fit for you, then the code you end up with will probably avoid many of the complexities of your current solution.
That said, turning your use of functional dependencies into a type family is really simple; the Shuffle class becomes
class Shuffle a where
type Elt a
indices :: a -> Array Int (Elt a)
reorganize :: a -> Array Int (Elt a) -> a
the instance becomes
instance (Ix ix) => Shuffle (Array ix e) where
type Elt (Array ix e) = ix
...
and the Shuffle a b constraint becomes Shuffle a.

Random-Pivot Quicksort in Haskell

Is it possible to implement a quicksort in Haskell (with RANDOM-PIVOT) that still has a simple Ord a => [a]->[a] signature?
I'm starting to understand Monads, and, for now, I'm kind of interpreting monads as somethink like a 'command pattern', which works great for IO.
So, I understand that a function that returns a random number should actually return a monadic value like IO, because, otherwise, it would break referential transparency. I also understand that there should be no way to 'extract' the random integer from the returned monadic value, because, otherwise, it would, again, break referential transparency.
But yet, I still think that it should be possible to implement a 'pure' [a]->[a] quicksort function, even if it uses random pivot, because, it IS referential transparent. From my point of view, the random pivot is just a implementation detail, and shouldn't change the function's signature
OBS: I'm not actually interested in the specific quicksort problem (so, I don't want to sound rude but I'm not looking for "use mergesort" or "random pivot doesn't increase performance in practice" kind of answers) I'm actually interested in how to implement a 'pure' function that uses 'impure' functions inside it, in cases like quicksort, where I can assure that the function actually is a pure one.
Quicksort is just a good example.
You are making a false assumption that picking the pivot point is just an implementation detail. Consider a partial ordering on a set. Like a quicksort on cards where
card a < card b if the face value is less but if you were to evaluate booleans:
4 spades < 4 hearts (false)
4 hearts < 4 spades (false)
4 hearts = 4 spades (false)
In that case the choice of pivots would determine the final ordering of the cards. In precisely the same way
for a function like
a = get random integer
b = a + 3
print b
is determined by a. If you are randomly choosing something then your computation is or could be non deterministic.
OK, check this out.
Select portions copied form the hashable package, and voodoo magic language pragmas
{-# LANGUAGE FlexibleInstances, UndecidableInstances, NoMonomorphismRestriction, OverlappingInstances #-}
import System.Random (mkStdGen, next, split)
import Data.List (foldl')
import Data.Bits (shiftL, xor)
class Hashable a where
hash :: a -> Int
instance (Integral a) => Hashable a where
hash = fromIntegral
instance Hashable Char where
hash = fromEnum
instance (Hashable a) => Hashable [a] where
hash = foldl' combine 0 . map hash
-- ask the authors of the hashable package about this if interested
combine h1 h2 = (h1 + h1 `shiftL` 5) `xor` h2
OK, so now we can take a list of anything Hashable and turn it into an Int. I've provided Char and Integral a instances here, more and better instances are in the hashable packge, which also allows salting and stuff.
This is all just so we can make a number generator.
genFromHashable = mkStdGen . hash
So now the fun part. Let's write a function that takes a random number generator, a comparator function, and a list. Then we'll sort the list by consulting the generator to select a pivot, and the comparator to partition the list.
qSortByGen _ _ [] = []
qSortByGen g f xs = qSortByGen g'' f l ++ mid ++ qSortByGen g''' f r
where (l, mid, r) = partition (`f` pivot) xs
pivot = xs !! (pivotLoc `mod` length xs)
(pivotLoc, g') = next g
(g'', g''') = split g'
partition f = foldl' step ([],[],[])
where step (l,mid,r) x = case f x of
LT -> (x:l,mid,r)
EQ -> (l,x:mid,r)
GT -> (l,mid,x:r)
Library functions: next grabs an Int from the generator, and produces a new generator. split forks the generator into two distinct generators.
My functions: partition uses f :: a -> Ordering to partition the list into three lists. If you know folds, it should be quite clear. (Note that it does not preserve the initial ordering of the elements in the sublists; it reverses them. Using a foldr could remedy this were it an issue.) qSortByGen works just like I said before: consult the generator for the pivot, partition the list, fork the generator for use in the two recursive calls, recursively sort the left and right sides, and concatenate it all together.
Convenience functions are easy to compose from here
qSortBy f xs = qSortByGen (genFromHashable xs) f xs
qSort = qSortBy compare
Notice the final function's signature.
ghci> :t qSort
qSort :: (Ord a, Hashable a) => [a] -> [a]
The type inside the list must implement both Hashable and Ord. There's the "pure" function you were asking for, with one logical added requirement. The more general functions are less restrictive in their requirements.
ghci> :t qSortBy
qSortBy :: (Hashable a) => (a -> a -> Ordering) -> [a] -> [a]
ghci> :t qSortByGen
qSortByGen
:: (System.Random.RandomGen t) =>
t -> (a -> a -> Ordering) -> [a] -> [a]
Final notes
qSort will behave exactly the same way for all inputs. The "random" pivot selection is. in fact, deterministic. But it is obscured by hashing the list and then seeding a random number generator, making it "random" enough for me. ;)
qSort also only works for lists with length less than maxBound :: Int, which ghci tells me is 9,223,372,036,854,775,807. I thought there would be an issue with negative indexes, but in my ad-hoc testing I haven't run into it yet.
Or, you can just live with the IO monad for "truer" randomness.
qSortIO xs = do g <- getStdGen -- add getStdGen to your imports
return $ qSortByGen g compare xs
ghci> :t qSortIO
qSortIO :: (Ord a) => [a] -> IO [a]
ghci> qSortIO "Hello world"
" Hdellloorw"
ghci> qSort "Hello world"
" Hdellloorw"
In such cases, where you know that the function is referentially transparent, but you can't proof it to the compiler, you may use the function unsafePerformIO :: IO a -> a from the module Data.Unsafe.
For instance, you may use unsafePerformIO to get an initial random state and then do anything using just this state.
But please notice: Don't use it if it's not really needed. And even then, think twice about it. unsafePerformIO is somewhat the root of all evil, since it's consequences can be dramatical - anything is possible from coercing different types to crashing the RTS using this function.
Haskell provides the ST monad to perform non-referentially-transparent actions with a referentially transparent result.
Note that it doesn't enforce referential transparency; it just insures that potentially non-referentially-transparent temporary state can't leak out. Nothing can prevent you from returning manipulated pure input data that was rearranged in a non-reproducible way. Best is to implement the same thing in both ST and pure ways and use QuickCheck to compare them on random inputs.

Resources