Properly exploit parallelism when building a map of expensive keys? - haskell

I'm writing a toy implementation of a rainbow table in Haskell. The main datastructure is a strict Map h c, containing a large amount of pairs, generated from random values c:
import qualified Data.Map as M
import System.Random
table :: (RandomGen g, Random c) => Int -> g -> Map h c
table n = M.fromList . map (\c -> (chain c, c)) . take n . randoms
where chain is very expensive to compute. The part that dominates the computation time is embarrassingly parallel, so I would expect to get a quasi-linear speedup in the number of cores if it runs in parallel.
However, I would like the computed pairs to be added to the table straight away, rather than accumulated in a list in memory. It should be noted that collisions may occur, and in that case, the redundant chains should be dropped as soon as possible. Heap profiling confirms that this is the case.
I've found parMap from Control.Parallel.Strategies, and tried to apply it to my table-building function:
table n = M.fromList . parMap (evalTuple2 rseq rseq) (\c -> (chain c, c)) . take n . randoms
but, running with -N, I get to 1.3 core usage at best. Heap profiling indicates, at least, that the intermediate list does not reside in memory, but '-s' also reports 0 sparks created. How is this possible with my usage of parMap ? What is the proper way to do this ?
EDIT: chain is defined as:
chain :: (c -> h) -> [h -> c] -> c -> h
chain h = h . flip (foldl' (flip (.h)))
where (c -> h) is the target hash function, from cleartext to hash,
and [h -> c] is a family of reducer functions. I want the implementation to stay generic over c and h, but for benchmarking I use strict bytestrings for both.

Here is what I came up with. Let me know how the benchmarks work out:
#!/usr/bin/env stack
{- stack --resolver lts-14.1 script --optimize
--package scheduler
--package containers
--package random
--package splitmix
--package deepseq
-}
{-# LANGUAGE BangPatterns #-}
import Control.DeepSeq
import Control.Scheduler
import Data.Foldable as F
import Data.IORef
import Data.List (unfoldr)
import Data.Map.Strict as M
import System.Environment (getArgs)
import System.Random as R
import System.Random.SplitMix
-- for simplicity
chain :: Show a => a -> String
chain = show
makeTable :: Int -> SMGen -> (SMGen, M.Map String Int)
makeTable = go M.empty
where go !acc i gen
| i > 0 =
let (c, gen') = R.random gen
in go (M.insert (chain c) c acc) (i - 1) gen'
| otherwise = (gen, acc)
makeTablePar :: Int -> SMGen -> IO (M.Map String Int)
makeTablePar n0 gen0 = do
let gens = unfoldr (Just . splitSMGen) gen0
gensState <- initWorkerStates Par (\(WorkerId wid) -> newIORef (gens !! wid))
tables <-
withSchedulerWS gensState $ \scheduler -> do
let k = numWorkers (unwrapSchedulerWS scheduler)
(q, r) = n0 `quotRem` k
forM_ ((if r == 0 then [] else [r]) ++ replicate k q) $ \n ->
scheduleWorkState scheduler $ \genRef -> do
gen <- readIORef genRef
let (gen', table) = makeTable n gen
writeIORef genRef gen'
table `deepseq` pure table
pure $ F.foldl' M.union M.empty tables
main :: IO ()
main = do
[n] <- fmap read <$> getArgs
gen <- initSMGen
print =<< makeTablePar n gen
Few notes on implementation:
Don't use generator from random, it is hella slow, splitmix is x200 faster
In makeTable, if you want duplicate results to be discarded right away, then manual loop or unfold is required. But since we need the generator returned, I opted for the manual loop.
In order to minimize synchronization between threads, independent maps will be built up per thread, and at the end duplicates get removed, when resulting maps are merged together.

Related

Parallelize computation of mutable vector in ST

How can computations done in ST be made to run in parallel?
I have a vector which needs to be filled in by random access, hence the use of ST, and the computation runs correctly single-threaded, but have been unable to figure out how to use more than one core.
Random access is needed because of the meaning of the indices into the vector. There are n things and every possible way of choosing among n things has an entry in the vector, as in the choice function. Each of these choices corresponds to a binary number (conceptually, a packed [Bool]) and these Int values are the indices. If there are n things, then the size of the vector is 2^n. The natural way the algorithm runs is for every entry corresponding to "n choose 1" to be filled in, then every entry for "n choose 2," etc. The entries corresponding to "n choose k" depends on the entries corresponding to "n choose (k-1)." The integers for the different choices do not occur in numerical order, and that's why random access is needed.
Here's a pointless (but slow) computation that follows the same pattern. The example function shows how I tried to break the computation up so that the bulk of the work is done in a pure world (no ST monad). In the code below, bogus is where most of the work is done, with the intent of calling that in parallel, but only one core is ever used.
import qualified Data.Vector as Vb
import qualified Data.Vector.Mutable as Vm
import qualified Data.Vector.Generic.Mutable as Vg
import qualified Data.Vector.Generic as Gg
import Control.Monad.ST as ST ( ST, runST )
import Data.Foldable(forM_)
import Data.Char(digitToInt)
main :: IO ()
main = do
putStrLn $ show (example 9)
example :: Int -> Vb.Vector Int
example n = runST $ do
m <- Vg.new (2^n) :: ST s (Vm.STVector s Int)
Vg.unsafeWrite m 0 (1)
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = (choiceList n i) :: [Int]
forM_ newEntries $ \e -> do
let v = bogus p e
Vg.unsafeWrite m e v
Gg.unsafeFreeze m
choiceList :: Int -> Int -> [Int]
choiceList _ 0 = [0]
choiceList n 1 = [ 2^k | k <- [0..(n-1) ] ]
choiceList n k
| n == k = [2^n - 1]
| otherwise = (choiceList (n-1) k) ++ (map ((2^(n-1)) +) $ choiceList (n-1) (k-1))
prev :: Vm.STVector s Int -> Int -> Int -> ST s Integer
prev m n 0 = return 1
prev m n i = do
let chs = choiceList n i
v <- mapM (\k -> Vg.unsafeRead m k ) chs
let e = map (\k -> toInteger k ) v
return (sum e)
bogus :: Integer -> Int -> Int
bogus prior index = do
let f = fac prior
let g = (f^index) :: Integer
let d = (map digitToInt (show g)) :: [Int]
let a = fromIntegral (head d)^2
a
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n - 1)
If anyone tests this, using more than 9 or 10 in show (example 9) will take much longer than you want to wait for such a pointless sequence of numbers.
Just do it in IO. If you need to use the result in pure code, then unsafePerformIO is available.
The following version runs about 3-4 times faster with +RTS -N16 than +RTS -N1. My changes involved converting the ST vectors to IO, changing the forM_ to forConcurrently_, and adding a bang annotation to let !v = bogus ....
Full code:
import qualified Data.Vector as Vb
import qualified Data.Vector.Mutable as Vm
import qualified Data.Vector.Generic.Mutable as Vg
import qualified Data.Vector.Generic as Gg
import Control.Monad.ST as ST ( ST, runST )
import Data.Foldable(forM_)
import Data.Char(digitToInt)
import Control.Concurrent.Async
import System.IO.Unsafe
main :: IO ()
main = do
let m = unsafePerformIO (example 9)
putStrLn $ show m
example :: Int -> IO (Vb.Vector Int)
example n = do
m <- Vg.new (2^n)
Vg.unsafeWrite m 0 (1)
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = (choiceList n i) :: [Int]
forConcurrently_ newEntries $ \e -> do
let !v = bogus p e
Vg.unsafeWrite m e v
Gg.unsafeFreeze m
choiceList :: Int -> Int -> [Int]
choiceList _ 0 = [0]
choiceList n 1 = [ 2^k | k <- [0..(n-1) ] ]
choiceList n k
| n == k = [2^n - 1]
| otherwise = (choiceList (n-1) k) ++ (map ((2^(n-1)) +) $ choiceList (n-1) (k-1))
prev :: Vm.IOVector Int -> Int -> Int -> IO Integer
prev m n 0 = return 1
prev m n i = do
let chs = choiceList n i
v <- mapM (\k -> Vg.unsafeRead m k ) chs
let e = map (\k -> toInteger k ) v
return (sum e)
bogus :: Integer -> Int -> Int
bogus prior index = do
let f = fac prior
let g = (f^index) :: Integer
let d = (map digitToInt (show g)) :: [Int]
let a = fromIntegral (head d)^2
a
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n - 1)
I think this can not be done in a safe way. In the general case, it seems it would break Haskell's referential transparency.
If we could perform multi-threaded computations within ST s, then we could spawn two threads that race over the same STRef s Bool. Let's say one thread is writing False and the other one True.
After we use runST on the computation, we get an expression of type Bool which is sometimes False and sometimes True. That should not be possible.
If you are absolutely certain that your parallelization does not break referential transparency, you could try using unsafe primitives like unsafeIOToST to spawn new threads. Use with extreme care.
There might be safer ways to achieve something similar. Outside ST, we do have some parallelism available in Control.Parallel.Strategies.
There are a number of ways to do parallelization in Haskell. Usually they will give comparable performance improvements, however some are better then the others and it mostly depends on problem that needs parallelization. This particular use case looked very interesting to me, so I decided to investigate a few approaches.
Approaches
vector-strategies
We are using a boxed vector, therefore we can utilize laziness and built-in spark pool for parallelization. One very simple approach is provided by vector-strategies package, which can iterate over any immutable boxed vector and evaluate all of the thunks in parallel. It is also possible to split the vector in chunks, but as it turns out the chunk size of 1 is the optimal one:
exampleParVector :: Int -> Vb.Vector Int
exampleParVector n = example n `using` parVector 1
parallel
parVector uses par underneath and requires one extra iteration over the vector. In this case we are already iterating over thee vector, thus it would actually make more sense to use par from parallel directly. This would allow us to perform computation in parallel while continue using ST monad:
import Control.Parallel (par)
...
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forM_ newEntries $ \e -> do
let v = bogus p e
v `par` Vg.unsafeWrite m e v
It is important to note that the computation of each element of the vector is expensive when compared to the total number of elements in the vector. That is why using par is a very good solution here. If it was the opposite, namely the vector was very large, but elements weren't too expensive to compute, it would be better to use an unboxed vector and switch it to a different parallelization method.
async
Another way was described by #K.A.Buhr. Switch to IO from ST and use async:
import Control.Concurrent.Async (forConcurrently_)
...
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forConcurrently_ newEntries $ \e -> do
let !v = bogus p e
Vg.unsafeWrite m e v
The concern that #chi has raised is a valid one, however in this particular implementation it is safe to use unsafePerformIO instead of runST, because parallelization does not violate the invariant of deterministic computation. Namely, we can promise that regardless of the input supplied to example function, the output will always be exactly the same.
scheduler
Green threads are pretty cheap in Haskell, but they aren't free. The solution above with async package has one slight drawback: it will spin up at least as many threads as there are elements in the newEntries list each time forConcurrently_ is called. It would be better to spin up as many threads as there are capabilities (the -N RTS option) and let them do all the work. For this we can use scheduler package, which is a work stealing scheduler:
import Control.Scheduler (Comp(Par), runBatch_, withScheduler_)
...
withScheduler_ Par $ \scheduler ->
forM_ [1..n] $ \i -> runBatch_ scheduler $ \_ -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forM_ newEntries $ \e -> scheduleWork_ scheduler $ do
let !v = bogus p e
Vg.unsafeWrite m e v
Spark pool in GHC also uses a work stealing scheduler, which is built into RTS and is unrelated to the package above in any shape or form, but the idea is very similar: few threads with many units of computation.
Benchmarks
Here are some benchmarks on a 16-core machine for all of the approaches with example 7 (value 9 takes on the order of seconds, which introduces too much noise for criterion). We only get about x5 speedup, because a significant part of the algorithm is sequential in nature and can't be parallelized.

Why is recursion slower than filter with this Random number counter

I am writing a function that generates a million random numbers of 1 or 0 and then counts how many 0s were generated.
import System.Random
import Control.Monad
countZeros :: Int -> IO Int
countZeros n = (length . filter (==0)) <$> (replicateM n $ randomRIO (0,1 :: Int))
countZeros' :: Int -> IO Int
countZeros' n = go n 0
where
go :: Int -> Int -> IO Int
go x acc = do
r <- randomRIO (0,1 :: Int)
case x of
0 -> pure acc
_ -> let acc' = if r == 0 then succ acc else acc
in go (pred x) acc'
when I run the functions with an input of 1000000
>λ= countZeros 1000000
499716
(0.93 secs, 789,015,080 bytes)
>λ= countZeros' 1000000
500442
(2.02 secs, 1,109,569,560 bytes)
I don't understand why the prime function is twice as slow as the other. I assumed that they are essentially doing the same thing behind the scenes.
I am using GHCi.
What am I missing?
With bang patterns, and proper compilation with -O2, the "prime" function is faster:
{-# LANGUAGE BangPatterns #-}
module Main where
import System.Random
import Control.Monad
import System.Environment
countZeros :: Int -> IO Int
countZeros n = (length . filter (==0)) <$> (replicateM n $ randomRIO (0,1 :: Int))
countZeros' :: Int -> IO Int
countZeros' n = go n 0
where
go :: Int -> Int -> IO Int
go !x !acc = do
r <- randomRIO (0,1 :: Int)
case x of
0 -> pure acc
_ -> let acc' = if r == 0 then succ acc else acc
in go (pred x) acc'
main :: IO ()
main = do
[what] <- getArgs
let n = 1000 * 1000 * 10
fun = case what of
"1" -> countZeros
"2" -> countZeros'
_ -> error "arg not a number"
putStrLn "----"
print =<< fun n
putStrLn "----"
Compiled with
$ stack ghc -- RandomPerf.hs -O2 -Wall
$ stack ghc -- --version
The Glorious Glasgow Haskell Compilation System, version 8.6.3
Tests:
$ time ./RandomPerf.exe 1
----
4999482
----
real 0m3.329s
user 0m0.000s
sys 0m0.031s
$ time ./RandomPerf.exe 2
----
5001089
----
real 0m2.338s
user 0m0.000s
sys 0m0.046s
Repeating the tests gives comparable results, so this is not a fluke.
Result: the countZeros' function is significantly faster.
Using Criterion and running a proper benchmark is left as an exercise.
You probably used GHCi to assess performance, which prevents the optimizer to do its job. GHCi sacrifices proper optimization to load files faster, and be more usable in an interactive way.
These actually work in different ways from each other, at a level that matters. And both are slow.
The version using replicateM is bad because replicateM in IO can't stream its results. The entire list will be constructed at once, before filter and length get to start operating on it. The reason it's faster is that length is strict in its accumulator, so it doesn't generate a massive nested chain of thinks the way your other version does. And that's even worse for performance.
The recursive version doesn't use a strict accumulator. This means that the value it returns is a giant chain of nested thunks, holding on to all the generated entries and a bunch of indirect calls via list indexing. This is even more memory used than the filter version, because it's holding on to a bunch of closures as well as all the values. But even with that fixed, it would still be slow. Using !! just wrecks performance. It's recursive when a simple if would do the same job much more efficiently.

What is the fastest way to parse line with lots of Ints?

I'm learning Haskell for two years now and I'm still confused, whats the best (fastest) way to read tons of numbers from a single input line.
For learning I registered into hackerearth.com trying to solve every challenge in Haskell. But now I'm stuck with a challenge because I run into timeout issues. My program is just too slow for beeing accepted by the site.
Using the profiler I found out it takes 80%+ of the time for parsing a line with lots of integers. The percentage gets even higher when the number of values in the line increases.
Now this is the way, I'm reading numbers from an input line:
import qualified Data.ByteString.Char8 as C8
main = do
scores <- fmap (map (fst . fromJust . C8.readInt) . C8.words) C8.getLine :: IO [Int]
Is there any way to get the data faster into the variable?
BTW: The biggest testcase consist of a line with 200.000 9-digits values. Parsing takes incredible long (> 60s).
It's always difficult to declare a particular approach "the fastest", since there's almost always some way to squeeze out more performance. However, an approach using Data.ByteString.Char8 and the general method you suggest should be among the fastest methods for reading numbers. If you encounter a case where performance is poor, the problem likely lies elsewhere.
To give some concrete results, I generated a 191Meg file of 20 million 9-digit numbers, space-separate on a single line. I then tried several general methods of reading a line of numbers and printing their sum (which, for the record, was 10999281565534666). The obvious approach using String:
reader :: IO [Int]
reader = map read . words <$> getLine
sum' xs = sum xs -- work around GHC ticket 10992
main = print =<< sum' <$> reader
took 52secs; a similar approach using Text:
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
ran in 2.4secs (but note that it would need to be modified to handle negative numbers!); and the same approach using Char8:
import qualified Data.ByteString.Char8 as C
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
ran in 1.4secs. All examples were compiled with -O2 on GHC 8.0.2.
As a comparison benchmark, a scanf-based C implementation:
/* GCC 5.4.0 w/ -O3 */
#include <stdio.h>
int main()
{
long x, acc = 0;
while (scanf(" %ld", &x) == 1) {
acc += x;
}
printf("%ld\n", acc);
return 0;
}
ran in about 2.5secs, on par with the Text implementation.
You can squeeze a bit more performance out of the Char8 implementation. Using a hand-rolled parser:
readChar8' :: IO [Int]
readChar8' = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
runs in about 0.9secs -- I haven't tried to determine why there's a difference, but the compiler must be missing an opportunity to perform some optimization of the words-to-readInt pipeline.
Haskell Code for Reference
Make some numbers with Numbers.hs:
-- |Generate 20M 9-digit numbers:
-- ./Numbers 20000000 100000000 999999999 > data1.txt
import qualified Data.ByteString.Char8 as C
import Control.Monad
import System.Environment
import System.Random
main :: IO ()
main = do [n, a, b] <- map read <$> getArgs
nums <- replicateM n (randomRIO (a,b))
let _ = nums :: [Int]
C.putStrLn (C.unwords (map (C.pack . show) nums))
Find their sum with Sum.hs:
import Data.List
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
import qualified Data.Char8 as C
import qualified Data.ByteString.Char8 as C
import System.Environment
-- work around https://ghc.haskell.org/trac/ghc/ticket/10992
sum' xs = sum xs
readString :: IO [Int]
readString = map read . words <$> getLine
readText :: IO [Int]
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
readHand :: IO [Int]
readHand = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
main = do [method] <- getArgs
let reader = case method of
"string" -> readString
"text" -> readText
"char8" -> readChar8
"hand" -> readHand
print =<< sum' <$> reader
where:
./Sum string <data1.txt # 54.3 secs
./Sum text <data1.txt # 2.29 secs
./Sum char8 <data1.txt # 1.34 secs
./Sum hand <data1.txt # 0.91 secs

Random number sequence un Haskell and State Monad, what am I doing wrong?

As part of my Haskell journey, I am implementing a raytracer and I need to be able to draw sequences of random numbers at several places in the code. Typically I would like to be able to get say 64 samples for each pixels and pixels are computed in parallel.
I was looking at the state monad to achieve that and I was guided by this answer Sampling sequences of random numbers in Haskell but the code I wrote does not terminate and it's memory consumption explodes.
Here is the abstracted part of the code:
I was hopping to be able to call sampleUniform several time in the code to get new lists of random numbers but if I do runhaskell test.hs, it outputs the first character of the lis [ and then it is stuck in an apparently infinite loop.
module Main (main
, computeArray) where
import Control.Monad
import Control.Monad.State (State, evalState, get, put)
import System.Random (StdGen, mkStdGen, random)
import Control.Applicative ((<$>))
type Rnd a = State StdGen a
runRandom :: Rnd a -> Int -> a
runRandom action seed = evalState action $ mkStdGen seed
rand :: Rnd Double
rand = do
gen <- get
let (r, gen') = random gen
put gen'
return r
{- Uniform distributions -}
uniform01 :: Rnd [Double]
uniform01 = mapM (\_ -> rand) $ repeat ()
{- Get n samples uniformly distributed between 0 and 1 -}
sampleUniform :: Int -> Rnd [Double]
sampleUniform n = liftM (take n) uniform01
computeArray :: Rnd [Bool]
computeArray = do
samples1 <- sampleUniform 10
samples2 <- sampleUniform 10
let dat = zip samples1 samples2
return $ uncurry (<) <$> dat
main :: IO ()
main = do
let seed = 48
let res = runRandom computeArray seed
putStrLn $ show res
uniform01 threads your state through an infinite number of computations, which means that although it produces its result lazily, there is no hope of retrieving a final state at the end to use for the next sampling. liftM (take n) only affects the final value, not the state effects used to compute it. Therefore as written, you can only use uniform01/sampleUniform once.
Instead you can thread the state through only as many rand actions as you use, e.g. with
sampleUniform n = mapM (\_ -> rand) $ replicate n ()
or simpler
sampleUniform n = sequence $ replicate n rand

Haskell: simplifying a mutable vector type signature provided by :t

I can compile the following, but when I use :t qsort I get the long, complex type signature below. However, when I add that, the program will no longer type check, without additional imports (see answer below). My best guess at defining the type is also pasted below, but doen't type check (types and monads are bewildering).
Ok, so I could add an extra import statement and it type checks, but I've never seen a code signature this complex on any code published online. So my question is:
I can drop Ord t and exhange Int for t and it checks, but how do I substitute in ST or IO for Control.Monad.Primitive.PrimMonad, and what do I do with v.
I'm trying to write a blog post that provides an example of how to use these various Monads, and by contrast for Arrays the type signature of qsort is a more manageable qsort :: (STArray s Int Int) -> Int -> Int -> ST s (). (For those that want to understand s there are lots of explanations on line, all of which passed slightly above my head - I get that it is a clever trick to get the type checker itself to prevent the author writing code where data from the Monad leaks out and thus leads to impurity.)
import Control.Monad
import Control.Monad.ST
import qualified Data.Vector as V
import qualified Data.Vector.Generic.Mutable as MV
main = do
lines <- BS.lines `fmap` BS.readFile "10.txt"
let
inputData = Prelude.map (maybe (error "can't read Int") fst . BS.readInt) lines
initialImmutableVector = V.fromList inputData
print $ runST $ do
state <- V.thaw initialImmutableVector
qsort state 0 (Prelude.length inputData - 1)
frozen <- V.freeze state
return frozen
--qsort :: (MV.MVector s Int) -> Int -> Int -> ST s ()
--qsort
-- :: (Ord t, Control.Monad.Primitive.PrimMonad m, MV.MVector v t) =>
-- v (Control.Monad.Primitive.PrimState m) t -> Int -> Int -> m ()
qsort vec min mx =
if mx - min < 1 then
return ()
else do
p <- MV.read vec min
final_i <- foldM (partitioner p) (min+1) [(min+1)..mx]
swap min (final_i - 1)
qsort vec min (final_i-2)
qsort vec final_i mx
where
swap i j = do
vec_i <- MV.read vec i
vec_j <- MV.read vec j
MV.write vec i vec_j
MV.write vec j vec_i
partitioner p i acc = do
vec_acc <- MV.read vec acc
if vec_acc > p then
return i
else do
swap i acc
return $ i+1
Importing function(-s) does not import their types. If your code explicitly refers to the type, it has to be imported. You can use imported functions without importing the types of their arguments or return values as long as the code does not explicitly refer to those types. Once you start using types or classes explicitly, you have to import them, this is intentional.
It looks like your last attempt is correct, but perhaps you didn't import everything you needed? The code you pasted as-is doesn't compile with or without type signatures. Here's a very slightly modified version of what you have that works just fine for me (with GHC 7.8.3):
import Control.Monad
import Control.Monad.ST
import qualified Data.Vector as V
import qualified Data.Vector.Generic.Mutable as MV
import qualified Data.ByteString.Char8 as BS
import Control.Monad.Primitive (PrimState, PrimMonad)
import Prelude hiding (lines, min)
main :: IO ()
main = do
lines <- BS.lines `fmap` BS.readFile "10.txt"
let
inputData = map (maybe (error "can't read Int") fst . BS.readInt) lines
initialImmutableVector = V.fromList inputData
print $ runST $ do
state <- V.thaw initialImmutableVector
qsort state 0 (Prelude.length inputData - 1)
frozen <- V.freeze state
return frozen
qsort :: (Ord t, PrimMonad m, MV.MVector v t)
=> v (PrimState m) t -> Int -> Int -> m ()
qsort vec min mx =
if mx - min < 1 then
return ()
else do
p <- MV.read vec min
final_i <- foldM (partitioner p) (min+1) [(min+1)..mx]
swap min (final_i - 1)
qsort vec min (final_i-2)
qsort vec final_i mx
where
swap i j = do
vec_i <- MV.read vec i
vec_j <- MV.read vec j
MV.write vec i vec_j
MV.write vec j vec_i
partitioner p i acc = do
vec_acc <- MV.read vec acc
if vec_acc > p then
return i
else do
swap i acc
return $ i+1

Resources