Can Haskell's Control.Concurrent.Async.mapConcurrently have a limit?

Can Haskell's Control.Concurrent.Async.mapConcurrently have a limit? - haskell

I'm attempting to run multiple downloads in parallel in Haskell, which I would normally just use the Control.Concurrent.Async.mapConcurrently function for. However, doing so opens ~3000 connections, which causes the web server to reject them all. Is it possible to accomplish the same task as mapConcurrently, but only have a limited number of connections open at a time (i.e. only 2 or 4 at a time)?

A quick solution would be to use a semaphore to restrict the number of concurrent actions. It's not optimal (all threads are created at once and then wait), but works:
import Control.Concurrent.MSem
import Control.Concurrent.Async
import Control.Concurrent (threadDelay)
import qualified Data.Traversable as T
mapPool :: T.Traversable t => Int -> (a -> IO b) -> t a -> IO (t b)
mapPool max f xs = do
sem <- new max
mapConcurrently (with sem . f) xs
-- A little test:
main = mapPool 10 (\x -> threadDelay 1000000 >> print x) [1..100]

You may also try the pooled-io package where you can write:
import qualified Control.Concurrent.PooledIO.Final as Pool
import Control.DeepSeq (NFData)
import Data.Traversable (Traversable, traverse)
mapPool ::
(Traversable t, NFData b) =>
Int -> (a -> IO b) -> t a -> IO (t b)
mapPool n f = Pool.runLimited n . traverse (Pool.fork . f)

This is really easy to do using the Control.Concurrent.Spawn library:
import Control.Concurrent.Spawn
type URL = String
type Response = String
numMaxConcurrentThreads = 4
getURLs :: [URL] -> IO [Response]
getURLs urlList = do
wrap <- pool numMaxConcurrentThreads
parMapIO (wrap . fetchURL) urlList
fetchURL :: URL -> IO Response

Chunking the threads may be inefficient if a few of them last significantly longer than the others. Here is a smoother, yet more complex, solution:
{-# LANGUAGE TupleSections #-}
import Control.Concurrent.Async (async, waitAny)
import Data.List (delete, sortBy)
import Data.Ord (comparing)
concurrentlyLimited :: Int -> [IO a] -> IO [a]
concurrentlyLimited n tasks = concurrentlyLimited' n (zip [0..] tasks) [] []
concurrentlyLimited' _ [] [] results = return . map snd $ sortBy (comparing fst) results
concurrentlyLimited' 0 todo ongoing results = do
(task, newResult) <- waitAny ongoing
concurrentlyLimited' 1 todo (delete task ongoing) (newResult:results)
concurrentlyLimited' n [] ongoing results = concurrentlyLimited' 0 [] ongoing results
concurrentlyLimited' n ((i, task):otherTasks) ongoing results = do
t <- async $ (i,) <$> task
concurrentlyLimited' (n-1) otherTasks (t:ongoing) results
Note : the above code could be made more generic using an instance of MonadBaseControl IO in place of IO, thanks to lifted-async.

If you have actions in a list, this one has less dependencies
import Control.Concurrent.Async (mapConcurrently)
import Data.List.Split (chunksOf)
mapConcurrentChunks :: Int -> (a -> IO b) -> [a] -> IO [b]
mapConcurrentChunks n ioa xs = concat <$> mapM (mapConcurrently ioa) (chunksOf n xs)
Edit: Just shortened a bit

Related

Lazy list wrapped in IO

Suppose the code
f :: IO [Int]
f = f >>= return . (0 :)
g :: IO [Int]
g = f >>= return . take 3
When I run g in ghci, it cause stackoverflow. But I was thinking maybe it could be evaluated lazily and produce [0, 0, 0] wrapped in IO. I suspect IO is to blame here, but I really have no idea. Obviously the following works:
f' :: [Int]
f' = 0 : f'
g' :: [Int]
g' = take 3 f'
Edit: In fact I am not interested in having such a simple function f, original code looked more along the lines:
h :: a -> IO [Either b c]
h a = do
(r, a') <- h' a
case r of
x#(Left _) -> h a' >>= return . (x :)
y#(Right _) -> return [y]
h' :: IO (Either b c, a)
-- something non trivial
main :: IO ()
main = mapM_ print . take 3 =<< h a
h does some IO computations and stores invalid (Left) responses in a list until a valid response (Right) is produced. The attempt is to construct the list lazily even though we are in the IO monad. So that someone reading the result of h can start consuming the list even before it is complete (because it may even be infinite). And if the one reading the results cares only for the first 3 entries no matter what, the rest of the list does not even have to be constructed. And I am getting the feeling that this will not be possible :/.

Yes, IO is to blame here. >>= for IO is strict in the "state of the world". If you write m >>= h, you'll get an action that first performs the action m, then applies h to the result, and finally performs the action h yields. It doesn't matter that your f action doesn't "do anything"; it has to be performed anyway. Thus you end up in an infinite loop starting the f action over and over.
Thankfully, there is a way around this, because IO is an instance of MonadFix. You can "magically" access the result of an IO action from within that action. Critically, that access must be sufficiently lazy, or you'll throw yourself into an infinite loop.
import Control.Monad.Fix
import Data.Functor ((<$>))
f :: IO [Int]
f = mfix (\xs -> return (0 : xs))
-- This `g` is just like yours, but prettier IMO
g :: IO [Int]
g = take 3 <$> f
There's even a bit of syntactic sugar in GHC for this letting you use do notation with the rec keyword or mdo notation.
{-# LANGUAGE RecursiveDo #-}
f' :: IO [Int]
f' = do
rec res <- (0:) <$> (return res :: IO [Int])
return res
f'' :: IO [Int]
f'' = mdo
res <- f'
return (0 : res)
For more interesting examples of ways to use MonadFix, see the Haskell Wiki.

It sounds like you want a monad that mixes the capabilities of lists and IO. Luckily, that's just what ListT is for. Here's your example in that form, with an h' that computes the Collatz sequence and asks the user how they feel about each element in the sequence (I couldn't really think of anything convincing that fit the shape of your outline).
import Control.Monad.IO.Class
import qualified ListT as L
h :: Int -> L.ListT IO (Either String ())
h a = do
(r, a') <- liftIO (h' a)
case r of
x#(Left _) -> L.cons x (h a')
y#(Right _) -> return y
h' :: Int -> IO (Either String (), Int)
h' 1 = return (Right (), 1)
h' n = do
putStrLn $ "Say something about " ++ show n
s <- getLine
return (Left s, if even n then n `div` 2 else 3*n + 1)
main = readLn >>= L.traverse_ print . L.take 3 . h
Here's how it looks in ghci:
> main
2
Say something about 2
small
Left "small"
Right ()
> main
3
Say something about 3
prime
Left "prime"
Say something about 10
not prime
Left "not prime"
Say something about 5
fiver
Left "fiver"
I suppose modern approaches would use pipes or conduits or iteratees or something, but I don't know enough about them to talk about the tradeoffs compared to ListT.

I'm not sure if this is an appropriate usage, but unsafeInterleaveIO would get you the behavior you're asking for, by deferring the IO actions of f until the value inside of f is asked for:
module Tmp where
import System.IO.Unsafe (unsafeInterleaveIO)
f :: IO [Int]
f = unsafeInterleaveIO f >>= return . (0 :)
g :: IO [Int]
g = f >>= return . take 3
*Tmp> g
[0,0,0]

Haskell: Random number

I have written a function to get a pair from [-10,10] by random.
import System.Random
main =
do {
s <- randomNumber
; b <- randomNumber
; print (head s,head b)}
randomNumber :: IO [Int]
randomNumber = sequence $ replicate 1 $ randomRIO (-10,10)
Now I want to take a list like [(1,2),(2,3),(2,3)], all the number is come from the randomNumber. How can I do that? I don't know how to achieve that.
I have tried to use state to get random, but somehow I can't use state on my computer.
I did this :
import System.Random
import Control.Monad.State
randomSt :: (RandomGen g, Random a) => State g a
randomSt = State random
But when I compiled it, it showed: Not in scope: data constructor ‘State’

So if all you want is a function
randomPairs :: IO [(Int, Int)]
then we can do something like
randomList :: IO [Int]
randomList = randomRs (-10, 10) `fmap` newStdGen
randomPairs = ??? randomList randomList
where ??? takes two IO [Int] and "zips" them together to form a IO [(Int, Int)]. We now turn to hoogle and query for a function [a] -> [a] -> [(a, a]) and we find a function zip :: [a] -> [b] -> [(a, b)] we now just need to "lift" zip into the IO monad to work with it across IO lists so we end up with
randomPairs = liftM2 zip randomList randomList
or if we want to be really fancy, we could use applicatives instead and end up with
import Control.Applicative
randomPairs = zip <$> randomList <*> randomList
But judging from your randomNumber funciton, you really just want one pair. The idea is quite similar. Instead of generating a list, we generate just one random number with randomRIO (-10, 10) and lift (,) :: a -> b -> (a, b) resulting in
randomPair = (,) <$> randomRIO (-10, 10) <*> randomRIO (-10, 10)
Finally, the State data constructor went away a while ago because the MTL moved from having separate State and StateT types to making State a type synonym. Nowadays you need to use the lowercase state :: (s -> (s, a)) -> State s a
To clarify, my final code is
import System.Random
import Control.Monad
randomList :: IO [Int]
randomList = randomRs (-10, 10) `fmap` newStdGen
pairs :: IO [(Int, Int)]
pairs = liftM2 zip randomList randomList
somePairs n = take n `fmap` pairs
main = somePairs 10 >>= print

Frequency of characters

I am trying to find frequency of characters in file using Haskell. I want to be able to handle files ~500MB size.
What I've tried till now
It does the job but is a bit slow as it parses the file 256 times
calculateFrequency :: L.ByteString -> [(Word8, Int64)]
calculateFrequency f = foldl (\acc x -> (x, L.count x f):acc) [] [255, 254.. 0]
I have also tried using Data.Map but the program runs out of memory (in ghc interpreter).
import qualified Data.ByteString.Lazy as L
import qualified Data.Map as M
calculateFrequency' :: L.ByteString -> [(Word8, Int64)]
calculateFrequency' xs = M.toList $ L.foldl' (\m word -> M.insertWith (+) word 1 m) (M.empty) xs

Here's an implementation using mutable, unboxed vectors instead of higher level constructs. It also uses conduit for reading the file to avoid lazy I/O.
import Control.Monad.IO.Class
import qualified Data.ByteString as S
import Data.Conduit
import Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import qualified Data.Vector.Unboxed.Mutable as VM
import Data.Word (Word8)
type Freq = VM.IOVector Int
newFreq :: MonadIO m => m Freq
newFreq = liftIO $ VM.replicate 256 0
printFreq :: MonadIO m => Freq -> m ()
printFreq freq =
liftIO $ mapM_ go [0..255]
where
go i = do
x <- VM.read freq i
putStrLn $ show i ++ ": " ++ show x
addFreqWord8 :: MonadIO m => Freq -> Word8 -> m ()
addFreqWord8 f w = liftIO $ do
let index = fromIntegral w
oldCount <- VM.read f index
VM.write f index (oldCount + 1)
addFreqBS :: MonadIO m => Freq -> S.ByteString -> m ()
addFreqBS f bs =
loop (S.length bs - 1)
where
loop (-1) = return ()
loop i = do
addFreqWord8 f (S.index bs i)
loop (i - 1)
-- | The main entry point.
main :: IO ()
main = do
freq <- newFreq
runResourceT
$ sourceFile "random"
$$ CL.mapM_ (addFreqBS freq)
printFreq freq
I ran this on 500MB of random data and compared with #josejuan's UArray-based answer:
conduit based/mutable vectors: 1.006s
UArray: 17.962s
I think it should be possible to keep much of the elegance of josejuan's high-level approach yet keep the speed of the mutable vector implementation, but I haven't had a chance to try implementing something like that yet. Also, note that with some general purpose helper functions (like Data.ByteString.mapM or Data.Conduit.Binary.mapM) the implementation could be significantly simpler without affecting performance.
You can play with this implementation on FP Haskell Center as well.
EDIT: I added one of those missing functions to conduit and cleaned up the code a bit; it now looks like the following:
import Control.Monad.Trans.Class (lift)
import Data.ByteString (ByteString)
import Data.Conduit (Consumer, ($$))
import qualified Data.Conduit.Binary as CB
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as VM
import System.IO (stdin)
freqSink :: Consumer ByteString IO (V.Vector Int)
freqSink = do
freq <- lift $ VM.replicate 256 0
CB.mapM_ $ \w -> do
let index = fromIntegral w
oldCount <- VM.read freq index
VM.write freq index (oldCount + 1)
lift $ V.freeze freq
main :: IO ()
main = (CB.sourceHandle stdin $$ freqSink) >>= print
The only difference in functionality is how the frequency is printed.

#Alex answer is good but, with only 256 values (indexes) an array should be better
import qualified Data.ByteString.Lazy as L
import qualified Data.Array.Unboxed as A
import qualified Data.ByteString as B
import Data.Int
import Data.Word
fq :: L.ByteString -> A.UArray Word8 Int64
fq = A.accumArray (+) 0 (0, 255) . map (\c -> (c, 1)) . concat . map B.unpack . L.toChunks
main = L.getContents >>= print . fq
#alex code take (for my sample file) 24.81 segs, using array take 7.77 segs.
UPDATED:
although Snoyman solution is better, an improvement avoiding unpack maybe
fq :: L.ByteString -> A.UArray Word8 Int64
fq = A.accumArray (+) 0 (0, 255) . toCounterC . L.toChunks
where toCounterC [] = []
toCounterC (x:xs) = toCounter x (B.length x) xs
toCounter _ 0 xs = toCounterC xs
toCounter x i xs = (B.index x i', 1): toCounter x i' xs
where i' = i - 1
with ~50% speedup.
UPDATED:
Using IOVector as Snoyman is as Conduit version (a bit faster really, but this is a raw code, better use Conduit)
import Data.Int
import Data.Word
import Control.Monad.IO.Class
import qualified Data.ByteString.Lazy as L
import qualified Data.Array.Unboxed as A
import qualified Data.ByteString as B
import qualified Data.Vector.Unboxed.Mutable as V
fq :: L.ByteString -> IO (V.IOVector Int64)
fq xs =
do
v <- V.replicate 256 0 :: IO (V.IOVector Int64)
g v $ L.toChunks xs
return v
where g v = toCounterC
where toCounterC [] = return ()
toCounterC (x:xs) = toCounter x (B.length x) xs
toCounter _ 0 xs = toCounterC xs
toCounter x i xs = do
let i' = i - 1
w = fromIntegral $ B.index x i'
c <- V.read v w
V.write v w (c + 1)
toCounter x i' xs
main = do
v <- L.getContents >>= fq
mapM_ (\i -> V.read v i >>= liftIO . putStr . (++", ") . show) [0..255]

This works for me on my computer:
module Main where
import qualified Data.HashMap.Strict as M
import qualified Data.ByteString.Lazy as L
import Data.Word
import Data.Int
calculateFrequency :: L.ByteString -> [(Word8, Int64)]
calculateFrequency xs = M.toList $ L.foldl' (\m word -> M.insertWith (+) word 1 m) M.empty xs
main = do
bs <- L.readFile "E:\\Steam\\SteamApps\\common\\Sid Meier's Civilization V\\Assets\\DLC\\DLC_Deluxe\\Behind the Scenes\\Behind the Scenes.wmv"
print (calculateFrequency bs)
Doesn't run out of memory, or even load the whole file in, but takes forever (about a minute) on 600mb+ files! I compiled this using ghc 7.6.3.
I should point out that the code is basically identical save for the strict HashMap instead of the lazy Map.
Note that insertWith is twice as fast with HashMap than Map in this case. On my machine, the code as written executes in 54 seconds, while the version using Map takes 107.

My two cents (using an STUArray). Can't compare it to other solutions here. Someone might be willing to try it...
module Main where
import Data.Array.ST (runSTUArray, newArray, readArray, writeArray)
import Data.Array.Unboxed (UArray)
import qualified Data.ByteString.Lazy as L (ByteString, unpack, getContents)
import Data.Word
import Data.Int
import Control.Monad (forM_)
calculateFrequency :: L.ByteString -> UArray Word8 Int64
calculateFrequency bs = runSTUArray $ do
a <- newArray (0, 255) 0
forM_ (L.unpack bs) $ \i -> readArray a i >>= writeArray a i . succ
return a
main = L.getContents >>= print . calculateFrequency

MonadFix instance for Rand monad

I would like to generate infinite stream of numbers with Rand monad from System.Random.MWC.Monad. If only there would be a MonadFix instance for this monad, or instance like this:
instance (PrimMonad m) => MonadFix m where
...
then one could write:
runWithSystemRandom (mfix (\ xs -> uniform >>= \x -> return (x:xs)))
There isn't one though.
I was going through MonadFix docs but I don't see an obvious way of implementing this instance.

You can write a MonadFix instance. However, the code will not generate an infinite stream of distinct random numbers. The argument to mfix is a function that calls uniform exactly once. When the code is run, it will call uniform exactly once, and create an infinite list containing the result.
You can try the equivalent IO code to see what happens:
import System.Random
import Control.Monad.Fix
main = print . take 10 =<< mfix (\xs -> randomIO >>= (\x -> return (x : xs :: [Int])))
It seems that you want to use a stateful random number generator, and you want to run the generator and collect its results lazily. That isn't possible without careful use of unsafePerformIO. Unless you need to produce many random numbers quickly, you can use a pure RNG function such as randomRs instead.

A question: how do you wish to generate your initial seed?
The problem is that MWS is built on the "primitive" package which abstracts only IO and strict (Control.Monad.ST.ST s). It does not also abstract lazy (Control.Monad.ST.Lazy.ST s).
Perhaps one could make instances for "primitive" to cover lazy ST and then MWS could be lazy.
UPDATE: I can make this work using Control.Monad.ST.Lazy by using strictToLazyST:
module Main where
import Control.Monad(replicateM)
import qualified Control.Monad.ST as S
import qualified Control.Monad.ST.Lazy as L
import qualified System.Random.MWC as A
foo :: Int -> L.ST s [Int]
foo i = do rest <- foo $! succ i
return (i:rest)
splam :: A.Gen s -> S.ST s Int
splam = A.uniformR (0,100)
getS :: Int -> S.ST s [Int]
getS n = do gen <- A.create
replicateM n (splam gen)
getL :: Int -> L.ST s [Int]
getL n = do gen <- createLazy
replicateM n (L.strictToLazyST (splam gen))
createLazy :: L.ST s (A.Gen s)
createLazy = L.strictToLazyST A.create
makeLots :: A.Gen s -> L.ST s [Int]
makeLots gen = do x <- L.strictToLazyST (A.uniformR (0,100) gen)
rest <- makeLots gen
return (x:rest)
main = do
print (S.runST (getS 8))
print (L.runST (getL 8))
let inf = L.runST (foo 0) :: [Int]
print (take 10 inf)
let inf3 = L.runST (createLazy >>= makeLots) :: [Int]
print (take 10 inf3)

(This would be better suited as a comment to Heatsink's answer, but it's a bit too long.)
MonadFix instances must adhere to several laws. One of them is left shrinking/thightening:
mfix (\x -> a >>= \y -> f x y) = a >>= \y -> mfix (\x -> f x y)
This law allows to rewrite your expression as
mfix (\xs -> uniform >>= \x -> return (x:xs))
= uniform >>= \x -> mfix (\xs -> return (x:xs))
= uniform >>= \x -> mfix (return . (x :))
Using another law, purity mfix (return . h) = return (fix h), we can further simplify to
= uniform >>= \x -> return (fix (x :))
and using the standard monad laws and rewriting fix (x :) as repeat x
= liftM (\x -> fix (x :)) uniform
= liftM repeat uniform
Therefore, the result is indeed one invocation of uniform and then just repeating the single value indefinitely.

Collecting IO outputs into list

How can I issue multiple calls to SDL.pollEvent :: IO Event until the output is SDL.NoEvent and collect all the results into a list?
In imperative terms something like this:
events = []
event = SDL.pollEvent
while ( event != SDL.NoEvent ) {
events.add( event )
event = SDL.pollEvent
}

James Cook was so kind to extend monad-loops with this function:
unfoldWhileM :: Monad m => (a -> Bool) -> m a -> m [a]
used with SDL:
events <- unfoldWhileM (/= SDL.NoEvent) SDL.pollEvent

You could use something like:
takeWhileM :: (a -> Bool) -> IO a -> IO [a]
takeWhileM p act = do
x <- act
if p x
then do
xs <- takeWhileM p act
return (x : xs)
else
return []
Instead of:
do
xs <- takeWhileM p act
return (x : xs)
you can also use:
liftM (x:) (takeWhileM p act) yielding:
takeWhileM :: (a -> Bool) -> IO a -> IO [a]
takeWhileM p act = do
x <- act
if p x
then liftM (x:) (takeWhileM p act)
else return []
Then you can use: takeWhileM (/=SDL.NoEvent) SDL.pollEvent

You can use monadic lists:
import Control.Monad.ListT (ListT)
import Control.Monad.Trans.Class (lift) -- transformers, not mtl
import Data.List.Class (takeWhile, repeat, toList)
import Prelude hiding (takeWhile, repeat)
getEvents :: IO [Event]
getEvents =
toList . takeWhile (/= NoEvent) $ do
repeat ()
lift pollEvent :: ListT IO Event
ListT from the "List" package on hackage.

Using these stubs for Event and pollEvent
data Event = NoEvent | SomeEvent
deriving (Show,Eq)
instance Random Event where
randomIO = randomRIO (0,1) >>= return . ([NoEvent,SomeEvent] !!)
pollEvent :: IO Event
pollEvent = randomIO
and a combinator, borrowed and adapted from an earlier answer, that stops evaluating the first time the predicate fails
spanM :: (Monad m) => (a -> Bool) -> m a -> m [a]
spanM p a = do
x <- a
if p x then do xs <- spanM p a
return (x:xs)
else return [x]
allows this ghci session, for example:
*Main> spanM (/= NoEvent) pollEvent
[SomeEvent,SomeEvent,NoEvent]

i eventually stumbled over this code snippet in an actual SDL game from hackage
getEvents :: IO Event -> [Event] -> IO [Event]
getEvents pEvent es = do
e <- pEvent
let hasEvent = e /= NoEvent
if hasEvent
then getEvents pEvent (e:es)
else return (reverse es)
thanks for your answers btw!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Can Haskell's Control.Concurrent.Async.mapConcurrently have a limit? - haskell

Related

Lazy list wrapped in IO

Haskell: Random number

Frequency of characters

MonadFix instance for Rand monad

Collecting IO outputs into list

Categories

Resources