Filtering / branching enumeratee

Filtering / branching enumeratee - haskell

I am using enumerator-0.4.10, and I need to distribute processing of
different parts of the incoming stream to different iteratees (I am
parsing a huge XML file, and different sub-trees have different
processing logic). Only a single iteratee will be active at a time
since the sub-trees don't intersect.
I wrote a simple example that filters the stream and passes the result
to one iteratee; please see below. However, with multiple nested
iteratees it seems to me that I can no longer use an enumeratee. Do I
need to write my own multi-enumeratee that holds multiple inner
iteratees? Any better ideas?
Here is my (beginner's) code for a single nested iteratee:
module Main ( main ) where
import qualified Data.Enumerator as E ( Enumeratee, Step(..), Stream(..),
checkDone, checkDoneEx, continue, enumList, joinI, run_, yield )
import Data.Enumerator ( ($$), (>>==) )
import qualified Data.Enumerator.List as EL ( consume )
-- cribbed from EL.concatMap
concatMapAccum :: Monad m => (s -> ao -> (s, [ai])) -> s ->
E.Enumeratee ao ai m b
concatMapAccum f s0 = E.checkDone (E.continue . step s0)
where
step _ k E.EOF = E.yield (E.Continue k) E.EOF
step s k (E.Chunks xs) = loop s k xs
loop s k [] = E.continue (step s k)
loop s k (x:xs) = case f s x of
(s', ais) -> k (E.Chunks $ ais) >>==
E.checkDoneEx (E.Chunks xs) (\k' -> loop s' k' xs)
passFromTo :: Monad m => ((a -> Bool), (a -> Bool)) -> Bool -> E.Enumeratee a a m b
passFromTo (from, to) pass0 =
concatMapAccum updatePass pass0
where
updatePass pass el = case (pass, from el, to el) of
(True, _, to_el) -> (not to_el, [el])
(False, True, _) -> (True, [el])
(False, False, _) -> (False, [])
main :: IO()
main = do
E.run_ (E.enumList 3 [1..20] $$
E.joinI $ passFromTo ((\e -> e == 3 || e == 13), (\e -> e == 7 || e == 17)) False $$
EL.consume) >>= print
$ ./dist/build/StatefulEnumeratee/StatefulEnumeratee
[3,4,5,6,7,13,14,15,16,17]

Yes, you need an enumeratee that passes the stream to multiple iteratees, like Data.Iteratee.sequence_ and Data.Iteratee.Parallel.psequence_ from iteratee-0.8.6. sequence_ takes a list of iteratees to run simultaneously, and handles each input chunk by mapM across that list. psequence_ takes similar arguments, but runs each input iteratee in a separate forkIO thread.
There has been some discussion on haskell-cafe and the iteratee mailing lists about these over the past year, eg: http://www.haskell.org/pipermail/haskell-cafe/2011-January/088319.html The main thing to be careful about is handling errors from the inner iteratees: in your application, if one inner iteratee fails do you want to terminate all iteratees or just that one, and [how] do you want to propagate those errors.

Related

How to randomly shuffle a list

I have random number generator
rand :: Int -> Int -> IO Int
rand low high = getStdRandom (randomR (low,high))
and a helper function to remove an element from a list
removeItem _ [] = []
removeItem x (y:ys) | x == y = removeItem x ys
| otherwise = y : removeItem x ys
I want to shuffle a given list by randomly picking an item from the list, removing it and adding it to the front of the list. I tried
shuffleList :: [a] -> IO [a]
shuffleList [] = []
shuffleList l = do
y <- rand 0 (length l)
return( y:(shuffleList (removeItem y l) ) )
But can't get it to work. I get
hw05.hs:25:33: error:
* Couldn't match expected type `[Int]' with actual type `IO [Int]'
* In the second argument of `(:)', namely
....
Any idea ?
Thanks!

Since shuffleList :: [a] -> IO [a], we have shuffleList (xs :: [a]) :: IO [a].
Obviously, we can't cons (:) :: a -> [a] -> [a] an a element onto an IO [a] value, but instead we want to cons it onto the list [a], the computation of which that IO [a] value describes:
do
y <- rand 0 (length l)
-- return ( y : (shuffleList (removeItem y l) ) )
shuffled <- shuffleList (removeItem y l)
return y : shuffled
In do notation, values to the right of <- have types M a, M b, etc., for some monad M (here, IO), and values to the left of <- have the corresponding types a, b, etc..
The x :: a in x <- mx gets bound to the pure value of type a produced / computed by the M-type computation which the value mx :: M a denotes, when that computation is actually performed, as a part of the combined computation represented by the whole do block, when that combined computation is performed as a whole.
And if e.g. the next line in that do block is y <- foo x, it means that a pure function foo :: a -> M b is applied to x and the result is calculated which is a value of type M b, denoting an M-type computation which then runs and produces / computes a pure value of type b to which the name y is then bound.
The essence of Monad is thus this slicing of the pure inside / between the (potentially) impure, it is these two timelines going on of the pure calculations and the potentially impure computations, with the pure world safely separated and isolated from the impurities of the real world. Or seen from the other side, the pure code being run by the real impure code interacting with the real world (in case M is IO). Which is what computer programs must do, after all.
Your removeItem is wrong. You should pick and remove items positionally, i.e. by index, not by value; and in any case not remove more than one item after having picked one item from the list.
The y in y <- rand 0 (length l) is indeed an index. Treat it as such. Rename it to i, too, as a simple mnemonic.

Generally, with Haskell it works better to maximize the amount of functional code at the expense of non-functional (IO or randomness-related) code.
In your situation, your “maximum” functional component is not removeItem but rather a version of shuffleList that takes the input list and (as mentioned by Will Ness) a deterministic integer position. List function splitAt :: Int -> [a] -> ([a], [a]) can come handy here. Like this:
funcShuffleList :: Int -> [a] -> [a]
funcShuffleList _ [] = []
funcShuffleList pos ls =
if (pos <=0) || (length(take (pos+1) ls) < (pos+1))
then ls -- pos is zero or out of bounds, so leave list unchanged
else let (left,right) = splitAt pos ls
in (head right) : (left ++ (tail right))
Testing:
λ>
λ> funcShuffleList 4 [0,1,2,3,4,5,6,7,8,9]
[4,0,1,2,3,5,6,7,8,9]
λ>
λ> funcShuffleList 5 "#ABCDEFGH"
"E#ABCDFGH"
λ>
Once you've got this, you can introduce randomness concerns in simpler fashion. And you do not need to involve IO explicitely, as any randomness-friendly monad will do:
shuffleList :: MonadRandom mr => [a] -> mr [a]
shuffleList [] = return []
shuffleList ls =
do
let maxPos = (length ls) - 1
pos <- getRandomR (0, maxPos)
return (funcShuffleList pos ls)
... IO being just one instance of MonadRandom.
You can run the code using the default IO-hosted random number generator:
main = do
let inpList = [0,1,2,3,4,5,6,7,8]::[Integer]
putStrLn $ "inpList = " ++ (show inpList)
-- mr automatically instantiated to IO:
outList1 <- shuffleList inpList
putStrLn $ "outList1 = " ++ (show outList1)
outList2 <- shuffleList outList1
putStrLn $ "outList2 = " ++ (show outList2)
Program output:
$ pickShuffle
inpList = [0,1,2,3,4,5,6,7,8]
outList1 = [6,0,1,2,3,4,5,7,8]
outList2 = [8,6,0,1,2,3,4,5,7]
$
$ pickShuffle
inpList = [0,1,2,3,4,5,6,7,8]
outList1 = [4,0,1,2,3,5,6,7,8]
outList2 = [2,4,0,1,3,5,6,7,8]
$
The output is not reproducible here, because the default generator is seeded by its launch time in nanoseconds.
If what you need is a full random permutation, you could have a look here and there - Knuth a.k.a. Fisher-Yates algorithm.

Writing an assembler in Haskell - mapM with state?

I'm writing a very simple two-pass assembler in Haskell and I've come across a scenario that I don't yet have the experience to solve. I think the solution is likely to involve monad transformers, which I don't really understand.
The assembler parses the assembly code into a list of Statements, which are either instructions or labels. Some Statements may refer to labels. The assembler needs to convert the Statements into Instructions, which involves eliminating the labels and substituting the label references with an appropriate value.
I have written the first pass of the assembler, which produces a [(String, Int)] representing a map from labels to addresses. I have also written the following function for translating a Statement into an Instruction:
stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]
stmtToInstruction addr labels stmt = case stmt of
ADD d s1 s2 -> Right [I.ADD d s1 s2]
BEQL s1 s2 l -> case do label <- find (\e -> fst e == l) labels
let labelAddr = snd label
let relativeAddr = I.ImmS $ fromIntegral (labelAddr - addr)
return (I.BEQ s1 s2 relativeAddr) of
Just i -> Right [i]
Nothing -> Left $ "Label " ++ l ++ " not defined"
LABEL _ -> Right []
I've omitted several cases for brevity, but you can see all the possible results here:
ADD always succeeds and produces an instruction
BEQL can either succeed or fail, depending on whether a label is found
LABEL always succeeds, even though it produces no actual instructions
This works as expected. The problem I now have is writing this function:
replaceLabels :: [Statement] -> Either String [I.Instruction]
replaceLabels takes a list of statements, and runs stmtToInstruction on each one. The addr argument to stmtToInstruction must be the length of the [Instruction] accumulated so far. The output may either be a Left String, if one of the label references was invalid, or a Right [I.Instruction], if there were no errors.
mapM :: Monad m => (a -> m b) -> [a] -> m [b] gets us some of the way there, but provides no way to inject the current address into the (a -> m b) function. How do I make this work?

You're right: the StateT monad transformer will do the trick:
imapM :: (Traversable t, Monad m)
=> (Int -> a -> m b) -> t a -> m (t b)
imapM f = flip runStateT 0 .
mapM (\a ->
do
count <- get
put $! count + 1
f count a)
But writing the specialized version for lists might be better:
itraverse :: Applicative f
=> (Int -> a -> f b) -> [a] -> f [b]
itraverse f = go 0 where
go !_ [] = pure []
go !count (x:xs) = (:) <$> f count x <*> go (count + 1) xs

I've implemented a recursive solution that I'm sure is very inefficient. I'd still be interested to see the 'proper' way of doing this.
replaceLabels :: [Statement] -> Either String [I.Instruction]
replaceLabels [] = Right []
replaceLabels stmts#(s:ss) = replaceLabels' labels stmts 0
where labels = process stmts
replaceLabels' :: [(String, Int)] -> [Statement] -> Int -> Either String [I.Instruction]
replaceLabels' _ [] _ = Right []
replaceLabels' labels (s:ss) addr = do
instructions <- stmtToInstruction addr labels s
restInstructions <- replaceLabels' labels ss (addr + length instructions)
return (instructions ++ restInstructions)

I would start by changing
stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]
into
stmtToInstruction :: [(String, Int)] -> Statement -> Either String (Int -> [I.Instruction])
That is, moving the function that takes the address into the Right branch of the Either. The reason is that label reference errors seem to be independent of addresses, so it's better to handle reference errors first and then worry about the address stuff in isolation.
This function resolves the references:
resolveRefs :: [(String,Int)] -> [Statement] -> Either String [Int -> [Instruction]]
resolveRefs environment = traverse (stmtToInstruction environment)
(traverse is equivalent to mapM but it only requires an Applicative constraint. They are different functions merely for historical reasons.)
Ok, after having handled the errors, lets now focus on the [Int -> [Instruction]] list. It seems that we have to map over it from the left while carrying an accumulated address that we must supply to each function. The mapAccumL function is perfect for this:
resolveAddrs :: [Int -> [Instruction]] -> [Instruction]
resolveAddrs funcs = mconcat . snd $ accumulate funcs
where
accumulate :: [Int -> [Instruction]] -> (Int,[[Instruction]])
accumulate = mapAccumL step 0
step address func = let is = func address in (address + length is,is)

Conduit: Join two sources into one

I have two conduit sources A and B which I want to merge them into one yielding:
data Result = Left Int | Value Int | Right Int
merge :: Monad m => Source m Int -> Source m Int -> Source Result
merge a b = undefined
such as it:
Consumes values from both a and b
Performs some computation to produce Value Int
As a result of computation either a or b could have a leftover
When one of the sequences is exhausted the result source should keep producing either Left or Right values, (depending on which original source still has values) until both sources are exhausted
I tried implementing it with ZipSource such as:
getZipSource (ZipSource (a =$= CL.map Left) <* ZipSource (b =$= CL.map Right))
but I couldn't figure out how to make it alternate between sources (when I do two awaits) and how to deal with leftovers in a way that I described above.
I also looked at sequenceSources but it doesn't seem to help either.
Can something like that be built with Conduit?
A concrete example would be:
Have two (assumed sorted) Int sources
Get values from both and compare
Yield the min value, subtract it from the biggest and put the remaining back to its stream
Repeat.
The expected output would be:
runConduit $ merge (CL.sourceList [10, 20, 30]) (CL.sourceList [6, 4, 20]) $$ CL.take 10
Value 6 -- 10-6 = 4, 6 yielded, 4 goes back to "a"
Value 4 -- 4-4 = 0, both values are fully consumed
Value 20 -- 20-20 = 0, both values are fully consumed
Left 30 -- "b" has no values, but "a" still yielding
[UPDATE]
The best way I found so far is to write something similar to zipSources tweaking its internals as:
go (Done ()) (HaveOutput src close y) = HaveOutput (go (Done ()) src) close (Nothing, Just y)
go (HaveOutput src close x) (Done ()) = HaveOutput (go src (Done ())) close (Just x, Nothing)
Would it be the right way to go?

I ended up doing this way:
data MergedValue a v b = BackL a v | MergedValue v | BackR v b
data JoinResult a v b = LeftoverL a | JoinValue v | LeftoverR b
joinSources :: Monad m
=> (a -> b -> MergedValue a v b)
-> Source m a
-> Source m b
-> Source m (JoinResult a v b)
joinSources f as bs =
go (newResumableSource as) (newResumableSource bs)
where
go ras rbs = do
(ras', ma) <- lift $ ras $$++ await
(rbs', mb) <- lift $ rbs $$++ await
case (ma, mb) of
(Nothing, Nothing) -> pure ()
(Nothing, Just b) -> yield (LeftoverR b) >> go ras' rbs'
(Just a, Nothing) -> yield (LeftoverL a) >> go ras' rbs'
(Just a, Just b) -> case f a b of
BackL x v -> do
yield (JoinValue v)
(nxt, _) <- lift $ ras' $$++ leftover x
go nxt rbs'
BackR v x -> do
yield (JoinValue v)
(nxt, _) <- lift $ rbs' $$++ leftover x
go ras' nxt
MergedValue v -> yield (JoinValue v) >> go ras' rbs'

Haskell: put in State monad seems to be elided

I'm writing a program to allocate pizzas to people; each person will get one pizza, ideally of their favorite type, unless stock has run out, in which case they are given their next favorite type recursively.
My approach is to compute a ((User, Pizza), Int) for the amount a person would like said pizza, sort those, and recurse through using a state monad to keep inventory counts.
The program is written and type checks:
allocatePizzasImpl :: [((User, Pizza), Int)]
-> State [(Pizza, Int)] [(User, Pizza)]
allocatePizzasImpl [] = return []
allocatePizzasImpl ((user, (flavor, _)):ranks) =
do inventory <- get
-- this line is never hit
put $ updateWith inventory (\i -> if i <= 0
then Nothing
else Just $ i - 1) flavor
next <- allocatePizzasImpl $ filter ((/= user) . fst) ranks
return $ (user, flavor) : next
and I have a helper function to extract the result:
allocatePizzas :: [Pizza]
-> [((User, Pizza), Int)]
-> [(User, Pizza)]
allocatePizzas pizzas rank = fst
. runState (allocatePizzasImpl rank)
$ buildQuotas pizzas
but the line indicated by -- this line is never hit is... never hit by any GHCI breakpoints; furthermore, if I break on the return call, GHCI says inventory isn't in scope.
When run, the result is assigning the same pizza (with one inventory count) to all users. Something is going wrong, but I have absolutely no idea how to proceed. I'm new to Haskell, so any comments on style would be appreciated as well =)
Thanks!
PS: For completeness, updateWith is defined as:
updateWith :: (Eq a, Eq b)
=> [(a, b)] -- inventory
-> (b -> Maybe b) -- update function; Nothing removes it
-> a -- key to update
-> [(a, b)]
updateWith set update key =
case lookup key set of
Just b -> replace set
(unwrapPair (key, update b))
(fromMaybe 0 $ elemIndex (key, b) set)
Nothing -> set
where replace :: [a] -> Maybe a -> Int -> [a]
replace [] _ _ = []
replace (_:xs) (Just val) 0 = val:xs
replace (_:xs) Nothing 0 = xs
replace (x:xs) val i = x : (replace xs val $ i - 1)
unwrapPair :: Monad m => (a, m b) -> m (a, b)
unwrapPair (a, mb) = do b <- mb
return (a, b)

I think your function replace is broken:
replace (_:xs) (Just val) 0 = val:xs
This doesn't pay any attention to the value it's replacing. Wasn't your intention to replace just the pair corresponding to key?
I think you want
updateWith [] e k = []
updateWith ((k', v):kvs) e k
| k' == k = case e v of
Just v' -> (k, v'):kvs
Nothing -> kvs
| otherwise = (k', v) : updateWith kvs e k

The issue (ignoring other conceptual things mentioned by the commenters) turned out to be using fst to extract the result from the State would for some reason not cause the State to actually be computed. Running the result through seq fixed it.
I'd be interested in knowing why this is the case, though!
Edit: As Daniel Wagner pointed out in the comments, I wasn't actually using inventory, which turned out to be the real bug. Marking this as accepted.

State transformer: need a Nothing in a (Maybe Bool)

This is from the nicta course (hence List = [], Optional = Maybe, ...), so I'm not looking for a full solution, but I am stuck on a State Transformer question. The aim is to filter duplicates from a List and completely fail if passed any number > 100.
-- filtering :: Applicative f => (a -> f Bool) -> List a -> f (List a)
distinctF :: (Ord a, Num a) => List a -> Optional (List a)
distinctF lst = case runStateT (filtering go lst) S.empty of
Full (val, _) -> Full val
Empty -> Empty
where
--go :: a -> StateT (S.Set a) Optional Bool
go x = do
s <- getT
if x > 100 then do
return *?*Empty / False*?*
This typechecks while go = undefined, but I'm struggling to put Empty into as return wraps e.g. False in a Full/Just. fail gets me someway forward but I don't think that is the solution.
In practice I am probably missing a more important issue and would welcome enlightenment.

If the goal is to write function making both: unique filtering and failing on large input at the same time, you got the skeleton quite right:
distinctF :: (Ord a, Num a) => List a -> Optional (List a)
distinctF lst = evalStateT (go lst) S.empty -- evalStateT is your case runStateT part
where -- on empty input we just return empty list
go [] = return []
-- otherwise
go (x:xs)
-- we check whether we should 'fail'
-- for that we *lift* the value from underlying monad (Optional) into our StateT Optional
| x > 100 = lift $ Empty
| otherwise = do
-- the stuff to do
-- get the state, do nothing if x is in there
-- otherwise add x to the state and recurse
So for your question, you need to lift Empty, not return it.

OK, so I finally found a way, by realising that I could construct the precisely correct return type, rather than trying to rely on return
go x = do
if x > 100 then
StateT (\_ -> Empty) -- `return` a fail
else do
st <- getT
However, I am still not quite sure how <- unwraps both the StateT and the inner monadic container

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Filtering / branching enumeratee - haskell

Related

How to randomly shuffle a list

Writing an assembler in Haskell - mapM with state?

Conduit: Join two sources into one

Haskell: put in State monad seems to be elided

State transformer: need a Nothing in a (Maybe Bool)

Categories

Resources