Chunk data with Conduit

Chunk data with Conduit - haskell

Here is an example of a conduit combinator that supposed to yield downstream when a complete message is received from upstream:
import qualified Data.ByteString as BS
import Data.Conduit
import Data.Conduit.Combinators
import Data.Conduit.Network
message :: Monad m => ConduitT BS.ByteString BS.ByteString m ()
message = loop
where
loop = await >>= maybe (return ()) go
go x = if (BS.isSuffixOf "|" x)
then yield (BS.init x) >> loop
else leftover x
Server code itself looks like following:
main :: IO ()
main = do
runTCPServer (serverSettings 5000 "!4") $ \ appData -> runConduit $
(appSource appData)
.| message
.| (appSink appData)
For some reason telnet 127.0.0.1 5000 disconnects after sending any message:
telnet 127.0.0.1 5000
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
123|
Connection closed by foreign host.
Please advice, what am I doing wrong here?
Update
More importantly what I try doing here is wait for completion signal | and then yield the complete message downstream. Here is the evolution of message combinator:
message :: Monad m => ConduitT BS.ByteString BS.ByteString m ()
message = do
minput <- await
case minput of
Nothing -> return ()
Just input -> do
case BS.breakSubstring "|" input of
("", "") -> return ()
("", "|") -> return ()
("", xs) -> leftover $ BS.tail xs
(x, "") -> leftover x -- problem is in this leftover
(x, xs) -> do
yield x
leftover $ BS.tail xs
message
The idea I had is that if there is nothing coming from the upstream combinator will have to wait until there will be something, such that it can send a complete message downstream. But it seams that conduit starts spinning on CPU a lot on that leftover call in the above message combinator.

Finally figured out that it was necessary to await instead of leftover on the base case. Here is how working message combinator looks like:
message :: Monad m => ConduitT BS.ByteString BS.ByteString m ()
message = do
minput <- await
case minput of
Nothing -> return ()
Just input -> process input >> message
where
process input =
case BS.breakSubstring "|" input of
("", "") -> return ()
("", "|") -> return ()
("", xs) -> leftover $ BS.tail xs
(x, "") -> do
minput <- await
case minput of
Nothing -> return ()
Just newInput -> process $ BS.concat [x, newInput]
(x, xs) -> do
yield x
leftover $ BS.tail xs
A bit of boilerplate that can probably be cleaned up, but it works.

Print x in go to debug.
...
go x = do
liftIO (Prelude.print x)
if ...
The socket receives a bytestring that ends with \r\n, so you go to the else branch, which terminates the session.

Related

pipes `take` with default value

I have a producer p with type Producer Message IO (Producer SB.ByteString IO ()).
Now I need optionally skip some messages and optionally process certain number of messages:
let p =
(processMBFile f >->
P.drop (optSkip opts)
>->
(case optLimit opts of
Nothing -> cat
Just n -> ptake n)
)
in
do
restp <- runEffect $ for p processMessage
I could not use take from Pipes.Prelude because it returns () while I need to return an empty producer. The way I quickly haked it is by replacing take with my own implementation ptake:
emptyP :: Producer SB.ByteString IO ()
emptyP = return ()
ptake :: Monad m => Int -> Pipe a a m (Producer SB.ByteString IO ())
ptake = go
where
go 0 = return emptyP
go n = do
a <- await
yield a
go (n-1)
My question is: if there is a more elegant way to do this?

ptake differs from take only in the return value, so it can be implemented in terms of it using (<$):
ptake n = emptyP <$ take n

How can I poll a process for it's stdout / stderrr output? Blocked by isEOF

The following example requires the packages of:
- text
- string-conversions
- process
Code:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE LambdaCase #-}
module Example where
import qualified Data.Text as T
import Data.Text (Text)
import Data.Monoid
import Control.Monad.Identity
import System.Process
import GHC.IO.Handle
import Debug.Trace
import Data.String.Conversions
runGhci :: Text -> IO Text
runGhci _ = do
let expr = "print \"test\""
let inputLines = (<> "\n") <$> T.lines expr :: [Text]
print inputLines
createProcess ((proc "ghci" ["-v0", "-ignore-dot-ghci"]) {std_in=CreatePipe, std_out=CreatePipe, std_err=CreatePipe}) >>= \case
(Just pin, Just pout, Just perr, ph) -> do
output <-
forM inputLines (\i -> do
let script = i <> "\n"
do
hPutStr pin $ cs $ script
hFlush pin
x <- hIsEOF pout >>= \case
True -> return ""
False -> hGetLine pout
y <- hIsEOF perr >>= \case
True -> return ""
False -> hGetLine perr
let output = cs $! x ++ y
return $ trace "OUTPUT" $ output
)
let f i o = "ghci>" <> i <> o
let final = T.concat ( zipWith f (inputLines :: [Text]) (output :: [Text]) :: [Text])
print final
terminateProcess ph
pure $ T.strip $ final
_ -> error "Invaild GHCI process"
If I attempt to run the above:
stack ghci src/Example.hs
ghci> :set -XOverloadedStrings
ghci> runGhci ""
["print \"test\"\n"]
It appears to be blocking on hIsEOF perr, according to https://stackoverflow.com/a/26510673/1663462 it sounds like I shouldn't call this function unless there is 'some output' ready to be flushed / read... However how do I handle the case where it does not have any output at that stage? I don't mind periodically 'checking' or having a timeout.
How can I prevent the above from hanging? I've tried various approaches involving hGetContents, hGetLine however they all seem to end up blocking (or closing the handle) in this situation...

I had to use additional threads, MVars, as well as timeouts:
runGhci :: Text -> IO Text
runGhci _ = do
let expr = "123 <$> 123"
let inputLines = filter (/= "") (T.lines expr)
print inputLines
createProcess ((proc "ghci" ["-v0", "-ignore-dot-ghci"]) {std_in=CreatePipe, std_out=CreatePipe, std_err=CreatePipe}) >>= \case
(Just pin, Just pout, Just perr, ph) -> do
output <- do
forM inputLines
(\i -> do
let script = "putStrLn " ++ show magic ++ "\n"
++ cs i ++ "\n"
++ "putStrLn " ++ show magic ++ "\n"
do
stdoutMVar <- newEmptyMVar
stderrMVar <- newMVar ""
hPutStr pin script
hFlush pin
tOutId <- forkIO $ extract' pout >>= putMVar stdoutMVar
tErrId <- forkIO $ do
let f' = hGetLine perr >>= (\l -> modifyMVar_ stderrMVar (return . (++ (l ++ "\n"))))
forever f'
x <- timeout (1 * (10^6)) (takeMVar stdoutMVar) >>= return . fromMaybe "***ghci timed out"
y <- timeout (1 * (10^6)) (takeMVar stderrMVar) >>= return . fromMaybe "***ghci timed out"
killThread tOutId
killThread tErrId
return $ trace "OUTPUT" $ cs $! x ++ y
)
let final = T.concat ( zipWith f (inputLines :: [Text]) (output :: [Text]) :: [Text])
print final
terminateProcess ph
pure $ T.strip $ cs $ final
_ -> error "Invaild GHCI process"

How can I use REPL with CPS function?

I've just encountered withSession :: (Session -> IO a) -> IO a of wreq package. I want to evaluate the continuation line by line, but I can't find any way for this.
import Network.Wreq.Session as S
withSession $ \sess -> do
res <- S.getWith opts sess "http://stackoverflow.com/questions"
-- print res
-- .. other things
In above snippet how can I evaluate print res in ghci? In other words, can I get Session type in ghci?

Wonderful question.
I am aware of no methods that can re-enter the GHCi REPL, so that we can use that in CPS functions. Perhaps others can suggest some way.
However, I can suggest an hack. Basically, one can exploit concurrency to turn CPS inside out, if it is based on the IO monad as in this case.
Here's the hack: use this in a GHCi session
> sess <- newEmptyMVar :: IO (MVar Session)
> stop <- newEmptyMVar :: IO (MVar ())
> forkIO $ withSession $ \s -> putMVar sess s >> takeMVar stop
> s <- takeMVar sess
> -- use s here as if you were inside withSession
> let s = () -- recommended
> putMVar stop ()
> -- we are now "outside" withSession, don't try to access s here!
A small library to automatize the hack:
data CPSControl b = CPSControl (MVar ()) (MVar b)
startDebugCps :: ((a -> IO ()) -> IO b) -> IO (a, CPSControl b)
startDebugCps cps = do
cpsVal <- newEmptyMVar
retVal <- newEmptyMVar
stop <- newEmptyMVar
_ <- forkIO $ do
x <- cps $ \c -> putMVar cpsVal c >> takeMVar stop
putMVar retVal x
s <- takeMVar cpsVal
return (s, CPSControl stop retVal)
stopDebugCps :: CPSControl b -> IO b
stopDebugCps (CPSControl stop retVal) = do
putMVar stop ()
takeMVar retVal
testCps :: (String -> IO ()) -> IO String
testCps act = do
putStrLn "testCps: begin"
act "here's some string!"
putStrLn "testCps: end"
return "some return value"
A quick test:
> (x, ctrl) <- startDebugCps testCps
testCps: begin
> x
"here's some string!"
> stopDebugCps ctrl
testCps: end
"some return value"

MVars are blocking indefinitely; but only in certain scenarios.

First, because this is about a specific case, I haven't reduced the code at all, so it will be quite long, and in 2 parts (Helper module, and the main).
SpawnThreads in ConcurHelper takes a list of actions, forks them, and gets an MVar containing the result of the action. It them combines the results, and returns the resulting list. It works fine in certain cases, but blocks indefinitely on others.
If I give it a list of putStrLn actions, it executes them fine, then returns the resulting ()s (yes, I know running print commands on different threads at the same time is bad in most cases).
If I try running multiTest in Scanner though (which takes either scanPorts or scanAddresses, the scan range, and the number of threads to use; then splits the scan range over the threads, and passes the list of actions to SpawnThreads), it will block indefinitely. The odd thing is, according to the debug prompts scattered around ConcurHelper, on each thread, ForkIO is returning before the MVar is filled. This would make sense if it wasn't in a do block, but shouldn't the actions be performed sequentially? (I don't know if this is related to the problem or not; it's just something I noticed while attempting to debug it).
I've thought it out step by step, and if it's executing in the order laid out in spawnThreads, the following should happen:
An empty MVar should be created inside forkIOReturnMVar, and passed to mVarWrapAct.
mVarWrapAct should execute the action, and put the result in the MVar (this is where the problem seems to lie. "MVar filled" is never shown, suggesting the MVar is never put into)
getResults should then take from the resulting list of MVars, and return the results
If point #2 isn't the issue, I can see where the problem would be (and if it is the issue, I can't see why putMVar never executes. Inside the scanner module, the only real function of interest for this question is multiTest. I only included the rest so it could be run).
To do a simple test, you can run the following:
spawnThreads [putStrLn "Hello", putStrLn "World"] (should return [(),()])
multiTest (scanPorts "127.0.0.1") 1 (0,5) (Creates the MVar, hangs for a sec, then crashes with the aforementioned error)
Any help in understanding whats going on here would be appreciated. I can't see what the difference between the 2 use cases are.
Thank you
(And I'm using this atrocious exception handling system because IO errors don't give codes for specific network exceptions, so I've been left with parsing messages to find out what happened)
Main:
module Scanner where
import Network
import Network.Socket
import System.IO
import Control.Exception
import Control.Concurrent
import ConcurHelper
import Data.Maybe
import Data.Char
import NetHelp
data NetException = NetNoException | NetTimeOut | NetRefused | NetHostUnreach
| NetANotAvail | NetAccessDenied | NetAddrInUse
deriving (Show, Eq)
diffExcept :: Either SomeException Handle -> Either NetException Handle
diffExcept (Right h) = Right h
diffExcept (Left (SomeException m))
| err == "WSAETIMEDOUT" = Left NetTimeOut
| err == "WSAECONNREFUSED" = Left NetRefused
| err == "WSAEHOSTUNREACH" = Left NetHostUnreach
| err == "WSAEADDRNOTAVAIL" = Left NetANotAvail
| err == "WSAEACCESS" = Left NetAccessDenied
| err == "WSAEADDRINUSE" = Left NetAddrInUse
| otherwise = error $ show m
where
err = reverse . dropWhile (== ')') . reverse . dropWhile (/='W') $ show m
extJust :: Maybe a -> a
extJust (Just a) = a
selectJusts :: IO [Maybe a] -> IO [a]
selectJusts mayActs = do
mays <- mayActs; return . map extJust $ filter isJust mays
scanAddresses :: Int -> Int -> Int -> IO [String]
scanAddresses port minAddr maxAddr =
selectJusts $ mapM (\addr -> do
let sAddr = "192.168.1." ++ show addr
print $ "Trying " ++ sAddr ++ " " ++ show port
connection <- testConn sAddr port
if isJust connection
then do hClose $ extJust connection; return $ Just sAddr
else return Nothing) [minAddr..maxAddr]
scanPorts :: String -> Int -> Int -> IO [Int]
scanPorts addr minPort maxPort =
selectJusts $ mapM (\port -> do
--print $ "Trying " ++ addr ++ " " ++ show port
connection <- testConn addr port
if isJust connection
then do hClose $ extJust connection; return $ Just port
else return Nothing) [minPort..maxPort]
main :: IO ()
main = do
withSocketsDo $ do
putStrLn "Scan Addresses or Ports? (a/p)"
choice <- getLine
if (toLower $ head choice) == 'a'
then do
putStrLn "On what port?"
sPort <- getLine
addrs <- scanAddresses (read sPort :: Int) 0 255
print addrs
else do
putStrLn "At what address?"
address <- getLine
ports <- scanPorts address 0 9999
print ports
main
testConn :: HostName -> Int -> IO (Maybe Handle)
testConn host port = do
result <- try $ timedConnect 1 host port
let result' = diffExcept result
case result' of
Left e -> do putStrLn $ "\t" ++ show e; return Nothing
Right h -> return $ Just h
setPort :: AddrInfo -> Int -> AddrInfo
setPort addInf nPort = case addrAddress addInf of
(SockAddrInet _ host) -> addInf { addrAddress = (SockAddrInet (fromIntegral nPort) host)}
getHostAddress :: HostName -> Int -> IO SockAddr
getHostAddress host port = do
addrs <- getAddrInfo Nothing (Just host) Nothing
let adInfo = head addrs
newAdInfo = setPort adInfo port
return $ addrAddress newAdInfo
timedConnect :: Int -> HostName -> Int -> IO Handle
timedConnect time host port = do
s <- socket AF_INET Stream defaultProtocol
setSocketOption s RecvTimeOut time; setSocketOption s SendTimeOut time
addr <- getHostAddress host port
connect s addr
socketToHandle s ReadWriteMode
multiTest :: (Int -> Int -> IO a) -> Int -> (Int, Int) -> IO [a]
multiTest partAction threads (mi,ma) =
spawnThreads $ recDiv [mi,perThread..ma]
where
perThread = ((ma - mi) `div` threads) + 1
recDiv [] = []
recDiv (curN:restN) =
partAction (curN + 1) (head restN) : recDiv restN
Helper:
module ConcurHelper where
import Control.Concurrent
import System.IO
spawnThreads :: [IO a] -> IO [a]
spawnThreads actions = do
ms <- mapM (\act -> do m <- forkIOReturnMVar act; return m) actions
results <- getResults ms
return results
forkIOReturnMVar :: IO a -> IO (MVar a)
forkIOReturnMVar act = do
m <- newEmptyMVar
putStrLn "Created MVar"
forkIO $ mVarWrapAct act m
putStrLn "Fork returned"
return m
mVarWrapAct :: IO a -> MVar a -> IO ()
mVarWrapAct act m = do a <- act; putMVar m a; putStrLn "MVar filled"
getResults :: [MVar a] -> IO [a]
getResults mvars = do
unpacked <- mapM (\m -> do r <- takeMVar m; return r) mvars
putStrLn "MVar taken from"
return unpacked

Your forkIOReturnMVar isn't exception safe: whenever act throws, the MVar isn't going to be filled.
Minimal example
import ConcurHelper
main = spawnThreads [badOperation]
where badOperation = do
error "You're never going to put something in the MVar"
return True
As you can see, badOperation throws, and therefore the MVar won't get filled in mVarWrapAct.
Fix
Fill the MVar with an appropriate value if you encounter an exception. Since you cannot provide a default value for all possible types a, it's better to use MVar (Maybe a) or MVar (Either b a) as you already do in your network code.
In order to catch the exceptions, use one of the operations provided in Control.Exception. For example, you could use onException:
mVarWrapAct :: IO a -> MVar (Maybe a) -> IO ()
mVarWrapAct act m = do
onException (act >>= putMVar m . Just) (putMVar m Nothing)
putStrLn "MVar filled"
However, you might want to preserve the actual exception for more information. In this case you could simply use catch together with Either SomeException a :
mVarWrapAct :: IO a -> MVar (Either SomeException a) -> IO ()
mVarWrapAct act m = do
catch (act >>= putMVar m . Right) (putMVar m . Left)
putStrLn "MVar filled"

How to access the response code in happstack?

I'm trying to store a counter of all 200 response codes in my happstack application.
module Main where
import Happstack.Server
import Control.Concurrent
import Control.Monad.IO.Class ( liftIO )
import Control.Monad
main :: IO ()
main = do
counter <- (newMVar 0) :: IO (MVar Integer)
simpleHTTP nullConf $ countResponses counter (app counter)
countResponses :: MVar Integer -> ServerPart Response -> ServerPart Response
countResponses counter r = do
resp <- r
liftIO $ putStrLn $ show resp
-- TODO: Does not work, response code always 200
if rsCode resp == 200
then liftIO $ (putMVar counter . (+) 1) =<< takeMVar counter
else liftIO $ putStrLn $ "Unknown code: " ++ (show $ rsCode resp)
return resp
app counter = do
c <- liftIO $ readMVar counter
msum
[ dir "error" $ notFound $ toResponse $ "NOT HERE"
, ok $ toResponse $ "Hello, World! " ++ (show c)
]
The problem, as far as I can tell, is that notFound adds a filter that sets the code, which hasn't been run at the time I am inspecting the response.
I can't hook in with my own filter, since it has type Response -> Response and I need to be in the IO monad to access the mvar. I found mapServerPartT which looks like it could be possible to hook in my own code, but I'm not quite sure whether that's overkill in this scenario.
I did find simpleHttp'' which seems to directly call runWebT, which then runs appFilterToResp outside of any code I can hook. Perhaps I have to build my own version of simpleHttp''?
UPDATE: This works, is it the best way?
-- Use this instead of simpleHTTP
withMetrics :: (ToMessage a) => MVar Integer -> Conf -> ServerPartT IO a -> IO ()
withMetrics counter conf hs =
Listen.listen conf (\req -> (simpleHTTP'' (mapServerPartT id hs) req) >>=
runValidator (fromMaybe return (validator conf)) >>=
countResponses counter)
A possibly related question: I also want to be able to time requests, which means I would have to hook in at probably the same spot at the end of the request cycle.
UPDATE 2: I was able to get timings for requests:
logMessage x = logM "Happstack.Server.AccessLog.Combined" INFO x
withMetrics :: (ToMessage a) => Conf -> ServerPartT IO a -> IO ()
withMetrics conf hs =
Listen.listen conf $ \req -> do
startTime <- liftIO $ getCurrentTime
resp <- simpleHTTP'' (mapServerPartT id hs) req
validatedResp <- runValidator (fromMaybe return (validator conf)) resp
endTime <- liftIO $ getCurrentTime
logMessage $ rqUri req ++ " " ++ show (diffUTCTime endTime startTime)
return validatedResp

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Chunk data with Conduit - haskell

Print x in go to debug. ... go x = do liftIO (Prelude.print x) if ... The socket receives a bytestring that ends with \r\n, so you go to the else branch, which terminates the session.

Related

pipes `take` with default value

How can I poll a process for it's stdout / stderrr output? Blocked by isEOF

How can I use REPL with CPS function?

MVars are blocking indefinitely; but only in certain scenarios.

How to access the response code in happstack?

Categories

Resources