How to implement data streaming with the Snap framework? - haskell

I'd like to implement streaming of large data (in both directions) with the Snap server. To explore the possibilities I created a sample program that has two endpoints - reading and writing. There is a very simple internal buffer that holds one ByteString and whatever is written to the writing endpoint appears in the reading one. (Currently there is no way how to terminate the stream, but that's fine for this purpose.)
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative
import Control.Concurrent.MVar.Lifted
import Control.Monad
import Data.ByteString (ByteString)
import Blaze.ByteString.Builder (Builder, fromByteString)
import Data.Enumerator
import qualified Data.Enumerator.List as E
import Data.Enumerator.Binary (enumFile, iterHandle)
import Snap.Core
import Snap.Http.Server
main :: IO ()
main = do
buf <- newEmptyMVar
quickHttpServe (site buf)
site :: MVar ByteString -> Snap ()
site buf =
route [ ("read", modifyResponse (setBufferingMode False
. setResponseBody (fromBuf buf)))
, ("write", runRequestBody (toBuf buf))
]
fromBuf :: MVar ByteString -> Enumerator Builder IO a
fromBuf buf = E.repeatM (liftM fromByteString $ takeMVar buf)
toBuf :: MVar ByteString -> Iteratee ByteString IO ()
toBuf buf = E.mapM_ (putMVar buf)
Then I run in different terminals
curl http://localhost:8000/read >/dev/nul
and
dd if=/dev/zero bs=1M count=100 | \
curl --data-binary #- http://localhost:8000/write
But the writing part fails with an exception escaped to toplevel: Too many bytes read. This is obviously an instance of TooManyBytesReadException, but I couldn't find where it's thrown. Writing smaller amount of data like 1MB works as expected.
My questions are:
Where/how to fix the reading limit?
Will this stream data, without loading the whole POST request in memory? If not, how to fix it?

It will work if you add any content type that's not "application/x-www-form-urlencoded" to your /write, e.g.:
dd if=/dev/zero bs=1M count=100 | \
curl -H "Content-Type: application/json" --data-binary #- http://localhost:8000/write
This bit in Snap does something like
if contentType == Just "application/x-www-form-urlencoded" then readData maximumPOSTBodySize
where
maximumPOSTBodySize = 10*1024*1024
and x-www-form-urlencoded is curl's default.

To follow up on the previous answer: because forms of type application/x-www-form-urlencoded are so common, as a convenience Snap auto-decodes them for you and puts them into a parameters map in the request. The idea is similar in spirit to e.g. $_POST from PHP.
However, since these maps are read into RAM, naively decoding unbounded amounts of this data would allow an attacker to trivially DoS a server by sending it arbitrary amounts of this input until heap exhaustion. For this reason snap-server limits the amount of data it is willing to read in this way.

Related

Haskell broken pipe error when working with streams

I'm trying to build a player using streams. The main idea is to have a thread running a player that reads from bytes that come from another thread that downloads youtube audio concurrently. The code works for a while and the content is streamed correctly, but after a few seconds I always got this error:
Exception: fd:13: hPutBuf: resource vanished (Broken pipe).
I guess I'm missing something, because even when using the connect function the result is the same. Here's the code (simplified):
import Control.Concurrent
import System.IO.Streams
import Data.ByteString
main = do
(sink,_,_,_) <- runInteractiveCommand "mplayer -novideo - cache 5096 -"
mainSink <- lockingOutputStream sink -- main audio stream, goes straight to player
(_,source,_,_) <- runInteractiveCommand "yt-dlp \"https://www.youtube.com/watch?v=uFtfDK39ZhI\" -f bv+ba -o -"
loop mainSink source
loop :: OutputStream ByteString -> InputStream ByteString -> IO ()
loop sink src = do
sourceBytes <- peek src
case sourceBytes of
Nothing -> do loop sink src
Just _ -> do
audioBytes <- read src
write audioBytes sink
loop sink src
The problem appears to be that mplayer is generating its usual verbose terminal output on stdout and stderr, while yt-dlp is similarly generating output on stderr. Since you toss these handles away and never drain them, eventually the pipe buffers fill, and the processes get stuck. I can't say precisely why one or both of the processes dies instead of just hanging, but that's what's happening. Here's a simple example that redirects the unneeded output to /dev/null and appears to work:
import System.IO.Streams
main = do
(sink,_,_,_) <- runInteractiveCommand "mplayer -cache 5096 - 2>/dev/null >&2"
(_,source,_,_) <- runInteractiveCommand "yt-dlp \"https://www.youtube.com/watch?v=uFtfDK39ZhI\" -f bv+ba -o - 2>/dev/null"
connect source sink

Warp / Scotty not terminating thread / resources at end of request

I'm running into an issue where my Scotty app does not seem to terminate old HTTP request threads. And eventually, after a large number (10-20) of concurrent requests, I run into an error with too many DB connections libpq: failed (FATAL: sorry, too many clients already).
{-# LANGUAGE OverloadedStrings #-}
import Web.Scotty
import Database.PostgreSQL.Simple
import Control.Monad.IO.Class
connection :: IO Connection
connection = connect defaultConnectInfo
{ connectHost = "localhost", connectUser="postgres", connectPassword="mysecretpassword" }
main :: IO ()
main = scotty 8000 $ do
get "/" $ do
c <- liftIO $ connection
text "test"
This also happens with a Warp application (which Scotty):
{-# LANGUAGE OverloadedStrings #-}
import Network.Wai
import Network.Wai.Handler.Warp (run)
import Network.HTTP.Types (status200)
import Network.HTTP.Types.Header (hContentType)
import Database.PostgreSQL.Simple
import Control.Monad.IO.Class
connection :: IO Connection
connection = connect defaultConnectInfo
{ connectHost = "localhost", connectUser="postgres", connectPassword="mysecretpassword" }
main = run 8000 app
app :: Application
app req respond = do
respond $ responseStream status200 [] $ \write flush -> do
print "test"
con <- connection
flush
write $ "World\n"
Why is this happening? Is there an simple way to "finalize" the request at the end?
I can manually close the connection, but ideally I think killing the thread with any other related resources would be ideal.
I've verified that it keeps the connection open by running the following in postgres:
SELECT sum(numbackends) FROM pg_stat_database;
Scotty seems to take a few seconds until it closes it automatically (after the request is completed).
postgresql-simple provides the close :: Connection -> IO () function which closes the connection and free associated resources. You need to close the connection after you are done with it.
But a common problem is the following: what happens if the code between opening and closing the connection throws an exception? How to ensure that the connection is closed even in that case, so that we don't leak connections?
In Haskell, this is solved using the bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c function. You pass it an IO action that allocates some resource (in this case, a connection) a function that frees the allocated resource once we are we are done with it (in our case, the close function) and a function that says what we actually want to do with the allocated resource (in this case, perform a query).
bracket is somewhat similar to try-with-resources in Java or the "using" statement in C#.
Instead of opening and closing a connection on each request, a better approach would be so use some kind of connection pool shared between request threads. persistent-postgresql uses resource-pool for example.

Make Network.Wreq.Proxy from Network.HTTP.Proxy.Proxy

Network.HTTP.Proxy has a nice function called fetchProxy:
fetchProxy :: Bool -> IO Proxy
fetchProxy flg gets the local proxy settings and parse the string into
a Proxy value. If you want to be informed of ill-formed proxy
configuration strings, supply True for flg. Proxy settings are sourced
from the HTTP_PROXY environment variable [...]
I want to use the Proxy obtained this way with Wreq library, which has it's own Proxy defined like this, by importing it from HTTP:
import Network.HTTP.Client.Internal (Proxy(..), Response)
There appears to be a type mismatch between Network.HTTP.Proxy.Proxy and Network.Wreq.Proxy, where I presume they must be identical.
I import both like this:
import Network.Wreq
import Network.HTTP.Proxy (fetchProxy)
How can I use HTTP.Proxy.Proxy with Wreq and why does GHC see them as different types?
It's likely that the Wreq authors were just unaware of the other Proxy as they seem to be storing equivalent information. It'll be tricky to get them to talk to each other, however, since fetchProxy stores the host:port as a string and Wreq's Proxy wants the individual host and port. You'll have to do some URI parsing:
import Control.Lens
import Data.Text.Strict.Lens
import Network.HTTP.Proxy
import Network.Wreq
import URI.ByteString
main :: IO ()
main = do
Network.HTTP.Proxy.Proxy host _ <- fetchProxy True
case parseURI strictURIParserOptions (host ^. packed . re utf8) of
Left e -> do
putStrLn "uh oh"
print e
Right uri ->
case ( uri ^? uriAuthorityL . _Just . authorityHostL . hostBSL
, uri ^? uriAuthorityL . _Just . authorityPortL . _Just . portNumberL) of
(Just host_, Just port_) -> do
let opts = defaults & proxy ?~ httpProxy host_ port_
response <- getWith opts "http://example.com"
print response
_ ->
putStrLn "uh oh"
I'm using lens here to do the boring bits and pieces of packing/unpacking strings, encoding UTF8, and talking to the uri-bytestring package to get URI parsing. But the general idea is that datatypes in Haskell can be sliced and diced simply by pattern matching on the constructor; once extracted, the host:string here is funneled down into the httpProxy call, which returns Wreq's Proxy type. By qualifying the name of the constructor (Newtork.HTTP.Proxy.Proxy) I've let the compiler know which module I want that name from.
It would also not be too difficult, and probably less code to boot, to manually parse proxy information from the environment variables yourself. You could even have a separate environment variable for host and port, which would obviate the need for URI parsing. URIs are have such massively low entropy that they're an awful format for storing configuration information.

Accepting specific certificate with http-client-tls or tls?

I'm probably just overlooking something basic in the documentation of http-client-tls and tls, but: how can I establish an HTTPS connection to a server and only accept one particular certificate, specified by me, that is potentially not in the system certificate store?
I see that this is an old question, but I just spent some time writing code to do this and figured I'd post it here for posterity... and in the hopes of getting some code review from the community. Snoyman's comment is helpful, but there are so many code interdependencies here, and X.509 and TLS are so boil the ocean, that it's hard to debug and to know for sure that you're not screwing something up without digging pretty deep into the various libraries. I figured a more complete explanation with working code was in order.
Anyways, here's what I've come up with (this is a stack script so you can run it easily yourself) --
#!/usr/bin/env stack
{- stack --resolver lts-7.16 runghc -}
import qualified Data.ByteString as B
import Data.ByteString.Lazy (ByteString)
import Data.Default.Class (def)
import Data.String (fromString)
import Data.X509.CertificateStore (CertificateStore, readCertificateStore)
import Network.HTTP.Client (httpLbs, newManager, ManagerSettings)
import Network.HTTP.Client.TLS (mkManagerSettings)
import Network.Connection (TLSSettings(TLSSettings))
import qualified Network.TLS as TLS
import qualified Network.TLS.Extra.Cipher as TLS
import System.Environment (getArgs, getProgName)
managerSettings :: CertificateStore -> ManagerSettings
managerSettings store = mkManagerSettings settings Nothing
where settings = TLSSettings params
params = (TLS.defaultParamsClient "" B.empty) {
TLS.clientUseServerNameIndication = True
, TLS.clientShared = def {
TLS.sharedCAStore = store
}
, TLS.clientSupported = def {
TLS.supportedCiphers = TLS.ciphersuite_default
}
}
get :: FilePath -> String -> IO ()
get ca url = do
mstore <- readCertificateStore ca
case mstore of
Just store -> do
manager <- newManager $ managerSettings store
response <- httpLbs (fromString url) manager
putStrLn (show response)
Nothing -> do
putStrLn $ "error: invalid certificate store " ++ ca
main :: IO ()
main = do
args <- getArgs
case args of
ca:url:[] -> get ca url
_ -> do
name <- getProgName
putStrLn $ "usage: " ++ name ++ " ca url"
A couple notes:
The TLS.sharedCAStore settings is where the magic happens. If you want to add your CA to the system store (vs. using only your CA) you can load the system store using getSystemCertificateStore from System.X509, then use Data.X509.CertificateStore to convert back and forth between CertificateStore and [SignedCertificate] to create a store with the system certificates along with your own.
TLS.defaultParamsClient takes a hostname and server id, used for TLS server name indication (SNI), a TLS extension that allows a server to host multiple sites on a single IP (similar to how HTTP/1.1 host headers work). We don't necessarily know what to set this to when we're creating a manager. Fortunately, Network.Connection (used by http-client-tls) appears to override whatever settings we use, so it doesn't matter.
The default for TLS.supportedCiphers is an empty list, so this setting is required (unless you turn off validation or something). Network.Connection defaults to ciphersuite_all but that includes some "not recommended last resource cipher suites" so I opted to use ciphersuite_default instead.
I think you're looking for ClientHooks. You can create a TLSSettings value with that by using the TLSSettings constructor, and then create a ManagerSettings using mkManagerSettings.

Signing temporary s3 upload URLs w/ Haskell

I'm trying to upload files from a web form directly to to Amazon S3 asynchronously. In order to do this I must authenticate the client request to upload files on the server.
By digitally signing an upload request w/ my AWS Secret key I can create a temporary authenticated URL that the client can use to upload files to a S3 bucket.
The amazon S3 docs specify that the signature must be generated by the following
Signature = URL-Encode( Base64( HMAC-SHA1( YourSecretAccessKeyID,
UTF-8-Encoding-Of( StringToSign ) ) ) );
I'm using Haskell on the server so my implementation looks like:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString.Base64.Lazy as B64
import qualified Data.Digest.Pure.SHA as SHA
import qualified Data.ByteString.Lazy.Char8 as BL8
sign :: BL8.ByteString -> BL8.ByteString
sign = B64.encode . SHA.bytestringDigest . SHA.hmacSha1 secret
where secret = "aws-secret-key"
The format of the amazon docs requires that StringToSign look like:
StringToSign = HTTP-VERB + "\n" +
Content-MD5 + "\n" +
Content-Type + "\n" +
Expires + "\n" +
CanonicalizedAmzHeaders +
CanonicalizedResource;
Another example from Amazon:
GET\n
\n
\n
1175139620\n
/johnsmith/photos/puppy.jpg
So my string looks like:
"PUT\n\n\n1384330538\n/bucketname/objname"
I sign the string above (w/ the sign function) and craft a url that looks like:
https://s3.amazonaws.com/bucketname/objname?AWSAccessKeyId=accessskey&Signature=signature=&Expires=1384330979
This is then sent to the client via an AJAX request before an upload. I have updated the CORS policy on the bucket as well to allow for PUT requests.
The problem is that every time I try to upload something with the above signed url I get this message (in an XML doc).
The request signature we calculated does not match the signature you
provided. Check your key and signing method.
So I'm not sure where I went wrong. Note: I can upload if I use the public url (https://s3.amazonaws.com/bucketname/objname) (but this shouldn't be, I only want users to upload blobs, not read nor delete, etc.)
As someone who's done this dance a lot, it's very difficult to build software that correctly signs an HTTP-digest authenticated request like this. In particular, if you rely only on the server response to guide you it will take a long time. For security purposes the servers are deliberately cryptic when rejecting you.
My best tip is to (a) get an alternative implementation that you know works and (b) build your Haskell interface to be pure so that it's easy to make it exactly replicate a request from that other framework and (c) make sure you can get both the exact request text and exact String-To-Sign from both the alternative framework and your own code. In particular, you'll often have to impute exact timestamps and nonces and pay close attention to percent encodings.
With these two tools just create a variety of successful requests from the alternative implementation and see if you can replicate the exact String-To-Sign and exact request text using your own framework.
Most often my own errors involved improper encoding, missing quotes, not including all of the proper parameters (or the wrong ones), or using the hmac function incorrectly.
here is my upload url code, i might have missed a couple of imports since i pulled it out of the deep.
{-# LANGUAGE OverloadedStrings, FlexibleContexts, TypeFamilies, DeriveDataTypeable, TemplateHaskell, QuasiQuotes #-}
import qualified Aws
import qualified Aws.Core as Aws
import qualified Aws.S3 as S3
import qualified Data.Text as T
import qualified Codec.Binary.Base64 as B64
import qualified Data.ByteString as BS
import Text.Shakespeare.Text(st)
import qualified Codec.Binary.Url as Url
import System.Posix.Time(epochTime)
import Crypto.MAC.HMAC(hmac)
import Crypto.Hash.SHA1(hash)
data Cfg = Cfg { baseCfg :: Aws.Configuration
, s3Cfg :: S3.S3Configuration Aws.NormalQuery
, s3Bucket :: S3.Bucket
}
uploadUrl :: Cfg -> T.Text -> T.Text -> IO T.Text
uploadUrl cfg mime filename = do
time <- epochTime
let expires = show $ time + 600
msg = E.encodeUtf8 $ [st|PUT
#{mime}
#{expires}
x-amz-acl:public-read
/#{s3Bucket cfg}/#{filename}|] --the gap is necessary
key = Aws.secretAccessKey $ Aws.credentials $ baseCfg cfg
accessid = T.pack $ Url.encode $ BS.unpack $ Aws.accessKeyID $ Aws.credentials $ baseCfg cfg
signature = encode . T.pack $ B64.encode $ BS.unpack $ hmac hash 64 key msg
encode = T.pack . Url.encode . BS.unpack . E.encodeUtf8
return $ [st|http://#{s3Bucket cfg}.s3.amazonaws.com/#{filename}?AWSAccessKeyId=#{accessid}&Expires=#{expires}&Signature=#{signature}|]

Resources