Many types of String (ByteString) - haskell

I wish to compress my application's network traffic.
According to the (latest?) "Haskell Popularity Rankings", zlib seems to be a pretty popular solution. zlib's interface uses ByteStrings:
compress :: ByteString -> ByteString
decompress :: ByteString -> ByteString
I am using regular Strings, which are also the data types used by read, show, and Network.Socket:
sendTo :: Socket -> String -> SockAddr -> IO Int
recvFrom :: Socket -> Int -> IO (String, Int, SockAddr)
So to compress my strings, I need some way to convert a String to a ByteString and vice-versa.
With hoogle's help, I found:
Data.ByteString.Char8 pack :: String -> ByteString
Trying to use it:
Prelude Codec.Compression.Zlib Data.ByteString.Char8> compress (pack "boo")
<interactive>:1:10:
Couldn't match expected type `Data.ByteString.Lazy.Internal.ByteString'
against inferred type `ByteString'
In the first argument of `compress', namely `(pack "boo")'
In the expression: compress (pack "boo")
In the definition of `it': it = compress (pack "boo")
Fails, because (?) there are different types of ByteString ?
So basically:
Are there several types of ByteString? What types, and why?
What's "the" way to convert Strings to ByteStrings?
Btw, I found that it does work with Data.ByteString.Lazy.Char8's ByteString, but I'm still intrigued.

There are two kinds of bytestrings: strict (defined in Data.Bytestring.Internal) and lazy (defined in Data.Bytestring.Lazy.Internal). zlib uses lazy bytestrings, as you've discovered.

The function you're looking for is:
import Data.ByteString as BS
import Data.ByteString.Lazy as LBS
lazyToStrictBS :: LBS.ByteString -> BS.ByteString
lazyToStrictBS x = BS.concat $ LBS.toChunks x
I expect it can be written more concisely without the x. (i.e. point-free, but I'm new to Haskell.)

A more efficient mechanism might be to switch to a full bytestring-based layer:
network.bytestring for bytestring sockets
lazy bytestrings for compressoin
binary of bytestring-show to replace Show/Read

Related

Inverse of `Data.Text.Encoding.decodeLatin1`?

Is there a function f :: Text -> Maybe ByteString such that forall x:
f (decodeLatin1 x) == Just x
Note, decodeLatin1 has the signature:
decodeLatin1 :: ByteString -> Text
I'm concerned that encodeUtf8 is not what I want, as I'm guessing what it does is just dump the UTF-8 string out as a ByteString, not reverse the changes that decodeLatin1 made on the way in to characters in the upper half of the character set.
I understand that f has to return a Maybe, because in general there's Unicode characters that aren't in the Latin character set, but I just want this to round trip at least, in that if we start with a ByteString we should get back to it.
DISCLAIMER: consider this a long comment rather than a solution, because I haven't tested.
I think you can do it with witch library. It is a general purpose type converter library with a fair amount of type safety. There is a type class called TryFrom to perform conversion between types that might fail to cast.
Luckily witch provides conversions from/to encondings too, having an instance TryFrom Text (ISO_8859_1 ByteString), meaning that you can convert between Text and latin1 encoded ByteString. So I think (not tested!!) this should work
{-# LANGUAGE TypeApplications #-}
import Witch (tryInto, ISO_8859_1)
import Data.Tagged (Tagged(unTagged))
f :: Text -> Maybe ByteString
f s = case tryInto #(ISO_8859_1 ByteString) s of
Left err -> Nothing
Right bs -> Just (unTagged bs)
Notice that tryInto returns a Either TryFromException s, so if you want to handle errors you can do it with Either. Up to you.
Also, witch docs points out that this conversion is done via String type, so probably there is an out-of-the-box solution without the need of depending on witch package. I don't know such a solution, and looking to the source code hasn't helped
Edit:
Having read witch source code aparently this should work
import qualified Data.Text as T
import Data.Char (isLatin1)
import qualified Data.ByteString.Char8 as C
f :: Text -> Maybe ByteString
f t = if allCharsAreLatin then Just (C.pack str) else Nothing
where str = T.unpack t
allCharsAreLatin = all isLatin1 str
The latin1 encoding is pretty damn simple -- codepoint X maps to byte X, whenever that's in range of a byte. So just unpack and repack immediately.
import Control.Monad
import qualified Data.Text as T
import qualified Data.ByteString.Char8 as BS
latin1EncodeText :: T.Text -> Maybe BS.ByteString
latin1EncodeText t = BS.pack (T.unpack t) <$ guard (T.all (<'\256') t)
It's possible to avoid the intermediate String, but you should probably make sure this is your bottleneck before trying for that.

Haskell Bytestring to Float Array

Hi have binaries of float data (single-precision 32-bit IEEE) that I would like to work on.
How can I best load this for further use, ideally as (IOArray Int Float).
bytesToFloats :: ByteString -> [Float]
bytesToFloatArray :: ByteString -> IOArray Int Float
If you've got bog standard single-precision floats, and you just want to work them over in Haskell, you can always be down and dirty about it:
import Data.ByteString.Internal as BS
import qualified Data.Vector.Storable as V
bytesToFloats :: BS.ByteString -> V.Vector Float
bytesToFloats = V.unsafeCast . aux . BS.toForeignPtr
where aux (fp,offset,len) = V.unsafeFromForeignPtr fp offset len
I think you might be happier with Data.Vector:
http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Vector_Tutorial#Parsing_Binary_Data
You could also use cereal library, for example:
import Control.Applicative
import Data.ByteString
import Data.Serialize
floatsToBytes :: [Float] -> ByteString
floatsToBytes = runPut . mapM_ putFloat32le
-- | Parses the input and returns either the result or an error description.
bytesToFloat :: ByteString -> Either String [Float]
bytesToFloat = runGet $ many getFloat32le
If you can convert 4 bytes to a Word32, you can use the function wordToFloat in the data-binary-ieee754 package to convert it to a float. You could then load this into any kind of list-like structure you want to manipulate it.

Haskell csv-conduit in GHCi

I've been suggested csv-conduit as a good Haskell package to work with CSV files. I want to learn how it works, but the documentation is too terse for a newbie Haskell programmer.
Is there a way for me to figure out how it works by trial-and-error in GHCi?
More specifically, should I load modules and files from GHCi or should I write a simple HS file to load them and then move around interactively?
I mentioned csv-conduit, but I'm opened to using any CSV package. I just need to get my hands on one and fool around with it, until I feel at ease (much like I would do in IDLE).
Take a look at the following function: readCSVFile :: :: (MonadResource m, CSV ByteString a) => CSVSettings -> FilePath -> m [a]
Its relatively simple to call, as we just need a CSVSettings, such as defCSVSettings, and a FilePath (aka String), "file.csv" or something.
Thus, after the call, we get (MonadResource m, CSV ByteString a). We can resolve this one at a time to figure out an appropriate type for this. We are performing IO in this operation, so for MonadResource m, m should just be ResourceT IO, which happens to be an instance of MonadBaseControl IO as required by runResourceT. This is a conduit specific thing.
For the CSV ByteString a, we need to find what instances of CSV. To do so, go to http://hackage.haskell.org/packages/archive/csv-conduit/0.2.1.1/doc/html/Data-CSV-Conduit.html#t:CSV (where the documentation for the package is in my opinion somewhat obnoxiously all stuffed into the typeclass...) and click on Instances to see what available instances we have of the form CSV ByteString a. The two options are CSV ByteString ByteString and CSV ByteString Text.
Of the two of these, Text is preferable because it handles unicode and CSV is unlikely to be containing binary data. ByteString is more or less similar to a [Word8] while Text is more similar to [Char] which is probably what you want. Hence, a should be Text (although ByteString will still work).
This means the result of the function call is ResourceT IO [Row Text]. We can't do much with this, but because ResourceT is a monad transformer, we can easily "pop" off the monad transformation layer with the function runResourceT. Thus,
readFile :: FilePath -> IO [Row Text]
readFile = runResourceT . readCSVFile defCSVSettings
which is easily usable within, say, main to get at the [Row Text] which you can then iterate over with a map or a fold to get your hands on the individual rows.
To run this sort of thing in GHCI you absolutely have to specifically point out the type. The reason is that the result class instance is not dependent on any of the parameters; thus, for any set of CSVSettings and FilePath, readCSVFile could return any number of different types as long as they as m is an instance of MonadResource m and a is an instance of CSV ByteString a. Thus, we have to explicitly point out to GHCi which type you want.
Have you tried Text.CSV? It might be more appropriate if you're just starting out with Haskell, as it's much simpler.
As for exploring new modules, you can just load it into GHCi, there's no need to write an additional file.
This works with the latest version of the csv-conduit package (version 0.6.3). Note the signature of readCsv without which I couldn't compile.
{-# LANGUAGE OverloadedStrings #-}
import Data.CSV.Conduit
import Data.Text (Text)
import qualified Data.Vector as V
import qualified Data.ByteString as B
csvset :: Char -> CSVSettings
csvset c = CSVSettings {csvSep = c, csvQuoteChar = Just '"'}
readCsv :: String -> Char -> IO (V.Vector (Row Text))
readCsv fp del = readCSVFile (csvset del) fp
main = readCsv "C:\\mydir\\myfile.csv" ';'

The type signature of Parsec function 'parse' and the class 'Stream'

What does the constraint (Stream s Identity t) mean in the following type declaration?
parse :: (Stream s Identity t)
=> Parsec s () a -> SourceName -> s -> Either ParseError a
What is Stream in the following class declaration, what does it mean. I'm totally lost.
class Monad m => Stream s m t | s -> t where
When I use Parsec, I get into a jam with the type-signatures (xxx :: yyy) all the time. I always skip the signatures, load the src into ghci, and then copy the type-signature back to my .hs file. It works, but I still don't understand what all these signatures are.
EDIT: more about the point of my question.
I'm still confused about the 'context' of type-signature:
(Show a) =>
means a must be a instance of class Show.
(Stream s Identity t) =>
what's the meaning of this 'context', since t never showed after the =>
I have a lot of different parser to run, so I write a warp function to run any of those parser with real files. but here comes the problem:
Here is my code, It cannot be loaded, how can I make it work?
module RunParse where
import System.IO
import Data.Functor.Identity (Identity)
import Text.Parsec.Prim (Parsec, parse, Stream)
--what should I write "runIOParse :: ..."
--runIOParse :: (Stream s Identity t, Show a) => Parsec s () a -> String -> IO ()
runIOParse pa filename =
do
inh <- openFile filename ReadMode
outh <- openFile (filename ++ ".parseout") WriteMode
instr <- hGetContents inh
let result = show $ parse pa filename instr
hPutStr outh result
hClose inh
hClose outh
the constraint: (Stream s Identity t) means what?
It means that the input s your parser works on (i.e. [Char]) must be an instance of the Stream class. In the documentation you see that [Char] is indeed an instance of Stream, since any list is.
The parameter t is the token type, which is normally Char and is determinded by s, as states the functional dependency s -> t.
But don't worry too much about this Stream typeclass. It's used only to have a unified interface for any Stream-like type, e.g. lists or ByteStrings.
what is Stream
A Stream is simply a typeclass. It has the uncons function, which returns the head of the input and the tail in a tuple wrapped in Maybe. Normally you won't need this function. As far as I can see, it's only needed in the most basic parsers like tokenPrimEx.
Edit:
what's the meaning of this 'context', since t never showed after the =>
Have a look at functional dependencies. The t never shows after the ´=>´, because it is determiend by s.
And it means that you can use uncons on whatever s is.
Here is my code, It cannot be loaded, how can I make it work?
Simple: Add an import statement for Text.Parsec.String, which defines the missing instance for Stream [tok] m tok. The documentation could be a bit clearer here, because it looks as if this instance was defined in Text.Parsec.Prim.
Alternatively import the whole Parsec library (import Text.Parsec) - this is how I always do it.
The Stream type class is an abstraction for list-like data structures. Early versions of Parsec only worked for parsing lists of tokens (for example, String is a synonym for [Char], so Char is the token type), which can be a very inefficient representation. These days, most substantial input in Haskell is handled as Text or ByteString types, which are not lists, but can act a lot like them.
So, for example, you mention
parse :: (Stream s Identity t)
=> Parsec s () a -> SourceName -> s -> Either ParseError a
Some specializations of this type would be
parse1 :: Parsec String () a -> SourceName -> String -> Either ParseError a
parse2 :: Parsec Text () a -> SourceName -> Text -> Either ParseError a
parse3 :: Parsec ByteString () a -> SourceName -> ByteString -> Either ParseError a
or even, if you have a separate lexer with a token type MyToken:
parse4 :: Parsec [MyToken] () a -> SourceName -> [MyToken] -> Either ParseError a
Of these, only the first and last use actual lists for the input, but the middle two use other Stream instances that act enough like lists for Parsec to work with them.
You can even declare your own Stream instance, so if your input is in some other type that acts sort of list-like, you can write an instance, implement the uncons function, and Parsec will work with your type, as well.

Haskell ByteString / Data.Binary.Get question

Attempting to use Data.Binary.Get and ByteString and not understanding what's happening. My code is below:
getSegmentParams :: Get (Int, L.ByteString)
getSegmentParams = do
seglen <- liftM fromIntegral getWord16be
params <- getByteString (seglen - 2)
return (seglen, params)
I get the following error against the third item of the return tuple, ie payload:
Couldn't match expected type `L.ByteString'
against inferred type `bytestring-0.9.1.4:Data.ByteString.Internal.ByteString'
Someone please explain to me the interaction between Data.Binary.Get and ByteStrings and how I can do what I'm intending. Thanks.
It says you expect the second element of the tuple to be a L.ByteString (I assume that L is from Data.ByteString.Lazy) but the getByteString routine returns a strict ByteString from Data.ByteString. You probably want to use getLazyByteString.
There are two ByteString data types: one is in Data.ByteString.Lazy and one is in Data.ByteString.
Given the L qualifying your ByteString, I presume you want the lazy variety, but getByteString is giving you a strict ByteString.
Lazy ByteStrings are internally represented by a list of strict ByteStrings.
Fortunately Data.ByteString.Lazy gives you a mechanism for turning a list of strict ByteStrings into a lazy ByteString.
If you define
import qualified Data.ByteString as S
strictToLazy :: S.ByteString -> L.ByteString
strictToLazy = L.fromChunks . return
you can change your code fragment to
getSegmentParams :: Get (Int, L.ByteString)
getSegmentParams = do
seglen <- liftM fromIntegral getWord16be
params <- getByteString (seglen - 2)
return (seglen, strictToLazy params)
and all should be right with the world.

Resources