Most elegant way to do string conversion in Haskell - string

See this related SO question: Automatic conversion between String and Data.Text in haskell
Given a string of type Text, I want to produce a lazy bytestring.
This works, but I wondered whether it's optimal, given the fact that both Text and the lazy bytestring have the property of being "string-like" and I still use the not-generic unpack:
import qualified Data.ByteString.Lazy (ByteString)
import Data.Text (Text, unpack)
import Data.String (fromString)
import Data.Text (unpack)
convert :: IsString str => Text -> str
convert = fromString . unpack
I found the package string-conversions that offers the polymorphic function
convertString :: a -> b
as part of the ConvertibleStrings typeclass.
While it works fine, I am suspicious: Why would I need an extra package for that? Couldn't there be already a typeclass like IsString that offers a toString method and in combination a universal convert function fromString . toString?

[Ok, while I was editing my question, a possible answer dawned to me]
On the hackage-page of string-conversions it says:
Assumes UTF-8 encoding for both types of ByteStrings.
So there are assumptions that go along with conversions and a universal conversion of string-like types might not be desirable.
Also performance probably depends on the input and output types and a universal conversion would pretend that it's all the same.
So my take on best practice is now this, being explicit rather than polymorphic:
import Data.ByteString.Lazy (ByteString)
import qualified Data.ByteString.Lazy as ByteString
import qualified Data.Text.Encoding as Text
convert :: Text -> ByteString
convert = ByteString.fromStrict . Text.encodeUtf8

Related

Inverse of `Data.Text.Encoding.decodeLatin1`?

Is there a function f :: Text -> Maybe ByteString such that forall x:
f (decodeLatin1 x) == Just x
Note, decodeLatin1 has the signature:
decodeLatin1 :: ByteString -> Text
I'm concerned that encodeUtf8 is not what I want, as I'm guessing what it does is just dump the UTF-8 string out as a ByteString, not reverse the changes that decodeLatin1 made on the way in to characters in the upper half of the character set.
I understand that f has to return a Maybe, because in general there's Unicode characters that aren't in the Latin character set, but I just want this to round trip at least, in that if we start with a ByteString we should get back to it.
DISCLAIMER: consider this a long comment rather than a solution, because I haven't tested.
I think you can do it with witch library. It is a general purpose type converter library with a fair amount of type safety. There is a type class called TryFrom to perform conversion between types that might fail to cast.
Luckily witch provides conversions from/to encondings too, having an instance TryFrom Text (ISO_8859_1 ByteString), meaning that you can convert between Text and latin1 encoded ByteString. So I think (not tested!!) this should work
{-# LANGUAGE TypeApplications #-}
import Witch (tryInto, ISO_8859_1)
import Data.Tagged (Tagged(unTagged))
f :: Text -> Maybe ByteString
f s = case tryInto #(ISO_8859_1 ByteString) s of
Left err -> Nothing
Right bs -> Just (unTagged bs)
Notice that tryInto returns a Either TryFromException s, so if you want to handle errors you can do it with Either. Up to you.
Also, witch docs points out that this conversion is done via String type, so probably there is an out-of-the-box solution without the need of depending on witch package. I don't know such a solution, and looking to the source code hasn't helped
Edit:
Having read witch source code aparently this should work
import qualified Data.Text as T
import Data.Char (isLatin1)
import qualified Data.ByteString.Char8 as C
f :: Text -> Maybe ByteString
f t = if allCharsAreLatin then Just (C.pack str) else Nothing
where str = T.unpack t
allCharsAreLatin = all isLatin1 str
The latin1 encoding is pretty damn simple -- codepoint X maps to byte X, whenever that's in range of a byte. So just unpack and repack immediately.
import Control.Monad
import qualified Data.Text as T
import qualified Data.ByteString.Char8 as BS
latin1EncodeText :: T.Text -> Maybe BS.ByteString
latin1EncodeText t = BS.pack (T.unpack t) <$ guard (T.all (<'\256') t)
It's possible to avoid the intermediate String, but you should probably make sure this is your bottleneck before trying for that.

String -> ByteString and reverse

In my Haskell Program I need to work with Strings and ByteStrings:
import Data.ByteString.Lazy as BS (ByteString)
import Data.ByteString.Char8 as C8 (pack)
import Data.Char (chr)
stringToBS :: String -> ByteString
stringToBS str = C8.pack str
bsToString :: BS.ByteString -> String
bsToString bs = map (chr . fromEnum) . BS.unpack $ bs
bsToString works fine, but stringToBS results with following error at compiling:
Couldn't match expected type ‘ByteString’
with actual type ‘Data.ByteString.Internal.ByteString’
NB: ‘ByteString’ is defined in ‘Data.ByteString.Lazy.Internal’
‘Data.ByteString.Internal.ByteString’
is defined in ‘Data.ByteString.Internal’
In the expression: pack str
In an equation for ‘stringToBS’: stringToBS str = pack str
But I need to let it be ByteString from Data.ByteString.Lazy as BS (ByteString) for further working functions in my code.
Any idea how to solve my problem?
You are working with both strict ByteStrings and lazy ByteStrings which are two different types.
This import:
import Data.ByteString.Lazy as BS (ByteString)
makes ByteString refer the lazy ByteStrings, so the type signature of your stringToBS doesn't match it's definition:
stringToBS :: String -> ByteString -- refers to lazy ByteStrings
stringToBS str = C8.pack str -- refers to strict ByteStrings
I think it would be a better idea to use import qualified like this:
import qualified Data.ByteString.Lazy as LBS
import qualified Data.ByteString.Char8 as BS
and use BS.ByteString and LBS.ByteString to refer to strict / lazy ByteStrings.
You can convert between lazy and non-lazy versions using fromStrict, and toStrict (both functions are in the lazy bytestring module).

Is there an equivalent of the Show typeclass for Data.Text?

Everyone knows Show. But what about:
class ShowText a where
showText :: a -> Text
I can't find this anywhere. Why?
The problem with creating the Text directly is you still need to know the overall size of the strict Text block before filling it in. You can do better with a Builder scheme and using Data.Text.Lazy. Dan Doel does this in bytestring-show, but I'm not aware of an equivalent for Text.
The library text-show exists now and solves exactly this problem.
Update (2016 Feb 12)
The show function provided in the basic-prelude library also renders straight to text:
show :: Show a => a -> Text
basic-prelude also has fewer dependencies than text-show. If you want to use basic-prelude, save yourself compilation headaches by adding the following to the top of your source file:
{-# LANGUAGE NoImplicitPrelude #-}
For the particular case of Int values, here's the code to convert them into strict Text values without using Strings in an intermediate stage:
import Data.Text
import Data.Text.Lazy (toStrict)
import Data.Text.Lazy.Builder (toLazyText)
import Data.Text.Lazy.Builder.Int (decimal)
showIntegral :: Integral a => a -> T.Text
showIntegral = toStrict. toLazyText . decimal
Module Data.Text.Lazy.Builder.RealFloat offers similar functionality for floating point values.
With these we can define a our own version of the Show typeclass:
import Data.Text
import Data.Text.Lazy (toStrict)
import Data.Text.Lazy.Builder (toLazyText)
import Data.Text.Lazy.Builder.Int (decimal)
import Data.Text.Lazy.Builder.RealFloat (realFloat)
class ShowText a where
showText :: a -> Text
instance ShowText Int where
showText = toStrict . toLazyText . decimal
instance ShowText Float where
showText = toStrict . toLazyText . realFloat
Then we can start adding more instances (one for tuples would be useful for example).
It's trivial to write your own function piggybacking off Show:
showText :: Show a => a -> Text
showText = pack . show
In both basic-prelude and classy-prelude there is a tshow function now.
tshow :: Show a => a -> Text
If you're using the standard prelude, try the text-show library.

Best way to convert between [Char] and [Word8]?

I'm new to Haskell and I'm trying to use a pure SHA1 implementation in my app (Data.Digest.Pure.SHA) with a JSON library (AttoJSON).
AttoJSON uses Data.ByteString.Char8 bytestrings, SHA uses Data.ByteString.Lazy bytestrings, and some of my string literals in my app are [Char].
Haskell Prime's wiki page on Char types seems to indicate this is something still being worked out in the Haskell language/Prelude.
And this blogpost on unicode support lists a few libraries but its a couple years old.
What is the current best way to convert between these types, and what are some of the tradeoffs?
Thanks!
Here's what I have, without using ByteString's internal functions.
import Data.ByteString as S (ByteString, unpack)
import Data.ByteString.Char8 as C8 (pack)
import Data.Char (chr)
strToBS :: String -> S.ByteString
strToBS = C8.pack
bsToStr :: S.ByteString -> String
bsToStr = map (chr . fromEnum) . S.unpack
S.unpack on a ByteString gives us [Word8], we apply (chr . fromEnum) which converts any Enum type to a character. By composing all of them together we'll the function we want!
For conversion between Char8 and Word8 you should be able to use toEnum/fromEnum conversions, as they represent the same data.
For Char and Strings you might be able to get away with Data.ByteString.Char8.pack/unpack or some sort of combination of map, toEnum and fromEnum, but that throws out data if you're using anything other than ASCII.
For strings which could contain more than just ASCII a popular choice is UTF8 encoding. I like the utf8-string package for this:
http://hackage.haskell.org/packages/archive/utf8-string/0.3.6/doc/html/Codec-Binary-UTF8-String.html
Char8 and normal bytestrings are the same thing, just with different interfaces depending on which module you import. Mainly you want to convert between strict and lazy bytestrings, for which you use toChunks and fromChunks.
To put chars into bytestrings, use pack.
Also note that if your chars include codepoints which multibyte representations in UTF-8, then there will be problems.
Note : This answers the question in a very specific case (calling functions on hard-coded strings).
This may seem a minor problem because conversion functions exist as detailed in previous answers.
But I wanted a method to reduce administrative code, i.e. the code that you have to write just to get functions working together.
The solution to reducing type-handling code for strings is to use the OverloadedStrings pragma and import the relevant module(s)
{-# LANGUAGE OverloadedStrings #-}
module Dummy where
import Data.ByteString.Lazy.Char8 (ByteString, append)
bslHandling :: ByteString -> ByteString
bslHandling = (append myWord8List)
myWord8List = "I look like a String, but I'm actually a ByteString"
Note : myWordList type is inferred by the compiler.
If you do not use it in bslHandling, then the above declaration will yeld a classical [Char] type.
It does not solve the problem of passing from one specific type to another
Hope it helps
Maybe you want to do this:
import Data.ByteString.Internal (unpackBytes)
import Data.ByteString.Char8 (pack)
import GHC.Word (Word8)
strToWord8s :: String -> [Word8]
strToWord8s = unpackBytes . pack
Assuming that Char and Word8 are the same,
import Data.Word ( Word8 )
import Unsafe.Coerce ( unsafeCoerce )
toWord8 :: Char -> Word8
toWord8 = unsafeCoerce
strToWord8 :: String -> Word8
strToWord8 = map toWord8

Reading in a binary file in haskell

How could I write a function with a definition something like...
readBinaryFile :: Filename -> IO Data.ByteString
I've got the functional parts of Haskell down, but the type system and monads still make my head hurt. Can someone write and explain how that function works to me?
import Data.ByteString.Lazy
readFile fp
easy as pie man. Knock off the lazy if you don't want the string to be lazy.
import Data.ByteString.Lazy as BS
import Data.Word
import Data.Bits
fileToWordList :: String -> IO [Word8]
fileToWordList fp = do
contents <- BS.readFile fp
return $ unpack contents
readBinaryFile :: Filename -> IO Data.ByteString
This is simply the Data.ByteString.readFile function, which you should never have to write, since it is in the bytestring package.

Resources