Hi have binaries of float data (single-precision 32-bit IEEE) that I would like to work on.
How can I best load this for further use, ideally as (IOArray Int Float).
bytesToFloats :: ByteString -> [Float]
bytesToFloatArray :: ByteString -> IOArray Int Float
If you've got bog standard single-precision floats, and you just want to work them over in Haskell, you can always be down and dirty about it:
import Data.ByteString.Internal as BS
import qualified Data.Vector.Storable as V
bytesToFloats :: BS.ByteString -> V.Vector Float
bytesToFloats = V.unsafeCast . aux . BS.toForeignPtr
where aux (fp,offset,len) = V.unsafeFromForeignPtr fp offset len
I think you might be happier with Data.Vector:
http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Vector_Tutorial#Parsing_Binary_Data
You could also use cereal library, for example:
import Control.Applicative
import Data.ByteString
import Data.Serialize
floatsToBytes :: [Float] -> ByteString
floatsToBytes = runPut . mapM_ putFloat32le
-- | Parses the input and returns either the result or an error description.
bytesToFloat :: ByteString -> Either String [Float]
bytesToFloat = runGet $ many getFloat32le
If you can convert 4 bytes to a Word32, you can use the function wordToFloat in the data-binary-ieee754 package to convert it to a float. You could then load this into any kind of list-like structure you want to manipulate it.
Related
The C language provides a very handy way of updating the nth element of an array: array[n] = new_value. My understanding of the Data.ByteString type is that it provides a very similar functionality to a C array of uint8_t - access via index :: ByteString -> Int -> Word8. It appears that the opposite operation - updating a value - is not that easy.
My initial approach was to use the take, drop and singleton functions, concatetaned in the following way:
updateValue :: ByteString -> Int -> Word8 -> ByteString
updateValue bs n value = concat [take (n-1) bs, singleton value, drop (n+1) bs]
(this is a very naive implementation as it does not handle edge cases)
Coming with a C background, it feels a bit too heavyweight to call 4 functions to update one value. Theoretically, the operation complexity is not that bad:
take is O(1)
drop is O(1)
singleton is O(1)
concat is O(n), but here I am not sure if the n is the length of the concatenated list altogether or if its just, in our case, 3.
My second approach was to ask Hoogle for a function with a similar type signature: ByteString -> Int -> a -> ByteString, but nothing appropriate appeared.
Am I missing something very obvious, or is really that complex to update the value?
I would like to note that I understand the fact that the ByteString is immutable and that changing any of its elements will result into a new ByteString instance.
EDIT:
A possible solution that I found while reading about the Control.Lens library uses the set lens. The following is an outtake from GHCi with omitted module names:
> import Data.ByteString
> import Control.Lens
> let clock = pack [116, 105, 99, 107]
> clock
"tick"
> let clock2 = clock & ix 1 .~ 111
> clock2
"tock"
One solution is to convert the ByteString to a Storable Vector, then modify that:
import Data.ByteString (ByteString)
import Data.Vector.Storable (modify)
import Data.Vector.Storable.ByteString -- provided by the "spool" package
import Data.Vector.Storable.Mutable (write)
import Data.Word (Word8)
updateAt :: Int -> Word8 -> ByteString -> ByteString
updateAt n x s = vectorToByteString . modify inner . byteStringToVector
where
inner v = write v n x
See the documentation for spool and vector.
I'm trying to process some Point Cloud data with Haskell, and it seems to use a LOT of memory. The code I'm using is below, it basically parses the data into a format I can work with. The dataset has 440MB with 10M rows. When I run it with runhaskell, it uses up all the ram in a short time (~3-4gb) and then crashes. If I compile it with -O2 and run it, it goes to 100% cpu and takes a long time to finish (~3 minutes). I should mention that I'm using an i7 cpu with 4GB ram and an SSD, so there should be plenty of resources. How can I improve the performance of this?
{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding (lines, readFile)
import Data.Text.Lazy (Text, splitOn, unpack, lines)
import Data.Text.Lazy.IO (readFile)
import Data.Maybe (fromJust)
import Text.Read (readMaybe)
filename :: FilePath
filename = "sample.txt"
readTextMaybe = readMaybe . unpack
data Classification = Classification
{ id :: Int, description :: Text
} deriving (Show)
data Point = Point
{ x :: Int, y :: Int, z :: Int, classification :: Classification
} deriving (Show)
type PointCloud = [Point]
maybeReadPoint :: Text -> Maybe Point
maybeReadPoint text = parse $ splitOn "," text
where toMaybePoint :: Maybe Int -> Maybe Int -> Maybe Int -> Maybe Int -> Text -> Maybe Point
toMaybePoint (Just x) (Just y) (Just z) (Just cid) cdesc = Just (Point x y z (Classification cid cdesc))
toMaybePoint _ _ _ _ _ = Nothing
parse :: [Text] -> Maybe Point
parse [x, y, z, cid, cdesc] = toMaybePoint (readTextMaybe x) (readTextMaybe y) (readTextMaybe z) (readTextMaybe cid) cdesc
parse _ = Nothing
readPointCloud :: Text -> PointCloud
readPointCloud = map (fromJust . maybeReadPoint) . lines
main = (readFile filename) >>= (putStrLn . show . sum . map x . readPointCloud)
The reason this uses all your memory when compiled without optimization is most likely because sum is defined using foldl. Without the strictness analysis that comes with optimization, that will blow up badly. You can try using this function instead:
sum' :: Num n => [n] -> n
sum' = foldl' (+) 0
The reason this is slow when compiled with optimization seems likely related to the way you parse the input. A cons will be allocated for each character when reading in the input, and again when breaking the input into lines, and probably yet again when splitting on commas. Using a proper parsing library (any of them) will almost certainly help; using one of the streaming ones like pipes or conduit may or may not be best (I'm not sure).
Another issue, not related to performance: fromJust is rather poor form in general, and is a really bad idea when dealing with user input. You should instead mapM over the list in the Maybe monad, which will produce a Maybe [Point] for you.
I need to read a binary format in Haskell. The format is fairly simple: four octets indicating the length of the data, followed by the data. The four octets represent an integer in network byte-order.
How can I convert a ByteString of four bytes to an integer? I want a direct cast (in C, that would be *(int*)&data), not a lexicographical conversion. Also, how would I go about endianness? The serialized integer is in network byte-order, but the machine may use a different byte-order.
I tried Googling but that only yold results about lexicographical conversion.
The binary package contains tools to get integer types of various sizes and endianness from ByteStrings.
λ> :set -XOverloadedStrings
λ> import qualified Data.Binary.Get as B
λ> B.runGet B.getWord32be "\STX\SOH\SOH\SOH"
33620225
λ> B.runGet B.getWord32be "\STX\SOH\SOH\SOHtrailing characters are ignored"
33620225
λ> B.runGet B.getWord32be "\STX\SOH\SOH" -- remember to use `catch`:
*** Exception: Data.Binary.Get.runGet at position 0: not enough bytes
CallStack (from HasCallStack):
error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.5.1:Data.Binary.Get
I assume you can use a fold, and then use either foldl or foldr to determine which endian you want (I forget which is which).
foldl :: (a -> Word8 -> a) -> a -> ByteString -> a
I think this will work for the binary operator:
foo :: Int -> Word8 -> Int
foo prev v = (prev * 256) + v
I'd just extract the first four bytes and merge them into a single 32bit integer using the functions in Data.Bits:
import qualified Data.ByteString.Char8 as B
import Data.Char (chr, ord)
import Data.Bits (shift, (.|.))
import Data.Int (Int32)
readInt :: B.ByteString -> Int32
readInt bs = (byte 0 `shift` 24)
.|. (byte 1 `shift` 16)
.|. (byte 2 `shift` 8)
.|. byte 3
where byte n = fromIntegral $ ord (bs `B.index` n)
sample = B.pack $ map chr [0x01, 0x02, 0x03, 0x04]
main = print $ readInt sample -- prints 16909060
main :: IO ()
main = do
let a = ("teeeeeeeeeeeeest","teeeeeeeeeeeest")
b <- app a
print b
app expects (bytestring,bytestring) not ([char],[char])
how can I convert it?
You can convert Strings to ByteStrings with Data.ByteString.Char8.pack (or the lazy ByteString version thereof) if your String contains only ASCII values or you are interested only in the last eight bits of each Char,
import qualified Data.ByteString.Char8 as C
main :: IO ()
main = do
let a = ("teeeeeeeeeeeeest","teeeeeeeeeeeest")
b <- app $ (\(x,y) -> (C.pack x, C.pack y)) a
print b
If your String contains non-ASCII Chars and you are interested in more than only the last eight bits, you will need some other encoding, like Data.ByteString.UTF8.fromString.
You could try:
import qualified Data.ByteString.Char8 as B --to prevent name clash with Prelude
B.pack "Hello, world"
A lot of useful functions can be found here:
http://www.haskell.org/ghc/docs/latest/html/libraries/bytestring/Data-ByteString-Char8.html
you could also use Data.ByteString.Lazy.Char8
for lazy bytestrings
http://hackage.haskell.org/packages/archive/bytestring/latest/doc/html/Data-ByteString-Lazy-Char8.html#v:pack
I wish to compress my application's network traffic.
According to the (latest?) "Haskell Popularity Rankings", zlib seems to be a pretty popular solution. zlib's interface uses ByteStrings:
compress :: ByteString -> ByteString
decompress :: ByteString -> ByteString
I am using regular Strings, which are also the data types used by read, show, and Network.Socket:
sendTo :: Socket -> String -> SockAddr -> IO Int
recvFrom :: Socket -> Int -> IO (String, Int, SockAddr)
So to compress my strings, I need some way to convert a String to a ByteString and vice-versa.
With hoogle's help, I found:
Data.ByteString.Char8 pack :: String -> ByteString
Trying to use it:
Prelude Codec.Compression.Zlib Data.ByteString.Char8> compress (pack "boo")
<interactive>:1:10:
Couldn't match expected type `Data.ByteString.Lazy.Internal.ByteString'
against inferred type `ByteString'
In the first argument of `compress', namely `(pack "boo")'
In the expression: compress (pack "boo")
In the definition of `it': it = compress (pack "boo")
Fails, because (?) there are different types of ByteString ?
So basically:
Are there several types of ByteString? What types, and why?
What's "the" way to convert Strings to ByteStrings?
Btw, I found that it does work with Data.ByteString.Lazy.Char8's ByteString, but I'm still intrigued.
There are two kinds of bytestrings: strict (defined in Data.Bytestring.Internal) and lazy (defined in Data.Bytestring.Lazy.Internal). zlib uses lazy bytestrings, as you've discovered.
The function you're looking for is:
import Data.ByteString as BS
import Data.ByteString.Lazy as LBS
lazyToStrictBS :: LBS.ByteString -> BS.ByteString
lazyToStrictBS x = BS.concat $ LBS.toChunks x
I expect it can be written more concisely without the x. (i.e. point-free, but I'm new to Haskell.)
A more efficient mechanism might be to switch to a full bytestring-based layer:
network.bytestring for bytestring sockets
lazy bytestrings for compressoin
binary of bytestring-show to replace Show/Read