Rsync's rolling checksum in haskell

Rsync's rolling checksum in haskell - haskell

I'm starting learn both haskell and remote delta compression. My first step is to implement rsync's version of rolling checksum in haskell. Does chunk equal to X(i) in those formula? If so I'm getting confused.
Haskell can turn bytestring into byte array
How I do turn array of Word8s into that big chunk. Word32768? I mean what if X(i) is list of Word8s?
After that how to do arithmetical operation on those unsigned ints of 4KB size?
Also my current version of implementation just slides by 1B (Word8) each.

It's very easy to turn a ByteString into [Word8] using unpack, which should be sufficient for performing this algorithm (although not necessarily the most efficient).
Why do you need to turn Word8s into a Word32768? Why do you need a 2^15 bit number? That would be very difficult to represent, but you could use a list or array of Word8s, which is very easy to represent in memory and is equivalent.
For performing arithmetic, functions like map, zipWith, folds, and scans are very useful. For example, performing the first step of the algorithm:
import qualified Data.ByteString as BS
a :: Int -> Int -> ByteString -> Int
a k l x
= (`mod` m)
$ sum
$ map fromIntegral
$ take (l - k)
$ drop k
$ BS.unpack x
where m = 2 ^ 16
And implementing the function b is only marginally more difficult, you just have to figure out the sequence of l - i + 1 for i = k to l, then use zipWith (*) between the map fromIntegral and take (l - k). After that, it's quite trivial to implement s, although it certainly could be performed more efficiently if you factor out the common steps of take (l - k) $ drop k $ BS.unpack x.

Chunk does not equal to X(i) in that equation/formula in the link provided. It is mostly related to Data Deduplication. Also, rolling checksum can be used to create chunks, identify chunk boundaries etc.
Also my current implementation of rsync's rolling checksum is as follows. Next I'm gonna implement cyclic polynomial rolling checksum then read some books on Data Deduplication
import qualified Data.ByteString.Lazy as B
import qualified Data.ByteString.Lazy.Char8 as B8
import Data.Word
import Data.Bits
import Data.Int
type CheckSumPartial = Word16
type CheckSumA = CheckSumPartial
type CheckSumB = CheckSumPartial
type WindowSize = Int64
type CheckSum = Word32
type Byte = Word8
main:: IO ()
main = do
let str = B8.pack "abcdef"
let s1 = roll 3 str
let s2 = withoutRoll 3 str
print s1
print s2
return ()
roll :: WindowSize -> B.ByteString -> [CheckSum]
roll w str =
let
(a,b,s) = newABS w str
h = B.head str
t = B.tail str
in if fromIntegral (B.length t) < w
then [s]
else s : rollNext w t h a b
withoutRoll :: WindowSize -> B.ByteString -> [CheckSum]
withoutRoll w str =
let
(_,_,s) = newABS w str
t = B.tail str
in if fromIntegral (B.length t) < w
then [s]
else s : withoutRoll w t
newA :: WindowSize -> B.ByteString -> CheckSumA
newA w str =
let block = B.take w str
in B.foldr aSum (0::CheckSumA) block
where
aSum x acc = acc + (fromIntegral x :: CheckSumA)
newB :: WindowSize -> B.ByteString -> CheckSumB
newB w str =
let block = B.take w str
in fst $ B.foldr bSum (0::CheckSumB, w) block
where
bSum x (acc,l) = (acc + fromIntegral l * (fromIntegral x :: CheckSumB), l-1)
rollA :: CheckSumA -> Byte -> Byte -> CheckSumA
rollA prevA prevHead curLast = prevA - fromIntegral prevHead + fromIntegral curLast
rollB :: CheckSumA -> Byte -> WindowSize -> CheckSumB -> CheckSumB
rollB curA prevHead w prevB = prevB - fromIntegral w * fromIntegral prevHead + curA
calculateS :: CheckSumA -> CheckSumB -> CheckSum
calculateS a b = (fromIntegral a :: Word32) .|. shift (fromIntegral b :: Word32) 16
rollNext :: WindowSize ->B.ByteString -> Byte -> CheckSumA -> CheckSumB -> [CheckSum]
rollNext w str prevHead prevA prevB =
let
curBlock = B.take (fromIntegral w) str
curLast = B.last curBlock
h = B.head str
t = B.tail str
a = rollA prevA prevHead curLast
b = rollB a prevHead w prevB
s = calculateS a b
in if fromIntegral (B.length t) < w
then [s]
else s : rollNext w t h a b
newABS :: WindowSize -> B.ByteString -> (CheckSumA, CheckSumB, CheckSum)
newABS w str =
let a = newA w str
b = newB w str
s = calculateS a b
in (a,b,s)

Related

Optimize this bit of conduit code for speed

I am implementing an approximate counting algorithm where we:
Maintain t counters {X1,..., Xt} using log (log n) bits for each
Initialize all counters to 0
When an item arrives, increase each Xi by 1 independently with probability (½)Xi
When the stream is over, output Z = 1/t (2X1 − 1 + ... + 2Xt − 1)
repeat the above steps m times independently and in parallel and output the median.
Here it is in haskell, using the conduit library.
import Data.Random
import Data.Conduit
import Data.List
import Data.Ord (comparing)
import qualified Data.Conduit.List as Cl
import Control.Monad.Identity
type Prob = Double
type Counter = Float
type Delta = Double
type Eps = Double
-- * Run Morris alpha on stream inputs `xs`
morrisA :: [a] -> IO Counter
morrisA xs = flip runRVar StdRandom $ Cl.sourceList xs $$ alpha
-- * Run Morris beta on stream inputs `xs` for `t` independent trials and average
morrisB :: Int -> [a] -> IO Counter
morrisB t = fmap rmean . replicateM t . morrisA
-- * final morris algorithm
-- * Run on stream inputs `xs` for t independent trials for `t = 1/eps`,
-- * and `m` times in parralell, for `m = 1/(e^2 * d)`
-- * and take the median
morris :: Eps -> Delta -> [a] -> IO Counter
morris e d = fmap rmedian . replicateM m . morrisB t
where (t,m) = (round $ 1/(e^2*d), round $ 1/d)
-- * Utils * --
-- * A step in morris Algorithm alpha
alpha :: Sink a RVar Counter
alpha = (\x -> 2^(round x) - 1) <$> Cl.foldM (\x _ -> incr x) 0
-- * Increment a counter `x` with probability 1/2^x
incr :: Counter -> RVar Counter
incr x = do
h <- (\q -> q <= (0.5^(round x) :: Prob)) <$> uniform 0 1
return $ if h then (seq () succ x) else seq () x
rmean, rmedian :: (Floating a, Ord a, RealFrac a) => [a] -> Float
rmean = fromIntegral . round . mean
rmedian = fromIntegral . round . median
-- |Numerically stable mean
mean :: Floating a => [a] -> a
mean x = fst $ foldl' (\(!m, !n) x -> (m+(x-m)/(n+1),n+1)) (0,0) x
-- |Median
median :: (Floating a, Ord a) => [a] -> a
median x | odd n = head $ drop (n `div` 2) x'
| even n = mean $ take 2 $ drop i x'
where i = (length x' `div` 2) - 1
x' = sort x
n = length x
The problem is that runtime of morris is linear in both length of stream and number of iterations t*m. So eg morrisA takes about 100μs for 100 items. Now if we want to be 95% confident of 5% error, we have to run morris for n=160000 times.
Could someone suggest how to optimize this code? maybe:
implement morris in something other than conduit
some faster way to repeat morrisA other than replicateM.

SHA-1 in haskell producing wrong hashes

I wrote a program to perform SHA-1 in haskell, and while it does produce hashes, they do not match with the ones produced by other SHA-1 programs
Example: cat
hashes to: b5be86bc8bccfc24b01b093228ebb96fc92fa804 but is supposed to hash to 9d989e8d27dc9e0ec3389fc855f142c3d40f0c50
My code is:
(old code omitted)
I have no idea what is wrong. Can someone tell me where I made a mistake?
Edit:
I fixed the stuff that was pointed out, however it is still not working. It works correctly up until the inner loop.
I cleaned up the code so the functions for the inner loop are available as f1, f2 and f3
cat now interestingly hashes to ebe6c9fa1afa0ef5a0ca80bab251fd41cc29127e.
Code:
import Data.Word
import Data.Bits
import Data.Char (ord, intToDigit)
import Data.Binary (encode, decode)
import Numeric (showHex, showIntAtBase)
import System.IO (stdin)
import Data.Sequence ((<|), (|>))
import qualified Data.Sequence as S
import qualified Data.ByteString.Lazy as B
type Quintuple32 = (Word32, Word32, Word32, Word32, Word32)
addQuintuple (a, b, c, d, e) (f, g, h, i, j) =
(a + f, b + g, c + h, d + i, e + j)
shower :: Quintuple32 -> String
shower (a, b, c, d, e) = concatMap (`showHex` "") [a, b, c, d, e]
hash :: Int -> S.Seq Word32 -> Quintuple32 -> Quintuple32
hash i w h#(a, b, c, d, e)
| i < 20 = hash (i + 1) w (newhash (f1 h + k1))
| i < 40 = hash (i + 1) w (newhash (f2 h + k2))
| i < 60 = hash (i + 1) w (newhash (f3 h + k3))
| i < 80 = hash (i + 1) w (newhash (f2 h + k4))
| otherwise = h
where (k1, k2, k3, k4) = (0x5A827999, 0x6ED9EBA1, 0x8F1BBCDC, 0xCA62C1D6)
newhash a' = (rotate a 5 + a' + e + (w `S.index` i), a, rotate b 30, c, d)
f1 :: Quintuple32 -> Word32
f1 (_, b, c, _, _) = (b .&. c) .|. (complement b .&. c)
f2 :: Quintuple32 -> Word32
f2 (_, b, c, d, _) = b `xor` c `xor` d
f3 :: Quintuple32 -> Word32
f3 (_, b, c, d, _) = (b .&. c) .|. (b .&. d) .|. (c .&. d)
starting :: Quintuple32
starting = (0x67452301
, 0xEFCDAB89
, 0x98BADCFE
, 0x10325476
, 0xC3D2E1F0)
hasher :: Quintuple32 -> S.Seq Word32 -> Quintuple32
hasher acc x = addQuintuple acc (hash 0 (extend x) acc)
process :: B.ByteString -> Quintuple32
process = foldl hasher starting . chunks . pad
extend :: S.Seq Word32 -> S.Seq Word32
extend = extend' 16
extend' :: Int -> S.Seq Word32 -> S.Seq Word32
extend' 80 a = a
extend' i a = extend' (i + 1) (a |> xored)
where xored = rotate ((a `S.index` (i - 3)) `xor`
(a `S.index` (i - 8)) `xor`
(a `S.index` (i - 14)) `xor`
(a `S.index` (i - 16))) 1
toBytes :: String -> B.ByteString
toBytes = B.pack . map (fromIntegral . ord)
splitEvery n xs
| B.null xs = S.empty
| otherwise = B.take n xs <| splitEvery n (B.drop n xs)
chunks :: B.ByteString -> [S.Seq Word32]
chunks xs
| B.null xs = []
| otherwise = x : chunks (B.drop 64 xs)
where x = fmap decode (splitEvery 4 (B.take 64 xs))
pad :: B.ByteString -> B.ByteString
pad xs = B.append (add0 $ add1 xs) length64
where length64 = encode (fromIntegral (8 * B.length xs) :: Word64)
add1 :: B.ByteString -> B.ByteString
add1 = flip B.append (B.singleton 128)
add0 :: B.ByteString -> B.ByteString
add0 xs
| modulo /= 448 = add0 $ B.append xs (B.singleton 0)
| otherwise = xs
where modulo = (B.length xs * 8) `rem` 512
Also, a small question: is something like (a, b) = (8, 9) an acceptable thing to do to set multiple variables?

Oh, another one of these!
Two errors jump out at me immediately:
pad :: B.ByteString -> B.ByteString
pad xs = B.append (add0 $ add1 xs) length64
where length64 = encode (fromIntegral (B.length xs) :: Word64)
Notice the length you append is supposed to be the bit length, not the byte length.
add1 :: B.ByteString -> B.ByteString
add1 = flip B.append (B.singleton 255)
Notice 255 /= 0b10000000 and the pad is supposed to be the later.
In general you debug these by 1) going over the spec again and again. 2) Comparing to another implementation, such as Adam Wick's SHA package, and comparing for equality at as fine grained level as possible.
EDIT: There are two more bugs, basically transcription errors. Look around a bit and shout if you're still stuck.

Generating sequence from Markov chain in Haskell

I would like to generate random sequences from a Markov chain. To generate the Markov chain I use the following code.
module Main where
import qualified Control.Monad.Random as R
import qualified Data.List as L
import qualified Data.Map as M
type TransitionMap = M.Map (String, String) Int
type MarkovChain = M.Map String [(String, Int)]
addTransition :: (String, String) -> TransitionMap -> TransitionMap
addTransition k = M.insertWith (+) k 1
fromTransitionMap :: TransitionMap -> MarkovChain
fromTransitionMap m =
M.fromList [(k, frequencies k) | k <- ks]
where ks = L.nub $ map fst $ M.keys m
frequencies a = map reduce $ filter (outboundFor a) $ M.toList m
outboundFor a k = fst (fst k) == a
reduce e = (snd (fst e), snd e)
After collecting the statistics and generating a Markov Chain object I would like to generate random sequences. I could imagine this method could look something like that (pseudo-code)
generateSequence mc s
| s == "." = s
| otherwise = s ++ " " ++ generateSequence mc s'
where s' = drawRandomlyFrom $ R.fromList $ mc ! s
I would greatly appreciate if someone could explain to me, how I should implement this function.
Edit
If anyone's interested it wasn't as difficult as I thought.
module Main where
import qualified Control.Monad.Random as R
import qualified Data.List as L
import qualified Data.Map as M
type TransitionMap = M.Map (String, String) Rational
type MarkovChain = M.Map String [(String, Rational)]
addTransition :: TransitionMap -> (String, String) -> TransitionMap
addTransition m k = M.insertWith (+) k 1 m
fromTransitionMap :: TransitionMap -> MarkovChain
fromTransitionMap m =
M.fromList [(k, frequencies k) | k <- ks]
where ks = L.nub $ map fst $ M.keys m
frequencies a = map reduce $ filter (outboundFor a) $ M.toList m
outboundFor a k = fst (fst k) == a
reduce e = (snd (fst e), snd e)
generateSequence :: (R.MonadRandom m) => MarkovChain -> String -> m String
generateSequence m s
| not (null s) && last s == '.' = return s
| otherwise = do
s' <- R.fromList $ m M.! s
ss <- generateSequence m s'
return $ if null s then ss else s ++ " " ++ ss
fromSample :: [String] -> MarkovChain
fromSample ss = fromTransitionMap $ foldl addTransition M.empty $ concatMap pairs ss
where pairs s = let ws = words s in zipWith (,) ("":ws) ws
sample :: [String]
sample = [ "I am a monster."
, "I am a rock star."
, "I want to go to Hawaii."
, "I want to eat a hamburger."
, "I have a really big headache."
, "Haskell is a fun language."
, "Go eat a big hamburger."
, "Markov chains are fun to use."
]
main = do
s <- generateSequence (fromSample sample) ""
print s
The only tiny annoyance is the fake "" starting node.

Not sure if this is what you're looking for. This compiles though:
generateSequence :: (R.MonadRandom m) => MarkovChain -> String -> m String
generateSequence mc s | s == "." = return s
| otherwise = do
s' <- R.fromList $ rationalize (mc M.! s)
s'' <- generateSequence mc s'
return $ s ++ " " ++ s''
rationalize :: [(String,Int)] -> [(String,Rational)]
rationalize = map (\(x,i) -> (x, toRational i))

All random number generation needs to happen in either the Random monad or the IO monad. For your purpose, it's probably easiest to understand how to do that in the IO monad, using evalRandIO. In the example below, getRandom is the function we want to use. Now getRandom operates in the Random monad, but we can use evalRandIO to lift it to the IO monad, like this:
main :: IO ()
main = do
x <- evalRandIO getRandom :: IO Double
putStrLn $ "Your random number is " ++ show x
Note: The reason we have to add the type signature to the line that binds x is because in this particular example there are no other hints to tell the compiler what type we want x to be. However, if we used x in some way that makes it clear that we want it to be a Double (e.g., multiplying by another Double), then the type signature wouldn't be necessary.
Using your MarkovChain type, for a current state you can trivially get the available transitions in the form [(nextState,probability)]. (I'm using the word "probability" loosely, it doesn't need to be a true probability; any numeric weight is fine). This is what fromList in Control.Monad.Random is designed for. Again, it operates in the Random monad, but we can use evalRandIO to lift it to the IO monad. Suppose transitions is your list of transitions, having the type [(nextState,probability)]. Then, in the IO monad you can call:
nextState <- evalRandIO $ fromList transitions
You might instead want to create your own function that operates in the Random monad, like this:
getRandomTransition :: RandomGen g => MarkovChain -> String -> Rand g String
getRandomTransition currState chain = do
let transitions = lookup currState chain
fromList transitions
Then you can call this function in the IO monad using evalRandIO, e.g.
nextState <- evalRandIO $ getRandomTransition chain

No stream fusion with unsafeUpdate_ in unboxed vector

Is it possible to maintain stream fusion when processing a vector if unsafeUpdate_ function is used to update some elements of a vector? The answer seems to be no in the test I did. For the code below, temporary vector is generated in upd function, as confirmed in the core:
module Main where
import Data.Vector.Unboxed as U
upd :: Vector Int -> Vector Int
upd v = U.unsafeUpdate_ v (U.fromList [0]) (U.fromList [2])
sum :: Vector Int -> Int
sum = U.sum . upd
main = print $ Main.sum $ U.fromList [1..3]
In the core, $wupd function is used in sum - as seen below, it generates new bytearray:
$wupd :: Vector Int -> Vector Int
$wupd =
\ (w :: Vector Int) ->
case w `cast` ... of _ { Vector ipv ipv1 ipv2 ->
case main11 `cast` ... of _ { Vector ipv3 ipv4 ipv5 ->
case main7 `cast` ... of _ { Vector ipv6 ipv7 ipv8 ->
runSTRep
(\ (# s) (s :: State# s) ->
case >=# ipv1 0 of _ {
False -> case main6 ipv1 of wild { };
True ->
case newByteArray# (*# ipv1 8) (s `cast` ...)
of _ { (# ipv9, ipv10 #) ->
case (copyByteArray# ipv2 (*# ipv 8) ipv10 0 (*# ipv1 8) ipv9)
`cast` ...
There is a nice, tight loop in the core for sum function but just before that loop, there is a call to $wupd function, and so, a temporary generation.
Is there a way to avoid temporary generation in the example here? The way I think about it, updating a vector in index i is the case of parsing a stream but only acting on the stream in index i (skipping the rest), and replacing the element there with another element. So, updating a vector in an arbitrary location shouldn't break stream fusion, right?

I can't be 100% sure, because with vector it's turtles all the way down (you never really reach the actual implementation, there's always another indirection), but as far as I understand it, the update variants force a new temporary through cloning:
unsafeUpdate_ :: (Vector v a, Vector v Int) => v a -> v Int -> v a -> v a
{-# INLINE unsafeUpdate_ #-}
unsafeUpdate_ v is w
= unsafeUpdate_stream v (Stream.zipWith (,) (stream is) (stream w))
unsafeUpdate_stream :: Vector v a => v a -> Stream (Int,a) -> v a
{-# INLINE unsafeUpdate_stream #-}
unsafeUpdate_stream = modifyWithStream M.unsafeUpdate
and modifyWithStream calls clone (and new),
modifyWithStream :: Vector v a
=> (forall s. Mutable v s a -> Stream b -> ST s ())
-> v a -> Stream b -> v a
{-# INLINE modifyWithStream #-}
modifyWithStream p v s = new (New.modifyWithStream p (clone v) s)
new :: Vector v a => New v a -> v a
{-# INLINE_STREAM new #-}
new m = m `seq` runST (unsafeFreeze =<< New.run m)
-- | Convert a vector to an initialiser which, when run, produces a copy of
-- the vector.
clone :: Vector v a => v a -> New v a
{-# INLINE_STREAM clone #-}
clone v = v `seq` New.create (
do
mv <- M.new (length v)
unsafeCopy mv v
return mv)
and I see no way that vector would get rid of that unsafeCopy again.

If you need to change one or very few elements, there are nice solutions in repa and yarr libraries. They preserve fusion (I'm not sure about repa) and Haskell-idiomatic.
Repa, using fromFunction:
upd arr = fromFunction (extent arr) ix
where ix (Z .: 0) = 2
ix i = index arr i
Yarr, using Delayed:
upd arr = Delayed (extent arr) (touchArray arr) (force arr) ix
where ix 0 = return 2
ix i = index arr i

How to print integer literals in binary or hex in haskell?

How to print integer literals in binary or hex in haskell?
printBinary 5 => "0101"
printHex 5 => "05"
Which libraries/functions allow this?
I came across the Numeric module and its showIntAtBase function but have been unable to use it correctly.
> :t showIntAtBase
showIntAtBase :: (Integral a) => a -> (Int -> Char) -> a -> String -> String

The Numeric module includes several functions for showing an Integral type at various bases, including showIntAtBase. Here are some examples of use:
import Numeric (showHex, showIntAtBase)
import Data.Char (intToDigit)
putStrLn $ showHex 12 "" -- prints "c"
putStrLn $ showIntAtBase 2 intToDigit 12 "" -- prints "1100"

You may also use printf of the printf package to format your output with c style format descriptors:
import Text.Printf
main = do
let i = 65535 :: Int
putStrLn $ printf "The value of %d in hex is: 0x%08x" i i
putStrLn $ printf "The html color code would be: #%06X" i
putStrLn $ printf "The value of %d in binary is: %b" i i
Output:
The value of 65535 in hex is: 0x0000ffff
The html color code would be: #00FFFF
The value of 65535 in binary is: 1111111111111111

If you import the Numeric and Data.Char modules, you can do this:
showIntAtBase 2 intToDigit 10 "" => "1010"
showIntAtBase 16 intToDigit 1023 "" => "3ff"
This will work for any bases up to 16, since this is all that intToDigit works for. The reason for the extra empty string argument in the examples above is that showIntAtBase returns a function of type ShowS, which will concatenate the display representation onto an existing string.

You can convert integer to binary with something like the following:
decToBin x = reverse $ decToBin' x
where
decToBin' 0 = []
decToBin' y = let (a,b) = quotRem y 2 in [b] ++ decToBin' a
usage in GHCi:
Prelude> decToBin 10
[1,0,1,0]

Hex can be written with 0x and binary with 0b prefix e.g.:
> 0xff
255
>:set -XBinaryLiterals
> 0b11
3
Note that binary requires the BinaryLiterals extension.

You could define your own recursive functions like:
import Data.Char (digitToInt)
import Data.Char (intToDigit)
-- generic function from base to decimal
toNum :: [Char] -> Int -> (Char -> Int) -> Int
toNum [] base map = 0
toNum s base map = base * toNum (init(s)) base map + map(last(s))
-- generic function from decimal to base k
toKBaseNum :: Int -> Int -> (Int -> Char) -> [Char]
toKBaseNum x base map | x < base = [map x]
| otherwise = toKBaseNum (x `div` base) base map ++ [map(x `mod` base)]
-- mapping function for hex to decimal
mapHexToDec :: Char -> Int
mapHexToDec x | x == 'A' = 10
| x == 'B' = 11
| x == 'C' = 12
| x == 'D' = 13
| x == 'E' = 14
| x == 'F' = 15
| otherwise = digitToInt(x) :: Int
-- map decimal to hex
mapDecToHex :: Int -> Char
mapDecToHex x | x < 10 = intToDigit(x)
| x == 10 = 'A'
| x == 11 = 'B'
| x == 12 = 'C'
| x == 13 = 'D'
| x == 14 = 'E'
| x == 15 = 'F'
-- hex to decimal
hexToDec :: String -> Int
hexToDec [] = 0
hexToDec s = toNum s 16 mapHexToDec
-- binary to decimal
binToDec :: String -> Int
binToDec [] = 0
binToDec s = toNum s 2 (\x -> if x == '0' then 0 else 1)
-- decimal to binary
decToBin :: Int -> String
decToBin x = toKBaseNum x 2 (\x -> if x == 1 then '1' else '0')
-- decimal to hex
decToHex :: Int -> String
decToHex x = toKBaseNum x 16 mapDecToHex
Explanation:
As you can see, the toNum function converts a k-based value to decimal, using the given base and a mapping function. The mapping function will map special characters to a decimal value (for ex. A=10, B=11, ... in hex). For binary mapping you could also use a lambda expression like you see in binToDec.
Whereas the toKBaseVal function is the opposite, converting a decimal to a k-based value. Again we need a mapping function which does the opposite: from a decimal to the corresponding special character of the k-based value.
As a test you can type:
binToDec(decToBin 7) = 7
Suppose you want to convert from decimal to octal:
-- decimal to octal
decToOct :: Int -> String
decToOct x = toKBaseNum x 8 (\x -> intToDigit(x))
Again, I use just a lambda expression, because the mapping is simple: just int to digit.
Hope that helps! Good programming!

Silly solution for one-liner fans:
(\d -> let fix f = let {x = f x} in x in fmap (\n -> "0123456789abcdef" !! n) (fix (\f l n -> if n == 0 then l :: [Int] else let (q, r) = quotRem n 16 in f (r:l) q) [] d)) 247
The nucleus of the one-liner is:
quotRem 247 16
For the sake of clarity, you can, alternatively, put the following in a file:
#!/usr/bin/env stack
{- stack script --resolver lts-12.1 -}
-- file: DecToHex.hs
module Main where
import System.Environment
fix :: (a -> a) -> a
fix f = let {x = f x} in x
ff :: ([Int] -> Int -> [Int]) -> [Int] -> Int -> [Int]
ff = \f l n ->
if n == 0
then l
else
let (q, r) = quotRem n 16
in f (r:l) q
decToHex :: Int -> String
decToHex d =
fmap (\n -> "0123456789abcdef" !! n)
(fix ff [] d)
main :: IO ()
main =
getArgs >>=
putStrLn . show . decToHex . read . head
And execute the script with:
stack runghc -- DecToHex.hs 247
I used fixed-point operator just so it is an example with fixed-point operator; also because it allowed me to construct the one-liner strictly bottom-up. (Note: bottom-up development is to be discouraged.)
References: stack script syntax, Command line arguments, fix operator definition.
recursion haskell hex haskell-stack

Here is a simple, efficient, base-agnostic, Unlicenced implementation:
convertToBase :: Word8 -> Integer -> String
convertToBase b n
| n < 0 = '-' : convertToBase b (-n)
| n < fromIntegral b = [(['0'..'9'] ++ ['A' .. 'Z']) !! fromIntegral n]
| otherwise = let (d, m) = n `divMod` fromIntegral b in convertToBase b d ++ convertToBase b m
You have to import Data.Word to use Word8 (which limits the values as much as reasonably possible), and you will often need fromIntegral (if only automatic type conversions were a thing...).

Using the FiniteBits class:
import Data.Bits (FiniteBits, finiteBitSize, testBit, shiftR)
showBits :: FiniteBits a => a -> String
showBits bits =
go (finiteBitSize bits - 1) where
go shift =
if shift >= 0
then
let bit = if testBit (shiftR bits shift) 0 then '1' else '0'
in bit : go (pred shift)
else
""
Examples:
showBits (4 :: Word8) => "00000100"
showBits (50 :: Int16) => "0000000000110010"
showBits (-127 :: Int32) => "11111111111111111111111110000001"

When working with text, I recommend using the text-show package which includes:
showbBin :: (Integral a, TextShow a) => a -> Builder
showbHex :: (Integral a, TextShow a) => a -> Builder
showbOct :: (Integral a, TextShow a) => a -> Builder
showbIntAtBase :: (Integral a, TextShow a) => a -> (Int -> Char) -> a -> Builder
For example, converting an Integer to Text in binary:
{-# LANGUAGE OverloadedStrings #-}
import TextShow (toText)
import TextShow.Data.Integral (showbBin)
toBinary :: Integer -> Text
toBinary n = toText . showbBin
> toBinary 6 == "110"
Perhaps you want to add a Text prefix. Builder allows you to efficiently construct Text; it is a monoid.
toBinaryWithPrefix :: Text -> Integer -> Text
toBinaryWithPrefix prefix n = toText $ fromText prefix <> showbBin n
For more information see the TextShow and TextShow.Data.Integral modules available on Hackage.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Rsync's rolling checksum in haskell - haskell

Related

Optimize this bit of conduit code for speed

SHA-1 in haskell producing wrong hashes

Generating sequence from Markov chain in Haskell

No stream fusion with unsafeUpdate_ in unboxed vector

How to print integer literals in binary or hex in haskell?

Categories

Resources