Working with ST and Data.UnionFind.ST

Working with ST and Data.UnionFind.ST - haskell

I'm trying to use the UnionFind package because I need this structure for my exercise (clustering nodes, numbered 1 .. 500) (notwithstanding this blog post suggesting that it does not help) and to learn about ST.
My first problem is that the package does not seem to have a function for bulk loading initial data. I therefore created the following (based on the foldST example in http://www.haskell.org/haskellwiki/Monad/ST) to initialise the data structure, but I can't find a convincing way to call initialClustering and there is no easy way to see whether data structure was created.
import Control.Monad
import Control.Monad.ST
import qualified Data.UnionFind.ST as UF
type Start = Int
--Union-find: clusters of Nodes
type Cluster = UF.Point [Start]
main = do
let
-- next line is clearly wrong but can't find another way to run function
iCluster = initialClustering [1..500]
print $ UF.repr [5] -- i
print $ UF.repr (UF.Pt [5]) -- ii
return ()
initialClustering :: [t] -> ()
initialClustering xs = runST $ do
forM_ xs $ \x -> do
UF.fresh [x]
But I am doing something wrong as compilation fails with i)
Couldn't match expected type `UF.Point s0 a0'
with actual type `[t1]'
In the first argument of `UF.repr', namely `[5]'
and ii)
Not in scope: data constructor `UF.Point'
This reflects a more fundamental lack of understanding about ST and the newtype used to create the UnionFind.ST library.

Related

Haskell: How to use a HashMap in a main function

I beg for your help, speeding up the following program:
main = do
jobsToProcess <- fmap read getLine
forM_ [1..jobsToProcess] $ \_ -> do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
putStrLn $ doSomeReallyLongWorkingJob r k
There could(!) be a lot of identical jobs to do, but it's not up to me modifying the inputs, so I tried to use Data.HashMap for backing up already processed jobs. I already optimized the algorithms in the doSomeReallyLongWorkingJob function, but now it seems, it's quite as fast as C.
But unfortunately it seems, I'm not able to implement a simple cache without producing a lot of errors. I need a simple cache of Type HashMap (Int, Int) Int, but everytime I have too much or too few brackets. And IF I manage to define the cache, I'm stuck in putting data into or retrieving data from the cache cause of lots of errors.
I already Googled for some hours but it seems I'm stuck. BTW: The result of the longrunner is an Int as well.

It's pretty simple to make a stateful action that caches operations. First some boilerplate:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
import Data.Map (Map)
import qualified Data.Map as M
import Debug.Trace
I'll use Data.Map, but of course you can substitute in a hash map or any similar data structure without much trouble. My long-running computation will just add up its arguments. I'll use trace to show when this computation is executed; we'll hope not to see the output of the trace when we enter a duplicate input.
reallyLongRunningComputation :: [Int] -> Int
reallyLongRunningComputation args = traceShow args $ sum args
Now the caching operation will just look up whether we've seen a given input before. If we have, we'll return the precomputed answer; otherwise we'll compute the answer now and store it.
cache :: (MonadState (Map a b) m, Ord a) => (a -> b) -> a -> m b
cache f x = do
mCached <- gets (M.lookup x)
case mCached of
-- depending on your goals, you may wish to force `result` here
Nothing -> modify (M.insert x result) >> return result
Just cached -> return cached
where
result = f x
The main function now just consists of calling cache reallyLongRunningComputation on appropriate inputs.
main = do
iterations <- readLn
flip evalStateT M.empty . replicateM_ iterations
$ liftIO getLine
>>= liftIO . mapM readIO . words
>>= cache reallyLongRunningComputation
>>= liftIO . print
Let's try it in ghci!
> main
5
1 2 3
[1,2,3]
6
4 5
[4,5]
9
1 2
[1,2]
3
1 2
3
1 2 3
6
As you can see by the bracketed outputs, reallyLongRunningComputation was called the first time we entered 1 2 3 and the first time we entered 1 2, but not the second time we entered these inputs.

I hope i'm not too far off base, but first you need a way to carry around the past jobs with you. Easiest would be to use a foldM instead of a forM.
import Control.Monad
import Data.Maybe
main = do
jobsToProcess <- fmap read getLine
foldM doJobAcc acc0 [1..jobsToProcess]
where
acc0 = --initial value of some type of accumulator, i.e. hash map
doJobAcc acc _ = do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
case getFromHash acc (r,k) of
Nothing -> do
i <- doSomeReallyLongWorkingJob r k
return $ insertNew acc (r,k) i
Just i -> do
return acc
Note, I don't actually use the interface for putting and getting the hash table key. It doesn't actually have to be a hash table, Data.Map from containers could work. Or even a list if its going to be a small one.
Another way to carry around the hash table would be to use a State transformer monad.

I am just adding this answer since I feel like the other answers are diverging a bit from the original question, namely using hashtable constructs in Main function (inside IO monad).
Here is a minimal hashtable example using hashtables module. To install the module with cabal, simply use
cabal install hashtables
In this example, we simply put some values in a hashtable and use lookup to print a value retrieved from the table.
import qualified Data.HashTable.IO as H
main :: IO ()
main = do
t <- H.new :: IO (H.CuckooHashTable Int String)
H.insert t 22 "Hello world"
H.insert t 5 "No problem"
msg <- H.lookup t 5
print msg
Notice that we need to use explicit type annotation to specify which implementation of the hashtable we wish to use.

How do I get lazy streaming into the foldl'?

How does one make their own streaming code? I was generating about 1,000,000,000 random pairs of war decks, and I wanted them to be lazy streamed into a foldl', but I got a space leak! Here is the relevant section of code:
main = do
games <- replicateM 1000000000 $ deal <$> sDeck --Would be a trillion, but Int only goes so high
let res = experiment Ace games --experiment is a foldl'
print res --res is tiny
When I run it with -O2, it first starts freezing up my computer, and then the program dies and the computer comes back to life (and Google Chrome then has the resources it needs to yell at me for using up all its resources.)
Note: I tried unsafeInterleaveIO, and it didn't work.
Full code is at: http://lpaste.net/109977

replicateM doesn't do lazy streaming. If you need to stream results from monadic actions, you should use a library such as conduit or pipes.
Your example code could be written to support streaming with conduits like this:
import Data.Conduit
import qualified Data.Conduit.Combinators as C
main = do
let games = C.replicateM 1000000 $ deal <$> sDeck
res <- games $$ C.foldl step Ace
-- where step is the function you want to fold with
print res
The Data.Conduit.Combinators module is from the conduit-combinators package.
As a quick-and-dirty solution you could implement a streaming version of replicateM using lazy IO.
import System.IO.Unsafe
lazyReplicateIO :: Integer -> IO a -> IO [a] --Using Integer so I can make a trillion copies
lazyReplicateIO 0 _ = return []
lazyReplicateIO n act = do
a <- act
rest <- unsafeInterleaveIO $ lazyReplicateIO (n-1) act
return $ a : rest
But I recommend using a proper streaming library.

The equivalent pipes solution is:
import Pipes
import qualified Pipes.Prelude as Pipes
-- Assuming the following types
action :: IO A
acc :: S
step :: S -> A -> S
done :: S -> B
main = do
b <- Pipes.fold step acc done (Pipes.replicateM 1000000 action)
print (b :: B)

Reading a file into an array of strings

I'm pretty new to Haskell, and am trying to simply read a file into a list of strings. I'd like one line of the file per element of the list. But I'm running into a type issue that I don't understand. Here's what I've written for my function:
readAllTheLines hdl = (hGetLine hdl):(readAllTheLines hdl)
That compiles fine. I had thought that the file handle needed to be the same one returned from openFile. I attempted to simply show the list from the above function by doing the following:
displayFile path = show (readAllTheLines (openFile path ReadMode))
But when I try to compile it, I get the following error:
filefun.hs:5:43:
Couldn't match expected type 'Handle' with actual type 'IO Handle'
In the return type of a call of 'openFile'
In the first argument of 'readAllTheLines', namely
'(openFile path ReadMode)'
In the first argument of 'show', namely
'(readAllTheLines (openFile path ReadMode))'
So it seems like openFile returns an IO Handle, but hGetLine needs a plain old Handle. Am I misunderstanding the use of these 2 functions? Are they not intended to be used together? Or is there just a piece I'm missing?

Use readFile and lines for a better alternative.
readLines :: FilePath -> IO [String]
readLines = fmap lines . readFile
Coming back to your solution openFile returns IO Handle so you have to run the action to get the Handle. You also have to check if the Handle is at eof before reading something from that. It is much simpler to just use the above solution.
import System.IO
readAllTheLines :: Handle -> IO [String]
readAllTheLines hndl = do
eof <- hIsEOF hndl
notEnded eof
where notEnded False = do
line <- hGetLine hndl
rest <- readAllTheLines hndl
return (line:rest)
notEnded True = return []
displayFile :: FilePath -> IO [String]
displayFile path = do
hndl <- openFile path ReadMode
readAllTheLines hndl

To add on to Satvik's answer, the example below shows how you can utilize a function to populate an instance of Haskell's STArray typeclass in case you need to perform computations on a truly random access data type.
Code Example
Let's say we have the following problem. We have lines in a text file "test.txt", and we need to load it into an array and then display the line found in the center of that file. This kind of computation is exactly the sort situation where one would want to use a random access array over a sequentially structured list. Granted, in this example, there may not be a huge difference between using a list and an array, but, generally speaking, list accesses will cost O(n) in time whereas array accesses will give you constant time performance.
First, let's create our sample text file:
test.txt
This
is
definitely
a
test.
Given the file above, we can use the following Haskell program (located in the same directory as test.txt) to print out the middle line of text, i.e. the word "definitely."
Main.hs
{-# LANGUAGE BlockArguments #-} -- See footnote 1
import Control.Monad.ST (runST, ST)
import Data.Array.MArray (newArray, readArray, writeArray)
import Data.Array.ST (STArray)
import Data.Foldable (for_)
import Data.Ix (Ix) -- See footnote 2
populateArray :: (Integral i, Ix i) => STArray s i e -> [e] -> ST s () -- See footnote 3
populateArray stArray es = for_ (zip [0..] es) (uncurry (writeArray stArray))
middleWord' :: (Integral i, Ix i) => i -> STArray s i String -> ST s String
middleWord' arrayLength = flip readArray (arrayLength `div` 2)
middleWord :: [String] -> String
middleWord ws = runST do
let len = length ws
array <- newArray (0, len - 1) "" :: ST s (STArray s Int String)
populateArray array ws
middleWord' len array
main :: IO ()
main = do
ws <- words <$> readFile "test.txt"
putStrLn $ middleWord ws
Explanation
Starting with the top of Main.hs, the ST s monad and its associated function runST allow us to extract pure values from imperative-style computations with in-place updates in a referentially transparent manner. The module Data.Array.MArray exports the MArray typeclass as an interface for instantiating mutable array data types and provides helper functions for creating, reading, and writing MArrays. These functions can be used in conjunction with STArrays since there is an instance of MArray defined for STArray.
The populateArray function is the crux of our example. It uses for_ to "applicatively" loop over a list of tuples of indices and list elements to fill the given STArray with those list elements, producing a value of type () in the ST s monad.
The middleWord' helper function uses readArray to produce a String (wrapped in the ST s monad) that corresponds to the middle element of a given STArray of Strings.
The middleWord function instantiates a new STArray, uses populateArray to fill the array with values from a provided list of strings, and calls middleWord' to obtain the middle string in the array. runST is applied to this whole ST s monadic computation to extract the pure String result.
We finally use our middleWord function in main to find the middle word in the text file "test.txt".
Further Reading
Haskell's STArray is not the only way to work with arrays in Haskell. There are in fact Arrays, IOArrays, DiffArrays and even "unboxed" versions of all of these array types that avoid using the indirection of pointers to simply store "raw" values. There is a page on the Haskell Wikibook on this topic that may be worth some study. Before that, however, looking at the Wikibook page on mutable objects may give you some insight as to why the ST s monad allows us to safely compute pure values from functions that use imperative/destructive operations.
Footnotes
1 The BlockArguments language extension is what allows us to pass a do block directly to a function without any parentheses or use of the function application operator $.
2 As suggested by the Hackage documentation, Ix is a typeclass mainly meant to be used to specify types for indexing arrays.
3 The use of the Integral and Ix type constraints may be a bit of overkill, but it's used to make our type signatures as general as possible.

Trouble with Template Haskell stage restriction

I just start learning Template Haskell, and stuck on simple problem with splicing.
In one module I've implemented function tupleN which replies N-th element of the tuple:
tupleN :: Lift a => a -> Int -> Q Exp
tupleN a n = do
(TupE as) <- lift a
return $ as !! n
In the Main module I have:
main :: IO ()
main = do
let tup = (1::Int,'a',"hello")
putStrLn $ show $(tupleN $tup 1)
This seems to be working, but it wouldn't. Compiler prints error:
GHC stage restriction: `tup'
is used in a top-level splice or annotation,
and must be imported, not defined locally
In the expression: tup
In the first argument of `tupleN', namely `$tup'
In the expression: tupleN ($tup) 1
If I put tuple description right into spliced expression, code become working:
main :: IO ()
main = do
putStrLn $ show $(tupleN (1::Int,'a',"hello") 1)
What I missing with the first variant?

You've tried to use tup as a splice, but tup is just an ordinary value. You don't want to prefix it with a $.
Moreover, as the compile error states, since Template Haskell runs during the compilation process, GHC really needs to know what it's doing before it has finished compiling the current module. That means your splice expression can't depend on tup, because that's still being compiled. Inside splices, you can only use literals, imported values, and the special 'name and ''TypeName forms (which you can think of as a sort of literal, I suppose). You can get at some of the information from this compilation by using e.g. reify, but even that can only give you data that's available at compile time – if you want a function you can pass user input, or data constructed from user input, to, that's just impossible.
In short, you can't do exactly what you want to do using Template Haskell. You could, however, define a splice that expands to a function to get the ith element of a tuple of size sz:
import Control.Monad (unless)
import Language.Haskell.TH
tupleN :: Int -> Int -> Q Exp
tupleN sz i = do
unless (i < sz) . reportError $ "tupleN: index " ++ show i
++ " out of bounds for " ++ show sz ++ "-tuple"
lamE
[tupP (replicate i wildP
++ [varP (mkName "x")]
++ replicate (sz - i - 1) wildP)]
(varE (mkName "x"))

Dice Game in Haskell

I'm trying to spew out randomly generated dice for every roll that the user plays. The user has 3 rolls per turn and he gets to play 5 turns (I haven't implemented this part yet and I would appreciate suggestions).
I'm also wondering how I can display the colors randomly. I have the list of tuples in place, but I reckon I need some function that uses random and that list to match those colors. I'm struggling as to how.
module Main where
import System.IO
import System.Random
import Data.List
diceColor = [("Black",1),("Green",2),("Purple",3),("Red",4),("White",5),("Yellow",6)]
{-
randomList :: (RandomGen g) -> Int -> g -> [Integer]
random 0 _ = []
randomList n generator = r : randomList (n-1) newGenerator
where (r, newGenerator) = randomR (1, 6) generator
-}
rand :: Int -> [Int] -> IO ()
rand n rlst = do
num <- randomRIO (1::Int, 6)
if n == 0
then doSomething rlst
else rand (n-1) (num:rlst)
doSomething x = putStrLn (show (sort x))
main :: IO ()
main = do
--hSetBuffering stdin LineBuffering
putStrLn "roll, keep, score?"
cmd <- getLine
doYahtzee cmd
--rand (read cmd) []
doYahtzee :: String -> IO ()
doYahtzee cmd = do
if cmd == "roll"
then rand 5 []
else do print "You won"

There's really a lot of errors sprinkled throughout this code, which suggests to me that you tried to build the whole thing at once. This is a recipe for disaster; you should be building very small things and testing them often in ghci.
Lecture aside, you might find the following facts interesting (in order of the associated errors in your code):
List is deprecated; you should use Data.List instead.
No let is needed for top-level definitions.
Variable names must begin with a lower case letter.
Class prerequisites are separated from a type by =>.
The top-level module block should mainly have definitions; you should associate every where clause (especially the one near randomList) with a definition by either indenting it enough not to be a new line in the module block or keeping it on the same line as the definition you want it to be associated with.
do introduces a block; those things in the block should be indented equally and more than their context.
doYahtzee is declared and used as if it has three arguments, but seems to be defined as if it only has one.
The read function is used to parse a String. Unless you know what it does, using read to parse a String from another String is probably not what you want to do -- especially on user input.
putStrLn only takes one argument, not four, and that argument has to be a String. However, making a guess at what you wanted here, you might like the (!!) and print functions.
dieRoll doesn't seem to be defined anywhere.
It's possible that there are other errors, as well. Stylistically, I recommend that you check out replicateM, randomRs, and forever. You can use hoogle to search for their names and read more about them; in the future, you can also use it to search for functions you wish existed by their type.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Working with ST and Data.UnionFind.ST - haskell

Related

Haskell: How to use a HashMap in a main function

How do I get lazy streaming into the foldl'?

Reading a file into an array of strings

Trouble with Template Haskell stage restriction

Dice Game in Haskell

Categories

Resources