Haskell - Rename duplicate values in a list of lists - haskell

I have a list of lists of strings e.g;
[["h","e","l","l","o"], ["g","o","o","d"], ["w","o","o","r","l","d"]]
And I want to rename repeated values outside a sublist so that all the repetitions are set to new randomly generated values throughout a sublist that are not pre-existing in the list but the same inside the same sublist so that a possible result might be:
[["h","e","l","l","o"], ["g","t","t","d"], ["w","s","s","r","z","f"]]
I already have a function that can randomly generate a string of size one called randomStr:
randomStr :: String
randomStr = take 1 $ randomRs ('a','z') $ unsafePerformIO newStdGen

Presuming you want to do what I've outlined in my comment below, it's best to break this problem up into several smaller parts to tackle one at a time. I would also recommend leveraging common modules in base and containers, since it will make the code much simpler and faster. In particular, the modules Data.Map and Data.Sequence are very useful in this case. Data.Map I would say is the most useful here, as it has some very useful functions that would otherwise be difficult to write by hand. Data.Sequence is used for efficiency purposes at the end, as you'll see.
First, imports:
import Data.List (nub)
import Data.Map (Map)
import Data.Sequence (Seq, (|>), (<|))
import qualified Data.Map as Map
import qualified Data.Sequence as Seq
import Data.Foldable (toList)
import System.Random (randomRIO)
import Control.Monad (forM, foldM)
import Control.Applicative ((<$>))
Data.Foldable.toList is needed since Data.Sequence does not have a toList function, but Foldable provides one that will work. On to the code. We first want to be able to take a list of Strings and find all the unique elements in it. For this, we can use nub:
lettersIn :: [String] -> [String]
lettersIn = nub
I like providing my own names for functions like this, it can make the code more readable.
Now that we can get all the unique characters, we want to be able to assign each a random character:
makeRandomLetterMap :: [String] -> IO (Map String String)
makeRandomLetterMap letters
= fmap Map.fromList
$ forM (lettersIn letters) $ \l -> do
newL <- randomRIO ('a', 'z')
return (l, [newL])
Here we get a new random character and essentially zip it up with our list of letters, then we fmap (<$>) Map.fromList over that result. Next, we need to be able to use this map to replace letters in a list. If a letter isn't found in the Map, we just want the letter back. Luckily, Data.Map has the findWithDefault function which is perfect for this situation:
replaceLetter :: Map String String -> String -> String
replaceLetter m letter = Map.findWithDefault letter letter m
replaceAllLetters :: Map String String -> [String] -> [String]
replaceAllLetters m letters = map (replaceLetter m) letters
Since we want to be able to update this map with new letters that have been encountered in each sublist, overwriting previously encountered letters as needed, we can use Data.Map.union. Since union favors its first argument, we need to flip it:
updateLetterMap :: Map String String -> [String] -> IO (Map String String)
updateLetterMap m letters = flip Map.union m <$> makeRandomLetterMap letters
Now we have all the tools needed to tackle the problem at hand:
replaceDuplicatesRandomly :: [[String]] -> IO [[String]]
replaceDuplicatesRandomly [] = return []
For the base case, just return an empty list.
replaceDuplicatesRandomly (first:rest) = do
m <- makeRandomLetterMap first
For a non-empty list, make the initial map off the first sublist
(_, seqTail) <- foldM go (m, Seq.empty) rest
Fold over the rest, starting with an empty sequence and the first map, and extract the resulting sequence
return $ toList $ first <| seqTail
Then convert the sequence to a list after prepending the first sublist (it doesn't get changed by this function). The go function is pretty simple too:
where
go (m, acc) letters = do
let newLetters = replaceAllLetters m letters
newM <- updateLetterMap m letters
return (newM, acc |> newLetters)
It takes the current map m and an accumulation of all the sublists processed so far acc along with the current sublist letters, replaces the letters in said sublist, builds a new map for the next iteration (newM), and then returns the new map along with the accumulation of everything processed, i.e. acc |> newLetters. All together, the function is
replaceDuplicatesRandomly :: [[String]] -> IO [[String]]
replaceDuplicatesRandomly [] = return []
replaceDuplicatesRandomly (first:rest) = do
m <- makeRandomLetterMap first
(_, seqTail) <- foldM go (m, Seq.empty) rest
return $ toList $ first <| seqTail
where
go (m, acc) letters = do
let newLetters = replaceAllLetters m letters
newM <- updateLetterMap m letters
return (newM, acc |> newLetters)

It's always better to keep impure and pure computations separated.
You cannot replace by letters, which are already in a list, so you need to get a string of fresh letters:
fresh :: [String] -> String
fresh xss = ['a'..'z'] \\ foldr union [] xss
This function replaces one letter with another in a string:
replaceOne :: Char -> Char -> String -> String
replaceOne y y' = map (\x -> if x == y then y' else x)
This function replaces one letter each time with a new letter for every string in a list of strings:
replaceOnes :: Char -> String -> [String] -> (String, [String])
replaceOnes y = mapAccumL (\(y':ys') xs ->
if y `elem` xs
then (ys', replaceOne y y' xs)
else (y':ys', xs))
For example
replaceOnes 'o' "ijklmn" ["hello", "good", "world"]
returns
("lmn",["helli","gjjd","wkrld"])
A bit tricky one:
replaceMany :: String -> String -> [String] -> (String, [String])
replaceMany ys' ys xss = runState (foldM (\ys' y -> state $ replaceOnes y ys') ys' ys) xss
This function replaces each letter from ys each time with a new letter from ys' for every string in xss.
For example
replaceMany "mnpqstuvxyz" "lod" ["hello", "good", "world"]
returns
("vxyz",["hemmp","gqqt","wsrnu"])
i.e.
'l's in "hello" are replaced by the first letter in "mnpqstuvxyz"
'l' in "world" is replaced by the second letter in "mnpqstuvxyz"
'o' in "hello" is replaced by the third letter in "mnpqstuvxyz"
'o's in "good" are replaced by the fourth letter in "mnpqstuvxyz"
...
'd' in "world" is replaced by the seventh letter in "mnpqstuvxyz"
This function goes through a list of strings and replaces all letters from the head by fresh letters, that ys' contains, for each string in the rest of the list.
replaceDuplicatesBy :: String -> [String] -> [String]
replaceDuplicatesBy ys' [] = []
replaceDuplicatesBy ys' (ys:xss) = ys : uncurry replaceDuplicatesBy (replaceMany ys' ys xss)
I.e. it does what you want, but without any randomness — just picks fresh letters from a list.
All described functions are pure. Here is an impure one:
replaceDuplicates :: [String] -> IO [String]
replaceDuplicates xss = flip replaceDuplicatesBy xss <$> shuffle (fresh xss)
I.e. generate a random permutation of a string, that contains fresh letters, and pass it to replaceDuplicatesBy.
You can take the shuffle function from https://www.haskell.org/haskellwiki/Random_shuffle
And the final test:
main = replicateM_ 3 $ replaceDuplicates ["hello", "good", "world"] >>= print
prints
["hello","gxxd","wcrzy"]
["hello","gyyd","wnrmf"]
["hello","gmmd","wvrtx"]
The whole code (without shuffle): http://lpaste.net/115763

I think this is bound to raise more questions than it answers.
import Control.Monad.State
import Data.List
import System.Random
mapAccumLM _ s [] = return (s, [])
mapAccumLM f s (x:xs) = do
(s', y) <- f s x
(s'', ys) <- mapAccumLM f s' xs
return (s'', y:ys)
pick excluded for w = do
a <- pick' excluded
putStrLn $ "replacement for " ++ show for ++ " in " ++ show w ++ " excluded: " ++ show excluded ++ " = " ++ show a
return a
-- | XXX -- can loop indefinitely
pick' excluded = do
a <- randomRIO ('a','z')
if elem a excluded
then pick' excluded
else return a
transform w = do
globallySeen <- get
let go locallySeen ch =
case lookup ch locallySeen of
Nothing -> if elem ch globallySeen
then do let excluded = globallySeen ++ (map snd locallySeen)
a <- lift $ pick excluded ch w
return ( (ch, a):locallySeen, a)
else return ( (ch,ch):locallySeen, ch )
Just ch' -> return (locallySeen, ch')
(locallySeen, w') <- mapAccumLM go [] w
let globallySeen' = w' ++ globallySeen
put globallySeen'
return w'
doit ws = runStateT (mapM transform ws) []
main = do
ws' <- doit [ "hello", "good", "world" ]
print ws'

Related

Get all string splits

Say I have a string:
"abc7de7f77ghij7"
I want to split it by a substring, 7 in this case, and get all the left-right splits:
[ ("abc", "de7f77ghij7")
, ("abc7de", "f77ghij7")
, ("abc7de7f", "7ghij7")
, ("abc7de7f7", "ghij7")
, ("abc7de7f77ghij", "")
]
Sample implementation:
{-# LANGUAGE OverloadedStrings #-}
module StrSplits where
import qualified Data.Text as T
splits :: T.Text -> T.Text -> [(T.Text, T.Text)]
splits d s =
let run a l r =
case T.breakOn d r of
(x, "") -> reverse a
(x, y) ->
let
rn = T.drop (T.length d) y
an = (T.append l x, rn) : a
ln = l `T.append` x `T.append` d
in run an ln rn
in run [] "" s
main = do
print $ splits "7" "abc7de7f77ghij7"
print $ splits "8" "abc7de7f77ghij7"
with expected result:
[("abc","de7f77ghij7"),("abc7de","f77ghij7"),("abc7de7f","7ghij7"),("abc7de7f7","ghij7"),("abc7de7f77ghij","")]
[]
I'm not too happy about the manual recursion and let/case/let nesting. If my feeling that it doesn't look too good is right, is there a better way to write it?
Is there a generalized approach to solving these kinds of problems in Haskell similar to how recursion can be replaced with fmap and folds?
How about this?
import Data.Bifunctor (bimap)
splits' :: T.Text -> T.Text -> [(T.Text, T.Text)]
splits' delimiter string = mkSplit <$> [1..numSplits]
where
sections = T.splitOn delimiter string
numSplits = length sections - 1
mkSplit n = bimap (T.intercalate delimiter) (T.intercalate delimiter) $ splitAt n sections
I like to believe there's a way that doesn't involve indices, but you get the general idea. First split the string by the delimiter. Then split that list of strings at in two everywhere possible, rejoining each side with the delimiter.
Not the most efficient, though. You can probably do something similar with indices from Data.Text.Internal.Search if you want it to be fast. In this case, you wouldn't need to do the additional rejoining. I didn't experiment with it since I didn't understand what the function was returning.
Here's an indexless one.
import Data.List (isPrefixOf, unfoldr)
type ListZipper a = ([a],[a])
moveRight :: ListZipper a -> Maybe (ListZipper a)
moveRight (_, []) = Nothing
moveRight (ls, r:rs) = Just (r:ls, rs)
-- As Data.List.iterate, but generates a finite list ended by Nothing.
unfoldr' :: (a -> Maybe a) -> a -> [a]
unfoldr' f = unfoldr (\x -> (,) x <$> f x)
-- Get all ways to split a list with nonempty suffix
-- Prefix is reversed for efficiency
-- [1,2,3] -> [([],[1,2,3]), ([1],[2,3]), ([2,1],[3])]
splits :: [a] -> [([a],[a])]
splits xs = unfoldr' moveRight ([], xs)
-- This is the function you want.
splitsOn :: (Eq a) => [a] -> [a] -> [([a],[a])]
splitsOn sub xs = [(reverse l, drop (length sub) r) | (l, r) <- splits xs, sub `isPrefixOf` r]
Try it online!
Basically, traverse a list zipper to come up with a list of candidates for the split. Keep only those that are indeed splits on the desired item, then (un)reverse the prefix portion of each passing candidate.

Haskell: Convert String to [(String,Double)]

I parse an XML and get an String like this:
"resourceA,3-resourceB,1-,...,resourceN,x"
I want to map that String into a list of tuples (String,Double), like this:
[(resourceA,3),(resourceB,1),...,(resourceN,x)]
How is it possible to do this? I ve looked into the map function and also the split one. I am able to split the string by "-" but anything else...
This is the code i have so far:
split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s
it is just a function to split my string into a list of Stirng, but then i dont know how to continue.
What I want to do know is to loop over that new list that i have created with the split method and for each element create a tuple. I hace tried with the map function but i dont get it to compile even
So in Haskell you dont really mutate any value, instead you'll create a new list of pairs from the string you've described, so the solution would look something similar to the following:
import Data.List.Split
xmlList = splitOn "-" "resourceA,3-resourceB,4-resourceC,6"
commaSplit :: String -> [String]
commaSplit = splitOn ","
xmlPair :: [String] -> [(String, Double)] -- might be more efficient to use Text instead of String
xmlPair [x] = [(\x' -> ((head x') :: String, (read (last x')) :: Double )) (commaSplit x)]
xmlPair (x:xs) = xmlPair [x] ++ xmlPair xs
main :: IO ()
main = mapM_ (\(a,b) -> putStrLn (show a++" = "++ show b)) (xmlPair $ xmlList)
This is my quick and dirty way of showing things but I'm sure someone can always add a more detailed answer.

Is there any way to not use explicit recursion in this algorithm?

So the problem I'm working on matching a pattern to a list, such like this:
match "abba" "redbluebluered" -> True or
match "abba" "redblueblue" -> False, etc. I wrote up an algorithm that works, and I think it's reasonable understandable, but I'm not sure if there's a better way to do this without explicit recursion.
import Data.HashMap.Strict as M
match :: (Eq a, Eq k, Hashable k) => [k] -> [a] -> HashMap k [a] -> Bool
match [] [] _ = True
match [] _ _ = False
match _ [] _ = False
match (p:ps) s m =
case M.lookup p m of
Just v ->
case stripPrefix v s of
Just post -> match ps post m
Nothing -> False
Nothing -> any f . tail . splits $ s
where f (pre, post) = match ps post $ M.insert p pre m
splits xs = zip (inits xs) (tails xs)
I would call this like match "abba" "redbluebluered" empty. The actual algorithm is simple. The map contains the patterns already matched. At the end it is [a - > "red", b -> "blue"]. If the next pattern is one we've seen before, just try matching it and recurse down if we can. Otherwise fail and return false.
If the next pattern is new, just try mapping the new pattern to every single prefix in the string and recursing down.
This is very similar to a parsing problem, so let's take a hint from the parser monad:
match should return a list of all of the possible continuations of the parse
if matching fails it should return the empty list
the current set of assignments will be state that has to carried through the computation
To see where we are headed, let's suppose we have this magic monad. Attempting to match "abba" against a string will look like:
matchAbba = do
var 'a'
var 'b'
var 'b'
var 'a'
return () -- or whatever you want to return
test = runMatch matchAbba "redbluebluered"
It turns out this monad is the State monad over the List monad. The List monad provides for backtracking and the State monad carries the current assignments and input around.
Here's the code:
import Data.List
import Control.Monad
import Control.Monad.State
import Control.Monad.Trans
import Data.Maybe
import qualified Data.Map as M
import Data.Monoid
type Assigns = M.Map Char String
splits xs = tail $ zip (inits xs) (tails xs)
var p = do
(assigns,input) <- get
guard $ (not . null) input
case M.lookup p assigns of
Nothing -> do (a,b) <- lift $ splits input
let assigns' = M.insert p a assigns
put (assigns', b)
return a
Just t -> do guard $ isPrefixOf t input
let inp' = drop (length t) input
put (assigns, inp')
return t
matchAbba :: StateT (Assigns, String) [] Assigns
matchAbba = do
var 'a'
var 'b'
var 'b'
var 'a'
(assigns,_) <- get
return assigns
test1 = evalStateT matchAbba (M.empty, "xyyx")
test2 = evalStateT matchAbba (M.empty, "xyy")
test3 = evalStateT matchAbba (M.empty, "redbluebluered")
matches :: String -> String -> [Assigns]
matches pattern input = evalStateT monad (M.empty,input)
where monad :: StateT (Assigns, String) [] Assigns
monad = do sequence $ map var pattern
(assigns,_) <- get
return assigns
Try, for instance:
matches "ab" "xyz"
-- [fromList [('a',"x"),('b',"y")],fromList [('a',"x"),('b',"yz")],fromList [('a',"xy"),('b',"z")]]
Another thing to point out is that code which transforms a string like "abba" to the monadic value do var'a'; var'b'; var 'b'; var 'a' is simply:
sequence $ map var "abba"
Update: As #Sassa NF points out, to match the end of input you'll want to define:
matchEnd :: StateT (Assigns,String) [] ()
matchEnd = do
(assigns,input) <- get
guard $ null input
and then insert it into the monad:
monad = do sequence $ map var pattern
matchEnd
(assigns,_) <- get
return assigns
I would like to modify your signature and return more than Bool. Your solution then becomes:
match :: (Eq a, Ord k) => [k] -> [a] -> Maybe (M.Map k [a])
match = m M.empty where
m kvs (k:ks) vs#(v:_) = let splits xs = zip (inits xs) (tails xs)
f (pre, post) t =
case m (M.insert k pre kvs) ks post of
Nothing -> t
x -> x
in case M.lookup k kvs of
Nothing -> foldr f Nothing . tail . splits $ vs
Just p -> stripPrefix p vs >>= m kvs ks
m kvs [] [] = Just kvs
m _ _ _ = Nothing
Using the known trick of folding to produce a function we can obtain:
match ks vs = foldr f end ks M.empty vs where
end m [] = Just m
end _ _ = Nothing
splits xs = zip (inits xs) (tails xs)
f k g kvs vs = let h (pre, post) = (g (M.insert k pre kvs) post <|>)
in case M.lookup k kvs of
Nothing -> foldr h Nothing $ tail $ splits vs
Just p -> stripPrefix p vs >>= g kvs
Here match is the function folding all keys to produce a function taking a Map and a string of a, which returns a Map of matches of the keys to substrings. The condition for matching the string of a in its entirety is tracked by the last function applied by foldr - end. If end is supplied with a map and an empty string of a, then the match is successful.
The list of keys is folded using function f, which is given four arguments: the current key, the function g matching the remainder of the list of keys (i.e. either f folded, or end), the map of keys already matched, and the remainder of the string of a. If the key is already found in the map, then just strip the prefix and feed the map and the remainder to g. Otherwise, try to feed the modified map and remainder of as for different split combinations. The combinations are tried lazily as long as g produces Nothing in h.
Here is another solution, more readable, I think, and as inefficient as other solutions:
import Data.Either
import Data.List
import Data.Maybe
import Data.Functor
splits xs = zip (inits xs) (tails xs)
subst :: Char -> String -> Either Char String -> Either Char String
subst p xs (Left q) | p == q = Right xs
subst p xs q = q
match' :: [Either Char String] -> String -> Bool
match' [] [] = True
match' (Left p : ps) xs = or [ match' (map (subst p ixs) ps) txs
| (ixs, txs) <- tail $ splits xs]
match' (Right s : ps) xs = fromMaybe False $ match' ps <$> stripPrefix s xs
match' _ _ = False
match = match' . map Left
main = mapM_ (print . uncurry match)
[ ("abba" , "redbluebluered" ) -- True
, ("abba" , "redblueblue" ) -- False
, ("abb" , "redblueblue" ) -- True
, ("aab" , "redblueblue" ) -- False
, ("cbccadbd", "greenredgreengreenwhiteblueredblue") -- True
]
The idea is simple: instead of having a Map, store both patterns and matched substrings in a list. So when we encounter a pattern (Left p), then we substitute all occurrences of this pattern with a substring and call match' recursively with this substring being striped, and repeat this for each substring, that belongs to inits of a processed string. If we encounter already matched substring (Right s), then we just try to strip this substring, and call match' recursively on a successive attempt or return False otherwise.

Reading multiline user's input

I want to lazily read user input and do something with it line by line. But if user ends a line with , (comma) followed by any number of spaces (including zero), I want give him opportunity to finish his input on the next line.
And here is what I've got:
import System.IO
import Data.Char
chop :: String -> [String]
chop = f . map (++ "\n") . lines
where f [] = []
f [x] = [x]
f (x : y : xs) = if (p . tr) x
then f ((x ++ y) : xs)
else x : f (y : xs)
p x = (not . null) x && ((== ',') . last) x
tr xs | all isSpace xs = ""
tr (x : xs) = x :tr xs
main :: IO ()
main =
do putStrLn "Welcome to hell, version 0.1.3!"
putPrompt
mapM_ process . takeWhile (/= "quit\n") . chop =<< getContents
where process str = putStr str >> putPrompt
putPrompt = putStr ">>> " >> hFlush stdout
Sorry, it doesn't work at all. Bloody mess.
P.S. I want to preserve \n characters on end of every chunk. Currently I add them manually with map (++ "\n") after lines.
How about changing the type of chop a little:
readMultiLine :: IO [String]
readMultiLine = do
ln <- getLine
if (endswith (rstrip ln) ",") then
liftM (ln:) readMultiLine
else
return [ln]
Now you know that if the last list is not empty, then the user didn't finish typing (the last input ended with ',').
Of course, either import Data.String.Utils, or write your own. Could be as simple as:
endswith xs ys = (length xs >= length ys)
&& (and $ zipWith (==) (reverse xs) (reverse ys))
rstrip = reverse . dropWhile isSpace . reverse
But I missed the point at first. Here's the actual thing.
unfoldM :: (Monad m) => (a -> Maybe (m b, m a)) -> a -> m [b]
unfoldM f z = case f z of
Nothing -> return []
Just (x, y) -> liftM2 (:) x $ y >>= unfoldM f
main = unfoldM (\x -> if (x == ["quit"]) then Nothing
else Just (print x, readMultiLine)) =<< readMultiLine
The reason is, you need to be able to insert the "action" to be done on input between reading one multi-line input and the next. Here print x is the action inserted between two readMultiLine
Since you have questions about getContents, let me add. Even though getContents provides a lazy String, its effectful changes to the world are ordered with the subsequent effects of processing the list. But the processing of the list attempts to insert effects between effects of reading particular list items. To do that, you need a function that exposes the chain of effects, so you can insert your own effects between them.
You can do this using pipes, preserving the laziness of the user's input
import Data.Char (isSpace)
import Pipes
import qualified Pipes.Prelude as Pipes
endsWithComma :: String -> Bool
endsWithComma str =
case (dropWhile isSpace $ reverse str) of
',':_ -> True
_ -> False
finish :: Monad m => Pipe String String m ()
finish = do
str <- await
yield str
if endsWithComma str
then do
str' <- await
yield str'
else finish
user :: Producer String IO ()
user = Pipes.stdinLn >-> finish
You can then hook up the user Producer to any downstream Consumer. For example, to echo the stream back out you can write:
main = runEffect (user >-> Pipes.stdoutLn)
To learn more about pipes you can read the tutorial.
Sorry, I wrote something wrong in a comment and I thought that now that I understood what you were trying to do, I'd give an answer with a little more substance. The core idea is that you're going to need a state buffer while you loop through the string, as far as I can tell. You have f :: [String] -> [String] but you'll need an extra string of buffer before you can solve this puzzle.
So let me assume an answer which looks like:
chop = joinCommas "" . map (++ "\n") . lines
Then the structure of joinCommas is going to look like:
import Data.List (isSuffixOf)
-- override with however you want to handle the ",\n" between lines.
joinLines = (++)
incomplete = isSuffixOf ",\n"
joinCommas :: String -> [String] -> [String]
joinCommas prefix (line : rest)
| incomplete prefix = joinCommas (joinLines prefix line) rest
| otherwise = prefix : joinCommas line rest
joinCommas prefix []
| incomplete prefix = error "Incomplete input"
| otherwise = [prefix]
The prefix stores up lines until it doesn't end with ",\n" at which point it emits the prefix and continues with the rest of the lines. On EOF we process the last line unless that line is incomplete.

How do I parse a matrix of integers in Haskell?

So I've read the theory, now trying to parse a file in Haskell - but am not getting anywhere. This is just so weird...
Here is how my input file looks:
m n
k1, k2...
a11, ...., an
a21,.... a22
...
am1... amn
Where m,n are just intergers, K = [k1, k2...] is a list of integers, and a11..amn is a "matrix" (a list of lists): A=[[a11,...a1n], ... [am1... amn]]
Here is my quick python version:
def parse(filename):
"""
Input of the form:
m n
k1, k2...
a11, ...., an
a21,.... a22
...
am1... amn
"""
f = open(filename)
(m,n) = f.readline().split()
m = int(m)
n = int(n)
K = [int(k) for k in f.readline().split()]
# Matrix - list of lists
A = []
for i in range(m):
row = [float(el) for el in f.readline().split()]
A.append(row)
return (m, n, K, A)
And here is how (not very) far I got in Haskell:
import System.Environment
import Data.List
main = do
(fname:_) <- getArgs
putStrLn fname --since putStrLn goes to IO ()monad we can't just apply it
parsed <- parse fname
putStrLn parsed
parse fname = do
contents <- readFile fname
-- ,,,missing stuff... ??? how can I get first "element" and match on it?
return contents
I am getting confused by monads (and the context that the trap me into!), and the do statement. I really want to write something like this, but I know it's wrong:
firstLine <- contents.head
(m,n) <- map read (words firstLine)
because contents is not a list - but a monad.
Any help on the next step would be great.
So I've just discovered that you can do:
liftM lines . readFile
to get a list of lines from a file. However, still the example only only transforms the ENTIRE file, and doesn't use just the first, or the second lines...
The very simple version could be:
import Control.Monad (liftM)
-- this operates purely on list of strings
-- and also will fail horribly when passed something that doesn't
-- match the pattern
parse_lines :: [String] -> (Int, Int, [Int], [[Int]])
parse_lines (mn_line : ks_line : matrix_lines) = (m, n, ks, matrix)
where [m, n] = read_ints mn_line
ks = read_ints ks_line
matrix = parse_matrix matrix_lines
-- this here is to loop through remaining lines to form a matrix
parse_matrix :: [String] -> [[Int]]
parse_matrix lines = parse_matrix' lines []
where parse_matrix' [] acc = reverse acc
parse_matrix' (l : ls) acc = parse_matrix' ls $ (read_ints l) : acc
-- this here is to give proper signature for read
read_ints :: String -> [Int]
read_ints = map read . words
-- this reads the file contents and lifts the result into IO
parse_file :: FilePath -> IO (Int, Int, [Int], [[Int]])
parse_file filename = do
file_lines <- (liftM lines . readFile) filename
return $ parse_lines file_lines
You might want to look into Parsec for fancier parsing, with better error handling.
*Main Control.Monad> parse_file "test.txt"
(3,3,[1,2,3],[[1,2,3],[4,5,6],[7,8,9]])
An easy to write solution
import Control.Monad (replicateM)
-- Read space seperated words on a line from stdin
readMany :: Read a => IO [a]
readMany = fmap (map read . words) getLine
parse :: IO (Int, Int, [Int], [[Int]])
parse = do
[m, n] <- readMany
ks <- readMany
xss <- replicateM m readMany
return (m, n, ks, xss)
Let's try it:
*Main> parse
2 2
123 321
1 2
3 4
(2,2,[123,321],[[1,2],[3,4]])
While the code I presented is quite expressive. That is, you get work done quickly with little code, it has some bad properties. Though I think if you are still learning haskell and haven't started with parser libraries. This is the way to go.
Two bad properties of my solution:
All code is in IO, nothing is testable in isolation
The error handling is very bad, as you see the pattern matching is very aggressive in [m, n]. What happens if we have 3 elements on the first line of the input file?
liftM is not magic! You would think it does some arcane thing to lift a function f into a monad but it is actually just defined as:
liftM f x = do
y <- x
return (f y)
We could actually use liftM to do what you wanted to, that is:
[m,n] <- liftM (map read . words . head . lines) (readFile fname)
but what you are looking for are let statements:
parseLine = map read . words
parse fname = do
(x:y:xs) <- liftM lines (readFile fname)
let [m,n] = parseLine x
let ks = parseLine y
let matrix = map parseLine xs
return (m,n,ks,matrix)
As you can see we can use let to mean variable assignment rather then monadic computation. In fact let statements are you just let expressions when we desugar the do notation:
parse fname =
liftM lines (readFile fname) >>= (\(x:y:xs) ->
let [m,n] = parseLine x
ks = parseLine y
matrix = map parseLine xs
in return matrix )
A Solution Using a Parsing Library
Since you'll probably have a number of people responding with code that parses strings of Ints into [[Int]] (map (map read . words) . lines $ contents), I'll skip that and introduce one of the parsing libraries. If you were to do this task for real work you'd probably use such a library that parses ByteString (instead of String, which means your IO reads everything into a linked list of individual characters).
import System.Environment
import Control.Monad
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString as B
First, I imported the Attoparsec and bytestring libraries. You can see these libraries and their documentation on hackage and install them using the cabal tool.
main = do
(fname:_) <- getArgs
putStrLn fname
parsed <- parseX fname
print parsed
main is basically unchanged.
parseX :: FilePath -> IO (Int, Int, [Int], [[Int]])
parseX fname = do
bs <- B.readFile fname
let res = parseOnly parseDrozzy bs
-- We spew the error messages right here
either (error . show) return res
parseX (renamed from parse to avoid name collision) uses the bytestring library's readfile, which reads in the file packed, in contiguous bytes, instead of into cells of a linked list. After parsing I use a little shorthand to return the result if the parser returned Right result or print an error if the parser returned a value of Left someErrorMessage.
-- Helper functions, more basic than you might think, but lets ignore it
sint = skipSpace >> int
int = liftM floor number
parseDrozzy :: Parser (Int, Int, [Int], [[Int]])
parseDrozzy = do
m <- sint
n <- sint
skipSpace
ks <- manyTill sint endOfLine
arr <- count m (count n sint)
return (m,n,ks,arr)
The real work then happens in parseDrozzy. We get our m and n Int values using the above helper. In most Haskell parsing libraries we must explicitly handle whitespace - so I skip the newline after n to get to our ks. ks is just all the int values before the next newline. Now we can actually use the previously specified number of rows and columns to get our array.
Technically speaking, that final bit arr <- count m (count n sint) doesn't follow your format. It will grab n ints even if it means going to the next line. We could copy Python's behavior (not verifying the number of values in a row) using count m (manyTill sint endOfLine) or we could check for each end of line more explicitly and return an error if we are short on elements.
From Lists to a Matrix
Lists of lists are not 2 dimensional arrays - the space and performance characteristics are completely different. Let's pack our list into a real matrix using Data.Array.Repa (import Data.Array.Repa). This will allow us to access the elements of the array efficiently as well as perform operations on the entire matrix, optionally spreading the work among all the available CPUs.
Repa defines the dimensions of your array using a slightly odd syntax. If your row and column lengths are in variables m and n then Z :. n :. m is much like the C declaration int arr[m][n]. For the one dimensional example, ks, we have:
fromList (Z :. (length ks)) ks
Which changes our type from [Int] to Array DIM1 Int.
For the two dimensional array we have:
let matrix = fromList (Z :. m :. n) (concat arr)
And change our type from [[Int]] to Array DIM2 Int.
So there you have it. A parsing of your file format into an efficient Haskell data structure using production-oriented libraries.
What about something simple like this?
parse :: String -> (Int, Int, [Int], [[Int]])
parse stuff = (m, n, ks, xss)
where (line1:line2:rest) = lines stuff
readMany = map read . words
(m:n:_) = readMany line1
ks = readMany line2
xss = take m $ map (take n . readMany) rest
main :: IO ()
main = do
stuff <- getContents
let (m, n, ks, xss) = parse stuff
print m
print n
print ks
print xss

Resources