Haskell Selective Text Obfuscation - haskell

I would like to obfuscate a text file report without obscuring certain keywords, like report titles, column headers, etc. I've built such a program using newLisp. I'm trying to implement the functionality in Haskell from scratch. Here is the code I've got so far, which compiles and runs successfully for the case of simple obfuscation.
module Main where
import Data.Char (isAlpha, isNumber, isUpper, toUpper)
import System.Environment (getArgs)
import System.Random (getStdGen, randomR, StdGen)
helpMessage = [ "Usage: cat filename(s) | obfuscate [-x filename] > filename",
"",
"Obfuscates text files. This obliterates the text--there is no recovery. This",
"is not encryption. It's simple, if slow, obfuscation.",
"",
"To include a list of words not to obfuscate, use the -x option. List one word",
"per line in the file.",
"" ]
data CLOpts = CLOpts { help :: Bool
, exceptionFileP :: Bool
, exceptionFile :: String }
main = do
args <- getArgs
if length args > 0
then do let opts = parseCL args CLOpts { help=False, exceptionFileP=False, exceptionFile="" }
if help opts
then do putStrLn $ unlines helpMessage
else do if exceptionFileP opts
then do exceptions <- readFile $ exceptionFile opts
obf complexObfuscation $ lines exceptions
else do obf simpleObfuscation []
else do obf simpleObfuscation []
where obf f xs = do
g <- getStdGen
c <- getContents
putStrLn $ f xs g c
parseCL :: [String] -> CLOpts -> CLOpts
parseCL [] opts = opts
parseCL ("-x":f:xs) opts = parseCL xs opts { exceptionFileP=True, exceptionFile=f }
parseCL (_:xs) opts = parseCL xs opts { help=True }
simpleObfuscation xs = obfuscate
complexObfuscation exceptions g c = undefined
obfuscate :: StdGen -> String -> String
obfuscate g = obfuscate' g []
where
obfuscate' _ a [] = reverse a
obfuscate' g a text#(c:cs)
| isAlpha c = obf obfuscateAlpha g a text
| isNumber c = obf obfuscateDigit g a text
| otherwise = obf id g a text
obf f g a (c:cs) = let (x,g') = f (c,g) in obfuscate' g' (x:a) cs
obfuscateAlpha, obfuscateDigit :: (Char, StdGen) -> (Char, StdGen)
obfuscateAlpha (c,g) = obfuscateChar g range
where range
| isUpper c = ('A','Z')
| otherwise = ('a','z')
obfuscateDigit (c,g) = obfuscateChar g ('0','9')
obfuscateChar :: StdGen -> (Char, Char) -> (Char, StdGen)
obfuscateChar = flip randomR
I cannot get my head around how to obfuscate all text except words passed in as exceptions. My newLisp implementation relied on it's built-in regular expression handling. I've not had much luck using regular expressions in Haskell. Probably old libraries or something.
I've tried splitting the text up into lines and words and creating what in J would be called a fret. That approach is quickly getting unwieldy. I tried to use a parser, but I think that's going to get pretty hairy, too.
Does anyone have suggestion(s) on a simple, straight-forward approach to identifying exception words in the text and how not to send those to the obfuscate function? Haskell is such a brilliant language, surely I'm missing something right under my nose.
I tried Google, but it seems my desire for providing an exception list of words not to obfuscate is novel. Otherwise, obfuscation is quite simple.
Update
Following the idea I marked as the answer, I created my own words function:
words' :: String -> [String]
words' text = f text [] []
where f [] wa ta = reverse $ wa:ta
f (c:cs) wa ta =
if isAlphaNum c
then f cs (c:wa) ta
else f cs [] $ if length wa > 0 then [c]:(reverse wa):ta else [c]:ta
Using break didn't work. I think mutual recursion with break and span would have worked, but I went with the code above before I thought of trying that.
Then I implemented complexObfuscation as follows:
complexObfuscation exceptions g = unlines . map obfuscateLine . lines
where obfuscateLine = concatMap obfuscateWord . words'
obfuscateWord word =
if word `elem` exceptions
then word
else obfuscate g word
This accomplished what I was after. Unfortunately, I did not anticipate that the same generator would generate the same characters with every call to obfuscate. So each word starts with the same characters. Lol. A problem for another day.

Read the exceptions file and build a Data.Set.Set.
After splitting the input file into lines, split it further into words.
Then, obfuscate each word individually. If a word is an element of the Set you built before, leave it as it is. Otherwise, apply your obfuscate function to each character.

Related

Searching through a file takes forever when action is phrased in a certain way

I have a textfile containing about 45k words and I want to check some simple property about the words.
Here is the body of my program
import Data.Char
import Data.List
import qualified Data.Set as Set
vowel x = elem x "aeiou"
nontrivial w = not $ null [x | x <- w, vowel x]
toPigLatin :: String -> String
toPigLatin word
| vowel (head word) = word ++ "ay"
| otherwise = let (v, c) = span (not . vowel) word in
if c == "" then word
else c ++ v ++ "ay"
pigExists :: String -> Set.Set String -> Bool
pigExists word set = Set.member (toPigLatin word) set
But here are two possible ways to write the main action:
main :: IO ()
main = do
allWords <- words <$> readFile "/tmp/linuxwords2" --https://users.cs.duke.edu/~ola/ap/linuxwords
let pigHits = filter (\word -> nontrivial word && pigExists word (Set.fromList allWords)) allWords
sequence_ $ map putStrLn pigHits
and
main :: IO ()
main = do
allWords <- words <$> readFile "/tmp/linuxwords2" --https://users.cs.duke.edu/~ola/ap/linuxwords
let pigHits = filter (\word -> nontrivial word && pigExists word (Set.fromList allWords)) allWords
putStrLn $ unlines pigHits
I have tried compiling with ghc -O2 and the former works fine, but the latter takes forever to produce an answer.
What is the real difference between these two styles? In general, how should I write efficient code that involves IO?
This problem was discussed first in Code Review SE
I can reproduce the problem in GHC 7.10.2. Using GHC 8.2.2 the problem is not present. Solution: upgrade your compiler.
It isn't surprising to see the second version performing worse - after all it is creating a rather long linked list by concatenating a large set of strings - this takes time and memory. The first version does not make the intermediate list at the cost of many (relatively fast) calls to putStrLn. EDIT: This wasn't ment to say it should take this long. I see now that output list, which I redirected to null, is extremely short. This is certainly a bug in the older GHC.

Cutting specific chunks from a Haskell String

I'm trying to cut chunks from a list, with a given predicate. I would have preferred to use a double character, e.g. ~/, but have resolved to just using $. What I essentially want to do is this...
A: "Hello, my $name is$ Danny and I $like$ Haskell"
What I want to turn this into is this:
B: "Hello, my Danny and I Haskell"
So I want to strip everything in between the given symbol, $, or my first preference was ~/, if I can figure it out. What I tried was this:
s1 :: String -> String
s1 xs = takeWhile (/= '$') xs
s2 :: String -> String
s2 xs = dropWhile (/= '$') xs
s3 :: String -> String
s3 xs = s3 $ s2 $ s1 xs
This solution seems to just bug my IDE out (possibly infinite looping).
Solution:
s3 :: String -> String
s3 xs
|'$' `notElem` xs = xs
|otherwise = takeWhile (/= '$') xs ++ (s3 $ s1 xs)
s1 :: String -> String
s1 xs = drop 1 $ dropWhile (/= '$') $ tail $ snd $ break ('$'==) xs
This seems like a nice application for parsers. A solution using trifecta:
import Control.Applicative
import Data.Foldable
import Data.Functor
import Text.Trifecta
input :: String
input = "Hello, my $name is$ Danny and I $like$ Haskell"
cutChunk :: CharParsing f => f String
cutChunk = "" <$ (char '$' *> many (notChar '$') <* char '$')
cutChunk matches $, followed by 0 or more (many) non-$ characters, then another $. Then we use ("" <$) to make this parser's value always be the empty string, thus discarding all the characters that this parser matches.
includeChunk :: CharParsing f => f String
includeChunk = some (notChar '$')
includeChunk matches the text that we want to include in the result, which is anything that's not the $ character. It's important that we use some (matching one or more characters) and not many (matching zero or more characters) because we're going to include this parser within another many expression next; if this parser matched on the empty string, then that could loop infinitely.
chunks :: CharParsing f => f String
chunks = fold <$> many (cutChunk <|> includeChunk)
chunks is the parser for everything. Read <|> as "or", as in "parse either a cutChunk or an includeChunk". many (cutChunk <|> includeChunk) is a parser that produces a list of chunks e.g. Success ["Hello, my ",""," Danny and I ",""," Haskell"], so we fold the output to concatenate those chunks together into a single string.
result :: Result String
result = parseString chunks mempty input
The result:
Success "Hello, my Danny and I Haskell"
Your infinite loop comes from calling s3 recursively with no base case:
s3 :: String -> String
s3 xs = s3 $ s2 $ s1 xs
Adding a base case corrects the infinite loop:
s3 xs
| '$' `notElem` xs = xs
| otherwise = ...
This is not the whole answer. Think about what s1 actually does and where you use its return value:
s1 "hello $my name is$ ThreeFx" == "hello "
For further reference, see the break function:
break :: (a -> Bool) -> [a] -> ([a], [a])
I think your logic is wrong, perhaps easier to write it in an elementary way
Prelude> let pr xs = go xs True
Prelude| where go [] _ = []
Prelude| go (x:xs) f | x=='$' = go xs (not f)
Prelude| | f = x : go xs f
Prelude| | otherwise = go xs f
Prelude|
Prelude> pr "Hello, my $name is$ Danny and I $like$ Haskell"
"Hello, my Danny and I Haskell"
Explanation The flag f keeps track of the state (either pass mode or not). If the current char is a token skip and switch state.

Haskell - Rename duplicate values in a list of lists

I have a list of lists of strings e.g;
[["h","e","l","l","o"], ["g","o","o","d"], ["w","o","o","r","l","d"]]
And I want to rename repeated values outside a sublist so that all the repetitions are set to new randomly generated values throughout a sublist that are not pre-existing in the list but the same inside the same sublist so that a possible result might be:
[["h","e","l","l","o"], ["g","t","t","d"], ["w","s","s","r","z","f"]]
I already have a function that can randomly generate a string of size one called randomStr:
randomStr :: String
randomStr = take 1 $ randomRs ('a','z') $ unsafePerformIO newStdGen
Presuming you want to do what I've outlined in my comment below, it's best to break this problem up into several smaller parts to tackle one at a time. I would also recommend leveraging common modules in base and containers, since it will make the code much simpler and faster. In particular, the modules Data.Map and Data.Sequence are very useful in this case. Data.Map I would say is the most useful here, as it has some very useful functions that would otherwise be difficult to write by hand. Data.Sequence is used for efficiency purposes at the end, as you'll see.
First, imports:
import Data.List (nub)
import Data.Map (Map)
import Data.Sequence (Seq, (|>), (<|))
import qualified Data.Map as Map
import qualified Data.Sequence as Seq
import Data.Foldable (toList)
import System.Random (randomRIO)
import Control.Monad (forM, foldM)
import Control.Applicative ((<$>))
Data.Foldable.toList is needed since Data.Sequence does not have a toList function, but Foldable provides one that will work. On to the code. We first want to be able to take a list of Strings and find all the unique elements in it. For this, we can use nub:
lettersIn :: [String] -> [String]
lettersIn = nub
I like providing my own names for functions like this, it can make the code more readable.
Now that we can get all the unique characters, we want to be able to assign each a random character:
makeRandomLetterMap :: [String] -> IO (Map String String)
makeRandomLetterMap letters
= fmap Map.fromList
$ forM (lettersIn letters) $ \l -> do
newL <- randomRIO ('a', 'z')
return (l, [newL])
Here we get a new random character and essentially zip it up with our list of letters, then we fmap (<$>) Map.fromList over that result. Next, we need to be able to use this map to replace letters in a list. If a letter isn't found in the Map, we just want the letter back. Luckily, Data.Map has the findWithDefault function which is perfect for this situation:
replaceLetter :: Map String String -> String -> String
replaceLetter m letter = Map.findWithDefault letter letter m
replaceAllLetters :: Map String String -> [String] -> [String]
replaceAllLetters m letters = map (replaceLetter m) letters
Since we want to be able to update this map with new letters that have been encountered in each sublist, overwriting previously encountered letters as needed, we can use Data.Map.union. Since union favors its first argument, we need to flip it:
updateLetterMap :: Map String String -> [String] -> IO (Map String String)
updateLetterMap m letters = flip Map.union m <$> makeRandomLetterMap letters
Now we have all the tools needed to tackle the problem at hand:
replaceDuplicatesRandomly :: [[String]] -> IO [[String]]
replaceDuplicatesRandomly [] = return []
For the base case, just return an empty list.
replaceDuplicatesRandomly (first:rest) = do
m <- makeRandomLetterMap first
For a non-empty list, make the initial map off the first sublist
(_, seqTail) <- foldM go (m, Seq.empty) rest
Fold over the rest, starting with an empty sequence and the first map, and extract the resulting sequence
return $ toList $ first <| seqTail
Then convert the sequence to a list after prepending the first sublist (it doesn't get changed by this function). The go function is pretty simple too:
where
go (m, acc) letters = do
let newLetters = replaceAllLetters m letters
newM <- updateLetterMap m letters
return (newM, acc |> newLetters)
It takes the current map m and an accumulation of all the sublists processed so far acc along with the current sublist letters, replaces the letters in said sublist, builds a new map for the next iteration (newM), and then returns the new map along with the accumulation of everything processed, i.e. acc |> newLetters. All together, the function is
replaceDuplicatesRandomly :: [[String]] -> IO [[String]]
replaceDuplicatesRandomly [] = return []
replaceDuplicatesRandomly (first:rest) = do
m <- makeRandomLetterMap first
(_, seqTail) <- foldM go (m, Seq.empty) rest
return $ toList $ first <| seqTail
where
go (m, acc) letters = do
let newLetters = replaceAllLetters m letters
newM <- updateLetterMap m letters
return (newM, acc |> newLetters)
It's always better to keep impure and pure computations separated.
You cannot replace by letters, which are already in a list, so you need to get a string of fresh letters:
fresh :: [String] -> String
fresh xss = ['a'..'z'] \\ foldr union [] xss
This function replaces one letter with another in a string:
replaceOne :: Char -> Char -> String -> String
replaceOne y y' = map (\x -> if x == y then y' else x)
This function replaces one letter each time with a new letter for every string in a list of strings:
replaceOnes :: Char -> String -> [String] -> (String, [String])
replaceOnes y = mapAccumL (\(y':ys') xs ->
if y `elem` xs
then (ys', replaceOne y y' xs)
else (y':ys', xs))
For example
replaceOnes 'o' "ijklmn" ["hello", "good", "world"]
returns
("lmn",["helli","gjjd","wkrld"])
A bit tricky one:
replaceMany :: String -> String -> [String] -> (String, [String])
replaceMany ys' ys xss = runState (foldM (\ys' y -> state $ replaceOnes y ys') ys' ys) xss
This function replaces each letter from ys each time with a new letter from ys' for every string in xss.
For example
replaceMany "mnpqstuvxyz" "lod" ["hello", "good", "world"]
returns
("vxyz",["hemmp","gqqt","wsrnu"])
i.e.
'l's in "hello" are replaced by the first letter in "mnpqstuvxyz"
'l' in "world" is replaced by the second letter in "mnpqstuvxyz"
'o' in "hello" is replaced by the third letter in "mnpqstuvxyz"
'o's in "good" are replaced by the fourth letter in "mnpqstuvxyz"
...
'd' in "world" is replaced by the seventh letter in "mnpqstuvxyz"
This function goes through a list of strings and replaces all letters from the head by fresh letters, that ys' contains, for each string in the rest of the list.
replaceDuplicatesBy :: String -> [String] -> [String]
replaceDuplicatesBy ys' [] = []
replaceDuplicatesBy ys' (ys:xss) = ys : uncurry replaceDuplicatesBy (replaceMany ys' ys xss)
I.e. it does what you want, but without any randomness — just picks fresh letters from a list.
All described functions are pure. Here is an impure one:
replaceDuplicates :: [String] -> IO [String]
replaceDuplicates xss = flip replaceDuplicatesBy xss <$> shuffle (fresh xss)
I.e. generate a random permutation of a string, that contains fresh letters, and pass it to replaceDuplicatesBy.
You can take the shuffle function from https://www.haskell.org/haskellwiki/Random_shuffle
And the final test:
main = replicateM_ 3 $ replaceDuplicates ["hello", "good", "world"] >>= print
prints
["hello","gxxd","wcrzy"]
["hello","gyyd","wnrmf"]
["hello","gmmd","wvrtx"]
The whole code (without shuffle): http://lpaste.net/115763
I think this is bound to raise more questions than it answers.
import Control.Monad.State
import Data.List
import System.Random
mapAccumLM _ s [] = return (s, [])
mapAccumLM f s (x:xs) = do
(s', y) <- f s x
(s'', ys) <- mapAccumLM f s' xs
return (s'', y:ys)
pick excluded for w = do
a <- pick' excluded
putStrLn $ "replacement for " ++ show for ++ " in " ++ show w ++ " excluded: " ++ show excluded ++ " = " ++ show a
return a
-- | XXX -- can loop indefinitely
pick' excluded = do
a <- randomRIO ('a','z')
if elem a excluded
then pick' excluded
else return a
transform w = do
globallySeen <- get
let go locallySeen ch =
case lookup ch locallySeen of
Nothing -> if elem ch globallySeen
then do let excluded = globallySeen ++ (map snd locallySeen)
a <- lift $ pick excluded ch w
return ( (ch, a):locallySeen, a)
else return ( (ch,ch):locallySeen, ch )
Just ch' -> return (locallySeen, ch')
(locallySeen, w') <- mapAccumLM go [] w
let globallySeen' = w' ++ globallySeen
put globallySeen'
return w'
doit ws = runStateT (mapM transform ws) []
main = do
ws' <- doit [ "hello", "good", "world" ]
print ws'

Reading multiline user's input

I want to lazily read user input and do something with it line by line. But if user ends a line with , (comma) followed by any number of spaces (including zero), I want give him opportunity to finish his input on the next line.
And here is what I've got:
import System.IO
import Data.Char
chop :: String -> [String]
chop = f . map (++ "\n") . lines
where f [] = []
f [x] = [x]
f (x : y : xs) = if (p . tr) x
then f ((x ++ y) : xs)
else x : f (y : xs)
p x = (not . null) x && ((== ',') . last) x
tr xs | all isSpace xs = ""
tr (x : xs) = x :tr xs
main :: IO ()
main =
do putStrLn "Welcome to hell, version 0.1.3!"
putPrompt
mapM_ process . takeWhile (/= "quit\n") . chop =<< getContents
where process str = putStr str >> putPrompt
putPrompt = putStr ">>> " >> hFlush stdout
Sorry, it doesn't work at all. Bloody mess.
P.S. I want to preserve \n characters on end of every chunk. Currently I add them manually with map (++ "\n") after lines.
How about changing the type of chop a little:
readMultiLine :: IO [String]
readMultiLine = do
ln <- getLine
if (endswith (rstrip ln) ",") then
liftM (ln:) readMultiLine
else
return [ln]
Now you know that if the last list is not empty, then the user didn't finish typing (the last input ended with ',').
Of course, either import Data.String.Utils, or write your own. Could be as simple as:
endswith xs ys = (length xs >= length ys)
&& (and $ zipWith (==) (reverse xs) (reverse ys))
rstrip = reverse . dropWhile isSpace . reverse
But I missed the point at first. Here's the actual thing.
unfoldM :: (Monad m) => (a -> Maybe (m b, m a)) -> a -> m [b]
unfoldM f z = case f z of
Nothing -> return []
Just (x, y) -> liftM2 (:) x $ y >>= unfoldM f
main = unfoldM (\x -> if (x == ["quit"]) then Nothing
else Just (print x, readMultiLine)) =<< readMultiLine
The reason is, you need to be able to insert the "action" to be done on input between reading one multi-line input and the next. Here print x is the action inserted between two readMultiLine
Since you have questions about getContents, let me add. Even though getContents provides a lazy String, its effectful changes to the world are ordered with the subsequent effects of processing the list. But the processing of the list attempts to insert effects between effects of reading particular list items. To do that, you need a function that exposes the chain of effects, so you can insert your own effects between them.
You can do this using pipes, preserving the laziness of the user's input
import Data.Char (isSpace)
import Pipes
import qualified Pipes.Prelude as Pipes
endsWithComma :: String -> Bool
endsWithComma str =
case (dropWhile isSpace $ reverse str) of
',':_ -> True
_ -> False
finish :: Monad m => Pipe String String m ()
finish = do
str <- await
yield str
if endsWithComma str
then do
str' <- await
yield str'
else finish
user :: Producer String IO ()
user = Pipes.stdinLn >-> finish
You can then hook up the user Producer to any downstream Consumer. For example, to echo the stream back out you can write:
main = runEffect (user >-> Pipes.stdoutLn)
To learn more about pipes you can read the tutorial.
Sorry, I wrote something wrong in a comment and I thought that now that I understood what you were trying to do, I'd give an answer with a little more substance. The core idea is that you're going to need a state buffer while you loop through the string, as far as I can tell. You have f :: [String] -> [String] but you'll need an extra string of buffer before you can solve this puzzle.
So let me assume an answer which looks like:
chop = joinCommas "" . map (++ "\n") . lines
Then the structure of joinCommas is going to look like:
import Data.List (isSuffixOf)
-- override with however you want to handle the ",\n" between lines.
joinLines = (++)
incomplete = isSuffixOf ",\n"
joinCommas :: String -> [String] -> [String]
joinCommas prefix (line : rest)
| incomplete prefix = joinCommas (joinLines prefix line) rest
| otherwise = prefix : joinCommas line rest
joinCommas prefix []
| incomplete prefix = error "Incomplete input"
| otherwise = [prefix]
The prefix stores up lines until it doesn't end with ",\n" at which point it emits the prefix and continues with the rest of the lines. On EOF we process the last line unless that line is incomplete.

How do I parse a matrix of integers in Haskell?

So I've read the theory, now trying to parse a file in Haskell - but am not getting anywhere. This is just so weird...
Here is how my input file looks:
m n
k1, k2...
a11, ...., an
a21,.... a22
...
am1... amn
Where m,n are just intergers, K = [k1, k2...] is a list of integers, and a11..amn is a "matrix" (a list of lists): A=[[a11,...a1n], ... [am1... amn]]
Here is my quick python version:
def parse(filename):
"""
Input of the form:
m n
k1, k2...
a11, ...., an
a21,.... a22
...
am1... amn
"""
f = open(filename)
(m,n) = f.readline().split()
m = int(m)
n = int(n)
K = [int(k) for k in f.readline().split()]
# Matrix - list of lists
A = []
for i in range(m):
row = [float(el) for el in f.readline().split()]
A.append(row)
return (m, n, K, A)
And here is how (not very) far I got in Haskell:
import System.Environment
import Data.List
main = do
(fname:_) <- getArgs
putStrLn fname --since putStrLn goes to IO ()monad we can't just apply it
parsed <- parse fname
putStrLn parsed
parse fname = do
contents <- readFile fname
-- ,,,missing stuff... ??? how can I get first "element" and match on it?
return contents
I am getting confused by monads (and the context that the trap me into!), and the do statement. I really want to write something like this, but I know it's wrong:
firstLine <- contents.head
(m,n) <- map read (words firstLine)
because contents is not a list - but a monad.
Any help on the next step would be great.
So I've just discovered that you can do:
liftM lines . readFile
to get a list of lines from a file. However, still the example only only transforms the ENTIRE file, and doesn't use just the first, or the second lines...
The very simple version could be:
import Control.Monad (liftM)
-- this operates purely on list of strings
-- and also will fail horribly when passed something that doesn't
-- match the pattern
parse_lines :: [String] -> (Int, Int, [Int], [[Int]])
parse_lines (mn_line : ks_line : matrix_lines) = (m, n, ks, matrix)
where [m, n] = read_ints mn_line
ks = read_ints ks_line
matrix = parse_matrix matrix_lines
-- this here is to loop through remaining lines to form a matrix
parse_matrix :: [String] -> [[Int]]
parse_matrix lines = parse_matrix' lines []
where parse_matrix' [] acc = reverse acc
parse_matrix' (l : ls) acc = parse_matrix' ls $ (read_ints l) : acc
-- this here is to give proper signature for read
read_ints :: String -> [Int]
read_ints = map read . words
-- this reads the file contents and lifts the result into IO
parse_file :: FilePath -> IO (Int, Int, [Int], [[Int]])
parse_file filename = do
file_lines <- (liftM lines . readFile) filename
return $ parse_lines file_lines
You might want to look into Parsec for fancier parsing, with better error handling.
*Main Control.Monad> parse_file "test.txt"
(3,3,[1,2,3],[[1,2,3],[4,5,6],[7,8,9]])
An easy to write solution
import Control.Monad (replicateM)
-- Read space seperated words on a line from stdin
readMany :: Read a => IO [a]
readMany = fmap (map read . words) getLine
parse :: IO (Int, Int, [Int], [[Int]])
parse = do
[m, n] <- readMany
ks <- readMany
xss <- replicateM m readMany
return (m, n, ks, xss)
Let's try it:
*Main> parse
2 2
123 321
1 2
3 4
(2,2,[123,321],[[1,2],[3,4]])
While the code I presented is quite expressive. That is, you get work done quickly with little code, it has some bad properties. Though I think if you are still learning haskell and haven't started with parser libraries. This is the way to go.
Two bad properties of my solution:
All code is in IO, nothing is testable in isolation
The error handling is very bad, as you see the pattern matching is very aggressive in [m, n]. What happens if we have 3 elements on the first line of the input file?
liftM is not magic! You would think it does some arcane thing to lift a function f into a monad but it is actually just defined as:
liftM f x = do
y <- x
return (f y)
We could actually use liftM to do what you wanted to, that is:
[m,n] <- liftM (map read . words . head . lines) (readFile fname)
but what you are looking for are let statements:
parseLine = map read . words
parse fname = do
(x:y:xs) <- liftM lines (readFile fname)
let [m,n] = parseLine x
let ks = parseLine y
let matrix = map parseLine xs
return (m,n,ks,matrix)
As you can see we can use let to mean variable assignment rather then monadic computation. In fact let statements are you just let expressions when we desugar the do notation:
parse fname =
liftM lines (readFile fname) >>= (\(x:y:xs) ->
let [m,n] = parseLine x
ks = parseLine y
matrix = map parseLine xs
in return matrix )
A Solution Using a Parsing Library
Since you'll probably have a number of people responding with code that parses strings of Ints into [[Int]] (map (map read . words) . lines $ contents), I'll skip that and introduce one of the parsing libraries. If you were to do this task for real work you'd probably use such a library that parses ByteString (instead of String, which means your IO reads everything into a linked list of individual characters).
import System.Environment
import Control.Monad
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString as B
First, I imported the Attoparsec and bytestring libraries. You can see these libraries and their documentation on hackage and install them using the cabal tool.
main = do
(fname:_) <- getArgs
putStrLn fname
parsed <- parseX fname
print parsed
main is basically unchanged.
parseX :: FilePath -> IO (Int, Int, [Int], [[Int]])
parseX fname = do
bs <- B.readFile fname
let res = parseOnly parseDrozzy bs
-- We spew the error messages right here
either (error . show) return res
parseX (renamed from parse to avoid name collision) uses the bytestring library's readfile, which reads in the file packed, in contiguous bytes, instead of into cells of a linked list. After parsing I use a little shorthand to return the result if the parser returned Right result or print an error if the parser returned a value of Left someErrorMessage.
-- Helper functions, more basic than you might think, but lets ignore it
sint = skipSpace >> int
int = liftM floor number
parseDrozzy :: Parser (Int, Int, [Int], [[Int]])
parseDrozzy = do
m <- sint
n <- sint
skipSpace
ks <- manyTill sint endOfLine
arr <- count m (count n sint)
return (m,n,ks,arr)
The real work then happens in parseDrozzy. We get our m and n Int values using the above helper. In most Haskell parsing libraries we must explicitly handle whitespace - so I skip the newline after n to get to our ks. ks is just all the int values before the next newline. Now we can actually use the previously specified number of rows and columns to get our array.
Technically speaking, that final bit arr <- count m (count n sint) doesn't follow your format. It will grab n ints even if it means going to the next line. We could copy Python's behavior (not verifying the number of values in a row) using count m (manyTill sint endOfLine) or we could check for each end of line more explicitly and return an error if we are short on elements.
From Lists to a Matrix
Lists of lists are not 2 dimensional arrays - the space and performance characteristics are completely different. Let's pack our list into a real matrix using Data.Array.Repa (import Data.Array.Repa). This will allow us to access the elements of the array efficiently as well as perform operations on the entire matrix, optionally spreading the work among all the available CPUs.
Repa defines the dimensions of your array using a slightly odd syntax. If your row and column lengths are in variables m and n then Z :. n :. m is much like the C declaration int arr[m][n]. For the one dimensional example, ks, we have:
fromList (Z :. (length ks)) ks
Which changes our type from [Int] to Array DIM1 Int.
For the two dimensional array we have:
let matrix = fromList (Z :. m :. n) (concat arr)
And change our type from [[Int]] to Array DIM2 Int.
So there you have it. A parsing of your file format into an efficient Haskell data structure using production-oriented libraries.
What about something simple like this?
parse :: String -> (Int, Int, [Int], [[Int]])
parse stuff = (m, n, ks, xss)
where (line1:line2:rest) = lines stuff
readMany = map read . words
(m:n:_) = readMany line1
ks = readMany line2
xss = take m $ map (take n . readMany) rest
main :: IO ()
main = do
stuff <- getContents
let (m, n, ks, xss) = parse stuff
print m
print n
print ks
print xss

Resources