Haskell Processing text from a file

Haskell Processing text from a file - string

Hi Guys,
1. What do I want to do?
I get a 1-lined file with text
"Bangabang [Just 3, Nothing, Just 1, Nothing] [Nothing, Nothing, Nothing, Nothing] [Nothing, Nothing, Just 4, Nothing] [Nothing, Just 3, Nothing, Nothing]"
I want to read this text from a file and convert it to:
[[Just 3, Nothing, Just 1, Nothing], [Nothing, Nothing, Nothing, Nothing], [Nothing, Nothing, Just 4, Nothing], [Nothing, Just 3, Nothing, Nothing]]
Which is a [[Maybe Integer]] type.
2. What have I already done?
I can modify normal String to Maybe Integer
My String:
xxx = "Bangabang [Just 3, Nothing, Just 1, Nothing] [Nothing, Nothing, Nothing, Nothing] [Nothing, Nothing, Just 4, Nothing] [Nothing, Just 3, Nothing, Nothing]"
after executing stripChars ",]" $ drop 10 xxx I get:
"Just 31 Nothing Just 1 Nothing [Nothing Nothing Nothing Nothing [Nothing Nothing Just 4 Nothing [Nothing Just 3 Nothing Nothing"
after next commands map (splitOn " ") $ splitOn "[" I have:
[["Just","31","Nothing","Just","1","Nothing",""],["Nothing","Nothing","Nothing","Nothing",""],["Nothing","Nothing","Just","4","Nothing",""],["Nothing","Just","3","Nothing","Nothing"]]
Now I have to cut off that empty strings "" using cleany
And finally change [[String]] to [[Maybe Integer]] using cuty
[[Just 31,Nothing,Just 1,Nothing],[Nothing,Nothing,Nothing,Nothing],[Nothing,Nothing,Just 4,Nothing],[Nothing,Just 3,Nothing,Nothing]]
That is what I wanted to have!
3. The problem is...
...how can I execute this method:
parse xxx = cuty $ cleany $ map (splitOn " ") $ splitOn "[" $ stripChars ",]" $ drop 10 xxx
on text read from file (which is IO String type)?
This is my first Haskell project, so my functions may reinvent the wheel or do worse things :/
Used functions:
main do
text <- readFile "test.txt"
let l = lines
map parse . l
-- deletes unwanted characters from a String
stripChars :: String -> String -> String
stripChars = filter . flip notElem
-- converts String to Maybe a
maybeRead :: Read a => String -> Maybe a
maybeRead s = case reads s of
[(x,"")] -> Just x
_ -> Nothing
-- convert(with subfunction conv, because I don't know how to make it one function)
conv:: [String] -> [Maybe Integer]
conv[] = []
conv(x:xs) = if x == "Just" then conv xs
else maybeRead x: conv xs
convert:: [[String]] -> [[Maybe Integer]]
convert[] = []
convert(x:xs) = conv x : convert xs
-- cleany (with subfunction clean, because I don't know how to make it one function)
clean :: [String] -> [String]
clean [] = []
clean (x:xs) = if x == "" then clean xs
else x : clean xs
cleany :: [[String]] -> [[String]]
cleany [] = []
cleany (x:xs) = clean x : cleany xs

I'll assume you're ok with a parser that does zero to minimal error checking. Haskell has great libraries for parsing, and later I'll amend my answer with some alternatives you should look at.
Instead of using splitOn I would recommend writing these functions:
takeList :: String -> (String, String)
-- returns the match text and the text following the match
-- e.g. takeList " [1,2,3] ..." returns ("[1,2,3]", " ...")
takeLists :: String -> [String]
-- parses a sequence of lists separated by spaces
-- into a list of matches
I'll leave takeList as an exercise. I like to use span and break from Data.List for these kinds of simple parsers.
In terms of takeList, here is how you might write takeLists:
takeLists :: String -> [ String ]
takeLists str =
let s1 = dropWhile (/= '[') str
in if null s1
then []
else let (s2,s3) = takeList s1
in s2 : takeLists s3
For example, takeLists " [123] [4,5,6] [7,8] " will return:
[ "[123]", "[4,5,6]", "[7,8]" ]
Finally, to convert each string in this list to Haskell values, just use read.
answer :: [ [Int] ]
answer = map read (takeLists " [123] [4,5,6] [7,8] ")
Update
Using the ReadP and ReadS parsers available in the base libraries:
import Text.ParserCombinators.ReadP
bang :: ReadP [[Maybe Int]]
bang = do string "Bangabang"
skipSpaces
xs <- sepBy1 (readS_to_P reads) skipSpaces
eof
return xs
input = "Bangabang [Just 3, Nothing, Just 1, Nothing] [Nothing, Nothing, Nothing, Nothing] [Nothing, Nothing, Just 4, Nothing] [Nothing, Just 3, Nothing, Nothing]"
runParser p input = case (readP_to_S p) input of
[] -> error "no parses"
((a,_):_) -> print a
example = runParser bang input

You can use directly Read instance.
data Bangabang = Bangabang [Maybe Integer]
[Maybe Integer]
[Maybe Integer]
[Maybe Integer] deriving (Read, Show)
now, you can use all Read machinery (read, reads, readIO, ...), inferred from types. E.g.
readBangabang :: String -> Bangabang
readBangabang = read
If data came from file
readFile "foo.txt" >>= print . readBangabang

Related

How do I get the parameters out of the Maybe wrapper correctly?

I want to get the Maybe parameters out of the Maybe wrapper
A wrapper that contains one or more approximation results
data CalculatedPoints =
Linear {xPoints :: [Maybe Double], yPoints :: [Maybe Double] }
| Segment {xPoints :: [Maybe Double], yPoints :: [Maybe Double]}
deriving Show
Trying to get values
main = do
resultPoints <- parseInput
case resultPoints of
[Nothing, Nothing] -> putStrLn "Calculation failed. Perhaps you did not specify a method."
[lpoints, Nothing] -> do
putStrLn "Linear Aproxiation"
let (xl, yl) = fromMaybe (Nothing, Nothing) lpoints -- THIS
-- prettyPoints xl yl
print $ xl
[Nothing, spoints] -> do
putStrLn "Segment Aproxiation"
print spoints
[lpoints, spoints] -> do
putStrLn "Linear Aproxiation"
print lpoints
putStrLn "Segment Aproxiation"
print spoints
I get this error
warning: [-Wdeferred-type-errors]
    • Couldn't match type ‘CalculatedPoints’ with ‘(Maybe a, Maybe a1)
      Expected: Maybe (Maybe a, Maybe a1)         Actual: Maybe CalculatedPoints     • In the second argument of ‘fromMaybe’, namely ‘lpoints’
      In the expression: fromMaybe (Nothing, Nothing) lpoints       In a pattern binding:
        (xl, yl) = fromMaybe (Nothing, Nothing) lpoint
P.S
I decided that parseInput was important for the context, so it turns out that calculatePoints is also necessary, so I included them
calculatePoints :: Interval -> Double -> Points -> Bool -> Bool -> [Maybe CalculatedPoints]
calculatePoints interval step points True True =
[ Just (linearApproximation interval step points),
Just (segmentApproximation interval step points)
]
calculatePoints interval step points True False = [Just (linearApproximation interval step points), Nothing]
calculatePoints interval step points False True = [Nothing, Just (segmentApproximation interval step points)]
calculatePoints _ _ _ False False = [Nothing, Nothing]
parseInput :: IO [Maybe CalculatedPoints]
parseInput = do
input <- cmdArgs inputOptions
points <- case file input of
"" -> getPointsNoFile []
path -> getPointsFile path
return $ calculatePoints (left input, right input) 0.95 points (lm input) (sm input)

Notice that fromMaybe needs a default value of the same type. Below, your code commented with the mistake
main = do
resultPoints <- parseInput -- This has type [Maybe CalculatedPoints]
case resultPoints of
-- This is the case where input is a list with two Nothing's . Looks good :)
[Nothing, Nothing] -> putStrLn "Calculation failed. Perhaps you did not specify a method."
-- This case is missleading. First, You can match directly (Just calcpoints)
-- because the case where lpoints is Nothing, has being match before.
-- so there is no need to have an irrefutable pattern (i.e. a variable ignoring the shape)
-- Nevertheless, the error is below, not here
[lpoints, Nothing] -> do
putStrLn "Linear Aproxiation"
-- |- This has type a -> Maybe a -> a (sumary: you provide a default value for Nothing, otherwise extract whatever is within Just)
-- | |- The default value has type "(Maybe a, Maybe a)" => a tuple of maybes
-- | | |- This has value (Maybe CalculatedPoints)
let (xl, yl) = fromMaybe (Nothing, Nothing) lpoints -- THIS
-- |- Also this has type tuple of maybes. Therefore GHC is complaining
-- "you want me to extract a tuple but the value you provide is Maybe CalculatedPoints
-- How in the heck I get a tuple from that??"
prettyPoints xl yl
print $ xl
.
.
.
Now, If you want to type check your code, It should be something like below. Notice that It isn't a very good / idiomatic haskell code.
main = do
resultPoints <- parseInput
case resultPoints of
[Nothing, Nothing] -> putStrLn "Calculation failed. Perhaps you did not specify a method."
[lpoints, Nothing] -> do
putStrLn "Linear Aproxiation"
-- |- Provide a default value of the same type as the lpoints has "inside" the Just
let points = fromMaybe (Linear [] []) lpoints
-- |- not sure about this... But you can work it out
prettyPoints points??
print $ xl
Now, If you want more idiomatic code I'd suggest
main = do
resultPoints <- parseInput
case resultPoints of
[Nothing, Nothing] -> putStrLn "Calculation failed. Perhaps you did not specify a method."
[Just (Linear xl yl), Nothing] -> do
putStrLn "Linear Aproxiation"
prettyPoints xl yl
print $ xl
.
.
.

I think I'd just delete most of this code. Here's a much simpler main with the same behavior:
main = do
input <- cmdArgs inputOptions
points <- case file input of
"" -> getPointsNoFile []
path -> getPointsFile path
let interval = (left input, right input)
unless (lm input || sm input) (putStrLn "Calculation failed. Perhaps you did not specify a method.")
when (lm input) $ do
putStrLn "Linear Aproxiation" -- sic
print (linearApproximation interval 0.95 points)
when (sm input) $ do
putStrLn "Segment Aproxiation"
print (segmentApproximation interval 0.95 points)
No Maybe needed; no CalculatedPoints needed (...probably. unless linearApproximation and segmentApproximation are weirder than their names sound); no fragile guaranteed-length-two list needed; and less code repetition between cases. There is still a little bit of repetition between the linear and segment printing code; if you really want, you could abstract those a little bit.
data ApproximationMethod = AM
{ name :: String
, approx :: Interval -> Double -> Points -> Points
, active :: InputOptions -> Bool
}
allMethods :: [ApproximationMethod]
allMethods = [AM "Linear" linearApproximation lm, AM "Segment" segmentApproximation sm]
main = do
{- ... -}
for_ allMethods $ \am -> when (active am input) $ do
putStrLn $ name am ++ " Aproxiation"
print (approx am interval 0.95 points)
But for this little repeated code, that seems like overkill.

How far does "try" back track?

So ... I messed up a recording in CSV format:
23,95489,0,20,9888
Due to language settings floating point numbers were written with commas as seperator ... in a comma separated value file ...
Problem is that the file does not have a nice formatting for every float. Some have no point at all and the number of numbers behind the point varies too.
My idea was to build a MegaParsec parser that would try to read every possible floating point formatting, move on and if back track if it finds an error.
Eg for the example above:
read 23,95489 -> good
read 0,20 -> good (so far)
read 9888 -> error (because value is too high for column (checked by guard))
(back tracking to 2.) read 0 -> good again
read 20,9888 -> good
done
I've implemented that as (pseudo code here):
floatP = try pointyFloatP <|> unpointyFloatP
lineP = (,,) <$> floatP <* comma <*> floatP <* comma <*> floatP <* comma
My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?
And if so ... how would I go about implementing further back tracking?

How far does “try” back track?
The parser try p consumes exactly as much input as p if p parses successfully, otherwise it does not consume any input at all. So if you look at that in terms of backtracking, it backtracks to the point where you were when you invoked it.
My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?
Yes, try does not "unconsume" input. All it does is to recover from a failure in the parser you give it without consuming any input. It does not undo the effects of any parsers that you've applied previously, nor does it affect subsequent parsers that you apply after try p succeeded.
And if so ... how would I go about implementing further back tracking?
Basically what you want is to not only know whether pointyFloatP succeeds on the current input, but also whether the rest of your lineP would succeed after successfully pointyFloatP - and if it doesn't you want to backtrack back to before you applied pointyFloatP. So basically you want the parser for the whole remaining line in the try, not just the float parser.
To achieve that you can make floatP take the parser for the remaining line as an argument like this:
floatP restP = try (pointyFloatP <*> restP) <|> unpointyFloatP <*> restP
Note that this kind of backtracking isn't going to be very efficient (but I assume you knew that going in).

Update: Include a custom monadic parser for more complex rows.
Using the List Monad for Simple Parsing
The list monad makes a better backtracking "parser" than Megaparsec. For example, to parse the cells:
row :: [String]
row = ["23", "95489", "0", "20", "9888"]
into exactly three columns of values satisfying a particular bound (e.g., less than 30), you can generate all possible parses with:
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Control.Applicative
rowResults :: [String] -> [[Double]]
rowResults = cols 3
where cols :: Int -> [String] -> [[Double]]
cols 0 [] = pure [] -- good, finished on time
cols 0 _ = empty -- bad, didn't use all the data
-- otherwise, parse exactly #n# columns from cells #xs#
cols n xs = do
-- form #d# from one or two cells
(d, ys) <- num1 xs <|> num2 xs
-- only accept #d < 30#
guard $ d < 30
ds <- cols (n-1) ys
return $ d : ds
-- read number from a single cell
num1 (x:xs) | ok1 x = pure (read x, xs)
num1 _ = empty
-- read number from two cells
num2 (x:y:zs) | ok1 x && ok2 y = pure (read (x ++ "." ++ y), zs)
num2 _ = empty
-- first cell: "0" is okay, but otherwise can't start with "0"
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- second cell: can't end with "0" (or *be* "0")
ok2 xs = last xs /= '0'
The above list-based parser tries to reduce ambiguity by assuming that if "xxx,yyy" is a number, the "xxx" won't start with zeros (unless it's just "0"), and the "yyy" won't end with a zero (or, for that matter, be a single "0"). If this isn't right, just modify ok1 and ok2 as appropriate.
Applied to row, this gives the single unambiguous parse:
> rowResults row
[[23.95489,0.0,20.9888]]
Applied to an ambiguous row, it gives all parses:
> rowResults ["0", "12", "5", "0", "8601"]
[[0.0,12.5,0.8601],[0.0,12.5,0.8601],[0.12,5.0,0.8601]]
Anyway, I'd suggest using a standard CSV parser to parse your file into a matrix of String cells like so:
dat :: [[String]]
dat = [ ["23", "95489", "0", "20", "9888"]
, ["0", "12", "5", "0", "8601"]
, ["23", "2611", "2", "233", "14", "422"]
]
and then use rowResults above get the row numbers of rows that were ambiguous:
> map fst . filter ((>1) . snd) . zip [1..] . map (length . rowResults) $ dat
[2]
>
or unparsable:
> map fst . filter ((==0) . snd) . zip [1..] . map (length . rowResults) $ dat
[]
>
Assuming there are no unparsable rows, you can regenerate one possible fixed file, even if some rows are ambiguous, but just grabbing the first successful parse for each row:
> putStr $ unlines . map (intercalate "," . map show . head . rowResults) $ dat
23.95489,0.0,20.9888
0.0,12.5,0.8601
23.2611,2.233,14.422
>
Using a Custom Monad based on the List Monad for More Complex Parsing
For more complex parsing, for example if you wanted to parse a row like:
type Stream = [String]
row0 :: Stream
row0 = ["Apple", "15", "1", "5016", "2", "5", "3", "1801", "11/13/2018", "X101"]
with a mixture of strings and numbers, it's actually not that difficult to write a monadic parser, based on the list monad, that generates all possible parses.
The key idea is to define a parser as a function that takes a stream and generates a list of possible parses, with each possible parse represented as a tuple of the object successfully parsed from the beginning of the stream paired with the remainder of the stream. Wrapped in a newtype, our parallel parser would look like:
newtype PParser a = PParser (Stream -> [(a, Stream)]) deriving (Functor)
Note the similarity to the type ReadS from Text.ParserCombinators.ReadP, which is also technically an "all possible parses" parser (though you usually only expect one, unambiguous parse back from a reads call):
type ReadS a = String -> [(a, String)]
Anyway, we can define a Monad instance for PParser like so:
instance Applicative PParser where
pure x = PParser (\s -> [(x, s)])
(<*>) = ap
instance Monad PParser where
PParser p >>= f = PParser $ \s1 -> do -- in list monad
(x, s2) <- p s1
let PParser q = f x
(y, s3) <- q s2
return (y, s3)
There's nothing too tricky here: pure x returns a single possible parse, namely the result x with an unchanged stream s, while p >>= f applies the first parser p to generate a list of possible parses, takes them one by one within the list monad to calculate the next parser q to use (which, as per usual for a monadic operation, can depend on the result of the first parse), and generates a list of possible final parses that are returned.
The Alternative and MonadPlus instances are pretty straightforward -- they just lift emptiness and alternation from the list monad:
instance Alternative PParser where
empty = PParser (const empty)
PParser p <|> PParser q = PParser $ \s -> p s <|> q s
instance MonadPlus PParser where
To run our parser, we have:
parse :: PParser a -> Stream -> [a]
parse (PParser p) s = map fst (p s)
and now we can introduce primitives:
-- read a token as-is
token :: PParser String
token = PParser $ \s -> case s of
(x:xs) -> pure (x, xs)
_ -> empty
-- require an end of stream
eof :: PParser ()
eof = PParser $ \s -> case s of
[] -> pure ((), s)
_ -> empty
and combinators:
-- combinator to convert a String to any readable type
convert :: (Read a) => PParser String -> PParser a
convert (PParser p) = PParser $ \s1 -> do
(x, s2) <- p s1 -- for each possible String
(y, "") <- reads x -- get each possible full read
-- (normally only one)
return (y, s2)
and parsers for various "terms" in our CSV row:
-- read a string from a single cell
str :: PParser String
str = token
-- read an integer (any size) from a single cell
int :: PParser Int
int = convert (mfilter ok1 token)
-- read a double from one or two cells
dbl :: PParser Double
dbl = dbl1 <|> dbl2
where dbl1 = convert (mfilter ok1 token)
dbl2 = convert $ do
t1 <- mfilter ok1 token
t2 <- mfilter ok2 token
return $ t1 ++ "." ++ t2
-- read a double that's < 30
dbl30 :: PParser Double
dbl30 = do
x <- dbl
guard $ x < 30
return x
-- rules for first cell of numbers:
-- "0" is okay, but otherwise can't start with "0"
ok1 :: String -> Bool
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- rules for second cell of numbers:
-- can't be "0" or end in "0"
ok2 :: String -> Bool
ok2 xs = last xs /= '0'
Then, for a particular row schema, we can write a row parser as we normally would with a monadic parser:
-- a row
data Row = Row String Int Double Double Double
Int String String deriving (Show)
rowResults :: PParser Row
rowResults = Row <$> str <*> int <*> dbl30 <*> dbl30 <*> dbl30
<*> int <*> str <*> str <* eof
and get all possible parses:
> parse rowResults row0
[Row "Apple" 15 1.5016 2.0 5.3 1801 "11/13/2018" "X101"
,Row "Apple" 15 1.5016 2.5 3.0 1801 "11/13/2018" "X101"]
>
The full program is:
{-# LANGUAGE DeriveFunctor #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Control.Applicative
type Stream = [String]
newtype PParser a = PParser (Stream -> [(a, Stream)]) deriving (Functor)
instance Applicative PParser where
pure x = PParser (\s -> [(x, s)])
(<*>) = ap
instance Monad PParser where
PParser p >>= f = PParser $ \s1 -> do -- in list monad
(x, s2) <- p s1
let PParser q = f x
(y, s3) <- q s2
return (y, s3)
instance Alternative PParser where
empty = PParser (const empty)
PParser p <|> PParser q = PParser $ \s -> p s <|> q s
instance MonadPlus PParser where
parse :: PParser a -> Stream -> [a]
parse (PParser p) s = map fst (p s)
-- read a token as-is
token :: PParser String
token = PParser $ \s -> case s of
(x:xs) -> pure (x, xs)
_ -> empty
-- require an end of stream
eof :: PParser ()
eof = PParser $ \s -> case s of
[] -> pure ((), s)
_ -> empty
-- combinator to convert a String to any readable type
convert :: (Read a) => PParser String -> PParser a
convert (PParser p) = PParser $ \s1 -> do
(x, s2) <- p s1 -- for each possible String
(y, "") <- reads x -- get each possible full read
-- (normally only one)
return (y, s2)
-- read a string from a single cell
str :: PParser String
str = token
-- read an integer (any size) from a single cell
int :: PParser Int
int = convert (mfilter ok1 token)
-- read a double from one or two cells
dbl :: PParser Double
dbl = dbl1 <|> dbl2
where dbl1 = convert (mfilter ok1 token)
dbl2 = convert $ do
t1 <- mfilter ok1 token
t2 <- mfilter ok2 token
return $ t1 ++ "." ++ t2
-- read a double that's < 30
dbl30 :: PParser Double
dbl30 = do
x <- dbl
guard $ x < 30
return x
-- rules for first cell of numbers:
-- "0" is okay, but otherwise can't start with "0"
ok1 :: String -> Bool
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- rules for second cell of numbers:
-- can't be "0" or end in "0"
ok2 :: String -> Bool
ok2 xs = last xs /= '0'
-- a row
data Row = Row String Int Double Double Double
Int String String deriving (Show)
rowResults :: PParser Row
rowResults = Row <$> str <*> int <*> dbl30 <*> dbl30 <*> dbl30
<*> int <*> str <*> str <* eof
row0 :: Stream
row0 = ["Apple", "15", "1", "5016", "2", "5", "3", "1801", "11/13/2018", "X101"]
main = print $ parse rowResults row0
Off-the-shelf Solutions
I find it a little surprising I can't find an existing parser library out there that provides this kind of "all possible parses" parser. The stuff in Text.ParserCombinators.ReadP takes the right approach, but it assumes that you're parsing characters from a String rather than arbitrary tokens from some other stream (in our case, Strings from a [String]).
Maybe someone else can point out an off-the-shelf solution that would save you from having to role your own parser type, instances, and primitives.

How do I convert a string to a list of Maybe Int

How would I go about converting a string like this "13.2..2" to a list like this
[Just 1, Just 3, Nothing, Just 2, Nothing, Nothing, Just 2]
I have had a look at digitToInt but it does not take care of Maybe Int. Is there a way I could maybe modify digitToInt to handle Maybe Int?

You can use isDigit to test whether digitToInt will succeed.
λ> fmap (\c -> if isDigit c then Just (digitToInt c) else Nothing) "13.2..2" :: [Maybe Int]
[Just 1, Just 3, Nothing, Just 2, Nothing, Nothing, Just 2]
We can introduce a new function to clean this up a bit:
digitToIntMay :: Char -> Maybe Int
digitToIntMay c = if isDigit c then Just (digitToInt c) else Nothing
λ> fmap digitToIntMay "13.2..2" :: [Maybe Int]
[Just 1, Just 3, Nothing, Just 2, Nothing, Nothing, Just 2]

If you would like all non digits to be converted to Nothing, you can use guards and fmap
import Data.Char
charToMaybeInt :: Char -> Maybe Int
charToMaybeInt x
| isDigit x = Just $ digitToInt x
| otherwise = Nothing
main = putStrLn $ show $ fmap charToMaybeInt "13.2..2"
Using guards is, from my non-expert understanding, a bit more idiomatic than using if/else.

If you're certain the data coming is definitely a digit or . (or are happy with exceptions being thrown if the data is different), you can use pattern matching and fmap
import Data.Char
chatToMaybeInt :: Char -> Maybe Int
chatToMaybeInt '.' = Nothing
chatToMaybeInt x = Just $ digitToInt x
main = putStrLn $ show $ fmap chatToMaybeInt "13.2..2"

Haskell - Rename duplicate values in a list of lists

I have a list of lists of strings e.g;
[["h","e","l","l","o"], ["g","o","o","d"], ["w","o","o","r","l","d"]]
And I want to rename repeated values outside a sublist so that all the repetitions are set to new randomly generated values throughout a sublist that are not pre-existing in the list but the same inside the same sublist so that a possible result might be:
[["h","e","l","l","o"], ["g","t","t","d"], ["w","s","s","r","z","f"]]
I already have a function that can randomly generate a string of size one called randomStr:
randomStr :: String
randomStr = take 1 $ randomRs ('a','z') $ unsafePerformIO newStdGen

Presuming you want to do what I've outlined in my comment below, it's best to break this problem up into several smaller parts to tackle one at a time. I would also recommend leveraging common modules in base and containers, since it will make the code much simpler and faster. In particular, the modules Data.Map and Data.Sequence are very useful in this case. Data.Map I would say is the most useful here, as it has some very useful functions that would otherwise be difficult to write by hand. Data.Sequence is used for efficiency purposes at the end, as you'll see.
First, imports:
import Data.List (nub)
import Data.Map (Map)
import Data.Sequence (Seq, (|>), (<|))
import qualified Data.Map as Map
import qualified Data.Sequence as Seq
import Data.Foldable (toList)
import System.Random (randomRIO)
import Control.Monad (forM, foldM)
import Control.Applicative ((<$>))
Data.Foldable.toList is needed since Data.Sequence does not have a toList function, but Foldable provides one that will work. On to the code. We first want to be able to take a list of Strings and find all the unique elements in it. For this, we can use nub:
lettersIn :: [String] -> [String]
lettersIn = nub
I like providing my own names for functions like this, it can make the code more readable.
Now that we can get all the unique characters, we want to be able to assign each a random character:
makeRandomLetterMap :: [String] -> IO (Map String String)
makeRandomLetterMap letters
= fmap Map.fromList
$ forM (lettersIn letters) $ \l -> do
newL <- randomRIO ('a', 'z')
return (l, [newL])
Here we get a new random character and essentially zip it up with our list of letters, then we fmap (<$>) Map.fromList over that result. Next, we need to be able to use this map to replace letters in a list. If a letter isn't found in the Map, we just want the letter back. Luckily, Data.Map has the findWithDefault function which is perfect for this situation:
replaceLetter :: Map String String -> String -> String
replaceLetter m letter = Map.findWithDefault letter letter m
replaceAllLetters :: Map String String -> [String] -> [String]
replaceAllLetters m letters = map (replaceLetter m) letters
Since we want to be able to update this map with new letters that have been encountered in each sublist, overwriting previously encountered letters as needed, we can use Data.Map.union. Since union favors its first argument, we need to flip it:
updateLetterMap :: Map String String -> [String] -> IO (Map String String)
updateLetterMap m letters = flip Map.union m <$> makeRandomLetterMap letters
Now we have all the tools needed to tackle the problem at hand:
replaceDuplicatesRandomly :: [[String]] -> IO [[String]]
replaceDuplicatesRandomly [] = return []
For the base case, just return an empty list.
replaceDuplicatesRandomly (first:rest) = do
m <- makeRandomLetterMap first
For a non-empty list, make the initial map off the first sublist
(_, seqTail) <- foldM go (m, Seq.empty) rest
Fold over the rest, starting with an empty sequence and the first map, and extract the resulting sequence
return $ toList $ first <| seqTail
Then convert the sequence to a list after prepending the first sublist (it doesn't get changed by this function). The go function is pretty simple too:
where
go (m, acc) letters = do
let newLetters = replaceAllLetters m letters
newM <- updateLetterMap m letters
return (newM, acc |> newLetters)
It takes the current map m and an accumulation of all the sublists processed so far acc along with the current sublist letters, replaces the letters in said sublist, builds a new map for the next iteration (newM), and then returns the new map along with the accumulation of everything processed, i.e. acc |> newLetters. All together, the function is
replaceDuplicatesRandomly :: [[String]] -> IO [[String]]
replaceDuplicatesRandomly [] = return []
replaceDuplicatesRandomly (first:rest) = do
m <- makeRandomLetterMap first
(_, seqTail) <- foldM go (m, Seq.empty) rest
return $ toList $ first <| seqTail
where
go (m, acc) letters = do
let newLetters = replaceAllLetters m letters
newM <- updateLetterMap m letters
return (newM, acc |> newLetters)

It's always better to keep impure and pure computations separated.
You cannot replace by letters, which are already in a list, so you need to get a string of fresh letters:
fresh :: [String] -> String
fresh xss = ['a'..'z'] \\ foldr union [] xss
This function replaces one letter with another in a string:
replaceOne :: Char -> Char -> String -> String
replaceOne y y' = map (\x -> if x == y then y' else x)
This function replaces one letter each time with a new letter for every string in a list of strings:
replaceOnes :: Char -> String -> [String] -> (String, [String])
replaceOnes y = mapAccumL (\(y':ys') xs ->
if y `elem` xs
then (ys', replaceOne y y' xs)
else (y':ys', xs))
For example
replaceOnes 'o' "ijklmn" ["hello", "good", "world"]
returns
("lmn",["helli","gjjd","wkrld"])
A bit tricky one:
replaceMany :: String -> String -> [String] -> (String, [String])
replaceMany ys' ys xss = runState (foldM (\ys' y -> state $ replaceOnes y ys') ys' ys) xss
This function replaces each letter from ys each time with a new letter from ys' for every string in xss.
For example
replaceMany "mnpqstuvxyz" "lod" ["hello", "good", "world"]
returns
("vxyz",["hemmp","gqqt","wsrnu"])
i.e.
'l's in "hello" are replaced by the first letter in "mnpqstuvxyz"
'l' in "world" is replaced by the second letter in "mnpqstuvxyz"
'o' in "hello" is replaced by the third letter in "mnpqstuvxyz"
'o's in "good" are replaced by the fourth letter in "mnpqstuvxyz"
...
'd' in "world" is replaced by the seventh letter in "mnpqstuvxyz"
This function goes through a list of strings and replaces all letters from the head by fresh letters, that ys' contains, for each string in the rest of the list.
replaceDuplicatesBy :: String -> [String] -> [String]
replaceDuplicatesBy ys' [] = []
replaceDuplicatesBy ys' (ys:xss) = ys : uncurry replaceDuplicatesBy (replaceMany ys' ys xss)
I.e. it does what you want, but without any randomness — just picks fresh letters from a list.
All described functions are pure. Here is an impure one:
replaceDuplicates :: [String] -> IO [String]
replaceDuplicates xss = flip replaceDuplicatesBy xss <$> shuffle (fresh xss)
I.e. generate a random permutation of a string, that contains fresh letters, and pass it to replaceDuplicatesBy.
You can take the shuffle function from https://www.haskell.org/haskellwiki/Random_shuffle
And the final test:
main = replicateM_ 3 $ replaceDuplicates ["hello", "good", "world"] >>= print
prints
["hello","gxxd","wcrzy"]
["hello","gyyd","wnrmf"]
["hello","gmmd","wvrtx"]
The whole code (without shuffle): http://lpaste.net/115763

I think this is bound to raise more questions than it answers.
import Control.Monad.State
import Data.List
import System.Random
mapAccumLM _ s [] = return (s, [])
mapAccumLM f s (x:xs) = do
(s', y) <- f s x
(s'', ys) <- mapAccumLM f s' xs
return (s'', y:ys)
pick excluded for w = do
a <- pick' excluded
putStrLn $ "replacement for " ++ show for ++ " in " ++ show w ++ " excluded: " ++ show excluded ++ " = " ++ show a
return a
-- | XXX -- can loop indefinitely
pick' excluded = do
a <- randomRIO ('a','z')
if elem a excluded
then pick' excluded
else return a
transform w = do
globallySeen <- get
let go locallySeen ch =
case lookup ch locallySeen of
Nothing -> if elem ch globallySeen
then do let excluded = globallySeen ++ (map snd locallySeen)
a <- lift $ pick excluded ch w
return ( (ch, a):locallySeen, a)
else return ( (ch,ch):locallySeen, ch )
Just ch' -> return (locallySeen, ch')
(locallySeen, w') <- mapAccumLM go [] w
let globallySeen' = w' ++ globallySeen
put globallySeen'
return w'
doit ws = runStateT (mapM transform ws) []
main = do
ws' <- doit [ "hello", "good", "world" ]
print ws'

Non-exhaustive patterns in lambda

I am getting Non-exhaustive patterns in lambda. I am not sure of the cause yet. Please anyone how to fix it. The code is below:
import Control.Monad
import Data.List
time_spent h1 h2 = max (abs (fst h1 - fst h2)) (abs (snd h1 - snd h2))
meeting_point xs = foldl' (find_min_time) maxBound xs
where
time_to_point p = foldl' (\tacc p' -> tacc + (time_spent p p')) 0 xs
find_min_time min_time p = let x = time_to_point p in if x < min_time then x else min_time
main = do
n <- readLn :: IO Int
points <- fmap (map (\[x,y] -> (x,y)) . map (map (read :: String->Int)) . map words . lines) getContents
putStrLn $ show $ meeting_point points

This is the lambda with the non-exhaustive patterns: \[x,y] -> (x,y).
The non-exhaustive pattern is because the argument you've specified, [x,y] doesn't match any possible list - it only matches lists with precisely two elements.
I would suggest replacing it with a separate function with an error case to print out the unexpected data in an error message so you can debug further, e.g.:
f [x,y] = (x, y)
f l = error $ "Unexpected list: " ++ show l
...
points <- fmap (map f . map ...)

As an addition to #GaneshSittampalam's answer, you could also do this with more graceful error handling using the Maybe monad, the mapM function from Control.Monad, and readMaybe from Text.Read. I would also recommend refactoring your code so that the parsing is its own function, it makes your main function much cleaner and easier to debug.
import Control.Monad (mapM)
import Text.Read (readMaybe)
toPoint :: [a] -> Maybe (a, a)
toPoint [x, y] = Just (x, y)
toPoint _ = Nothing
This is just a simple pattern matching function that returns Nothing if it gets a list with length not 2. Otherwise it turns it into a 2-tuple and wraps it in Just.
parseData :: String -> Maybe [(Int, Int)]
parseData text = do
-- returns Nothing if a non-Int is encountered
values <- mapM (mapM readMaybe . words) . lines $ text
-- returns Nothing if a line doesn't have exactly 2 values
mapM toPoint values
Your parsing can actually be simplified significantly by using mapM and readMaybe. The type of readMaybe is Read a => String -> Maybe a, and in this case since we've specified the type of parseData to return Maybe [(Int, Int)], the compiler can infer that readMaybe should have the local type of String -> Maybe Int. We still use lines and words in the same way, but now since we use mapM the type of the right hand side of the <- is Maybe [[Int]], so the type of values is [[Int]]. What mapM also does for us is if any of those actions fails, the overall computation exits early with Nothing. Then we simply use mapM toPoint to convert values into a list of points, but also with the failure mechanism built in. We actually could use the more general signature of parseData :: Read a => String -> Maybe [(a, a)], but it isn't necessary.
main = do
n <- readLn :: IO Int
points <- fmap parseData getContents
case points of
Just ps -> print $ meeting_point ps
Nothing -> putStrLn "Invalid data!"
Now we just use fmap parseData on getContents, making points have the type Maybe [(Int, Int)]. Finally, we pattern match on points to print out the result of the meeting_point computation or print a helpful message if something went wrong.
If you wanted even better error handling, you could leverage the Either monad in a similar fashion:
toPoint :: [a] -> Either String (a, a)
toPoint [x, y] = Right (x, y)
toPoint _ = Left "Invalid number of points"
readEither :: Read a => String -> Either String a
readEither text = maybe (Left $ "Invalid parse: " ++ text) Right $ readMaybe text
-- default value ^ Wraps output on success ^
-- Same definition with different type signature and `readEither`
parseData :: String -> Either String [(Int, Int)]
parseData text = do
values <- mapM (mapM readEither . words) . lines $ text
mapM toPoint values
main = do
points <- fmap parseData getContents
case points of
Right ps -> print $ meeting_point ps
Left err -> putStrLn $ "Error: " ++ err

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell Processing text from a file - string

Related

How do I get the parameters out of the Maybe wrapper correctly?

How far does "try" back track?

How do I convert a string to a list of Maybe Int

Haskell - Rename duplicate values in a list of lists

Non-exhaustive patterns in lambda

Categories

Resources