Strange behaviour of `ReadP` with regards to `fmap head` - haskell

Consider the following repl session:
λ import Text.ParserCombinators.ReadP
λ x $$ y = readP_to_S x y
-- This auxiliary function makes things tidier.
λ many get $$ "abc"
[("","abc"),("a","bc"),("ab","c"),("abc","")]
-- This is reasonable.
λ fmap head (many get) $$ "abc"
[(*** Exception: Prelude.head: empty list
-- Wut?
λ fmap last (many get) $$ "abc"
[(*** Exception: Prelude.last: empty list
-- This works neither.
λ fmap id (many get) $$ "abc"
[("","abc"),("a","bc"),("ab","c"),("abc","")]
-- The list is there until I try to chop its head!
My questions:
What is happening here?
How can I extract a single (preferably longest) parse result?
P.S. My goal is to construct a parser combinator that greedily returns the repetitive application of a given parser. (get in this instance, but in actuality I have a more involved logic.) Chopping the list of intermediate results is one approach I thought would do, but I am fine with any, except that it is preferable not to convert to ReadS and back.

Related

Haskell sequence of IO actions processing with filtration their results in realtime+perfoming some IO actions in certain moments

I want to do some infinite sequence of IO actions processing with filtration their results in realtime+perfoming some IO actions in certain moments:
We have some function for reducing sequences (see my question haskell elegant way to filter (reduce) sequences of duplicates from infinte list of numbers):
f :: Eq a => [a] -> [a]
f = map head . group
and expression
join $ sequence <$> ((\l -> (print <$> l)) <$> (f <$> (sequence $ replicate 6 getLine)))
if we run this, user can generate any seq of numbers, for ex:
1
2
2
3
3
"1"
"2"
"3"
[(),(),()]
This means that at first all getLine actions performed (6 times in the example and at the end of this all IO actions for filtered list performed, but I want to do IO actions exactly in the moments then sequencing reduces done for some subsequences of same numbers.
How can I archive this output:
1
2
"1"
2
3
"2"
3
3
"3"
[(),(),()]
So I Want this expression not hangs:
join $ sequence <$> ((\l -> (print <$> l)) <$> (f <$> (sequence $ repeat getLine)))
How can I archive real-time output as described above without not blocking it on infinite lists?
Without a 3rd-party library, you can lazily read the contents of standard input, appending a dummy string to the end of the expected input to force output. (There's probably a better solution that I'm stupidly overlooking.)
import System.IO
print_unique :: (String, String) -> IO ()
print_unique (last, current) | last == current = return ()
| otherwise = print last
main = do
contents <- take 6 <$> lines <$> hGetContents stdin
traverse print_unique (zip <*> tail $ (contents ++ [""]))
zip <*> tail produces tuples consisting of the ith and i+1st lines without blocking. print_unique then immediately outputs a line if the following line is different.
Essentially, you are sequencing the output actions as the input is executed, rather than sequencing the input actions.
This seems like a job for a streaming library, like streaming.
{-# LANGUAGE ImportQualifiedPost #-}
module Main where
import Streaming
import Streaming.Prelude qualified as S
main :: IO ()
main =
S.mapM_ print
. S.catMaybes
. S.mapped S.head
. S.group
$ S.replicateM 6 getLine
"streaming" has an API reminiscent to that of lists, but works with effectful sequences.
The nice thing about streaming's version of group is that it doesn't force you to keep the whole group in memory if it isn't needed.
The least intuitive function in this answer is mapped, because it's very general. It's not obvious that streaming's version of head fits as its parameter. The key idea is that the Stream type can represent both normal effectful sequences, and sequences of elements on which groups have been demarcated. This is controlled by changing a functor type parameter (Of in the first case, a nested Stream (Of a) m in the case of grouped Streams).
mapped let's you transform that functor parameter while having some effect in the underlying monad (here IO). head processes the inner Stream (Of a) m groups, getting us back to an Of (Maybe a) functor parameter.
I found a nice solution with iterateUntilM
iterateUntilM (\_->False) (\pn -> getLine >>= (\n -> if n==pn then return n else (if pn/="" then print pn else return ()) >> return n) ) ""
I don't like some verbose with
(if pn/="" then print pn else return ())
if you know how to reduce this please comment)
ps.
It is noteworthy that I made a video about this function :)
And could not immediately apply it :(

Why is sequence [getLine, getLine, getLine] not evaluated lazily?

main = do
input <- sequence [getLine, getLine, getLine]
mapM_ print input
Let's see this program in action:
m#m-X555LJ:~$ runhaskell wtf.hs
asdf
jkl
powe
"asdf"
"jkl"
"powe"
Surprisingly to me, there seems to be no laziness here. Instead, all 3 getLines are evaluated eagerly, the read values are stored in memory and then, not before, all are printed.
Compare to this:
main = do
input <- fmap lines getContents
mapM_ print input
Let's see this in action:
m#m-X555LJ:~$ runhaskell wtf.hs
asdf
"asdf"
lkj
"lkj"
power
"power"
Totally different stuff. Lines are read one by one and printed one by one. Which is odd to me because I don't really see any differences between these two programs.
From LearnYouAHaskell:
When used with I/O actions, sequenceA is the same thing as sequence!
It takes a list of I/O actions and returns an I/O action that will
perform each of those actions and have as its result a list of the
results of those I/O actions. That's because to turn an [IO a] value
into an IO [a] value, to make an I/O action that yields a list of
results when performed, all those I/O actions have to be sequenced so
that they're then performed one after the other when evaluation is
forced. You can't get the result of an I/O action without performing
it.
I'm confused. I don't need to perform ALL IO actions to get the results of just one.
A few paragraphs earlier the book shows a definition of sequence:
sequenceA :: (Applicative f) => [f a] -> f [a]
sequenceA [] = pure []
sequenceA (x:xs) = (:) <$> x <*> sequenceA xs
Nice recursion; nothing here hints me that this recursion should not be lazy;just like in any other recursion, to get the head of the returned list Haskell doesn't have to go down through ALL steps of recursion!
Compare:
rec :: Int -> [Int]
rec n = n:(rec (n+1))
main = print (head (rec 5))
In action:
m#m-X555LJ:~$ runhaskell wtf.hs
5
m#m-X555LJ:~$
Clearly, the recursion here is performed lazily, not eagerly.
Then why is the recursion in the sequence [getLine, getLine, getLine] example performed eagerly?
As to why it is important that IO actions are run in order
regardless of the results: Imagine an action createFile :: IO () and
writeToFile :: IO (). When I do a sequence [createFile,
writeToFile] I'd hope that they're both done and in order, even
though I don't care about their actual results (which are both the
very boring value ()) at all!
I'm not sure how this applies to this Q.
Maybe I'll word my Q this way...
In my mind this:
do
input <- sequence [getLine, getLine, getLine]
mapM_ print input
should detoriate to something like this:
do
input <- do
input <- concat ( map (fmap (:[])) [getLine, getLine, getLine] )
return input
mapM_ print input
Which, in turn, should detoriate to something like this (pseudocode, sorry):
do
[ perform print on the result of getLine,
perform print on the result of getLine,
perform print on the result of getLine
] and discard the results of those prints since print was applied with mapM_ which discards the results unlike mapM
getContents is lazy, getLine isn't. Lazy IO isn't a feature of Haskell per se, it's a feature of some particular IO actions.
I'm confused. I don't need to perform ALL IO actions to get the results of just one.
Yes you do! That is one of the most important features of IO, that if you write a >> b or equivalently,
do a
b
then you can be sure that a is definitely "run" before b (see footnote). getContents is actually the same, it "runs" before whatever comes after it... but the result it returns is a sneaky result that sneakily does more IO when you try to evaluate it. That is actually the surprising bit, and it can lead to some very interesting results in practice (like the file you're reading the contents of being deleted or changed while you're processing the results of getContents), so in practical programs you probably shouldn't be using it, it mostly exists for convenience in programs where you don't care about such things (Code Golf, throwaway scripts or teaching for instance).
As to why it is important that IO actions are run in order regardless of the results: Imagine an action createFile :: IO () and writeToFile :: IO (). When I do a sequence [createFile, writeToFile] I'd hope that they're both done and in order, even though I don't care about their actual results (which are both the very boring value ()) at all!
Addressing the edit:
should detoriate to something like this:
do
input <- do
input <- concat ( map (fmap (:[])) [getLine, getLine, getLine] )
return input
mapM_ print input
No, it actually turns into something like this:
do
input <- do
x <- getLine
y <- getLine
z <- getLine
return [x,y,z]
mapM_ print input
(the actual definition of sequence is more or less this:
sequence [] = return []
sequence (a:as) = do
x <- a
fmap (x:) $ sequence as
Technically, in
sequenceA (x:xs) = (:) <$> x <*> sequenceA xs
we find <*>, which first runs the action on the left, then the action on the right, and finally applies their result together. This is what makes the first effect in the list to be occur first, and so on.
Indeed, on monads, f <*> x is equivalent to
do theF <- f
theX <- x
return (theF theX)
More in general, note that all the IO actions are generally executed in order, first to last (see below for a few rare exceptions). Doing IO in a completely lazy way would be a nightmare for the programmer. For instance, consider:
do let aX = print "x" >> return 4
aY = print "y" >> return 10
x <- aX
y <- aY
print (x+y)
Haskell guarantees that the output is x y 14, in that order. If we had completely lazy IO we could also get y x 14, depending on which argument is forced first by +. In such case, we would need to know exactly the order in which the lazy thunks are demanded by every operation, which is something the programmer definitely does not want to care about. Under such detailed semantics, x + y is no longer equivalent to y + x, breaking equational reasoning in many cases.
Now, if we wanted to force IO to be lazy we could use one of the forbidden functions, e.g.
do let aX = unsafeInterleaveIO (print "x" >> return 4)
aY = unsafeInterleaveIO (print "y" >> return 10)
x <- aX
y <- aY
print (x+y)
The above code makes aX and aY lazy IO actions, and the order of the output is now at the whim of the compiler and the library implementation of +. This is in general dangerous, hence the unsafeness of lazy IO.
Now, about the exceptions. Some IO actions which only read from the environment, like getContents were implemented with lazy IO (unsafeInterleaveIO). The designers felt that for such reads, lazy IO can be acceptable, and that the precise timing of the reads is not that important in many cases.
Nowadays, this is controversial. While it can be convenient, lazy IO can be too unpredictable in many cases. For instance, we can't know where the file will be closed, and that could matter if we're reading from a socket. We also need to be very careful not to force the reads too early: that often leads to a deadlock when reading from a pipe. Today, it is usually preferred to avoid lazy IO, and resort to some library like pipes or conduit for "streaming"-like operations, where there is no ambiguity.

string variable as haskell command

when I do following it works
print [1..5]
and result [1,2,3,4,5]
but why following is not working
let x = "[1..5]"
print x
I want to process a string variable as haskell command. can someone please help me in it.
Note that your second example:
let x = "[1..5]"
print x
works just fine, it just says something different than you intended.
If you wish to consider some string as a valid Haskell expression then you'll need to interpret that string via some Haskell interpreter. The most common interpreter is accessed via the ghc-api. A clean wrapper for the ghc-api is the hint package.
A simple example of using hint is (via ghci):
import Language.Haskell.Interpreter
let x = "[1..5]"
Right result <- runInterpreter $ setImports ["Prelude"] >> eval x
print result
The above code will:
Import an Interpreter module from the hint package
Set a string, x, which is the expression you desire to evaluate
Run the interpreter on the expression
Print the result (which is already a string, so you might prefer putStrLn result).
If you just want to get a list as a string, take advantage of the show function
let x = show [1..5]
print x
Your first answer "works" because function application is right associative, so Haskell evaluates [1..5] to produce the list [1,2,3,4,5] and passes this to the print function.
It looks like you're looking for System.Eval.Haskell.eval or one of its variants. In this case, I believe that
import System.Eval.Haskell
do x <- eval "[1..5]" [] :: IO (Maybe Int List)
putStrLn (if isJust x then "" else show $ fromJust x)
will do what you want.

Outputting Haskell GHCi command results to a txt file

I am new to Haskell.
I am having a really difficult time outputting command results from GHCi to a file. I was wondering if someone can give me a simple explanation on how to do this? The examples I have found online so far seem over complicated.
This post on Reddit describes how to colorize your GHCi output (GHC >= 7.6). Instead of a prettyprinter, you could specify a logging function. For example, add the following to your .ghci.conf:
:{
let logFile = "/home/david/.ghc/ghci.log"
maxLogLength = 1024 -- max length of a single write
logPrint x = appendFile logFile (take maxLogLength (show x) ++ "\n") >> print x
:}
:set -interactive-print=logPrint
This will log GHCi's output to ghci.log.
The logging file must already exist, otherwise appendFile will complain. You'll have to create that manually.
It has to fit in a let statement, otherwise GHCi will reject it. Use :{ :} to add multiline support in GHCi.
Apparently, using :l gets rid of all imports you've made in your ghci.conf, therefore you're limited to Prelude functions. The Reddit post mentions that you can somehow redefine :l, but I don't know anythng about that. (If you know how to do this, you can of course automatically generate the logfile if it doesn't exist.)
Let's suppose you have a function mungeData and you do
ghci> mungeData [1..5]
[5,2,5,2,4,6,7,4,6,78,4,7,5,3,57,7,4,67,4,6,7,4,67,4]
writeFile
You can write this to file like this:
ghci> writeFile "myoutput.txt" (show (mungeData [1..5])
I'd be inclined to write
ghci> writeFile "myoutput.txt" $ show $ mungeData [1..5]
to get rid of a few brackets.
Reading it back in
You could get that back using
ghci> fmap (read::String -> [Int]) $ readFile "myoutput.txt"
One number per line
You could output it a line per number like this:
ghci> writeFile "myoutput'.txt" $ unlines.map show $ mungeData [1..5]
which reads back in as
ghci> fmap (map read.lines::String -> [Int]) $ readFile "myoutput'.txt"

Unable to understand a mutual recursion

I am reading Programming In Haskell, in the 8th chapter, the author gives an example of writing parsers.
The full source is here: http://www.cs.nott.ac.uk/~gmh/Parsing.lhs
I can't understand the following part: many permits zero or more applications of p,
whereas many1 requires at least one successful application:
many :: Parser a → Parser [a ]
many p = many1 p +++ return [ ]
many1 :: Parser a → Parser [a ]
many1 p = do v ← p
vs ← many p
return (v : vs)
How the recursive call happens at
vs <- many p
vs is the result value of many p, but many p called many1 p, all many1 has in its definition is a do notation, and again has result value v, and vs, when does the recursive call return?
Why does the following snippet can return [("123","abc")] ?
> parse (many digit) "123abc"
[("123", "abc")]
The recursion stops at the v <- p line. The monadic behavior of the Parser will just propagate a [] to the end of the computation when p cannot be parsed anymore.
p >>= f = P (\inp -> case parse p inp of
[] -> [] -- this line here does not call f
[(v,out)] -> parse (f v) out)
The second function is written in do-notation, which is just a nice syntax for the following:
many1 p = p >>= (\v -> many p >>= (\vs -> return (v : vs)))
If parsing p produces an empty list [] the function \v -> many p >>= (\vs -> return (v : vs)) will not be called, stopping the recursion.
For the last question:
> parse (many digit) "123abc"
[("123", "abc")]
Means that parsing has been successful as at least one result has been returned in the answer list. Hutton parsers always return a list - the empty list means parsing failure.
The result ("123", "abc") means that parsing has found three digits "123" and stopped at 'a' which is not a digit - so the "rest of the input" is "abc".
Note that many means "as many as possibly" not "one or more". If it were "one or more" you'd get this result instead:
[("1", "23abc"), ("12", "3abc"), ("123", "abc")]
This behaviour wouldn't be very good for deterministic parsing, though it might sometimes be needed for natural language parsing.
Let me strip this down to the barest bones to make absolutely clear why do-blocks can be misunderstood if they're read simply as imperative code. Consider this snippet:
doStuff :: Maybe Int
doStuff = do
a <- Nothing
doStuff
It looks like doStuff will recurse forever, after all, it's defined to do a sequence of things ending with doStuff. But the sequence of lines in a do-block is not simply a sequence of operations that is performed in order. If you're at a point in a do-block, the way the rest of the block is handled is determined by the definition of >>=. In my example, the second argument to >>= is only used if the first argument isn't Nothing. So the recursion never happens.
Something similar can happen in many different monads. Your example is just a little more complex: when there are no more ways to parse something, the stuff after the >>= is ignored.

Resources