how to parse a uniprot-file with parsec? - haskell

I am a newbie to Haskell, but it seems like a powerful language that I want to learn. I was adopting some code from the chapter in real world Haskell on parsec. I tried to make my own version of it parsing the content of a uniprot-file. This is a file that consists of records (that starts with ">"), and where each record consists of lines. My code seems very close to what is done in the example, but I am getting a lot of errors - mostly on types. My exception is among other that I am taking the output of readFile (IO string) instead of a string. I would appreciate it if someone could help me understand what is wrong in my approach...
import Text.ParserCombinators.Parsec
main:: IO()
parseSprot :: IO String -> Either ParseError [[String]]
parseSprot input = parse uniprotFile "(unknown)" input
where
uniprotFile = endBy record eol
record = sepBy lines (char '>')
lines = many (noneOf ",\n")
eol = char '\n'
main = do
parseSprot $ readFile "uniprot_sprot.fasta"
putStrLn "hey"

parseSprot doesn't need an IO in its signature.
parseSprot :: String -> Either ParseError [[String]]
...
The result of readFile is an IO String. You can do something with this String by binding the result of the readFile action into a new IO action. In do notation you can bind the result to a variable with <-
main = do
fileContents <- readFile "uniprot_sprot.fasta"
The parseSprot function doesn't return a result in IO, you can use it anywhere. In do notation we tell the difference between a result bound to a variable and a declaration by using different syntax. x <- ... binds a result to a variable. let x = ... declares x to be whatever is on the right hand side.
main = do
fileContents <- readFile "uniprot_sprot.fasta"
let parsedContents = parseSprot fileContents
To test what your parser is doing, you might want to print the value returned from parse.
main = do
fileContents <- readFile "uniprot_sprot.fasta"
let parsedContents = parseSprot fileContents
print parsedContents
Without do notation you can write this as
main = readFile "uniprot_sprot.fasta" >>= print . parseSprot
>>= takes the result of the first computation and feeds it into a function to decide what to do next.

Related

Read from file or from command line arguments haskell

I am designing the IO for a program I have written in Haskell. I would like to be able to read some arguments from file using -f or from the command line by default. The command line arguments get split up into convenient chunks meaning I do not need to parse them. However this does not happen when reading from file. So I wrote a simple little parser to do what I wanted.
parseFile :: String -> [String]
parseFile [] = [""]
parseFile ('"':xs) = parseString xs '"'
parseFile ('\'':xs) = parseString xs '\''
parseFile (' ':xs) = "":parseFile xs
parseFile ('\t':xs) = "":parseFile xs
parseFile ('\n':xs) = "":parseFile xs
parseFile (s:xs) = (\(a:xa)->(a++[s]):xa)$ parseFile xs
parseString :: String -> Char -> [String]
parseString (s:xs) a
| s == a = "":parseFile xs
| otherwise = (\(a:xa)->(a++[s]):xa)$ parseString xs a
I thought this would be pretty simple I would do something like
let myInput = if (flagDetected) then (parseFile $ readFile $ last args) else (args)
However the result of readFile is an IO action and not a string thus I cannot parse it. I've tried a number of configurations all which have fails, mostly for typing reasons. I've tried assigning the results before parsing, which resulted in a type mismatch between args which is a [[Char]] ( it is the result of using getArgs from the System.Environment module) and the result of readFile which is still an IO String.
I tried wrapping args with a return which of course doesn't fix this problem because of a type mismatch. I'm really at a loss of ideas now. I feel like this is probably a problem with the way I am thinking about the issue.
How can I get the behavior I desire?
Here's a related question I asked earlier. The root of the problem is the same but the answers on the last one were far to specific to help me here. It seems this is a frequent problem for me.
You need to do an inversion of control in the flow of your data.. Haskell's IO datatypes are there to manage the effects of i/o operations. Read string from a file is not a fact, the file couldn't exist, can be empty or so on. Many possibilities could happend. So for that reason main function use to be based on IO. The way you can use to "extract" wrapped value from a IO is using do-return block
inputIO = if flagDetected then
do
cnt <- (readFile path)
return (parseFile cnt)
else return args
main = do
input <- inputIO
return yourProgram input
Look here

Read a single int from a file and print it

How do I create a program that reads a line from a file, parse it to an int and print it(ignoring exceptions of course). Is there anything like "read" but for IO String?
I've got this so far but I couldn't get around the IO types:
readFromFile = do
inputFile <- openFile "catalogue.txt" ReadMode
isbn <- read( hGetLine inputFile)
hClose inputFile
You can specify the type explicitly, change the read line to
isbn <- fmap read (hGetLine inputFile) :: IO Int
As hGetLine inputFile is of type IO String, you should use fmap to get "inside" to read as an Int.
You can use the readFile function to convert your file to a string.
main = do
contents <- readFile "theFile"
let value = read $ head $ lines contents::Int
print value
You should add better error detection, or this program will fail if there isn't a first line, or if the value is malformed, but this is the basic flow....
First, observe that reading stuff and then immediately printing it can result in mysterious errors:
GHCi, version 8.0.0.20160421: http://www.haskell.org/ghc/ :? for help
Prelude λ read "123"
*** Exception: Prelude.read: no parse
The reason is that you don't specify what type you want to read. You can counter this by using type annotations:
Prelude λ read "123" :: Integer
123
but it is sometimes easier to introduce a little helper function:
Prelude λ let readInteger = read :: String -> Integer
Prelude λ readInteger "123"
123
Now to the main problem. read( hGetLine inputFile) doesn't work because hGetLine inputFile returns and IO String and read needs a String. This can be solved in two steps:
line <- hGetLine inputFile
let isbn = readInteger line
Note two different constructs <- and let .. =, they do different things. Can you figure out exactly what?
As shown in another answer, you can do it in a less verbose manner:
isbn <- fmap readInteger (hGetLine inputFile)
which is great if you do a simple thing like read. But it is often desirable to explicitly name intermediate results. You can use <- and let .. = constructs in such cases.

Read in multiple lines from standard input with arguments in Haskell

I'm trying to read in multiple lines from standard input in Haskell, plus one argument, then do something with the current line and write something to the standard output.
In my case I am trying to normalize lambda expressions. The program may receive 1 or more lambda expressions to normalize and then it has to write the result (normalized form or error) to the standard output. And the program may receive an argument (the max number of reductions). Here is the main function:
main :: IO ()
main = do
params <- getArgs
fullLambda <- getLine
let lambda = convertInput fullLambda
let redNum | (length params) == 1 = read (head params)
| otherwise = 100
case (parsing lambda) of
Left errorExp -> putStrLn ("ERROR: " ++ lambda)
Right lambdaExp -> do
let normalizedLambdaExp = reduction lambdaExp redNum
if (isNormalForm normalizedLambdaExp) && (isClosed lambdaExp)
then putStrLn ("OK: " ++ show normalizedLambdaExp)
else putStrLn ("ERROR: " ++ lambda)
where
convertInput :: String -> String
convertInput ('\"':xs) = take ((length xs) - 2) xs
convertInput input = input
So this code handles one line and completes the reductions and then writes something to the standard output. How can I change this to handle multiple lines? I've read about replicateM but I can't seem to grasp it. My mind is very OO so I was thinking maybe some looping somehow, but that is surely not the preferred way.
Also, this program has to be able to run like this:
echo "(\x.x) (\x.x)" | Main 25
And will produce:
OK: (\x.x)
And if there are multiple lines, it has to produce the same kind of output for each line, in new lines.
But also has to work without the argument, and has to handle multiple lines. I spent time on google and here, but I'm not sure how the argument reading will happen. I need to read in the argument once and the line(s) once or many times. Does someone know a not too lengthy solution to this problem?
I've tried it like this, too (imperatively):
main :: IO ()
main = do
params <- getArgs
mainHelper params
main
mainHelper :: [String] -> IO ()
mainHelper params = do
fullLambda <- getLine
And so on, but then it puts this to the standard output as well:
Main: <stdin>: hGetLine: end of file
Thank you in advance!
It appears you want to:
Parse a command line option which may or may not exist.
For each line of input process it with some function.
Here is an approach using lazy IO:
import System.Environment
import Control.Monad
main = do args <- getArgs
let option = case args of
[] -> ... the default value...
(a:_) -> read a
contents <- getContents
forM_ (lines contents) $ \aline -> do
process option aline
I am assuming your processing function has type process :: Int -> String -> IO (). For instance, it could look like:
process :: Int -> String -> IO ()
process option str = do
if length str < option
then putStrLn $ "OK: " ++ str
else putStrLn $ "NOT OK: line too long"
Here's how it works:
contents <- getContents reads all of standard input into the variable contents
lines contents breaks up the input into lines
forM_ ... iterates over each line, passing the line to the process function
The trick is that getContents reads standard input lazily so that you'll get some output after each line is read.
You should be aware that there are issues with lazy IO which you may run into when your program becomes more complex. However, for this simple use case lazy IO is perfectly fine and works well.

First Haskell IO program isn't working

Sorry, this is probably really dumb, but can someone explain me why this program doesn't compile? I get Couldn't match expected type 'a1 -> String' with actual type 'IO String'.
import System.Environment
main = do
[first, last] <- getArgs
firstnames <- lines . readFile "firstnames_male"
lastnames <- lines . readFile "lastnames"
print firstnames
You can't do lines . readFile "lastnames".
The readFile function returns an IO String, not a String.
You can, however, use the fmap function (or the <$> operator) to achieve this:
main = do
[first, last] <- argArgs
firstnames <- lines `fmap` readFile "firstnames_males"
...
This works because IO is a functor.

Haskell Let/In in main function

My code:
import System.IO
main :: IO()
main = do
inFile <- openFile "file.txt" ReadMode
content <- hGetContents inFile
let
someValue = someFunction(content)
in
print(anotherFunction(someValue))
print(anotherFunction2(someValue))
hClose inFile
My error:
- Type error in application
*** Expression : print (anotherFunction2(someValue))
*** Term : print
*** Type : e -> IO ()
*** Does not match : a -> b -> c -> d
I need to print two or more lines with functions that require "someValue".
How I can fix it?
The cause of that error message is that when you write
let
someValue = someFunction(content)
in
print(anotherFunction(someValue))
print(anotherFunction2(someValue))
the two print statements are actually parsed as one:
print (anotherFunction (someValue)) print (anotherFunction2 (someValue))
In other words, it thinks the second print as well as (anotherFunction2 (someValue)) are also arguments to the first print. This is why it complains that e -> IO () (the actual type of print) does not match a -> b -> c -> d (a function taking three arguments).
You can fix this by adding a do after the in to make it parse the two statements as separate:
let
someValue = someFunction(content)
in do
print(anotherFunction(someValue))
print(anotherFunction2(someValue))
Though, it's better to use the do-notation form of let here, without any in:
import System.IO
main :: IO()
main = do
inFile <- openFile "file.txt" ReadMode
content <- hGetContents inFile
let someValue = someFunction content
print (anotherFunction someValue)
print (anotherFunction2 someValue)
hClose inFile
I also got rid of some redundant parentheses in the above code. Remember, they are only used for grouping, not for function application in Haskell.
When you use let binding in a do block, don't use the in keyword.
main :: IO()
main = do
inFile <- openFile "file.txt" ReadMode
content <- hGetContents inFile
let someValue = someFunction(content)
print(anotherFunction(someValue))
print(anotherFunction2(someValue))
hClose inFile

Resources