How to read text files into a haskell program? - haskell

I have a text file which contains two lists on each line. Each list can contain any number of alphanumeric arguments.
eg [t1,t2,...] [m1,m2,...]
I can read the file into ghc, but how can I read this into another main file and how can the main file recognise each argument separately to then process it?

I think it's best for you to figure out most of this for yourself, but I've got some pointers for you.
Firstly, try not to deal with the file access until you've got the rest of the code working, otherwise you might end up having IO all over the place. Start with some sample data:
sampleData = "[m1,m2,m3][x1,x2,x3,x4]\n[f3,f4,f5][y7,y8,y123]\n[m4,m5,m6][x5,x6,x7,x8]"
You should not mention sampleData anywhere else in your code, but you should use it in ghci for testing.
Once you have a function that does everything you want, eg processLists::String->[(String,String)], you can replcae readFile "data.txt" :: IO String with
readInLists :: FilePath -> IO [(String,String)]
readInLists filename = fmap processLists (readFile filename)
If fmap makes no sense to you, you could read a tutorial I accidentally wrote.
If they really are alphanumeric, you can split them quite easily. Here are some handy functions, with examples.
tail :: [a] -> [a]
tail "(This)" = "This)"
You can use that to throw away something you don't want at the front of your string.
break :: (Char->Bool) -> String -> (String,String)
break (== ' ') "Hello Mum" = ("Hello"," Mum")
So break uses a test to find the first character of the second string, and breaks the string just before it.
Notice that the break character is still there at the front of the next string. span is the same but uses a test for what to have in the first list, so
span :: (Char->Bool) -> String -> (String,String)
span (/= ' ') "Hello Mum" = ("Hello"," Mum")
You can use these functions with things like (==','), or isAlphaNum (you'll have to import Data.Char at the top of your file to use it).
You might want to look at the functions splitWith and splitOn that I have in this answer. They're based on the definitions of split and words from the Prelude.

Related

How to read three consecutive integers from stdin in Haskell?

I want to read an input like 12 34 56 into three integers using Haskell.
For a single integer, one might use myInteger <- readLn. But for this case, I have not found any solution, except the one of first reading a line, then replacing all spaces with ,, (using something like:
spaceToCommas str =
let repl ' ' = ','
repl c = c
in map repl str
) and then calling read '[' ++ str ++ ']' which feels very hackish. Also, it does not allow me to state that I want to read three integers, it will attempt to read any amount of integers from stdin.
There has to be a better way.
Note that I would like a solution that does not rely on external packages. Using e.g. Parsec is of course great, but this simple example should not require the use of a full-fledged Parser Combinator framework, right?
What about converting the string like:
convert :: Read a => String -> [a]
convert = map read . words
words splits the given string into a list of strings (the "words") and then we perform a read on every element using map.
and for instance use it like:
main = do
line <- getLine
let [a,b,c] = convert line :: [Int] in putStrLn (show (c,a,b))
or if you for instance want to read the first three elements and don't care about the rest (yes this apparently requires super-creativity skills):
main = do
line <- getLine
let (a:b:c:_) = convert line :: [Int] in putStrLn (show (c,a,b))
I here returned a tuple that is rotated one place to the right to show parsing is done.

How to read line by line from a file in Haskell

im trying to make a programm that should read line by line from a file and check if its a palindrom, if it is, then print.
I'm really new to haskell so the only thing i could do is just print out each line, with this code :
main :: IO()
main = do
filecontent <- readFile "palindrom.txt"
mapM_ putStrLn (lines filecontent)
isPalindrom w = w==reverse w
The thing is, i dont know how to go line by line and check if the line is a palindrom ( note that in my file, each line contains only one word). Thanks for any help.
Here is one suggested approach
main :: IO()
main = do
filecontent <- readFile "palindrom.txt"
putStrLn (unlines $ filter isPalindrome $ lines filecontent)
isPalindrome w = w==reverse w
The part in parens is pure code, it has type String->String. It is generally a good idea to isolate pure code as much as possible, because that code tends to be the easiest to reason about, and often is more easily reusable.
You can think of data as flowing from right to left in that section, broken apart by the ($) operators. First you split the content into separate lines, then filter only the palindromes, finally rebuild the full output as a string. Also, because Haskell is lazy, even though it looks like it is treating the input as a single String in memory, it actually is only pulling the data as needed.
Edited to add extra info....
OK, so the heart of the soln is the pure portion:
unlines $ filter isPalindrome $ lines filecontent
The way that ($) works is by evaluating the function to the right, then using that as the input of the stuff on the left. In this case, filecontent is the full input from the file (a String, including newline chars), and the output is STDOUT (also a full string including newline chars).
Let's follow sample input through this process, "abcba\n1234\nK"
unlines $ filter isPalindrome $ lines "abcba\n1234\nK"
First, lines will break this into an array of lines
unlines $ filter isPalindrome ["abcba", "1234", "K"]
Note that the output of lines is being fed into the input for filter.
So, what does filter do? Notice its type
filter :: (a -> Bool) -> [a] -> [a]
This takes 2 input params, the first is a function (which isPalendrome is), the second a list of items. It will test each item in the list using the function, and its output is the same list input, minus items that the function has chosen to remove (returned False on). In our case, the first and third items are in fact palendromes, the second not. Our expression evaluates as follows
unlines ["abcba", "K"]
Finally, unlines is the opposite of lines.... It will concatinate the items again, inserting newlines in between.
"abcba\nK"
Since STDIO itself is a String, this is ready for outputting.
Note that is it perfectly OK to output a list of Strings using non-pure functions, as follows
forM ["1", "2", "3"] $ \item -> do
putStrLn item
This method however mixes pure and impure code, and is considered slightly less idiomatic Haskell code than the former. You will still see this type of thing a lot though!
Have a look at the filter function. You may not want to put all processing on a single line, but use a let expression. Also, your indentation is off:
main :: IO ()
main = do
filecontent <- readFile "palindrom.txt"
let selected = filter ... filecontent
...

Can I create a function in Haskell that will encapsulate reading data from file and returning me a simple list of data?

Consider the code below taken from a working example I've built to help me learn Haskell. This code parses a CSV file containing stock quotes downloaded from Yahoo into a nice simple list of bars with which I can then work.
My question: how can I write a function that will take a file name as its parameter and return an OHLCBarList so that the first four lines inside main can be properly encapsulated?
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
I've tried to do this myself but with my limited Haskell knowledge, I'm failing miserably.
import qualified Data.ByteString as BS
type Filename = String
getContentsOfFile :: Filename -> IO BS.ByteString
barParser :: Parser Bar
barParser = do
time <- timeParser
char ','
open <- double
char ','
high <- double
char ','
low <- double
char ','
close <- double
char ','
volume <- decimal
char ','
return $ Bar Bar1Day time open high low close volume
type OHLCBar = (UTCTime, Double, Double, Double, Double)
type OHLCBarList = [OHLCBar]
barsToBarList :: [Either String Bar] -> OHLCBarList
main :: IO ()
main = do
contents :: C.ByteString <- getContentsOfFile "PriceData/Daily/yhoo1.csv" --PriceData/Daily/Yhoo.csv"
let lineList :: [C.ByteString] = C.lines contents -- Break the contents into a list of lines
let bars :: [Either String Bar] = map (parseOnly barParser) lineList -- Using the attoparsec
let ohlcBarList :: OHLCBarList = barsToBarList bars -- Now I have a nice simple list of tuples with which to work
--- Now I can do simple operations like
print $ ohlcBarList !! 0
If you really want your function to have type Filename -> OHLCBarList, it can't be done.* Reading the contents of a file is an IO operation, and Haskell's IO monad is specifically designed so that values in the IO monad can never leave. If this restriction were broken, it would (in general) mess with a lot of things. Instead of doing this, you have two options: make the type of getBarsFromFile be Filename -> IO OHLCBarList — thus essentially copying the first four lines of main — or write a function with type C.ByteString -> OHLCBarList that the output of getContentsOfFile can be piped through to encapsulate lines 2 through 4 of main.
* Technically, it can be done, but you really, really, really shouldn't even try, especially if you're new to Haskell.
Others have explained that the correct type of your function has to be Filename -> IO OHLCBarList, I'd like to try and give you some insight as to why the compiler imposes this draconian measure on you.
Imperative programming is all about managing state: "do certain operations to certain bits of memory in sequence". When they grow large, procedural programs become brittle; we need a way of limiting the scope of state changes. OO programs encapsulate state in classes but the paradigm is not fundamentally different: you can call the same method twice and get different results. The output of the method depends on the (hidden) state of the object.
Functional programming goes all the way and bans mutable state entirely. A Haskell function, when called with certain inputs, will always produce the same output. Simple examples of
pure functions are mathematical operators like + and *, or most of the list-processing functions like map. Pure functions are all about the inputs and outputs, not managing internal state.
This allows the compiler to be very smart in optimising your program (for example, it can safely collapse duplicated code for you), and helps the programmer not to make mistakes: you can't put the system in an invalid state if there is none! We like pure functions.
The exception to the rule is IO. Code that performs IO is impure by definition: you could call getLine a hundred times and never get the same result, because it depends on what the user typed. Haskell handles this using the type system: all impure functions are marred with the IO type. IO can be thought of as a dependency on the state of the real world, sort of like World -> (NewWorld, a)
To summarise: pure functions are good because they are easy to reason about; this is why Haskell makes functions pure by default. Any impure code has to be labelled as such with an IO type signature; this tells the compiler and the reader to be careful with this function. So your function which reads from a file (a fundamentally impure action) but returns a pure value can't exist.
Addendum in response to your comment
You can still write pure functions to operate on data that was obtained impurely. Consider the following straw-man:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
line <- getLine
let numberStrings = words line
let numbers = map read numberStrings
putStrLn $ "The result of the calculation is " ++ (show $ foldr1 (*) numbers + 10)
Lots of code inside IO here. Let's extract some functions:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
result <- fmap processLine getLine -- fmap :: (a -> b) -> IO a -> IO b
-- runs an impure result through a pure function
-- without leaving IO
putStrLn $ "The result of the calculation is " ++ result
processLine :: String -> String -- look ma, no IO!
processLine = show . calculate . readNumbers
readNumbers :: String -> [Int]
readNumbers = map read . words
calculate :: [Int] -> Int
calculate numbers = product numbers + 10
product :: [Int] -> Int
product = foldr1 (*)
I've pulled logic out of main into pure functions which are easier to read, easier for the compiler to optimise, and more reusable (and so more testable). The program as a whole still lives inside IO because the data is obtained impurely (see the last part of this answer for a more thorough treatment of this argument). Impure data can be piped through pure functions using fmap and other combinators; you should try to put as little logic in main as possible.
Your code does seem to be most of the way there; as others have suggested you could extract lines 2-4 of your main into another function.
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
You cannot do this without getting all sorts of errors about IO stuff because this type for getBarsFromFile misses an IO. Probably that's what the errors about IO stuff are trying to tell you. Did you try understanding and fixing the errors?
In your situation, I would start by abstracting over the second to fourth line of your main in a function:
parseBars :: ByteString -> OHLCBarList
And then I would combine this function with getContentsOfFile to get:
getBarsFromFile :: FilePath -> IO OHLCBarList
This I would call in main.

Load property file into Map structure

I have a Java like property text files with key value pairs. What are some good approaches for loading that data into haskell and then accessing it.
The file look likes:
XXXX=vvvvv
YYYY=uuuuu
I want to be able to access the "XXXX" key.
You could use a parser library like the excellent Parsec (part of the Haskell Platform). Writing a parser for a format that simple would only take a few minutes.
However, if it's really that simple, you could use split; split the string into lines using the standard lines function (or use Data.List.Split if you need to handle blank lines, etc.), and then use the Data.List.Split functions to split it on '='.
The simplest solution would be rolling your own with break:
import Control.Arrow
parse :: String -> [(String, String)]
parse = map parseField . lines
where parseField = second (drop 1) . break (== '=')
However, this doesn't handle whitespace, blank lines, or anything like that.
As for looking up by key, once you have a structure like [(String, String)], it's easy to put it into a Map (with fromList) and operate on that.
Exploring a few details ehird didn't mention, and a slightly different approach:
import qualified Data.Map as Map
type Key = String
type Val = String
main = do
-- get contents of file (contents :: String)
contents <- readFile "config.txt"
-- split into lines (optionList :: [String])
let optionList = lines contents
-- parse into map (optionMap :: Map Key Val)
let optionMap = optionMapFromList optionList
doStuffWith optionMap
optionMapFromList :: [String] -> Map.Map Key Val
optionMapFromList = foldr step Map.empty
where step line map = case parseOpt line of
Just (key, val) -> Map.insert key val map
Nothing -> map
parseOpt :: String -> Maybe (Key, Val)
parseOpt = undefined
I've expressed my solution to your problem as a fold: taking the list of lines in the file, and turning it into the desired map. Each step of the fold involves inspecting a single line, attempting to parse it into a key/value pair, and when successful, inserting it into the map.
I've left parseOpt undefined; you could use an approach like ehird's parseField, or whatever you like. Perhaps you would prefer to only parse specific options:
interestingOpts = ["XXXX", "YYYY"]
parseOpt line = case find (`isPrefixOf` line) interestingOpts of
Just key -> Just (key, drop 1 $ dropWhile (/= '=') line)
Nothing -> Nothing
Using the prefix testing approach isn't always the best idea, though, if you have (for example) an option "XX" and an option "XXXX". Play around and see what approach suits your data best. If you need high performance, look into using Data.Text instead of Strings.

How do I get a search match from a list of strings in Haskell?

How do I get a search match from a list of strings in Haskell?
module Main
where
import List
import IO
import Monad
getLines = liftM lines . readFile
main = do
putStrLn "Please enter your name: "
name <- getLine
list <- getLines "list.txt"
-- mapM_ putStrLn list -- this part is to list out the input of lists
The first thing to do, the all-important first principle, is to get as much of the thinking out of main or out of IO as possible. main should where possible contain all the IO and maybe nothing but IO decorated with pure terms you define elsewhere in the module. Your getLines is mixing them unnecessarily.
So, to get that out of the way, we should have a main that is something like
main =
do putStrLn "What is your name?"
name <- getContents
names <- readFile "names.txt"
putStrLn (frankJ name names)
-- or maybe the more austere segregation of IO from all else that we get from:
main =
do putStrLn greeting
name <- getContents
names <- readFile nameFile
putStrLn (frankJ name names)
together with the 'pure' terms:
greeting, nameFile :: String
greeting = "What is your name?"
nameFile = "names.txt"
Either way, we are now really in Haskell-land: the problem is now to figure out what the pure function:
frankJ :: String -> String -> String
should be.
We might start with a simple matching function: we get a match when the first string appears on a list of strings:
match :: String -> [String] -> Bool
match name namelist = name `elem` namelist
-- pretty clever, that!
or we might want to normalize a bit, so that white space at the beginning and end of the name we are given and the names on the list doesn't affect the match. Here's a rather shabby way to do that:
clean :: String -> String
clean = reverse . omitSpaces . reverse . omitSpaces
where omitSpaces = dropWhile (== ' ')
Then we can improve on our old match, i.e. elem:
matchClean :: String -> [String] -> Bool
matchClean name namelist = match (clean name) (map clean namelist)
Now we need to follow the types, figuring out how to fit the type of, say, matchClean:: String -> [String] -> Bool with that of frankJ :: String -> String -> String. We want to fit it inside our definition of frankJ.
Thus, to 'provide input' for matchClean, we need a function to take us from a long string with newlines to the list of stings (the names) that matchClean needs: that's the Prelude function lines.
But we also need to decide what to do with the Bool that matchClean yields as value; frankJ, as we have it, returns a String. Let us continue with simple-minded decomposition of the problem:
response :: Bool -> String
response False = "We're sorry, your name does not appear on the list, please leave."
response True = "Hey, you're on the A-list, welcome!"
Now we have materials we can compose into a reasonable candidate for the function frankJ :: String -> String -> String that we are feeding into our IO machine defined in main:
frankJ name nametext = response (matchClean name (lines nametext))
-- or maybe the fancier:
-- frankJ name = response . matchClean name . lines
-- given a name, this
-- - pipes the nametext through the lines function, splitting it,
-- - decides whether the given name matches, and then
-- - calculates the 'response' string
So here, almost everything is a matter of pure functions, and it is easy to see how to emend things for further refinement. For example, maybe the name entered and the lines of the text file should be further normalized. Internals spaces should be restricted to one space, before the comparison. Or maybe there is a comma in lines on the list since people are listed as "lastname, firstname", etc. etc. Or maybe we want the response function to use the person's name:
personalResponse :: String -> Bool -> String
personalResponse name False = name ++ " is a loser, as far as I can tell, get out!"
personalResponse name True = "Ah, our old friend " ++ name ++ "! Welcome!"
together with
frankJpersonal name = personalResponse name . matchClean name . lines
Of course there are a million ways of going about this. For example, there are regex libraries. The excellent and simple Data.List.Split from Hackage might also be of use, but I'm not sure it can be used by Hugs, which you might be using.
I note that you are using old-fashioned names for the imported modules. What I have written uses only the Prelude so imports are unnecessary, but the other modules are now called "System.IO", "Data.List" and "Control.Monad" in accordance with the hierarhical naming system. I wonder if you are using an old tutorial or manual. Maybe the pleasant 'Learn You a Haskell' site would be better? He affirms he's using ghc but I think that won't affect much.
If you wan't a list of all lines in your list.txt that contain the name,
you can simply use
filter (isInfixOf name) list
but I'm not sure if I understood your question correct.

Resources