I have a Java like property text files with key value pairs. What are some good approaches for loading that data into haskell and then accessing it.
The file look likes:
XXXX=vvvvv
YYYY=uuuuu
I want to be able to access the "XXXX" key.
You could use a parser library like the excellent Parsec (part of the Haskell Platform). Writing a parser for a format that simple would only take a few minutes.
However, if it's really that simple, you could use split; split the string into lines using the standard lines function (or use Data.List.Split if you need to handle blank lines, etc.), and then use the Data.List.Split functions to split it on '='.
The simplest solution would be rolling your own with break:
import Control.Arrow
parse :: String -> [(String, String)]
parse = map parseField . lines
where parseField = second (drop 1) . break (== '=')
However, this doesn't handle whitespace, blank lines, or anything like that.
As for looking up by key, once you have a structure like [(String, String)], it's easy to put it into a Map (with fromList) and operate on that.
Exploring a few details ehird didn't mention, and a slightly different approach:
import qualified Data.Map as Map
type Key = String
type Val = String
main = do
-- get contents of file (contents :: String)
contents <- readFile "config.txt"
-- split into lines (optionList :: [String])
let optionList = lines contents
-- parse into map (optionMap :: Map Key Val)
let optionMap = optionMapFromList optionList
doStuffWith optionMap
optionMapFromList :: [String] -> Map.Map Key Val
optionMapFromList = foldr step Map.empty
where step line map = case parseOpt line of
Just (key, val) -> Map.insert key val map
Nothing -> map
parseOpt :: String -> Maybe (Key, Val)
parseOpt = undefined
I've expressed my solution to your problem as a fold: taking the list of lines in the file, and turning it into the desired map. Each step of the fold involves inspecting a single line, attempting to parse it into a key/value pair, and when successful, inserting it into the map.
I've left parseOpt undefined; you could use an approach like ehird's parseField, or whatever you like. Perhaps you would prefer to only parse specific options:
interestingOpts = ["XXXX", "YYYY"]
parseOpt line = case find (`isPrefixOf` line) interestingOpts of
Just key -> Just (key, drop 1 $ dropWhile (/= '=') line)
Nothing -> Nothing
Using the prefix testing approach isn't always the best idea, though, if you have (for example) an option "XX" and an option "XXXX". Play around and see what approach suits your data best. If you need high performance, look into using Data.Text instead of Strings.
Related
I want to read an input like 12 34 56 into three integers using Haskell.
For a single integer, one might use myInteger <- readLn. But for this case, I have not found any solution, except the one of first reading a line, then replacing all spaces with ,, (using something like:
spaceToCommas str =
let repl ' ' = ','
repl c = c
in map repl str
) and then calling read '[' ++ str ++ ']' which feels very hackish. Also, it does not allow me to state that I want to read three integers, it will attempt to read any amount of integers from stdin.
There has to be a better way.
Note that I would like a solution that does not rely on external packages. Using e.g. Parsec is of course great, but this simple example should not require the use of a full-fledged Parser Combinator framework, right?
What about converting the string like:
convert :: Read a => String -> [a]
convert = map read . words
words splits the given string into a list of strings (the "words") and then we perform a read on every element using map.
and for instance use it like:
main = do
line <- getLine
let [a,b,c] = convert line :: [Int] in putStrLn (show (c,a,b))
or if you for instance want to read the first three elements and don't care about the rest (yes this apparently requires super-creativity skills):
main = do
line <- getLine
let (a:b:c:_) = convert line :: [Int] in putStrLn (show (c,a,b))
I here returned a tuple that is rotated one place to the right to show parsing is done.
I have probably just spend a day of computation time in vain :)
The problem is that I (naively) wrote about 3.5GB of (compressed) [(Text, HashMap Text Int)] data to a file and at that point my program crashed. Of course there is no final ] at the end of the data and the sheer size of it makes editing it by hand impossible.
The data was formatted via Prelude.show and just at this point I realize that Prelude.read will need to the whole dataset into memory (impossible) before any data is returned.
Now ... is there a way to recover the data without resorting to write a parser manually?
Update 1
main = do
s <- getContents
let hs = read s :: [(String, M.Map String Integer)]
print $ head hs
This I tried ... but it just keeps consuming more memory until it gets killed by the OS.
Sort of. You will still be writing a parser manually... but it is a very short and very easy-to-write parser, because almost all of it will ship out to read. The idea is this: read is strict, but reads, when working on a single element, is lazyish. So we just need to strip out the bits that reads isn't expecting when working on a single element. Here's an example to get you started:
> let s = "[3,4,5," ++ undefined
> reads (drop 1 s) :: [(Int, String)]
[(3,",4,5,*** Exception: Prelude.undefined
I included the undefined at the end as evidence that it is in fact not reading the entire String before producing the parsed 3 at the head of the list.
Daniels answer can be extended to parse the whole list at once using this function. Then you can directly access it as a list the way you want
lazyread :: Read a => [Char] -> [a]
lazyread xs = go (tail xs)
where go xs = a : go (tail b)
where (a,b) = head $ reads xs
Manually delete the opening '['. After that you might be able to use reads (note the s) to incrementally access getContents.
I'm pretty new to Haskell, and am trying to simply read a file into a list of strings. I'd like one line of the file per element of the list. But I'm running into a type issue that I don't understand. Here's what I've written for my function:
readAllTheLines hdl = (hGetLine hdl):(readAllTheLines hdl)
That compiles fine. I had thought that the file handle needed to be the same one returned from openFile. I attempted to simply show the list from the above function by doing the following:
displayFile path = show (readAllTheLines (openFile path ReadMode))
But when I try to compile it, I get the following error:
filefun.hs:5:43:
Couldn't match expected type 'Handle' with actual type 'IO Handle'
In the return type of a call of 'openFile'
In the first argument of 'readAllTheLines', namely
'(openFile path ReadMode)'
In the first argument of 'show', namely
'(readAllTheLines (openFile path ReadMode))'
So it seems like openFile returns an IO Handle, but hGetLine needs a plain old Handle. Am I misunderstanding the use of these 2 functions? Are they not intended to be used together? Or is there just a piece I'm missing?
Use readFile and lines for a better alternative.
readLines :: FilePath -> IO [String]
readLines = fmap lines . readFile
Coming back to your solution openFile returns IO Handle so you have to run the action to get the Handle. You also have to check if the Handle is at eof before reading something from that. It is much simpler to just use the above solution.
import System.IO
readAllTheLines :: Handle -> IO [String]
readAllTheLines hndl = do
eof <- hIsEOF hndl
notEnded eof
where notEnded False = do
line <- hGetLine hndl
rest <- readAllTheLines hndl
return (line:rest)
notEnded True = return []
displayFile :: FilePath -> IO [String]
displayFile path = do
hndl <- openFile path ReadMode
readAllTheLines hndl
To add on to Satvik's answer, the example below shows how you can utilize a function to populate an instance of Haskell's STArray typeclass in case you need to perform computations on a truly random access data type.
Code Example
Let's say we have the following problem. We have lines in a text file "test.txt", and we need to load it into an array and then display the line found in the center of that file. This kind of computation is exactly the sort situation where one would want to use a random access array over a sequentially structured list. Granted, in this example, there may not be a huge difference between using a list and an array, but, generally speaking, list accesses will cost O(n) in time whereas array accesses will give you constant time performance.
First, let's create our sample text file:
test.txt
This
is
definitely
a
test.
Given the file above, we can use the following Haskell program (located in the same directory as test.txt) to print out the middle line of text, i.e. the word "definitely."
Main.hs
{-# LANGUAGE BlockArguments #-} -- See footnote 1
import Control.Monad.ST (runST, ST)
import Data.Array.MArray (newArray, readArray, writeArray)
import Data.Array.ST (STArray)
import Data.Foldable (for_)
import Data.Ix (Ix) -- See footnote 2
populateArray :: (Integral i, Ix i) => STArray s i e -> [e] -> ST s () -- See footnote 3
populateArray stArray es = for_ (zip [0..] es) (uncurry (writeArray stArray))
middleWord' :: (Integral i, Ix i) => i -> STArray s i String -> ST s String
middleWord' arrayLength = flip readArray (arrayLength `div` 2)
middleWord :: [String] -> String
middleWord ws = runST do
let len = length ws
array <- newArray (0, len - 1) "" :: ST s (STArray s Int String)
populateArray array ws
middleWord' len array
main :: IO ()
main = do
ws <- words <$> readFile "test.txt"
putStrLn $ middleWord ws
Explanation
Starting with the top of Main.hs, the ST s monad and its associated function runST allow us to extract pure values from imperative-style computations with in-place updates in a referentially transparent manner. The module Data.Array.MArray exports the MArray typeclass as an interface for instantiating mutable array data types and provides helper functions for creating, reading, and writing MArrays. These functions can be used in conjunction with STArrays since there is an instance of MArray defined for STArray.
The populateArray function is the crux of our example. It uses for_ to "applicatively" loop over a list of tuples of indices and list elements to fill the given STArray with those list elements, producing a value of type () in the ST s monad.
The middleWord' helper function uses readArray to produce a String (wrapped in the ST s monad) that corresponds to the middle element of a given STArray of Strings.
The middleWord function instantiates a new STArray, uses populateArray to fill the array with values from a provided list of strings, and calls middleWord' to obtain the middle string in the array. runST is applied to this whole ST s monadic computation to extract the pure String result.
We finally use our middleWord function in main to find the middle word in the text file "test.txt".
Further Reading
Haskell's STArray is not the only way to work with arrays in Haskell. There are in fact Arrays, IOArrays, DiffArrays and even "unboxed" versions of all of these array types that avoid using the indirection of pointers to simply store "raw" values. There is a page on the Haskell Wikibook on this topic that may be worth some study. Before that, however, looking at the Wikibook page on mutable objects may give you some insight as to why the ST s monad allows us to safely compute pure values from functions that use imperative/destructive operations.
Footnotes
1 The BlockArguments language extension is what allows us to pass a do block directly to a function without any parentheses or use of the function application operator $.
2 As suggested by the Hackage documentation, Ix is a typeclass mainly meant to be used to specify types for indexing arrays.
3 The use of the Integral and Ix type constraints may be a bit of overkill, but it's used to make our type signatures as general as possible.
I have a text file which contains two lists on each line. Each list can contain any number of alphanumeric arguments.
eg [t1,t2,...] [m1,m2,...]
I can read the file into ghc, but how can I read this into another main file and how can the main file recognise each argument separately to then process it?
I think it's best for you to figure out most of this for yourself, but I've got some pointers for you.
Firstly, try not to deal with the file access until you've got the rest of the code working, otherwise you might end up having IO all over the place. Start with some sample data:
sampleData = "[m1,m2,m3][x1,x2,x3,x4]\n[f3,f4,f5][y7,y8,y123]\n[m4,m5,m6][x5,x6,x7,x8]"
You should not mention sampleData anywhere else in your code, but you should use it in ghci for testing.
Once you have a function that does everything you want, eg processLists::String->[(String,String)], you can replcae readFile "data.txt" :: IO String with
readInLists :: FilePath -> IO [(String,String)]
readInLists filename = fmap processLists (readFile filename)
If fmap makes no sense to you, you could read a tutorial I accidentally wrote.
If they really are alphanumeric, you can split them quite easily. Here are some handy functions, with examples.
tail :: [a] -> [a]
tail "(This)" = "This)"
You can use that to throw away something you don't want at the front of your string.
break :: (Char->Bool) -> String -> (String,String)
break (== ' ') "Hello Mum" = ("Hello"," Mum")
So break uses a test to find the first character of the second string, and breaks the string just before it.
Notice that the break character is still there at the front of the next string. span is the same but uses a test for what to have in the first list, so
span :: (Char->Bool) -> String -> (String,String)
span (/= ' ') "Hello Mum" = ("Hello"," Mum")
You can use these functions with things like (==','), or isAlphaNum (you'll have to import Data.Char at the top of your file to use it).
You might want to look at the functions splitWith and splitOn that I have in this answer. They're based on the definitions of split and words from the Prelude.
first off sorry for doing the typical thing of 'where do I begin', but I'm totally lost.
I've been reading the 'Learn you a haskell for great good' site for what feels like an age now (pretty much half a semester. I'm just about to finish the 'Input and Output' chapter, and I still have no clue how to write a multi line program.
I've seen the do statement, and that you can only use it to concat IO actions into a single function, but I can't see how I'm gonna go about writing a realistic application.
Can someone point me in the right direction.
I'm from a C background, and basically I'm using haskell for one of my modules this semester at uni, I want to compare C++ against haskell (in many aspects). I'm looking to create a series of searching and sorting programs so that I can comment on how easy they are in the respective languages versus their speed.
However, I'm really starting to loose my faith in using Haskell as its been six weeks, and I still have no idea how to write a complete application, and the chapters in the site I'm reading seem to be getting longer and longer.
I basically need to create a basic object which will be stored in the structure (which I know how to do), more what I'm struggling with is, how do I create a program which reads data in from some text file, and populates the structure with that data in the first place, then goes on to process it. As haskell seems to split IO and other operations and it won't just let me write multiple lines in a program, I'm looking for something like this:
main = data <- getContent
let allLines = lines data
let myStructure = generateStruct allLines
sort/search/etc
print myStructure
how do I go about this? any good tutorials which will help me get going with realistic programs?
-A
You mentioned seeing do notation, now it's time to learn how to use do. Consider your example main is an IO, you should be using do syntax or binds:
main = do
dat <- getContent
let allLines = lines dat
myStructure = generateStruct allLines
sorted = mySort myStructure
searchResult = mySearch myStructure
print myStructure
print sorted
print searchResult
So now you have a main that gets stdin, turns it into [String] via lines, presumably parses it into a structure and runs sorting and searches on that structure. Notice the interesting code is all pure - mySort, mySearch, and generateStruct doesn't need to be IO (and can't be, being inside a let binding) so you are actually properly using pure and effectful code together.
I suggest you look at how bind works (>>=) and how do notation desugars into bind. This SO question should help.
See also Explaining Haskell IO without Monads by Neil Mitchell.
I'll try to start with a simplified example. Let's say this is what we want to do:
Open a file which contains a list of integers and return it.
Sort this list
Let's also reverse the list
Print the result on the screen
Let's also say that we have these functions that we can use:
getContent :: IO [Int]
sort :: [Int] -> [Int]
reverse :: [Int] -> [Int]
show :: a -> String
putStrLn :: String -> IO ()
Just so we are clear, I'll have a word about these functions:
getContent: I made up this function, but if there was such function that would be it's signature (you can use getContent = return [3,7,2,1] for testing purposes). I'm sure you've seen such signature before and at least vaguely understand that since it does IO its signature can not be just getContent :: [Int].
sort: It's a function defined in Data.List module, usage is simple: sort [3,1,2] returns [1,2,3]
reverse: Also defined in Data.List module: reverse [1,3,2] returns [2,3,1]
show: don't need to import anything, just use it: show 11 returns the string "11"; show [1,2,3] returns the string "[1,2,3]", etc.
putStrLn: takes a string, puts it on the screen and returns IO (), now again, since it does IO its signature can not be just putStrLn :: Stiring -> ().
OK, now we have all we need to create our program, the problem now is about connecting these functions together. Let's start with connecting functions:
getContent :: IO [Int] with sort :: [Int] -> [Int]
I think if you get this part, you'll easily get the rest as well. So, the problem is that since getContent returns IO [Int] and not just [Int], you can't just ignore or get rid of the IO part and shove it into sort. That is, this is what you can not do to connect these functions:
sort (getRidOfIO getContent)
Here is where the >>= :: m a -> (a -> m b) -> m b operation comes to the rescue. Now notice that m, a and b are type variables so if we substitute m for IO, a for [Int] and b for [Int], we get the signagure:
>>= :: IO [Int] -> ([Int] -> IO [Int]) -> IO [Int]
Have a look again at those getContent and sort functions and their signatures and try to think about how they'll fit into the >>=. I'm sure you'll notice that you can use getContent directly as the first argument to >>=. So far what >>= will do is take the [Int] out getContent and shoves it into the function provided as a second argument. But what will be the function in the second argument? We can't use the sort :: [Int] -> [Int] directly, the next best thing we can try is
\listOfInts -> sort listOfInts
but that still has signature [Int] -> [Int] so that did not help much. Here is where the other hero comes to the play, the
return :: a -> m a.
Again, a and m are type variables, lets substitute them and we will get
return :: [Int] -> IO [Int]
so adding \listOfInts -> sort listOfInts and return together we will get:
\listOfInts -> return $ sort listOfInts :: [Int] -> IO [Int]
Which is exactly what we want to put as a second argument to >>=. So lets finaly connect getContent and sort using our glue together:
getContent >>= (\listOfInts -> return $ sort listOfInts)
which is the same thing as (using the do notation):
do listOfInts <- getContent
return $ sort listOfInts
There, that is the end of the most terrifying part. And now comes possibly one of the aha moments, try to think about what is the result type of the connection we just made up. I'll spoil it for you,... the type of
getContent >>= (\listOfInts -> return $ sort listOfInts) is IO [Int] again.
Lets summarize: we took something of type IO [Int] and something of type [Int] -> [Int], glued those two things together and got again something of type IO [Int]!
Now go ahead and try exactly the same thing: Take the IO [Int] object we have just created and glue it together (using >>= and return) with reverse :: [Int] -> [Int].
I think I wrote way too much, but let me know if anything was not clear or if you need help with the rest.
Wha I've described so far can look something like this:
getContent :: IO [Int]
getContent = return [5,2,1,7]
main :: IO ()
main = do
listOfInts <- getContent
return $ sort listOfInts
return () -- This is only to sattisfy the signature of main
If it is a question of reading from stdin and writing a result to stdout, with no further intevening user input -- as your mention of getContents suggests -- then the ancient interact :: (String -> String) -> IO (), or the several other versions, e.g. Data.ByteString.interact :: (ByteString -> ByteString) -> IO () or Data.Text.interact :: (Text -> Text) -> IO() are all that are needed. interact is basically the 'make a little unix tool out of this function' function -- it maps pure functions of the right type to executable actions (i.e. values of the type IO().) All Haskell tutorials should mention it on the third or fourth page, with instructions on compilation.
So if you write
main = interact arthur
arthur :: String -> String
arthur = reverse
and compile with ghc --make -O2 Reverse.hs -o reverse then whatever you pipe to ./reverse will be understood as a list of characters and emerge reversed. Similarly, whatever you pipe to
main = interact (unlines . meredith . lines)
meredith :: [String] -> [String]
meredith = filter (not.null)
will emerge with the empty lines omitted. More interestingly,
main = interact ( unlines . map show . luther . map read . lines)
luther :: [Int] -> [Int]
luther = filter even
will take a stream of characters separated by newlines, read them as Ints, removing the odd ones, and yielding the suitably filtered stream.
main = interact ( unlines . map show . emma . map read . lines)
emma :: [Int] -> Int
emma = sum . map square
where square x = x * x
will print the sum of the squares of the newline-separated numerals.
In these last two cases, luther and emma the internal 'data structure' is [Int], which is pretty dull, and the function applied to it is idiot simple, of course. The main point is to let one of the forms of interact take care of all of the IO, and thus get images like 'populating a structure' and 'processing it' out of your head. To use interact you need to use composition to make the whole yield some sort of String -> String function. But even here, as in the runt first example arthur:: String -> String you are defining a genuine function in something more like the mathematical sense. Values in the types String and ByteString are just as pure as those in Bool or Int.
In more complicated cases of this basic interact type, your task is thus, first, to think how the desired pure values of the function you will be focussing on can be mapped to String values (here, it's just show for an Int or unlines . map show for a [Int]). interact knows what to "do" with the string. -- And then to figure out how to define a pure mapping from Strings or ByteString (which will contain your 'raw' data) to values in the type or types your principal function takes as arguments. Here I was just using map read . lines resulting in a [Int]. If you are working on some more complicated, say tree structure you'd need a function from [Int] to MyTree Int. A more elaborate function to put in this position would be a Parser, of course.
Then you can go to town, in this sort of case: there is really no reason to think of yourself as 'programming', 'populating' and 'processing' at all. This is where all the cool devices of LYAH kick in. Your duty is to define a mapping within the specific definitional discipline. In the last two cases, these are from [Int] to [Int] and from [Int] to Int, but here is a similar example derived from the excellent, still incomplete, tutorial on the super-excellent Vector package where the initial numerical structure one is dealing with is Vector Int
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Vector.Unboxed as U
import System.Environment
main = L.interact (L.pack . (++"\n") . show . roman . parse)
where
parse :: L.ByteString -> U.Vector Int
parse bytestr = U.unfoldr step bytestr
step !s = case L.readInt s of
Nothing -> Nothing
Just (!k, !t) -> Just (k, L.tail t)
-- now the IO and stringy nonsense is out of the way
-- so we can calculate properly:
roman :: U.Vector Int -> Int
roman = U.sum
Here again roman is moronic, any function from a Vector of Ints to an Int, however complex, can take its place. Writing a better roman will never be a question of "populating" "multi-line programming" "processing" etc., though of course we speak this way; it is just a question of defining a genuine function by composition of the functions in Data.Vector and elsewhere. The sky is the limit, check out that tutorial too.