Count `Pat`s in a Module

Count `Pat`s in a Module - haskell

I need to count the number of Pat in a haskell Module. I know the simplest way is to pattern match on each level of the AST, which will result in a huge function that looks like the entire AST. I believe there's some way to take advantage of typeclasses like Functor or the State Monad to lean on some existing function that walks the tree (like prettyPrint) and trace a counter along, but I'm not sure how it works exactly.

It's very easy using uniplate:
import Data.Data
import Data.Generics.Uniplate.Data
import Control.Monad
import Language.Haskell.Exts
findPats :: Data a => a -> [Pat]
findPats = universeBi
test = do
content <- readFile "Simple.hs"
case parseModule content of
ParseFailed _ e -> error e
ParseOk a -> do
forM_ (findPats a) $ \p -> do
putStrLn $ "got a pat: " ++ show p
Essentially it's just the universeBi function.

Related

Haskell: how to extract a value from a Right

I am building up a simple script to parse a two-items-per-row CSV file:
//Main.hs
module Main where
import qualified Data.ByteString.Lazy as BL
import qualified Data.Vector as V
import Data.Csv
type Row = (BL.ByteString, BL.ByteString)
main :: IO ()
main = do
csvData <- BL.readFile "csvs/twostringsperrow.csv"
let v = decode NoHeader csvData :: Either String (V.Vector Row)
putStrLn "All done"
The script works. Obviously it doesn't do much at the moment, but it works, which is reassuring.
I want to now interact with this in the GHCi and so I run those couple of lines:
$ stack ghci
...
*Main> csvData <- BL.readFile "csvs/twostringsperrow.csv"
*Main> let v = decode NoHeader csvData :: Either String (V.Vector Row)
*Main> v
Right [("1","2"),("3","4")]
At this point I can see that the parsing has been successful and would like to get the [("1","2"),("3","4")] out of the Right into a variable called df so that I can have a play with it. i.e.:
*Main> let df = <something here> v
*Main> df
[("1","2"),("3","4")]
How do I do that?

You can use pattern matching logic here. For example:
let Right df = v
We thus here unwrap the data out of Right data constructor.
You can for example write a function that handles both the Left and Right case, since it is typically better to implement total functions (functions that can process the entire space of values specified by the type).

A basic approach it to use a case.
do ...
x <- parse ...
case x of
Left e -> putStrLn ("Parse error" ++ show e)
Right y -> putStrLn ("Parse OK!" ++ show y)
Don't forget that we can not, in general, "remove a Right" in a safe way, since a value of type Either ParseError T is not necessarily a Right, but could also be a Left.
Indeed, the parsing library returns such a sum type in order to force us to handle the error, and consider both cases.
There are some dangerous partial functions that indeed "remove Right" but it is better to avoid them.

Can I drop the IO monad on this pure function prettily?

It is quite hard to formulate good questions titles as a newbie. Please make this question search friendly =)
Trying to write my first "real" Haskell program (i.e. not only Project Euler stuff), I am trying to read and parse my configuration file with nice error messages. So far, I have this:
import Prelude hiding (readFile)
import System.FilePath (FilePath)
import System.Directory (doesFileExist)
import Data.Aeson
import Control.Monad.Except
import Data.ByteString.Lazy (ByteString, readFile)
-- Type definitions without real educational value here
loadConfiguration :: FilePath -> ExceptT String IO Configuration
loadConfiguration path = do
fileContent <- readConfigurationFile "C:\\Temp\\config.json"
configuration <- parseConfiguration fileContent
return configuration
readConfigurationFile :: FilePath -> ExceptT String IO ByteString
readConfigurationFile path = do
fileExists <- liftIO $ doesFileExist path
if fileExists then do
fileContent <- liftIO $ readFile path
return fileContent
else
throwError $ "Configuration file not found at " ++ path ++ "."
parseConfiguration :: ByteString -> ExceptT String IO Configuration
parseConfiguration raw = do
let result = eitherDecode raw :: Either String Configuration
case result of
Left message -> throwError $ "Error parsing configuration file: " ++ message
Right configuration -> return configuration
This works, but the IO monad in parseConfiguration is not necessary, and should go away. But I can't just drop it, of course, and I have not yet found a way to change parseConfiguration to something pure while keeping the prettyness of loadConfiguration.
What is the correct way to write this? If this is answered in the documentation, I am sorry, but I did not find it. I think reading the hackage documentation is a skill that grows as slowly as the rest of my Haskell skills. =)
P.S.: Comments on other style mistakes are, of course, very welcome!

If you are already using mtl, then the solution given by bheklilr in his comment is a good one. Make parseConfiguration work on any monad that implements MonadError.
If for whatever reason you are not using mtl, but only transformers, then you need'll a function with a type like Monad n => Except e a -> ExceptT e n a that "hoists" an Except into an ExceptT over some monad.
We can build this function using mapExceptT :: (m (Either e a) -> n (Either e' b)) -> ExceptT e m a -> ExceptT e' n b, a function that can change the base monad of an ExceptT transformer.
Except is really ExceptT Identity, so what we want is to unwrap the Identity and return the value in the new monad:
hoistExcept :: Monad n => Except e a -> ExceptT e n a
hoistExcept = mapExceptT (return . runIdentity)
You could also define it this way:
hoistExcept :: Monad n => Except e a -> ExceptT e n a
hoistExcept = ExceptT . return . runIdentity . runExceptT

Preventing "getCurrentDirectory: resource exhausted (Too many open files)" error

I am trying to run a Parsec parser over a whole bunch of small files, and getting an error saying I have too many open files. I understand that I need to use strict IO, but I'm not sure how to do that. This is the problematic code:
files = getDirectoryContents historyFolder
hands :: IO [Either ParseError [Hand]]
hands = join $ sequence <$> parseFromFile (many hand) <<$>> files
Note: my <<$>> function is this:
(<<$>>) :: (Functor f1, Functor f2) => (a -> b) -> f1 (f2 a) -> f1 (f2 b)
a <<$>> b = (a <$>) <$> b

I don't know what your parseFromFile function looks like right now (probably a good idea to include that in the question), but I'm guessing you're using Prelude.readFile, which as #Markus1189 points out includes lazy I/O. To get to strict I/O, you just need a strict readFile, such as Data.Text.IO.readFile.
A streaming data library like pipes or conduit would allow you to avoid reading the entire file into memory at once, though- to my knowledge- parsec doesn't provide a streaming interface to allow this to happen. attoparsec, on the other hand, does include such a streaming interface, and both pipes and conduit have attoparsec adapter libraries (e.g., Data.Conduit.Attoparsec).
tl;dr: You probably just need the following helper function:
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
readFileStrict :: FilePath -> IO String
readFileStrict = fmap T.unpack . TIO.readFile

You can use the BangPatterns language extension to enforce strictness of your IO operations, in this case parseFromFile. For example the function hands can be changed in:
hands :: [String] → IO [Either ParseError [Hand]]
hands [] = return []
hands (f:fs) = do
!res ← parseFromFile hand f
others ← hands fs
return (res:others)
This version of hands waits for the results of each call of parseFromFile before moving to the next file in the list. Once you have this, the problem should disappear. A full working toy example is:
{-# LANGUAGE BangPatterns #-}
import Control.Monad
import Control.Applicative hiding (many)
import Data.Char (isDigit)
import System.Directory (getDirectoryContents)
import System.FilePath ((</>))
import Text.ParserCombinators.Parsec
data Hand = Hand Int deriving Show
hand :: GenParser Char st [Hand]
hand = do
string "I'm file "
num ← many digit
newline
eof
return [Hand $ read num]
files :: IO [String]
files = map ("manyfiles" </>)
∘ filter (all isDigit) <$> getDirectoryContents "manyfiles"
hands :: [String] → IO [Either ParseError [Hand]]
hands [] = return []
hands (f:fs) = do
!res ← parseFromFile hand f
others ← hands fs
return (res:others)
main :: IO 𐌏
main = do
results ← files >≥ hands
print results

Feeding a monadic expression into unless or when

I often find myself writing code that looks like this:
import System.Directory (doesFileExist)
import Control.Monad (unless)
example = do
fileExists <- doesFileExist "wombat.txt"
unless fileExists $ putStrLn "Guess I should create the file, huh?"
Perhaps a better way is:
example2 =
doesFileExist "wombat.txt" >>=
(\b -> unless b $ putStrLn "Guess I should create the file, huh?")
What's the best approach here?

I could define a helper function:
unlessM :: Monad m => m Bool -> m () -> m ()
unlessM b s = b >>= (\t -> unless t s)
example3 = unlessM (doesFileExist "wombat.txt") $
putStrLn "Guess I should create the file, huh?"
It seems like unlessM would be very useful. But the fact that I don't see anything like unlessM (or with that type signature) on Hackage makes me think that there's some better way to handle this situation, one that I haven't discovered yet. What do the cool kids do?

I have made use of flip unless for such cases, but these types of combinators can get a little bit noisy. With the LambdaCase extension, you could at least avoid using a name for the result of doesFileExist, though it would result in having to pattern match on True and False, which can look a little strange (depending on if you believe if is unnecessary or not).
{-# LANGUAGE LambdaCase #-}
import System.Directory (doesFileExist)
import Control.Monad (unless)
example' =
doesFileExist "wombat.txt" >>=
flip unless (putStrLn "Guess I should create the file, huh?")
example'' =
doesFileExist "wombat.txt" >>= \ case
True -> return ()
False -> putStrLn "Guess I should create the file, huh?"

How do I avoid memory problems when writing to file using the Writer monad?

I am building some moderately large DIMACS files, however with the method used below the memory usage is rather large compared to the size of the files generated, and on some of the larger files I need to generate I run in to out of memory problems.
import Control.Monad.State.Strict
import Control.Monad.Writer.Strict
import qualified Data.ByteString.Lazy.Char8 as B
import Control.Monad
import qualified Text.Show.ByteString as BS
import Data.List
main = printDIMACS "test.cnf" test
test = do
xs <- freshs 100000
forM_ (zip xs (tail xs))
(\(x,y) -> addAll [[negate x, negate y],[x,y]])
type Var = Int
type Clause = [Var]
data DIMACSS = DS{
nextFresh :: Int,
numClauses :: Int
} deriving (Show)
type DIMACSM a = StateT DIMACSS (Writer B.ByteString) a
freshs :: Int -> DIMACSM [Var]
freshs i = do
next <- gets nextFresh
let toRet = [next..next+i-1]
modify (\s -> s{nextFresh = next+i})
return toRet
fresh :: DIMACSM Int
fresh = do
i <- gets nextFresh
modify (\s -> s{nextFresh = i+1})
return i
addAll :: [Clause] -> DIMACSM ()
addAll c = do
tell
(B.concat .
intersperse (B.pack " 0\n") .
map (B.unwords . map BS.show) $ c)
tell (B.pack " 0\n")
modify (\s -> s{numClauses = numClauses s + length c})
add h = addAll [h]
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
(s,b) = runWriter (execStateT f (DS 1 0))
i = nextFresh s - 1
j = numClauses s
I would like to keep the monadic building of clauses since it is very handy, but I need to overcome the memory problem. How do I optimize the above program so that it doesn't use too much memory?

If you want good memory behavior, you need to make sure that you write out the clauses as you generate them, instead of collecting them in memory and dumping them as such, either using lazyness or a more explicit approach such as conduits, enumerators, pipes or the like.
The main obstacle to that approach is that the DIMACS format expects the number of clauses and variables in the header. This prevents the naive implementation from being sufficiently lazy. There are two possibilities:
The pragmatic one is to write the clauses first to a temporary location. After that the numbers are known, so you write them to the real file and append the contents of the temporary file.
The prettier approach is possible if the generation of clauses has no side effects (besides the effects offered by your DIMACSM monad) and is sufficiently fast: Run it twice, first throwing away the clauses and just calculating the numbers, print the header line, run the generator again; now printing the clauses.
(This is from my experience with implementing SAT-Britney, where I took the second approach, because it fitted better with other requirements in that context.)
Also, in your code, addAll is not lazy enough: The list c needs to be retained even after writing (in the MonadWriter sense) the clauses. This is another space leak. I suggest you implement add as the primitive operation and then addAll = mapM_ add.

As explained in Joachim Breitner's answer the problem was that DIMACSM was not lazy enough, both because the strict versions of the monads was used and because the number of variables and clauses are needed before the ByteString can be written to the file. The solution is to use the lazy versions of the Monads and execute them twice. It turns out that it is also necessary to have WriterT be the outer monad:
import Control.Monad.State
import Control.Monad.Writer
...
type DIMACSM a = WriterT B.ByteString (State DIMACSS) a
...
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
s = execState (execWriterT f) (DS 1 0)
b = evalState (execWriterT f) (DS 1 0)
i = nextFresh s - 1
j = numClauses s

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Count `Pat`s in a Module - haskell

Related

Haskell: how to extract a value from a Right

Can I drop the IO monad on this pure function prettily?

Preventing "getCurrentDirectory: resource exhausted (Too many open files)" error

Feeding a monadic expression into unless or when

How do I avoid memory problems when writing to file using the Writer monad?

Categories

Resources