I'm trying to understand how to use Shake and how to build new rules. As an exercise, I've decided to implement what I call a backup rule.
The idea is to generate a file if it doesn't exists OR if it's too old (let's more than 24 hour). I like to store long command in a makefile and run them on demand. An example is a mysql backup. The only problem is when the backup already exists, make doesn't do anything. To solve this, I can either
remove the previous backup before redoing a new one,
make the backup target phony
add a fictive force dependency, which I can touch manually or in cron.
What I would like, is to redo the backup if it's older than 24 hours (which I can do with a touch force in cron). Anyway it's only an example to play with Shake. What I would like is something like :
expirable "my_backup" 24 \out -> do
cmd "mysqldump" backup_parameter out
I read the doc, but I have no idea how to do this or define a rule and what an Action is.
I understand that I need to instanciate a Rule class but I can't figure out what is what.
Clarification
I don't want the backup to be run automatically but to be run only on demand but with a maximum of once per 24 hour.
An example scenario is
I have a production database on a remote machine, local copy and run some time consuming reports locally. The normal workflow is
download production backup
refresh the local database with it
create some denormalized tables on a local warehouse database
generate some report.
I don't run the report every days but only when I need it. So I don't want to run report every 24 hours. It's easy to do with a makefile except the timing bit, they are wor around but once again it's a contrived example to understand deeply how Shake work.
So, when I first do make report it backup the db run everything and generate the report.
Now, I want to modify the report (because I'm testing it). I don't need the backup to regenerated (nor the local database to refreshed) (we are the evening, and I know that nothing has changed on production until the next day)
Then the next day, or next month, I rerun the report. This time I need the backup to be done again, and all it's dependency to be rerun as well.
Basically the rule I need is instead of
redo timestamp = timestamp < old
is
redo timestamp = timestamp < old || now > timestamp + 24*36000
But I have no idea where to put this rule.
The question is more where to puth it, instead of how to write it (it's above).
If it's easier (to explain) I can have a rule which ask the user (getLine) 'do you want to redo the this target (yes/no)?`.
Later I will also need a rule depending on the last update of the database (or a specific table). I know how to get the information from the database but not how to integrate it in Shake.
I might be confused with what a Rule is. In make a rule is about how to make a target (so it's more a recipe) or what I think is the Action in Shake. Where is, when I say rule, I mean the rule which decide to remake the target or not, not how to do it. In make, you don't have the choice (it's timestamp) so there is no such concept.
There are two senses of "writing rules" in Shake: 1) using *> or similar to define the rules specific to your build system; 2) defining new types of rules, e.g. defining operators such as *> yourself. Most users of Shake do 1 a lot, and never do 2. Your question seems entirely concerned with 2, which is certainly possible (all rules are written outside the core of Shake) but rarer.
To define something which runs while checking the build, you need to use the Development.Shake.Rule module, and define an instance of the type class Rule. You typically want to sugar up the apply1 function so people can use your rule in a type-safe way. If you are writing a simple rule (e.g. look up a modification date, see if it's changed) then it isn't too hard. If you are doing a more complex rule (e.g. check a file is no more than 1 day old) it's a bit tricker, but still possible - it needs more care thinking about what gets stored where. Taking your "rebuild if file is older than some number of seconds" example, we can define:
module MaximumAgeRule(maximumAge, includeMaximumAge) where
import Data.Maybe
import Development.Shake.Rule
import Development.Shake.Classes
import Development.Shake
import System.Directory as IO
import Data.Time
newtype MaxAgeQ = MaxAgeQ (FilePath, Double)
deriving (Show,Binary,NFData,Hashable,Typeable,Eq)
instance Rule MaxAgeQ Double where
storedValue _ (MaxAgeQ (file, secs)) = do
exists <- IO.doesFileExist file
if not exists then return Nothing else do
mtime <- getModificationTime file
now <- getCurrentTime
return $ Just $ fromRational (toRational $ diffUTCTime now mtime)
equalValue _ (MaxAgeQ (_, t)) old new = if new < t then EqualCheap else NotEqual
-- | Define that the file must be no more than N seconds old
maximumAge :: FilePath -> Double -> Action ()
maximumAge file secs = do
apply1 $ MaxAgeQ (file, secs) :: Action Double
return ()
includeMaximumAge :: Rules ()
includeMaximumAge = do
rule $ \q#(MaxAgeQ (_, secs)) -> Just $ do
opts <- getShakeOptions
liftIO $ fmap (fromMaybe $ secs + 1) $ storedValue opts q
We can then use the rule with:
import Development.Shake
import MaximumAgeRule
main = shakeArgs shakeOptions $ do
includeMaximumAge
want ["output.txt"]
"output.txt" *> \out -> do
maximumAge out (24*60*60)
liftIO $ putStrLn "rerunning"
copyFile' "input.txt" "output.txt"
Now the file input.txt will be copied to output.txt every time it changes. In addition, if output.txt is more than one day old, it will be copied afresh.
How the usage works Since we are using a custom rule, we have to declare that with includeMaximumAge (which is ugly, but unavoidable). We then call maximumAge when producing output.txt, saying that the file output.txt must be no more than 1 day old. If it is, the rule reruns. Simple and reusable.
How the definition works The definition is a bit complex, but I don't expect many people to define rules, so a StackOverflow question per rule definition seems reasonable :). We have to define a key and a value for the rule, where the key produces the value. For the key we declare a fresh type (as you always should for keys) which stores the filename and how old it is allowed to be. For the value, we store how old the file is. The storedValue function retrieves the value from the key by querying the file. The equalValue function looks at the value and decides if the value is EqualCheap (don't rebuild) or NotEqual (do rebuild). Normally equalValue does old == new as its main test, but here we don't care what the value was last time (we ignore old), but we do care what the threshold in MaxAgeQ is, and we compare it to the value.
The maximumAge function just invokes apply1 to add a dependency on MaxAgeQ, and includeMaximumAge defines what apply1 calls.
Here's a solution that partially works:
import Development.Shake
import Control.Monad
import System.Directory as IO
import Data.Time
buildBackupAt :: FilePath -> Action ()
buildBackupAt out = cmd "mysqldump" "-backup" out {- Or whatever -}
-- Argument order chosen for partial application
buildEvery :: NominalDiffTime -> (FilePath -> Action ()) -> FilePath -> Action ()
buildEvery secs act file = do
alwaysRerun
exists <- liftIO $ IO.doesFileExist file
rebuild <- if not exists then return True else do
mtime <- liftIO $ getModificationTime file
now <- liftIO $ getCurrentTime
return $ diffUTCTime now mtime > secs
when rebuild $ act file
myRules :: Rules ()
myRules = "my_backup" *> buildEvery (24*60*60) buildBackupAt
-- File name is a FilePattern that shake turns into a FilePath; no wildcard here,
-- so it's simple, but you can wildcard, too as long as you action pays attention
-- to the FilePath passed in.
This will rebuild the backup every day, but will not rebuild if the dependencies declared in buildBackupAt change.
Related
I'm making a program using Haskell that requires simple save and load functions. When I call the save function, I need to put a string into a text file. When I call load, I need to pull the string out of the text file.
I'm aware of the complexities surrounding IO in Haskell. From some reading around online I have discovered that it is possible through a 'main' function. However, I seem to only be able to implement either save, or load... not both.
For example, I have the following function at the moment for reading from the file.
main = do
contents <- readFile "Test.txt"
putStrLn contents
How can I also implement a write function? Does it have to be within the same function? Or can I separate it? Also, is there a way of me being able to name the functions load/save? Having to call 'main' when I actually want to call 'load' or 'save' is rather annoying.
I can't find any examples online of someone implementing both, and any implementations I've found of either always go through a main function.
Any advice will be greatly appreciated.
I'm aware of the complexities surrounding IO in Haskell.
It's actually not that complex. It might seem a little intimidating at first but you'll quickly get the hang of it.
How can I also implement a write function?
The same way
Or can I separate it?
Yes
Also, is there a way of me being able to name the functions load/save?
Yes, for example you could do your loading like this:
load :: IO String
load = readFile "Test.txt"
All Haskell programs start inside main, but they don't have to stay there, so you can use it like this:
main :: IO ()
main = do
contents <- load -- notice we're using the thing we just defined above
putStrLn contents
Note the main is always what your program does; But your main doesn't only have to do a single thing. It could just as well do many things, including for instance reading a value and then deciding what to do; Here's a more complicated (complete) example - I expect you'll not understand all parts of it right off the bat, but it at least should give you something to play around with:
data Choice = Save | Load
pickSaveOrLoad :: IO Choice
pickSaveOrLoad = do
putStr "Do you want to save or load? "
answer <- getLine
case answer of
"save" -> return Save
"load" -> return Load
_ -> do
putStrLn "Invalid choice (must pick 'save' or 'load')"
pickSaveOrLoad
save :: IO ()
save = do
putStrLn "You picked save"
putStrLn "<put your saving stuff here>"
load :: IO ()
load = do
putStrLn "You picked load"
putStrLn "<put your loading stuff here>"
main :: IO ()
main = do
choice <- pickSaveOrLoad
case choice of
Save -> save
Load -> load
Of course it's a bit odd to want to do either save or load, most programs that can do these things want to do both, but I don't know what exactly you're going for so I kept it generic.
What is the recommended way of running some Action if part of a file changes?
My use-case is given a file that I know exists (concretely elm-package.json), run a shell command (elm package install --yes) if part of the file changes (the dependencies field).
It seems that the Oracle abstraction exposes comparing a value to the last (via Eq). So I tried a newtype like:
newtype ElmDependencies = ElmDependencies () deriving ...
type instance RuleResult ElmDependencies = String
But now, I get stuck actually using this function of type ElmDependencies -> Action String, since the rule I want to write doesn't actually care what the returned String is, it simply wants to be called if the String changes.
In other words,
action $ do
_ <- askOracle (ElmDependencies ())
cmd_ "elm package install --yes"
at the top-level doesn't work; it will run the action every time.
Your askOracle approach is pretty close, but Shake needs to be able to
identify the "output" of the action, so it can give it a persistent name
between runs, so other steps can depend on it, and use that persistent name to avoid recomputing. One way to do that is to make the action create a stamp file, e.g.:
"packages.stamp" *> \out -> do
_ <- askOracle $ ElmDependencies ()
cmd_ "elm package install --yes"
writeFile' out ""
want ["packages.stamp"]
Separately, an alternative to using Oracle is to have a file
elm-package-dependencies.json which you generate from
elm-package.json, write using writeFileIfChanged (which gives you Eq for files), and depend on that
file in packages.stamp. That way you get Eq on files, and can also
easily debug it or delete the -dependencies.json file to force a rerun.
I want to rename a file in Haskell without overwriting an already existing one. In case the target file exists I want to deal with that in my code (by appending something to the file name).
The description of renameFile from System.Directory says:
renameFile old new changes the name of an existing file system object from old to new. If the new object already exists, it is atomically replaced by the old object. Neither path may refer to an existing directory.
Is there any existing module or command that would let me rename without overwriting?
I know I can do the checks myself. I'd just feel much better if there was a function written by someone experienced. Overwritten files are gone for good.
Update
I want to rename photos, videos, live photos by creation data from either EXIF (similar to jhead) or the file system timestamp normalized to the timezone the photo was taken in. It might be that two photos were taken at exactly the same time and would end up with the same name: 2017-01-12 – 11-12-11.jpg. This must not happen. The second photo should be called something like 2017-01-12 – 11-12-11a.jpg.
POSIX has the ability to create a new file: atomically check a file exists and only create it if it does not, via the O_EXCL flag to open(). This lets you avoid the race condition in the more obvious implementation in which two processes may check that a file doesn't exist before either of them creates it, causing one process to overwrite the other's file. This can help here: the idea is to exclusively create an empty file at the target, and then overwrite it with a rename only if the exclusive creation succeeded. If the exclusive creation failed then another process already created the file. This is exposed in Haskell's unix package, via the openFd function, which either succeeds or else throws an IOException. It can be used like this:
module RenameNoOverwrite where
import Control.Exception
import Control.Monad
import Data.Bits
import System.Directory
import System.Posix.Files
import System.Posix.IO
renameFileNoOverwrite :: FilePath -> FilePath -> IO Bool
renameFileNoOverwrite old new = do
created <- handle handleIOException $ bracket createNewFile closeFd $ pure $ pure True
when created $ renameFile old new
return created
where
createNewFile = openFd new WriteOnly (Just defaultMode) defaultFileFlags {exclusive = True}
defaultMode = ownerReadMode .|. ownerWriteMode .|. groupReadMode .|. otherReadMode
handleIOException :: IOException -> IO Bool
handleIOException _ = return False
The key part is the {exclusive = True} option, which sets the O_EXCL flag on the resulting call to open().
Windows has a similar ability, exposed via the CREATE_NEW flag to CreateFile. There's also a MOVEFILE_REPLACE_EXISTING flag to MoveFileEx that looks like it might be useful, but I've never used it and the documentation is not 100% clear to me. These are exposed in Haskell's Win32 package.
Unfortunately there doesn't currently seem to be a portable way of doing this.
Here is one potential solution:
import System.Directory (doesFileExist, renameFile)
-- | Rename a src file as tgt file, safely. If the tgt file exists, don't
-- rename and return False. Otherwise, rename src to tgt and return True.
renameSafely :: FilePath -> FilePath -> IO Bool
renameSafely src tgt = do
exists <- doesFileExist tgt
if not exists
then (renameFile src tgt >> return True)
else return False
(Disclaimer: I didn't run this through GHC to ensure that it compiles; the ">>" in the then clause might be an issue.)
As noted in the comments, there is a potential race condition in the file system with two processes trying to create or rename a file with the same name at the same time. However, as you pointed out, that is unlikely to be an issue for you.
If renameSafely returns IO False, then simply try another name. :-)
I thought there would already be a question about this, but I can't find one.
I want my program to print out the date it was compiled on. What's the easiest way to set that up?
I can think of several possibilities, but none of them are what you'd call "easy". Ideally I'd like to be able to just do ghc --make Foo and have Foo print out the compilation date each time I run it.
Various non-easy possibilities that spring to mind:
Learn Template Haskell. Figure out how to use Data.Time to fetch today's date. Find a way how to transform that into a string. (Now my program requires TH in order to work. I also need to convince it to recompile that module every time, otherwise I get the compilation date for that module [which never changes] rather than the whole program.)
Write a shell script that generates a tiny Haskell module containing the system date. (Now I have to use that shell script rather than compile my program directly. Also, shell scripting on Windows leaves much to be desired!)
Sit down and write some Haskell code which generates a tiny Haskell module containing the date. (More portable than previous idea - but still requires extra build steps or the date printed will be incorrect.)
There might be some way to do this through Cabal - but do I really want to package up this little program just to get a date facility?
Does anybody have any simpler suggestions?
Using Template Haskell for this is relatively simple.
You just need to:
Run IO action within Template Haskell monad:
runIO :: IO a -> Exp a
Then create a string literal with:
stringE :: String -> ExpQ
Put a whole expression within a quasiquote.
$( ... )
This program will print time of its compilation:
{-# LANGUAGE TemplateHaskell #-}
import Language.Haskell.TH
import Data.Time
main = print $(stringE =<< runIO (show `fmap` Data.Time.getCurrentTime))
You may put the relevant fragment into a module that imports all other modules to make sure it is recompiled.
Or take current revision information from your versioning system. See: TemplateHaskell and IO
The preprocessor helpfully defines __DATE__ and __TIME__ macros (just like in C), so this works:
{-# LANGUAGE CPP #-}
main = putStrLn (__DATE__ ++ " " ++ __TIME__)
This is probably simpler than Michal's suggestion of Template Haskell, but doesn't let you choose the format of the date.
This is something of an extension to this question:
Dispatching to correct function with command line arguments in Haskell
So, as it turns out, I don't have a good solution yet for dispatching "commands" from the command line to other functions. So, I'd like to extend the approach in the question above. It seems cumbersome to have to manually add functions to the table and apply the appropriate transformation function to each function so that it takes a list of the correct size instead of its normal arguments. Instead, I'd like to build a table where I'll add functions and "tag" them with the number of arguments it needs to take from the command line. The "add" procedure, should then take care of composing with the correct "takesXarguments" procedure and adding it to the table.
I'd like to be able to install "packages" of functions into the table, which makes me think I need to be able to keep track of the state of the table, since it will change when packages get installed. Is the Reader Monad or the State Monad what I'm looking for?
No monad necessary. Your tagging idea is on the right track, but that information is encoded probably in a different way than you expected.
I would start with a definition of a command:
type Command = [String] -> IO ()
Then you can make "command maker" functions:
mkCommand1 :: (String -> IO ()) -> Command
mkCommand2 :: (String -> String -> IO ()) -> Command
...
Which serves as the tag. If you don't like the proliferation of functions, you can also make a "command lambda":
arg :: (String -> Command) -> Command
arg f (x:xs) = f x xs
arg f [] = fail "Wrong number of arguments"
So that you can write commands like:
printHelloName :: Command
printHelloName = arg $ \first -> arg $ \last -> do
putStrLn $ "Hello, Mr(s). " ++ last
putStrLn $ "May I call you " ++ first ++ "?"
Of course mkCommand1 etc. can be easily written in terms of arg, for the best of both worlds.
As for packages, Command sufficiently encapsulates choices between multiple subcommands, but they don't compose. One option here is to change Command to:
type Command = [String] -> Maybe (IO ())
Which allows you to compose multiple Commands into a single one by taking the first action that does not return Nothing. Now your packages are just values of type Command as well. (In general with Haskell we are very interested in these compositions -- rather than packages and lists, think about how you can take two of some object to make a composite object)
To save you from the desire you have surely built up: (1) there is no reasonable way to detect the number of arguments a function takes*, and (2) there is no way to make a type depend on a number, so you won't be able to create a mkCommand which takes as its first argument an Int for the number of arguments.
Hope this helped.
In this case, it turns out that there is, but I recommend against it and think it is a bad habit -- when things get more abstract the technique breaks down. But I'm something of a purist; the more duct-tapey Haskellers might disagree with me.