Running an Action if part of a file changes - haskell

What is the recommended way of running some Action if part of a file changes?
My use-case is given a file that I know exists (concretely elm-package.json), run a shell command (elm package install --yes) if part of the file changes (the dependencies field).
It seems that the Oracle abstraction exposes comparing a value to the last (via Eq). So I tried a newtype like:
newtype ElmDependencies = ElmDependencies () deriving ...
type instance RuleResult ElmDependencies = String
But now, I get stuck actually using this function of type ElmDependencies -> Action String, since the rule I want to write doesn't actually care what the returned String is, it simply wants to be called if the String changes.
In other words,
action $ do
_ <- askOracle (ElmDependencies ())
cmd_ "elm package install --yes"
at the top-level doesn't work; it will run the action every time.

Your askOracle approach is pretty close, but Shake needs to be able to
identify the "output" of the action, so it can give it a persistent name
between runs, so other steps can depend on it, and use that persistent name to avoid recomputing. One way to do that is to make the action create a stamp file, e.g.:
"packages.stamp" *> \out -> do
_ <- askOracle $ ElmDependencies ()
cmd_ "elm package install --yes"
writeFile' out ""
want ["packages.stamp"]
Separately, an alternative to using Oracle is to have a file
elm-package-dependencies.json which you generate from
elm-package.json, write using writeFileIfChanged (which gives you Eq for files), and depend on that
file in packages.stamp. That way you get Eq on files, and can also
easily debug it or delete the -dependencies.json file to force a rerun.

Related

Adding an extra dependency in new Rules to existing Rules

I am writing a Shakefile with the aim of making it extensible with new Rules. Its interface is a function mainFor :: Rules () -> IO (), the idea being that client projects would only need to define main = mainFor myCustomRules to get the whole thing working. mainFor customRules is defined as a bunch of Shake Rules followed by a call to customRules.
This works as long as the custom rules passed to mainFor are for new targets.
However, some of my stock (non-custom) rules are basically of the form "run this big opaque proprietary external script with this input and hope for the best"; and there can be extra files used by the external script depending on its input. For example, imagine I have a rule of the following form:
"_build/output.bin" %> out -> do
need ["_build/script.scr", "_build/src/generated.src"]
runExternalScript
For a particular client project, maybe the generated source code contains references to another file _build/src/extrainput.src. So in the custom rules passed to mainFor, not only do I need extra rules for this file, but the existing rule should also be modified to mark that it needs this input:
main = mainFor $ do
"_build/src/extrainput.src" %> \out -> do
generateExtraSrc
"_buld/output.bin" %> \out -> do
need ["_build/src/extrainput.src"]
but this, unsurprisingly, fails because both the stock rule in mainFor and the second custom rule passed in the customRules argument are for the same target. Note that I do not want to fully override the stock rule, only extend it to add the extra dependency.
There is currently no way to do this using Shake. The possibilities are:
Add it to Shake. Whether that's the right thing depends on how common this requirement is - and my guess is relatively rare - but that needs validating. The fact you want the dependencies run before the rule is more concerning - it's somehow less compositional than just providing multiple actions that together produce a result.
Do it on the outside. My straw man would be to write the "extras" as some kind of FilePath -> Action () function, then define your own %> that also applied that function to the output. It would only work with pre-selected extension points, but if you redefine %> at the top of the file it can hit all your instances.
If you really want to hide it more, use shakeExtra to store the state in some way.

How can I ensure the destination file does not exist when renaming a file with Haskell, to avoid overwriting it?

I want to rename a file in Haskell without overwriting an already existing one. In case the target file exists I want to deal with that in my code (by appending something to the file name).
The description of renameFile from System.Directory says:
renameFile old new changes the name of an existing file system object from old to new. If the new object already exists, it is atomically replaced by the old object. Neither path may refer to an existing directory.
Is there any existing module or command that would let me rename without overwriting?
I know I can do the checks myself. I'd just feel much better if there was a function written by someone experienced. Overwritten files are gone for good.
Update
I want to rename photos, videos, live photos by creation data from either EXIF (similar to jhead) or the file system timestamp normalized to the timezone the photo was taken in. It might be that two photos were taken at exactly the same time and would end up with the same name: 2017-01-12 – 11-12-11.jpg. This must not happen. The second photo should be called something like 2017-01-12 – 11-12-11a.jpg.
POSIX has the ability to create a new file: atomically check a file exists and only create it if it does not, via the O_EXCL flag to open(). This lets you avoid the race condition in the more obvious implementation in which two processes may check that a file doesn't exist before either of them creates it, causing one process to overwrite the other's file. This can help here: the idea is to exclusively create an empty file at the target, and then overwrite it with a rename only if the exclusive creation succeeded. If the exclusive creation failed then another process already created the file. This is exposed in Haskell's unix package, via the openFd function, which either succeeds or else throws an IOException. It can be used like this:
module RenameNoOverwrite where
import Control.Exception
import Control.Monad
import Data.Bits
import System.Directory
import System.Posix.Files
import System.Posix.IO
renameFileNoOverwrite :: FilePath -> FilePath -> IO Bool
renameFileNoOverwrite old new = do
created <- handle handleIOException $ bracket createNewFile closeFd $ pure $ pure True
when created $ renameFile old new
return created
where
createNewFile = openFd new WriteOnly (Just defaultMode) defaultFileFlags {exclusive = True}
defaultMode = ownerReadMode .|. ownerWriteMode .|. groupReadMode .|. otherReadMode
handleIOException :: IOException -> IO Bool
handleIOException _ = return False
The key part is the {exclusive = True} option, which sets the O_EXCL flag on the resulting call to open().
Windows has a similar ability, exposed via the CREATE_NEW flag to CreateFile. There's also a MOVEFILE_REPLACE_EXISTING flag to MoveFileEx that looks like it might be useful, but I've never used it and the documentation is not 100% clear to me. These are exposed in Haskell's Win32 package.
Unfortunately there doesn't currently seem to be a portable way of doing this.
Here is one potential solution:
import System.Directory (doesFileExist, renameFile)
-- | Rename a src file as tgt file, safely. If the tgt file exists, don't
-- rename and return False. Otherwise, rename src to tgt and return True.
renameSafely :: FilePath -> FilePath -> IO Bool
renameSafely src tgt = do
exists <- doesFileExist tgt
if not exists
then (renameFile src tgt >> return True)
else return False
(Disclaimer: I didn't run this through GHC to ensure that it compiles; the ">>" in the then clause might be an issue.)
As noted in the comments, there is a potential race condition in the file system with two processes trying to create or rename a file with the same name at the same time. However, as you pointed out, that is unlikely to be an issue for you.
If renameSafely returns IO False, then simply try another name. :-)

How to define a timer rule in Shake

I'm trying to understand how to use Shake and how to build new rules. As an exercise, I've decided to implement what I call a backup rule.
The idea is to generate a file if it doesn't exists OR if it's too old (let's more than 24 hour). I like to store long command in a makefile and run them on demand. An example is a mysql backup. The only problem is when the backup already exists, make doesn't do anything. To solve this, I can either
remove the previous backup before redoing a new one,
make the backup target phony
add a fictive force dependency, which I can touch manually or in cron.
What I would like, is to redo the backup if it's older than 24 hours (which I can do with a touch force in cron). Anyway it's only an example to play with Shake. What I would like is something like :
expirable "my_backup" 24 \out -> do
cmd "mysqldump" backup_parameter out
I read the doc, but I have no idea how to do this or define a rule and what an Action is.
I understand that I need to instanciate a Rule class but I can't figure out what is what.
Clarification
I don't want the backup to be run automatically but to be run only on demand but with a maximum of once per 24 hour.
An example scenario is
I have a production database on a remote machine, local copy and run some time consuming reports locally. The normal workflow is
download production backup
refresh the local database with it
create some denormalized tables on a local warehouse database
generate some report.
I don't run the report every days but only when I need it. So I don't want to run report every 24 hours. It's easy to do with a makefile except the timing bit, they are wor around but once again it's a contrived example to understand deeply how Shake work.
So, when I first do make report it backup the db run everything and generate the report.
Now, I want to modify the report (because I'm testing it). I don't need the backup to regenerated (nor the local database to refreshed) (we are the evening, and I know that nothing has changed on production until the next day)
Then the next day, or next month, I rerun the report. This time I need the backup to be done again, and all it's dependency to be rerun as well.
Basically the rule I need is instead of
redo timestamp = timestamp < old
is
redo timestamp = timestamp < old || now > timestamp + 24*36000
But I have no idea where to put this rule.
The question is more where to puth it, instead of how to write it (it's above).
If it's easier (to explain) I can have a rule which ask the user (getLine) 'do you want to redo the this target (yes/no)?`.
Later I will also need a rule depending on the last update of the database (or a specific table). I know how to get the information from the database but not how to integrate it in Shake.
I might be confused with what a Rule is. In make a rule is about how to make a target (so it's more a recipe) or what I think is the Action in Shake. Where is, when I say rule, I mean the rule which decide to remake the target or not, not how to do it. In make, you don't have the choice (it's timestamp) so there is no such concept.
There are two senses of "writing rules" in Shake: 1) using *> or similar to define the rules specific to your build system; 2) defining new types of rules, e.g. defining operators such as *> yourself. Most users of Shake do 1 a lot, and never do 2. Your question seems entirely concerned with 2, which is certainly possible (all rules are written outside the core of Shake) but rarer.
To define something which runs while checking the build, you need to use the Development.Shake.Rule module, and define an instance of the type class Rule. You typically want to sugar up the apply1 function so people can use your rule in a type-safe way. If you are writing a simple rule (e.g. look up a modification date, see if it's changed) then it isn't too hard. If you are doing a more complex rule (e.g. check a file is no more than 1 day old) it's a bit tricker, but still possible - it needs more care thinking about what gets stored where. Taking your "rebuild if file is older than some number of seconds" example, we can define:
module MaximumAgeRule(maximumAge, includeMaximumAge) where
import Data.Maybe
import Development.Shake.Rule
import Development.Shake.Classes
import Development.Shake
import System.Directory as IO
import Data.Time
newtype MaxAgeQ = MaxAgeQ (FilePath, Double)
deriving (Show,Binary,NFData,Hashable,Typeable,Eq)
instance Rule MaxAgeQ Double where
storedValue _ (MaxAgeQ (file, secs)) = do
exists <- IO.doesFileExist file
if not exists then return Nothing else do
mtime <- getModificationTime file
now <- getCurrentTime
return $ Just $ fromRational (toRational $ diffUTCTime now mtime)
equalValue _ (MaxAgeQ (_, t)) old new = if new < t then EqualCheap else NotEqual
-- | Define that the file must be no more than N seconds old
maximumAge :: FilePath -> Double -> Action ()
maximumAge file secs = do
apply1 $ MaxAgeQ (file, secs) :: Action Double
return ()
includeMaximumAge :: Rules ()
includeMaximumAge = do
rule $ \q#(MaxAgeQ (_, secs)) -> Just $ do
opts <- getShakeOptions
liftIO $ fmap (fromMaybe $ secs + 1) $ storedValue opts q
We can then use the rule with:
import Development.Shake
import MaximumAgeRule
main = shakeArgs shakeOptions $ do
includeMaximumAge
want ["output.txt"]
"output.txt" *> \out -> do
maximumAge out (24*60*60)
liftIO $ putStrLn "rerunning"
copyFile' "input.txt" "output.txt"
Now the file input.txt will be copied to output.txt every time it changes. In addition, if output.txt is more than one day old, it will be copied afresh.
How the usage works Since we are using a custom rule, we have to declare that with includeMaximumAge (which is ugly, but unavoidable). We then call maximumAge when producing output.txt, saying that the file output.txt must be no more than 1 day old. If it is, the rule reruns. Simple and reusable.
How the definition works The definition is a bit complex, but I don't expect many people to define rules, so a StackOverflow question per rule definition seems reasonable :). We have to define a key and a value for the rule, where the key produces the value. For the key we declare a fresh type (as you always should for keys) which stores the filename and how old it is allowed to be. For the value, we store how old the file is. The storedValue function retrieves the value from the key by querying the file. The equalValue function looks at the value and decides if the value is EqualCheap (don't rebuild) or NotEqual (do rebuild). Normally equalValue does old == new as its main test, but here we don't care what the value was last time (we ignore old), but we do care what the threshold in MaxAgeQ is, and we compare it to the value.
The maximumAge function just invokes apply1 to add a dependency on MaxAgeQ, and includeMaximumAge defines what apply1 calls.
Here's a solution that partially works:
import Development.Shake
import Control.Monad
import System.Directory as IO
import Data.Time
buildBackupAt :: FilePath -> Action ()
buildBackupAt out = cmd "mysqldump" "-backup" out {- Or whatever -}
-- Argument order chosen for partial application
buildEvery :: NominalDiffTime -> (FilePath -> Action ()) -> FilePath -> Action ()
buildEvery secs act file = do
alwaysRerun
exists <- liftIO $ IO.doesFileExist file
rebuild <- if not exists then return True else do
mtime <- liftIO $ getModificationTime file
now <- liftIO $ getCurrentTime
return $ diffUTCTime now mtime > secs
when rebuild $ act file
myRules :: Rules ()
myRules = "my_backup" *> buildEvery (24*60*60) buildBackupAt
-- File name is a FilePattern that shake turns into a FilePath; no wildcard here,
-- so it's simple, but you can wildcard, too as long as you action pays attention
-- to the FilePath passed in.
This will rebuild the backup every day, but will not rebuild if the dependencies declared in buildBackupAt change.

Which Monad do I need?

This is something of an extension to this question:
Dispatching to correct function with command line arguments in Haskell
So, as it turns out, I don't have a good solution yet for dispatching "commands" from the command line to other functions. So, I'd like to extend the approach in the question above. It seems cumbersome to have to manually add functions to the table and apply the appropriate transformation function to each function so that it takes a list of the correct size instead of its normal arguments. Instead, I'd like to build a table where I'll add functions and "tag" them with the number of arguments it needs to take from the command line. The "add" procedure, should then take care of composing with the correct "takesXarguments" procedure and adding it to the table.
I'd like to be able to install "packages" of functions into the table, which makes me think I need to be able to keep track of the state of the table, since it will change when packages get installed. Is the Reader Monad or the State Monad what I'm looking for?
No monad necessary. Your tagging idea is on the right track, but that information is encoded probably in a different way than you expected.
I would start with a definition of a command:
type Command = [String] -> IO ()
Then you can make "command maker" functions:
mkCommand1 :: (String -> IO ()) -> Command
mkCommand2 :: (String -> String -> IO ()) -> Command
...
Which serves as the tag. If you don't like the proliferation of functions, you can also make a "command lambda":
arg :: (String -> Command) -> Command
arg f (x:xs) = f x xs
arg f [] = fail "Wrong number of arguments"
So that you can write commands like:
printHelloName :: Command
printHelloName = arg $ \first -> arg $ \last -> do
putStrLn $ "Hello, Mr(s). " ++ last
putStrLn $ "May I call you " ++ first ++ "?"
Of course mkCommand1 etc. can be easily written in terms of arg, for the best of both worlds.
As for packages, Command sufficiently encapsulates choices between multiple subcommands, but they don't compose. One option here is to change Command to:
type Command = [String] -> Maybe (IO ())
Which allows you to compose multiple Commands into a single one by taking the first action that does not return Nothing. Now your packages are just values of type Command as well. (In general with Haskell we are very interested in these compositions -- rather than packages and lists, think about how you can take two of some object to make a composite object)
To save you from the desire you have surely built up: (1) there is no reasonable way to detect the number of arguments a function takes*, and (2) there is no way to make a type depend on a number, so you won't be able to create a mkCommand which takes as its first argument an Int for the number of arguments.
Hope this helped.
In this case, it turns out that there is, but I recommend against it and think it is a bad habit -- when things get more abstract the technique breaks down. But I'm something of a purist; the more duct-tapey Haskellers might disagree with me.

FastCGI Haskell script to make use of Pandoc text conversion

1. Motivation
I'm writing my own mini-wiki. I want to be able to easily convert from markdown to LATEX/HTML and vice versa. After some searching I discovered Pandoc, which is written in Haskell and that I could use the FastCGI module to run a Haskell program on my Apache server.
2. Problem/ Question
I'm not sure how to what protocol I should use to send my FastCGI script the input/output variables (POST/GET?) and how this is done exactly. Any ideas, suggestions, solutions?
3. Steps taken
3.1 Attempt
Here is what I've done so far (based on example code). Note, I have no experience in Haskell and at the moment I don't have too much time to learn the language. I'd just love to be able to use the pandoc text format conversion tool.
module Main ( main ) where
import Control.Concurrent
import Network.FastCGI
import Text.Pandoc
--initialize Variables/ functions
fastcgiResult :: CGI CGIResult
markdownToHTML:: String -> String
--implement conversion function
markdownToHTML s = writeLaTeX defaultWriterOptions {writerReferenceLinks = True} (readMarkdown defaultParserState s)
--main action
fastcgiResult = do
setHeader "Content-type" "text/plain"
n <- queryString
output $ (markdownToHTML n)
main :: IO ()
main = runFastCGIConcurrent' forkIO 10 fastcgiResult
This code reads the string after the question mark in the request url. But this is not a good solution as certain characters are omitted (e.g. '#' ) and spaces are replaced by "/20%".
Thanks in advance.
3.2 Network.CGI
Documentation found here. Under the heading "Input" there are a number of methods to get input. Which one is right for me?
Is it :
Get the value of an input variable, for example from a form. If the variable has multiple values, the first one is returned. Example:
query <- getInput "query"
So lets say I have a HTML POST form with name='Joe' can I grab this using getInput? And if so how do I handle the Maybe String type?
The fastCGI package is actually a extension of the cgi package, which includes the protocol types for receiving request data and returning result pages. I'd suggest using CGI to start with, and then move to fastCGI once you know what you are doing.
You might also want to look at this tutorial.
Edit to answer questions about the tutorial:
"Maybe a" is a type that can either contain "Just a" or "Nothing". Most languages use a null pointer to indicate that there is no data, but Haskell doesn't have null pointers. So we have an explicit "Maybe" type instead for cases when the data might be null. The two constructors ("Just" and "Nothing") along with the type force you to explicitly allow for the null case when it might happen, but also let you ignore it when it can't happen.
The "maybe" function is the universal extractor for Maybe types. The signature is:
maybe :: b -> (a -> b) -> Maybe a -> b
Taking the arguments from front to back, the "Maybe a" third argument is the value you are trying to work with. The second argument is a function called if the third argument is "Just v", in which case the result is "f v". The first argument is the default, returned if the third is "Nothing".
In this case, the trick is that the "cgiMain" function is called twice. If it finds an input field "name" then the "mn" variable will be set to (Just "Joe Bloggs"), otherwise it will be set to (Nothing). (I'm using brackets to delimit values now because quotes are being used for strings).
So the "maybe" call returns the page to render. The first time through no name is provided, so "mn" is (Nothing) and the default "inputForm" page is returned for rendering. When the user clicks Submit the same URL is requested, but this time with the "name" field set, so now you get the "greet" function called with the name as an argument, so it says "Hello Joe Bloggs".

Resources