Moving from static configuration to dynamic configuration - haskell

I am working on a haskell project where the settings are currently in a file called Setting.hs, so they are checked during compile time and can be accessed globally.
However, since that is a bit too static, I was considering to read the configuration during runtime. The codebase is huge and it seems it would be considerable effort to pass the setting e.g. as an argument through the whole program flow, since they may be arbitrarily accessed from anywhere.
Are there any design patterns, libraries or even ghc extensions that can help here without refactoring the whole code?

Thanks for the hints! I came up with a minimal example which shows how I will go about it with the reflection package:
{-# LANGUAGE Rank2Types, FlexibleContexts, UndecidableInstances #-}
import Data.Reflection
data GlobalConfig = MkGlobalConfig {
getVal1 :: Int
, getVal2 :: Double
, getVal3 :: String
}
main :: IO ()
main = do
let config = MkGlobalConfig 1 2.0 "test"
-- initialize the program flow via 'give'
print $ give config (doSomething 2)
-- this works too, the type is properly inferred
print $ give config (3 + 3)
-- and this as well
print $ give config (addInt 7 3)
-- We need the Given constraint, because we call 'somethingElse', which finally
-- calls 'given' to retrieve the configuration. So it has to be propagated up
-- the program flow.
doSomething :: (Given GlobalConfig) => Int -> Int
doSomething = somethingElse "abc"
-- since we call 'given' inside the function to retrieve the configuration,
-- we need the Given constraint
somethingElse :: (Given GlobalConfig) => String -> Int -> Int
somethingElse str x
| str == "something" = x + getVal1 given
| getVal3 given == "test" = 0 + getVal1 given
| otherwise = round (fromIntegral x * getVal2 given)
-- no need for Given constraint here, since this does not use 'given'
-- or any other functions that would
addInt :: Int -> Int -> Int
addInt = (+)
The Given class is a bit easier to work with and perfectly suitable for a global configuration model. All functions that do not make use of given (which gets the value) don't seem to need the class constraint. That means I only have to change functions that actually access the global configuration.
That's what I was looking for.

What you are asking, if it was possible would break referential transparency, at least for pure function ( a pure function result can depend on some global variables but not on a config file couldn't it ) ?
Usually people avoid that type of situation by passing implicitly the configuration as data via a Monad. Alternatively (if you are happy to refactor your code a bit) you can use the implicit parameter extenson, which in theory has been made to solve that type of problem but in practice doesn't really work.
However, if you really need, you can use unsafePerformIO and ioRef to have a top level mutable state which is dirty and frowned upton. You need a top level mutable state, because you need to be able to modify "mutate" your initial config when you are loading it.
Then you get things like that :
myGlobalVar :: IORef Int
{-# NOINLINE myGlobalVar #-}
myGlobalVar = unsafePerformIO (newIORef 17)

Related

Changing a single record field to be strict leads to worse performance

I have a program that uses haskell-src-exts, and to improve performance I decided to make some record fields strict. This resulted in much worse performance.
Here's the complete module that I'm changing:
{-# LANGUAGE DeriveDataTypeable, BangPatterns #-}
module Cortex.Hackage.HaskellSrcExts.Language.Haskell.Exts.SrcSpan(
SrcSpan, srcSpan, srcSpanFilename, srcSpanStartLine,
srcSpanStartColumn, srcSpanEndLine, srcSpanEndColumn,
) where
import Control.DeepSeq
import Data.Data
data SrcSpan = SrcSpanX
{ srcSpanFilename :: String
, srcSpanStartLine :: Int
, srcSpanStartColumn :: Int
, srcSpanEndLine :: Int
, srcSpanEndColumn :: Int
}
deriving (Eq,Ord,Show,Typeable,Data)
srcSpan :: String -> Int -> Int -> Int -> Int -> SrcSpan
srcSpan fn !sl !sc !el !ec = SrcSpanX fn sl sc el ec
instance NFData SrcSpan where
rnf (SrcSpanX x1 x2 x3 x4 x5) = rnf x1
Note that the only way to construct a SrcSpan is by using the srcSpan function which is strict in all the Ints.
With this code my program (sorry, I can't share it) runs in 163s.
Now change a single line, e.g.,
, srcSpanStartLine :: !Int
I.e., the srcSpanStartLine field is now marked as strict. My program now takes 198s to run. So making that one field strict increases the running time by about 20%.
How is this possible? The code for the srcSpan function should be the same regardless since it is already strict. The code for the srcSpanStartLine selector should be a bit simpler since it no longer has to evaluate.
I've experimented with -funbox-strict-fields and -funbox-small-strict-field on and off. It doesn't make any noticeable difference. I'm using ghc 7.8.3.
Has anyone seen something similar? Any bright ideas what might cause it?
With some more investigation I can answer my own question. The short answer is uniplate.
Slightly longer answer. In one place I used uniplate to get the children of a Pat (haskell-src-exts type for patterns). The call looked like children p and the type of this instance of children was Pat SrcSpanInfo -> [Pat SrcSpanInfo]. So it's doing no recursion, just returning the immediate children of a node.
Uniplate uses two very different methods depending on if there are strict fields in the type your operating on. Without strict fields it reasonable fast, with strict fields it switches to using gfoldl and is incredibly slow. And even though my use of uniplate didn't directly involve a strict field, it slowed down.
Conclusion: Beware uniplate if you have a strict field anywhere in sight!

StateT with Q monad from template haskell

I would like to create a function that takes some declarations of type Dec (which I get from [d| ... |]) and modify them. Modifications will depend on previous declarations so I would like to be able to store them in map hidden in State monad - mainly I create records and class instances and add to them fields from previous records. (This way I want to mimic OOP but this is probably not relevant for my question). I would like to splice results of my computations to code after I processed (and changed) all declarations.
I tried composing StatT with Q every possible way but I can't get it right.
Edit
My idea was to create functions collecting classes declarations (I am aware that Haskell is not object oriented language, I read some papers about how you can encode OOP in Haskell and I try to implement it with Template Haskell as small assignment).
So I would like to be able to write something like this:
declare [d| class A a where
attr1 :: a -> Int
attr2 :: a -> Int |]
declareInherit [d| class B b where
attr3 :: b -> Int |] ''A
This is encoding of (for example c++ code)
struct A{
int attr1;
int attr2;
}
struct B : A {
int attr3;
}
And I would like to generate two records with template haskell:
data A_data = A_data { attr1' :: Int, attr2' :: Int}
data B_data = B_data { attr1'' :: Int, attr2'' :: Int, attr3'' :: Int}
and instances of classes:
instance A A_data where
attr1 = attr1'
attr2 = attr2'
instance A B_data where
attr1 = attr1''
attr2 = attr2''
instance B B_data where
attr3 = attr3''
This is how my encoding of OO works, but I would like to be able to generate it automatically, not write it by hand.
I had a problem with interacting with DecsQ in declare function, probably I would like it to exist in something like this:
data Env = Env {classes :: Map.Map Name Dec }
type QS a = (StateT Env Q) a
I also has problem how to run computation in QS.
A problem with Template Haskell is that its API is not as polished as in most of other Haskell libraries. The Q monad is overloaded with problems: it reifies, it renders, it manages the state of local names. But it is never a good idea to interleave problem domains, since it's been proven that human beings can only think about one thing at a time (in other words, we're "single core"). This means that mixing problems together is not scalable. That is why it is hard to reason about Q already, and you want to add yet another problem on top of it: your state. Bad idea.
As well as any other problem you should approach this with separation of concerns. I.e., you should extract smaller problem domains from your main one and work with each of them in isolation. Concerning Template Haskell there are several evident domains: reification of existing ASTs, their analysis and rendering of new ASTs. For communication between those domains you may also need some "lingua franca" data model. Kinda reminds of something, doesn't it? Yep, it's MVC.
There are certain properties of the extracted domains, which you can then exploit to your benefits: you only need to remain in the Q monad for reification, the rendering and analysis can be done in the pure environment, granting you with all of its benefits. You can safely purify quasi-quotes using unsafePerformIO . runQ.
For real-life examples I can refer you to some of my projects, in which I apply this approach:
https://github.com/nikita-volkov/type-structure/blob/404017df89d3432481ed118776713cbd660ba220/src/TypeStructure/TH.hs
https://github.com/nikita-volkov/graph-db/blob/ba6ca0343ce73011b1c801d74aa3e17e0250d8a4/library/GraphDB/Macros.hs
At least within one file, you can pass state from one DecsQ splice to the one lower down in the file by storing it in:
{-# NOINLINE env #-}
env :: IORef Env
env = unsafePerformIO (newIORef (Env mempty))
and then do things like runIO (readIORef env) :: Q Env.

Access environment in a function

In main I can read my config file, and supply it as runReader (somefunc) myEnv just fine. But somefunc doesn't need access to the myEnv the reader supplies, nor do the next couple in the chain. The function that needs something from myEnv is a tiny leaf function.
How do I get access to the environment in a function without tagging all the intervening functions as (Reader Env)? That can't be right because otherwise you'd just pass myEnv around in the first place. And passing unused parameters through multiple levels of functions is just ugly (isn't it?).
There are plenty of examples I can find on the net but they all seem to have only one level between runReader and accessing the environment.
I'm accepting Chris Taylor's because it's the most thorough and I can see it being useful to others. Thanks too to Heatsink who was the only one who attempted to actually directly answer my question.
For the test app in question I'll probably just ditch the Reader altogether and pass the environment around. It doesn't buy me anything.
I must say I'm still puzzled by the idea that providing static data to function h changes not only its type signature but also those of g which calls it and f which calls g. All this even though the actual types and computations involved are unchanged. It seems like implementation details are leaking all over the code for no real benefit.
You do give everything the return type of Reader Env a, although this isn't as bad as you think. The reason that everything needs this tag is that if f depends on the environment:
type Env = Int
f :: Int -> Reader Int Int
f x = do
env <- ask
return (x + env)
and g calls f:
g x = do
y <- f x
return (x + y)
then g also depends on the environment - the value bound in the line y <- f x can be different, depending on what environment is passed in, so the appropriate type for g is
g :: Int -> Reader Int Int
This is actually a good thing! The type system is forcing you to explicitly recognise the places where your functions depend on the global environment. You can save yourself some typing pain by defining a shortcut for the phrase Reader Int:
type Global = Reader Int
so that now your type annotations are:
f, g :: Int -> Global Int
which is a little more readable.
The alternative to this is to explicitly pass the environment around to all of your functions:
f :: Env -> Int -> Int
f env x = x + env
g :: Env -> Int -> Int
g x = x + (f env x)
This can work, and in fact syntax-wise it's not any worse than using the Reader monad. The difficulty comes when you want to extend the semantics. Say you also depend on having an updatable state of type Int which counts function applications. Now you have to change your functions to:
type Counter = Int
f :: Env -> Counter -> Int -> (Int, Counter)
f env counter x = (x + env, counter + 1)
g :: Env -> Counter -> Int -> (Int, Counter)
g env counter x = let (y, newcounter) = f env counter x
in (x + y, newcounter + 1)
which is decidedly less pleasant. On the other hand, if we are taking the monadic approach, we simply redefine
type Global = ReaderT Env (State Counter)
The old definitions of f and g continue to work without any trouble. To update them to have application-counting semantics, we simply change them to
f :: Int -> Global Int
f x = do
modify (+1)
env <- ask
return (x + env)
g :: Int -> Global Int
g x = do
modify(+1)
y <- f x
return (x + y)
and they now work perfectly. Compare the two methods:
Explicitly passing the environment and state required a complete rewrite when we wanted to add new functionality to our program.
Using a monadic interface required a change of three lines - and the program continued to work even after we had changed the first line, meaning that we could do the refactoring incrementally (and test it after each change) which reduces the likelihood that the refactor introduces new bugs.
Nope. You totally do tag all the intervening functions as Reader Env, or at least as running in some monad with an Env environment. And it totally does get passed around everywhere. That's perfectly normal -- albeit not as inefficient as you might think, and the compiler will often optimize such things away in many places.
Basically, anything that uses the Reader monad -- even if it's very far down -- should be a Reader itself. (If something doesn't use the Reader monad, and doesn't call anything else that does, it doesn't have to be a Reader.)
That said, using the Reader monad means that you don't have to pass the environment around explicitly -- it's handled automatically by the monad.
(Remember, it's just a pointer to the environment getting passed around, not the environment itself, so it's quite cheap.)
Another technique which might be useful is to pass the leaf function itself, partially applied with the value from the config file. Of course this only makes sense if being able to replace the leaf function is somehow to your advantage.
These are truly global variables, since they are initialized exactly once in main. For this situation, it's appropriate to use global variables. You have to use unsafePerformIO to write them if IO is required.
If you're only reading a configuration file, it's pretty easy:
config :: Config
{-# NOINLINE config #-}
config = unsafePerformIO readConfigurationFile
If there are some dependences on other code, so that you have to control when the configuration file is loaded, it's more complicated:
globalConfig :: MVar Config
{-# NOINLINE globalConfig #-}
globalConfig = unsafePerformIO newEmptyMVar
-- Call this from 'main'
initializeGlobalConfig :: Config -> IO ()
initializeGlobalConfig x = putMVar globalConfig x
config :: Config
config = unsafePerformIO $ do
when (isEmptyMVar globalConfig) $ fail "Configuration has not been loaded"
readMVar globalConfig
See also:
Proper way to treat global flags in Haskell
Global variables via unsafePerformIO in Haskell
If you don't want to make everything down to the tiny leaf function be in the Reader monad, does your data allow you to extract the necessary item(s) out of the Reader monad at the top level, and then pass them as ordinary parameters down through to the leaf function? That would eliminate the need for everything in between to be in Reader, although if the leaf function does need to know that it's inside Reader in order to use Reader's facilities then you can't get away from having to run it inside your Reader instance.

haskell load module in list

Hey haskellers and haskellettes,
is it possible to load a module functions in a list.
in my concrete case i have a list of functions all checked with or
checkRules :: [Nucleotide] -> Bool
checkRules nucs = or $ map ($ nucs) [checkRule1, checkRule2]
i do import checkRule1 and checkRule2 from a seperate module - i don't know if i will need more of them in the future.
i'd like to have the same functionality look something like
-- import all functions from Rules as rules where
-- :t rules ~~> [([Nucleotide] -> Bool)]
checkRules :: [Nucleotide] -> Bool
checkRules nucs = or $ map ($ nucs) rules
the program sorts Pseudo Nucleotide Sequences in viable and nonviable squences according to given rules.
thanks in advance ε/2
Addendum:
So do i think right - i need:
genList :: File -> TypeSignature -> [TypeSignature]
chckfun :: (a->b) -> TypeSignature -> Bool
at compile time.
but i can't generate a list of all functions in the module - as they most probably will have not the same type signature and hence not all fit in one list. so i cannot filter given list with chckfun.
In order to do this i either want to check the written type signatures in the source file (?) or the inferenced types given by the compiler(?).
another problem that comes to my mind is: not every function written in the source file might get exported ?
Is this a problem a haskell beginner should try to solve after 5 months of learning - my brain is shaped like a klein's bottle after all this "compile time thinking".
There is a nice package on Hackage just for this: language-haskell-extract. In particular, the Template Haskell function functionExtractor takes a regular expression and returns a list of the matching top level bindings as (name, value) pairs. As long as they all have matching types, you're good to go.
{-# LANGUAGE TemplateHaskell #-}
import Language.Haskell.Extract
myFoo = "Hello"
myBar = "World"
allMyStuff = $(functionExtractor "^my")
main = print allMyStuff
Output:
[("myFoo", "Hello"), ("myBar", "World")]

SML conversions to Haskell

A few basic questions, for converting SML code to Haskell.
1) I am used to having local embedded expressions in SML code, for example test expressions, prints, etc. which functions local tests and output when the code is loaded (evaluated).
In Haskell it seems that the only way to get results (evaluation) is to add code in a module, and then go to main in another module and add something to invoke and print results.
Is this right? in GHCi I can type expressions and see the results, but can this be automated?
Having to go to the top level main for each test evaluation seems inconvenient to me - maybe just need to shift my paradigm for laziness.
2) in SML I can do pattern matching and unification on a returned result, e.g.
val myTag(x) = somefunct(a,b,c);
and get the value of x after a match.
Can I do something similar in Haskell easily, without writing separate extraction functions?
3) How do I do a constructor with a tuple argument, i.e. uncurried.
in SML:
datatype Thing = Info of Int * Int;
but in Haskell, I tried;
data Thing = Info ( Int Int)
which fails. ("Int is applied to too many arguments in the type:A few Int Int")
The curried version works fine,
data Thing = Info Int Int
but I wanted un-curried.
Thanks.
This question is a bit unclear -- you're asking how to evaluate functions in Haskell?
If it is about inserting debug and tracing into pure code, this is typically only needed for debugging. To do this in Haskell, you can use Debug.Trace.trace, in the base package.
If you're concerned about calling functions, Haskell programs evaluate from main downwards, in dependency order. In GHCi you can, however, import modules and call any top-level function you wish.
You can return the original argument to a function, if you wish, by making it part of the function's result, e.g. with a tuple:
f x = (x, y)
where y = g a b c
Or do you mean to return either one value or another? Then using a tagged union (sum-type), such as Either:
f x = if x > 0 then Left x
else Right (g a b c)
How do I do a constructor with a tuple argument, i.e. uncurried in SML
Using the (,) constructor. E.g.
data T = T (Int, Int)
though more Haskell-like would be:
data T = T Int Bool
and those should probably be strict fields in practice:
data T = T !Int !Bool
Debug.Trace allows you to print debug messages inline. However, since these functions use unsafePerformIO, they might behave in unexpected ways compared to a call-by-value language like SML.
I think the # syntax is what you're looking for here:
data MyTag = MyTag Int Bool String
someFunct :: MyTag -> (MyTag, Int, Bool, String)
someFunct x#(MyTag a b c) = (x, a, b, c) -- x is bound to the entire argument
In Haskell, tuple types are separated by commas, e.g., (t1, t2), so what you want is:
data Thing = Info (Int, Int)
Reading the other answers, I think I can provide a few more example and one recommendation.
data ThreeConstructors = MyTag Int | YourTag (String,Double) | HerTag [Bool]
someFunct :: Char -> Char -> Char -> ThreeConstructors
MyTag x = someFunct 'a' 'b' 'c'
This is like the "let MyTag x = someFunct a b c" examples, but it is a the top level of the module.
As you have noticed, Haskell's top level can defined commands but there is no way to automatically run any code merely because your module has been imported by another module. This is entirely different from Scheme or SML. In Scheme the file is interpreted as being executed form-by-form, but Haskell's top level is only declarations. Thus Libraries cannot do normal things like run initialization code when loaded, they have to provide a "pleaseRunMe :: IO ()" kind of command to do any initialization.
As you point out this means running all the tests requires some boilerplate code to list them all. You can look under hackage's Testing group for libraries to help, such as test-framework-th.
For #2, yes, Haskell's pattern matching does the same thing. Both let and where do pattern matching. You can do
let MyTag x = someFunct a b c
in ...
or
...
where MyTag x = someFunct a b c

Resources