How to explicitly instantiate/specialise a polymorphic Haskell function? - haskell

I was wondering whether it is possible to explicitly instantiate/specialise a polymorphic function in Haskell? What I mean is, imagine I've a function like the following:
parseFile :: FromJSON a => FilePath -> IO Either String a
The structure into which it attempts to parse the file's contents will depend on the type of a. Now, I know it's possible to specify a by annotation:
parseFile myPath :: IO Either String MyType
What I was wondering was whether it's possible to specialise parseFile more explicitly, for instance with something like (specialise parseFile MyType) to turn it into parseFile :: FilePath -> IO Either String MyType
The reason I ask is that the method of annotation can become clumsy with larger functions. For instance, imagine parseFile gets called by foo which gets called by bar, and bar's return value has a complex type like
:: FromJSON a => IO (([Int],String), (Int, String, Int), a, (Double, [String]))
This means that if I want to call bar with a as MyType, I have to annotate the call with
:: IO (([Int],String), (Int, String, Int), MyType, (Double, [String]))
If I want to call bar multiple times to process different types, I end up writing this annotation multiple times, which seems like unnecessary duplication.
res1 <- bar inputA :: IO (([Int],String), (Int, String, Int), MyType, (Double, [String]))
res2 <- bar inputB :: IO (([Int],String), (Int, String, Int), OtherType, (Double, [String]))
res3 <- bar inputC :: IO (([Int],String), (Int, String, Int), YetAnotherType, (Double, [String]))
Is there a way to avoid this? I'm aware it would be possible to bind the result of bar inputA and use it in a function expecting a MyType, allowing the type engine to infer that the a in question was a MyType without requiring explicit annotation. This seems to sacrifice type safety however, as if I accidentally used the result of the above bar inputB (an OtherType) in a function that expects a MyType, for instance, the type system wouldn't complain, instead the program would fail at runtime when attempting to parse inputB into a MyType, as inputB contains an OtherType, not a MyType.

First, a small correction, the type should be
parseFile :: FromJSON a => FilePath -> IO (Either String a)
The parenthesis are important and necessary
There are a couple ways around this. For example, if you had a function
useMyType :: MyType -> IO ()
useMyType = undefined
Then you used parseFile as
main = do
result <- parseFile "data.json"
case result of
Left err -> putStrLn err
Right mt -> useMyType mt
No extra type annotations are required, GHC can infer the type of mt by its use with useMyType.
Another way is to simply assign it to a concretely typed name:
parseMyTypeFromFile :: FilePath -> IO (Either String MyType)
parseMyTypeFromFile = parseFile
main = do
result <- parseMyTypeFromFile "data.json"
case result of
Left err -> putStrLn err
Right mt -> useMyType mt
And where ever you use parseMyTypeFromFile no explicit annotation is necessary. This is the same as a common practice for specifying the type of read:
readInt :: String -> Int
readInt = read
For solving the bar problem, if you have a type that complex I would at least suggest creating an alias for it, if not its own data type entirely, possibly with record fields and whatnot. Something similar to
data BarType a = (([Int], String), (Int, String, Int), a, (Double, [String]))
Then you can write bar as
bar :: FromJSON a => InputType -> IO (BarType a)
bar input = implementation details
which makes bar nicer to read too. Then you can just do
res1 <- bar inputA :: IO (BarType MyType)
res2 <- bar inputB :: IO (BarType OtherType)
res3 <- bar inputC :: IO (BarType YetAnotherType)
I would consider this perfectly clear and idiomatic Haskell, personally. Not only is it immediately readable and clear what you're doing, but by having a name to refer to the complex type, you minimize the chance of typos, take advantage of IDE autocompletion, and can put documentation on the type itself to let others (and your future self) know what all those fields mean.

You can't make a polymorphic function provided elsewhere and given an explicit annotation into a more restricted version with the same name. But you can do something like:
parseFileOfMyType :: FilePath -> IO Either String MyType
parseFileOfMyType = parseFile
A surprising number of useful functions in various libraries are similar type-specific aliases of unassuming functions like id. Anyway, you should be able to make type-constrained versions of those examples using this technique.
Another solution to the verbosity problem would be to create type aliases:
type MyInputParse a = IO (([Int],String), (Int, String, Int), a, (Double, [String]))
res1 <- bar inputA :: MyInputParse MyType
res2 <- bar inputB :: MyInputParse OtherType
res3 <- bar inputC :: MyInputParse YetAnotherType
In the not-too-distant future, GHC will possibly be getting a mechanism to provide partial type signatures, which will let you leave some sort of hole in the type signature that inference will fill in while you make the part you're interested in specific. But it's not there yet.

Related

How can I make the signature of this function more precise

I have two functions:
prompt :: Text -> (Text -> Either Text a) -> IO a
subPrompt :: Text -> (Text -> Bool) -> IO a -> IO (Maybe (Text, a))
subPrompt takes a second prompt (argument 3) and displays it if the function in argument 2 comes back as true after running the first prompt.
What I don't like is that argument 3 is IO a I would like it to be something more like:
subPrompt :: Text -> (Text -> Bool) -> prompt -> IO (Maybe (Text, a))
But I know I can't do that. I'm stuck trying to think of a way to make it clearer from the signature what the third argument is. Is there some way I can define a clearer type? Or maybe I'm overthinking it and IO a is actually fine - I'm pretty new to haskell.
One way is to reify the two things as a data structure. So:
{-# LANGUAGE GADTs #-}
data Prompt a where
Prompt :: Text -> (Text -> Either Text a) -> Prompt a
SubPrompt :: Text -> (Text -> Bool) -> Prompt a -> Prompt (Maybe (Text, a))
Now because the third argument to SubPrompt is a Prompt, you know it must either be a call to SubPrompt or Prompt -- definitely not some arbitrary IO action that might do filesystem access or some other nasty thing.
Then you can write an interpreter for this tiny DSL into IO:
runPrompt :: Prompt a -> IO a
runPrompt (Prompt cue validator) = {- what your old prompt used to do -}
runPrompt (SubPrompt cue deeper sub) = {- what your old subPrompt used to do,
calling runPrompt on sub where needed -}
Besides the benefit of being sure you don't have arbitrary IO as an argument to SubPrompt, this has the side benefit that it makes testing easier. Later you could implement a second interpreter that is completely pure; say, something like this, which takes a list of texts to be treated as user inputs and returns a list of texts that the prompt output:
data PromptResult a = Done a | NeedsMoreInput (Prompt a)
purePrompt :: Prompt a -> [Text] -> ([Text], PromptResult a)
purePrompt = {- ... -}
Nothing wrong with making the second prompt a simple IO a - specially if you document what it is somewhere.
That said, yes, it is good practice to make the types as self-explanatory as possible; you can create an alias:
type Prompt a = IO a
and then use it in subPrompt's signature:
subPrompt :: Text -> (Text -> Bool) -> Prompt a -> IO (Maybe (Text, a))
This makes the signature more self-explanatory, while still allowing you to pass any IO a as the third parameter (the keyword type just creates an alias).
But wait, there is more: you'd rather not accidentally pass any IO a that isn't actually a prompt! You don't want to pass it an IO action that, say, launches the missiles...
So, we can declare an actual Prompt type (not just an alias, but a real type):
newtype Prompt a = Prompt { getPrompt :: IO a }
This allows you to wrap any value of type IO a inside a type, ensuring it doesn't get mixed up with other functions with the same type, but different semantics.
The signature of subPrompt remains the same as before:
subPrompt :: Text -> (Text -> Bool) -> Prompt a -> IO (Maybe (Text, a))
But now you cannot pass just any old IO a to it; to pass your prompt, for example, you have to wrap it:
subPrompt "Do we proceed?" askYesNo (Prompt (prompt "Please enter your name" processName))
(subPrompt won't be able to call it directly, but will have to extract "prompt" from inside the wrapper: let actualPrompt = getPrompt wrappedPrompt)

Haskell: function signature

This program compiles without problems:
bar :: MonadIO m
=> m String
bar = undefined
run2IO :: MonadIO m
=> m String
-> m String
run2IO foo = liftIO bar
When I change bar to foo (argument name),
run2IO :: MonadIO m
=> m String
-> m String
run2IO foo = liftIO foo
I get:
Couldn't match type ‘m’ with ‘IO’
‘m’ is a rigid type variable bound by
the type signature for run2IO :: MonadIO m => m String -> m String
...
Expected type: IO String
Actual type: m String ...
Why are the 2 cases are not equivalent?
Remember the type of liftIO:
liftIO :: MonadIO m => IO a -> m a
Importantly, the first argument must be a concrete IO value. That means when you have an expression liftIO x, then x must be of type IO a.
When a Haskell function is universally quantified (using an implicit or explicit forall), then that means the function caller chooses what the type variable is replaced by. As an example, consider the id function: it has type a -> a, but when you evaluate the expression id True, then id takes the type Bool -> Bool because a is instantiated as the Bool type.
Now, consider your first example again:
run2IO :: MonadIO m => m Integer -> m Integer
run2IO foo = liftIO bar
The foo argument is completely irrelevant here, so all that actually matters is the liftIO bar expression. Since liftIO requires its first argument to be of type IO a, then bar must be of type IO a. However, bar is polymorphic: it actually has type MonadIO m => m Integer.
Fortunately, IO has a MonadIO instance, so the bar value is instantiated using IO to become IO Integer, which is okay, because bar is universally quantified, so its instantiation is chosen by its use.
Now, consider the other situation, in which liftIO foo is used, instead. This seems like it’s the same, but it actually isn’t at all: this time, the MonadIO m => m Integer value is an argument to the function, not a separate value. The quantification is over the entire function, not the individual value. To understand this more intuitively, it might be helpful to consider id again, but this time, consider its definition:
id :: a -> a
id x = x
In this case, x cannot be instantiated to be Bool within its definition, since that would mean id could only work on Bool values, which is obviously wrong. Effectively, within the implementation of id, x must be used completely generically—it cannot be instantiated to a specific type because that would violate the parametricity guarantees.
Therefore, in your run2IO function, foo must be used completely generically as an arbitrary MonadIO value, not a specific MonadIO instance. The liftIO call attempts to use the specific IO instance, which is disallowed, since the caller might not provide an IO value.
It is possible, of course, that you might want the argument to the function to be quantified in the same way as bar is; that is, you might want its instantiation to be chosen by the implementation, not the caller. In that case, you can use the RankNTypes language extension to specify a different type using an explicit forall:
{-# LANGUAGE RankNTypes #-}
run3IO :: MonadIO m => (forall m1. MonadIO m1 => m1 Integer) -> m Integer
run3IO foo = liftIO foo
This will typecheck, but it’s not a very useful function.
In the first, you're using liftIO on bar. That actually requires bar :: IO String. Now, IO happens to be (trivially) an instance on MonadIO, so this works – the compiler simply throws away the polymorphism of bar.
In the second case, the compiler doesn't get to decide what particular monad to use as the type of foo: it's fixed by the environment, i.e. the caller can decide what MonadIO instance it should be. To again get the freedom to choose IO as the monad, you'd need the following signature:
{-# LANGUAGE Rank2Types, UnicodeSyntax #-}
run2IO' :: MonadIO m
=> (∀ m' . MonadIO m' => m' String)
-> m String
run2IO' foo = liftIO foo
... however I don't think you really want that: you might then as well write
run2IO' :: MonadIO m => IO String -> m String
run2IO' foo = liftIO foo
or simply run2IO = liftIO.

How to store arbitrary values in a recursive structure or how to build a extensible software architecture?

I'm working on a basic UI toolkit and am trying to figure out the overall architecture.
I am considering to use WAI's structure for extensibility. A reduced example of the core structure for my UI:
run :: Application -> IO ()
type Application = Event -> UI -> (Picture, UI)
type Middleware = Application -> Application
In WAI, arbitrary values for Middleware are saved in the vault. I think that this is a bad hack to save arbitary values, because it isn't transparent, but I can't think of a sufficient simple structure to replace this vault to give every Middleware a place to save arbitrary values.
I considered to recursively store tuples in tuples:
run :: (Application, x) -> IO ()
type Application = Event -> UI -> (Picture, UI)
type Middleware y x = (Application, x) -> (Application, (y,x))
Or to only use lazy lists to provide a level on which is no need to separate values (which provides more freedom, but also has more problems):
run :: Application -> IO ()
type Application = [Event -> UI -> (Picture, UI)]
type Middleware = Application -> Application
Actually, I would use a modified lazy list solution. Which other solutions might work?
Note that:
I prefer not to use lens at all.
I know UI -> (Picture, UI) could be defined as State UI Picture .
I'm not aware of a solution regarding monads, transformers or FRP. It would be great to see one.
Lenses provide a general way to reference data type fields so that you can extend or refactor your data set without breaking backwards compatibility. I'll use the lens-family and lens-family-th libraries to illustrate this, since they are lighter dependencies than lens.
Let's begin with a simple record with two fields:
{-# LANGUAGE Template Haskell #-}
import Lens.Family2
import Lens.Family2.TH
data Example = Example
{ _int :: Int
, _str :: String
}
makeLenses ''Example
-- This creates these lenses:
int :: Lens' Example Int
str :: Lens' Example String
Now you can write Stateful code that references fields of your data structure. You can use Lens.Family2.State.Strict for this purpose:
import Lens.Family2.State.Strict
-- Everything here also works for `StateT Example IO`
example :: State Example Bool
example = do
s <- use str -- Read the `String`
str .= s ++ "!" -- Set the `String`
int += 2 -- Modify the `Int`
zoom int $ do -- This sub-`do` block has type: `State Int Int`
m <- get
return (m + 1)
The key thing to note is that I can update my data type, and the above code will still compile. Add a new field to Example and everything will still work:
data Example = Example
{ _int :: Int
, _str :: String
, _char :: Char
}
makeLenses ''Example
int :: Lens' Example Int
str :: Lens' Example String
char :: Lens' Example Char
However, we can actually go a step further and completely refactor our Example type like this:
data Example = Example
{ _example2 :: Example
, _char :: Char
}
data Example2 = Example2
{ _int2 :: Int
, _str2 :: String
}
makeLenses ''Example
char :: Lens' Example Char
example2 :: Lens' Example Example2
makeLenses ''Example2
int2 :: Lens' Example2 Int
str2 :: Lens' Example2 String
Do we have to break our old code? No! All we have to do is add the following two lenses to support backwards compatibility:
int :: Lens' Example Int
int = example2 . int2
str :: Lens' Example Char
str = example2 . str2
Now all the old code still works without any changes, despite the intrusive refactoring of our Example type.
In fact, this works for more than just records. You can do the exact same thing for sum types, too (a.k.a. algebraic data types or enums). For example, suppose we have this type:
data Example3 = A String | B Int
makeTraversals ''Example3
-- This creates these `Traversals'`:
_A :: Traversal' Example3 String
_B :: Traversal' Example3 Int
Many of the things that we did with sum types can similarly be re-expressed in terms of Traversal's. There's a notable exception of pattern matching: it's actually possible to implement pattern matching with totality checking with Traversals, but it's currently verbose.
However, the same point holds: if you express all your sum type operations in terms of Traversal's, then you can greatly refactor your sum type and just update the appropriate Traversal's to preserve backwards compatibility.
Finally: note that the true analog of sum type constructors are Prisms (which let you build values using the constructors in addition to pattern matching). Those are not supported by the lens-family family of libraries, but they are provided by lens and you can implement them yourself using just a profunctors dependency if you want.
Also, if you're wondering what the lens analog of a newtype is, it's an Iso', and that also minimally requires a profunctors dependency.
Also, everything I've said works for reference multiple fields of recursive types (using Folds). Literally anything you can imagine wanting to reference in a data type in a backwards-compatible way is encompassed by the lens library.

Can I use type declaration inside Haskell code

I am new to Haskell, and I have a little question about function type declaration. Suppose there are bunch of integers, we need to sum it and print it out. I am aware this works:
main = do
a <- fmap (map read . words) getContents :: IO [Int]
print $ sum a
buta <- fmap (map (read :: Int) . words) getContentsfailed. Why it failed? We know getContents is IO String, then words is dealing with String and return [String] to map (read :: Int), I thought it may goes fine, because we declared it to read an Int, but it failed.
Is it impossible to use type declaration inside a line of code, or I use it the wrong way. Thanks.
The problem is that read doesn't have the type Int, it has the type String -> Int (for your purposes). The map function only accepts a function as its first argument, and you're trying to say that read has type Int, which would mean it's not a function. There's also no way you can coerce the type Read a => String -> a to Int, so it would error on both of these problems.

Simplest way to join functions of same meaning but different return value type

I'm writing small "hello world" type of program, which groups same files by different "reasons", e.g. same size, same content, same checksum etc.
So, I've got to the point when I want to write a function like this (DuplicateReason is an algebraic type which states the reason why two files are identical):
getDuplicatesByMethods :: (Eq a) => [((FilePath -> a), DuplicateReason)] -> IO [DuplicateGroup]
Where in each tuple, first function would be the one that by file's path returns you some (Eq a) value, like bytestring (with content), or Word32 with checksum, or Int with size.
Clearly, Haskell doesn't like that these functions are of different types, so I need to somehow gather them.
The only way I see it to create a type like
data GroupableValue = GroupString String | GroupInt Int | GroupWord32 Word32
And then to make life easier to make typeclass like
class GroupableValueClass a where
toGroupableValue :: a -> GroupableValue
fromGroupableValue :: GroupableValue -> a
and implement instance for each value I'm going to get.
Question: am I doing it right and (if no) is there a simpler way to solve this task?
Update:
Here's full minimal code that should describe what I want (simplified, with no IO etc.):
data DuplicateGroup = DuplicateGroup
-- method for "same size" -- returns size
m1 :: String -> Int
m1 content = 10
-- method for "same content" -- returns content
m2 :: String -> String
m2 content = "sample content"
groupByMethods :: (Eq a) => [(String -> a)] -> [DuplicateGroup]
groupByMethods predicates = undefined
main :: IO ()
main = do
let groups = (groupByMethods [m1, m2])
return ()
Lists are always homogeneous, so you can't put items with a different a in to the same list (as you noticed). There are several ways to design around this, but I usually prefer using GADTs. For example:
{-# LANGUAGE GADTs #-}
import Data.ByteString (ByteString)
import Data.Word
data DuplicateReason = Size | Checksum | Content
data DuplicateGroup
data DuplicateTest where
DuplicateTest :: Eq a => (FilePath -> IO a) -> DuplicateReason -> DuplicateTest
getSize :: FilePath -> IO Integer
getSize = undefined
getChecksum :: FilePath -> IO Word32
getChecksum = undefined
getContent :: FilePath -> IO ByteString
getContent = undefined
getDuplicatesByMethods :: [DuplicateTest] -> IO [DuplicateGroup]
getDuplicatesByMethods = undefined
This solution still needs a new type, but at least you don't have to specify all cases in advance or create boilerplate type-classes. Now, since the generic type a is essentially "hidden" inside the GADT, you can define a list that contains functions with different return types, wrapped in the DuplicateTest GADT.
getDuplicatesByMethods
[ DuplicateTest getSize Size
, DuplicateTest getChecksum Checksum
, DuplicateTest getContent Content
]
You can also solve this without using any language extensions or introducing new types by simply re-thinking your functions. The main intention is to group files according to some property a, so we could define getDuplicatesByMethods as
getDuplicatesByMethods :: [([FilePath] -> IO [[FilePath]], DuplicateReason)] -> IO [DuplicateGroup]
I.e. we take in a function that groups files according to some criteria. Then we can define a helper function
groupWith :: Eq a => (FilePath -> IO a) -> [FilePath] -> IO [[FilePath]]
and call getDuplicatesByMethods like this
getDuplicatesByMethods
[ (groupWith getSize, Size)
, (groupWith getChecksum, Checksum)
, (groupWith getContent, Content)
]

Resources