Simplest way to join functions of same meaning but different return value type - haskell

I'm writing small "hello world" type of program, which groups same files by different "reasons", e.g. same size, same content, same checksum etc.
So, I've got to the point when I want to write a function like this (DuplicateReason is an algebraic type which states the reason why two files are identical):
getDuplicatesByMethods :: (Eq a) => [((FilePath -> a), DuplicateReason)] -> IO [DuplicateGroup]
Where in each tuple, first function would be the one that by file's path returns you some (Eq a) value, like bytestring (with content), or Word32 with checksum, or Int with size.
Clearly, Haskell doesn't like that these functions are of different types, so I need to somehow gather them.
The only way I see it to create a type like
data GroupableValue = GroupString String | GroupInt Int | GroupWord32 Word32
And then to make life easier to make typeclass like
class GroupableValueClass a where
toGroupableValue :: a -> GroupableValue
fromGroupableValue :: GroupableValue -> a
and implement instance for each value I'm going to get.
Question: am I doing it right and (if no) is there a simpler way to solve this task?
Update:
Here's full minimal code that should describe what I want (simplified, with no IO etc.):
data DuplicateGroup = DuplicateGroup
-- method for "same size" -- returns size
m1 :: String -> Int
m1 content = 10
-- method for "same content" -- returns content
m2 :: String -> String
m2 content = "sample content"
groupByMethods :: (Eq a) => [(String -> a)] -> [DuplicateGroup]
groupByMethods predicates = undefined
main :: IO ()
main = do
let groups = (groupByMethods [m1, m2])
return ()

Lists are always homogeneous, so you can't put items with a different a in to the same list (as you noticed). There are several ways to design around this, but I usually prefer using GADTs. For example:
{-# LANGUAGE GADTs #-}
import Data.ByteString (ByteString)
import Data.Word
data DuplicateReason = Size | Checksum | Content
data DuplicateGroup
data DuplicateTest where
DuplicateTest :: Eq a => (FilePath -> IO a) -> DuplicateReason -> DuplicateTest
getSize :: FilePath -> IO Integer
getSize = undefined
getChecksum :: FilePath -> IO Word32
getChecksum = undefined
getContent :: FilePath -> IO ByteString
getContent = undefined
getDuplicatesByMethods :: [DuplicateTest] -> IO [DuplicateGroup]
getDuplicatesByMethods = undefined
This solution still needs a new type, but at least you don't have to specify all cases in advance or create boilerplate type-classes. Now, since the generic type a is essentially "hidden" inside the GADT, you can define a list that contains functions with different return types, wrapped in the DuplicateTest GADT.
getDuplicatesByMethods
[ DuplicateTest getSize Size
, DuplicateTest getChecksum Checksum
, DuplicateTest getContent Content
]
You can also solve this without using any language extensions or introducing new types by simply re-thinking your functions. The main intention is to group files according to some property a, so we could define getDuplicatesByMethods as
getDuplicatesByMethods :: [([FilePath] -> IO [[FilePath]], DuplicateReason)] -> IO [DuplicateGroup]
I.e. we take in a function that groups files according to some criteria. Then we can define a helper function
groupWith :: Eq a => (FilePath -> IO a) -> [FilePath] -> IO [[FilePath]]
and call getDuplicatesByMethods like this
getDuplicatesByMethods
[ (groupWith getSize, Size)
, (groupWith getChecksum, Checksum)
, (groupWith getContent, Content)
]

Related

Haskell: How to write a type of function from a specific type to any type?

In Scala, I could write the following trait:
trait Consumer[A] {
def apply(a: A): Unit
}
And scala would convert whatever I want to Unit, i.e., it would discard the type. Equivalently, I could have said that apply returns Any and ignore the result.
However, in Haskell, if I defined the type as type Consumer = a -> IO (), I wouldn't be able to pass an Int -> IO Int function, as Int isn't ().
There are two ways I know of solving this issue, but none are satisfactory:
Use Data.Functor.void at the call site to manual change IO a to IO (). This is annoying as an API user.
define type Consumer a b = a -> IO b, but then every time I would want to use Consumer in a signature, I would have to carry the useless type b.
Is there any way to define the Consumer type as a function from a to "IO Any"? As far as I know, Haskell does not support something like exists x. a -> IO x.
Using forall results in the opposite of what I want, e.g.,
type Consumer = forall b. a -> IO b
foo :: Int -> IO Int
foo = undefined
bar :: Consumer Int
bar = foo
results in the error:
• Couldn't match type ‘b’ with ‘Int’
‘b’ is a rigid type variable bound by
the type signature for:
bar :: Consumer Int
Expected type: Int -> IO b
Actual type: Int -> IO Int
• In the expression: foo
In an equation for ‘bar’: bar = foo
• Relevant bindings include
bar :: Int -> IO b
Note that I specifically want Consumer to a be type alias, and not a data constructor, as is described here: Haskell function returning existential type. I wouldn't mind if Consumer were a class if anyone knows how to make that work.
To get an existentially-quantified type in Haskell, you need to write down a data declaration (as opposed to a newtype declaration or a type alias declaration, like you used.).
Here's a Consumer type that fits your purposes:
{-# LANGUAGE ExistentialQuantification #-}
data Consumer input = forall output. Consumer { runDiscardingOutput :: input -> IO output }
And, analogously, here is what your example would look like with the new type:
f :: Int -> IO Int
f = undefined
g :: Consumer Int
g = Consumer f
This doesn't really avoid your concerns about client code needing an extra call, though. (I mean, this is no better than exporting a consumer = Data.Functor.void binding from your library.) Also, it complicates how clients will be able to use a consumer, too:
consumer :: Consumer Int
consumer = Consumer (\x -> return [x])
{- This doesn't typecheck -}
main1 :: IO ()
main1 = runIgnoringOutput consumer 4
{- This doesn't typecheck (!!!) -}
main2 :: IO ()
main2 = void (runIgnoringOutput consumer 4)
{- Only this typechecks :( -}
main3 :: IO ()
main3 =
case consumer of
Consumer f -> Data.Functor.void (f 4)
So it would probably make sense to have a apply function in your library that did the dirty work, just as there was an apply function in the Scala library.
apply :: Consumer a -> a -> IO ()
apply (Consumer f) x = void (f x)
I wouldn't mind if Consumer were a class if anyone knows how to make that work.
You can simulate existential types for classes with an associated type family.
But Haskell doesn't allow ambiguous types in classes without using something like a GADT existential wrapper, so you would still have the type information there somewhere.
{-# LANGUAGE TypeFamilies, MultiParamTypeClasses #-}
class Consumer c a where
type Output c
consume :: c -> a -> IO (Output c)
c is necessary here to allow for the reconstruction of the type of Output c, so it is not strictly an existential. But you can now write
{-# LANGUAGE FlexibleInstances, InstanceSigs #-}
instance Consumer (a -> IO b) a where
type Output (a -> IO b) = b
consume :: (a -> IO b) -> a -> IO b
consume = id
This may not fit your use case, because there will not be a type signature that can express Consumer a in a truly existential way. But it is possible to write
... :: (Consumer c a) => c -> ...
(You could also make use of FunctionalDependencies here to clarify the class somewhat.)

RPC (Or: How do I disambiguate function application based on TypeRep values?)

I'm building some infrastructure for doing remote procedure calls in Haskell, and for reasons that are too long to explain here, I cannot reuse existing libraries.
So here's the setup: I have a type class for serializing and deserializing data:
class Serializable a where
encode :: a -> B.ByteString
decode :: B.ByteString -> Maybe a
maxSize :: a -> Int
where B is Data.ByteString.
I can use this to implement serialization of integers, booleans, lists of serializables, tuples of serializables ect.
Now I want to send some arguments across a network to a server, which then performs a computation based on these arguments, and sends back a result. So I create an existential type representing things that can be serialized:
data SerializableExt = forall t . Serializable t => SerializableExt t
because I want to send something of type [SerializableExt].
So, of course, I need to create an instance Serializable SerializableExt. This is where the problem starts:
In order to implement decode :: B.ByteString -> Maybe SerializableExt I need to know the concrete type that the existential type SerializableExt wraps.
So I implement encode :: SerializableExt -> B.ByteString as serializing the concrete type along with the value:
encode (SerializableExt x) = encode (typeOf x, x)
using typeOf from Data-Typeable. The problem is now the implementation of decode :: B.ByteString -> Maybe SerializableExt:
decode bs =
let (tyenc, xenc) = splitPair bs -- Not really important. It just splits bs into the two components
in case (decode tyenc :: Maybe TypeRep) of
Just ty -> SerializableExt <$> _ -- Somehow invoke decode xenc, where the choice of which decode to execute depends on the value of ty.
_ -> Nothing
But I can't see how to fill in the hole here. Because of Haskell's separation of the value level and the type level I can't use the value of ty to disambiguate the invocation of decode xenc, right?
Is there a way to solve this issue, and actually put something in the hole which will do what I want? Or can you come up with another design?
EDIT: One way of doing it would be the following:
decode bs =
let (tyenc, xenc) = splitPair bs
in SerializableExt <$>
case (decode tyenc :: Maybe TypeRep) of
Just ty
| ty == typeRep (Proxy :: Proxy Int) -> decode xenc :: Maybe Int
| ty = typeRep (Proxy :: Proxy ()) -> decode xenc :: Maybe ()
| ...
_ -> Nothing
but this is bad for several reasons:
It's tedious to extend.
It cannot handle pairs (or generally: tuples) generically; every
combination of types needs to be handled.
It's not very Haskelly
Data.Dynamic lets us put arbitrary Haskell values into a single container, and get them out again in a type-safe way. That's a good start towards inter-process communication; I'll come back to serialization below.
We can write a program that takes a list of Dynamic values, checks for the number & types it needs, and returns a result in the same way.
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE ScopedTypeVariables #-}
-- | Experiments with type-safe serialization.
module Main where
import Data.Proxy
import Data.Dynamic
import Data.Foldable
import Data.Type.Equality
import Type.Reflection
foo :: Int -> String -> String
foo i s = concat (replicate i s)
actor :: [Dynamic] -> Either String [Dynamic]
actor (di : ds : _) = case (fromDynamic di, fromDynamic ds) of
(Just i, Just s) -> Right [toDyn (foo i s)]
_ -> Left "Wrong types of arguments"
actor _ = Left "Not enough arguments"
caller :: Either String [Dynamic]
caller = actor [ toDyn (3::Int), toDyn "bar" ]
main :: IO ()
main = case caller of
Left err -> putStrLn err
Right dyns -> for_ dyns (\d -> case fromDynamic d of
Just s -> putStrLn s
Nothing -> print d)
We can use a TypeRep to guide selection of a class instance. (For ease of testing my code, I used String.)
class Serial a where
encode :: a -> String
decode :: String -> Maybe a
decodeAs :: Serial a => TypeRep a -> String -> Maybe a
decodeAs _ s = decode s
Finally, we'd like to serialize the TypeRep, and when decoding, check that the encoded type matches the type that we're decoding at.
instance Serial SomeTypeRep
encodeDyn :: (Typeable a, Serial a) => a -> (String, String)
encodeDyn a = (encode (SomeTypeRep (typeOf a)), encode a)
decodeDynamic :: forall a. (Typeable a, Serial a) => String -> String -> Maybe a
decodeDynamic tyStr aStr = case decode tyStr of
Nothing -> Nothing
Just (SomeTypeRep ty) ->
case eqTypeRep ty (typeRep :: TypeRep a) of
Nothing -> Nothing
Just HRefl -> decodeAs ty aStr

Best practices for talking to an API

I'm trying to create some bindings for an API in Haskell. I noticed some functions have a tremendous number of arguments, e.g.
myApiFunction :: Key -> Account -> Int -> String -> Int -> Int -> IO (MyType)
It's not necessarily bad, per se, to have this many arguments. But as a user I don't like long argument functions. However, each of these args is absolutely 100% necessary.
Is there a more haskell-ish way to abstract over the common parts of these functions? Everything past account here is used to build a URL, so I would need it available, and what it stands for depends entirely on the function. Certain things are consistent though, like Key and Account, and I'm wondering what the best to abstract over these arguments is.
Thank you!
You can combine these into more descriptive data types:
data Config = Config
{ cKey :: Key
, cAccount :: Account
}
Then maybe have types or newtypes to make the other arguments more descriptive:
-- I have no idea what these actually should be, I'm just making up something
type Count = Int
type Name = String
type Position = (Int, Int)
myApiFunction :: Config -> Count -> Name -> Position -> IO MyType
myApiFunction conf count name (x, y) =
myPreviousApiFunction (cKey conf)
(cAccount conf)
name
name
x
y
If the Config is always needed, then I would recommend working in a Reader monad, which you can easily do as
myApiFunction
:: (MonadReader Config io, MonadIO io)
=> Count -> Name -> Position
-> io MyType
myApiFunction count name (x, y) = do
conf <- ask
liftIO $ myPreviousApiFunction
(cKey conf)
(cAccount conf)
name
name
x
y
This uses the mtl library for monad transformers. If you don't want to have to type that constraint over and over, you can also use the ConstraintKinds extension to alias it:
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE FlexibleContexts #-}
...
type ApiCtx io = (MonadReader Config io, MonadIO io)
...
myApiFunction
:: ApiCtx io
=> Count -> Location -> Position
-> io MyType
myApiFunction ...
Depending on your specific application, you could also split it up into multiple function. I've seen plenty of APIs before that had something like
withCount :: ApiCtx io => Count -> io a -> io a
withName :: ApiCtx io => Name -> io a -> io a
withPos :: ApiCtx io => Position -> io a -> io a
(&) :: a -> (a -> b) -> b
request :: ApiCtx io => io MyType
> :set +m -- Multi-line input
> let r = request & withCount 1
| & withName "foo"
| & withPos (1, 2)
> runReaderT r (Config key acct)
These are just a handful of techniques, there are others out there as well but they generally start becoming more complex after this. Others will have different preferences on how to do this, and I'm sure plenty would disagree with me on whether some of these are even good practice (specifically ConstraintKinds, it isn't universally accepted).
If you find yourself having type signatures that are too large a lot, even after applying some of these techniques, then maybe you're approaching the problem from the wrong direction, maybe those functions can be broken down into simpler intermediate steps, maybe some of those arguments can be grouped together logically into more specific data types, maybe you just need a larger record structure to handle setting up complex operations. It's pretty open ended right now.

Deserializing many network messages without using an ad-hoc parser implementation

I have a question pertaining to deserialization. I can envision a solution using Data.Data, Data.Typeable, or with GHC.Generics, but I'm curious if it can be accomplished without generics, SYB, or meta-programming.
Problem Description:
Given a list of [String] that is known to contain the fields of a locally defined algebraic data type, I would like to deserialize the [String] to construct the target data type. I could write a parser to do this, but I'm looking for a generalized solution that will deserialize to an arbitrary number of data types defined within the program without writing a parser for each type. With knowledge of the number and type of value constructors an algebraic type has, it's as simple as performing a read on each string to yield the appropriate values necessary to build up the type. However, I don't want to use generics, reflection, SYB, or meta-programming (unless it's otherwise impossible).
Say I have around 50 types defined similar to this (all simple algebraic types composed of basic primitives (no nested or recursive types, just different combinations and orderings of primitives) :
data NetworkMsg = NetworkMsg { field1 :: Int, field2 :: Int, field3 :: Double}
data NetworkMsg2 = NetworkMsg2 { field1 :: Double, field2 :: Int, field3 :: Double }
I can determine the data-type to be associated with a [String] I've received over the network using a tag id that I parse before each [String].
Possible conjectured solution path:
Since data constructors are first-class values in Haskell, and actually have a type-- Can NetworkMsg constructor be thought of as a function, such as:
NetworkMsg :: Int -> Int -> Double -> NetworkMsg
Could I transform this function into a function on tuples using uncurryN then copy the [String] into a tuple of the same shape the function now takes?
NetworkMsg' :: (Int, Int, Double) -> NetworkMsg
I don't think this would work because I'd need knowledge of the value constructors and type information, which would require Data.Typeable, reflection, or some other metaprogramming technique.
Basically, I'm looking for automatic deserialization of many types without writing type instance declarations or analyzing the type's shape at run-time. If it's not feasible, I'll do it an alternative way.
You are correct in that the constructors are essentially just functions so you can write generic instances for any number of types by just writing instances for the functions. You'll still need to write a separate instance
for all the different numbers of arguments, though.
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
import Text.Read
import Control.Applicative
class FieldParser p r where
parseFields :: p -> [String] -> Maybe r
instance Read a => FieldParser (a -> r) r where
parseFields con [a] = con <$> readMaybe a
parseFields _ _ = Nothing
instance (Read a, Read b) => FieldParser (a -> b -> r) r where
parseFields con [a, b] = con <$> readMaybe a <*> readMaybe b
parseFields _ _ = Nothing
instance (Read a, Read b, Read c) => FieldParser (a -> b -> c -> r) r where
parseFields con [a, b, c] = con <$> readMaybe a <*> readMaybe b <*> readMaybe c
parseFields _ _ = Nothing
{- etc. for as many arguments as you need -}
Now you can use this type class to parse any message based on the constructor as long as the type-checker is able to figure out the resulting message type from context (i.e. it is not able to deduce it simply from the given constructor for these sort of multi-param type class instances).
data Test1 = Test1 {fieldA :: Int} deriving Show
data Test2 = Test2 {fieldB ::Int, fieldC :: Float} deriving Show
test :: String -> [String] -> IO ()
test tag fields = case tag of
"Test1" -> case parseFields Test1 fields of
Just (a :: Test1) -> putStrLn $ "Succesfully parsed " ++ show a
Nothing -> putStrLn "Parse error"
"Test2" -> case parseFields Test2 fields of
Just (a :: Test2) -> putStrLn $ "Succesfully parsed " ++ show a
Nothing -> putStrLn "Parse error"
I'd like to know how exactly you use the message types in the application, though, because having each message as its separate type makes it very difficult to have any sort of generic message handler.
Is there some reason why you don't simply have a single message data type? Such as
data NetworkMsg
= NetworkMsg1 {fieldA :: Int}
| NetworkMsg2 {fieldB :: Int, fieldC :: Float}
Now, while the instances are built in pretty much the same way, you get much better type inference since the result type is always known.
instance Read a => MessageParser (a -> NetworkMsg) where
parseMsg con [a] = con <$> readMaybe a
instance (Read a, Read b) => MessageParser (a -> b -> NetworkMsg) where
parseMsg con [a, b] = con <$> readMaybe a <*> readMaybe b
instance (Read a, Read b, Read c) => MessageParser (a -> b -> c -> NetworkMsg) where
parseMsg con [a, b, c] = con <$> readMaybe a <*> readMaybe b <*> readMaybe c
parseMessage :: String -> [String] -> Maybe NetworkMsg
parseMessage tag fields = case tag of
"NetworkMsg1" -> parseMsg NetworkMsg1 fields
"NetworkMsg2" -> parseMsg NetworkMsg2 fields
_ -> Nothing
I'm also not sure why you want to do type-generic programming specifically without actually using any of the tools meant for generics. GHC.Generics, SYB or Template Haskell is usually the best solution for this kind of problem.

How to explicitly instantiate/specialise a polymorphic Haskell function?

I was wondering whether it is possible to explicitly instantiate/specialise a polymorphic function in Haskell? What I mean is, imagine I've a function like the following:
parseFile :: FromJSON a => FilePath -> IO Either String a
The structure into which it attempts to parse the file's contents will depend on the type of a. Now, I know it's possible to specify a by annotation:
parseFile myPath :: IO Either String MyType
What I was wondering was whether it's possible to specialise parseFile more explicitly, for instance with something like (specialise parseFile MyType) to turn it into parseFile :: FilePath -> IO Either String MyType
The reason I ask is that the method of annotation can become clumsy with larger functions. For instance, imagine parseFile gets called by foo which gets called by bar, and bar's return value has a complex type like
:: FromJSON a => IO (([Int],String), (Int, String, Int), a, (Double, [String]))
This means that if I want to call bar with a as MyType, I have to annotate the call with
:: IO (([Int],String), (Int, String, Int), MyType, (Double, [String]))
If I want to call bar multiple times to process different types, I end up writing this annotation multiple times, which seems like unnecessary duplication.
res1 <- bar inputA :: IO (([Int],String), (Int, String, Int), MyType, (Double, [String]))
res2 <- bar inputB :: IO (([Int],String), (Int, String, Int), OtherType, (Double, [String]))
res3 <- bar inputC :: IO (([Int],String), (Int, String, Int), YetAnotherType, (Double, [String]))
Is there a way to avoid this? I'm aware it would be possible to bind the result of bar inputA and use it in a function expecting a MyType, allowing the type engine to infer that the a in question was a MyType without requiring explicit annotation. This seems to sacrifice type safety however, as if I accidentally used the result of the above bar inputB (an OtherType) in a function that expects a MyType, for instance, the type system wouldn't complain, instead the program would fail at runtime when attempting to parse inputB into a MyType, as inputB contains an OtherType, not a MyType.
First, a small correction, the type should be
parseFile :: FromJSON a => FilePath -> IO (Either String a)
The parenthesis are important and necessary
There are a couple ways around this. For example, if you had a function
useMyType :: MyType -> IO ()
useMyType = undefined
Then you used parseFile as
main = do
result <- parseFile "data.json"
case result of
Left err -> putStrLn err
Right mt -> useMyType mt
No extra type annotations are required, GHC can infer the type of mt by its use with useMyType.
Another way is to simply assign it to a concretely typed name:
parseMyTypeFromFile :: FilePath -> IO (Either String MyType)
parseMyTypeFromFile = parseFile
main = do
result <- parseMyTypeFromFile "data.json"
case result of
Left err -> putStrLn err
Right mt -> useMyType mt
And where ever you use parseMyTypeFromFile no explicit annotation is necessary. This is the same as a common practice for specifying the type of read:
readInt :: String -> Int
readInt = read
For solving the bar problem, if you have a type that complex I would at least suggest creating an alias for it, if not its own data type entirely, possibly with record fields and whatnot. Something similar to
data BarType a = (([Int], String), (Int, String, Int), a, (Double, [String]))
Then you can write bar as
bar :: FromJSON a => InputType -> IO (BarType a)
bar input = implementation details
which makes bar nicer to read too. Then you can just do
res1 <- bar inputA :: IO (BarType MyType)
res2 <- bar inputB :: IO (BarType OtherType)
res3 <- bar inputC :: IO (BarType YetAnotherType)
I would consider this perfectly clear and idiomatic Haskell, personally. Not only is it immediately readable and clear what you're doing, but by having a name to refer to the complex type, you minimize the chance of typos, take advantage of IDE autocompletion, and can put documentation on the type itself to let others (and your future self) know what all those fields mean.
You can't make a polymorphic function provided elsewhere and given an explicit annotation into a more restricted version with the same name. But you can do something like:
parseFileOfMyType :: FilePath -> IO Either String MyType
parseFileOfMyType = parseFile
A surprising number of useful functions in various libraries are similar type-specific aliases of unassuming functions like id. Anyway, you should be able to make type-constrained versions of those examples using this technique.
Another solution to the verbosity problem would be to create type aliases:
type MyInputParse a = IO (([Int],String), (Int, String, Int), a, (Double, [String]))
res1 <- bar inputA :: MyInputParse MyType
res2 <- bar inputB :: MyInputParse OtherType
res3 <- bar inputC :: MyInputParse YetAnotherType
In the not-too-distant future, GHC will possibly be getting a mechanism to provide partial type signatures, which will let you leave some sort of hole in the type signature that inference will fill in while you make the part you're interested in specific. But it's not there yet.

Resources