Suppose I have the following data type which maps my database schema.
data Object = Object
{ classification :: Text
, country :: Text
, numberOfParts :: Int
-- Lots of other fields
}
I want to extract statistics for all objects in the database. For example, I want to extract the frequency of every field in the Person data constructor. So I would have the following function :
-- In the return type, the key of the Map is the field name.
-- Each map value represents possible values with frequency
-- (ex. "classification" -> [("table", 10), ("chair", 3), ("computer", 2)])
generateStats :: [Object] -> Map Text [(Text, Integer)]
This function would calculate the frequency of every field, so I would have to call id object, classification object, country object, etc. If the datatype has 50 fields, I would have to call 50 functions to access those fields.
Is there a way to generalize this ?
Can it be generalized to any data constructor ?
Is there a more elegant way to solve this type of problem ?
This sort of problem can be solved with generics. Usually, the syb package (Data.Generics or Data.Data or SYB or "scrap your boilerplate" generics) is the easiest to use, so it's worth trying it first and moving on to more complicated libraries only if you can't get it to work for a particular task.
Here, syb provides a straightforward way of retrieving the list of field names from a record constructor. If you derive a Data instance for some Object:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
import Data.Text (Text)
import qualified Data.Text as Text
data Object = Object
{ classification :: Text
, country :: Text
, numberOfParts :: Int
} deriving (Data)
then you can fetch the field names at runtime with the following function:
-- Get field names (empty list if not record constructor)
getnames :: Data object => object -> [Text]
getnames = map Text.pack . constrFields . toConstr
like so:
λ> :set -XOverloadedStrings
λ> getnames $ Object "prime" "Canada" 5
["classification","country","numberOfParts"]
You can fetch field values as Text at runtime using a generic query gmapQ and writing a generic helper function toText that converts field values of various types to Text:
-- Get field values as Text.
getfields :: Data object => object -> [Text]
getfields = gmapQ toText
The toText function has type:
toText :: (Data a) => a -> Text
and needs to be prepared to handle any possible field encountered. A limitation of Data.Data generics is that you can only handle a fixed set of explicit types with a default value for "the rest". Here, we handle Text, String, Int, and Double types and throw an error with unknown for "the rest":
{-# LANGUAGE TypeApplications #-}
toText = mkQ unknown -- make a query with default value "unknown"
id -- handle: id :: Text -> Text
`extQ` Text.pack -- extend to: pack :: String -> Text
`extQ` tshow #Int -- extend to: tshow :: Int -> Text
`extQ` tshow #Double -- extend to: tshow :: Double -> Text
where tshow :: (Show a) => a -> Text
tshow = Text.pack . show
unknown = error "unsupported type"
If you wanted to handle all types with a Show (or some other) instance, then syb won't do the job. (If you tried dropping the type application above and writing `extQ` tshow to handle all Show cases, you'd get an error.) Instead, you'd need need to upgrade to syb-with-class or some other generics library to handle this.
With all that in place, getting a list of key/value pairs from any object is straightword:
getpairs :: Data object => object -> [(Text,Text)]
getpairs = zip <$> getnames <*> getfields
This works on Objects:
λ> concatMap getpairs [Object "prime" "Canada" 5, Object "substandard" "Fakeistan" 100]
[("classification","prime"),("country","Canada"),("numberOfParts","5")
,("classification","substandard"),("country","Fakeistan"),("numberOfParts","100")]
or anything else with a Data instance. Sum types and record-less constructors should work okay. With the type:
data OtherObject = Foo { foo :: String, factor :: Double }
| Bar { bar :: Int }
| NotARecord Int Int Int
deriving (Data)
we get:
λ> getpairs $ Foo "exchange" 0.75
[("foo","exchange"),("factor","0.75")]
λ> getpairs $ Bar 42
[("bar","42")]
λ> getpairs $ NotARecord 1 2 3
[]
Here's a complete code example:
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeApplications #-}
import Data.Generics
import Data.Text (Text)
import qualified Data.Text as Text
data Object = Object
{ classification :: Text
, country :: Text
, numberOfParts :: Int
} deriving (Data)
data OtherObject = Foo { foo :: String, factor :: Double }
| Bar { bar :: Int }
| NotARecord Int Int Int
deriving (Data)
-- Get field names (empty list if not record constructor)
getnames :: Data object => object -> [Text]
getnames = map Text.pack . constrFields . toConstr
-- Get field vales as Text.
getfields :: Data object => object -> [Text]
getfields = gmapQ toText
-- Generic function to convert one field.
toText :: (Data a) => a -> Text
toText = mkQ unknown -- make a query with default value "unknown"
id -- handle: id :: Text -> Text
`extQ` Text.pack -- extend to: pack :: String -> Text
`extQ` tshow #Int -- extend to: tshow :: Int -> Text
`extQ` tshow #Double -- extend to: tshow :: Double -> Text
where tshow :: (Show a) => a -> Text
tshow = Text.pack . show
unknown = error "unsupported type"
-- Get field name/value pairs from any `Data` object.
getpairs :: Data object => object -> [(Text,Text)]
getpairs = zip <$> getnames <*> getfields
main :: IO ()
main = mapM_ print $
[ getpairs $ Object "prime" "Canada" 5
, getpairs $ Foo "exchange" 0.75
, getpairs $ Bar 42
, getpairs $ NotARecord 1 2 3
]
This solution depends on the generics machinery from generics-sop and the streaming sinks from foldl.
Some required pragmas and imports:
{-# LANGUAGE DeriveGeneric,DeriveAnyClass,ScopedTypeVariables,FlexibleContexts,
GADTs,TypeApplications,OverloadedStrings,StandaloneDeriving, TypeOperators #-}
module Main (main) where
import qualified GHC.Generics as GHC
import Generics.SOP (All,And,IsProductType,productTypeFrom,
DatatypeInfo(..),datatypeInfo,
ConstructorInfo(..),FieldInfo(..),FieldName,
projections, I(..), K(..),type (-.->)(Fn),type (:.:)(Comp),
Generic,HasDatatypeInfo)
import Generics.SOP.NP -- All the *_NP functions come form here
import Generics.SOP.Dict (Dict(..),zipAll)
import qualified Control.Foldl as L
import Data.Proxy
import Data.Text (Text)
import qualified Data.Map.Strict as Map
Datatypes and functions to calculate histograms, which aren't tied to any concrete record:
newtype Histogram a = Histogram (Map.Map a Int) deriving Show
-- Hides the exact type of the key behind an existential
data SomeHistogram = forall a. (Ord a, Show a) => SomeHistogram (Histogram a)
deriving instance Show SomeHistogram
-- Streaming sink for a single field
histogram :: Ord a => L.Fold a (Histogram a)
histogram = (L.Fold step Map.empty Histogram)
where
step m a = Map.insertWith (+) a 1 m
-- For any record with Generics.SOP.Generic instance,
-- create a streaming sink that accepts record values and
-- returns a list of histograms, one for each field
recordHistogram :: forall r xs . (IsProductType r xs, All Ord xs, All Show xs)
=> L.Fold r [SomeHistogram]
recordHistogram =
let productOfFolds =
cliftA_NP
(Proxy #Ord)
(\(Fn proj) ->
Comp (L.premap (\o -> let np = productTypeFrom #r #xs o
I r = proj (K np)
in r)
histogram))
(projections #xs)
foldToProduct = sequence'_NP productOfFolds -- pull the Fold outward
-- convince GHC that we have a combination of Ord and Show for all fields
ordAndShow = zipAll (Dict #(All Ord) #xs) (Dict #(All Show) #xs)
foldToList = case ordAndShow of -- collapse result of Fold into a list
Dict -> collapse_NP . cliftA_NP (Proxy #(Ord `And` Show)) (K . SomeHistogram)
<$>
foldToProduct
in foldToList
In case we want a list of field names to zip with the list of histograms:
fieldNamesOf :: forall r xs. (IsProductType r xs, HasDatatypeInfo r)
=> Proxy r
-> [FieldName]
fieldNamesOf _ =
case datatypeInfo (Proxy #r) of
ADT _ _ ((Record _ fields) :* Nil) _ ->
collapse_NP (liftA_NP (\(FieldInfo i) -> K i) fields)
_ -> error "hey, not a record!"
Putting it all to work with Object:
data Object = Object
{ classification :: Text
, country :: Text
, numberOfParts :: Int
} deriving (GHC.Generic,Generic,HasDatatypeInfo)
-- Generic and HasDatatypeInfo from generics-sop
main :: IO ()
main = print $ L.fold recordHistogram [Object "foo" "Spain" 4, Object "bar" "France" 4]
This solution has two potential problems:
Internally, recordHistogram uses n-ary products from generics-sop. Constructing and traversing these products might incur in some overhead.
There might be some space leak in the streaming sink (the Fold) returned by recordHistogram. Some extra strictness might be necessary.
(UPDATED)
I have made an interface using a Free Monad to a generic data store. I want to place the specific interpreter (:: DataStore a -> IO a) chosen by the user at run time into a state monad along with some other information. I cannot seem to put anything into this field in the data structure.
How do I put a value into a field defined as a higher rank type?
Below is a minimum example:
{-# LANGUAGE RankNTypes, DeriveFunctor #-}
data ProgramState = PS { -- line 3
[...]
, storageInterface :: (forall a. DataStore a -> IO a)
}
data DataStoreF next =
Create Asset ( String -> next)
| Read String ( Asset -> next)
| Update Asset ( Bool -> next)
| UpdateAll [Asset] ( Bool -> next)
| [...]
deriving Functor
type DataStore = Free DataStoreF
runMemory :: (IORef (Map String Asset)) -> DataStore a -> IO a
runMemory ms (Pure a) = return a
runMemory ms (Free Create asset next) = [...]
runMemory ms (Free Read str next) = [...]
[...]
pickStorageInterface :: IO (DataStore a -> IO a)
pickStorageInterface = do
opts <- parseOptions
case (storage opts) of
MemoryStorage ->
ms <- readAssetsFromDisk
return $ runMemory ms
SomeOtherStorage -> [...]
restOfProgram :: StateT ProgramState IO
restOfProgram = [...]
main = do
si <- pickStorageInterface
let programState = PS { storageInterface = si} -- line 21
evalState restOfProgram programState
When I try to do this GHC complains that:
Main.hs: << Line 21 >>
Couldn't match type `a0' with `a'
because type variable `a' would escape its scope
This (rigid, skolem) type variable is bound by
a type expected by the context: DataStore a -> IO a
at Main.hs <<line 3>>
Expected type: DataStore a -> IO a
Actual type: DataStore a0 -> IO a0
In the `storageInterface' field of a record
[...]
UPDATE
My original minimal example was to minimal. Some further experimentation shows that the problem arises when I need to load the interface in an the IO monad so I can read the command line options. I've updated the example to include that issue. Knowing this I may be able to code around it.
Interesting GHCI tells me that the results of a function of type IO (DataStore a -> IO a) is DataStore GHC.Prim.Any -> IO GHC.Prim.Any which is not what I expected.
The issue here is that
pickStorageInterface :: forall a. IO (DataStore a -> IO a)
while we would need the (impredicative) type
pickStorageInterface :: IO (forall a. DataStore a -> IO a)
for the code above to work. Alas, the impredicative types are in a sad state now in GHC, and are best to be avoided.
You can work around that using a newtype wrapper around the universally quantified type:
newtype SI = SI { runSI :: forall a. DataStore a -> IO a }
pickStorageInterface :: IO SI
pickStorageInterface = do
opts <- parseOptions
case (storage opts) of
MemoryStorage ->
ms <- readAssetsFromDisk
return $ SI $ runMemory ms
...
main = do
si <- pickStorageInterface
let programState = PS { storageInterface = runSI si}
...
Sorry for asking as potentially silly question, but returning to Haskell to do some conversion from one database package to a different one, I find myself a bit puzzled about how to do this properly.
In the Database.SQLite3 module, there is an execWithCallback with type
execWithCallback :: Database -> Text -> ExecCallback -> IO ()
Now, the callback is defined as
type ExecCallback = ColumnCount -> [Text]-> [Maybe Text] -> IO ()
that is, a function with type ExecCallback
My silly test code compiles and runs correctly:
{-# LANGUAGE OverloadedStrings #-}
import Database.SQLite3
import Data.Text
cb :: ColumnCount -> [Text] -> [Maybe Text] -> IO ()
cb n cnl ct = do print $ cnl !! 1
return ()
main = do
dh <- open "fileinfo.sqlite"
execWithCallback dh "select * from files;" cb
close dh
but then, what is the point of the type??? And, how do I specify that cb is an ExecCallback??
In Haskell, with type you define a type synonym. In your example that means that ExecCallback is just an alias for the type ColumnCount -> [Text]-> [Maybe Text] -> IO (), they are interchangeable.
You could change the following lines
cb :: ColumnCount -> [Text] -> [Maybe Text] -> IO ()
cb n cnl ct = do print $ cnl !! 1
return ()
to
cb :: ExecCallback
cb n cnl ct = do print $ cnl !! 1
return ()
and everything would still work as is. It can make your code shorter and more readable.
One other good example is
type String = [Char]
in Prelude. I bet you normally use String instead of [Char] in most cases. But you're absolutely free to use either.
Another (completely unrelated) example is the conduit package where some type synonyms make a major difference:
type Sink i = ConduitM i Void
type Consumer i m r = forall o. ConduitM i o m r
For something that's a sink for values of any type i, Sink i seems way more readable than ConduitM i Void. Same for Consumer.
When there is a function like:
some_type_t* some_type_create(const char* name, char** errptr);
is there a way to get C2HS to generate a Haskell binding with the following signature?
someTypeCreate :: String -> IO (SomeTypeFPtr, String)
Here is what I can get so far:
{#fun some_type_create as ^
{`String', alloca- `Ptr CChar' peek*} -> `SomeTypeFPtr' #}
and it works in a way that I get
someTypeCreate :: String -> IO (SomeTypeFPtr, (Ptr CChar))
but how do I make it return IO (SomeTypeFPtr, String)
(or better IO (Either String SomeTypeFPtr)) since String represents the error)?
I assume that I should use/write a different marshaller to use instead of peek which would convert the result type but I don't quite understand how to do it.
I think I've figured it out, I just wrote the following marshallers:
nullableM :: (Ptr a -> IO b) -> Ptr a -> IO (Maybe b)
nullableM f ptr = if ptr == nullPtr
then return Nothing
else liftM Just $ f ptr
{-# INLINE nullableM #-}
toStringMaybe :: CString -> IO (Maybe String)
toStringMaybe = nullableM peekCString
{-# INLINE toStringMaybe #-}
peekStringMaybe :: Ptr CString -> IO (Maybe String)
peekStringMaybe x = peek x >>= toStringMaybe
{-# INLINE peekStringMaybe #-}
1) I need to pass a field constructor parameter to a function. I made some tests but i was unable to do so. Is it possible? Otherwise, is it possible with lens package?
2) Is it possible in a MonadState to modify a field using modify? (I made a few attempts, but without success. For example: modify (second = "x") does not work.
import Control.Monad.State
data Test = Test {first :: Int, second :: String} deriving Show
dataTest = Test {first = 1, second = ""}
test1 = runStateT modif1 dataTest -- OK
test2 = runStateT (modif2 "!") dataTest -- OK
test3 = runStateT (modif3 second) dataTest -- WRONG
-- modif1 :: StateT Test IO ()
modif1 = do
st <- get
r <- lift getLine
put $ st {second = "x" ++ r}
-- modif2 :: String -> StateT Test IO ()
modif2 s = do
stat <- get
r <- lift getLine
put $ stat {second = "x" ++ r ++ s}
-- modif3 :: ???? -> StateT Test IO ()
modif3 fc = do
stat <- get
r <- lift getLine
put $ stat {fc = "x" ++ r}
-- When i try to load the module, this is the result:
-- ghc > Failed:
-- ProvaRecord.hs:33:16:`fc' is not a (visible) constructor field name
As you said, you're probably looking for lenses. A lens is a value that allows to read, set or modify a given field. Usually with Control.Lens, you define fields with underscores and you use makeLenses to create full-featured lenses.
There are many combinators that allow lenses to be used together within MonadState. In your case we can use %=, which in this case would be specialized to type
(MonadState s m) => Lens' s b -> (b -> b) -> m ()
which modifies a state value using a given lens and a function that operates on the inside value.
Your example could be rewritten using lenses as follows:
{-# LANGUAGE TemplateHaskell, RankNTypes #-}
import Control.Lens
import Control.Monad.State
data Test = Test { _first :: Int
, _second :: String
}
deriving Show
-- Generate `first` and `second` lenses.
$(makeLenses ''Test)
-- | An example of a universal function that modifies any lens.
-- It reads a string and appends it to the existing value.
modif :: Lens' a String -> StateT a IO ()
modif l = do
r <- lift getLine
l %= (++ r)
dataTest :: Test
dataTest = Test { _first = 1, _second = "" }
test :: IO Test
test = execStateT (modif second) dataTest