How can I get a nested property using Data.Aeson?
For example, when decoding an arbitrary JSON string using Value like this:
decode "{\"foo\":{\"bar0\":\"foobar0\",
\"bar1\":\"foobar1\"}}" :: Maybe Value
I end up with this:
Just (Object (fromList [("foo",Object (fromList [("bar1",String "foobar1"),("bar0",String "foobar0")]))]))
Now, how can I write a function [String] -> Object -> Maybe Value that will extract the Value, if any, arrived at by following the provided list of properties?
This function should be used like so:
extractProperty ["foo", "bar0"] (Object (fromList [("foo",Object (fromList [("bar1",String "foobar1"),("bar0",String "foobar0")]))]))
==> Just (String "foobar0")
extractProperty ["foo", "bar0", "baz"] (Object (fromList [("foo",Object (fromList [("bar1",String "foobar1"),("bar0",String "foobar0")]))]))
==> Nothing
The following solution makes use of the lens and lens-aeson packages:
{-# LANGUAGE FlexibleInstances #-}
import Control.Lens (view,pre,ix) -- from lens
import Control.Lens.Reified (ReifiedTraversal(..))
import Data.Aeson -- from aeson
import Data.Aeson.Lens (_Object) -- from lens-aeson
import Data.Text -- form text
instance Monoid (ReifiedTraversal s s s s) where
mempty = Traversal id
mappend (Traversal t1) (Traversal t2) = Traversal (t1 . t2)
extractProperty :: [Text] -> Object -> Maybe Value
extractProperty keys o =
view (pre telescope) (Object o)
where
telescope = runTraversal $ foldMap (\k -> Traversal $ _Object . ix k) keys
ReifiedTraversal is simply a newtype around a Traversal, we define a Monoid instance on it to allow easy composition of traversals that start and end in the same type (similar to how the Endo monoid works).
In our case, the traversal _Object . ix k goes from Value to Value. ix comes from Control.Lens.At and indexes on the map of properties of an Object.
We extract the first result of the composed traversal (if it exists) with the pre function.
Edit: As #cchalmers mentions in his comment, there's no need to declare an orphan instance, it works fine just with Endo. Also key k is the same as _Object . ix k.
Here's a version of extractProperty that doesn't use lens and instead relies on composing a list of kleisli arrows Value -> Maybe Value using foldr:
import qualified Data.HashMap.Strict as HM
extractProperty :: [T.Text] -> Object -> Maybe Value
extractProperty keys o = telescope keys (Object o)
where
telescope = foldr (>=>) return . map maybeKey
maybeKey k v = case v of
Object o -> HM.lookup k o
_ -> Nothing
Perhaps lens was a bit overkill in this case.
Another approach, using monadic bind:
import Data.Text (Text)
import qualified Data.HashMap.Strict as HM
extractProperty :: [Text] -> Value -> Maybe Value
extractProperty [] v = Just v
extractProperty (k:ks) (Object o) = HM.lookup k o >>= prop ks
extractProperty _ _ = Nothing
Related
Unless I'm missing something it seems that there are only two ways to "traverse" a Pandoc data-structure:
Manually pattern-matching on Block and Inline constructors
Via the Walkable type-class and related utility function
Using the Walkable type-class, is there an efficient way to search for the first matching element (preferably in a breadth-first manner), and stop the traversal as soon as its found? It seems to me that all functions around the Walkable type-class are going to traverse the entire data structure no matter what.
If not, I guess the only way is to pattern-match the Block and Inline constructors and build this on my own.
The other answer points out the useful query function. I'd add that there's a package of pandoc lenses. You asked about breadth-first traversal too, so here's both.
import Data.Semigroup (First (..))
dfsFirstLink :: Walkable Inline b => b -> Maybe Text
dfsFirstLink = fmap getFirst . query (preview $ _Link . _2 . _1 . to First)
bfsFirstLink :: Walkable Inline b => b -> Maybe Text
bfsFirstLink = fmap getFirst . getConst . traverseOf (levels query . folded) (Const . preview (_Link . _2 . _1 . to First))
-- Construct a walkable value where dfs != bfs
p :: Pandoc
p = Pandoc mempty [Plain [Note [Plain [Link mempty [] ("a","b")]]],Plain [Link mempty [] ("c","d")]]
>> dfsFirstLink p
Just "a"
>> bfsFirstLink p
Just "c"
Though unfortunately some ad-hoc experiments suggest it may not be as lazy as one might hope.
The Walkable typeclass contains a function called query with the following type signature:
query :: Monoid c => (a -> c) -> b -> c
In Data.Semigroup, there's a type called First, with a semigroup instance where the accumulating behavior is to return the "leftmost value".
This can be combined with the Monoid on Maybe, which turns any Semigroup into a Monoid with mempty of Nothing, to give the behavior you want.
For example, adapting a function from Inline -> Maybe String, to Pandoc -> Maybe String, can be done like so:
import Text.Pandoc
import Text.Pandoc.Walk (query)
import Data.Semigroup
findUrl :: Inline -> Maybe String
findUrl (Link _ _ target) = Just $ fst target
findUrl _ = Nothing
findFirstUrl :: Pandoc -> Maybe String
findFirstUrl = (fmap getFirst) . (query findUrl')
where
findUrl' :: Inline -> Maybe (First String)
findUrl' = (fmap First) . findUrl
With regards to your concern that this will traverse the entire data structure: Haskell is lazy; it shouldn't traverse any further than it needs to.
As pointed out in the comments, it's also possible to write this by specializing query to the List Monoid:
import Text.Pandoc
import Text.Pandoc.Walk (query)
import Data.Maybe (listToMaybe)
findUrl :: Inline -> [String]
findUrl (Link _ _ target) = [fst target]
findUrl _ = []
findFirstUrl :: Pandoc -> Maybe String
findFirstUrl = listToMaybe . (query findUrl)
I have an json value of:
{
"name": "xyz1",
"extra": [
{
"this_string_A": "Hello"
},
{
"this_string_B": "World"
}
]
}
And a data type of:
data Abc = Abc
{ name :: String
, a :: Maybe String
, b :: Maybe String
} deriving (Generic, Show)
In the above case I would want it to parse with a result of Abc "xyz1" (Just "Hello") (Just "World").
I can't figure out how to conditionally parse the values within extra (which is a JSON array) within the aeson Parser context. How can I get extra[0].this_string_a for example? I
What I tried:
I thought I could create my own Parser (Maybe String) function but ran into confusing errors:
instance FromJSON Abc where
parseJSON = withObject "Abc" $ \v -> Abc
<$> v .: "name"
<*> myParse v
<*> myParse v
myParse :: Object -> Parser (Maybe String)
myParse x = withArray "extra" myParse2 (x)
myParse2 :: Array -> Parser (Maybe String)
myParse2 = undefined
typecheck fails with:
• Couldn't match type ‘unordered-containers-0.2.10.0:Data.HashMap.Base.HashMap
text-1.2.3.1:Data.Text.Internal.Text Value’
with ‘Value’
Expected type: Value
Actual type: Object
• In the third argument of ‘withArray’, namely ‘(x)’
And if I replace x with Object x then I get parse error of:
Left "Error in $: parsing extra failed, expected Array, but encountered Object"
Full example (run test function to test):
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
module Example where
import GHC.Generics
import Data.Aeson
import Data.Aeson.Types
data Abc = Abc
{ name :: String
, a :: Maybe String
, b :: Maybe String
} deriving (Generic, Show)
instance FromJSON Abc where
parseJSON = withObject "Abc" $ \v -> Abc
<$> v .: "name"
<*> (v.: "extra") -- find where object has key of this_string_a ??
<*> (v.: "extra") -- find where object has key of this_string_b ??
test :: Either String Abc
test = eitherDecode exampleJson
exampleJson = "{ \"name\": \"xyz1\", \"extra\": [ { \"this_string_A\": \"Hello\" }, { \"this_string_B\": \"World\" } ] }"
The withXXX "helpers" make everything kind of awkward, but here goes.
The Aeson Parser type is misnamed, and this causes confusion.
The idea with Aeson Parser objects is that they represent a monadic parse result. (This is different from the Parser objects you find in Parsec, etc., which represent actual monadic parsers.) So, you should think of a Parser a as isomorphic to an Either ParseError a -- a monadic result with the possibility of failure.
These parse results are usually combined applicatively. So if you have a parser like:
data Xyz = Xyz { x :: String, y :: String }
instance FromJSON Xyz where
parseJSON = withObject "Xyz" $ \v ->
Xyz <$> v .: "x" <*> v .: "y"
the parse results v .: "x" and v .: "y" have type Parser String which is really like Either ParseError a, and the last line of that instance is the usual method of combining successful and unsuccessful results in an applicative manner, along the lines of:
Xyz <$> Right "value_x" <*> Left "while parsing Xyz: key y was missing"
Now, the function parseJSON has type Value -> Parser a. This is what should properly be called a parser, but to avoid confusion, let's call it a "parse function". A parse function takes a JSON representation (a Value, or an Object or some other JSON thingy) and returns a parse result. The withXXX family of functions are used to adapt parse functions between JSON thingies. If you have a parse function that expects an Object, like:
\v -> Xyz <$> v .: "x" <*> v .: "y" :: Object -> Parser Xyz
and you want to adapt it to parseJSON :: Value -> Parser Xyz, you use withObject "str" :: (Object -> Parser Xyz) -> (Value -> Parser Xyz) to do it.
Getting back to your problem, if you'd like to write a core parser that looks like:
\v -> Abc <$> v .: "name" <*> extra .:? "this_string_A"
<*> extra .:? "this_string_B"
you want extra to be an Object, and you want to extract it monadically from the overall JSON object v :: Object, using appropriate withXXX helpers to adapt parse functions from one input JSON thingy type to another. So, let's write a monadic function (a parse function, in fact) to do that:
getExtra :: Object -> Parser Object
getExtra v = do
First, we monadically extract the optional "extra" component from v. We use the conditional form here, so mextra :: Maybe Value.
mextra <- v .:? "extra"
Second, let's monadically create our final Object out of "mextra". This will be the JSON Object whose keys are "this_string_A" and "this_string_B" with the array layer removed. Note the type of this case expression will be Parser Object, a parse result of type Object = HashMap key value. For the Just case, we have a Value that we expect to be an array, so let's use the withArray helper to ensure that. Note that the withArray "str" helper function takes our parse function of type \arr -> do ... :: Array -> Parser Object and adapts it to Value -> Parser Object so it can be applied to vv :: Value.
case mextra of
Just vv -> vv & withArray "Abc.extra" (\arr -> do
Now, arr is an Array = Vector Value. We hope it's an array of Objects. Let's pull the Values out as a list:
let vallst = toList arr
and then monadically traverse the list with the help of withObject to ensure they're all Objects as expected. Note the use of pure here, since we want to extract the Objects as-is without any additional processing:
objlst <- traverse (withObject "Abc.extra[..]" pure) vallst
Now, we have an objlst :: [Object]. They're a set of singleton hashmaps with disjoint keys, and the Object / hashmap we want is their union, so let's return that. The parenthesis here ends the withArray expression that's being applied to vv:
return $ HashMap.unions objlst)
For the Nothing case ("extra" not found), we merely return an empty hashmap:
Nothing -> return HashMap.empty
The full function looks like this:
getExtra :: Object -> Parser Object
getExtra v = do
mextra <- v .:? "extra"
case mextra of
Just vv -> vv & withArray "Abc.extra" (\arr -> do
let vallst = toList arr
objlst <- traverse (withObject "Abc.extra[..]" pure) vallst
return $ HashMap.unions objlst)
Nothing -> return HashMap.empty
and you use it in your parser instance like so:
instance FromJSON Abc where
parseJSON =
withObject "Abc" $ \v -> do
extra <- getExtra v
Abc <$> v .: "name" <*> extra .:? "this_string_A" <*> extra .:? "this_string_B"
With a test case:
example :: BL.ByteString
example = "{\"name\": \"xyz1\", \"extra\": [{\"this_string_A\": \"Hello\"}, {\"this_string_B\": \"World\"}]}"
main = print (eitherDecode example :: Either String Abc)
it works like so:
λ> main
Right (Abc {name = "xyz1", a = Just "Hello", b = Just "World"})
The full code:
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson (eitherDecode, FromJSON, Object, parseJSON, withArray, withObject, (.:), (.:?))
import Data.Aeson.Types (Parser)
import GHC.Generics (Generic)
import qualified Data.ByteString.Lazy as BL (ByteString)
import qualified Data.HashMap.Strict as HashMap (empty, unions)
import Data.Function ((&))
import Data.Foldable (toList)
data Abc = Abc
{ name :: String
, a :: Maybe String
, b :: Maybe String
} deriving (Generic, Show)
instance FromJSON Abc where
parseJSON =
withObject "Abc" $ \v -> do
extra <- getExtra v
Abc <$> v .: "name" <*> extra .:? "this_string_A" <*> extra .:? "this_string_B"
getExtra :: Object -> Parser Object
getExtra v = do
mextra <- v .:? "extra"
case mextra of
Just vv -> vv & withArray "Abc.extra" (\arr -> do
let vallst = toList arr
objlst <- traverse (withObject "Abc.extra[..]" pure) vallst
return $ HashMap.unions objlst)
Nothing -> return HashMap.empty
example :: BL.ByteString
example = "{\"name\": \"xyz1\", \"extra\": [{\"this_string_A\": \"Hello\"}, {\"this_string_B\": \"World\"}]}"
main = print (eitherDecode example :: Either String Abc)
Partial answer...
instance FromJSON Abc where
parseJSON = withObject "Abc" $ \v -> Abc
<$> v .: "name"
<*> (v .: "extra" >>= myParse)
<*> (v .: "extra" >>= myParse)
myParse :: Array -> Parser (Maybe String)
myParse x = withArray "extra" (lookupDictArray "this_string_a") (Array x)
lookupDictArray :: Text -> Array -> Parser (Maybe String)
lookupDictArray k a = do
let v = Vector.find (maybe False (HashMap.member k) . parseMaybe parseJSON) a
case v of
Just v' -> withObject "grrrrrrrrrrr" (\v -> v .: k) v'
Nothing -> pure Nothing
Fails to typecheck with:
src/Example.hs:32:69-77: error:
• Ambiguous type variable ‘a0’ arising from a use of
‘parseJSON’
prevents the constraint ‘(FromJSON a0)’ from being
solved.
Probable fix: use a type annotation to specify
what ‘a0’ should be.
These potential instances exist:
instance FromJSON DotNetTime
-- Defined in ‘aeson-1.4.4.0:Data.Aeson.Types.FromJSON’
instance FromJSON Value
-- Defined in ‘aeson-1.4.4.0:Data.Aeson.Types.FromJSON’
instance (FromJSON a, FromJSON b) => FromJSON
(Either a b)
-- Defined in ‘aeson-1.4.4.0:Data.Aeson.Types.FromJSON’
...plus 29 others
...plus 60 instances involving out-of-scope types
(use -fprint-potential-instances to see them all)
• In the first argument of ‘parseMaybe’, namely
‘parseJSON’
In the second argument of ‘(.)’, namely
‘parseMaybe parseJSON’
In the first argument of ‘Vector.find’, namely
‘(maybe False (member k) . parseMaybe
parseJSON)’
|
32 | let v = (Vector.find (maybe False (HashMap.member
k) . parseMaybe parseJSON) a)
Suppose I want to implement FromJSON for a data type. Below are the complete source code:
{-# LANGUAGE
NamedFieldPuns
, OverloadedStrings
, TupleSections
, ViewPatterns
#-}
module Main
( main
) where
import Data.Aeson
import Control.Monad
import qualified Data.HashMap.Strict as HM
import qualified Data.Map.Strict as M
import qualified Data.Text as T
data Foo
= Foo
{ aaa :: Int
, bbb :: T.Text
, ccc :: Maybe (Int, Int)
, extra :: M.Map T.Text T.Text
}
instance FromJSON Foo where
parseJSON = withObject "Foo" $ \obj -> do
aaa <- obj .: "aaa"
bbb <- obj .: "bbb"
ccc <- obj .:? "ccc"
let existingFields = T.words "aaa bbb ccc"
obj' =
-- for sake of simplicity, I'm not using the most efficient approach.
filter ((`notElem` existingFields) . fst)
. HM.toList
$ obj
(M.fromList -> extra) <- forM obj' $ \(k,v) ->
withText "ExtraText" (pure . (k,)) v
pure Foo {aaa,bbb,ccc,extra}
main :: IO ()
main = pure ()
This data type Foo has a bunch of fields of potentially different types and in the end there is extra to collect all remaining fields.
Obviously no one would enjoy updating existingFields every time some fields get add/remove/update-ed, any recommended approach on collecting unused fields?
An alternative that I can think of is to stack a StateT on top with obj (converted to Map) as the initial state, and use something like Data.Map.splitLookup to "discharge" used fields. But I'm reluctant to do so as it will involve some lifting around monad stacks and it doesn't sound very good performance-wise removing elements one at a time from Map in comparison to filtering through HashMap in one pass in the end.
no one would enjoy updating existingFields every time some fields get
add/remove/update-ed
Consider this function
import Data.Aeson.Types (Parser)
import Data.Text (Text)
import Control.Monad.Trans.Writer
import Data.Functor.Compose
keepName :: (Object -> Text -> Parser x)
-> Object -> Text -> Compose (Writer [Text]) Parser x
keepName f obj fieldName = Compose $ do
tell [fieldName]
pure (f obj fieldName)
It takes as input an operator like .: or .:? and "enriches" its result value so that, instead of returning a Parser, it returns a Parser nested inside a Writer that serves to accumulate the supplied field names. The composition is wrapped in the Compose newtype, which automatically gives us an Applicative instance because, as mentioned in the docs:
(Applicative f, Applicative g) => Applicative (Compose f g)
(The composition is not a Monad though. Also take note that we are using Writer and not WriterT. We are nesting Applicatives, not applying monad transformers).
The rest of the code doesn't change that much:
{-# LANGUAGE ApplicativeDo #-}
instance FromJSON Foo where
parseJSON = withObject "Foo" $ \obj -> do
let Compose (runWriter -> (parser,existingFields)) =
do aaa <- keepName (.:) obj "aaa"
bbb <- keepName (.:) obj "bbb"
ccc <- keepName (.:?) obj "ccc"
pure Foo {aaa,bbb,ccc,extra = mempty}
obj' =
filter ((`notElem` existingFields) . fst)
. HM.toList
$ obj
(M.fromList -> extra) <- forM obj' $ \(k,v) ->
withText "ExtraText" (pure . (k,)) v
r <- parser
pure $ r { extra }
Suppose I have the following data type which maps my database schema.
data Object = Object
{ classification :: Text
, country :: Text
, numberOfParts :: Int
-- Lots of other fields
}
I want to extract statistics for all objects in the database. For example, I want to extract the frequency of every field in the Person data constructor. So I would have the following function :
-- In the return type, the key of the Map is the field name.
-- Each map value represents possible values with frequency
-- (ex. "classification" -> [("table", 10), ("chair", 3), ("computer", 2)])
generateStats :: [Object] -> Map Text [(Text, Integer)]
This function would calculate the frequency of every field, so I would have to call id object, classification object, country object, etc. If the datatype has 50 fields, I would have to call 50 functions to access those fields.
Is there a way to generalize this ?
Can it be generalized to any data constructor ?
Is there a more elegant way to solve this type of problem ?
This sort of problem can be solved with generics. Usually, the syb package (Data.Generics or Data.Data or SYB or "scrap your boilerplate" generics) is the easiest to use, so it's worth trying it first and moving on to more complicated libraries only if you can't get it to work for a particular task.
Here, syb provides a straightforward way of retrieving the list of field names from a record constructor. If you derive a Data instance for some Object:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
import Data.Text (Text)
import qualified Data.Text as Text
data Object = Object
{ classification :: Text
, country :: Text
, numberOfParts :: Int
} deriving (Data)
then you can fetch the field names at runtime with the following function:
-- Get field names (empty list if not record constructor)
getnames :: Data object => object -> [Text]
getnames = map Text.pack . constrFields . toConstr
like so:
λ> :set -XOverloadedStrings
λ> getnames $ Object "prime" "Canada" 5
["classification","country","numberOfParts"]
You can fetch field values as Text at runtime using a generic query gmapQ and writing a generic helper function toText that converts field values of various types to Text:
-- Get field values as Text.
getfields :: Data object => object -> [Text]
getfields = gmapQ toText
The toText function has type:
toText :: (Data a) => a -> Text
and needs to be prepared to handle any possible field encountered. A limitation of Data.Data generics is that you can only handle a fixed set of explicit types with a default value for "the rest". Here, we handle Text, String, Int, and Double types and throw an error with unknown for "the rest":
{-# LANGUAGE TypeApplications #-}
toText = mkQ unknown -- make a query with default value "unknown"
id -- handle: id :: Text -> Text
`extQ` Text.pack -- extend to: pack :: String -> Text
`extQ` tshow #Int -- extend to: tshow :: Int -> Text
`extQ` tshow #Double -- extend to: tshow :: Double -> Text
where tshow :: (Show a) => a -> Text
tshow = Text.pack . show
unknown = error "unsupported type"
If you wanted to handle all types with a Show (or some other) instance, then syb won't do the job. (If you tried dropping the type application above and writing `extQ` tshow to handle all Show cases, you'd get an error.) Instead, you'd need need to upgrade to syb-with-class or some other generics library to handle this.
With all that in place, getting a list of key/value pairs from any object is straightword:
getpairs :: Data object => object -> [(Text,Text)]
getpairs = zip <$> getnames <*> getfields
This works on Objects:
λ> concatMap getpairs [Object "prime" "Canada" 5, Object "substandard" "Fakeistan" 100]
[("classification","prime"),("country","Canada"),("numberOfParts","5")
,("classification","substandard"),("country","Fakeistan"),("numberOfParts","100")]
or anything else with a Data instance. Sum types and record-less constructors should work okay. With the type:
data OtherObject = Foo { foo :: String, factor :: Double }
| Bar { bar :: Int }
| NotARecord Int Int Int
deriving (Data)
we get:
λ> getpairs $ Foo "exchange" 0.75
[("foo","exchange"),("factor","0.75")]
λ> getpairs $ Bar 42
[("bar","42")]
λ> getpairs $ NotARecord 1 2 3
[]
Here's a complete code example:
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeApplications #-}
import Data.Generics
import Data.Text (Text)
import qualified Data.Text as Text
data Object = Object
{ classification :: Text
, country :: Text
, numberOfParts :: Int
} deriving (Data)
data OtherObject = Foo { foo :: String, factor :: Double }
| Bar { bar :: Int }
| NotARecord Int Int Int
deriving (Data)
-- Get field names (empty list if not record constructor)
getnames :: Data object => object -> [Text]
getnames = map Text.pack . constrFields . toConstr
-- Get field vales as Text.
getfields :: Data object => object -> [Text]
getfields = gmapQ toText
-- Generic function to convert one field.
toText :: (Data a) => a -> Text
toText = mkQ unknown -- make a query with default value "unknown"
id -- handle: id :: Text -> Text
`extQ` Text.pack -- extend to: pack :: String -> Text
`extQ` tshow #Int -- extend to: tshow :: Int -> Text
`extQ` tshow #Double -- extend to: tshow :: Double -> Text
where tshow :: (Show a) => a -> Text
tshow = Text.pack . show
unknown = error "unsupported type"
-- Get field name/value pairs from any `Data` object.
getpairs :: Data object => object -> [(Text,Text)]
getpairs = zip <$> getnames <*> getfields
main :: IO ()
main = mapM_ print $
[ getpairs $ Object "prime" "Canada" 5
, getpairs $ Foo "exchange" 0.75
, getpairs $ Bar 42
, getpairs $ NotARecord 1 2 3
]
This solution depends on the generics machinery from generics-sop and the streaming sinks from foldl.
Some required pragmas and imports:
{-# LANGUAGE DeriveGeneric,DeriveAnyClass,ScopedTypeVariables,FlexibleContexts,
GADTs,TypeApplications,OverloadedStrings,StandaloneDeriving, TypeOperators #-}
module Main (main) where
import qualified GHC.Generics as GHC
import Generics.SOP (All,And,IsProductType,productTypeFrom,
DatatypeInfo(..),datatypeInfo,
ConstructorInfo(..),FieldInfo(..),FieldName,
projections, I(..), K(..),type (-.->)(Fn),type (:.:)(Comp),
Generic,HasDatatypeInfo)
import Generics.SOP.NP -- All the *_NP functions come form here
import Generics.SOP.Dict (Dict(..),zipAll)
import qualified Control.Foldl as L
import Data.Proxy
import Data.Text (Text)
import qualified Data.Map.Strict as Map
Datatypes and functions to calculate histograms, which aren't tied to any concrete record:
newtype Histogram a = Histogram (Map.Map a Int) deriving Show
-- Hides the exact type of the key behind an existential
data SomeHistogram = forall a. (Ord a, Show a) => SomeHistogram (Histogram a)
deriving instance Show SomeHistogram
-- Streaming sink for a single field
histogram :: Ord a => L.Fold a (Histogram a)
histogram = (L.Fold step Map.empty Histogram)
where
step m a = Map.insertWith (+) a 1 m
-- For any record with Generics.SOP.Generic instance,
-- create a streaming sink that accepts record values and
-- returns a list of histograms, one for each field
recordHistogram :: forall r xs . (IsProductType r xs, All Ord xs, All Show xs)
=> L.Fold r [SomeHistogram]
recordHistogram =
let productOfFolds =
cliftA_NP
(Proxy #Ord)
(\(Fn proj) ->
Comp (L.premap (\o -> let np = productTypeFrom #r #xs o
I r = proj (K np)
in r)
histogram))
(projections #xs)
foldToProduct = sequence'_NP productOfFolds -- pull the Fold outward
-- convince GHC that we have a combination of Ord and Show for all fields
ordAndShow = zipAll (Dict #(All Ord) #xs) (Dict #(All Show) #xs)
foldToList = case ordAndShow of -- collapse result of Fold into a list
Dict -> collapse_NP . cliftA_NP (Proxy #(Ord `And` Show)) (K . SomeHistogram)
<$>
foldToProduct
in foldToList
In case we want a list of field names to zip with the list of histograms:
fieldNamesOf :: forall r xs. (IsProductType r xs, HasDatatypeInfo r)
=> Proxy r
-> [FieldName]
fieldNamesOf _ =
case datatypeInfo (Proxy #r) of
ADT _ _ ((Record _ fields) :* Nil) _ ->
collapse_NP (liftA_NP (\(FieldInfo i) -> K i) fields)
_ -> error "hey, not a record!"
Putting it all to work with Object:
data Object = Object
{ classification :: Text
, country :: Text
, numberOfParts :: Int
} deriving (GHC.Generic,Generic,HasDatatypeInfo)
-- Generic and HasDatatypeInfo from generics-sop
main :: IO ()
main = print $ L.fold recordHistogram [Object "foo" "Spain" 4, Object "bar" "France" 4]
This solution has two potential problems:
Internally, recordHistogram uses n-ary products from generics-sop. Constructing and traversing these products might incur in some overhead.
There might be some space leak in the streaming sink (the Fold) returned by recordHistogram. Some extra strictness might be necessary.
This is a simple question, but I cannot find the way to use the PSQ library.
The code below is messy, but seems to find PSQ and fromList, but fails to find Binding (Error: Not in scope: data constructor 'Data.PSQueue.Binding'). LearnYouAHaskell does not cover how to use non-standard libraries and I can't find any simple examples that just show PSQ being implemented.
import qualified Data.PSQueue (Binding, PSQ, fromList)
{-
data Binding k p
k :-> p binds the key k with the priority p.
Constructors
k :-> p
data PSQ k p
A mapping from keys k to priorites p.
-}
type VertHeap = Data.PSQueue.PSQ Int Int
main = do
--fromList :: (Ord k, Ord p) => [Binding k p] -> PSQ k p
return $ Data.PSQueue.fromList $ map (\k -> Data.PSQueue.Binding k 1000000) [2..10]
It can be easy to miss, but the data constructor for the Binding type is :->.
So this import should work:
import qualified Data.PSQueue (PSQ,Binding(..),fromList)
and later:
return $ Data.PSQueue.fromList $ map (\k -> k Data.PSQueue.:-> 1000000) [2..10]
Using Binding(..) will import all of the data constructors for the Binding data type.
Edit: :-> is just an infix operator defined by Data.PSQueue. Data.PSQueue.:-> is the fully qualified name for it.
Once I understood how to refer to Binding, I could use a more familiar pattern
import qualified Data.PSQueue as PSQ
type VertHeap = PSQ.PSQ Int Int
main = do
return $ PSQ.fromList $ map (\k -> k PSQ.:-> 1000000) [2..10]