How to parse a JSON string using Aeson that can be one of two different types

How to parse a JSON string using Aeson that can be one of two different types - haskell

I'm currently struggling to parse some JSON data using the aeson library. There are a number of properties that have the value false when the data for that property is absent. So if the property's value is typically an array of integers and there happens to be no data for that property, instead of providing an empty array or null, the value is false. (The way that this data is structured isn't my doing so I'll have to work with it somehow.)
Ideally, I would like to end up with an empty list in cases where the value is a boolean. I've created a small test case below for demonstration. Because my Group data constructor expects a list, it fails to parse when it encounters false.
data Group = Group [Int] deriving (Eq, Show)
jsonData1 :: ByteString
jsonData1 = [r|
{
"group" : [1, 2, 4]
}
|]
jsonData2 :: ByteString
jsonData2 = [r|
{
"group" : false
}
|]
instance FromJSON Group where
parseJSON = withObject "group" $ \g -> do
items <- g .:? "group" .!= []
return $ Group items
test1 :: Either String Group
test1 = eitherDecode jsonData1
-- returns "Right (Group [1,2,4])"
test2 :: Either String Group
test2 = eitherDecode jsonData2
-- returns "Left \"Error in $.group: expected [a], encountered Boolean\""
I was initially hoping that the (.!=) operator would allow it to default to an empty list but that only works if the property is absent altogether or null. If it were "group": null, it would parse successfully and I would get Right (Group []).
Any advice for how to get it to successfully parse and return an empty list in these cases where it's false?

One way to solve this problem is to pattern match on the JSON data constructors that are valid for your dataset and raise invalid for all others.
For instance, you could write something like this for that particular field, keeping in mind that parseJSON is a function from Value -> Parser a:
instance FromJSON Group where
parseJSON (Bool False) = Group <$> pure []
parseJSON (Array arr) = pure (Group $ parseListOfInt arr)
parseJSON invalid = typeMismatch "Group" invalid
parseListOfInt :: Vector Value -> [Int]
parseListOfInt = undefined -- build this function
You can see an example of this in the Aeson docs, which are pretty good (but you kind of have to read them closely and a few times through).
I would probably then define a separate record to represent the top-level object that this key comes in and rely on generic deriving, but others may have a better suggestion there:
data GroupObj = GroupObj { group :: Group } deriving (Eq, Show)
instance FromJSON GroupObj
One thing to always keep in mind when working with Aeson are the core constructors (of which there are only 6) and the underlying data structures (HashMap for Object and Vector for Array, for instance).
For example, in the above, when you pattern match on Array arr, you have to be aware that you're getting a Vector Value there in arr and we still have some work to do to turn this into a list of integers, which is why I left that other function parseListOfInt undefined up above because I think it's probably a good exercise to build it?

Related

How to properly constrain `arbitrary` UUID-Generation?

I'm trying to create Arbitrary instances for some of my types to be used in QuickCheck property testing. I need randomly generated UUIDs, with the constraint that all-zero (nil) UUIDs are disallowed - that is, 00000000-0000-0000-0000-000000000000. Therefore, I set up the following generator:
nonzeroIdGen :: Gen UUID.UUID
nonzeroIdGen = arbitrary `suchThat` (not . UUID.null)
Which I use in an Arbitrary instance as follows:
instance Arbitrary E.EventId where
arbitrary = do
maybeEid <- E.mkEventId <$> nonzeroIdGen
return $ fromJust maybeEid
In general, this is unsafe code; but for testing, with supposedly guaranteed nonzero UUIDs, I thought the fromJust to be ok.
mkEventId is defined as
mkEventId :: UUID.UUID -> Maybe EventId
mkEventId uid = EventId <$> validateId uid
with EventId a new type-wrapper around UUID.UUID, and
validateId :: UUID.UUID -> Maybe UUID.UUID
validateId uuid = if UUID.null uuid then Nothing else Just uuid
To my surprise, I get failing tests because of all-zero UUIDs generated by the above code. A trace in mkEventId shows the following:
00000001-0000-0001-0000-000000000001
Just (EventId {getEventId = 00000001-0000-0001-0000-000000000001})
00000000-0000-0000-0000-000000000000
Nothing
Create valid Events. FAILED [1]
The first generated ID is fine, the second one is all-zero, despite my nonzeroIdGen generator from above. What am I missing?

I generally find that in cases like this, using newtypes to define instances of Arbitrary composes better. Here's one I made for valid UUID values:
newtype NonNilUUID = NonNilUUID { getNonNilUUID :: UUID } deriving (Eq, Show)
instance Arbitrary NonNilUUID where
arbitrary = NonNilUUID <$> arbitrary `suchThat` (/= nil)
You can then compose other Arbitrary instances from this one, like I do here with a Reservation data type:
newtype ValidReservation =
ValidReservation { getValidReservation :: Reservation } deriving (Eq, Show)
instance Arbitrary ValidReservation where
arbitrary = do
(NonNilUUID rid) <- arbitrary
(FutureTime d) <- arbitrary
n <- arbitrary
e <- arbitrary
(QuantityWithinCapacity q) <- arbitrary
return $ ValidReservation $ Reservation rid d n e q
Notice the pattern match (NonNilUUID rid) <- arbitrary to deconstruct rid as a UUID value.
You may notice that I've also created a ValidReservation newtype for my Reservation data type. I consistently do this to avoid orphan instances, and to avoid polluting my domain model with a QuickCheck dependency. (I have nothing against QuickCheck, but test-specific capabilities don't belong in the 'production' code.)
All the code shown here is available in context on GitHub.

Accessing record fields as 'Maybe' values in Haskell

In Haskell, if I specify field names for a type with a single constructor, the compiler should generate appropriate functions MyType -> fieldType. This breaks down if MyType has multiple constructors with different arities or types however. I want to know if there is some way I can tell the compiler to give these functions the signature MyType -> Maybe fieldType. i.e. instead of:
data MyType = Empty | Record { st :: String, ui :: Word }
-- where
-- st Empty == undefined
-- ui Empty == undefined
-- I have
data MyType = Empty | Record { st :: String, ui :: Word }
-- where
-- st :: MyType -> Maybe String
-- st Empty = Nothing
-- st (Record s _) = Just s
--
-- ui Empty = Nothing
-- ui (Record _ n) = n
I want to avoid the default behaviour of having expressions like st Empty returning undefined, because if st Empty returns Nothing, I can use pattern matching to decide on what to do next, rather than having to catch the exception further up the callstack in impure code. I realise this is not part of Haskell by default, so I wonder if there is a compiler extension that allows this? Alternately, could I implement something like this myself using templating?

No, there's no way to do this with record selectors. To understand why, remember that they can be used for record updates, rather than just as a function. If x = Empty, then there's still nothing reasonable that x { st = "foo" } could be. If you don't care about the functions actually being records, then you could use Template Haskell to generate just the functions you want, though.

Filtering for values in Haskell

I've been doing some Haskell exercises from a Haskell book, and one of the tasks is for me to filter for values of a certain type and return them as a list.
import Data.Time
data Item = DbString String
| DbNumber Integer
| DbDate UTCTime
deriving (Eq, Ord, Show)
database :: [Item]
database =
[
DbDate (UTCTime (fromGregorian 1911 5 1) (secondsToDiffTime 34123)),
DbNumber 9001,
DbString "Hello World!",
DbDate (UTCTime (fromGregorian 1921 5 1) (secondsToDiffTime 34123))
]
That's the code I am given to work with, and for my first task:
Write a function that filters for DbDate values and returns a list of the UTCTime values inside them. The template for the function is:
filterDate :: [Item] -> [UTCTime]
filterDate = undefined
What I have to use here are folds since that is the matter concerned here.
I looked up the Data.Time module on Hoogle and that didn't really help since I couldn't understand how to interact with the module. Maybe I'm looking at this from a wrong perspective because I don't think it has something to do with the filter function, and I don't think it has something to do with type-casting neither ::.
How do I get UTCTime values, and how do I filter for them?

OK, my Haskell-fu is extremely weak but I'm going to have a stab at an answer. You're looking to define a function that walks across a list and filters it. If the value is a DbDate then you return <that value> : <output list>, otherwise you return <output list>. By folding over the input you produce a filtered output. There's a relevant question at How would you define map and filter using foldr in Haskell? which might explain this better.
This breaks down to something like:
filterFn :: Item -> [UTCTime] -> [UTCTime]
filterFn (DbDate x) xs = x:xs
filterFn _ xs = xs
(this might be a syntax fail). This function takes an item off our [Item] and pattern matches.
If it matches DbDate x then x is a UTCTime and we append it to our input list.
If it doesn't then we ignore it and return the input list unchanged.
We can then fold:
filterDate = foldr filterFn []
Does that get you to an answer?

Item is defined as a union type, which means it can be a DbString, a DbNumber or a DbDate.
data Item = DbString String
| DbNumber Integer
| DbDate UTCTime
deriving (Eq, Ord, Show)
You can use pattern matching to get only the value you're interested in. You need to match on an item, check whether it is a DbDate and if that's the case extract the UTCTime instance it holds.
You said you want to use a fold so you need an accumulator where you can put the values you want to keep and a function to populate it.
filterDate items = foldl accumulate [] items
where extractTime item = case item of DbDate time -> [time]
_ -> []
accumulate item accumulator = accumulator ++ (extractTime item)
In the code above you have extractTime that pattern matches over an item and either returns a list containing the time or it returns an empty list. The accumulate function just puts together the values you got from the previous steps (they're stored in accumulator) and the value you got applying extractTime to the current item.

Parse top-level value with Aeson

I'm trying to parse JSON values with Aeson and I have no problem (so far) parsing objects or arrays, but I can't get Aeson to parse JSON documents that are just strings.
As I understand, since RFC 7159 values are legal JSON documents, and Aeson supports that since 0.9.0.0 (I'm using 0.9.0.1), so it should work. For example, I'm wrapping an API that returns strings as top-level JSON documents for many of its calls, and would like to newtype those strings for some static typing safety:
newtype Bar = Bar String deriving (Eq, Show)
instance FromJSON Bar where
parseJSON (String v) = pure (Bar $ T.unpack v)
parseJSON _ = mzero
If I try to decode something:
decode "JustSomeRandomString" :: Maybe Bar
all I get is Nothing in return.
Any ideas what I'm doing wrong? Of course, I could handle API calls that return strings as JSON documents without Aeson, but would like to keep things uniform!

Try decode "\"JustSomeRandomString\"" :: Maybe Bar

Set-like Data Structure without `Ord`?

Given the following types:
import Data.Set as Set
-- http://json.org/
type Key = String
data Json = JObject Key (Set JValue)
| JArray JArr
deriving Show
data JObj = JObj Key JValue
deriving Show
data JArr = Arr [JValue] deriving Show
data Null = Null deriving Show
data JValue = Num Double
| S String
| B Bool
| J JObj
| Array JArr
| N Null
deriving Show
I created a JObject Key (Set Value) with a single element:
ghci> JObject "foo" (Set.singleton (B True))
JObject "foo" (fromList [B True])
But, when I tried to create a 2-element Set, I got a compile-time error:
ghci> JObject "foo" (Set.insert (Num 5.5) $ Set.singleton (B True))
<interactive>:159:16:
No instance for (Ord JValue) arising from a use of ‘insert’
In the expression: insert (Num 5.5)
In the second argument of ‘JObject’, namely
‘(insert (Num 5.5) $ singleton (B True))’
In the expression:
JObject "foo" (insert (Num 5.5) $ singleton (B True))
So I asked, "Why is it necessary for JValue to implement the Ord typeclass?"
The docs on Data.Set answer that question.
The implementation of Set is based on size balanced binary trees (or trees of bounded balance)
But, is there a Set-like, i.e. non-ordered, data structure that does not require Ord's implementation that I can use?

You will pretty much always need at least Eq to implement a set (or at least the ability to write an Eq instance, whether or not one exists). Having only Eq will give you a horrifyingly inefficient one. You can improve this with Ord or with Hashable.
One thing you might want to do here is use a trie, which will let you take advantage of the nested structure instead of constantly fighting it.
You can start by looking at generic-trie. This does not appear to offer anything for your Array pieces, so you may have to add some things.
Why Eq is not good enough
The simplest way to implement a set is using a list:
type Set a = [a]
member a [] = False
member (x:xs) | a == x = True
| otherwise = member a xs
insert a xs | member a xs = xs
| otherwise = a:xs
This is no good (unless there are very few elements), because you may have to traverse the entire list to see if something is a member.
To improve matters, we need to use some sort of tree:
data Set a = Node a (Set a) (Set a) | Tip
There are a lot of different kinds of trees we can make, but in order to use them, we must be able, at each node, to decide which of the branches to take. If we only have Eq, there is no way to choose the right one. If we have Ord (or Hashable), that gives us a way to choose.
The trie approach structures the tree based on the structure of the data. When your type is deeply nested (a list of arrays of records of lists...), either hashing or comparison can be very expensive, so the trie will probably be better.
Side note on Ord
Although I don't think you should use the Ord approach here, it very often is the right one. In some cases, your particular type may not have a natural ordering, but there is some efficient way to order its elements. In this case you can play a trick with newtype:
newtype WrappedThing = Wrap Thing
instance Ord WrappedThing where
....
newtype ThingSet = ThingSet (Set WrappedThing)
insertThing thing (ThingSet s) = ThingSet (insert (Wrap thing) s)
memberThing thing (ThingSet s) = member (WrapThing) s
...
Yet another approach, in some cases, is to define a "base type" that is an Ord instance, but only export a newtype wrapper around it; you can use the base type for all your internal functions, but the exported type is completely abstract (and not an Ord instance).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to parse a JSON string using Aeson that can be one of two different types - haskell

Related

How to properly constrain `arbitrary` UUID-Generation?

Accessing record fields as 'Maybe' values in Haskell

Filtering for values in Haskell

Parse top-level value with Aeson

Set-like Data Structure without `Ord`?

Categories

Resources