Using file name in json parsing (Haskell Aeson) - haskell

I have no experience in Haskell. I'm trying to parse many .json files to a data structure in Haskell using aeson. However, by reasons beyond my control, I need to store the name of the file from where the data was parsed as one of the fields in my data. A simple example of what I have so far is:
data Observation = Observation { id :: Integer
, value :: Integer
, filename :: String}
instance FromJSON Observation where
parseJson (Object v) =
Observation <$> (read <$> v .: "id")
<*> v .: "value"
<*> ????
My question is: what is a smart way to be able to serialize my data when parsing a json file having access to the name of the file?
What comes in my mind is to define another data like NotNamedObservation, initialize it and then having a function that converts NotNamedObservation -> String -> Observation (where String is the filename) but that sounds like a very poor approach.
Thanks.

Just make your instance a function from file path to Observation:
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson
import qualified Data.ByteString.Lazy as LBS
import System.Environment
data Observation = Observation { ident :: Integer
, value :: Integer
, filename :: FilePath
} deriving (Show)
instance FromJSON (FilePath -> Observation) where
parseJSON (Object v) =
do i <- read <$> v .: "id"
l <- v .: "value"
pure $ Observation i l
main :: IO ()
main = do
files <- getArgs
fileContents <- traverse LBS.readFile files
print fileContents
let fs = map (maybe (error "Invalid json") id . decode) fileContents
jsons :: [Observation]
jsons = zipWith ($) fs files
print jsons

When you don't control the data definition and you have strict requirements about the format to parse, it's better to write the (de)serializer explicitly.
If external information is required to fully construct values, avoid the FromJSON/ToJSON type classes, and just write standalone parsers.
aeson's deriving mechanism is more suited to applications that talk to themselves (and thus only care about round-tripping between parseJSON and toJSON), or where there is flexibility either in defining the JSON format or the Haskell types.
If you still have to use these classes for some reason, one option is of course to just put undefined in those missing fields. To rely on the type system more, you can also parameterize types by a "phase" (that assumes again you can tweak the data type), which is a type constructor that wraps some fields.
Functor functors
One way to keep that style compatible with regular records is to use this HKD/defunctionalization design pattern.
data Observation' p = Observation
{ id :: Integer
, value :: Integer
, filename :: p String }
-- This is isomorphic to the original Observation data type
type Observation = Observation Identity
-- When we don't have the filename available, we keep the field empty with Proxy
instance FromJSON (Observation' Proxy) where
...
mkObservation :: FileName -> Observation' Proxy -> Observation

Related

What is best practice for handling data integrity when using Haskell's Persistent library?

Lets say I have a basic model in my application for handling users:
User
username Text
password Text
email Text
Based on these types I can store whatever I like (as long as it is Text obviously) to any of these fields. Say I want to begin enforcing some rules around what can and cannot be considered an email. I create this email module:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Model.EmailAddress
( EmailAddress
, parseEmailAddress
) where
import Data.Aeson
import Data.Text (Text)
import qualified Data.Text.Encoding as T
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.Builder as TL
import Database.Persist
import Database.Persist.Sql
import Text.Shakespeare.Text (ToText (..))
newtype EmailAddress = EmailAddress (Either InvalidEmailAddress ValidEmailAddress)
deriving (Show, Eq)
instance ToJSON EmailAddress where
toJSON = String . toStrictText
instance FromJSON EmailAddress where
parseJSON = withText "EmailAddress" (pure . parseEmailAddress)
instance ToText EmailAddress where
toText (EmailAddress e) = either toText toText e
newtype ValidEmailAddress = ValidEmailAddress Text
deriving (Show, Eq)
instance ToText ValidEmailAddress where
toText (ValidEmailAddress t) = toText t
data InvalidEmailAddress = InvalidEmailAddress
{ invalidEmailAddressValue :: Text
, invalidEmailAddressReason :: Text
} deriving (Show, Eq)
instance ToText InvalidEmailAddress where
toText = toText . invalidEmailAddressValue
parseEmailAddress :: Text -> EmailAddress
parseEmailAddress t =
-- super basic validation example
EmailAddress $ case T.count "#" t of
0 -> Left (InvalidEmailAddress t "Must contain # symbol")
1 -> Right (ValidEmailAddress t)
_ -> Left (InvalidEmailAddress t "Contains more than one # symbol")
instance PersistFieldSql EmailAddress where
sqlType _ = SqlString
instance PersistField EmailAddress where
fromPersistValue =
decodePersistTextValue parseEmailAddress
toPersistValue =
PersistText . toStrictText
decodePersistTextValue :: PersistField a => (Text -> a) -> PersistValue -> Either Text a
decodePersistTextValue f = \case
PersistByteString bs ->
fromPersistValue $ PersistText (T.decodeUtf8 bs)
v ->
f <$> fromPersistValue v
toStrictText :: ToText t => t -> Text
toStrictText = builderToStrictText . toText
builderToStrictText :: TL.Builder -> Text
builderToStrictText = TL.toStrict . TL.toLazyText
I could now define my model like so:
User
username Text
password Text
email EmailAddress
However, this does not stop me from writing a bad email address to the database. If the rules around what constitutes a valid email address changes, I can read them from the DB still, and previously valid emails will come out of the DB as invalid, and this is fine, I like this. However I cannot enforce (at the persistent level anyway) preventing writing an invalid email to the database, because toPersistValue has the type a -> PersistValue, I suppose I would have to enforce this a layer up (when data is coming into the app, form decoders, json decoders etc) to ensure when a Left InvalidEmailAddress appears the user is given an error.
This leads me to thinking, is this the best practice around enforcing data integrity when using persistent? is there a better approach? Should persistent not be involved at all? ie the field remains as a Text and it is down to the app to convert these fields when they're being accessed?
I would really like a way of writing my entities to the database in a type safe way that allows me to determine what to do in the situation where an invalid email is used to try and save my User type, I could throw an error in the toPersistValue implementation but this feels a bit dirty to me, and not very Haskell'y.

Making ToJSON use Show Instance

If I have a data type that looks like this:
data SumType = ABC | DEF deriving (Generic, ToJSON)
data MyType = MyType {field1 :: String, field2 :: SumType} deriving (Generic, ToJSON)
The above will generate a JSON that looks like: {"field1": "blah", "field2":"ABC"}
In practice, MyType is a fairly complex type and I would like to keep the ToJSON deriving but want to adjust just one field to use the show instance.
instance Show SumType where
show ABC = "abc-blah"
show DEF = "def-xyz"
Unfortunately, the above Show instance does not get picked up by ToJSON (I don't know if it is supposed to). Hand rolling ToJSON for SumType does not seem to work because it expects a key-value pair (Maybe there is another of way doing it?). In other words, the JSON will be like: {"field1": "blah", "field2":{"field3": "ABC"}} -- I just want to change the way the value is stringified and not create a new object there.
Any suggestions on how I can change the output string of SumType without manually creating ToJSON for MyType? So the output is {"field1": "blah", "field2":"abc-blah"}
Thanks!
I do not see what the problem is with defining a ToJSON instance for the SumType. You can do this with:
import Data.Aeson(ToJSON(toJSON), Value(String))
import Data.Text(pack)
instance ToJSON SumType where
toJSON = String . pack . show
Or if you want to use other strings for the ToJSON than Show:
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson(ToJSON(toJSON), Value(String))
instance ToJSON SumType where
toJSON ABC = String "ABC for JSON"
toJSON DEF = String "DEF for JSON"
Now Haskell will JSON-encode the SumType as a JSON string:
Prelude Data.Aeson> encode ABC
"\"ABC for JSON\""
Prelude Data.Aeson> encode DEF
"\"DEF for JSON\""
You can do the same with FromJSON to parse the JSON string back into a SumType object:
{-# LANGUAGE OverloadedStrings #-}
import Data.Aeson(FromJSON(parseJSON), withText)
instance FromJSON SumType where
parseJSON = withText "SumType" f
where f "ABC for JSON" = return ABC
f "DEF for JSON" = return DEF
f _ = fail "Can not understand what you say!"
If we then parse back the JSON string, we get:
Prelude Data.Aeson> decode "\"ABC for JSON\"" :: Maybe SumType
Just ABC
Prelude Data.Aeson> decode "\"DEF for JSON\"" :: Maybe SumType
Just DEF
Prelude Data.Aeson> decode "\"other JSON string\"" :: Maybe SumType
Nothing
Prelude Data.Aeson> decode "{}" :: Maybe SumType
Nothing
So in case we do not decode a JSON string that follows one of the patterns we have defined, the parsing will fail. The same happens if we do not provide a JSON string, but for instance an empty JSON object.
Additional notes:
Since SumType here has two values, you can also use a JSON boolean to encode the values.
you can also encode on different JSON objects. You can for instance use the JSON string for ABC, and an integer for DEF.
Although I would advice not to do this until there are good reasons
(if for instance ABC carries only a string, and DEF ony an
integer).
usually the more complex you make encoding, the more complex decoding will be.

Haskell Type Polymorphism -- Mapping to String

I am new to Haskell, so maybe I am missing some fundamental concepts here (or maybe failed to find the appropriate extension). I was wondering if there was a way to optimize or further abstract the following scenario. This code seems very redundant.
Let's say I have the following data classes:
data Person = Person
{ personName :: !String
, personAge :: !Int
} deriving Show
data Dog = Dog
{ dogName :: !String
, dogAge :: !Int
} deriving Show
Let's say I have a service and I'm only concerned with outputing records as strings. In reality, the strings will probably be JSON and the records fetched from the DB, but let's take a simpler case. I basically need a URL token to fetch an appropriate object (say, the string "dog" will get me a Dog, or even just the Haskell "show" string, without expressly declaring it as (value)::Dog).
I have attempted to implement this in several ways...the only thing that seems to work is the following:
data Creature = DogC Dog
| PersonC Person
deriving Show
fromString :: String -> Maybe Creature
fromString "dog" = Just $ DogC $ Dog "muffin" 8
fromString "person" = Just $ PersonC $ Person "John" 22
fromString _ = Nothing
main :: IO ()
main = do
putStrLn $ show $ fromString "dog"
I'm not entirely fond of the new type, nor the list of fromString declarations. And to benefit from the original data declarations, I would probably need to write a similarly tedious expression (eg, "fromCreature") to revert Creature back into my original types. This information might change, so I would probably need TH for a few of the declarations...
Is there a way around some of this? I fiddled with GADTs and classes, but both seemed to be dependent on type- rather than value- based polymorphism (A string identifier tends to cause issues with ambiguous instances). It would be nice to map the constructor to a string (Say, with Data.Map), but constructors often have different kinds.
Update
So, I went with an approach that isn't exactly relevant to the question I had asked, but it may be useful to someone. I did want to maintain some record types, but most didn't add much value and were getting in my way. The steps I had followed went something like:
Use a different/lower-level DB driver, that returns workable types (eg, [ColumnDef] and [[SQLValue]] instead of tuples and records...).
Create ToJSON instances for SQLValue -- most of the types were covered, except a few ByteString types, and I had to handle the conversion of SQLNull to Null. To maintain compatibility with some record types, my default handler looked like: toJSON = genericToJSON defaultOptions { sumEncoding = UnTaggedValue} The untagged value should allow one to read the JSON into defined data types (eg, Dog / Person ) if desired....
Given that column name is accessible from ColumnDef, I wrote an expression that zips [ColumnDef] and [SqlValue] to a list of Aeson-compatible key-value pairs, eg: toJsPairs :: [ColumnDef] -> [SqlValue] -> [(Text,Value)]
Then, I wrote an expression to fetch the JSON from a table name, which more or less serves as my "universal dispatcher." It references a list of authorized tables, so it's less crazy than it might sound.
The code looked a bit like this (using mysql-haskell).
{-# LANGUAGE OverloadedStrings #-}
import qualified Control.Applicative as App
import Database.MySQL.Base
import qualified System.IO.Streams as Streams
import Data.Aeson (FromJSON, ToJSON)
import Data.Aeson.Encode.Pretty (encodePretty)
import Data.Aeson.Types
import Data.Text.Encoding
import Data.String (fromString)
import Data.ByteString.Internal
import qualified Data.ByteString.Lazy.Internal as BLI
import Data.HashMap.Strict (fromList)
appConnectInfo = defaultConnectInfo {
ciUser = "some_user"
, ciPassword = "some_password"
, ciDatabase = "some_db"
}
instance FromJSON ByteString where
parseJSON (String s) = pure $ encodeUtf8 s
parseJSON _ = App.empty
instance ToJSON ByteString where
toJSON = String . decodeUtf8
instance ToJSON MySQLValue where
toJSON (MySQLNull) = Null
toJSON x = genericToJSON defaultOptions
{ sumEncoding = UntaggedValue } x
-- This expression should fail on dimensional mismatch.
-- It's stupidly lenient, but really dimensional mismatch should
-- never occur...
toJsPairs :: [ColumnDef] -> [MySQLValue] -> [(Text,Value)]
toJsPairs [] _ = []
toJsPairs _ [] = []
toJsPairs (x:xs) (y:ys) = (txt x, toJSON y):toJsPairs xs ys
where
-- Implement any modifications to the key names here
txt = decodeUtf8.columnName
listRecords :: String -> IO BLI.ByteString
listRecords tbl = do
conn <- connect appConnectInfo
-- This is clearly an injection vulnerability.
-- Implemented, however, the values for 'tbl' are intensely
-- vetted. This is just an example.
(defs, is) <- query_ conn $ fromString ( "SELECT * FROM `" ++ tbl ++ "` LIMIT 100")
rcrds <- Streams.toList is
return $ encodePretty $ map (jsnobj defs) rcrds
where
jsnobj :: [ColumnDef] -> [MySQLValue] -> Value
jsnobj defs x = Object $ fromList $ toJsPairs defs x
If what you want to consume at the end is json value - it might make sense to
represent result as json value using aeson library:
{-# LANGUAGE DeriveGeneric #-}
import Data.Aeson
import GHC.Generics
data Dog = Dog Int String deriving (Show, Generic)
data Cat = Cat Int String deriving (Show, Generic)
-- here I'm using instance derived with generics, but you can write one by
-- hands
instance ToJSON Dog
instance ToJSON Cat
-- actions to get stuff from db
getDog :: Monad m => Int -> m Dog
getDog i = return (Dog i (show i))
getCat :: Monad m => Int -> m Cat
getCat i = return (Cat i (show i))
-- dispatcher - picks which action to use
getAnimal :: Monad m => String -> Int -> m (Maybe Value)
getAnimal "dog" i = Just . toJSON <$> getDog i
getAnimal "cat" i = Just . toJSON <$> getCat i
getAnimal _ _ = return Nothing
main :: IO ()
main = do
getAnimal "dog" 2 >>= print
getAnimal "cat" 3 >>= print
getAnimal "chupakabra" 12 >>= print
High energy magic version
class Monad m => MonadAnimal m where
-- basically you want something that fetches extra argumets from HTTP or
-- whatevere, perform DB query and so on.
class Animal a where
animalName :: Proxy a -> String
animalGetter :: MonadAnimal m => m a
locateAnimals :: MonadAnimal m => Q [(String, m Value)]
locateAnimals -- implement using TH (reify function is your friend). It should look for
-- all the animal instances in scope and make a list from them with serialized
-- fetcher.
-- with that in place dispatcher should be easy to implement

How to conditionally parse JSON based on settings in a Reader environment?

Is there any way to pass down a reader environment to the JSON (de)serialisation functions of Aeson? Here's a real-life example of why this could be required?
-- JSON instances for decimal -- ORPHAN instances
defaultPrecision :: Word8
defaultPrecision = fromInteger 2
instance ToJSON Data.Decimal.Decimal where
toJSON d = toJSON $ show d
instance FromJSON Data.Decimal.Decimal where
-- TODO: New problem! How do we get the precision dynamically, based on
-- the currency settings of the logged-in user
parseJSON (Number n) = return $ Data.Decimal.realFracToDecimal defaultPrecision n
parseJSON x = fail $ "Expectig a number in the JSON to parse to a Decimal. Received " ++ (show x)
If instance depends on some runtime value then what you really want is an ability to create instances at runtime. You can implement FromJSON for Reader as it's done in your gist. But as you correctly noticed, you can't do the same for ToJSON because you don't know this precision. The easiest solution would be just store precision as separate field in data type. Like this:
data DecimalWithPrecision = MkDWP
{ value :: Decimal
, precision :: Word8
}
If you store this data type in your data base and query it after user login, then this is the easiest solution and doesn't require type level tricks from you.
If you don't know precision in advance, for example user inputs precision through console (I don't know why, but lets assume this), then this wouldn't work for you. As we all know that «type classes are just syntax sugar for data types», you can replace ToJSON/FromJSON constraints with JsonDict for Money_ in the following manner:
newtype Money_ = Money_ (Reader Word8 Decimal)
data JsonDict a = JsonDict
{ jdToJSON :: a -> Value
, jdParseJSON :: Value -> Parser a
}
mkJsonDict :: Word8 -- precision
-> JsonDict Money_
You can create such dictionary (or something similar to it) on-the-fly using Word8 from context and just pass it to function which needs it. See this blog post by Gabriel Gonzalez for details.
If you really want to have toJSON implementation inside instance, you can use reflection library. Precision is a natural number which gives you ability to use this library. Using it you basically can create instances in runtime as in previous approach, but you still have your type classes. See this blog post where similar technique was applied to make Arbitrary instances depend on runtime values. In your case this will look like this:
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE UndecidableInstances #-}
import Control.Monad.Reader (Reader, ask)
import Data.Aeson (FromJSON (..), Result (..), ToJSON (..),
Value, fromJSON, withNumber)
import Data.Aeson.Types (Parser)
import Data.Decimal (Decimal, realFracToDecimal)
import Data.Proxy (Proxy (..))
import Data.Reflection (Reifies (reflect), reify)
import Data.Word8 (Word8)
newtype PreciseDecimal s = PD Decimal
instance Reifies s Int => FromJSON (PreciseDecimal s) where
parseJSON = withNumber "a number" $ \n -> do
let precision = fromIntegral $ reflect (Proxy :: Proxy s)
pure $ PD $ realFracToDecimal precision n
instance Reifies s Int => ToJSON (PreciseDecimal s) where
toJSON (PD decimal) =
let precision = reflect (Proxy :: Proxy s)
ratDec = realToFrac decimal :: Double
in toJSON ratDec -- use precision if needed
makeMoney :: Decimal -> Reader Word8 (Value, Decimal)
makeMoney value = do
precision <- fromIntegral <$> ask
let jsoned = reify precision $ \(Proxy :: Proxy s) ->
toJSON (PD value :: PreciseDecimal s)
let parsed = reify precision $ \(Proxy :: Proxy s) ->
let Success (PD res :: PreciseDecimal s)
= fromJSON jsoned in res
pure (jsoned, parsed)
And then you can run it like this to test:
ghci> runReader (makeMoney 3.12345) 2
(Number 3.12345,3.12)

Aeson Example not working

I'm following the Aeson library documentation but their example doesn't seem to work for me:
Code:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text
import Data.Aeson
import Control.Applicative ((<$>),(<*>))
import Control.Monad
instance FromJSON Person where
parseJSON (Object v) = Person <$>
v .: "name" <*>
v .: "age"
-- A non-Object value is of the wrong type, so fail.
parseJSON _ = mzero
data Person = Person
{ name :: Text
, age :: Int
} deriving Show
Error report:
ghci> decode "{\"name\":\"Joe\",\"age\":12}" :: Maybe Person
Couldn't match expected type `Data.ByteString.Lazy.Internal.ByteString'
with actual type `[Char]'
In the first argument of `decode', namely
`"{\"name\":\"Joe\",\"age\":12}"'
In the expression:
decode "{\"name\":\"Joe\",\"age\":12}" :: Maybe Person
In an equation for `a':
a = decode "{\"name\":\"Joe\",\"age\":12}" :: Maybe Person
Am i doing something wrong here ?
The problem is that decode expects a ByteString and you are passing a String.
Try this in ghci:
:m +Data.ByteString.Lazy.Char8
decode $ pack "{\"name\":\"Joe\",\"age\":12}" :: Maybe Person
In real code you shouldn't use the Char8 module as it just truncates Chars to 8 bits without taking any account of encoding. Generally you should aim to start out with a ByteString, e.g. by reading it from disk using the functions in Data.ByteString.Lazy.

Resources