Haskell Parsec and Unordered Properties

Haskell Parsec and Unordered Properties - haskell

I am trying to use Parsec to parse something like this:
property :: CharParser SomeObject
property = do
name
parameters
value
return SomeObjectInstance { fill in records here }
I am implementing the iCalendar spec and on every like there is a name:parameters:value triplet, very much like the way that XML has a name:attributes:content triplet. Infact you could very easily convert an iCalendar into XML format (thought I can't really see the advantages).
My point is that the parameters do not have to come in any order at all and each paramater may have a different type. One parameter may be a string while the other is the numeric id of another element. They may share no similarity yet, in the end, I want to place them correctly in the right record fields for whatever 'SomeObjectInstance' that I wanted the parser to return. How do I go about doing this sort of thing (or can you point me to an example of where somebody had to parse data like this)?
Thankyou, I know that my question is probably a little confused but that reflects my level of understanding of what I need to do.
Edit: I was trying to avoid giving the expected output (because it is large, not because it is hidden) but here is an example of an input file (from wikipedia):
BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//hacksw/handcal//NONSGML v1.0//EN
BEGIN:VEVENT
UID:uid1#example.com
DTSTAMP:19970714T170000Z
ORGANIZER;CN=John Doe:MAILTO:john.doe#example.com
DTSTART:19970714T170000Z
DTEND:19970715T035959Z
SUMMARY:Bastille Day Party
END:VEVENT
END:VCALENDAR
As you can see it contains one VEvent inside a VCalendar, I have made data structures that represent them here.
I am trying to write a parser that parses that type of file into my data structures and I am stuck on the bit where I need to handle properties coming in any order with any type; date, time, int, string, uid, ect. I hope that makes more sense without repeating the entire iCalendar spec.

Parsec has the Parsec.Perm module precisely to parse unordered but linear (i.e. at the same level in the syntax tree) elements such as attribute tags in XML files.
Unfortunately the Perm module is mostly undocumented. The best reference is the Parsing Permutation Phrases paper which the Haddock doc page refers to, but even that is largely a description of the technique rather than how to use it.

Ok, so between BEGIN:VEVENT and END:VEVENT, you have many key value pairs. So write a rule keyValuePair that returns (key, value). Now inside the rule for VEVENT you do many KeyValuePair to get a list of pairs. Once you've done that you use a fold to populate a VEVENT record with the given values. In the function you give to fold, you use pattern matching to find out in which field to store the value. As the starting value for the accumulator you use a VEvent record where the optional fields are set to Nothing. Example:
pairs <- many keyValuePairs
vevent = foldr f (VEvent {sequence = Nothing}) pairs
where f ("SUMMARY", v) ve = ve {summary = v}
f ("DSTART", v) ve = ve {dstart = read v}
...and so on. Do the same for the other components.
Edit: Here's some runnable example code for the fold:
data VEvent = VEvent {
summary :: String,
dstart :: String,
sequenceSt :: Maybe String
} deriving Show
vevent pairs = foldr f (VEvent {sequenceSt = Nothing}) pairs
where f ("SUMMARY", v) ve = ve {summary = v}
f ("DSTART", v) ve = ve {dstart = v}
f ("SEQUENCEST", v) ve = ve {sequenceSt = Just v}
main = do print $ vevent [("SUMMARY", "lala"), ("DSTART", "lulu")]
print $ vevent [("SUMMARY", "lala"), ("DSTART", "lulu"), ("SEQUENCEST", "lili")]
Output:
VEvent {summary = "lala", dstart = "lulu", sequenceSt = Nothing}
VEvent {summary = "lala", dstart = "lulu", sequenceSt = Just "lili"}
Note that this will produce a warning when compiled. To avoid the warning, initialize all non-optional fields to undefined explicitly.

Related

Need help storing the previous element of a list (Haskell)

I'm currently working on an assignment. I have a function called gamaTipo that converts the values of a tuple into a data type previously defined by my professor.
The problem is: in order for gamaTipo to work, it needs to receive some preceding element. gamaTipo is defined like this: gamaTipo :: Peca -> (Int,Int) -> Peca where Peca is the data type defined by my professor.
What I need to do is to create a funcion that takes a list of tuples and converts it into Peca data type. The part that im strugling with is taking the preceding element of the list. i.e : let's say we have a list [(1,2),(3,4)] where the first element of the list (1,2) always corresponds to Dirt Ramp (data type defined by professor). I have to create a function convert :: [(Int,Int)] -> [Peca] where in order to calculate the element (3,4) i need to first translate (1,2) into Peca, and use it as the previous element to translate (3,4)
Here's what I've tried so far:
updateTuple :: [(Int,Int)] -> [Peca]
updateTuple [] = []
updateTuple ((x,y):xs) = let previous = Dirt Ramp
in (gamaTipo previous (x,y)): updateTuple xs
Although I get no error messages with this code, the expected output isn't correct. I'm also sorry if it's not easy to understand what I'm asking, English isn't my native tongue and it's hard to express my self. Thank you in advance! :)

If I understand correctly, your program needs to have a basic structure something like this:
updateTuple :: [(Int, Int)] -> [Peca]
updateTuple = go initialValue
where
go prev (xy:xys) =
let next = getNextValue prev xy
in prev : (go next xys)
go prev [] = prev
Basically, what’s happening here is:
updateTuple is defined in terms of a helper function go. (Note that ‘helper function’ isn’t standard terminology, it’s just what I’ve decided to call it).
go has an extra argument, which is used to store the previous value.
The implementation of go can then make use of the previous value.
When go recurses, the recursive call can then pass the newly-calculated value as the new ‘previous value’.
This is a reasonably common pattern in Haskell: if a recursive function requires an extra argument, then a new function (often named go) can be defined which has that extra argument. Then the original function can be defined in terms of go.

Finding list entry with the highest count

I have an Entry data type
data Entry = Entry {
count :: Integer,
name :: String }
Then I want to write a function, that takes the name and a list of Entrys as arguments an give me the Entrys with the highest count. What I have so far is
searchEntry :: String -> [Entry] -> Maybe Integer
searchEntry _ [] = Nothing
searchEntry name1 (x:xs) =
if name x == name1
then Just (count x)
else searchEntry name xs
That gives me the FIRST Entry that the function finds, but I want the Entry with the highest count. How can I implement that?

My suggestion would be to break the problem into two parts:
Find all entries matching a given name
Find the entry with the highest count
You could set it up as
entriesByName :: String -> [Entry] -> [Entry]
entriesByName name entries = undefined
-- Use Maybe since the list might be empty
entryWithHighestCount :: [Entry] -> Maybe Entry
entryWithHighestCount entries = undefined
entryByNameWithHighestCount :: String -> [Entry] -> Maybe Entry
entryByNameWithHighestCount name entires = entryWithHighestCount $ entriesByName name entries
All you have to do is implement the relatively simple functions that are used to implement getEntryByNameWithHighestCount.

You need to add an inner method that takes a current result as a parameter and returns that instead of Nothing when reaching the end of the method.
Also you would need to update your result found logic to compare a potentially existing function and the found value.

I would consider changing the signature of the function to String->Maybe Entry (or String->[Entry]) if you indeed want to return the "Entry" items with the highest count.
Otherwise, you can actually do what you want as a oneliner using some pretty common Haskell functions....
As Bheklilr mentioned, the name filter can be done first, and it is really easy to do this using the filter function....
filter (hasName theName) entries
Note that hasName can be written out fully as a separate function, but Haskell also offers you the following shortcut.
hasName = (== theName) . name
Now you just need the maximum value.... Haskell has a maximum function, but it only works on the Ord class. You can make Entry an instance of Ord, or you can just use the related maximumBy function, that takes an extra ordering function
maximumBy orderFunction entries2
Again, you can write orderFunction yourself (which you might want to do as an excercise), but haskell again offers a shortcut.
orderFunction = compare `on` count
You will need to import some libs to get this all to work (Data.Function, Data.List). You also will need to put in some extra code to account for the Nothing case.
It might be worth it to write out the functions longhand first, but I recommend that you use Hoogle to lookup and understand compare, on, and maximumBy.... Using tricks like this can really shorten your code.
Putting it all together, you can get the entry with the maximum count like this
maxEntry = maximumBy (compare `on` count) $ filter ((theName ==) . name) $ entries
You will need to modify this to account for the Nothing case, or if you want to return all max Entries (this just chooses one), or if you really wanted to return count, and not the entry.

Haskell: Defining a proper interface for data types with many fields

For the representation of a DSL syntax tree I have data types that represent this tree. At several places, within this tree I get quite a number of subelements that are optional and/or have a "*" multiplicity. So one data type might look something like
data SpecialDslExpression = MyExpression String [Int] Double [String] (Maybe Bool)
What I am looking for is a possibility to construct such a type without having to specify all of the parameters, assuming I have a valid default for each of them. The usage scenario is such that I need to create many instances of the type with all kinds of combinations of its parameters given or omitted (most of the time two or three), but very rarely all of them. Grouping the parameters into subtypes won't get me far as the parameter combinations don't follow a pattern that would have segmentation improve matters.
I could define functions with different parameter combinations to create the type using defaults for the rest, but I might end up with quite a number of them that would become hard to name properly, as there might be no possibility to give a proper name to the idea of createWithFirstAndThirdParameter in a given context.
So in the end the question boils down to: Is it possible to create such a data type or an abstraction over it that would give me something like optional parameters that I can specify or omit at wish?

I would suggest a combinations of lenses and a default instance. If you are not already importing Control.Lens in half of your modules, now is the time to start! What the heck are lenses, anyway? A lens is a getter and a setter mashed into one function. And they are very composable. Any time you need to access or modify parts of a data structure but you think record syntax is unwieldy, lenses are there for you.
So, the first thing you need to do – enable TH and import Control.Lens.
{-# LANGUAGE TemplateHaskell #-}
import Control.Lens
The modification you need to do to your data type is adding names for all the fields, like so:
data SpecialDslExpression = MyExpression { _exprType :: String
, _exprParams :: [Int]
, _exprCost :: Double
, _exprComment :: [String]
, _exprLog :: Maybe Bool
} deriving Show
The underscores in the beginning of the field names are important, for the following step. Because now we want to generate lenses for the fields. We can ask GHC to do that for us with Template Haskell.
$(makeLenses ''SpecialDslExpression)
Then the final thing that needs to be done is constructing an "empty" instance. Beware that nobody will check statically that you actually fill all the required fields, so you should preferably add an error to those fields so you at least get a run-time error. Something like this:
emptyExpression = MyExpression (error "Type field is required!") [] 0.0 [] Nothing
Now you are ready to roll! You cannot use an emptyExpression, and it will fail at run-time:
> emptyExpression
MyExpression {_exprType = "*** Exception: Type field is required!
But! As long as you populate the type field, you will be golden:
> emptyExpression & exprType .~ "Test expression"
MyExpression { _exprType = "Test expression"
, _exprParams = []
, _exprCost = 0.0
, _exprComment = []
, _exprLog = Nothing
}
You can also fill several fields at once, if you want to.
> emptyExpression & exprType .~ "Test expression"
| & exprLog .~ Just False
| & exprComment .~ ["Test comment"]
MyExpression { _exprType = "Test expression"
, _exprParams = []
, _exprCost = 0.0
, _exprComment = ["Test comment"]
, _exprLog = Just False
}
You can also use lenses to apply a function to a field, or look inside a field of a field, or modify any other existing expression and so on. I definitely recommend taking a look at what you can do!

Alright I'll actually expand upon my comment. Firstly, define your data type as a record (and throw in a few type synonyms).
data Example = E {
one :: Int,
two :: String,
three :: Bool,
four :: Double
}
next you create a default instance
defaultExample = Example 1 "foo" False 1.4
and then when a user wants to tweak a field in the default to make their own data they can do this:
myData = defaultExample{four=2.8}
Finally, when they want to pattern match just one item, they can use
foo MyData{four=a} = a

Haskell: Confusion with own data types. Record syntax and unique fields

I just uncovered this confusion and would like a confirmation that it is what it is. Unless, of course, I am just missing something.
Say, I have these data declarations:
data VmInfo = VmInfo {name, index, id :: String} deriving (Show)
data HostInfo = HostInfo {name, index, id :: String} deriving (Show)
vm = VmInfo "vm1" "01" "74653"
host = HostInfo "host1" "02" "98732"
What I always thought and what seems to be so natural and logical is this:
vmName = vm.name
hostName = host.name
But this, obviously, does not work. I got this.
Questions
So my questions are.
When I create a data type with record syntax, do I have to make sure that all the fields have unique names? If yes - why?
Is there a clean way or something similar to a "scope resolution operator", like :: or ., etc., so that Haskell distinguishes which data type the name (or any other none unique fields) belongs to and returns the correct result?
What is the correct way to deal with this if I have several declarations with the same field names?
As a side note.
In general, I need to return data types similar to the above example.
First I returned them as tuples (seemed to me the correct way at the time). But tuples are hard to work with as it is impossible to extract individual parts of a complex type as easy as with the lists using "!!". So next thing I thought of the dictionaries/hashes.
When I tried using dictionaries I thought what is the point of having own data types then?
Playing/learning data types I encountered the fact that led me to the above question.
So it looks like it is easier for me to use dictionaries instead of own data types as I can use the same fields for different objects.
Can you please elaborate on this and tell me how it is done in real world?

Haskell record syntax is a bit of a hack, but the record name emerges as a function, and that function has to have a unique type. So you can share record-field names among constructors of a single datatype but not among distinct datatypes.
What is the correct way to deal with this if I have several declarations with the same field names?
You can't. You have to use distinct field names. If you want an overloaded name to select from a record, you can try using a type class. But basically, field names in Haskell don't work the way they do in say, C or Pascal. Calling it "record syntax" might have been a mistake.
But tuples are hard to work with as it is impossible to extract individual parts of a complex type
Actually, this can be quite easy using pattern matching. Example
smallId :: VmInfo -> Bool
smallId (VmInfo { vmId = n }) = n < 10
As to how this is done in the "real world", Haskell programmers tend to rely heavily on knowing what type each field is at compile time. If you want the type of a field to vary, a Haskell programmer introduces a type parameter to carry varying information. Example
data VmInfo a = VmInfo { vmId :: Int, vmName :: String, vmInfo :: a }
Now you can have VmInfo String, VmInfo Dictionary, VmInfo Node, or whatever you want.
Summary: each field name must belong to a unique type, and experienced Haskell programmers work with the static type system instead of trying to work around it. And you definitely want to learn about pattern matching.

There are more reasons why this doesn't work: lowercase typenames and data constructors, OO-language-style member access with .. In Haskell, those member access functions actually are free functions, i.e. vmName = name vm rather than vmName = vm.name, that's why they can't have same names in different data types.
If you really want functions that can operate on both VmInfo and HostInfo objects, you need a type class, such as
class MachineInfo m where
name :: m -> String
index :: m -> String -- why String anyway? Shouldn't this be an Int?
id :: m -> String
and make instances
instance MachineInfo VmInfo where
name (VmInfo vmName _ _) = vmName
index (VmInfo _ vmIndex _) = vmIndex
...
instance MachineInfo HostInfo where
...
Then name machine will work if machine is a VmInfo as well as if it's a HostInfo.

Currently, the named fields are top-level functions, so in one scope there can only be one function with that name. There are plans to create a new record system that would allow having fields of the same name in different record types in the same scope, but that's still in the design phase.
For the time being, you can make do with unique field names, or define each type in its own module and use the module-qualified name.

Lenses can help take some of the pain out of dealing with getting and setting data structure elements, especially when they get nested. They give you something that looks, if you squint, kind of like object-oriented accessors.
Learn more about the Lens family of types and functions here: http://lens.github.io/tutorial.html
As an example for what they look like, this is a snippet from the Pong example found at the above github page:
data Pong = Pong
{ _ballPos :: Point
, _ballSpeed :: Vector
, _paddle1 :: Float
, _paddle2 :: Float
, _score :: (Int, Int)
, _vectors :: [Vector]
-- Since gloss doesn't cover this, we store the set of pressed keys
, _keys :: Set Key
}
-- Some nice lenses to go with it
makeLenses ''Pong
That makes lenses to access the members without the underscores via some TemplateHaskell magic.
Later on, there's an example of using them:
-- Update the paddles
updatePaddles :: Float -> State Pong ()
updatePaddles time = do
p <- get
let paddleMovement = time * paddleSpeed
keyPressed key = p^.keys.contains (SpecialKey key)
-- Update the player's paddle based on keys
when (keyPressed KeyUp) $ paddle1 += paddleMovement
when (keyPressed KeyDown) $ paddle1 -= paddleMovement
-- Calculate the optimal position
let optimal = hitPos (p^.ballPos) (p^.ballSpeed)
acc = accuracy p
target = optimal * acc + (p^.ballPos._y) * (1 - acc)
dist = target - p^.paddle2
-- Move the CPU's paddle towards this optimal position as needed
when (abs dist > paddleHeight/3) $
case compare dist 0 of
GT -> paddle2 += paddleMovement
LT -> paddle2 -= paddleMovement
_ -> return ()
-- Make sure both paddles don't leave the playing area
paddle1 %= clamp (paddleHeight/2)
paddle2 %= clamp (paddleHeight/2)
I recommend checking out the whole program in its original location and looking through the rest of the lens material; it's very interesting even if you don't end up using them.

Yes, you cannot have two records in the same module with the same field names. The field names are added to the module's scope as functions, so you would use name vm rather than vm.name. You could have two records with the same field names in different modules and import one of the modules qualified as some name, but this is probably awkward to work with.
For a case like this, you should probably just use a normal algebraic data type:
data VMInfo = VMInfo String String String
(Note that the VMInfo has to be capitalized.)
Now you can access the fields of VMInfo by pattern matching:
myFunc (VMInfo name index id) = ... -- name, index and id are bound here

Any nice record Handling tricks in Haskell?

I'm aware of partial updates for records like :
data A a b = A { a :: a, b :: b }
x = A { a=1,b=2 :: Int }
y = x { b = toRational (a x) + 4.5 }
Are there any tricks for doing only partial initialization, creating a subrecord type, or doing (de)serialization on subrecord?
In particular, I found that the first of these lines works but the second does not :
read "A {a=1,b=()}" :: A Int ()
read "A {a=1}" :: A Int ()
You could always massage such input using a regular expression, but I'm curious what Haskell-like options exist.

Partial initialisation works fine: A {a=1} is a valid expression of type A Int (); the Read instance just doesn't bother parsing anything the Show instance doesn't output. The b field is initialised to error "...", where the string contains file/line information to help with debugging.
You generally shouldn't be using Read for any real-world parsing situations; it's there for toy programs that have really simple serialisation needs and debugging.
I'm not sure what you mean by "subrecord", but if you want serialisation/deserialisation that can cope with "upgrades" to the record format to contain more information while still being able to process old (now "partial") serialisations, then the safecopy library does just that.

You cannot leave some value in Haskell "uninitialized" (it would not be possible to "initialize" it later anyway, since Haskell is pure). If you want to provide "default" values for the fields, then you can make some "default" value for your record type, and then do a partial update on that default value, setting only the fields you care about. I don't know how you would implement read for this in a simple way, however.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string