Haskell data type function parameter - haskell

What is the significance of the parenthesis in a function definition in Haskell with respect to the data type of parameters.
For example:
doStuff Name -> Age -> String
doStuff (NameConstr a) (AgeConstr b) = "Nom: " ++ a ++ ", age: " ++ b
with the following defined somewhere beforehand:
data Name = NameConstr String
data Age = AgeConstr Integer
Could the function parameters a and b be captured in a way that negates the need for parenthesis here?
FYI, I'm working through:
http://yannesposito.com/Scratch/en/blog/Haskell-the-Hard-Way/#type-construction
http://learnyouahaskell.com/types-and-typeclasses ,
and I just can't seem grasp this finer detail yet.

Without parentheses, the function would be deemed to have four parameters. I can't think of a counterexample where omitting brackets would lead to ambiguity, though.
If you want, you can redefine your types as follows:
data Name = NameConstr { getName :: String }
data Age = AgeConstr { getAge :: Integer }
so that your function can become:
doStuff n a = "Nom: " ++ getName n ++ ", age: " ++ show (getAge a)
(fixed the last part; a is an Integer and cannot be concatenated to a string)

Indeed, it's possible to parse simple grammar for (even nested) patterns without parens at all. Suppose such one:
<PAT> ::= <WILDCARD> | <VAR> | <CON0> | <CON1> <PAT> | <CON2> <PAT> <PAT> ...
<VAR> ::= <LNAME>
<CON*> ::= <UNAME>
<WILD> ::= "_"
where LNAME is names that starts with lowercase letter and UNAME starts with uppercase letter. While parsing we should look up constructor name so we can find out its arity. Then we can parse constructor fields using arity information. But this lookup might significant complicate and slow down parsing itself. Haskell has much more complex patterns(view patterns, "as" patterns, records, infix constructors with arbitrary fixity, e.t.c.) and omitting parens can lead to ambiguity.
Though there is another reason not to do that. Consider the following code:
data Bar = Bar Int
data Foo = Foo Int
libFunction Foo a Bar b = a + b
someUse bar foo = libFunction foo bar
Next imagine we change datatypes a bit:
data Bar = Bar
data Foo = Foo Int Bar Int
Modified code might still typecheck, but the function will do not that we expect. Not a real world example but nevertheless. And since Haskell have type classes it can be pretty hard to find out where we get something wrong.
In other words: we can loose quality of error messages and parens defends us from unexpected behaviour after changes.

It's slightly silly, but in this case there is actually a way to avoid parentheses:
doStuff :: Name -> Age -> String
NameConstr a `doStuff` AgeConstr b = "Nom: " ++ a ++ ", age: " ++ b
This is exactly the same way you define an infix operator, and using backticks to infix-ify a non-operator identifier works just as well when defining it as when applying it.
I don't recommend actually doing this with functions you don't expect to be used in backtick-y infix style, though.

Related

Apply function to argument before performing pattern matching

I'm new to Haskell and trying to not instinctively think imperatively. I have a function which does all the work in a pattern matching block, however it needs to pattern match on the argument after a function is applied to it.
Doing this in a functional way gets me to this:
foo :: Int -> String
foo n = bar $ show n
bar :: String -> String
bar [] = ""
bar (c:s) = "-" ++ bar s
Where foo is the function I'm trying to implement but bar is where all the work gets done. foo only exists to provide the right type signature and perform the precursor show transformation before calling bar. In practice, bar could get quite complicated, but still, I have no reason to expose it as a separate function.
What's the Haskell way to perform a simple function like show and "then" pattern match on the result of that?
I tried changing the pattern matching to a case statement, but it didn't permit the all-important recursion, because there was no function to call recursively. For the same reason, using a where clause applied to multiple patterns also doesn't work.
I remembered that Learn You A Haskell often seemed to emphasise that where and let are more powerful than they first seem because "everything is a function" and the clauses are expressions themselves.
That prompted me to see if I could push where a bit harder and use it to essentially define the helper function bar. Turns out I can:
foo :: Int -> String
foo n = bar $ show n
where bar [] = ""
bar (c:s) = "-" ++ bar s
Unless there's a better way, I think this is the solution I'm after. It took a bit of beating back my imperative tendencies to see this, but it's starting to look logical and much more "core" than I was imagining.
Please provide an alternative answer if these assumptions are leading me off course!
It depends on whether the pattern-matching function is recursive or not.
In your example, it is called bar and is recursive.
foo :: Int -> String
foo n = bar $ show n
bar :: String -> String
bar [] = ""
bar (c:s) = "-" ++ bar s
Here, the (arguably) best solution is the one you found: use where (or let) and define it locally to foo:
foo :: Int -> String
foo n = bar $ show n
where
bar :: String -> String -- optional type annotation
bar [] = ""
bar (c:s) = "-" ++ bar s
The type annotation for the inner function is optional. Many Haskellers think that the top-level function foo should have its signature (and GHC with -Wall warns if you do not provide it) but also believe that the inner function do not have to be annotated. For what it is worth, I like to add it when I think it's non obvious from the context. Feel free to include or omit it.
When bar is not recursive, we have other options. Consider this code:
foo :: Int -> String
foo n = bar $ show n
bar :: String -> String
bar [] = "empty"
bar (c:s) = "nonempty " ++ c : s
Here, we can use case of:
foo :: Int -> String
foo n = case show n of
[] -> "empty"
(c:s) -> "nonempty " ++ c : s
This calls function show first, and then pattern-matches its result. I think this is easier to read than adding a where to define bar.
Theoretically, speaking, we could follow the case approach even in the recursive case, and leverage fix (a function from the library) to close the recursion. I do not recommend you to do this, since defining bar using where (or let) is more readable. I'm adding this less readable option here only for the sake of completeness.
foo :: Int -> String
foo n = fix (\bar x -> case x of
[] -> ""
(c:s) -> "-" ++ bar s
) $ show n
This is equivalent to the first recursive code snippet, but it requires much more time to read. The helper fix has its uses, but if I read this in actual production code I'd think the programmer is trying to show they are "clever" instead of writing simple, readable code.

Why is (.) called infix as just . rather than `(.)`

I learned that functions can be invoked in two ways; prefix and infix. For example, say I've created this function:
example :: [Char] -> [Char] -> [Char]
example x y = x ++ " " ++ y
I can call it prefix like so:
example "Hello" "World"
or infix like so:
"Hello" `example` "World"
Both of which will result in the list of chars representing a string "Hello World".
However, I am now learning about function composition, and have come across the function defined like so:
(.) :: (b -> c) -> (a -> b) -> a -> c
So, say I was wanting to compose negate with multiplication by three. I would write the prefix invocation like:
negateComposedWithMultByThree = (.) negate (*3)
And the infix invocation like:
negateComposedWithMultByThree = negate `(.)` (*3)
But, whilst the prefix invocation compiles, the infix invocation does not and instead gives me the error message:
error: parse error on input `('
It seems, in order to call compose infix, I need to omit the brackets and call it like so:
negateComposedWithMultByThree = negate . (*3)
Can anyone shed any light on this? Why does "Hello" `example` "World" whilst negate `(.)` (*3) does not?
In addition, if I try to make my own function with a signature like this:
(,) :: Int -> Int
(,) x = 1
It does not compile, with the error:
"Invalid type signature (,) : ... Should be of form :: "
There's nothing deep here. There's just two kinds of identifiers that have different rules about how they're parsed: by-default-infix, and by-default-prefix. You can tell which is which, because by-default-infix identifiers contain only punctuation, while by-default-prefix identifiers contain only numbers, letters, apostrophes, and underscores.
Recognizing that the default isn't always the right choice, the language provides conversions away from the default behavior. So there are two separate syntax rules, one that converts a by-default-infix identifier to prefix (add parentheses), and one that converts a by-default-prefix identifier to infix (add backticks). You can not nest these conversions: a by-default-infix identifier converted to prefix form is not a by-default-prefix identifier.
That's it. Nothing fundamentally interesting -- all of them become just function applications once parsed -- it's just syntax sugar.

Why is the restriction on newtype?

I read that "newtype has exactly one constructor with exactly one field inside it." Is this restriction adds any advantage? If the value constructor is limited with only one field, why can't i use the field directly in my code, instead of wrapping it with newtype?
newtype is a tool for creating data abstraction that has no runtime cost.
What do I mean by abstraction?
Suppose you have:
greetPerson :: String -> String -> String
greetPerson greeting name = greeting ++ " " ++ name
greetPerson "Hello" "Mike" => "Hello Mike"
This works fine, but it opens opportunities for misuse:
greetPerson "Mike" "Hello" => "Mike Hello"
The problem is that you're using the same type everywhere (String), carrying no semantic meaning. Let's use a newtype:
newtype Name = Name String
greetPerson :: String -> Name -> String
greetPerson greeting (Name name) = greeting ++ " " ++ name
greetPerson "Hello" (Name "Mike") => "Hello Mike"
We end up with the same functionality, but now the type signature carries more meaning and the compiler can tell us when we misuse it.
What do I mean by no runtime cost?
The newtype from my example exists only at the type level and the compiler generates exactly the same code as if I used String throughout.
This hints at why newtype is only allowed for one constructor with one field.
Imagine you tried to make newtype work for more than one constructor. How would you distinguish which one you have at runtime? You'd have to store some additional information.
Same with more than one field. You'd need some way to bundle two fields together.
All of them add some runtime cost which newtype promises not to do.
If you want more than one field or more than one constructor simply use data:
data Foo = Bar String | Baz Int Bool

Parsec not consuming all Input?

I'am writing some code to parse commands from the Simple Imperative Language defined in
Theory of Programming Languages (Reynolds, 1998).
I have a lexer module that given a string extracts the tokens from it if it's a valid language expression and then I pass that list of tokens to the parser which should build an internal representation of the command (defined as an algebraic data type).
These are my Tokens:
--Tokens for the parser
data Token = Kw Keyword
| Num Int
| Op Operator
| Str String
| Sym Symbol
deriving Show
I'm having trouble with binary operators. I'll put as an example the sum, but it happens the same with all of them, either boolean or integers.
For example if I'd run the program parse "x:=2+3"
I should get the following list of tokens from the lexer
[Str "x", Op Colon, Op Equal, Num 2, OP, Plus, Num 3]
which is actually what I'm getting.
But then the parser should return the command
Assign "x" (Ibin Plus (Const 2) (Const 3)
which is the correct representation of the command. But instead of that I'm getting the following representation:
Assign "x" (Const 2)
I guess that I screwed it at some point in the pIntExpr function because the variable identifier and the := of the assignment are parsed OK and it's not parsing the last elements. Here are the relevant parsers for this example, to see if someone can orientate me in what I'm doing wrong.
-- Integer expressions
data IntExpr = Const Int
| Var Iden --Iden=String
| Neg IntExpr
| IBin OpInt IntExpr IntExpr
deriving Show
type TParser = Parsec [Token] ()
--Internal representation of the commands
data Comm = Skip
| Assign Iden IntExpr
| If Assert Comm Comm
| Seq Comm Comm
| While Assert Comm
| Newvar Iden IntExpr Comm
deriving Show
--Parser for non sequential commands
pComm' :: TParser Comm
pComm' = choice [pif,pskip,pAssign,pwhile,pNewVar]
--Parser for the assignment command
pAssign :: TParser Comm
pAssign = do v <- pvar
_ <- silentOp Colon
_ <- silentOp Equal
e <- pIntExp
return $ Assign v e
-- Integer expressions parser
pIntExp :: TParser IntExpr
pIntExp = choice [ var' --An intexp is either a variable
, num --Or a numeric constant
, pMul --Or <intexp>x<intexp>
, pSum --Or <intexp>+<intexp>
, pRes --Or <intexp>-<intexp>
, pDiv --Division
, pMod --Modulus
, pNeg --Unary "-"
]
-- Parser for <intexp>+<intexp>
pSum :: TParser IntExpr
pSum = do
e <- pIntExp
_ <- silentOp Lexer.Plus
e' <- pIntExp
return $ IBin Lang.Plus e e'
UPDATE TAKING INTO ACCOUNT AndrewC's ANSWER
Unfortunately moving the var' parser down in the choice list didn't work, it yields the same result. But I took AndrewC's answer into account and tried to "manually" trace the execution (I'm not familiar with ghci's debugger and ended up doing lot of single steps and got lost eventually).
This is how I reason it:
I got this token list from the lexer:
[Str "x", Op Colon, Op Equal, Num 2, OP Plus, Num 3]
So, the pComm' parser fails with pif and pskip, but succeds with pAssign, consuming Str "x", Op Colon and Op Equal and trying to parse
[Num 2, OP Plus, Num 3] with pIntExp (!!)
The pIntExp parser then tries the var' parser and fails, but succeds with the num parser consuming the Num 2 token and therefore returning the erroneous result Assign "x" (Const 2).
So with AndrewC's advice in mind about choice, I moved num parser down in the list too. For the sake of simplicity I'll consider pIntExp as
choice [pSum, num, varĀ“] that it's what's relevant for this particular example.
The first part of the reasoning remains the same. So I'll restart from (!!) where we had
[Num 2, Op Plus, Num 3] to be parsed by pIntExp
pIntExp tries now first with pSum, which in turn "calls" pIntExp again,
which will try pSum again, and so the program hangs. I tried it and it indeed hangs and never ends.
So I was wondering if there's a form to make the pSum parser "lookahead" for the Op Plus token and then parse the corresponding expressions?
UPDATE 2: After "googling" a little bit more now that I've identified the problem I found that the combinational parsers chainl1 and/or chainl might be just what I need.
I'll be playing with these and if I work it out post the solution
The choice function tries the parser it's given in the order they are in the list.
Since yoiur parser for variables appears before your parser for the more complicated addition expression, it suceeds before the other is tried.
To solve this problem, put the variable parser after any expressions that start with a variable (and think through any other substring-matching issues when using choice.
Similar problems incude 3 - 4 + 1 evaluating to -2. People expect left association in the absence of other priorities (so sum - term instead of term - sum).
You also might not want 1 + 10 * 5 to eveluate to 55, so you'll have to be careful around + and * etc if you want to implement operator precedence. You can achieve this by parsing an expression made up of multiplication as a term and then an additive expression as a sum of terms.

Create a type that can contain an int and a string in either order

I'm following this introduction to Haskell, and this particular place (user defined types 2.2) I'm finding particularly obscure. To the point, I don't even understand what part of it is code, and what part is the thoughts of the author. (What is Pt - it is never defined anywhere?). Needless to say, I can't execute / compile it.
As an example that would make it easier for me to understand, I wanted to define a type, which is a pair of an Integer and a String, or a String and an Integer, but nothing else.
The theoretical function that would use it would look like so:
combine :: StringIntPair -> String
combine a b = (show a) ++ b
combine a b = a ++ (show b)
If you need a working code, that does the same, here's CL code for doing it:
(defgeneric combine (a b)
(:documentation "Combines strings and integers"))
(defmethod combine ((a string) (b integer))
(concatenate 'string a (write-to-string b)))
(defmethod combine ((a integer) (b string))
(concatenate 'string (write-to-string a) b))
(combine 100 "500")
Here's one way to define the datatype:
data StringIntPair = StringInt String Int |
IntString Int String
deriving (Show, Eq, Ord)
Note that I've defined two constructors for type StringIntPair, and they are StringInt and IntString.
Now in the definition of combine:
combine :: StringIntPair -> String
combine (StringInt s i) = s ++ (show i)
combine (IntString i s) = (show i) ++ s
I'm using pattern matching to match the constructors and select the correct behavior.
Here are some examples of usage:
*Main> let y = StringInt "abc" 123
*Main> let z = IntString 789 "a string"
*Main> combine y
"abc123"
*Main> combine z
"789a string"
*Main> :t y
y :: StringIntPair
*Main> :t z
z :: StringIntPair
A few things to note about the examples:
StringIntPair is a type; doing :t <expression> in the interpreter shows the type of an expression
StringInt and IntString are constructors of the same type
the vertical bar (|) separates constructors
a well-written function should match each constructor of its argument's types; that's why I've written combine with two patterns, one for each constructor
data StringIntPair = StringInt String Int
| IntString Int String
combine :: StringIntPair -> String
combine (StringInt s i) = s ++ (show i)
combine (IntString i s) = (show i) ++ s
So it can be used like that:
> combine $ StringInt "asdf" 3
"asdf3"
> combine $ IntString 4 "fasdf"
"4fasdf"
Since Haskell is strongly typed, you always know what type a variable has. Additionally, you will never know more. For instance, consider the function length that calculates the length of a list. It has the type:
length :: [a] -> Int
That is, it takes a list of arbitrary a (although all elements have the same type) and returns and Int. The function may never look inside one of the lists node and inspect what is stored in there, since it hasn't and can't get any informations about what type that stuff stored has. This makes Haskell pretty efficient, since, as opposed to typical OOP languages such as Java, no type information has to be stored at runtime.
To make it possible to have different types of variables in one parameter, one can use an Algebraic Data Type (ADT). One, that stores either a String and an Int or an Int and a String can be defined as:
data StringIntPair = StringInt String Int
| IntString Int String
You can find out about which of the two is taken by pattern matching on the parameter. (Notice that you have only one, since both the string and the in are encapsulated in an ADT):
combine :: StringIntPair -> String
combine (StringInt str int) = str ++ show int
combine (IntString int str) = show int ++ str

Resources