I've recently picked up Haskell at uni and I'm working my way through a set of exercises, here's a snippet of one that I can't make sense of:
"Consider the following grammar for a simple, prefix calculator language:"
num ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
int ::= num | num int
expr ::= int | - expr | + expr expr | * expr expr
I'm confused as how to translate this into Haskell syntax (I'm a complete beginner in both Haskell and functional programming, please be gentle)
I suspect that num, int and expr are all, supposedly, types/values that can be declared using data or type and that they impose constraints on the calculator. However I can't make sense of either: How do I declare type or data(not a variable) for fixed values, namely 0-9? Also, how can I put symbols like - or + in a declaration?
Don't confuse a string in the grammar for the AST that represents it. Compare the string
"+ + 3 4 5"
which is a string in the grammar you've been given with
Plus (Plus (Literal 3) (Literal 4)) (Literal 5)
which would be a sensible Haskell value for the AST that String could get parsed to.
How do I declare type or data(not a variable) for fixed values, namely 0-9?
You can define a type, like
data Digit = Zero | One | Two | Three | Four | Five | Six | Seven | Eight | Nine deriving (Eq, Show)
This represents the num in your problem. Obviously we cannot use 0, 1, 2, 3, ... since they are already interpreted as numbers in Haskell.
Then, you can define
data Number = Single Digit | Many Digit Number deriving (Eq, Show)
which is equivalent to int in your problem. This type represents one (Single ...) or more (Many ...) digits, which together make a one decimal number. For example, with these data types a number 361 would be Many Three (Many Six (Single One)).
Also, how can I put symbols like - or + in a declaration?
There is no way to put those symbols in type or data declarations. You can use, however, names for the operations, like Sum, Sub and Mul. Then the expr of the grammar of your problem would translate to
data Expr = Lit Number
| Sub Expr Expr
| Sum Expr Expr
| Mul Expr Expr
deriving (Eq, Show)
If we would have a string "+ (- (2 5) (1 3)) 3", which represents an expression in the prefix calculator language of your problem , it would be parsed to Sum (Sub (Lit (Many Two (Single Five))) (Lit (Many One (Single Three)))) (Single Three).
If it is just a exercise about modeling data (without code) the answer consist of adding constructor names to your grammar (and changing literal number to names). Something like
data Num = Zero | One | Two | Three | Four | Five
| Six | Seven | Eight | Nine
data Int = Single Num | Multiple Num Int
data Exp = ExpInt Int | ExpMinus Exp Exp | ExpMul Exp Exp
| ExpMul Exp Exp
From that, you can write all sort of code, to parse and evaluate expressions.
Years ago, I got clever, and I declared my AST type an instance of Num, Eq and Ord, then defined the mathematical and comparison operators for AST expressions, so that expr1 + expr2 would yield a valid AST. Using sevenj’s declarations, this would be written like (+) x y = Sum x y, where the right-hand side is the constructor of an AST expression. For brevity, one = Lit One and two = Lit Two. Then you might write one + one == two and the operators would generate your AST with the correct precedence. Between that and abuse of the let { ... } in ... syntax to allow for arbitrary indentation, I had a way to write ASTs that was almost just the toy imperative language itself, with some boilerplate above, below and on the left.
The TA grading my assignment, though, was not amused, and wrote, “This is not Haskell!”
Related
I am trying to build an interpreter for a C-like language in Haskell. I have so far written and combined small monadic parsers following this paper, hence so far I can generate an AST representation of a program. I defined the abstract syntax as follows:
data LangType = TypeReal | TypeInt | TypeBool | TypeString deriving (Show)
type Id = String
data AddOp = Plus | Minus | Or deriving (Show)
data RelOp = LT | GT | LTE | GTE | NEq | Eq deriving (Show)
data MultOp = Mult | Div | And deriving (Show)
data UnOp = UnMinus | UnNot deriving (Show)
data BinOp = Rel RelOp | Mul MultOp | Add AddOp deriving (Show)
data AST = Program [Statement] deriving (Show)
data Block = StatsBlock [Statement] deriving (Show)
data Statement = VariableDecl Id LangType Expression
| Assignment Id Expression
| PrintStatement Expression
| IfStatement Expression Block Block
| WhileStatement Expression Block
| ReturnStatement Expression
| FunctionDecl Id LangType FormalParams Block
| BlockStatement Block
deriving (Show)
data Expression = RealLiteral Double
| IntLiteral Int
| BoolLiteral Bool
| StringLiteral String
| Unary UnOp Expression
| Binary BinOp Expression Expression
| FuncCall Id [Expression]
| Var Id
deriving (Show)
data FormalParams = IdentifierType [(Id, LangType)] deriving (Show)
I have yet to type-check my AST and build the interpreter to evaluate expressions and execute statements. My questions are the following:
Does the abstract syntax make sense/can it be improved? In particular, I've been running into a recurring problem. In the EBNF of this language I'm trying to interpret, a WhileStatement consists of an Expression (which I have no problem with) and a Block, which in the EBNF happens to be a Statement just like WhileStatement, and so I cannot refer to Block from my WhileStatement. I've worked around this by defining a separate data type Block (as is shown in the above code), but am not sure if this is the best way. I'm finding defining data types quite confusing.
Since I have to type-check my AST and evaluate/execute, do I implement these separately or can I define some function which does them both at the same time?
Any general tips on how I should go about type-checking and interpreting the language would also be greatly appreciated. Since the language has variable and function declarations, I am thinking of implementing some sort of symbol table, although again I am struggling with defining the type for this. So far I've tried
import qualified Data.Map as M
data Value = RealLit Double | IntLit Int | BoolLit Bool | StringLit String | Func [FormalParams] String
deriving (Show)
type TermEnv = M.Map String Value
but I'm unsure whether I should be using my LangType from before.
Addressing your question in the comments about how to proceed with type checking and evaluation.
If you don't have to do inference or polymorphism, type checking is pretty simple. Also type checking and evaluation mirror each other pretty closely in these conditions.
Begin by defining a monad with the features you need. For a type checker, you will need
A type environment, i.e. a Reader(Map Id LangType) component, to keep track of the types of local variables.
An error ability, e.g. ExceptString.
So you could define a monad like
type TypeEnv = Map.Map Id LangType
type TC = ReaderT TypeEnv (Except String)
And then your typechecker function would look like:
typeCheck :: AST -> TC ()
(We return () because there is nothing interesting to be gained from the typechecking process besides knowing whether the program passed.)
This will be largely structurally inductive, e.g.
typeCheck (Program stmt) = -- typecheckStmt each statement*
typeCheckStmt :: Statement -> TC ()
typeCheckStmt (VariableDecl v type defn) = ...
typeCheckStmt (Assignment v exp) = do
Just t <- asks (Map.lookup v)
t' <- typeCheckExp exp
when (t /= t') $ throwError "Types do not match"
...
-- Return the type of a composite expression to use elsewhere
typeCheckExp :: Expression -> TC LangType
...
There will be a bit of finesse required to make sure that variable declarations in a list of statements can be seen by later statements in the same list. I will leave that as a puzzle. (Hint: see the local function to provide an updated environment within a scope.)
Evaluation is a similar story. You're correct that you need a type of run-time values. Without some cleverness that you are probably not ready for (and is of questionable utility even if you were) there is not really a way to use LangType in Value, so you're on the right track.
You will need a monad that supports keeping track of the values of variables and the ability to do whatever else your language needs. To start I recommend
type Eval = StateT (Map Id Value) IO
and proceed structurally as before. There will again be some finesse required when handling variable scopes and shadowing, and you may need to change the environment type or mess with your Value type a bit to accommodate these subtleties, but thinking through these problems is important. Start simple, don't try to implement typechecking and evaluation for your whole language at once.
I've specified precedence and associativity like this:
expr
: LB expr RB
| <assoc=right> (SUB | NOT) expr
| expr op=(MULTI | DIV | MOD | AND) expr
| expr op=(ADD | SUB | OR) expr
| expr comparator expr
| expr op=(ANDTHEN | ORELSE) expr
| INTLIT
;
But it also works on ( 1 and 2 ). I want to represent the expression only for integer (i.e., only work on + - * /) or boolean (AND OR). How can I do that?
That's not a precedence issue, it's a type issue and should thus be handled by the type checker.
You might be tempted to separate your grammar into rules such as integerExpression and booleanExpression and it's certainly possibe to create a grammar that rejects 1 and 2 this way. But this approach makes your grammar needlessly complicated and will reach its limits once your language becomes even slightly more powerful. When you introduce variables, for example, you'd want to allow a and b if and only if a and b are both Boolean variables, but that's not something you can tell just by looking at the expression. So in that scenario (and many others), you'll need Java (or whichever language you're using) code to check the types anyway.
So in conclusion, you should leave your grammar as-is and reject 1 and 2 in the type checker.
I have the following data structures defined:
data Operator = Plus | Times | Minus deriving (Eq,Show)
data Variable = A | B | C deriving (Eq,Show)
newtype Const = D Numeral deriving (Eq,Show)
data CVO = Const | Variable | Operator deriving (Eq,Show)
type Expr = [CVO]
I have defined the following function:
eval2 :: Expr -> Integer
eval2 x = helper x
I would like to check if an element of the CVO list (Expr) is either an instance of Const, Variable or Operator (this works) and I would like to implement varying code for the specific type of the instance (e.g. Plus, Times, Minus for Operator).
helper :: Expr -> Integer
helper [] = 2
helper (x:xs)
| x == Operator && x == Plus = 1
I cannot compare x to Plus, because it expects x to be of type CVO.
Couldn't match expected type ‘CVO’ with actual type ‘Operator’
Is it somehow possible to cast x to be an instance of Operator in order to do the comparison?
A value can't have two different types at the same time. If x is a CVO you can't use == to compare it to Plus which is an Operator.
At the moment the type CVO consists of three constant values called Const, Variable and Operator. I'm guessing you actually wanted it to contain values of the type Const, Variable or Operator. You do that by declaring arguments to the constructors.
data CVO = Const Const -- a constructor whose name is Const and contains a value of type Const
| Var Variable -- a constructor named Var containing a Variable
| Op Operator -- a constructor named Op containing an Operator
A given value of type CVO must have been built from one of those three constructors, containing a value of the correct type. You can test which constructor was used to create the CVO, and simultaneously unpack the value, using pattern matching. Something like this:
helper :: Expr -> Integer
helper [] = 0
helper (Op o:xs) -- because we matched Op, we know o :: Operator
| o == Plus = 1
| otherwise = 2
helper _ = 3
If i want to make a String but holds only an uppercase character. I know that String is a [Char]. I have tried something like type a = ['A'..'Z'] but it did not work any help?
What you're wanting is dependent types, which Haskell doesn't have. Dependent types are those that depend on values, so using dependent types you could encode at the type level a vector with length 5 as
only5 :: Vector 5 a -> Vector 10 a
only5 vec = concatenate vec vec
Again, Haskell does not have dependent types, but languages like Agda, Coq and Idris do support them. Instead, you could just use a "smart constructor"
module MyModule
( Upper -- export type only, not constructor
, mkUpper -- export the smart constructor
) where
import Data.Char (isUpper)
newtype Upper = Upper String deriving (Eq, Show, Read, Ord)
mkUpper :: String -> Maybe Upper
mkUpper s = if all isUpper s then Just (Upper s) else Nothing
Here the constructor Upper is not exported, just the type, and then users of this module have to use the mkUpper function that safely rejects non-uppercase strings.
For clarification, and to show how awesome dependent types can be, consider the mysterious concatenate function from above. If I were to define this with dependent types, it would actually look something like
concatenate :: Vector n a -> Vector m a -> Vector (n + m) a
concatenate v1 v2 = undefined
Wait, what's arithmetic doing in a type signature? It's actually performing type-system level computations on the values that this type is dependent on. This removes a lot of potential boilerplate in Haskell, and it makes guarantees at compilation time that, e.g., arrays can't have negative length.
Most desires for dependent types can be filled either using smart constructors (see bheklilr's answer), generating Haskell from an external tool (Coq, Isabelle, Inch, etc), or using an exact representation. You probably want the first solution.
To exactly represent just the capitals then you could write a data type that includes a constructor for each letter and conversion to/from strings:
data Capital = CA | CB | CC | CD | CE | CF | CG | CH | CI | CJ | CK | CL | CM | CN | CO | CP | CQ | CR | CS | CT | CU | CV | CW | CX | CY | CZ deriving (Eq, Ord, Enum)
toString :: [Capital] -> String
toString = map (toEnum . (+ (fromEnum 'A')) . fromEnum)
You can even go a step further and allow conversion from string literals, "Anything in quotes", to a type [Capitals] by using the OverloadedStrings extension. Just add to the top of your file {-# LANGUAGE OverloadedStrings, FlexibleInstances #-}, be sure to import Data.String and write the instance:
type Capitals = [Capital]
instance IsString Capitals where
fromString = map (toEnum . (subtract (fromEnum 'A')) . fromEnum) . filter (\x -> 'A' <= x && x <= 'Z')
After that, you can type capitals all you want!
*Main> toString ("jfoeaFJOEW" :: Capitals)
"FJOEW"
*Main>
bheklilr is correct but perhaps for your purposes the following could be OK:
import Data.Char(toUpper)
newtype UpperChar = UpperChar Char
deriving (Show)
upperchar :: Char -> UpperChar
upperchar = UpperChar. toUpper
You can alternatively make UpperChar an alias of Char (use type instead of newtype) which would allow you to forms lists of both Char and UpperChar. The problem with an alias, however, is that you could feed a Char into a function expecting an UpperChar...
One way to do something similar which will work well for the Latin script of your choice but not so well as a fully general solution is to use a custom type to represent upper case letters. Something like this should do the trick:
data UpperChar = A|B|C|D| (fill in the rest) | Y | Z deriving (Enum, Eq, Ord, Show)
newtype UpperString = UpperString [UpperChar]
instance Show UpperString
show (UpperString s) = map show s
The members of this type are not Haskell Strings, but you can convert between them as needed.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to create a type bounded within a certain range
I have the data type:
data Expr = Num Int
| Expression Expr Operator Expr
In the context of the problem, the numbers that (Num Int) will represent are single digit only. Is there a way to ensure that restriction within the type declaration?
Of course we could define a function to test whether the Expr is valid, but it would be nice to have the type system handle it.
You can use an abstract data type with a smart constructor:
newtype Digit = Digit { digitVal :: Int }
deriving (Eq, Ord, Show)
mkDigit :: Int -> Maybe Digit
mkDigit n
| n >= 0 && n < 10 = Just (Digit n)
| otherwise = Nothing
If you put this in another module and don't export the Digit constructor, then client code can't construct values of type Digit outside of the range [0,9], but you have to manually wrap and unwrap it to use it. You could define a Num instance that does modular arithmetic, if that would be helpful; that would also let you use numeric literals to construct Digits. (Similarly for Enum and Bounded.)
However, this doesn't ensure that you never try to create an invalid Digit, just that you never do. If you want more assurance, then the manual solution Jan offers is better, at the cost of being less convenient. (And if you define a Num instance for that Digit type, it will end up just as "unsafe", because you'd be able to write 42 :: Digit thanks to the numeric literal support you'd get.)
(If you don't know what newtype is, it's basically data for data-types with a single, strict field; a newtype wrapper around T will have the same runtime representation as T. It's basically just an optimisation, so you can pretend it says data for the purpose of understanding this.)
Edit: For the more theory-oriented, 100% solution, see the rather cramped comment section of this answer.
Since there are only ten possibilities, you could use Enum to specify all of them.
data Digit = Zero | One | Two deriving (Enum, Show)
Then you'd have to use fromEnum to treat them as numbers.
1 == fromEnum One
Similarly, using toEnum you can get a Digit from a number.
toEnum 2 :: Digit
We can go even further and implement Num.
data Digit = Zero | One | Two deriving (Enum, Show, Eq)
instance Num Digit where
fromInteger x = toEnum (fromInteger x) :: Digit
x + y = toEnum $ fromEnum x + fromEnum y
x * y = toEnum $ fromEnum x * fromEnum y
abs = id
signum _ = 1
Zero + 1 + One == Two