Classify lexeme of a input String

Classify lexeme of a input String - haskell

given a String "3 + a * 6" how do I determine the lexeme one by one? I know that my code is missing classify xs part but I don't know where to put it. Can anyone help me with this?
(the language is in Haskell)
classify :: String -> String
classify (x:xs)
|x == '+' = "PLUS"
|x == '-' = "MINUS"
|x == '*' = "MULT"
|x == '/' = "DIV"
|x == '(' = "LP"
|x == ')' = "RP"
|isAlpha x = "VAR"
|isDigit x = "CONST"
|otherwise = error "Cannot determine lexeme"

This kind of tokenisation is best left to lexer generators or parser combinators. You can try Alex, at http://www.haskell.org/alex/ , or Parsec, at http://www.haskell.org/haskellwiki/Parsec .
These tools are designed specifically to make tokenisation/scanning (and parsing, in the case of Parsec) easy to use.

If you really only need a tokenizer, here's how you could do it without parsec. I defined an additional ADT for the token types (you can of course convert that back to strings), and had to change the return type, since you get a sequence of tokens.
type Error = String
data Token = Plus | Minus | Mult | Div | Lp | Rp
| Var | Const | Whitespace deriving (Show, Eq)
tokenTable = [('+', Plus), ('-', Minus), ('*', Mult), ('/', Div), ('(', Lp), (')', Rp)]
tokenize :: String -> Either Error [Token]
tokenize "" = Right []
tokenize (x:xs) = case lookup x tokenTable of
Just t -> fmap (t:) (tokenize xs)
Nothing -> recognize x where
recognize x
| isAlpha x = fmap (Var:) (tokenize xs)
| isDigit x = fmap (Const:) (tokenize xs)
| isSeparator x = fmap (Whitespace:) (tokenize xs)
| otherwise = Left "Cannot determine lexeme"
However, this quickly becomes tedious. It already is, somehow, since we have to lift the list consing to Either using fmap. Imagine how you would implement indicating the location of the error? Going further essentialy becomes implementing a monad stack and reimplementing a parser combinator like Parsec. That's why it's often recomminded to use a combinator library directly, and also let it do the lexing.
And if you can't or don't want to use full Parsec, it's not too difficult to implement the basic functionality by yourself.

You don't need to parse spaces in general. Here is a combination of your and phg's solutions:
import Data.Char
data Token = Plus | Minus | Mult | Div | Lp | Rp | Var | Digit | Undefined
deriving Show
tokenMap :: String -> Token
tokenMap "+" = Plus
tokenMap "-" = Minus
tokenMap "*" = Mult
tokenMap "/" = Div
tokenMap "(" = Lp
tokenMap ")" = Rp
tokenMap [c]
| isAlpha c = Var
| isDigit c = Digit
tokenMap _ = Undefined
classify :: String -> [Token]
classify = map tokenMap . words

Related

Assigning Integers to Variables in Haskell

I'm sorry that this is a beginner question. I just trying to let x = 5, y = 2, and set all other variables to zero in a function s. All this work is to verify p computes 5! in the code below, where p representing
y:=1 ; while ¬(x=1) do (y:=y*x; x:=x-1);
type Num = Integer
type Var = String
type Z = Integer
type T = Bool
type State = Var -> Z
data Aexp = N Num | V Var | Add Aexp Aexp | Mult Aexp Aexp | Sub Aexp Aexp deriving (Show, Eq, Read)
data Bexp = TRUE | FALSE | Eq Aexp Aexp | Le Aexp Aexp | Neg Bexp | And Bexp Bexp deriving (Show, Eq, Read)
data Stm = Ass Var Aexp | Skip | Comp Stm Stm | If Bexp Stm Stm | While Bexp Stm deriving (Show, Eq, Read)
p::Stm
p = (Comp(Ass "y" (N 1))(While(Neg(Eq (V "x") (N 1)))(Comp (Ass "y" (Mult (V "y") (V "x")))(Ass "x" (Sub (V "x") (N 1))))))
s :: State {- It has to be Var -> Int-}
s x = 5
s y = 2
And when I try to compile this, ghci gives that the pattern matches are overlapped. I know this is a quite simple question, but there is not much information online for me to solve this. Could you give me any hints? Thanks!

You are pattern matching. This is typically done using different values of the argument type.
didSayHello :: String -> Bool
didSayHello "hello" = True
didSayHello x = False
This matches top-down, reading "if the string argument is 'hello', then True" and "if it is any random String argument (excluding 'hello'), then False"
Your matches are overlapping because in both patterns you're referring to any random String place holder. The one just happened to be called "x" and the other "y".
See this link for more details

The two alternatives are doing exactly the same thing: matching on any value passed in, and calling it x. Since the first case matches anything it always succeeds and returns 5.
To do what you seem to be trying to do, you cannot use pattern matching, you need to use equality (via the Eq class):
s :: Var -> Int
s v | v == x = 5
| v == y = 3
You cannot pattern match against other values, only patterns. x and y are values when you defined them using let, but x and y are patterns (matching absolutely any input value)
You mentioned that Var is a String, so this won't work if x and y are set like let x = 5, y = 2 because there is no instance for Num String, so x and y won't be Strings, and can't be tested for equality with a Var

It's pretty unclear what you are trying to do, but know that assignment does not really exist in Haskell. Perhaps you want to look at
s string =
let
x = 5
y = 2
in
-- something using x and y, e.g.
x + y
In this case s will return 7 no matter what string you pass it, i.e. :
s "alpha" => 7
s "beta" => 7

Convert a string to a list of "grades"

I want to make a function that takes in a string of multiple "grades" of varying length and convert it to a list of grades.
Grade is just a data structure that looks like this (just an arbitrary grading system):
data Grade = A+ | A | A- | B+ | B | B- | P | F
deriving (Show, Eq)
As you can see, the grades have varying length. If they had length 1 or consistent length, this would have been much easier.
Here is the function that I want to make:
This is what the string input looks like "PA+FABA+B-A"
stringToGrade :: String -> Grade
stringToGrade stringGrade
| stringGrade == "A+" = A+
| stringGrade == "A" = A
-- and so on
extractGrades :: String -> [Grade]
extractGrades stringGrades = case stringGrades of
[] -> []
x:y:ys
| x == "A" && y == "+" -> [stringToGrade (x : y)] : extractGrades ys
| x == "A" -> [stringToGrade x] : extractGrades y:ys
-- and so on
As you can see, this is not going anywhere.
Is there an elegant and easy way I cam do this instead of had coding everything?

We can apply pattern matching so to match a string prefix. Here's an example:
foo :: String -> [Int]
foo [] = []
foo ('h':'e':'l':'l':'o':rest) = 1 : foo rest
foo ('b':'o':'b':rest) = 2 : foo rest
foo ('b':rest) = 3 : foo rest
foo _ = error "foo: invalid input syntax"
Sample usage:
foo "hellobbobbobhello" ==> [1,3,2,2,1]

You can split the string into tokens using combination of split functions.
split (keepDelimsR $ oneOf "+-") "PA+FABA+B-A"
will create this form, where the suffixes are attached.
["PA+","FABA+","B-","A"]
Now, you can split this further with a custom splitter
splitInit [] = []
splitInit [x] = [[x]]
splitInit [x,y] = [[x,y]]
splitInit (x:xs) = [x] : splitInit xs
a combination will give you
concatMap splitInit $ split (keepDelimsR $ oneOf "+-") "PA+FABA+B-A"
["P","A+","F","A","B","A+","B-","A"]
where you can map through your constructors

Tokenizer identifier in Haskell

I'm writing this small program basically to identify each input tokens as operator/parenthesis/int.
However, I encountered a problem stating that
Not in scope: data constructor `Integer'
Here's what I have so far (Data.Char only defines isDigit, nothing else)
import Data.Char (isDigit)
data Token = TPlus | TTimes | TParenLeft | TParenRight | TNumber Integer | TError
deriving (Show, Eq)
tokenize :: String -> [Token]
tokenize [] = []
tokenize (c:cs)
| c == '+' = TPlus : tokenize cs
| c == '*' = TTimes : tokenize cs
| c == '(' = TParenLeft : tokenize cs
| c == ')' = TParenRight : tokenize cs
| isDigit c = TNumber Integer (read c) : tokenize cs
| otherwise = TError : tokenize cs
Some example expected output:
*Main> tokenize "( 1 + 2 )"
should give
[TParenLeft,TNumber 1,TPlus,TNumber 2,TParenRight]
and
*Main> tokenize "abc"
should expect TError, but I'm getting
[TError,TError,TError]
I'd appreciate if anyone could shed some light on these two issues.

For the Not in scope: data constructor 'Integer' part, the problem is that you have an extra Integer in the line
isDigit c = TNumber Integer (read c) : tokenize cs
which should be
isDigit c = TNumber (read [c]) : tokenize cs
The [c] part is needed because read has type read :: Read a => String -> a, and c is a Char, but [c] is a String containing only the char c.
tokenize "abc" is returning [TError, TError, TError] because of your error treatment policy:
| otherwise = TError : tokenize cs
This leads us to:
tokenize "abc"
-- c = 'a', cs = "bc"
TError : tokenize "bc"
TError : (TError : tokenize "c")
TError : TError : TError : []
[TError, TError, TError]
if you want to group all of your errors in a single TError, then you should drop all the incorrect input
| otherwise = TError : (dropWhile (\o -> o == TError) (tokenize cs))

When constructing a TNumber, you don't need to (and shouldn't) include the types of each of the constructor's arguments. Thus, you need to change this:
| isDigit c = TNumber Integer (read c) : tokenize cs
to this:
| isDigit c = TNumber (read c) : tokenize cs

how can i use more guards with small tricks?

When i compile my code in ghci, there is no problem. It can compile correctly. However, if i try to compile it in hugs, I get the error "compiled code too complex". I think the problem is due to many | conditions.
If I change it to use if/else, there is no problem. I can add if/else statements 100 times but this will be very tiresome and annoying. Rather than that, I tried to put if/else statements after 20-30 | conditions, but i cannot make | work inside if statements like the below:
f x y z
| cond1 = e1
| cond2 = e2
...
if (1)
then
| cond30 = e30
| cond31 = e31
...
else
| cond61 = e61
| cond62 = e62
How can I fix the code with the least effort? The complete code is on hpaste because it is longer than StackOverflow's question size limit.

Avoiding repetitive guards
Firstly, you can rewrite
function input
| this && that && third thing && something else = ... -- you only actually needed brackets for (head xs)
| this && that && third thing && something different = ....
| this && that && a change && ...
...
| notthis && ....
with
function input | this = function2 input'
| notthis = function4 input'
function2 input | that = function3 input''
| notthat = ...
That should simplify your 200 lines of copo code down, but it's still the wrong approach.
Use a function to deal with the same problem just once, not every time
The 4 cases for dealing with operations that you deal with time after time could be replaced with one function, perhaps like:
operation :: Num a => Char -> a -> a -> a
operation x = case x of
'+' -> (+)
'-' -> (-)
'*' -> (*)
'/' -> (/)
_ -> error ("operation: expected an operation (+-*/) but got " ++ [c])
Use list functions instead of testing characters one at a time
You should use some standard functions to help reduce all the single character checks into just grabbing as much number as is there. takeWhile :: (a -> Bool) -> [a] -> [a], so
takeWhile isDigit "354*243" = "354"
takeWhile isDigit "+245" = ""
and there's the corresponding dropWhile:
dropWhile isDigit "354*1111" = "*1111"
dropWhile isDigit "*1111" = "*1111"
So the most dramatic shortening of your code would be to start copo with
copo xs = let
numText = takeWhile isDigit xs
theRest = droWhile isDigit xs
num = read numText
....
in answer....
but there's a shortcut if you want both takeWhile and dropWhile, called span, because span p xs == (takeWhile p xs, dropWhile p xs)
copo xs = let
(numText,theRest) = span isDigit xs
num = read numText
....
in answer....
Use recursion instead of repeating code
You deal with 234 then 234*56 then 234*56/23 then ....
You could replace this with a recursive call to copo, or produce a tree. This depends on whether you're supposed to obey the normal operator precedence (* or / before + or -) or not.

If you insist on guards, instead of
foo a b c d
| cond1, cond2, cond3 = ...
| cond1, cond2, cond4 = ...
| cond5, cond6, cond7 = ...
| cond5, cond6, cond8 = ...
write
foo a b c d
| cond1, cond2 = case () of
() | cond3 = ...
| cond4 = ...
| cond5, cond6 = case () of
() | cond7 = ...
| cond8 = ...

Do some replacement in Haskell List Comprehensions

My questions is if I put in a string containing such as Hello, today is a Nice Day!! How could I get rid of spaces and punctuation and also replacing the uppercase letters with lowercase?
I know how to delete them but not how to replace them.
Also to get rid of the punctuation.
Sorry I don't know how to mess around with strings, only numbers.
testList xs = [if x = [,|.|?|!] then " " | x<-xs]

import Data.Char
If you want convert the punctuation to space and the characters from upper case to lower case:
testList xs = [if x `elem` ",.?!" then ' ' else toLower x | x<-xs]
Example: testList "TeST,LiST!" == "test list "
If you want to delete the punctuation and convert the characters from upper case to lower case:
testList2 xs = [toLower x | x<-xs, not (x `elem` ",.?!")]
Example: testList2 "Te..S,!t LiS?T" == "test list"
If you don't want or can not import Data.Char, this is an implementation of toLower:
toLower' :: Char -> Char
toLower' char
| isNotUppercase = char -- no change required
| otherwise = toEnum (codeChar + diffLowerUpperChar) -- char lowered
where
codeChar = fromEnum char -- each character has a numeric code
code_A = 65
code_Z = 90
code_a = 97
isNotUppercase = codeChar < code_A || codeChar > code_Z
diffLowerUpperChar = code_a - code_A

I've been without writing a code in Haskell for a long time, but the following should remove the invalid characters (replace them by a space) and also convert the characters from Uppercase to Lowercase:
import Data.Char
replace invalid xs = [if elem x invalid then ' ' else toLower x | x <- xs]
Another way of doing the same:
repl invalid [] = []
repl invalid (x:xs) | elem x invalid = ' ' : repl invalid xs
| otherwise = toLower x : repl invalid xs
You can call the replace (or repl) function like this:
replace ",.?!" "Hello, today is a Nice Day!!"
The above code will return:
"hello today is a nice day "
Edit: I'm using the toLower function from Data.Char in Haskell, but if you want to write it by yourself, check here on Stack Overflow. That question has been asked before.

You will find the functions you need in Data.Char:
import Data.Char
process str = [toLower c | c <- str , isAlpha c]
Though personally, I think the function compositional approach is clearer:
process = map toLower . filter isAlpha

To get rid of the punctuation you can use a filter like this one
[x | x<-[1..10], x `mod` 2 == 0]
The "if" you are using won't filter. Putting an if in the "map" part of a list comprehension will only seve to choose between two options but you can't filter them out there.
As for converting things to lowercase, its the same trick as you can already pull off in numbers:
[x*2 | x <- [1..10]]

Here's a version without importing modules, using fromEnum and toEnum to choose which characters to allow:
testList xs =
filter (\x -> elem (fromEnum x) ([97..122] ++ [32] ++ [48..57])) $ map toLower' xs
where toLower' x = if elem (fromEnum x) [65..90]
then toEnum (fromEnum x + 32)::Char
else x
OUTPUT:
*Main> testList "Hello, today is a Nice Day!!"
"hello today is a nice day"
For a module-less replace function, something like this might work:
myReplace toReplace xs = map myReplace' xs where
myReplace' x
| elem (fromEnum x) [65..90] = toEnum (fromEnum x + 32)::Char
| elem x toReplace = ' '
| otherwise = x
OUTPUT:
*Main> myReplace "!," "Hello, today is a Nice Day!! 123"
"hello today is a nice day 123"

Using Applicative Style
A textual quote from book "Learn You a Haskell for Great Good!":
Using the applicative style on lists is often a good replacement for
list comprehensions. In the second chapter, we wanted to see all the
possible products of [2,5,10] and [8,10,11], so we did this:
[ x*y | x <- [2,5,10], y <- [8,10,11]]
We're just drawing from two lists and applying a function between
every combination of elements. This can be done in the applicative
style as well:
(*) <$> [2,5,10] <*> [8,10,11]
This seems clearer to me, because it's easier to see that we're just
calling * between two non-deterministic computations. If we wanted all
possible products of those two lists that are more than 50, we'd just
do:
filter (>50) $ (*) <$> [2,5,10] <*> [8,10,11]
-- [55,80,100,110]
Functors, Applicative Functors and Monoids

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Classify lexeme of a input String - haskell

Related

Assigning Integers to Variables in Haskell

Convert a string to a list of "grades"

Tokenizer identifier in Haskell

how can i use more guards with small tricks?

Do some replacement in Haskell List Comprehensions

Categories

Resources