Tokenizer identifier in Haskell

Tokenizer identifier in Haskell - haskell

I'm writing this small program basically to identify each input tokens as operator/parenthesis/int.
However, I encountered a problem stating that
Not in scope: data constructor `Integer'
Here's what I have so far (Data.Char only defines isDigit, nothing else)
import Data.Char (isDigit)
data Token = TPlus | TTimes | TParenLeft | TParenRight | TNumber Integer | TError
deriving (Show, Eq)
tokenize :: String -> [Token]
tokenize [] = []
tokenize (c:cs)
| c == '+' = TPlus : tokenize cs
| c == '*' = TTimes : tokenize cs
| c == '(' = TParenLeft : tokenize cs
| c == ')' = TParenRight : tokenize cs
| isDigit c = TNumber Integer (read c) : tokenize cs
| otherwise = TError : tokenize cs
Some example expected output:
*Main> tokenize "( 1 + 2 )"
should give
[TParenLeft,TNumber 1,TPlus,TNumber 2,TParenRight]
and
*Main> tokenize "abc"
should expect TError, but I'm getting
[TError,TError,TError]
I'd appreciate if anyone could shed some light on these two issues.

For the Not in scope: data constructor 'Integer' part, the problem is that you have an extra Integer in the line
isDigit c = TNumber Integer (read c) : tokenize cs
which should be
isDigit c = TNumber (read [c]) : tokenize cs
The [c] part is needed because read has type read :: Read a => String -> a, and c is a Char, but [c] is a String containing only the char c.
tokenize "abc" is returning [TError, TError, TError] because of your error treatment policy:
| otherwise = TError : tokenize cs
This leads us to:
tokenize "abc"
-- c = 'a', cs = "bc"
TError : tokenize "bc"
TError : (TError : tokenize "c")
TError : TError : TError : []
[TError, TError, TError]
if you want to group all of your errors in a single TError, then you should drop all the incorrect input
| otherwise = TError : (dropWhile (\o -> o == TError) (tokenize cs))

When constructing a TNumber, you don't need to (and shouldn't) include the types of each of the constructor's arguments. Thus, you need to change this:
| isDigit c = TNumber Integer (read c) : tokenize cs
to this:
| isDigit c = TNumber (read c) : tokenize cs

Related

Haskell: Replace a subString in a String without Data.List package

I'm very new in Haskell and I want to know how I can replace a predetermined word in a String by another word. This is my code so far, I know I can't do this for now:
treat :: String -> String
treat text = text
main::IO()
main = do argv <- getArgs
texte <- readFile "intputText"
print (separation text)
print ( treat text )
separation :: String -> [String]
separation [] = [""]
separation (c:cs) | c == "\Graph" = "Graphic : " : rest
| c == '}' = "" : rest
| c == '{' = "" : rest
| otherwise = (c : head rest) : tail rest
where rest = separation cs
So basically I know I can't put a String in the first c == "\Graph" so I want to know
how I can basically replace every word "\Graph" in my String text by "Graphic".
I want to be able to do that without importing any package.
If anyone can help me out I'd really appreciate it.
Thank you very much!

replace :: String -> String -> String-> String
replace [] token repl = []
replace str#(s:ss) token#(t:tx) repl
-- check if first char of string equal to first char of token
| s == t = case validateToken token str of
Just list -> repl ++ replace list token repl
Nothing -> s : replace ss token repl
-- if not equal then continue recursion step
| otherwise = s: replace ss token repl
where
-- validate if token matches the following chars of the string
-- returns Nothing if token is not matched
-- returns the remaining string after the token if token is matched
validateToken:: String -> String -> Maybe String
validateToken (a:as) [] = Nothing
validateToken [] list = Just list
validateToken (a:as) (x:xs)
| a == x = validateToken as xs
| otherwise = Nothing
example = replace "yourString" "token" "new"

Convert a string to a list of "grades"

I want to make a function that takes in a string of multiple "grades" of varying length and convert it to a list of grades.
Grade is just a data structure that looks like this (just an arbitrary grading system):
data Grade = A+ | A | A- | B+ | B | B- | P | F
deriving (Show, Eq)
As you can see, the grades have varying length. If they had length 1 or consistent length, this would have been much easier.
Here is the function that I want to make:
This is what the string input looks like "PA+FABA+B-A"
stringToGrade :: String -> Grade
stringToGrade stringGrade
| stringGrade == "A+" = A+
| stringGrade == "A" = A
-- and so on
extractGrades :: String -> [Grade]
extractGrades stringGrades = case stringGrades of
[] -> []
x:y:ys
| x == "A" && y == "+" -> [stringToGrade (x : y)] : extractGrades ys
| x == "A" -> [stringToGrade x] : extractGrades y:ys
-- and so on
As you can see, this is not going anywhere.
Is there an elegant and easy way I cam do this instead of had coding everything?

We can apply pattern matching so to match a string prefix. Here's an example:
foo :: String -> [Int]
foo [] = []
foo ('h':'e':'l':'l':'o':rest) = 1 : foo rest
foo ('b':'o':'b':rest) = 2 : foo rest
foo ('b':rest) = 3 : foo rest
foo _ = error "foo: invalid input syntax"
Sample usage:
foo "hellobbobbobhello" ==> [1,3,2,2,1]

You can split the string into tokens using combination of split functions.
split (keepDelimsR $ oneOf "+-") "PA+FABA+B-A"
will create this form, where the suffixes are attached.
["PA+","FABA+","B-","A"]
Now, you can split this further with a custom splitter
splitInit [] = []
splitInit [x] = [[x]]
splitInit [x,y] = [[x,y]]
splitInit (x:xs) = [x] : splitInit xs
a combination will give you
concatMap splitInit $ split (keepDelimsR $ oneOf "+-") "PA+FABA+B-A"
["P","A+","F","A","B","A+","B-","A"]
where you can map through your constructors

how to replace a letter in string with Haskell

i have to make Haskell function called markDups that processes a string, replacing all repeated occurrences of a character with the underscore, "_", character.
here is my code i did so far.
makeBar :: Char -> [Char] -> [Char]
makeBar c (x:xs) | c == x = '_':makeBar c xs --turn into a "_"
| otherwise = x:makeBar c xs--ignore and move on
when I run this, here is my output with error message
output should be like this
what should I do?

This seems to work:
import Data.Set
main = putStrLn (markDups "hello world" empty)
markDups :: [Char] -> Set Char -> [Char]
markDups [] set = []
markDups (x:rest) set
| member x set = '_':(markDups rest set)
| otherwise = x:(markDups rest (insert x set))

Having trouble with isUpper function in use

Is it OK to write the otherwise part this way? The function should lower the uppercase letters and put the space in front. It keeps giving an error.
functionl s
| s==[] = error "empty"
| otherwise = [ if isUpper c then (" " ++ toLower c) else c | c <-read s::[Char] ]

First, Note that the return type of (" "++ toLower c) is a String ([Char]) if it was done properly - but it isn't. I'll show you below.
But before that, note that in this specific list comprehension, you have else c which is a single Char.
Your return types must match.
This might be a suitable replacement: concat [ if (isUpper c) then (" "++[c]) else [c] | c <-s ]

Your list comprehension is almost right as #Arnon has shown, but you could definitely implement this function more easily using recursion:
-- A descriptive name and a type signature help
-- tell other programmers what this function does
camelCaseToWords :: String -> String
camelCaseToWords [] = []
camelCaseToWords (c:cs)
| isUpper c = ' ' : toLower c : camelCaseToWords cs
| otherwise = c : camelCaseToWords cs
Now, this pattern can be abstracted to use a fold, which is Haskell's equivalent of a basic for-loop:
camelCaseToWords cs = foldr replacer [] cs
where
replacer c xs
| isUpper c = ' ' : toLower c : xs
| otherwise = c : xs
Here each step of the iteration is performed by replacer, which takes the current character c, an accumulated value xs and returns a new value to be used in the next iteration. The fold is seeded with an initial value of [], and then performed over the entire string.

Classify lexeme of a input String

given a String "3 + a * 6" how do I determine the lexeme one by one? I know that my code is missing classify xs part but I don't know where to put it. Can anyone help me with this?
(the language is in Haskell)
classify :: String -> String
classify (x:xs)
|x == '+' = "PLUS"
|x == '-' = "MINUS"
|x == '*' = "MULT"
|x == '/' = "DIV"
|x == '(' = "LP"
|x == ')' = "RP"
|isAlpha x = "VAR"
|isDigit x = "CONST"
|otherwise = error "Cannot determine lexeme"

This kind of tokenisation is best left to lexer generators or parser combinators. You can try Alex, at http://www.haskell.org/alex/ , or Parsec, at http://www.haskell.org/haskellwiki/Parsec .
These tools are designed specifically to make tokenisation/scanning (and parsing, in the case of Parsec) easy to use.

If you really only need a tokenizer, here's how you could do it without parsec. I defined an additional ADT for the token types (you can of course convert that back to strings), and had to change the return type, since you get a sequence of tokens.
type Error = String
data Token = Plus | Minus | Mult | Div | Lp | Rp
| Var | Const | Whitespace deriving (Show, Eq)
tokenTable = [('+', Plus), ('-', Minus), ('*', Mult), ('/', Div), ('(', Lp), (')', Rp)]
tokenize :: String -> Either Error [Token]
tokenize "" = Right []
tokenize (x:xs) = case lookup x tokenTable of
Just t -> fmap (t:) (tokenize xs)
Nothing -> recognize x where
recognize x
| isAlpha x = fmap (Var:) (tokenize xs)
| isDigit x = fmap (Const:) (tokenize xs)
| isSeparator x = fmap (Whitespace:) (tokenize xs)
| otherwise = Left "Cannot determine lexeme"
However, this quickly becomes tedious. It already is, somehow, since we have to lift the list consing to Either using fmap. Imagine how you would implement indicating the location of the error? Going further essentialy becomes implementing a monad stack and reimplementing a parser combinator like Parsec. That's why it's often recomminded to use a combinator library directly, and also let it do the lexing.
And if you can't or don't want to use full Parsec, it's not too difficult to implement the basic functionality by yourself.

You don't need to parse spaces in general. Here is a combination of your and phg's solutions:
import Data.Char
data Token = Plus | Minus | Mult | Div | Lp | Rp | Var | Digit | Undefined
deriving Show
tokenMap :: String -> Token
tokenMap "+" = Plus
tokenMap "-" = Minus
tokenMap "*" = Mult
tokenMap "/" = Div
tokenMap "(" = Lp
tokenMap ")" = Rp
tokenMap [c]
| isAlpha c = Var
| isDigit c = Digit
tokenMap _ = Undefined
classify :: String -> [Token]
classify = map tokenMap . words

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Tokenizer identifier in Haskell - haskell

When constructing a TNumber, you don't need to (and shouldn't) include the types of each of the constructor's arguments. Thus, you need to change this: | isDigit c = TNumber Integer (read c) : tokenize cs to this: | isDigit c = TNumber (read c) : tokenize cs

Related

Haskell: Replace a subString in a String without Data.List package

Convert a string to a list of "grades"

how to replace a letter in string with Haskell

Having trouble with isUpper function in use

Classify lexeme of a input String

Categories

Resources