I'm a newbie in Haskell and I'd like some opinions about improving this script. This is a code generator and requires a command line argument to generate the sql script.
./GenCode "people name:string age:integer"
Code:
import Data.List
import System.Environment (getArgs)
create_table :: String -> String
create_table str = "CREATE TABLE " ++ h (words str)
where h (x:xs) = let cab = x
final = xs
in x ++ "( " ++ create_fields xs ++ ")"
create_fields (x:xs) = takeWhile (/=':') x ++ type x ++ sig
where sig | length xs > 0 = "," ++ create_fields xs
| otherwise = " " ++ create_fields xs
create_fields [] = ""
type x | isInfixOf "string" x = " CHARACTER VARYING"
| isInfixOf "integer" x = " INTEGER"
| isInfixOf "date" x = " DATE"
| isInfixOf "serial" x = " SERIAL"
| otherwise = ""
main = mainWith
where mainWith = do
args <- getArgs
case args of
[] -> putStrLn $ "You need one argument"
(x:xs) -> putStrLn $ (create_table x)
I think you understand how to write functional code already. Here are some small style notes:
Haskell usually uses camelCase, not under_score_separation
In create_table, cabo and final are not used.
Usually a list-recursive function like create_fields puts the empty list case first.
I would not make create_fields recursive anyway. The comma-joining code is quite complicated and should be separated from the typing code. Instead do something like Data.List.intercalate "," (map create_field xs). Then create_field x can just be takeWhile (/=':') x ++ type x
Especially if there are a lot of types to be translated, you might put them into a map
Like so:
types = Data.Map.fromList [("string", "CHARACTER VARYING")
,("integer", "INTEGER")
-- etc
]
Then type can be Data.Maybe.fromMaybe "" (Data.Map.lookup x types)
Code can appear in any order, so it's nice to have main up front. (This is personal preference though)
You don't need mainWith.
Just say
main = do
args <- getArgs
case args of
[] -> ...
You don't need the dollar for the calls to putStrLn. In the first call, the argument wouldn't require parentheses anyway, and in the second, you supply the parentheses. Alternatively, you could keep the second dollar and drop the parentheses.
Don't use length xs > 0 (in sig); it needlessly counts the length of xs when all you really wanted to know is whether it's empty. Use null xs to check for a non-empty list:
...
where sig | null xs = ... -- Empty case
| otherwise = ... -- Non-empty case
or add an argument to sig and pattern match:
...
where sig (y:ys) = ...
sig [] = ...
Although Nathan Sanders' advice to replace the whole recursive thing with intercalate is excellent and makes this a moot point.
You're also identifying the type by passing the whole "var:type" string into type, so it is testing
"string" `isInfixOf` "name:string"
etc.
You could use break or span instead of takeWhile to separate the name and type earlier:
create_fields (x:xs) = xname ++ type xtype ++ sig
where
(xname, _:xtype) = break (==':') x
sig = ...
and then type can compare for string equality, or look up values using a Map.
A quick explanation of that use of break:
break (==':') "name:string" == ("name", ":string")
Then when binding
(xname, _:xtype) to ("name", ":string"),
xname -> "name"
_ -> ':' (discarded)
xtype -> "string"
Related
For my first line of Haskell I thought it'd be a nice case to produce a "natural listing" of items (of which the type supports show to get a string representation). By "natural listing" I mean summing up all items separated with , except the last one, which should read and lastitem. Ideally, I'd also like to not have a , before the "and".
To spice it up a bit (to show off the compactness of haskell), I wanted to have an "inline" solution, such that I can do
"My listing: " ++ ... mylist ... ++ ", that's our listing."
(Obviously for "production" making a function for that would be better in all ways, and allow for recursion naturally, but that's the whole point of my "inline" restriction for this exercise.)
For now I came up with:
main = do
-- hello
nicelist
nicelist = do
let is = [1..10]
putStrLn $ "My listing: " ++ concat [ a++b | (a,b) <- zip (map show is) (take (length is -1) $ repeat ", ") ++ [("and ", show $ last is)]] ++ ", that's our listing."
let cs = ["red", "green", "blue", "yellow"]
putStrLn $ "My listing: " ++ concat [ a++b | (a,b) <- zip (map show cs) (take (length cs -1) $ repeat ", ") ++ [("and ", show $ last cs)]] ++ ", that's our listing."
but this hardly seems optimal or elegant.
I'd love to hear your suggestions for a better solution.
EDIT:
Inspired by the comments and answer, I dropped the inline requirement and came up with the following, which seems pretty sleek. Would that be about as "haskellic" as we can get, or would there be improvements?
main = do
putStrLn $ "My listing: " ++ myListing [1..10] ++ ", that's the list!"
putStrLn $ "My listing: " ++ myListing ["red", "green", "blue", "yellow"] ++ ", that's the list!"
myListing :: (Show a) => [a] -> String
myListing [] = "<nothing to list>"
myListing [x] = "only " ++ (show x)
myListing [x, y] = (show x) ++ " and " ++ (show y)
myListing (h:t) = (show h) ++ ", " ++ myListing t
Here's how I would write it:
import Data.List
niceShow' :: [String] -> String
niceShow' [] = "<empty>"
niceShow' [a] = a
niceShow' [a, b] = a ++ " and " ++ b
niceShow' ls = intercalate ", " (init ls) ++ ", and " ++ last ls
niceShow :: [String] -> String
niceShow ls = "My listing: " ++ niceShow' ls ++ ", that's our listing."
niceList :: IO ()
nicelist = do
putStrLn $ niceShow $ show <$> [1..10]
putStrLn $ niceShow ["red", "green", "blue", "yellow"]
Steps:
Create niceShow to create your string
Replace list comprehensions with good old function calls
Know about intercalate and init
Add type signatures to top levels
Format nicely
niceShow can only be inlined if you know the size of the list beforehand, otherwise, you'd be skipping the edge cases.
Another way to state the rules for punctuating a list (without an Oxford comma) is this:
Append a comma after every element except the last two
Append “and” after the second-to-last element
Leave the final element unchanged
This can be implemented by zipping the list with a “pattern” list containing the functions to perform the modifications, which repeats on one end. We want something like:
repeat (<> ",") <> [(<> " and"), id]
But of course this is just an infinite list of the comma function, so it will never get past the commas and on to the “and”. One solution is to reverse both the pattern list and the input list, and use zipWith ($) to combine them. But we can avoid the repeated reversals by using foldr to zip “in reverse” (actually, just right-associatively) from the tail end of the input. Then the result is simple:
punctuate :: [String] -> [String]
punctuate = zipBack
$ [id, (<> " and")] <> repeat (<> ",")
zipBack :: [a -> b] -> [a] -> [b]
zipBack fs0 = fst . foldr
(\ x (acc, f : fs) -> (f x : acc, fs))
([], fs0)
Example uses:
> test = putStrLn . unwords . punctuate . words
> test "this"
this
> test "this that"
this and that
> test "this that these"
this, that and these
> test "this that these those them"
this, that, these, those and them
There are several good ways to generalise this:
zipBack is partial—it assumes the function list is infinite, or at least as long as the string list; consider different ways you could make it total, e.g. by modifying fs0 or the lambda
The punctuation and conjunction can be made into parameters, so you could use e.g. semicolons and “or”
zipBack could work for more general types of lists, Foldable containers, and functions (i.e. zipBackWith)
String could be replaced with an arbitrary Semigroup or Monoid
There’s also a cute specialisation possible—if you want to add the option to include an Oxford comma, its presence in the “pattern” (function list) depends on the length of the final list, because it should not be included for lists of 2 elements. Now, if only we could refer to the eventual result of a computation while computing it…
I want to know how to make modifications to a text that is full of special characters and codes and replace those codes with strings.
I have the following text:
text=
"#chomsky/syntactic structures/chomskySynt/: published in 1957. #bloomfield/language/bloomfieldLan/: published in 1933. #chomsky/aspects of a theory of syntax/chomskyAsp/: published in 1965. ... #see/chomskySynt/ is considered the starting point of generative linguistics.... Another hypothesis was introduced in #see/chomskyAsp/."
I want to turn it into=
"Chomsky 1: Syntactic structures : published in 1957. Bloomfield 1: Language : published in 1933. Chomsky 2: Aspects of a theory of syntax : published in 1965. ... Chomsky 1 is considered the starting point of generative linguistics ... Another hypothesis was introduced in Chomsky 2..."
Explanation of the special characters and codes: the information on a book starts with # followed by the name of the author (chomsky for example) followed by / then title of the book / then the special code for the book (chomskyAsp) then /
The citation of a book starts with #see followed by / then the special code of the book (ex. chomskySyn) /
The modifications are:
To count how many times an author is cited and concatenate the number to the name: Chomsky 1, for example.
Author name will start with a capital letter
Remove the special code : chomskySynt which serves only as an identification code.
Replace the reference : #see/chomskyAsp with the Chomsky 2. That is replace the reference with the actual author and number.
Here is my code:
RemoveSlash = myReplace "/"" " text
removeDash = map lines $ (filter(any isLetter) . groupBy ( (==) `on` (=='#'))) $ removeSlash
flattenList= concat removeDash
splitIntoWords = map words flattenList
And here is the myReplace function:
myReplace _ _ [] = []
myReplace a b s#(x:xs)= if isPrefixOf a s
then b++myReplace a b (drop(length a)s)
else x: myReplace a b xs
Here is the result so far:
[["chomsky syntactic structures chomskySynt published in 1957. "], ["bloomfield language bloomfieldLan published in 1933. "],["chomsky aspects of a theory of syntax chomskyAsp published in 1965. ... "],["see chomskySynt is considered the starting point of generative linguistics.... Another hypothesis was introduced in "],["see chomskyAsp"]]
The reason I flattened the list and split it into words is now if I do:
map head splitIntoWords
I get ["chomsky","bloomfield","chomsky","see","see"]
I am stuck at this stage. How do I count how many times an author is cited and concatenate the number to the name. I thought of using the zip function:
zipChomsky =zip [1, 2][x | x <- diviser,(head x) == "chomsky"]
This gives:
[(1["chomsky","syntactic","structures","chomskySynt","published","in","1957."]),(2,["chomsky","aspects","of","a","theory","of","syntax","chomskyAsp","published","in","1965.","..."])]
But the result is very different from: Chomsky 1: ...
EDIT: I didn't mean to make the answer this long, but the problem turned out a non-trivial task, and I'm not quite sure how much detail I should put in the answer. In case you understand all the tools I'm using, the full code is just at the end of this answer.
In your case, you'll need:
an approach to parse your input document
a suitable data structure to store the input information
displaying the data as output format
For the parsing part, perhaps Regex is enough (maybe), but I guess the Parsec library is a better choice. For detailed usage of Parsec please refer to the link, and I'll only try to show how to use it in your case:
First, import Text.ParserCombinators.Parsec.
A document is a list of
a literal string
a definition, with format #<Author>/<Title>/<Code>/, as in "#chomsky/syntactic structures/chomskySynt/"
a citation, with format #see/<Code>/, as in "#see/chomskyAsp/"
Hence we define
data Index = Index {
getAuthor :: String,
getTitle :: String,
getSpecialCode :: String,
getAuthorCount :: Int
-- For counting author later.
} deriving (Show)
data Content = Def Index
| Cite String Index
-- We'll fill in Index later.
| Literal String
deriving (Show)
and our input document will just be turned into [Content].
Correspondingly, we'll use the following function (actually, parser) to parse the input:
document = many (try def <|> try cite <|> literal)
literal = Literal <$> many1 (noneOf "#")
def = do
char '#'
author <- many1 $ noneOf "/"
char '/'
title <- many1 $ noneOf "/"
char '/'
code <- many1 $ noneOf "/"
char '/'
return $ Def author title code
cite = do
try $ string "#see/"
code <- many1 $ noneOf "/"
char '/'
return $ Cite code nullIndex
A short explanation:
A document is many (def or cite or literal), with operator <|> combining parsers.
A literal is a string, stopping at '#', with at least 1 char (using many1); a parser inside many should not accept empty input, think of why!
A def is #<Author>/<Title>/<Code>/, and we can write in do-notation since Parser is a monad.
A cite goes similarly.
A def, cite, or string "#see/" parse multiple characters, hence is possible to fail when they have consumed some chars; therefore, we use the combinator try.
By the way, nullIndex is just a placeholder before we actually fill this record:
nullIndex :: Index
nullIndex = Index "" "" "" 0
Now we only need a function with signiture [Content] -> String.
We can start with captializing the author name:
capitalizeAuthor :: Content -> Content
capitalizeAuthor (Def x) = Def (x {getAuthor = author'}) where
author' = toUpper (head author) : tail author
author = getAuthor x
capitalizeAuthor y = y
The other tasks are not local, since the relation between Contents should be observed, hence we will use a foldl across the list.
Define
import Data.Map.Strict ((!))
import qualified Data.Map.Strict as M
type CodeDict = M.Map String Index
-- Map Code Index
type AuthorDict = M.Map String Int
-- Map Author Count
type Fold = (CodeDict, AuthorDict, [Content])
emptyFold :: Fold
emptyFold = (M.empty, M.empty, [])
The Fold type will store the state when we modify along the original [Content].
(I realize that the code will be much clearer if I use the State monad, but I'm not sure if I need to explain it then ...)
In addition, a folding function for foldl
accum :: Fold -> Content -> Fold
accum (c,a,ls) (Def x) = (c',a',Def x':ls) where
a' = M.insertWith (+) author 1 a
c' = M.insert code x' c
x' = x {getAuthorCount = count}
count = maybe 1 (+1) $ a !? author
author = getAuthor x
code = getSpecialCode x
accum (c,a,ls) (Cite code _) = (c,a,Cite code (c ! code) : ls)
accum (c,a,ls) y = (c,a,y:ls)
After foldr, the resulted list will contain the contents with
getAuthorCount correctly filled
Cites transferred into Defs, since they have the same outputting format.
The resulted list is reversed, so you'll need Data.List.reverse.
Finally, you can define your own version of Show for Content. For example,
instance Show Index where
show x = getAuthor x ++ " "
++ show (getAuthorCount x) ++ ": "
++ getTitle x ++ " "
instance Show Content where
show (Def idx) = show idx
show (Cite x idx) = getAuthor idx ++ " "
++ show (getAuthorCount idx)
show (Literal x) = x
as I figured out from your output sample.
The full length code:
import Data.Char
import Data.List (reverse)
import Data.Map.Strict ((!),(!?))
import qualified Data.Map.Strict as M
import Text.ParserCombinators.Parsec
data Index = Index {
getAuthor :: String,
getTitle :: String,
getSpecialCode :: String,
getAuthorCount :: Int
-- For counting author later.
}
nullIndex :: Index
nullIndex = Index "" "" "" 0
instance Show Index where
show x = getAuthor x ++ " "
++ show (getAuthorCount x) ++ ": "
++ getTitle x ++ " "
data Content = Def Index
| Cite String Index
| Literal String
instance Show Content where
show (Def idx) = show idx
show (Cite x idx) = getAuthor idx ++ " "
++ show (getAuthorCount idx)
show (Literal x) = x
document = many (try cite <|> try def <|> literal)
literal = Literal <$> many1 (noneOf "#")
def = do
char '#'
author <- many1 $ noneOf "/"
char '/'
title <- many1 $ noneOf "/"
char '/'
code <- many1 $ noneOf "/"
char '/'
return $ Def $ Index author title code 0
cite = do
try $ string "#see/"
code <- many1 $ noneOf "/"
char '/'
return $ Cite code nullIndex
capitalizeAuthor :: Content -> Content
capitalizeAuthor (Def x) = Def (x {getAuthor = author'}) where
author' = toUpper (head author) : tail author
author = getAuthor x
capitalizeAuthor y = y
type CodeDict = M.Map String Index
-- Map Code Index
type AuthorDict = M.Map String Int
-- Map Author Count
type Fold = (CodeDict, AuthorDict, [Content])
emptyFold :: Fold
emptyFold = (M.empty, M.empty, [])
accum :: Fold -> Content -> Fold
accum (c,a,ls) (Def x) = (c',a',Def x':ls) where
a' = M.insertWith (+) author 1 a
c' = M.insert code x' c
x' = x {getAuthorCount = count}
count = maybe 1 (+1) $ a !? author
author = getAuthor x
code = getSpecialCode x
accum (c,a,ls) (Cite code _) = (c,a,Cite code (c ! code) : ls)
accum (c,a,ls) y = (c,a,y:ls)
main :: IO ()
main = do
line <- getLine
let parsed = parse document "" line
case parsed of
Left x -> print x
Right cs -> do
let cs1 = map capitalizeAuthor cs
let (_,_,cs2) = foldl accum emptyFold cs1
let output = concatMap show $ reverse cs2
putStrLn output
My goal is to find the number of times a substring exists within a string.
The substring I'm looking for will be of type "[n]", where n can be any variable.
My attempt involved splitting the string up using the words function,
then create a new list of strings if the 'head' of a string was '[' and
the 'last' of the same string was ']'
The problem I ran into was that I entered a String which when split using
the function words, created a String that looked like this "[2],"
Now, I still want this to count as an occurrence of the type "[n]"
An example would be I would want this String,
asdf[1]jkl[2]asdf[1]jkl
to return 3.
Here's the code I have:
-- String that will be tested on references function
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
-- Function that will take a list of Strings and return a list that contains
-- any String of the type [n], where n is an variable
ref :: [String] -> [String]
ref [] = []
ref xs = [x | x <- xs, head x == '[', last x == ']']
-- Function takes a text with references in the format [n] and returns
-- the total number of references.
-- Example : ghci> references txt -- -> 3
references :: String -> Integer
references txt = len (ref (words txt))
If anyone can enlighten me on how to search for a substring within a string
or how to parse a string given a substring, that would be greatly appreciated.
I would just use a regular expression, and write it like this:
import Text.Regex.Posix
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
-- references counts the number of references in the input string
references :: String -> Int
references str = str =~ "\\[[0-9]*\\]"
main = putStrLn $ show $ references txt -- outputs 3
regex is huge overkill for such a simple problem.
references = length . consume
consume [] = []
consume ('[':xs) = let (v,rest) = consume' xs in v:consume rest
consume (_ :xs) = consume xs
consume' [] = ([], [])
consume' (']':xs) = ([], xs)
consume' (x :xs) = let (v,rest) = consume' xs in (x:v, rest)
consume waits for a [ , then calls consume', which gathers everything until a ].
Here's a solution with
sepCap.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Either
import Data.Maybe
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
pattern = single '[' *> anySingle <* single ']' :: Parsec Void String Char
length $ rights $ fromJust $ parseMaybe (sepCap pattern) txt
3
My questions is if I put in a string containing such as Hello, today is a Nice Day!! How could I get rid of spaces and punctuation and also replacing the uppercase letters with lowercase?
I know how to delete them but not how to replace them.
Also to get rid of the punctuation.
Sorry I don't know how to mess around with strings, only numbers.
testList xs = [if x = [,|.|?|!] then " " | x<-xs]
import Data.Char
If you want convert the punctuation to space and the characters from upper case to lower case:
testList xs = [if x `elem` ",.?!" then ' ' else toLower x | x<-xs]
Example: testList "TeST,LiST!" == "test list "
If you want to delete the punctuation and convert the characters from upper case to lower case:
testList2 xs = [toLower x | x<-xs, not (x `elem` ",.?!")]
Example: testList2 "Te..S,!t LiS?T" == "test list"
If you don't want or can not import Data.Char, this is an implementation of toLower:
toLower' :: Char -> Char
toLower' char
| isNotUppercase = char -- no change required
| otherwise = toEnum (codeChar + diffLowerUpperChar) -- char lowered
where
codeChar = fromEnum char -- each character has a numeric code
code_A = 65
code_Z = 90
code_a = 97
isNotUppercase = codeChar < code_A || codeChar > code_Z
diffLowerUpperChar = code_a - code_A
I've been without writing a code in Haskell for a long time, but the following should remove the invalid characters (replace them by a space) and also convert the characters from Uppercase to Lowercase:
import Data.Char
replace invalid xs = [if elem x invalid then ' ' else toLower x | x <- xs]
Another way of doing the same:
repl invalid [] = []
repl invalid (x:xs) | elem x invalid = ' ' : repl invalid xs
| otherwise = toLower x : repl invalid xs
You can call the replace (or repl) function like this:
replace ",.?!" "Hello, today is a Nice Day!!"
The above code will return:
"hello today is a nice day "
Edit: I'm using the toLower function from Data.Char in Haskell, but if you want to write it by yourself, check here on Stack Overflow. That question has been asked before.
You will find the functions you need in Data.Char:
import Data.Char
process str = [toLower c | c <- str , isAlpha c]
Though personally, I think the function compositional approach is clearer:
process = map toLower . filter isAlpha
To get rid of the punctuation you can use a filter like this one
[x | x<-[1..10], x `mod` 2 == 0]
The "if" you are using won't filter. Putting an if in the "map" part of a list comprehension will only seve to choose between two options but you can't filter them out there.
As for converting things to lowercase, its the same trick as you can already pull off in numbers:
[x*2 | x <- [1..10]]
Here's a version without importing modules, using fromEnum and toEnum to choose which characters to allow:
testList xs =
filter (\x -> elem (fromEnum x) ([97..122] ++ [32] ++ [48..57])) $ map toLower' xs
where toLower' x = if elem (fromEnum x) [65..90]
then toEnum (fromEnum x + 32)::Char
else x
OUTPUT:
*Main> testList "Hello, today is a Nice Day!!"
"hello today is a nice day"
For a module-less replace function, something like this might work:
myReplace toReplace xs = map myReplace' xs where
myReplace' x
| elem (fromEnum x) [65..90] = toEnum (fromEnum x + 32)::Char
| elem x toReplace = ' '
| otherwise = x
OUTPUT:
*Main> myReplace "!," "Hello, today is a Nice Day!! 123"
"hello today is a nice day 123"
Using Applicative Style
A textual quote from book "Learn You a Haskell for Great Good!":
Using the applicative style on lists is often a good replacement for
list comprehensions. In the second chapter, we wanted to see all the
possible products of [2,5,10] and [8,10,11], so we did this:
[ x*y | x <- [2,5,10], y <- [8,10,11]]
We're just drawing from two lists and applying a function between
every combination of elements. This can be done in the applicative
style as well:
(*) <$> [2,5,10] <*> [8,10,11]
This seems clearer to me, because it's easier to see that we're just
calling * between two non-deterministic computations. If we wanted all
possible products of those two lists that are more than 50, we'd just
do:
filter (>50) $ (*) <$> [2,5,10] <*> [8,10,11]
-- [55,80,100,110]
Functors, Applicative Functors and Monoids
The following two functions behave differently when given an empty string:
guardMatch l#(x:xs)
| x == '-' = "negative " ++ xs
| otherwise = l
patternMatch ('-':xs) = "negative " ++ xs
patternMatch l = l
Here my output:
*Main> guardMatch ""
"*** Exception: matching.hs:(1,1)-(3,20): Non-exhaustive patterns in function guardMatch
*Main> patternMatch ""
""
Question: why does not the 'otherwise' close catch the empty string?
The otherwise is within the scope of the pattern l#(x:xs), which can only match a non-empty string. It might help to see what this (effectively) translates to internally:
guardMatch l = case l of
(x :xs) -> if x == '-' then "negative " ++ xs else l
patternMatch l = case l of
('-':xs) -> "negative " ++ xs
_ -> l
(Actually, I think the if is translated to a case + guard instead of the other way around.)
A guard is always evaluated after the pattern. This is - the guard is tried iff the pattern succeeds. In your case, the pattern (x:xs) excludes the empty string, so the guards are not even tried, as the pattern fails.
The other two answers are totally right of course, but here's another way to think about it: What if you had written this?
guardMatch l#(x:xs)
| x == '-' = "negative " ++ xs
| otherwise = [x]
What would you expect guardMatch "" to be?