Haskell: successive modifications of a text - haskell

I want to know how to make modifications to a text that is full of special characters and codes and replace those codes with strings.
I have the following text:
text=
"#chomsky/syntactic structures/chomskySynt/: published in 1957. #bloomfield/language/bloomfieldLan/: published in 1933. #chomsky/aspects of a theory of syntax/chomskyAsp/: published in 1965. ... #see/chomskySynt/ is considered the starting point of generative linguistics.... Another hypothesis was introduced in #see/chomskyAsp/."
I want to turn it into=
"Chomsky 1: Syntactic structures : published in 1957. Bloomfield 1: Language : published in 1933. Chomsky 2: Aspects of a theory of syntax : published in 1965. ... Chomsky 1 is considered the starting point of generative linguistics ... Another hypothesis was introduced in Chomsky 2..."
Explanation of the special characters and codes: the information on a book starts with # followed by the name of the author (chomsky for example) followed by / then title of the book / then the special code for the book (chomskyAsp) then /
The citation of a book starts with #see followed by / then the special code of the book (ex. chomskySyn) /
The modifications are:
To count how many times an author is cited and concatenate the number to the name: Chomsky 1, for example.
Author name will start with a capital letter
Remove the special code : chomskySynt which serves only as an identification code.
Replace the reference : #see/chomskyAsp with the Chomsky 2. That is replace the reference with the actual author and number.
Here is my code:
RemoveSlash = myReplace "/"" " text
removeDash = map lines $ (filter(any isLetter) . groupBy ( (==) `on` (=='#'))) $ removeSlash
flattenList= concat removeDash
splitIntoWords = map words flattenList
And here is the myReplace function:
myReplace _ _ [] = []
myReplace a b s#(x:xs)= if isPrefixOf a s
then b++myReplace a b (drop(length a)s)
else x: myReplace a b xs
Here is the result so far:
[["chomsky syntactic structures chomskySynt published in 1957. "], ["bloomfield language bloomfieldLan published in 1933. "],["chomsky aspects of a theory of syntax chomskyAsp published in 1965. ... "],["see chomskySynt is considered the starting point of generative linguistics.... Another hypothesis was introduced in "],["see chomskyAsp"]]
The reason I flattened the list and split it into words is now if I do:
map head splitIntoWords
I get ["chomsky","bloomfield","chomsky","see","see"]
I am stuck at this stage. How do I count how many times an author is cited and concatenate the number to the name. I thought of using the zip function:
zipChomsky =zip [1, 2][x | x <- diviser,(head x) == "chomsky"]
This gives:
[(1["chomsky","syntactic","structures","chomskySynt","published","in","1957."]),(2,["chomsky","aspects","of","a","theory","of","syntax","chomskyAsp","published","in","1965.","..."])]
But the result is very different from: Chomsky 1: ...

EDIT: I didn't mean to make the answer this long, but the problem turned out a non-trivial task, and I'm not quite sure how much detail I should put in the answer. In case you understand all the tools I'm using, the full code is just at the end of this answer.
In your case, you'll need:
an approach to parse your input document
a suitable data structure to store the input information
displaying the data as output format
For the parsing part, perhaps Regex is enough (maybe), but I guess the Parsec library is a better choice. For detailed usage of Parsec please refer to the link, and I'll only try to show how to use it in your case:
First, import Text.ParserCombinators.Parsec.
A document is a list of
a literal string
a definition, with format #<Author>/<Title>/<Code>/, as in "#chomsky/syntactic structures/chomskySynt/"
a citation, with format #see/<Code>/, as in "#see/chomskyAsp/"
Hence we define
data Index = Index {
getAuthor :: String,
getTitle :: String,
getSpecialCode :: String,
getAuthorCount :: Int
-- For counting author later.
} deriving (Show)
data Content = Def Index
| Cite String Index
-- We'll fill in Index later.
| Literal String
deriving (Show)
and our input document will just be turned into [Content].
Correspondingly, we'll use the following function (actually, parser) to parse the input:
document = many (try def <|> try cite <|> literal)
literal = Literal <$> many1 (noneOf "#")
def = do
char '#'
author <- many1 $ noneOf "/"
char '/'
title <- many1 $ noneOf "/"
char '/'
code <- many1 $ noneOf "/"
char '/'
return $ Def author title code
cite = do
try $ string "#see/"
code <- many1 $ noneOf "/"
char '/'
return $ Cite code nullIndex
A short explanation:
A document is many (def or cite or literal), with operator <|> combining parsers.
A literal is a string, stopping at '#', with at least 1 char (using many1); a parser inside many should not accept empty input, think of why!
A def is #<Author>/<Title>/<Code>/, and we can write in do-notation since Parser is a monad.
A cite goes similarly.
A def, cite, or string "#see/" parse multiple characters, hence is possible to fail when they have consumed some chars; therefore, we use the combinator try.
By the way, nullIndex is just a placeholder before we actually fill this record:
nullIndex :: Index
nullIndex = Index "" "" "" 0
Now we only need a function with signiture [Content] -> String.
We can start with captializing the author name:
capitalizeAuthor :: Content -> Content
capitalizeAuthor (Def x) = Def (x {getAuthor = author'}) where
author' = toUpper (head author) : tail author
author = getAuthor x
capitalizeAuthor y = y
The other tasks are not local, since the relation between Contents should be observed, hence we will use a foldl across the list.
Define
import Data.Map.Strict ((!))
import qualified Data.Map.Strict as M
type CodeDict = M.Map String Index
-- Map Code Index
type AuthorDict = M.Map String Int
-- Map Author Count
type Fold = (CodeDict, AuthorDict, [Content])
emptyFold :: Fold
emptyFold = (M.empty, M.empty, [])
The Fold type will store the state when we modify along the original [Content].
(I realize that the code will be much clearer if I use the State monad, but I'm not sure if I need to explain it then ...)
In addition, a folding function for foldl
accum :: Fold -> Content -> Fold
accum (c,a,ls) (Def x) = (c',a',Def x':ls) where
a' = M.insertWith (+) author 1 a
c' = M.insert code x' c
x' = x {getAuthorCount = count}
count = maybe 1 (+1) $ a !? author
author = getAuthor x
code = getSpecialCode x
accum (c,a,ls) (Cite code _) = (c,a,Cite code (c ! code) : ls)
accum (c,a,ls) y = (c,a,y:ls)
After foldr, the resulted list will contain the contents with
getAuthorCount correctly filled
Cites transferred into Defs, since they have the same outputting format.
The resulted list is reversed, so you'll need Data.List.reverse.
Finally, you can define your own version of Show for Content. For example,
instance Show Index where
show x = getAuthor x ++ " "
++ show (getAuthorCount x) ++ ": "
++ getTitle x ++ " "
instance Show Content where
show (Def idx) = show idx
show (Cite x idx) = getAuthor idx ++ " "
++ show (getAuthorCount idx)
show (Literal x) = x
as I figured out from your output sample.
The full length code:
import Data.Char
import Data.List (reverse)
import Data.Map.Strict ((!),(!?))
import qualified Data.Map.Strict as M
import Text.ParserCombinators.Parsec
data Index = Index {
getAuthor :: String,
getTitle :: String,
getSpecialCode :: String,
getAuthorCount :: Int
-- For counting author later.
}
nullIndex :: Index
nullIndex = Index "" "" "" 0
instance Show Index where
show x = getAuthor x ++ " "
++ show (getAuthorCount x) ++ ": "
++ getTitle x ++ " "
data Content = Def Index
| Cite String Index
| Literal String
instance Show Content where
show (Def idx) = show idx
show (Cite x idx) = getAuthor idx ++ " "
++ show (getAuthorCount idx)
show (Literal x) = x
document = many (try cite <|> try def <|> literal)
literal = Literal <$> many1 (noneOf "#")
def = do
char '#'
author <- many1 $ noneOf "/"
char '/'
title <- many1 $ noneOf "/"
char '/'
code <- many1 $ noneOf "/"
char '/'
return $ Def $ Index author title code 0
cite = do
try $ string "#see/"
code <- many1 $ noneOf "/"
char '/'
return $ Cite code nullIndex
capitalizeAuthor :: Content -> Content
capitalizeAuthor (Def x) = Def (x {getAuthor = author'}) where
author' = toUpper (head author) : tail author
author = getAuthor x
capitalizeAuthor y = y
type CodeDict = M.Map String Index
-- Map Code Index
type AuthorDict = M.Map String Int
-- Map Author Count
type Fold = (CodeDict, AuthorDict, [Content])
emptyFold :: Fold
emptyFold = (M.empty, M.empty, [])
accum :: Fold -> Content -> Fold
accum (c,a,ls) (Def x) = (c',a',Def x':ls) where
a' = M.insertWith (+) author 1 a
c' = M.insert code x' c
x' = x {getAuthorCount = count}
count = maybe 1 (+1) $ a !? author
author = getAuthor x
code = getSpecialCode x
accum (c,a,ls) (Cite code _) = (c,a,Cite code (c ! code) : ls)
accum (c,a,ls) y = (c,a,y:ls)
main :: IO ()
main = do
line <- getLine
let parsed = parse document "" line
case parsed of
Left x -> print x
Right cs -> do
let cs1 = map capitalizeAuthor cs
let (_,_,cs2) = foldl accum emptyFold cs1
let output = concatMap show $ reverse cs2
putStrLn output

Related

How to replace multiple characters in a string in Haskell?

I am making a program that replaces stuff using the Esperanto X-System to Esperanto, so I need it to transform "cx" to "ĉ", "sx" to "ŝ", "gx" to "g", "jx" to "ĵ", and "ux" to "ŭ", and the same for uppercase letters.
Currently it converts "a" to "b", and "c" to "d". The method I am currently using will only work for replacing single character, not multiple characters. So how do I replace multiple characters (like "cx") instead of a single one (like "a")?
replaceChar :: Char -> Char
replaceChar char = case char of
'a' -> 'b'
'c' -> 'd'
_ -> char
xSistemo :: String -> String
xSistemo = map replaceChar
So currently "cats" will transform to "dbts".
As #AJFarmar pointed out, you are probably implementing Esperanto's X-system [wiki]. Here all items that are translated are digraphs that end with x, the x is not used in esperato itself. We can for example use explicit recursion for this:
xSistemo :: String -> String
xSistemo (x:'x':xs) = replaceChar x : xSistemo xs
xSistemo (x:xs) = x : xSistemo xs
xSistemo [] = []
where we have a function replaceChar :: Char -> Char, like:
replaceChar :: Char -> Char
replaceChar 's' = 'ŝ'
-- ...
This then yields:
Prelude> xSistemo "sxi"
"\349i"
Prelude> putStrLn (xSistemo "sxi")
ŝi
A generic method:
The problem looks similar to question 48571481.
So you could try to leverage the power of Haskell regular expressions.
Borrowing from question 48571481, you can use foldl to loop thru the various partial substitutions.
This code seems to work:
-- for stackoverflow question 57548358
-- about Esperanto diacritical characters
import qualified Text.Regex as R
esperantize :: [(String,String)] -> String -> String
esperantize substList st =
let substRegex = R.subRegex
replaceAllIn = foldl (\acc (k, v) -> substRegex (R.mkRegex k) acc v)
in
replaceAllIn st substList
esperSubstList1 = [("cx","ĉ"), ("sx","ŝ"), ("jx","ĵ"), ("ux","ŭ")]
esperantize1 :: String -> String
esperantize1 = esperantize esperSubstList1 -- just bind first argument
main = do
let sta = "abcxrsxdfuxoojxii"
putStrLn $ "st.a = " ++ sta
let ste = esperantize1 sta
putStrLn $ "st.e = " ++ ste
Program output:
st.a = abcxrsxdfuxoojxii
st.e = abĉrŝdfŭooĵii
We can shorten the code, and also optimize it a little bit by keeping the Regex objects around, like this:
import qualified Text.Regex as R
esperSubstList1_raw = [("cx","ĉ"), ("sx","ŝ"), ("jx","ĵ"), ("ux","ŭ")]
-- try to "compile" the substitution list into regex things as far as possible:
esperSubstList1 = map (\(sa, se) -> (R.mkRegex sa, se)) esperSubstList1_raw
-- use 'flip' as we want the input string to be the rightmost argument for
-- currying purposes:
applySubstitutionList :: [(R.Regex,String)] -> String -> String
applySubstitutionList = flip $ foldl (\acc (re, v) -> R.subRegex re acc v)
esperantize1 :: String -> String
esperantize1 = applySubstitutionList esperSubstList1 -- just bind first argument
main = do
let sta = "abcxrsxdfuxoojxiicxtt"
putStrLn $ "st.a = " ++ sta
let ste = esperantize1 sta
putStrLn $ "st.e = " ++ ste

Cutting specific chunks from a Haskell String

I'm trying to cut chunks from a list, with a given predicate. I would have preferred to use a double character, e.g. ~/, but have resolved to just using $. What I essentially want to do is this...
A: "Hello, my $name is$ Danny and I $like$ Haskell"
What I want to turn this into is this:
B: "Hello, my Danny and I Haskell"
So I want to strip everything in between the given symbol, $, or my first preference was ~/, if I can figure it out. What I tried was this:
s1 :: String -> String
s1 xs = takeWhile (/= '$') xs
s2 :: String -> String
s2 xs = dropWhile (/= '$') xs
s3 :: String -> String
s3 xs = s3 $ s2 $ s1 xs
This solution seems to just bug my IDE out (possibly infinite looping).
Solution:
s3 :: String -> String
s3 xs
|'$' `notElem` xs = xs
|otherwise = takeWhile (/= '$') xs ++ (s3 $ s1 xs)
s1 :: String -> String
s1 xs = drop 1 $ dropWhile (/= '$') $ tail $ snd $ break ('$'==) xs
This seems like a nice application for parsers. A solution using trifecta:
import Control.Applicative
import Data.Foldable
import Data.Functor
import Text.Trifecta
input :: String
input = "Hello, my $name is$ Danny and I $like$ Haskell"
cutChunk :: CharParsing f => f String
cutChunk = "" <$ (char '$' *> many (notChar '$') <* char '$')
cutChunk matches $, followed by 0 or more (many) non-$ characters, then another $. Then we use ("" <$) to make this parser's value always be the empty string, thus discarding all the characters that this parser matches.
includeChunk :: CharParsing f => f String
includeChunk = some (notChar '$')
includeChunk matches the text that we want to include in the result, which is anything that's not the $ character. It's important that we use some (matching one or more characters) and not many (matching zero or more characters) because we're going to include this parser within another many expression next; if this parser matched on the empty string, then that could loop infinitely.
chunks :: CharParsing f => f String
chunks = fold <$> many (cutChunk <|> includeChunk)
chunks is the parser for everything. Read <|> as "or", as in "parse either a cutChunk or an includeChunk". many (cutChunk <|> includeChunk) is a parser that produces a list of chunks e.g. Success ["Hello, my ",""," Danny and I ",""," Haskell"], so we fold the output to concatenate those chunks together into a single string.
result :: Result String
result = parseString chunks mempty input
The result:
Success "Hello, my Danny and I Haskell"
Your infinite loop comes from calling s3 recursively with no base case:
s3 :: String -> String
s3 xs = s3 $ s2 $ s1 xs
Adding a base case corrects the infinite loop:
s3 xs
| '$' `notElem` xs = xs
| otherwise = ...
This is not the whole answer. Think about what s1 actually does and where you use its return value:
s1 "hello $my name is$ ThreeFx" == "hello "
For further reference, see the break function:
break :: (a -> Bool) -> [a] -> ([a], [a])
I think your logic is wrong, perhaps easier to write it in an elementary way
Prelude> let pr xs = go xs True
Prelude| where go [] _ = []
Prelude| go (x:xs) f | x=='$' = go xs (not f)
Prelude| | f = x : go xs f
Prelude| | otherwise = go xs f
Prelude|
Prelude> pr "Hello, my $name is$ Danny and I $like$ Haskell"
"Hello, my Danny and I Haskell"
Explanation The flag f keeps track of the state (either pass mode or not). If the current char is a token skip and switch state.

How can I get the value of a Monad without System.IO.Unsafe? [duplicate]

This question already has answers here:
How to get normal value from IO action in Haskell
(2 answers)
Closed 7 years ago.
I just started learning Haskell and got my first project working today. Its a small program that uses Network.HTTP.Conduit and Graphics.Rendering.Chart (haskell-chart) to plot the amount of google search results for a specific question with a changing number in it.
My problem is that simple-http from the conduit package returns a monad (I hope I understood the concept of monads right...), but I only want to use the ByteString inside of it, that contains the html-code of the website. So until now i use download = unsafePerformIO $ simpleHttp url to use it later without caring about the monad - I guess that's not the best way to do that.
So: Is there any better solution so that I don't have to carry the monad with me the whole evaluation? Or would it be better to leave it the way the result is returned (with the monad)?
Here's the full program - the mentioned line is in getResultCounter. If things are coded not-so-well and could be done way better, please remark that too:
import System.IO.Unsafe
import Network.HTTP.Conduit (simpleHttp)
import qualified Data.ByteString.Lazy.Char8 as L
import Graphics.Rendering.Chart.Easy
import Graphics.Rendering.Chart.Backend.Cairo
numchars :: [Char]
numchars = "1234567890"
isNum :: Char -> Bool
isNum = (\x -> x `elem` numchars)
main = do
putStrLn "Please input your Search (The first 'X' is going to be replaced): "
search <- getLine
putStrLn "X ranges from: "
from <- getLine
putStrLn "To: "
to <- getLine
putStrLn "In steps of (Only whole numbers are accepted):"
step <- getLine
putStrLn "Please have some patience..."
let range = [read from,(read from + read step)..read to] :: [Int]
let searches = map (replaceX search) range
let res = map getResultCounter searches
plotList search ([(zip range res)] :: [[(Int,Integer)]])
putStrLn "Done."
-- Creates a plot from the given data
plotList name dat = toFile def (name++".png") $ do
layout_title .= name
plot (line "Results" dat)
-- Calls the Google-site and returns the number of results
getResultCounter :: String -> Integer
getResultCounter search = read $ filter isNum $ L.unpack parse :: Integer
where url = "http://www.google.de/search?q=" ++ search
download = unsafePerformIO $ simpleHttp url -- Not good
parse = takeByteStringUntil "<"
$ dropByteStringUntil "id=\"resultStats\">" download
-- Drops a ByteString until the desired String is found
dropByteStringUntil :: String -> L.ByteString -> L.ByteString
dropByteStringUntil str cont = helper str cont 0
where helper s bs n | (bs == L.empty) = L.empty
| (n >= length s) = bs
| ((s !! n) == L.head bs) = helper s (L.tail bs) (n+1)
| ((s !! n) /= L.head bs) = helper s (L.tail bs) 0
-- Takes a ByteString until the desired String is found
takeByteStringUntil :: String -> L.ByteString -> L.ByteString
takeByteStringUntil str cont = helper str cont 0
where helper s bs n | bs == L.empty = bs
| n >= length s = L.empty
| s !! n == L.head bs = L.head bs `L.cons`
helper s (L.tail bs) (n + 1)
| s !! n /= L.head bs = L.head bs `L.cons`
helper s (L.tail bs) 0
-- Replaces the first 'X' in a string with the show value of the given value
replaceX :: (Show a) => String -> a -> String
replaceX str x | str == "" = ""
| head str == 'X' = show x ++ tail str
| otherwise = head str : replaceX (tail str) x
This is a lie:
getResultCounter :: String -> Integer
The type signature above is promising that the resulting integer only depends on the input string, when this is not the case: Google can add/remove results from one call to the other, affecting the output.
Making the type more honest, we get
getResultCounter :: String -> IO Integer
This honestly admits it's going to interact with the external world. The code then is easily adapted to:
getResultCounter search = do
let url = "http://www.google.de/search?q=" ++ search
download <- simpleHttp url -- perform IO here
let parse = takeByteStringUntil "<"
$ dropByteStringUntil "id=\"resultStats\">" download
return (read $ filter isNum $ L.unpack parse :: Integer)
Above, I tried to preserve the original structure of the code.
Now, in main we can no longer do
let res = map getResultCounter searches
but we can do
res <- mapM getResultCounter searches
after importing Control.Monad.

Haskell extract substring within a string

My goal is to find the number of times a substring exists within a string.
The substring I'm looking for will be of type "[n]", where n can be any variable.
My attempt involved splitting the string up using the words function,
then create a new list of strings if the 'head' of a string was '[' and
the 'last' of the same string was ']'
The problem I ran into was that I entered a String which when split using
the function words, created a String that looked like this "[2],"
Now, I still want this to count as an occurrence of the type "[n]"
An example would be I would want this String,
asdf[1]jkl[2]asdf[1]jkl
to return 3.
Here's the code I have:
-- String that will be tested on references function
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
-- Function that will take a list of Strings and return a list that contains
-- any String of the type [n], where n is an variable
ref :: [String] -> [String]
ref [] = []
ref xs = [x | x <- xs, head x == '[', last x == ']']
-- Function takes a text with references in the format [n] and returns
-- the total number of references.
-- Example : ghci> references txt -- -> 3
references :: String -> Integer
references txt = len (ref (words txt))
If anyone can enlighten me on how to search for a substring within a string
or how to parse a string given a substring, that would be greatly appreciated.
I would just use a regular expression, and write it like this:
import Text.Regex.Posix
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
-- references counts the number of references in the input string
references :: String -> Int
references str = str =~ "\\[[0-9]*\\]"
main = putStrLn $ show $ references txt -- outputs 3
regex is huge overkill for such a simple problem.
references = length . consume
consume [] = []
consume ('[':xs) = let (v,rest) = consume' xs in v:consume rest
consume (_ :xs) = consume xs
consume' [] = ([], [])
consume' (']':xs) = ([], xs)
consume' (x :xs) = let (v,rest) = consume' xs in (x:v, rest)
consume waits for a [ , then calls consume', which gathers everything until a ].
Here's a solution with
sepCap.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Either
import Data.Maybe
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
pattern = single '[' *> anySingle <* single ']' :: Parsec Void String Char
length $ rights $ fromJust $ parseMaybe (sepCap pattern) txt
3

Improve a haskell script

I'm a newbie in Haskell and I'd like some opinions about improving this script. This is a code generator and requires a command line argument to generate the sql script.
./GenCode "people name:string age:integer"
Code:
import Data.List
import System.Environment (getArgs)
create_table :: String -> String
create_table str = "CREATE TABLE " ++ h (words str)
where h (x:xs) = let cab = x
final = xs
in x ++ "( " ++ create_fields xs ++ ")"
create_fields (x:xs) = takeWhile (/=':') x ++ type x ++ sig
where sig | length xs > 0 = "," ++ create_fields xs
| otherwise = " " ++ create_fields xs
create_fields [] = ""
type x | isInfixOf "string" x = " CHARACTER VARYING"
| isInfixOf "integer" x = " INTEGER"
| isInfixOf "date" x = " DATE"
| isInfixOf "serial" x = " SERIAL"
| otherwise = ""
main = mainWith
where mainWith = do
args <- getArgs
case args of
[] -> putStrLn $ "You need one argument"
(x:xs) -> putStrLn $ (create_table x)
I think you understand how to write functional code already. Here are some small style notes:
Haskell usually uses camelCase, not under_score_separation
In create_table, cabo and final are not used.
Usually a list-recursive function like create_fields puts the empty list case first.
I would not make create_fields recursive anyway. The comma-joining code is quite complicated and should be separated from the typing code. Instead do something like Data.List.intercalate "," (map create_field xs). Then create_field x can just be takeWhile (/=':') x ++ type x
Especially if there are a lot of types to be translated, you might put them into a map
Like so:
types = Data.Map.fromList [("string", "CHARACTER VARYING")
,("integer", "INTEGER")
-- etc
]
Then type can be Data.Maybe.fromMaybe "" (Data.Map.lookup x types)
Code can appear in any order, so it's nice to have main up front. (This is personal preference though)
You don't need mainWith.
Just say
main = do
args <- getArgs
case args of
[] -> ...
You don't need the dollar for the calls to putStrLn. In the first call, the argument wouldn't require parentheses anyway, and in the second, you supply the parentheses. Alternatively, you could keep the second dollar and drop the parentheses.
Don't use length xs > 0 (in sig); it needlessly counts the length of xs when all you really wanted to know is whether it's empty. Use null xs to check for a non-empty list:
...
where sig | null xs = ... -- Empty case
| otherwise = ... -- Non-empty case
or add an argument to sig and pattern match:
...
where sig (y:ys) = ...
sig [] = ...
Although Nathan Sanders' advice to replace the whole recursive thing with intercalate is excellent and makes this a moot point.
You're also identifying the type by passing the whole "var:type" string into type, so it is testing
"string" `isInfixOf` "name:string"
etc.
You could use break or span instead of takeWhile to separate the name and type earlier:
create_fields (x:xs) = xname ++ type xtype ++ sig
where
(xname, _:xtype) = break (==':') x
sig = ...
and then type can compare for string equality, or look up values using a Map.
A quick explanation of that use of break:
break (==':') "name:string" == ("name", ":string")
Then when binding
(xname, _:xtype) to ("name", ":string"),
xname -> "name"
_ -> ':' (discarded)
xtype -> "string"

Resources