Haskell. TagSoup library with OverloadedStrings - haskell

Good day, question is rather noobish, but i stuck with it.
I decided to migrate from plain Strings to Text in my project and faced the problem, all strings in source yielded compilation errors after adding {-# LANGUAGE OverloadedStrings #-}, for example snippet like:
dropWhile (~/= "<li>") tags
now leads to
Ambiguous type variable t' in the constraints:
Data.String.IsString t'
arising from the literal "<li>"'
at ParserOx.hs:93:42-47
TagRep t'
arising from a use of `~=='
What could be wrong here ?
UPD:
And yes, all my functions have signatures, ex:
getContainer :: [Tag Text] -> [Tag Text]
getContainer tags =
h
where
(h:t) = sections (~== "<div id=\"itemscontainer\">") tags

The problem is that you have an ambiguous type with two constraints -- the isstring constraint given by overloaded strings, and the tagrep constraint used by tagsoup to allow you to use tags or strings interchangeably. So two methods of "overloading" strings (one in general, and one just for use in tagsoup's matchers) are running into one another and causing confusion. Either turn off overloaded strings in the offending file, or specify your strings as actual strings in the code (i.e. (~/= ("<li>"::String))). Rather than inline type signatures, you can do the following to force types more quietly:
s :: String -> String
s = id
.... (~/= s "<li>") ...

One option is to define a wrapper around ~== which pins down parts of the type. For example, you could define:
(~===) a b = a ~== (b :: String)
Then you can just write (~=== "<div id=\"itemscontainer\">") with no further annotations at the use site.

The compiler can't find out what kind of String you want to have. Try to give your functions explicit signatures.

Related

How can I save a variable as a bytestring?

Ik this is a dumb question, but if I have this:
a :: B.ByteString
a = "a"
I get an error that says "Couldn't match type B.ByteString with type [Char]". I know what's the problem but I don't know how to fix it, could you help? thx.
Character string literals in Haskell, by default, are always treated as String, which is equivalent to [Char]. Most string-like data structures define a function called pack to convert from, and the bytestring package is no exception (Note that this is pack from Data.ByteString.Char8; the one in Data.ByteString converts from [Word8]).
import Data.ByteString.Char8(pack)
a :: B.ByteString
a = pack "a"
However, GHC also supports an extension called OverloadedStrings. If you're willing to enable this, ByteString implements a typeclass called IsString. With this extension enabled, the type of a string literal like "a" is no longer [Char] and is instead forall a. IsString a => a (similar to how the type of numerical literals like 3 is forall a. Num a => a). This will happily specialize to ByteString if the type is in scope.
{-# LANGUAGE OverloadedStrings #-}
a :: B.ByteString
a = "a"
If you go this route, make sure you understand the proviso listed in the docs for this instance. For ASCII characters, it won't pose a problem, but if your string has Unicode characters outside the ASCII range, you need to be aware of it.

How does OverloadedStrings language extension work?

I am trying to understand the language extension OverloadedStrings from the page https://ocharles.org.uk/posts/2014-12-17-overloaded-strings.html.
When the OverloadedStrings is enabled, then String becomes a type Data.String.IsString a => a:
Prelude Data.String> :t fromString "Foo"
fromString "Foo" :: IsString a => a
In the description, the author has mentioned the following:
By enabling this extension, string literals are now a call to the
fromString function, which belongs to the IsString type class.
What does string literals are now a call to the fromString function ?
and also the author has mentioned:
This polymorphism is extremely powerful, and it allows us to write
embedded domain specific languages in Haskell source code, without
having to introduce new constructs for otherwise normal values.
what does without having to introduce new constructs for otherwise normal values mean?
When the OverloadedStrings is enabled, then String becomes a type Data.String.IsString a => a
No that is incorrect. A String remains a String. It has only effect on string literals, not variables that have as type a String, and these still can be Strings.
What does string literals are now a call to the fromString function?
It means that if you write a string literal, like "foo", Haskell implicitly writes fromString "foo", and thus you can use this like any IsString object.
what does without having to introduce new constructs for otherwise normal values mean?
It means that we can make our own types for which we can write some sort of "mini-parser", and thus write these objects as string literals in our code. For example if we make a datatype like:
newtype BoolList = BoolList [Bool] deriving Show
then we can write our own parser
instance IsString BoolList where
fromString = BoolList . map toBool
where toBool '1' = True
toBool _ = False
Now we can for example define a list of Bools as:
myboollist :: BoolList
myboollist = "10110010001"
So then we get:
Prelude Data.String> myboollist
BoolList [True,False,True,True,False,False,True,False,False,False,True]
We here thus wrote a string literal "10110010001", and that means that implictly, we wrote fromString "10110010001". Since the type of myboollist is BoolList, it is here clear to what the string literal is parsed.
This thus can be useful if some data types are complex, our would take a lot of code to construct an object.
Since the fromString call is however postponed, and frequently not all possible strings map to a value of the type (here it is the case, although it is debatable if it is good to just fill in False for everything else than '1'), it thus can raise errors at runtime when the string turns out to be "unparsable".
what does without having to introduce new constructs for otherwise normal values mean?
The next sentence says
So why should string literals be any different?
so this one refers primarily to number literals. Consider e.g. a type defining polynomials. Because + and * can only be applied to arguments of the same type, if we want
2*x^3 + 3*x :: Poly Int
to be legal, 2 and 3 have to be of type Poly Int; otherwise you'd need either
a separate operator to multiply a polynomial by a number: 2.*x^3 + 3.^x.
a constructor for a constant polynomial: (C 2)*x^3 + (C 3)*x
An example for string literals is given at the end:
However, SQL queries are notorious for injection attacks when we concatenate strings. Interestingly, postgresql-simple provides a Query type that only has a IsString instance. This means that it’s very lightweight to write a literal query, but the moment we want to start concatenating strings for our query, we have to be very explicit.

Why can't I use record selectors with an existentially quantified type?

When using Existential types, we have to use a pattern-matching syntax for extracting the foralled value. We can't use the ordinary record selectors as functions. GHC reports an error and suggest using pattern-matching with this definition of yALL:
{-# LANGUAGE ExistentialQuantification #-}
data ALL = forall a. Show a => ALL { theA :: a }
-- data ok
xALL :: ALL -> String
xALL (ALL a) = show a
-- pattern matching ok
-- ABOVE: heaven
-- BELOW: hell
yALL :: ALL -> String
yALL all = show $ theA all
-- record selector failed
forall.hs:11:19:
Cannot use record selector `theA' as a function due to escaped type variables
Probable fix: use pattern-matching syntax instead
In the second argument of `($)', namely `theA all'
In the expression: show $ theA all
In an equation for `yALL': yALL all = show $ theA all
Some of my data take more than 5 elements. It's hard to maintain the code if I
use pattern-matching:
func1 (BigData _ _ _ _ elemx _ _) = func2 elemx
Is there a good method to make code like that maintainable or to wrap it up so that I can use some kind of selectors?
Existential types work in a more elaborate manner than regular types. GHC is (rightly) forbidding you from using theA as a function. But imagine there was no such prohibition. What type would that function have? It would have to be something like this:
-- Not a real type signature!
theA :: ALL -> t -- for a fresh type t on each use of theA; t is an instance of Show
To put it very crudely, forall makes GHC "forget" the type of the constructor's arguments; all that the type system knows is that this type is an instance of Show. So when you try to extract the value of the constructor's argument, there is no way to recover the original type.
What GHC does, behind the scenes, is what the comment to the fake type signature above says—each time you pattern match against the ALL constructor, the variable bound to the constructor's value is assigned a unique type that's guaranteed to be different from every other type. Take for example this code:
case ALL "foo" of
ALL x -> show x
The variable x gets a unique type that is distinct from every other type in the program and cannot be matched with any type variable. These unique types are not allowed to escape to the top level—which is the reason why theA cannot be used as a function.
You can use record syntax in pattern matching,
func1 BigData{ someField = elemx } = func2 elemx
works and is much less typing for huge types.

Why does the Data.String.IsString typeclass only define one conversion?

Why does the Haskell base package only define the IsString class to have a conversion from String to 'like-string' value, and not define the inverse transformation, from 'like-string' value to String?
The class should be defined as:
class IsString a where
fromString :: String -> a
toString :: a -> String
ref: http://hackage.haskell.org/packages/archive/base/4.4.0.0/doc/html/Data-String.html
The reason is IMHO that IsString's primary purpose is to be used for string literals in Haskell source code (or (E)DSLs -- see also Paradise: A two-stage DSL embedded in Haskell) via the OverloadedStrings language extension in an analogous way to how other polymorphic literals work (e.g. via fromRational for floating point literals or fromInteger for integer literals)
The term IsString might be a bit misleading, as it suggests that the type-class represents string-like structures, whereas it's really just to denote types which have a quoted-string-representation in Haskell source code.
If you desire to use toString :: a -> String, I think you're simply forgetting about show :: a -> String, or more properly Show a => show :: a -> String.
If you want to operate on a type both having a :: a -> String and :: String -> a, you can simply put those type-class constraints on the functions.
doubleConstraintedFunction :: Show a, IsString a => a -> .. -> .. -> a
We carefully note that we avoid defining type classes having a set of functions that can as well be split into two subclasses. Therefor we don't put toString in IsString.
Finally, I must also mention about Read, which provides Read a => String -> a. You use read and show for very simple serialization. fromString from IsString has a different purpose, it's useful with the language pragma OverloadedStrings, then you can very conveniently insert code like "This is not a string" :: Text. (Text is a (efficient) data-structure for Strings)

Template Haskell type quoting problems

The TemplateHaskell quoting documents two quotes ('') as the way to get the Name of a type:
> ''String
GHC.Base.String
This works fine for this type (name). However, I can't find a way to make it work nice for e.g. Maybe String:
> ''Maybe String -- interprets String as a data constructor
> ''Maybe ''String -- wants to apply ''String to the Name type
I know I can workaround via using [t| Maybe String |], but this is then in the Q monad, and requires type changes, and I think is not type-checked at the respective moment, only when spliced in.
I can also work around by first defining a type alias, type MaybeString = Maybe String, and then using ''MaybeString, but this is also cumbersome.
Any way to directly get what I want simply via the '' quotation?
'' is used to quote names, not types. Maybe is a name, Maybe String is not. It is therefore not too surprising that you have to give your type a name by defining a type alias, before you can quote that name.
[t| |] on the other hand, quotes types. Note the difference here.
Prelude> :t ''String
''String :: Language.Haskell.TH.Syntax.Name
Prelude> :t [t| String |]
[t| String |]
:: Language.Haskell.TH.Syntax.Q Language.Haskell.TH.Syntax.Type
So I'm afraid you cannot use '' for what you're trying to do.
I think what you're looking for is:
ConT ''Maybe `AppT` ConT ''String

Resources