In Haskell, the () type has two values, namely, () and bottom. If you have an expression e :: (), there's no point in actually inspecting it, since either it's e = () or by inspecting it you're crashing a program which could otherwise have not crashed.
Hence, I figured that perhaps operations on values of type () would not inspect the value and would not distinguish between () and bottom.
However, this is wildly untrue:
▎λ ghci
GHCi, version 9.0.2: https://www.haskell.org/ghc/ :? for help
ghci> u = (undefined :: ())
ghci> show u
"*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
ghci> () == u
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
ghci> f () = "ok"
ghci> f u
"*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:75:14 in base:GHC.Err
undefined, called at <interactive>:1:6 in interactive:Ghci1
What is the reason for this? Here are some conjectures:
For some reason that I can't think of, it's useful to be non-lazy on (). Sometimes we want that bottom to propagate.
Haskell semantics are written in such a way that destructuring any ADTs, even trivial ones, inspects them. This means that having case (undefined :: ()) of { () -> ... } not throw would be a violation of language semantics
() is an extremely special case and simply isn't worth the attention to eke out this tiny extra bit of safety in a massive language like Haskell
There's also the possible combination explanation of 2+3, that Haskell could have had semantics dictating that an expression case e of inspects e unless it is of type (), but that would pollute the language spec for relatively low benefit
I will address this part:
For some reason that I can't think of, it's useful to be non-lazy on (). Sometimes we want that bottom to propagate.
Let's have a look at Control.Parallel.Strategies (version 1, an older version). This is a module for parallel evaluation. Let's focus on one of its functions for the sake of simplicity:
parMap :: Strategy b -> (a -> b) -> [a] -> [b]
The result of parMap strat f xs is the same as map f xs, except that the list is computed in parallel. What is the strat argument? Well,
strat :: Strategy b
means
strat :: b -> ()
There are only two things you can do with strat:
call it and ignore the result, which by laziness amounts to not calling it at all;
call it and force the result, even if you know it's () or a bottom.
parMap does the latter, in parallel. This allows the caller to specify a strat argument that evaluates the list values of type b as needed. For example
parMap (\(x,y) -> ()) f xs
parMap (\(x,y) -> x `seq` ()) f xs
parMap (\(x,y) -> x `seq` y `seq` ()) f xs
are valid calls, and will cause parMap to evaluate the new list-of-pairs only to expose the pair constructor, also the first component, also the second component, respectively.
Hence, forcing the () result of strat in this case allows the user to control how much evaluation to perform during parMap, i.e. how much to force the result (in parallel), and consequently which parts of the result should be left unevaluated. (By comparison map f xs would leave the result fully unevaluated -- it is completely lazy. parMap can not do that otherwise it is not longer parallel.)
Minor digression: note that the GADT
data a :~: b where
Refl :: t :~: t
has one constructor like (). Here, it is mandatory that such values are forced as in:
foo :: Int :~: String -> Int -> String
foo Refl x = x ++ " hello"
Here the first argument must be a bottom. By forcing that, we make the function error out with an exception. If we did not force that, we would get a very nasty undefined behavior like those in C and C++, completely breaking type safety. Haskell will correctly reject any attempt to circumvent that:
foo :: Int :~: String -> Int -> String
foo _ x = x ++ " hello"
triggers a type error at compile time.
I don't know for sure, but I suspect it's none of the things you said. Instead, this is so that the language is predictable and consistent.
There are, essentially, two things you observed, and I consider them to be separate things. The first is that checking whether a x is indeed () with a case statement forces evaluation of x; the second is that the instances (of Show and Eq) are written to use a case statement.
Pattern matching: the predictable, consistent rule here is that if you write case <e0> of <pat> -> <e1>, then e0 is evaluated far enough to check whether the constructors in pat are in fact in the given places. Well, okay, there's some wrinkles here to do with irrefutable patterns; let's say instead that e0 is evaluated far enough to check whether pat actually does match! For the () type, that means that the pattern () causes full evaluation -- because you've specified the full value that you expect it to be -- while the pattern x or _ can match without further evaluation.
Class instances: the natural inductive way to specify what the various class instances do is to always have an outermost case that matches against each available constructor with simple variable patterns for the fields, then does something (presumably recursive calls) on each of the fields in turn. That is, simplifying a bit, the show implementation goes like:
show x = case x of
<Con0> field00 field01 field02 <...> -> "<Con0>"
++ " " ++ show field00
++ " " ++ show field01
++ " " ++ show field02
++ <...>
<Con1> field10 field11 field12 <...> -> "<Con1>"
++ " " ++ show field10
++ " " ++ show field11
++ " " ++ show field12
++ <...>
<...>
It is very natural for the specialization of this scheme to the single-constructor, zero-field type () to go:
show x = case x of
() -> "()"
(Additionally, the Report specifies that (==) is is always strict in both arguments; but that property would also arise naturally from the obvious way of writing a generic Eq instance derivation algorithm.) Therefore the path of least surprise is for class instances to pattern match on their argument(s).
#2 is definitely true.
The () type is just a nullary data type with special type/data constructor syntax:
data () = ()
As a result, the Haskell 2010 report, while only providing an informal semantics, makes it pretty clear in section 3.17.2 Informal Semantics of Pattern Matching that the expression:
case undefined of () -> "ack!"
will be evaluated as per rule #5:
Matching the pattern con pat1 … patn against a value, where con is a constructor defined by data, depends on the value:
If the value is of the form con v1 … vn, sub-patterns are matched left-to-right against the components of the data value; if all matches succeed, the overall match succeeds; the first to fail or diverge causes the overall match to fail or diverge, respectively.
If the value is of the form con′ v1 … vm, where con is a different constructor to con′, the match fails.
If the value is ⊥, the match diverges.
Here, the value of undefined is ⊥, so the third bullet point applies, and the match diverges. And if the match diverges, the program diverges, and if the program diverges it must terminate with an error or -- at worst -- loop forever. It cannot continue as if nothing has happened. Admittedly, this last part is not explicitly stated, but it is the only reasonable interpretation for the semantics of a divergent evaluation of an expression.
Related
In GHC.Prim, we find a magical function named dataToTag#:
dataToTag# :: a -> Int#
It turns a value of any type into an integer based on the data constructor it uses. This is used to speed up derived implementations of Eq, Ord, and Enum. In the GHC source, the docs for dataToTag# explain that the argument should already by evaluated:
The dataToTag# primop should always be applied to an evaluated argument.
The way to ensure this is to invoke it via the 'getTag' wrapper in GHC.Base:
getTag :: a -> Int#
getTag !x = dataToTag# x
It makes total sense to me that we need to force x's evaluation before dataToTag# is called. What I do not get is why the bang pattern is sufficient. The definition of getTag is just syntactic sugar for:
getTag :: a -> Int#
getTag x = x `seq` dataToTag# x
But let's turn to the docs for seq:
A note on evaluation order: the expression seq a b does not guarantee that a will be evaluated before b. The only guarantee given by seq is that the both a and b will be evaluated before seq returns a value. In particular, this means that b may be evaluated before a. If you need to guarantee a specific order of evaluation, you must use the function pseq from the "parallel" package.
In the Control.Parallel module from the parallel package, the docs elaborate further:
... seq is strict in both its arguments, so the compiler may, for example, rearrange a `seq` b into b `seq` a `seq` b ...
How is it that getTag is guaranteed to behave work, given that seq is insufficient for controlling evaluation order?
GHC tracks certain information about each primop. One key datum is whether the primop "can_fail". The original meaning of this flag is that a primop can fail if it can cause a hard fault. For example, array indexing can cause a segmentation fault if the index is out of range, so indexing operations can fail.
If a primop can fail, GHC will restrict certain transformations around it, and in particular won't float it out of any case expressions. It would be rather bad, for example, if
if n < bound
then unsafeIndex c n
else error "out of range"
were compiled to
case unsafeIndex v n of
!x -> if n < bound
then x
else error "out of range"
One of these bottoms is an exception; the other is a segfault.
dataToTag# is marked can_fail. So GHC sees (in Core) something like
getTag = \x -> case x of
y -> dataToTag# y
(Note that case is strict in Core.) Because dataToTag# is marked can_fail, it won't be floated out of any case expressions.
When I try to pattern-match a GADT in an proc syntax (with Netwire and Vinyl):
sceneRoot = proc inputs -> do
let (Identity camera :& Identity children) = inputs
returnA -< (<*>) (map (rGet draw) children) . pure
I get the (rather odd) compiler error, from ghc-7.6.3
My brain just exploded
I can't handle pattern bindings for existential or GADT data constructors.
Instead, use a case-expression, or do-notation, to unpack the constructor.
In the pattern: Identity cam :& Identity childs
I get a similar error when I put the pattern in the proc (...) pattern. Why is this? Is it unsound, or just unimplemented?
Consider the GADT
data S a where
S :: Show a => S a
and the execution of the code
foo :: S a -> a -> String
foo s x = case s of
S -> show x
In a dictionary-based Haskell implementation, one would expect that the value s is carrying a class dictionary, and that the case extracts the show function from said dictionary so that show x can be performed.
If we execute
foo undefined (\x::Int -> 4::Int)
we get an exception. Operationally, this is expected, because we can not access the dictionary.
More in general, case (undefined :: T) of K -> ... is going to produce an error because it forces the evaluation of undefined (provided that T is not a newtype).
Consider now the code (let's pretend that this compiles)
bar :: S a -> a -> String
bar s x = let S = s in show x
and the execution of
bar undefined (\x::Int -> 4::Int)
What should this do? One might argue that it should generate the same exception as with foo. If this were the case, referential transparency would imply that
let S = undefined :: S (Int->Int) in show (\x::Int -> 4::Int)
fails as well with the same exception. This would mean that the let is evaluating the undefined expression, very unlike e.g.
let [] = undefined :: [Int] in 5
which evaluates to 5.
Indeed, the patterns in a let are lazy: they do not force the evaluation of the expression, unlike case. This is why e.g.
let (x,y) = undefined :: (Int,Char) in 5
successfully evaluates to 5.
One might want to make let S = e in e' evaluate e if a show is needed in e', but it feels rather weird. Also, when evaluating let S = e1 ; S = e2 in show ... it would be unclear whether to evaluate e1, e2, or both.
GHC at the moment chooses to forbid all these cases with a simple rule: no lazy patterns when eliminating a GADT.
What is the precise promise/guarantee the Haskell language provides with respect to referential transparency? At least the Haskell report does not mention this notion.
Consider the expression
(7^7^7`mod`5`mod`2)
And I want to know whether or not this expression is 1. For my safety, I will do perform this twice:
( (7^7^7`mod`5`mod`2)==1, [False,True]!!(7^7^7`mod`5`mod`2) )
which now gives (True,False) with GHCi 7.4.1.
Evidently, this expression is now referentially opaque. How can I tell whether or not a program is subject to such behavior? I can inundate the program with :: all over but that does not make it very readable. Is there any other class of Haskell programs in between that I miss? That is between a fully annotated and an unannotated one?
(Apart from the only somewhat related question I found on SO there must be something else on this)
I do not think there's any guarantee that evaluating a polymorphically typed expression such as 5 at different types will produce "compatible" results, for any reasonable definition of "compatible".
GHCi session:
> class C a where num :: a
> instance C Int where num = 0
> instance C Double where num = 1
> num + length [] -- length returns an Int
0
> num + 0 -- GHCi defaults to Double for some reason
1.0
This looks as it's breaking referential transparency since length [] and 0 should be equal, but under the hood it's num that's being used at different types.
Also,
> "" == []
True
> [] == [1]
False
> "" == [1]
*** Type error
where one could have expected False in the last line.
So, I think referential transparency only holds when the exact types are specified to resolve polymorphism. An explicit type parameter application à la System F would make it possible to always substitute a variable with its definition without altering the semantics: as far as I understand, GHC internally does exactly this during optimization to ensure that semantics is unaffected. Indeed, GHC Core has explicit type arguments which are passed around.
The problem is overloading, which does indeed sort of violate referential transparency. You have no idea what something like (+) does in Haskell; it depends on the type.
When a numeric type is unconstrained in a Haskell program the compiler uses type defaulting to pick some suitable type. This is for convenience, and usually doesn't lead to any surprises. But in this case it did lead to a surprise. In ghc you can use -fwarn-type-defaults to see when the compiler has used defaulting to pick a type for you. You can also add the line default () to your module to stop all defaulting.
I thought of something which might help clarify things...
The expression mod (7^7^7) 5 has type Integral a so there are two common ways to convert it to an Int:
Perform all of the arithmetic using Integer operations and types and then convert the result to an Int.
Perform all of the arithmetic using Int operations.
If the expression is used in an Int context Haskell will perform method #2. If you want to force Haskell to use #1 you have to write:
fromInteger (mod (7^7^7) 5)
This will ensure that all of the arithmetic operations will be performed using Integer operations and types.
When you enter he expression at the ghci REPL, defaulting rules typed the expression as an Integer, so method #1 was used. When you use the expression with the !! operator it was typed as an Int, so it was computed via method #2.
My original answer:
In Haskell the evaluation of an expression like
(7^7^7`mod`5`mod`2)
depends entirely on which Integral instance is being used, and this is something that every Haskell programmer learns to accept.
The second thing that every programmer (in any language) has to be aware of is that numeric operations are subject to overflow, underflow, loss of precision, etc. and thereby the laws for arithmetic may not always hold. For instance, x+1 > x is not always true; addition and multiple of real numbers is not always associative; the distributive law does not always hold; etc. When you create an overflowing expression you enter the realm of undefined behavior.
Also, in this particular case there are better ways to go about evaluating this expression which preserves more of our expectation of what the result should be. In particular, if you want to efficiently and accurately compute a^b mod c you should be using the "power mod" algorithm.
Update: Run the following program to see how the choice of Integral instance affects the what an expression evaluates to:
import Data.Int
import Data.Word
import Data.LargeWord -- cabal install largeword
expr :: Integral a => a
expr = (7^e `mod` 5)
where e = 823543 :: Int
main :: IO ()
main = do
putStrLn $ "as an Integer: " ++ show (expr :: Integer)
putStrLn $ "as an Int64: " ++ show (expr :: Int64)
putStrLn $ "as an Int: " ++ show (expr :: Int)
putStrLn $ "as an Int32: " ++ show (expr :: Int32)
putStrLn $ "as an Int16: " ++ show (expr :: Int16)
putStrLn $ "as a Word8: " ++ show (expr :: Word8)
putStrLn $ "as a Word16: " ++ show (expr :: Word16)
putStrLn $ "as a Word32: " ++ show (expr :: Word32)
putStrLn $ "as a Word128: " ++ show (expr :: Word128)
putStrLn $ "as a Word192: " ++ show (expr :: Word192)
putStrLn $ "as a Word224: " ++ show (expr :: Word224)
putStrLn $ "as a Word256: " ++ show (expr :: Word256)
and the output (compiled with GHC 7.8.3 (64-bit):
as an Integer: 3
as an Int64: 2
as an Int: 2
as an Int32: 3
as an Int16: 3
as a Word8: 4
as a Word16: 3
as a Word32: 3
as a Word128: 4
as a Word192: 0
as a Word224: 2
as a Word256: 1
What is the precise promise/guarantee the Haskell language provides with respect to referential transparency? At least the Haskell report does not mention this notion.
Haskell does not provide a precise promise or guarantee. There exist many functions like unsafePerformIO or traceShow which are not referentially transparent. The extension called Safe Haskell however provides the following promise:
Referential transparency — Functions in the safe language are deterministic, evaluating them will not cause any side effects. Functions in the IO monad are still allowed and behave as usual. Any pure function though, as according to its type, is guaranteed to indeed be pure. This property allows a user of the safe language to trust the types. This means, for example, that the unsafePerformIO :: IO a -> a function is disallowed in the safe language.
Haskell provides an informal promise outside of this: the Prelude and base libraries tend to be free of side effects and Haskell programmers tend to label things with side effects as such.
Evidently, this expression is now referentially opaque. How can I tell whether or not a program is subject to such behavior? I can inundate the program with :: all over but that does not make it very readable. Is there any other class of Haskell programs in between that I miss? That is between a fully annotated and an unannotated one?
As others have said, the problem emerges from this behavior:
Prelude> ( (7^7^7`mod`5`mod`2)==1, [False,True]!!(7^7^7`mod`5`mod`2) )
(True,False)
Prelude> 7^7^7`mod`5`mod`2 :: Integer
1
Prelude> 7^7^7`mod`5`mod`2 :: Int
0
This happens because 7^7^7 is a huge number (about 700,000 decimal digits) which easily overflows a 64-bit Int type, but the problem will not be reproducible on 32-bit systems:
Prelude> :m + Data.Int
Prelude Data.Int> 7^7^7 :: Int64
-3568518334133427593
Prelude Data.Int> 7^7^7 :: Int32
1602364023
Prelude Data.Int> 7^7^7 :: Int16
8823
If using rem (7^7^7) 5 the remainder for Int64 will be reported as -3 but since -3 is equivalent to +2 modulo 5, mod reports +2.
The Integer answer is used on the left due to the defaulting rules for Integral classes; the platform-specific Int type is used on the right due to the type of (!!) :: [a] -> Int -> a. If you use the appropriate indexing operator for Integral a you instead get something consistent:
Prelude> :m + Data.List
Prelude Data.List> ((7^7^7`mod`5`mod`2) == 1, genericIndex [False,True] (7^7^7`mod`5`mod`2))
(True,True)
The problem here is not referential transparency because the functions that we're calling ^ are actually two different functions (as they have different types). What has tripped you up is typeclasses, which are an implementation of constrained ambiguity in Haskell; you have discovered that this ambiguity (unlike unconstrained ambiguity -- i.e. parametric types) can deliver counterintuitive results. This shouldn't be too surprising but it's definitely a little strange at times.
A another type has been chosen, because !! requires an Int. The full computation now uses Int instead of Integer.
λ> ( (7^7^7`mod`5`mod`2 :: Int)==1, [False,True]!!(7^7^7`mod`5`mod`2) )
(False,False)
What you think this has to do with referential transparency? Your uses of 7, ^, mod, 5, 2, and == are applications of those variables to dictionaries, yes, but I don't see why you think that fact makes Haskell referentially opaque. Often applying the same function to different arguments produces different results, after all!
Referential transparency has to do with this expression:
let x :: Int = 7^7^7`mod`5`mod`2 in (x == 1, [False, True] !! x)
x is here a single value, and should always have that same single value.
By contrast, if you say:
let x :: forall a. Num a => a; x = 7^7^7`mod`5`mod`2 in (x == 1, [False, True] !! x)
(or use the expression inline, which is equivalent), x is now a function, and can return different values depending on the Num argument you supply to it. You might as well complain that let f = (+1) in map f [1, 2, 3] is [2, 3, 4], but let f = (+3) in map f [1, 2, 3] is [4, 5, 6] and then say "Haskell gives different values for map f [1, 2, 3] depending on the context so it's referentially opaque"!
Probably another type-inference and referential-transparency related thing is the „dreaded“ Monomorphism restriction (its absence, to be exact). A direct quote:
An example, from „A History of Haskell“:
Consider the genericLength function, from Data.List
genericLength :: Num a => [b] -> a
And consider the function:
f xs = (len, len)
where
len = genericLength xs
len has type Num a => a and, without the monomorphism restriction, it could be computed twice.
Notice that in this case types of both expressions are the same. Results are too, but the substitution isn't always possible.
I have been wondering about this a lot, but I haven't been able to find anything about it.
When using the seq function, how does it then really work? Everywhere, it is just explained saying that seq a b evaluates a, discards the result and returns b.
But what does that really mean? Would the following result in strict evaluation:
foo s t = seq q (bar q t) where
q = s*t
What I mean is, is q strictly evaluated before being used in bar? And would the following be equivalent:
foo s t = seq (s*t) (bar (s*t) t)
I find it a little hard getting specifics on the functionality of this function.
You're not alone. seq is probably one of the most difficult Haskell functions to use properly, for a few different reasons. In your first example:
foo s t = seq q (bar q t) where
q = s*t
q is evaluated before bar q t is evaluated. If bar q t is never evaluated, q won't be either. So if you have
main = do
let val = foo 10 20
return ()
as val is never used, it won't be evaluated. So q won't be evaluated either. If you instead have
main = print (foo 10 20)
the result of foo 10 20 is evaluated (by print), so within foo q is evaluated before the result of bar.
This is also why this doesn't work:
myseq x = seq x x
Semantically, this means the first x will be evaluated before the second x is evaluated. But if the second x is never evaluated, the first one doesn't need to be either. So seq x x is exactly equivalent to x.
Your second example may or may not be the same thing. Here, the expression s*t will be evaluated before bar's output, but it may not be the same s*t as the first parameter to bar. If the compiler performs common sub-expression elimination, it may common-up the two identical expressions. GHC can be fairly conservative about where it does CSE though, so you can't rely on this. If I define bar q t = q*t it does perform the CSE and evaluate s*t before using that value in bar. It may not do so for more complex expressions.
You might also want to know what is meant by strict evaluation. seq evaluates the first argument to weak head normal form (WHNF), which for data types means unpacking the outermost constructor. Consider this:
baz xs y = seq xs (map (*y) xs)
xs must be a list, because of map. When seq evaluates it, it will essentially transform the code into
case xs of
[] -> map (*y) xs
(_:_) -> map (*y) xs
This means it will determine if the list is empty or not, then return the second argument. Note that none of the list values are evaluated. So you can do this:
Prelude> seq [undefined] 4
4
but not this
Prelude> seq undefined 5
*** Exception: Prelude.undefined
Whatever data type you use for seqs first argument, evaluating to WHNF will go far enough to figure out the constructor and no further. Unless the data type has components that are marked as strict with a bang pattern. Then all the strict fields will also be evaluated to WHNF.
Edit: (thanks to Daniel Wagner for suggestion in comments)
For functions, seq will evaluate the expression until the function "has a lambda showing", which means that it's ready for application. Here are some examples that might demonstrate what this means:
-- ok, lambda is outermost
Prelude> seq (\x -> undefined) 'a'
'a'
-- not ok. Because of the inner seq, `undefined` must be evaluated before
-- the lambda is showing
Prelude> seq (seq undefined (\x -> x)) 'b'
*** Exception: Prelude.undefined
If you think of a lambda binding as a (built-in) data constructor, seq on functions is perfectly consistent with using it on data.
Also, "lambda binding" subsumes all types of function definitions, whether defined by lambda notation or as a normal function.
The Controversy section of the HaskellWiki's seq page has a little about some of the consequences of seq in relation to functions.
You can think of seq as:
seq a b = case a of
_ -> b
This will evaluate a to head-normal form (WHNF) and then continue with evaluating b.
Edit after augustss comment: this case ... of is the strict, GHC Core one, which always forces its argument.
I have the following function:
appendMsg :: String -> (String, Integer) -> Map.Map (String, Integer) [String] -> Map.Map (String, Integer) [String]
appendMsg a (b,c) m = do
let Just l = length . concat <$> Map.lookup (b,c) m
l' = l + length a
--guard $ l' < 1400
--return $ Map.adjust (++ [a]) (b, c) m
if l' < 1400 then let m2 = Map.adjust (++ [a]) (b, c) m in return m2 else return (m)
In case l' < 1400 the value m2 should be created and returned, in case l' > 1400 I eventually want to call a second function but for now it is sufficient to return nothing or m in this case.
I started with guards and was immediately running in the error
No instance for (Monad (Map.Map (String, Integer)))
arising from a use of `return'
Then I tried if-then-else, had to fix some misunderstanding and eventually ended up with the very same error.
I want to know how to fix this. I do understand what a Monad is or that Data in Haskell is immutable. However, after reading two books on Haskell I feel still quite far away from being in a position to code something useful. There is always one more Monad ... :D .
This looks very much like you're confused about what return does. Unlike in many imperative languages, return is not the way you return a value from a function.
In Haskell you define a function by saying something like f a b c = some_expression. The right hand side is the expression that the function "returns". There's no need for a "return statement" as you would use in imperative languages, and in fact it wouldn't make sense. In imperative languages where a function is a sequence of statements that are executed, it fits naturally to determine what result the function evaluates to with a return statement which terminates the function and passes a value back up. In Haskell, the right hand side of your function is a single expression, and it doesn't really make sense to suddenly put a statement in the middle of that expression that somehow terminates evaluation of the expression and returns something else instead.
In fact, return is itself a function, rather than a statement. It was perhaps poorly named, given that it isn't what imperative programmers learning Haskell would expect. return foo is how you wrap foo up into a monadic value; it doesn't have any effect on flow control, but just evaluates to foo in the context of whatever monad you're using. It doesn't look like you're using monads at all (certainly the type for your function is wrong if you are).
It's also worth noting that if/then/else isn't a statement in Haskall either; the whole if/then/else block is an expression, which evaluates to either the then expression or the else expression, depending on the value of the condition.
So take away the do, which just provides convenient syntax for working with monads. Then you don't need to return anything, just have m2 as your then branch (with the let binding, or since you only use m2 once you could just replace your whole then expression with Map.adjust (++ [a]) (b, c) m), and m in your else branch. You then wrap that in a let ... in ... block to get your bindings of l and l'. And the whole let ... in ... construct is also an expression, so it can just be the right hand side of your function definition as-is. No do, no return.
You state that the result of your function is
Map.Map (String, Integer) [String]
and, through the use of return, state that it is also monadic. Hence the compiler thinks there should be an Monad instance for Map (String, Integer).
But I am quite sure this will compile if you drop return and do and write in before the if. Or at least it will give you more meaningful error messages.