Why are if expressions frowned upon in Haskell?

Why are if expressions frowned upon in Haskell? - haskell

This has been a question I've been wondering for a while. if statements are staples in most programming languages (at least then ones I've worked with), but in Haskell it seems like it is quite frowned upon. I understand that for complex situations, Haskell's pattern matching is much cleaner than a bunch of ifs, but is there any real difference?
For a simple example, take a homemade version of sum (yes, I know it could just be foldr (+) 0):
sum :: [Int] -> Int
-- separate all the cases out
sum [] = 0
sum (x:xs) = x + sum xs
-- guards
sum xs
| null xs = 0
| otherwise = (head xs) + sum (tail xs)
-- case
sum xs = case xs of
[] -> 0
_ -> (head xs) + sum (tail xs)
-- if statement
sum xs = if null xs then 0 else (head xs) + sum (tail xs)
As a second question, which one of these options is considered "best practice" and why? My professor way back when always used the first method whenever possible, and I'm wondering if that's just his personal preference or if there was something behind it.

The problem with your examples is not the if expressions, it's the use of partial functions like head and tail. If you try to call either of these with an empty list, it throws an exception.
> head []
*** Exception: Prelude.head: empty list
> tail []
*** Exception: Prelude.tail: empty list
If you make a mistake when writing code using these functions, the error will not be detected until run time. If you make a mistake with pattern matching, your program will not compile.
For example, let's say you accidentally switched the then and else parts of your function.
-- Compiles, throws error at run time.
sum xs = if null xs then (head xs) + sum (tail xs) else 0
-- Doesn't compile. Also stands out more visually.
sum [] = x + sum xs
sum (x:xs) = 0
Note that your example with guards has the same problem.

I think the Boolean Blindness article answers this question very well. The problem is that boolean values have lost all their semantic meaning as soon as you construct them. That makes them a great source for bugs and also makes the code more difficult to understand.

Your first version, the one preferred by your prof, has the following advantages compared to the others:
no mention of null
list components are named in the pattern, so no mention of head and tail.
I do think that this one is considered "best practice".
What's the big deal? Why would we want to avoid especially head and tail? Well, everybody knows that those functions are not total, so one automatically tries to make sure that all cases are covered. A pattern match on [] not only stands out more than null xs, a series of pattern matches can be checked by the compiler for completeness. Hence, the idiomatic version with complete pattern match is easier to grasp (for the trained Haskell reader) and to proof exhaustive by the compiler.
The second version is slightly better than the last one because one sees at once that all cases are handled. Still, in the general case the RHS of the second equation could be longer and there could be a where clauses with a couple of definitions, the last of them could be something like:
where
... many definitions here ...
head xs = ... alternative redefnition of head ...
To be absolutly sure to understand what the RHS does, one has to make sure common names have not been redefined.
The 3rd version is the worst one IMHO: a) The 2nd match fails to deconstruct the list and still uses head and tail. b) The case is slightly more verbose than the equivalent notation with 2 equations.

In many programming languages, if-statements are fundamental primitives, and things like switch-blocks are just syntax sugar to make deeply-nested if-statements easier to write.
Haskell does it the other way around. Pattern matching is the fundamental primitive, and an if-expression is literally just syntax sugar for pattern matching. Similarly, constructs like null and head are simply user-defined functions, which are all ultimately implemented using pattern matching. So pattern matching is the thing at the bottom of it all. (And therefore potentially more efficient than calling user-defined functions.)
In many cases - such as the ones you list above - it's simply a matter of style. The compiler can almost certainly optimise things to the point where all versions are roughly equal in performance. But generally [not always!] pattern matching makes it clearer exactly what you're trying to achieve.
(It's annoyingly easy to write an if-expression where you get the two alternatives the wrong way around. You'd think it would be a rare mistake, but it's surprisingly common. With a pattern match, there's little chance of making that specific mistake, although there's still plenty of other things to screw up.)

Each call to null, head and tail entails a pattern match. But the 1st version in your answer does just one pattern match, and reuses its results through named components of the pattern.
Just for that, it is better. But it is also more visually apparent, more readable.

Pattern matching is better than a string of if-then-else statements for (at least) the following reasons:
it is more declarative
it interacts well with sum-types
Pattern matching helps to reduce the amount of "accidental complexity" in your code - that is, code that is really more about implementation details rather than the essential logic of your program.
In most other languages when the compier/run-time sees a string of if-then-else statements it has no choice but to test the conditions in exactly the order the programmer specified them. But pattern matching encourages the programmer to focus more on describing what should happen versus how things should be performed. Due to purity and immutability of values in Haskell the compiler can consider the collection of patterns as a whole and decide the how best to implement them.
An analogy would be C's switch statement. If you dump the assembly code for various switch statements you will see that sometimes the compiler will generate a chain/tree of comparisons and in other cases it will generate a jump table. The programmer uses the same syntax in both cases - the compiler chooses the implementation based on what the comparison values are. If they form a contiguous block of values the jump table method is used, otherwise a comparison tree is used. And this separation of concerns allows the compiler to implement even more strategies in the future if other patterns among the comparison values are detected.

Related

Haskell style: Pattern matching vs. more intuitive solutions

I'm just starting out with Haskell, so I'm trying to wrap my head around the "Haskell way of thinking." Is there a reason to use pattern matching to solve Problem 1 here basically by unwrapping the whole list and calling the function recursively, instead of just retrieving the last element directly like myLast lst = lst !! ((length lst) - 1)? It seems almost brute-force, but I assume it's just my lack of familiarity here.

A few things I can think of:
(!!) and length are ultimately implemented using recursion over the structure of the list. That being so, it can be a worthwhile learning exercise to implement those basic functions using explicit recursion.
Keep in mind that, under the hood, the retrieval of the last element is not direct. Since we are dealing with linked lists, length has to go through all elements of the lists, and (!!) has to go through all elements up to the desired index. That being so, lst !! (length lst - 1) runs through the whole list twice, rather than once. (This is one of the reasons why, as a rule of thumb, length is better avoided unless you actually need to know the number of elements in and of itself, and not just as a proxy to something else.)
Pattern matching is a neat way of stating facts about the structure of data types. If, while consuming a list recursively, you match a [x] pattern (or, equivalently, x : [] -- an element consed to the empty list), you know that x is the last element. In a way, matching [x] involves one less level of indirection than accessing the list element at index length lst - 1, as it only deals with the structure of the list, without requiring an indexing scheme to be bolted on the top of it.
With all that said, there is something fundamentally right about your feeling that explicit recursion feels "almost brute-force". In time, you'll find out about folds, mapping functions, and other ways to capture and abstract common recursive patterns, making it possible to write in a more fluent manner.

what's the meaning of "you do computations in Haskell by declaring what something is instead of declaring how you get it"?

Recently I am trying to learn a functional programming language and I choosed Haskell.
Now I am reading learn you a haskell and here is a description seems like Haskell's philosophy I am not sure I understand it exactly: you do computations in Haskell by declaring what something is instead of declaring how you get it.
Suppose I want to get the sum of a list.
In a declaring how you get it way:
get the total sum by add all the elements, so the code will be like this(not haskell, python):
sum = 0
for i in l:
sum += i
print sum
In a what something is way:
the total sum is the sum of the first element and the sum of the rest elements, so the code will be like this:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs
But I am not sure I get it or not. Can some one help? Thanks.

Imperative and functional are two different ways to approach problem solving.
Imperative (Python) gives you actions which you need to use to get what you want. For example, you may tell the computer "knead the dough. Then put it in the oven. Turn the oven on. Bake for 10 minutes.".
Functional (Haskell, Clojure) gives you solutions. You'd be more likely to tell the computer "I have flour, eggs, and water. I need bread". The computer happens to know dough, but it doesn't know bread, so you tell it "bread is dough that has been baked". The computer, knowing what baking is, knows now how to make bread. You sit at the table for 10 minutes while the computer does the work for you. Then you enjoy delicious bread fresh from the oven.
You can see a similar difference in how engineers and mathematicians work. The engineer is imperative, looking at the problem and giving workers a blueprint to solve it. The mathematician defines the problem (solve for x) and the solution (x = -----) and may use any number of tried and true solutions to smaller problems (2x - 1 = ----- => 2x = ----- + 1) until he finally finds the desired solution.
It is not a coincidence that functional languages are used largely by people in universities, not because it is difficult to learn, but because there are not many mathematical thinkers outside of universities. In your quotation, they tried to define this difference in thought process by cleverly using how and what. I personally believe that everybody understands words by turning them into things they already understand, so I'd imagine my bread metaphor should clarify the difference for you.
EDIT: It is worth noting that when you imperatively command the computer, you don't know if you'll have bread at the end (maybe you cooked it too long and it's burnt, or you didn't add enough flour). This is not a problem in functional languages where you know exactly what each solution gives you. There is no need for trial and error in a functional language because everything you do will be correct (though not always useful, like accidentally solving for t instead of x).

The missing part of the explanations is the following.
The imperative example shows you step by step how to compute the sum. At no stage you can convince yourself that it is indeed a sum of elements of a list. For example, there is no knowing why sum=0 at first; should it be 0 at all; do you loop through the right indices; what sum+=i gives you.
sum=0 -- why? it may become clear if you consider what happens in the loop,
-- but not on its own
for i in l:
sum += i -- what do we get? it will become clear only after the loop ends
-- at no step of iteration you have *the sum of the list*
-- so the step on its own is not meaningful
The declarative example is very different in this respect. In this particular case you start with declaring that the sum of an empty list is 0. This is already part of the answer of what the sum is. Then you add a statement about non-empty lists - a sum for a non-empty list is the sum of the tail with the head element added to it. This is the declaration of what the sum is. You can demonstrate inductively that it finds the solution for any list.
Note this proof part. In this case it is obvious. In more complex algorithms it is not obvious, so the proof of correctness is a substantial part - and remember that the imperative case only makes sense as a whole.
Another way to compute sum, where, hopefully, declarativeness and proovability become clearer:
sum [] = 0 -- the sum of the empty list is 0
sum [x] = x -- the sum of the list with 1 element is that element
sum xs = sum $ p xs where -- the sum of any other list is
-- the sum of the list reduced with p
p (x:y:xs) = x+y : p xs -- p reduces the list by replacing a pair of elements
-- with their sum
p xs = xs -- if there was no pair of elements, leave the list as is
Here we can convince ourselves that: 1. p makes the list ever shorter, so the computation of the sum will terminate; 2. p produces a list of sums, so by summing ever shorter lists we get a list of just one element; 3. because (+) is associative, the value produced by repeatedly applying p is the same as the sum of all elements in the original list; 4. we can demonstrate the number of applications of (+) is smaller than in the straightforward implementation.
In other words, the order of adding the elements doesn't matter, so we can sum the elements ([a,b,c,d,e]) in pairs first (a+b, c+d), which gives us a shorter list [a+b,c+d,e], whose sum is the same as the sum of the original list, and which now can be reduced in the same way: [(a+b)+(c+d),e], then [((a+b)+(c+d))+e].

Robert Harper claims in his blog that "declarative" has no meaning. I suppose
he is talking about a clear definition there, which I usually think of as more narrow
then meaning, but the post is still worth checking out and hints that you might not
find as clear an answer as you would wish.
Still, everybody talks about "declarative" and it feels like when we do we usually
talk about the same thing. i.e. Give a number of people two different apis/languages/programs
and ask them which is the most declarative one and they will usually pick the same.
The confusing part to me at first was that your declarative sum
sum' [] = 0
sum' (x:xs) = x + sum' xs
can also be seen as an instruction on how to get the result. It's just a different one.
It's also worth noting that the function sum in the prelude isn't actually defined like that
since that particular way of calculating the sum is inefficient. So clearly something is
fishy.
So, the "what, not how" explanation seem unsatisfactory to me. I think of it instead as
declarative being a "how" which in addition have some nice properties. My current intuition
about what those properties are is something similar to:
A thing is more declarative if it doesn't mutate any state.
A thing is more declarative if you can do mathy transformations on it and the meaning of
the thing sort of remains intact. So given your declarative sum again, if we knew that
+ is commutative there is some justification for thinking that writing it like
sum' xs + x should yield the same result.
A declarative thing can be decomposed into smaller thing and still have some meaning. Like
x and sum' xs still have the same meaning when taken separately, but trying to do the
same with the sum += x of python doesn't work as well.
A thing is more declarative if it's independent of the flow of time. For example css
doesn't describe the styling of a web page at page load. It describes the styling of the
web page at any time, even if the page would change.
A thing is more declarative if you don't have to think about program flow.
Other people might have different intuitions, or even a definition that I'm not aware of,
but hopefully these are somewhat helpful regardless.

What requirements does non-strict semantics of Haskell have on the evaluation strategy?

The Haskell language specification states that it is a non-strict language, but nothing about the evaluation strategy (like when and how an expression is evaluated, and to what level). It does mention the word "evaluate" several times when talking about pattern matching.
I have read a wonderful tutorial about lazy evaluation and weak head normal form, but it is just an implementation strategy of some compiler, which I should not depend on when writing codes.
I come from a strict language background and I just don't feel right if I don't understand how my codes are executed. I wonder why the language specification does not define the evaluation strategy.
I hope someone can enlighten me. Thanks!

I would argue that trying to care about evaluation order is counter productive in Haskell. Not only is the language designed to make evaluation order irrelevant but the evaluation order can jump all over the place in strange and confusing ways. Additionally, the implementation has substantial freedom to execute things differently[1] or to vastly restructure your program[2] if the end result is still the same so different parts of the program might be evaluated in different ways.
The only real restriction is what evaluation strategies you can't use. For example, you can't always use strict evaluation because that would cause valid programs to crash or enter infinite loops.
const 17 undefined
take 10 (repeat 17)
That said, if you really care, one valid strategy you could possibly use to implement all of Haskell is lazy evaluation with thunks. Each value is represented in a box that either contains the value or a thunk subroutine that can be used to compute the value when you finally need to use it.
So when you write
let xs = 1 : []
You would be kind of doing this:
x --> {thunk}
If you never inspect the contents of x then the thunk stays unevaluated. However, if you ever do some pattern matching on the thunk then you need to evaluate the thunk to see what branch you take:
case xs of
[] -> ...
(y:ys) -> ...
Now, x's thunk is evaluated and the resulting value is stored in the box in case you ever need it. This is to avoid needing to recompute the thunk. Note that in the following diagram I'm using Cons to stand for the : list constructor
x ---> {Cons {thunk} {thunk}}
^ ^
| |
y ys
Of course, just the presence of the pattern math isn't enough you need first be forced to evaluate that pattern match in the first place. Ultimately, this boils down to your main function needing to print a value or something like that.
Another thing to point out is that we didn't immediately evaluate the contents of the Cons constructor when we first evaluate it. You can check this by running a program that doesn't use the contents of the list:
length [undefined, undefined]
Of course, when we actually use y for something then its corresponding thunk gets evaluated.
Additionally, you can use bang patterns to mark constructor fields as strict so they get immediately evaluated when the constructor gets evaluated (I don't remember if you need to turn an extension for this though)
data LazyBox = LazyBox Int
data StrictBox = StrictBox !Int
case (LazyBox undefined) of (LazyBox _) -> 17 -- Gives 17
case (StrictBox undefined) of (StrictBox _) -> 17 -- *** Exception: Prelude.undefined
[1]: One important optimization that GHC does is using strict evaluation in the sections that the strictness analyzers determines to be strict.
[2]: One of the most radical examples would be deforestation.

Why does Haskell's `head` crash on an empty list (or why doesn't it return an empty list)? (Language philosophy)

Note to other potential contributors: Please don't hesitate to use abstract or mathematical notations to make your point. If I find your answer unclear, I will ask for elucidation, but otherwise feel free to express yourself in a comfortable fashion.
To be clear: I am not looking for a "safe" head, nor is the choice of head in particular exceptionally meaningful. The meat of the question follows the discussion of head and head', which serve to provide context.
I've been hacking away with Haskell for a few months now (to the point that it has become my main language), but I am admittedly not well-informed about some of the more advanced concepts nor the details of the language's philosophy (though I am more than willing to learn). My question then is not so much a technical one (unless it is and I just don't realize it) as it is one of philosophy.
For this example, I am speaking of head.
As I imagine you'll know,
Prelude> head []
*** Exception: Prelude.head: empty list
This follows from head :: [a] -> a. Fair enough. Obviously one cannot return an element of (hand-wavingly) no type. But at the same time, it is simple (if not trivial) to define
head' :: [a] -> Maybe a
head' [] = Nothing
head' (x:xs) = Just x
I've seen some little discussion of this here in the comment section of certain statements. Notably, one Alex Stangl says
'There are good reasons not to make everything "safe" and to throw exceptions when preconditions are violated.'
I do not necessarily question this assertion, but I am curious as to what these "good reasons" are.
Additionally, a Paul Johnson says,
'For instance you could define "safeHead :: [a] -> Maybe a", but now instead of either handling an empty list or proving it can't happen, you have to handle "Nothing" or prove it can't happen.'
The tone that I read from that comment suggests that this is a notable increase in difficulty/complexity/something, but I am not sure that I grasp what he's putting out there.
One Steven Pruzina says (in 2011, no less),
"There's a deeper reason why e.g 'head' can't be crash-proof. To be polymorphic yet handle an empty list, 'head' must always return a variable of the type which is absent from any particular empty list. It would be Delphic if Haskell could do that...".
Is polymorphism lost by allowing empty list handling? If so, how so, and why? Are there particular cases which would make this obvious? This section amply answered by #Russell O'Connor. Any further thoughts are, of course, appreciated.
I'll edit this as clarity and suggestion dictates. Any thoughts, papers, etc., you can provide will be most appreciated.

Is polymorphism lost by allowing empty
list handling? If so, how so, and why?
Are there particular cases which would
make this obvious?
The free theorem for head states that
f . head = head . $map f
Applying this theorem to [] implies that
f (head []) = head (map f []) = head []
This theorem must hold for every f, so in particular it must hold for const True and const False. This implies
True = const True (head []) = head [] = const False (head []) = False
Thus if head is properly polymorphic and head [] were a total value, then True would equal False.
PS. I have some other comments about the background to your question to the effect of if you have a precondition that your list is non-empty then you should enforce it by using a non-empty list type in your function signature instead of using a list.

Why does anyone use head :: [a] -> a instead of pattern matching? One of the reasons is because you know that the argument cannot be empty and do not want to write the code to handle the case where the argument is empty.
Of course, your head' of type [a] -> Maybe a is defined in the standard library as Data.Maybe.listToMaybe. But if you replace a use of head with listToMaybe, you have to write the code to handle the empty case, which defeats this purpose of using head.
I am not saying that using head is a good style. It hides the fact that it can result in an exception, and in this sense it is not good. But it is sometimes convenient. The point is that head serves some purposes which cannot be served by listToMaybe.
The last quotation in the question (about polymorphism) simply means that it is impossible to define a function of type [a] -> a which returns a value on the empty list (as Russell O'Connor explained in his answer).

It's only natural to expect the following to hold: xs === head xs : tail xs - a list is identical to its first element, followed by the rest. Seems logical, right?
Now, let's count the number of conses (applications of :), disregarding the actual elements, when applying the purported 'law' to []: [] should be identical to foo : bar, but the former has 0 conses, while the latter has (at least) one. Uh oh, something's not right here!
Haskell's type system, for all its strengths, is not up to expressing the fact that you should only call head on a non-empty list (and that the 'law' is only valid for non-empty lists). Using head shifts the burden of proof to the programmer, who should make sure it's not used on empty lists. I believe dependently typed languages like Agda can help here.
Finally, a slightly more operational-philosophical description: how should head ([] :: [a]) :: a be implemented? Conjuring a value of type a out of thin air is impossible (think of uninhabited types such as data Falsum), and would amount to proving anything (via the Curry-Howard isomorphism).

There are a number of different ways to think about this. So I am going to argue both for and against head':
Against head':
There is no need to have head': Since lists are a concrete data type, everything that you can do with head' you can do by pattern matching.
Furthermore, with head' you're just trading off one functor for another. At some point you want to get down to brass tacks and get some work done on the underlying list element.
In defense of head':
But pattern matching obscures what's going on. In Haskell we are interested in calculating functions, which is better accomplished by writing them in point-free style using compositions and combinators.
Furthermore, thinking about the [] and Maybe functors, head' allows you to move back and forth between them (In particular the Applicative instance of [] with pure = replicate.)

If in your use case an empty list makes no sense at all, you can always opt to use NonEmpty instead, where neHead is safe to use. If you see it from that angle, it's not the head function that is unsafe, it's the whole list data-structure (again, for that use case).

I think this is a matter of simplicity and beauty. Which is, of course, in the eye of the beholder.
If coming from a Lisp background, you may be aware that lists are built of cons cells, each cell having a data element and a pointer to next cell. The empty list is not a list per se, but a special symbol. And Haskell goes with this reasoning.
In my view, it is both cleaner, simpler to reason about, and more traditional, if empty list and list are two different things.
...I may add - if you are worried about head being unsafe - don't use it, use pattern matching instead:
sum [] = 0
sum (x:xs) = x + sum xs

What is the difference between Pattern Matching and Guards?

I am very new to Haskell and to functional programming in general. My question is pretty basic. What is the difference between Pattern Matching and Guards?
Function using pattern matching
check :: [a] -> String
check [] = "Empty"
check (x:xs) = "Contains Elements"
Function using guards
check_ :: [a] -> String
check_ lst
| length lst < 1 = "Empty"
| otherwise = "Contains elements"
To me it looks like Pattern Matching and Guards are fundamentally the same. Both evaluate a condition, and if true will execute the expression hooked to it. Am I correct in my understanding?
In this example I can either use pattern matching or guards to arrive at the same result. But something tells me I am missing out on something important here. Can we always replace one with the other?
Could someone give examples where pattern matching is preferred over guards and vice versa?

Actually, they're fundamentally quite different! At least in Haskell, at any rate.
Guards are both simpler and more flexible: They're essentially just special syntax that translates to a series of if/then expressions. You can put arbitrary boolean expressions in the guards, but they don't do anything you couldn't do with a regular if.
Pattern matches do several additional things: They're the only way to deconstruct data, and they bind identifiers within their scope. In the same sense that guards are equivalent to if expressions, pattern matching is equivalent to case expressions. Declarations (either at the top level, or in something like a let expression) are also a form of pattern match, with "normal" definitions being matches with the trivial pattern, a single identifier.
Pattern matches also tend to be the main way stuff actually happens in Haskell--attempting to deconstruct data in a pattern is one of the few things that forces evaluation.
By the way, you can actually do pattern matching in top-level declarations:
square = (^2)
(one:four:nine:_) = map square [1..]
This is occasionally useful for a group of related definitions.
GHC also provides the ViewPatterns extension which sort of combines both; you can use arbitrary functions in a binding context and then pattern match on the result. This is still just syntactic sugar for the usual stuff, of course.
As for the day-to-day issue of which to use where, here's some rough guides:
Definitely use pattern matching for anything that can be matched directly one or two constructors deep, where you don't really care about the compound data as a whole, but do care about most of the structure. The # syntax lets you bind the overall structure to a variable while also pattern matching on it, but doing too much of that in one pattern can get ugly and unreadable quickly.
Definitely use guards when you need to make a choice based on some property that doesn't correspond neatly to a pattern, e.g. comparing two Int values to see which is larger.
If you need only a couple pieces of data from deep inside a large structure, particularly if you also need to use the structure as a whole, guards and accessor functions are usually more readable than some monstrous pattern full of # and _.
If you need to do the same thing for values represented by different patterns, but with a convenient predicate to classify them, using a single generic pattern with a guard is usually more readable. Note that if a set of guards is non-exhaustive, anything that fails all the guards will drop down to the next pattern (if any). So you can combine a general pattern with some filter to catch exceptional cases, then do pattern matching on everything else to get details you care about.
Definitely don't use guards for things that could be trivially checked with a pattern. Checking for empty lists is the classic example, use a pattern match for that.
In general, when in doubt, just stick with pattern matching by default, it's usually nicer. If a pattern starts getting really ugly or convoluted, then stop to consider how else you could write it. Besides using guards, other options include extracting subexpressions as separate functions or putting case expressions inside the function body in order to push some of the pattern matching down onto them and out of the main definition.

For one, you can put boolean expressions within a guard.
For example:
Just as with list comprehensions, boolean expressions can be freely mixed with among the pattern guards. For example:
f x | [y] <- x
, y > 3
, Just z <- h y
= ...
Update
There is a nice quote from Learn You a Haskell about the difference:
Whereas patterns are a way of making sure a value conforms to some form and deconstructing it, guards are a way of testing whether some property of a value (or several of them) are true or false. That sounds a lot like an if statement and it's very similar. The thing is that guards are a lot more readable when you have several conditions and they play really nicely with patterns.

To me it looks like Pattern Matching and Guards are fundamentally the same. Both evaluate a condition, and if true will execute the expression hooked to it. Am I correct in my understanding?
Not quite. First pattern matching can not evaluate arbitrary conditions. It can only check whether a value was created using a given constructor.
Second pattern matching can bind variables. So while the pattern [] might be equivalent to the guard null lst (not using length because that'd not be equivalent - more on that later), the pattern x:xs most certainly is not equivalent to the guard not (null lst) because the pattern binds the variables x and xs, which the guard does not.
A note on using length: Using length to check whether a list is empty is very bad practice, because, to calculate the length it needs to go through the whole list, which will take O(n) time, while just checking whether the list is empty takes O(1) time with null or pattern matching. Further using `length´ just plain does not work on infinite lists.

In addition to the other good answers, I'll try to be specific about guards: Guards are just syntactic sugar. If you think about it, you will often have the following structure in your programs:
f y = ...
f x =
if p(x) then A else B
That is, if a pattern matches, it is followed right after by a if-then-else discrimination. A guard folds this discrimination into the pattern match directly:
f y = ...
f x | p(x) = A
| otherwise = B
(otherwise is defined to be True in the standard library). It is more convenient than an if-then-else chain and sometimes it also makes the code much simpler variant-wise so it is easier to write than the if-then-else construction.
In other words, it is sugar on top of another construction in a way which greatly simplifies your code in many cases. You will find that it eliminates a lot of if-then-else chains and make your code more readable.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string