Is my understanding of *mapping* correct? - haskell

I learn Haskell. When I'm reading books (the russian translate of them) I often see the mapping word... I'm not sure I understand it right.
In my understanding: the mapping - this is the getting of new value on the base of some old value. So it is a result of any function with parameters (at least one), or data constructor. The new value isn't obliged to have the same type, as old.
I.e.
-- mapping samples:
func a b = a + b
func' a = show a
func'' a = a
func''' a = Just a
Am I right?

Yes what you have understood is correct. Mapping means getting new values based on the old value by applying it to some function. The new value may or may not be of the same type (of the old value). In mathematics, the word mapping and function is actually used interchangeably.
There is another concept related to mapping: map
map is a famous higher order function which can perform mapping on a list of values.
λ> map (+ 1) [1,2,3]
[2,3,4]
In the previous example, you are using the map function to apply the function (+ 1) on each of the list values.

Related

Haskell, stuck on understanding type synonyms

From LYAH book, parameterizing type synonyms:
I would understand:
type MyName = String
But this example I don't get:
type IntMap v = Map Int v -- let alone it can be shortened
'Map' is a function, not a type, right? This got me in loops in the book constantly now. Next to that: Map would require a function and a list to work, correct? If so, and 'v' is the list, what is the 'Int'?
Map is the name of a type. It's an associative array and maps values of type T to values of type K. So IntMap is the type of a Map which has Int keys and v values. Maps are also known as dictionaries in some languages. They're implemented by hash tables, balanced trees or other more exotic data structures.
It's an unfortunate name collision that there's also the map function. They kind of do the same thing, only in different contexts. map transforms values in the input by applying a function to them, whereas Map transforms input keys to output values.
There is a type named Map (with a capital "M"). There is also a function named map (with a lowercase "M"). These are unrelated other than having kind-of similar names. Try not to confuse them. ;-)

How to construct an array with multiple possible lengths using immutability and functional programming practices?

We're in the process of converting our imperative brains to a mostly-functional paradigm. This function is giving me trouble. I want to construct an array that EITHER contains two pairs or three pairs, depending on a condition (whether refreshToken is null). How can I do this cleanly using a FP paradigm? Of course with imperative code and mutation, I would just conditionally .push() the extra value onto the end which looks quite clean.
Is this an example of the "local mutation is ok" FP caveat?
(We're using ReadonlyArray in TypeScript to enforce immutability, which makes this somewhat more ugly.)
const itemsToSet = [
[JWT_KEY, jwt],
[JWT_EXPIRES_KEY, tokenExpireDate.toString()],
[REFRESH_TOKEN_KEY, refreshToken /*could be null*/]]
.filter(item => item[1] != null) as ReadonlyArray<ReadonlyArray<string>>;
AsyncStorage.multiSet(itemsToSet.map(roArray => [...roArray]));
What's wrong with itemsToSet as given in the OP? It looks functional to me, but it may be because of my lack of knowledge of TypeScript.
In Haskell, there's no null, but if we use Maybe for the second element, I think that itemsToSet could be translated to this:
itemsToSet :: [(String, String)]
itemsToSet = foldr folder [] values
where
values = [
(jwt_key, jwt),
(jwt_expires_key, tokenExpireDate),
(refresh_token_key, refreshToken)]
folder (key, Just value) acc = (key, value) : acc
folder _ acc = acc
Here, jwt, tokenExpireDate, and refreshToken are all of the type Maybe String.
itemsToSet performs a right fold over values, pattern-matching the Maye String elements against Just and (implicitly) Nothing. If it's a Just value, it cons the (key, value) pair to the accumulator acc. If not, folder just returns acc.
foldr traverses the values list from right to left, building up the accumulator as it visits each element. The initial accumulator value is the empty list [].
You don't need 'local mutation' in functional programming. In general, you can refactor from 'local mutation' to proper functional style by using recursion and introducing an accumulator value.
While foldr is a built-in function, you could implement it yourself using recursion.
In Haskell, I'd just create an array with three elements and, depending on the condition, pass it on either as-is or pass on just a slice of two elements. Thanks to laziness, no computation effort will be spent on the third element unless it's actually needed. In TypeScript, you probably will get the cost of computing the third element even if it's not needed, but perhaps that doesn't matter.
Alternatively, if you don't need the structure to be an actual array (for String elements, performance probably isn't that critical, and the O (n) direct-access cost isn't an issue if the length is limited to three elements), I'd use a singly-linked list instead. Create the list with two elements and, depending on the condition, append the third. This does not require any mutation: the 3-element list simply contains the unchanged 2-element list as a substructure.
Based on the description, I don't think arrays are the best solution simply because you know ahead of time that they contain either 2 values or 3 values depending on some condition. As such, I would model the problem as follows:
type alias Pair = (String, String)
type TokenState
= WithoutRefresh (Pair, Pair)
| WithRefresh (Pair, Pair, Pair)
itemsToTokenState: String -> Date -> Maybe String -> TokenState
itemsToTokenState jwtKey jwtExpiry maybeRefreshToken =
case maybeRefreshToken of
Some refreshToken ->
WithRefresh (("JWT_KEY", jwtKey), ("JWT_EXPIRES_KEY", toString jwtExpiry), ("REFRESH_TOKEN_KEY", refreshToken))
None ->
WithoutRefresh (("JWT_KEY", jwtKey), ("JWT_EXPIRES_KEY", toString jwtExpiry))
This way you are leveraging the type system more effectively, and could be improved on further by doing something more ergonomic than returning tuples.

Pass by Reference in Haskell?

Coming from a C# background, I would say that the ref keyword is very useful in certain situations where changes to a method parameter are desired to directly influence the passed value for value types of for setting a parameter to null.
Also, the out keyword can come in handy when returning a multitude of various logically unconnected values.
My question is: is it possible to pass a parameter to a function by reference in Haskell? If not, what is the direct alternative (if any)?
There is no difference between "pass-by-value" and "pass-by-reference" in languages like Haskell and ML, because it's not possible to assign to a variable in these languages. It's not possible to have "changes to a method parameter" in the first place in influence any passed variable.
It depends on context. Without any context, no, you can't (at least not in the way you mean). With context, you may very well be able to do this if you want. In particular, if you're working in IO or ST, you can use IORef or STRef respectively, as well as mutable arrays, vectors, hash tables, weak hash tables (IO only, I believe), etc. A function can take one or more of these and produce an action that (when executed) will modify the contents of those references.
Another sort of context, StateT, gives the illusion of a mutable "state" value implemented purely. You can use a compound state and pass around lenses into it, simulating references for certain purposes.
My question is: is it possible to pass a parameter to a function by reference in Haskell? If not, what is the direct alternative (if any)?
No, values in Haskell are immutable (well, the do notation can create some illusion of mutability, but it all happens inside a function and is an entirely different topic). If you want to change the value, you will have to return the changed value and let the caller deal with it. For instance, see the random number generating function next that returns the value and the updated RNG.
Also, the out keyword can come in handy when returning a multitude of various logically unconnected values.
Consequently, you can't have out either. If you want to return several entirely disconnected values (at which point you should probably think why are disconnected values being returned from a single function), return a tuple.
No, it's not possible, because Haskell variables are immutable, therefore, the creators of Haskell must have reasoned there's no point of passing a reference that cannot be changed.
consider a Haskell variable:
let x = 37
In order to change this, we need to make a temporary variable, and then set the first variable to the temporary variable (with modifications).
let tripleX = x * 3
let x = tripleX
If Haskell had pass by reference, could we do this?
The answer is no.
Suppose we tried:
tripleVar :: Int -> IO()
tripleVar var = do
let times_3 = var * 3
let var = times_3
The problem with this code is the last line; Although we can imagine the variable being passed by reference, the new variable isn't.
In other words, we're introducing a new local variable with the same name;
Take a look again at the last line:
let var = times_3
Haskell doesn't know that we want to "change" a global variable; since we can't reassign it, we are creating a new variable with the same name on the local scope, thus not changing the reference. :-(
tripleVar :: Int -> IO()
tripleVar var = do
let tripleVar = var
let var = tripleVar * 3
return()
main = do
let x = 4
tripleVar x
print x -- 4 :(

Why does concatenation of lists take O(n)?

According to the theory of ADTs (Algebraic Data Types) the concatenation of two lists has to take O(n) where n is the length of the first list. You, basically, have to recursively iterate through the first list until you find the end.
From a different point of view, one can argue that the second list can simply be linked to the last element of the first. This would take constant time, if the end of the first list is known.
What am I missing here ?
Operationally, an Haskell list is typically represented by a pointer to the first cell of a single-linked list (roughly). In this way, tail just returns the pointer to the next cell (it does not have to copy anything), and consing x : in front of the list allocates a new cell, makes it point to the old list, and returns the new pointer. The list accessed by the old pointer is unchanged, so there's no need to copy it.
If you instead append a value with ++ [x], then you can not modify the original liked list by changing its last pointer unless you know that the original list will never be accessed. More concretely, consider
x = [1..5]
n = length (x ++ [6]) + length x
If you modify x when doing x++[6], the value of n would turn up to be 12, which is wrong. The last x refer to the unchanged list which has length 5, so the result of n must be 11.
Practically, you can't expect the compiler to optimize this, even in those cases in which x is no longer used and it could, theoretically, be updated in place (a "linear" use). What happens is that the evaluation of x++[6] must be ready for the worst-case in which x is reused afterwards, and so it must copy the whole list x.
As #Ben notes, saying "the list is copied" is imprecise. What actually happens is that the cells with the pointers are copied (the so-called "spine" on the list), but the elements are not. For instance,
x = [[1,2],[2,3]]
y = x ++ [[3,4]]
requires only to allocate [1,2],[2,3],[3,4] once. The lists of lists x,y will share pointers to the lists of integers, which do not have to be duplicated.
What you're asking for is related to a question I wrote for TCS Stackexchange some time back: the data structure that supports constant-time concatenation of functional lists is a difference list.
A way of handling such lists in a functional programming language was worked out by Yasuhiko Minamide in the 90s; I effectively rediscovered it a while back. However, the good run-time guarantees require language-level support that's not available in Haskell.
It's because of immutable state. A list is an object + a pointer, so if we imagined a list as a Tuple it might look like this:
let tupleList = ("a", ("b", ("c", [])))
Now let's get the first item in this "list" with a "head" function. This head function takes O(1) time because we can use fst:
> fst tupleList
If we want to swap out the first item in the list with a different one we could do this:
let tupleList2 = ("x",snd tupleList)
Which can also be done in O(1). Why? Because absolutely no other element in the list stores a reference to the first entry. Because of immutable state, we now have two lists, tupleList and tupleList2. When we made tupleList2 we didn't copy the whole list. Because the original pointers are immutable we can continue to reference them but use something else at the start of our list.
Now let's try to get the last element of our 3 item list:
> snd . snd $ fst tupleList
That happened in O(3), which is equal to the length of our list i.e. O(n).
But couldn't we store a pointer to the last element in the list and access that in O(1)? To do that we would need an array, not a list. An array allows O(1) lookup time of any element as it is a primitive data structure implemented on a register level.
(ASIDE: If you're unsure of why we would use a Linked List instead of an Array then you should do some more reading about data structures, algorithms on data structures and Big-O time complexity of various operations like get, poll, insert, delete, sort, etc).
Now that we've established that, let's look at concatenation. Let's concat tupleList with a new list, ("e", ("f", [])). To do this we have to traverse the whole list just like getting the last element:
tupleList3 = (fst tupleList, (snd $ fst tupleList, (snd . snd $ fst tupleList, ("e", ("f", [])))
The above operation is actually worse than O(n) time, because for each element in the list we have to re-read the list up to that index. But if we ignore that for a moment and focus on the key aspect: in order to get to the last element in the list, we must traverse the entire structure.
You may be asking, why don't we just store in memory what the last list item is? That way appending to the end of the list would be done in O(1). But not so fast, we can't change the last list item without changing the entire list. Why?
Let's take a stab at how that might look:
data Queue a = Queue { last :: Queue a, head :: a, next :: Queue a} | Empty
appendEnd :: a -> Queue a -> Queue a
appendEnd a2 (Queue l, h, n) = ????
IF I modify "last", which is an immutable variable, I won't actually be modifying the pointer for the last item in the queue. I will be creating a copy of the last item. Everything else that referenced that original item, will continue referencing the original item.
So in order to update the last item in the queue, I have to update everything that has a reference to it. Which can only be done in optimally O(n) time.
So in our traditional list, we have our final item:
List a []
But if we want to change it, we make a copy of it. Now the second last item has a reference to an old version. So we need to update that item.
List a (List a [])
But if we update the second last item we make a copy of it. Now the third last item has an old reference. So we need to update that. Repeat until we get to the head of the list. And we come full circle. Nothing keeps a reference to the head of the list so editing that takes O(1).
This is the reason that Haskell doesn't have Doubly Linked Lists. It's also why a "Queue" (or at least a FIFO queue) can't be implemented in a traditional way. Making a Queue in Haskell involves some serious re-thinking of traditional data structures.
If you become even more curious about how all of this works, consider getting the book Purely Funtional Data Structures.
EDIT: If you've ever seen this: http://visualgo.net/list.html you might notice that in the visualization "Insert Tail" happens in O(1). But in order to do that we need to modify the final entry in the list to give it a new pointer. Updating a pointer mutates state which is not allowed in a purely functional language. Hopefully that was made clear with the rest of my post.
In order to concatenate two lists (call them xs and ys), we need to modify the final node in xs in order to link it to (i.e. point at) the first node of ys.
But Haskell lists are immutable, so we have to create a copy of xs first. This operation is O(n) (where n is the length of xs).
Example:
xs
|
v
1 -> 2 -> 3
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7
^ ^
| |
xs ++ ys ys

What makes a good name for a helper function?

Consider the following problem: given a list of length three of tuples (String,Int), is there a pair of elements having the same "Int" part? (For example, [("bob",5),("gertrude",3),("al",5)] contains such a pair, but [("bob",5),("gertrude",3),("al",1)] does not.)
This is how I would implement such a function:
import Data.List (sortBy)
import Data.Function (on)
hasPair::[(String,Int)]->Bool
hasPair = napkin . sortBy (compare `on` snd)
where napkin [(_, a),(_, b),(_, c)] | a == b = True
| b == c = True
| otherwise = False
I've used pattern matching to bind names to the "Int" part of the tuples, but I want to sort first (in order to group like members), so I've put the pattern-matching function inside a where clause. But this brings me to my question: what's a good strategy for picking names for functions that live inside where clauses? I want to be able to think of such names quickly. For this example, "hasPair" seems like a good choice, but it's already taken! I find that pattern comes up a lot - the natural-seeming name for a helper function is already taken by the outer function that calls it. So I have, at times, called such helper functions things like "op", "foo", and even "helper" - here I have chosen "napkin" to emphasize its use-it-once, throw-it-away nature.
So, dear Stackoverflow readers, what would you have called "napkin"? And more importantly, how do you approach this issue in general?
General rules for locally-scoped variable naming.
f , k, g, h for super simple local, semi-anonymous things
go for (tail) recursive helpers (precedent)
n , m, i, j for length and size and other numeric values
v for results of map lookups and other dictionary types
s and t for strings.
a:as and x:xs and y:ys for lists.
(a,b,c,_) for tuple fields.
These generally only apply for arguments to HOFs. For your case, I'd go with something like k or eq3.
Use apostrophes sparingly, for derived values.
I tend to call boolean valued functions p for predicate. pred, unfortunately, is already taken.
In cases like this, where the inner function is basically the same as the outer function, but with different preconditions (requiring that the list is sorted), I sometimes use the same name with a prime, e.g. hasPairs'.
However, in this case, I would rather try to break down the problem into parts that are useful by themselves at the top level. That usually also makes naming them easier.
hasPair :: [(String, Int)] -> Bool
hasPair = hasDuplicate . map snd
hasDuplicate :: Ord a => [a] -> Bool
hasDuplicate = not . isStrictlySorted . sort
isStrictlySorted :: Ord a => [a] -> Bool
isStrictlySorted xs = and $ zipWith (<) xs (tail xs)
My strategy follows Don's suggestions fairly closely:
If there is an obvious name for it, use that.
Use go if it is the "worker" or otherwise very similar in purpose to the original function.
Follow personal conventions based on context, e.g. step and start for args to a fold.
If all else fails, just go with a generic name, like f
There are two techniques that I personally avoid. One is using the apostrophe version of the original function, e.g. hasPair' in the where clause of hasPair. It's too easy to accidentally write one when you meant the other; I prefer to use go in such cases. But this isn't a huge deal as long as the functions have different types. The other is using names that might connote something, but not anything that has to do with what the function actually does. napkin would fall into this category. When you revisit this code, this naming choice will probably baffle you, as you will have forgotten the original reason that you named it napkin. (Because napkins have 4 corners? Because they are easily folded? Because they clean up messes? They're found at restaurants?) Other offenders are things like bob and myCoolFunc.
If you have given a function a name that is more descriptive than go or h, then you should be able to look at either the context in which it is used, or the body of the function, and in both situations get a pretty good idea of why that name was chosen. This is where my point #3 comes in: personal conventions. Much of Don's advice applies. If you are using Haskell in a collaborative situation, then coordinate with your team and decide on certain conventions for common situations.

Resources