How to use groupBy on a list of tuples? - haskell

How can I group this list by second element of tuples:
[(3,2),(17,2),(50,3),(64,3)]
to get something like:
[[(3,2),(17,2)],[(50,3),(64,3)]]
I'm actually a newcomer to Haskell...and seems to be falling in love with it. Hope you would help me find an efficient way.

It sounds like you've already identified that you want Data.List.groupBy. The type of this function is
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
So it takes a binary predicate, i.e. an equivalence relation determining how to group elements. You want to group elements by equality on the second term of a pair, so you want
groupBy (\x y -> snd x == snd y) myList
Where snd is a built-in function that gets the second element of a pair.
Incidentally, this pattern of "apply a function to two arguments and then apply a binary function to the results" is very common, especially when calling Data.List functions, so Data.Function provides on.
on :: (b -> b -> c) -> (a -> b) -> a -> a -> c
Weird signature, but the use case is just what we want.
((+) `on` f) x y = f x + f y
So your desired groupBy can be written as
groupBy ((==) `on` snd)
Note that groupBy only finds consecutive equal elements. You didn't indicate whether you wanted consecutive equal elements or all equal elements, but if you want the latter, then I don't believe Haskell base provides that function, though you could certainly write it recursively yourself.

Related

Calculating from multiple tuples

I'm trying to take a list of lists of tuple and turn each list of tuples into a single tuple. Like this:
Currently have:
[[("Erikson,Ann",2.0,3),("Erikson,Ann",3.33,3)],[("Lewis,Buck",2.66,1),
("Lewis,Buck",2.0,3)],[("Smith,John",0.0,1),("Smith,John",1.66,3),
("Smith,John",1.33,3)],[("Torvell,Sarah",4.0,3)]]
And I want the form to be a single list of tuples. One tuple for each persons name.
Along with combining the list of tuples into a single tuple I want to use the second and third elements of each tuple to calculate the gpa of the person. The second number is the grade point for that class and the third number is the credits for the class.
What I need to do is take the sum of credits * gradepoint for each tuple and then divide that sum by the sum of all the credits in each tuple.
What i have so far, that doesn't work is this...
calcGPA :: MyType2 -> MyType2 -> (String, Double, Double)
calcGPA (a,b,c) (d,e,f) = (a, ((b*(fromIntegral c))+(e*(fromIntegral
f))/(b+e)),
(b+e))
Where i am passing in the first list of lists I show at the top of this post.
Am I going in the right direction to solve this problem. Any tips or help would be appreciated.
Thank you
EDIT
Thank you for the help! Helped me understand what was actually going on. I wrote the cumulativeSums fuction as follow:
cumulativeSums :: (Double, Int) -> (String, Double, Int) -> (Double,
Int)
cumulativeSums (a,b) (c,d,e) = (a+(d*e), b+e)
I'm confused on the chunk of code you have above with the let. Where does this go? Do I put it in its own function that I call passing in the list of list of tuples?
Thank you
________________________________________________________________________________Now that im trying to output credits also
calcGPA :: [(String, Double, Int)] -> (String, Double, Int)
calcGPA grades = let name = (\ (name, _, _) ->
name) (head grades)
(name, weightedSum, sumOfWeights) = foldl
cumulativeSums (name, 0, 0) grades
gpa = weightedSum / sumOfWeights
in (name, gpa, credits)
You're going in the right direction if you were planning on using foldl or foldr with your calcGPA function.
What we do when folding is we have a function with the result-so-far, the next element in a list, and the result-just-after. With foldl, which is most appropriate for sums, the type and arguments, as far as lists are concerned is:
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f startingResult items = …
We see that your function will need to be type (b -> a -> b). Looking elsewhere in the type signature, the ultimate result of foldl is type b. The type of elements in the list is type a.
So what the function you provide foldl does is takes two arguments: the result-so-far and the next item in the list. It then expects your function to give back the result-just-after.
You "fold" in a new item to the result each time your function is run on the next element in the list. So let's look at what our list element type is and what our result type will be.
Our list element type is something like (String, Double, Int). Our result type is (Double, Int). So the type signature for our folding function is:
cumulativeSums :: (Double, Int) -> (String, Double, Int) -> (Double, Int)
So far so good. Now what about the other arguments to foldl? We know the items argument: it's our sublist for one person's grades. We know f, it's our cumulativeSums function we're going to write. What is startingResult? Well, both sums should start with 0, so it's (0, 0). We have:
let name = (\ (name, _, _) -> name) (head grades)
(weightedSum, sumOfWeights) = foldl cumulativeSums (0, 0) grades
gpa = weightedSum / sumOfWeights
in (name, gpa)
Now we write cumulativeSums. Remember, we're getting told the result-so-far and the item from the list. We just need to give back the result-just-after. See if you can write that part.
For the code already provided, I'd recommend writing your own version of it. There are some type errors related to mixing Ints and Doubles in the code above.
You need to go over each sub-list so you can accumulate values. Something like this:
averageGdp :: [[(String, Double, Double)]] -> [(String, Double, Double)]
averageGdp = fmap f
where
f = (,,) <$> fst . head <*> totalAvg <*> totalCredit
fst (a, _, _) = a
totalCredit = getSum . foldMap (\(_, _, c) -> pure c)
total = getSum . foldMap (\(_, b, c) -> pure $ b * c)
totalAvg = (/) <$> total <*> totalCredit
f takes the inner list as its input and produces a triple. You then map f over the outer list.
With this sort of grouping problems, I think it's a bit of a red herring that the data already looks grouped. Can you always be sure of that? What if the data looks like the following?
[[("Erikson,Ann",2.0,3),("Erikson,Ann",3.33,3),("Lewis,Buck",2.66,1)],
[("Lewis,Buck",2.0,3)]]
Or like this?
[[("Erikson,Ann",2.0,3),("Erikson,Ann",3.33,3),("Lewis,Buck",2.66,1)], []]
Notice that in the first example, one entry for "Lewis,Buck" is grouped together with entries for "Erikson,Ann". The second example, on the other hand, contains an empty list.
Most attempts I've seen at solving problems like this does so by utilising unsafe (i.e. non-total) functions like head. This can lead to wrong implementations or run-time crashes.
Haskell is a great language exactly because you can use the type system to keep you honest. If the original input wasn't already grouped, it'd be safer to use ungrouped data. Otherwise, you can flatten the input using concat. I'm here assuming that the example data in the OP is called sample:
*Q52527030> concat sample
[("Erikson,Ann",2.0,3.0),("Erikson,Ann",3.33,3.0),("Lewis,Buck",2.66,1.0),
("Lewis,Buck",2.0,3.0),("Smith,John",0.0,1.0),("Smith,John",1.66,3.0),
("Smith,John",1.33,3.0),("Torvell,Sarah",4.0,3.0)]
This gives you a nice flat list on which you can perform a custom grouping operation:
import qualified Data.Map.Strict as Map
arrangeByFst :: Ord a => [(a, b, c)] -> [(a, [(b, c)])]
arrangeByFst = Map.toList . foldl updateMap Map.empty
where updateMap m (x, y, z) = Map.insertWith (++) x [(y, z)] m
Here I've chosen to take a shortcut and use the built-in Map module, but otherwise, writing a function similar to Map.insertWith on a list of tuples isn't too hard.
This function takes a flat list of triples and groups them into pairs keyed by the first element, but with the other element being a list of data.
If you apply that to the flattened sample input, you get this:
*Q52527030> arrangeByFst $ concat sample
[("Erikson,Ann",[(3.33,3.0),(2.0,3.0)]),("Lewis,Buck",[(2.0,3.0),(2.66,1.0)]),
("Smith,John",[(1.33,3.0),(1.66,3.0),(0.0,1.0)]),("Torvell,Sarah",[(4.0,3.0)])]
This is a more robust approach because it doesn't rely on any particular assumptions about how data is ordered.
Each element in this list is a pair, where the first element is the name, and the second element is a list of grades. You can add a function to calculate the GPA of such a pair:
calculateGPA :: Fractional b => (a, [(b, b)]) -> (a, b)
calculateGPA (n, ts) = (n, sumOfGrades ts / numberOfGrades ts)
where
sumOfGrades grades = sum $ map (\(gp, c) -> gp * c) grades
numberOfGrades grades = fromIntegral (length grades)
This function takes as input a tuple where the second element is a list of tuples ts. It calculates sumOfGrades by mapping each tuple of grade points gp and credits c into the product of the two, and then taking the sum of those numbers. It then divides that number by the length of the list of grades.
You can now map the list produced in the previous step to calculate the GPA of each person:
*Q52527030> map calculateGPA $ arrangeByFst $ concat sample
[("Erikson,Ann",7.995),("Lewis,Buck",4.33),("Smith,John",2.9899999999999998),
("Torvell,Sarah",12.0)]
Apart from using Data.Map.Strict, I've deliberately attempted to strike a balance between keeping things basic, but still safe. A more sophisticated approach could have used fmap instead of map, join instead of concat, more point-free style, and so on. There's always room for improvement.
A one-liner to do what you asked about:
import Control.Category ( (>>>) )
g :: [[(t, Double, Double)]] -> [(t, Double, Double)]
g = filter (not . null) >>>
map (unzip3 >>> \ (a,b,c) -> (head a, sum (zipWith (*) b c) / sum c, sum c))
unzip3 :: [(a, b, c)] -> ([a], [b], [c]) is in the Prelude.
>>> is the left-to-right function composition, (f >>> g) x = g (f x).
filter makes sure all empty groups are removed before further processing.

Apply map, but only for temporary use

Given a function:
func :: [Int] -> Int
func x = minimum (map (+5) x)
And an input: func [1,10].
I'm trying to get the output 1, as 1+5 is lower than 1+10, however, I can only work out how to output the value after the mapping function has been applied, whereas I only want the mapping to apply to my minimum usage and the output to be one of the original inputs.
How can I use a map temporarily, until I've found what I wanted, then return the pre-mapped version of that value?
There are several ways, but the best is probably to use Data.List.minimumBy. It takes a function that can compare two elements, then finds the smallest element using that comparison function. It's pretty much purpose built for your situation. It's type is
> :type minimumBy
minimumBy :: (a -> a -> Ordering) -> [a] -> a
Where
> :info Ordering
data Ordering = LT | EQ | GT -- Defined in 'GHC.Types'
-- A bunch of instances that don't really matter here
so Ordering is just a basic sum type with three no-argument constructors. Their names are pretty self explanatory, so all you need to do is pass it a function that returns one of these values:
comparer :: Int -> Int -> Ordering
comparer x y = ...
I'll leave the implementation to you. You can then use it as
func x = minimumBy comparer x
Or simply
func = minimumBy comparer
use minimumBy (comparing f) (where f would be (+5) for your example)
minimumBy :: (a -> a -> Ordering) -> [a] -> a
comparing :: Ord a => (b -> a) -> b -> b -> Ordering
An option is to use a custom comparison function and minimumBy, as other commented. Note that in this case your transformation (+5) will be called twice for each element in the list (roughly). E.g.
[4,3,2,1]
leads to
compare 4 3 => compare (4+5) (3+5) => compare 9 8 => GT
compare 3 2 => compare (3+5) (2+5) => compare 8 7 => GT
compare 2 1 => compare (2+5) (1+5) => compare 8 7 => GT
Note that (3+5),(2+5) are computed twice, above.
If the operation is expensive, and simply (+5) it may be beneficial to resort to a different approach. We can pre-compute the modified values while keeping the original ones if we build a list of pairs:
map (\x -> (expensive x, x)) [4,3,2,1]
Above, function expensive plays the role of (+5) in the original code. We can then take the minimum as follows:
snd $ minimumBy (comparing fst) $ map (\x -> (expensive x, x)) [4,3,2,1]
or even
snd $ minimum $ map (\x -> (expensive x, x)) [4,3,2,1]
The latter is slightly less general in that it requires that the list [4,3,2,1] is made of comparable elements, while the former only assumes that expensive produces something comparable.
However, note that the latter will not just return a random element of the list [4,3,2,1] which minimizes expensive, but it will be the minimum such element. That is, when expensive x == expensive y, the minimum function will break the tie comparing x and y directly. The former makes no such guarantee.

Directly generating specific subsets of a powerset?

Haskell's expressiveness enables us to rather easily define a powerset function:
import Control.Monad (filterM)
powerset :: [a] -> [[a]]
powerset = filterM (const [True, False])
To be able to perform my task it is crucial for said powerset to be sorted by a specific function, so my implementation kind of looks like this:
import Data.List (sortBy)
import Data.Ord (comparing)
powersetBy :: Ord b => ([a] -> b) -> [a] -> [[a]]
powersetBy f = sortBy (comparing f) . powerset
Now my question is whether there is a way to only generate a subset of the powerset given a specific start and endpoint, where f(start) < f(end) and |start| < |end|. For example, my parameter is a list of integers ([1,2,3,4,5]) and they are sorted by their sum. Now I want to extract only the subsets in a given range, lets say 3 to 7. One way to achieve this would be to filter the powerset to only include my range but this seems (and is) ineffective when dealing with larger subsets:
badFunction :: Ord b => b -> b -> ([a] -> b) -> [a] -> [[a]]
badFunction start end f = filter (\x -> f x >= start && f x <= end) . powersetBy f
badFunction 3 7 sum [1,2,3,4,5] produces [[1,2],[3],[1,3],[4],[1,4],[2,3],[5],[1,2,3],[1,5],[2,4],[1,2,4],[2,5],[3,4]].
Now my question is whether there is a way to generate this list directly, without having to generate all 2^n subsets first, since it will improve performance drastically by not having to check all elements but rather generating them "on the fly".
If you want to allow for completely general ordering-functions, then there can't be a way around checking all elements of the powerset. (After all, how would you know the isn't a special clause built in that gives, say, the particular set [6,8,34,42] a completely different ranking from its neighbours?)
However, you could make the algorithm already drastically faster by
Only sorting after filtering: sorting is O (n · log n), so you want keep n low here; for the O (n) filtering step it matters less. (And anyway, number of elements doesn't change through sorting.)
Apply the ordering-function only once to each subset.
So
import Control.Arrow ((&&&))
lessBadFunction :: Ord b => (b,b) -> ([a]->b) -> [a] -> [[a]]
lessBadFunction (start,end) f
= map snd . sortBy (comparing fst)
. filter (\(k,_) -> k>=start && k<=end)
. map (f &&& id)
. powerset
Basically, let's face it, powersets of anything but a very small basis are infeasible. The particular application “sum in a certain range” is pretty much a packaging problem; there are quite efficient ways to do that kind of thing, but you'll have to give up the idea of perfect generality and of quantification over general subsets.
Since your problem is essentially a constraint satisfaction problem, using an external SMT solver might be the better alternative here; assuming you can afford the extra IO in the type and the need for such a solver to be installed. The SBV library allows construction of such problems. Here's one encoding:
import Data.SBV
-- c is the cost type
-- e is the element type
pick :: (Num e, SymWord e, SymWord c) => c -> c -> ([SBV e] -> SBV c) -> [e] -> IO [[e]]
pick begin end cost xs = do
solutions <- allSat constraints
return $ map extract $ extractModels solutions
where extract ts = [x | (t, x) <- zip ts xs, t]
constraints = do tags <- mapM (const free_) xs
let tagged = zip tags xs
finalCost = cost [ite t (literal x) 0 | (t, x) <- tagged]
solve [finalCost .>= literal begin, finalCost .<= literal end]
test :: IO [[Integer]]
test = pick 3 7 sum [1,2,3,4,5]
We get:
Main> test
[[1,2],[1,3],[1,2,3],[1,4],[1,2,4],[1,5],[2,5],[2,3],[2,4],[3,4],[3],[4],[5]]
For large lists, this technique will beat out generating all subsets and filtering; assuming the cost function generates reasonable constraints. (Addition will be typically OK, if you've multiplications, the backend solver will have a harder time.)
(As a side note, you should never use filterM (const [True, False]) to generate power-sets to start with! While that expression is cute and fun, it is extremely inefficient!)

Stuck - Practice exam Q for Haskell coding: Return the longest String in a list of Strings

The full practice exam question is:
Using anonymous functions and mapping functions, define Haskell
functions which return the longest String in a list of Strings, e.g.
for [“qw”, “asd”,”fghj”, “kl”] the function should return “fghj”.
I tried doing this and keep failing and moving onto others, but I would really like to know how to tackle this. I have to use mapping functions and anonymous functions it seems, but I don't know how to write code to make each element check with each to find the highest one.
I know using a mapping function like "foldr" can make you perform repeating operations to each element and return one result, which is what we want to do with this question (check each String in the list of Strings for the longest, then return one string).
But with foldr I don't know how to use it to make checks between elments to see which is "longest"... Any help will be gladly appreciated.
So far I've just been testing if I can even use foldr to test the length of each element but it doesn't even work:
longstr :: [String] -> String
longstr lis = foldr (\n -> length n > 3) 0 lis
I'm quite new to haskell as this is a 3 month course and it's only been 1 month and we have a small exam coming up
I'd say they're looking for a simple solution:
longstr xs = foldr (\x acc -> if length x > length acc then x else acc) "" xs
foldr is like a loop that iterates on every element of the list xs. It receives 2 arguments: x is the element and acc (for accumulator) in this case is the longest string so far.
In the condition if the longest string so far is longer than the element we keep it, otherwise we change it.
Another idea:
Convert to a list of tuples: (length, string)
Take the maximum of that list (which is some pair).
Return the string of the pair returned by (2).
Haskell will compare pairs (a,b) lexicographically, so the pair returned by (2) will come from the string with largest length.
Now you just have to write a maximum function:
maximum :: Ord a => [a] -> a
and this can be written using foldr (or just plain recursion.)
To write the maximum function using recursion, fill in the blanks:
maximum [a] = ??? -- maximum of a single element
maximum (a:as) = ??? -- maximum of a value a and a list as (hint: use recursion)
The base case for maximum begins with a single element list since maximum [] doesn't make sense here.
You can map the list to a list of tuples, consisting of (length, string). Sort by length (largest first) and return the string of the first element.
https://stackoverflow.com/a/9157940/127059 has an answer as well.
Here's an example of building what you want from the bottom up.
maxBy :: Ord b => (a -> b) -> a -> a -> a
maxBy f x y = case compare (f x) (f y) of
LT -> y
_ -> x
maximumBy :: Ord b => (a -> b) -> [a] -> Maybe a
maximumBy _ [] = Nothing
maximumBy f l = Just . fst $ foldr1 (maxBy snd) pairs
where
pairs = map (\e -> (e, f e)) l
testData :: [String]
testData = ["qw", "asd", "fghj", "kl"]
test :: Maybe String
test = maximumBy length testData
main :: IO ()
main = print test

How can I iterate over a string without recursion?

isTogether' :: String -> Bool
isTogether' (x:xs) = isTogether (head xs) (head (tail xs))
For the above code, I want to go through every character in the string. I am not allowed to use recursion.
isTogether' (x:xs) = isTogether (head xs) (head (tail xs))
If I've got it right, you are interested in getting consequential char pairs from some string. So, for example, for abcd you need to test (a,b), (b,c), (c,d) with some (Char,Char) -> Bool or Char -> Char -> Bool function.
Zip could be helpful here:
> let x = "abcd"
> let pairs = zip x (tail x)
it :: [(Char, Char)]
And for some f :: Char -> Char -> Bool function we can get uncurry f :: (Char, Char) -> Bool.
And then it's easy to get [Bool] value of results with map (uncurry f) pairs :: [Bool].
In Haskell, a String is just a list of characters ([Char]). Thus, all of the normal higher-order list functions like map work on strings. So you can use whichever higher-order function is most applicable to your problem.
Note that these functions themselves are defined recursively; in fact, there is no way to go through the entire list in Haskell without either recursing explicitly or using a function that directly or indirectly recurses.
To do this without recursion, you will need to use a higher order function or a list comprehension. I don't understand what you're trying to accomplish so I can only give generic advice. You probably will want one of these:
map :: (a -> b) -> [a] -> [b]
Map converts a list of one type into another. Using map lets you perform the same action on every element of the list, given a function that operates on the kinds of things you have in the list.
filter :: (a -> Bool) -> [a] -> [a]
Filter takes a list and a predicate, and gives you a new list with only the elements that satisfy the predicate. Just with these two tools, you can do some pretty interesting things:
import Data.Char
map toUpper (filter isLower "A quick test") -- => "QUICKTEST"
Then you have folds of various sorts. A fold is really a generic higher order function for doing recursion on some type, so using it takes a bit of getting used to, but you can accomplish pretty much any recursive function on a list with a fold instead. The basic type of foldr looks like this:
foldr :: (a -> b -> b) -> b -> [a] -> b
It takes three arguments: an inductive step, a base case and a value you want to fold. Or, in less mathematical terms, you could think of it as taking an initial state, a function to take the next item and the previous state to produce the next state, and the list of values. It then returns the final state it arrived at. You can do some pretty surprising things with fold, but let's say you want to detect if a list has a run of two or more of the same item. This would be hard to express with map and filter (impossible?), but it's easy with recursion:
hasTwins :: (Eq a) => [a] -> Bool
hasTwins (x:y:xs) | x == y = True
hasTwins (x:y:xs) | otherwise = hasTwins (y:xs)
hasTwins _ = False
Well, you can express this with a fold like so:
hasTwins :: (Eq a) => [a] -> Bool
hasTwins (x:xs) = snd $ foldr step (x, False) xs
where
step x (prev, seenTwins) = (x, prev == x || seenTwins)
So my "state" in this fold is the previous value and whether we've already seen a pair of identical values. The function has no explicit recursion, but my step function passes the current x value along to the next invocation through the state as the previous value. But you don't have to be happy with the last state you have; this function takes the second value out of the state and returns that as the overall return value—which is the boolean whether or not we've seen two identical values next to each other.

Resources