How to group and count in haskell?

How to group and count in haskell? - haskell

Given a list (for instance [1,2,2,3,3,4,5,6]) how is it possible to group and count them according to bins/range? I would like to be able to specify a range, so that:
Say range=2, and using the previous list, would give me [1, 4, 2, 1], given that there's 1 0's or 1's, 4 2's or 3's, 2 4's or 5's and 1 6's or 7's.
Say range=4, and using the previous list, would give me [5, 3], given that there's 5 0's or 1's or 2's or 3's, 3 4's or 5's or 6's or 7's.
I have looked into group and groupBy but not found appropriate predicates, and also the histogram-fill library. The latter seems very nice to create bins, but I could not find out how to load data into those bins.
How can I achieve this?
My attempt on one of the suggestions below:
import Data.List
import Data.Function
quantize range n = n `div` range
main = print (groupBy ((==) `on` quantize 4) [1,2,3,4,2])
The output is [[1,2,3],[4],[2]] when it should have been [[1,2,2,3],[4]]. Both suggestions below works on sorted lists.
main = print (groupBy ((==) `on` quantize 4) (sort [1,2,3,4,2]))

You can achieve this using the groupBy and div functions. Let's say we have a range N. If we get the integral division (div) of N consecutive numbers, all of those should be equal. For example, N=3, 0 div 3 = 0, 1 div 3 = 0, 2 div 3 = 0, 3 div 3 = 1, 4 div 3 = 1, 5 div 3 = 1, 6 div 3 = 2.
Knowing this, we can look at groupBy :: (a -> a -> Bool) -> [a] -> [[a]] and use the function:
sameGroup :: Integral a => a -> a -> a -> Bool
sameGroup range a b = a `div` range == b `div` range
To write our own grouping function
groupings :: Integral a => a -> [a] -> [[a]]
groupings range = groupBy (sameGroup range)
Which should look something like groupings 2 [1, 2, 2, 3, 3, 4, 5, 6] == [[1], [2, 2, 3, 3], [4, 5], [6]]. Now we just have to count it to have the final function
groupAndCount :: Integral a => a -> [a] -> [Int]
groupAndCount range list = map length $ groupings range list
Which should mirror the wanted behavior.

You'll need to quantize in order to get the definitions of the bins.
-- `quantize range n` rounds n down to the nearest multiple of range
quantize :: Int -> Int -> Int
groupBy takes a "predicate" argument*, which identifies whether two items should be placed in the same bin. So:
groupBy (\n m -> quantize range n == quantize range m) :: [Int] -> [[Int]]
will group elements by whether they are in the same bin, without changing the elements. If range is 2, that will give you something like
[[1],[2,2,3,3],[4,5],[6]]
Then you just have to take the length of each sublist.
* There's a neat function called on which allows you to write the predicate more succinctly
groupBy ((==) `on` quantize range)

Related

Return the first iteration of a sequence from a list in Haskell

I have an array of sequences, with infinite iterations (e.g. [6,6,6,6,6] or [23, 24, 23, 24] or [1, 2, 3, 4, 1, 2, 3, 4])
How do I iterate through each such list in Haskell and return only the first iteration? In case of the above examples: [6]; [23, 24]; [1, 2, 3, 4]
Thanks!
Edit: Sorry, I wasn't precise. The lists are indeed infinte. My goal is to return a list of the aliquot sequence of a given Integer. I have a function which returns the sum of the dividers. I started a recursive call with the first sum, and constructed the list. That resulted in lists like [6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6...].
First I wanted to solve the problem by taking the first part of the list, up to the second occurrence of the original Int. But then it hit me: It's easier to check with elem if the sum exists in the list. If yes, return the list as-is. Otherwise, append the sum and go on.
Edit 2: The code that produces the (in my definition at least) infinite list is the following chunk:
aliquot :: (Integral a) => a -> [a]
aliquot 0 = []
aliquot 1 = [1]
aliquot n = n : (aliquot $ sum $ divisors n)
divisors :: (Integral a) => a -> [a]
divisors n = filter ((0 ==) . (n `mod`)) [1 .. (n `div` 2)]

This isn't possible. Consider this sequence: [1,1,1,1,1,1,1... It looks like the answer here is [1], right? Wrong. That sequence was actually the greatest common factor of (x^17 + 9) and ((x + 1)^17 + 9), which stops being all 1s once you get to the 8424432925592889329288197322308900672459420460792433rd element. Or consider this other sequence: [1,1,1,1,1,1,1.... The world's greatest mathematicians aren't sure what the answer is for this sequence. It's the smallest number ever reached by the Collatz sequence for each starting value, and although we've never found a number where this isn't 1, we haven't been able to actually prove that it will be 1 for all numbers.

Haskell - Is there a better way of evenly distributing an element over a list

Given a matrix like this
matrix_table =
[[ 0, 0, 0, 0]
,[ 0, 0, 0, 0]
,[ 0, 0, 0, 0]
,[ 0, 0, 0, 0]
]
and a list position_list = [2, 3, 2, 10]
output of a function
distribute_ones :: [[Int]] -> [Int] -> [[Int]]
distribute_ones matrix_table position_list
should look like this
[[ 0, 1, 0, 1] -- 2 '1's in the list
,[ 0, 1, 1, 1] -- 3 '1's in the list
,[ 0, 1, 0, 1] -- 2 '1's in the list
,[ 1, 1, 1, 1] -- Since 10 > 4, all '1's in the list
]
What I tried:
I generated list of lists, the base matrix with
replicate 4 (replicate 4 0)
then divided inner lists with chunksOf from Data.List.Split library to make cut-outs of 4 - (position_list !! nth).
Finally appending and concatenating with 1 like this
take 4 . concat . map (1 :)
Although I think it's not exactly the best approach.
Is there a better way of doing that?

For evenly distributing elements, I recommend Bjorklund's algorithm. Bjorklund's algorithm takes two sequences to merge, and repeatedly:
Merges as much of the prefix of the two as it can, taking one from each, then
recursively calls itself with the merged elements as one sequence and the leftovers from the longer input as the other sequence.
In code:
bjorklund :: [[a]] -> [[a]] -> [a]
bjorklund xs ys = case zipMerge xs ys of
([], leftovers) -> concat leftovers
(merged, leftovers) -> bjorklund merged leftovers
zipMerge :: [[a]] -> [[a]] -> ([[a]], [[a]])
zipMerge [] ys = ([], ys)
zipMerge xs [] = ([], xs)
zipMerge (x:xs) (y:ys) = ((x++y):merged, leftovers) where
~(merged, leftovers) = zipMerge xs ys
Here's some examples in ghci:
> bjorklund (replicate 2 [1]) (replicate 2 [0])
[1,0,1,0]
> bjorklund (replicate 5 [1]) (replicate 8 [0])
[1,0,0,1,0,1,0,0,1,0,0,1,0]
If you like, you could write a small wrapper that takes just the arguments you care about.
ones len numOnes = bjorklund
(replicate ((-) len numOnes) [0])
(replicate (min len numOnes) [1])
In ghci:
> map (ones 4) [2,3,2,10]
[[0,1,0,1],[0,1,1,1],[0,1,0,1],[1,1,1,1]]

Here's an alternate algorithm to distribute itemCount items across rowLength cells within a single row. Initialize currentCount to 0. Then for each cell:
Add itemCount to currentCount.
If the new currentCount is less than rowLength, use the original value of the cell.
If the new currentCount is at least rowLength, subtract rowLength, and increment the value of the cell by one.
This algorithm produces the output you expect from the input you provide.
We can write the state required for this as a simple data structure:
data Distribution = Distribution { currentCount :: Int
, itemCount :: Int
, rowLength :: Int
} deriving (Eq, Show)
At each step of the algorithm we need to know whether we're emitting an output (and incrementing the value), and what the next state value will be.
nextCount :: Distribution -> Int
nextCount d = currentCount d + itemCount d
willEmit :: Distribution -> Bool
willEmit d = (nextCount d) >= (rowLength d)
nextDistribution :: Distribution -> Distribution
nextDistribution d = d { currentCount = (nextCount d) `mod` (rowLength d) }
To keep this as running state, we can package it in the State monad. Then we can write the "for each cell" list above as a single function:
distributeCell :: Int -> State Distribution Int
distributeCell x = do
emit <- gets willEmit
modify nextDistribution
return $ if emit then x + 1 else x
To run this over a whole row, we can use the traverse function from the standard library. This takes some sort of "container" and a function that maps single values to monadic results, and creates a "container" of the results inside the same monad. Here the "container" type is [a] and the "monad" type is State Distribution a, so the specialized type signature of traverse is
traverse :: (Int -> State Distribution Int)
-> [Int]
-> State Distribution [Int]
We don't actually care about the final state, we just want the resulting [Int] out, which is what evalState does. This would produce:
distributeRow :: [Int] -> Int -> [Int]
distributeRow row count =
evalState
(traverse distributeCell row :: State Distribution [Int])
(Distribution 0 count (length row))
Applying this to the whole matrix is a simple application of zipWith (given two lists and a function, call the function repeatedly with pairs of items from the two lists, returning the list of results):
distributeOnes :: [[Int]] -> [Int] -> [[Int]]
distributeOnes = zipWith distributeRow

Get n elements of list having the highest property

I'm new to Haskell and trying to implement some genetic algorithms.
Currently I fail with the selection of the n best element of a list of individuals (where each individual is a list for itself.
An individual is created as follows:
ind1 :: [Int]
ind1 = [1, 1, 1, 1, 1, 1, 1]
ind2 :: [Int]
ind2 = [0, 0, 0, 0, 0, 0, 0]
The appropriate population consists of a list of those individuals:
pop :: [[Int]]
pop = [ind1, ind2]
What I want to achieve is to get the best n individuals of the population, where the "best" is determined by the sum of its elements, e.g.,
> sum ind1
7
> sum ind2
0
I started creating a function for creating tuples with individual and its quality:
f x = [(ind, sum ind) | ind <- x]
so at least I got something like this:
[([1, 1, 1, 1, 1, 1, 1], 7), ([0, 0, 0, 0, 0, 0, 0], 0)]
How do I get from here to the expected result? I do not even manage to get the "fst" of the tuple where "snd == max".
I started with recursive approaches as seen in different topics, but unfortunately without reasonable result.
Any suggestions, probably also where to read?
Thank you!

The best choice here is to use sortBy from Data.List:
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
The sortBy function is higher order, so it takes a function as one of its arguments. The function it needs is one that takes two elements and returns a Ordering value (LT, EQ or GT). You can write your own custom comparison function, but the Data.Ord module has comparing, which exists to help with writing these comparison functions:
comparing :: Ord b => (a -> b) -> (a -> a -> Ordering)
Hopefully you can see how comparing pairs with sortBy, you pass it a function to convert your type to a known comparable type, and then you have a function of the right type to pass to sortBy. So in practice you can do
import Data.List (sortBy)
import Data.Ord (comparing)
-- Some types to make things more readable
type Individual = [Int]
type Fitness = Int
-- Here's our fitness function (change as needed)
fitness :: Individual -> Fitness
fitness = sum
-- Redefining so it can be used with `map`
f :: Individual -> (Individual, Fitness)
f ind = (ind, fitness ind)
-- If you do want to see the fitness of the top n individuals
solution1 :: Int -> [Individual] -> [(Individual, Fitness)]
solution1 n inds = take n $ sortBy (flip $ comparing snd) $ map f inds
-- If you just want the top n individuals
solution2 :: Int -> [Individual] -> [Individual]
solution2 n inds = take n $ sortBy (flip $ comparing fitness) inds
The flip in the arguments to sortBy forces the sort to be descending instead of the default ascending, so the first n values returned from sortBy will be the n values with the highest fitness in descending order. If you wanted to try out different fitness functions then you could do something like
fittestBy :: (Individual -> Fitness) -> Int -> [Individual] -> [Individual]
fittestBy fit n = take n . sortBy (flip $ comparing fit)
Then you'd have
solution2 = fittestBy sum
But you could also have
solution3 = fittestBy product
if you wanted to change your fitness function to be the product rather than the sum.

Use sortBy and on.
> take 2 $ sortBy (flip compare `on` sum) [[1,2],[0,4],[1,1]]
[[0,4],[1,2]]

Comparing List Elements in Haskell

I'm just learning Haskell and am kind of stuck.
I'd like to compare list elements and measure the difference between them and return the highest one.
Unfortunatly, I do not know how to approach that problem.
For usual, I'd just iterate the list and compare the neighbours but that does not seem to be the way to go in Haskell.
I already tried using map but as I said I do not really know how you can solve that problem.
I'd be thankful for every kind of advice!
Best wishes
Edit: My idea is to first zip all pairs like this pairs a = zip a (tail a). Then I'd like to get all differences (maybe with map?) and then just chose the highest one. I just can't handle the Haskell syntax.

I don't know what you mean by "measure the discrepancy" between list elements, but if you want to calculate the "largest" element in a list, you'd use the built-in maximum function:
maximum :: Ord a => [a] -> a
This function takes a list of values that can be ordered, so all numbers, chars, and strings, among others.
If you want to get the difference between the maximum value and the minimum value, you can use the similar function minimum, then just subtract the two. Sure, there might be a slightly faster solution whereby you only traverse the list once, or you could sort the list then take the first and last elements, but for most cases doing diff xs = maximum xs - minimum xs is plenty fast enough and makes the most sense to someone else.
So what you want to do is compute a difference between successive elements, not calculate the minimum and maximum of each element. You don't need to index directly, but rather use a handy function called zipWith. It takes a binary operation and two lists, and "zips" them together using that binary operation. So something like
zipWith (+) [1, 2, 3] [4, 5, 6] = [1 + 4, 2 + 5, 3 + 6] = [5, 7, 9]
It is rather handy because if one of the lists runs out early, it just stops there. So you could do something like
diff xs = zipWith (-) xs ???
But how do we offset the list by 1? Well, the easy (and safe) way is to use drop 1. You could use tail, but it'll throw an error and crash your program if xs is an empty list, but drop will not
diff xs = zipWith (-) xs $ drop 1 xs
So an example would be
diff [1, 2, 3, 4] = zipWith (-) [1, 2, 3, 4] $ drop 1 [1, 2, 3, 4]
= zipWith (-) [1, 2, 3, 4] [2, 3, 4]
= [1 - 2, 2 - 3, 3 - 4]
= [-1, -1, -1]
This function will return positive and negative values, and we're interested only in the magnitude, so we can then use the abs function:
maxDiff xs = ??? $ map abs $ diff xs
And then using the function I highlighted above:
maxDiff xs = maximum $ map abs $ diff xs
And you're done! If you want to be fancy, you could even write this in point-free notation as
maxDiff = maximum . map abs . diff
Now, this will in fact raise an error on an empty list because maximum [] throws an error, but I'll let you figure out a way to solve that.

As mentioned by bheklilr, maximum is the quick and easy solution.
If you want some of the background though, here's a bit. What we're trying to do is take a list of values and reduce it to a single value. This is known as a fold, and is possible with (among others) the foldl function, which has the signature foldl :: (a -> b -> a) -> a -> [b] -> a.
The (a -> b -> a) section of foldl is a function which takes two values and returns one of the first type. In our case, this should be our comparison function:
myMax :: Ord a => a -> a -> a
myMax x y | x > y = x
| otherwise = y
(note that Ord a is required so that we can compare our values).
So, we can say
-- This doesn't work!
myMaximum :: Ord a => [a] -> a
myMaximum list = foldl myMax _ list
But what is _? It doesn't make sense to have a starting value for this function, so we turn instead to foldl1, which does not require a starting value (instead it takes the first two values from the list). That makes our maximum function
myMaximum :: Ord a => [a] -> a
myMaximum list = foldl1 myMax list
or, in pointfree format,
myMaximum :: Ord a => [a] -> a
myMaximum = foldl1 myMax
If you look at the actual definition of maximum in Data.List, you'll see it uses this same method.

map maps a function over a list. It transforms each thing1 in a list to a thing2.
What you want is to find the biggest difference between two neighbours, which you can't do with map alone. I'll assume you're only looking at numbers for now, because that's just easier.
diffs :: (Num a) => [a] -> [a]
diffs [] = []
diffs [x] = []
diffs (x1:x2:xs) = abs(x1-x2) : (diffs$x2:xs)
mnd :: (Num a, Ord a) => [a] -> a
mnd [] = 0
mnd [x] = 0
mnd xs = maximum$diffs xs
So diffs takes each list item one at a time and gets the absolute difference between it and its neighbour, then puts that at the front of a list it creates at it goes along (the : operator puts an individual element at the front of a list).
mnd is just a wrapper around maximum$diffs xs that stop exceptions being thrown.

Haskell: surprising behavior of "groupBy"

I'm trying to figure out the behavior of the library function groupBy (from Data.List), which purports to group elements of a list by an "equality test" function passed in as the first argument. The type signature suggests that the equality test just needs to have type
(a -> a -> Bool)
However, when I use (<) as the "equality test" in GHCi 6.6, the results are not what I expect:
ghci> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Instead I'd expect runs of strictly increasing numbers, like this:
[[1,2,3],[2,4],[1,5,9]]
What am I missing?

Have a look at the ghc implementation of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
Now compare these two outputs:
Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
[[8],[2,3],[2,4],[1,5,9]]
In short, what happens is this: groupBy assumes that the given function (the first argument) tests for equality, and thus assumes that the comparison function is reflexive, transitive and symmetric (see equivalence relation). The problem here is that the less-than relation is not reflexive, nor symmetric.
Edit: The following implementation only assumes transitivity:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' _ [x] = [[x]]
groupBy' cmp (x:xs#(x':_)) | cmp x x' = (x:y):ys
| otherwise = [x]:r
where r#(y:ys) = groupBy' cmp xs

The fact that "<" isn't an equality test.
You might expect some behavior because you'd implement it differently, but that isn't what it promises.
An example of why what it outputs is a reasonable answer is if it sweeps through it, doing
[1, 2, 3, 2, 4, 1, 5, 9] ->
[[1,2,3], [2,4], [1,5,9]]
Now has 3 groups of equal elements. So it checks if any of them are in fact the same:
Since it knows all elements in each group is equal, it can just look at the first element in each, 1, 2 and 1.
1 > 2? Yes! So it merges the first two groups.
1 > 1? No! So it leaves the last group be.
And now it's compared all elements for equality.
...only, you didn't pass it the kind of function it expected.
In short, when it wants an equality test, give it an equality test.

The problem is that the reference implementation of groupBy in the Haskell Report compares elements against the first element, so the groups are not strictly increasing (they just have to be all bigger than the first element). What you want instead is a version of groupBy that tests on adjacent elements, like the implementation here.

I'd just like to point out that the groupBy function also requires your list to be sorted before being applied.
For example:
equalityOp :: (a, b1) -> (a, b2) -> Bool
equalityOp x y = fst x == fst y
testData = [(1, 2), (1, 4), (2, 3)]
correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]
otherTestData = [(1, 2), (2, 3), (1, 4)]
incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]
This behaviour comes about because groupBy is using span in its definition. To get reasonable behaviour which doesn't rely on us having the underlying list in any particular order we can define a function:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' eq [] = []
groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
where similarResults = filter (eq x) xs
differentResults = filter (not . eq x) xs

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to group and count in haskell? - haskell

Related

Return the first iteration of a sequence from a list in Haskell

Haskell - Is there a better way of evenly distributing an element over a list

Get n elements of list having the highest property

Comparing List Elements in Haskell

Haskell: surprising behavior of "groupBy"

Categories

Resources