This is my FIRST haskell program! "wordCount" takes in a list of words and returns a tuple with with each case-insensitive word paired with its usage count. Any suggestions for improvement on either code readability or performance?
import List;
import Char;
uniqueCountIn ns xs = map (\x -> length (filter (==x) xs)) ns
nubl (xs) = nub (map (map toLower) xs) -- to lowercase
wordCount ws = zip ns (uniqueCountIn ns ws)
where ns = nubl ws
Congrats on your first program!
For cleanliness: lose the semicolons. Use the new hierarchical module names instead (Data.List, Data.Char). Add type signatures. As you get more comfortable with function composition, eta contract your function definitions (remove rightmost arguments). e.g.
nubl :: [String] -> [String]
nubl = nub . map (map toLower)
If you want to be really rigorous, use explicit import lists:
import Data.List (nub)
import Data.Char (toLower)
For performance: use a Data.Map to store the associations instead of nub and filter. In particular, see fromListWith and toList. Using those functions you can simplify your implementation and improve performance at the same time.
One of the ways to improve readibility is to try to get used to the standard functions. Hoogle is one of the tools that sets Haskell apart from the rest of the world ;)
import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&))
wordCount :: String -> [(String, Int)]
wordCount = map (head &&& length) . group . sort . words . map toLower
EDIT: Explanation: So you think of it as a chain of mappings:
(map toLower) :: String -> String lowercases the entire text, for the purpose of case
insensitivity
words :: String -> [String] splits a piece of text into words
sort :: Ord a => [a] -> [a] sorts
group :: Eq a => [a] -> [[a]] gathers identicial elements in a list, for example, group
[1,1,2,3,3] -> [[1,1],[2],[3,3]]
&&& :: (a -> b) -> (a -> c) -> (a -> (b, c)) applies two functions on the same piece of data, then returns
the tuple of results. For example: (head &&& length) ["word","word","word"] -> ("word", 3) (actually &&& is a little more general, but the simplified explanation works for this example)
EDIT: Or actually, look for the "multiset" package on Hackage.
It is always good to ask more experienced developers for feedback. Nevertheless you could use hlint to get feedback on some small scale issues. It'll tell you about hierarchical imports, unncessary parenthesis, alternative higher-order functions, etc.
Regarding the function, nub1. If you don't follow luqui's advice to remove the parameter altogether yet, I would at least remove the parenthesis around xs on the right side of the equation.
Related
I know and love my filter, map and reduce, which happen to be part of more and more languages that are not really purely functional.
I found myself needing a similar function though: something like map, but instead of one to one it would be one to many.
I.e. one element of the original list might be mapped to multiple elements in the target list.
Is there already something like this out there or do I have to roll my own?
This is exactly what >>= specialized to lists does.
> [1..6] >>= \x -> take (x `mod` 3) [1..]
[1,1,2,1,1,2]
It's concatenating together the results of
> map (\x -> take (x `mod` 3) [1..]) [1..6]
[[1],[1,2],[],[1],[1,2],[]]
You do not have to roll your own. There are many relevant functions here, but I'll highlight three.
First of all, there is the concat function, which already comes in the Prelude (the standard library that's loaded by default). What this function does, when applied to a list of lists, is return the list that contains concatenated contents of the sublists.
EXERCISE: Write your own version of concat :: [[a]] -> [a].
So using concat together with map, you could write this function:
concatMap :: (a -> [b]) -> [a] -> [b]
concatMap f = concat . map f
...except that you don't actually need to write it, because it's such a common pattern that the Prelude already has it (at a more general type than what I show here—the library version takes any Foldable, not just lists).
Finally, there is also the Monad instance for list, which can be defined this way:
instance Monad [] where
return a = [a]
as >>= f = concatMap f as
So the >>= operator (the centerpiece of the Monad class), when working with lists, is exactly the same thing as concatMap.
EXERCISE: Skim through the documentation of the Data.List module. Figure out how to import the module into your code and play around with some of the functions.
Suppose I have a tree represented as a list of parents and I want to reverse the edges, obtaining a list of children for each node. For this tree - http://i.stack.imgur.com/uapqT.png - transformation would look like:
[0,0,0,1,1,2,5,4,4] -> [[2,1],[4,3],[5],[],[8,7],[6],[],[],[]]
But it's not limited to graph transposing, however. I have a few other problems that I would solve in imperative language in the following way: traverse some source data array and non-sequentially update a resulting array as I get to know something about it.
Essentially, my question is "what is Haskell's idiomatic way to solve things like this?". As I understand, I can do it in imperative way by means of mutable vectors, but isn't there some purely functional method? If not, how would I properly use mutables?
Finally, I need it to work fast, that is O(n) complexity, and non-standard packages are not an option for me.
It's worth to consider the pure functions in Data.Vector or Data.Array that internally use mutation, in order to be more efficient (the accum-s in both libraries, plus the unfolds and construct-s in vector).
The accum-s are great when we don't care about intermediate states of an array during construction. They're nicely applicable for transposing graphs, although we have to provide a range for the node keys:
{-# LANGUAGE TupleSections #-}
import qualified Data.Array as A
type Graph = [(Int, [Int])]
transpose :: (Int, Int) -> Graph -> Graph
transpose range g =
A.assocs $ A.accumArray (flip (:)) [] range (do {(i, ns) <- g; map (,i) ns})
Here we first unroll the graph into an adjacency list, but with swapped pairs of indices, and then accumulate them into an array. It's roughly as fast as a standard imperative loop over a mutable array, and it's more convenient than the ST monad.
Alternatively, we can just use IntMap, likely alongside the State monad, and just port our imperative algorithms as they are, and the performance will be satisfactory for most purposes.
Fortunately IntMap provides a lot of higher-order functions, so we're not (always) forced to program in an imperative style with it. There's an analogue for accum, for instance:
import qualified Data.IntMap.Strict as IM
transpose :: Graph -> Graph
transpose g =
IM.assocs $ IM.fromListWith (++) (do {(i, ns) <- g; (i,[]) : map (,[i]) ns})
A purely functional way would be to use a map to store the information, producing O(n log n) algorithm:
import qualified Data.IntMap as IM
import Data.Maybe (fromMaybe)
childrenMap :: [Int] -> IM.IntMap [Int]
childrenMap xs = foldr addChild IM.empty $ zip xs [0..]
where
addChild :: (Int, Int) -> IM.IntMap [Int] -> IM.IntMap [Int]
addChild (parent, child) = IM.alter (Just . (child :) . fromMaybe []) parent
You could also use an imperative solution and keep things pure using the ST monad, which is obviously O(n), but the imperative code somewhat obscures the main idea:
import Control.Monad (forM_)
import Data.Array
import Data.Array.MArray
import Data.Array.ST
childrenST :: [Int] -> [[Int]]
childrenST xs = elems $ runSTArray $ do
let l = length xs
arr <- newArray (0, l - 1) []
let add (parent, child) =
writeArray arr parent . (child :) =<< readArray arr parent
forM_ (zip xs [0..]) add
return arr
One drawback of this approach is that an index is out of bounds, it just fails.
Another is that you traverse the list twice. However, if you used arrays instead of lists everywhere, this wouldn't matter.
I've been using this page on the Haskell website all day and its been really helpful with learning list functions: http://www.haskell.org/haskellwiki/How_to_work_on_lists
My task at the moment is to write a single line statement that returns the number of characters (a-Z) that are used in a string. I can't seem to find any help on the above page or anywhere else on the internet
I know how to count characters in a string by using length nameoflist, but I'm not sure how I would go about counting the number of a-Z characters that have been used, for example 'starT to' should return 6
Any help is appreciated, thanks
An alternative to #Sibi's perfectly fine answer is to use a combination of sort and group from Data.List:
numUnique :: Ord a => [a] -> Int
numUnique = length . group . sort
This imposes the tighter restriction of Ord instead of just Eq, but I believe might be somewhat faster since nub is not known for its efficiency. You can also use a very similar function to count the number of each unique element in the list:
elemFrequency :: Ord a => [a] -> [(a, Int)]
elemFrequency = map (\s -> (head s, length s)) . group . sort
Or if you want to use the more elegant Control.Arrow form
elemFrequency = map (head &&& length) . group . sort
It can be used as
> elemFrequency "hello world"
[(' ',1),('d',1),('e',1),('h',1),('l',3),('o',2),('r',1),('w',1)]
You can remove the duplicate elements using nub and find the length of the resulting list.
import Data.List (nub)
numL :: Eq a => [a] -> Int
numL xs = length $ nub xs
Demo in ghci:
ghci > numL "starTto"
6
In case you don't want to consider a whitespace in the String, then remove it using a filter or any other appropriate function.
There are a few ways to do this, depending on what structure you want to use.
If you want to use Eq structure, you can do this with nub. If the inputs denote a small set of characters, then this is fairly good. However, if there are a lot of distinct alphabetic characters (remember that "Å" and "Ω" are both alphabetic, according to isAlpha), then this technique will have poor performance (quadratic running time).
import Data.Char (isAlpha)
import Data.List (nub)
distinctAlpha :: String -> Int
distinctAlpha = length . nub . filter isAlpha
You can increase performance for larger sets of alphabetic characters by using additional structure. Ord is the first choice, and allows you to use Data.Set, which gives O(N log N) asymptotic performance.
import Data.Char (isAlpha)
import Data.Set (size, fromList)
distinctAlpha :: String -> Int
distinctAlpha = size . fromList . filter isAlpha
First, filter the list in order to remove any non a-Z characters; second, remove duplicate elements; third, calculate its length.
import Data.Char (isAlpha)
import Data.List (nub)
count = length . nub . filter isAlpha
numberOfCharacters = length . Data.List.nub . filter Data.Char.isAlpha
I am converting the zxcvbn password strength algorithm to Haskell.
I have two functions that check for all characters being ASCII and that a brute force attack is possible:
filterAscii :: [String] -- ^terms to filter
-> [String] -- ^filtered terms
filterAscii = filter $ all (\ chr -> ord chr < 128)
and
filterShort :: [String] -- ^terms to filter
-> [String] -- ^filtered terms
filterShort terms = map fst $ filter long $ zip terms [1..]
where long (term, index) = (26 ^ length term) > index
I composed these into a single function:
filtered :: [String] -- ^terms to filter
-> [String] -- ^filtered terms
filtered = filterAscii . filterShort
I now have need to compose these with a third filter to check if the terms are not null:
filter (not . null) terms
It has occurred to me that I am creating a chain of filters and that it would make more sense to create a single function that takes a list of filter functions and composes them in the order given.
If I recall from my reading, this is a job for an applicative functor, I believe. Can I use applicatives for this?
I am not sure how to handle the filterShort function where I need to zip each item with its one-based index before filtering.
You can use the Endo wrapper from Data.Monoid to get a monoid instance that will allow you to use mconcat like so:
Prelude> :m + Data.Monoid
Prelude Data.Monoid> :t appEndo $ mconcat [Endo filterAscii, Endo filterShort]
appEndo $ mconcat [Endo filterAscii, Endo filterShort] :: [String] -> [String]
In other words, you want :
filters :: [a -> Bool] -> [a] -> [a]
filters fs = filter (\a -> and $ map ($ a) fs)
But you should also know that a pipeline of filters is very likely to be optimized by GHC (as far as I know) anyway. So it may not be worth it to create this function. Note that there will be some problems with your filterShort since it's not a pure filter.
is GHC intelligent enough to run multiple operations on lists in 'semi-parallel'?
Consider this (simplified) code:
findElements bigList = do
let special = head . filter isSpecial $ bigList
let others = filter isSpecialOrNormal $ bigList
return (special, others)
(Monad due to original code)
I guess GHC will run the first list operation and will keep all elements in memory so that the second operation is able to work on them.
My problem is that i am running into a spaceleak when dealing with larger files. But i believe it should be able to run in constant space. Is there a way to achieve this?
Update 1
Having written it down like this the solution to this problem of course is to change the order of the two lines.
But my question remains: is the GHC intelligent enough to figure out this semi-parallel processing when it not done in a monad?
I don't think GHC is smart enough to merge these two traversals, or, as is usually the case, GHC could be smart enough, but there are cases where you don't want this behavior, so GHC doesn't do it.
Here's how I would do it, using monoids and foldMap.
import Data.Monoid
import Data.Foldable
First, here's how to write special with foldMap, using the First monoid.
specialF :: a -> First a
specialF a = First $ if isSpecial a then Just a else Nothing
special :: [a] -> a
special as = let (First (Just s)) = foldMap specialF as in s
And similar for specialOrNormal, using the list monoid.
specialOrNormalF :: a -> [a]
specialOrNormalF a = if isSpecialOrNormal a then [a] else []
specialOrNormal :: [a] -> [a]
specialOrNormal = foldMap specialOrNormalF
One neat thing about monoids is that a tuple of monoids is also a monoid, which makes merging these folds easy:
findElements :: [a] -> (a, [a])
findElements bigList =
let (First (Just s), son) =
foldMap (\a -> (specialF a, specialOrNormalF a)) bigList
in (s, son)
And if you like point-free code, you can write the whole thing like this:
findElements :: [a] -> (a, [a])
findElements =
first (fromJust . getFirst) .
foldMap
( First . mfilter isSpecial . return
&&& mfilter isSpecialOrNormal . return
)