Haskell - Returning the number of a-Z characters used in a string - string

I've been using this page on the Haskell website all day and its been really helpful with learning list functions: http://www.haskell.org/haskellwiki/How_to_work_on_lists
My task at the moment is to write a single line statement that returns the number of characters (a-Z) that are used in a string. I can't seem to find any help on the above page or anywhere else on the internet
I know how to count characters in a string by using length nameoflist, but I'm not sure how I would go about counting the number of a-Z characters that have been used, for example 'starT to' should return 6
Any help is appreciated, thanks

An alternative to #Sibi's perfectly fine answer is to use a combination of sort and group from Data.List:
numUnique :: Ord a => [a] -> Int
numUnique = length . group . sort
This imposes the tighter restriction of Ord instead of just Eq, but I believe might be somewhat faster since nub is not known for its efficiency. You can also use a very similar function to count the number of each unique element in the list:
elemFrequency :: Ord a => [a] -> [(a, Int)]
elemFrequency = map (\s -> (head s, length s)) . group . sort
Or if you want to use the more elegant Control.Arrow form
elemFrequency = map (head &&& length) . group . sort
It can be used as
> elemFrequency "hello world"
[(' ',1),('d',1),('e',1),('h',1),('l',3),('o',2),('r',1),('w',1)]

You can remove the duplicate elements using nub and find the length of the resulting list.
import Data.List (nub)
numL :: Eq a => [a] -> Int
numL xs = length $ nub xs
Demo in ghci:
ghci > numL "starTto"
6
In case you don't want to consider a whitespace in the String, then remove it using a filter or any other appropriate function.

There are a few ways to do this, depending on what structure you want to use.
If you want to use Eq structure, you can do this with nub. If the inputs denote a small set of characters, then this is fairly good. However, if there are a lot of distinct alphabetic characters (remember that "Å" and "Ω" are both alphabetic, according to isAlpha), then this technique will have poor performance (quadratic running time).
import Data.Char (isAlpha)
import Data.List (nub)
distinctAlpha :: String -> Int
distinctAlpha = length . nub . filter isAlpha
You can increase performance for larger sets of alphabetic characters by using additional structure. Ord is the first choice, and allows you to use Data.Set, which gives O(N log N) asymptotic performance.
import Data.Char (isAlpha)
import Data.Set (size, fromList)
distinctAlpha :: String -> Int
distinctAlpha = size . fromList . filter isAlpha

First, filter the list in order to remove any non a-Z characters; second, remove duplicate elements; third, calculate its length.
import Data.Char (isAlpha)
import Data.List (nub)
count = length . nub . filter isAlpha

numberOfCharacters = length . Data.List.nub . filter Data.Char.isAlpha

Related

How to generate strings drawn from every possible character?

At the moment I'm generating strings like this:
arbStr :: Gen String
arbStr = listOf $ elements (alpha ++ digits)
where alpha = ['a'..'z']
digits = ['0'..'9']
But obviously this only generates strings from alpha num chars. How can I do it to generate from all possible chars?
Char is a instance of both the Enum and Bounded typeclass, you can make use of the arbitraryBoundedEnum :: (Bounded a, Enum a) => Gen a function:
import Test.QuickCheck(Gen, arbitraryBoundedEnum, listOf)
arbStr :: Gen String
arbStr = listOf arbitraryBoundedEnum
For example:
Prelude Test.QuickCheck> sample arbStr
""
""
"\821749"
"\433465\930384\375110\256215\894544"
"\431263\866378\313505\1069229\238290\882442"
""
"\126116\518750\861881\340014\42369\89768\1017349\590547\331782\974313\582098"
"\426281"
"\799929\592960\724287\1032975\364929\721969\560296\994687\762805\1070924\537634\492995\1079045\1079821"
"\496024\32639\969438\322614\332989\512797\447233\655608\278184\590725\102710\925060\74864\854859\312624\1087010\12444\251595"
"\682370\1089979\391815"
Or you can make use of the arbitrary in the Arbitrary Char typeclass:
import Test.QuickCheck(Gen, arbitrary, listOf)
arbStr :: Gen String
arbStr = listOf arbitrary
Note that the arbitrary for Char is implemented such that ASCII characters are (three times) more common than non-ASCII characters, so the "distribution" is different.
Since Char is an instance of Bounded as well as Enum (confirm this by asking GHCI for :i Char), you can simply write
[minBound..maxBound] :: [Char]
to get a list of all legal characters. Obviously this will not lead to efficient random access, though! So you could instead convert the bounds to Int with Data.Char.ord :: Char -> Int, and use QuickCheck's feature to select from a range of integers, then map back to a character with Data.Chra.chr :: Int -> Char.
When we do like
λ> length ([minBound..maxBound] :: [Char])
1114112
we get the number of all characters and say Wow..! If you think the list is too big then you may always do like drop x . take y to limit the range.
Accordingly, if you need n many random characters just shuffle :: [a] -> IO [a] the list and do a take n from that shuffled list.
Edit:
Well of course... since shuffling could be expensive, it's best if we chose a clever strategy. It would be ideal to randomly limit the all characters list. So just
make a limits = liftM sort . mapM randomRIO $ replicate 2 (0,1114112) :: (Ord a, Random a, Num a) => IO [a]
limits >>= \[min,max] -> return . drop min . take max $ ([minBound..maxBound] :: [Char])
Finally just take n many like random Chars like liftM . take n from the result of Item 2.

most frequently occurring string (element) in a list?

I have made a function which prints every possible subsequence of a string. Now I need to make a function which prints the most common. Any ideas on where I can start. Not asking for fully coded functions just a place to start. Also, only using prelude functions (including base).
for example, if I enter "jonjo" my functions will return ["jonjo","jonj","jon","jo","j","onjo","onj"...] etc. The most common substring would be "jo".
In the case where there would be two or more most occurring substrings, only the longest would be printed. If still equal, any one of the substrings will suffice.
The problem as it is stated can be reduced to finding the most frequent character, since it is obvious that, for example, the first character in any "most frequent substring" will be AT LEAST as frequent as the substring itself.
I suggest you take a look at
sort :: Ord a => [a] -> [a]
from base Data.List
group :: Eq a => [a] -> [[a]]
from base Data.List
length :: [a] -> Int
from base Prelude, base Data.List
and
maximum :: Ord a => [a] -> a
from base Prelude, base Data.List
If you can really ony use prelude functions, then I suggest you implement these yourself, or design a datastructure to make this efficient, such as a trie.

First attempt at Haskell: Converting lower case letters to upper case

I have recently started learning Haskell, and I've tried creating a function in order to convert a lower case word to an upper case word, it works, but I don't know how good it is and I have some questions.
Code:
lowerToUpperImpl element list litereMari litereMici =
do
if not (null list) then
if (head list) == element then
['A'..'Z'] !! (length ['A'..'Z'] - length (tail list ) -1)
else
lowerToUpperImpl element (tail list) litereMari litereMici
else
'0' --never to be reached
lowerToUpper element = lowerToUpperImpl element ['a'..'z'] ['A'..'Z'] ['a'..'z']
lowerToUpperWordImpl word =
do
if not (null word) then
lowerToUpper (head (word)):(lowerToUpperWordImpl (tail word))
else
""
I don't like the way I have passed the upper case and lower case
letters , couldn't I just declare a global variables or something?
What would your approach be in filling the dead else branch?
What would your suggestions on improving this be?
Firstly, if/else is generally seen as a crutch in functional programming languages, precisely because they aren't really supposed to be used as branch operations, but as functions. Also remember that lists don't know their own lengths in Haskell, and so calculating it is an O(n) step. This is particularly bad for infinite lists.
I would write it more like this (if I didn't import any libraries):
uppercase :: String -> String
uppercase = map (\c -> if c >= 'a' && c <= 'z' then toEnum (fromEnum c - 32) else c)
Let me explain. This code makes use of the Enum and Ord typeclasses that Char satisfies. fromEnum c translates c to its ASCII code and toEnum takes ASCII codes to their equivalent characters. The function I supply to map simply checks that the character is lowercase and subtracts 32 (the difference between 'A' and 'a') if it is, and leaves it alone otherwise.
Of course, you could always just write:
import Data.Char
uppercase :: String -> String
uppercase = map toUpper
Hope this helps!
The things I always recommend to people in your circumstances are these:
Break the problem down into smaller pieces, and write separate functions for each piece.
Use library functions wherever you can to solve the smaller subproblems.
As an exercise after you're done, figure out how to write on your own the library functions you used.
In this case, we can apply the points as follows. First, since String in Haskell is a synonym for [Char] (list of Char), we can break your problem into these two pieces:
Turn a character into its uppercase counterpart.
Transform a list by applying a function separately to each of its members.
Second point: as Alex's answer points out, the Data.Char standard library module comes with a function toUpper that performs the first task, and the Prelude library comes with map which performs the second. So using those two together solves your problem immediately (and this is exactly the code Alex wrote earlier):
import Data.Char
uppercase :: String -> String
uppercase = map toUpper
But I'd say that this is the best solution (shortest and clearest), and as a beginner, this is the first answer you should try.
Applying my third point: after you've come up with the standard solution, it is enormously educational to try and write your own versions of the library functions you used. The point is that this way you learn three things:
How to break down problems into easier, smaller pieces, preferably reusable ones;
The contents of the standard libraries of the language;
How to write the simple "foundation" functions that the library provides.
So in this case, you can try writing your own versions of toUpper and map. I'll provide a skeleton for map:
map :: (a -> b) -> [a] -> [b]
map f [] = ???
map f (x:xs) = ???

How to define String of characters?

I'm new to Haskell hope someone will help me. I need to define a data structure for a string of characters (alphabet) which will represent a substitution cipher.
Since this is for representing a substituition cypher
type Cypher = [(Char, Char)]
makeCypher :: String -> Cypher
makeCypher s = zip ['a' .. 'z'] s
Here you just pass a string representing each new letter positionally, so "f.." will map a to f. It returns a list of pairs [('a', 'f')...].
Then to use it,
import Data.Maybe
encrypt :: Cypher -> String -> String
encrypt cyph = mapMaybe (flip lookup cyph)
Which just looks up each character in the list of pairs.
Another option is to use Data.Map which can be used almost identically to above, substituting zip for fromList and similar.
How about a List of Char
Prelude> let alphabet = ['a'..'z']
Prelude> alphabet
"abcdefghijklmnopqrstuvwxyz"

Simple word count in haskell

This is my FIRST haskell program! "wordCount" takes in a list of words and returns a tuple with with each case-insensitive word paired with its usage count. Any suggestions for improvement on either code readability or performance?
import List;
import Char;
uniqueCountIn ns xs = map (\x -> length (filter (==x) xs)) ns
nubl (xs) = nub (map (map toLower) xs) -- to lowercase
wordCount ws = zip ns (uniqueCountIn ns ws)
where ns = nubl ws
Congrats on your first program!
For cleanliness: lose the semicolons. Use the new hierarchical module names instead (Data.List, Data.Char). Add type signatures. As you get more comfortable with function composition, eta contract your function definitions (remove rightmost arguments). e.g.
nubl :: [String] -> [String]
nubl = nub . map (map toLower)
If you want to be really rigorous, use explicit import lists:
import Data.List (nub)
import Data.Char (toLower)
For performance: use a Data.Map to store the associations instead of nub and filter. In particular, see fromListWith and toList. Using those functions you can simplify your implementation and improve performance at the same time.
One of the ways to improve readibility is to try to get used to the standard functions. Hoogle is one of the tools that sets Haskell apart from the rest of the world ;)
import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&))
wordCount :: String -> [(String, Int)]
wordCount = map (head &&& length) . group . sort . words . map toLower
EDIT: Explanation: So you think of it as a chain of mappings:
(map toLower) :: String -> String lowercases the entire text, for the purpose of case
insensitivity
words :: String -> [String] splits a piece of text into words
sort :: Ord a => [a] -> [a] sorts
group :: Eq a => [a] -> [[a]] gathers identicial elements in a list, for example, group
[1,1,2,3,3] -> [[1,1],[2],[3,3]]
&&& :: (a -> b) -> (a -> c) -> (a -> (b, c)) applies two functions on the same piece of data, then returns
the tuple of results. For example: (head &&& length) ["word","word","word"] -> ("word", 3) (actually &&& is a little more general, but the simplified explanation works for this example)
EDIT: Or actually, look for the "multiset" package on Hackage.
It is always good to ask more experienced developers for feedback. Nevertheless you could use hlint to get feedback on some small scale issues. It'll tell you about hierarchical imports, unncessary parenthesis, alternative higher-order functions, etc.
Regarding the function, nub1. If you don't follow luqui's advice to remove the parameter altogether yet, I would at least remove the parenthesis around xs on the right side of the equation.

Resources