How to find the frequency of characters in a string in Haskell? - haskell

How can I count the frequency of characters in a string and then output them in sort of a table?
For example, if I input the word "happy" the result would be
h 1
a 1
p 2
y 1
If this could be ordered in ASCII order too that would be brilliant.
I know I need to use the count function, any other hints would be appreciated.
EDIT: All the answers are brilliant, only I'm such a beginner at Haskell that I don't actually understand what they are doing.

The simplest solution is to use a Data.Map to store the intermediate mapping from character to frequency. You can then construct the counts easily using fromListWith. Since Data.Map is sorted, you get them in ASCII order for free.
λ> :m + Data.Map
λ> let input = "happy"
λ> toList $ fromListWith (+) [(c, 1) | c <- input]
[('a',1),('h',1),('p',2),('y',1)]
So what's happening here?
The idea is to build a Data.Map (a tree map) using the characters as keys and the frequencies as values.
First, we take the input string and make tuples of each character with a 1 to indicate one occurrence.
λ> [(c, 1) | c <- input]
[('h',1),('a',1),('p',1),('p',1),('y',1)]
Next, we use fromListWith to build a sorted map from these key-value pairs by repeatedly inserting each key-value pair into a map. We also give it a function which will be used when a key was already in the map. In our case, we use (+) so that when a character is seen multiple times, we add the count to the existing sum.
Finally we covert the map back into a list of key-value tuples using toList.

There's probably something shorter, but this works:
Prelude> import Data.List
Prelude Data.List> map (\x -> (head x, length x)) $ group $ sort "happy"
[('h',1),('a',1),('p',2),('y',1)]

func xs = map (\a -> (head a, length a)) $ group $ sort xs

Use list comprehension, no need for any imports or sorting.
[ (x,c) | x<-['A'..'z'], let c = (length.filter (==x)) "happy", c>0 ]
Result:
[('a',1),('h',1),('p',2),('y',1)]
Above is the filtered and rewritten (only character with count > 0) from:
[(x,(length.filter (==x)) "happy" ) | x<-['A'..'z']]
Explanation:
Make a list of all characters that match a given character (A..z).
For each character, count this list (==length)
Put this count in a tuple with the character

I'll scetch a solution step by step. A shorter solution is possible using standard functions.
You want a sorted result, therefore
result = sort cs
where
cs would be a list of tuples, where the first element is the character and the second element is the number of times it appears.
cs = counts "happy"
counts [] = []
counts (c:cs) = (c, length otherc + 1) : counts nonc where
(otherc, nonc) = partition (c==) cs
That's all.
Interestingly, counts works on any list of items that support the == operator.

import Data.Array (Ix, accumArray, assocs)
eltDist :: (Bounded a, Ix a, Eq b, Num b) => [a] -> [(a, b)]
eltDist str = filter ((/=0) . snd ) $
assocs (accumArray (+) 0 (minBound, maxBound) [(i, 1) | i <- str])
"minBound" and "maxBound" are going to depend on the range of the type inferred for i. For Char it will be 0 - 1,114,111, which is extravagant but not impossible. It would be especially convenient if you were counting Unicode chars. If you are only interested in ASCII strings, then (0, 255) would do. A nice thing about arrays is that they can be indexed by any type that can be mapped to an integer. See Ix.
assocs pulls the indices and counts out of the array into a list of pairs and filter disposes of the unused ones.

//Count the frequency of character in a string
package waytocreateobject;
import java.util.Scanner;
public class Freqchara {
public static void main(String[] args) {
int c = 0, x = 0, loop = 26, space = 0;
int count[] = new int[loop];
//Arrays.fill(count, x);
Scanner sc = new Scanner(System.in);
String str =sc.nextLine();
char[] charr = str.toCharArray();
int aa = str.length();
//System.out.println(str.charAt(10));
//System.out.println(str.charAt(11));
for (int mm = 0; mm < aa; mm++) {
if (str.charAt(c) != '\0') { //Considering characters from 'a' to 'z' only and ignoring others.
if ((str.charAt(c) >= 'a') && (str.charAt(c) <= 'z')) {
x = (int) str.charAt(c) - (int) 'a';
// System.out.println(x);
count[x] = count[x] + 1;
}
c++;
} else {}
}
// printing all the charcter
int i = 97;
for (int j = 0; j < loop; j++) {
char ch = (char) (i + j);
System.out.println(ch + " occurs " + count[j] + " times in the string");
}
System.out.println(" occurs " + space);
}
}

Related

How can Haskell code utilise brackets to separate or tidy up information while using list comprehensions?

I am trying to write a function where the expression:
crosswordFind letter inPosition len words
should return all the items from words which
(i) are of the given length len and
(ii) have letter in the position inPosition.
For example, seven-letter words that have ’k’ in position 1, the expression:
crosswordFind ’k’ 1 7 ["funky", "fabulous", "kite", "icky", "ukelele"]
will return
["ukelele"]
Here is what I have so far:
crosswordFind :: Char -> Int -> Int -> [String] -> [String]
crosswordFind letter pos len words =
if isAlpha **words** == letter &&
**words** !! letter == pos &&
length **pos** == len
then words
else []
The code above is after altering to remove the brackets that I placed to separate each condition. The code below is the original one (which is wrong):
crosswordFind :: Char -> Int -> Int -> [String] -> [String]
crosswordFind letter pos len words =
[ if [isAlpha x == letter] &&
[xs !! n == pos] &&
[length x == len]
then words
else [] ]
I understand why it is wrong (because a list of length 1 will be returned), but why can't brackets like these be used to section off code in Haskell?
How can this question be solved using list comprehensions? And I'm wondering what to put in to replace the bolded words as well to make the code run normally.
You can filter with a condition that should satisfy two criteria:
the word has the given length; and
the character on position pos is the given letter.
For a word w of the words we thus check if length w == len and w !! pos == letter.
We thus can implement this with:
crosswordFind :: Eq a => a -> Int -> Int -> [[a]] -> [[a]]
crosswordFind letter pos len words = filter (\w -> length w == len && w !! pos == letter) words
we can also omit the words variable and work with:
crosswordFind :: Eq a => a -> Int -> Int -> [[a]] -> [[a]]
crosswordFind letter pos len = filter (\w -> length w == len && w !! pos == letter)
The above is not very safe: if the pos is greater than or equal to the length, then w !! pos == letter will raise an error. Furthermore for infinite strings (lists of Chars), length will loop forever. I leave it as an exercise to introduce safer variants. You can determine these with recursive functions.
Square brackets are part of list syntax. Nothing else.
Lists.
You can freely utilize round parentheses ( ) for the grouping of expressions.
Some expression types, like let and do, have their own separators {, ;, }, which can also be used, especially to prevent whitespace brittleness.

Retrieve value from tuple by giving number or letter in Haskell

I am very new to Haskell and am trying to retrieve 'a' if I give 0, 'b' if I give 1 and so on...
This is my code so far:
alpha = ['a'..'z']
numb = [0..25]
zippedChars = zip alpha numb
and this is the list:
I want to make something like: getCharfromNumb, and if I type getCharfromNumb 0 I should receive 'a'.
And getNumbfromChar 'a' should give me 0.
import Data.List
import Data.Tuple
getCharFromNum :: Int -> Maybe Char
getCharFromNum n = lookup n $ swap <$> zippedChars
getNumFromChar :: Char -> Maybe Int
getNumFromChar c = lookup c zippedChars
See Data.List.lookup, Data.Tuple.swap, and Data.Functor.<$>

OCaml equivalent to Haskell's # in pattern matching (a.k.a. as-pattern)

In Haskell, while pattern matching, I can use # to get the entire structure in a pattern. (For easier Googling, this structure is known as an as-pattern.)
For example, x:xs decomposes a list into a head and a tail; I can also get the entire list with xxs#(x:xs).
Does OCaml have something like this?
You can use as:
let f = function
| [] -> (* ... *)
| (x::xs) as l ->
(*
here:
- x is the head
- xs is the tail
- l is the whole list
*)
Let me extend Etienne's answer a little bit with some examples:
let x :: xs as list = [1;2;3];;
val x : int = 1
val xs : int list = [2; 3]
val list : int list = [1; 2; 3]
When you write <pattern> as <name>, the variable <name> is bound to the whole pattern on the left, in other words, the scope of as extends as far to the left as possible (speaking more techically as has lower priority than constructors, i.e., the constructors bind tighter). So, in case of the deep pattern matching, you might need to use parentheses to limit the scope, e.g.,
let [x;y] as fst :: ([z] as snd) :: xs as list = [[1;2];[3]; [4]];;
val x : int = 1
val y : int = 2
val fst : int list = [1; 2]
val z : int = 3
val snd : int list = [3]
val xs : int list list = [[4]]
val list : int list list = [[1; 2]; [3]; [4]]

Haskell: how to read values from stdin line-by-line and add them to a map?

I want to read strings from stdin and store them into a map, where key is the input string and value is the number of previous occurrences of this string. In Java I would have done something like this:
for (int i = 0; i < numberOfLines; i++) {
input = scanner.nextLine();
if (!map.containsKey(input)) {
map.put(input, 0);
System.out.println(input);
} else {
int num = map.get(input) + 1;
map.remove(input);
map.put(input, num);
System.out.println(input.concat(String.valueOf(num));
}
}
I've tried doing the same in Haskell by using forM_ but had no luck.
import Control.Monad
import qualified Data.Map as Map
import Data.Maybe
main = do
input <- getLine
let n = read input :: Int
let dataset = Map.empty
forM_ [1..n] (\i -> do
input <- getLine
let a = Map.lookup input dataset
let dataset' =
if isNothing a then
Map.insert input 0 dataset
else
Map.insert input num (Map.delete input dataset)
where num = ((read (fromJust a) :: Int) + 1)
let dataset = dataset'
let output = if isNothing a then
input
else
input ++ fromJust a
putStrLn output)
The contents of dataset in the above code does not change at all.
The Map defined in Data.Map is an immutable data type. Calling Map.insert returns a modified Map, it does not change the one you already have. What you want to do is iteratively apply updates in a loop. Something more like
import qualified Data.Map as M
import Data.Map (Map)
-- Adds one to an existing value, or sets it to 0 if it isn't present
updateMap :: Map String Int -> String -> Map String Int
updateMap dataset str = M.insertWith updater str 0 dataset
where
updater _ 0 = 1
updater _ old = old + 1
-- Loops n times, returning the final data set when n == 0
loop :: Int -> Map String Int -> IO (Map String Int)
loop 0 dataset = return dataset
loop n dataset = do
str <- getLine
let newSet = updateMap dataset str
loop (n - 1) newSet -- recursively pass in the new map
main :: IO ()
main = do
n <- fmap read getLine :: IO Int -- Combine operations into one
dataset <- loop n M.empty -- Start with an empty map
print dataset
Notice how this is actually less code (it's be even shorter if you just counted the number of occurrences, then updateMap dataset str = M.insertWith (+) str 1 dataset), and it separates the pure code from the impure.
In this case, you don't actually want to use forM_, because each step of the computation depends on the previous. It's preferred to write a recursive function that exits at a condition. If you so desired, you could also write loop as
loop :: Int -> IO (Map String Int)
loop n = go n M.empty
where
go 0 dataset = return dataset
go n dataset = getLine >>= go (n - 1) . updateMap dataset
Here I've compressed the body of the old loop into a single line and then put it inside go, this allows you to call it as
main :: IO ()
main = do
n <- fmap read getLine :: IO Int
dataset <- loop n
print dataset
This removes the need to know that you must pass in M.empty into loop for the first call, unless you have a use case to call loop multiple times on the same map.
Your problem is that Map.insert does not do what map.remove does in C++. Map.insert returns a new Map which has the element in it but you are simply throwing this new Map away. This is how nearly all Haskell data structures work, for instance the code:
main = do
let x = []
y = 5 : x
print x
prints the empty list []. The cons : operator does not destructively modify the empty list but returns a new list containing 5. Map.insert does the same but with Maps instead of lists.
First regarding your java code, you do not need to remove from the map before inserting a new value.
Regarding haskell, the language does not work the way you think it does : your let trick is not updating a value, everything is basically immutable in haskell.
Using only the basic getLine, one way to do it is to use recursion:
import qualified Data.Map as Map
type Dict = Map.Map String Int
makeDict ::Dict -> Int -> IO Dict
makeDict d remain = if remain == 0 then return d else do
l <- getLine
let newd = Map.insertWith (+) l 1 d
makeDict newd (remain - 1)
newDict count = makeDict Map.empty count

What can be improved on my first haskell program?

Here is my first Haskell program. What parts would you write in a better way?
-- Multiplication table
-- Returns n*n multiplication table in base b
import Text.Printf
import Data.List
import Data.Char
-- Returns n*n multiplication table in base b
mulTable :: Int -> Int -> String
mulTable n b = foldl (++) (verticalHeader n b w) (map (line n b w) [0..n])
where
lo = 2* (logBase (fromIntegral b) (fromIntegral n))
w = 1+fromInteger (floor lo)
verticalHeader :: Int -> Int -> Int -> String
verticalHeader n b w = (foldl (++) tableHeader columnHeaders)
++ "\n"
++ minusSignLine
++ "\n"
where
tableHeader = replicate (w+2) ' '
columnHeaders = map (horizontalHeader b w) [0..n]
minusSignLine = concat ( replicate ((w+1)* (n+2)) "-" )
horizontalHeader :: Int -> Int -> Int -> String
horizontalHeader b w i = format i b w
line :: Int -> Int -> Int -> Int -> String
line n b w y = (foldl (++) ((format y b w) ++ "|" )
(map (element b w y) [0..n])) ++ "\n"
element :: Int -> Int -> Int -> Int -> String
element b w y x = format (y * x) b w
toBase :: Int -> Int -> [Int]
toBase b v = toBase' [] v where
toBase' a 0 = a
toBase' a v = toBase' (r:a) q where (q,r) = v `divMod` b
toAlphaDigits :: [Int] -> String
toAlphaDigits = map convert where
convert n | n < 10 = chr (n + ord '0')
| otherwise = chr (n + ord 'a' - 10)
format :: Int -> Int -> Int -> String
format v b w = concat spaces ++ digits ++ " "
where
digits = if v == 0
then "0"
else toAlphaDigits ( toBase b v )
l = length digits
spaceCount = if (l > w) then 0 else (w-l)
spaces = replicate spaceCount " "
Here are some suggestions:
To make the tabularity of the computation more obvious, I would pass the list [0..n] to the line function rather than passing n.
I would further split out the computation of the horizontal and vertical axes so that they are passed as arguments to mulTable rather than computed there.
Haskell is higher-order, and almost none of the computation has to do with multiplication. So I would change the name of mulTable to binopTable and pass the actual multiplication in as a parameter.
Finally, the formatting of individual numbers is repetitious. Why not pass \x -> format x b w as a parameter, eliminating the need for b and w?
The net effect of the changes I am suggesting is that you build a general higher-order function for creating tables for binary operators. Its type becomes something like
binopTable :: (i -> String) -> (i -> i -> i) -> [i] -> [i] -> String
and you wind up with a much more reusable function—for example, Boolean truth tables should be a piece of cake.
Higher-order and reusable is the Haskell Way.
You don't use anything from import Text.Printf.
Stylistically, you use more parentheses than necessary. Haskellers tend to find code more readable when it's cleaned of extraneous stuff like that. Instead of something like h x = f (g x), write h = f . g.
Nothing here really requires Int; (Integral a) => a ought to do.
foldl (++) x xs == concat $ x : xs: I trust the built-in concat to work better than your implementation.
Also, you should prefer foldr when the function is lazy in its second argument, as (++) is – because Haskell is lazy, this reduces stack space (and also works on infinite lists).
Also, unwords and unlines are shortcuts for intercalate " " and concat . map (++ "\n") respectively, i.e. "join with spaces" and "join with newlines (plus trailing newline)"; you can replace a couple things by those.
Unless you use big numbers, w = length $ takeWhile (<= n) $ iterate (* b) 1 is probably faster. Or, in the case of a lazy programmer, let w = length $ toBase b n.
concat ( (replicate ((w+1)* (n+2)) "-" ) == replicate ((w+1) * (n+2)) '-' – not sure how you missed this one, you got it right just a couple lines up.
You do the same thing with concat spaces, too. However, wouldn't it be easier to actually use the Text.Printf import and write printf "%*s " w digits?
Norman Ramsey gave excellent high-level (design) suggestions; Below are some low-level ones:
First, consult with HLint. HLint is a friendly program that gives you rudimentary advice on how to improve your Haskell code!
In your case HLint gives 7 suggestions. (mostly about redundant brackets)
Modify your code according to HLint's suggestions until it likes what you feed it.
More HLint-like stuff:
concat (replicate i "-"). Why not replicate i '-'?
Consult with Hoogle whenever there is reason to believe that a function you need is already available in Haskell's libraries. Haskell comes with tons of useful functions so Hoogle should come in handy quite often.
Need to concatenate strings? Search for [String] -> String, and voila you found concat. Now go replace all those folds.
The previous search also suggested unlines. Actually, this even better suits your needs. It's magic!
Optional: pause and thank in your heart to Neil M for making Hoogle and HLint, and thank others for making other good stuff like Haskell, bridges, tennis balls, and sanitation.
Now, for every function that takes several arguments of the same type, make it clear which means what, by giving them descriptive names. This is better than comments, but you can still use both.
So
-- Returns n*n multiplication table in base b
mulTable :: Int -> Int -> String
mulTable n b =
becomes
mulTable :: Int -> Int -> String
mulTable size base =
To soften the extra characters blow of the previous suggestion: When a function is only used once, and is not very useful by itself, put it inside its caller's scope in its where clause, where it could use the callers' variables, saving you the need to pass everything to it.
So
line :: Int -> Int -> Int -> Int -> String
line n b w y =
concat
$ format y b w
: "|"
: map (element b w y) [0 .. n]
element :: Int -> Int -> Int -> Int -> String
element b w y x = format (y * x) b w
becomes
line :: Int -> Int -> Int -> Int -> String
line n b w y =
concat
$ format y b w
: "|"
: map element [0 .. n]
where
element x = format (y * x) b w
You can even move line into mulTable's where clause; imho, you should.
If you find a where clause nested inside another where clause troubling, then I suggest to change your indentation habits. My recommendation is to use consistent indentation of always 2 or always 4 spaces. Then you can easily see, everywhere, where the where in the other where is at. ok
Below's what it looks like (with a few other changes in style):
import Data.List
import Data.Char
mulTable :: Int -> Int -> String
mulTable size base =
unlines $
[ vertHeaders
, minusSignsLine
] ++ map line [0 .. size]
where
vertHeaders =
concat
$ replicate (cellWidth + 2) ' '
: map horizontalHeader [0 .. size]
horizontalHeader i = format i base cellWidth
minusSignsLine = replicate ((cellWidth + 1) * (size + 2)) '-'
cellWidth = length $ toBase base (size * size)
line y =
concat
$ format y base cellWidth
: "|"
: map element [0 .. size]
where
element x = format (y * x) base cellWidth
toBase :: Integral i => i -> i -> [i]
toBase base
= reverse
. map (`mod` base)
. takeWhile (> 0)
. iterate (`div` base)
toAlphaDigit :: Int -> Char
toAlphaDigit n
| n < 10 = chr (n + ord '0')
| otherwise = chr (n + ord 'a' - 10)
format :: Int -> Int -> Int -> String
format v b w =
spaces ++ digits ++ " "
where
digits
| v == 0 = "0"
| otherwise = map toAlphaDigit (toBase b v)
spaces = replicate (w - length digits) ' '
0) add a main function :-) at least rudimentary
import System.Environment (getArgs)
import Control.Monad (liftM)
main :: IO ()
main = do
args <- liftM (map read) $ getArgs
case args of
(n:b:_) -> putStrLn $ mulTable n b
_ -> putStrLn "usage: nntable n base"
1) run ghc or runhaskell with -Wall; run through hlint.
While hlint doesn't suggest anything special here (only some redundant brackets), ghc will tell you that you don't actually need Text.Printf here...
2) try running it with base = 1 or base = 0 or base = -1
If you want multiline comments use:
{- A multiline
comment -}
Also, never use foldl, use foldl' instead, in cases where you are dealing with large lists which must be folded. It is more memory efficient.
A brief comments saying what each function does, its arguments and return value, is always good. I had to read the code pretty carefully to fully make sense of it.
Some would say if you do that, explicit type signatures may not be required. That's an aesthetic question, I don't have a strong opinion on it.
One minor caveat: if you do remove the type signatures, you'll automatically get the polymorphic Integral support ephemient mentioned, but you will still need one around toAlphaDigits because of the infamous "monomorphism restriction."

Resources