Comparing and counting string correlations with Haskell

Comparing and counting string correlations with Haskell - haskell

I'm working on a pretty complicated (complicated for me, at least) function that I'd like to use to count the number of times a word in a list corresponds with a word in a database.
An example using random words:
let input = [("InputName", ["dog", "cat", "cat"...]), ...]
let database = ["dog", "cat", "badger"...]
After several hours of mental gymnastics, I came up with this hideous function that almost works. I've simplified it so it'll make sense in the context of this example:
findMatches input database = [ (snd x, wordCount (snd x)) | x <- input ]
where
wordCount ys = sum[ if y `elem` database then 1 else 0 | y <- ys ]
My goal, my hope, my wish would be to have an output that reads:
[("dog", 1), ("cat", 2), ("badger", 0)]
Any suggestions or nudges in the right direction would be appreciated.
EDIT
I finally made a function that works. catWordCount counts the number of times a database entry appears in an input. I'm working on a better implementation using fold.
let input = words "5 4 10 0 1 9 1"
let database = [("C1", words "1 2 3 4 5"), ("C2", words "6 7 8 9 10")]
catwordCount input database
catWordCount fs zs = [ (fst f, inputSearch (snd f)) | f <- fs ]
where
inputSearch gs = [ (g, wordCount [g]) | g <- gs ]
wordCount hs = sum[ if h == z then 1 else 0 | h <- hs, z <- zs ]
And the output:
(["C1", [("1",2),("2",0),("3",0),("4",1),("5",1)])
(["C2", [("6",0),("7",0),("8",0),("9",1),("10",1)])

You can keep a Map of counts that you update for each item. Since you don't want to include items from the input list that are not in the database, if I understood correctly,
alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a
is a good way to do that. The supplied key k is looked up, and if it's present, the argument to the update function will we Just value, otherwise it will be Nothing. If the result of the update function is Nothing, the key will be deleted from the Map (or not added, if it wasn't present), if the result is Just something, the key will be associated with something in the altered Map.
So you start with a Map mapping every item to 0,
m0 :: Map String Int
m0 = fromList $ zip database (repeat 0)
to update, you want to increment the count if the item is in database, and not change anything otherwise,
incr :: Maybe Int -> Maybe Int
incr (Just n) = Just (n+1)
incr Nothing = Nothing
or, shorter, incr = fmap (+1) using the Functor instance of Maybe.
Then the resulting map is simply
finalMap :: Map String Int
finalMap = foldl (flip $ alter incr) m0 $ snd input
and if you want a list rather than a Map, just call assocs or toList on finalMap.

It might not be exactly what you are looking for, but try this:
import Data.List
countMatches :: [(String, [String])] -> [(String, Int)]
countMatches = map (\l -> (head l, length l)) . group . sort . concat . map snd
Hopefully the function compositions are not too confusing. I'll go over it step by step. Say you run this function with input
[("", ["a", "b"]), ("", ["b", "c", "x", "a"]), ("", ["y", "b", "z"])]
After map snd it is
[["a", "b"], ["b", "c", "x", "a"], ["y", "b", "z"]]
After concat,
["a", "b", "b", "c", "x", "a", "y", "b", "z"]
After sort,
["a", "a", "b", "b", "b", "c", "x", "y", "z"]
After group,
[["a", "a"], ["b", "b", "b"], ["c"], ["x"], ["y"], ["z"]]
And finally map (\l -> (head l, length l)) produces
[("a", 2), ("b", 3), ("c", 1), ("x", 1), ("y", 1), ("z", 1)]

Related

Group items with same key recursively

Given two list of the form:
l1 = [k1, k2, k3]
l2 = [(k1, x1), (k2, x2), (k1, x3), (k2, x4), (k5, x5), ..)
I would like to group the x'es in a list with their respective key such that:
l = [(k1, [x1, x3]), (k2, [x2, x4]), (k5, [x5]), ..]
I have already googled the question and all the solutions use library code, I was however wondering how you implement this recursively. What I have so far is:
groupHelper [] l2 = []
groupHelper (k, value):tail1 (k1, values):tail2 | k == k1 = groupHelper tail1 (k1,(value:values)):tail2
| k /= k1 = (k1, values):groupHelper (k,value):tail1 tail2
groupHelper (k, value):tail1 [] = (k, value):groupHelper tail1 (k,value)
As you will notice my code makes absolutely no sense. I have been stuck on this problem for some time now, so I was wondering if adding a third accumulator parameter in case I've enumerated through l1 I can go back, would help.
I'm really confused right now, if anyone could point me in the right direction that would be appreciated.

Start with a simpler function: update :: (a, b) -> [(a, [b])] -> [(a, [b])], which takes a single pair and updates the in-progress grouping.
update (k, v) [] = ...
update (k, v) ((k1, vs):rest) = ...
Once you do that, it' s simple matter of folding your list of pairs using update.
groupby :: [(a, b)] -> [(a, [b])]
groupby = foldr update []
If you don't want to use foldr, it would be a good exercise to implement it from scratch.
foldr_ :: (a -> b -> b) -> b -> [a] -> b
foldr_ f z [] = ...
foldr_ f z (x:xs) = ...
There's little to be gained by ignoring the abstraction provided by foldr and trying to implement groupby entirely from first principles.

Not necessarily particularly elegant, but we can pattern match on the list, and then iterate over the tail to find all values with the same key, then cons the result onto the result of running the same function recursively on the remaining elements.
groupByFirst [] = []
groupByFirst ((k, v):tl) = (k, v:map snd s) : groupByFirst d
where
(s, d) = partition ((k ==) . fst) tl
ghci> l2 = [("k1", "x1"), ("k2", "x2"), ("k1", "x3"), ("k2", "x4"), ("k5", "x5")]
ghci> groupByFirst l2
[("k1",["x1","x3"]),("k2",["x2","x4"]),("k5",["x5"])]
The complexity of this is less than ideal, but it should be simple enough to understand. Effectively, evaluating this would look like:
groupByFirst [("k1", "x1"), ("k2", "x2"), ("k1", "x3"), ("k2", "x4"), ("k5", "x5")]
("k1", ["x1", "x3"]) : groupByFirst [("k2", "x2"), ("k2", "x4"), ("k5", "x5")]
("k1", ["x1", "x3"]) : ("k2", ["x2", "x4"]) : groupByFirst [("k5", "x5")]
("k1", ["x1", "x3"]) : ("k2", ["x2", "x4"]) : ("k5", ["x5"]) : groupByFirst []
("k1", ["x1", "x3"]) : ("k2", ["x2", "x4"]) : ("k5", ["x5"]) : []
[("k1", ["x1", "x3"]), ("k2", ["x2", "x4"]), ("k5", ["x5"])]

Haskell: How to generate all possible combinations of elements, each element from a list, with arbitrary number of lists [duplicate]

This question already has an answer here:
Haskell function :: [Name] -> [[(Name, Bool)]]
(1 answer)
Closed 4 years ago.
Having a list of ["P", "Q", "R" ...] I want to generate all possible list of [(String, Bool)] where on the left is a letter from the first array, and on the right is True or False. For example having ["P", "Q"] I want to obtain
: [[("P",True),("Q",True)],[("P",True),("Q",False)],[("P",False),("Q",True)],[("P",False),("Q",False)]]
I made it for the case where I only have ["P", "Q"] but I need to suport arbitrary number of letters. I tought I can generate for every letter L two pairs in an array like [(L,True),(L,False)] and do that for every letter and make all possible combinations of those arrays with one element from each array, but I don't know how to do it properly.
That's what I did for the list of length 2 of letters
envs :: [String] -> [[(String, Bool)]]
envs predicate = let
env = [(p,b) | p <- predicate, b <- [True, False]]
ps = filter (\(pred,val) -> pred == "P") env
qs = filter (\(pred,val) -> pred == "Q") env
in [[a,b] | a <- ps, b <- qs]

Introduce this function
cartProdn :: [a] -> Int -> [[a]]
cartProdn ls 2 = [[x, y] | x <- ls, y <- ls]
cartProdn ls n = [x : t | x <- ls, t <- cartProdn ls (n - 1)]
This gives all possible combinations of length n of a finite list (n > 1).
Then do
*Main> ls = ["P", "Q", "R"]
*Main> rs = [zip ls c | c <- cartProdn [True, False] (length ls)]
*Main> putStrLn $ unlines $ map show rs
[("P",True),("Q",True),("R",True)]
[("P",True),("Q",True),("R",False)]
[("P",True),("Q",False),("R",True)]
[("P",True),("Q",False),("R",False)]
[("P",False),("Q",True),("R",True)]
[("P",False),("Q",True),("R",False)]
[("P",False),("Q",False),("R",True)]
[("P",False),("Q",False),("R",False)]
note: you might want to write ls = "PQR".

LZW routine in Haskell using Monads

I'm trying to implement LZW compression in Haskell using Monads, here is my code so far with test cases:
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
import Control.Monad.Writer
import Data.Char (chr, ord)
import Data.List (isPrefixOf, maximumBy)
import Data.Function
import Test.QuickCheck
type Dictionary = [String]
dictionary :: Dictionary
dictionary = [[chr x] | x <- [0..127]]
test_dictionary =
[ map ord (concat dictionary) == [0..127]
, all (\str -> length str == 1) dictionary
]
prefixes :: String -> Dictionary -> [(Int, String)]
prefixes str dict = [(x, dict!!x) | x <- [0..length dict - 1], isPrefixOf (dict!!x) str]
test_prefixes =
[ prefixes "" dictionary == []
, prefixes "appletree" [] == []
, prefixes "appletree" ["ap", "apple", "tree", "pear"] == [(0, "ap"), (1, "apple")]
, prefixes "babe" dictionary == [(98, "b")]
]
longest :: [(Int, String)] -> (Int, String)
longest prefs = maximumBy (compare `on` (\(x,y) -> length y)) prefs
test_longest =
[ longest [(30, "a"), (20, "abc"), (15, "ab")] == (20, "abc")
, longest [(30, "a"), (20, "abc"), (15, "abc")] == (15, "abc")
]
instance MonadState Dictionary ((->) Dictionary) where
get = \s -> s
munch :: MonadState Dictionary m => String -> m (Int, String, String)
munch str = do
dict <- get
let longst = longest (prefixes str dict)
return (fst longst, snd longst, [str!!x | x <- [length (snd longst)..length str - 1]])
test_munch =
[ evalState (munch "a") ["a"] == (0, "a", "")
, evalState (munch "appletree") ["a"] == (0, "a", "ppletree")
, evalState (munch "peach") ["a", "ba", "b"] == (1, "ba", "be")
]
instance MonadState m (StateT Dictionary ((->) m)) where
append :: MonadState Dictionary m => String -> String -> m ()
append s "" = return ()
append s w = do
dict <- get
let newWord = s ++ (take 1 w)
if (notElem newWord dict)
then do
put (dict++[newWord])
else return ()
test_append =
[ execState (append "a" "") [] == []
, execState (append "a" "") dictionary == dictionary
, execState (append "a" "bc") [] == ["ab"]
, execState (append "a" "bc") ["ab"] == ["ab"]
]
encode :: String -> WriterT [Int] (State Dictionary) ()
encode "" = return ()
encode w = do
dict <- get
let (a, b, c) = (munch w) dict
if length dict < 256
then do
tell [a]
put ((append b c) dict)
encode c
else return ()
test_encode =
[ evalState (execWriterT (encode "")) [] == []
, evalState (execWriterT (encode "aaa")) ["a"] == [0, 1]
, evalState (execWriterT (encode "aaaa")) ["a"] == [0, 1, 0]
, evalState (execWriterT (encode "aaaaa")) ["a"] == [0, 1, 1]
, evalState (execWriterT (encode "abababab")) ["a", "b"] == [0, 1, 2, 4, 1]
, evalState (execWriterT (encode "aaabbbccc")) dictionary
== [97, 128, 98, 130, 99, 132]
]
decode :: [Int] -> WriterT String (State Dictionary) ()
decode [] = return ()
decode [x] = do
dict <- get
tell (dict!!x)
decode (x:xs) = do
dict <- get
let f = dict!!x
let s = if(length dict > head xs)
then dict!!head xs
else f
tell f
put (append f s) dict
decode xs
test_decode =
[ evalState (execWriterT (decode [])) [] == []
, evalState (execWriterT (decode [0])) ["a"] == "a"
, evalState (execWriterT (decode [0, 1, 1, 0])) ["a", "b"] == "abba"
, evalState (execWriterT (decode [0, 1, 2, 0])) ["a", "b"] == "ababa"
, evalState (execWriterT (decode [0, 1, 2, 4, 1])) ["a", "b"] == "abababab"
, evalState (execWriterT (decode [97, 128, 98, 130, 99, 132])) dictionary
== "aaabbbccc"
]
compress :: String -> [Int]
compress w = evalState (execWriterT (encode w)) dictionary
test_compress =
[ compress "" == []
, compress "a" == [97]
, compress "aaa" == [97, 128]
, compress "aaabbbccc" == [97, 128, 98, 130, 99, 132]
]
decompress :: [Int] -> String
decompress list = evalState (execWriterT (decode list)) dictionary
test_decompress =
[ decompress [] == ""
, decompress [97] == "a"
, decompress [97, 128] == "aaa"
, decompress [97, 128, 98, 130, 99, 132] == "aaabbbccc"
]
prop_compressDecompress :: String -> Bool
prop_compressDecompress w = do
let tmp = [chr (div (ord x) 2) | x <- w]
decompress (compress tmp) == tmp
compressFile :: FilePath -> FilePath -> IO ()
compressFile source target = do
s <- readFile source
let compressed = compress s
let chars = [chr x | x <- compressed]
writeFile target chars
decompressFile :: FilePath -> FilePath -> IO ()
decompressFile source target = do
s <- readFile source
let code = [ord x | x <- s]
let decompressed = decompress code
writeFile target decompressed
allTests = [test_dictionary, test_prefixes, test_longest, test_munch, test_append
,test_encode
--, test_decode, test_compress, test_decompress
]
main = do
--quickCheck prop_compressDecompress
print (allTests, and (concat allTests))
With this code I get the following error (referring to the use of put in "encode" and "decode" functions) :
Main.hs#80:13-80:16 No instance for (MonadState () (StateT Dictionary Data.Functor.Identity.Identity)) arising from a use of put
I've tried to define this instance but the best I could achieve is a "functional dependencies conflict between instance declarations" error.
I know there is more simple solutions without Monads but I have to use them, also the types of the functions are not to be modified.
Could you give me any help about what I'm doing wrong here?

You don't need to write any instances. You're simply using append and munch in a wrong way.
append has type MonadState Dictionary m => String -> String -> m ().
If f and s are strings, then append f s yields a state-modifying action that eventually returns a ().
put has type MonadState s m => s -> m (). put replaces the current state with the s argument.
In the light of this, put (append f s) dict makes little sense. You're supposed to supply put a single argument. And you don't have to do anything with dict there; a core point of using the State monad is that the state is left implicit, and there is no need to pass it around.
Also, append f s by itself already updates the state. So what you want here is simply append f s, instead of put (append x y) dict.
There's a similar error in encode with munch; it has a single String argument, so (munch w) dict is erroneous. Again, no need to touch dict. Also, because munch w yields a monadic result, you must bind the result with <- instead of let. So, you should replace let (a, b, c) = (much w dict) with (a, b, c) <- munch w.

put ((append b c) dict)
looks wrong. append is already a monadic action, so it should be enough to use
append b c

convert tuples to a datatype in haskell

I am having issues to convert a list with multiple tuples into a datatype
data SensorValue = SensorValue {a:: Integer, b:: Integer, c:: [Integer]} deriving (Show)
my list with tuples looks like this:
[(1, [(2, [3,4,5]), (2, [2,3,1]), (3, [2,3,7])]), (2, [(1, [4,4,1]), (2, [2,3,1]), (3, [9,0,3])]),...]
so basically my list looks like [(Integer, [(Integer, [Integer])])]
Example
If I take the first tuple from my list (1, [(2, [3,4,5]) then my expected output is:
a SensorValue object with :
a = 1 -- first element of the first tuple
b = 2 -- first element of the second tuple
c = [3,4,5] -- second element of the second tuple
I know how to get to the first tuple with fst but how do I get to the second tuple?

You can use pattern matching here. Your function would look something like this:
f :: (Integer,[(Integer,[Integer])]) -> [SensorValue]
f (x,((y,z):zs)) = SensorValue x y z : f (x,zs) -- First element same for all
f(x,[]) = []
Demo
You would still need to specify the conditions to handle other cases e.g. What happens if the list that forms the second element of the outer tuple is empty?

List comprehensions or do syntax make this quite nice -- assuming you understand them!
doSyntax, listComprehensions :: [(Integer, [(Integer, [Integer])])] -> [SensorValue]
doSyntax sensorPoints = do
(a, pointsAtA ) <- sensorPoints
(b, valuesAtAB) <- pointsAtA
return (SensorValue a b valuesAtAB)
listComprehensions sensorPoints =
[ SensorValue a b valuesAtAB
| (a, pointsAtA ) <- sensorPoints
, (b, valuesAtAB) <- pointsAtA
]
Depending on just what you want to do, you might even consider storing just one sensor value in each element of the result list. Like this (with a variant on the naming scheme above, just for fun):
data SensorValue = SensorValue { a, b, val :: Integer }
fromRawData abvalM =
[ SensorValue a b val
| (a, bvalM) <- abvalM
, (b, valM) <- bvalM
, val <- valM
]

Haskell - how to generate permutations

How can I create a function which lazily makes permutations for the chars '_' and '*' like this:
For example:
Main> function 3
["___","*__","_*_","__*","**_","_**","*_*","***"]
First element is made only from _, the next 3 are permutations that lists: *__, the second 3 are permutations that lists **_, and the last element contains only *.
How can I do that?

Here's another "correct order" version:
function :: Int -> [String]
function c = concatMap helper $ zip (reverse [0..c]) [0..c]
helper :: (Int, Int) -> [String]
helper (c, 0) = [replicate c '_']
helper (0, c) = [replicate c '*']
helper (cUnderscores, cAsterisks) = map ('_' :) (helper (cUnderscores - 1, cAsterisks))
++ map ('*' :) (helper (cUnderscores, cAsterisks - 1))

You might want to look at replicateM.

let k = ["_", "*"]
let p = [ a ++ b ++ c | a <- k, b <- k, c <- k ]

The “correct order” version:
import Data.List
function k = concatMap (nub . permutations . pat) [0..k]
where pat x = replicate x '*' ++ replicate (k-x) '_'
I don’t know how to step from one permutation to another in constant time, though.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Comparing and counting string correlations with Haskell - haskell

Related

Group items with same key recursively

Haskell: How to generate all possible combinations of elements, each element from a list, with arbitrary number of lists [duplicate]

LZW routine in Haskell using Monads

convert tuples to a datatype in haskell

Haskell - how to generate permutations

Categories

Resources