Haskell: Generating k-itemsets for apriori

Haskell: Generating k-itemsets for apriori - haskell

I am trying to generate all k-item sets for use in apriori, I am following this pseudocode:
L1= {frequent items};
for (k= 2; Lk-1 !=∅; k++) do begin
Ck= candidates generated from Lk-1 (that is: cartesian product Lk-1 x Lk-1 and eliminating any
k-1 size itemset that is not frequent);
for each transaction t in database do
increment the count of all candidates in
Ck that are contained in t
Lk = candidates in Ck with min_sup
end
return U_k Lk;
,here is the code I have:
-- d transactions, threshold
kItemSets d thresh = kItemSets' 2 $ frequentItems d thresh
where
kItemSets' _ [] = [[]]
kItemSets' k t = ck ++ (kItemSets' (k+1) ck)
where
-- those (k-1) length sets that meet the threshold of being a subset of the transactions in d
ck = filter (\x->(countSubsets x d) >= thresh) $ combinations k t
-- length n combinations that can be made from xs
combinations 0 _ = [[]]
combinations _ [] = []
combinations n xs#(y:ys)
| n < 0 = []
| otherwise = case drop (n-1) xs of
[ ] -> []
[_] -> [xs]
_ -> [y:c | c <- combinations (n-1) ys]
++ combinations n ys
-- those items of with frequency o in the dataset
frequentItems xs o = [y| y <- nub cs, x<-[count y cs], x >= o]
where
cs = concat xs
isSubset a b = not $ any (`notElem` b) a
-- Count how many times the list y appears as a subset of a list of lists xs
countSubsets y xs = length $ filter (isSubset y ) xs
count :: Eq a => a -> [a] -> Int
count x [] = 0
count x (y:ys) | x == y = 1+(count x ys)
| otherwise = count x ys
transactions =[["Butter", "Biscuits", "Cream", "Newspaper", "Bread", "Chocolate"],
["Cream", "Newspaper", "Tea", "Oil", "Chocolate"] ,
["Chocolate", "Cereal", "Bread"],
["Chocolate", "Flour", "Biscuits", "Newspaper"],
["Chocolate", "Biscuits", "Newspaper"] ]
But when I compile I get the error:
apriori.hs:5:51:
Occurs check: cannot construct the infinite type: a0 = [a0]
Expected type: [a0]
Actual type: [[a0]]
In the second argument of kItemSets', namely `ck'
In the second argument of `(++)', namely `(kItemSets' (k + 1) ck)'
Failed, modules loaded: none.
But when I run from ghci:
*Main> mapM_ print $ filter (\x->(countSubsets x transactions ) >= 2 ) $ combinations 2 $ frequentItems transactions 2
["Biscuits","Newspaper"]
["Biscuits","Chocolate"]
["Cream","Newspaper"]
["Cream","Chocolate"]
["Newspaper","Chocolate"]
["Bread","Chocolate"]
Which is correct, since it's those 2-item sets that meet the occurrence threshold in the set of transactions. But what I need for the 3-item sets is
[["Biscuits", "Chocolate", "Newspaper" ],
["Chocolate", "Cream", "Newspaper"]]
and for this to be appended to the list of 2-item sets. How would I change my current code to achieve this? I know it can be built from the 2-item set, but I'm not sure how to go about it.

Had to use this for line 5:
kItemSets' k t = ck ++ (kItemSets' (k+1) $ nub $ concat ck)
Not the most efficient but it works.

Related

generate a list of numbers with increasing digit order

Hi I am trying to generate a list with
All possible n digit numbers
And their digits are in decreasing order
For example, if n = 3 the output will be [111 .. 321 .. 543 ..999].
My initial attempt was
--attempt1
digits n = map (\x -> read [x] :: Int) (show n)
sorted [] = True
sorted [x] = True
sorted (x:y:xs) = if x <= y then sorted (y:xs) else False
[ x | x <- [ 10^(n-1) .. 10^n ] , sorted $ digits $ x]
However this approach got slower exponentially as the variable n got bigger.
My second approach was (if n == 3)
joiner :: [Integer] -> Integer
joiner = read . concatMap show
[ joiner [z,y,x] |
x <- [1..9],
y <- [9,8..x],
z <- [9,8..y]]
However now the problem is how I can generalise this code to an arbitrary n
joiner :: [Integer] -> Integer
joiner = read . concatMap show
[ joiner [a_n,...,a_1] |
a_1 <- [1..9],
a_2 <- [9,8..x],
.
.
.
a_n <- [9,8..a_n-1]
]
Thank you!

Every time you need to combine N of something (where N is unknown upfront), the answer is always recursion. After all, that's the only way to iterate in Haskell.
First, we'll need a way to append another digit to a given number. Simple enough:
appendDigit x = [ x*10 + d | d <- [0..9] ]
Let's test it out:
λ appendDigit 2
[20,21,22,23,24,25,26,27,28,29]
λ appendDigit 3
[30,31,32,33,34,35,36,37,38,39]
But not good enough: we only need to append digits that are less than the last one. Well, easy to modify:
appendDigit x = [ x*10 + d | d <- [0..(lastDigit-1)] ]
where lastDigit = x `mod` 10
Try it out:
λ appendDigit 2
[20,21]
*Main Lib
λ appendDigit 3
[30,31,32]
*Main Lib
λ appendDigit 8
[80,81,82,83,84,85,86,87]
And now all that remains is just to do it N times, concatenating resulting lists along the way:
decDigits 0 = [] -- degenerate case: when N = 0, there are no such numbers
decDigits 1 = [0..9] -- base case: N = 1
decDigits n = concatMap appendDigit $ decDigits (n-1)

Haskell: Given a list of numbers and a number k, return whether any two numbers from the list add up to k

Given a list of numbers and a number k, return whether any two numbers from the list add up to k.
For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
The program must prompt the user for input.
The program must accept the list as a collection of comma separated values.
The values should all be integers.
The input list can be between 1 and 42 number long.
What I have done
I have been able to input the list of integer as a list and seperated by commas but am not able to return true when 2 numbers add to k
toList :: String -> [Integer]
toList input = read ("[" ++ input ++ "]")
main = do
putStrLn "Enter a list of numbers (separated by comma):"
input <- getLine
print $ k (toList input)

There are following approaches.
1) Create a list pf pairs which are all combinations [(10,10),(10,15),..,(15,10),(15,3)..].
Now you can use simple any function on this list to check if any pair add up to given number.
getCoupleList :: [a]->[(a,a)]
getCoupleList [] = []
getCoupleList [x] = []
getCoupleList (x:xs) = map (\y->(x,y)) xs ++ getCoupleList xs
getSumOfCoupleList :: Num a => [(a,a)]->[a]
getSumOfCoupleList xs = map (\x -> fst x + snd x) xs
isSum :: [Int]->Int->Bool
isSum xs k = any (==k) $ (getSumOfCoupleList.getCoupleList) xs
or directly check wuthout getSumOfCoupleList
isSum xs k = any (\(a,b)-> a + b == k) $ (getSumOfCoupleList.getCoupleList) xs
If you check creating the couple list and finding the sum in not needed. We can directly get the list of sum with simple changes.
getSumList :: Num a=>[a]->[a]
getSumList [] = []
getSumList [x] = []
getSumList (x:xs) = map (+x) xs ++ getSumList xs
isSum1 :: [Int]->Int->Bool
isSum1 xs k = any (==k) $ getSumList xs
2) Create another list from given list by subtracting every element from 17. Now just check if any number from first list is present in second.
isSum2 :: [Int]->Int->Bool
isSum2 xs k = let newList = map (k-) xs
intersectList = xs `intersect` newList
in not (null intersectList)

It's a naive method, not optimized and just show an example.
toList :: String -> [Integer]
toList input = read ("[" ++ input ++ "]")
check :: Integer -> [Integer] -> Bool
check k (x:xs) = if ((k-x) `elem` xs)
then True
else (check k xs)
check k x = False
main = do
let k = 12
putStrLn "Enter a list of numbers (separated by comma):"
input <- getLine
print $ (check k (toList input))

I was recently asked the same exact question in an interview, here's one of my answer
import util
arr = [10, 15, 3, 8]
k = 17
for i in range(0, len(arr)):
arr_new.append(abs(arr[i] -17))
res= list(set(arr).intersection(arr_new))
if (len(res)>0):
print(str(res[0]) + " + " + str(res[1]) +"= "+ str(k ))
else:
print("No numbers add up to k")

Euler 75 what is wrong, I have all primitive triangles and multiples but still something fails

Basetri looks just like the wikipedia definition of the euclidian
algorithm (but i only save perimeter) , and seems to generate all
triangles.
Timesify gives all multiples of these triangles (the 120 triangle
appears 3 times)
Then i concatenate, sort and group to give list of lists with each of
the perimeters in same group, then filter the ones with more than 1
just one way to make the perimeter.
This should give me all the triangles that are just possible to do in just one way, however length euler75 = 157730 does not seem to be the valid answer.
euler75 = filter justOneElement $ group $ sort $ concat $ timesify (takeWhile (<=1500000) basetri)
justOneElement (x:[]) = True
justOneElement _ = False
basetri = [((x m n + y m n + z m n)) | m<-[1..700],n<-[1..(m-1)], odd (m-n),gcd m n == 1]
where
x m n = (m^2 - n^2)
y m n = 2*m*n
z m n = (m^2+n^2)
timesify [] = []
timesify (x:xs) = (takeWhile (<=1500000) $ (map (*x) [1..])) : timesify xs

Changed to
triangs :: Integer -> [Integer]
triangs l = [p | n <- [2..1000],
m <- [1..n-1],
gcd m n == 1,
odd (m+n),
let p = 2 * (n^2 + m*n),
p <= l]
and now it works

Comparing 3 output lists in haskell

I am doing another Project Euler problem and I need to find when the result of these 3 lists is equal (we are given 40755 as the first time they are equal, I need to find the next:
hexag n = [ n*(2*n-1) | n <- [40755..]]
penta n = [ n*(3*n-1)/2 | n <- [40755..]]
trian n = [ n*(n+1)/2 | n <- [40755..]]
I tried adding in the other lists as predicates of the first list, but that didn't work:
hexag n = [ n*(2*n-1) | n <- [40755..], penta n == n, trian n == n]
I am stuck as to where to to go from here.
I tried graphing the function and even calculus but to no avail, so I must resort to a Haskell solution.

Your functions are weird. They get n and then ignore it?
You also have a confusion between function's inputs and outputs. The 40755th hexagonal number is 3321899295, not 40755.
If you really want a spoiler to the problem (but doesn't that miss the point?):
binarySearch :: Integral a => (a -> Bool) -> a -> a -> a
binarySearch func low high
| low == high = low
| func mid = search low mid
| otherwise = search (mid + 1) high
where
search = binarySearch func
mid = (low+high) `div` 2
infiniteBinarySearch :: Integral a => (a -> Bool) -> a
infiniteBinarySearch func =
binarySearch func ((lim+1) `div` 2) lim
where
lim = head . filter func . lims $ 0
lims x = x:lims (2*x+1)
inIncreasingSerie :: (Ord a, Integral i) => (i -> a) -> a -> Bool
inIncreasingSerie func val =
val == func (infiniteBinarySearch ((>= val) . func))
figureNum :: Integer -> Integer -> Integer
figureNum shape index = (index*((shape-2)*index+4-shape)) `div` 2
main :: IO ()
main =
print . head . filter r $ map (figureNum 6) [144..]
where
r x = inIncreasingSerie (figureNum 5) x && inIncreasingSerie (figureNum 3) x

Here's a simple, direct answer to exactly the question you gave:
*Main> take 1 $ filter (\(x,y,z) -> (x == y) && (y == z)) $ zip3 [1,2,3] [4,2,6] [8,2,9]
[(2,2,2)]
Of course, yairchu's answer might be more useful in actually solving the Euler question :)

There's at least a couple ways you can do this.
You could look at the first item, and compare the rest of the items to it:
Prelude> (\x -> all (== (head x)) $ tail x) [ [1,2,3], [1,2,3], [4,5,6] ]
False
Prelude> (\x -> all (== (head x)) $ tail x) [ [1,2,3], [1,2,3], [1,2,3] ]
True
Or you could make an explicitly recursive function similar to the previous:
-- test.hs
f [] = True
f (x:xs) = f' x xs where
f' orig (y:ys) = if orig == y then f' orig ys else False
f' _ [] = True
Prelude> :l test.hs
[1 of 1] Compiling Main ( test.hs, interpreted )
Ok, modules loaded: Main.
*Main> f [ [1,2,3], [1,2,3], [1,2,3] ]
True
*Main> f [ [1,2,3], [1,2,3], [4,5,6] ]
False
You could also do a takeWhile and compare the length of the returned list, but that would be neither efficient nor typically Haskell.
Oops, just saw that didn't answer your question at all. Marking this as CW in case anyone stumbles upon your question via Google.

The easiest way is to respecify your problem slightly
Rather than deal with three lists (note the removal of the superfluous n argument):
hexag = [ n*(2*n-1) | n <- [40755..]]
penta = [ n*(3*n-1)/2 | n <- [40755..]]
trian = [ n*(n+1)/2 | n <- [40755..]]
You could, for instance generate one list:
matches :: [Int]
matches = matches' 40755
matches' :: Int -> [Int]
matches' n
| hex == pen && pen == tri = n : matches (n + 1)
| otherwise = matches (n + 1) where
hex = n*(2*n-1)
pen = n*(3*n-1)/2
tri = n*(n+1)/2
Now, you could then try to optimize this for performance by noticing recurrences. For instance when computing the next match at (n + 1):
(n+1)*(n+2)/2 - n*(n+1)/2 = n + 1
so you could just add (n + 1) to the previous tri to obtain the new tri value.
Similar algebraic simplifications can be applied to the other two functions, and you can carry all of them in accumulating parameters to the function matches'.
That said, there are more efficient ways to tackle this problem.

Detecting cyclic behaviour in Haskell

I am doing yet another projecteuler question in Haskell, where I must find if the sum of the factorials of each digit in a number is equal to the original number. If not repeat the process until the original number is reached. The next part is to find the number of starting numbers below 1 million that have 60 non-repeating units. I got this far:
prob74 = length [ x | x <- [1..999999], 60 == ((length $ chain74 x)-1)]
factorial n = product [1..n]
factC x = sum $ map factorial (decToList x)
chain74 x | x == 0 = []
| x == 1 = [1]
| x /= factC x = x : chain74 (factC x)
But what I don't know how to do is to get it to stop once the value for x has become cyclic. How would I go about stopping chain74 when it gets back to the original number?

When you walk through the list that might contain a cycle your function needs to keep track of the already seen elements to be able to check for repetitions. Every new element is compared against the already seen elements. If the new element has already been seen, the cycle is complete, if it hasn't been seen the next element is inspected.
So this calculates the length of the non-cyclic part of a list:
uniqlength :: (Eq a) => [a] -> Int
uniqlength l = uniqlength_ l []
where uniqlength_ [] ls = length ls
uniqlength_ (x:xs) ls
| x `elem` ls = length ls
| otherwise = uniqlength_ xs (x:ls)
(Performance might be better when using a set instead of a list, but I haven't tried that.)

What about passing another argument (y for example) to the chain74 in the list comprehension.
Morning fail so EDIT:
[.. ((length $ chain74 x x False)-1)]
chain74 x y not_first | x == y && not_first = replace_with_stop_value_:-)
| x == 0 = []
| x == 1 = [1]
| x == 2 = [2]
| x /= factC x = x : chain74 (factC x) y True

I implemented a cycle-detection algorithm in Haskell on my blog. It should work for you, but there might be a more clever approach for this particular problem:
http://coder.bsimmons.name/blog/2009/04/cycle-detection/
Just change the return type from String to Bool.
EDIT: Here is a modified version of the algorithm I posted about:
cycling :: (Show a, Eq a) => Int -> [a] -> Bool
cycling k [] = False --not cycling
cycling k (a:as) = find 0 a 1 2 as
where find _ _ c _ [] = False
find i x c p (x':xs)
| c > k = False -- no cycles after k elements
| x == x' = True -- found a cycle
| c == p = find c x' (c+1) (p*2) xs
| otherwise = find i x (c+1) p xs
You can remove the 'k' if you know your list will either cycle or terminate soon.
EDIT2: You could change the following function to look something like:
prob74 = length [ x | x <- [1..999999], let chain = chain74 x, not$ cycling 999 chain, 60 == ((length chain)-1)]

Quite a fun problem. I've come up with a corecursive function that returns the list of the "factorial chains" for every number, stopping as soon as they would repeat themselves:
chains = [] : let f x = x : takeWhile (x /=) (chains !! factC x) in (map f [1..])
Giving:
take 4 chains == [[],[1],[2],[3,6,720,5043,151,122,5,120,4,24,26,722,5044,169,363601,1454]]
map head $ filter ((== 60) . length) (take 10000 chains)
is
[1479,1497,1749,1794,1947,1974,4079,4097,4179,4197,4709,4719,4790,4791,4907,4917
,4970,4971,7049,7094,7149,7194,7409,7419,7490,7491,7904,7914,7940,7941,9047,9074
,9147,9174,9407,9417,9470,9471,9704,9714,9740,9741]
It works by calculating the "factC" of its position in the list, then references that position in itself. This would generate an infinite list of infinite lists (using lazy evaluation), but using takeWhile the inner lists only continue until the element occurs again or the list ends (meaning a deeper element in the corecursion has repeated itself).
If you just want to remove cycles from a list you can use:
decycle :: Eq a => [a] -> [a]
decycle = dc []
where
dc _ [] = []
dc xh (x : xs) = if elem x xh then [] else x : dc (x : xh) xs
decycle [1, 2, 3, 4, 5, 3, 2] == [1, 2, 3, 4, 5]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell: Generating k-itemsets for apriori - haskell

Had to use this for line 5: kItemSets' k t = ck ++ (kItemSets' (k+1) $ nub $ concat ck) Not the most efficient but it works.

Related

generate a list of numbers with increasing digit order

Haskell: Given a list of numbers and a number k, return whether any two numbers from the list add up to k

Euler 75 what is wrong, I have all primitive triangles and multiples but still something fails

Comparing 3 output lists in haskell

Detecting cyclic behaviour in Haskell

Categories

Resources