Solving Google Code Jam's "Minimum Scalar Product" in Haskell - haskell

In preparation for the upcoming Google Code Jam, I've started working on some problems. Here's one of the practice problems I've tried:
http://code.google.com/codejam/contest/32016/dashboard#s=p0
And here's the gist of my current Haskell solution:
{-
- Problem URL: http://code.google.com/codejam/contest/32016/dashboard#s=p0
-
- solve takes as input a list of strings for a particular case
- and returns a string representation of its solution.
-}
import Data.List
solve :: [String] -> String
solve input = show $ minimum options
where (l1:l2:l3:_) = input
n = read l1 :: Int
v1 = parseVector l2 n
v2 = parseVector l3 n
pairs = [(a,b) | a <- permutations v1, b <- permutations v2]
options = map scalar pairs
parseVector :: String -> Int -> [Int]
parseVector line n = map read $ take n $ (words line) :: [Int]
scalar :: ([Int],[Int]) -> Int
scalar (v1,v2) = sum $ zipWith (*) v1 v2
This works with the sample input provided, but dies on the small input to an out of memory error. Increasing the max heap size appears to do nothing but let it run indefinitely.
My main question is how to go about optimizing this so that it will actually return a solution, other than passing in the -O flag to ghc, which I've already done.

Improving your algorithm is the most important thing. Currently, you permute both lists, giving a total of (n!)^2 options to check. You need only permute one of the lists. That should not only reduce the time complexity but also the space usage drastically.
And make sure the minimum gets replaced by the strict minimum (as it should, with optimisations, since you're taking the minimum of a list of Ints, compile with -ddump-rule-firings and check for "minimumInt"). If it isn't, use foldl1' min instead of minimum.
I just checked:
Large dataset
T = 10
100 ≤ n ≤ 800
-100000 ≤ xi, yi ≤ 100000
For that, you need a dramatically better algorithm. 100! is about 9.3 * 10157, the universe will long have ceased to exist before you have checked a measurable portion of all the permutations of 100 elements. You must construct the solving permutations by looking at the data. To find out what the solutions would look like, investigate some small inputs and what permutations realise the minimum scalar product for those (having a look at the Cauchy-Bun'akovskiy-Schwarz inequality would also not hurt).

The solution I came up with:
{-
- Problem URL: http://code.google.com/codejam/contest/32016/dashboard#s=p0
-
- solve takes as input a list of strings for a particular case
- and returns a string representation of its solution.
-}
import Data.List
solve :: [String] -> String
solve input = show result
where (l1:l2:l3:_) = input
n = read l1 :: Int
v1 = parseVector l2 n
v2 = parseVector l3 n
result = scalar (sort v1) (reverse $ sort v2)
parseVector :: String -> Int -> [Integer]
parseVector line n = map read $ take n $ (words line) :: [Integer]
scalar :: [Integer] -> [Integer] -> Integer
scalar v1 v2 = sum $ zipWith (*) v1 v2 :: Integer
This seems to give the right answer and is orders of magnitude faster. The trick I guess is to not take the word "permutation" at face value when reading problems like this.

according to me(my solution work), you really don't need to permute,
you really need to sort both list,then you require queue and stack,
so how it work
ex:-
input
1 3 -5
-2 1 4
sort
-5 1 3
-2 1 4
then -2*3+4*-5+1*1=-6
ex2:-
so you see
-ve number multiply with most +ve or least -ve
+ve number multiply with most -ve or least +ve

My main question is how to go about optimizing this so that it will actually return a solution, other than passing in the -O flag to ghc, which I've already done.
I think you should reconsider your approach. I think this practice problem is not so much about coding skills but more about problem solving. Trying out all possible permutations is the brute force way and I think you have a good feeling now why this brute force strategy does not work in this case ;). It really can be done in 3 lines of code when you use stl algorithms (not including in- and output of course).
Actually Madan Ram already gave some kind of solution. However, in case you will ever be given such a problem in an interview, you will also have to know why it works for all cases (not just for a number of examples). So my advice is to take a pen and a piece of paper and do some examples by hand, then you will find out how to do it.
SPOILER ALERT
Its hard to give a hint without spoiling to much, but as the solution has been posted already... Try to start simple. Which element in the first vector has the biggest contribution to the scalar product? To which element from the second vector do you have to mulitply this one to get in total the smallest contribution to the scalar product?

Related

Sieve of Euler space complexity in Haskell

I was given 2 different algorithms written in Haskell aimed to generate the first k primes. As the title suggests, they are the Sieve of Eratoshenes and Euler.
I am trying to understand why Euler implementation uses so much memory. What I thought so far is that the number of streams generated explodes quickly saturating the memory but from what I have understood they should be equal to 2k - 1 where k is the index of the prime we want.
So there should be other major problems.
I tried running the code via GHC's Debugger and it shows that much of the memory allocated is from the minus function so I think streams are definitely part of the problem.
Here minus removes from tcs every multiple of p starting from p*p which is coprime to every prime found so far (except p) and finally generates a new stream passed to the next recursive call of eulerSieve.
minus :: [Int] -> [Int] -> [Int]
minus xs#(x:txs) ys#(y:tys)
| x < y = x : minus txs ys
| otherwise = minus txs tys
eulerSieve :: [Int] -> [Int]
eulerSieve cs#(p:tcs) = p:eulerSieve (minus tcs (map (p*) cs))
Reading here I found that the space is bounded by O(k^2) but it doesn't explain why accurately.
Any suggestion would be appreciated.

Exponentiation using list comprehension

I'm trying to solve the following exercise (I'm learning Haskell):
Define x^n using a list comprehension.
And I'm struggling to find a solution.
Using recursion or fold, the solution is not complicated (for instance, foldr (*) 1 [x | c <- [1..n]]). However, using only list comprehension it gets difficult (at least for me).
In order to solve the problem, I'm trying to create a list of x^n elements and then get the length. Generating a list of x*n elements is easy, but I fail to generate a list of x^n elements.
ppower x n = length [1 | p <- [1..x], c <- [1..n]]
returns a list of x*n elements giving a wrong result. Any ideas on this will be appreciated.
A naturally-occurring exponential comes from sequence:
length (sequence [[1..x] | _ <- [1..n]])
If you haven't seen sequence yet, it's quite a general function but
when used with lists it works like:
sequence [xs1, ... , xsk] = [[x1, ... xk] | x1 <- xs1, ... , xk <- xsk]
But this is really cheating since sequence is defined recursively.
If you want to use nothing but length and list comprehensions I think
it might be impossible. The rest of this answer will be sketchy and I half
expect someone to prove me wrong. However:
We'll try to prove that such an expression can only compute values up
to some finite power of x or n, and therefore can't compute values
as big as x^n for arbitrary x and n.
Specifically we show by induction on the structure of expressions that
any expression expr has an upper bound ub(expr, m) = m^k where m
is the maximum of the free variables it uses, and k is a known finite
power which we could calculate from the structure of the expression expr.
(When we look at the whole expression, m will be max x n.)
Our upper bounds on list expressions will be bounds on both the length of the list and also bounds on any of
its elements (and lengths of its elements, etc.).
For example if we have [x..y] and we know that x <= m and y <= m, we
know that all the elements are <= m and the length is also <= m.
So we have ub([x..y], m) = m^1.
The tricky case is the list comprehension:
[eleft | x1 <- e1, ... , xk <- ek]
The result will have length equal to length e1 * ... * length ek, so
an upper bound for it would be the product of the upper bounds for
e1 to ek, or if m^i is the maximum of these then an upper bound
would be (m^i)^k = m^(i*k).
To get a bound on the elements, suppose expression eleft has ub(eleft, m') = m'^j. It can use x1
... xk. If m^i is an upper bound for these, as above, we need to
take m' = m^i and so ub(eleft, m) = (m^i)^j = m^(i*j)
As a conservative upper bound for the whole list comprehension e we
could take ub(e, m) = m^(i*j*k).
I should really also work through cases for pattern matching
(shouldn't be a problem because the parts matched are smaller than
what we already had), let definitions and functions (but we banned
recursion, so we can just fully expand these before we start), and
list literals like [x,37,x,x,n] (we can throw their lengths
into m as initially-available values).
If infinite lists like [x..] or [x,y..] are allowed they would need some
thinking about. We can construct head and filter, which means we can get
from an infinite list to its first element matching a predicate, and that looks suspiciously like a way to get recursive functions. I don't
think it's a problem since 1. they are only arithmetic sequences and
2. we'll have to construct any numbers we want to use in the
predicate. But I'm not certain here.
As #n.m suggested, I asked Richard Bird (author of the book "Introduction to functional programming", first edition, the book where I got the exercise) for an answer/guidance in solving this exercise. He kindly replied and here I post the answer he gave me:
Since a list comprehension returns a list not a number, x^n cannot be
defined as an instance of a list comprehension. Your solution x^n =
product [x | c <- [1..n]] is the correct one.
So, I guess I'll stick to the solution I posted (and discarded for using recursion):
foldr (*) 1 [x | c <- [1..n]]
He didn't say anything about creating a list of x^n elements with lists comprehensions (no recursion) though as #David Fletcher and #n.m point out in their comments, it might be impossible.
May be you can do as follows;
pow :: Int -> Int -> Int
pow 0 _ = 1
pow 1 x = x
pow n x = length [1 | y <- [1..x], z <- [1..pow (n-1) x]]
so pow 3 2 would return 8

How can I replace generators if I need only one result?

I'm playing with Haskell for first time.
I've created function that returns first precise enough result. It works as expected, but I'm using generator for this. How can I replace generator in this task?
integrateWithPrecision precision =
(take 1 $ preciseIntegrals precision) !! 0
preciseIntegrals :: Double -> [Double]
preciseIntegrals precision =
[
integrate (2 ^ power) pi | power <- [0..],
enoughPowerForPrecision power precision
]
You can use the beautiful until function. Here it is:
-- | #'until' p f# yields the result of applying #f# until #p# holds.
until :: (a -> Bool) -> (a -> a) -> a -> a
until p f x | p x = x
| otherwise = until p f (f x)
So, you can write your function like this:
integrateWithPrecision precision = integrate (2 ^ pow) pi
where
pow = until done succ 0
done pow = enoughPowerForPrecision pow precision
In your case, you do all the iteration and then compute a result just once. But until is useful even when you need to compute a result at each step - just use an (iter, result) tuple and then just extract the result at the end with snd.
It seems like you want to check higher and higher powers until you get one that satisfies a requirement. This is what you could do: First you define a function to get enough power, and then you integrate using that.
find gets the first element of a list that satisfies a condition – like being enough of a power! Then we need a fromJust to get the actual value from that. Please note that almost always, fromJust is a terrible idea to have in your code. However, in this case the list is infinite, so we will have troubles with infinite loops long before fromJust is able to crash the program.
enoughPower :: Double -> Int
enoughPower precision =
fromJust $ find (flip enoughPowerForPrecision precision) [0..]
preciseIntegrals :: Double -> Double
preciseIntegrals precision = integrate (2^(enoughPower precision)) pi
The function
\xs -> take 1 xs !! 0
is called head
head [] = error "Cannot take head of empty list"
head (x:xs) = x
Its use is somewhat unsafe, as shown it can throw an error if you pass it an empty list, but in this case since you can be certain your list is non-empty it's fine.
Also, we tend not to call these "generators" in Haskell as they're not a special form but are instead a simple consequence of lazy evaluation. In this case, preciseIntegrals is called a "list comprehension" and [0..] is nothing more than a lazily generated list.

Character & strings

I am new in haskell and I have a problem (aka homework).
So, I have a list with a tuple – a string and an integer:
xxs :: [([Char], Integer)]
I need to know how many of the strings in xxs start with a given character.
Let me exemplify:
foo 'A' [("Abc",12),("Axx",34),("Zab",56)]
Output: 2
foo 'B' [("Abc",12),("Bxx",34),("Zab",56)]
Output: 1
My best attempt so far:
foo c xxs = length (foldl (\acc (x:xs) -> if x == c then c else x) [] xxs)
But, of course, there's something VERY wrong inside the lambda expression.
Any suggestion?
Thanks.
You can use a fold, but I would suggest another way, which breaks the problem in three steps:
transform the input list to the list of first letters. You can use map for this
filter out all elements not equal to the given Char
take the length of the remaining list
Obviously the first step is the hardest, but not as hard as it looks. For doing it you just have to combine somehow the functions fst and head, or even easier, map twice.
You can write this as a simple one-liner, but maybe you should start with a let:
foo c xxs = let strings = map ...
firstLetters = map ...
filteredLetters = filter ...
in length ...
There are a few problems with your attempt:
You plan to use foldl to construct a shorter list and then to take its length. While it is possible, filter function is much better suited for that task as #landei suggests
foldl can be used to accumulate the length without constructing a shorter list. See the answer of #WuXingbo - his answer is incorrect, but once you realize that length is not needed at all with his approach, it should be easy for you to come with correct solution.
Somewhat contradictory to common sense, in a lazy language foldr is faster and uses less memory than foldl. You should ask your teacher why.
I would rewrite foo as
foo :: Char -> [(String, Int)] -> Int
foo c = length . filter ((==c).head.fst)
fst fetches the first element of a two-element tuple.
(==c) is a one-argument function that compares its input with c (see http://www.haskell.org/tutorial/functions.html paragraph 3.2.1 for better explanation).

Project euler problem 3 in haskell

I'm new in Haskell and try to solve 3 problem from http://projecteuler.net/.
The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?
My solution:
import Data.List
getD :: Int -> Int
getD x =
-- find deviders
let deriveList = filter (\y -> (x `mod` y) == 0) [1 .. x]
filteredList = filter isSimpleNumber deriveList
in maximum filteredList
-- Check is nmber simple
isSimpleNumber :: Int -> Bool
isSimpleNumber x = let deriveList = map (\y -> (x `mod` y)) [1 .. x]
filterLength = length ( filter (\z -> z == 0) deriveList)
in
case filterLength of
2 -> True
_ -> False
I try to run for example:
getD 13195
> 29
But when i try:
getD 600851475143
I get error Exception: Prelude.maximum: empty list Why?
Thank you #Barry Brown, I think i must use:
getD :: Integer -> Integer
But i get error:
Couldn't match expected type `Int' with actual type `Integer'
Expected type: [Int]
Actual type: [Integer]
In the second argument of `filter', namely `deriveList'
In the expression: filter isSimpleNumber deriveList
Thank you.
Your type signature limits the integer values to about 2^29. Try changing Int to Integer.
Edit:
I see that you already realised that you need to use Integer instead of Int. You need to change the types of both getD and isSimpleNumber otherwise you will get a type mismatch.
Also in general, if you are having trouble with types, simply remove the type declarations and let Haskell tell you the correct types.
Main> :t getD
getD :: Integral a => a -> a
Main> :t isSimpleNumber
isSimpleNumber :: Integral a => a -> Bool
After you found the error, may I point out that your solution is quite verbose? In this case a very simple implementation using brute force is good enough:
getD n = getD' n 2 where
getD' n f | n == f = f
| n `mod` f == 0 = getD' (n `div` f) f
| otherwise = getD' n (succ f)
this question is easy enough for brute-force solution, but it is a bad idea to do so because the whole idea of project euler is problems you need to really think of to solve (see end of answer)
so here are some of your program's flaws:
first, use rem instead of mod. it is more efficient.
some mathematical thinking should have told you that you don't need to check all numbers from 1 to x in the isprime function and the getD function, but checking all numbers from the squareroot to one (or reversed) should be sufficient. note that in getD you will actually need to filter numbers between x and the square root, because you search for the biggest one.
why do you use the maximum function in getD? you know the list is monotonically growing, so you may as well get the last one.
despite you only need the biggest divisor (which is prime) you compute the divisors list from small to big making the computer check for each value if it is a divisor or not although discarding the result once a bigger divisor is found. it should be fixed by filtering the list of numbers from x to 1, not from 1 to x. this will cause the computer to check divisibility (how should I say that?) for the biggest possible divisor, not throwing to the trash the knowledge of previous checks. note that this optimization takes effect only if the previous point is optimized, because otherwise the computer will compute all divisors anyway.
with the previous points mixed, you should have filtered all numbers [x,x-1 .. squareroot x] and taken the first.
you don't use an efficient isPrime function. if I were you, I would have searched for an isprime library function, which is guaranteed to be efficient.
and there are more..
with this kind of code you will never be able to solve harder project euler problems. they are designed to need extra thinking about the problem (for instance noticing you don't have to check numbers greater from the square root) and writing fast and efficient code. this is the purpose of project euler; being smart about programming. so don't skip it.

Resources