Project Euler #2 For Large Limits - haskell

I have a Haskell solution to Project Euler Problem 2 which works fine for the four million limit, as well as for limits up to 10^100000, taking only a few seconds on my machine.
But for anything bigger, e.g. 10^1000000, the computation does not return in good time, if at all (have tried leaving it for a couple of minutes). What is the limiting factor here?
evenFibonacciSum :: Integer -> Integer
evenFibonacciSum limit =
foldl' (\t (_,b) -> t + b) 0 . takeWhile ((<=limit) . snd) . iterate doIteration $ (1,2) where
doIteration (a, b) = (twoAB - a, twoAB + b) where
twoAB = 2*(a + b)

The problem is that you are summing the (even) Fibonacci numbers. That means you have to calculate them all. But
F(n) ≈ φ^n / √5, with φ = (1 + √5)/2
So you are adding a lot of numbers of large size, Θ(n) bits for F(n). For a limit of 10^1000000, you need about 800000×2 additions of numbers larger than 10^500000. In general, you need Θ(n) additions of numbers with Θ(n) bits.
Adding numbers of d digits [in whatever base] is an O(d) operation. So your algorithm is quadratic in the exponent.
To avoid that, find a closed formula for the sum S(k) of the first k even Fibonacci numbers (hint: it's a relatively easy formula involving one Fibonacci number), find the largest k so that F(3*k) <= limit, and compute the sum using the formula and the algorithm to compute F(n) in O(log n) steps e.g. here.

The problem here seems that you're using a formula for the even fibonacci-numbers that takes linear time to be computed. IIf you double your limit, your computation time also doubles. There should be an algorithm that takes only logarithmic time (if you double the limit, the time changes by a constant value), but it's your job to find out. I'm not spoiling Euler answers here.

Related

sin function returning numbers outside of range [-1, 1]

I am working in Haskell and have had an error where, for really large floats z, sin(z) returns some value outside of the range [-1, 1].
I'm learning Haskell for the first time, so I have had very little luck debugging and the program just crashes when sin(z) does return a value outside of the above range as sin(z) is an input into another function that only accepts values inside the range [-1, 1].
Additionally, I don't have access to the other function, I only can send in a value, but it keeps crashing when sin(z) returns a number either greater than 1 or less than -1.
Is there any way to figure out why sin(z) is doing this?
The sin :: Double -> Double function returns a number strictly between -1 and 1 for all finite inputs, no matter how large. In particular, for the largest representable finite positive double, it returns a value that's roughly 0.005:
> sin (1.7976931348623157E+308 :: Double)
4.961954789184062e-3
and for the largest representable finite negative double, it returns a value that's the negative of that:
> sin (-1.7976931348623157E+308 :: Double)
-4.961954789184062e-3
What's undoubtedly happening is that your input to sin has exceeded the finite range of Double. For example, the following double isn't actually representable as a finite double and is "rounded" to infinity:
> 1.7976931348623159E+308 :: Double
Infinity
If you feed such an infinite value to sin, you get NaN:
> sin (1.7976931348623159E+308 :: Double)
NaN
which will undoubtedly cause problems when fed to a function expecting finite numbers between -1 and 1. This can be "fixed" with min:
> min (sin (1.7976931348623159E+308 :: Double)) 1
1.0
but this fix is largely useless because you have a much bigger problem going on.
For numbers this large, the precision of a Double is on the order of plus or minus 1e292. That is, two "adjacent" representable finite doubles of this size are about 1e292 apart and the sin of two such numbers might as well be random numbers between -1 and 1, completely unrelated to any calculation you're trying to complete. Whatever you're trying to do with these numbers can't possibly be working as you intend.
It seems like this is a floating point error; see this similar post. So for very large values, the sin function is returning a value slightly above 1, due to rounding errors.
To solve your problem, I would cap the return value at 1. Specifically, return the min 1 (sin z) instead of just the sin z directly.
Edit: replaced max with min.

haskell optimized fill of max value

Started building some small real world stuff in Haskell for getting some routine. Need piece of code that takes a value-A (max) and a list of values (x:xs) that will return a list of values of which the sum equals or is lower than value-A. What I have now below, but this keeps fitting numbers until full. It will not return the most optimal sequence.
fill :: Int -> [Int] -> Int
fill max (x:xs) = if x < max then fill (max - x) xs else fill max xs
fill max [] = max
I think I would need to write a function that takes the first value of the list, traverses the rest of the list for the most optimal addition, add that to another list and keep doing that until the smallest remainder or a match with max. Guess might be simple for you but proven a tough nut to crack for me.
Edit: if I sort the value list first, it's a little simpler it seems, since I can discard values that blow up max

How to get n digits of a Haskell Double?

I want to have a function digits :: Double -> Int -> Double, that gives n digits of a double (must round!) and returns a new double only with the digits wanted.
Example: digits 1.2345 3 -> 1.235
I could implement this using strings, but I think there is a better solution.
Any idea how to implement this function?
Thanks in advance!
One relatively simple way is to multiply by an appropriate power of ten and round, then divide away the power of ten. For example:
digits d n = fromInteger (round (d * 10^n)) / 10^n
In ghci:
> digits 1.2346 3
1.235
> digits 1.2344 3
1.234
> digits 1.2345 3
1.234
The last example shows that Haskell uses banker's rounding by default, so it doesn't quite meet your spec -- but perhaps it is close enough to be useful anyway. It is easy to implement other rounding variants (or find them on Hackage) if you really need them.

string Rabin-Karp elementary number notations

I am reading about String algorithms in Introduction to Algorithms by Cormen etc
Following is text about some elementary number theoretic notations.
Note: In below text refere == as modulo equivalence.
Given a well-defined notion of the remainder of one integer when divided by another, it is convenient to provide special notation to indicate equality of remainders. If (a mod n) = (b mod n), we write a == b (mod n) and say that a is equivalent to b, modulo n. In other words, a == b (mod n) if a and b have the same remainder when divided by n. Equivalently, a == b (mod n) if and only if n | (b - a).
For example, 61 == 6 (mod 11). Also, -13 == 22 == 2 == (mod 5).
The integers can be divided into n equivalence classes according to their remainders modulo n. The equivalence class modulo n containing an integer a is
[a]n = {a + kn : k Z} .
For example, [3]7 = {. . . , -11, -4, 3, 10, 17, . . .}; other denotations for this set are [-4]7 and [10]7.
Writing a belongs to [b]n is the same as writing a == b (mod n). The set of all such equivalence classes is
Zn = {[a]n : 0 <= a <= n - 1}.----------> Eq 1
My question in above text is in equation 1 it is mentioned that "a" should be between 0 and n-1, but in example it is given as -4 which is not between 0 and 6, why?
In addition to above it is mentioned that for Rabin-Karp algorithm we use equivalence of two numbers modulo a third number? What does this mean?
I'll try to nudge you in the right direction, even though it's not about programming.
The example with -4 in it is an example of an equivalence class, which is a set of all numbers equivalent to a given number. Thus, in [3]7, all numbers are equivalent (modulo 7) to 3, and that includes -4 as well as 17 and 710 and an infinity of others.
You could also name the same class [10]7, because every number that is equivalent (modulo 7) to 3 is at the same time equivalent (modulo 7) to 10.
The last definition gives a set of all distinct equivalence classes, and states that for modulo 7, there is exactly 7 of them, and can be produced by numbers from 0 to 6. You could also say
Zn = {[a]n : n <= a < 2 * n}
and the meaning will remain the same, since [0]7 is the same thing as [7]7, and [6]7 is the same thing as [13]7.
This is not a programming question, but never mind...
it is mentioned that "a" should be between 0 and n-1, but in example it is given as -4 which is not between 0 and 6, why?
Because [-4]n is the same equivalence class as [x]n for some x such that 0 <= x < n. So equation 1 takes advantage of the fact to "neaten up" the definition and make all the possibilities distinct.
In addition to above it is mentioned that for Rabin-Karp algorithm we use equivalence of two numbers modulo a third number? What does this mean?
The Rabin-Karp algorithm requires you to calculate a hash value for the substring you are searching for. When hashing, it is important to use a hash function that uses the whole of the available domain even for quite small strings. If your hash is a 32 bit integer and you just add the successive unicode values together, your hash will usually be quite small resulting in lots of collisions.
So you need a function that can give you large answers. Unfortunately, this also exposes you to the possibility of integer overflow. Hence you use modulo arithmetic to keep the comparisons from being messed up by overflow.

Given string s, find the shortest string t, such that, t^m=s

Given string s, find the shortest string t, such that, t^m=s.
Examples:
s="aabbb" => t="aabbb"
s="abab" => t = "ab"
How fast can it be done?
Of course naively, for every m divides |s|, I can try if substring(s,0,|s|/m)^m = s.
One can figure out the solution in O(d(|s|)n) time, where d(x) is the number of divisors of s. Can it be done more efficiently?
This is the problem of computing the period of a string. Knuth, Morris and Pratt's sequential string matching algorithm is a good place to get started. This is in a paper entitled "Fast Pattern Matching in Strings" from 1977.
If you want to get fancy with it, then check out the paper "Finding All Periods and Initial Palindromes of a String in Parallel" by Breslauer and Galil in 1991. From their abstract:
An optimal O(log log n) time CRCW-PRAM algorithm for computing all
periods of a string is presented. Previous parallel algorithms compute
the period only if it is shorter than half of the length of the
string. This algorithm can be used to find all initial palindromes of
a string in the same time and processor bounds. Both algorithms are
the fastest possible over a general alphabet. We derive a lower bound
for finding palindromes by a modification of a previously known lower
bound for finding the period of a string [3]. When p processors are
available the bounds become \Theta(d n p e + log log d1+p=ne 2p).
I really like this thing called the z-algorithm: http://www.utdallas.edu/~besp/demo/John2010/z-algorithm.htm
For every position it calculates the longest substring starting from there, that is also a prefix of the whole string. (in linear time of course).
a a b c a a b x a a a z
1 0 0 3 1 0 0 2 2 1 0
Given this "z-table" it is easy to find all strings that can be exponentiated to the whole thing. Just check for all positions if pos+z[pos] = n.
In our case:
a b a b
0 2 0
Here pos = 2 gives you 2+z[2] = 4 = n hence the shortest string you can use is the one of length 2.
This will even allow you to find cases where only a prefix of the exponentiated string matches, like:
a b c a
0 0 1
Here (abc)^2 can be cut down to your original string. But of course, if you don't want matches like this, just go over the divisors only.
Yes you can do it in O(|s|) time.
You can search for a "target" string of length n in a "source" string of length m in O(n+m) time. Build a solution based on that.
Let both source and target be s. An additional constraint is that 1 and any positions in the source that do not divide |s| are not valid starting positions for the match. Of course the search per se will always fail. But if there's a partial match and you have reached the end of the sourse string, then you have a solution to the original problem.
a modification to Boyer-Moore could possibly handle this in O(n) where n is length of s
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

Resources