I wonder if my complexity analysis (T worst case for n elements/nodes) is correct for the following function leaves in Haskell (Note: wurzel = root; C = constant factor)
--abstract data type for bin trees
data Bintree el = Empty
| Node {left :: Bintree el, root :: el, right :: Bintree el}
deriving Show
--extract all leaves of a given Bintree (output: list)
leaves :: Bintree el -> [el]
leaves Empty = []
leaves (Node Empty root Empty) = [root]
leaves (Node left root right) = leaves left ++ leaves right
No, there are many mistakes. Here are a few of the more glaring ones:
When you write T(n/2)+T(n/2)+T(n/4)+T(n/4)+..., you seem to be assuming that half of the nodes are in the left branch and half are in the right. That's not always true -- some trees are balanced, but some certainly are not.
Even if the tree is balanced, there are not only 2 subtrees of size n/4 -- there are 4. Similarly there are 8 subtrees of size n/8, not 2.
The correct expression to describe "dividing n by 2 i times" is n/(2^i), not n/(i^2). Additionally, the above comment about balancing notwithstanding, you would want to keep dividing until you reached just one leaf, so the correct base case of the ellipsis is T(n/n), not either one of T(n/(2^n)) or T(n/(n^2)).
If you repeatedly divide by two, and add the results, as in n + n/2 + n/4 + n/8 + n/16 + ..., forever, you get 2*n, not log_2(n).
Anyway, that doesn't apply, because you are not adding multiples of n. T(n) + T(n/2) + T(n/4) + T(n/8) + T(n/16) + ... is not necessarily related in any special way to T(2*n) (nor to T(log_2(n))). For example, imagine if f(n) = 1. Then the sum f(1) + f(1/2) + f(1/4) + f(1/8) + f(1/16) + ... = 1 + 1 + 1 + 1 + 1 + ... diverges, even though f(1 + 1/2 + 1/4 + 1/8 + 1/16 + ...) = f(2) = 1.
Related
Brocard's problem is n! + 1 = m^2. The solutions to this problems are pairs of integers called Brown numbers (4,5), etc, of which only three are known.
A very literal implementation to Brocard's problem:
import math
def brocard(n,m):
if math.factorial(n)+1 == m**2:
return (n,m)
else:
return
a=10000
for n in range(a):
for m in range(a):
b=brocard(n,m)
if b is not None:
print(b)
The time complexity of this should be O(n^2) because of the nested for loops with differing variables and the complexity of whatever math.factorial algorithm is (apparently divide-and-conquer). Is there any way to improve upon O(n^2)?
There are other interpretations on SO like this. How does the time complexity of this compare with my implementation?
Your algorithm is O(n^3).
You have two nested loops, and inside you use factorial(), having O(n) complexity itself.
Your algorithm tests all (n,m) combinations, even those where factorial(n) and m^2 are far apart, e.g. n=1 and m=10000.
You always recompute the factorial(n) deep inside the loop, although it's independent of the inner loop variable m. So, it could be moved outside of the inner loop.
And, instead of always computing factorial(n) from scratch, you could do that incrementally. Whenever you increment n by 1, you can multiply the previous factorial by n.
A different, better approach would be not to use nested loops, but to always keep n and m in a number range so that factorial(n) is close to m^2, to avoid checking number pairs that are vastly off. We can do this by deciding which variable to increment next. If the factorial is smaller, then the next brocard pair needs a bigger n. If the square is smaller, we need a bigger m.
In pseudo code, that would be
n = 1; m = 1; factorial = 1;
while n < 10000 and m < 10000
if factorial + 1 == m^2
found a brocard pair
// the next brocard pair will have different n and m,
// so we can increment both
n = n + 1
factorial = factorial * n
m = m + 1
else if factorial + 1 < m^2
// n is too small for the current m
n = n + 1
factorial = factorial * n
else
// m is too small for the given n
m = m + 1
In each loop iteration, we either increment n or m, so we can have at most 20000 iterations. There is no inner loop in the algorithm. We have O(n). So, this should be fast enough for n and m up to the millions range.
P.S. There are still some optimizations possible.
Factorials (after n=1, known to have no brocard pair) are always even numbers, so m^2 must be odd to satisfy the brocard condition, meaning that we can always increment m by 2, skipping the even number in between.
For larger n values, the factorial increases much faster than the square. So, instead of incrementing m until its square reaches the factorial+1 value, we could recompute the next plausible m as integer square root of factorial+1.
Or, using the square root approach, just compute the integer square root of factorial(n), and check if it matches, without any incremental steps for m.
I constructed a finite automata for the language L of all strings made of the symbols 0, 1 and 2 (Σ = {0, 1, 2}) where the last symbol is not smaller than the first symbol. E.g., the strings 0, 2012, 01231 and 102 are in the language, but 10, 2021 and 201 are not in the language.
Then from that an GNFA so I can convert to RE.
My RE looks like this:
(0(0+1+2)* )(1(0(1+2)+1+2)* )(2((0+1)2+2))*)
I have no idea if this is correct, as I think I understand RE but not entirely sure.
Could someone please tell me if it’s correct and if not why?
There is a general method to convert any DFA into a regular expression, and is probably what you should be using to solve this homework problem.
For your attempt specifically, you can tell whether an RE is incorrect by finding a word that should be in the language, but that your RE doesn't accept, or a word that shouldn't be in the language that the RE does accept. In this case, the string 1002 should be in the language, but the RE doesn't match it.
There are two primary reasons why this string isn't matched. The first is that there should be a union rather than a concatenation between the three major parts of the language (words starting with 0, 1 and 2, respectively:
(0(0+1+2)*) (1(0(1+2)+1+2)*) (2((0+1)2+2))*) // wrong
(0(0+1+2)*) + (1(0(1+2)+1+2)*) + (2((0+1)2+2))*) // better
The second problem is that in the 1 and 2 cases, the digits smaller than the starting digit need to be repeatable:
(1(0 (1+2)+1+2)*) // wrong
(1(0*(1+2)+1+2)*) // better
If you do both of those things, the RE will be correct. I'll leave it as an exercise for you to follow that step for the 2 case.
The next thing you can try is find a way to make the RE more compact:
(1(0*(1+2)+1+2)*) // verbose
(1(0*(1+2))*) // equivalent, but more compact
This last step is just a matter of preference. You don't need the trailing +1+2 because 0* can be of zero length, so 0*(1+2) covers the +1+2 case.
You can use an algorithm but this DFA might be easy enough to convert as a one-off.
First, note that if the first symbol seen in the initial state is 0, you transition to state A and remain there. A is accepting. This means any string beginning with 0 is accepted. Thus, our regular expression might as well have a term like 0(0+1+2)* in it.
Second, note that if the first symbol seen in the initial state is 1, you transition to state B and remain in states B and D from that point on. You only leave B if you see 0 and you stay out of B as long as you keep seeing 0. The only way to end on D is if the last symbol you saw was 0. Therefore, strings beginning with 1 are accepted if and only if the strings don't end in 0. We can have a term like 1(0+1+2)*(1+2) in our regular expression as well to cover these cases.
Third, note that if the first symbol seen in the initial state is 2, you transition to state C and remain in states C and E from that point on. You leave state C if you see anything but 2 and stay out of B until you see a 2 again. The only way to end up on C is if the last symbol you saw was 2. Therefore, strings beginning with 2 are accepted if and only if the strings end in 2. We can have a term like 2(0+1+2)*(2) in our regular expression as well to cover these cases.
Finally, we see that there are no other cases to consider; our three terms cover all cases and the union of them fully describes our language:
0(0+1+2)* + 1(0+1+2)*(1+2) + 2(0+1+2)*2
It was easy to just write out the answer here because this DFA is sort of like three simple DFAs put together with a start state. More complicated DFAs might be easier to convert to REs using algorithms that don't require you understand or follow what the DFA is doing.
Note that if the start state is accepting (mentioned in a comment on another answer) the RE changes as follows:
e + 0(0+1+2)* + 1(0+1+2)*(1+2) + 2(0+1+2)*2
Basically, we just tack the empty string onto it since it is not already generated by any of the other parts of the aggregate expression.
You have the equivalent of what is known as a right-linear system. It's right-linear because the variables occur on the right hand sides only to the first degree and only on the right-hand sides of each term. The system that you have may be written - with a change in labels from 0,1,2 to u,v,w - as
S ≥ u A + v B + w C
A ≥ 1 + (u + v + w) A
B ≥ 1 + u D + (v + w) B
C ≥ 1 + (u + v) E + w C
D ≥ u D + (v + w) B
E ≥ (u + v) E + w C
The underlying algebra is known as a Kleene algebra. It is defined by the following identities that serve as its fundamental properties
(xy)z = x(yz), x1 = x = 1x,
(x + y) + z = x + (y + z), x + 0 = x = 0 + x,
y0z = 0, w(x + y)z = wxz + wyz,
x + y = y + x, x + x = x,
with a partial ordering relation defined by
x ≤ y ⇔ y ≥ x ⇔ ∃z(x + z = y) ⇔ x + y = y
With respect to this ordering relation, all finite subsets have least upper bounds, including the following
0 = ⋁ ∅, x + y = ⋁ {x, y}
The sum operator "+" is the least upper bound operator.
The system you have is a right-linear fixed point system, since it expresses the variables on the left as a (right-linear) function, as given on the right, of the variables. The object being specified by the system is the least solution with respect to the ordering; i.e. the least fixed point solution; and the regular expression sought out is the value that the main variable has in the least fixed point solution.
The last axiom(s) for Kleene algebras can be stated in any of a number of equivalent ways, including the following:
0* = 1
the least fixed point solution to x ≥ a + bx + xc is x = b* a c*.
There are other ways to express it. A consequence is that one has identities such as the following:
1 + a a* = a* = 1 + a* a
(a + b)* = a* (b a*)*
(a b)* a = a (b a)*
In general, right linear systems, such as the one corresponding to your problem may be written in vector-matrix form as 𝐪 ≥ 𝐚 + A 𝐪, with the least fixed point solution given in matrix form as 𝐪 = A* 𝐚. The central theorem of Kleene algebras is that all finite right-linear systems have least fixed point solutions; so that one can actually define matrix algebras over Kleene algebras with product and sum given respectively as matrix product and matrix sum, and that this algebra can be made into a Kleene algebra with a suitably-defined matrix star operation through which the least fixed point solution is expressed. If the matrix A decomposes into block form as
B C
D E
then the star A* of the matrix has the block form
(B + C E* D)* (B + C E* D)* C E*
(E + D B* C)* D B* (E + D B* C)*
So, what this is actually saying is that for a vector-matrix system of the form
x ≥ a + B x + C y
y ≥ b + D x + E y
the least fixed point solution is given by
x = (B + C E* D)* (a + C E* b)
y = (E + D B* C)* (D B* a + b)
The star of a matrix, if expressed directly in terms of its components, will generally be huge and highly redundant. For an n×n matrix, it has size O(n³) - cubic in n - if you allow for redundant sub-expressions to be defined by macros. Otherwise, if you in-line insert all the redundancy then I think it blows up to a highly-redundant mess that is exponential in n in size.
So, there's intelligence required and involved (literally meaning: AI) in finding or pruning optimal forms that avoid the blow-up as much as possible. That's a non-trivial job for any purported matrix solver and regular expression synthesis compiler.
An heuristic, for your system, is to solve for the variables that don't have a "1" on the right-hand side and in-line substitute the solutions - and to work from bottom-up in terms of the dependency chain of the variables. That would mean starting with D and E first
D ≥ u* (v + w) B
E ≥ (u + v)* w C
In-line substitute into the other inequations
S ≥ u A + v B + w C
A ≥ 1 + (u + v + w) A
B ≥ 1 + u u* (v + w) B + (v + w) B
C ≥ 1 + (u + v) (u + v)* w C + w C
Apply Kleene algebra identities (e.g. x x* y + y = x* y)
S ≥ u A + v B + w C
A ≥ 1 + (u + v + w) A
B ≥ 1 + u* (v + w) B
C ≥ 1 + (u + v)* w C
Solve for the next layer of dependency up: A, B and C:
A ≥ (u + v + w)*
B ≥ (u* (v + w))*
C ≥ ((u + v)* w)*
Apply some more Kleene algebra (e.g. (x* y)* = 1 + (x + y)* y) to get
B ≥ 1 + N (v + w)
C ≥ 1 + N w
where, for convenience we set N = (u + v + w)*. In-line substitute at the top-level:
S ≥ u N + v (1 + N (v + w)) + w (1 + N w).
The least fixed point solution, in the main variable S, is thus:
S = u N + v + v N (v + w) + w + w N w.
where
N = (u + v + w)*.
As you can already see, even with this simple example, there's a lot of chess-playing to navigate through the system to find an optimally-pruned solution. So, it's certainly not a trivial problem. What you're essentially doing is synthesizing a control-flow structure for a program in a structured programming language from a set of goto's ... essentially the core process of reverse-compiling from assembly language to a high level language.
One measure of optimization is that of minimizing the loop-depth - which here means minimizing the depth of the stars or the star height. For example, the expression x* (y x*)* has star-height 2 but reduces to (x + y)*, which has star height 1. Methods for reducing star-height come out of the research by Hashiguchi and his resolution of the minimal star-height problem. His proof and solution (dating, I believe, from the 1980's or 1990's) is complex and to this day the process still goes on of making something more practical of it and rendering it in more accessible form.
Hashiguchi's formulation was cast in the older 1950's and 1960's formulation, predating the axiomatization of Kleene algebras (which was in the 1990's), so to date, nobody has rewritten his solution in entirely algebraic form within the framework of Kleene algebras anywhere in the literature ... as far as I'm aware. Whoever accomplishes this will have, as a result, a core element of an intelligent regular expression synthesis compiler, but also of a reverse-compiler and programming language synthesis de-compiler. Essentially, with something like that on hand, you'd be able to read code straight from binary and the lid will be blown off the world of proprietary systems. [Bite tongue, bite tongue, mustn't reveal secret yet, must keep the ring hidden.]
. Is there any Direct formula or System to find out the Numbers of Zero's between a Distinct Range ... Let two Integer M & N are given . if I have to find out the total number of zero's between this Range then what should I have to do ?
Let M = 1234567890 & N = 2345678901
And answer is : 987654304
Thanks in advance .
Reexamining the Problem
Here is a simple solution in Ruby, which inspects each integer from the interval [m,n], determines the string of its digits in the standard base 10 positional system, and counts the occuring 0 digits:
def brute_force(m, n)
if m > n
return 0
end
z = 0
m.upto(n) do |k|
z += k.to_s.count('0')
end
z
end
If you run it in an interactive Ruby shell you will get
irb> brute_force(1,100)
=> 11
which is fine. However using the interval bounds from the example in the question
m = 1234567890
n = 2345678901
you will recognize that this will take considerable time. On my machine it does need more than a couple of seconds, I had to cancel it so far.
So the real question is not only to come up with the correct zero counts but to do it faster than the above brute force solution.
Complexity: Running Time
The brute force solution needs to perform n-m+1 times searching the base 10 string for the number k, which is of length floor(log_10(k))+1, so it will not use more than
O(n (log(n)+1))
string digit accesses. The slow example had an n of roughly n = 10^9.
Reducing Complexity
Yiming Rong's answer is a first attempt to reduce the complexity of the problem.
If the function for calculating the number of zeros regarding the interval [m,n] is F(m,n), then it has the property
F(m,n) = F(1,n) - F(1,m-1)
so that it suffices to look for a most likely simpler function G with the property
G(n) = F(1,n).
Divide and Conquer
Coming up with a closed formula for the function G is not that easy. E.g.
the interval [1,1000] contains 192 zeros, but the interval [1001,2000] contains 300 zeros, because a case like k = 99 in the first interval would correspond to k = 1099 in the second interval, which yields another zero digit to count. k=7 would show up as 1007, yielding two more zeros.
What one can try is to express the solution for some problem instance in terms of solutions to simpler problem instances. This strategy is called divide and conquer in computer science. It works if at some complexity level it is possible to solve the problem instance and if one can deduce the solution of a more complex problem from the solutions of the simpler ones. This naturally leads to a recursive formulation.
E.g. we can formulate a solution for a restricted version of G, which is only working for some of the arguments. We call it g and it is defined for 9, 99, 999, etc. and will be equal to G for these arguments.
It can be calculated using this recursive function:
# zeros for 1..n, where n = (10^k)-1: 0, 9, 99, 999, ..
def g(n)
if n <= 9
return 0
end
n2 = (n - 9) / 10
return 10 * g(n2) + n2
end
Note that this function is much faster than the brute force method: To count the zeros in the interval [1, 10^9-1], which is comparable to the m from the question, it just needs 9 calls, its complexity is
O(log(n))
Again note that this g is not defined for arbitrary n, only for n = (10^k)-1.
Derivation of g
It starts with finding the recursive definition of the function h(n),
which counts zeros in the numbers from 1 to n = (10^k) - 1, if the decimal representation has leading zeros.
Example: h(999) counts the zero digits for the number representations:
001..009
010..099
100..999
The result would be h(999) = 297.
Using k = floor(log10(n+1)), k2 = k - 1, n2 = (10^k2) - 1 = (n-9)/10 the function h turns out to be
h(n) = 9 [k2 + h(n2)] + h(n2) + n2 = 9 k2 + 10 h(n2) + n2
with the initial condition h(0) = 0. It allows to formulate g as
g(n) = 9 [k2 + h(n2)] + g(n2)
with the intital condition g(0) = 0.
From these two definitions we can define the difference d between h and g as well, again as a recursive function:
d(n) = h(n) - g(n) = h(n2) - g(n2) + n2 = d(n2) + n2
with the initial condition d(0) = 0. Trying some examples leads to a geometric series, e.g. d(9999) = d(999) + 999 = d(99) + 99 + 999 = d(9) + 9 + 99 + 999 = 0 + 9 + 99 + 999 = (10^0)-1 + (10^1)-1 + (10^2)-1 + (10^3)-1 = (10^4 - 1)/(10-1) - 4. This gives the closed form
d(n) = n/9 - k
This allows us to express g in terms of g only:
g(n) = 9 [k2 + h(n2)] + g(n2) = 9 [k2 + g(n2) + d(n2)] + g(n2) = 9 k2 + 9 d(n2) + 10 g(n2) = 9 k2 + n2 - 9 k2 + 10 g(n2) = 10 g(n2) + n2
Derivation of G
Using the above definitions and naming the k digits of the representation q_k, q_k2, .., q2, q1 we first extend h into H:
H(q_k q_k2..q_1) = q_k [k2 + h(n2)] + r (k2-kr) + H(q_kr..q_1) + n2
with initial condition H(q_1) = 0 for q_1 <= 9.
Note the additional definition r = q_kr..q_1. To understand why it is needed look at the example H(901), where the next level call to H is H(1), which means that the digit string length shrinks from k=3 to kr=1, needing an additional padding with r (k2-kr) zero digits.
Using this, we can extend g to G as well:
G(q_k q_k2..q_1) = (q_k-1) [k2 + h(n2)] + k2 + r (k2-kr) + H(q_kr..q_1) + g(n2)
with initial condition G(q_1) = 0 for q_1 <= 9.
Note: It is likely that one can simplify the above expressions like in case of g above. E.g. trying to express G just in terms of G and not using h and H. I might do this in the future. The above is already enough to implement a fast zero calculation.
Test Result
recursive(1234567890, 2345678901) =
987654304
expected:
987654304
success
See the source and log for details.
Update: I changed the source and log according to the more detailed problem description from that contest (allowing 0 as input, handling invalid inputs, 2nd larger example).
You can use a standard approach to find m = [1, M-1] and n = [1, N], then [M, N] = n - m.
Standard approaches are easily available: Counting zeroes.
I need to find the time complexity (in terms of theta) of this function:
int x = 0;
for (int i=1; i < n ; i++) {
for (double j=i; j <= n ; j+=sqrt(i)) {
++x;
}
}
I know that the outer loop does n-1 iterations and the inner loop does (n-i)/sqrt(i) iterations so I need to calculate sigma of i=1 to n-1 of (n-i)/sqrt(i). Any idea how to do that?
EDIT:
Assume sqrt() runs in O(1).
I don't know what sigma and theta mean, but sqrt is a constant time operation so it basically doesn't matter in big O notation, ie j+=sqrt(i); is the same as j+=i; is the same as j+=1;. Also (n-k) ~= n for k much less than n. This means as n gets large n-i is just n. So (n-i) * sqrt() = n * 1 = n. And you do this n times for the outer loop so n^2.
Addition:
As to your complicated series, I'm sure this is accurate, but it is not what we care about, we care about the order of the operation. So we need show your series is O(n^2) or K*n^2. So you have i + 2*i + ... (n-1)*i + n*i. Where i is constant so we can factor it out and wrap it up in K and are left with 1 + ... + n. This statement is dominated by n ie as n gets large n ~= (n-1), and (n-1) ~= (n-2) which implies that (n-2) ~= n. Now this doesn't hold as we approach zero, but n is so large we can drop the first say million terms. so we are left with some function that looks like
C*(n-k)*n + c. where C, k, and c are all constant. Since we don't care about constants we just care about growth as n grows we can drop all these constants and just save the n^2. Alternatively, you could show that your series is bounded by n^k*n where k goes to one as n approaches infinity, but a good logic argument is usually better. ~Ben
I'm rather new to Haskell. The problem is to find the sum of all even Fibonacci numbers not greater than 4 million. I can't use lists.
If I understand correctly, the below solution is wrong, because it uses lists:
my_sum = sum $ filter (odd) $ takeWhile (< 4000000) fibs
Where fibs is the list of all Fibonacci numbers.
Somehow, I find it difficult not to think in Haskell in terms of lists. Could anyone guide me to a solution to this problem?
Regards
EDIT:
If anyone is interested, I've solved this problem. Here is the code (very clumsy-looking, but works nevertheless):
findsum threshold = findsum' 0 1 0 threshold
findsum' n1 n2 accu t
| n2 > t = accu
| odd n2 = findsum' n2 n3 accu t
| otherwise = findsum' n2 n3 accu2 t
where
n3 = n2 + n1
accu2 = accu + n2
You might find it easier to build this in excel and then figure the code out from there. It is pretty easy to do this in excel. Just put 1 in the first cell and put 1 just below it. Then make every cell below that add the two above it. (ie, cell a3 contains =A1+A2). Make the next column contain only even values "ie, if(mod(a3,2)==0,a3,0)". Next, put your running sum in the third column. Based on that you should be able to come up with the recursive solution.
Another way is to start with the problem. You only want a total which screams for an accumulator.
sumFib :: Integer -> Integer
sumFib threshold = sumFib' 1 1 0 threshold
sumFib' :: Integer -> Integer -> Integer -> Integer -> Integer
sumFib' n1 n2 acc threshold
You can see the signatures of my functions above. I built a pretty front end that takes a threshold (4,000,000) to decide when to quit building fibonacci numbers. Then I pass this plus the first 2 fibonacci numbers and an accumulator into the worker function "sumFib" which does the recursion. Voila...the answer, "4613732", without a list....
n1 is the n-1 fibonacci number and n2 is the n-2 fibonacci number.
Hope that helps.
EDIT: here is my full solution:
sumFib :: Integer -> Integer
sumFib threshold = sumFib' 1 1 0 threshold
sumFib' :: Integer -> Integer -> Integer -> Integer -> Integer
sumFib' n1 n2 acc threshold
| n1 > threshold = acc
| otherwise = sumFib' (n2+n1) n1 newAcc threshold
where newAcc = if n1 `mod` 2 == 0
then n1 + acc
else acc
You can do this without a list, with a recursive solution, using continuation-passing style.
BTW running through all the fibonacci numbers and filtering out the odd ones is the slow way to solve this problem.
Again, a non-example for how useful computers can be:
You can do this without a computer!
1st observation: Every third Fibo-number is even, the first even Fibo-number is F_3=2
Indeed: odd+odd=even; odd+even=odd; even+odd=odd, which already closes the circle
2nd observation: F_3 + F_6 + F_9 + ... + F_{3k} = 1/2 (F_{3k+2} - 1)
By Induction: F_3 = 2 = 1/2 (5 - 1) = 1/2 (F_5 - 1)
F_3 + F_6 + ... + F_{3k+3} = 1/2 (F_{3k+2} - 1) + F_{3k+3} = 1/2 (F_{3k+2} + 2F_{3k+3} -1) = 1/2 (F_{3k+4} + F_{3k+3} -1) = 1/2 (F_{3k+5} -1)
3rd observation: The sum will have 1333333 summands, the last one being the 3999999-th Fibo-number.
4th observation: F_n = 1/sqrt(5) * (phi^n - (1-phi)^n)
Proof by Wikipedia
Now, we can put the parts together:
F_3 + F_6 + ... + F_3999999 = 1/2 (F_4000001 - 1) = 1/2 1/sqrt(5) (phi^4000001 - (1-phi)^4000001) - 1/2 = int(1/2 1/sqrt(5) phi^4000001)
Here int is the integer part. The last step works, because -1 < 1-phi < 0 and so (1-phi)^4000001 nearly vanishes. You might want to use a calculator to get a numerical value.