Is there a way to optimise this program in Haskell? - haskell

I am doing project euler question 224. And whipped up this list comprehension in Haskell:
prob39 = length [ d | d <- [1..75000000], c <- [1..37500000], b <-[1..c], a <- [1..b], a+b+c == d, a^2 + b^2 == (c^2 -1)]
I compiled it with GHC and it has been running with above average kernel priority for over an hour without returning a result. What can I do to optimise this solution? It seems I am getting better at finding brute force solutions in a naive manner. Is there anything I can do about this?
EDIT: I am also unclear about the definition of 'integral length', does this just mean the side length has a magnitude which falls in the positive set of integers, i.e: 1,2,3,4,5... ?

My Haskell isn't amazing, but I think this is going to be n^5 as written.
It looks like you're saying for each n from 1 to 75 million, check every "barely obtuse" triangle with a perimiter less than or equal to 75 million to see if it has perimiter n.
Also I'm not certain if list comprehensions are smart enough to stop looking once the current value of c^2 -1 is greater than a^2 + b^2.
A simple refactor should be
prob39 = length [ (a, b, c) | c <- [1..37500000], b <-[1..c], a <- [1..b], a^2 + b^2 == (c^2 -1), (a + b + c) <= 75000000]
You can make it better, but that should literally be 75 million times faster.
Less certain about this refactoring, but it should also speed things up considerably:
prob39 = length [ (a, b, c) | a <- [1..25000000], b <-[a..(75000000 - 2*a)], c <- [b..(75000000 - a - b)], a^2 + b^2 == (c^2 -1)]
Syntax may not be 100% there. The idea is that a can only be 1 to 25 million (since a <= b <= c and a + b + c <= 75 million). b can only be between a and halfway from a to 75 million (since b <= c) and c can only be from b to 75 million - (a + b), otherwise the perimeter would be over 75 million.
Edit: updated code snippets, there were a couple of bugs in there.
Another quick suggestion, you can replace c <- [b..(75000000 - a - b)] with something along the lines of c <- [b..min((75000000 - a - b), sqrt(aa + bb) + 1)]. There's no need to bother checking any values of c greater than the ceiling of the square root of (a^2 + b^2). Can't remember if those are the correct min/sqrt function names in haskell though.
Getting OCD on this one, I have a couple more suggestions.
1) you can set the upper bound on b to be the min of the current upper bound and a^2 * 2 + 1. This is based on the principle that (x+1)^2 - x^2 = 2x + 1. b cannot be so much larger than a that we can guarantee that (a^2) + (b^2) < (b+1)^2.
2) set the lower bound of c to be max of b + 1 and floor(sqrt(a^2 + b^2) - 1). Just like the upper limit on C, no need to test values which couldn't possibly be correct.

Along with the suggestions given #patros.
I would like to share my observations on this problem.
If we print the values of a , b and c for some perimeter say 100000, then we can observe that a and b always take even values and c always take odd values. So if we optimize our code with these restrictions then almost half the checking can be skipped.

Related

String manipulation with dynamic programming

I have a problem where I have a string of length N, where (1 ≤ N ≤ 10^5). This string will only have lower case letters.
We have to rewrite the string so that it has a series of "streaks", where the same letter is included at least K (1 ≤ K ≤ N) times in a row.
It costs a_ij to change a single specific letter in the string from i to j. There are M different possible letters you can change each letter to.
Example: "abcde" is the input string. N = 5 (length of "abcde"), M = 5 (letters are A, B, C, D, E), and K = 2 (each letter must be repeated at least 2 times) Then we are given a M×M matrix of values a_ij, where a_ij is an integer in the range 0…1000 and a_ii = 0 for all i.
0 1 4 4 4
2 0 4 4 4
6 5 0 3 2
5 5 5 0 4
3 7 0 5 0
Here, it costs 0 to change from A to A, 1 to change from A to B, 4 to change from A to C, and so on. It costs 2 to change from B to A.
The optimal solution in this example is to change the a into b, change the d into e, and then change both e’s into c’s. This will take 1 + 4 + 0 + 0 = 5 moves, and the final combo string will be "bbccc".
It becomes complicated as it might take less time to switch from using button i to an intermediate button k and then from button k to button j rather than from i to j directly (or more generally, there may be a path of changes starting with i and ending with j that gives the best overall cost for switching from button i ultimately to button j).
To solve for this issue, I am treating the matrix as a graph, and then performing Floyd Warshall to find the fastest time to switch letters. This will take O(M^3) which is only 26^3.
My next step is to perform dynamic programming on each additional letter to find the answer. If someone could give me advice on how to do this, I would be thankful!
Here are some untested ideas. I'm not sure if this is efficient enough (or completely worked out) but it looks like 26 * 3 * 10^5. The recurrence could be converted to a table, although with higher Ks, memoisation might be more efficient because of reduced state possibilities.
Assume we've recorded 26 prefix arrays for conversion of the entire list to each of the characters using the best conversion schedule, using a path-finding method. This lets us calculate the cost of a conversion of a range in the string in O(1) time, using a function, cost.
A letter in the result can be one of three things: either it's the kth instance of character c, or it's before the kth, or it's after the kth. This leads to a general recurrence:
f(i, is_kth, c) ->
cost(i - k + 1, i, c) + A
where
A = min(
f(i - k, is_kth, c'),
f(i - k, is_after_kth, c')
) forall c'
A takes constant time since the alphabet is constant, assuming earlier calls to f have been tabled.
f(i, is_before_kth, c) ->
cost(i, i, c) + A
where
A = min(
f(i - 1, is_before_kth, c),
f(i - 1, is_kth, c'),
f(i - 1, is_after_kth, c')
) forall c'
Again A is constant time since the alphabet is constant.
f(i, is_after_kth, c) ->
cost(i, i, c) + A
where
A = min(
f(i - 1, is_after_kth, c),
f(i - 1, is_kth, c)
)
A is constant time in the latter. We would seek the best result of the recurrence applied to each character at the end of the string with either state is_kth or state is_after_kth.

How can I solve this classical dynamic programming problem?

There are N jewellery shop(s). Each jewellery shop has three kinds of coins - Gold, Platinum, and Diamond having worth value A, B, and C respectively. You decided to go to each of N jewellery shop and take coins from each of the shop. But to do so following conditions must satisfy -
You can take at most 1 coin from an individual shop.
You can take at most X coins of Gold type.
You can take at most Y coins of Platinum type.
You can take at most Z coins of Diamond type.
You want to collect coins from shops in such a way that worth value of coins collected is maximised.
Input Format :
The first line contains an integer N. Where N is the number of jewellery shops.
The second line contains three integers X, Y, Z. Where X, Y, Z denotes the maximum number of coins you can collect of type Gold, Platinum, and diamond respectively.
Then N lines contain three space-separated integers A, B, C. Where A, B, C is the worth value of the Gold, Platinum, and diamond coin respectively.
Output Format :
Print a single integer representing the maximum worth value you can get.
Constraints :
1
<=
N
<=
200
1
<=
X
,
Y
,
Z
<=
N
1
<=
A
,
B
,
C
<
10
9
Example : -
4
2 1 1
5 4 5
4 3 2
10 9 7
8 2 9
Answer:-
27(9+9+5+4)
I tried the obvious greedy approach but it failed :-)

recursive exponentiator output and complexity

def exp3(a,b):
if b == 1:
return a
if (b%2)*2 == b:
return exp3(a*a, b/2)
else: return a*exp3(a,b-1)
This is a recursive exponentiator program.
Question 1:
If b is even, it will exceute (b%2)2 == b. If b is odd, it will exceute aexp3(a,b-1). There is no problem in my program. If b is 4, (4%2)*2=0, and 0 is not equal to b. So I can't understand how to calculate b when it's even.
Question 2:
I want to calucate the number of steps in the program. so according to my textbook, I can get the formual as follows.
b even t(b) = 6 + t(b/2)
b odd t(b) = 6 + t(b-1)
Why is the first number 6? How can I get the number 3 in the beginning?
Your (b%2)*2 == b test is never true. I think you want b % 2 == 0 to test if b is even. The code still gets the right answer because the other recursive case (intended only for odd b values) works for even ones too (it's just less efficient).
As for your other question, I have no idea where the 6 is coming from either. It depends a lot on what you're counting as a "step". Usually it's most useful to discuss performance in terms of "Big-O" values rather than specific numbers.

how to interpret coefficients in log-log market mix model

I am running a multivariate OLS regression as below using weekly sales and media data. I would like to understand how to calculate the sales contribution when doing log transforms like log-linear, linear-log and log-log.
For example:
Volume_Sales = b0 + b1.TV_GRP + b2.SocialMedia + b3.PaidSearch + e
In this case, the sales contributed by TV is b1 x TV_GRPs (coefficient multiplied by the TV GRP of that month)
Now, my question is: How do we calculate sales contribution for the below cases:
Log-Linear: ln(Volume_Sales) = b0 + b1.TV_GRP + b2.SocialMedia + b3.PaidSearch + e
Linear-Log: Volume_Sales = b0 + b1.TV_GRP) + b2. ln(SocialMedia) + b3. ln(PaidSearch) + e
Log-Log: *ln(Volume_Sales) = b0 + b1.TV_GRP) + b2. ln(SocialMedia) + b3. ln(PaidSearch) + e**
In general terms, a log transformation takes something that acts on the multiplicative scale and re-represents it on the additive scale so certain mathematical assumptions hold: among them, linearity. So to step beyond the "transform data we don't like" paradigm that many of us are guilty of, I like thinking in terms of "does it make most sense if an effect to this variable is additive (+3 units) or multiplicative (3 times as much, 20% reduction, etc)?" That and your diagnostic plots (residual, q-q, etc.) will do a good job of telling you what's the most appropriate in your case.
As for interpreting coefficients, here are some ways I've seen it done.
Linear: y = b0 + b1x + e
Interpretation: there is an estimated b1-unit increase in the mean of y for every 1-unit increase in x.
Log-linear: ln(y) = b0 + b1x + e
Interpretation: there is an estimated change in the median of y by a factor of exp(b1) for every 1-unit increase in x.
Linear-log: y = b0 + b1ln(x) + e
Interpretation: there is an estimated b1*ln(2)-unit increase in the mean of y when x is doubled.
Log-log: ln(y) = b0 + b1ln(x) + e
Interpretation: there is an estimated change in the median of y by a factor of 2^b1 when x is doubled.
Note: these can be fairly readily derived by considering what happens to y if you replace x with (x+1) or with 2x.
These generic-form interpretations tend to make more sense with a bit of context, particularly once you know the sign of the coefficient. Say you've got a log-linear model with an estimated b1 of -0.3. Exponentiated, this is exp(-0.3)=0.74, meaning that there is an estimated change in the median of y by a factor of 0.74 for every 1-unit increase in x ... or better yet, a 26% decrease.
Log-linear means an exponential: ln(y) = a x + b is equivalent to y = exp(a x) * exp(b), which is of the form A^x * B. Likewise, a log-log transform gives a power law: ln(y) = a ln(x) + b is of the form y = B * x^a, with B = exp(b).
On a log-linear plot an exponential will thus be a straight line, and a power law will be on a log-log plot.

Simple python and need explanation

m = 0
for x in range (4,6):
for y in range (2,4):
m = m + x + y
print (m)
ANSWER: 28
not sure how this is? Excluding the last number in the range, I thought it should be 14. I add it up on paper and cannot understand what I am doing wrong.
That loop is equivalent to:
m = 4+2 + 4+3 + 5+2 + 5+3
And, that sum is 28.
(In the outer loop, x takes on the values 4 and 5. In the inner loop, y takes on values 2 and 3.)

Resources