Haskell beginner - optimal way to solve this

Haskell beginner - optimal way to solve this - haskell

I was assigned the task to create a function which is able to calculate the grade of your exam by adding the points you achieved to the extra points you collected through other means and then converting them into the grade system. I also had to add an error message if either the extra points or the exam points exceeded their maximum (20 and 100 respectively).
The function I created works, but it probably isn't close to being optimal.
calcGrade :: Double -> Double -> Double
calcGrade x y
| x > 20 = error "Can't add more than 20 extra points"
| y > 100 = error "Can't achieve more than 100 points"
| x + y < 50 = 5.0
| x + y >= 50 && x + y < 54 = 4.0
| x + y >= 54 && x + y < 58 = 3.7
| x + y >= 58 && x + y < 62 = 3.3
| x + y >= 62 && x + y < 66 = 3.0
| x + y >= 66 && x + y < 70 = 2.7
| x + y >= 70 && x + y < 74 = 2.3
| x + y >= 74 && x + y < 78 = 2.0
| x + y >= 78 && x + y < 82 = 1.7
| x + y >= 82 && x + y < 86 = 1.3
| x + y >= 86 = 1.0
Is there another way of doing this or is there anything I could do more efficiently? I'm pretty new to Haskell and programming in general so I'm thankful for any advice!

If I wanted to implement that exact same function (as opposed to changing the specification so that I could make the code cleaner -- which is sometimes possible), I think I would use Map's lookupGT to encode the lookup table that is currently done with guards. So:
import Data.Map (fromAscList, lookupGT)
calcGrade :: Double -> Double -> Double
calcGrade x y
| x > 20 = error "Can't add more than 20 extra points"
| y > 100 = error "Can't achieve more than 100 points"
| otherwise = case lookupGT (x+y) cutoffs of
Just (_, v) -> v
Nothing -> 1.0
where
cutoffs = fromAscList [(50, 5.0), (54, 4.0), (58, 3.7), (62, 3.3), (66, 3.0), (70, 2.7), (74, 2.3), (78, 2.0), (82, 1.7), (86, 1.3)]
This has a few advantages:
Much less repetition. This is a virtue in and of itself.
With the guards, the reader must carefully check that the condition is on x+y in every case, and not, say, the very visually similar x+v for some case for some reason. With this encoding, that's clear without careful attention.
Each guard would be checked in turn, giving a linear runtime in the number of cutoffs. With Map's lookupGT, only a log number of comparisons is done. Since you probably don't intend to vary the cutoffs dynamically, this probably doesn't matter; but the trick used here can occasionally be useful elsewhere, so it's nice to remember it for those cases where asymptotics do matter.
Because cutoffs appear in only one place, if this changes later (you'd be surprised...) you don't have to be careful to change, e.g., 58 to 59 in two places as one would need to do with your code.
The only wart, in my opinion, is that the default score case (Nothing -> 1.0) doesn't live next to the cutoffs; though it's not clear to me how one might go about doing that sanely.

If you only accepted Int values (and still returned Double) then you could write it as
calcGrade x y =
let score = (min 46 (x + y) - 46) `div` 4
grades = [5.0, 4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7, 1.3] ++ repeat 1.0
in grades !! score
But this leaves out the first 2 checks. You could put them in rather easily, but it might be better to put that in a different function (also, the use of error is frowned upon in Haskell, better to use a type that indicates the function might fail, such as Maybe or Either).
What this function does is calculates the sum of x and y first, then says "which is smaller, x + y or 46?". This handles the case of x + y < 50. Next, it subtracts 46, so a score of 50 becomes 4, a score of 54 becomes 8, and so on. The div function will do integer division by 4, so a score of 50 becomes 4 becomes 1, a score less than 50 becomes 0, and a score of 73 becomes 27 becomes 6.
The grades themselves are stored in a list, any score less than 50 will index out the first element of 5.0, and then each range indexes out its corresponding element, so 73 indexes out the 7th element (index 6) of 2.3. The ++ repeat 1.0 handles a score >= 86.
Another way you could solve this might make a bit more sense. Just build a list of the ranges:
let score = x + y
mins = [0, 50, 54, 58, 62, 66, 70, 74, 78, 82, 86, 120]
ranges = zip mins (tail mins) -- [(0, 50), (50, 54), ..., (86, 120)]
grades = [5.0, 4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7, 1.3, 1.0]
inRange = map (\(lower, upper) -> lower <= score && score < upper) ranges
in snd $ head $ filter fst $ zip inRange grades
I think most of this logic is pretty clear, but the last line might be confusing. It zips the inRange list of Bools with the grades, filters by the first element (whether that range included the score), takes the first element from the list, then grabs the second element of that (Bool, Double) tuple.

Some things you can try to clean it up a bit:
Use a where binding to extract the common subexpression x + y.
Better yet, just accept the already added number as an argument. Then this function will be your "lookup grade" function, and you can call it like lookupGrade (exam + extra).
Don't use error. Instead, return a Maybe Double (which is either Just a Double, or Nothing if you can't compute the score.
List your guards in reverse order. This way, you only have to check one bound on each, not both. It is ok to overlap because they are checked in order.
Try to extract the actual meaning of the grading algorithm, rather than trying to list out cases. Try to find a math formula that converts it like you want, then write that formula in Haskell.
Applying these transformations is probably the best way to write this function, unless you want to use a Map (from Data.Map) to list larger numbers of cases. This code will express your intent better than using a bunch of unneeded lists.

Not sure about the performance, but this should work.
import Data.Map (fromAscList, filterWithKey)
calculate :: Double -> Double -> String
calculate exam bonus
| exam > 20 || bonus > 100 = "Be real!"
| otherwise = (foldr (++) "" . filterWithKey isInRange) letterGrade
where
isInRange k _ = percent `elem` k
percent = truncate $ (exam + bonus) * 10 / 12
letterGrade = fromAscList [ ([90..100], "A"), ([80..89], "B"), ([70..79], "C"), ([60..69], "D"), ([0 ..59], "F")]

Related

Find H Index of the nodes of a graph using NetworkX

Definition of H Index used in this algorithm
Supposing a relational expression is represented as y = F(x1, x2, . . . , xn), where F returns an integer number greater than 0, and the function is to find a maximum value y satisfying the condition that there exist at least y elements whose values are not less than y. Hence, the H-index of any node i is defined as
H(i) = F(kj1 ,kj2 ,...,k jki)
where kj1, kj2, . . . , kjki represent the set of degrees of neighboring nodes of node i.
Now I want to find the H Index of the nodes of the following graphs using the algorithm given below :
Graph :
Code (Written in Python and NetworkX) :
def hindex(g, n):
nd = {}
h = 0
# print(len(list(g.neighbors(n))))
for v in g.neighbors(n):
#nd[v] = len(list(g.neighbors(v)))
nd[v] = g.degree(v)
snd = sorted(nd.values(), reverse=True)
for i in range(0,len(snd)):
h = i
if snd[i] < i:
break
#print("H index of " + str(n)+ " : " + str(h))
return h
Problem :
This algorithm is returning the wrong values of nodes 1, 5, 8 and 9
Actual Values :
Node 1 - 6 : H Index = 2
Node 7 - 9 : H Index = 1
But for Node 1 and 5 I am getting 1, and for Node 8 and 9 I am getting 0.
Any leads on where I am going wrong will be highly appreciated!

Try this:
def hindex(g, n):
sorted_neighbor_degrees = sorted((g.degree(v) for v in g.neighbors(n)), reverse=True)
h = 0
for i in range(1, len(sorted_neighbor_degrees)+1):
if sorted_neighbor_degrees[i-1] < i:
break
h = i
return h
There's no need for a nested loop; just make a decreasing list, and calculate the h-index like normal.
The reason for 'i - 1' is just that our arrays are 0-indexed, while h-index is based on rankings (i.e. the k largest values) which are 1-indexed.
From the definition of h-index: For a non-increasing function f, h(f) is max i >= 0 such that f(i) >= i. This is, equivalently, the min i >= 1 such that f(i) < i, minus 1. Here, f(i) is equal to sorted_neighbor_degrees[i - 1]. There are of course many other ways (with different time and space requirements) to calculate h.

Mutable variables in Haskell?

I'm starting to wrap my head around Haskell and do some exciting experiments. And there's one thing I just seem to be unable to comprehend (previous "imperativist" experience talks maybe).
Recently, I was yearning to implement integer division function as if there where no multiply/divide operations. An immensely interesting brain-teaser which led to great confusion.
divide x y =
if x < y then 0
else 1 + divide (x - y) y
I compiled it and it.. works(!). That's mind-blowing. However, I was told, I was sure that variables are immutable in Haskell. How comes that with each recursive step variable x keeps it's value from previous step? Or is my glorious compiler lying to me? Why does it work at all?

Your x here doesn't change during one function call (i.e., after creation) - that's exactly what immutable means. What does change is value of x during multiple (recursive) calls. In a single stack frame (function call) the value of x is constant.
An example of execution of your code, for a simple case
call divide 8 3 -- (x = 8, y = 3), stack: divide 8 3
step 1: x < y ? NO
step 2: 1 + divide 5 3
call: divide 5 3 -- (x = 5, y = 3), stack: divide 8 3, divide 5 3
step 1: x < y ? NO
step 2: 1 + divide 2 3
call divide 2 3 -- (x = 2, y = 3), stack: divide 8 3, divide 5 3, divide 2 3
step 1: x < y ? YES
return: 0 -- unwinding bottom call
return 1 + 0 -- stack: divide 8 3, divide 5 3, unwinding middle call
return 1 + 1 + 0 -- stack: divide 8 3
I am aware that the above notation is not anyhow formalized, but I hope it helps to understand what recursion is about and that x might have different values in different calls, because it's simply a different instance of whole call, thus also different instance of x.

x is actually not a variable, but a parameter, and isn't that different from parameters in imperative languages.
Maybe it'd look more obvious with explicit return statements?
-- for illustrative purposes only, doesn't actually work
divide x y =
if x < y
then return 0
else return 1 + divide (x - y) y
You're not mutating x, just stacking up several function calls to calculate your desired result with the values they return.
Here's the same function in Python:
def divide(x, y):
if x < y:
return 0
else:
return 1 + divide(x - y, y)
Looks familiar, right? You can translate this to any language that allows recursion, and none of them would require you to mutate a variable.
Other than that, yes, your compiler is lying to you. Because you're not allowed to directly mutate values, the compiler can make a lot of extra assumptions based on your code, which helps translating it to efficient machine code, and at that level, there's no escaping mutability. The major benefit is that compilers are way less likely to introduce mutability-related bugs than us mortals.

Count the Number of Zero's between Range of integers

. Is there any Direct formula or System to find out the Numbers of Zero's between a Distinct Range ... Let two Integer M & N are given . if I have to find out the total number of zero's between this Range then what should I have to do ?
Let M = 1234567890 & N = 2345678901
And answer is : 987654304
Thanks in advance .

Reexamining the Problem
Here is a simple solution in Ruby, which inspects each integer from the interval [m,n], determines the string of its digits in the standard base 10 positional system, and counts the occuring 0 digits:
def brute_force(m, n)
if m > n
return 0
end
z = 0
m.upto(n) do |k|
z += k.to_s.count('0')
end
z
end
If you run it in an interactive Ruby shell you will get
irb> brute_force(1,100)
=> 11
which is fine. However using the interval bounds from the example in the question
m = 1234567890
n = 2345678901
you will recognize that this will take considerable time. On my machine it does need more than a couple of seconds, I had to cancel it so far.
So the real question is not only to come up with the correct zero counts but to do it faster than the above brute force solution.
Complexity: Running Time
The brute force solution needs to perform n-m+1 times searching the base 10 string for the number k, which is of length floor(log_10(k))+1, so it will not use more than
O(n (log(n)+1))
string digit accesses. The slow example had an n of roughly n = 10^9.
Reducing Complexity
Yiming Rong's answer is a first attempt to reduce the complexity of the problem.
If the function for calculating the number of zeros regarding the interval [m,n] is F(m,n), then it has the property
F(m,n) = F(1,n) - F(1,m-1)
so that it suffices to look for a most likely simpler function G with the property
G(n) = F(1,n).
Divide and Conquer
Coming up with a closed formula for the function G is not that easy. E.g.
the interval [1,1000] contains 192 zeros, but the interval [1001,2000] contains 300 zeros, because a case like k = 99 in the first interval would correspond to k = 1099 in the second interval, which yields another zero digit to count. k=7 would show up as 1007, yielding two more zeros.
What one can try is to express the solution for some problem instance in terms of solutions to simpler problem instances. This strategy is called divide and conquer in computer science. It works if at some complexity level it is possible to solve the problem instance and if one can deduce the solution of a more complex problem from the solutions of the simpler ones. This naturally leads to a recursive formulation.
E.g. we can formulate a solution for a restricted version of G, which is only working for some of the arguments. We call it g and it is defined for 9, 99, 999, etc. and will be equal to G for these arguments.
It can be calculated using this recursive function:
# zeros for 1..n, where n = (10^k)-1: 0, 9, 99, 999, ..
def g(n)
if n <= 9
return 0
end
n2 = (n - 9) / 10
return 10 * g(n2) + n2
end
Note that this function is much faster than the brute force method: To count the zeros in the interval [1, 10^9-1], which is comparable to the m from the question, it just needs 9 calls, its complexity is
O(log(n))
Again note that this g is not defined for arbitrary n, only for n = (10^k)-1.
Derivation of g
It starts with finding the recursive definition of the function h(n),
which counts zeros in the numbers from 1 to n = (10^k) - 1, if the decimal representation has leading zeros.
Example: h(999) counts the zero digits for the number representations:
001..009
010..099
100..999
The result would be h(999) = 297.
Using k = floor(log10(n+1)), k2 = k - 1, n2 = (10^k2) - 1 = (n-9)/10 the function h turns out to be
h(n) = 9 [k2 + h(n2)] + h(n2) + n2 = 9 k2 + 10 h(n2) + n2
with the initial condition h(0) = 0. It allows to formulate g as
g(n) = 9 [k2 + h(n2)] + g(n2)
with the intital condition g(0) = 0.
From these two definitions we can define the difference d between h and g as well, again as a recursive function:
d(n) = h(n) - g(n) = h(n2) - g(n2) + n2 = d(n2) + n2
with the initial condition d(0) = 0. Trying some examples leads to a geometric series, e.g. d(9999) = d(999) + 999 = d(99) + 99 + 999 = d(9) + 9 + 99 + 999 = 0 + 9 + 99 + 999 = (10^0)-1 + (10^1)-1 + (10^2)-1 + (10^3)-1 = (10^4 - 1)/(10-1) - 4. This gives the closed form
d(n) = n/9 - k
This allows us to express g in terms of g only:
g(n) = 9 [k2 + h(n2)] + g(n2) = 9 [k2 + g(n2) + d(n2)] + g(n2) = 9 k2 + 9 d(n2) + 10 g(n2) = 9 k2 + n2 - 9 k2 + 10 g(n2) = 10 g(n2) + n2
Derivation of G
Using the above definitions and naming the k digits of the representation q_k, q_k2, .., q2, q1 we first extend h into H:
H(q_k q_k2..q_1) = q_k [k2 + h(n2)] + r (k2-kr) + H(q_kr..q_1) + n2
with initial condition H(q_1) = 0 for q_1 <= 9.
Note the additional definition r = q_kr..q_1. To understand why it is needed look at the example H(901), where the next level call to H is H(1), which means that the digit string length shrinks from k=3 to kr=1, needing an additional padding with r (k2-kr) zero digits.
Using this, we can extend g to G as well:
G(q_k q_k2..q_1) = (q_k-1) [k2 + h(n2)] + k2 + r (k2-kr) + H(q_kr..q_1) + g(n2)
with initial condition G(q_1) = 0 for q_1 <= 9.
Note: It is likely that one can simplify the above expressions like in case of g above. E.g. trying to express G just in terms of G and not using h and H. I might do this in the future. The above is already enough to implement a fast zero calculation.
Test Result
recursive(1234567890, 2345678901) =
987654304
expected:
987654304
success
See the source and log for details.
Update: I changed the source and log according to the more detailed problem description from that contest (allowing 0 as input, handling invalid inputs, 2nd larger example).

You can use a standard approach to find m = [1, M-1] and n = [1, N], then [M, N] = n - m.
Standard approaches are easily available: Counting zeroes.

Does ((a^x) ^ 1/x) == a in Zp? (for Jablon's protocol)

I have to implement Jablon's protocol (paper) but I've been sitting on a bug for two hours.
I'm not very good with math so I don't know if it's my fault in writing it or it just isn't possible. If it isn't possible, I don't see how Jablon's protocol can be implemented since it relies on the fact that ((gP ^ x) ^ yi) ^ (1/x) == gP^yi .
Take the following code. It doesn't work.
BigInteger p = new BigInteger("101");
BigInteger a = new BigInteger("83");
BigInteger x = new BigInteger("13");
BigInteger ax = a.modPow(x, p);
BigInteger xinv = x.modInverse(p);
BigInteger axxinv = ax.modPow(xinv, p);
if (a.equals(axxinv))
System.out.println("Yay!");
else
System.out.println("How is this possible?");

Your problem is that you're not calculating k(1/x) correctly. We need k(1/x))k to be x. Fermat's Little Theorem tells us that kp-1 is 1 mod p. Therefore we want to find y such that x * y is 1 mod p-1, not mod p.
So you want BigInteger xinv = x.modInverse(p-1);.
This will not work if x shares a common factor with p-1. (Your case avoids that.) For that, you need additional theory.
If p is a prime, then r is a primitive root if none of r, r^2, r^3, ..., r^(p-2) are congruent to 1 mod p. There is no simple algorithm to produce a primitive root, but they are common so you usually only need to check a few. (For p=101, the first number I tried, 2, turned out to be a primitive root. 83 is also.) Testing them would seem to be hard, but it isn't so bad since it turns out that (omitting a bunch of theory here) only divisors of p-1 need to be checked. For instance for 101 you only need to check the powers 1, 2, 4, 5, 10, 20, 25 and 50.
Now if r is a primitive root, then every number mod p is some power of r. What power? That's called the discrete logarithm problem and is not simple. (It's difficulty is the basis of RSA, which is a well known cryptography system.) You can do it with trial division. So trying 1, 2, 3, ... you eventually find that, for instance, 83 is 2^89 (mod 101).
But once we know that every number from 1 to 100 is 2 to some power, we are armed with a way to calculate roots. Because raising a number to the power of x just multiplies the exponent by x. And 2^100 is 1. So exponentiation is multiplying by x (mod 100).
So suppose that we want y ^ 13 to be 83. Then y is 2^k for some k such that k * 13 is 89. If you play around with the Chinese Remainder Theorem you can realize that k = 53 works. Therefore 2^53 (mod 101) = 93 is the 13'th root of 89.
That is harder than what we did before. But suppose that we wanted to take, say, the 5th root of 44 mod 101. We can't use the simple procedure because 5 does not have a multiplicative inverse mod 100. However 44 is 2^15. Therefore 2^3 = 8 is a 5th root. But there are 4 others, namely 2^23, 2^43, 2^63 and 2^83.

Problem detecting cyclic numbers in Haskell

I am doing problem 61 at project Euler and came up with the following code (to test the case they give):
p3 n = n*(n+1) `div` 2
p4 n = n*n
p5 n = n*(3*n -1) `div` 2
p6 n = n*(2*n -1)
p7 n = n*(5*n -3) `div` 2
p8 n = n*(3*n -2)
x n = take 2 $ show n
x2 n = reverse $ take 2 $ reverse $ show n
pX p = dropWhile (< 999) $ takeWhile (< 10000) [p n|n<-[1..]]
isCyclic2 (a,b,c) = x2 b == x c && x2 c == x a && x2 a == x b
ns2 = [(a,b,c)|a <- pX p3 , b <- pX p4 , c <- pX p5 , isCyclic2 (a,b,c)]
And all ns2 does is return an empty list, yet cyclic2 with the arguments given as the example in the question, yet the series doesn't come up in the solution. The problem must lie in the list comprehension ns2 but I can't see where, what have I done wrong?
Also, how can I make it so that the pX only gets the pX (n) up to the pX used in the previous pX?
PS: in case you thought I completely missed the problem, I will get my final solution with this:
isCyclic (a,b,c,d,e,f) = x2 a == x b && x2 b == x c && x2 c == x d && x2 d == x e && x2 e == x f && x2 f == x a
ns = [[a,b,c,d,e,f]|a <- pX p3 , b <- pX p4 , c <- pX p5 , d <- pX p6 , e <- pX p7 , f <- pX p8 ,isCyclic (a,b,c,d,e,f)]
answer = sum $ head ns

The order is important. The cyclic numbers in the question are 8128, 2882, 8281, and these are not P3/127, P4/91, P5/44 but P3/127, P5/44, P4/91.
Your code is only checking in the order 8128, 8281, 2882, which is not cyclic.
You would get the result if you check for
isCyclic2 (a,c,b)
in your list comprehension.

EDIT: Wrong Problem!
I assumed you were talking about the circular number problem, Sorry!
There is a more efficient way to do this with something like this:
take (2 * l x -1) . cycle $ show x
where l = length . show
Try that and see where it gets you.

If I understand you right here, you're no longer asking why your code doesn't work but how to make it faster. That's actually the whole fun of Project Euler to find an efficient way to solve the problems, so proceed with care and first try to think of reducing your search space yourself. I suggest you let Haskell print out the three lists pX p3, pX p4, pX p5 and see how you'd go about looking for a cycle.
If you would proceed like your list comprehension, you'd start with the first element of each list, 1035, 1024, 1080. I'm pretty sure you would stop right after picking 1035 and 1024 and not test for cycles with any value from P5, let alone try all the permutations of the combinations involving these two numbers.
(I haven't actually worked on this problem yet, so this is how I would go about speeding it up. There may be some math wizardry out there that's even faster)
First, start looking at the numbers you get from pX. You can drop more than those. For example, P3 contains 6105 - there's no way you're going to find a number in the other sets starting with '05'. So you can also drop those numbers where the number modulo 100 is less than 10.
Then (for the case of 3 sets), we can sometimes see after drawing two numbers that there can't be any number in the last set that will give you a cycle, no matter how you permutate (e.g. 1035 from P3 and 3136 from P4 - there can't be a cycle here).
I'd probably try to build a chain by starting with the elements from one list, one by one, and for each element, find the elements from the remaining lists that are valid successors. For those that you've found, continue trying to find the next chain element from the remaining lists. When you've built a chain with one number from every list, you just have to check if the last two digits of the last number match the first two digits of the first number.
Note when looking for successors, you again don't have to traverse the entire lists. If you're looking for a successor to 3015 from P5, for example, you can stop when you hit a number that's 1600 or larger.
If that's too slow still, you could transform the lists other than the first one to maps where the map key is the first two digits and the associated values are lists of numbers that start with those digits. Saves you from going through the lists from the start again and again.
I hope this helps a bit.

btw, I sense some repetition in your code.
you can unite your [p3, p4, p5, p6, p7, p8] functions into one function that will take the 3 from the p3 as a parameter etc.
to find what the pattern is, you can make all the functions in the form of
pX n = ... `div` 2

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string