Every time I use J's M. adverb, performance degrades considerably. Since I suspect Iverson and Hui are far smarter than I, I must be doing something wrong.
Consider the Collatz conjecture. There seem to be all sorts of opportunities for memoization here, but no matter where I place M., performance is terrible. For example:
hotpo =: -:`(>:#(3&*))#.(2&|) M.
collatz =: hotpo^:(>&1)^:a:"0
##collatz 1+i.10000x
Without M., this runs in about 2 seconds on my machine. With M., well, I waited over ten minutes for it to complete and eventually gave up. I've also placed M. in other positions with similarly bad results, e.g.,
hotpo =: -:`(>:#(3&*))#.(2&|)
collatz =: hotpo^:(>&1)M.^:a:"0
##collatz 1+i.10000x
Can someone explain the proper usage of M.?
The M. does nothing for you here.
Your code is constructing a chain, one link at a time:
-:`(>:#(3&*))#.(2&|)^:(>&1)^:a:"0 M. 5 5
5 16 8 4 2 1
5 16 8 4 2 1
Here, it remembers that 5 leads to 16, 16 leads to 8, 8 leads to 4, etc... But what does that do for you? It replaces some simple arithmetic with a memory lookup, but the arithmetic is so trivial that it's likely faster than the lookup. (I'm surprised your example takes 10 whole minutes, but that's beside the point.)
For memoization to make sense, you need to be replace a more expensive computation.
For this particular problem, you might want a function that takes an integer and returns a 1 if and when the sequence arrives at 1. For example:
-:`(>:#(3&*))#.(2&|)^:(>&1)^:_"0 M. 5 5
1 1
All I did was replace the ^:a: with ^:_, to discard the intermediate results. Even then, it doesn't make much difference, but you can use timespacex to see the effect:
timespacex '-:`(>:#(3&*))#.(2&|)^:(>&1)^:_"0 i.100000'
17.9748 1.78225e7
timespacex '-:`(>:#(3&*))#.(2&|)^:(>&1)^:_"0 M. i.100000'
17.9625 1.78263e7
Addendum: The placement of the M. relative to the "0 does seems to make
a huge difference. I thought I might have made a mistake there, but a quick test showed that swapping them caused a huge performance loss in both time and space:
timespacex '-:`(>:#(3&*))#.(2&|)^:(>&1)^:_ M. "0 i.100000'
27.3633 2.41176e7
M. preserves the rank of the underlying verb, so the two are semantically equivalent, but I suspect with the "0 on the outside like this, the M. doesn't know that it's dealing with scalars. So I guess the lesson here is to make sure M. knows what it's dealing with. :)
BTW, if the Collatz conjecture turned out to be false, and you fed this code a counterexample, it would go into an infinite loop rather than produce an answer.
To actually detect a counterexample, you'd want to monitor the intermediate results until you found a cycle, and then return the lowest number in the cycle. To do this, you'd probably want to implement a custom adverb to replace ^:n.
Related
I want to calculate the time it will take to break a SHA-256 hash. So I research and found the following calculation. If I have a password in lower letter with a length of 6 chars, I would have 26^6passwords right?
To calculate the time I have to divide this number by a hashrate, I guess. So if I had one RTX 3090, the hashrate would be 120 MH/s (1.2*10^8 H/s) and than I need to calculate 26^6/(1.2*10^8) to get the time in seconds right?
Is this idea right or wrong?
Yes, but a lowercase-latin 6 character string is also short enough that you would expect to compute this one time and put it into a database so that you could look it up in O(1). It's only a bit over 300M entries. That said, given you're 50% likely to find the answer in the first half of your search, it's so fast to crack that you might not even bother unless you were doing this often. You don't even need a particularly fancy GPU for something on this scale.
Note that in many cases a 6 character string can also be a 5 character string, so you need to add 26^6 + 26^5 + 26^4 + ..., but all of these together only raises this to around 320M hashes. It's a tiny space.
Adding uppercase, numbers and the easily typed symbols gets you up to 96^6 ~ 780B. On the other hand, adding just 3 more lowercase-letters (9 total) gets you to 26^9 ~ 5.4T. For brute force on random strings, longer is much more powerful than complicated.
To your specific question, note that it does matter how you implement this. You won't get these kinds of hash rates if you don't write your code in a way to maximize the GPU. For example, writing simple code that sends one value to the GPU to hash at a time, and then compares the result on the CPU could be incredibly slow (in some cases slower than just doing all the work on a CPU). Setting up your memory efficiently and maximizing things the GPU can do in parallel are very important. If you're not familiar with this kind of programming, I recommend using or studying a tool like John the Ripper.
I am working on the following problem and having a hell of a time at the moment.
We are playing the Guessing Game. The game will work as follows:
I pick a number between 1 and n.
You guess a number.
If you guess the right number, you win the game.
If you guess the wrong number, then I will tell you whether the number I picked is higher or lower, and you will continue guessing.
Every time you guess a wrong number x, you will pay x dollars. If you run out of money, you lose the game.
Given a particular n, return the minimum amount of money you need to guarantee a win regardless of what number I pick.
So, what do I know? Clearly this is a dynamic programming problem. I have two choices, break things up recursively or go ahead and do things bottom up. Bottom up seems like a better choice to me (though technically the max recursion depth would be 100 as we are guaranteed n<=100). The question then is: What do the sub-problems look like?
Well, I think we could start thinking about subarrays (but possible we need subsequences here) so what is the worst case in each possible sub-division kind of thing? That is:
[1,2,3,4,5,6]
[[1],[2],[3],[4],[5],[6]] -> 21
[[1,2],[3,4],[5,6]] -> 9
[[1,2,3],[4,5,6]] -> 7
...
but I don't think I quite have the idea yet. So, to get succinct since this post is kind of long: How are we breaking this up? What is the sub-problem we are trying to solve here?
Relevant Posts:
Binary Search Doesn't work in this case?
I’m studying for mid-terms and this is one of the questions from a past yr paper in university. (Questions stated below)
Given Euclid’s algorithm, we can write the function gcd.
def gcd(a,b):
if b == 0:
return a
else:
return gcd(b, a%b)
[Reduced Proper Fraction]
Consider the fraction, n/d , where n and d are positive integers.
If n < d and GCD(n,d) = 1, it is called a reduced proper fraction.
If we list the set of reduced proper fractions for n <=8 in ascending order of size, we get:
1/8,1/7,1/6,1/5,1/4,2/7,1/3,3/8,2/5,3/7,1/2,4/7,3/5,5/8,2/3,5/7,3/4,4/5,5/6,6/7,7/8
It can be seen that there are 21 elements in this set.
Implement the function count_fraction that takes an integer n and returns the number of reduced proper fractions for n. Assuming that the order of growth (in time) for gcd is O(logn), what is the order of growth in terms of time and space for the function you wrote in Part (B) in terms of n. Explain your answer.
Suggested answer.
def count_fraction(n):
if n==1:
return 0
else:
new = 0
for i in range(1,n):
if gcd(i,n) == 1:
new += 1
return new + count_fraction(n-1)
The suggested answer is pretty strange as the trend of this question in previous years, is designed to test purely recursive/purely iterative solutions, but it gave a mix. Nevertheless, I don’t understand why the suggested order of growth is given as such. (I will write it in the format, suggested answer, my answer and questions on my fundamentals)
Time: O(nlogn), since it’s roughly log1+log2+· · ·+log(n−1)+logn
My time: O(n^2 log n). Since there is n recursive function calls, each call has n-1 iterations, which takes O(log n) time due to gcd.
Question 1: Time in my opinion is counting number of iterations/recursions* time taken for 1 iteration/recursion. It’s actually my first time interacting with a mixed iterative/recursive solution so I don’t really know the interaction. Can someone tell me whether I'm right/wrong?
Space: O(n), since gcd is O(1) and this code is obviously linear recursion.
My space: O(n*log n). Since gcd is O(log n) and this code takes up O(n) space.
Question 2: Space in my opinion is counting number of recursions*space taken for 1 recursive call OR largest amount of space required among all iterations. In the first place, I would think gcd is O(log n) as I assume that recursion will happen log n times. I want to ask whether the discrepancy is due to what my lecturer said.
(I don’t really understand what my lecturers says about delayed operations for recursions on factorial or no new objects being formed in iteratives. How do u then accept the fact that there are NEW objects formed in recursion also no delayed operations in iteration).
If u can clarify my doubt on why gcd is O(1) instead of O(log n), I think if I take n*1 for recursion case, I would agree with the answer.
I agree with your analysis for of the running time. It should be O(n^2 log(n)), since you make n calls to gcd on each recursive call to count_fraction.
You're also partly right about the second question, but you get the conclusion wrong (and the supplied answer gets the right conclusion for the wrong reasons). The gcd function does indeed use O(log(n)) space, for the stack of the recursive calls. However, that space gets reused for each later call to gcd from count_fraction, so there's only ever one stack of size log(n). So there's no reason to multiply the log(n) by anything, only add it to whatever else might be using memory when the gcd calls are happening. Since there will also be a stack of size O(n) for the recursive calls of count_fraction, the smaller log(n) term can be dropped, so you say it takes O(n) space rather than O(n + log(n)).
All in all, I'd say this is a really bad assignment to be trying to learn from. Almost everything in it has an error somewhere, from the description saying it's limiting n when it's really limiting d, to the answers you describe which are all at least partly wrong.
This is a first run-in with not only bitwise ops in python, but also strange (to me) syntax.
for i in range(2**len(set_)//2):
parts = [set(), set()]
for item in set_:
parts[i&1].add(item)
i >>= 1
For context, set_ is just a list of 4 letters.
There's a bit to unpack here. First, I've never seen [set(), set()]. I must be using the wrong keywords, as I couldn't find it in the docs. It looks like it creates a matrix in pythontutor, but I cannot say for certain. Second, while parts[i&1] is a slicing operation, I'm not entirely sure why a bitwise operation is required. For example, 0&1 should be 1 and 1&1 should be 0 (carry the one), so binary 10 (or 2 in decimal)? Finally, the last bitwise operation is completely bewildering. I believe a right shift is the same as dividing by two (I hope), but why i>>=1? I don't know how to interpret that. Any guidance would be sincerely appreciated.
[set(), set()] creates a list consisting of two empty sets.
0&1 is 0, 1&1 is 1. There is no carry in bitwise operations. parts[i&1] therefore refers to the first set when i is even, the second when i is odd.
i >>= 1 shifts right by one bit (which is indeed the same as dividing by two), then assigns the result back to i. It's the same basic concept as using i += 1 to increment a variable.
The effect of the inner loop is to partition the elements of _set into two subsets, based on the bits of i. If the limit in the outer loop had been simply 2 ** len(_set), the code would generate every possible such partitioning. But since that limit was divided by two, only half of the possible partitions get generated - I couldn't guess what the point of that might be, without more context.
I've never seen [set(), set()]
This isn't anything interesting, just a list with two new sets in it. So you have seen it, because it's not new syntax. Just a list and constructors.
parts[i&1]
This tests the least significant bit of i and selects either parts[0] (if the lsb was 0) or parts[1] (if the lsb was 1). Nothing fancy like slicing, just plain old indexing into a list. The thing you get out is a set, .add(item) does the obvious thing: adds something to whichever set was selected.
but why i>>=1? I don't know how to interpret that
Take the bits in i and move them one position to the right, dropping the old lsb, and keeping the sign. Sort of like this
Except of course that in Python you have arbitrary-precision integers, so it's however long it needs to be instead of 8 bits.
For positive numbers, the part about copying the sign is irrelevant.
You can think of right shift by 1 as a flooring division by 2 (this is different from truncation, negative numbers are rounded towards negative infinity, eg -1 >> 1 = -1), but that interpretation is usually more complicated to reason about.
Anyway, the way it is used here is just a way to loop through the bits of i, testing them one by one from low to high, but instead of changing which bit it tests it moves the bit it wants to test into the same position every time.
I am reading page 69 of Haskell School of Expression and I am not sure that I got the evalution of rev [1:2:3:4] right.
Hudak does not explain the evalution(rewriting) order in detail in his book for reverse.
Could someone please either confirm that my guess (shown in the attached picture) is correct or if not correct then point out what I got wrong. I believe that it is correct but I am not 100% sure, this is the reason for asking.
So the question is:
when I evaluate one step of reverse then aftes the evaluation (i.e. rewriting) the result should be surrounded by parenthesis, right?
If I understand correctly, these unlucky appearance of parentheseses is the reason for the poor (read quadratic) time complexity of reverse. In this example 6 steps are spent in total on list appending in order to reverse a 4 element list.
Yes, nested, left-associative calls to append (in Haskell, goes by the names (++) and (<>)) generates poor performance of singly-linked lists.
There are several solutions to this problem, since it's been known about for 30 or 40 years, at least. I believe the library version of reverse uses an accumulator to achieve linear complexity rather than quadratic, but it's still not something you want to call frequently on lists.