J Primes Enumeration - j

J will answer the n-th prime via p:n.
If I ask for the 100 millionth prime I get an almost instant answer. I cannot imagine J is sieving for that prime that quickly, but neither looking it up in a table as that table would be around 1GB in size.
There are equations giving approximations to the number of primes to a bound, but they are only approximations.
How is J finding the answer so quickly ?

J uses a table to start, then calculates
NOTE! This is speculation, based on benchmarks (shown below).
If you want to quickly try for yourself, try the following:
p:1e8 NB. near-instant
p:1e8-1 NB. noticeable pause
The low points on the graph are where J looks up the prime in a table. After that, J is calculating the value from a particular starting point so it doesn't have to calculate the entire thing. So some lucky primes will be constant time (simple table lookup) but generally there's first a table lookup, and then a calculation. But happily, it calculates starting from the previous table lookup instead of calculating the entire value.
Benchmarks
I did some benchmarking to see how p: performs on my machine (iMac i5, 16G RAM). I'm using J803. The results are interesting. I'm guessing the sawtooth pattern in the time plots (visible on the 'up to 2e5' plot) is lookup table related, while the overall log-ish shape (visible on the 'up to 1e7' plot) is CPU related.
NB. my test script
ts=:3 : 0
a=.y
while. a do.
c=.timespacex 'p:(1e4*a)' NB. 1000 times a
a=.<:a
b=.c;b
end.
}:b
)
a =: ts 200
require'plot'
plot >0{each a NB. time
plot >1{each a NB. space
(p: up to 2e5)
time
space
(p: up to 1e7)
time
space
During these runs one core was hovering around 100%:
Also, the voc page states:
Currently, arguments larger than 2^31 are tested to be prime according to a probabilistic algorithm (Miller-Rabin).
And in addition to a prime lookup table as #Mauris points out, v2.c contains this function:
static F1(jtdetmr){A z;B*zv;I d,h,i,n,wn,*wv;
RZ(w=vi(w));
wn=AN(w); wv=AV(w);
GA(z,B01,wn,AR(w),AS(w)); zv=BAV(z);
for(i=0;i<wn;++i){
n=*wv++;
if(1>=n||!(1&n)||0==n%3||0==n%5){*zv++=0; continue;}
h=0; d=n-1; while(!(1&d)){++h; d>>=1;}
if (n< 9080191)*zv++=spspd(31,n,d,h)&&spspd(73,n,d,h);
else if(n<94906266)*zv++=spspd(2 ,n,d,h)&&spspd( 7,n,d,h)&&spspd(61,n,d,h);
else *zv++=spspx(2 ,n,d,h)&&spspx( 7,n,d,h)&&spspx(61,n,d,h);
}
RE(0); R z;
} /* deterministic Miller-Rabin */

Related

What's the Big-O-notation for this algorithm for printing the prime numbers?

I am trying to figure out the time complexity of the below problem.
import math
def prime(n):
for i in range(2,n+1):
for j in range(2, int(math.sqrt(i))+1):
if i%j == 0:
break
else:
print(i)
prime(36)
This problem prints the prime numbers until 36.
My understanding of the above program:
for every n the inner loop runs for sqrt(n) times so on until n.
so the Big-o-Notation is O(n sqrt(n)).
Does my understanding is right? Please correct me if I am wrong...
Time complexity measures the increase in number or steps (basic operations) as the input scales up:
O(1) : constant (hash look-up)
O(log n) : logarithmic in base 2 (binary search)
O(n) : linear (search for an element in unsorted list)
O(n^2) : quadratic (bubble sort)
To determine the exact complexity of an algorithm requires a lot of math and algorithms knowledge. You can find a detailed description of them here: time complexity
Also keep in mind that these values are considered for very large values of n, so as a rule of thumb, whenever you see nested for loops, think O(n^2).
You can add a steps counter inside your inner for loop and record its value for different values of n, then print the relation in a graph. Then you can compare your graph with the graphs of n, log n, n * sqrt(n) and n^2 to determine exactly where your algorithm is placed.

Is there any way to make this code more efficient?

I have to write a code to calculate the number of elements that have the maximum number of divisors between any 2 given numbers (A[0], A[1])(inclusive of both). I have to take input in the form of a line separated with spaces. The first line of the input gives the number of cases present in an example. This code is working perfectly fine but is taking some time to execute. Can anyone please help me write this code more efficiently?
import numpy as np
from sys import stdin
t=input()
for i in range(int(t)):
if int(t)<=100 and int(t)>=1:
divisor=[]
A=list(map(int,stdin.readline().split(' ')))
def divisors(n):
count=0
for k in range(1,int(n/2)+1):
if n%k==0:
count+=1
return count
for j in np.arange(A[0],A[1]+1):
divisor.append(divisors(j))
print(divisor.count(max(divisor)))
Sample input:
2
2 9
1 10
Sample Output:
3
4
There is a way to calculate divisors from the prime factorisation of a number.
Given the prime factorisation, calculating divisors is faster than trial division (which you do here).
But prime factorisation has to be fast. For small numbers having a pre-calculated list of prime numbers (easy to do) can make prime factorisation fast and divisor calculation fast as well. If you konw the upper limit of the numbers you test (let's call it L), then you need the prime numbers up to sqrt(L). Given the prime factorisation of a number n = p_1^e_1 * p_2^e_2 * .. * p_k^e_k the number of divisors is simply (1+e_1) * (1+e_2) * .. * (1+e_k)
Even more, you can pre-calculate and/or memoize the num of divisors of some overused numbers up to some limit. This will save a lot of time but increase memory, else you can calculate it directly (for example using previous method).
Apart from that, you can optimise the code a bit. For example you can avoid doing int(t) casting (and similar) all the time, do it once and store it in a variable.
Numpy may be avoided all together, it is superflous and I doubt adds any speed advantage, depends.
That should make your code faster, but always need to measure performance by real tests.

What is the time complexity of this agorithm (that solves leetcode question 650) (question 2)?

Hello I have been working on https://leetcode.com/problems/2-keys-keyboard/ and came upon this dynamic programming question.
You start with an 'A' on a blank page and you get a number n when you are done you should have n times 'A' on the page. The catch is you are allowed only 2 operations copy (and you can only copy the total amount of A's currently on the page) and paste --> find the minimum number of operations to get n 'A' on the page.
I solved this problem but then found a better solution in the discussion section of leetcode --> and I can't figure out it's time complexity.
def minSteps(self, n):
factors = 0
i=2
while i <= n:
while n % i == 0:
factors += i
n /= i
i+=1
return factors
The way this works is i is never gonna be bigger than the biggest prime factor p of n so the outer loop is O(p) and the inner while loop is basically O(logn) since we are dividing n /= i at each iteration.
But the way I look at it we are doing O(logn) divisions in total for the inner loop while the outer loop is O(p) so using aggregate analysis this function is basically O(max(p, logn)) is this correct ?
Any help is welcome.
Your reasoning is correct: O(max(p, logn)) gives the time complexity, assuming that arithmetic operations take constant time. This assumption is not true for arbitrary large n, that would not fit in the machine's fixed-size number storage, and where you would need Big-Integer operations that have non-constant time complexity. But I will ignore that.
It is still odd to express the complexity in terms of p when that is not the input (but derived from it). Your input is only n, so it makes sense to express the complexity in terms of n alone.
Worst Case
Clearly, when n is prime, the algorithm is O(n) -- the inner loop never iterates.
For a prime n, the algorithm will take more time than for n+1, as even the smallest factor of n+1 (i.e. 2), will halve the number of iterations of the outer loop, and yet only add 1 block of constant work in the inner loop.
So O(n) is the worst case.
Average Case
For the average case, we note that the division of n happens just as many times as n has prime factors (counting duplicates). For example, for n = 12, we have 3 divisions, as n = 2·2·3
The average number of prime factors for 1 < n < x approaches loglogn + B, where B is some constant. So we could say the average time complexity for the total execution of the inner loop is O(loglogn).
We need to add to that the execution of the outer loop. This corresponds to the average greatest prime factor. For 1 < n < x this average approaches C.n/logn, and so we have:
O(n/logn + loglogn)
Now n/logn is the more important term here, so this simplifies to:
O(n/logn)

Find euclidean distance between rows of two huge CSR matrices

I have two sparse martrices, A and B. A is 120000*5000 and B is 30000*5000. I need to find the euclidean distances between each row in B with all rows of A and then find the 5 rows in A with the lowest distance to the selected row in B. As it is a very big data I am using CSR otherwise I get memory error. It is clear that for each row in A it calculates (x_b - x_a)^2 5000 times and sums them and then get a sqrt. This process is taking a very very long time, like 11 days! Is there any way I can do this more efficiently? I just need the 5 rows with the lowest distance to each row in B.
I am implementing K-Nearest Neighbours and A is my training set and B is my test set.
Well - I don't know if you could 'vectorize' that code, so that it would run in native code instead of Python. The trick to speed-up numpy and scipy is always getting that.
If you can run that code in native code in a 1GHz CPU, with 1 FP instruction for clock cicle, you'd get it done in a little under 10 hours.
(5000 * 2 * 30000 * 120000) / 1024 ** 3
Raise that to 1.5Ghz x 2 CPU physical cores x 4 way SIMD instructions with multiply + acummulate (Intel AVX extensions, available in most CPUs) and you could get that number crunching down to one hour, at 2 x 100% on a modest core i5 machinne. But that would require full SIMD optimization in native code - far from a trivial task (although, if you decide to go this path, further questions on S.O. could get help from people either to wet their hands in SIMD coding :-) ) - interfacing this code in C with Scipy is not hard using cython, for example (you only need that part to get it to the above 10 hour figure)
Now... as for algorithm optimization, and keeping things Python :-)
Fact is, you don't need to fully calculate all distances from rows in A - you just need to keep a sorted list of the 5 lower rows - and any time the cumulation of a sum of squares get larger than the 5th nearest row (so far), you just abort the calculation for that row.
You could use Python' heapq operations for that:
import heapq
import math
def get_closer_rows(b_row, a):
result = [(float("+inf"), None) * 5]
for i, a_row in enumerate(a):
distance_sq = 0
count = 0
for element_a, element_b in zip(a_row, b_row):
distance_sq += element_a * element_b
if not count % 64 and distance_sq > result[4][0]:
break
count += 1
else:
heapq.heappush(result, (distance, i))
result[:] = result[:5]
return [math.sqrt(r) for r in result]
closer_rows_to_b = []
for row in b:
closer_rows_to_b.append(get_closer_rows(row, a))
Note the auxiliar "count" to avoid the expensive retrieving and comparison of values for all multiplications.
Now, if you can run this code using pypy instead of regular Python, I believe it could get full benefit of JITting, and you could get a noticeable improvement over your times if you are running the code in pure Python (i.e.: non numpy/scipy vectorized code).

how to differentiate two very long strings in c++?

I would like to solve
Levenshtein_distance this problem where length of string is too huge .
Edit2 :
As Bobah said that title is miss leading , so i had updated the title of questoin .
Initial title was how to declare 100000x100000 2-d integer in c++ ?
Content was
There is any way to declare int x[100000][100000] in c++.
When i declare it globally then compiler produces error: size of array ‘x’ is too large .
One method could be using map< pair< int , int > , int > mymap .
But allocating and deallocating takes more time . There is any other way like uisng vector<int> myvec ;
For memory blocks that large, the best approach is dynamic allocation using the operating system's facilities for adding virtual memory to the process.
However, look how large a block you are trying to allocate:
40 000 000 000 bytes
I take my previous advice back. For a block that large, the best approach is to analyze the problem and figure out a way to use less memory.
Filling the edit distance matrix can be done each row at a time. Remembering the previous row is enough to compute the current row. This observation reduces space usage from quadratic to linear. Makes sense?
Your question is very interesting, but the title is misleading.
This is what you need in terms of data model (x - first string, y - second string, * - distance matrix).
y <-- first string (scrolls from top down)
y
x x x x x x x x <- second string (scrolls from left to right)
y * * *
y * * *
y * * * <-- distance matrix (a donut) scrolls together with strings
and grows/shrinks when needed, as explained below
y
Have two relatively long (but still << N) character buffers and relatively small ( << buffers size) rectangular (start from square) distance matrix.
Make the matrix a donut - bi-dimentional ring buffer (can use the one from boost, or just std::deque).
When string fragments currently covered by the matrix are 100% match shift both buffers by one, rotate the donut around both axes, recalculating one new row/column in the distance matrix.
When match is <100% and is less than configured threshold then grow the size of the both dimensions of the donut without dropping any rows/columns and do it until either match gets above the threshold or you reach the maximum donut size. When match ratio hits the threshold from the below you need to scroll donut discarding head of x and y buffers and at the same time aligning them (only X needs moving by 1 when the distance matrix tells that X[i] does not exist in Y, but X[i+1,i+m] matches Y[j, j+m-1]).
As a result you will have a simple yet very efficient heuristic diff engine with deterministic limited memory footprint and all memory can be pre-allocated at startup so no dynamic allocation will slow it down at runtime.
Apache v2 license, in case you decide to go for it.

Resources