Limit of Python recursion functions (Process finished with exit code 139) - python-3.x

I had an old script that from a pandas dataframe calculates new columns from others, but also from the previous result of that column being calculated.
This script used for loops, and it was quite slow.
For this reason, I replaced the for loops with recursive functions.
The new script is around 100 times faster than the old one, which is good news. But I am now encountering a limit that I did not have before. As soon as I have more than 29952 rows in my dataset, I get the following error:
"Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"
I made this little script with lists reflecting my problem :
If I increase the size of the lists (list_lenght) to more than 29952, the script crashes (on my computer)
import random
import sys
def list_generator(min_value, max_value, list_lenght):
return [random.randrange(min_value,max_value) for i in range(list_lenght)]
def recursive_function(list_1, list_2, n, result):
if n == len(list_1):
return result
elif list_1[n] <= list_2[n]:
result.append(1 + result[n - 1])
else:
result.append(0)
return recursive_function(list_1, list_2, (n + 1), result)
list_lenght = 29952 # How to increase this limit without generating an error?
min_value = 10
max_value = 20
list_one = list_generator(min_value, max_value, list_lenght)
list_two = list_generator(min_value, max_value, list_lenght)
# Set recursion limit
sys.setrecursionlimit(list_lenght * 2)
# Compute a new list from list_one and list_two
list_result = recursive_function(list_one, list_two, 1, [0])
I suspect a memory problem, but how do you take advantage of all the power of python's recursive functions while avoiding this limit as well as possible?
Thanks in advance
Following comment from #trincot, here is the version of the code without recursion function... which is ultimately faster than the version above with a recursive function ! And with which there are no more limits
def no_recursive_function(list_1, list_2, n, result):
if list_1[n] <= list_2[n]:
return 1 + result[n - 1]
else:
return 0
list_lenght = 29952
min_value = 10
max_value = 20
list_one = list_generator(min_value, max_value, list_lenght)
list_two = list_generator(min_value, max_value, list_lenght)
# Set recursion limit
sys.setrecursionlimit(list_lenght * 2)
list_result_2 = [0]
for n in range(list_lenght - 1):
result = no_recursive_function(list_one, list_two, n + 1, list_result_2)
list_result_2.append(result)

Related

Why this multiprocessing code hangs in perpetuity?

Trying to parallelize some code in Python. Both methods, apply and map from the multiprocessing library, hang in perpetuity when executing the following code.
import multiprocessing as mp
import numpy as np
#the following function will be parallelized.
def howmany_within_range(row, minimum, maximum):
"""Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
count = 0
for n in row:
if minimum <= n <= maximum:
count = count + 1
return count
# Step 1: Create data
np.random.RandomState(100)
arr = np.random.randint(0, 10, size=[200000, 5])
data = arr.tolist()
data[:5]
# Step 2: Init multiprocessing.Pool()
pool = mp.Pool(mp.cpu_count())
# Step 3: `pool.apply` the `howmany_within_range()`
results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]
# Step 4: close
pool.close()
print(results[:10])
The other method pool.map also hangs:
# Redefine, with only 1 mandatory argument.
def howmany_within_range_rowonly(row, minimum=4, maximum=8):
count = 0
for n in row:
if minimum <= n <= maximum:
count = count + 1
return count
pool = mp.Pool(mp.cpu_count())
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
print(results[:10])
What is wrong?
Ps. Working on Python 3.8.11 (Jupyter Notebook 6.1.4)

Why is heapq.heapify so fast?

I have tried to reimplement heapify method in order to use _siftup and _siftdown for updating or deleting any nodes in the heap and maintaining a time complexity of O(log(n)).
I did some effort for optimizing my code, But they proved to be worse compared to that of heapq.heapify (in terms of the total time taken). So I have decided to look into source code. and compared copied code with the modules ones.
# heap invariant.
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if newitem < parent:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
def heapify(x):
"""Transform list into a heap, in-place, in O(len(x)) time."""
n = len(x)
# Transform bottom-up. The largest index there's any point to looking at
# is the largest with a child index in-range, so must have 2*i + 1 < n,
# or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
for i in reversed(range(n//2)):
_siftup(x, i)
a = list(reversed(range(1000000)))
b = a.copy()
import heapq
import time
cp1 = time.time()
heapq.heapify(a)
cp2 = time.time()
heapify(b)
cp3 = time.time()
print(a == b)
print(cp3 - cp2, cp2 - cp1)
And i found always cp3 - cp2 >= cp2 - cp1 and not same , heapify took more time than heapq.heaify even though both were same.
And in some case heapify took 3 seconds and heapq.heapify took 0.1 seconds
heapq.heapfy module executes faster than the same heapify they only differ through import.
please let me know the reason, I am sorry if I made some silly mistakes.
The heapify from the heapq module is actually a built-in function:
>>> import heapq
>>> heapq
<module 'heapq' from 'python3.9/heapq.py'>
>>> heapq.heapify
<built-in function heapify>
help(heapq.heapify) says:
Help on built-in function heapify in module _heapq...
So it's actually importing the built-in module _heapq and thus running C code, not Python.
If you scroll the heapq.py code further, you'll see this:
# If available, use C implementation
try:
from _heapq import *
except ImportError:
pass
This will overwrite functions like heapify with their C implementations. For instance, _heapq.heapify is here.

Problem in the function of my program code python

I tried to make a program to do the below things but apparently, the function doesn't work. I want my function to take two or more arguments and give me the average and median and the maximum number of those arguments.
example input:
calc([2, 20])
example output : (11.0, 11.0, 20)
def calc():
total = 0
calc = sorted(calc)
for x in range(len(calc)):
total += int(calc[x])
average = total / len(calc)
sorted(calc)
maximum = calc[len(calc) - 1]
if len(calc) % 2 != 0:
median = calc[(len(calc) // 2) + 1]
else:
median = (float(calc[(len(calc) // 2) - 1]) + float(calc[(len(calc) // 2)])) / 2
return (average, median, maximum)
There are some things I'm going to fix as I go since I can't help myself.
First, you main problem is arguments.
If you hand a function arguments
calc([2, 20])
It needs to accept arguments.
def calc(some_argument):
This will fix your main problem but another thing is you shouldn't have identical names for your variables.
calc is your function name so it should not also be the name of your list within your function.
# changed the arg name to lst
def calc(lst):
lst = sorted(lst)
# I'm going to just set these as variables since
# you're doing the calculations more than once
# it adds a lot of noise to your lines
size = len(lst)
mid = size // 2
total = 0
# in python we can just iterate over a list directly
# without indexing into it
# and python will unpack the variable into x
for x in lst:
total += int(x)
average = total / size
# we can get the last element in a list like so
maximum = lst[-1]
if size % 2 != 0:
# this was a logical error
# the actual element you want is mid
# since indexes start at 0
median = lst[mid]
else:
# here there is no reason to explicity cast to float
# since python division does that automatically
median = (lst[mid - 1] + lst[mid]) / 2
return (average, median, maximum)
print(calc([11.0, 11.0, 20]))
Output:
(14.0, 11.0, 20)
Because you are passing arguments into a function that doesn't accept any, you are getting an error. You could fix this just by making the first line of your program:
def calc(calc):
But it would be better to accept inputs into your function as something like "mylist". To do so you would just have to change your function like so:
def calc(mylist):
calc=sorted(mylist)

Is the Benchmarking of my Algorithms right?

i wrote Quicksort and Mergesort and a Benchmark for them, to see how fast they are.
Here is my code:
#------------------------------Creating a Random List-----------------------------#
def create_random_list (length):
import random
random_list = list(range(0,length))
random.shuffle(random_list)
return random_list
# Initialize default list length to 0
random_list = create_random_list(0)
# Testing random list function
print ("\n" + "That is a randomized list: " + "\n")
print (random_list)
print ("\n")
#-------------------------------------Quicksort-----------------------------------#
"""
Recursive Divide and Conquer Algorithm
+ Very efficient for large data set
- Performance Depends largely on Pivot Selection
Time Complexity
--> Worst-Case -----> O (n^2)
--> Best-Case -----> Ω (n log (n))
--> Average Case ---> O (n log (n))
Space Complexity
--> O(log(n))
"""
# Writing the Quick Sort Algorithm for sorting the list - Recursive Method
def qsort (random_list):
less = []
equal = []
greater = []
if len(random_list)>1:
# Initialize starting Point
pivot = random_list[0]
for x in random_list:
if x < pivot:
less.append(x)
elif x == pivot:
equal.append(x)
elif x > pivot:
greater.append(x)
return qsort(less) + equal + qsort(greater)
else:
return random_list
"""
Build in Python Quick Sort:
def qsort(L):
if len(L) <= 1: return L
return qsort([lt for lt in L[1:] if lt < L[0]]) + L[0:1] + \
qsort([ge for ge in L[1:] if ge >= L[0]])
"""
# Calling Quicksort
sorted_list_qsort = qsort(random_list)
# Testint Quicksort
print ("That is a sorted list with Quicksort: " + "\n")
print (sorted_list_qsort)
print ("\n")
#-------------------------------------FINISHED-------------------------------------#
#-------------------------------------Mergesort------------------------------------#
"""
Recursive Divide and Conquer Algorithm
+
-
Time Complexity
--> Worst-Case -----> O (n l(n))
--> Best-Case -----> Ω (n l(n))
--> Average Case ---> O (n l(n))
Space Complexity
--> O (n)
"""
# Create a merge algorithm
def merge(a,b): # Let a and b be two arrays
c = [] # Final sorted output array
a_idx, b_idx = 0,0 # Index or start from a and b array
while a_idx < len(a) and b_idx < len(b):
if a[a_idx] < b[b_idx]:
c.append(a[a_idx])
a_idx+=1
else:
c.append(b[b_idx])
b_idx+=1
if a_idx == len(a): c.extend(b[b_idx:])
else: c.extend(a[a_idx:])
return c
# Create final Mergesort algorithm
def merge_sort(a):
# A list of zero or one elements is sorted by definition
if len(a)<=1:
return a
# Split the list in half and call Mergesort recursively on each half
left, right = merge_sort(a[:int(len(a)/2)]), merge_sort(a[int(len(a)/2):])
# Merge the now-sorted sublists with the merge function which sorts two lists
return merge(left,right)
# Calling Mergesort
sorted_list_mgsort = merge_sort(random_list)
# Testing Mergesort
print ("That is a sorted list with Mergesort: " + "\n")
print (sorted_list_mgsort)
print ("\n")
#-------------------------------------FINISHED-------------------------------------#
#------------------------------Algorithm Benchmarking------------------------------#
# Creating an array for iterations
n = [100,1000,10000,100000]
# Creating a dictionary for times of algorithms
times = {"Quicksort":[], "Mergesort": []}
# Import time for analyzing the running time of the algorithms
from time import time
# Create a for loop which loop through the arrays of length n and analyse their times
for size in range(len(n)):
random_list = create_random_list(n[size])
t0 = time()
qsort(random_list)
t1 = time()
times["Quicksort"].append(t1-t0)
random_list = create_random_list(n[size-1])
t0 = time()
merge_sort(random_list)
t1 = time()
times["Mergesort"].append(t1-t0)
# Create a table while shows the Benchmarking of the algorithms
print ("n\tMerge\tQuick")
print ("_"*25)
for i,j in enumerate(n):
print ("{}\t{:.5f}\t{:.5f}\t".format(j, times["Mergesort"][i], times["Quicksort"][i]))
#----------------------------------End of Benchmarking---------------------------------#
The code is well documented and runs perfectly with Python 3.8. You may copy it in a code editor for better readability.
--> My Question as the title states:
Is my Benchmarking right? I'm doubting it a litte bit, because the running times of my Algorithms seem a little odd. Can someone confirm my runtime?
--> Here is the output of this code:
That is a randomized list:
[]
That is a sorted list with Quicksort:
[]
That is a sorted list with Mergesort:
[]
n Merge Quick
_________________________
100 0.98026 0.00021
1000 0.00042 0.00262
10000 0.00555 0.03164
100000 0.07919 0.44718
--> If someone has another/better code snippet on how to print the table - feel free to share it with me.
The error is in n[size-1]: when size is 0 (the first iteration), this translates to n[-1], which corresponds to your largest size. So in the first iteration you are comparing qsort(100) with merge_sort(100000), which obviously will favour the first a lot. It doesn't help that you call this variable size, as it really isn't the size, but the index in the n list, which contains the sizes.
So remove the -1, or even better: iterate directly over n. And I would also make sure both sorting algorithms get to sort the same list:
for size in n:
random_list1 = create_random_list(size)
random_list2 = random_list1[:]
t0 = time()
qsort(random_list1)
t1 = time()
times["Quicksort"].append(t1-t0)
t0 = time()
merge_sort(random_list2)
t1 = time()
times["Mergesort"].append(t1-t0)
Finally, consider using timeit which is designed for measuring performance.

Speed up bitstring/bit operations in Python?

I wrote a prime number generator using Sieve of Eratosthenes and Python 3.1. The code runs correctly and gracefully at 0.32 seconds on ideone.com to generate prime numbers up to 1,000,000.
# from bitstring import BitString
def prime_numbers(limit=1000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
yield 2
sub_limit = int(limit**0.5)
flags = [False, False] + [True] * (limit - 2)
# flags = BitString(limit)
# Step through all the odd numbers
for i in range(3, limit, 2):
if flags[i] is False:
# if flags[i] is True:
continue
yield i
# Exclude further multiples of the current prime number
if i <= sub_limit:
for j in range(i*3, limit, i<<1):
flags[j] = False
# flags[j] = True
The problem is, I run out of memory when I try to generate numbers up to 1,000,000,000.
flags = [False, False] + [True] * (limit - 2)
MemoryError
As you can imagine, allocating 1 billion boolean values (1 byte 4 or 8 bytes (see comment) each in Python) is really not feasible, so I looked into bitstring. I figured, using 1 bit for each flag would be much more memory-efficient. However, the program's performance dropped drastically - 24 seconds runtime, for prime number up to 1,000,000. This is probably due to the internal implementation of bitstring.
You can comment/uncomment the three lines to see what I changed to use BitString, as the code snippet above.
My question is, is there a way to speed up my program, with or without bitstring?
Edit: Please test the code yourself before posting. I can't accept answers that run slower than my existing code, naturally.
Edit again:
I've compiled a list of benchmarks on my machine.
There are a couple of small optimizations for your version. By reversing the roles of True and False, you can change "if flags[i] is False:" to "if flags[i]:". And the starting value for the second range statement can be i*i instead of i*3. Your original version takes 0.166 seconds on my system. With those changes, the version below takes 0.156 seconds on my system.
def prime_numbers(limit=1000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
yield 2
sub_limit = int(limit**0.5)
flags = [True, True] + [False] * (limit - 2)
# Step through all the odd numbers
for i in range(3, limit, 2):
if flags[i]:
continue
yield i
# Exclude further multiples of the current prime number
if i <= sub_limit:
for j in range(i*i, limit, i<<1):
flags[j] = True
This doesn't help your memory issue, though.
Moving into the world of C extensions, I used the development version of gmpy. (Disclaimer: I'm one of the maintainers.) The development version is called gmpy2 and supports mutable integers called xmpz. Using gmpy2 and the following code, I have a running time of 0.140 seconds. Running time for a limit of 1,000,000,000 is 158 seconds.
import gmpy2
def prime_numbers(limit=1000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
yield 2
sub_limit = int(limit**0.5)
# Actual number is 2*bit_position + 1.
oddnums = gmpy2.xmpz(1)
current = 0
while True:
current += 1
current = oddnums.bit_scan0(current)
prime = 2 * current + 1
if prime > limit:
break
yield prime
# Exclude further multiples of the current prime number
if prime <= sub_limit:
for j in range(2*current*(current+1), limit>>1, prime):
oddnums.bit_set(j)
Pushing optimizations, and sacrificing clarity, I get running times of 0.107 and 123 seconds with the following code:
import gmpy2
def prime_numbers(limit=1000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
yield 2
sub_limit = int(limit**0.5)
# Actual number is 2*bit_position + 1.
oddnums = gmpy2.xmpz(1)
f_set = oddnums.bit_set
f_scan0 = oddnums.bit_scan0
current = 0
while True:
current += 1
current = f_scan0(current)
prime = 2 * current + 1
if prime > limit:
break
yield prime
# Exclude further multiples of the current prime number
if prime <= sub_limit:
list(map(f_set,range(2*current*(current+1), limit>>1, prime)))
Edit: Based on this exercise, I modified gmpy2 to accept xmpz.bit_set(iterator). Using the following code, the run time for all primes less 1,000,000,000 is 56 seconds for Python 2.7 and 74 seconds for Python 3.2. (As noted in the comments, xrange is faster than range.)
import gmpy2
try:
range = xrange
except NameError:
pass
def prime_numbers(limit=1000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
yield 2
sub_limit = int(limit**0.5)
oddnums = gmpy2.xmpz(1)
f_scan0 = oddnums.bit_scan0
current = 0
while True:
current += 1
current = f_scan0(current)
prime = 2 * current + 1
if prime > limit:
break
yield prime
if prime <= sub_limit:
oddnums.bit_set(iter(range(2*current*(current+1), limit>>1, prime)))
Edit #2: One more try! I modified gmpy2 to accept xmpz.bit_set(slice). Using the following code, the run time for all primes less 1,000,000,000 is about 40 seconds for both Python 2.7 and Python 3.2.
from __future__ import print_function
import time
import gmpy2
def prime_numbers(limit=1000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
yield 2
sub_limit = int(limit**0.5)
flags = gmpy2.xmpz(1)
# pre-allocate the total length
flags.bit_set((limit>>1)+1)
f_scan0 = flags.bit_scan0
current = 0
while True:
current += 1
current = f_scan0(current)
prime = 2 * current + 1
if prime > limit:
break
yield prime
if prime <= sub_limit:
flags.bit_set(slice(2*current*(current+1), limit>>1, prime))
start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)
Edit #3: I've updated gmpy2 to properly support slicing at the bit level of an xmpz. No change in performance but a much nice API. I have done a little tweaking and I've got the time down to about 37 seconds. (See Edit #4 to changes in gmpy2 2.0.0b1.)
from __future__ import print_function
import time
import gmpy2
def prime_numbers(limit=1000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
sub_limit = int(limit**0.5)
flags = gmpy2.xmpz(1)
flags[(limit>>1)+1] = True
f_scan0 = flags.bit_scan0
current = 0
prime = 2
while prime <= sub_limit:
yield prime
current += 1
current = f_scan0(current)
prime = 2 * current + 1
flags[2*current*(current+1):limit>>1:prime] = True
while prime <= limit:
yield prime
current += 1
current = f_scan0(current)
prime = 2 * current + 1
start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)
Edit #4: I made some changes in gmpy2 2.0.0b1 that break the previous example. gmpy2 no longer treats True as a special value that provides an infinite source of 1-bits. -1 should be used instead.
from __future__ import print_function
import time
import gmpy2
def prime_numbers(limit=1000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
sub_limit = int(limit**0.5)
flags = gmpy2.xmpz(1)
flags[(limit>>1)+1] = 1
f_scan0 = flags.bit_scan0
current = 0
prime = 2
while prime <= sub_limit:
yield prime
current += 1
current = f_scan0(current)
prime = 2 * current + 1
flags[2*current*(current+1):limit>>1:prime] = -1
while prime <= limit:
yield prime
current += 1
current = f_scan0(current)
prime = 2 * current + 1
start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)
Edit #5: I've made some enhancements to gmpy2 2.0.0b2. You can now iterate over all the bits that are either set or clear. Running time has improved by ~30%.
from __future__ import print_function
import time
import gmpy2
def sieve(limit=1000000):
'''Returns a generator that yields the prime numbers up to limit.'''
# Increment by 1 to account for the fact that slices do not include
# the last index value but we do want to include the last value for
# calculating a list of primes.
sieve_limit = gmpy2.isqrt(limit) + 1
limit += 1
# Mark bit positions 0 and 1 as not prime.
bitmap = gmpy2.xmpz(3)
# Process 2 separately. This allows us to use p+p for the step size
# when sieving the remaining primes.
bitmap[4 : limit : 2] = -1
# Sieve the remaining primes.
for p in bitmap.iter_clear(3, sieve_limit):
bitmap[p*p : limit : p+p] = -1
return bitmap.iter_clear(2, limit)
if __name__ == "__main__":
start = time.time()
result = list(sieve(1000000000))
print(time.time() - start)
print(len(result))
OK, so this is my second answer, but as speed is of the essence I thought that I had to mention the bitarray module - even though it's bitstring's nemesis :). It's ideally suited to this case as not only is it a C extension (and so faster than pure Python has a hope of being), but it also supports slice assignments.
I haven't even tried to optimise this, I just rewrote the bitstring version. On my machine I get 0.16 seconds for primes under a million.
For a billion, it runs perfectly well and completes in 2 minutes 31 seconds.
import bitarray
def prime_bitarray(limit=1000000):
yield 2
flags = bitarray.bitarray(limit)
flags.setall(False)
sub_limit = int(limit**0.5)
for i in range(3, limit, 2):
if not flags[i]:
yield i
if i <= sub_limit:
flags[3*i:limit:i*2] = True
Okay, here's a (near complete) comprehensive benchmarking I've done tonight to see which code runs the fastest. Hopefully someone will find this list useful. I omitted anything that takes more than 30 seconds to complete on my machine.
I would like to thank everyone that put in an input. I've gained a lot of insight from your efforts, and I hope you have too.
My machine: AMD ZM-86, 2.40 Ghz Dual-Core, with 4GB of RAM. This is a HP Touchsmart Tx2 laptop. Note that while I may have linked to a pastebin, I benchmarked all of the following on my own machine.
I will add the gmpy2 benchmark once I am able to build it.
All of the benchmarks are tested in Python 2.6 x86
Returning a list of prime numbers n up to 1,000,000: (Using Python
generators)
Sebastian's numpy generator version (updated) - 121 ms #
Mark's Sieve + Wheel - 154 ms
Robert's version with slicing - 159 ms
My improved version with slicing
- 205 ms
Numpy generator with enumerate - 249 ms #
Mark's Basic Sieve - 317 ms
casevh's improvement on my original
solution - 343 ms
My modified numpy generator solution - 407 ms
My original method in the
question - 409 ms
Bitarray Solution - 414 ms #
Pure Python with bytearray - 1394 ms #
Scott's BitString solution - 6659
ms #
'#' means this method is capable of generating up to n < 1,000,000,000 on
my machine setup.
In addition, if you don't need the
generator and just want the whole list
at once:
numpy solution from RosettaCode -
32 ms #
(The numpy solution is also capable of generating up to 1 billion, which took 61.6259 seconds. I suspect the memory was swapped once, hence the double time.)
Related question: Fastest way to list all primes below N in python.
Hi, i am too looking for a code in Python to generate primes up to 10^9 as fast as i can, which is difficult because of the memory problem. up to now i came up with this to generate primes up to 10^6 & 10^7 (clocking 0,171s & 1,764s respectively on my old machine), which seems to be slightly faster (at least in my computer) than "My improved version with slicing" from previous post, probably because i use n//i-i +1 (see below) instead of the analogous len(flags[i2::i<<1]) in the other code. please correct me if i am wrong. Any suggestions for improvement are very welcome.
def primes(n):
""" Returns a list of primes < n """
sieve = [True] * n
for i in xrange(3,int(n**0.5)+1,2):
if sieve[i]:
sieve[i*i::2*i]=[False]*((n-i*i-1)/(2*i)+1)
return [2] + [i for i in xrange(3,n,2) if sieve[i]]
In one of his codes Xavier uses flags[i2::i<<1] and len(flags[i2::i<<1]). I computed the size explicitly, using the fact that between i*i..n we have (n-i*i)//2*i multiples of 2*i because we want to count i*i also we sum 1 so len(sieve[i*i::2*i]) equals (n-i*i)//(2*i) +1. This makes the code faster. A basic generator based on the code above would be:
def primesgen(n):
""" Generates all primes <= n """
sieve = [True] * n
yield 2
for i in xrange(3,int(n**0.5)+1,2):
if sieve[i]:
yield i
sieve[i*i::2*i] = [False]*((n-i*i-1)/(2*i)+1)
for i in xrange(i+2,n,2):
if sieve[i]: yield i
with a bit of modification one can write a slightly slower version of the code above that starts with a sieve half of the size sieve = [True] * (n//2) and works for the same n. not sure how that will reflect in the memory issue. As an example of implementation here is the
modified version of the numpy rosetta code (maybe faster) starting with a sieve half of the size.
import numpy
def primesfrom3to(n):
""" Returns a array of primes, 3 <= p < n """
sieve = numpy.ones(n/2, dtype=numpy.bool)
for i in xrange(3,int(n**0.5)+1,2):
if sieve[i/2]: sieve[i*i/2::i] = False
return 2*numpy.nonzero(sieve)[0][1::]+1
A Faster & more memory-wise generator would be:
import numpy
def primesgen1(n):
""" Input n>=6, Generates all primes < n """
sieve1 = numpy.ones(n/6+1, dtype=numpy.bool)
sieve5 = numpy.ones(n/6 , dtype=numpy.bool)
sieve1[0] = False
yield 2; yield 3;
for i in xrange(int(n**0.5)/6+1):
if sieve1[i]:
k=6*i+1; yield k;
sieve1[ ((k*k)/6) ::k] = False
sieve5[(k*k+4*k)/6::k] = False
if sieve5[i]:
k=6*i+5; yield k;
sieve1[ ((k*k)/6) ::k] = False
sieve5[(k*k+2*k)/6::k] = False
for i in xrange(i+1,n/6):
if sieve1[i]: yield 6*i+1
if sieve5[i]: yield 6*i+5
if n%6 > 1:
if sieve1[i+1]: yield 6*(i+1)+1
or with a bit more code:
import numpy
def primesgen(n):
""" Input n>=30, Generates all primes < n """
size = n/30 + 1
sieve01 = numpy.ones(size, dtype=numpy.bool)
sieve07 = numpy.ones(size, dtype=numpy.bool)
sieve11 = numpy.ones(size, dtype=numpy.bool)
sieve13 = numpy.ones(size, dtype=numpy.bool)
sieve17 = numpy.ones(size, dtype=numpy.bool)
sieve19 = numpy.ones(size, dtype=numpy.bool)
sieve23 = numpy.ones(size, dtype=numpy.bool)
sieve29 = numpy.ones(size, dtype=numpy.bool)
sieve01[0] = False
yield 2; yield 3; yield 5;
for i in xrange(int(n**0.5)/30+1):
if sieve01[i]:
k=30*i+1; yield k;
sieve01[ (k*k)/30::k] = False
sieve07[(k*k+ 6*k)/30::k] = False
sieve11[(k*k+10*k)/30::k] = False
sieve13[(k*k+12*k)/30::k] = False
sieve17[(k*k+16*k)/30::k] = False
sieve19[(k*k+18*k)/30::k] = False
sieve23[(k*k+22*k)/30::k] = False
sieve29[(k*k+28*k)/30::k] = False
if sieve07[i]:
k=30*i+7; yield k;
sieve01[(k*k+ 6*k)/30::k] = False
sieve07[(k*k+24*k)/30::k] = False
sieve11[(k*k+16*k)/30::k] = False
sieve13[(k*k+12*k)/30::k] = False
sieve17[(k*k+ 4*k)/30::k] = False
sieve19[ (k*k)/30::k] = False
sieve23[(k*k+22*k)/30::k] = False
sieve29[(k*k+10*k)/30::k] = False
if sieve11[i]:
k=30*i+11; yield k;
sieve01[ (k*k)/30::k] = False
sieve07[(k*k+ 6*k)/30::k] = False
sieve11[(k*k+20*k)/30::k] = False
sieve13[(k*k+12*k)/30::k] = False
sieve17[(k*k+26*k)/30::k] = False
sieve19[(k*k+18*k)/30::k] = False
sieve23[(k*k+ 2*k)/30::k] = False
sieve29[(k*k+ 8*k)/30::k] = False
if sieve13[i]:
k=30*i+13; yield k;
sieve01[(k*k+24*k)/30::k] = False
sieve07[(k*k+ 6*k)/30::k] = False
sieve11[(k*k+ 4*k)/30::k] = False
sieve13[(k*k+18*k)/30::k] = False
sieve17[(k*k+16*k)/30::k] = False
sieve19[ (k*k)/30::k] = False
sieve23[(k*k+28*k)/30::k] = False
sieve29[(k*k+10*k)/30::k] = False
if sieve17[i]:
k=30*i+17; yield k;
sieve01[(k*k+ 6*k)/30::k] = False
sieve07[(k*k+24*k)/30::k] = False
sieve11[(k*k+26*k)/30::k] = False
sieve13[(k*k+12*k)/30::k] = False
sieve17[(k*k+14*k)/30::k] = False
sieve19[ (k*k)/30::k] = False
sieve23[(k*k+ 2*k)/30::k] = False
sieve29[(k*k+20*k)/30::k] = False
if sieve19[i]:
k=30*i+19; yield k;
sieve01[ (k*k)/30::k] = False
sieve07[(k*k+24*k)/30::k] = False
sieve11[(k*k+10*k)/30::k] = False
sieve13[(k*k+18*k)/30::k] = False
sieve17[(k*k+ 4*k)/30::k] = False
sieve19[(k*k+12*k)/30::k] = False
sieve23[(k*k+28*k)/30::k] = False
sieve29[(k*k+22*k)/30::k] = False
if sieve23[i]:
k=30*i+23; yield k;
sieve01[(k*k+24*k)/30::k] = False
sieve07[(k*k+ 6*k)/30::k] = False
sieve11[(k*k+14*k)/30::k] = False
sieve13[(k*k+18*k)/30::k] = False
sieve17[(k*k+26*k)/30::k] = False
sieve19[ (k*k)/30::k] = False
sieve23[(k*k+ 8*k)/30::k] = False
sieve29[(k*k+20*k)/30::k] = False
if sieve29[i]:
k=30*i+29; yield k;
sieve01[ (k*k)/30::k] = False
sieve07[(k*k+24*k)/30::k] = False
sieve11[(k*k+20*k)/30::k] = False
sieve13[(k*k+18*k)/30::k] = False
sieve17[(k*k+14*k)/30::k] = False
sieve19[(k*k+12*k)/30::k] = False
sieve23[(k*k+ 8*k)/30::k] = False
sieve29[(k*k+ 2*k)/30::k] = False
for i in xrange(i+1,n/30):
if sieve01[i]: yield 30*i+1
if sieve07[i]: yield 30*i+7
if sieve11[i]: yield 30*i+11
if sieve13[i]: yield 30*i+13
if sieve17[i]: yield 30*i+17
if sieve19[i]: yield 30*i+19
if sieve23[i]: yield 30*i+23
if sieve29[i]: yield 30*i+29
i += 1
mod30 = n%30
if mod30 > 1:
if sieve01[i]: yield 30*i+1
if mod30 > 7:
if sieve07[i]: yield 30*i+7
if mod30 > 11:
if sieve11[i]: yield 30*i+11
if mod30 > 13:
if sieve13[i]: yield 30*i+13
if mod30 > 17:
if sieve17[i]: yield 30*i+17
if mod30 > 19:
if sieve19[i]: yield 30*i+19
if mod30 > 23:
if sieve23[i]: yield 30*i+23
Ps: If you have any suggestions about how to speed up this code, anything from changing the order of operations to pre-computing anything, please comment.
Here's a version that I wrote a while back; it might be interesting to compare with yours for speed. It doesn't do anything about the space problems, though (in fact, they're probably worse than with your version).
from math import sqrt
def basicSieve(n):
"""Given a positive integer n, generate the primes < n."""
s = [1]*n
for p in xrange(2, 1+int(sqrt(n-1))):
if s[p]:
a = p*p
s[a::p] = [0] * -((a-n)//p)
for p in xrange(2, n):
if s[p]:
yield p
I have faster versions, using a wheel, but they're much more complicated. Original source is here.
Okay, here's the version using a wheel. wheelSieve is the main entry point.
from math import sqrt
from bisect import bisect_left
def basicSieve(n):
"""Given a positive integer n, generate the primes < n."""
s = [1]*n
for p in xrange(2, 1+int(sqrt(n-1))):
if s[p]:
a = p*p
s[a::p] = [0] * -((a-n)//p)
for p in xrange(2, n):
if s[p]:
yield p
class Wheel(object):
"""Class representing a wheel.
Attributes:
primelimit -> wheel covers primes < primelimit.
For example, given a primelimit of 6
the wheel primes are 2, 3, and 5.
primes -> list of primes less than primelimit
size -> product of the primes in primes; the modulus of the wheel
units -> list of units modulo size
phi -> number of units
"""
def __init__(self, primelimit):
self.primelimit = primelimit
self.primes = list(basicSieve(primelimit))
# compute the size of the wheel
size = 1
for p in self.primes:
size *= p
self.size = size
# find the units by sieving
units = [1] * self.size
for p in self.primes:
units[::p] = [0]*(self.size//p)
self.units = [i for i in xrange(self.size) if units[i]]
# number of units
self.phi = len(self.units)
def to_index(self, n):
"""Compute alpha(n), where alpha is an order preserving map
from the set of units modulo self.size to the nonnegative integers.
If n is not a unit, the index of the first unit greater than n
is given."""
return bisect_left(self.units, n % self.size) + n // self.size * self.phi
def from_index(self, i):
"""Inverse of to_index."""
return self.units[i % self.phi] + i // self.phi * self.size
def wheelSieveInner(n, wheel):
"""Given a positive integer n and a wheel, find the wheel indices of
all primes that are less than n, and that are units modulo the
wheel modulus.
"""
# renaming to avoid the overhead of attribute lookups
U = wheel.units
wS = wheel.size
# inverse of unit map
UI = dict((u, i) for i, u in enumerate(U))
nU = len(wheel.units)
sqroot = 1+int(sqrt(n-1)) # ceiling of square root of n
# corresponding index (index of next unit up)
sqrti = bisect_left(U, sqroot % wS) + sqroot//wS*nU
upper = bisect_left(U, n % wS) + n//wS*nU
ind2 = bisect_left(U, 2 % wS) + 2//wS*nU
s = [1]*upper
for i in xrange(ind2, sqrti):
if s[i]:
q = i//nU
u = U[i%nU]
p = q*wS+u
u2 = u*u
aq, au = (p+u)*q+u2//wS, u2%wS
wp = p * nU
for v in U:
# eliminate entries corresponding to integers
# congruent to p*v modulo p*wS
uvr = u*v%wS
m = aq + (au > uvr)
bot = (m + (q*v + u*v//wS - m) % p) * nU + UI[uvr]
s[bot::wp] = [0]*-((bot-upper)//wp)
return s
def wheelSieve(n, wheel=Wheel(10)):
"""Given a positive integer n, generate the list of primes <= n."""
n += 1
wS = wheel.size
U = wheel.units
nU = len(wheel.units)
s = wheelSieveInner(n, wheel)
return (wheel.primes[:bisect_left(wheel.primes, n)] +
[p//nU*wS + U[p%nU] for p in xrange(bisect_left(U, 2 % wS)
+ 2//wS*nU, len(s)) if s[p]])
As to what a wheel is: well, you know that (apart from 2), all primes are odd, so most sieves miss out all the even numbers. Similarly, you can go a bit further and notice that all primes (except 2 and 3) are congruent to 1 or 5 modulo 6 (== 2 * 3), so you can get away with only storing entries for those numbers in your sieve. The next step up is to note that all primes (except 2, 3 and 5) are congruent to one of 1, 7, 11, 13, 17, 19, 23, 29 (modulo 30) (here 30 == 2*3*5), and so on.
One speed improvement you can make using bitstring is to use the 'set' method when you exclude multiples of the current number.
So the vital section becomes
for i in range(3, limit, 2):
if flags[i]:
yield i
if i <= sub_limit:
flags.set(1, range(i*3, limit, i*2))
On my machine this runs about 3 times faster than the original.
My other thought was to use the bitstring to represent only the odd numbers. You could then find the unset bits using flags.findall([0]) which returns a generator. Not sure if that would be much faster (finding bit patterns isn't terribly easy to do efficiently).
[Full disclosure: I wrote the bitstring module, so I've got some pride at stake here!]
As a comparison I've also taken the guts out of the bitstring method so that it's doing it in the same way, but without range checking etc. I think this gives a reasonable lower limit for pure Python that works for a billion elements (without changing the algorithm, which I think is cheating!)
def prime_pure(limit=1000000):
yield 2
flags = bytearray((limit + 7) // 8)
sub_limit = int(limit**0.5)
for i in xrange(3, limit, 2):
byte, bit = divmod(i, 8)
if not flags[byte] & (128 >> bit):
yield i
if i <= sub_limit:
for j in xrange(i*3, limit, i*2):
byte, bit = divmod(j, 8)
flags[byte] |= (128 >> bit)
On my machine this runs in about 0.62 seconds for a million elements, which means it's about a quarter of the speed of the bitarray answer. I think that's quite reasonable for pure Python.
Python's integer types can be of arbitrary size, so you shouldn't need a clever bitstring library to represent a set of bits, just a single number.
Here's the code. It handles a limit of 1,000,000 with ease, and can even handle 10,000,000 without complaining too much:
def multiples_of(n, step, limit):
bits = 1 << n
old_bits = bits
max = 1 << limit
while old_bits < max:
old_bits = bits
bits += bits << step
step *= 2
return old_bits
def prime_numbers(limit=10000000):
'''Prime number generator. Yields the series
2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
using Sieve of Eratosthenes.
'''
yield 2
sub_limit = int(limit**0.5)
flags = ((1 << (limit - 2)) - 1) << 2
# Step through all the odd numbers
for i in xrange(3, limit, 2):
if not (flags & (1 << i)):
continue
yield i
# Exclude further multiples of the current prime number
if i <= sub_limit:
flags &= ~multiples_of(i * 3, i << 1, limit)
The multiples_of function avoids the cost of setting every single multiple individually. It sets the initial bit, shifts it by the initial step (i << 1) and adds the result to itself. It then doubles the step, so that the next shift copies both bits, and so on until it reaches the limit. This sets all the multiples of a number up to the limit in O(log(limit)) time.
One option you may want to look at is just compiling a C/C++ module so you have direct access to the bit-twiddling features. AFAIK you could write something of that nature and only return the results on completion of the calculations performed in C/C++. Now that I'm typing this you may also look at numpy which does store arrays as native C for speed. I don't know if that will be any faster than the bitstring module, though!
Another really stupid option, but that can be of help if you really need a large list of primes number available very fast. Say, if you need them as a tool to answer project Euler's problems (if the problem is just optimizing your code as a mind game it's irrelevant).
Use any slow solution to generate list and save it to a python source file, says primes.py that would just contain:
primes = [ a list of a million primes numbers here ]
Now to use them you just have to do import primes and you get them with minimal memory footprint at the speed of disk IO. Complexity is number of primes :-)
Even if you used a poorly optimized solution to generate this list, it will only be done once and it does not matter much.
You could probably make it even faster using pickle/unpickle, but you get the idea...
You could use a segmented Sieve of Eratosthenes. The memory used for each segment is reused for the next segment.
Here is some Python3 code that uses less memory than the bitarray/bitstring solutions posted to date and about 1/8 the memory of Robert William Hanks's primesgen(), while running faster than primesgen() (marginally faster at 1,000,000, using 37KB of memory, 3x faster than primesgen() at 1,000,000,000 using under 34MB). Increasing the chunk size (variable chunk in the code) uses more memory but speeds up the program, up to a limit -- I picked a value so that its contribution to memory is under about 10% of sieve's n//30 bytes. It's not as memory-efficient as Will Ness's infinite generator (see also https://stackoverflow.com/a/19391111/5439078) because it records (and returns at the end, in compressed form) all calculated primes.
This should work correctly as long as the square root calculation comes out accurate (about 2**51 if Python uses 64-bit doubles). However you should not use this program to find primes that large!
(I didn't recalculate the offsets, just took them from code of Robert William Hanks. Thanks Robert!)
import numpy as np
def prime_30_gen(n):
""" Input n, int-like. Yields all primes < n as Python ints"""
wheel = np.array([2,3,5])
yield from wheel[wheel < n].tolist()
powers = 1 << np.arange(8, dtype='u1')
odds = np.array([1, 7, 11, 13, 17, 19, 23, 29], dtype='i8')
offsets=np.array([[0,6,10,12,16,18,22,28],[6,24,16,12,4,0,22,10],
[0,6,20,12,26,18,2,8], [24,6,4,18,16,0,28,10],
[6,24,26,12,14,0,2,20], [0,24,10,18,4,12,28,22],
[24,6,14,18,26,0,8,20], [0,24,20,18,14,12,8,2]],
dtype='i8')
offsets = {d:f for d,f in zip(odds, offsets)}
sieve = 255 * np.ones((n + 28) // 30, dtype='u1')
if n % 30 > 1: sieve[-1] >>= 8 - odds.searchsorted(n % 30)
sieve[0] &= ~1
for i in range((int((n - 1)**0.5) + 29) // 30):
for odd in odds[(sieve[i] & powers).nonzero()[0]]:
k = i * 30 + odd
yield int(k)
for clear_bit, off in zip(~powers, offsets[odd]):
sieve[(k * (k + off)) // 30 :: k] &= clear_bit
chunk = min(1 + (n >> 13), 1<<15)
for i in range(i + 1, len(sieve), chunk):
a = (sieve[i : i + chunk, np.newaxis] & powers)
a = np.flatnonzero(a.astype('bool')) + i * 8
a = (a // 8 * 30 + odds[a & 7]).tolist()
yield from a
return sieve
Side note: If you want real speed and have the 2GB of RAM required for the list of primes to 10**9, then you should use pyprimesieve (on https://pypi.python.org/, using primesieve http://primesieve.org/).

Resources