Optimize Sieve of Eratosthenes Further - python-3.x

I have written a Sieve of Eratosthenes--I think--but it seems like it's not as optimized as it could be. It works, and it gets all the primes up to N, but not as quickly as I'd hoped. I'm still learning Python--coming from two years of Java--so if something isn't particularly Pythonic then I apologize:
def sieve(self):
is_prime = [False, False, True, True] + [False, True] * ((self.lim - 4) // 2)
for i in range(3, self.lim, 2):
if i**2 > self.lim: break
if is_prime[i]:
for j in range(i * i, self.lim, i * 2):
is_prime[j] = False
return is_prime
I've looked at other questions similar to this one but I can't figure out how some of the more complicated optimizations would fit in to my code. Any suggestions?
EDIT: as requested, some of the other optimizations I've seen are stopping the iteration of the first for loop before the limit, and skipping by different numbers--which I think is wheel optimization?
EDIT 2: Here's the code that would utilize the method, for Padraic:
primes = sieve.sieve()
for i in range(0, len(primes)):
if primes[i]:
print("{:d} ".format(i), end = '')
print() # print a newline

a slightly different approach: use a bitarray to represent the odd numbers 3,5,7,... saving some space compared to a list of booleans.
this may save some space only and not help speedup...
from bitarray import bitarray
def index_to_number(i): return 2*i+3
def number_to_index(n): return (n-3)//2
LIMIT_NUMBER = 50
LIMIT_INDEX = number_to_index(LIMIT_NUMBER)+1
odd_primes = bitarray(LIMIT_INDEX)
# index 0 1 2 3
# number 3 5 7 9
odd_primes.setall(True)
for i in range(LIMIT_INDEX):
if odd_primes[i] is False:
continue
n = index_to_number(i)
for m in range(n**2, LIMIT_NUMBER, 2*n):
odd_primes[number_to_index(m)] = False
primes = [index_to_number(i) for i in range(LIMIT_INDEX)
if odd_primes[i] is True]
primes.insert(0,2)
print('primes: ', primes)
the same idea again; but this time let bitarray handle the inner loop using slice assignment. this may be faster.
for i in range(LIMIT_INDEX):
if odd_primes[i] is False:
continue
odd_primes[2*i**2 + 6*i + 3:LIMIT_INDEX:2*i+3] = False
(none of this code has been seriously checked! use with care)
in case you are looking for a primes generator based on a different method (wheel factorizaition) have a look at this excellent answer.

Related

A Python code of a program that reads an integer, and prints the integer it is a multiple of either 2 or 5 but not both

A Python code of a program that reads an integer, and prints the integer is a multiple of either 2 or 5 but not both.
First off, SO is not a forum to ask people to solve your homework without first putting effort yourself. you could have written a simple naive solution with if-else statements and posted that in your question, asking for a better way to do this.
That being said, I'm still adding my approach because I think you should know that a beautiful and simple way would be to just use an XOR check:
n = int(input("Input a number: "))
answer = True if (n % 2 == 0) != (n % 5 == 0) else False
print(answer)
I'm assuming the user will always input a value that can be cast as an integer. You can take care of the error handling process. Once the number has been read, I do a simple XOR check to return True if the number is only divisible by either 2 or 5 but not both, and False in all other cases. This works because bool(a)!=bool(b) gives True only when bool(a) and bool(b) evaluate to different things.
Demonstration:
def check(n):
return (n%2 == 0) != (n%5 ==0)
print(check(2)) # Out: True
print(check(5)) # Out: True
print(check(10)) # Out: False
print(check(3)) # Out: False

A strategy-proof method of finding the time complexity of complex algorithms?

I have a question in regard to time complexity (big-O) in Python. I want to understand the general method I would need to implement when trying to find the big-O of a complex algorithm. I have understood the reasoning behind calculating the time complexity of simple algorithms, such as a for loop iterating over a list of n elements having a O(n), or having two nested for loops each iterating over 2 lists of n elements each having a big-O of n**2. But, for more complex algorithms that implement multiple if-elif-else statements coupled with for loops, I would want to see if there is a strategy to, simply based on the code, in an iterative fashion, to determine the big-O of my code using simple heuristics (such as, ignoring constant time complexity if statements or always squaring the n upon going over a for loop, or doing something specific when encountering an else statement).
I have created a battleship game, for which I would like to find the time complexity, using such an aforementioned strategy.
from random import randint
class Battle:
def __init__(self):
self.my_grid = [[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False]]
def putting_ship(self,x,y):
breaker = False
while breaker == False:
r1=x
r2=y
element = self.my_grid[r1][r2]
if element == True:
continue
else:
self.my_grid[r1][r2] = True
break
def printing_grid(self):
return self.my_grid
def striking(self,r1,r2):
element = self.my_grid[r1][r2]
if element == True:
print("STRIKE!")
self.my_grid[r1][r2] = False
return True
elif element == False:
print("Miss")
return False
def game():
battle_1 = Battle()
battle_2 = Battle()
score_player1 = 0
score_player2 = 0
turns = 5
counter_ships = 2
while True:
input_x_player_1 = input("give x coordinate for the ship, player 1\n")
input_y_player_1 = input("give y coordinate for the ship, player 1\n")
battle_1.putting_ship(int(input_x_player_1),int(input_y_player_1))
input_x_player_2 = randint(0,9)
input_y_player_2 = randint(0,9)
battle_2.putting_ship(int(input_x_player_2),int(input_y_player_2))
counter_ships -= 1
if counter_ships == 0:
break
while True:
input_x_player_1 = input("give x coordinate for the ship\n")
input_y_player_1 = input("give y coordinate for the ship\n")
my_var = battle_1.striking(int(input_x_player_1),int(input_y_player_1))
if my_var == True:
score_player1 += 1
print(score_player1)
input_x_player_2 = randint(0,9)
input_y_player_2 = randint(0,9)
my_var_2 = battle_2.striking(int(input_x_player_2),int(input_y_player_2))
if my_var_2 == True:
score_player2 += 1
print(score_player2)
counter_ships -= 1
if counter_ships == 0:
break
print("the score for player 1 is",score_player1)
print("the score for player 2 is",score_player2)
print(game())
If it's just nested for loops and if/else statements, you can take the approach ibonyun has suggested - assume all if/else cases are covered and look at the deepest loops (being aware that some operations like sorting, or copying an array, might hide loops of their own.)
However, your code also has while loops. In this particular example it's not too hard to replace them with fors, but for code containing nontrivial whiles there is no general strategy that will always give you the complexity - this is a consequence of the halting problem.
For example:
def collatz(n):
n = int(abs(n))
steps = 0
while n != 1:
if n%2 == 1:
n=3*n+1
else:
n=n//2
steps += 1
print(n)
print("Finished in",steps,"steps!")
So far nobody has been able to prove that this will even finish for all n, let alone shown an upper bound to the run-time.
Side note: instead of the screen-breaking
self.my_grid = [[False,False,...False],[False,False,...,False],...,[False,False,...False]]
consider something like:
grid_size = 10
self.my_grid = [[False for i in range(grid_size)] for j in range(grid_size)]
which is easier to read and check.
Empirical:
You could do some time trials while increasing n (so maybe increasing the board size?) and plot the resulting data. You could tell by the curve/slope of the line what the time complexity is.
Theoretical:
Parse the script and keep track of the biggest O() you find for any given line or function call. Any sorting operations will give you nlogn. A for loop inside a for loop will give you n^2 (assuming their both iterating over the input data), etc. Time complexity is about the broad strokes. O(n) and O(n*3) are both linear time, and that's what really matters. I don't think you need to worry about the minutia of all your if-elif-else logic. Maybe just focus on worst case scenario?

String subpattern recognition optimization

In this kata you need to build a function to return either true/True or false/False if a string can be seen as the repetition of a simpler/shorter subpattern or not.
For example:
has_subpattern("a") == False #no repeated pattern
has_subpattern("aaaa") == True #created repeating "a"
has_subpattern("abcd") == False #no repeated pattern
has_subpattern("abababab") == True #created repeating "ab"
has_subpattern("ababababa") == False #cannot be entirely reproduced repeating a pattern
Strings will never be empty and can be composed of any character (just consider upper- and lowercase letters as different entities) and can be pretty long (keep an eye on performances!).
My solution is:
def has_subpattern(string):
string_size = len(string)
for i in range(1, string_size):
slice1 = string[:i]
appearence_count = string.count(slice1)
slice1_len = len(slice1)
if appearence_count > 0:
if appearence_count * slice1_len == string_size:
return True
return False
Obviously there are weak and too slow things like slice1 = string[:i] and string.count() in loop..
Is there better ways to solve an issue or ways to improve performance ?
Short regex approach:
import re
def has_subpattern_re(s):
return bool(re.search(r'^(\w+)\1+$', s))
It'll provide a close (to initial has_subpattern approach) performance on small strings:
import timeit
...
print(timeit.timeit('has_subpattern("abababab")', 'from __main__ import has_subpattern'))
0.7413144190068124
print(timeit.timeit('has_subpattern_re("abababab")', 'from __main__ import re, has_subpattern_re'))
0.856149295999785
But, a significant performance increase (in about 3-5 times faster) on long strings:
print(timeit.timeit('has_subpattern("ababababababababababababababababababababababababa")', 'from __main__ import has_subpattern'))
14.669428467008402
print(timeit.timeit('has_subpattern_re("ababababababababababababababababababababababababa")', 'from __main__ import re, has_subpattern_re'))
4.308312018998549
And one more test for a more longer string:
print(timeit.timeit('has_subpattern("ababababababababababababababababababababababababaababababababababababababababababababababababababab")', 'from __main__ import has_subpattern'))
35.998031173992786
print(timeit.timeit('has_subpattern_re("ababababababababababababababababababababababababaababababababababababababababababababababababababab")', 'from __main__ import re, has_subpattern_re'))
7.010367843002314
Within standard Python, the bottlenecks here will be count, which enjoys C speed implementation and the looping.
The looping itself may be hard to speed-up (althogh Cython may be of some help).
Hence, the most important optimization is to reduce the number of loopings.
One obvious way is to let range() do not exceed half the size of the input (+ 2: + 1 for rounding issues, + 1 for end extrema exclusion in range()):
Also, string is a standard Python module, so better not use it as a variable name.
def has_subpattern_loop(text):
for i in range(1, len(text) // 2 + 2):
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
A much more effective way of restricting the number of calls to count is to skip computation when i is not a multiple of the length of the input.
def has_subpattern_loop2(text):
for i in range(1, len(text) // 2 + 2):
if len(text) % i == 0:
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
Even better would be to generate only the divisors of the length of the input.
This could be done using sympy and the approach outlined here:
import sympy as sym
import functools
def get_divisors(n):
if n == 1:
yield 1
return
factors = list(sym.factor_.factorint(n).items())
nfactors = len(factors)
f = [0] * nfactors
while True:
yield functools.reduce(lambda x, y: x * y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
i = 0
while True:
f[i] += 1
if f[i] <= factors[i][1]:
break
f[i] = 0
i += 1
if i >= nfactors:
return
def has_subpattern_divs(text):
for i in get_divisors(len(text)):
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
A completely different approach is the one proposed in #ВладДавидченко answer:
def has_subpattern_find(text):
return (text * 2).find(text, 1) != len(text)
or the more memory efficient (requires ~50% less additional memory compared to has_subpattern_find2()):
def has_subpattern_find2(text):
return (text + text[:len(text) // 2 + 2]).find(text, 1) > 0
and it is based on the idea that if there is a exactly self-repeating string, the string itself must be found in a circularly extended string:
Input: abab
Extension1: abababab
Found1: |-abab
Extension2: ababab
Found2: |-abab
Input: ababa
Extension1: ababaababa
Found1: |----ababa
Extension2: ababab
Found2: NOT FOUND!
The find-based method are the fastest, with has_subpattern_find() being fastest in the small input size regime, and has_subpattern_find2() gets generally faster in the intermediate and large input size regime (especially in the False case).
For shorter inputs, the direct looping approaches (especially has_subpattern_loop2()) are fastest, closely followed (but sometimes surpassed by has_subpattern_re()), but as soon as the input gets bigger (and especially for the False outcome), the has_subpattern_divs() method gets to be the fastest (aside of find-based ones) by far and large, as shown by the following benchmarks.
For the True outcome, has_subpattern_loop2() gets to be the fastest due to the very small number of loops required, which is independent of the input size.
The input is generated as a function of n using:
def gen_input(n, m=0):
g = string.ascii_lowercase
if not m:
m = n
offset = '!' if n % 2 else ''
return g[:n] * (m // min(n, len(g)) + 2) + offset
so that if n is even, the has_subpattern*() always return True and the opposite for odd n.
Note that, in general, the has_subpattern() function will depend not only on the raw size of the input but also on the length of the repeating string, if any. This is not explored in the benchmarks, except for the odd/even separation.
Even Inputs
Odd Inputs
(Full code available here).
(EDITED to include some more solutions as well as comparison with regex-based solution from #RomanPerekhrest)
(EDITED to include some more solutions based on the find from #ВладДавидченко)
Found another one solution, probably will be useful:
def has_subpattern(string):
return (string * 2).find(string, 1) != len(string)

Non-recursive Most Efficient Big-O Permutation Alghoritm Python3 (non-built-in)

Hi Guys For my Data Structure assignment I have to find the most efficient way (big-o wise) to calculate permutations of a list of objects.
I found recursive examples on the web but this doesn't seem to be the most efficient way; I tried my own code but then I realized that when I count the number of possible permutations I'm actually making my algorithm O(!n). Any suggestions? .-.
from random import sample
import time
start = time.time()
testList = list(x for x in range(7))
print('list lenght: %i objects' % len(testList))
nOfPerms = 1
for i in range(1,len(testList)+1):
nOfPerms *= i
print('number of permutations:', nOfPerms)
listOfPerms = []
n = 1
while n <= nOfPerms:
perm = tuple(sample(testList, len(testList)))
listOfPerms.append(perm)
permutations = set(listOfPerms)
if len(permutations) == len(listOfPerms):
n += 1
else:
del(listOfPerms[-1])
end = time.time() - start
print('time elapsed:', end)
OUTPUT:
list lenght: 7 objects
number of permutations: 5040
time elapsed: 13.142292976379395
If instead of 7 I put 8 or 9, or 10, those are the number of permutations (I won't show the time cause it's taking too long):
list lenght: 8 objects
number of permutations: 40320
list lenght: 9 objects
number of permutations: 362880
list lenght: 10 objects
number of permutations: 3628800
I believe this will be the best you can do. Generating the number of permutations of a list generates n! permutations. As you need to generate them all this is also how much time it will take (O(n!)). What you could try to do is to make it a python generator function so you will always only generate exactly as many as you need instead of precalculating them all and storing them in memory. If you want an example of this i could give you one.
Im sorry this might be a quite negative answer. It's a good question but im pretty sure this is about the best that you can do, asymptotically. You could optimize the code itself a bit to use less instructions but in the end that wont help too much.
Edit:
This is a python implementation of Heap's algorithm which i promised
(https://en.wikipedia.org/wiki/Heap%27s_algorithm) generating N! permutations where the generation of every one permutation takes amortized O(1) time and which uses O(n) space complexity (by alteri
def permute(lst, k=None):
if k == None:
k = len(lst)
if k == 1:
yield lst
else:
yield from permute(lst, k-1)
for i in range(k-1):
if i % 2 == 0:
#even
lst[i], lst[k-1] = lst[k-1], lst[i]
else:
#odd
lst[0], lst[k-1] = lst[k-1], lst[0]
yield from permute(lst, k-1)
for i in permute([1, 2, 3, 4]):
print(i)

Trying to understand this simple python code

I was reading Jeff Knupp's blog and I came across this easy little script:
import math
def is_prime(n):
if n > 1:
if n == 2:
return True
if n % 2 == 0:
return False
for current in range(3, int(math.sqrt(n) + 1), 2):
if n % current == 0:
return False
return True
return False
print(is_prime(17))
(note: I added the import math at the beginning. You can see the original here:
http://www.jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/)
This is all pretty straightforward and I get the majority of it, but I'm not sure what's going on with his use of the range function. I haven't ever used it this way or seen anyone else use it this way, but then I'm a beginner. What does it mean for the range function to have three parameters, and how does this accomplish testing for primeness?
Also (and apologies if this is a stupid question), but the very last 'return False' statement. That is there so that if a number is passed to the function that is less than one (and thus not able to be prime), the function won't even waste its time evaluating that number, right?
The third is the step. It iterates through every odd number less than or equal to the square root of the input (3, 5, 7, etc.).
import math #import math module
def is_prime(n): #define is_prime function and assign variable n to its argument (n = 17 in this example).
if n > 1: #check if n (its argument) is greater than one, if so, continue; else return False (this is the last return in the function).
if n == 2: #check if n equals 2, it so return True and exit.
return True
if n % 2 == 0: #check if the remainder of n divided by two equas 0, if so, return False (is not prime) and exit.
return False
for current in range(3, int(math.sqrt(n) + 1), 2): #use range function to generate a sequence starting with value 3 up to, but not including, the truncated value of the square root of n, plus 1. Once you have this secuence give me every other number ( 3, 5, 7, etc)
if n % current == 0: #Check every value from the above secuence and if the remainder of n divided by that value is 0, return False (it's not prime)
return False
return True #if not number in the secuence divided n with a zero remainder then n is prime, return True and exit.
return False
print(is_prime(17))

Resources