Trying to understand this simple python code - python-3.x

I was reading Jeff Knupp's blog and I came across this easy little script:
import math
def is_prime(n):
if n > 1:
if n == 2:
return True
if n % 2 == 0:
return False
for current in range(3, int(math.sqrt(n) + 1), 2):
if n % current == 0:
return False
return True
return False
print(is_prime(17))
(note: I added the import math at the beginning. You can see the original here:
http://www.jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/)
This is all pretty straightforward and I get the majority of it, but I'm not sure what's going on with his use of the range function. I haven't ever used it this way or seen anyone else use it this way, but then I'm a beginner. What does it mean for the range function to have three parameters, and how does this accomplish testing for primeness?
Also (and apologies if this is a stupid question), but the very last 'return False' statement. That is there so that if a number is passed to the function that is less than one (and thus not able to be prime), the function won't even waste its time evaluating that number, right?

The third is the step. It iterates through every odd number less than or equal to the square root of the input (3, 5, 7, etc.).

import math #import math module
def is_prime(n): #define is_prime function and assign variable n to its argument (n = 17 in this example).
if n > 1: #check if n (its argument) is greater than one, if so, continue; else return False (this is the last return in the function).
if n == 2: #check if n equals 2, it so return True and exit.
return True
if n % 2 == 0: #check if the remainder of n divided by two equas 0, if so, return False (is not prime) and exit.
return False
for current in range(3, int(math.sqrt(n) + 1), 2): #use range function to generate a sequence starting with value 3 up to, but not including, the truncated value of the square root of n, plus 1. Once you have this secuence give me every other number ( 3, 5, 7, etc)
if n % current == 0: #Check every value from the above secuence and if the remainder of n divided by that value is 0, return False (it's not prime)
return False
return True #if not number in the secuence divided n with a zero remainder then n is prime, return True and exit.
return False
print(is_prime(17))

Related

String subpattern recognition optimization

In this kata you need to build a function to return either true/True or false/False if a string can be seen as the repetition of a simpler/shorter subpattern or not.
For example:
has_subpattern("a") == False #no repeated pattern
has_subpattern("aaaa") == True #created repeating "a"
has_subpattern("abcd") == False #no repeated pattern
has_subpattern("abababab") == True #created repeating "ab"
has_subpattern("ababababa") == False #cannot be entirely reproduced repeating a pattern
Strings will never be empty and can be composed of any character (just consider upper- and lowercase letters as different entities) and can be pretty long (keep an eye on performances!).
My solution is:
def has_subpattern(string):
string_size = len(string)
for i in range(1, string_size):
slice1 = string[:i]
appearence_count = string.count(slice1)
slice1_len = len(slice1)
if appearence_count > 0:
if appearence_count * slice1_len == string_size:
return True
return False
Obviously there are weak and too slow things like slice1 = string[:i] and string.count() in loop..
Is there better ways to solve an issue or ways to improve performance ?
Short regex approach:
import re
def has_subpattern_re(s):
return bool(re.search(r'^(\w+)\1+$', s))
It'll provide a close (to initial has_subpattern approach) performance on small strings:
import timeit
...
print(timeit.timeit('has_subpattern("abababab")', 'from __main__ import has_subpattern'))
0.7413144190068124
print(timeit.timeit('has_subpattern_re("abababab")', 'from __main__ import re, has_subpattern_re'))
0.856149295999785
But, a significant performance increase (in about 3-5 times faster) on long strings:
print(timeit.timeit('has_subpattern("ababababababababababababababababababababababababa")', 'from __main__ import has_subpattern'))
14.669428467008402
print(timeit.timeit('has_subpattern_re("ababababababababababababababababababababababababa")', 'from __main__ import re, has_subpattern_re'))
4.308312018998549
And one more test for a more longer string:
print(timeit.timeit('has_subpattern("ababababababababababababababababababababababababaababababababababababababababababababababababababab")', 'from __main__ import has_subpattern'))
35.998031173992786
print(timeit.timeit('has_subpattern_re("ababababababababababababababababababababababababaababababababababababababababababababababababababab")', 'from __main__ import re, has_subpattern_re'))
7.010367843002314
Within standard Python, the bottlenecks here will be count, which enjoys C speed implementation and the looping.
The looping itself may be hard to speed-up (althogh Cython may be of some help).
Hence, the most important optimization is to reduce the number of loopings.
One obvious way is to let range() do not exceed half the size of the input (+ 2: + 1 for rounding issues, + 1 for end extrema exclusion in range()):
Also, string is a standard Python module, so better not use it as a variable name.
def has_subpattern_loop(text):
for i in range(1, len(text) // 2 + 2):
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
A much more effective way of restricting the number of calls to count is to skip computation when i is not a multiple of the length of the input.
def has_subpattern_loop2(text):
for i in range(1, len(text) // 2 + 2):
if len(text) % i == 0:
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
Even better would be to generate only the divisors of the length of the input.
This could be done using sympy and the approach outlined here:
import sympy as sym
import functools
def get_divisors(n):
if n == 1:
yield 1
return
factors = list(sym.factor_.factorint(n).items())
nfactors = len(factors)
f = [0] * nfactors
while True:
yield functools.reduce(lambda x, y: x * y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
i = 0
while True:
f[i] += 1
if f[i] <= factors[i][1]:
break
f[i] = 0
i += 1
if i >= nfactors:
return
def has_subpattern_divs(text):
for i in get_divisors(len(text)):
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
A completely different approach is the one proposed in #ВладДавидченко answer:
def has_subpattern_find(text):
return (text * 2).find(text, 1) != len(text)
or the more memory efficient (requires ~50% less additional memory compared to has_subpattern_find2()):
def has_subpattern_find2(text):
return (text + text[:len(text) // 2 + 2]).find(text, 1) > 0
and it is based on the idea that if there is a exactly self-repeating string, the string itself must be found in a circularly extended string:
Input: abab
Extension1: abababab
Found1: |-abab
Extension2: ababab
Found2: |-abab
Input: ababa
Extension1: ababaababa
Found1: |----ababa
Extension2: ababab
Found2: NOT FOUND!
The find-based method are the fastest, with has_subpattern_find() being fastest in the small input size regime, and has_subpattern_find2() gets generally faster in the intermediate and large input size regime (especially in the False case).
For shorter inputs, the direct looping approaches (especially has_subpattern_loop2()) are fastest, closely followed (but sometimes surpassed by has_subpattern_re()), but as soon as the input gets bigger (and especially for the False outcome), the has_subpattern_divs() method gets to be the fastest (aside of find-based ones) by far and large, as shown by the following benchmarks.
For the True outcome, has_subpattern_loop2() gets to be the fastest due to the very small number of loops required, which is independent of the input size.
The input is generated as a function of n using:
def gen_input(n, m=0):
g = string.ascii_lowercase
if not m:
m = n
offset = '!' if n % 2 else ''
return g[:n] * (m // min(n, len(g)) + 2) + offset
so that if n is even, the has_subpattern*() always return True and the opposite for odd n.
Note that, in general, the has_subpattern() function will depend not only on the raw size of the input but also on the length of the repeating string, if any. This is not explored in the benchmarks, except for the odd/even separation.
Even Inputs
Odd Inputs
(Full code available here).
(EDITED to include some more solutions as well as comparison with regex-based solution from #RomanPerekhrest)
(EDITED to include some more solutions based on the find from #ВладДавидченко)
Found another one solution, probably will be useful:
def has_subpattern(string):
return (string * 2).find(string, 1) != len(string)

Define a function that returns a list of prime factors

I'm a complete beginer when it comes to programming and need help with this. Ive been tasked with creating a function that returns a list of prime factors of a given number.
i have already made a separate function to check for prime numbers, and a function to list prime factors up to a given number but when i try perform a check if said prime is a factor the list is returned empty.
#A1 Q1, Flow control
def isprime(N):
if N < 2: return False
for n in range(2,N-1):
if (N % n) == 0: return False
if n >= N//n: break
return True
def factors(x):
pf = []
for n in range(x+1):
if isprime(n) is True and n % (x+1) == 0:
pf.append(n)
return pf
pf2 = factors(10075)
print(pf2)
so for 10075 pf2 should return [5, 13, 31]. instead i get an empty list.

find the first occurrence of a number greater than k in a sorted array

For the given sorted list,the program should return the index of the number in the list which is greater than the number which is given as input.
Now when i run code and check if it is working i am getting 2 outputs. One is the value and other output is None.
If say i gave a input of 3 for the below code.The expected output is index of 20 i.e., 1 instead i am getting 1 followed by None.
If i give any value that is greater than the one present in the list i am getting correct output i.e., "The entered number is greater than the numbers in the list"
num_to_find = int(input("Enter the number to be found"))
a=[2,20,30]
def occur1(a,num_to_find):
j = i = 0
while j==0:
if a[len(a)-1] > num_to_find:
if num_to_find < a[i]:
j=1
print(i)
break
else:
i = i + 1
else:
ret_state = "The entered number is greater than the numbers in the list"
return ret_state
print(occur1(a,num_to_find))
This code is difficult to reason about due to extra variables, poor variable names (j is typically used as an index, not a bool flag), usage of break, nested conditionals and side effect. It's also inefficient because it needs to visit each element in the list in the worst case scenario and fails to take advantage of the sorted nature of the list to the fullest. However, it appears working.
Your first misunderstanding is likely that print(i) is printing the index of the next largest element rather than the element itself. In your example call of occur1([2, 20, 30], 3)), 1 is where 20 lives in the array.
Secondly, once the found element is printed, the function returns None after it breaks from the loop, and print dutifully prints None. Hopefully this explains your output--you can use return a[i] in place of break to fix your immediate problem and meet your expectations.
Having said that, Python has a builtin module for this: bisect. Here's an example:
from bisect import bisect_right
a = [1, 2, 5, 6, 8, 9, 15]
index_of_next_largest = bisect_right(a, 6)
print(a[index_of_next_largest]) # => 8
If the next number greater than k is out of bounds, you can try/except that or use a conditional to report the failure as you see fit. This function takes advantage of the fact that the list is sorted using a binary search algorithm, which cuts the search space in half on every step. The time complexity is O(log(n)), which is very fast.
If you do wish to stick with a linear algorithm similar to your solution, you can simplify your logic to:
def occur1(a, num_to_find):
for n in a:
if n > num_to_find:
return n
# test it...
a = [2, 5, 10]
for i in range(11):
print(i, " -> ", occur1(a, i))
Output:
0 -> 2
1 -> 2
2 -> 5
3 -> 5
4 -> 5
5 -> 10
6 -> 10
7 -> 10
8 -> 10
9 -> 10
10 -> None
Or, if you want the index of the next largest number:
def occur1(a, num_to_find):
for i, n in enumerate(a):
if n > num_to_find:
return i
But I want to stress that the binary search is, by every measure, far superior to the linear search. For a list of a billion elements, the binary search will make about 20 comparisons in the worst case where the linear version will make a billion comparisons. The only reason not to use it is if the list can't be guaranteed to be pre-sorted, which isn't the case here.
To make this more concrete, you can play with this program (but use the builtin module in practice):
import random
def bisect_right(a, target, lo=0, hi=None, cmps=0):
if hi is None:
hi = len(a)
mid = (hi - lo) // 2 + lo
cmps += 1
if lo <= hi and mid < len(a):
if a[mid] < target:
return bisect_right(a, target, mid + 1, hi, cmps)
elif a[mid] > target:
return bisect_right(a, target, lo, mid - 1, cmps)
else:
return cmps, mid + 1
return cmps, mid + 1
def linear_search(a, target, cmps=0):
for i, n in enumerate(a):
cmps += 1
if n > target:
return cmps, i
return cmps, i
if __name__ == "__main__":
random.seed(42)
trials = 10**3
list_size = 10**4
binary_search_cmps = 0
linear_search_cmps = 0
for n in range(trials):
test_list = sorted([random.randint(0, list_size) for _ in range(list_size)])
test_target = random.randint(0, list_size)
res = bisect_right(test_list, test_target)[0]
binary_search_cmps += res
linear_search_cmps += linear_search(test_list, test_target)[0]
binary_search_avg = binary_search_cmps / trials
linear_search_avg = linear_search_cmps / trials
s = "%s search made %d comparisons across \n%d searches on random lists of %d elements\n(found the element in an average of %d comparisons\nper search)\n"
print(s % ("binary", binary_search_cmps, trials, list_size, binary_search_avg))
print(s % ("linear", linear_search_cmps, trials, list_size, linear_search_avg))
Output:
binary search made 12820 comparisons across
1000 searches on random lists of 10000 elements
(found the element in an average of 12 comparisons
per search)
linear search made 5013525 comparisons across
1000 searches on random lists of 10000 elements
(found the element in an average of 5013 comparisons
per search)
The more elements you add, the worse the situation looks for the linear search.
I would do something along the lines of:
num_to_find = int(input("Enter the number to be found"))
a=[2,20,30]
def occur1(a, num_to_find):
for i in a:
if not i <= num_to_find:
return a.index(i)
return "The entered number is greater than the numbers in the list"
print(occur1(a, num_to_find))
Which gives the output of 1 (when inputting 3).
The reason yours gives you 2 outputs, is because you have 2 print statements inside your code.

Python 3, any better/cleaner way to write these functions that use for loops?

I'm trying to write code in the most simplest and cleanest way possible. I've found a few ways to shorten and simplify my code through functions that I've never seen before or through using other methods. I'd like to expand my knowledge on writing code using various (but simple) methods, and also expand my function 'vocabulary'.
Here are the functions:
1. Perfect number:
If a number's divisors' sum is equal to the number itself, it is a perfect number. We dont count the number itself as a divisor. E.g. 6's divisors are 1, 2, 3. The sum of the divisors is 6. Therefore 6 is a perfect number.
def perfect_number(num):
if type(num) != int or num < 0:
return None
divisors = []
total = 0
for x in range(num):
if num % (x+1) == 0:
if num != x+1:
divisors += [x+1]
for x in divisors:
total += x
if total == num:
return True
return False
2. Pattern:
A function that takes a positive integer and prints a pattern as follows:
pattern(1): '#-'
pattern(2): '#-#--'
pattern(5): '#-#--#---#----#-----'
def pattern(num):
if type(num) != int or num < 0:
return None
output = ''
for x in range(num):
output += '#'+('-'*(x+1))
return output
3. Reversed Numbers:
A function that takes 2 integers. It goes through every number in the range between those 2 numbers, if one of those numbers is a palindrome (the same thing backwards e.g. 151 is a 'palindrome'), it will increase a variable by 1. That variable is then returned.
invert_number(num) returns the opposite of num as an integer.
def reversed_numbers(low, high):
output = 0
for x in range(low,high+1):
if invert_number(x) == x:
output += 1
return output
It is assumed that low is lower than high.
If I broke a rule or if this doesnt fit here, please tell me where I can post it/how I can improve. Thanks :)

How to handle negative cases for prime check in Python?

With regard to the "xrange" function - ("range" in Python3) what happens when I do a negative check inside a loop? In this case, a negative number could be regarded as an edge case, but always returns None. Any insights?
The problem is you are checking if the number is negative inside the for loop. For instance if x=-3, then you are trying to run a for loop in range(2,-1) which is None. So the for loop never runs and hence returns True.
def isprime(x):
if x<=0:
return(False)
for a in range(2,(x//2)+1):
if(x%a==0):
return(False)
return(True)
By it's elementary school definition, prime numbers are defined for positive numbers only, thus your function should return False for every negative number, for example:
def isprime(x):
if x <= 0:
return False
for a in range(2, (x//2)+1):
if x % a == 0:
return False
return True
That being said, it is possible to extend the definition (as done in some fields in math), to include negative numbers as well (for further discussion see here and here). In this case, for every negative number -n, -n is prime iff n is prime. Therefore your code could be something like:
def isprime(x):
if x <= 0: # or use any |abs| method you'd like, like numpy's
x = -x
for a in range(2, (x//2)+1):
if x % a == 0:
return False
return True

Resources