Python 3 index is len(l) conditional evaluation error - python-3.x

I have the following merge sort code. When the line if ib is len(b) or ... is changed to use double equal ==: if ib == len(b) or ..., the code does not raise an IndexError exception.
This is very unexpected because:
len(b) is evaluated to a number and is is equivalent to == for integers. You can test it out: a python expression
(1 is len([0]) )
is evaluated to be True.
the input to the function is range(1500, -1, -1), and range objects are handled differently in python3. I was suspecting that since the input was handled as a range instance, the length evaluation might have been an instance instead of a integer primitive. This is again strange because
1 is len(range(1))
also gives you True as the result.
Is this a bug with the conditional evaluation in Python3?
Tom Caswell supplied this following useful express in our discussion, I'm copy pasting it here for your notice:
tt = [j is int(str(j)) for j in range(15000)]
only the first 256 items are True. The rest are False hahahaha.
The original script:
def merge_sort(arr):
if len(arr) >= 2:
s = int(len(arr)/2)
a = merge_sort(arr[:s])
b = merge_sort(arr[s:])
ia = 0
ib = 0
new_arr = []
while len(new_arr) < len(arr):
try:
if ib is len(b) or a[ia] <= b[ib]:
new_arr.append(a[ia])
ia += 1
else:
new_arr.append(b[ib])
ib += 1
except IndexError:
print(len(a), len(b), ia, ib)
raise IndexError
return new_arr
else:
return arr
print(merge_sort(range(1500, -1, -1)))

Python does not guarantee that two integer instances with equal value are the same instance. In the example below, the reason the first 256 comparisons return equal is because Python caches -5 to 256 in Long.
This behavior is described here: https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong
example:
tt = [j is int(str(j)) for j in range(500)]
plt.plot(tt)
IIRC that any of them pass the is test is an implementation-specific optimization detail.

is checks whether 2 arguments refer to the same object, == checks whether 2 arguments have the same value. You cannot assume they mean the same thing, they have different uses, and you'll get an error thrown if you attempt to use them interchangeably.

Related

Python how to calculate average of range list?

Could somebody tell me what I am doing wrong?
I am gotting error Vidurkis = sum(B)/len(B)
TypeError: 'int' object is not callable
A = int(input('Betkoks skaicius'))
if A == 0:
print('Ačiū')
if A <= 10 and A>=-10:
if A<0:
print('Neigiamas vienženklis')
if A>0:
print('Teigiamas vienženklis')
else:
print('| {:^20} |'.format('Autorius: '))
for r in range(10,A,1):
Vidurkis = sum(r)/len(r)
print(Vidurkis)
after
sum = 0
sum is no longer the built-in sum function! You would have to rename that variable. The real error is, however, that you are applying functions that take iterables as arguments to integers (Your loop variable B is an int while sum and len would expect a list or similar). The following would suffice:
r = range(10, A, 1) # == range(10, A)
Vidurkis = sum(r)/len(r) # only works for A > 10, otherwise ZeroDivisionError

find the first occurrence of a number greater than k in a sorted array

For the given sorted list,the program should return the index of the number in the list which is greater than the number which is given as input.
Now when i run code and check if it is working i am getting 2 outputs. One is the value and other output is None.
If say i gave a input of 3 for the below code.The expected output is index of 20 i.e., 1 instead i am getting 1 followed by None.
If i give any value that is greater than the one present in the list i am getting correct output i.e., "The entered number is greater than the numbers in the list"
num_to_find = int(input("Enter the number to be found"))
a=[2,20,30]
def occur1(a,num_to_find):
j = i = 0
while j==0:
if a[len(a)-1] > num_to_find:
if num_to_find < a[i]:
j=1
print(i)
break
else:
i = i + 1
else:
ret_state = "The entered number is greater than the numbers in the list"
return ret_state
print(occur1(a,num_to_find))
This code is difficult to reason about due to extra variables, poor variable names (j is typically used as an index, not a bool flag), usage of break, nested conditionals and side effect. It's also inefficient because it needs to visit each element in the list in the worst case scenario and fails to take advantage of the sorted nature of the list to the fullest. However, it appears working.
Your first misunderstanding is likely that print(i) is printing the index of the next largest element rather than the element itself. In your example call of occur1([2, 20, 30], 3)), 1 is where 20 lives in the array.
Secondly, once the found element is printed, the function returns None after it breaks from the loop, and print dutifully prints None. Hopefully this explains your output--you can use return a[i] in place of break to fix your immediate problem and meet your expectations.
Having said that, Python has a builtin module for this: bisect. Here's an example:
from bisect import bisect_right
a = [1, 2, 5, 6, 8, 9, 15]
index_of_next_largest = bisect_right(a, 6)
print(a[index_of_next_largest]) # => 8
If the next number greater than k is out of bounds, you can try/except that or use a conditional to report the failure as you see fit. This function takes advantage of the fact that the list is sorted using a binary search algorithm, which cuts the search space in half on every step. The time complexity is O(log(n)), which is very fast.
If you do wish to stick with a linear algorithm similar to your solution, you can simplify your logic to:
def occur1(a, num_to_find):
for n in a:
if n > num_to_find:
return n
# test it...
a = [2, 5, 10]
for i in range(11):
print(i, " -> ", occur1(a, i))
Output:
0 -> 2
1 -> 2
2 -> 5
3 -> 5
4 -> 5
5 -> 10
6 -> 10
7 -> 10
8 -> 10
9 -> 10
10 -> None
Or, if you want the index of the next largest number:
def occur1(a, num_to_find):
for i, n in enumerate(a):
if n > num_to_find:
return i
But I want to stress that the binary search is, by every measure, far superior to the linear search. For a list of a billion elements, the binary search will make about 20 comparisons in the worst case where the linear version will make a billion comparisons. The only reason not to use it is if the list can't be guaranteed to be pre-sorted, which isn't the case here.
To make this more concrete, you can play with this program (but use the builtin module in practice):
import random
def bisect_right(a, target, lo=0, hi=None, cmps=0):
if hi is None:
hi = len(a)
mid = (hi - lo) // 2 + lo
cmps += 1
if lo <= hi and mid < len(a):
if a[mid] < target:
return bisect_right(a, target, mid + 1, hi, cmps)
elif a[mid] > target:
return bisect_right(a, target, lo, mid - 1, cmps)
else:
return cmps, mid + 1
return cmps, mid + 1
def linear_search(a, target, cmps=0):
for i, n in enumerate(a):
cmps += 1
if n > target:
return cmps, i
return cmps, i
if __name__ == "__main__":
random.seed(42)
trials = 10**3
list_size = 10**4
binary_search_cmps = 0
linear_search_cmps = 0
for n in range(trials):
test_list = sorted([random.randint(0, list_size) for _ in range(list_size)])
test_target = random.randint(0, list_size)
res = bisect_right(test_list, test_target)[0]
binary_search_cmps += res
linear_search_cmps += linear_search(test_list, test_target)[0]
binary_search_avg = binary_search_cmps / trials
linear_search_avg = linear_search_cmps / trials
s = "%s search made %d comparisons across \n%d searches on random lists of %d elements\n(found the element in an average of %d comparisons\nper search)\n"
print(s % ("binary", binary_search_cmps, trials, list_size, binary_search_avg))
print(s % ("linear", linear_search_cmps, trials, list_size, linear_search_avg))
Output:
binary search made 12820 comparisons across
1000 searches on random lists of 10000 elements
(found the element in an average of 12 comparisons
per search)
linear search made 5013525 comparisons across
1000 searches on random lists of 10000 elements
(found the element in an average of 5013 comparisons
per search)
The more elements you add, the worse the situation looks for the linear search.
I would do something along the lines of:
num_to_find = int(input("Enter the number to be found"))
a=[2,20,30]
def occur1(a, num_to_find):
for i in a:
if not i <= num_to_find:
return a.index(i)
return "The entered number is greater than the numbers in the list"
print(occur1(a, num_to_find))
Which gives the output of 1 (when inputting 3).
The reason yours gives you 2 outputs, is because you have 2 print statements inside your code.

If elif one liner

if i == len(a):
tempList.extend(b[j:])
break
elif j == len(b):
tempList.extend(a[i:])
break
I am using this in a mergesort-program in Python. Is there any way to put this into a oneliner?
Maybe, but let's give a dedicated non-answer: don't even try.
You don't write your code to be short. You write it so that:
it gets the job done in a straight forward manner
it clearly communicates its meaning to human readers
The above code does that already.
In other words: of course being precise is a valuable property of source code. So, when you have to equally readable pieces of code doing the same thing, and one version is a one-liner, and the other is way more lengthy - then you go for the short version.
But I very much doubt that the above can be expressed as readable as above - with less code.
You can use and and or boolean operations to make a pretty readable one-liner:
l = []
a = [1,2,3,4]
b = [8,9,10]
i = 4
j = 2
l.extend(i == len(a) and b[j:] or j == len(b) and a[i:] or [])
l == [10]
i = 0
j = 3
l.extend(i == len(a) and b[j:] or j == len(b) and a[i:] or [])
l == [10, 1, 2, 3, 4]
This example uses next properties:
The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.
The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.
We have to add or [] to mitigate TypeError: 'bool' object is not iterable exception raised when i == len(a) and j > len(b) (e.g. i == 4 and j == 5).
I'd still prefer an expanded version though.

How to determine which nested generator produces StopIteration exception?

I bumped into a situation where I need to determine in my try/except code which nested generator is raising a StopIteration exception. How do I do it? The following is a dummy example:
def genOne(iMax, jMax):
i = 0;
g2 = genTwo(jMax)
while i <= iMax:
print('genOne: ' + str(i))
next(g2)
yield
i = i + 1
def genTwo(jMax):
j = 0;
while j <= jMax:
print('genTwo: ' + str(j))
yield
j = j + 1
g1 = genOne(6, 3) # The inputs are arbitrary numbers
try:
while True:
next(g1)
except:
# Do some processing depending on who generates the StopIteration exception
Thanks!
This can be generalized to the problem of finding the origin of an arbitrary exception.
Use the traceback module to inspect the stacktrace of your exception object.
Here is a previous answer on a similar subject.
Some example code:
g1 = genOne(6, 3) # The inputs are arbitrary numbers
try:
while True:
next(g1)
except:
exc_type, exc_value, exc_traceback = sys.exc_info()
print(traceback.extract_tb(exc_traceback)[-1])
Shell output:
> ./test.py
genOne: 0
genTwo: 0
genOne: 1
genTwo: 1
genOne: 2
genTwo: 2
genOne: 3
genTwo: 3
genOne: 4
('./test.py', 12, 'genOne', 'next(g2)')
Note that the [-1] in the extract_tb() call explicitly checks only the first lower level of the stacktrace. With the print you can see which element of that output you'd need to check (genOne -> item index #2 in that list). In your particular example you'd probably want to check if the lowest level generator string genTwo exists in any of the elements of the traceback.extract_tb(exc_traceback) array.
Those hardcoded checks relying on internal code details are been frowned upon, especially since in your particular example you do not have control over their implementation.

11+ digit ints not working

I'm using python 3 for a small extra credit assignment to write an RSA cracker. The teacher has given us a fairly large (large enough to require more than 32 bits) int and the public key. My code works for primes < 32 bits. One of the reasons I chose python 3 is because I heard it can handle arbitrarily large integers. In the python terminal I tested this by doing small things such as 2**35 and factorial(70). This stuff worked fine.
Now that I've written the code, I'm running in to problems with overflow errors etc. Why is it that operations on large numbers seem to work in the terminal but won't work in my actual code? The errors state that they cannot be converted to their C types, so my first guess would be that for some reason the stuff in the python interpreter is not being converter to C types while the coded stuff is. Is there anyway to get this working?
As a first attempt, I tried calculating a list of all primes between 1 and n (the large number). This sort of worked until I realized that the list indexers [ ] only accept ints and explode if the number is higher than int. Also, creating an array that is n in length won't work if n > 2**32. (not to mention the memory this would take up)
Because of this, I switched to using a function I found that could give a very accurate guess as to whether or not a number was prime. These methods are pasted below.
As you can see, I am only doing , *, /, and % operations. All of these seem to work in the interpreter but I get "cannot convert to c-type" errors when used with this code.
def power_mod(a,b,n):
if b < 0:
return 0
elif b == 0:
return 1
elif b % 2 == 0:
return power_mod(a*a, b/2, n) % n
else:
return (a * power_mod(a,b-1,n)) % n
Those last 3 lines are where the cannot convert to c-type appears.
The below function estimates with a very high degree of certainty that a number is prime. As mentioned above, I used this to avoid creating massive arrays.
def rabin_miller(n, tries = 7):
if n == 2:
return True
if n % 2 == 0 or n < 2:
return False
p = primes(tries**2)
if n in p:
return True
s = n - 1
r = 0
while s % 2 == 0:
r = r+1
s = s/2
for i in range(tries):
a = p[i]
if power_mod(a,s,n) == 1:
continue
else:
for j in range(0,r):
if power_mod(a, (2**j)*s, n) == n - 1:
break
else:
return False
continue
return True
Perhaps I should be more specific by pasting the error:
line 19, in power_mod
return (a * power_mod(a,b-1,n)) % n
OverflowError: Python int too large to convert to C double
This is the type of error I get when performing arithmetic. Int errors occur when trying to create incredibly large lists, sets etc
Your problem (I think) is that you are converting to floating point by using the / operator. Change it to // and you should stay in the int domain.
Many C routines still have C int limitations. Do your work using Python routines instead.

Resources