Quick sort counting - python-3.x

Python questions again.
I want to count the number of comparison operations performed by quick sort. Because I use a recursive function, I do not think that assigning count = 0 to the beginning of the function body is inappropriate, so I made it as follows.
def QuickSort(lst, count = 0):
if len(lst) > 1:
pivot_idx = len(lst) // 2
smaller_nums, larger_nums = [], []
for idx, num in enumerate(lst):
if idx != pivot_idx:
if num < lst[pivot_idx]:
smaller_nums.append(num)
else:
larger_nums.append(num)
count = QuickSort(smaller_nums, count + 1)[1]
count = QuickSort(larger_nums, count + 1)[1]
lst[:] = smaller_nums + [lst[pivot_idx]] + larger_nums
return lst, count
However, after counting, I confirmed the count which is much lower than my expectation. According to big o, the quick sort would have to show the calculation of n * log (n), but it showed a much lower count. For example, when sorting a list with 1000 random elements, we expected to see a count of 1000 * log (1000) = 6907, but actually only 1164 counts. I am wondering if I am misusing the count in the function or misunderstanding it.
Thank you.

Your post is mistaken on several points:
Big-O is allows arbitrary constant factors and also ignoring the values for "small" values of n, where "small" can be arbitrarily large for any given analysis. So your computations are meaningless.
Your counts are wrong. There's one comparison per loop iteration. You're counting something else.
This is a strange way to code the count. Just use a global variable.
Try this. Note really you're using twice as many comparisons as this reports. The check that the loop index isn't the pivot could be eliminated with a smarter implementation.
c = 0
def QuickSort(lst):
if len(lst) <= 1:
return lst
pivot_idx = len(lst) // 2
smaller, larger = [], []
for idx, num in enumerate(lst):
if idx != pivot_idx:
global c
c += 1
(larger, smaller)[num < lst[pivot_idx]].append(num)
return QuickSort(smaller) + [lst[pivot_idx]] + QuickSort(larger)
def Run(n):
lst = [random.randint(0,1000) for r in xrange(n)]
QuickSort(lst)
print c
Run(1000)
If you're aghast at the prospect of using a global variable, then you can just wrap the sort in a class:
import random
class QuickSort:
def __init__(self):
self.comparisons = 0
def sort(self, lst):
if len(lst) <= 1:
return lst
pivot_idx = len(lst) // 2
smaller, larger = [], []
for idx, num in enumerate(lst):
if idx != pivot_idx:
self.comparisons += 1
(larger, smaller)[num < lst[pivot_idx]].append(num)
return self.sort(smaller) + [lst[pivot_idx]] + self.sort(larger)
def Run(n):
lst = [random.randint(0,1000) for r in xrange(n)]
quicksort = QuickSort()
print quicksort.sort(lst)
print quicksort.comparisons
Run(100)

Building on the answer provided by Gene by adding print statements and a sort "error" range, his example was very helpful to my understanding of quicksort and an error term on the big O impact of operations performance comparison.
class QuickSort:
def __init__(self):
self.comparisons = 0
def sort(self, lst):
k_err = 0 # k << n, the value the sort array can be in error
if len(lst) <= 1:
return lst
pivot_idx = len(lst) // 2
smaller, larger = [], []
for idx, num in enumerate(lst):
if idx != (pivot_idx) :
self.comparisons += 1
try:
(larger, smaller)[(num - k_err) < lst[pivot_idx]].append(num)
except:
(larger, smaller)[(num + k_err) < lst[pivot_idx]].append(num)
print(pivot_idx,"larger", self.comparisons, larger)
print(pivot_idx, "smaller", self.comparisons, smaller, )
return self.sort(smaller) + [lst[pivot_idx]] + self.sort(larger)
def Run(n):
random.seed(100)
lst = [random.randint(0,round(100,0)) for r in range(n)]
quicksort = QuickSort()
print(len(lst), lst)
print(quicksort.sort(lst))
print(quicksort.comparisons, quicksort.comparisons/n, ((quicksort.comparisons/n)/math.log(n,10)), math.log(n,10) )

Related

Python: Roll a dice for 12 times, calculate the probability if each number equally shows up twice

I've drafted the below code for the captioned question, but the return result is always 0. Could anyone please help me figure out what's the problem here?
Thanks a lot!
import random
dice_sides = 6
frequency_list = []
def roll_dice(times):
results = []
for roll_num in range(times):
result = random.randint(1,dice_sides)
results.append(result)
for i in range(dice_sides):
if results.count(i) != 2:
frequency = 0
break
else:
frequency = 1
return frequency
def occurrence(N,times):
for j in range(N):
frequency_list.append(roll_dice(times))
prob = frequency_list.count(1)
return prob
print(occurrence(10000,12))
You can try something like this
Code
import random
from collections import Counter
def roll_dice(n_sides, times):
if n_sides % times:
return 0
results = []
for roll_num in range(times):
result = random.randint(1, n_sides)
results.append(result)
# I'm using set here and will check its length,
# Counter(results) returns a dict of items (item, count)
# and if every item has the same count it should have length 1.
# More generic statement not only for (2 in this case)
res_dict = set(Counter(results).values())
if len(res_dict) == 1:
return 1
return 0
def mean(ar):
return sum(ar)/len(ar)
def occurrence(N, n_sides, times):
frequency_list = []
for j in range(N):
frequency_list.append(roll_dice(n_sides, times))
prob = mean(frequency_list)
return prob
if __name__ == '__main__':
N = 100000 # I intentionally made it 100k
n_sides = 6
times = 12
res_prob = occurrence(N, times)
print(res_prob)
Output
0.00604
[Finished in 3.6s]

Buggy merge sort implementation

My merge sort implementation is buggy since i am not getting two sorted list before calling merge.I am not sure what is wrong with it.
def mergeSort(arr):
if len(arr) == 1 : return arr
mid = len(arr) // 2
left_half = arr[:mid]
right_half = arr[mid:]
mergeSort(left_half)
mergeSort(right_half)
return merge(left_half,right_half)
def merge(list1,list2):
res = []
i = 0
j = 0
while i < len(list1) and j < len(list2):
if list1[i] < list2[j]:
res.append(list1[i])
i += 1
elif list1[i] > list2[j]:
res.append(list2[j])
j += 1
#Add remaining to res if any
while i < len(list1):
res.append(list1[i])
i += 1
while j < len(list2):
res.append(list2[j])
j += 1
return res
A = [5,1,2,15]
print(mergeSort(A))
My understanding of merge sort is that the space complexity is O(n) since n items in memory (in the final merge).Is quick sort preferred over merge sort just because quick sort is in-place?
I not python expert, but you should write
left_half = arr[:mid]
right_half = arr[mid:]
left_half = mergeSort(left_half)
right_half = mergeSort(right_half)
Because your mergeSort return copy of sorted array.
You have 3 mistakes in your code.
The first is that you don't handle the empty list. You need a <= instead of an == in your second line.
The second is that by simply calling mergeSort(left_half) you suppose that it will sort left_half “by reference”, which it doesn't (same with right_half).
The third is that you aren't doing anything in the case list1[i] == list2[j]. Actually you don't need that elif, you simply need an else. It doesn't matter whether you append list1[i] or list2[j] if they are equal, but you must append one of the two.
Your code should rather be:
def mergeSort(arr):
if len(arr) <= 1 : return arr
mid = len(arr) // 2
left_half = mergeSort(arr[:mid])
right_half = mergeSort(arr[mid:])
return merge(left_half, right_half)
def merge(list1, list2):
res = []
i = 0
j = 0
while i < len(list1) and j < len(list2):
if list1[i] < list2[j]:
res.append(list1[i])
i += 1
else:
res.append(list2[j])
j += 1
#Add remaining to res if any
...
As for your questions about space complexity and comparison with quicksort, there are already answers here on StackOverflow (here and here).

Using Recursive Functions in Python to find Factors of a Given Number

Have tried searching for this, but can't find exactly what I'm looking for.
I want to make a function that will recursively find the factors of a number; for example, the factors of 12 are 1, 2, 3, 4, 6 & 12.
I can write this fairly simply using a for loop with an if statement:
#a function to find the factors of a given number
def print_factors(x):
print ("The factors of %s are:" % number)
for i in range(1, x + 1):
if number % i == 0: #if the number divided by i is zero, then i is a factor of that number
print (i)
number = int(input("Enter a number: "))
print (print_factors(number))
However, when I try to change it to a recursive function, I am getting just a loop of the "The factors of x are:" statement. This is what I currently have:
#uses recursive function to print all the letters of an integer
def print_factors(x): #function to print factors of the number with the argument n
print ("The factors of %s are:" % number)
while print_factors(x) != 0: #to break the recursion loop
for i in range(1,x + 1):
if x % i == 0:
print (i)
number = int(input("Enter a number: "))
print_factors(number)
The error must be coming in either when I am calling the function again, or to do with the while loop (as far as I understand, you need a while loop in a recursive function, in order to break it?)
There are quite many problems with your recursive approach. In fact its not recursive at all.
1) Your function doesn't return anything but your while loop has a comparision while print_factors(x) != 0:
2) Even if your function was returning a value, it would never get to the point of evaluating it and comparing due to the way you have coded.
You are constantly calling your function with the same parameter over and over which is why you are getting a loop of print statements.
In a recursive approach, you define a problem in terms of a simpler version of itself.
And you need a base case to break out of recursive function, not a while loop.
Here is a very naive recursive approach.
def factors(x,i):
if i==0:
return
if x%i == 0:
print(i)
return factors (x,i-1) #simpler version of the problem
factors(12,12)
I think we do using below method:
def findfactor(n):
factorizeDict
def factorize(acc, x):
if(n%x == 0 and n/x >= x):
if(n/x > x):
acc += [x, n//x]
return factorize(acc, x+1)
else:
acc += [x]
return acc
elif(n%x != 0):
return factorize(acc, x+1)
else:
return acc
return factorize(list(), 1)
def factors(x,i=None) :
if i is None :
print('the factors of %s are : ' %x)
print(x,end=' ')
i = int(x/2)
if i == 0 :
return
if x % i == 0 :
print(i,end=' ')
return factors(x,i-1)
num1 = int(input('enter number : '))
print(factors(num1))
Recursion is a functional heritage and so using it with functional style yields the best results. This means avoiding things like mutations, variable reassignments, and other side effects. That said, here's how I'd write factors -
def factors(n, m = 2):
if m >= n:
return
if n % m == 0:
yield m
yield from factors(n, m + 1)
print(list(factors(10))) # [2,5]
print(list(factors(24))) # [2,3,4,6,8,12]
print(list(factors(99))) # [3,9,11,33]
And here's prime_factors -
def prime_factors(n, m = 2):
if m > n:
return
elif n % m == 0:
yield m
yield from prime_factors(n // m, m)
else:
yield from prime_factors(n, m + 1)
print(list(prime_factors(10))) # [2,5]
print(list(prime_factors(24))) # [2,2,2,3]
print(list(prime_factors(99))) # [3,3,11]
def fact (n , a = 2):
if n <= a :
return n
elif n % a != 0 :
return fact(n , a + 1 )
elif n % a == 0:
return str(a) + f" * {str(fact(n / a , a ))}"
Here is another way. The 'x' is the number you want to find the factors of. The 'c = 1' is used as a counter, using it we'll divide your number by 1, then by 2, all the way up to and including your nubmer, and if the modular returns a 0, then we know that number is a factor, so we print it out.
def factors (x,c=1):
if c == x: return x
else:
if x%c == 0: print(c)
return factors(x,c+1)

'list index out of range' in while loop designed to return two values from list that adds to a specific sum

Line 11 produces the error. Stepping through the code doesn't reveal a problem?
The code just points at from left and right ends of list, moving pointers toward per iteration until a target sum is found or not! Doesn't look like the loops can step on itself but seems to anyway.
def twoSum(num_array, sum):
'''1.twoSum
Given an array of integers, return indices of the two numbers that
add up to a specific target.
'''
array = sorted(num_array)
l = array[0]
r = array[len(array)-1]
indx_Dict = dict(enumerate(array))
while (l < r) :
if (array[l] + array[r]) == sum:
return [indx_Dict[l], indx_Dict[r]]
elif array[l] + array[r] < sum:
l += 1
else:
r -= 1
num_array1 = [2, 7, 11, 15,1,0]
target1 = 9
twoSum(num_array1, target1)
that is what i changed:
array[len(array)-1] -> len(array)-1 (that's what caused your IndexError)
indx_Dict: i changed it such that indx_Dict[sorted_index] = original_index
sum -> sum_: sum is a built-in. it is never a good idea to use one of those as variable name! (yes, the new name could be better)
this is the final code:
def two_sum(num_array, sum_):
'''1.twoSum
Given an array of integers, return indices of the two numbers that
add up to a specific target.
'''
array = sorted(num_array)
l = 0
r = len(array)-1
indx_Dict = {array.index(val): index for index, val in enumerate(num_array)} ##
while (l < r) :
if (array[l] + array[r]) == sum_:
return [indx_Dict[l], indx_Dict[r]]
elif array[l] + array[r] < sum_:
l += 1
else:
r -= 1
here is a discussion about this problem:
Find 2 numbers in an unsorted array equal to a given sum (which you seem to be aware of - looks like what you are trying to do). this is a python version of just that:
def two_sum(lst, total):
sorted_lst = sorted(lst)
n = len(lst)
for i, val0 in enumerate(sorted_lst):
for j in range(n-1, i, -1):
val1 = sorted_lst[j]
s = val0 + val1
if s < total:
break
if s == total:
return sorted((lst.index(val0), lst.index(val1)))
return None
this version is based on looping over the indices i and j.
now here is a version that i feel is more pythonic (but maybe a little bit harder to understand; but it does the exact same as the one above). it ignores the index j completely as it is not really needed:
from itertools import islice
def two_sum(lst, total):
n = len(lst)
sorted_lst = sorted(lst)
for i, val0 in enumerate(sorted_lst):
for val1 in islice(reversed(sorted_lst), n-i):
s = val0 + val1
if s < total:
break
if s == total:
return sorted((lst.index(val0), lst.index(val1)))
return None
aaaaand just for the fun of it: whenever there is a sorted list in play i feel the need to use the bisect module. (a very rudimentary benchmark showed that this may perform better for n > 10'000'000; n being the length of the list. so maybe not worth it for all practical purposes...)
def two_sum_binary(lst, total):
n = len(lst)
sorted_lst = sorted(lst)
for i, val0 in enumerate(sorted_lst):
# binary search in sorted_lst[i:]
j = bisect_left(sorted_lst, total-val0, lo=i)
if j >= n:
continue
val1 = sorted_lst[j]
if val0 + val1 == total:
return sorted((lst.index(val0), lst.index(val1)))
else:
continue
return None
for (a bit more) completeness: there is a dictionary based approach:
def two_sum_dict(lst, total):
dct = {val: index for index, val in enumerate(lst)}
for i, val in enumerate(lst):
try:
return sorted((i, dct[total-val]))
except KeyError:
pass
return None
i hope the code serves as its own explanation...
l and r are not your indices, but values from your array.
Say you have an array: [21,22,23,23]. l is 21, r is 23; therefore, calling array[21] is out of bounds.
Additionally, you would have a problem with your indx_Dict. You call enumerate on it, which returns [(0,21),...(3,23)]. Calling dict gives you {0:21,1:22,2:23,3:23}. There is no key equivalent to 21 or 23, which will also give you an error.
What you could try is:
def twoSum(num_array, asum):
'''1.twoSum
Given an array of integers, return indices of the two numbers that
add up to a specific target.
'''
array = sorted(num_array)
l = 0
r = len(array)-1
while (l < len(array)-1) :
while (r > l):
if (array[l] + array[r]) == asum:
return [num_array.index(array[l]),\
num_array.index(array[r])]
r -= 1
r = len(array)-1
l += 1
num_array1 = [2, 7, 11, 15,1,0]
target1 = 9
twoSum(num_array1, target1)
This way, your l and r are both indices of the sorted array. It goes through every possible combination of values from the array, and returns when it either has found the sum or gone through everything. It then returns the index of the original num_array that contains the correct values.
Also, as #hiro-protagonist said, sum is a built-in function in Python already, so it should be changed to something else (asum in my example).

Counting substrings in string

Lets assume that i have 2 strings
M = "sses"
N = "assesses"
I have to count how many times string M is present into string N
I am not allowed to use any import or methods just loops and range() if needed.
M = "sses"
N = "assesses"
counter = 0
if M in N:
counter +=1
print(counter)
This isn't good enough i need loop to go trough N and count all M present
in this case it is 2.
def count(M, N):
i = 0
count = 0
while True:
try:
i = N.index(M, i)+1
count += 1
except ValueError:
break
return count
Or a one-liner without str.index:
def count(M, N):
return sum(N[i:i+len(M)]==M for i in range(len(N)-len(M)+1))
The same without using the sum function:
def count(M, N):
count = 0
for i in range(len(N)-len(M)+1):
if N[i:i+len(M)] == M:
count += 1
return count

Resources