Why is heapq.heapify so fast? - python-3.x

I have tried to reimplement heapify method in order to use _siftup and _siftdown for updating or deleting any nodes in the heap and maintaining a time complexity of O(log(n)).
I did some effort for optimizing my code, But they proved to be worse compared to that of heapq.heapify (in terms of the total time taken). So I have decided to look into source code. and compared copied code with the modules ones.
# heap invariant.
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if newitem < parent:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
def heapify(x):
"""Transform list into a heap, in-place, in O(len(x)) time."""
n = len(x)
# Transform bottom-up. The largest index there's any point to looking at
# is the largest with a child index in-range, so must have 2*i + 1 < n,
# or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
for i in reversed(range(n//2)):
_siftup(x, i)
a = list(reversed(range(1000000)))
b = a.copy()
import heapq
import time
cp1 = time.time()
heapq.heapify(a)
cp2 = time.time()
heapify(b)
cp3 = time.time()
print(a == b)
print(cp3 - cp2, cp2 - cp1)
And i found always cp3 - cp2 >= cp2 - cp1 and not same , heapify took more time than heapq.heaify even though both were same.
And in some case heapify took 3 seconds and heapq.heapify took 0.1 seconds
heapq.heapfy module executes faster than the same heapify they only differ through import.
please let me know the reason, I am sorry if I made some silly mistakes.

The heapify from the heapq module is actually a built-in function:
>>> import heapq
>>> heapq
<module 'heapq' from 'python3.9/heapq.py'>
>>> heapq.heapify
<built-in function heapify>
help(heapq.heapify) says:
Help on built-in function heapify in module _heapq...
So it's actually importing the built-in module _heapq and thus running C code, not Python.
If you scroll the heapq.py code further, you'll see this:
# If available, use C implementation
try:
from _heapq import *
except ImportError:
pass
This will overwrite functions like heapify with their C implementations. For instance, _heapq.heapify is here.

Related

Speeding up this PySCIPOpt routine for a Mixed Integer Program

I'm writing code to find median orders of tournaments, given a tournament T with n vertices, a median order is an ordering of the vertices of T such that it maximices the number of edges pointing in the 'increasing' direction with respect to the ordering.
In particular if the vertex set of T is {0,...,n-1}, the following integer problem (maximizing over the set of permutations) yields an optimal answer to the problem where Q is also a boolean n by n matrix.
I've implemented a linealization of this problem, noting that Q is a permutation, the following python code works, but my computer can't handle graphs that are as small as 10 vertices, where i would expect to have fast answers, is there any relatively easy way to speed up this computation?.
import numpy as np
from pyscipopt import Model,quicksum
from networkx.algorithms.tournament import random_tournament as rt
import math
# Some utilities to define the adjacency matrix of an oriented graph
def adjacency_matrix(t,order):
n = len(order)
adj_t = np.zeros((n,n))
for e in t.edges:
adj_t[order.index(e[0]),order.index(e[1])] = 1
return adj_t
# Random tournaments to instanciate the problem
def random_tournament(n):
r_t = rt(n)
adj_t = adjacency_matrix(r_t,list(range(n)))
return r_t, adj_t
###############################################################
############# PySCIPOpt optimization Routine ##################
###############################################################
n = 5 # some arbitrary size parameter
t,adj_t = random_tournament(n)
model = Model()
p,w,r = {},{},{}
# Defining model variables and weights
for k in range(n):
for l in range(n):
p[k,l] = model.addVar(vtype='B')
for i in range(n):
for j in range(i,n):
r[i,k,l,j] = model.addVar(vtype='C')
w[i,k,l,j] = adj_t[k][l]
for i in range(n):
# Forcing p to be a permutation
model.addCons(quicksum(p[s,i] for s in range(n))==1)
model.addCons(quicksum(p[i,s] for s in range(n))==1)
for k in range(n):
for j in range(i,n):
for l in range(n):
# Setting r[i,k,l,j] = min(p[i,k],p[l,j])
model.addCons(r[i,k,l,j] <= p[k,i])
model.addCons(r[i,k,l,j] <= p[l,j])
# Set the objective function
model.setObjective(quicksum(r[i,k,l,j]*w[i,k,l,j] for i in range(n) for j in range(i,n) for k in range(n) for l in range(n)), "maximize")
model.data = p,r
model.optimize()
sol = model.getBestSol()
# Print the solution on a readable format
Q = np.array([math.floor(model.getVal(model.data[0][key])) for key in model.data[0].keys()]).reshape([n,n])
print(f'\nOptimization ended with status {model.getStatus()} in {"{:.2f}".format(end_optimization-end_setup)}s, with {model.getObjVal()} increasing edges and optimal solution:')
print('\n',Q)
order = [int(x) for x in list(np.matmul(Q.T,np.array(range(n))))]
new_adj_t = adjacency_matrix(t,order)
print(f'\nwhich induces the ordering:\n\n {order}')
print(f'\nand induces the following adjacency matrix: \n\n {new_adj_t}')
Right now I've run it for n=5 taking between 5 and 20 seconds, and have ran it succesfully for small integers 6,7 with not much change in time needed.
For n=10, on the other hand, it has been running for around an hour with no solution yet, I suppose the linearization having O(n**4) variables hurts, but I don't understand why it blows up so fast. Is this normal? How would a better implementation be in case there is one?.

Limit of Python recursion functions (Process finished with exit code 139)

I had an old script that from a pandas dataframe calculates new columns from others, but also from the previous result of that column being calculated.
This script used for loops, and it was quite slow.
For this reason, I replaced the for loops with recursive functions.
The new script is around 100 times faster than the old one, which is good news. But I am now encountering a limit that I did not have before. As soon as I have more than 29952 rows in my dataset, I get the following error:
"Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"
I made this little script with lists reflecting my problem :
If I increase the size of the lists (list_lenght) to more than 29952, the script crashes (on my computer)
import random
import sys
def list_generator(min_value, max_value, list_lenght):
return [random.randrange(min_value,max_value) for i in range(list_lenght)]
def recursive_function(list_1, list_2, n, result):
if n == len(list_1):
return result
elif list_1[n] <= list_2[n]:
result.append(1 + result[n - 1])
else:
result.append(0)
return recursive_function(list_1, list_2, (n + 1), result)
list_lenght = 29952 # How to increase this limit without generating an error?
min_value = 10
max_value = 20
list_one = list_generator(min_value, max_value, list_lenght)
list_two = list_generator(min_value, max_value, list_lenght)
# Set recursion limit
sys.setrecursionlimit(list_lenght * 2)
# Compute a new list from list_one and list_two
list_result = recursive_function(list_one, list_two, 1, [0])
I suspect a memory problem, but how do you take advantage of all the power of python's recursive functions while avoiding this limit as well as possible?
Thanks in advance
Following comment from #trincot, here is the version of the code without recursion function... which is ultimately faster than the version above with a recursive function ! And with which there are no more limits
def no_recursive_function(list_1, list_2, n, result):
if list_1[n] <= list_2[n]:
return 1 + result[n - 1]
else:
return 0
list_lenght = 29952
min_value = 10
max_value = 20
list_one = list_generator(min_value, max_value, list_lenght)
list_two = list_generator(min_value, max_value, list_lenght)
# Set recursion limit
sys.setrecursionlimit(list_lenght * 2)
list_result_2 = [0]
for n in range(list_lenght - 1):
result = no_recursive_function(list_one, list_two, n + 1, list_result_2)
list_result_2.append(result)

Is the Benchmarking of my Algorithms right?

i wrote Quicksort and Mergesort and a Benchmark for them, to see how fast they are.
Here is my code:
#------------------------------Creating a Random List-----------------------------#
def create_random_list (length):
import random
random_list = list(range(0,length))
random.shuffle(random_list)
return random_list
# Initialize default list length to 0
random_list = create_random_list(0)
# Testing random list function
print ("\n" + "That is a randomized list: " + "\n")
print (random_list)
print ("\n")
#-------------------------------------Quicksort-----------------------------------#
"""
Recursive Divide and Conquer Algorithm
+ Very efficient for large data set
- Performance Depends largely on Pivot Selection
Time Complexity
--> Worst-Case -----> O (n^2)
--> Best-Case -----> Ω (n log (n))
--> Average Case ---> O (n log (n))
Space Complexity
--> O(log(n))
"""
# Writing the Quick Sort Algorithm for sorting the list - Recursive Method
def qsort (random_list):
less = []
equal = []
greater = []
if len(random_list)>1:
# Initialize starting Point
pivot = random_list[0]
for x in random_list:
if x < pivot:
less.append(x)
elif x == pivot:
equal.append(x)
elif x > pivot:
greater.append(x)
return qsort(less) + equal + qsort(greater)
else:
return random_list
"""
Build in Python Quick Sort:
def qsort(L):
if len(L) <= 1: return L
return qsort([lt for lt in L[1:] if lt < L[0]]) + L[0:1] + \
qsort([ge for ge in L[1:] if ge >= L[0]])
"""
# Calling Quicksort
sorted_list_qsort = qsort(random_list)
# Testint Quicksort
print ("That is a sorted list with Quicksort: " + "\n")
print (sorted_list_qsort)
print ("\n")
#-------------------------------------FINISHED-------------------------------------#
#-------------------------------------Mergesort------------------------------------#
"""
Recursive Divide and Conquer Algorithm
+
-
Time Complexity
--> Worst-Case -----> O (n l(n))
--> Best-Case -----> Ω (n l(n))
--> Average Case ---> O (n l(n))
Space Complexity
--> O (n)
"""
# Create a merge algorithm
def merge(a,b): # Let a and b be two arrays
c = [] # Final sorted output array
a_idx, b_idx = 0,0 # Index or start from a and b array
while a_idx < len(a) and b_idx < len(b):
if a[a_idx] < b[b_idx]:
c.append(a[a_idx])
a_idx+=1
else:
c.append(b[b_idx])
b_idx+=1
if a_idx == len(a): c.extend(b[b_idx:])
else: c.extend(a[a_idx:])
return c
# Create final Mergesort algorithm
def merge_sort(a):
# A list of zero or one elements is sorted by definition
if len(a)<=1:
return a
# Split the list in half and call Mergesort recursively on each half
left, right = merge_sort(a[:int(len(a)/2)]), merge_sort(a[int(len(a)/2):])
# Merge the now-sorted sublists with the merge function which sorts two lists
return merge(left,right)
# Calling Mergesort
sorted_list_mgsort = merge_sort(random_list)
# Testing Mergesort
print ("That is a sorted list with Mergesort: " + "\n")
print (sorted_list_mgsort)
print ("\n")
#-------------------------------------FINISHED-------------------------------------#
#------------------------------Algorithm Benchmarking------------------------------#
# Creating an array for iterations
n = [100,1000,10000,100000]
# Creating a dictionary for times of algorithms
times = {"Quicksort":[], "Mergesort": []}
# Import time for analyzing the running time of the algorithms
from time import time
# Create a for loop which loop through the arrays of length n and analyse their times
for size in range(len(n)):
random_list = create_random_list(n[size])
t0 = time()
qsort(random_list)
t1 = time()
times["Quicksort"].append(t1-t0)
random_list = create_random_list(n[size-1])
t0 = time()
merge_sort(random_list)
t1 = time()
times["Mergesort"].append(t1-t0)
# Create a table while shows the Benchmarking of the algorithms
print ("n\tMerge\tQuick")
print ("_"*25)
for i,j in enumerate(n):
print ("{}\t{:.5f}\t{:.5f}\t".format(j, times["Mergesort"][i], times["Quicksort"][i]))
#----------------------------------End of Benchmarking---------------------------------#
The code is well documented and runs perfectly with Python 3.8. You may copy it in a code editor for better readability.
--> My Question as the title states:
Is my Benchmarking right? I'm doubting it a litte bit, because the running times of my Algorithms seem a little odd. Can someone confirm my runtime?
--> Here is the output of this code:
That is a randomized list:
[]
That is a sorted list with Quicksort:
[]
That is a sorted list with Mergesort:
[]
n Merge Quick
_________________________
100 0.98026 0.00021
1000 0.00042 0.00262
10000 0.00555 0.03164
100000 0.07919 0.44718
--> If someone has another/better code snippet on how to print the table - feel free to share it with me.
The error is in n[size-1]: when size is 0 (the first iteration), this translates to n[-1], which corresponds to your largest size. So in the first iteration you are comparing qsort(100) with merge_sort(100000), which obviously will favour the first a lot. It doesn't help that you call this variable size, as it really isn't the size, but the index in the n list, which contains the sizes.
So remove the -1, or even better: iterate directly over n. And I would also make sure both sorting algorithms get to sort the same list:
for size in n:
random_list1 = create_random_list(size)
random_list2 = random_list1[:]
t0 = time()
qsort(random_list1)
t1 = time()
times["Quicksort"].append(t1-t0)
t0 = time()
merge_sort(random_list2)
t1 = time()
times["Mergesort"].append(t1-t0)
Finally, consider using timeit which is designed for measuring performance.

How to implement quick sort on python?

maybe I’m the next person who asks how to release quick sort on python correctly. But it’s important for me to know if I wrote this algorithm correctly by reading the pseudocode from the textbook Essential Algorithms: A Practical Approach to Computer Algorithms.
When I run the code, I get this message. RecursionError: maximum recursion depth exceeded in comparison
import random
def quickSort(arr, start, end):
if start >= end: # if len(arr) < 2
return arr
else:
divideIndex = partition(arr, start, end)
quickSort(arr, start, divideIndex - 1)
quickSort(arr, divideIndex, end)
def partition(arr, head, tail):
left = head
right = tail
pivot = arr[(head + tail) // 2] # mid
while right >= left:
# looking through the array from the left
while arr[left] <= pivot:
left = left + 1
# looking through the array from the right
while arr[right] > pivot:
right = right - 1
# found a couple of elements that can be exchanged.
if left <= right:
swap(arr[right], arr[left])
# move the left and right wall
left = left + 1
right = right - 1
# return one elements if not found a couple
return left
def swap(arr1, arr2):
temp = arr1
arr1 = arr2
arr2 = temp
# Generator random variables
deck = list(range(50))
random.shuffle(deck)
start = 0
end = len(deck) - 1
print(quickSort(deck, start, end))
Try this:
def partition(arr,low,high):
i = ( low-1 )
pivot = arr[high]
for j in range(low , high):
if arr[j] <= pivot:
i = i+1
arr[i],arr[j] = arr[j],arr[i]
arr[i+1],arr[high] = arr[high],arr[i+1]
return ( i+1 )
def quickSort(arr,low,high):
if low < high:
pi = partition(arr,low,high)
quickSort(arr, low, pi-1)
quickSort(arr, pi+1, high)

The code is running but there is not output showing

The code is being executed but the output is not shown nor the variables are created
import numpy as np
def magicsquares():
n=input('enter the order of squares')
n=int(n)
m=np.zeros((n,n))
s=n*(n**2+1)/2 #sum of each row or diagonal
p=int(n/2)
q=(n-1)
for i in range(n**2):
m[p][q]=1 #assigning postion of 1
P=p-1
Q=q+1
if(i>=2): #assigning remaining positions
if(P==-1):
P=n-1
if(Q==n):
Q=0
there is not output showing because you are just declaring the function but not calling the function and there is no print/return inside the function. Here is a solution which you can use to see the output and work on:
import numpy as np
def magicsquares():
n = input('enter the order of squares')
n = int(n)
m = np.zeros((n, n))
s = n*(n**2+1)/2 # sum of each row or diagonal
p = int(n/2)
q = (n-1)
for i in range(n**2):
m[p][q] = 1 # assigning postion of 1
P = p-1
Q = q+1
if i >= 2: # assigning remaining positions
if P == -1:
P = n-1
if Q == n:
Q = 0
print(m)
magicsquares()
It is not the ultimate solution to find magic_square. It's just an updated version of your code so that you can see the outputs and work on.

Resources