why is siftdown working in heapsort, but not siftup? - python-3.x

I have a programming assignment as follows:
You will need to convert the array into a heap using only O(n) swaps, as was described in the lectures. Note that you will need to use a min-heap instead of a max-heap in this problem. The first line of the output should contain single integer m — the total number of swaps. m must satisfy conditions 0 ≤ m ≤ 4n. The next m lines should contain the swap operations used to convert the array a into a heap. Each swap is described by a pair of integers i,j — the 0-based indices of the elements to be swapped
I have implemented a solution using sifting up technique by comparing with parent's value which gave solutions to small text cases, when number of integers in array is less than 10,verified by manual checking, but it could not pass the test case with 100000 integers as input.
this is the code for that
class HeapBuilder:
def __init__(self):
self._swaps = [] #array of tuples or arrays
self._data = []
def ReadData(self):
n = int(input())
self._data = [int(s) for s in input().split()]
assert n == len(self._data)
def WriteResponse(self):
print(len(self._swaps))
for swap in self._swaps:
print(swap[0], swap[1])
def swapup(self,i):
if i !=0:
if self._data[int((i-1)/2)]> self._data[i]:
self._swaps.append(((int((i-1)/2)),i))
self._data[int((i-1)/2)], self._data[i] = self._data[i],self._data[int((i-1)/2)]
self.swapup(int((i-1)/2))
def GenerateSwaps(self):
for i in range(len(self._data)-1,0,-1):
self.swapup(i)
def Solve(self):
self.ReadData()
self.GenerateSwaps()
self.WriteResponse()
if __name__ == '__main__':
heap_builder = HeapBuilder()
heap_builder.Solve()
on the other hand i have implemented a heap sort using sifting down technique with similar comparing process, and this thing has passed every test case.
following is the code for this method
class HeapBuilder:
def __init__(self):
self._swaps = [] #array of tuples or arrays
self._data = []
def ReadData(self):
n = int(input())
self._data = [int(s) for s in input().split()]
assert n == len(self._data)
def WriteResponse(self):
print(len(self._swaps))
for swap in self._swaps:
print(swap[0], swap[1])
def swapdown(self,i):
n = len(self._data)
min_index = i
l = 2*i+1 if (2*i+1<n) else -1
r = 2*i+2 if (2*i+2<n) else -1
if l != -1 and self._data[l] < self._data[min_index]:
min_index = l
if r != - 1 and self._data[r] < self._data[min_index]:
min_index = r
if i != min_index:
self._swaps.append((i, min_index))
self._data[i], self._data[min_index] = \
self._data[min_index], self._data[i]
self.swapdown(min_index)
def GenerateSwaps(self):
for i in range(len(self._data)//2 ,-1,-1):
self.swapdown(i)
def Solve(self):
self.ReadData()
self.GenerateSwaps()
self.WriteResponse()
if __name__ == '__main__':
heap_builder = HeapBuilder()
heap_builder.Solve()
can someone explain what is wrong with sift/swap up method?

Trying to build a heap by "swapping up" from the bottom won't always work. The resulting array will not necessarily be a valid heap. For example, consider this array: [3,6,2,4,5,7,1]. Viewed as tree that is:
3
4 2
6 5 7 1
Your algorithm starts at the last item and swaps up towards the root. So you swap 1 with 2, and then you swap 1 with 3. That gives you:
1
4 3
6 5 7 2
You then continue with the rest of the items, none of which have to be moved.
The result is an invalid heap: that last item, 2, should be the parent of 3.
The key here is that the swapping up method assumes that when you've processed a[i], then the item that ends up in that position is in its final place. Contrast that to the swap down method that allows repeated adjustment of items that are lower in the heap.

Related

Why is my python function not working properly when I call it recursively?

I'm doing a question from a previous Waterloo ccc competition (https://cemc.uwaterloo.ca/contests/computing/2020/ccc/juniorEF.pdf problem J5)
and my code isn't working the way I expected
Here's the sample input I'm using:
3
4
3 10 8 14
1 11 12 12
6 2 3 9
Here's my code so far
y_size = int(input())
x_size = int(input())
mat = []
"ok".split()
for i in range(y_size):
row = input().split()
mat.append(row)
pos_list = [[0, 0]]
current_num = int(mat[0][0])
a = 0
def canEscape():
global a
global mat
global pos_list
global current_num
end = y_size * x_size
if y_size -1 * x_size -1 == current_num:
return True
for i in range(y_size):
print("______")
for j in range(x_size):
v = (i + 1) * (j + 1)
print(v)
print(current_num)
if v == current_num:
print("ok")
if v == end:
print("ok")
a += 1
current_num = mat[i][j]
pos_list.append([i, j])
canEscape()
pos_list.pop(-1)
a -= 1
current_num = mat[pos_list[a][0]][pos_list[a][1]]
canEscape()
The problem I'm having is that I expect if v == current_num: to be true when I call it again. Both current_num and v are equal to 8 but the code seems to carry on with the for-in loop and break, without entering the if statement. I've made the output print v followed by current_num for every iteration of the for loop to try and figure out the problem but it seems that both variables == 8 so I really don't know what I did wrong. Did I make a silly mistake or did I structure my whole program wrong?
I'm having trouble following what your program is doing at all. This problem involves integer factoring, and I do not see where you're factoring integers. You definitely are not understanding that aspect of the problem.
When you calculate what cells you can go to you look at the value of your current cell. Lets say it is 6. 6 has the factors 1, 2, 3, and 6 because all of those numbers can be multiplied by another number to equal 6. So, you can go to the cells (1, 6), (6, 1), (2, 3), and (3, 2), because those are the pairs of numbers that can be multiplied together to equal 6.
Also, you never convert the lines of input into integers. When you append to the matrix, you are appending a list of strings that happen to be numbers. You must convert those into integers.
Anyways, this program will solve the problem. I copy and pasted the factoring algorithm from other threads:
n_rows = int(input())
n_cols = int(input())
mat = []
for i in range(n_rows):
mat.append(list(map(lambda x: int(x), input().split()))) # Convert input strings to integers.
def reduce(f, l):
# This is just needed for the factoring function
# It's not relevant to the problem
r = None
for e in l:
if r is None:
r = e
else:
r = f(r, e)
return r
def factors(n):
# An efficient function for calculating factors.
return set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))
def get_pairs(items):
for i in range(len(items) // 2):
yield (items[i],items[len(items) - 1 - i]) # use yield to save memory
if(len(items) % 2 != 0): # This is for square numbers.
n = items[len(items) // 2]
yield (n,n)
checked_numbers = set()
def isPath(r=1, c=1):
# check if the testing row or column is to large.
if r > n_rows or c > n_cols:
return False
y = r - 1
x = c - 1
n = mat[y][x]
# If we've already checked a number with a certain value we dont need to check it again.
if n in checked_numbers:
return False
checked_numbers.add(n)
# Check if we've reached the exit.
if(r == n_rows and c == n_cols):
return True
# Calculate the factors of the number, and then find all valid pairs with those factors.
pairs = get_pairs(sorted(list(factors(n))))
# Remember to check each pair with both combinations of every pair of factors.
# If any of the pairs lead to the exit then we return true.
return any([isPath(pair[0], pair[1]) or isPath(pair[1], pair[0]) for pair in pairs])
if isPath():
print("yes");
else:
print("no");
This works and it is fast. However, it if you are limited on memory and/or have a large data input size your program could easily run out of memory. I think it is likely that this will happen with some of the testing inputs but I'm not sure.. It is surely possible to write this program in a way that would use a fraction of the memory, perhaps by converting the factors function to a function that uses iterators, as well as converting the get_pairs function to somehow iterate as well.
I would imagine that this solution solves most of the testing inputs they have but will not solve the ones towards the end, because they will be very large and it will run out of memory.

Dynamic Programming Primitive calculator code optimization

I am currently doing coursera course on algorithms. I have successfully completed this assignment. All test cases passed. My code looks messy and I want to know if there is any thing availiable in Python which can help run my code faster. Thanks
The problem statement is as follows: You are given a primitive calculator that can perform the following three operations with
the current number 𝑥: multiply 𝑥 by 2, multiply 𝑥 by 3, or add 1 to 𝑥. Your goal is given a
positive integer 𝑛, find the minimum number of operations needed to obtain the number 𝑛
starting from the number 1.
# Uses python3
import sys
def optimal_sequence(m):
a=[0,0]
for i in range(2,m+1):
if i%3==0 and i%2==0:
a.append(min(a[i//2],a[i//3],a[i-1])+1)
elif i%3==0:
a.append(min(a[i//3],a[i-1])+1)
elif i%2==0:
a.append(min(a[i//2],a[i-1])+1)
else:
a.append((a[i-1])+1)
return backtrack(a,m)
def backtrack(a,m):
result=[]
result.append(m)
current = m
for i in range(a[-1],0,-1):
if current%3==0 and a[current//3]==(i-1):
current=current//3
result.append(current)
elif current%2==0 and a[current//2]==(i-1):
current = current//2
result.append(current)
elif a[current-1]==(i-1):
current = current-1
result.append(current)
return result
n = int(input())
if n == 1:
print(0)
print(1)
sys.exit(0)
a= (optimal_sequence(n))
print(len(a)-1)
for x in reversed(a):
print(x,end=" ")
I would use a breadth first search for number 1 starting from number n. Keep track of the numbers that were visited, so that the search backtracks on already visited numbers. For visited numbers remember which is the number you "came from" to reach it, i.e. the next number in the shortest path to n.
In my tests this code runs faster than yours:
from collections import deque
def findOperations(n):
# Perform a BFS
successor = {} # map number to next number in shortest path
queue = deque() # queue with number pairs (curr, next)
queue.append((n,None)) # start at n
while True:
curr, succ = queue.popleft()
if not curr in successor:
successor[curr] = succ
if curr == 1:
break
if curr%3 == 0: queue.append((curr//3, curr))
if curr%2 == 0: queue.append((curr//2, curr))
queue.append((curr-1, curr))
# Create list from successor chain
result = []
i = 1
while i:
result.append(i)
i = successor[i]
return result
Call this function with argument n:
findOperations(n)
It returns a list.

Print permutation tree python3

I have list of numbers and I need to create tree and print all permutations, e.g. for [1, 2, 3] it should print
123
132
213
231
312
321
Currently my code prints first element only once:
123
32
213
31
312
21
How do I fix it?
class Node(object):
def __init__(self):
self.children = None
self.data = None
self.depth = None
class PermutationTree(object):
def __init__(self, numbers):
numbers = list(set(numbers))
numbers.sort()
self.depth = len(numbers)
self.tree = self.generate_root(numbers)
self.create_tree(numbers, self.tree.children, 1)
def generate_root(self, numbers):
root = Node()
root.children = []
root.depth = 0
for i in numbers:
tree = Node()
tree.data = i
tree.depth = 1
root.children.append(tree)
return root
def create_tree(self, numbers, children, depth):
depth += 1
if depth == self.depth + 1:
return
for child in children:
reduced_numbers = list(numbers)
reduced_numbers.remove(child.data)
child.children = []
for j in reduced_numbers:
tree = Node()
tree.data = j
tree.depth = depth
child.children.append(tree)
self.create_tree(reduced_numbers, child.children, depth)
def print_tree(self):
self.print(self.tree)
def print(self, node):
for i in node.children:
print(i.data, end="")
if not i.depth == self.depth:
self.print(i)
print("")
test = [1, 2, 3]
permutation_tree = PermutationTree(test)
permutation_tree.print_tree()
This is pretty good example showing why side effects should be done in as few places as possible.
Side effects everywhere
Your approach is like this: "I will print one digit every time it is needed." This approach gives you limited control over side effects. Your code prints many times in different recursion depths and it's hard to keep track of what is going on.
Side effects in as few places as possible
Another approach to this problem is creating side effects in one place. "I will generate all permutations first and then print them." Such a code is easier to debug and can give you better control of what is going on in your code.
If you write your code like this:
def print_tree(self): # this function takes care about printing
for path in self.generate_path(self.tree):
print(path)
def generate_perm(self, node): #this function takes care about generating permutations
if not node.children:
return [str(node.data)]
app_str = str(node.data) if node.data else ""
res= []
for child in node.children:
for semi_path in self.generate_perm(child):
res.append (app_str + semi_path)
return res
P.S.
generate_perm can be rewritten to be more efficient with yields.

Quick sort counting

Python questions again.
I want to count the number of comparison operations performed by quick sort. Because I use a recursive function, I do not think that assigning count = 0 to the beginning of the function body is inappropriate, so I made it as follows.
def QuickSort(lst, count = 0):
if len(lst) > 1:
pivot_idx = len(lst) // 2
smaller_nums, larger_nums = [], []
for idx, num in enumerate(lst):
if idx != pivot_idx:
if num < lst[pivot_idx]:
smaller_nums.append(num)
else:
larger_nums.append(num)
count = QuickSort(smaller_nums, count + 1)[1]
count = QuickSort(larger_nums, count + 1)[1]
lst[:] = smaller_nums + [lst[pivot_idx]] + larger_nums
return lst, count
However, after counting, I confirmed the count which is much lower than my expectation. According to big o, the quick sort would have to show the calculation of n * log (n), but it showed a much lower count. For example, when sorting a list with 1000 random elements, we expected to see a count of 1000 * log (1000) = 6907, but actually only 1164 counts. I am wondering if I am misusing the count in the function or misunderstanding it.
Thank you.
Your post is mistaken on several points:
Big-O is allows arbitrary constant factors and also ignoring the values for "small" values of n, where "small" can be arbitrarily large for any given analysis. So your computations are meaningless.
Your counts are wrong. There's one comparison per loop iteration. You're counting something else.
This is a strange way to code the count. Just use a global variable.
Try this. Note really you're using twice as many comparisons as this reports. The check that the loop index isn't the pivot could be eliminated with a smarter implementation.
c = 0
def QuickSort(lst):
if len(lst) <= 1:
return lst
pivot_idx = len(lst) // 2
smaller, larger = [], []
for idx, num in enumerate(lst):
if idx != pivot_idx:
global c
c += 1
(larger, smaller)[num < lst[pivot_idx]].append(num)
return QuickSort(smaller) + [lst[pivot_idx]] + QuickSort(larger)
def Run(n):
lst = [random.randint(0,1000) for r in xrange(n)]
QuickSort(lst)
print c
Run(1000)
If you're aghast at the prospect of using a global variable, then you can just wrap the sort in a class:
import random
class QuickSort:
def __init__(self):
self.comparisons = 0
def sort(self, lst):
if len(lst) <= 1:
return lst
pivot_idx = len(lst) // 2
smaller, larger = [], []
for idx, num in enumerate(lst):
if idx != pivot_idx:
self.comparisons += 1
(larger, smaller)[num < lst[pivot_idx]].append(num)
return self.sort(smaller) + [lst[pivot_idx]] + self.sort(larger)
def Run(n):
lst = [random.randint(0,1000) for r in xrange(n)]
quicksort = QuickSort()
print quicksort.sort(lst)
print quicksort.comparisons
Run(100)
Building on the answer provided by Gene by adding print statements and a sort "error" range, his example was very helpful to my understanding of quicksort and an error term on the big O impact of operations performance comparison.
class QuickSort:
def __init__(self):
self.comparisons = 0
def sort(self, lst):
k_err = 0 # k << n, the value the sort array can be in error
if len(lst) <= 1:
return lst
pivot_idx = len(lst) // 2
smaller, larger = [], []
for idx, num in enumerate(lst):
if idx != (pivot_idx) :
self.comparisons += 1
try:
(larger, smaller)[(num - k_err) < lst[pivot_idx]].append(num)
except:
(larger, smaller)[(num + k_err) < lst[pivot_idx]].append(num)
print(pivot_idx,"larger", self.comparisons, larger)
print(pivot_idx, "smaller", self.comparisons, smaller, )
return self.sort(smaller) + [lst[pivot_idx]] + self.sort(larger)
def Run(n):
random.seed(100)
lst = [random.randint(0,round(100,0)) for r in range(n)]
quicksort = QuickSort()
print(len(lst), lst)
print(quicksort.sort(lst))
print(quicksort.comparisons, quicksort.comparisons/n, ((quicksort.comparisons/n)/math.log(n,10)), math.log(n,10) )

Tree traversals python

I have to define three functions: preorder(t):, postorder(t):, and inorder(t):.
Each function will take a binary tree as input and return a list. The list should then be ordered in same way the tree elements would be visited in the respective traversal (post-order, pre-order, or in-order)
I have written a code for each of them, but I keep getting an error when I call another function (flat_list()), I get an index error thrown by
if not x or len(x) < 1 or n > len(x) or x[n] == None:
IndexError: list index out of range
The code for my traversal methods is as follows:
def postorder(t):
pass
if t != None:
postorder(t.get_left())
postorder(t.get_right())
print(t.get_data())
def pre_order(t):
if t != None:
print(t.get_data())
pre_order(t.get_left())
pre_order(t.get_right())
def in_order(t):
pass
if t != None:
in_order(t.get_left())
print(t.get_data())
in_order(t.get_right())
def flat_list2(x,n):
if not x or len(x) < 1 or n > len(x) or x[n] == None:
return None
bt = BinaryTree( x[n] )
bt.set_left( flat_list2(x, 2*n))
bt.set_right(flat_list2(x, 2*n + 1))
return bt
this is how i call flat_list2
flat_node_list = [None, 55, 24, 72, 8, 51, None, 78, None, None, 25]
bst = create_tree_from_flat_list2(flat_node_list,1)
pre_order_nodes = pre_order(bst)
in_order_nodes = in_order(bst)
post_order_nodes = post_order(bst)
print( pre_order_nodes)
print( in_order_nodes)
print( post_order_nodes)
You should actually write three function that return iterators. Let the caller decide whether a list is needed. This is most easily done with generator functions. In 3.4+, 'yield from` can by used instead of a for loop.
def in_order(t):
if t != None:
yield from in_order(t.get_left())
yield t.get_data()
yield from in_order(t.get_right())
Move the yield statement for the other two versions.
First things first, I noticed that your indentation was inconsistent in the code block that you provided (fixed in revision). It is critical that you ensure that your indentation is consistent in Python or stuff will go south really quickly. Also, in the code below, I am assuming that you wanted your t.get_data() to still fall under if t != None in your postorder(t), so I have indented as such below. And lastly, I noticed that your method names did not match the spec you listed in the question, so I have updated the method names below to be compliant with your spec (no _ in the naming).
For getting the list, all you need to do is have your traversal methods return a list, and then extend your list at each level of the traversal with the other traversed values. This is done in lieu of printing the data.
def postorder(t):
lst = []
if t != None:
lst.extend(postorder(t.get_left()))
lst.extend(postorder(t.get_right()))
lst.append(t.get_data())
return lst
def preorder(t):
lst = []
if t != None:
lst.append(t.get_data())
lst.extend(preorder(t.get_left()))
lst.extend(preorder(t.get_right()))
return lst
def inorder(t):
lst = []
if t != None:
lst.extend(inorder(t.get_left()))
lst.append(t.get_data())
lst.extend(inorder(t.get_right()))
return lst
This will traverse to the full depths both left and right on each node and, depending on if it's preorder, postorder, or inorder, will append all the traversed elements in the order that they were traversed. Once this has occurred, it will return the properly ordered list to the next level up to get appended to its list. This will recurse until you get back to the root level.
Now, the IndexError coming from your flat_list, is probably being caused by trying to access x[n] when n could be equal to len(x). Remember that lists/arrays in Python are indexed from 0, meaning that the last element of the list would be x[n-1], not x[n].
So, to fix that, replace x[n] with x[n-1]. Like so:
def flat_list2(x,n):
if not x or len(x) < 1 or n < 1 or n > len(x) or x[n-1] == None:
return None
bt = BinaryTree( x[n-1] )
bt.set_left( flat_list2(x, 2*n))
bt.set_right(flat_list2(x, 2*n + 1))
return bt

Resources