Dijkstra's algorithm in graph (Python) - python-3.x

I need some help with the graph and Dijkstra's algorithm in python 3. I tested this code (look below) at one site and it says to me that the code works too long. Can anybody say me how to solve that or paste the example of code for this algorithm? I don't know how to speed up this code. I read many sites but l don't found normal examples...
P.S. Now l edit code in few places and tried to optimize it, nut it still too slow(
from collections import deque
class node:
def __init__(self, name, neighbors, distance, visited):
self.neighbors = neighbors
self.distance = distance
self.visited = visited
self.name = name
def addNeighbor(self, neighbor_name, dist): # adding new neighbor and length to him
if neighbor_name not in self.neighbors:
self.neighbors.append(neighbor_name)
self.distance.append(dist)
class graph:
def __init__(self):
self.graphStructure = {} # vocabulary with information in format: node_name, [neighbors], [length to every neighbor], visited_status
def addNode(self, index): # adding new node to graph structure
if self.graphStructure.get(index) is None:
self.graphStructure[index] = node(index, [], [], False)
def addConnection(self, node0_name, node1_name, length): # adding connection between 2 nodes
n0 = self.graphStructure.get(node0_name)
n0.addNeighbor(node1_name, length)
n1 = self.graphStructure.get(node1_name)
n1.addNeighbor(node0_name, length)
def returnGraph(self): # printing graph nodes and connections
print('')
for i in range(len(self.graphStructure)):
nodeInfo = self.graphStructure.get(i + 1)
print('name =', nodeInfo.name, ' neighborns =', nodeInfo.neighbors, ' length to neighborns =', nodeInfo.distance)
def bfs(self, index): # bfs method of searching (also used Dijkstra's algorithm)
distanceToNodes = [float('inf')] * len(self.graphStructure)
distanceToNodes[index - 1] = 0
currentNode = self.graphStructure.get(index)
queue = deque()
for i in range(len(currentNode.neighbors)):
n = currentNode.neighbors[i]
distanceToNodes[n - 1] = currentNode.distance[i]
queue.append(n)
while len(queue) > 0: # creating queue and visition all nodes
u = queue.popleft()
node_u = self.graphStructure.get(u)
node_u.visited = True
for v in range(len(node_u.neighbors)):
node_v = self.graphStructure.get(node_u.neighbors[v])
distanceToNodes[node_u.neighbors[v] - 1] = min(distanceToNodes[node_u.neighbors[v] - 1], distanceToNodes[u - 1] + node_u.distance[v]) # update minimal length to node
if not node_v.visited:
queue.append(node_u.neighbors[v])
return distanceToNodes
def readInputToGraph(graph): # reading input data and write to graph datatbase
node0, node1, length = map(int, input().split())
graph.addNode(node0)
graph.addNode(node1)
graph.addConnection(node0, node1, length)
def main():
newGraph = graph()
countOfNodes, countOfPairs = map(int, input().split())
if countOfPairs == 0:
print('0')
exit()
for _ in range(countOfPairs): # reading input data for n(countOfPairs) rows
readInputToGraph(newGraph)
# newGraph.returnGraph() # printing information
print(sum(newGraph.bfs(1))) # starting bfs from start position
main()
The input graph structure may look like this:
15 17
3 7 2
7 5 1
7 11 5
11 5 1
11 1 2
1 12 1
1 13 3
12 10 1
12 4 3
12 15 1
12 13 4
1 2 1
2 8 2
8 14 1
14 6 3
6 9 1
13 9 2
I'm only learning python so l think l could do something wrong(

The correctness of Dijkstra's algorithm relies on retrieving the node with the shortest distance from the source in each iteration. Using your code as an example, the operation u = queue.popleft() MUST return the node that has the shortest distance from the source out of all nodes that are currently in the queue.
Looking at the documentation for collections.deque, I don't think the implementation guarantees that popleft() always returns the node with the lowest key. It simply returns the left most item in what is effectively a double linked list.
The run time of Dijkstra's algorithm (once you implement it correctly) almost entirely lies on the underlying data structure used to implement queue. I would suggest that you first revisit the correctness of your implementation, and once you can confirm that it is actually correct, then start experimenting with different data structures for queue.

Related

How to select a good sample size of nodes from a graph

I have a network that has a node attribute labeled as 0 or 1. I want to find how the distance between nodes with the same attribute differs from the distance between nodes with a different attributes. As it is computationally difficult to find the distance between all combinations of nodes, I want to select a sample size of nodes. How will I select a sample size of nodes? I am working on python and networkx
You've not given many details, so I'll invent some data and make assumptions in the hope it's useful.
Start by importing packages and sampling a dataset:
import random
import networkx as nx
# human social networks tend to be "scale-free"
G = nx.generators.scale_free_graph(1000)
# set labels to either 0 or 1
for i, attr in G.nodes.data():
attr['label'] = 1 if random.random() < 0.2 else 0
Next, calculate the shortest paths between random pairs of nodes:
results = []
# I had to use 100,000 pairs to get the CI small enough below
for _ in range(100000):
a, b = random.sample(list(G.nodes), 2)
try:
n = nx.algorithms.shortest_path_length(G, a, b)
except nx.NetworkXNoPath:
# no path between nodes found
n = -1
results.append((a, b, n))
Finally, here is some code to summarise the results and print them out:
from collections import Counter
from scipy import stats
# somewhere to counts of both 0, both 1, different labels
c_0 = Counter()
c_1 = Counter()
c_d = Counter()
# accumulate distances into the above counters
node_data = {i: a['label'] for i, a in G.nodes.data()}
cc = { (0,0): c_0, (0,1): c_d, (1,0): c_d, (1,1): c_1 }
for a, b, n in results:
cc[node_data[a], node_data[b]][n] += 1
# code to display the results nicely
def show(c, title):
s = sum(c.values())
print(f'{title}, n={s}')
for k, n in sorted(c.items()):
# calculate some sort of CI over monte carlo error
lo, hi = stats.beta.ppf([0.025, 0.975], 1 + n, 1 + s - n)
print(f'{k:5}: {n:5} = {n/s:6.2%} [{lo:6.2%}, {hi:6.2%}]')
show(c_0, 'both 0')
show(c_1, 'both 1')
show(c_d, 'different')
The above prints out:
both 0, n=63930
-1: 60806 = 95.11% [94.94%, 95.28%]
1: 107 = 0.17% [ 0.14%, 0.20%]
2: 753 = 1.18% [ 1.10%, 1.26%]
3: 1137 = 1.78% [ 1.68%, 1.88%]
4: 584 = 0.91% [ 0.84%, 0.99%]
5: 334 = 0.52% [ 0.47%, 0.58%]
6: 154 = 0.24% [ 0.21%, 0.28%]
7: 50 = 0.08% [ 0.06%, 0.10%]
8: 3 = 0.00% [ 0.00%, 0.01%]
9: 2 = 0.00% [ 0.00%, 0.01%]
both 1, n=3978
-1: 3837 = 96.46% [95.83%, 96.99%]
1: 6 = 0.15% [ 0.07%, 0.33%]
2: 34 = 0.85% [ 0.61%, 1.19%]
3: 34 = 0.85% [ 0.61%, 1.19%]
4: 31 = 0.78% [ 0.55%, 1.10%]
5: 30 = 0.75% [ 0.53%, 1.07%]
6: 6 = 0.15% [ 0.07%, 0.33%]
To save space I've cut off the section where the labels differ. The proportions in the square brackets is the 95% CI of the Monte-Carlo error. Using more iterations above allows you to reduce this error, while obviously taking more CPU time.
This is more or less an extension of my discussion with Sam Mason and only want to give you some timing numbers, because as discussed maybe retrieving all distances is feasible and may even faster. Based on the code in Sam Mason answer, I tested both variants and retrieving all distances is for 1000 nodes much faster than sampling 100 000 pairs. The main advantage is that all "retrieved distances" are used.
import random
import networkx as nx
import time
# human social networks tend to be "scale-free"
G = nx.generators.scale_free_graph(1000)
# set labels to either 0 or 1
for i, attr in G.nodes.data():
attr['label'] = 1 if random.random() < 0.2 else 0
def timing(f):
def wrap(*args, **kwargs):
time1 = time.time()
ret = f(*args, **kwargs)
time2 = time.time()
print('{:s} function took {:.3f} ms'.format(f.__name__, (time2-time1)*1000.0))
return ret
return wrap
#timing
def get_sample_distance():
results = []
# I had to use 100,000 pairs to get the CI small enough below
for _ in range(100000):
a, b = random.sample(list(G.nodes), 2)
try:
n = nx.algorithms.shortest_path_length(G, a, b)
except nx.NetworkXNoPath:
# no path between nodes found
n = -1
results.append((a, b, n))
#timing
def get_all_distances():
all_distances = nx.shortest_path_length(G)
get_sample_distance()
# get_sample_distance function took 2338.038 ms
get_all_distances()
# get_all_distances function took 304.247 ms
``

Nodes at given distance in binary tree (Amazon SDE-2)

Given a binary tree, a target node in the binary tree, and an integer value k, find all the nodes that are at distance k from the given target node. No parent pointers are available.
link to the problem on GFG: LINK
Example 1:
Input :
20
/ \
8 22
/ \
4 12
/ \
10 14
Target Node = 8
K = 2
Output: 10 14 22
Explanation: The three nodes at distance 2
from node 8 are 10, 14, 22.
My code
from collections import defaultdict
class solver:
def __init__(self):
self.vertList = defaultdict(list)
def addEdge(self,u,v):
self.vertList[u].append(v)
def makeGraph(self,root):
visited = set()
queue = []
queue.append(root)
while len(queue) > 0:
curr = queue.pop(0)
visited.add(curr)
if curr.left is not None and curr.left not in visited:
self.vertList[curr.data].append(curr.left.data)
self.vertList[curr.left.data].append(curr.data)
queue.append(curr.left)
if curr.right is not None and curr.right not in visited:
self.vertList[curr.data].append(curr.right.data)
self.vertList[curr.right.data].append(curr.data)
queue.append(curr.right)
def KDistanceNodes(self,root,target,k):
self.makeGraph(root)
dist = {}
for v in self.vertList:
dist[v] = 0
visited2 = set()
queue2 = []
queue2.append(target)
while len(queue2) > 0:
curr = queue2.pop(0)
visited2.add(curr)
for nbr in self.vertList[curr]:
if nbr not in visited2:
visited2.add(nbr)
queue2.append(nbr)
dist[nbr] = dist[curr] + 1
ans = []
for v in dist:
if dist[v] == k:
ans.append(str(v))
return ans
#{
# Driver Code Starts
#Initial Template for Python 3
from collections import deque
# Tree Node
class Node:
def __init__(self, val):
self.right = None
self.data = val
self.left = None
# Function to Build Tree
def buildTree(s):
# Corner Case
if (len(s) == 0 or s[0] == "N"):
return None
# Creating list of strings from input
# string after spliting by space
ip = list(map(str, s.split()))
# Create the root of the tree
root = Node(int(ip[0]))
size = 0
q = deque()
# Push the root to the queue
q.append(root)
size = size + 1
# Starting from the second element
i = 1
while (size > 0 and i < len(ip)):
# Get and remove the front of the queue
currNode = q[0]
q.popleft()
size = size - 1
# Get the current node's value from the string
currVal = ip[i]
# If the left child is not null
if (currVal != "N"):
# Create the left child for the current node
currNode.left = Node(int(currVal))
# Push it to the queue
q.append(currNode.left)
size = size + 1
# For the right child
i = i + 1
if (i >= len(ip)):
break
currVal = ip[i]
# If the right child is not null
if (currVal != "N"):
# Create the right child for the current node
currNode.right = Node(int(currVal))
# Push it to the queue
q.append(currNode.right)
size = size + 1
i = i + 1
return root
if __name__ == "__main__":
x = solver()
t = int(input())
for _ in range(t):
line = input()
target=int(input())
k=int(input())
root = buildTree(line)
res = x.KDistanceNodes(root,target,k)
for i in res:
print(i, end=' ')
print()
Input:
1 N 2 N 3 N 4 5
target = 5
k = 4
Its Correct output is:
1
And Your Code's output is:
[]
My logic:
-> First convert the tree into a undirected graph using BFS / level order traversal
-> Traverse the graph using BFS and calculate the distance
-> Return the nodes at k distance from the target
What I think:
First of all in the given test case the tree representation seems to be in level order however, in the failing test case it doesn't look like level order or maybe my logic is wrong?
Input Format:
Custom input should have 3 lines. First line contains a string representing the tree as described below. Second line contains the data value of the target node. Third line contains the value of K.
The values in the string are in the order of level order traversal of the tree where, numbers denote node values, and a character ā€œNā€ denotes NULL child.
For the above tree, the string will be: 1 2 3 N N 4 6 N 5 N N 7 N
The mistake is that the logic that is done in "init" should be done for every use-case (reinitializing self.vertList), not just once at the beginning.
Change:
def __init__(self):
self.vertList = defaultdict(list)
to:
def __init__(self):
pass
and:
def KDistanceNodes(self,root,target,k):
self.makeGraph(root)
dist = {}
...
to:
def KDistanceNodes(self, root, target, k):
self.vertList = defaultdict(list) # <-- move it here
self.makeGraph(root)
dist = {}
...
The original code kept accumulating the nodes and "remembered" the previous use-cases.
Also pay attention that you should return ans sorted, meaning that you should not append the numbers as strings so that you'll be able to sort them, change: ans.append(str(v)) to: ans.append(v) and return sorted(ans).

Limit of Python recursion functions (Process finished with exit code 139)

I had an old script that from a pandas dataframe calculates new columns from others, but also from the previous result of that column being calculated.
This script used for loops, and it was quite slow.
For this reason, I replaced the for loops with recursive functions.
The new script is around 100 times faster than the old one, which is good news. But I am now encountering a limit that I did not have before. As soon as I have more than 29952 rows in my dataset, I get the following error:
"Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"
I made this little script with lists reflecting my problem :
If I increase the size of the lists (list_lenght) to more than 29952, the script crashes (on my computer)
import random
import sys
def list_generator(min_value, max_value, list_lenght):
return [random.randrange(min_value,max_value) for i in range(list_lenght)]
def recursive_function(list_1, list_2, n, result):
if n == len(list_1):
return result
elif list_1[n] <= list_2[n]:
result.append(1 + result[n - 1])
else:
result.append(0)
return recursive_function(list_1, list_2, (n + 1), result)
list_lenght = 29952 # How to increase this limit without generating an error?
min_value = 10
max_value = 20
list_one = list_generator(min_value, max_value, list_lenght)
list_two = list_generator(min_value, max_value, list_lenght)
# Set recursion limit
sys.setrecursionlimit(list_lenght * 2)
# Compute a new list from list_one and list_two
list_result = recursive_function(list_one, list_two, 1, [0])
I suspect a memory problem, but how do you take advantage of all the power of python's recursive functions while avoiding this limit as well as possible?
Thanks in advance
Following comment from #trincot, here is the version of the code without recursion function... which is ultimately faster than the version above with a recursive function ! And with which there are no more limits
def no_recursive_function(list_1, list_2, n, result):
if list_1[n] <= list_2[n]:
return 1 + result[n - 1]
else:
return 0
list_lenght = 29952
min_value = 10
max_value = 20
list_one = list_generator(min_value, max_value, list_lenght)
list_two = list_generator(min_value, max_value, list_lenght)
# Set recursion limit
sys.setrecursionlimit(list_lenght * 2)
list_result_2 = [0]
for n in range(list_lenght - 1):
result = no_recursive_function(list_one, list_two, n + 1, list_result_2)
list_result_2.append(result)

Writing user defined function to evaluate the Saha equation with for-loops for an expected output

I am trying to create a function that evaluates the Saha function for certain values of temperature and electron pressure. The question is a little in depth so I will provide as much detail as possible about past code used before this section.
Previous sections code
Evaluating the partition function (part 1):
k= 8.617333262145179e-05
T=10000.
g=1.0
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) #in eV
Ca_partition_values= []
def partfunc_E(chiI,T):
for chiI in Ca_ion_energies:
elem = 0
for i in np.arange(chiI):
elem = elem + (g*np.exp(-(i/(k*T))))
Ca_partition_values.append(elem)
return Ca_partition_values
print(partfunc_E(Ca_ion_energies,T))
Output:
[1.455902590894594, 1.45633321917395, 1.4563345239240013, 1.4563345239240013, 1.4563345239240013]
Evaluating the Boltzmann equation (part 2):
chiI = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) #in eV
k= 8.617333262145179e-05
T=10000.
def boltz_E(chiI,T,I,i):
Z_1 = partfunc_E(chiI,T)
ratio = np.exp(-i/(k*T)) / Z_1
return ratio [I-1]
print(Ca_ion_energies)
print("i Fraction in level i for I=1 (neutral)")
print("- -------------------------------------")
for n in range(0,10):
print(n,boltz_E(chiI,10000,1,n))
Output:
[ 6.1131554 11.871719 50.91316 67.2732 84.34 ]
i Fraction in level i for I=1 (neutral)
- -------------------------------------
0 0.6868591389658425
1 0.21522358567610525
2 0.06743914320048579
3 0.021131689732463026
4 0.006621500359539954
5 0.002074811222693332
6 0.0006501308428703751
7 0.0002037149733085943
8 6.383298193775377e-05
9 2.0001718660577703e-05
Question I need help with (and my code so far):
Evaluating the Saha equation (part 3):
The instructions for this section are as follows:
The simplest way to get this ratio is to set š‘_š¼=1 (i.e. the neutral atom) to some value (e.g. unity), evaluate the next ionisation-stage populations successively from the Saha equation in a for loop, and at the end divide them by the sum of all the š‘ on the same scale. You will find the numpy np.sum function useful to get the total over all stages. We want temperature T to be 5000K and electron pressure Pe to be 100.0 N/m^2.
FYI: I is the ionisation stage, Z_1 is the partition function from part 1, Z_I is the partition function for stage I+1, Pe is the electron pressure, chiI are the ionisation energies (for Calcium in my code), T is temperature and the function that "fraction" is set equal to is the Saha equation.
It should start something like:
def saha_E(chiI,T,Pe,I):
compute Saha population fraction N_I/N
input: ionisation energies, temperature, electron pressure, ion stage
Compute the partition functions
Loop over each ionisation stage that you have an energy for, computing the fraction via the saha equation. Note that the first stage should be set to 1.
Divide each stage by the total
Return the fraction of the requested stage
My code attempt:
k= 8.617333262145179e-05
T=10000.
g=1.0
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34])
N_I = 1
h = 6.626e-34
m = 9.11e-31
fractions = []
fraction_sum = []
def saha_E(chiI,T,Pe,I):
Z_1 = partfunc_E(chiI,T)
Z_I = partfunc_E(chiI+1,T)
for I in Ca_ion_energies:
fraction = (N_I*(Z_I/Z_1)*((2*k*T)/((h**3)*Pe))*((2*np.pi*m*k*T)**(3/2))*np.exp(-I/(k*T)))
fractions.append(fraction)
fraction_sum.append(np.sum(fractions))
for i in fractions:
i/fraction_sum
return fraction
print("For ionisation energies (in eV) of:",chiI)
print()
print("I Fraction in stage I")
print("- -------------------")
for I in range(0,6):
print(I,saha_E(chiI,5000,100.0,I))
I am instructed also that the output should be something similar to:
For ionisation energies (in eV) of: [ 6.11 11.87 50.91 67.27 84.34]
I Fraction in stage I
- -------------------
1 0.999998720736
2 1.27926351211e-06
3 7.29993420039e-52
4 1.3474665329e-113
5 1.54848994685e-192
Firstly, I don't think my code is correct but it is the best I can do which is why I need some help, but also, this code is giving me the following error:
TypeError: unsupported operand type(s) for /: 'list' and 'list'
If my code is totally wrong please tell me as I have spent so much time trying to figure this out already.
Edit
This question is still not completely answered, please keep commenting!
If I understood your problem well, my approach is to calculate the "fractions" and "fractions sums" in a single loop on the various energies, and normalize only once we are outside the loop.
Also, careful with the scope of your code. I pushed some variables you declared outside of the function inside of it because there is no reason to keep them alive outside of the function's scope.
Careful also not to use the same variable twice. Your function takes a I argument but then has a I variable in a for loop.
As said in the chat, you want to write dosctrings and comments so that you know where you are going even before touching any code. Here is a base to complete:
import numpy as np
# Constants.
k = 8.617333262145179e-05
g = 1.0
h = 6.626e-34
m = 9.11e-31
Ca_ion_energies = np.array([6.1131554, 11.871719, 50.91316, 67.2732, 84.34]) # in eV.
# Partition function.
def partfunc_E(chiI, T):
"""This function returns the partition of blablabla.
args:
------
:chiI: (array or list) the energy levels of a chosen ion.
:T: (float) the temperature at which kT will be calculated."""
Ca_partition_values = []
for energy_level in chiI: # For each energy level.
elem = 0
for i in np.arange(energy_level): # From 0 to current energy level.
elem += g*np.exp(-(i/(k*T)))
Ca_partition_values.append(elem)
return np.array(Ca_partition_values) # Conversion to numpy array to support operations later.
print(partfunc_E(Ca_ion_energies, T=10000))
# Boltzmann equation.
def boltz_E(chiI, T, I, i):
Z_1 = partfunc_E(chiI, T)
ratio = np.exp(-i/(k*T)) / Z_1
return ratio[I-1]
print(Ca_ion_energies)
print("i Fraction in level i for I=1 (neutral)")
print("- -------------------------------------")
for n in range(0,10):
print(n, boltz_E(Ca_ion_energies, T=10000, I=1, i=n))
# Saha equation.
def saha_E(chiI, T, Pe, i):
p = partfunc_E(chiI, T)
Z_ratios = np.array([p[n]/p[0] for n in range(len(chiI))])
fractions = []
fractions_sum = []
for n, I in enumerate(chiI):
fraction = Z_ratios[n]*((2*k*T)/((h**3)*Pe))*((2*np.pi*m*k*T)**(3/2))*np.exp(-I/(k*T))
fractions.append(fraction)
fractions_sum.append(np.sum(fractions))
# Let's normalize the array before returning it.
fractions = np.divide(fractions, fractions_sum)
return fractions[i]
print("For ionisation energies (in eV) of:", Ca_ion_energies)
print()
print("I Fraction in stage n")
print("- -------------------")
for n in range(0, 4):
print(n, saha_E(Ca_ion_energies, T=5000, Pe=100.0, i=n))

MST challenge gives "time exceeded" error [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am doing the BLINNET problem on Sphere Online Judge where I need to find the cost of a minimum spanning tree. I should follow a structure with Edge and Vertex instances. Vertices represent cities in this case.
I get a "time exceeded" error, and I feel like too many for loop iterations are at the cause, but that is the best I can do. I want to try the binary sort to see if it works with that, but that is not easy as it should be sorted using the key property in the City class.
Sample input
2
4
gdansk
2
2 1
3 3
bydgoszcz
3
1 1
3 1
4 4
torun
3
1 3
2 1
4 1
warszawa
2
2 4
3 1
3
ixowo
2
2 1
3 3
iyekowo
2
1 1
3 7
zetowo
2
1 3
2 7
Output for the Sample
3
4
My code
import sys
import heapq
class City:
def __init__(self, city_id):
self.city_id = city_id
self.key = float('inf')
self.parent = None
self.edge_list = list()
self.visited = False
#self.city_name = None
def is_not_visited(self):
if self.visited is False:
return True
return False
def add_neighbor(self, edge):
self.edge_list.append(edge)
def __lt__(self, other):
return self.key < other.key
class Edge:
def __init__(self, to_vertex, cost):
self.to_vertex = to_vertex
self.cost = cost
#
# def find_and_pop(queue):
# min = queue[0]
# index = 0
# for a in range(0, len(queue)):
# if queue[a].key < min.key:
# min = queue[a]
# index = a
# return queue.pop(index)
#
def MST(vertices_list):
queue = vertices_list
current = queue[0]
current.key = 0
#visited_list = list()
#heapq.heapify(queue)
total_weight = 0
while queue:
#current = find_and_pop(queue)
current = queue.pop(0)
for edge in current.edge_list:
if edge.to_vertex.is_not_visited():
if edge.cost < edge.to_vertex.key:
edge.to_vertex.key = edge.cost
edge.to_vertex.parent = current
total_weight = total_weight + current.key
current.visited = True
queue = sorted(queue, key=lambda x: x.city_id)
#heapq.heapify(queue)
#visited_list.append(current)
# total_weight = 0
# for x in visited_list:
# total_weight = total_weight + x.key
sys.stdout.write("{0}\n".format(total_weight))
class TestCase:
def __init__(self, vertices):
self.vertices = vertices
testcases = []
def main():
case_num = int(sys.stdin.readline())
#skip_line = sys.stdin.readline()
for n_case in range(0, case_num):
sys.stdin.readline()
vertices_list = list()
number_of_city = int(sys.stdin.readline())
#interate and make for the time of number of cities
for n_city in range(0, number_of_city):
city = City(n_city)
vertices_list.append(city)
for n_city in range(0, number_of_city):
c_name = sys.stdin.readline()
#vertices_list[n_city].city_name = c_name
num_neighbor = int(sys.stdin.readline())
for n_neigh in range(0, num_neighbor):
to_city_cost = sys.stdin.readline()
to_city_cost = to_city_cost.split(" ")
to_city = int(to_city_cost[0])
cost = int(to_city_cost[1])
edge = Edge(vertices_list[to_city-1], cost)
vertices_list[n_city].edge_list.append(edge)
testcase = TestCase(vertices_list)
testcases.append(testcase)
count = 0
for testcase in testcases:
MST(testcase.vertices)
# if count < case_num -1:
# print()
# count = count + 1
if __name__ == "__main__":
main()
The sorted call in your MST loop makes the solution inefficient. You have some commented-out code that relies on heapq, and that is indeed the way to avoid having to sort the queue each time you alter it. Anyway, I don't understand why you would sort the queue by city id. If anything, it should be sorted by key.
Although it could work with the key property as you did it, it seems more natural to me to add edges to the queue (heap) instead of vertices, so you have the edge cost as the basis for the heap property. Also, that queue should not have all the items from the start, but add them as they are selected during the algorithm. And, that corresponds more the the MST-building algorithm, which adds edge after edge, each time the one with the minimum cost.
If edges are pushed on a heap, they must be comparable. So __lt__ must be implemented on the Edge class like you did for the Vertex class.
class Edge:
# ... your code remains unchanged... Just add:
def __lt__(self, other):
return self.cost < other.cost
def MST(vertices_list):
# first edge in the queue is a virtual one with zero cost.
queue = [Edge(vertices_list[0], 0)] # heap of edges, ordered by cost
total_weight = 0
while queue:
mst_edge = heapq.heappop(queue) # pop both cost & vertex
current = mst_edge.to_vertex
if current.visited: continue
for edge in current.edge_list:
if not edge.to_vertex.visited:
heapq.heappush(queue, edge)
current.visited = True
total_weight += mst_edge.cost
sys.stdout.write("{0}\n".format(total_weight))

Resources