How to "decrease priority" in a min-priority queue in Dijkstra's algorithm? - priority-queue

Wikipedia of Dijkstra's algorithm has the following pseudocode, which uses a min-priority queue:
1 function Dijkstra(Graph, source):
2 dist[source] ← 0 // Initialization
3
4 create vertex priority queue Q
5
6 for each vertex v in Graph:
7 if v ≠ source
8 dist[v] ← INFINITY // Unknown distance from source to v
9 prev[v] ← UNDEFINED // Predecessor of v
10
11 Q.add_with_priority(v, dist[v])
12
13
14 while Q is not empty: // The main loop
15 u ← Q.extract_min() // Remove and return best vertex
16 for each neighbor v of u: // only v that are still in Q
17 alt ← dist[u] + length(u, v)
18 if alt < dist[v]
19 dist[v] ← alt
20 prev[v] ← u
21 Q.decrease_priority(v, alt)
22
23 return dist, prev
However, it's unclear how decrease_priority can be implemented in logarithmic time. Would anyone care to help?

It depends on which data structure you use, but decrease_priority can be implemented with O(log n) time complexity by using a min-heap. For example, I have the following heap:
If I want to decrease the priority of node i=8 from 25 to 1, I first change its priority value, but then I still need to re-heapify to maintain the heap property (root <= children). That can be done by swapping that node with its parent until the heap property is reached. This will take at most log n swaps.

Related

Maximum Sum of XOR operation on a selected element with array elements with an optimize approach

Problem: Choose an element from the array to maximize the sum after XOR all elements in the array.
Input for problem statement:
N=3
A=[15,11,8]
Output:
11
Approach:
(15^15)+(15^11)+(15^8)=11
My Code for brute force approach:
def compute(N,A):
ans=0
for i in A:
xor_sum=0
for j in A:
xor_sum+=(i^j)
if xor_sum>ans:
ans=xor_sum
return ans
Above approach giving the correct answer but wanted to optimize the approach to solve it in O(n) time complexity. Please help me to get this.
If you have integers with a fixed (constant) number of c bites then it should be possible because O(c) = O(1). For simplicity reasons I assume unsigned integers and n to be odd. If n is even then we sometimes have to check both paths in the tree (see solution below). You can adapt the algorithm to cover even n and negative numbers.
find max in array with length n O(n)
if max == 0 return 0 (just 0s in array)
find the position p of the most significant bit of max O(c) = O(1)
p = -1
while (max != 0)
p++
max /= 2
so 1 << p gives a mask for the highest set bit
build a tree where the leaves are the numbers and every level stands for a position of a bit, if there is an edge to the left from the root then there is a number that has bit p set and if there is an edge to the right there is a number that has bit p not set, for the next level we have an edge to the left if there is a number with bit p - 1 set and an edge to the right if bit p - 1 is not set and so on, this can be done in O(cn) = O(n)
go through the array and count how many times a bit at position i (i from 0 to p) is set => sum array O(cn) = O(n)
assign the root of the tree to node x
now for each i from p to 0 do the following:
if x has only one edge => x becomes its only child node
else if sum[i] > n / 2 => x becomes its right child node
else x becomes its left child node
in this step we choose the best path through the tree that gives us the most ones when xoring O(cn) = O(n)
xor all the elements in the array with the value of x and sum them up to get the result, actually you could have built the result already in the step before by adding sum[i] * (1 << i) to the result if going left and (n - sum[i]) * (1 << i) if going right O(n)
All the sequential steps are O(n) and therefore overall the algorithm is also O(n).

Getting wrong results with implementation of Dijkstra's algorithm using PriorityQueue

I have implemented Dijkstra's algorithm using the PriorityQueue class of the queue module in Python.
But I am not always getting the correct result according to the online judge. Something must be missing in the below-given code, but I have no idea what.
What is wrong with my code?
from queue import PriorityQueue
class Solution:
#Function to find the shortest distance of all the vertices
#from the source vertex S.
def dijkstra(self, V, adj, S):
#code here
q=PriorityQueue()
distance=[-1]*V
distance[S]=0
visited=set()
visited.add(S)
for i in adj[S]:
distance[i[0]]=distance[S]+i[1]
q.put([i[1],i[0]])
while not q.empty():
w,s=q.get()
visited.add(s)
for i in adj[s]:
d=distance[s]+i[1]
if distance[i[0]]==-1:
distance[i[0]]=d
elif distance[i[0]]>d:
distance[i[0]]=d
if i[0] not in visited:
q.put([i[1],i[0]])
return distance
#{
# Driver Code Starts
#Initial Template for Python 3
import atexit
import io
import sys
if __name__ == '__main__':
test_cases = int(input())
for cases in range(test_cases):
V,E = map(int,input().strip().split())
adj = [[] for i in range(V)]
for i in range(E):
u,v,w = map(int,input().strip().split())
adj[u].append([v,w])
adj[v].append([u,w])
S=int(input())
ob = Solution()
res = ob.dijkstra(V,adj,S)
for i in res:
print(i,end=" ")
print()
# } Driver Code Ends
Sample Input for one test case:
9 14
0 1 4
0 7 8
1 7 11
1 2 8
7 6 1
7 8 7
2 8 2
8 6 6
2 5 4
2 3 7
6 5 2
3 5 14
3 4 9
5 4 10
0
Expected Output:
0 4 12 19 21 11 9 8 14
Problem:
My code returns this instead:
0 4 12 19 26 16 18 8 14
The problem is that you are giving priority to the edges with the least weight, but you should give priority to paths with the least weight.
So near the end of your code change:
q.put([i[1],i[0]])
to:
q.put([d,i[0]])
This will solve it.
However, some comments:
If you use a priority queue it should not be necessary to compare a previously stored distance for a node with a new distance, as the priority queue's role is to make sure you visit a node via the shortest path upon its first visit. With a bit of code reorganisation, you can get rid of that minimal-distance test.
Once you have that in place, you also do not need to have visited, as it is enough to check that the node's distance is still at -1 (assuming weights are never negative). When that is the case, it means you haven't visited it yet.
It is also a bit more efficient if you store tuples on the queue instead of lists.
And you can reorganise the code so that you only need to push the initial cell to the queue before starting the traversal loop.
Finally, instead of one letter variables, it is more readable if you use descriptive names, like node and weight:
class Solution:
def dijkstra(self, V, adj, S):
queue = PriorityQueue()
distances = [-1] * V
queue.put((0, S))
while not queue.empty():
dist, node = queue.get()
if distances[node] == -1:
distances[node] = dist
for neighbor, weight in adj[node]:
queue.put((dist + weight, neighbor))
return distances

What is the solution path yielded by a DFS on this graph

Using a DFS on this graph, the nodes are visited in the following order(for more than one successor node, nodes are pushed to the "frontier" in alphabetical order):
S->A->E->D->F->G
Is that visitation sequence the solution path aswell? If so, why is it not S->A->E->G, since G is also a successor node of E?
PS: Im new to algorithms, so if I'm clearly not understanding the concept, please let me know.
If you are visiting the nodes, the DFS approach will traverse the graph based on the creation order of the adjacency list.
For example, the order of inserting node E's successors may be in the following ways:
1- E-> D, G
2- E-> G, D
In the first way, you will traverse D->F->G or D->G directly and in both cases you will visit node G before traversing any of node E other successors, so you will not be able to traverse the pathS->A->E->G because node G will be already visited before from node D or F.
In the second way, you will traverse E->G directly, so this will result in traversing the path S->A->E->G, but also you will not be able to access node G from node D or F because it will be already visited from node E.
The previous scenario will happen if you are visiting with true or false, but if you are trying to find the shortest path using the costs on edges, then you will need to use Dijkstra's algorithm for finding shortest path on a graph, and you can read more about it here if you are not familiar with it.
I assume its taking into account both the heuristic and edge cost to determine the next node to visit.
Starting at S it looks at its three possibilities:
A = 9 + 13 = 21
B = 14 + 14 = 28
C = 15 + 15 = 30
It then chooses A and looks at its only available path from A and goes to E.
From E we have two possibilities:
D = 2 + 8 = 10
G = 19 + 0 = 19
It will then choose D and now it has two possibilities:
F = 11 + 5 = 16
G = 16 + 0 = 16
Its a tie so depending on how the algorithm was set it up and the solution you gave it goes to F which then has two possibilities:
E = 6 + 7 = 13
G = 6 + 0 = 6
It then goes to G and finally it sees that this is the goal node and returns the state sequence.

Count the Number of Zero's between Range of integers

. Is there any Direct formula or System to find out the Numbers of Zero's between a Distinct Range ... Let two Integer M & N are given . if I have to find out the total number of zero's between this Range then what should I have to do ?
Let M = 1234567890 & N = 2345678901
And answer is : 987654304
Thanks in advance .
Reexamining the Problem
Here is a simple solution in Ruby, which inspects each integer from the interval [m,n], determines the string of its digits in the standard base 10 positional system, and counts the occuring 0 digits:
def brute_force(m, n)
if m > n
return 0
end
z = 0
m.upto(n) do |k|
z += k.to_s.count('0')
end
z
end
If you run it in an interactive Ruby shell you will get
irb> brute_force(1,100)
=> 11
which is fine. However using the interval bounds from the example in the question
m = 1234567890
n = 2345678901
you will recognize that this will take considerable time. On my machine it does need more than a couple of seconds, I had to cancel it so far.
So the real question is not only to come up with the correct zero counts but to do it faster than the above brute force solution.
Complexity: Running Time
The brute force solution needs to perform n-m+1 times searching the base 10 string for the number k, which is of length floor(log_10(k))+1, so it will not use more than
O(n (log(n)+1))
string digit accesses. The slow example had an n of roughly n = 10^9.
Reducing Complexity
Yiming Rong's answer is a first attempt to reduce the complexity of the problem.
If the function for calculating the number of zeros regarding the interval [m,n] is F(m,n), then it has the property
F(m,n) = F(1,n) - F(1,m-1)
so that it suffices to look for a most likely simpler function G with the property
G(n) = F(1,n).
Divide and Conquer
Coming up with a closed formula for the function G is not that easy. E.g.
the interval [1,1000] contains 192 zeros, but the interval [1001,2000] contains 300 zeros, because a case like k = 99 in the first interval would correspond to k = 1099 in the second interval, which yields another zero digit to count. k=7 would show up as 1007, yielding two more zeros.
What one can try is to express the solution for some problem instance in terms of solutions to simpler problem instances. This strategy is called divide and conquer in computer science. It works if at some complexity level it is possible to solve the problem instance and if one can deduce the solution of a more complex problem from the solutions of the simpler ones. This naturally leads to a recursive formulation.
E.g. we can formulate a solution for a restricted version of G, which is only working for some of the arguments. We call it g and it is defined for 9, 99, 999, etc. and will be equal to G for these arguments.
It can be calculated using this recursive function:
# zeros for 1..n, where n = (10^k)-1: 0, 9, 99, 999, ..
def g(n)
if n <= 9
return 0
end
n2 = (n - 9) / 10
return 10 * g(n2) + n2
end
Note that this function is much faster than the brute force method: To count the zeros in the interval [1, 10^9-1], which is comparable to the m from the question, it just needs 9 calls, its complexity is
O(log(n))
Again note that this g is not defined for arbitrary n, only for n = (10^k)-1.
Derivation of g
It starts with finding the recursive definition of the function h(n),
which counts zeros in the numbers from 1 to n = (10^k) - 1, if the decimal representation has leading zeros.
Example: h(999) counts the zero digits for the number representations:
001..009
010..099
100..999
The result would be h(999) = 297.
Using k = floor(log10(n+1)), k2 = k - 1, n2 = (10^k2) - 1 = (n-9)/10 the function h turns out to be
h(n) = 9 [k2 + h(n2)] + h(n2) + n2 = 9 k2 + 10 h(n2) + n2
with the initial condition h(0) = 0. It allows to formulate g as
g(n) = 9 [k2 + h(n2)] + g(n2)
with the intital condition g(0) = 0.
From these two definitions we can define the difference d between h and g as well, again as a recursive function:
d(n) = h(n) - g(n) = h(n2) - g(n2) + n2 = d(n2) + n2
with the initial condition d(0) = 0. Trying some examples leads to a geometric series, e.g. d(9999) = d(999) + 999 = d(99) + 99 + 999 = d(9) + 9 + 99 + 999 = 0 + 9 + 99 + 999 = (10^0)-1 + (10^1)-1 + (10^2)-1 + (10^3)-1 = (10^4 - 1)/(10-1) - 4. This gives the closed form
d(n) = n/9 - k
This allows us to express g in terms of g only:
g(n) = 9 [k2 + h(n2)] + g(n2) = 9 [k2 + g(n2) + d(n2)] + g(n2) = 9 k2 + 9 d(n2) + 10 g(n2) = 9 k2 + n2 - 9 k2 + 10 g(n2) = 10 g(n2) + n2
Derivation of G
Using the above definitions and naming the k digits of the representation q_k, q_k2, .., q2, q1 we first extend h into H:
H(q_k q_k2..q_1) = q_k [k2 + h(n2)] + r (k2-kr) + H(q_kr..q_1) + n2
with initial condition H(q_1) = 0 for q_1 <= 9.
Note the additional definition r = q_kr..q_1. To understand why it is needed look at the example H(901), where the next level call to H is H(1), which means that the digit string length shrinks from k=3 to kr=1, needing an additional padding with r (k2-kr) zero digits.
Using this, we can extend g to G as well:
G(q_k q_k2..q_1) = (q_k-1) [k2 + h(n2)] + k2 + r (k2-kr) + H(q_kr..q_1) + g(n2)
with initial condition G(q_1) = 0 for q_1 <= 9.
Note: It is likely that one can simplify the above expressions like in case of g above. E.g. trying to express G just in terms of G and not using h and H. I might do this in the future. The above is already enough to implement a fast zero calculation.
Test Result
recursive(1234567890, 2345678901) =
987654304
expected:
987654304
success
See the source and log for details.
Update: I changed the source and log according to the more detailed problem description from that contest (allowing 0 as input, handling invalid inputs, 2nd larger example).
You can use a standard approach to find m = [1, M-1] and n = [1, N], then [M, N] = n - m.
Standard approaches are easily available: Counting zeroes.

Confused regarding pre-processed table in KMP

Looking over the KMP algorithm, and confused regarding a specific line in KMP which calculates the table of suffix-prefix counts.
algorithm kmp_table:
input:
an array of characters, W (the word to be analyzed)
an array of integers, T (the table to be filled)
output:
nothing (but during operation, it populates the table)
define variables:
an integer, pos ← 2 (the current position we are computing in T)
an integer, cnd ← 0 (the zero-based index in W of the next
character of the current candidate substring)
(the first few values are fixed but different from what the algorithm
might suggest)
let T[0] ← -1, T[1] ← 0
while pos is less than the length of W, do:
(first case: the substring continues)
if W[pos - 1] = W[cnd],
let cnd ← cnd + 1, T[pos] ← cnd, pos ← pos + 1
(second case: it doesn't, but we can fall back)
otherwise, if cnd > 0, let cnd ← T[cnd]
(third case: we have run out of candidates. Note cnd = 0)
otherwise, let T[pos] ← 0, pos ← pos + 1
Above is taken straight from wikipedia. I'm a bit confused if cnd > 0 why set cnd := T[cnd], shouldn't cnd be reset back to 0 as if we're starting again?
Obviously, T[0] = -1 and so setting cnd to T[cnd = 0] = -1 would amount to reading W[cnd = -1] on the next iteration and that's outside of the string. At least for that reason you need a special treatment for cnd > 0 vs cnd == 0.
The real reason we compare cnd to 0 is that T[cnd] is supposed to give us the position in W[] to which to rewind when there's a partial string match to the left of W[cnd]. T[0], however, can't be used for this purpose because there's nothing to the left of W[0].
why set cnd := T[cnd], shouldn't cnd be reset back to 0 as if we're starting again?
You are missing the whole point of the algorithm. If you restart at position 0 after a partial match, you are back at the naïve algorithm. T[] contains the rewind positions, which as you can see from the sample tables with W[] and T[] right below, aren't always 0. So, instead of going all the way back to position 0, you sometimes go to other positions and continue matching from there. And that's what makes the algorithm more scalable than the naïve one.

Resources