Conditional Prolog Sorting approach - programming-languages

I am new to prolog language. I came across an interesting problem in prolog.
Generally, quicksort works well for large lists. But for smaller lists, insertion sort works better than quicksort. How can i write a sorting algorithm in Prolog that uses quicksort
initially, but switches to insertion sort for sublists of 15 or fewer elements.
The hint is that we can count the number of elements during the partition operation. But i don't know how to form an algorithm for this problem. Can anyone please guide/help.
Thanks a lot in advance.

partition puts each element either to one or the other sublist. so just maintain two more arguments which are counts for the sublists, starting as 0, and increment the corresponding counter when adding another element to its sublist:
part([] ,[],[],0 ,0).
part([P|LS],L , R,CL,CR):- part(P,LS,L,[],R,[],0,CL,0,CR).
part(_,[] ,LZ,LZ,RZ,RZ,CL,CL,CR,CR).
part(P,[X|LS],L ,LZ,R ,RZ,IL,CL,JR,CR):-
X < P -> L=[X|T],I2 is IL+1, part(P,LS,T,LZ,R,RZ,I2,CL,JR,CR)
; .....

You can make several clauses for the mysort rule that choose the algorithm based on the length of the list, like this:
mySort(In, Out) :-
count(In, Cnt),
Cnt < 15,
insertionSort(In, Out).
mySort(In, Out) :-
count(In, Cnt),
Cnt >= 15,
quickSort(In, Out).
quickSort(In, Out) :-
partition(In, Left, Right),
mySort(Left, SortedLeft),
mySort(Right, SortedRight),
mergeSorted(SortedLeft, SortedRight, Out).
The trick is that the quickSort/2 rule references sort, not quickSort, after partitioning the input. This means that as soon as the count drops below 15, insertionSort is going to be used to sort the smaller partitions.

Related

Algorithm, that finds the k-greatest number in O(n*log(k))

was wondering, if you have given an unsorted list of arrays of any length n >= k,
what is your idea, to find the k-greatest number in O(n*log(k)) time. So the k = 2 -greatest number of an Array containing the numbers 1 to 9 would be 8 for example.
I'm trying to code this in python, if you have an idea how in that time complexity :)
My answer is not python-specific, however you should be able to implement the used concepts in python, or find libraries already implementing them.
The basic idea is to iterate over the list and store the current greatest, second greatest, ... , k-greatest number in a separate data structure. Since you will be iterating over all n entries in your array, the complexity of this is in O(n * insertion_step_complexity)
As seen above, the insertion step needs to not exceed a complexity of O(log(k)) to achieve this you can use a AVL-Tree that has a complexity of O(log(m)) for inserting and deleting items, where m is the number of items stored within the avl-tree.
An algorithm would look like this:
def find_k_greatest_number(k, array):
avl_tree = initialize AVL tree here
avl_items = 0
for number in array:
if (number > avl_tree.smallest_number()):
if (avl_itmes >= k):
avl_tree.delete_smallest_number()
else:
avl_items++
avl_tree.insert(number)
return avl_tree.smallest_number()
Finding the smallest number in a sorted tree is dependent on its height. Since the AVL tree can't exceed the height of log(k) the complexity of finding the smallest number is O(log(k)).

Valid Sudoku: How to decrease runtime

Problem is to check whether the given 2D array represents a valid Sudoku or not. Given below are the conditions required
Each row must contain the digits 1-9 without repetition.
Each column must contain the digits 1-9 without repetition.
Each of the 9 3x3 sub-boxes of the grid must contain the digits 1-9 without repetition.
Here is the code I prepared for this, please give me tips on how I can make it faster and reduce runtime and whether by using the dictionary my program is slowing down ?
def isValidSudoku(self, boards: List[List[str]]) -> bool:
r = {}
a = {}
for i in range(len(boards)):
c = {}
for j in range(len(boards[i])):
if boards[i][j] != '.':
x,y = r.get(boards[i][j]+f'{j}',0),c.get(boards[i][j],0)
u,v = (i+3)//3,(j+3)//3
z = a.get(boards[i][j]+f'{u}{v}',0)
if (x==0 and y==0 and z==0):
r[boards[i][j]+f'{j}'] = x+1
c[boards[i][j]] = y+1
a[boards[i][j]+f'{u}{v}'] = z+1
else:
return False
return True
Simply optimizing assignment without rethinking your algorithm limits your overall efficiency by a lot. When you make a choice you generally take a long time before discovering a contradiction.
Instead of representing, "Here are the values that I have figured out", try to represent, "Here are the values that I have left to try in each spot." And now your fundamental operation is, "Eliminate this value from this spot." (Remember, getting it down to 1 propagates to eliminating the value from all of its peers, potentially recursively.)
Assignment is now "Eliminate all values but this one from this spot."
And now your fundamental search operation is, "Find the square with the least number of remaining possibilities > 1. Try each possibility in turn."
This may feel heavyweight. But the immediate propagation of constraints results in very quickly discovering constraints on the rest of the solution, which is far faster than having to do exponential amounts of reasoning before finding the logical contradiction in your partial solution so far.
I recommend doing this yourself. But https://norvig.com/sudoku.html has full working code that you can look at at need.

How does Duval's algorithm handle odd-length strings?

Finding the Lexicographically minimal string rotation is a well known problem, for which a linear time algorithm was proposed by Jean Pierre Duval in 1983. This blog post is probably the only publicly available resource that talks about the algorithm in detail. However, Duval's algorithms is based on the idea of pairwise comparisons ("duels"), and the blog conveniently uses an even-length string as an example.
How does the algorithm work for odd-length strings, where the last character wouldn't have a competing one to duel with?
One character can get a "bye", where it wins without participating in a "duel". The correctness of the algorithm does not rely on the specific duels that you perform; given any two distinct indices i and j, you can always conclusively rule out that one of them is the start-index of the lexicographically-minimal rotation (unless both are start-indices of identical lexicographically-minimal rotations, in which case it doesn't matter which one you reject). The reason to perform the duels in a specific order is performance: to get asymptotically linear time by ensuring that half the duels only need to compare one character, half of the rest only need to compare two characters, and so on, until the last duel only needs to compare half the length of the string. But a single odd character here and there doesn't change the asymptotic complexity, it just makes the math (and implementation) a little bit more complicated. A string of length 2n+1 still requires fewer "duels" than one of length 2n+1.
OP here: I accepted ruakh's answer as it pertains to my question, but I wanted to provide my own explanation for others that might stumble across this post trying to understand Duval's algorithm.
Problem:
Lexicographically least circular substring is the problem of finding
the rotation of a string possessing the lowest lexicographical order
of all such rotations. For example, the lexicographically minimal
rotation of "bbaaccaadd" would be "aaccaaddbb".
Solution:
A O(n) time algorithm was proposed by Jean Pierre Duval (1983).
Given two indices i and j, Duval's algorithm compares string segments of length j - i starting at i and j (called a "duel"). If index + j - i is greater than the length of the string, the segment is formed by wrapping around.
For example, consider s = "baabbaba", i = 5 and j = 7. Since j - i = 2, the first segment starting at i = 5 is "ab". The second segment starting at j = 7 is constructed by wrapping around, and is also "ab".
If the strings are lexicographically equal, like in the above example, we choose the one starting at i as the winner, which is i = 5.
The above process repeated until we have a single winner. If the input string is of odd length, the last character wins without a comparison in the first iteration.
Time complexity:
The first iteration compares n strings each of length 1 (n/2 comparisons), the second iteration may compare n/2 strings of length 2 (n/2 comparisons), and so on, until the i-th iteration compares 2 strings of length n/2 (n/2 comparisons). Since the number of winners is halved each time, the height of the recursion tree is log(n), thus giving us a O(n log(n)) algorithm. For small n, this is approximately O(n).
Space complexity is O(n) too, since in the first iteration, we have to store n/2 winners, second iteration n/4 winners, and so on. (Wikipedia claims this algorithm uses constant space, I don't understand how).
Here's a Scala implementation; feel free to convert to your favorite programming language.
def lexicographicallyMinRotation(s: String): String = {
#tailrec
def duel(winners: Seq[Int]): String = {
if (winners.size == 1) s"${s.slice(winners.head, s.length)}${s.take(winners.head)}"
else {
val newWinners: Seq[Int] = winners
.sliding(2, 2)
.map {
case Seq(x, y) =>
val range = y - x
Seq(x, y)
.map { i =>
val segment = if (s.isDefinedAt(i + range - 1)) s.slice(i, i + range)
else s"${s.slice(i, s.length)}${s.take(s.length - i)}"
(i, segment)
}
.reduce((a, b) => if (a._2 <= b._2) a else b)
._1
case xs => xs.head
}
.toSeq
duel(newWinners)
}
}
duel(s.indices)
}

Finding the minimum number of swaps to convert one string to another, where the strings may have repeated characters

I was looking through a programming question, when the following question suddenly seemed related.
How do you convert a string to another string using as few swaps as follows. The strings are guaranteed to be interconvertible (they have the same set of characters, this is given), but the characters can be repeated. I saw web results on the same question, without the characters being repeated though.
Any two characters in the string can be swapped.
For instance : "aabbccdd" can be converted to "ddbbccaa" in two swaps, and "abcc" can be converted to "accb" in one swap.
Thanks!
This is an expanded and corrected version of Subhasis's answer.
Formally, the problem is, given a n-letter alphabet V and two m-letter words, x and y, for which there exists a permutation p such that p(x) = y, determine the least number of swaps (permutations that fix all but two elements) whose composition q satisfies q(x) = y. Assuming that n-letter words are maps from the set {1, ..., m} to V and that p and q are permutations on {1, ..., m}, the action p(x) is defined as the composition p followed by x.
The least number of swaps whose composition is p can be expressed in terms of the cycle decomposition of p. When j1, ..., jk are pairwise distinct in {1, ..., m}, the cycle (j1 ... jk) is a permutation that maps ji to ji + 1 for i in {1, ..., k - 1}, maps jk to j1, and maps every other element to itself. The permutation p is the composition of every distinct cycle (j p(j) p(p(j)) ... j'), where j is arbitrary and p(j') = j. The order of composition does not matter, since each element appears in exactly one of the composed cycles. A k-element cycle (j1 ... jk) can be written as the product (j1 jk) (j1 jk - 1) ... (j1 j2) of k - 1 cycles. In general, every permutation can be written as a composition of m swaps minus the number of cycles comprising its cycle decomposition. A straightforward induction proof shows that this is optimal.
Now we get to the heart of Subhasis's answer. Instances of the asker's problem correspond one-to-one with Eulerian (for every vertex, in-degree equals out-degree) digraphs G with vertices V and m arcs labeled 1, ..., m. For j in {1, ..., n}, the arc labeled j goes from y(j) to x(j). The problem in terms of G is to determine how many parts a partition of the arcs of G into directed cycles can have. (Since G is Eulerian, such a partition always exists.) This is because the permutations q such that q(x) = y are in one-to-one correspondence with the partitions, as follows. For each cycle (j1 ... jk) of q, there is a part whose directed cycle is comprised of the arcs labeled j1, ..., jk.
The problem with Subhasis's NP-hardness reduction is that arc-disjoint cycle packing on Eulerian digraphs is a special case of arc-disjoint cycle packing on general digraphs, so an NP-hardness result for the latter has no direct implications for the complexity status of the former. In very recent work (see the citation below), however, it has been shown that, indeed, even the Eulerian special case is NP-hard. Thus, by the correspondence above, the asker's problem is as well.
As Subhasis hints, this problem can be solved in polynomial time when n, the size of the alphabet, is fixed (fixed-parameter tractable). Since there are O(n!) distinguishable cycles when the arcs are unlabeled, we can use dynamic programming on a state space of size O(mn), the number of distinguishable subgraphs. In practice, that might be sufficient for (let's say) a binary alphabet, but if I were to try to try to solve this problem exactly on instances with large alphabets, then I likely would try branch and bound, obtaining bounds by using linear programming with column generation to pack cycles fractionally.
#article{DBLP:journals/corr/GutinJSW14,
author = {Gregory Gutin and
Mark Jones and
Bin Sheng and
Magnus Wahlstr{\"o}m},
title = {Parameterized Directed \$k\$-Chinese Postman Problem and \$k\$
Arc-Disjoint Cycles Problem on Euler Digraphs},
journal = {CoRR},
volume = {abs/1402.2137},
year = {2014},
ee = {http://arxiv.org/abs/1402.2137},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
You can construct the "difference" strings S and S', i.e. a string which contains the characters at the differing positions of the two strings, e.g. for acbacb and abcabc it will be cbcb and bcbc. Let us say this contains n characters.
You can now construct a "permutation graph" G which will have n nodes and an edge from i to j if S[i] == S'[j]. In the case of all unique characters, it is easy to see that the required number of swaps will be (n - number of cycles in G), which can be found out in O(n) time.
However, in the case where there are any number of duplicate characters, this reduces to the problem of finding out the largest number of cycles in a directed graph, which, I think, is NP-hard, (e.g. check out: http://www.math.ucsd.edu/~jverstra/dcig.pdf ).
In that paper a few greedy algorithms are pointed out, one of which is particularly simple:
At each step, find the minimum length cycle in the graph (e.g. Find cycle of shortest length in a directed graph with positive weights )
Delete it
Repeat until all vertexes have not been covered.
However, there may be efficient algorithms utilizing the properties of your case (the only one I can think of is that your graphs will be K-partite, where K is the number of unique characters in S). Good luck!
Edit:
Please refer to David's answer for a fuller and correct explanation of the problem.
Do an A* search (see http://en.wikipedia.org/wiki/A-star_search_algorithm for an explanation) for the shortest path through the graph of equivalent strings from one string to the other. Use the Levenshtein distance / 2 as your cost heuristic.

Haskell function taking a long time to process

I am doing question 12 of project euler where I must find the first triangle number with 501 divisors. So I whipped up this with Haskell:
divS n = [ x | x <- [1..(n)], n `rem` x == 0 ]
tri n = (n* (n+1)) `div` 2
divL n = length (divS (tri n))
answer = [ x | x <- [100..] , 501 == (divL x)]
The first function finds the divisors of a number.
The second function calculates the nth triangle number
The 3rd function finds the length of the list that are the divisors of the triangle number
The 4th function should return the value of the triangle number which has 501 divisors.
But so far this run for a while without returning a result. Is the answer very large or do I need some serious optimisation to make this work in a realistic amount of time?
You need to use properties of divisor function: http://en.wikipedia.org/wiki/Divisor_function
Notice that n and n + 1 are always coprime, so that you can get d(n * (n + 1) / 2) by multiplying previously computed values.
It is probably faster to prime-factorise the number and then use the factorisation to find the divisors, than using trial division with all numbers <= sqrt(n).
The Sieve of Eratosthenes is a classical way of finding primes, which may be modified slightly to find the number of divisors of each natural number. Instead of just marking each non-prime as "not prime", you could make a list of all the primes dividing each number.
You can then use those primes to compute the complete set of divisors, or just the number of them, since that is all you need.
Another variation would be to mark not just multiples of primes, but multiples of all natural numbers. Then you could simply use a counter to keep track of the number of divisors for each number.
You also might want to check out The Genuine Sieve of Eratosthenes, which explains why
trial division is way slower than the real sieve.
Last off, you should look carefully at the different kinds of arrays in Haskell. I think it is probably easier to use the ST monad to implement the sieve, but it might be possible to achieve the correct complexity using accumArray, if you can make sure that your update function is strict. I have never managed to get this to work though, so you are on your own here.
If you were using C instead of Haskell, your function would still take much time.
To make it faster you will need to improve the algorithm, using suggestions from the above answers. I suggest to change the title and question description accordingly. Following that I'll delete this comment.
If you wish, I can spoil the problem by sharing my solution.
For now I'll give you my top-level code:
main =
print .
head . filter ((> 500) . length . divisors) .
map (figureNum 3) $ [1..]
The algorithmic improvement lies in the divisors function. You can further improve it using rawicki's suggestion, but already this takes less than 100ms.
Some optimization tips:
check for divisors between 1 and sqrt(n). I promise you won't find any above that limit (except for the number itself).
don't build a list of divisors and count the list, but count them directly.

Resources