How to determine what character would be at a given index in a sorted string without hashing or sorting? - string

We are given with a string and an integer. We have to tell what character would be at that integer position in the string if the characters were to be placed into sort order.
For Example
String = LALIT
Index = 3
Sorted string AILLT and the character at position 3 is L
Is it possible to solve this problem without sorting?
if yes then can someone provide a pseudo code.

Yes, it's possible to do this. You're looking for something called a selection algorithm which, given a list of elements and a number k, returns what element would be in position k if the elements were to be in sorted order. Amazingly enough, it's possible to do this without sorting the entire list!
The simplest non-sorting algorithm for selection is called quickselect, which runs in expected time O(n) and, provided you're allowed to modify the original array, uses only O(1) auxiliary storage space. The idea behind quickselect is to do a single step of quicksort - pick a pivot element, partition the elements into elements less than the pivot, elements equal to the pivot, and elements greater than the pivot - then to see what happens based on that. If the pivot element ends up in position k after this step, then you're done - that's the element that would be at position k in the final sequence. If the pivot is at a position higher than k, recursively look to the left of the pivot (the kth smallest element is somewhere in there), and if the pivot is at a position lower than k, recursively look to the right of the pivot (the kth smallest element is somewhere in there).
Other approaches exist as well, such as the median-of-medians algorithm that always runs in worst-case O(n) time but is a classic "tricky algorithm to wrap your head around."

Related

Find lexicographically smallest string with given hash value [Competitive Coding]

I encountered the following problem for which I couldn't quite find the appropriate solution.
The problem says for a given string having a specific hash value, find the lowest string (which is not the same as the given one) of the
same length and same hash value (if one exists). E.g. For the
following value mapping of alphabets: {a:0, b:1, c:2,...,z:25}
If the given string is: ady with hash value - 27. The
lexicographically smallest one (from all possible ones excluding the
given one) would be: acz
Solution approach I could think of:
I reduced the problem to Coin-Change problem and resorted to finding all possible combinations for the given sum. Out of all the obtained solutions, I sort them up and find the lowest (or the next smallest if the given string is smallest).
The problem however lies with finding all possible solutions (even in a DP approach) which might be inefficient for larger inputs.
My doubt is:
What solution strategy (possibly even Greedy) could give a better time complexity than above?
I cannot guarantee that this will give you a lower complexity, but a couple of things:1 you don't need to check all the space, just the space of lexicographic value less than or equal to the given string. 2: you can formulate it as an integer programming problem:
Assuming your character space is the letters, and each letter is given its number index[0-25] so a corresponds to 0, b to 1 and so forth. let x_i be the number of letters in your string corresponding to index i. You can formulate your problem as:
min sum_i(wi*xi)
st xi*ai = M
xi>=0,
sum_i(xi)=n
sum_i(wi*xi)<= N
xi integer
Where wi= 26^i, ai is equal to hash(letter(i)), n is the number of letters of the original string, N is the hash value of the original string. This is an integer programming problem so you can try plugging it to a solver. The original problem is very similar to subset sum problem with fixed subset size (where the hash values are the elements you are summing over, and the subset size is the length of the string) so you might also want to take a look at that, although as you will see from the answer it is a complicated problem.

Looking for a way to distinguish identical string entries for index use

I am making a function in python 3.5.2 to read chemical structures (e.g. CaBr2) and then gives a list with the names of the elements and their coefficients.
The general rundown of how I am doing it is i have a for loop, it skips the first letter. Then it will append the previous element when it reaches one of: capital letter/number/the end. I did this with index of my iteration, and then get the entry with index(iteration)-1 or -2 depending on the specifics. For the given example it would skip C, read a but do nothing, reach B and append to my name list the translation of Ca, and append 1 to my coefficient list.
This works perfectly for structures with unique entries, but with something like CaCl2, the index of the iteration at the second C is not 2, but zero as index doesn't differentiate between the two. How would I be able to have variables in my function equal to the value at previous index(es) without running in to this problem? Keeping in mind inputs can be of any length, capitalization cannot change, and there could be any number of repeated values

Why and how does a bad pivot choice make Quicksort O(n^2)?

For example when the pivot is the highest or lowest value in the array.
For quicksort that uses 2 pointers, 1 goes Left end to right, the other goes right end to left, a pointer stops when it finds an element out of place in respect to the pivot, when both have stopped, they swap the elements and continue on from that position. But, why and how does a bad pivot choice make Quicksort O(n^2)?
how does a bad pivot choice make Quicksort O(n^2)?
Let's say you always pick the smallest element as your pivot. The top-level iteration of quicksort will require n-1 comparisons and will split the array into two subarrays: one of size 1 and one of size n-1. The first one is already sorted, and you apply the quicksort recursively to the second one. Splitting the second one will require n-2 comparisons. And so on.
In total, you have (n-1) + (n-2) + ... + 1 = n * (n-1) / 2 = O(n^2) comparisons.
If your chosen pivot happened to be the maximal value in your subset on every recursion, the algorithm would simply move every record read into the subset below the pivot, and continue on with only one non-empty partition. This new subset's size would be only one less.
In that case, the quicksorts operation would be similar to a selection sort. I would find a maximal value, put it where it goes, and move on to the rest of the data in the next iteration. The difference being that the selection sort searches for the maximal (or minimal) data point, where the worst-case quicksort would happen to select the maximal value and then discover that it is, indeed, the maximum.
This is a quite rare case, to my knowledge.
Try a list with n times the same number.
Choose any way to find a privot.
Look whats happening.
(Edit: To make some hints:
The pivot does not depend on the way to find a pivot, because it is always the same.
So in every iteration, for the current list with n elements you will need n comparisons and you will split the list with n current elements in two sublists with 1 and n-1 elements.
You can quickly calculate the number of operations overall. You need n, n-1, n-2, ..., 2, 1 operations.
Formally, it is the sum from i=1 to n over i, for which you should know a formula to see it is O(n*n))

Lexiographically first fixed size substring for each sliding window position

From given string, I want to find substring (some fixed size k) that comes first in lexiographical sort order among all substrings of same size in the string.
I would do this with sliding window over very long string (size m), and would like to find that substring for every sliding window (size n > k) position when I move it trough the string.
It seems that trivial solution would take m*O(n log(n)) time.
I think I could get to m*O(log(n)) if I make normal sort at the beginning and then just remove the substring that starts at the beginning of last window position and insert new substring that ends at the end of the current window position into already sorted collection of substrings every time I move the window. (of course I don't store substrings separetly but just keep their positions in the collection, so space requirement would be just n-k integers),
Is there faster algorithm for this?
Let m be the size of the input string and n be the length of the string that you're looking for. I think you can solve this in time O(m) by using suffix trees.
Start by building a suffix tree for the input string. This takes time O(m). Now, do a depth-first search on the tree, always choosing the lexicographically first choice at each step. In the course of doing so, the first string of length n that you find is the lexicographically-first substring of length n. Doing a DFS over a suffix tree for a string of length m takes time O(m), so overall this takes time O(m).

Count no. of words in O(n)

I am on an interview ride here. One more interview question I had difficulties with.
“A rose is a rose is a rose” Write an
algorithm that prints the number of
times a character/word occurs. E.g.
A – 3 Rose – 3 Is – 2
Also ensure that when you are printing
the results, they are in order of
what was present in the original
sentence. All this in order n.
I did get solution to count number of occurrences of each word in sentence in the order as present in the original sentence. I used Dictionary<string,int> to do it. However I did not understand what is meant by order of n. That is something I need an explanation from you guys.
There are 26 characters, So you can use counting sort to sort them, in your counting sort you can have an index which determines when specific character visited first time to save order of occurrence. [They can be sorted by their count and their occurrence with sort like radix sort].
Edit: by words first thing every one can think about it, is using Hash table and insert words in hash, and in this way count them, and They can be sorted in O(n), because all numbers are within 1..n steel you can sort them by counting sort in O(n), also for their occurrence you can traverse string and change position of same values.
Order of n means you traverse the string only once or some lesser multiple of n ,where n is number of characters in the string.
So your solution to store the String and number of its occurences is O(n) , order of n, as you loop through the complete string only once.
However it uses extra space in form of the list you created.
Order N refers to the Big O computational complexity analysis where you get a good upper bound on algorithms. It is a theory we cover early in a Data Structures class, so we can torment, I mean help the student gain facility with it as we traverse in a balanced way, heaps of different trees of knowledge, all different. In your case they want your algorithm to grow in compute time proportional to the size of the text as it grows.
It's a reference to Big O notation. Basically the interviewer means that you have to complete the task with an O(N) algorithm.
"Order n" is referring to Big O notation. Big O is a way for mathematicians and computer scientists to describe the behavior of a function. When someone specifies searching a string "in order n", that means that the time it takes for the function to execute grows linearly as the length of that string increases. In other words, if you plotted time of execution vs length of input, you would see a straight line.
Saying that your function must be of Order n does not mean that your function must equal O(n), a function with a Big O less than O(n) would also be considered acceptable. In your problems case, this would not be possible (because in order to count a letter, you must "touch" that letter, thus there must be some operation dependent on the input size).
One possible method is to traverse the string linearly. Then create a hash and list. The idea is to use the word as the hash key and increment the value for each occurance. If the value is non-existent in the hash, add the word to the end of the list. After traversing the string, go through the list in order using the hash values as the count.
The order of the algorithm is O(n). The hash lookup and list add operations are O(1) (or very close to it).

Resources