find median of a unsorted array using heap - median

Is there a way to find median of a unsorted array using heap ? If it is possible, is it more efficient than using sorting and then finding median ?

The trick here is to use two heaps of which one is min-heap and other is max-heap. I will not go in details, but the following points are sufficient to implement the required algorithm.
The top of the min-heap is the smallest element greater than or equal to the mean
The top of the max-heap is the largest element less than or equal to the mean.
Now coming to your second question, it is only efficient if you want to find the running median i.e. the median just after inserting a new element each time into the array.
If you want to calculate the median of all the array elements just once, then sorting will be a good idea.
Hope this helps.

Related

What's the better way to choose pivot for quicksort?

Some people told me there were a list of optimized pivot for Quicksort, but I searched on the net and I didn't found it.
So this list contain a lot of prime number, but also many others (nowadays we aren't able to explain why this pivot are the best).
Then if u know Something about it or have some documentation, I'm interested.
If you know another way to optimize the quicksort I'm interested too.
Thanks in advance
One sort that uses a list of numbers is shell short, where the numbers are used for the "gaps":
https://en.wikipedia.org/wiki/Shellsort#Gap_sequences
For quicksort, using median of 3 will help, median of 9 helps a bit more, and median of medians guarantees worst case O(n log(n)) time complexity, but involves a large constant factor that in most cases, results in a slower overall quicksort.
https://en.wikipedia.org/wiki/Median_of_medians
Introsort with a reasonable pivot choice (random, median of 3, 9, ...), that switches to heapsort if the level of recursion becomes too deep is a common choice.
https://en.wikipedia.org/wiki/Introsort
There's no better way to pick a pivot than to pick the middle element of the list as our pivot.
Why?
The ideal way to find a pivot for a list of numbers is to find a pivot randomly. However, the additional randomization process will take additional time complexity or space complexity.
What if we just select the first element as a pivot and say that's somehow "random" for the list?
If the list is already sorted and select the first element as a pivot, then the algorithm will generate to have a time complexity of O(n^2) instead of our average time O(nlogn).
Therefore, in order to guarantee no additional time complexity is used and to not degenerate our algorithm. The quickest, easiest, and most common fix is to use the middle element of the list as our pivot. If so, we could guarantee that our algorithm is O(nlogn). It will be extremely difficult for our algorithm at this point to degenerate, unless it is ordered purposely in a way so as to degenerate it.

Excel- Generating a set of numbers with normal distribution with MIN and MAX

I want to generate a single column of 6000 numbers with a normal distribution, with a mean of 30.15, standard deviation of 49.8, minium of -11.5, maximum 133.5.
I am a total newb at this so i tried to use the following formula in a cell and than just drag it down to cell 6000:
=NORMINV(RANDBETWEEN(-11.5,133.5)/100,30.15,49.8)
It returns a value but sometimes it returns #NUM! error. Thank you!
Unfortunately NORMINV expects a probability for the argument, which must be a value in the interval (0, 1). Any parameter outside that range will yield #NUM!.
What you're asking cannot be done directly with a normal distribution since that has no constraints on the minimum and maximum values.
One approach is to use a primary column to generate the normally distributed numbers, then filter out the ones you want in the adjacent column. But this will cause even the mean (let alone higher moments) to go off quite considerably due to your minimum and maximum values not being equidistant from the mean. You could get round this by recentering the distribution and adjusting afterwards.

Statistic with Median

So I have an excel-sheet where I have different values, for example:
I want to judge these objects by their values. The values should be between 0 and 1 so in the end I can draw a Matrix. So far so good. What you could do is just take the maximum value and divide the value of the object with that maximum value. For the final result I just take the average of all 3 values.
Now I have the problem, that if one value is too big, it effects the whole situation. So I know, that the Median tries to resolve this, but how can I use this, to get the percentages/values between 0 and 1? And is there an easy way to do this in Excel?
Does excel not have a median function?
Otherwise, you can sort and find the middle value if the number of rows is odd, or the two middle values if the number of rows is even and take the average of those two values to get the median.

Can you estimate percentiles in unordered data?

Suppose you have a very large list of numbers which would be expensive to sort. They are real numbers/decimals but all lie in the same range, say 0 to n for some integer n. Are there any methods for estimating percentiles that don't require sorting the data i.e. an algorithm that has better complexity than the fastest sorting algorithm.
Note: The tag is quantiles only because there is no existing tag for percentiles and it wouldn't let me create one; my question is not specific to quantiles.
In order to find the p-th percentile of a set of N numbers, essentially you are trying to find the k-th largest number where k = N*p/100 (rounded down, I think--or on second thought, thinking of the median, for example, maybe it's rounded up).
You might try the median of medians algorithm, which is supposed to be able to find the k-th largest number among N numbers in O(N) time.
I don't know where this is implemented in a standard library but a proposed implementation
was posted in one of the answers to this question.

Applying dp for values from 1-10^9

I want to store fibonacci nos from 1-10^9. using dp. How can this be done when the maximum size of array is much less than that.
Not possible. You can't store all the numbers if the maximum allowed array can not store them. How ever you need not store all the values to compute only a single value - you need only 3 variables. Also there is a logarithmic solution for finding F(n) - the n-th number of fibonacci.

Resources