What is the efficient way to find summation of NCk-2x for x in range 0 to k/2?
I tried using naive approach but the complexity is very high. Is there any other approach for constraints N,k<100000.
I got this(https://oeis.org/A008949 )but still I'am unable to figure out something to calculate upto 100000 with O(n) or o(nlogn) complexity.
Related
I have a dataset with 9448 data points (rows)
Whenever I choose values of K ranging BETWEEN 1 to 10, the accuracy comes out to be 100 percent ( which is an ideal case ofcourse! ) and wierd.
If I choose my K value to be be 100 or above the accuracy decreases gradually (95% to 90%).
How does one choose the value of K? We want a decent accuracy and not hypothetical as 100 percent
Well, a simple approach to select k is sqrt(no. of datapoints). In this case, it will be sqrt(9448) = 97.2 ~ 97. And please keep in mind that It is inappropriate to say which k value suits best without looking at the data. If training samples of similar classes form clusters, then using k value from 1 to 10 will achieve good accuracy. If data is randomly distributed then one cannot say which k value will give the best results. In such cases, you need to find it by performing an empirical analysis.
I'm in excel trying to count how many "peaks" there are in my data. Initially I thought "find the median, set a threshold value, count the number of values above said threshold value." So I ended up using =median(),=max() and =ifcount().
The problem is is that for a peak, there may be data-points that end up making the "slope" of said peak which are higher than the threshold value.
Wondering if there's an easy way in excel to count said peaks, or if I have to figure out a way to convert the data to a function and take a second derivative to find local maxima points.
I mean I know of algorithms which talk about generating exponential possibilities and iterating through them. But can anyone give me a pseudo code where the code goes through all cases and finds the answer.
Yes, there is. The simple algorithm used for calculating the Fibonacci series without dynamic programming is the best example.
int f(n)
{
if(f == 0 || f == 1)
return 1;
return f(n-1)+f(n-2);
}
This code takes exponential time. The time for calculating f(n) is proportional to the n+1th Fibonacci number. You can check this link to know about the growth of Fibonacci series (Courtesy : David Leese's blog). If you look at the logarithmic graph of Fibonacci series, you can see that it has an exponential growth.
The solution is dynamic programming, of course. Store the Fibonacci series elements that we have calculated so far and store it as a look-up table.
I have a list of 153 golfers with associated salaries and average scores.
I want to find the combination of 6 golfers that optimizes avg score and keeps salary under $50,000.
I've tried using Solver, but I am stuck! Can anyone help please? :)
Illustrating a solution that is pretty close to what #ErwinKalvelagen suggested.
Column A is the names of the 153 golfers
Column B is the golfers salaries (generated by =RANDBETWEEN(50, 125)*100, filled down, then Copy/Paste Values)
Column C is the golfers average scores (generated by =RANDBETWEEN(70, 85), filled down, then Copy/Paste Values)
Column D is a 0 or 1 to indicate if the golfer is included.
Cell F2 is the total salary, given by =SUMPRODUCT(B2:B154,D2:D154)
Cell G2 is the number of golfers, given by =SUM(D2:D154)
Cell H2 is the average score of the team, given by =SUMPRODUCT(C2:C154,D2:D154)/G2
The page looks like this, before setting up Solver ...
The Solver setup looks like this ...
According to the help, it says to use Evolutionary engine for non-smooth problems. In Options, I needed to increase the Maximum Time without improvement from 30 to 300 (60 may have been good enough).
It took a couple of minutes for it to complete. It reached the solution of 70 fairly quickly, but spent more time looking for a better answer.
And here are the six golfers it came up with.
Of the golfers with an average of 70, it could have found a lower salary.
In Cell I2 added the formula =F2+F2*(H2-70) which is essentially salary penalized by increases in average score above 70 ...
... and use the same Solver setup, except to minimize Cell I2 instead of H2 ...
and these are the golfers it chose ...
Again - it looks like there is still a better solution. It could have picked Name97 instead of Name96.
This is a simple optimization problem that can be solved using Excel solver (just use "Simplex Lp solver" -- somewhat of a misnomer as we will use it here to solve an integer programming or MIP problem).
You need one column with 153 binary (BIN) variables (Excels limit is I believe 200). Make sure you add a constraint to set the values to Binary. Lets call this column INCLUDE; Solver will fill it with 0 or 1 values. Sum these values, and add a constraint with SUMINCLUDE=6. Then add a column with INCLUDE * SCORE. Sum this column and this is your objective (optimizing the average is the same as optimizing the sum). Then add a column with INCLUDE*SALARY and sum these. Add a constraint with SUMSALARY <= 50k. Press solve and done.
I don't agree with claims that Excel will crash on this or that this does not fit within the limits of Excels solver. (I really tried this out).
I prefer the simplex method above the evolutionary solver as the simplex solver is more suitable for this problem: it is faster (simplex takes < 1 seconds) and provides optimal solutions (evolutionary solver gives often suboptimal solutions).
If you want to solve this problem with Matlab a function to look at is intlinprog (Optimization Toolbox).
To be complete: this is the mathematical model we are solving here:
Results with random data:
....
Suppose you have a very large list of numbers which would be expensive to sort. They are real numbers/decimals but all lie in the same range, say 0 to n for some integer n. Are there any methods for estimating percentiles that don't require sorting the data i.e. an algorithm that has better complexity than the fastest sorting algorithm.
Note: The tag is quantiles only because there is no existing tag for percentiles and it wouldn't let me create one; my question is not specific to quantiles.
In order to find the p-th percentile of a set of N numbers, essentially you are trying to find the k-th largest number where k = N*p/100 (rounded down, I think--or on second thought, thinking of the median, for example, maybe it's rounded up).
You might try the median of medians algorithm, which is supposed to be able to find the k-th largest number among N numbers in O(N) time.
I don't know where this is implemented in a standard library but a proposed implementation
was posted in one of the answers to this question.