If complexity is O(nlog2(n))...
How to prove execution time for data like 10e7if we know that for data like 10e5execution time is 0.1s?
In short: To my knowledge, you don't prove it in this way.
More verbosely:
The thing about complexity is that they are reported in Big O notation, in which any constants and lower order terms are discarded. For example; the complexity in the question O(nlog2(n)), however this could be the simplified form of k1 * n * log(k2 * log(k3 * n + c3) + c2) + c1.
These constants cover things like initialization tasks which take the same time regardless of the number of samples, the proportional time it takes to do the log2(n) bit (each one of those could potentially take 10^6 times longer than the n bit), and so on.
In addition to the constants you also have variable factors, such as the hardware on which the algorithm is executed, any additional load on the system, etc.
In order use this as the basis for an estimate of execution time you would need to have enough samples of execution times with respect to problem sizes to estimate the both the constants and variable factors.
For practical purposes one could gather multiple samples of execution times for a sufficiently sizable set of problem sizes, then fit the data with a suitable function based on your complexity formula.
In terms of proving an execution time... not really doable, the best you can hope for is a best fit model and a significant p-value.
Of course, if all you want is a rough guess you could always try assuming that all the constants and variables are 1 or 0 as appropriate and plug in the numbers you have: (0.1s / (10^5 * log2(10^5))) * (10^7 * log2(10^7)) = 11 ish
Related
I am working on multi objective Genetic Algorithms, I have say 4 objectives and no. of generations is 400, and a population size of 100.
So how many function evaluation will be there?
I mean to say is it 4*400*100 or 400*100?
If for each chromosome you evaluate 4 functions, then obviously you have a total of 4*400*100 evaluations.
What you might also want to consider is the running time of each of this evaluations, because if 3 of the functions run in O(n) and the forth runs in O(n^2), the total running time will be bounded by O(number_of_gens*population_size*n^2), and will be only mildly affected by the other three functions in large problem instances.
If you're asking about the number of evaluations as counted by MOO researchers (i.e., you want to know whether your algorithm is better than mine with the same number of evaluations), then the accepted answer is incorrect. In multi-objective optimization, we formally consider the problem not as optimizing k different functions, but as optimizing one vector-valued function.
It's one evaluation per individual, regardless of the dimensionality of the objective space.
As far as I know, the number of function evaluation of genetic algorithm can be calculated through following equation:
Number of function evaluations = Number of main population + [number of new children(from cross over) + number of mututed children(from mutation)] * number of itteration.
Say i have this very common DP problem ( Dynamic programming) -
Given a cost matrix cost[][] and a position (m, n) in cost[][], write a function that returns cost of minimum cost path to reach (m, n) from (0, 0). Each cell of the matrix represents a cost to traverse through that cell. Total cost of a path to reach (m, n) is sum of all the costs on that path (including both source and destination). You can only traverse down, right and diagonally lower cells from a given cell, i.e., from a given cell (i, j), cells (i+1, j), (i, j+1) and (i+1, j+1) can be traversed. You may assume that all costs are positive integers.
PS: answer to this - 8
Now, After solving this question.. Following Question ran through my mind.
Say i have 1000*1000 matrix. and O(n^2) will take some time (<1sec on intel i5 for sure).
but can i minimize it further. say starting 6-8 threads using this algorithm and then synchronizing them back to get the answer at last ? will it be fast or even logically possible to get answer or i should throw this thought away
Generally speaking, on such small problems (as you say < 1sec), parallel computing is less efficient than sequential due to protocol overhead (thread starting and synchronizing). Another problem might be, that you increase the cache miss rate because you're choosing the data you want to operate on "randomly" (not linearly) from the input. However, when it comes to larger problems, say matrices with 10 times as many entries, it sure is worth a thought (or two).
This is a possible solution. Given a 16x16 Matrix, we cut it into 4 equal squares. For each of those squares, one thread is responsible. The number in each little square indicates, after how many time units the result in that square can be calculated.
So, the total time is 33 units (whatever a unit is). Compared to the sequential solution with 64 units, it is just half of it. You can convince yourself that the runtime for any 2^k x 2^k Matrix is 2^(2k - 1) + 1.
However, this is only the first idea that came up to my mind. I hope that there is a (much) faster parallel solution in the world outside.
What's more, for the reasons I mentionned at the beginning of my answer, for all practical purposes, you would not achieve a speedup of 2 with my solution.
I'd start with algorithmic improvements. There's no need to test N2 solutions.
One key is the direction from which you entered a square. If you entered it by moving downward, there's no need to check the square to the right. Likewise, if you entered it by moving right, there's no need to check the path downward from there. The destination of a right-angle turn can always be reached via a diagonal move, leaving out one square and its positive weight/cost.
As far as threading goes, I can see (at least) a couple of ways of splitting things up. One would be to simply queue up requests from when you enter a square. I.e., instead of (for example) testing another square, it queues up requests to test its two or three exits. N threads process those requests, which generate more requests, continuing until all of them reach the end point.
This has the obvious disadvantage that you're likely to continue traversing some routes after serial code could abandon them because they're already longer than the shortest route you've round so far.
Another possibility would be to start two threads, one traversing forward, the other backward. In each, you find the shortest route to any given point along the diagonal, then you're left with a purely linear scan through those candidates to find the shortest sum.
Generally speaking when you are numerically evaluating and integral, say in MATLAB do I just pick a large number for the bounds or is there a way to tell MATLAB to "take the limit?"
I am assuming that you just use the large number because different machines would be able to handle numbers of different magnitudes.
I am just wondering if their is a way to improve my code. I am doing lots of expected value calculations via Monte Carlo and often use the trapezoid method to check my self of my degrees of freedom are small enough.
Strictly speaking, it's impossible to evaluate a numerical integral out to infinity. In most cases, if the integral in question is finite, you can simply integrate over a reasonably large range. To converge at a stable value, the integral of the normal error has to be less than 10 sigma -- this value is, for better or worse, as equal as you are going to get to evaluating the same integral all the way out to infinity.
It depends very much on what type of function you want to integrate. If it is "smooth" (no jumps - preferably not in any derivatives either, but that becomes progressively less important) and finite, that you have two main choices (limiting myself to the simplest approach):
1. if it is periodic, here meaning: could you put the left and right ends together and the also there have no jumps in value (and derivatives...): distribute your points evenly over the interval and just sample the functionvalues to get the estimated average, and than multiply by the length of the interval to get your integral.
2. if not periodic: use Legendre-integration.
Monte-carlo is almost invariably a poor method: it progresses very slow towards (machine-)precision: for any additional significant digit you need to apply 100 times more points!
The two methods above, for periodic and non-periodic "nice" (smooth etcetera) functions gives fair results already with a very small number of sample-points and then progresses very rapidly towards more precision: 1 of 2 points more usually adds several digits to your precision! This far outweighs the burden that you have to throw away all parts of the previous result when you want to apply a next effort with more sample points: you REPLACE the previous set of points with a fresh new one, while in Monte-Carlo you can just simply add points to the existing set and so refine the outcome.
'Simple' question, what is the fastest way to calculate the binomial coefficient? - Some threaded algorithm?
I'm looking for hints :) - not implementations :)
Well the fastest way, I reckon, would be to read them from a table rather than compute them.
Your requirements on integer accuracy from using a double representation means that C(60,30) is all but too big, being around 1e17, so that (assuming you want to have C(m,n) for all m up to some limit, and all n<=m), your table would only have around 1800 entries. As for filling the table in I think Pascal's triangle is the way to go.
According to the equation below (from wikipedia) the fastest way would be to split the range i=1,k to the number of threads, give each thread one range segment, and each thread updates the final result in a lock. "Academic way" would be to split the range into tasks, each task being to calculate (n - k + i)/i, and then no matter how many threads you have, they all run in a loop asking for next task. First is faster, second is... academic.
EDIT: further explanation - in both ways we have some arbitrary number of threads. Usually the number of threads is equal to the number of processor cores, because there is no benefit in adding more threads. The difference between two ways is what those threads are doing.
In first way each thread is given N, K, I1 and I2, where I1 and I2 are the segment in the range 1..K. Each thread then has all the data it neads, so it calculates its part of the result, and upon finish updates the final result.
In second way each thread is given N, K, and access to some syncronized counter that counts from 1 to K. Each thread then aquires one value from this shared counter, calculates one fraction of the result, updates the final result, and loops on this until counter informs the thread that there are no more items. If it happens that some processor cores are faster that others then this second way will put all cores to maximum use. Downside to second way is too much synchronization that effectively blocks, say, 20% of threads all the time.
Hint: You want to do as little multiplications as possible. The formula is n! / (k! * (n-k)!). You should do less than 2m multiplications, where m is the minimum of k and n-k. If you want to work with (fairly) big numbers, you should use a special class for the number representation (Java has BigInteger for instance).
Here's a way that never overflows if the final result is expressible natively in the machine, involves no multiplications/factorizations, is easily parallelized, and generalizes to BigInteger-types:
First note that the binomial coefficients satisfy following:
.
This yields a straightforward recursion for computing the coefficient: the base cases are and , both of which are 1.
The individual results from the subcalls are integers and if \binom{n}{k} can be represented by an int, they can too; so, overflow is not a concern.
Naively implemented, the recursion leads to repeated subcalls and exponential runtimes.
This can be fixed by caching intermediate results. There are
n^2 subproblems, which can be combined in O(1) time, yielding an O(n^2) complexity bound.
This answer calculates binomial with Python:
def h(a, b, c):
x = 0
part = str("=")
while x < (c+1):
nCr = math.comb(c,x)
part = part+'+'+str(int(a**(c-1))*int(b**x)*int(nCr)+'x^'+str(x)
x = x+1
print(part)
h(2,6,4)
Since I have no idea about what I am doing right now, my wording may sound funny. But seriously, I need to learn.
The problem I'm facing is to come up with a method (model) to estimate how a software program works: namely running time and maximal memory usage. What I already have are a large amount of data. This data set gives an overview of how a program works under different conditions, e.g.
<code>
RUN Criterion_A Criterion_B Criterion_C Criterion_D Criterion_E <br>
------------------------------------------------------------------------
R0001 12 2 3556 27 9 <br>
R0002 2 5 2154 22 8 <br>
R0003 19 12 5556 37 9 <br>
R0004 10 3 1556 7 9 <br>
R0005 5 1 556 17 8 <br>
</code>
I have thousands of rows of such data. Now I need to know how I can estimate (forecast) the running time and maximal memory usage if I know all criteria in advance. What I need is an approximation that gives hints (upper limits, or ranges).
I have feeling that it is a typical ??? problem which I don't know. Could you guys show me some hints or give me some ideas (theories, explanations, webpages) or anything that may help. Thanks!
You want a new program that takes as input one or more criteria, then outputs an estimate of the running time or memory usage. This is a machine learning problem.
Your inputs can be listed as a vector of numbers, like this:
input = [ A, B, C, D, E ]
One of the simplest algorithms for this would be a K-nearest neighbor algorithm. The idea behind this is that you'll take your input vector of numbers, and find in your database the vector of numbers that is most similar to your input vector. For example, given this vector of inputs:
input = [ 11, 1.8, 3557, 29, 10 ]
You can assume that the running time and memory should be very similar to the values from this run (originally in your table listed above):
R0001 12 2 3556 27 9
There are several algorithms for calculating the similarity between these two vectors, one simple and intuitive such algorithm is the Euclidean distance. As an example, the Euclidean distance between the input vector and the vector from the table is this:
dist = sqrt( (11-12)^2 + (1.8-2)^2 + (3557-3556)^2 + (27-29)^2 + (9-10)^2 )
dist = 2.6533
It should be intuitively clear that points with lower distance should be better estimates for running time and memory usage, as the distance should describe the similarity between two sets of criteria. Assuming your criteria are informative and well-selected, points with similar criteria should have similar running time and memory usage.
Here's some example code of how to do this in R:
r1 = c(11,1.8,3557,29,10)
r2 = c(12,2.0,3556,27, 9)
print(r1)
print(r2)
dist_r1_r2 = sqrt( (11-12)^2 + (1.8-2)^2 + (3557-3556)^2 + (27-29)^2 + (9-10)^2 )
print(dist_r1_r2)
smarter_dist_r1_r2 = sqrt( sum( (r1 - r2)^2 ) )
print(smarter_dist_r1_r2)
Taking the running time and memory usage of your nearest row is the KNN algorithm for K=1. This approach can be extended to include data from multiple rows by taking a weighted combination of multiple rows from the database, with rows with lower distances to your input vector contributing more to the estimates. Read the Wikipedia page on KNN for more information, especially with regard to data normalization, including contributions from multiple points, and computing distances.
When calculating the difference between these lists of input vectors, you should consider normalizing your data. The rationale for doing this is that a difference of 1 unit between 3557 and 3556 for criteria C may not be equivalent to a difference of 1 between 11 and 12 for criteria A. If your data are normally distributed, you can convert them all to standard scores (or Z-scores) using this formula:
N_trans = (N - mean(N)) / sdev(N)
There is no general solution on the "right" way to normalize data as it depends on the type and range of data that you have, but Z-scores are easy to compute and a good method to try first.
There are many more sophisticated techniques for constructing estimates such as this, including linear regression, support vector regression, and non-linear modeling. The idea behind some of the more sophisticated methods is that you try and develop an equation that describes the relationship of your variables to running time or memory. For example, a simple application might just have one criterion and you can try and distinguish between models such as:
running_time = s1 * A + s0
running_time = s2 * A^2 + s1 * A + s0
running_time = s3 * log(A) + s2 * A^2 + s1 * A + s0
The idea is that A is your fixed criteria, and sN are a list of free parameters that you can tweak until you get a model that works well.
One problem with this approach is that there are many different possible models that have different numbers of parameters. Distinguishing between models that have different numbers of parameters is a difficult problem in statistics, and I don't recommend tackling it during your first foray into machine learning.
Some questions that you should ask yourself are:
Do all of my criteria affect both running time and memory usage? Do some affect only one or the other, and are some useless from a predictive point of view? Answering this question is called feature selection, and is an outstanding problem in machine learning.
Do you have any a priori estimates of how your variables should influence running time or memory usage? For example, you might know that your application uses a sorting algorithm that is N * log(N) in time, which means that you explicitly know the relationship between one criterion and your running time.
Do your rows of measured input criteria paired with running time and memory usage cover all of the plausible use cases for your application? If so, then your estimates will be much better, as machine learning can have a difficult time with data that it's unfamiliar with.
Do the running time and memory of your program depend on criteria that you don't input into your estimation strategy? For example, if you're depending on an external resource such as a web spider, problems with your network may influence running time and memory usage in ways that are difficult to predict. If this is the case, your estimates will have a lot more variance.
If the criterion you would be forecasting for lies within the range of currently known criteria then you should do some more research on the Interpolation process:
In the mathematical subfield of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points
If it lies outside your currently known data range research Extrapolation which is less accurate:
In mathematics, extrapolation is the process of constructing new data points outside a discrete set of known data points.
Methods
Interpolation methods for your browsing.
A powerpoint presentation detailing some methods used for Extrapolation.