Constraints on a Genetic Algorithm - excel

I'm using SolveXL as an add-in for excel to do a multi-objective optimization. SolveXL solves it by applying the Non-dominated sorting genetic algorithm II (NSGA-II). My problem is with the constraints, when I have constraints to not exceed a value (less than 15) for example like this
=IF(C5>15;C5-15;0)
The software give me all solutions satisfying this. But when I want the solutions to be greater than a value (15)
=IF(C5<15;15-C5;0)
I don't get any solution with infeasibility zero. I'm sure I'm computing it right because I tried it with a small example.
Could it not be working because of the complexity of my excel computations?

Related

Bayesian t-test assumptions

Good afternoon,
I know that the traditional independent t-test assumes homoscedasticity (i.e., equal variances across groups) and normality of the residuals.
They are usually checked by using levene's test for homogeneity of variances, and the shapiro-wilk test and qqplots for the normality assumption.
Which statistical assumptions do I have to check with the bayesian independent t test? How may I check them in R with coda and rjags?
For whichever test you want to run, find the formula and plug in using the posterior draws of the parameters you have, such as the variance parameter and any regression coefficients that the formula requires. Iterating the formula over the posterior draws will give you a range of values for the test statistic from which you can take the mean to get an average value and the sd to get a standard deviation (uncertainty estimate).
And boom, you're done.
There might be non-parametric Bayesian t-tests. But commonly, Bayesian t-tests are parametric, and as such they assume equality of relevant population variances. If you could obtain a t-value from a t-test (just a regular t-test for your type of t-test from any software package you're comfortable with), use levene's test (do not think this in any way is a dependable test, remember it uses p-value), then you can do a Bayesian t-test. But remember the point that the Bayesian t-test, requires a conventional modeling of observations (Likelihood), and an appropriate prior for the parameter of interest.
It is highly recommended that t-tests be re-parameterized in terms of effect sizes (especially standardized mean difference effect sizes). That is, you focus on the Bayesian estimation of the effect size arising from the t-test not other parameter in the t-test. If you opt to estimate Effect Size from a t-test, then a very easy to use free, online Bayesian t-test software is THIS ONE HERE (probably one of the most user-friendly package available, note that this software uses a cauchy prior for the effect size arising from any type of t-test).
Finally, since you want to do a Bayesian t-test, I would suggest focusing your attention on picking an appropriate/defensible/meaningful prior rather then levenes' test. No test could really show that the sample data may have come from two populations (in your case) that have had equal variances or not unless data is plentiful. Note that the issue that sample data may have come from populations with equal variances itself is an inferential (Bayesian or non-Bayesian) question.

excel solver and lp_solve API

I have a mixed integer/binary linear programming problem, the free version of excel solver can find a solution that satisfies all the constraints. however, the lp_solve API that I call from C++ could not find a solution. I suspect that excel solver simplifies the problem. So far I locate 2 parameters in excel solver option: mip gap and constraint precision. the former is set to 1%, i set the lp_solve's mip gap to 1%, but I do not know what is the equivalent constraint precision in lp_solve. Anyone can help? Thanks!

How does the Needleman Wunsch algorithm compare to brute force?

I'm wondering how you can quantify the results of the Needleman-Wunsch algorithm (typically used for aligning nucleotide/protein sequences).
Consider some fixed scoring scheme and two sequences of varying length S1 and S2. Say we calculate every possible alignment of S1 and S2 by brute force, and the highest scoring alignment has a score x. And of course, this has considerably higher complexity than the Needleman-Wunsch approach.
When using the Needleman-Wunsch algorithm to find a sequence alignment, say that it has a score y.
Consider r to be the score generated via Needleman-Wunsch for two random sequences R1 and R2.
How does x compare to y? Is y always greater than r for two sequences of known homology?
In general, I do understand that we use the Needleman-Wunsch algorithm to significantly speed up sequence alignment (vs a brute-force approach), but don't understand the cost in accuracy (if any) that comes with it. I had a go at reading the original paper (Needleman & Wunsch, 1970) but am still left with this question.
Needlman-Wunsch always produces an optimal answer - it's much faster than brute force and doesn't sacrifice accuracy in the process. The key insight it uses is that it's not actually necessary to generate all possible alignments, since most of them contain bad sub-alignments and couldn't possibly be optimal. The Needleman-Wunsch algorithm works by instead slowly building up optimal alignments for fragments of the original strands and then slowly growing those smaller alignments into larger alignments using the guarantee that any optimal alignment must contain an optimal alignment for a slightly smaller case.
I think your question boils down to whether dynamic programming finds the optimal solution ie, garantees that y >= x. For a discussion on this I would refer to people who are likely smarter than me:
https://cs.stackexchange.com/questions/23599/how-is-dynamic-programming-different-from-brute-force
Basically, it says that dynamic programming will likely produce optimal result ie, same as brute force, but only for particular problems that satisfy the Bellman principle of optimality.
According to Wikipedia page for Needleman-Wunsch, the problem does satisfy Bellman principle of optimality:
https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm
Specifically:
The Needleman–Wunsch algorithm is still widely used for optimal global
alignment, particularly when the quality of the global alignment is of
the utmost importance. However, the algorithm is expensive with
respect to time and space, proportional to the product of the length
of two sequences and hence is not suitable for long sequences.
There is also mention of optimality elsewhere in the same Wikipedia page.

2D shape optimization through genetic algorithms

I just recently started learning about genetic algorithms and am now trying to implement them in 2D shape optimization in physics simulaiton. The simulation produces a single scalar for each shape. (I guess this is kind of similar to boxcar2d http://boxcar2d.com/)
The 2D shapes are actually the union of several 2D "sub shapes." Each subshape is stored as an list of angles/radii. The 2D shape is then stored as a list of subshape lists. This serves as my chromosone right now.
Right now for fitness, I will probably use the scalar the simulation produced. My question is, how should I go about the selection and reproduction process? Would tournament be more appropriate, or would I want to use truncation in combination with the proportional selection? Also, how do you find a good mutation rate/population size, etc
sorry for so many questions but thanks in advance. I just don't really know where to start.
On my point of view the best way is to use adaptive reproduction strategy during evolution: at the first steps (let name it - "the first phase of calculations") you might set high mutation probability, at this phase you should find enough good solution. At the "second phase" of algorithm you might set decreasing of mutation probability every few steps - at this phase you should improve your solution. But sometimes in my practice I've noticed degradation of population during second phase of optimization (when each chromosome is strongly similar to other) - which effects with extremly slowing down of optimization pperformance, so my solution was to improve algorithm with high valued mutation random perturbations and it helps.
Also I'll advice you to read about differential evolution algorithm - http://en.wikipedia.org/wiki/Differential_evolution. As for me it's performance is much more faster than genetic algorithm.

Help with EXCEL Fast Fourier Transform

I am trying to use Excel's (2007) built in FFT feature, however, it requires that I have 2^n data points - which I do not have.
I have tried two things, both give different results:
Pad the data values by zeros so that N (the number of data points) reach the closest power of 2
Use a divide-and-conquer approach i.e. if I have 112 data points, then I do a FFT for 64, then 32, then 16 (112=64+32+16)
Which is the better approach? I am comfortable writing VBA macros but I am looking for an algorithm which does not require the constraint of N being power of 2. Can anyone help?
Splitting your data into smaller bits will result in erroneous output, especially for smaller numbers of data points.
Padding with zeroes is a much better idea, and the general approach for FFTs. If you are interested in an alternative way of doing the FFT, octave will do it for you, and most of the Matlab documentation applies so you should have no trouble with it.
Padding with zeros is the right direction, but keep in mind that if you're doing the transform in order to estimate frequency content, you will need a window function, and that should be applied to the short block (i.e., if you have 2000 points, apply a 2000 point Hann window, then pad to 2048 and calculate the transform).
If you're developing an add-in, you might consider using one of the many FFT libraries out there. I'm a big fan of KISS FFT by Marc Borgerding. It offers fast transforms for many blocksizes, essentially any blocksize that can be factored into the numbers 2,3,4, and/or 5. It doesn't handle prime number sized blocks though. It's written in very plain C, so should be easy to port to C#. Or, this SO question suggests some libraries that can be used in .NET.
pad out with zeros
2^n is a requirement of the FFT algorithm.
Maybe a test of a known time series (e.g., simple sine or cosine of a single frequency). When you FFT that, you should get a single frequency (Dirac delta function). Anything else is an error. Do it with an integer power of two, padded with zeroes, etc.
You can pad with zeros, or you can use an FFT library that supports arbitrary sizes. One such library is https://github.com/altomani/XL-FFT.
It implements the FFT as a pure formula with LAMBDA functions (i.e. without any VBA code).
For power of two length it uses a recursive radix-2 Cooley-Tukey algorithm
and for other length a version of Bluestein's algorithm that reduces the calculation to a power of two case.

Resources