MiniZinc select a subset of products -problem category - modeling

after having watched the coursera Basic Modelling course I am trying to categorize my problem so that to choose the suitable Model representation on MiniZinc.
I have a range of 10 products, each of them with its 4 special features/attributes, (a table 4x10). This table has fixed values. The user will give as input 4 parameters.
The constraints will be created in a way that the user input parameters will determine the product's attribute values.
The decision variable will be the subset of the products that match user's input.
from my understanding this is a problem of selecting a subset from a set of Objects, is there any example suggestion available that corresponds to the above Minizinc model description to have a look?

I'm (still) not completely sure about the exact specification of the problem, but here is a model that identifies all the products that are "nearest" the input data. I've defined "nearest" simply as the sum of absolute differences between each feature of a product and the input array (calculated by the score function).
int: k; % number of products
int: n; % number of features
array[1..k, 1..n] of int: data;
array[1..n] of int: input;
% decision variables
array[1..k] of var int: x; % the closeness score for each product
array[1..k] of var 0..1: y; % 1: this products is nearest (as array)
% var set of 1..k: y; % products closest to input (as set)
var int: z; % the minimum score
function var int: score(array[int] of var int: a, array[int] of var int: b) =
let {
var int: t = sum([abs(a[i]-b[i]) | i in index_set(a)])
} in
t
;
solve minimize z;
constraint
forall(i in 1..k) (
x[i] = score(data[i,..], input) /\
(y[i] = 1 <-> z = x[i]) % array
% (i in y <-> x[i] = z) % set
)
/\
minimum(z, x)
;
output [
"input: \(input)\n",
"z: \(z)\n",
"x: \(x)\n",
"y: \(y)\n\n"
]
++
[
% using array representation of y
if fix(y[i]) = 1 then
"nearest: ix:\(i) \(data[i,..])\n" else "" endif
| i in 1..k
];
% data
k = 10;
n = 4;
% random features for the products
data = array2d(1..k,1..n,
[
3,6,7,5,
3,5,6,2,
9,1,2,7,
0,9,3,6,
0,5,2,4, % score 5
1,8,7,9,
2,0,2,3, % score 5
7,5,9,2,
2,8,9,7,
3,6,1,2]);
input = [1,2,3,4];
% input = [7,5,9,2]; % exact match for product 8
The output is:
input: [1, 2, 3, 4]
z: 5
x: [11, 10, 13, 10, 5, 15, 5, 17, 16, 10]
y: [0, 0, 0, 0, 1, 0, 1, 0, 0, 0]
nearest: ix:5 [0, 5, 2, 4]
nearest: ix:7 [2, 0, 2, 3]

Related

Finding all the Combinations - N Rectangles inside the Square

I am a beginner in Constraint Programming using Minizinc and I need help from experts in the field.
How can I compute all the possible combinations: 6 Rectangles inside the Square (10x10) using Minizinc?
Considering that the RESTRICTIONS of the problem are:
1) No Rectangle Can Overlap
2) The 6 rectangles may be vertical or horizontal
OUTPUT:
0,1,1,0,0, . . . , 0,0,6,6,6
1,1,1,0,0, . . . , 0,0,0,4,4
0,0,5,5,0, . . . , 0,0,1,1,1
0,0,0,2,2, . . . , 0,0,0,0,0
0,0,0,0,2, . . . , 0,0,0,0,0
6,6,6,0,0, . . . , 0,4,4,4,0
Continue Combination...
The following model finds solutions within a couple of seconds:
% Chuffed: 1.6s
% CPLEX: 3.9s
% Gecode: 1.5s
int: noOfRectangles = 6;
int: squareLen = 10;
int: Empty = 0;
set of int: Coords = 1..squareLen;
set of int: Rectangles = 1..noOfRectangles;
% decision variables:
% The square matrix
% Every tile is either empty or belongs to one of the rectangles
array[Coords, Coords] of var Empty .. noOfRectangles: s;
% the edges of the rectangles
array[Rectangles] of var Coords: top;
array[Rectangles] of var Coords: bottom;
array[Rectangles] of var Coords: left;
array[Rectangles] of var Coords: right;
% function
function var Coords: getCoord(Coords: row, Coords: col, Rectangles: r, Coords: coord, Coords: defCoord) =
if s[row, col] == r then coord else defCoord endif;
% ----------------------< constraints >-----------------------------
% Determine rectangle limits as minima/maxima of the rows and columns for the rectangles.
% Note: A non-existing rectangle would have top=squareLen, bottom=1, left=squareLen, right=1
% This leads to a negative size and is thus ruled-out.
constraint forall(r in Rectangles) (
top[r] == min([ getCoord(row, col, r, row, squareLen) | row in Coords, col in Coords])
);
constraint forall(r in Rectangles) (
bottom[r] == max([ getCoord(row, col, r, row, 1) | row in Coords, col in Coords])
);
constraint forall(r in Rectangles) (
left[r] == min([ getCoord(row, col, r, col, squareLen) | row in Coords, col in Coords])
);
constraint forall(r in Rectangles) (
right[r] == max([ getCoord(row, col, r, col, 1) | row in Coords, col in Coords])
);
% all tiles within the limits must belong to the rectangle
constraint forall(r in Rectangles) (
forall(row in top[r]..bottom[r], col in left[r]..right[r])
(s[row, col] == r)
);
% enforce a minimum size per rectangle
constraint forall(r in Rectangles) (
(bottom[r] - top[r] + 1) * (right[r] - left[r] + 1) in 2 .. 9
);
% symmetry breaking:
% order rectangles according to their top/left corners
constraint forall(r1 in Rectangles, r2 in Rectangles where r2 > r1) (
(top[r1]*squareLen + left[r1]) < (top[r2]*squareLen + left[r2])
);
% output solution
output [ if col == 1 then "\n" else "" endif ++
if "\(s[row, col])" == "0" then " " else "\(s[row, col]) " endif
| row in Coords, col in Coords];
The grid positions in the sqare can be empty or assume one of six values. The model determines the top and bottom rows of all rectangles. Together with the left and right columns, it makes sure that all tiles within these limits belong to the same rectangle.
To experiment, it is helpful to start with smaller square dimensions and/or smaller numbers of rectangles. It might also make sense to delimit the size of rectangles. Otherwise, the rectangles tend to become too small (1x1) or too big.
Symmetry breaking to enforce a certain ordering of rectangles, does speed-up the solving process.
Here's another solution using MiniZincs Geost constraint. This solution is heavily based on Patrick Trentins excellent answer here. Also make sure to see his explanation on the model.
I assume using the geost constraint speeds up the process a little. Symmetry breaking might further speed things up as Axel Kemper suggests.
include "geost.mzn";
int: k;
int: nObjects;
int: nRectangles;
int: nShapes;
set of int: DIMENSIONS = 1..k;
set of int: OBJECTS = 1..nObjects;
set of int: RECTANGLES = 1..nRectangles;
set of int: SHAPES = 1..nShapes;
array[DIMENSIONS] of int: l;
array[DIMENSIONS] of int: u;
array[RECTANGLES,DIMENSIONS] of int: rect_size;
array[RECTANGLES,DIMENSIONS] of int: rect_offset;
array[SHAPES] of set of RECTANGLES: shape;
array[OBJECTS,DIMENSIONS] of var int: x;
array[OBJECTS] of var SHAPES: kind;
array[OBJECTS] of set of SHAPES: valid_shapes;
constraint forall (obj in OBJECTS) (
kind[obj] in valid_shapes[obj]
);
constraint geost_bb(k, rect_size, rect_offset, shape, x, kind, l, u);
And the corresponding data:
k = 2; % Number of dimensions
nObjects = 6; % Number of objects
nRectangles = 4; % Number of rectangles
nShapes = 4; % Number of shapes
l = [0, 0]; % Lower bound of our bounding box
u = [10, 10]; % Upper bound of our bounding box
rect_size = [|
2, 3|
3, 2|
3, 5|
5, 3|];
rect_offset = [|
0, 0|
0, 0|
0, 0|
0, 0|];
shape = [{1}, {2}, {3}, {4}];
valid_shapes = [{1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {3, 4}];
The output reads a little different. Take this example:
x = array2d(1..6, 1..2, [7, 0, 2, 5, 5, 0, 0, 5, 3, 0, 0, 0]);
kind = array1d(1..6, [1, 1, 1, 1, 1, 3]);
This means rectangle one is placed at [7, 0] and takes the shape [2,3] as seen in this picture:
Building on the answer of #Phonolog, one way to obtain the wanted output format is to use a 2d-array m which is mapped to x through constraints (here size is the bounding box size):
% mapping to a 2d-array output format
set of int: SIDE = 0..size-1;
array[SIDE, SIDE] of var 0..nObjects: m;
constraint forall (i, j in SIDE) ( m[i,j] = sum(o in OBJECTS)(o *
(i >= x[o,1] /\
i <= x[o,1] + rect_size[kind[o],1]-1 /\
j >= x[o,2] /\
j <= x[o,2] + rect_size[kind[o],2]-1)) );
% symmetry breaking between equal objects
array[OBJECTS] of var int: pos = [ size*x[o,1] + x[o,2] | o in OBJECTS ];
constraint increasing([pos[o] | o in 1..nObjects-1]);
solve satisfy;
output ["kind=\(kind)\n"] ++
["x=\(x)\n"] ++
["m="] ++ [show2d(m)]
Edit: Here is the complete code:
include "globals.mzn";
int: k = 2;
int: nObjects = 6;
int: nRectangles = 4;
int: nShapes = 4;
int: size = 10;
set of int: DIMENSIONS = 1..k;
set of int: OBJECTS = 1..nObjects;
set of int: RECTANGLES = 1..nRectangles;
set of int: SHAPES = 1..nShapes;
array[DIMENSIONS] of int: l = [0, 0];
array[DIMENSIONS] of int: u = [size, size];
array[OBJECTS,DIMENSIONS] of var int: x;
array[OBJECTS] of var SHAPES: kind;
array[RECTANGLES,DIMENSIONS] of int: rect_size = [|
3, 2|
2, 3|
5, 3|
3, 5|];
array[RECTANGLES,DIMENSIONS] of int: rect_offset = [|
0, 0|
0, 0|
0, 0|
0, 0|];
array[SHAPES] of set of SHAPES: shape = [
{1}, {2}, {3}, {4}];
array[OBJECTS] of set of SHAPES: valid_shapes =
[{1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {3, 4}];
constraint forall (obj in OBJECTS) (
kind[obj] in valid_shapes[obj]
);
constraint
geost_bb(
k,
rect_size,
rect_offset,
shape,
x,
kind,
l,
u
);
% mapping to a 2d-array output format
set of int: SIDE = 0..size-1;
array[SIDE, SIDE] of var 0..nObjects: m;
constraint forall (i, j in SIDE) ( m[i,j] = sum(o in OBJECTS)(o *
(i >= x[o,1] /\
i <= x[o,1] + rect_size[kind[o],1]-1 /\
j >= x[o,2] /\
j <= x[o,2] + rect_size[kind[o],2]-1)) );
% symmetry breaking between equal objects
array[OBJECTS] of var int: pos = [ size*x[o,1] + x[o,2] | o in OBJECTS ];
constraint increasing([pos[o] | o in 1..nObjects-1]);
solve satisfy;
output ["kind=\(kind)\n"] ++
["x=\(x)\n"] ++
["m="] ++ [show2d(m)]

Coin Change problem using Memoization (Amazon interview question)

def rec_coin_dynam(target,coins,known_results):
'''
INPUT: This funciton takes in a target amount and a list of possible coins to use.
It also takes a third parameter, known_results, indicating previously calculated results.
The known_results parameter shoud be started with [0] * (target+1)
OUTPUT: Minimum number of coins needed to make the target.
'''
# Default output to target
min_coins = target
# Base Case
if target in coins:
known_results[target] = 1
return 1
# Return a known result if it happens to be greater than 1
elif known_results[target] > 0:
return known_results[target]
else:
# for every coin value that is <= than target
for i in [c for c in coins if c <= target]:
# Recursive call, note how we include the known results!
num_coins = 1 + rec_coin_dynam(target-i,coins,known_results)
# Reset Minimum if we have a new minimum
if num_coins < min_coins:
min_coins = num_coins
# Reset the known result
known_results[target] = min_coins
return min_coins
This runs perfectly fine but I have few questions about it.
We give it the following input to run:
target = 74
coins = [1,5,10,25]
known_results = [0]*(target+1)
rec_coin_dynam(target,coins,known_results)
why are we initalising the know result with zeros of length target+1? why can't we just write
know_results = []
Notice that the code contains lines such as:
known_results[target] = 1
return known_results[target]
known_results[target] = min_coins
Now, let me demonstrate the difference between [] and [0]*something in the python interactive shell:
>>> a = []
>>> b = [0]*10
>>> a
[]
>>> b
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>>
>>> a[3] = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range
>>>
>>> b[3] = 1
>>>
>>> a
[]
>>> b
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
The exception IndexError: list assignment index out of range was raised because we tried to access cell 3 of list a, but a has size 0; there is no cell 3. We could put a value in a using a.append(1), but then the 1 would be at position 0, not at position 3.
There was no exception when we accessed cell 3 of list b, because b has size 10, so any index between 0 and 9 is valid.
Conclusion: if you know in advance the size that your array will have, and this size never changes during the execution of the algorithm, then you might as well begin with an array of the appropriate size, rather than with an empty array.
What is the size of known_results? The algorithm needs results for values ranging from 0 to target. How many results is that? Exactly target+1. For instance, if target = 2, then the algorithm will deal with results for 0, 1 and 2; that's 3 different results. Thus known_results must have size target+1. Note that in python, just like in almost every other programming language, a list of size n has n elements, indexed 0 to n-1. In general, in an integer interval [a, b], there are b-a+1 integers. For instance, there are three integers in interval [8, 10] (those are 8, 9 and 10).

Find multiples of a number in list

I have a list and I want to find all the multiples of that number within a certain tolerance as well as get their indices:
def GetPPMError(t, m):
"""
calculate theoretical (t) and measured (m) ppm
"""
return (((t - m) / t) * 1e6)
multiple = n*1.0033
a = 100
b = [100, 101, 101.0033,102, 102.0066,102.123,103.0099]
results = [p for p in b if abs(GetPPMError(a,b)) < 10]
So I want to find all the multiples like 102.0066 and 103.0099 etc.
where a = 100 + 1*1.0033, a = 100 + 2*1.0033, a = 100 + 3*1.0033 etc
So the result would be the indexes.
Results for the indexes:
[2, 4, 6]
and:
[101.0033, 102.0066, 103.0099]
for the values.
This works for your data:
multiple = 1.0033
a = 100
digits = 6
res = []
res_index = []
for n, x in enumerate(b):
diff = round((x - a) / multiple, digits)
if diff != 0 and diff == int(diff):
res.append(x)
res_index.append(n)
print(res)
print(res_index)
Output:
[101.0033, 102.0066, 103.0099]
[2, 4, 6]

Find number of rectangles from a given set of coordinates

I have to find max number of rectangles from a given set of coordinates.
Consider the following coordinates are given in an X Y coordinate system
3 10,
3 8,
3 6,
3 4,
3 0,
6 0,
6 4,
6 8,
6 10,
How can I find if the following coordinates form a rectangle (3,0) (3,4) (6,4) (6,0)
Running time constraint: 0.1 sec
Thank you
Separate your points in lists of 'y' coordinate, grouped by 'x' coordinate. In your case you would have two sorted lists:
3: [0,4,6,8,10]
6: [0,4,8,10]
Doing the intersection of both lists you get: [0,4,8,10]
Any two of those would form a rectangle:
[0,4] => (3,0), (3,4), (6,0), (6,4)
[0,8] => (3,0), (3,8), (6,0), (6,8)
[4,8] => (3,4), (3,8), (6,4), (6,8)
...
This solution only works for orthogonal rectangles, this is, with sides parallel to x,y axis.
For every pair of points, say (x1, y1) and (x2, y2) consider it to be the diagonal of some rectangle. If there exist points (x1, y2) and (x2, y1) in the initial set then we have found our rectangle. It should be noted that there will exist 2 diagonals which will represent the same rectangle so we divide the final answer by 2.
This will work only for rectangles parallel to x-axis or y-axis.
PseudoCode C++:
answer = 0;
set<pair<int, int>> points;
for(auto i=points.begin(); i!=std::prev(points.end()); i++)
{
for(auto j=i+1; j!=points.end(); j++)
{
pair<int, int> p1 = *i;
pair<int, int> p2 = *j;
if(p1.first == p2.first || p1.second == p2.second)
continue;
pair<int, int> p3 = make_pair(p1.first, p2.second);
pair<int, int> p4 = make_pair(p2.first, p1.second);
if(points.find(p3) != points.end() && points.find(p4) != points.end())
++answer;
}
}
return answer/2;
To check if 4 points form a rectangle:
for every two points calculate the distance. store all in array of floats.
sort the array.
you will have a[0] = a[1], a[2] = a[3], a[4] = a[5]
How can I find if the following coordinates form a rectangle
Check whether the difference vectors are orthogonal, i.e. have dot product zero.
This does not check whether these coordinates are included in your list. It also does not check whether the rectangle is aligned with the coordinate axes, which would be a far simpler problem.
If you want to find all rectangles in your input, you could do the above check for all quadruples. If that is inacceptable for performance reasons, then you should update your question, indicating what kind of problem size and performance constrainst you are facing.
My humble submission
I assume number of optimizations is possible.
My approach is to
Traverse over every point
Check which all points are there just above this point and then store Y coordinates of that points which form a line
Next time when I find again the same Y coordinate that means we have found 1 rectangle
Keep traversing all other points again doing the same thing.
My solution runs in O(n^2) but this will only be rectangle which are parallel to X or Y axis.
Here is my code for above approach:
def getRectangleCount(coordinate):
n = len(coordinate)
y_count = dict()
ans = 0
for i in range(n):
x, y = coordinate[i]
for j in range(n):
dx = coordinate[j][0]
dy = coordinate[j][1]
if y < dy and x == dx:
ans += y_count.get((y, dy), 0)
y_count[(y, dy)] = y_count.get((y, dy), 0) + 1
return ans
coordinate = [[3, 10], [3, 8], [3, 6], [3, 4], [3, 0], [6, 0], [6, 4], [6, 8], [6, 10]]
print(getRectangleCount(coordinate))
Here's a solution that finds all unique rectangles (not only those parallel to the x or y-axis) in a given list of coordinate points in O(n^4) time.
Pseudocode:
// checks if two floating point numbers are equal within a given
// error to avoid rounding issues
bool is_equal(double a, double b, double e) {
return abs(a - b) < e;
}
// computes the dot product of the vectors ab and ac
double dot_product(Point a, Point b, Point c) {
return (b.x - a.x) * (c.x - a.x) + (b.y - a.y) * (c.y - a.y);
}
// find all rectangles in a given set of coordinate points
List<Rectangle> find_rectangles(List<Point> points) {
List<Rectangle> rectangles;
// sort points in ascending order by first comparing x than y value
sort(points);
for (int a = 0; a < points.size(); ++a)
for (int b = a + 1; a < points.size(); ++b)
for (int c = b + 1; c < points.size(); ++c)
for (int d = c + 1; d < points.size(); ++d)
// check all angles
if (is_equal(dot_product(points[a], points[b], points[c]), 0.0, 1e-7) &&
is_equal(dot_product(points[b], points[a], points[d]), 0.0, 1e-7) &&
is_equal(dot_product(points[d], points[b], points[c]), 0.0, 1e-7) &&
is_equal(dot_product(points[c], points[a], points[d]), 0.0, 1e-7))
// found rectangle
rectangles.add(new Rectangle(points[a], points[c], points[d], points[b]));
return rectangles;
}
Explanation:
For a given set of points A, B, C, D to define a rectangle we can check if all angles are 90° meaning all non-parallel sides are orthogonal.
Since we can check this property just by the dot product being 0 this is the most efficient way (instead of having to do square-root calculations for computing side lengths).
Sorting the points first avoids checking the same set of points multiple times due to permutations.

How to select Y values at X position in Groovy?

this is sort of a mathy question...
I had a question prior to this about normalizing monthly data here :
How to produce X values of a stretched graph?
I got a good answer and it works well, the only issue is that now I need to check X values of one month with 31 days against X values of a month with 28.
So my question would be: If I have two sets of parameters like so:
x | y x2 | y2
1 | 10 1.0 | 10
2 | 9 1.81 | 9.2
3 | 8 2.63 | 8.6
4 | 7 3.45 | 7.8
5 | 6 4.27 | 7
6 | 5 5.09 | 6.2
7 | 4 5.91 | 5.4
8 | 3 6.73 | 4.2
9 | 2 7.55 | 3.4
10 | 1 8.36 | 2.6
9.18 | 1.8
10.0 | 1.0
As you can see, the general trend is the same for these two data sets.
However, if I run these values through a cross-correlation function (the general goal), I will get something back that does not reflect this, since the data sets are of two different sizes.
The real world example of this would be, say, if you are tracking how many miles you run per day:
In February (with 28 days), during the first week, you run one mile each day. During the second week, you run two miles each day, etc.
In March (with 31 days), you do the same thing, but run for one mile for eight days, two miles for eight days, three miles for eight days, and four miles for seven days.
The correlation coefficient according to the following function should be almost exactly 1:
class CrossCorrelator {
def variance = { x->
def v = 0
x.each{ v += it**2}
v/(x.size()) - (mean(x)**2)
}
def covariance = {x, y->
def z = 0
[x, y].transpose().each{ z += it[0] * it[1] }
(z / (x.size())) - (mean(x) * mean(y))
}
def coefficient = {x, y->
covariance(x,y) / (Math.sqrt(variance(x) * variance(y)))
}
}
def i = new CrossCorrelator()
i.coefficient(y values, y2 values)
Just looking at the data sets, it seems like the graphs would be exactly the same if I were to grab the values at 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10, and the function would produce a more accurate result.
However, it's skewed since the lengths are not the same.
Is there some way to locate what the values at the integers in the twelve-value data set would be? I haven't found a simple way to do it, but this would be incredibly helpful.
Thanks in advance,
5
Edit: As per request, here is the code that generates the X values of the graphs:
def x = (1..12)
def y = 10
change = {l, size ->
v = [1]
l.each{
v << ((((size-1)/(x.size() - 1)) * it) + 1)
}
v -= v.last()
return v
}
change(x, y)
Edit: Not working code as per another request:
def normalize( xylist, days ) {
xylist.collect { x, y -> [ x * ( days / xylist.size() ), y ] }
}
def change = {l, size ->
def v = [1]
l.each{
v << ((((size-1)/(l.size() - 1)) * it) + 1)
}
v -= v.last()
return v
}
def resample( list, min, max ) {
// We want a graph with integer points from min to max on the x axis
(min..max).collect { i ->
// find the values above and below this point
bounds = list.inject( [ a:null, b:null ] ) { r, p ->
// if the value is less than i, set it in r.a
if( p[ 0 ] < i )
r.a = p
// if it's bigger (and we don't already have a bigger point)
// then set it into r.b
if( !r.b && p[ 0 ] >= i )
r.b = p
r
}
// so now, bounds.a is the point below our required point, and bounds.b
// Deal with the first case (where a is null, because we are at the start)
if( !bounds.a )
[ i, list[ 0 ][ 1 ] ]
else {
// so work out the distance from bounds.a to bounds.b
dist = ( bounds.b[0] - bounds.a[0] )
// And how far the point i is along this line
r = ( i - bounds.a[0] ) / dist
// and recalculate the y figure for this point
y = ( ( bounds.b[1] - bounds.a[1] ) * r ) + bounds.a[1]
[ i, y ]
}
}
}
def feb = [9, 3, 7, 23, 15, 16, 17, 18, 19, 13, 14, 8, 13, 12, 15, 6, 7, 13, 19, 12, 7, 3, 4, 15, 6, 17, 8, 19]
def march = [8, 12, 4, 17, 11, 15, 12, 8, 9, 13, 12, 7, 3, 4, 8, 2, 17, 19, 21, 12, 12, 13, 14, 15, 16, 7, 8, 19, 21, 14, 16]
//X and Y Values for February
z = [(1..28), change(feb, 28)].transpose()
//X and Y Values for March stretched to 28 entries
o = [(1..31), change(march, 28)].transpose()
o1 = normalize(o, 28)
resample(o1, 1, 28)
If I switch "march" in the o variable declaration to (1..31), the script runs successfully. When I try to use "march," I get "
java.lang.NullPointerException: Cannot invoke method getAt() on null object"
Also: I try not to directly copy code just because it's bad practice, so one of the functions I changed basically does the same thing, it's just my version. I'll get around to refactoring the rest of it eventually, too. But that's why it's slightly different.
Ok...here we go...this may not be the cleanest bit of code ever...
Let's first generate two distributions, both from 1 to 10 (in the y axis)
def generate( range, max ) {
range.collect { i ->
[ i, max * ( i / ( range.to - range.from + 1 ) ) ]
}
}
// A distribution 10 elements long from 1 to 10
def e1 = generate( 1..10, 10 )
// A distribution 14 elements long from 1 to 10
def e2 = generate( 1..14, 10 )
So now, e1 and e2 are:
[1.00,1.00], [2.00,2.00], [3.00,3.00], [4.00,4.00], [5.00,5.00], [6.00,6.00], [7.00,7.00], [8.00,8.00], [9.00,9.00], [10.00,10.00]
[1.00,0.71], [2.00,1.43], [3.00,2.14], [4.00,2.86], [5.00,3.57], [6.00,4.29], [7.00,5.00], [8.00,5.71], [9.00,6.43], [10.00,7.14], [11.00,7.86], [12.00,8.57], [13.00,9.29], [14.00,10.00]
respectively (to 2dp). Now, using the code from the previous question, we can normalize these to the same x range:
def normalize( xylist, days ) {
xylist.collect { x, y -> [ x * ( days / xylist.size() ), y ] }
}
n1 = normalize( e1, 10 )
n2 = normalize( e2, 10 )
This means n1 and n2 are:
[1.00,1.00], [2.00,2.00], [3.00,3.00], [4.00,4.00], [5.00,5.00], [6.00,6.00], [7.00,7.00], [8.00,8.00], [9.00,9.00], [10.00,10.00]
[0.71,0.71], [1.43,1.43], [2.14,2.14], [2.86,2.86], [3.57,3.57], [4.29,4.29], [5.00,5.00], [5.71,5.71], [6.43,6.43], [7.14,7.14], [7.86,7.86], [8.57,8.57], [9.29,9.29], [10.00,10.00]
But, as you correctly state they have different numbers of sample points, so cannot be compared easily.
But we can write a method to step through each point we want in our graph, fond the two closest points, and interpolate a y value from the values of these two points like so:
def resample( list, min, max ) {
// We want a graph with integer points from min to max on the x axis
(min..max).collect { i ->
// find the values above and below this point
bounds = list.inject( [ a:null, b:null ] ) { r, p ->
// if the value is less than i, set it in r.a
if( p[ 0 ] < i )
r.a = p
// if it's bigger (and we don't already have a bigger point)
// then set it into r.b
if( !r.b && p[ 0 ] >= i )
r.b = p
r
}
// so now, bounds.a is the point below our required point, and bounds.b
if( !bounds.a ) // no lower bound...take the first element
[ i, list[ 0 ][ 1 ] ]
else if( !bounds.b ) // no upper bound... take the last element
[ i, list[ -1 ][ 1 ] ]
else {
// so work out the distance from bounds.a to bounds.b
dist = ( bounds.b[0] - bounds.a[0] )
// And how far the point i is along this line
r = ( i - bounds.a[0] ) / dist
// and recalculate the y figure for this point
y = ( ( bounds.b[1] - bounds.a[1] ) * r ) + bounds.a[1]
[ i, y ]
}
}
}
final1 = resample( n1, 1, 10 )
final2 = resample( n2, 1, 10 )
now, the values final1 and final2 are:
[1.00,1.00], [2.00,2.00], [3.00,3.00], [4.00,4.00], [5.00,5.00], [6.00,6.00], [7.00,7.00], [8.00,8.00], [9.00,9.00], [10.00,10.00]
[1.00,1.00], [2.00,2.00], [3.00,3.00], [4.00,4.00], [5.00,5.00], [6.00,6.00], [7.00,7.00], [8.00,8.00], [9.00,9.00], [10.00,10.00]
(obviously, there is some rounding here, so 2d.p. is hiding the fact that they are not exactly the same)
Phew... Must be home-time after that ;-)
EDIT
As pointed out in the edit to the question, there was a bug in my resample method that caused it to fail in certain conditions...
I believe this has now been fixed in the code above, and from the given example:
def march = [8, 12, 4, 17, 11, 15, 12, 8, 9, 13, 12, 7, 3, 4, 8, 2, 17, 19, 21, 12, 12, 13, 14, 15, 16, 7, 8, 19, 21, 14, 16]
o = [ (1..31), march ].transpose()
// X values squeezed to be between 1 and 28 (instead of 1 to 31)
o1 = normalize(o, 28)
// Then, resample this graph so there are only 28 points
v = resample(o1, 1, 28)
If you plot the original 31 points (in o) and the new graph of 28 points (in v), you get:
Which doesn't look too bad.
I have no idea what the change method was supposed to do, so I have omitted it from this code

Resources