this is sort of a mathy question...
I had a question prior to this about normalizing monthly data here :
How to produce X values of a stretched graph?
I got a good answer and it works well, the only issue is that now I need to check X values of one month with 31 days against X values of a month with 28.
So my question would be: If I have two sets of parameters like so:
x | y x2 | y2
1 | 10 1.0 | 10
2 | 9 1.81 | 9.2
3 | 8 2.63 | 8.6
4 | 7 3.45 | 7.8
5 | 6 4.27 | 7
6 | 5 5.09 | 6.2
7 | 4 5.91 | 5.4
8 | 3 6.73 | 4.2
9 | 2 7.55 | 3.4
10 | 1 8.36 | 2.6
9.18 | 1.8
10.0 | 1.0
As you can see, the general trend is the same for these two data sets.
However, if I run these values through a cross-correlation function (the general goal), I will get something back that does not reflect this, since the data sets are of two different sizes.
The real world example of this would be, say, if you are tracking how many miles you run per day:
In February (with 28 days), during the first week, you run one mile each day. During the second week, you run two miles each day, etc.
In March (with 31 days), you do the same thing, but run for one mile for eight days, two miles for eight days, three miles for eight days, and four miles for seven days.
The correlation coefficient according to the following function should be almost exactly 1:
class CrossCorrelator {
def variance = { x->
def v = 0
x.each{ v += it**2}
v/(x.size()) - (mean(x)**2)
}
def covariance = {x, y->
def z = 0
[x, y].transpose().each{ z += it[0] * it[1] }
(z / (x.size())) - (mean(x) * mean(y))
}
def coefficient = {x, y->
covariance(x,y) / (Math.sqrt(variance(x) * variance(y)))
}
}
def i = new CrossCorrelator()
i.coefficient(y values, y2 values)
Just looking at the data sets, it seems like the graphs would be exactly the same if I were to grab the values at 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10, and the function would produce a more accurate result.
However, it's skewed since the lengths are not the same.
Is there some way to locate what the values at the integers in the twelve-value data set would be? I haven't found a simple way to do it, but this would be incredibly helpful.
Thanks in advance,
5
Edit: As per request, here is the code that generates the X values of the graphs:
def x = (1..12)
def y = 10
change = {l, size ->
v = [1]
l.each{
v << ((((size-1)/(x.size() - 1)) * it) + 1)
}
v -= v.last()
return v
}
change(x, y)
Edit: Not working code as per another request:
def normalize( xylist, days ) {
xylist.collect { x, y -> [ x * ( days / xylist.size() ), y ] }
}
def change = {l, size ->
def v = [1]
l.each{
v << ((((size-1)/(l.size() - 1)) * it) + 1)
}
v -= v.last()
return v
}
def resample( list, min, max ) {
// We want a graph with integer points from min to max on the x axis
(min..max).collect { i ->
// find the values above and below this point
bounds = list.inject( [ a:null, b:null ] ) { r, p ->
// if the value is less than i, set it in r.a
if( p[ 0 ] < i )
r.a = p
// if it's bigger (and we don't already have a bigger point)
// then set it into r.b
if( !r.b && p[ 0 ] >= i )
r.b = p
r
}
// so now, bounds.a is the point below our required point, and bounds.b
// Deal with the first case (where a is null, because we are at the start)
if( !bounds.a )
[ i, list[ 0 ][ 1 ] ]
else {
// so work out the distance from bounds.a to bounds.b
dist = ( bounds.b[0] - bounds.a[0] )
// And how far the point i is along this line
r = ( i - bounds.a[0] ) / dist
// and recalculate the y figure for this point
y = ( ( bounds.b[1] - bounds.a[1] ) * r ) + bounds.a[1]
[ i, y ]
}
}
}
def feb = [9, 3, 7, 23, 15, 16, 17, 18, 19, 13, 14, 8, 13, 12, 15, 6, 7, 13, 19, 12, 7, 3, 4, 15, 6, 17, 8, 19]
def march = [8, 12, 4, 17, 11, 15, 12, 8, 9, 13, 12, 7, 3, 4, 8, 2, 17, 19, 21, 12, 12, 13, 14, 15, 16, 7, 8, 19, 21, 14, 16]
//X and Y Values for February
z = [(1..28), change(feb, 28)].transpose()
//X and Y Values for March stretched to 28 entries
o = [(1..31), change(march, 28)].transpose()
o1 = normalize(o, 28)
resample(o1, 1, 28)
If I switch "march" in the o variable declaration to (1..31), the script runs successfully. When I try to use "march," I get "
java.lang.NullPointerException: Cannot invoke method getAt() on null object"
Also: I try not to directly copy code just because it's bad practice, so one of the functions I changed basically does the same thing, it's just my version. I'll get around to refactoring the rest of it eventually, too. But that's why it's slightly different.
Ok...here we go...this may not be the cleanest bit of code ever...
Let's first generate two distributions, both from 1 to 10 (in the y axis)
def generate( range, max ) {
range.collect { i ->
[ i, max * ( i / ( range.to - range.from + 1 ) ) ]
}
}
// A distribution 10 elements long from 1 to 10
def e1 = generate( 1..10, 10 )
// A distribution 14 elements long from 1 to 10
def e2 = generate( 1..14, 10 )
So now, e1 and e2 are:
[1.00,1.00], [2.00,2.00], [3.00,3.00], [4.00,4.00], [5.00,5.00], [6.00,6.00], [7.00,7.00], [8.00,8.00], [9.00,9.00], [10.00,10.00]
[1.00,0.71], [2.00,1.43], [3.00,2.14], [4.00,2.86], [5.00,3.57], [6.00,4.29], [7.00,5.00], [8.00,5.71], [9.00,6.43], [10.00,7.14], [11.00,7.86], [12.00,8.57], [13.00,9.29], [14.00,10.00]
respectively (to 2dp). Now, using the code from the previous question, we can normalize these to the same x range:
def normalize( xylist, days ) {
xylist.collect { x, y -> [ x * ( days / xylist.size() ), y ] }
}
n1 = normalize( e1, 10 )
n2 = normalize( e2, 10 )
This means n1 and n2 are:
[1.00,1.00], [2.00,2.00], [3.00,3.00], [4.00,4.00], [5.00,5.00], [6.00,6.00], [7.00,7.00], [8.00,8.00], [9.00,9.00], [10.00,10.00]
[0.71,0.71], [1.43,1.43], [2.14,2.14], [2.86,2.86], [3.57,3.57], [4.29,4.29], [5.00,5.00], [5.71,5.71], [6.43,6.43], [7.14,7.14], [7.86,7.86], [8.57,8.57], [9.29,9.29], [10.00,10.00]
But, as you correctly state they have different numbers of sample points, so cannot be compared easily.
But we can write a method to step through each point we want in our graph, fond the two closest points, and interpolate a y value from the values of these two points like so:
def resample( list, min, max ) {
// We want a graph with integer points from min to max on the x axis
(min..max).collect { i ->
// find the values above and below this point
bounds = list.inject( [ a:null, b:null ] ) { r, p ->
// if the value is less than i, set it in r.a
if( p[ 0 ] < i )
r.a = p
// if it's bigger (and we don't already have a bigger point)
// then set it into r.b
if( !r.b && p[ 0 ] >= i )
r.b = p
r
}
// so now, bounds.a is the point below our required point, and bounds.b
if( !bounds.a ) // no lower bound...take the first element
[ i, list[ 0 ][ 1 ] ]
else if( !bounds.b ) // no upper bound... take the last element
[ i, list[ -1 ][ 1 ] ]
else {
// so work out the distance from bounds.a to bounds.b
dist = ( bounds.b[0] - bounds.a[0] )
// And how far the point i is along this line
r = ( i - bounds.a[0] ) / dist
// and recalculate the y figure for this point
y = ( ( bounds.b[1] - bounds.a[1] ) * r ) + bounds.a[1]
[ i, y ]
}
}
}
final1 = resample( n1, 1, 10 )
final2 = resample( n2, 1, 10 )
now, the values final1 and final2 are:
[1.00,1.00], [2.00,2.00], [3.00,3.00], [4.00,4.00], [5.00,5.00], [6.00,6.00], [7.00,7.00], [8.00,8.00], [9.00,9.00], [10.00,10.00]
[1.00,1.00], [2.00,2.00], [3.00,3.00], [4.00,4.00], [5.00,5.00], [6.00,6.00], [7.00,7.00], [8.00,8.00], [9.00,9.00], [10.00,10.00]
(obviously, there is some rounding here, so 2d.p. is hiding the fact that they are not exactly the same)
Phew... Must be home-time after that ;-)
EDIT
As pointed out in the edit to the question, there was a bug in my resample method that caused it to fail in certain conditions...
I believe this has now been fixed in the code above, and from the given example:
def march = [8, 12, 4, 17, 11, 15, 12, 8, 9, 13, 12, 7, 3, 4, 8, 2, 17, 19, 21, 12, 12, 13, 14, 15, 16, 7, 8, 19, 21, 14, 16]
o = [ (1..31), march ].transpose()
// X values squeezed to be between 1 and 28 (instead of 1 to 31)
o1 = normalize(o, 28)
// Then, resample this graph so there are only 28 points
v = resample(o1, 1, 28)
If you plot the original 31 points (in o) and the new graph of 28 points (in v), you get:
Which doesn't look too bad.
I have no idea what the change method was supposed to do, so I have omitted it from this code
Related
I am trying to solving the "Counting Change" problem with memorization.
Consider the following problem: How many different ways can we make change of $1.00, given half-dollars, quarters, dimes, nickels, and pennies? More generally, can we write a function to compute the number of ways to change any given amount of money using any set of currency denominations?
And the intuitive solution with recursoin.
The number of ways to change an amount a using n kinds of coins equals
the number of ways to change a using all but the first kind of coin, plus
the number of ways to change the smaller amount a - d using all n kinds of coins, where d is the denomination of the first kind of coin.
#+BEGIN_SRC python :results output
# cache = {} # add cache
def count_change(a, kinds=(50, 25, 10, 5, 1)):
"""Return the number of ways to change amount a using coin kinds."""
if a == 0:
return 1
if a < 0 or len(kinds) == 0:
return 0
d = kinds[0] # d for digit
return count_change(a, kinds[1:]) + count_change(a - d, kinds)
print(count_change(100))
#+END_SRC
#+RESULTS:
: 292
I try to take advantage of memorization,
Signature: count_change(a, kinds=(50, 25, 10, 5, 1))
Source:
def count_change(a, kinds=(50, 25, 10, 5, 1)):
"""Return the number of ways to change amount a using coin kinds."""
if a == 0:
return 1
if a < 0 or len(kinds) == 0:
return 0
d = kinds[0]
cache[a] = count_change(a, kinds[1:]) + count_change(a - d, kinds)
return cache[a]
It works properly for small number like
In [17]: count_change(120)
Out[17]: 494
work on big numbers
In [18]: count_change(11000)
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
<ipython-input-18-52ba30c71509> in <module>
----> 1 count_change(11000)
/tmp/ipython_edit_h0rppahk/ipython_edit_uxh2u429.py in count_change(a, kinds)
9 return 0
10 d = kinds[0]
---> 11 cache[a] = count_change(a, kinds[1:]) + count_change(a - d, kinds)
12 return cache[a]
... last 1 frames repeated, from the frame below ...
/tmp/ipython_edit_h0rppahk/ipython_edit_uxh2u429.py in count_change(a, kinds)
9 return 0
10 d = kinds[0]
---> 11 cache[a] = count_change(a, kinds[1:]) + count_change(a - d, kinds)
12 return cache[a]
RecursionError: maximum recursion depth exceeded in comparison
What's the problem with memorization solution?
In the memoized version, the count_change function has to take into account the highest index of coin you can use when you make the recursive call, so that you can use the already calculated values ...
def count_change(n, k, kinds):
if n < 0:
return 0
if (n, k) in cache:
return cache[n,k]
if k == 0:
v = 1
else:
v = count_change(n-kinds[k], k, kinds) + count_change(n, k-1, kinds)
cache[n,k] = v
return v
You can try :
cache = {}
count_change(120,4, [1, 5, 10, 25, 50])
gives 494
while :
cache = {}
count_change(11000,4, [1, 5, 10, 25, 50])
outputs: 9930221951
after having watched the coursera Basic Modelling course I am trying to categorize my problem so that to choose the suitable Model representation on MiniZinc.
I have a range of 10 products, each of them with its 4 special features/attributes, (a table 4x10). This table has fixed values. The user will give as input 4 parameters.
The constraints will be created in a way that the user input parameters will determine the product's attribute values.
The decision variable will be the subset of the products that match user's input.
from my understanding this is a problem of selecting a subset from a set of Objects, is there any example suggestion available that corresponds to the above Minizinc model description to have a look?
I'm (still) not completely sure about the exact specification of the problem, but here is a model that identifies all the products that are "nearest" the input data. I've defined "nearest" simply as the sum of absolute differences between each feature of a product and the input array (calculated by the score function).
int: k; % number of products
int: n; % number of features
array[1..k, 1..n] of int: data;
array[1..n] of int: input;
% decision variables
array[1..k] of var int: x; % the closeness score for each product
array[1..k] of var 0..1: y; % 1: this products is nearest (as array)
% var set of 1..k: y; % products closest to input (as set)
var int: z; % the minimum score
function var int: score(array[int] of var int: a, array[int] of var int: b) =
let {
var int: t = sum([abs(a[i]-b[i]) | i in index_set(a)])
} in
t
;
solve minimize z;
constraint
forall(i in 1..k) (
x[i] = score(data[i,..], input) /\
(y[i] = 1 <-> z = x[i]) % array
% (i in y <-> x[i] = z) % set
)
/\
minimum(z, x)
;
output [
"input: \(input)\n",
"z: \(z)\n",
"x: \(x)\n",
"y: \(y)\n\n"
]
++
[
% using array representation of y
if fix(y[i]) = 1 then
"nearest: ix:\(i) \(data[i,..])\n" else "" endif
| i in 1..k
];
% data
k = 10;
n = 4;
% random features for the products
data = array2d(1..k,1..n,
[
3,6,7,5,
3,5,6,2,
9,1,2,7,
0,9,3,6,
0,5,2,4, % score 5
1,8,7,9,
2,0,2,3, % score 5
7,5,9,2,
2,8,9,7,
3,6,1,2]);
input = [1,2,3,4];
% input = [7,5,9,2]; % exact match for product 8
The output is:
input: [1, 2, 3, 4]
z: 5
x: [11, 10, 13, 10, 5, 15, 5, 17, 16, 10]
y: [0, 0, 0, 0, 1, 0, 1, 0, 0, 0]
nearest: ix:5 [0, 5, 2, 4]
nearest: ix:7 [2, 0, 2, 3]
I have matrix of coordinates in format XXYY where XX and YY are numbers (0 to 10,000) but Y is represented using letters (A = 1, B = 2, AA = 27 and so on).
points = ["2B", "29AA", "18F", "5AG"]
how can i convert this to something like?
xy_points = [(2, 2), (29, 27), (18, 6), (5, 33)]
My first thought was to use int() and ord(), but things get complicated when a Y coordinate is more than one letter (AZ, AE, BE).
A = ["1", "2", "3", "B"]
C = [list(map(lambda x: int(x) if x.isdigit() else ord(x) - 64, A))]
print(C)
I know i can get the string letter by letter and convert it to an integer using base 26. (i.e. for AG would be (ord("A") - 64) * 26 + (ord("G") - 64)). But that would involve a lot of lines.
Is there a simple way to do so?
A recursive function like this will do:
def y(s):
if len(s) == 1:
return ord(s[0]) - ord('A') + 1
return y(s[-1]) + 26 * y(s[:-1])
print(y('B'), y('AA'), y('F'), y('AG'), y('AA'), y('BA'), y('ZZ'), y('AAA'))
This outputs:
2 27 6 33 27 53 702 703
I have a list and I want to find all the multiples of that number within a certain tolerance as well as get their indices:
def GetPPMError(t, m):
"""
calculate theoretical (t) and measured (m) ppm
"""
return (((t - m) / t) * 1e6)
multiple = n*1.0033
a = 100
b = [100, 101, 101.0033,102, 102.0066,102.123,103.0099]
results = [p for p in b if abs(GetPPMError(a,b)) < 10]
So I want to find all the multiples like 102.0066 and 103.0099 etc.
where a = 100 + 1*1.0033, a = 100 + 2*1.0033, a = 100 + 3*1.0033 etc
So the result would be the indexes.
Results for the indexes:
[2, 4, 6]
and:
[101.0033, 102.0066, 103.0099]
for the values.
This works for your data:
multiple = 1.0033
a = 100
digits = 6
res = []
res_index = []
for n, x in enumerate(b):
diff = round((x - a) / multiple, digits)
if diff != 0 and diff == int(diff):
res.append(x)
res_index.append(n)
print(res)
print(res_index)
Output:
[101.0033, 102.0066, 103.0099]
[2, 4, 6]
I am having problems with my quicksort function constantly re cursing the best of three function. I dont know why it is doing that and i need help. I am trying to practice this for my coding class next semester and this is one of the assignments from last year that my friend had and im lost when it comes to this error
This is my quicksort function:
def quick_sort ( alist, function ):
if len(alist) <= 1:
return alist + []
pivot, index = function(alist)
#print("Pivot:",pivot)
left = []
right = []
for value in range(len(alist)):
if value == index:
continue
if alist[value] <= pivot:
left.append(alist[value])
else:
right.append(alist[value])
print("left:", left)
print("right:", right)
sortedleft = quick_sort( left, function )
print("sortedleft", sortedleft)
sortedright = quick_sort( right, function )
print("sortedright", sortedright)
completeList = sortedleft + [pivot] + sortedright
return completeList
#main
alist = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
x = quick_sort(alist, best_of_three)
print(x)
this is my best of three function:
def best_of_three( bNlist, nine = False ):
rightindex = 2
middleindex = 1
if nine == False:
left = blist[0]
rightindex = int(len(blist) - 1)
rightvalue = int(blist[rightindex])
middleindex = int((len(blist) - 1)/2)
middlevalue = int(blist[middleindex])
bNlist.append(left)
bNlist.append(middlevalue)
bNlist.append(rightvalue)
BN = bNlist
print("Values:",BN)
left = bNlist[0]
middle = bNlist[1]
right = bNlist[2]
if left <= middle <= right:
return middle , middleindex
elif left >= middle >= right:
return middle, middleindex
elif middle <= right <= left:
return right, rightindex
elif middle >= right >= left:
return right, rightindex
else:
return left, 0
#main
bNlist = []
print('Best of Three')
blist = [54,26,93,17,77,31,44,55]
print("")
print( "List: [54,26,93,17,77,31,44,55]" )
x, index = best_of_three(bNlist)
print("Pivot: ",x)
print("----------------------------")
i really dont know why it keeps infinitely re cursing,
There is also a third function called ninther
def ninther( bNlist ):
stepsize = int(len(blist) / 9)
left = 0
middle = left + 2
right = left + 2 * stepsize
blist[left]
blist[middle]
blist[right]
leftvalue = blist[left]
rightvalue = blist[right]
middlevalue = blist[middle]
left2 = right + stepsize
middle2 = left2 + 2
right2 = left2 + 2 * stepsize
blist[left2]
blist[middle2]
blist[right2]
left2value = blist[left2]
middle2value = blist[middle2]
right2value = blist[right2]
left3 = right2 + stepsize
middle3 = left3 + 2
right3 = left3 + 2 * stepsize
blist[left3]
blist[middle3]
blist[right3]
left3value = blist[left3]
middle3value = blist[middle3]
right3value = blist[right3]
bN3list = []
bN2list = []
bNlist = []
bNlist.append(leftvalue)
bNlist.append(middlevalue)
bNlist.append(rightvalue)
bN2list.append(left2value)
bN2list.append(middle2value)
bN2list.append(right2value)
bN3list.append(left3value)
bN3list.append(middle3value)
bN3list.append(right3value)
BN3 = bN3list
BN2 = bN2list
BN = bNlist
print("Ninter ")
print("Group 1:", BN)
print("Group 2:", BN2)
print("Group 3:", BN3)
x = best_of_three(bNlist, True)[0]
c = best_of_three(bN2list, True)[0]
d = best_of_three(bN3list, True)[0]
print("Median 1:", x)
print("Median 2:", c)
print("Median 3:", d)
bN4list = [x,c,d]
print("All Medians:", bN4list)
z = best_of_three(bN4list, True)
return z[0], z[1]
#main
blist = [2, 6, 9, 7, 13, 4, 3, 5, 11, 1, 20, 12, 8, 10, 32, 16, 14, 17, 21, 46]
Y = ninther(blist)
print("Pivot", Y)
print("----------------------------")
i have looked everywhere in it and i cant figure out where the problem is when calling best of three
Summary: The main error causing infinite recursion is that you don't deal with the case where best_of_three receives a length 2 list. A secondary error is that best_of_three modifies the list you send to it. If I correct these two errors, as below, your code works.
The details: best_of_three([1, 2]) returns (2, 3), implying a pivot value of 2 at the third index, which is wrong. This would give a left list of [1, 2], which then causes exactly the same behavior at the next recursive quick_sort(left, function) call.
More generally, the problem is that the very idea of choosing the best index out of three possible values is impossible for a length 2 list, and you haven't chosen how to deal with that special case.
If I add this special case code to best_of_three, it deals with the length 2 case:
if len(bNlist) == 2:
return bNlist[1], 1
The function best_of_three also modifies bNlist. I have no idea why you have the lines of the form bNlist.append(left) in that function.
L = [15, 17, 17, 17, 17, 17, 17]
best_of_three(L)
print(L) # prints [15, 17, 17, 17, 17, 17, 17, 54, 17, 55]
I removed the append lines, since having best_of_three modify bNlist is unlikely to be what you want, and I have no idea why those lines are there. However, you should ask yourself why they are there to begin with. There might be some reason I don't know about. When I do that, there are a couple of quantities you compute that are never used, so I remove the lines that compute those also.
Then I notice you have the code
rightindex = 2
middleindex = 1
if nine == False:
rightindex = int(len(blist) - 1)
middleindex = int((len(blist) - 1)/2)
left = bNlist[0]
middle = bNlist[1]
right = bNlist[2]
This doesn't seem to make any sense, since you set rightindex and middleindex to other values, but then you still access values using the old indices (2 and 1 respectively). So I removed the if nine == False block. Again, ask yourself why you had this code to begin with, maybe there's some other way you should modify this to account for something I don't know about.
The result is the following for best_of_three:
def best_of_three(bNlist):
print(bNlist)
if len(bNlist) == 2:
return bNlist[1], 1
rightindex = 2
middleindex = 1
left = bNlist[0]
middle = bNlist[1]
right = bNlist[2]
if left <= middle <= right:
return middle , middleindex
elif left >= middle >= right:
return middle, middleindex
elif middle <= right <= left:
return right, rightindex
elif middle >= right >= left:
return right, rightindex
else:
return left, 0
If I use this, your code does not recurse infinitely, and it sorts.
I don't know why you mentioned ninther at all, since it seems to have nothing to do with your question. You should probably edit it to remove that code.