How can I make sure that a rounding error doesn't happen when I multiply a number, round it, then pull it back out? - rounding

We have this API that we need to make payments to. On our side we have an initial payment and a fee of 3% that we add to get the total.
We take this and pass it to the API which then pulls it apart. It knows about the 3% fee and breaks it down to the fee and the payment by takeing the total amount and multipling by 97.09% to get the payment.
The problem we are having is that the API only excepts to 2 decimals places but we need things to work out to the penny.
So for example:
payment: $100.01
makes fee of $3.0103
total = payment + fee = 103.0103 (rounded to 103.01)
reverseEngineered = total * percent = 103.01 * .9709
leaves us with 100.012409 (100.01)
Which is correct but for this example
payment = $333.33
makes fee of 9.9999
total = payment + fee = 343.3299 (rounded to 343.33)
reverseEngineered = total * percent = 343.33 * .9709
leaves us with 333.339097 (333.34)
333.33 != 333.34 so there is a problem when rounding.
I don't control the API otherwise I would have the percent be more accurate (97.08737864%).
Any ideas on how this can be done or is there no way to ensure it returns witout rounding errors.
After doing some math i found that using %2.99721907 as the fee percentage make more numbers work out
{
Though process x*y = z therefor z / y = x;
x / .9709 = z therefor z * .9709 = x
1 / .9709 = 1.299721907
}
Example
payment = $333.33
makes fee of 9.9906303260
total = payment + fee = 343.3206 (rounded to 343.32)
reverseEngineered = total * percent = 343.32 * .9709
leaves us with 333.329388 (333.33)
But I'm not sure this will always be the case. Does anyone know a way I could be sure? Or will this not work for every number?
Edit:
I'm going to be more clear about the standing with the API. We didn't write/ don't have control over the API. A company we are working with do. We might be able to suggest changes but nothing more that that.
When we send the payment over the API on the other companies end they are breaking the payment and fee apart and sending the funds to two separate accounts. This is why the fee needs to be reverse engineered

Since 97.08737864 does not equal 97.09, you're screwed. Why are you using an API for such a simple calculation?

Well, as I think you know by now, it will never work-you need to change the api. You can't redefine math.

I was testing this out in php/Laravel. Dumping the starting payment and the final result to the log files along with the overall difference between the original payment and the final result payment. This is the code I used for testing.
<?php
namespace App\Http\Controllers;
use App\Project;
class ProjectsController extends Controller
{
public function run(){
set_time_limit(300000);
for ($payment = 330; $payment < 340; $payment += .01) {
//Fix the weird increment problem with floats
$payment = round($payment,2);
$fee = $this->applyFee($payment);
$total = $fee + $payment;
$returned = $this->passToAPI($total,true,false);
// \Log::alert($payment . ": " . ($returned["Payment"] - $payment));
\Log::alert($payment .": " . round($returned["Payment"],2));
\Log::alert($payment - $returned["Payment"]);
if($payment != round($returned["Payment"],2)){
\Log::alert($payment . " is unequal");
}
}
}
public function passToAPI($value, $round = true, $roundResult = true){
$rndTotal = $round ? round($value,2) : $value;
$Payment = $this->reverseEngineerPayment($rndTotal);
$Fee = $rndTotal - $Payment;
$Payment = $roundResult ? round($Payment,2) : $Payment;
$Fee = $roundResult ? round($Fee,2) : $Fee;
return array("Payment" => $Payment, "Fee" => $Fee);
}
public function reverseEngineerPayment($total){
return $total * .9709;
}
public function applyFee($payment){
return $payment * .0299721907;
}
public function reverseEngineerFee($total){
return $total * .0291;
}
}
When using .03 for the applyFee function the results were somthing like this
[2018-07-12 13:47:10] local.ALERT: 330: 330.01
[2018-07-12 13:47:10] local.ALERT: -0.0089099999999576
[2018-07-12 13:47:10] local.ALERT: 330 is unequal
[2018-07-12 13:47:10] local.ALERT: 330.01: 330.02
[2018-07-12 13:47:10] local.ALERT: -0.0086190000000101
[2018-07-12 13:47:10] local.ALERT: 330.01 is unequal
[2018-07-12 13:47:10] local.ALERT: 330.02: 330.03
[2018-07-12 13:47:10] local.ALERT: -0.0083280000000059
[2018-07-12 13:47:10] local.ALERT: 330.02 is unequal
[2018-07-12 13:47:10] local.ALERT: 330.03: 330.04
[2018-07-12 13:47:10] local.ALERT: -0.0080370000000016
[2018-07-12 13:47:10] local.ALERT: 330.03 is unequal
[2018-07-12 13:47:10] local.ALERT: 330.04: 330.05
[2018-07-12 13:47:10] local.ALERT: -0.0077459999999974
[2018-07-12 13:47:10] local.ALERT: 330.04 is unequal
...
Which was kind of expected because of the calculation rounding error.
But when I used
.0299721907 for the applyFee function the results were like this
[2018-07-12 13:50:21] local.ALERT: 330: 330
[2018-07-12 13:50:21] local.ALERT: 0.00079900000002908
[2018-07-12 13:50:21] local.ALERT: 330.01: 330.01
[2018-07-12 13:50:21] local.ALERT: 0.0010900000000333
[2018-07-12 13:50:21] local.ALERT: 330.02: 330.02
[2018-07-12 13:50:21] local.ALERT: 0.0013809999999808
[2018-07-12 13:50:21] local.ALERT: 330.03: 330.03
[2018-07-12 13:50:21] local.ALERT: 0.001671999999985
[2018-07-12 13:50:21] local.ALERT: 330.04: 330.04
[2018-07-12 13:50:21] local.ALERT: 0.0019630000000461
[2018-07-12 13:50:21] local.ALERT: 330.05: 330.05
They always returned to the penny the result I passed in. It seems like the rounding error only goes up the the decimal after the penny making it so the results are always the desired results.
Note* I also tested this script from 1 to 50,000. and was unable to find any instances where the result wasn't the same.
I realize this is pretty clunky but it works as far as I can tell. If you can find a reason that this wouldn't work out then I'll be happy to not use it, but as far as I can tell it's solid.

Related

OR-Tools VRP: Constrain locations to be served by same vehicle

I would like to constrain locations to be served by the same vehicle.
I used capacity-constraints for achieving this. Say we have l = [[1,2], [3,4]] which means that location 1, 2 must be served by the same vehicle and 3, 4 as well. So 1, 2 ends up on route_1 and 3, 4 on route_2
My code for achieving this is:
for idx, route_constraint in enumerate(l):
vehicle_capacities = [0] * NUM_VEHICLES
vehicle_capacities[idx] = len(route_constraint)
route_dimension_name = 'Same_Route_' + str(idx)
def callback(from_index):
from_node = manager.IndexToNode(from_index)
return 1 if from_node in route_constraint else 0
same_routes_callback_index = routing.RegisterUnaryTransitCallback(callback)
routing.AddDimensionWithVehicleCapacity(
same_routes_callback_index,
0, # null capacity slack
vehicle_capacities, # vehicle maximum capacities
True, # start cumul to zero
route_dimension_name)
The idea is that 1,2 have a capacity demand of each 1 unit (all others have zero). As only vehicle 1 has a capacity of 2 it is the only one able to serve 1,2.
This seems to work fine if len(l) == 1. If greater the solver is not able to find a solution if though I put into l pairs of locations which were on the same route without the above code (hence without the above capacity constraints.
Is there a more elegant way to model my requirement?
Why does the solver fail to find a solution?
I have also considered the possibility of dropping visits (at a high cost) to give the solver the possibility to start from a solution which drops visits such that it will find his way fro this point to a solution without any drops. I had no luck.
Thanks in advance.
Each stop has a vehicle var whose values determine what vehicle is allowed to visit the stop. If you want to have stops 1 and 2 serviced by vehicle 0 use a member constraint on the vehicle var of each stop and set it to [0]. Since you might have other constraints that make stops optional add the value -1 to the list. It is a special value that indicates that the stop is not serviced by a vehicle.
In Python:
n2x = index_manager.NodeToIndex
cpsolver = routing_model.solver()
for stop in [1,2]:
vehicle_var = routing_model.VehicleVar(n2x(stop))
values = [-1, 0]
cpsolver.Add(cpsolver.MemberCt(vehicle_var, values))

Running time for a shortestP alg in unweighted undirected graph in python3

For this kind of problem I thought it would have been better make some BFS-like implementation. I don't know why, after some running time testing, it came out a plot that resembles an exponential function.
So I began to think of my code's correctness: is it really efficient? Can you help me to make a running time analysis for a good algorithm?
I've plotted the log in base 1.5 for the x-axis (since in the code I use a list of the first 30 powers of 1.5 as number of vertices input in a random graph generetor). Still looks exponential...
def bfs_short(graph, source, target):
visited = set()
queue = collections.deque([source])
d={}
d[source]=0
while queue:
u = queue.pop()
if u==target:
return d[target]
if u not in visited:
visited.add(u)
for w in graph[u]:
if w not in visited:
queue.appendleft(w)
d[w]=d[u]+1
The thing is... I didn't posted also the benching input trials which also may cause problems but first of all I want to be sure that the code works fine... solving the problems related to testing is my final purpose.
A problem in your code is that your queue does not take in account the distance to the origin in order to prioritize the next vertex it will pop.
Also, collections.deque is not a priority queue, and thus will not give you the closest unvisited vertex seen so far when you pop an element from it.
This should do it, using heapq, a built-in heap implementation:
import heapq
def bfs_short(graph, source, target):
visited = set()
queue = [(0, source)]
heapq.heapify(queue)
while not queue:
dist, vertex = heapq.heappop(queue)
if vertex == target:
return dist
if vertex not in visited:
visited.add(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
heapq.heappush(queue, (dist + 1, neighbor))
In this version, pairs are pushed in the queue. To evaluate and compare tuples, Python tries to compare their first element, then the second in case of equality, and on and on.
So by pushing dist(vertex, origin) as first member of the tuple, the smallest couple (dist, vertex) in the heap will also be the closest to the origin vertex.

What is the formula for calculating an average rating from ratings originated from different sources?

I want to build a website, where I will show ratings for movies. I want to show there an average rating for a movie and I want to calculate that average from ratings from different sources. The issue is that on some of the desired sources a few people have rated a movie and in other sources thousands of them. E.g. The movie "The Revenant". It has an average rating of 7.9 in IMDB (10000 users have voted) and it also has a 9.9 in "XYZ" website (10 users have voted). How would look like a formula in order to calculate that average?
Initially I thought of just simply assigning weights based on the number of users, but I have the feeling I am missing something. Any ideas?
average = sum_of(rating + number_of_people)/total_sum_of_people
If you want to make every vote weight the same then you should weight your averages by vote, as you initially thought. Formulae would be (9.9 * 10 + 7.8 * 10k) / 10010. You can however weight users from some sites more than others, using arbitrary weights.
Finally, note that computing algorithm for average is better done using an accumulator, in terms of memory and avoiding overflows (if that matters).
acc = scores[0]; u = users[0];
for (i = 1; i < nb_sites; i++)
r = u / (u + users[i])
acc = acc * r + users[i] * (1 - r)
u += users[i]
return acc

Pattern matching benchmarking : Compiletime lookup vs Runtime lookup in D

I need advice on my first D-project . I have uploaded it at :-
https://bitbucket.org/mrjohns/matcher/downloads
IDEA : Benchmarking of 3 runtime algorithms and comparing them to their compile-time variants. The only difference between these is that for the compile time-ones, the lookup tables (i.e. Arrays bmBc, bmGs, and suffixes ) must be computed at compile time( I currently rely on CTFE ) . While for the runtime-ones the lookup tables are computed on runtime.
NB : The pattern matching algorithms themselves need not be executed at compile-time, only the lookup tables.Having stated this the algorithms which run on known( compile-time computed) tables must be faster than the ones which have to compute them at runtime.
My results seem to show something different, only the first pair(
BM_Runtime and BM_Compile-time) yields admissible results, the other two pair give higher execution time for the compile-time variants. I think am missing something here. Please help.
Current Results for the pattern="GCAGAGAG" are as below :-
**BM_Runtime** = 366 hnsecs position= 513
**BM_Compile-time** = 294 hnsecs position =513
**BMH_Runtime** = 174 hnsecs position= 513
**BMH_Compile-time** = 261 hnsecs position= 513
**AG_Run-time** = 258 hnsecs position= 513
**AG_Compile-time** = 268 hnsecs position= 513
Running the code : dmd -J. matcher.d inputs.d rtime_pre.d ctime_pre.d && numactl --physcpubind=0 ./matcher
I would appreciate your suggestions.
Thanking you in advince.
Any performance test without activating compiler optimization is not useful. You should add dmd -release -inline -O -boundscheck=off. Also usually performance tests use cycles for repeating calculations. Otherwise you may get incorrect results.

Performance difference RBFS - A*

I've implemented the RBFS as defined in AIMA and wanted to compare it to an implementation of A*. When I try it out on an instance from the TSPlib (ulyssis16 - 16 cities), both come up with a correct solution, the memory usage is also correct (exponential for A*, linear for RBFS). The only weird thing is that the RBFS algorithm is much faster than A* when it shouldn't be the case. A* finished in about 3.5 minutes, RBFS takes only a few seconds. When I count the number of visits for each node, A* visits a lot more nodes than RBFS. That also seems counter-intuitive.
Is this because of the trivial instance? I can't really try it out on a bigger instance as my computer doesn't have enough memory for that.
Has anyone got any suggestions? Both algorithms seem to be correct according to their specifications, their results and memory usage. Only the execution time seems off...
I already looked everywhere but couldn't find anything about a difference in their search strategy. Except for the fact that RBFS can revisit nodes.
Thanks
Jasper
Edit 1: AIMA corresponds to the book Artificial Intelligence a Modern Approach by Russel and Norvig.
Edit 2: TSPlib is a set of instances of the TSP problem (http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/)
Edit 4: The code for the A* algorithm is given as follows. This should be correct.
def graph_search(problem, fringe):
"""Search through the successors of a problem to find a goal.
The argument fringe should be an empty queue.
If two paths reach a state, only use the best one. [Fig. 3.18]"""
closed = {}
fringe.append(Node(problem.initial))
while fringe:
node = fringe.pop()
if problem.goal_test(node.state):
return node
if node.state not in closed:
closed[node.state] = True
fringe.extend(node.expand(problem))
return None
def best_first_graph_search(problem, f):
return graph_search(problem, PriorityQueue(f,min))
def astar_graph_search(problem, h):
def f(n):
return n.path_cost + h(n)
return best_first_graph_search(problem, f)
Where problem is a variable containing details about the problem to be solved (when the goal is reached, how to generate successors, the initial state,...). A node contains the path on how to reach that state by storing the parent node and some other utility functions. Here is an older version of this code http://aima-python.googlecode.com/svn/trunk/search.py) For the TSP problem, the tours are created incrementally. The used heuristic is the minimal spanning tree on the nodes that are not yet visited.
The code for RBFS is as follows:
def rbfs(problem, h):
def f(n):
return n.path_cost + h(n)
def rbfs_helper(node, bound):
#print("Current bound: ", bound)
problem.visit(node)
if (problem.goal_test(node.state)):
return [node, f(node)]
backup = {}
backup[node.state.id] = f(node)
succ = list(node.expand(problem))
if (not succ):
return [None, float("inf")]
for v in succ:
backup[v.state.id] = max(f(v), backup[node.state.id])
while(True):
sortedSucc = sorted(succ, key=lambda node: backup[node.state.id])
best = sortedSucc[0]
if (backup[best.state.id] > bound):
return [None, backup[best.state.id]]
if (len(sortedSucc) == 1):
[resultNode, backup[best.state.id]] = rbfs_helper(best, bound)
else:
alternative = sortedSucc[1]
[resultNode, backup[best.state.id]] = rbfs_helper(best, min(bound, backup[alternative.state.id]))
if (resultNode != None):
return [resultNode, backup[best.state.id]]
[node, v] = rbfs_helper(Node(problem.initial), float("inf"))
return node
It also uses the problem and node as defined above. Those were specifically designed to be used as generic elements.

Resources