so here is my situation. I have to solve a math problem on server end and could expect tens of thousands of requests a second so I'm trying to find the most efficient path to solving the problem.
Client will submit some number, let's call it A, and I need to determine base of the exponent in a geometric series (see below) where the result will be as close to A as possible without exceeding it.
The problem is that in the real-world, each value of the geometric series is rounded, so standard math can't apply.
round(x^1)+round(x^2)+round(x^3).
I can use the partial sum of geometric series equation to find some rough upper and lower limits using:
((x)^(n+1)-1)/((x)-1)
So say x=2 is a lower limit and x=2.03 is an upper limit... and the value i'm solving for is x=2.02392372838123.
So far the only solution i found was to use a recursive function to go through decimals individually testing until I find the number, but the load on the server is too high at the volume of requests I expect. (I am using node.js).
Does anyone have any thoughts or suggestions on a more efficient way to solve this? Again the only reason I can't solve this with math alone (to the best of my skill) is because of the real-world rounding of numbers in the sum.
Thanks.
Related
I have a few questions regarding using a random effect in a GAM. First, how do you interpret and communicate the output graph?
I have fire modeled as a random effect in this GAM because it is largely a random occurrence at my different field sites and I only noted it as a binary. It wouldn't work as a normal variable since it has too few levels and there is also relatively few sites with fire. However, it greatly improved model variance capture when included so I don't want to simply exclude it. I don't know how to interpret the output and I am also not entirely confident that there wouldn't be another way to include it in the model other than as a random effect. Any help would be greatly appreciated!
The effect has been modelled as a random slope if you didn't code it as a factor in the data. The value on the y axis is the estimated slope; it will be a little smaller in absolute value than if you use Fire as a linear fixed effect in the model formula because it is being penalised (shrunk) towards zero.
This likely should have been fitted as a binary fixed effect; code Fire as a factor with two levels (Yes/No, or Burned / Unburned say). Just because a variable represents something that is random over the data doesn't mean it is a suitable random effect; fire here has some average effect and the fixed effect describes that well. There's nothing stopping you from using Fire coded as a factor as a random effect via the smooth, but with only two levels it's not going the two intercepts aren't going to be estimate that precisely.
Now, if you had repeated observations on n sites and you thought the Fire effect varied across the n sites then you could do s(Site, Fire, bs = 're') where both Site and Fire are factors and you'll get different Fire effects for each Site. Then the plot you show would have many points on it as it is a QQ-plot of the estimated values for the effect of Fire in each Site, hence 1 point per Site. Given the way this model is estimated, these are somewhat assumed to be distributed Gaussian with some variance that is inversely proportional to the smoothness parameter selected by gam() when fitting this random effect smoother. That's why the default plot is as it is; it's a QQ-plot comparing the observed distribution of estimate values of the random effects against the theoretical expectation.
I had a question about solving a weighted interval scheduling problem given a fixed number of classrooms. So, initially, we are given a set of intervals, each with a starting time and finishing time, and each with a weight. So, the aim of the problem is to find a scheduling in two classrooms that maximizes the weight. Is there an efficient way to do this by dynamic programming?
My approach was trivial, since I built an algorithm that simply maximizes the intervals for each classroom. Is there a better way to do this?
My idea is not fully dynamic programming. But I think it will help.
Sort all classes by their starting time.
Now for a class i find next class j which start time is greater or equal then this end time. (Using binary search you can find this because we have an sorted array which is sorted by starting time)
Assume max_so_far is an array and max_so_far[z] contain the max_weight class from z to last
For all i find the max of summation of weight of class[i] and weight max_so_far[j]
Please find the code here
Time complexity of this code is O(nLog(n)).
Say i have this very common DP problem ( Dynamic programming) -
Given a cost matrix cost[][] and a position (m, n) in cost[][], write a function that returns cost of minimum cost path to reach (m, n) from (0, 0). Each cell of the matrix represents a cost to traverse through that cell. Total cost of a path to reach (m, n) is sum of all the costs on that path (including both source and destination). You can only traverse down, right and diagonally lower cells from a given cell, i.e., from a given cell (i, j), cells (i+1, j), (i, j+1) and (i+1, j+1) can be traversed. You may assume that all costs are positive integers.
PS: answer to this - 8
Now, After solving this question.. Following Question ran through my mind.
Say i have 1000*1000 matrix. and O(n^2) will take some time (<1sec on intel i5 for sure).
but can i minimize it further. say starting 6-8 threads using this algorithm and then synchronizing them back to get the answer at last ? will it be fast or even logically possible to get answer or i should throw this thought away
Generally speaking, on such small problems (as you say < 1sec), parallel computing is less efficient than sequential due to protocol overhead (thread starting and synchronizing). Another problem might be, that you increase the cache miss rate because you're choosing the data you want to operate on "randomly" (not linearly) from the input. However, when it comes to larger problems, say matrices with 10 times as many entries, it sure is worth a thought (or two).
This is a possible solution. Given a 16x16 Matrix, we cut it into 4 equal squares. For each of those squares, one thread is responsible. The number in each little square indicates, after how many time units the result in that square can be calculated.
So, the total time is 33 units (whatever a unit is). Compared to the sequential solution with 64 units, it is just half of it. You can convince yourself that the runtime for any 2^k x 2^k Matrix is 2^(2k - 1) + 1.
However, this is only the first idea that came up to my mind. I hope that there is a (much) faster parallel solution in the world outside.
What's more, for the reasons I mentionned at the beginning of my answer, for all practical purposes, you would not achieve a speedup of 2 with my solution.
I'd start with algorithmic improvements. There's no need to test N2 solutions.
One key is the direction from which you entered a square. If you entered it by moving downward, there's no need to check the square to the right. Likewise, if you entered it by moving right, there's no need to check the path downward from there. The destination of a right-angle turn can always be reached via a diagonal move, leaving out one square and its positive weight/cost.
As far as threading goes, I can see (at least) a couple of ways of splitting things up. One would be to simply queue up requests from when you enter a square. I.e., instead of (for example) testing another square, it queues up requests to test its two or three exits. N threads process those requests, which generate more requests, continuing until all of them reach the end point.
This has the obvious disadvantage that you're likely to continue traversing some routes after serial code could abandon them because they're already longer than the shortest route you've round so far.
Another possibility would be to start two threads, one traversing forward, the other backward. In each, you find the shortest route to any given point along the diagonal, then you're left with a purely linear scan through those candidates to find the shortest sum.
I have been trying to get the big O of four different functions using java and excel. I have no idea what these functions are as they have been hidden. I am not sure if this is the right place / forum to ask.
I have got the functions to give various pieces of data using some java and put them into excel along with the steps (1-n). I then put them into graphs straight away using just the n and the arbitrary measure of time they took if the output was constantly the same. For example if n = 1 always equal 200 for every time its run. For the ones that varied each time the function was run I ran the function 10 times and did an average for each step.
After I had the data I created a graph for each one and put a trendline on it. My f(1) for example was best fitted to a polynomial trendline order 2, which I assume is Quadratic (n2) of big O?. But I needed to prove it was n2, so I did =Steps/LOG(N) which made it fit best to a polynomial trendline order 3, which I assume is Cubic (n3)? (Is that right?)
I really have no idea what to do next to 'prove' that this function is Quadratic or Cubic or how to prove its best case / worst case.
So basically I am trying to work out what the missing step is.
Computation
Graph
Trendline
??? - Proof that the function has big O(?)
When you say "if n=1 always equal 200" does that mean if n=1 it takes 200 steps to run? If that's the case this function would be 200n and this O(n).
I think to solve this you should call each function on different values (I'd start with 10, 20, 30 ... ect) up to some high number. Capture these values and plot them in Excel. Then use the built in trend line function. This should give you a rough estimate of what the run time is. From there you should be able to get the Big-O.
Generally speaking when you are numerically evaluating and integral, say in MATLAB do I just pick a large number for the bounds or is there a way to tell MATLAB to "take the limit?"
I am assuming that you just use the large number because different machines would be able to handle numbers of different magnitudes.
I am just wondering if their is a way to improve my code. I am doing lots of expected value calculations via Monte Carlo and often use the trapezoid method to check my self of my degrees of freedom are small enough.
Strictly speaking, it's impossible to evaluate a numerical integral out to infinity. In most cases, if the integral in question is finite, you can simply integrate over a reasonably large range. To converge at a stable value, the integral of the normal error has to be less than 10 sigma -- this value is, for better or worse, as equal as you are going to get to evaluating the same integral all the way out to infinity.
It depends very much on what type of function you want to integrate. If it is "smooth" (no jumps - preferably not in any derivatives either, but that becomes progressively less important) and finite, that you have two main choices (limiting myself to the simplest approach):
1. if it is periodic, here meaning: could you put the left and right ends together and the also there have no jumps in value (and derivatives...): distribute your points evenly over the interval and just sample the functionvalues to get the estimated average, and than multiply by the length of the interval to get your integral.
2. if not periodic: use Legendre-integration.
Monte-carlo is almost invariably a poor method: it progresses very slow towards (machine-)precision: for any additional significant digit you need to apply 100 times more points!
The two methods above, for periodic and non-periodic "nice" (smooth etcetera) functions gives fair results already with a very small number of sample-points and then progresses very rapidly towards more precision: 1 of 2 points more usually adds several digits to your precision! This far outweighs the burden that you have to throw away all parts of the previous result when you want to apply a next effort with more sample points: you REPLACE the previous set of points with a fresh new one, while in Monte-Carlo you can just simply add points to the existing set and so refine the outcome.