Coefficient of Variations? - statistics

I have a list of values incrementing exponentially. I was asked to have multiple Coefficent of variations from them. You might agree with me that CV is only for the whole set of numbers and dividing the set of numbers into subgroups and calculating a CV for each subgroup seems unreasonable. Would there be any statistical idea behind multiple CVs and if there is, how histogram can be made by the CVs, I mean what would the bins of the historgram. I appreciate the answers in advance

I agree with you - it does not make sense to me to calculate multiple CVs for one dataset unless there's some inferential reason for doing so.
That being said, there might actually be a reason for considering sub-groups of a dataset. In the field of Statistics, context is everything. My first thought is to ask your colleague why they want you do proceed that way. Maybe there's a good reason, maybe they don't have as full a grasp of stats as you do, regardless, it should be an enlightening conversation to have.
If you do decide to go this route, here's some R code that might help (R is great - flexible, powerful, and free)
# first, simulating some fake data (100 values of measurement & group for 10 groups)
x <- rnorm(100, mean=10, sd=1)
group <- sample(LETTERS[1:10], 100, replace=T)
# first few values of each
head(data.frame(x, group))
x group
1 10.778480 F
2 9.274193 B
3 9.639143 G
4 9.080369 I
5 10.727895 D
6 10.850306 G
# this is the part you'd actually need...
# calculating the sd & avgs for each group
sds <- tapply(x, group, sd)
avgs <- tapply(x, group, mean)
# then the cv
cvs <- sds/avgs
cvs
A B C D E F G H I J
0.07859528 0.07570556 0.09370247 0.12552468 0.08897856 0.11044543 0.10947615 0.10323379 0.08908262 0.09729945
# and if you want a histogram, R makes it pretty easy
hist(cvs)

Related

Simultaneous Subset sums

I am dealing with a problem which is a variant of a subset-sum problem, and I am hoping that the additional constraint could make it easier to solve than the classical subset-sum problem. I have searched for a problem with this constraint but I have been unable to find a good example with an appropriate algorithm either on StackOverflow or through googling elsewhere.
The problem:
Assume you have two lists of positive numbers A1,A2,A3... and B1,B2,B3... with the same number of elements N. There are two sums Sa and Sb. The problem is to find the simultaneous set Q where |sum (A{Q}) - Sa| <= epsilon and |sum (B{Q}) - Sb| <= epsilon. So, if Q is {1, 5, 7} then A1 + A5 + A7 - Sa <= epsilon and B1 + B5 + B7 - Sb <= epsilon. Epsilon is an arbitrarily small positive constant.
Now, I could solve this as two completely separate subset sum problems, but removing the simultaneity constraint results in the possibility of erroneous solutions (where Qa != Qb). I also suspect that the additional constraint should make this problem easier than the two NP complete problems. I would like to solve an instance with 18+ elements in both lists of numbers, and most subset-sum algorithms have a long run time with this number of elements. I have investigated the pseudo-polynomial run time dynamic programming algorithm, but this has the problems that a) the speed relies on a short bit-depth of the list of numbers (which does not necessarily apply to my instance) and b) it does not take into account the simultaneity constraint.
Any advice on how to use the simultaneity constraint to reduce the run time? Is there a dynamic programming approach I could use to take into account this constraint?
If I understand your description of the problem correctly (I'm confused about why you have the distance symbols around "sum (A{Q}) - Sa" and "sum (B{Q}) - Sb", it doesn't seem to fit the rest of the explanation), then it is in NP.
You can see this by making a reduction from Subset sum (SUB) to Simultaneous subset sum (SIMSUB).
If you have a SUB problem consisting of a set X = {x1,x2,...,xn} and a target called t and you have an algorithm that solves SIMSUB when given two sets A = {a1,a2,...,an} and B = {b1,b2,...,bn}, two intergers Sa and Sb and a value for epsilon then we can solve SUB like this:
Let A = X and let B be a set of length n consisting of only 0's. Set Sa = t, Sb = 0 and epsilon = 0. You can now run the SIMSUB algorithm on this problem and get the solution to your SUB problem.
This shows that SUBSIM is as least as hard as SUB and therefore in NP.

can a random number be added to a set..such that its mean and variance will not change

i have a set of 4 values. i want to generate a random number which will be adding to the each of the set. But after adding ,the values of mean and variance should not change.
Meaning mean and variance of set before adding should be same as after adding the number.i was trying to approach it with genetic algorithm .can anyone please give me more insight on this?
Let us suppose your set is called x. Let us also suppose that you will add values to x to make it y. In R, this could be achieved by
x <- rnorm(4, mean = 5, sd = 2)
x
[1] 5.124843 3.070105 4.444706 6.657949
rand <- rnorm(0, sd(x))/1000 # Divide by 1000 so rand will have minimum
#impact on the mean and variance of x when added
y <- x + rand
y
[1] 5.124799 3.066977 4.444524 6.656452
mean(x); mean(y)
[1] 4.824401
[1] 4.823188
Now this will show some incremental change but to minimize the incremental change, you can scale rand by dividing it by a large number (as I did) or multiplying it by a small number. Another way you can about this is by using the jitter function in R. This function uses a small uniform distribution centered about 0 to sample and add noise to data.
x <- c(1, -.5, 2, -1.2)
jitter(x)
[1] 1.1117953 -0.5391391 2.0695948 -1.1145638
The only downside to jitter is that you cannot scale your noise from outside the function. It will scale your entire x vector.

Nested if in Gnu Mathprog for an energy model

I have a code in Gnu Mathprog for an energy model:
s.t.EBa1_RateOfFuelProduction1{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: OutputActivityRatio[r,t,f,m,y] <> 0}:
RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y] = RateOfProductionByTechnologyByMode[r,l,t,m,f,y];
s.t.EBa4_RateOfFuelUse1{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: InputActivityRatio[r,t,f,m,y]<>0}:
RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y] = RateOfUseByTechnologyByMode[r,l,t,m,f,y];
I want to put these two constraints in one, and i am thinking to insert two conditional expressions(if).The first if, will be referred to technology(t) and fuel(f)where the OutputActivityRatio<>0 and the second one for the same technology(t) it will start checking again the f(fuels) to see if the InputActivityRatio<>0.
Like that:
s.t.RateOfProduction{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR: OutputActivityRatio[r,t,f,m,y] <>0}:
RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y] = RateOfProductionByTechnologyByMode[r,l,t,m,f,y]
If InputActivityRatio[r,t,ff,m,y]<>0 then
RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y] = RateOfUseByTechnologyByMode[r,l,t,m,f,y]
else 0
else 0 ;
My question is: is it possible to have two if in series (nested if) and between them to have an equation as well?How can I write something like that?
Thank you very much!
As described in your other Question (regarding nested if-then-else in mathprog) there are no If-Then-Else statements in mathprog. The workaround with conditional for-loops is also no solution for your problem, since you can only use them in pre- or post processing of your data (you can't use this in your constraints!).
But there are still possibilities to merge your constraints. I think something like the following would work, if your condition is that either Input or Output is 0.
s.t.RateOfProduction{r in REGION, l in TIMESLICE, f in FUEL, t in TECHNOLOGY, m in MODE_OF_OPERATION, y in YEAR}:
(RateOfActivity[r,l,t,m,y]*OutputActivityRatio[r,t,f,m,y])
+ (RateOfActivity[r,l,t,m,y]*InputActivityRatio[r,t,f,m,y])
= RateOfProductionByTechnologyByMode[r,l,t,m,f,y];
Here in the lefthandside summation one multiplication would turn zero.
Since I don't know which parts are variables and which a parameters, this solution could also fail (for example it could be problematic if there is input and output at the same time and the rest of the model doesn't contain the right bounds for that)

Reverse Interpolation

I have a class implementing an audio stream that can be read at varying speed (including reverse and fast varying / "scratching")... I use linear interpolation for the read part and everything works quite decently..
But now I want to implement writing to the stream at varying speed as well and that requires me to implement a kind of "reverse interpolation" i.e. Deduce the input sample vector Z that, interpolated with vector Y will produce the output X (which I'm trying to write)..
I've managed to do it for constant speeds, but generalising for varying speeds (e.g accelerating or decelerating) is proving more complicated..
I imagine this problem has been solved repeatedly, but I can't seem to find many clues online, so my specific question is if anyone has heard of this problem and can point me in the right direction (or, even better, show me a solution :)
Thanks!
I would not call it "reverse interpolation" as that does not exists (my first thought was you were talking about extrapolation!). What you are doing is still simply interpolation, just at an uneven rate.
Interpolation: finding a value between known values
Extrapolation: finding a value beyond known values
Interpolating to/from constant rates is indeed much much simpler than the generic quest of "finding a value between known values". I propose 2 solutions.
1) Interpolate to a significantly higher rate, and then just sub-sample to the nearest one (try adding dithering)
2) Solve the generic problem: for each point you need to use the neighboring N points and fit a order N-1 polynomial to them.
N=2 would be linear and would add overtones (C0 continuity)
N=3 could leave you with step changes at the halfway point between your source samples (perhaps worse overtones than N=2!)
N=4 will get you C1 continuity (slope will match as you change to the next sample), surely enough for your application.
Let me explain that last one.
For each output sample use the 2 previous and 2 following input samples. Call them S0 to S3 on a unit time scale (multiply by your sample period later), and you are interpolating from time 0 to 1. Y is your output and Y' is the slope.
Y will be calculated from this polynomial and its differential (slope)
Y(t) = At^3 + Bt^2 + Ct + D
Y'(t) = 3At^2 + 2Bt + C
The constraints (the values and slope at the endpoints on either side)
Y(0) = S1
Y'(0) = (S2-S0)/2
Y(1) = S2
Y'(1) = (S3-S1)/2
Expanding the polynomial
Y(0) = D
Y'(0) = C
Y(1) = A+B+C+D
Y'(1) = 3A+2B+C
Plugging in the Samples
D = S1
C = (S2-S0)/2
A + B = S2 - C - D
3A+2B = (S3-S1)/2 - C
The last 2 are a system of equations that are easily solvable. Subtract 2x the first from the second.
3A+2B - 2(A+B)= (S3-S1)/2 - C - 2(S2 - C - D)
A = (S3-S1)/2 + C - 2(S2 - D)
Then B is
B = S2 - A - C - D
Once you have A, B, C and D you can put in an time 't' in the polynomial to find a sample value between your known samples.
Repeat for every output sample, reuse A,B,C&D if the next output sample is still between the same 2 input samples. Calculating t each time is similar to Bresenham's line algorithm, you're just advancing by a different amount each time.

Finding number of different paths

I have a game that one player X wants to pass a ball to player Y, but he can be playing with more than one player and the others players can pass the ball to Y.
I want to know how many different paths can the ball take from X to Y?
for example if he is playing with 3 players there are 5 different paths, 4 players 16 paths, if he is playing with 20 players there are 330665665962404000 paths, and 40 players 55447192200369381342665835466328897344361743780 that the ball can take.
the number max. of players that he can play with is 500.
I was thinking in using Catalan Numbers? do you think is a correct approach to solve this?
Can you give me some tips.
At first sight, I would say, that tht number of possible paths can be calculated the following way (I assume a "path" is a sequence of players with no player occuring more than once).
If you play with n+2 players, i.e. player X, player Y and n other players that could occur in the path.
Then the path can contain 0, 1, 2, 3, ... , n-1 or n "intermediate" players between player X (beginning) and player Y (end).
If you choose k (1 <= k <= n) players from n players in total, you can do this in (n choose k) ways.
For each of this subsets of intermediate players, there are k! possible arrangements of players.
So this yields sum(i=0 to n: (n choose i) * i!).
For "better" reading:
---- n / n \ ---- n n! ---- n 1
\ | | \ -------- \ ------
/ | | * i! = / (n-i)! = n! / i!
---- i=0 \ i / ---- i=0 ---- i=0
But I think that these are not the catalan numbers.
This is really a question in combinatorics, not algorithms.
Mark the number of different paths from player X to player Y as F(n), where n is the number of players including Y but not X.
Now, how many different paths are there? Player X can either pass the ball straight to Y (1 option), or pass it to one of the other players (n-1 options). If X passes to another player, we can pretend that player is the new X, where there are n-1 players in the field (since the 'old' X is no longer in the game). That's why
F(n) = 1 + (n-1)F(n-1)
and
F(1) = 1
I'm pretty sure you can reach phimuemue's answer from this one. The question is if you prefer a recursive solution or one with summation.
I'm somewhat of a noob at this kind of searching, but a quick run through the numbers demonstrates the more you can trim, cut out, filter out, the faster you can do it. The numbers you cite are BIG.
First thing that comes to mind is "Is it practical to limit your search depth?" If you can limit your search depth to say 4 (an arbitrary number), your worst case number of possibilities comes out to ...
499 * 498 * 497 * 496 = 61,258,725,024 (assuming no one gets the ball twice)
This is still large, but an exhaustive search would be far faster (though still too slow for a game) than your original set of numbers.
I'm sure others with more experience in this area would have better suggestions. Still, I hope this helps.
If X needs to pass to Y, and there could be P1, P2, ..., Pn players in between and you care about the order of passing then indeed
For 2 extra players you have paths: X-Y, X-P1-Y, X-P2-Y, X-P1-P2-Y, X-P2-P1-Y
Which gives a total of 5 different paths, similarly for 3 extra players you have 16 different paths
First try to reduce the problem to something known, and for this I would eliminate X-Y, they are common to all of the above translates to question: what is the sum of k-permutations for k from 0 to n, where n is the number of P.
This can be given as
f(n):=sum(n!/(n-i)!,i,0,n);
and I can confirm your findings for 19 and 39 (20 and 40 in your notation).
For f(499) I get
6633351524650661171514504385285373341733228850724648887634920376333901210587244906195903313708894273811624288449277006968181762616943058027258258920058014768423359811679381900054568501151839849768338994244697593758840394106353734267539926205845992860165295957099385939316593862710470512043836452624452665801937754479602741031832540175306674471495745716725509714798824661807396000105338256698426305553340786519843729411660457896089840381658295930455362209587765698327585913037665131195504013431486823990271059962837959407778393078276213331859189770016153265512805722812864376997337140529242894215031131618375899072989922780132488077015246576266246551484603286735418485007674249207286921801779414240854077425752351919182464902664206622037834736215298295580945851569079682952183639701057397376328170754187008425429164206646365285647875545882646729176997107332605851460212415526607757545366695048460341802079614840254694664267117469603856584752270653889630424848913719533359942725361985274851471687885265903663806182184272555073708882789845441094009797907518245726494471433964169680271980763830020431957658400573531564215436064984091520
Results obtained with wxMaxima
EDIT: After more clarification from the comments of the question, my answer is absolutely useless :) he definitely wants the number of possible routes, not the best one!
My first thought is why do you want to know these numbers? You're certainly never going to iterate through all the paths available to 500 people (would take far too long) and it's too big to display on a ui in any meaningful way.
I'm assuming that you're going to try to find the best route that the ball can take in which case I would consider looking into algorithms that don't care about the number of nodes in a route.
I'd try looking at the A star algorithm and Dijkstra's algorithm.

Resources