How to calculate DEF total using RRDTOOL - python-3.x

I am trying to calculate the total octets_in using python rrdtool. I have this but it's not giving me the correct output.
here is a data sample
[
406.29947919463086
],
[
433.0391666666667
],
[
430.70380365448506
]
I want the total to be 1269
my def, cdef and vdef are
f'DEF:OCTETS_IN={self.file_name}:OCTETS_IN:AVERAGE'
'CDEF:octets_in=OCTETS_IN,PREV,ADDNAN',
'VDEF:out_octets_in_total=octets_in,AVERAGE'
The only operators I can use from rrdtool are AVERAGE, MINIMUM, MAXIMUM and PERCENT and they all give the wrong results.
Anybody know how to do this?

If you want to calculate the total of a rate variable over the time period, then there is a useful VDEF function to calculate sum( var x stepsize )
For example,
DEF:octetspersec=file.rrd:IN:AVERAGE
VDEF:octetstotal=octetspersec,TOTAL
Now, octetstotal holds the total number of octets over the graph time window based on the rate held in the octetspersec variable.
If you have an older version of RRDTool, you may not have the TOTAL function. In this case, use AVERAGE, then multiply by STEPWIDTH and the pixel width of the graph.
Note that if your variable already holds the number of bytes for that interval, then you will not need to multiply by the step width. Since the TOTAL function does this (as it assumes the variable is a rate) you then need to divide the VDEF result by STEPWIDTH again.
For more details on using the RPN functions, see here

Related

Excel Random number from a set of options

In MS Excel, how can I randomly sum up to a target number with numbers divisible by 5?
For example, I would like a completely random output of numbers divisible by 5 (5,10,15,20….) in cells B1:B100, to add up to 10000.
I initially looked at the CHOOSE(RANDBETWEEN) option but I can't get up to make the numbers add up to 10000.
In Office 365, B1 enter the formula:
=LET(
rndArr,RANDARRAY(100,1),
Correction,INT(SEQUENCE(100,1,1,-1/100)),
INT(rndArr/SUM(rndArr)*2000)*5+IF(Correction=1,10000-SUM(INT(rndArr/SUM(rndArr)*2000)*5),0))
EDIT: the below added in response to the comment about constraining it to a min/max. It's not actually foolproof for all min/max values, but seemed to work well enough for me with the values you supplied.
=LET(
Total, 10000,
Min, 10, Max, 300,
rndArr, RANDARRAY(100, 1),
Correction, SEQUENCE(100, 1, 1, 1) = MATCH(MIN(rndArr), rndArr, 0),
rndArr5, INT(rndArr/SUM(rndArr)*Total/5)*5,
rndArrMinMax, IFS(rndArr5 < Min, Min, rndArr5 > Max, Max, TRUE, rndArr5),
rndArrMinMax + (Total-SUM(rndArrMinMax)) * Correction
)
Explanation of what that does:
Enter Total, Min and Max variables
create rndArr, an array of random numbers (that is the correct size, 100 rows x 1 col)
create Correction, a boolean array of the same size as rndArr where the only TRUE value is the position of the smallest value in rndArr. This is because we'll need to add a figure in later to ensure the total is correct, and want to add it to the smallest number in the array (best possible chance that it won't go above our maximum, remember I said this wasn't foolproof for all values).
create rndArr5, which proportionately increases rndArr so it totals 2000, rounds down to nearest integers, then multiplies by 5. The result is an array of random multiples of 5 that totals somewhere below 10000
create rndArrMinMax by checking rndArr5 (our progress so far) against desired min and max values, editing any outside of our desired range to be the min or max value respectively.
Final output value is that corrected value, plus any difference to make the correct total (that's Total - SUM(rndArrMinMax), which is multiplied by our Correction boolean array so it only gets added on the smallest value in the array. Again, this may result in that smallest value going over the max if the totals are way out and/or the Max is very small, but there's not much you can do about that with random numbers.

How can I use simulation tool in Excel for solving the following problem related to probability?

Trial Number 1 2 3 4 5 ........ 2000000 (two million)
Success in nth attempt 12 4 21 5 10 12
Note: Imagine throwing a dice where each outcome has probability of 1/10 (not 1/6 as it is usual for dice). For us "success" means throwing a "3". For each trial (see above) we keep throwing dice until we get "3". For example, above I assume that during first trial we threw dice 12 times and could get "3" only on 12th attempt. The same for other trials. For instance, on 5th trial we threw dice 10 times and could get "3" only on 10th attempt.
We need to simulate this for 2 million times (or lower than that, let's say 500,000 times).
Eventually we need to calculate what percent of "success" happens in interval of 10-20 tries, 1-10 tries etc.
For example, out of 2000000 trials in 60% of cases (1200000) we get "3" in between 10th and 20th attempts of throwing a dice.
Can you please help?
I performed a manual simulation, but could not create a simulation model. Can you please help?
This might be not a good solution for a large dataset as is your intent. Probably Excel is not the most efficient tool for that. Anyway here is a possible approach.
In cell A1, put the following formula:
=LET(maxNum, 10, trialNum, 5, maxRep, 20, event, 3, cols, SEQUENCE(trialNum,1),
rows, SEQUENCE(maxRep, 1), rArr, RANDARRAY(maxRep, trialNum, 1, maxNum, TRUE),
groupSize, 10, startGroups, SEQUENCE(maxRep/groupSize, 1,,groupSize),
GROUP_PROB, LAMBDA(matrix,start,end, LET(result, BYCOL(matrix, LAMBDA(arr,
LET(idx, IFERROR(XMATCH(event, arr),0), IF(AND(idx >= start, idx <= end), 1, 0)))),
AVERAGE(result))),
HREDUCE, LAMBDA(arr, LET(idx, XMATCH(event, arr),
IF(ISNUMBER(idx), FILTER(arr, rows <= idx),event &" not found"))),
trials, DROP(REDUCE(0, cols, LAMBDA(acc,col, HSTACK(acc,
HREDUCE(INDEX(rArr,,col))))),,1),
dataBlock, VSTACK("Trials", trials),
probBlock, DROP(REDUCE(0, startGroups, LAMBDA(acc,ss,
VSTACK(acc, LET(ee, ss+groupSize-1, HSTACK("%-Group "&ss&"-"&ee,
GROUP_PROB(trials, ss, ee))
))
)),1),
IFERROR(HSTACK(dataBlock, probBlock), "")
)
And here is the output:
Explanation
We use LET for easy reading and composition. We first define the parameters of the experiment:
maxNum, the maximum random number to be generated. The minimum will be 1.
trialNum, the number of trials (in our case the number of columns)
maxRep, the maximum number of repetitions in our case the number of rows.
rows and cols rows and columns respectively
event, our successful event, in our case 3.
groupSize, The size of each group for calculating the probability of each group
startGroups The start index position of each group
rArr, Random array of size maxRep x trialNum. The minimum random number will be 1 and the maximum maxNum. The last input argument of RANDARRAY ensures it generates only integer numbers.
GROUP_PROB is a user LAMBDA function to calculate the probability of our successful event: number 3 was generated.
LAMBDA(matrix,start,end, LET(result, BYCOL(matrix, LAMBDA(arr,
LET(idx, IFERROR(XMATCH(event, arr),0), IF(AND(idx >= start, idx <= end), 1, 0)))),
AVERAGE(result)))
Basically, for each column (arr) of matrix, finds the index position of the event and check if the index position belongs to the reference interval: start, end, if so return 1, otherwise 0. Finally, the AVERAGE function serves to calculate the probability. If the event was not generated, then it counts as 0 too.
We use the DROP/REDUCE/VSTACK or HSTACK pattern. Please check the answer to the question: how to transform a table in Excel from vertical to horizontal but with different length provided by #DavidLeal.
HREDUCE user LAMBDA function filters the rArr until the event is found. In case the event was not found, then it returns a string indicating the event was not found.
The name probBlock builds the array with all the probability groups

Count Waveform Periods and Calculate Frequency

I have some oscillating time v displacement excel data from an actuator that I need to analyze and my goal is to be able to count the cycles using the amount of times the displacement value crosses 0. Almost like counting the period of a sine wave. The problem I am having is that the frequency of this data changes several times throughout the data set and may not always or ever = 0. I think if I had a way to count the amount of times the displacement value "crossed" zero I could use every 3 of those points to calculate the wave forms cycles and frequency. I have written a lot of simple things in VBA but by no means am I an expert so any help would be appreciated.
I think an easy way to get it done would be to save compare the sign of the previous value to the current value's sign.
For example, value1 = -0.1236 , value2 = 0.5482. So inbetween those two values, you know it must have crossed 0. You can have the program check each value in the data, count up all the times the sign changes and that should be the number you're looking for.
Example code of how to compare the values:
If Current_Value < 0 <> Previous_Value < 0 then Counter = Counter + 1

meaning V value, wilcoxen signed rank test

I have a question about my results of the Wilcoxon signed rank test:
My data consists of a trial with 2 groups (paired) in which a treatment was used. The results were scored in %. Groups consist of 131 people.
When I run the test in R, I got the following result:
wilcox.test(no.treatment, with.treatment, paired=T)
# Wilcoxon signed rank test with continuity correction
# data: no.treatment and with.treatment V = 3832, p-value = 0.7958
# alternative hypothesis: true location shift is not equal to 0
I am wondering what the V value means. I read somewhere that it has something to do with the number of positive scores (?), but I am wondering if it could tell me anything about the data and interpretation?
I'll give a little bit of background before answering your question.
The Wilcoxon signed rank sum test compares two values between the same N people (here 131), like for example blood values were measured for 131 people at two time points. The purpose of the test is to see whether the blood values have changed.
The V-statistic you are getting does not have a direct interpretation. This value is based on the pairwise difference between the individuals in your two groups. It is a value for a variable, that is supposed to follow a certain probability distribution. Intuitively speaking, you can say that the larger the value for V, the larger the difference between the two groups you sampled.
As always in hypothesis testing, you (well, the wilcox.test function) will calculate the probability that the value (V) of that variable is equal to 3832 or larger
prob('observing a value of 3832 or larger, when the groups are actually the same')
If there is really no difference between the two groups, the value for V will be 'close to zero'. Whether the value V you see is 'close to zero' depends on the probability distribution. The probability distribution is not straightforward for this variable, but luckily that doesn't matter since wilcoxon knows the distribution and calculates the probability for you (0.7958).
In short
Your groups do not significantly differ and V doesn't have a clear interpretation.
The V statistic produced by the function wilcox.test() can be calculated in R as follows:
# Create random data between -0.5 and 0.5
da_ta <- runif(1e3, min=-0.5, max=0.5)
# Perform Wilcoxon test using function wilcox.test()
wilcox.test(da_ta)
# Calculate the V statistic produced by wilcox.test()
sum(rank(abs(da_ta))[da_ta > 0])
The user MrFlick provided the above answer in reply to this question:
How to get same results of Wilcoxon sign rank test in R and SAS.
The Wilcoxon W statistic is not the same as the V statistic, and can be calculated in R as follows:
# Calculate the Wilcoxon W statistic
sum(sign(da_ta) * rank(abs(da_ta)))
The above statistic can be compared with the Wilcoxon probability distribution to obtain the p-value. There is no simple formula for the Wilcoxon distribution, but it can be simulated using Monte Carlo simulation.
The value of V does not mean the number of positive scores, but the sum of these positive scores.
As well there is a measurement for the sum for the negative scores, that this test does not provide. A brief script for calculating the sum for positive and for negative scores is provided in the following example:
a <- c(214, 159, 169, 202, 103, 119, 200, 109, 132, 142, 194, 104, 219, 119, 234)
b <- c(159, 135, 141, 101, 102, 168, 62, 167, 174, 159, 66, 118, 181, 171, 112)
diff <- c(a - b) #calculating the vector containing the differences
diff <- diff[ diff!=0 ] #delete all differences equal to zero
diff.rank <- rank(abs(diff)) #check the ranks of the differences, taken in absolute
diff.rank.sign <- diff.rank * sign(diff) #check the sign to the ranks, recalling the signs of the values of the differences
ranks.pos <- sum(diff.rank.sign[diff.rank.sign > 0]) #calculating the sum of ranks assigned to the differences as a positive, ie greater than zero
ranks.neg <- -sum(diff.rank.sign[diff.rank.sign < 0]) #calculating the sum of ranks assigned to the differences as a negative, ie less than zero
ranks.pos #it is the value V of the wilcoxon signed rank test
[1] 80
ranks.neg
[1] 40
CREDITS: https://www.r-bloggers.com/wilcoxon-signed-rank-test/
(They also provide a nice context for it.)
You can compare also both of these numbers to their average (in this case, 60), that would be the expected value for each side, i.e. positive ranks summing 60 and negative ranks summing 60 means complete equivalence of the sides. Do positive ranks summing 80 and negative ranks summing 40 can also be considered equivalent? (i.e. could we just attribute this difference of "20" to stochastic reasons or is this distant enough for us to reject the hypothesis of no-difference?)
So, as they explain, the critical interval for this case is [25,95]. Checking on a table for critical values for the Wilcoxon rank signed test, the critical value for this example is 25 (15 pairs at 5% on a two-tailed test; and 120-25 = 95...). Meaning that the interval [40,80] is not "big enough" to discard the possibility that the differences are purely due to random sampling. (Consistently, the p-value is above the alpha).
To compare the sum of positive scores to the sum of negative scores helps to determine the significance of the difference, it enriches the analysis. Also, the positive ranks themselves are input for the calculation of the p-value of the test, therefore the interest in them.
But to extract meaning from a simply reported sum of positive ranks (V), only, I think that is not straightforward. In terms of providing information, I believe that the least to do is to also check the sum of the negative ranks, too, to have a more consistent idea of what is happening. (of course, along with general info, like sample size, p-value, etc).
I, too, was confused about this seemingly mysterious "V" statistic. I realize there are already some helpful answers here, but I did not really understand them when I first read over them. So here I am explain it again in a way that I finally understood it. Hopefully it helps others if they are also still confused.
The V-statistic is the sum of ranks assigned to the differences with positive signs. Meaning, when you run a Wilcoxon Signed Rank test, it calculates a sum of negative ranks (W-) and a sum of positive ranks (W+). The test statistic (W) is usually the minimum value either (W-) or (W+), however the V-statistic is just going to be (W+).
To understand the importance of this, if the null hypothesis is true, (W+) and (W-) would be similar. This is because given the number of samples (n), your (W+) and (W-) will have a maximum possible combined value or, (W+)+(W-)=n(n+1)/2. If this maximum value is divided somewhat evenly, than there is not much of a difference between the paired sample sets and we accept the null. If there is a large difference between (W+) and (W-) than there is a large difference between the paired sample sets, and we have evidence for the alternative hypothesis. The degree of the difference and its significance relates to the critical value chart for W.
Here are particularly helpful sites to check out if the concept is still not 100%:
1.) https://mathcracker.com/wilcoxon-signed-ranks
2.) https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric6.html)
3.) https://www.youtube.com/watch?v=TqCg2tb4wJ0
TLDR; the V-statistic reported by R is the same as the W-statistic in cases where (W+) is the smaller of (W+) or (W-).

NORMDIST function is not giving the correct output

I'm trying to use NORMDIST function in Excel to create a bell curve, but the output is strange.
My mean is 0,0000583 and standard deviation is 0,0100323 so when I plug this to the function NORMDIST(0,0000583; 0,0000583; 0,0100323; FALSE) I expect to get something close to 0,5 as I'm using the same value as the mean probability of this value should be 50%, but the function gives an output of 39,77 which is clearly not correct.
Why is it like this?
A probability cannot have values greater than 1, but a density can.
The integral of the entire range of a density function is equal 1, but it can have values greater than one in specific interval. Example, a uniform distribution on the interval [0, ½] has probability density f(x) = 2 for 0 ≤ x ≤ ½ and f(x) = 0 elsewhere. See below:
          
=NORMDIST(x, mean, dev, FALSE) returns the density function. Densities are probabilities per unit. It is almost the probability of a point, but with a very tiny range interval (the derivative in the point).
shg's answer here, explain how to get a probability on a given interval with NORMIDIST and also in what occasions it can return a density greater than 1.
For a continuous variable, the probability of any particular value is zero, because there are an infinite number of values.
If you want to know the probability that a continuous random variable with a normal distribution falls in the range of a to b, use:
=NORMDIST(b, mean, dev, TRUE) - NORMDIST(a, mean, dev, TRUE)
The peak value of the density function occurs at the mean (i.e., =NORMDIST(mean, mean, dev, FALSE) ), and the value is:
=1/(SQRT(2*PI())*dev)
The peak value will exceed 1 when the deviation is less than 1 / sqrt(2pi) ~ 0.399,
which was your case.
This is an amazing answer on Cross Validated Stack Exchange (statistics) from a moderator (whuber), that addresses this issue very thoughtfully.
It is returning the probability density function whereas I think you want the cumulative distribution function (so try TRUE in place of FALSE) ref.

Resources