In WEKA, can I roundup a range after discretization? - attributes

I have a numeric type attribute, and I discretized it into 6 bins.
But, after discretization, the range looks like (3.663336-4.325577]
If I want it to be roundup or looked like (3.7-4.3] what should I do?
Thanks.

I don't think there's any way to round to one decimal place through Weka. However, the MathExpression filter provides ceil and floor operations, as well as multiplication and division. This means that before discretization you could run a series of MathExpression filters to get the values you're after. See this page for some example usage.

Related

Format integers as fraction

I have a list of fractions in Excel which I want to format as fractions, including integers. However, by default Excel formats integers as integers which is understandable.
Is there any way to force Excel to format, say, 4/4 as 4/4 instead of 1?
I need it to be stores as values and not as text, so '4/4 wont work. As I need to average a bunch of values from it afterwards.
Apparently I'm the first person ever to take issue with this, because google provides absolutely no help whatsoever :o
Yes, use a custom number format:
?/4
I don't believe what you are trying to accomplish is doable outright, as fractions are really division problems. However, with some formula trickery, you may be able to get something that will work for you.
If you place '7/8 in cell A1 and then use the following formula in cell B1
=DECIMAL(MID(A1,1,FIND("/",A1,1)-1),10)/DECIMAL(MID(A1,FIND("/",A1,1)+1,LEN(A1)),10)
The cell will display the decimal value of the "fraction", in this case 0.875 allowing you to change the denominator at will and still perform math functions.
This works because the formula slices up the "fraction" stored as text and converts it to a number and performs the math.

Excel: Add number before multiplying with PRODUCT(...)

I am calculating the geometric mean of a row in MS Excel by using the GEOMEAN(...) command.
What is the geometric mean: The row could be A1:A10. A geometric mean with
GEOMEAN(A1:A10)
is the product of all 10 cell values (multiplied together) after which the 10th root is taken (mathematically: nth_root(A_1 x A_2 x ... x A_n) ).
The issue: The command GEOMEAN(A1:A10) works fine as long as no cells contain negative values (actually just as long as the product ends up positive). If one cell has a negative value, then taking the root is mathematically an invalid action and Excel gives an error.
The solution: I can work-around this by adding a large enough number such as +1000000 to each value before doing GEOMEAN(A1:A10) and afterwards subtracting -1000000 from the result. This is a mathematical approximation to the pure geometrical mean.
The question: But how do I add +1000000 to each value in Excel? A solution would be to create a whole new extra row where the number is added, and then doing GEOMEAN on this row and subtracting the number from the result. But I would really like to avoid creating a new row, since I have many long data sets to perform this command on.
Is there a way to add the number inside the command itself? To add it onto each value before it is multiplied? Something along the lines of:
GEOMEAN(A1:A10+1000000)-1000000
Solution to avoid the work-around
Based on the answer from and discussion with #ImaginaryHuman072889
It turns out that a working command that avoids any work-around is:
IFERROR(GEOMEAN(A1:A10);-GEOMEAN(ABS(A1:A10)))
If an error are cought by the IFERROR, then we know that a negative result would have appeared, so this is constructed manually in that case.
BUT: This does not take into account the case mentioned by #ImaginaryHuman072889, though, because Excel seems to forbid any negative numbers involved and not just if the inner product is negative. For example, both GEOMEAN(-2,-2) as well as GEOMEAN(-2,-2,-2) give errors in Excel, even though they both should be mathematically valid, giving the results 2 and -2, respectively. To overcome this Excel-issue, we can simply write out the exact same command line manually:
IFERROR(PRODUCT(A1:A10)^(1/COUNTA(A1:A10));-(PRODUCT(ABS(A1:A10))^(1/COUNTA(A1:A10)))))
I add this solution to aid any by-comers who have the same issue. This mathematically works, but the fact that -2 and -2 have the geometrical mean 2 does seem a bit odd and not at all like any useful value of a "mean". It is still mathematically legal as far as I can find (WolframAlpha has no issue with it and the Wikipedia article never mentions a sign).
Your "workaround" of doing this:
GEOMEAN(A1:A10+1000000)-1000000
Is completely wrong. This is absolutely not equal to GEOMEAN(A1:A10).
Simple counter-example:
GEOMEAN({2,8}) returns the value of 4, which is the geometric mean of 2 and 8.
GEOMEAN({2,8}+1)-1 is equal to GEOMEAN({3,9})-1 which is approximately 4.196.
What is a valid workaround is if you multiply each value inside GEOMEAN by a certain value, then divide the result by that value.
Simple example:
GEOMEAN({2,8}*3)/3 is equal to GEOMEAN({6,24})/3 which is 4.
However, this method of multiplying by a constant does not help your situation, since this won't get rid of negative values.
Mathematically speaking, the geometric mean of a positive number and a negative number is an imaginary number, which is presumably why Excel cannot handle it.
Example:
2*-8 = -16
sqrt(-16) = 4i
Therefore, 4i is the geometric mean of 2 and -8. Notice how it has the same magnitude as GEOMEAN({2,8}), just that it is an imaginary number.
All that said... here is what I recommend you doing:
I suggest you return two results, one result is the magnitude of the geometric mean and the other is the phase of the geometric mean.
Formula for magnitude:
= GEOMEAN(ABS(A1:A10))
(Note, this is an array formula, so you'd have to press Ctrl+Shift+Enter instead of just Enter after typing this formula.) The use of ABS converts all negative numbers to positive before the GEOMEAN calculation, guaranteeing a positive geometric mean.
Formula for phase, I would just do something like this:
= IF(PRODUCT(A1:A10)>=0,"Real","Imaginary")
Which obviously returns Real if the geometric mean is a real number and returns Imaginary if the geometric mean is an imaginary number.
EDIT
Technically speaking, some of what I said wasn't completely precise, although the magnitude formula above still stands.
Some things I want to clarify:
If PRODUCT(data) is positive (or zero), then the geometric mean of data is positive (or zero).
If PRODUCT(data) is negative and if the number of entries in data is odd, then the geometric mean of data is negative (but still real).
If PRODUCT(data) is negative and if the number of entries in data is even, then the geometric mean of data is imaginary.
That said... if you want these formulas to be a bit more technically accurate, I would modify to this:
Adjusted formula for magnitude:
= GEOMEAN(ABS(A1:A10))*IF(AND(PRODUCT(A1:A10)<0,MOD(COUNT(A1:A10),2)=1),-1,1)
Adjusted formula for phase:
= IF(AND(PRODUCT(A1:A10)<0,MOD(COUNT(A1:A10),2)=0),"Imaginary","Real")
If the geometric mean is real, it returns the precise geometric mean (whether it is positive or negative), and if the geometric mean is imaginary, it returns a positive real value with the correct magnitude.
So, I just found the answer - although I have no idea why this works.
Doing GEOMEAN(A1:A10+1000000)-1000000 is actually possible. But by pressing enter and error #VALUE is displayed. You must click control+shift+enter to have the actual result displayed.
According to this: https://www.mrexcel.com/forum/excel-questions/264366-calculating-geometric-mean-some-negative-values.html
If anyone has an explanation for this, I am very interested.

Microsoft Excel's RAND()

Is it possible to use Microsoft Excel's RAND() or RANDBETWEEN() to obtain specific range of values between say 0.5 and 0.9?
I know RAND() returns random numbers between 0 and 0.999999, but I would like to avoid values under 0.5 and all negative numbers.
I'm guessing it's something obvious but can't put my finger on it.
Do the whole numbers then divide by the number of decimal places
=RANDBETWEEN(5,9)/10
So if you really want .5000 and .9999 then you would use:
=RANDBETWEEN(5000,9999)/10000
Take the result of RAND(), multiply by the range (0.9 - 0.5) and add the lowest number in the range (0.5). Altogether,
=(0.9-0.5)*RAND()+0.5
Comparing this answer to Scott's, this implementation's domain is all the numbers between the ranges out to maximum precision. But Scott's answer allows you to specify the exact precision you want. Either is a good option, choose which one suits your needs best.

Excel roundings not summing properly

I have a excel sheet with a few formulas like this:
A1,A2,A3= 0.13,1.25,2.21
A4: =(A1*A2) =0.16 ( 2 decimal points)
A5: =(A2*A3) =2.76 ( 2 decimal points)
A6: =SUM(A4;A5) =2.93 ( 2 decimal points )
And i want to show 0.16+2.76=2.92
well, there's my problem in bold. i want to add the values from the cells, not the formuls result. How can i do that ? Thank you
Presumably you're working with money which is why you need this.
One way to resolve this is to use =ROUND(A1*A2, 2) etc. and base your subsequent calculations from that.
Do be aware though that you will still occasionally get spurious results due to Excel using a 64 bit IEEE754 floating point double to represent numbers. (Although it does have some extremely clever circumvention techniques - see how it evaluates 1/3 + 1/3 + 1/3 - it will not resolve every possible oddity). If you're building an accounting-style sheet you are best off working in pence, and dividing the final result.
Round the values before you sum, ie:
=ROUND(A1*A2,2)
=ROUND(A2*A3,2)
You could wrap your formulas with the ROUND function:
=ROUND(A1*A2,2)
This will give you 0.16 as opposed to 0.163. Do this for each of your calculations and you'll only be calculating everything to two decimal places. Although I'm not sure why you'd want to do that.

How do I use a standard distribution to guess where the value falls in the future?

I have a mean value x and I want to model it into the future. I want to output a value of what it could be in 6 months. Assuming the value follows a normal distribution and we have the standard deviation how do I randomize the value x while following a normal distribution? I'm doing this in excel, but just understanding it would help too! Basically I want to produce numbers 68% of the time within 1 deviation, 95% of the time withing 2 deviation etc. etc.
You can use the excel function 'NORMINV' to convert a random input 'RAND()' to a normal distribution.
=NORMINV(RAND(),Mean,Std Dev)
i.e. if you repeat this many times, save and analyze the results, you'll see a bell curve over the input Mean value.
Does that get you started?
The tricky bit comes when you come up with the formula to predict what a value will be in the future using this.

Resources