Best way to store "percentages" while programming? [closed] - maintainability

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have a financial application that deals with "percentages" quite frequently. We are using "decimal" types to avoid rounding errors. My question is:
If I have a quantity representing 76%, is it better to store it as 0.76 or 76?
Furthermore, what are the advantages and/or disadvantages from a code maintainability point of view? Are there conventions for this?

If percentage is only a small part of the problem domain, I would probably stick with a primitive number.
PerCent literally means per hundred, and is a decimal number; e.g. 76 % is equal to 0.76. Thus, in the absence of units of measure support in the programming language itself, I would represent a percentage as a decimal number.
This also means that you don't need to perform special arithmetic in order to calculate with percentages, but you will need to multiply by 100 if you need to display the number in percent.
If percentage is a very central part of the problem domain, you should consider discarding Primitive Obsession and instead introduce a proper Value Object.
Still, even if you introduce a Percent Value Object that contains conversions to and from primitive numbers, you're still stuck with potential programmer errors, because in itself, the number 0.9 could both mean 90 % or 0.9 % depending on how you choose to interpret it.
In the end, my best advice is to cover your code base with appropriate unit tests, so that you lock the conversion code down.

Related

Correct way to simplify/neaten up complex or dense sequence diagrams [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am creating a sequence diagram for an alarm system which involves a few different states and events which cause different behaviour depending on the state.
I am wondering the best way to present it. I planned on creating reference sequences to common events - such as entering a pin, a sensor going off and the alarm being activated - along the path of 'no fault'. This would be the alarm being off, being armed and then being turned off again. Without the alarm being set off by the sensors or pin entry failure.
Here's what I've got so far. Is there a better way (I will obviously define the reference sequences) or is this clear enough?
My approach to this is to keep sequence diagrams limited to a single level. The sequence diagram should only describe the behavior of one operation of one single class. To describe behavior of other operations of the same or other classes I use different sequence diagrams.
Furthermore I try to limit the number of messages in a sequence diagram to something like 15 or so. In general my rule is that I should always be able to print a diagram on A4 size and still be able to read it. If not there's too much on the diagram and it should be divided over several different diagram.
More details can be found here: UML Best Practice: One Operation => One Sequence Diagram

Repeatedly select in ‘random’ order the components of three sets of ten each [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I wish to assign randomly (sort of) the numbers 1, 2 and 3 to 30 rows such that every time I run the randomization I get ten instances of each number.
I do know of =RANDBETWEEN but am not sure how to ensure each of the three numbers is output with equal probability, but in a varying sequence.
Is there a convenient software algorithm for this, using Excel functions?
What you are doing is called a shuffle - randomly ordering a fixed set of elements.
Fill the array with 10 1's, 10 2's, and 10 3's, then run a random shuffle on it.
You should be able to find descriptions on shuffle algorithms with a web search.

measuring precision and recall [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
We are building a text search solution and want a way to measure precision and recall of the system every time we add new document types. From reading some of the posts here it sounds like a machine learning based solution is the way to go. Can a expert comment on this? We will then look to add machine learning folks to our team.
The only way to get the F1-score require knowledge about the correct class, rank of all samples obtains by evaluation querys, and you also need thoses evaluation querys.
Any machine learning will need a large quantity of manual work to provided thoses samples and/or querys. So large that it wont save you any time.
Another bad aspect of this evaluation is through to learning-related intrinsic errors. It will go with the growing size of the index of the search engine and the number of examples required. You never get a good evaluation.
Forget machine-learning for the evaluation of search engine.
Build by hand your tests querys and sample, by the time it will become big and reliable.
If you really want machine-learning in your system, you should look at query pre-processing. Getting some meta-information about the query by another way (you say SVN, why not?) is generaly a good for performance and while it did'nt change the result, you can use the same sample for an end-to-end evaluation.
That what I have done few years ago, but with naive baye classifier on natural langage analysis.

Why does cos(2^27) fail? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
It seems excel knows how to calculate =cos(2^27-1) but fails to calculate =cos(2^27). That returns #NUM!. Does anyone know why?
I have no idea what sort of arithmetic that Excel uses internally, but at some point, with a large number, the error after you do a mod 2*pi operation is too substantial to produce a reliable answer. Presumably they picked 2^27 as their cutoff.
This is behavior is not well documented. The Sin Function documentation indicates that the argument is a Double, and the specified limits in the documentation indicate that the double type is stored as a 64-bit number ranging from 4.94E-324 to 1.797E308 (for positive numbers).
I suspect that it is not coincidental that 2^27 (134,217,728) bytes is precisely 128 megabytes, and it seems likely that there is an internal limitation for some trig functions (eg. COS, SIN and TAN, but interestingly, NOT for TANH, etc.). This is not to say that this amount of memory consumption would be required - it's just that a programmer's implementation could have some (potentially unnecessary) limits on these types of inputs internally.
To get around this silly limit, simply use the following:
=COS(MOD(2^27, 2*PI()))
This works because the limitation does not exist for other operations, and is nowhere to be seen in the Excel Specifications and Limits. :-)
It would be good for the documentation as linked provided a description of these limits, but unfortunately, it does not.

What does taking the logarithm of a variable mean? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Question with regards to taking the logarithm of a variable (Statistics Question)
Say you have a bar graph displaying data for an example "Cost of Computer Orders by the Population" and you are trying to analyze the data and find a distribution. The information does not indicate anything so you take the logarithm of the variable and the graph then resembles a normal distribution. I know that the normal distribution basically means the mean, but what does taking the logarithm of the information indicate?
It seems that you are describing the lognormal distribution: a random variable is said to come from a lognormal distribution if its logarithm is distributed normally.
In practice, this can describe processes where the value cannot go below zero, and most of the population is close to the left (right skewness). For example: salaries, home prices, bone fractures, number of girlfriends all could be reasonably modeled with a log normal distribution.
For example: say that on average young adults have had 2.5 girlfriends. A few have never had one; you cannot have "negative number" of girlfriends, and a few bastards have had 25. However, most young adults will have had between, say, one and three.
if you display the values of x as their log(x) then the line in the diagramm is a straight line, when the values grow exponential. This is a stastistically trick for a fast check if values grow exponentionally.

Resources