I am curious about the existence of any "rounding" standards" when it comes to the calculation of financial data. My initial thoughts are to perform rounding only when the data is being presented to the user (presentation layer).
If "rounded" data is then used for further calculations, should be use the "rounded" figure or the "raw" figure? Does anyone have any advice?
Please note that I am aware of different rounding methods, i.e. Bankers Rounding etc.
The first and most important rule: use a decimal data type, never ever binary floating-point types.
When exactly rounding should be performed can be mandated by regulations, such as the conversion between the Euro and national currencies it replaced.
If there are no such rules, I'd do all calculations with high precision, and round only for presentation, i.e. not use rounded values for further calculations. This should yield the best overall precision.
I just asked a greybeard mainframe programmer at the financial software company I work for, and he said there is no well-known standard and it's up to programmer practice.
While statisticians have been aware of the rounding issue since at least 1906, it's difficult to find a financial standard endorsing it.
According to this site, the "European Commission report The Introduction of the Euro and the Rounding of Currency Amounts suggests that there had previously been no standard approach to rounding in banking."
In general, use a symmetric rounding mode no matter what base you are working in (base-2 or base-10).
This will avoid systematic bias during calculations.
Such a mode is Round-Half-To-Even, otherwise known as "bankers rounding".
Use language tools that allow you to specify the numeric context explicity, including the rounding and truncation modes. For example, Python's decimal module. The implicit assumptions made by the C library might not be appropriate for your computations.
http://en.wikipedia.org/wiki/Rounding#Rounding_to_integer
It's frustrating that there aren't clear standards on this, both to guide the programmer, and as a defense in court. Just doing "regular" rounding toward nearest for payroll can lead to underpayment by a few pennies on a paycheck here and there, which is something labor lawyers eat up like crack.
Though a base pay rate may well only be specified in two decimal places ("You're hired at $22.71/hour"), things like blended overtime (determined by averaging multiple pay rates in a period) end up with an effective hourly rate of $23.37183475/hr.
How do you pay overtime on that?
15 hours x 23.37183475 x 1.5 = $525.87 rounded from $525.86628187
15 hours x 23.37 x 1.5 = $525.82
WHY DID YOU STEAL FIVE CENTS FROM MY CLIENT? Sadly, I'm not joking about this.
This gets even more uncomfortable when you calculate at the full precision value but display a truncated version: you do the first calculation above, but only display $23.37 for the rate on the pay stub.
Now the pay stub calculations don't tie out to the penny, and now you have to explain it, but even if it's in the employee's favor, it can be enough for a labor lawyer to smell blood in the water and start looking for other stuff.
One approach is to always round in favor of the employee, not in the natural direction, so there cannot ever be an accusation of systematic wage theft.
Ive not seen the existence of "the one standard to rule them all" - there are any number of rounding rules (as you have referenced), and they seem to come into play based on industry/customer/and currency code (http://en.wikipedia.org/wiki/ISO_4217) - since not everyone uses 2 places after the decimal, the problem becomes even more complicated. At the end of the day, your customer needs to specify the rules they want to implement...
Consider using scaled integers.
In other words, store whole numbers of pennies instead of fractional numbers of dollars.
Related
We can take Java as a perspective. Suppose we have a system that has items with a price. The price will take several operations, let's say 15 operations. the items' price will be divided multiplied, summed, subtracted with decimals over and over. Know lets say that our system talks to another system. That other system also makes operations to itens prices. In the end the price values of the two systems have to match exactly(cents). We are hipothetically talking about accounting systems. The chance of the two match is very rare, according to my experience. How can we handle such situation. Is there a rule for rounding?
I'd say always calculate with the raw numbers (i.e. a lot of decimals) and transfer that number as well. Only for the comparison round to a less precise, previously agreed upon degree of precision (which is still precise enough for the purpose). That way you have matching results while maintaining high enough precision.
A factor to consider is the rounding method in case one side may differ. There are three I know about:
The one which is taught in school: 0.5 --> 1.0
Towards zero: 1.5 --> 1.0 but -2.5 --> -2.0
Towards even, also called "Banker's rounding": 1.5 --> 2.0 but 2.5 --> 2.0
In below post of Analytics Vidya, ANOVA test has been performed on COVID data, to check whether the difference in posotive cases of denser region is statistically significant.
I believe ANOVA test can’t be performed on this COVID time series data, atleast not in way as it has been done in this post.
Sample data has been consider randomly from different groups(denser1, denser2…denser4). The data is time series so it is more likely that number of positive cases in random sample of groups will be from different point of time.
There might be the case denser1 has random data from early covid time and another region has random data from another point of time. If this is the case, then F-Statistics will high certainly.
Can anyone explain if you have other opinions?
https://www.analyticsvidhya.com/blog/2020/06/introduction-anova-statistics-data-science-covid-python/
ANOVA should not be applied to time-series data, as the independence assumption is violated. The issue with independence is that days tend to correlate very highly. For example, if you know that today you have 1400 positive cases, you would expect tomorrow to have a similar number of positive cases, regardless of any underlying trends.
It sounds like you're trying to determine causality of different treatments (ie mask mandates or other restrictions etc) and their effects on positive cases. The best way to infer causality is usually to perform A-B testing, but obviously in this case it would not be reasonable to give different populations different treatments. One method that is good for going back and retro-actively inferring causality is called "synthetic control".
https://economics.mit.edu/files/17847
Above is linked a basic paper on the methodology. The hard part of this analysis will be in constructing synthetic counterfactuals or "controls" to test your actual population against.
If this is not what you're looking for, please reply with a clarifying question, but I think this should be an appropriate method that is well-suited to studying time-series data.
I've been having some fun creating a rather extensive inventory in Google Sheets for my collection of trading cards. I buy the majority of my collectibles in lots meaning that I pay a total of X dollars for Y number of cards of different values (as opposed to buying each card individually).
In my spreadsheet I have a "Purchase Price" column where I enter in the price I paid for each card. If I buy 1 lot of 10 cards, to find the value of each of those cards you would just divide the cost of the lot by the number of cards in the lot. So if I purchased 1 lot of 10 cards for a total of $100, the Purchase Price of each card would equal $10. Simple enough right?
Well, that would be if you were OK with entering the rare, uncommon, and common cards in the lot with having the same exact purchase price even though their real market values would all be different. So, what I did was create a formula that automatically adjusts the purchase price for each card that's part of a lot based on its rarity so it's at least closer in accuracy to the actual market value of the card.
Here is the formula:
=IFS(B2="C",D2*$B$15*G2/((D2*$B$15)+(E2*$B$16)+(F2*$B$17))/D2,
B2="U",D2*$B$16*G2/((D2*$B$15)+(E2*$B$16)+(F2*$B$17))/D2,
B2="R",D2*$B$17*G2/((D2*$B$15)+(E2*$B$16)+(F2*$B$17))/D2)
Not sure if that means much to anyone, so here's a link to an example spreadsheet of the formula in action below.
And if you don't care to check that out, here's a screenshot:
The problem:
So the formula works exactly how I want it to work EXCEPT when there are 0 commons in a lot. When that happens I get a #DIV/0! error saying that "Function DIVIDE parameter 2 cannot be zero." I understand why this is happening since it doesn't like to divide by 0 in the first line, but what I don't understand is how to fix it.
How can I fix the DIV error, or is there a better way to do this, perhaps an alternative formula or approach? I am not a programmer and somewhat of a beginner at Excel.
Two suggestions.
Embed each division in an IFERROR() function as shown below. This function will return zero instead of an error. You can substitute another calculation for that. In fact, depending upon which level you introduce the IFERROR at (embracing just one of the three calculations or all three) you might choose to embed the IFS in another IFS that tests for zeroes. Once you have no more divisions by zero there would be no more need for IFERROR. So, it becomes a question of formula efficiency.
=IFERROR(D2*$B$15*G2/((D2*$B$15)+(E2*$B$16)+(F2*$B$17))/D2,0)
Forget about all of this and seek a commercially logical solution. The logic says that you never buy a lot unless it contains some items you want, and the seller never has a lot that doesn't contain rubbish. In the end you get inundated with commons, meaning you have more of them than you can ever hope to sell. So, what's their real, commercial value? Valuate your rare and uncommon cards individually and all the baggage not at all. You will find the outcome more realistic both for the Commons and the Rare. BTW, that's what they do with stamp or coin collections.
I realise this is more a comment than an answer, but it's too large to put it as a comment:
Your formula is unreadable, as you can see:
=IFS(B2="C",D2*$B$15*G2/((D2*$B$15)+(E2*$B$16)+(F2*$B$17))/D2,
B2="U",D2*$B$16*G2/((D2*$B$15)+(E2*$B$16)+(F2*$B$17))/D2,
B2="R",D2*$B$17*G2/((D2*$B$15)+(E2*$B$16)+(F2*$B$17))/D2)
First I'd advise you to create a new column (you might always hide it), I:I, which contains following formula (for I2):
=B2*G2/((D2*const_weigth_common)+(E2*const_weigth_uncommon)+(F2*const_weigth_rare))/D2
(And you give this a meaningful header name)
As far as names for $B$15:$B$17 are concerned, do something like:
$B$15 : const_weigth_common
$B$16 : const_weigth_uncommon
$B$17 : const_weigth_rare
(You do know how to use names in Excel?)
Like this, your formula becomes:
=IFS(B2="C",I2 * const_weigth_common,
B2="U",I2 * const_weigth_uncommon,
B2="R",I2 * const_weigth_rare);
As far as your error is concerned: as mentioned in another answer, this might be tackled using the =IF() formula, so I2 becomes:
=IF(D2<>0;...;_ERROR_VALUE); // up to you how to change your error value
Like this, your formulas become much clearer and it will be easier to solve possible problems.
I Ain't No Math-A-Magician
But I can help you with this...
The way I see it, there are three schools of thought and you need to figure out which one is yours:
The programmer - trapping #div/0 errors all day
The purist - there is only one answer, and it is undefined
The pragmatic - a graph shows results approaching a limit
I think the programmer is either dangerous or ineffectual, or dangerously ineffectual. Sure he can trap there error but what exactly does that do? I'll tell you exactly what that does, it puts lipstick on a pig. It's literally replacing one string, "#div/0!", with a different more aesthetic string. Or he can play second fiddle to the devil and publicly killing the one bug everyone knew how to defend at defend sgainst and creating another?
I think the purist is right about one thing, the answer is what it is and it can't be a anything else; but he's also wrong. A precisely known theoretical answer may very well be undefined, but where in the real world has anyone ever seen division by zero? It's a mathematical construct like infinity, we can safely ignore it. Don't believe me? What happens when you dvide the sun by zero? Go ahead, I'll wait for your answer.
I think these things because I am grounded in pragmatism One string is not better than the other if they both symbolize an attempt to divide by zero. I prefer knowing who my enemies are so I my keep them in front of it me. Nature truly abhorss the undefined and is only slightly displeased with a vacuum.
I have a dataset that contains admissions rates of all providers that we work with. I need to divide that data into quartiles, so that each provider can see where their rate lies in comparison to other providers. The rate ranges from 7% to 89%. can anyone suggest me how to do this? I am not sure if this is the right place to ask this question but if somebody can help me with this, I would really appreciate that.
The other concern is that if a provider's numbers is really small eg: 2/4 = 50%, the provider might fall into worse quartile but it doesn't mean that the provider's performance is bad because the numbers are so small. I hope this is making sense. Please let me know if I can clarify it further.
There are ways to obtain quantiles without doing a complete sort but unless you've got huge amounts of data there is no point in implementing those algorithms if you haven't already got them available. Presuming you have a sort() function available, all you need to do is:
Given n data points.
Sort the data points.
Find the n/4, n/2 and 3*n/4th points in the sorted data, which are your quartiles.
As you say, if n is less than some number (that you'll have to decide for yourself) you may want to say that the quartile result is "not applicable" or some such.
First concern: For small n, do not use quartiles. Whether n is small is arbitrary.
A lot of the answers to the questions about the accuracy of float and double recommend the use of decimal for monetary amounts. This works because today all currencies are decimal except MGA and MRO, and those have subunits of 1/5 so are still decimal-friendly.
But what about the software used in U.S. stock markets when prices were in 1/16ths of dollar? The accuracy of binary data types wouldn't have been an issue, right?
Going further back, how did pre-1971 British accounting software deal with pounds, shillings, and pence? Did their versions of COBOL have a special PIC clause for it? Were all amounts stored in pence? How was decimalisation handled?
PL/I had a type specifically for British currency - I don't know about COBOL. The British currency at one time incorporated farthings, or a quarter of a penny; I'm not sure though that computers had to deal with those, just with half pennies or ha'pennies.
Accurate accounting usually uses special types - representing decimals exactly. The new IEEE 754 has support for floating-point decimals, and some chips (notably IBM pSeries) have such support in hardware.
COBOL could do it, eg PICTURE 9(4)D88D6 DISPLAY-ST see http://www.computinghistory.org.uk/downloads/10924 page 117
1/16 can be represented in four digits as .0625. For fractions of that type you just add some additional decimal places.