Algorithm to convert an IEEE 754 double to a string? - string

Many programming languages that use IEEE 754 doubles provide a library function to convert those doubles to strings. For example, C has sprintf, C++ has stringstream, Java has Double.toString, etc.
Internally, how are these functions implemented? That is, what algorithm(s) are they using to convert the double into a string representation, given that they are often subject to programmer-chosen precision limitations?
Thanks!

The code used by various software environments to convert floating-point numbers to string representations is typically based on the following publications (the work by Steele and White is particularly frequently cited):
Jerome T. Coonen: "An Implementation Guide to a Proposed Standard for Floating-Point Arithmetic." Computer, Vol. 13, No. 1, January 1980, pp. 68-79
Guy. L. Steele Jr. and J. L. White: "How to print floating-point numbers accurately". In proceedings of ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, White Plains, New York, June 1990, pp. 112-126
David M. Gay: "Correctly rounded binary-decimal and decimal-binary conversions." Technical Report 90-10, AT&T Bell Laboratories, November 1990.
Some relevant followup work:
Robert G. Burger and R. Kent Dybvig: "Printing floating-point numbers quickly and accurately." In proceedings of ACM SIGPLAN 1996 conference on Programming Language Design and Implementation, Philadelphia, PA, USA, May 1996, pp. 108-116
Guy L. Steele Jr. and Jon L. White: "Retrospective: How to print floating-point numbers accurately." ACM SIGPLAN Notices, Vol. 39, No. 4, April 2004, pp. 372–389
Florian Loitsch: "Printing floating-point numbers quickly and accurately with integers." In proceedings of 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, Toronto, ON, Canada, June 2010, pp. 233-243
Marc Andrysco, Ranjit Jhala, and Sorin Lerner: "Printing floating-point numbers: a faster, always correct method." ACM SIGPLAN Notices, Vol. 51, No. 1, January 2016, pp. 555-567
Ulf Adams: "Ryū: fast float-to-string conversion." ACM SIGPLAN Notices, Vol. 53, No. 4, April 2018, pp. 270-282

I believe you are looking for Printing Floating-Point Numbers Quickly and Accurately
I found that link on another post: here.

See Ryan Juckett's Printing Floating-Point Numbers (2014), which describes history and implementations of floating-point to string conversions. In this four-part post, Ryan also provides a C++ implementation of Dragon4 based on Steele and White (1990), which is an efficient algorithm to convert a binary number in floating point format to a decimal number in string format.
You can also see a C implementation of Ryan's Dragon4 for Numpy here, and use it within Python/Numpy 1.14 format_float_positional and format_float_scientific functions.
In 2018, an algorithm/library Ryu was published, with bindings in many modern programming languages (C, Java, C++, C#, Scala, Rust, Julia, Go, ...)

For most example languages you quote the source is freely consultable online as they're available in open source.
For Java, the class java.lang.Double delegates this work to sun.misc.FloatingDecimal. Check out its constructor and toJavaFormatString() method.
For C, glibc is always a good example, and there we see that floating point output is located in its own source file.

Related

Why is the "english - World" locale not in any way resembling ISO 8601?

On this page: https://en.wikipedia.org/wiki/ISO_8601
... it says:
2020-01-28T08:17:09+00:00
On my computer, running Windows 10, there is a locale called "english - World" (en_001). It's supposed to be some kind of "international compromise" locale, for use when you can't determine the exact locale. This is what it looks like and what I expected based on what I know about international standards/compromises:
Actual date format:
28/01/2020, 10:17 am
Expected date format:
2020-01-28T08:17:09+00:00
or
2020-01-28 08:17
Actual number format:
123,456,789.99
Expected number format:
123 456 789.99
Actual money sum format:
SEK 123,456,789.99
Expected money sum format:
123 456 789.99 SEK
Actual percent format:
99.99%
Expected percent format:
99.99 %
Why is the "World" locale so US-centric and seemingly entirely ignores the ISO standard linked to? It's definitely not supposed to use commas for thousands separators as this is very much US/UK-specific! And Wikipedia specifically states that percentages use a space in international context.
Well, the locale "en-001" is first of all using English, see the prefix "en". And the English-speaking parts of the world does not use ISO-8601-formats but other English-specific "traditional" formats.
ISO-8601 is mainly intended for the technical exchange of date-time-informations. Therefore this standard emphasizes the sortability of date-times in textual form, hence the date-time-components in ISO-8601 follow the order year-month-day-hour(24)-minute-second.
On the other Hand, "en-001" is rather intended for English speakers without exactly specifying the concrete English-speaking country, that means: It can be US, UK, Australia, South Africa etc. Of course, due to the economic and military power of US, the US-standards dominate here.

Why is it Ceiling() and Floor() not RoundUp() and RoundDown()

Most languages name the native rounding up function Ceiling() or Ceil() and the rounding down function Floor(). Yet as far as I know, this notation is pretty much never used outside of programming. Most people just refer to ceiling as "rounding up" and floor as "rounding down" so how did the ceiling/floor notation arise and become popular?
I guess the naming convention has its origin from Mathematics and Computer Science.
http://en.wikipedia.org/wiki/Floor_and_ceiling_functions

Does decimal math use the FPU

Does decimal math use the FPU?
I would think that the answer is yes, but I'm not sure, since a decimal is not a floating point, but a fixed precision number.
I'm looking mostly for .NET, but a general answer would be useful too.
With regards to .NET and more specifically C#, no, System.Decimal does not use the FPU because the type is emulated in software.
Also, System.Decimal is a floating point number, not a fixed precision number like commonly found in a database. The type is actually a decimal floating point that uses 10 for its base as opposed to a binary floating point (i.e. System.Single or System.Double) which uses 2 as its base. It still has the same precision problems if you attempt to store a fraction that cannot be exactly represented, for example, 1/3.
Yes, modern languages in general support floating point math and integer math, and that's it; there's no direct support for fixed point, BCD, etc. Floating-point math is going to be done using floating-point processor instructions; modern architectures don't include a separate FPU.

Where did string escape codes (\n, \t...) originate?

Purely wondering... since they're still around and in use in C# today...
Where did the pattern of using string escape codes come from? What language did it first appear in? What languages, if any, have solved the problem in a different way?
I suspect that these escape codes originated in B, a high-level assembly programming language for the Honeywell 6000 GCOS operating system. This language was developed at Bell Labs based on a British language called BCPL. Because BCPL was rather wordy, the B developers simplified the syntax and added things like braces to replace BEGIN and END. That's where the name B came from, because it was an abbreviated form of BCPL.
Later on some people at Bell Labs created a language that was the successor to B, mainly by adding typing and a standard I/O library. Because it was B's successor, they chose the next letter in the name BCPL.
I do not recall seeing the backslash notation before B, and since C and UNIX inherited it from B, I thing that B is the origin of this notation, or more specifically, that Bell Labs was the origin. It's entirely possible that this notation was used in other Bell Labs software before B, since they were a prolific producer of software, much of which was distributed freely to universities such as the one which I attended in the mid 1970's.
By the way, the idea of an escape sequence existed long before that, dating back to the 19th century Baudot code which was a fixed length 5 bit binary code intended to replace variable length Morse code. Baudot had SI (Shift In) and SO (Shift Out) codes that escaped letters into their capital variation, just like the Shift key on a typewriter.

Truly multi-lingual programming languages?

I realize most languages support multiple languages, but every language I've seen has always been more-or-less US-centric. By that, I mean the keywords, standard library functions, etc. all have english names. So, as a programmer, you still really need to know at least some english to make sense of it.
Are there any truly "multi-lingual" languages out there with support for language keywords and such in multiple languages?
This is generally a horrible idea, as anyone who's worked in a localized IDE can attest to. Programmers rely heavily on having one common vocabulary. When the compiler gives me the error "missing type specifier - int assumed", I can share this exact error message with others, for example here on SO, and it will be familiar to those others so they can tell me what it means. If the compiler instead generated error messages in Danish, I'd be limited to getting help from the relatively few programmers who speak Danish.
Suddenly my vocabulary is no longer the same as someone in the same position in Germany, France or Japan. We can no longer exchange code, bugs, bug fixes or ideas.
A developer in Spain wouldn't be able to use my code because it was literally written in another language. And if I had trouble with my code, others would be helpless to debug it, because it wouldn't even compile under their localization settings (and if it did, it'd still be unreadable to them).
Ultimately, a programming language is a language. It may have borrowed some words from English, but it is not English, and you do not need to understand English to program in it, any more than I need to understand latin in order to speak English (English borrows latin words as well).
You might as well ask for a multi-lingual English. What would be the point? Yes, it would in theory allow people who didn't speak English to... speak English. It just wouldn't be the same English as every other English-speaker speaks, so it wouldn't actually enable communication between them.
The keyword if in a programming language is not the same as if in the English language. They mean different things, even though one was obviously inspired by the other.
The delegate keyword in C# does not mean the same thing as "delegate" in English. Nor does while, return or "constructor". They are not english words, they are keywords or concepts in C++, Java, C#, Python or any other programming language.
Sounds like a bad idea to me. If I'm writing a program, how am I to know that the variable name I'm typing is actually a keyword in Bulgarian or Korean as transliterated? Do I have to deal with thousands of keywords, or do I have problems combining two routines written by my Swedish and Egyptian colleagues?
Just realize that programming keywords are in English, just like music keywords are in Italian.
This seems like a good place to start: Non-English-based programming languages.
There's a few interesting ones on there, like Python translated to Chinese.
You can make use of the C/C++ preprocessor to redefine all the keywords - and some people have done this. I came across it when working as a trainer/mentor for a Norwegian company. Some bright spark had implemented aheader that translated all the C keywords into Norwegian and enforced its use. The Norwegian staff, all of whom spoke excellent English (or I couldn't have earned my crust with them) all hated it and it died a death.
I've also worked fairly extensively in the Netherlands, and most of the programmers there seem to program in English. The only people I've come across who are resistant to the English hegemony in programming languages are (needless to say) the French.
There is one area where a localized language may be useful and helpful and these are DLSs (Domain Specific Languages) that were designed to be used by non-programmers. Those languages can surely benefit from being localized since business users from non-English speaking countries often don't know English as well as programmers do.
Such localized DSLs can prove advantageous to programmers as well if they deal with a lot of non-translatable terms. One rather successful system I've encountered was used to calculate salaries for personnel in the Israeli military. It used a Hebrew-based syntax together with hundreds of terms that can only be properly expressed in Hebrew. In that particular case the standard logic keywords if, then, else, etc. were translated to Hebrew and the entire code editor was right-to-left. A very large body of business logic is maintained in this manner to this day and, IMHO, rightly so.
It seems like it would be notoriously difficult, unless it was a community effort, but for some languages I don't see why you can't make an existing language multi-lingual, but creating custom libraries that localize standard libraries.
For example, in Java, you can create
public class HoweverYouSayExampleObjectInYourLanguage extends ExampleObjectName {
}
and then create wrapper functions / methods with names in your target language, but which basically call existing standard methods
private void HoweverYouSayExampleMethodInYourLanguage(parameters) {
this.ExampleMethod(parameters);
// Some error handling code
}
If you do error handling properly, the stack trace / errors will all reference errors in the standard libraries, unless it was an error speficially with the implementation of your localization library - in which case that should be pointed out via proper error handling in the localization library itself.
The disadvantage would be, as other people have mentioned, sharing source code. If we were all on the same page with an IDE for a given language - I don't see a reason why you couldn't build a really internationalized IDE in which the source code you see on the screen isn't the REAL source code per se, but a local rendering of the real source code via some form of mapping.
I'm going to go ahead and say that everything I just said above is probably at best an okay idea because function names aren't nearly as important as localized documentation for libraries and APIs. something which in my experience is done terribly or not at all for common programming languages / contexts.
You can program Perl in Latin.
Don't try to code in the natural language, that's useless. Learn the "programming" language instead.
For instance, the "switch" word didn't mean anything to me in English, but it was an instruction to decide over several choices.
Later ( when I learn english ) I thought.. Hey, this is funny, English do have a "switch" word too, just like C. ( doh! )
:)
No matter how good or bad your English is, you can't say to java
import java.util.* into my CD-ROM;
Because it is not a valid syntax.
What about languages like APL and J? The keywords in APL are all single symbols; unfortunately, most of these are not on your keyboard, so J came along and replaced most of them with ASCII representations (made up of more than one character in many cases).
Sorted!
Sorted! is bilingual. It can understand both english and german code. To my knowledge, Sorted! is the only programming language that can do this, in the world.
Any useful ones? That's a better question.
To a significant extent, you can program in prolog in any unicode script (because it is a symbolic language). There's a (tiny, weeny) catch - variables are signified by an initial capital Roman letter in all the prolog compilers you are likely to come across and you'll have to redefine the built-ins (but prolog makes this relatively easy*).
I think an example will illustrate what I mean best:
% an algorithm for finding easter dates, given year (as first argument)
復活節( V1, V2, V3) :-
A 是 (V1 mod 19),
B 是 V1// 100,
C 是 V1 mod 100,
D 是 B // 4,
E 是 B mod 4,
F 是 (B + 8) // 25,
G 是 (B - F + 1) // 3,
H 是 (19*A + B - D - G + 15 )mod 30,
I 是 C // 4,
J 是 C mod 4,
K 是 (32 + 2*E + 2*I - H - J) mod 7 ,
L 是 (A + 11*H + 22*K) // 451,
M 是 (H + K - 7*L + 114) // 31,
N 是 (H + K - 7*L + 114)mod 31,
V2 是 M,
V3 是 N + 1.
/*
Example test:
?- 復活節( 2013, V2 , V3).
V2 = 3 ,
V3 = 31
i.e. Easter this year will be on 31st March
*/
this is what I used to redefine the build in 'is' operator (don't shoot me if it's imperfect):
:-op(500,xfy,是).
是(X,Y):-is(X,Y).

Resources