Related
I would like to create a list of all the real numbers between 1.0 and 2.0 in two-decimal-place increments. This
dList = [1.00,1.01..2.00]
however, creates float run-on problems
dList = [1.0,1.01,1.02,1.03,1.04,1.05,1.06,1.07,1.08,1.09,1.1,1.11,1.12,
1.1300000000000001,1.1400000000000001,1.1500000000000001, ...
To remedy this I found what I thought was a function in Data.Decimal namely, roundTo. I was hoping to eventually run this
map (roundTo 2) [1.1,1.2..2.0]
and get rid of the float run-on, but it produces a giant error message. This page is for this beginner undecipherable. And so I'm trying to do this with an hs file loaded at a ghci REPL. This is the code
import Data.Decimal
dList :: [Decimal]
dList = [1.00,1.01..2.0]
main = print dList
and it produces
Could not find module ‘Data.Decimal’
Lost, I am....
Note
This answer is an FYI for all of you beginners simply trying to follow along in a beginner Haskell book that has you typing code into a text editor, firing up the ghci REPL and doing :load my-haskell-code.hs.
YMMV Solution
As can be gleaned above, Data.Decimal is not a standard included Prelude sort of package. It must be loaded independently -- and no, it doesn't work to simply put import Data.Decimal at the top of your code. As leftaroundabout said in a comment above, the very simplest way for Haskell beginners not yet doing projects is to start ghci thusly
stack ghci --package Decimal
Of course YMMV depending on how you installed Haskell. I installed Haskell through the stack project management, hence, the stack before ghci --package Decimal. Another unique thing about my setup is I'm using Emacs org-mode's Babel code blocks, which is largely the same as the basic type-and-load way, i.e., non-project. I did try to alter Emacs's haskell-process-args-stack-ghci which is in haskell-customize.el by just adding --package Decimal but it didn't work. Instead, I simply went to my bash command line and put in stack ghci --package Decimal, then I restarted a separate org-mode Babel ghci and it worked. Now,
dList :: [Decimal]
dList = [1.00,1.01..2.00]
> dList
[1,1.01,1.02,1.03,1.04,1.05,1.06,1.07,1.08,1.09,1.10,1.11,1.12,1.13,1.14,1.15,1.16,1.17,1.18,1.19,1.20,1.21,1.22,1.23,1.24,1.25,1.26,1.27,1.28,1.29,1.30,1.31,1.32,1.33,1.34,1.35,1.36,1.37,1.38,1.39,1.40,1.41,1.42,1.43,1.44,1.45,1.46,1.47,1.48,1.49,1.50,1.51,1.52,1.53,1.54,1.55,1.56,1.57,1.58,1.59,1.60,1.61,1.62,1.63,1.64,1.65,1.66,1.67,1.68,1.69,1.70,1.71,1.72,1.73,1.74,1.75,1.76,1.77,1.78,1.79,1.80,1.81,1.82,1.83,1.84,1.85,1.86,1.87,1.88,1.89,1.90,1.91,1.92,1.93,1.94,1.95,1.96,1.97,1.98,1.99,2.00]
no muss, no fuss. I killed the ghci and loaded it without the --package Decimal and it still knew about Decimal, so this change is recorded quasi-permanently in my ~/.stack directory somewhere? Oddly, the bash ghci session doesn't know about the *haskell* ghci session in org-mode. Also, when just using the Emacs haskell mode stand-alone for type-and-load, its ghci session also doesn't play well with the org-mode ghci. I've gone with the minimalist Emacs org-mode babel because it seems better than just lhs literal Haskell. If anyone knows how to make literal Haskell sing like org-mode, I'd like to know.
Postmortem
I guess I insisted on figuring out Decimal because in researching the whole rounding issue I started seeing a wild divergence of suggested solutions (not necessarily here, but other places) and scary technical arguments. Decimal seemed the simplest in a wild storm of competing rounding strategies. Rounding should be simple, but in Haskell it turned into a time-sucking trip down multiple rabbit holes.
Answer to your specific float problem
By far the simplest option in this case is
[n/100 | n<-[0..200]]
or variations on the same idea:
map (/100) [0..200]
(*1e-2) . fromIntegral <$> [0 .. 200 :: Int] -- (*) is more efficient than (/),
-- but will actually introduce rounding
-- errors (non-accumulating) again
No need for any special decimal library or rational numbers.
The reason this works better than the [x₀, x₁ .. xe] approach is that integers below 252 can be expressed exactly in floating point (whereas decimal fractions cannot). Therefore, the range [0..200] is exactly what you want it to be. Then at the end, dividing each of these numbers by 100 will still not give you exact representations of the hundredths you want to get – because such representations don't exist – but you will for each element get the closest possible approximation. And that closest possible approximation is in fact printed in x.yz form, even with the standard print function. By comparison, in [1.00,1.01..2.0] you keep adding up already-approximate values, thereby compounding the errors.
You could also use your original range calculation in exact rationals, and only then convert them to float★ – this still doesn't require a decimal library
map fromRational [0, 0.01 .. 2]
Rational arithmetic is often an easy fix for similar problems – but I tend to advise against this, because it usually scales badly. Algorithms that have rounding problems in floating-point will more often than not run into memory problems in rational arithmetic, because you need to carry an ever-growing range of precision through the entire calculation. The better solution is to avoid needing the precision in the first place, like with the n/100 suggestion.
Also, arguably the [x₀, x₁ .. xe] syntax is a broken design anyway; integer ranges have far clearer semantics.
Note also that the float errors in your original attempt aren't necessarily a problem at all. An error of 10-9 in a real-world measured quantity is for all meaningful purposes neglectable. If you need something to be truely exact, you probably shouldn't be using fractional values at all, but straight integers. So, conside Carl's suggestion of just living with the float-deviations, but simply printing them in a suitably rounded form, by way of showFFloat, printf, or Text.Show.Pragmatic.print.
★Actually, in this specific case both solutions are almost equivalent, because converting Rational to float involves floating-point dividing the numerator by the denominator.
Answer to your module-loading problem
In cases where you do need the Decimal library (or some other library), you need to depend on it.
The easiest way to do this is to use Stack and add Decimal to your global project. Then, you can load the file with stack ghci and it'll know where to look for Data.Decimal.
Alternatively, and IMO preferrably, you should create a project package yourself and only make that depend on Decimal. This can be done with either Stack or Cabal-install.
$ mkdir my-decimal-project
$ cd my-decimal-project
$ cabal init
Now you're being asked some questions about the name etc. of your project, you can mostly answer with the defaults. Say your project defines a library (you can add an executable later on if you want.)
cabal init creates amongst other things a my-decimal-project.cabal file. In that file, add Decimal to the dependency, and your own source file to exposed-modules.
Then you need to (still in your project directory) cabal install --dependencies-only to fetch the Decimal library, and then cabal repl to load your module.
I am implementing an algorithm using Data.Ratio (convergents of continued fractions).
However, I encounter two obstacles:
The algorithm starts with the fraction 1%0 - but this throws a zero denominator exception.
I would like to pattern match the constructor a :% b
I was exploring on hackage. An in particular the source seems to be using exactly these features (e.g. defining infinity = 1 :% 0, or pattern matching for numerator).
As beginner, I am also confused where it is determined that (%), numerator and such are exposed to me, but not infinity and (:%).
I have already made a dirty workaround using a tuple of integers, but it seems silly to reinvent the wheel about something so trivial.
Also would be nice to learn how read the source which functions are exposed.
They aren't exported precisely to prevent people from doing stuff like this. See, the type
data Ratio a = a:%a
contains too many values. In particular, e.g. 2/6 and 3/9 are actually the same number in ℚ and both represented by 1:%3. Thus, 2:%6 is in fact an illegal value, and so is, sure enough, 1:%0. Or it might be legal but all functions know how to treat them so 2:%6 is for all observable means equal to 1:%3 – I don't in fact know which of these options GHC chooses, but at any rate it's an implementation detail and could change in future releases without notice.
If the library authors themselves use such values for e.g. optimisation tricks that's one thing – they have after all full control over any algorithmic details and any undefined behaviour that could arise. But if users got to construct such values, it would result in brittle code.
So – if you find yourself starting an algorithm with 1/0, then you should indeed not use Ratio at all there but simply store numerator and denominator in a plain tuple, which has no such issues, and only make the final result a Ratio with %.
I’m working on a TXT to SPC converter, and certain values have to be stored as hex of double, but Python only works with float and struct.unpack(‘<d’, struct.pack(‘<f’, value))/any other unpack and pack matryoshka doll I can conceive doesn’t work because of the difference in byte size.
The SPC library unpacks said values from SPC as <d and converts them to float through float()
What do I do?
I think you may be getting confused by different programming languages' naming strategies.
There's a class of data types known as "floating point numbers". Two floating-point number types defined by IEEE-754 are "binary32" and "binary64". In C and C++, those two types are exposed as the types float and double, respectively. In Python, only "binary64" is natively supported as a built-in type; it's known as float.
Python's struct module supports both binary32 and binary64, and uses C/C++'s nomenclature to refer to them. f specifies binary32 and d specifies binary64. Regardless of which you're using, the module packs from and unpacks to Python's native float type (which, remember, is binary64). In the case of d that's exact; in the case of f it converts the type under the hood. You don't need to fool Python into doing the conversion.
Now, I'm just going to assume you're wrong about "stored as hex of double". What I think you probably mean is "stored as double" -- namely, 64 bits in a file -- as opposed to stored as "hex of double", namely sixteen human-readable ASCII characters. That latter one just doesn't happen.
All of which is to say, if you want to store things as binary64, it's just a matter of struct.pack('d', value).
It just seemed to me studying GEP,and especially analyzing Karva expressions, that Non Terminals are most suitable for functions which type is a->a for some type a, in Haskell notation.
Like, with classic examples, Q+-*/ are all functions from 'some' Double to 'a' Double and they just change in arity.
Now, how can one coder use functions of heterogeneous signature in one Karva expressed gene?
Brief Introduction to GEP/Karva
Gene Expression Programming uses dense representations of a population of expressions and applies evolutionary pressure to make better ones to solve a given problem.
Karva notation represents an expression tree as a string, represented in a non-traditional traversal of level-at-a-time, left-to-right - read more here. Using Karva notation, it is simple and quick to combine (or mutate) expressions to create the next generation.
You can parse Karva notation in Haskell as per this answer with explanation of linear time or this answer that's the same code, but with more diagrams and no proof.
Terminals are the constants or variables in a Karva expression, so /+a*-3cb2 (meaning (a+(b*2))/(3-c)) has terminals [a,b,2,3,c]. A Karva expression with no terminals is thus a function of some arity.
My Question is then more related to how one would use different types of functions without breaking the gene.
What if one wants to use a Non Terminal like a > function? One can count on the fact that, for example, it can compare Doubles. But the result, in a strongly typed Language, would be a Bool. Now, assuming that the Non terminal encoding for > is interspersed in the gene, the parse of the k-expression would result in invalid code, because anything calling it would expect a Double.
One can then think of manually and silently sneak in a cast, as is done by Ms. Ferreira in her book, where she converts Bools into Ints like 0 and 1 for False and True.
Si it seems to me that k-expressed genes are for Non Terminals of any arity, that share the property of taking values of one type a, returning a type a.
In the end, has anyone any idea about how to overcome this?
I already now that one can use homeotic genes, providing some glue between different Sub Expression Trees, but that, IMHO, is somewhat rigid, because, again, you need to know in advance returned types.
In various articles I have read, there are sometimes references to primitive data types and sometimes there are references to scalars.
My understanding of each is that they are data types of something simple like an int, boolean, char, etc.
Is there something I am missing that means you should use particular terminology or are the terms simply interchangeable?
The Wikipedia pages for each one doesn't show anything obvious.
If the terms are simply interchangeable, which is the preferred one?
I don't think they're interchangeable. They are frequently similar, but differences do exist, and seems to mainly be in what they are contrasted with and what is relevant in context.
Scalars are typically contrasted with compounds, such as arrays, maps, sets, structs, etc. A scalar is a "single" value - integer, boolean, perhaps a string - while a compound is made up of multiple scalars (and possibly references to other compounds). "Scalar" is used in contexts where the relevant distinction is between single/simple/atomic values and compound values.
Primitive types, however, are contrasted with e.g. reference types, and are used when the relevant distinction is "Is this directly a value, or is it a reference to something that contains the real value?", as in Java's primitive types vs. references. I see this as a somewhat lower-level distinction than scalar/compound, but not quite.
It really depends on context (and frequently what language family is being discussed). To take one, possibly pathological, example: strings. In C, a string is a compound (an array of characters), while in Perl, a string is a scalar. In Java, a string is an object (or reference type). In Python, everything is (conceptually) an object/reference type, including strings (and numbers).
There's a lot of confusion and misuse of these terms. Often one is used to mean another. Here is what those terms actually mean.
"Native" refers to types that are built into to the language, as opposed to being provided by a library (even a standard library), regardless of how they're implemented. Perl strings are part of the Perl language, so they are native in Perl. C provides string semantics over pointers to chars using a library, so pointer to char is native, but strings are not.
"Atomic" refers to a type that can no longer be decomposed. It is the opposite of "composite". Composites can be decomposed into a combination of atomic values or other composites. Native integers and floating point numbers are atomic. Fractions, complex numbers, containers/collections, and strings are composite.
"Scalar" -- and this is the one that confuses most people -- refers to values that can express scale (hence the name), such as size, volume, counts, etc. Integers, floating point numbers, and fractions are scalars. Complex numbers, booleans, and strings are NOT scalars. Something that is atomic is not necessarily scalar and something that is scalar is not necessarily atomic. Scalars can be native or provided by libraries.
Some types have odd classifications. BigNumber types, usually implemented as an array of digits or integers, are scalars, but they're technically not atomic. They can appear to be atomic if the implementation is hidden and you can't access the internal components. But the components are only hidden, so the atomicity is an illusion. They're almost invariably provided in libraries, so they're not native, but they could be. In the Mathematica programming language, for example, big numbers are native and, since there's no way for a Mathematica program to decompose them into their building blocks, they're also atomic in that context, despite the fact that they're composites under the covers (where you're no longer in the world of the Mathematica language).
These definitions are independent of the language being used.
Put simply, it would appear that a 'scalar' type refers to a single item, as opposed to a composite or collection. So scalars include both primitive values as well as things like an enum value.
http://ee.hawaii.edu/~tep/EE160/Book/chap5/section2.1.3.html
Perhaps the 'scalar' term may be a throwback to C:
where scalars are primitive objects which contain a single value and are not composed of other C++ objects
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1995/N0774.pdf
I'm curious about whether this refers to whether these items would have a value of 'scale'? - Such as counting numbers.
I like Scott Langeberg's answer because it is concise and backed by authoritative links. I would up-vote Scott's answer if I could.
I suppose that "primitive" data type could be considered primary data type so that secondary data types are derived from primary data types. The derivation is through combining, such as a C++ struct. A struct can be used to combine data types (such as and int and a char) to get a secondary data type. The struct-defined data type is always a secondary data type. Primary data types are not derived from anything, rather they are a given in the programming language.
I have a parallel to primitive being the nomenclature meaning primary. That parallel is "regular expression". I think the nomenclature "regular" can be understood as "regulating". Thus you have an expression that regulates the search.
Scalar etymology (http://www.etymonline.com/index.php?allowed_in_frame=0&search=scalar&searchmode=none) means ladder-like. I think the way this relates to programming is that a ladder has only one dimension: How many rungs from the end of the ladder. A scalar data type has only one dimension, thus represented by a single value.
I think in usage, primitive and scalar are interchangeable. Is there any example of a primitive that is not scalar, or of a scalar that is not primitive?
Although interchangeable, primitive refers to the data-type being a basic building block of other data types, and a primitive is not composed of other data types.
Scalar refers to its having a single value. Scalar contrasts with the mathematical vector. A vector is not represented by a single value because (using one kind of vector as an example) one value is needed to represent the vector's direction and another value needed to represent the vector's magnitude.
Reference links:
http://whatis.techtarget.com/definition/primitive
http://en.wikipedia.org/wiki/Primitive_data_type
Being scalar has nothing to do with the language, whereas being primitive is all dependent on the language. The two have nothing to do with each other.
A scalar data type is something that has a finite set of possible values, following some scale, i.e. each value can be compared to any other value as either equal, greater or less. Numeric values (floating point and integer) are the obvious examples, while discrete/enumerated values can also be considered scalar. In this regard, boolean is a scalar with 2 discrete possible values, and normally it makes sense that true > false. Strings, regardless of programming language, are technically not scalars.
Now what is primitive depends on the language. Every language classifies what its "basic types" are, and these are designated as its primitives. In JavaScript, string is primitive, despite it not being a scalar in the general sense. But in some languages a string is not primitive. To be a primitive type, the language must be able to treat it as immutable, and for this reason referential types such as objects, arrays, collections, cannot be primitive in most, if not all, languages.
In C, enumeration types, characters, and the various representations of integers form a more general type class called scalar types. Hence, the operations you can perform on values of any scalar type are the same as those for integers.
null type is the only thing that most realistically conforms to the definition of a "scalar type". Even the serialization of 'None' as 'N.' fitting into a 16bit word which is traditionally scalar -- or even a single bit which has multiple possible values -- isn't a "single data".
Every primitive is scalar, but not vice versa. DateTime is scalar, but not primitive.