I often see pprograms like this, where Int64 is an absolute performance killer on 32-bit platforms. My question is now:
If I need a specific word length for my task (in my case a RNG), is Int64 efficient on 64-bit platforms or will it still use the C-calls? And how efficient is converting an Int64 to an Int?
on a 64bit system Int64 should be fine, I don't know for sure though.
More importantly if you're doing Crypto or random-number-generation you MUST use the data type that the algorithm says to use, also be careful of signed-ness. If you do-not do this you will have the wrong results, which might mean that your cryptography is not secure or your random number generator is not really random (RNGs are hard, and many look random but aren't).
For any other type of work use Integer wherever you can, or even better, make your program polymorphic using the Integral type-class. Then if you think your program is slower than it should be profile it to determine where you should concentrate when you try to speed it up. if you use the Integral type-class changing from Integer to Int is easy. Haskell should be smart enough to specialize (most) code that uses polymorphism to avoid overheads.
Interesting article on 64 bit performance here:
Isn’t my code going to be faster on 64-bit???
As the article states, the big bottleneck isn't the processor, it's the cache and memory I/O.
Related
I am trying to find the fastest way to flip the value of a Boolean in rust? i.e.
false => true
true => false
For my application I do not care about the current value of the Boolean only that it is flipped. For my application (Sieve of Atkin - an improved version of the Sieve of Eratosthenes) this will need to be performed a large number of times so would be good to have it run as fast as possible. Currently my code is:
item[i] = !item[i]
Because, (as mentioned) the current value of item[i] is irrelevant I am sure there is a faster (possibly bitwise) way to do this. However, I'm a bit of a rust nubie and haven't been able to find it, can anyone advise me on a better way?
Thanks,
It doesn't matter. The compiler will outsmart you on this and pick the fastest method it knows. The exact syntax you use is not important. However this is only the case when you remember to turn on optimizations. You may think this is obvious, but it is an extremely common mistake. This is done using --release when building or running your project with cargo. If you forget this step, the compiler won't even attempt to speed up your code and timing code execution becomes meaningless.
What matters more is how you access the memory where the boolean resides. Try to keep memory you are working with on the stack if you are doing a lot of work on one value or small region at a time. Cache locality also means that it will be faster to read adjacent cache lines than jumping between places in memory. Memory is more likely to be in the cache if you accessed it recently or the CPU guesses you are about to access it.
There are also crates like bit-vec and bitvec which reduce each boolean to using a single bit. This is great for improving memory usage (8x improvement to be exact), but comes at a very small cost to performance. I would avoid the bitvec crate though. About a month ago I did some benchmarks and the performance was absolutely abysmal.
Do you need to work on a single boolean at a time? Try to work with entire words of memory if possible. Bitwise operations on a u64 will likely take the exact same amount of time, but you get 64x the productivity.
Since in Rust a boolean variable is represented as an 8 bit unsigned integer with 0 for false and 1 for true, the compiler can implement negation without a branch by computing the XOR of the value with 1.
That being said, while I'm not familiar with the Sieve of Atkin, at least for the Sieve of Eratosthenes, you really want to use bitfields rather than booleans. But the same trick can be used there to avoid a branch.
I am learning rust and in the official tutorial, the author assigned the value 5 to a variable like so:
let x: i32 = 5;
I thought this was weird as one could use u8 as the type and the program would run fine. This got me thinking, are there any advantages to using a lower bit number? Is it faster?
The main advantage is that they use less memory. A vector<i32> with 1 billion elements will use 4GB, while a vector<u8> will use 1GB. This can be a significant advantage regardless of speed.
Arithmetic on smaller integer types on modern CPUs is not faster in general. There are some issues with using only part of a register but optimizers will almost certainly resolve these performance problems for you.
When you have a lot of integers and the optimizer can make use of vectorization (for example adding your 1 billion integers in the vector) then smaller types will typically yield better performance, because more of them fit in a SIMD register.
If you use them just as one scalar stack variable like in your example, I highly doubt there will be a difference in 99% of cases. Here other considerations are more important:
A bigger type will make overflows less likely, maybe you did calculate your maximal possible value wrong.
For public interfaces bigger types are more future proof.
Its better to cast from i8 to i32 than the other way round.
I have some data which can be represented by an unsigned Integral type and its biggest value requires 52 bits. AFAIK only Integer, Int64 and Word64 satisfy these requirements.
All the information I could find out about those types was that Integer is signed and has a floating unlimited bit-size, Int64 and Word64 are fixed and signed and unsigned respectively. What I coudn't find out was the information on the actual implementation of those types:
How many bits will a 52-bit value actually occupy if stored as an Integer?
Am I correct that Int64 and Word64 allow you to store a 64-bit data and weigh exactly 64 bits for any value?
Are any of those types more performant or preferrable for any other reasons than size, e.g. native code implementations or direct processor instructions-related optimizations?
And just in case: which one would you recommend for storing a 52-bit value in an application extremely sensitive in terms of performance?
How many bits will a 52-bit value actually occupy if stored as an Integer?
This is implementation-dependent. With GHC, values that fit inside a machine word are stored directly in a constructor of Integer, so if you're on a 64-bit machine, it should take the same amount of space as an Int. This corresponds to the S# constructor of Integer:
data Integer = S# Int#
| J# Int# ByteArray#
Larger values (i.e. those represented with J#) are stored with GMP.
Am I correct that Int64 and Word64 allow you to store a 64-bit data and weigh exactly 64 bits for any value?
Not quite — they're boxed. An Int64 is actually a pointer to either an unevaluated thunk or a one-word pointer to an info table plus a 64-bit integer value. (See the GHC commentary for more information.)
If you really want something that's guaranteed to be 64 bits, no exceptions, then you can use an unboxed type like Int64#, but I would strongly recommend profiling first; unboxed values are quite painful to use. For instance, you can't use unboxed types as arguments to type constructors, so you can't have a list of Int64#s. You also have to use operations specific to unboxed integers. And, of course, all of this is extremely GHC-specific.
If you're looking to store a lot of 52-bit integers, you might want to use vector or repa (built on vector, with fancy things like automatic parallelism); they store the values unboxed under the hood, but let you work with them in boxed form. (Of course, each individual value you take out will be boxed.)
Are any of those types more performant or preferrable for any other reasons than size, e.g. native code implementations or direct processor instructions-related optimizations?
Yes; using Integer incurs a branch for every operation, since it has to distinguish the machine-word and bignum cases; and, of course, it has to handle overflow. Fixed-size integral types avoid this overhead.
And just in case: which one would you recommend for storing a 52-bit value in an application extremely sensitive in terms of performance?
If you're using a 64-bit machine: Int64 or, if you must, Int64#.
If you're using a 32-bit machine: Probably Integer, since on 32-bit Int64 is emulated with FFI calls to GHC functions that are probably not very highly optimised, but I'd try both and benchmark it. With Integer, you'll get the best performance on small integers, and GMP is heavily-optimised, so it'll probably do better on the larger ones than you might think.
You could select between Int64 and Integer at compile-time using the C preprocessor (enabled with {-# LANGUAGE CPP #-}); I think it would be easy to get Cabal to control a #define based on the word width of the target architecture. Beware, of course, that they are not the same; you will have to be careful to avoid "overflows" in the Integer code, and e.g. Int64 is an instance of Bounded but Integer is not. It might be simplest to just target a single word width (and thus type) for performance and live with the slower performance on the other.
I would suggest creating your own Int52 type as a newtype wrapper over Int64, or a Word52 wrapper over Word64 — just pick whichever matches your data better, there should be no performance impact; if it's just arbitrary bits I'd go with Int64, just because Int is more common than Word.
You can define all the instances to handle wrapping automatically (try :info Int64 in GHCi to find out which instances you'll want to define), and provide "unsafe" operations that just apply directly under the newtype for performance-critical situations where you know there won't be any overflow.
Then, if you don't export the newtype constructor, you can always swap the implementation of Int52 later, without changing any of the rest of your code. Don't worry about the overhead of a separate type — the runtime representation of a newtype is completely identical to the underlying type; they only exist at compile-time.
Lua by default uses a double precision floating point (double) type as its only numeric type. That's nice and useful. However, I'm working on software that expects to see 64bit integers, for which I don't get around using actual 64bit integers one way or another.
The place where the integer type becomes relevant is for file sizes. Although I don't truly expect to see file sizes beyond what Lua can represent with full "integer" precision using a double, I want to be prepared.
What strategies can you recommend when using a 64bit integer type in parallel with the default numeric type of Lua? I don't really want to throw the default implementation overboard (and I'm not worried of its performance compared to integer arithmetics), but I need some way of representing 64bit integers up to their full precision without too much of a performance penalty.
My problem is that I'm unsure where to modify the behavior. Should I modify the syntax and extend the parser (numbers with appended LL or ULL come to mind, which to my knowledge doesn't exist in default Lua) or should I instead write my own C module and define a userdata type that represents the 64bit integer, along with library functions able to manipulate the values? ...
Note: yes, I am embedding Lua, so I am free to extend it whichever way I please.
As part of LuaJIT's port to ARM CPUs (which often have poor floating-point), LuaJIT implemented a "Dual-number VM", which allows it to switch between integers and floats dynamically as needed. You could use this yourself, just switch between 64-bit integers and doubles instead of 32-bit integers and floats.
It's currently live in builds, so you may want to consider using LuaJIT as your Lua "interpreter." Or you could use it as a way to learn how to do this sort of thing.
However, I do agree with Marcelo; the 53-bit mantissa should be plenty. You shouldn't really need this for a good 10 years or so.
I'd suggest storing your data outside of Lua and use some type of reference to retrieve it when calling your other libraries. You can then push various results onto the Lua stack for the user the see, you can even retrieve the value as a string to be precise, but I would avoid modifying them in Lua and relying on the Lua values when calling your external library.
If you're not going to need floating-point precision at any point in the program, you can just redefine LUA_NUMBER to __int64 (or whatever 64-bit int may be in your environment) in luaconf.h.
Otherwise, you can just bring in another library to handle your integers- for infinite precision, you can use a bignum library such as lhf's lbn.
I'm currently teaching myself Haskell, and I'm wondering what the best practices are when working with strings in Haskell.
The default string implementation in Haskell is a list of Char. This is inefficient for file input-output, according to Real World Haskell, since each character is separately allocated (I assume that this means that a String is basically a linked list in Haskell, but I'm not sure.)
But if the default string implementation is inefficient for file i/o, is it also inefficient for working with Strings in memory? Why or why not? C uses an array of char to represent a String, and I assumed that this would be the default way of doing things in most languages.
As I see it, the list implementation of String will take up more memory, since each character will require overhead, and also more time to iterate over, because a pointer dereferencing will be required to get to the next char. But I've liked playing with Haskell so far, so I want to believe that the default implementation is efficient.
Apart from String/ByteString there is now the Text library which combines the best of both worlds—it works with Unicode while being ByteString-based internally, so you get fast, correct strings.
Best practices for working with strings performantly in Haskell are basically: Use Data.ByteString/Data.ByteString.Lazy.
http://hackage.haskell.org/packages/archive/bytestring/latest/doc/html/
As far as the efficiency of the default string implementation goes in Haskell, it's not. Each Char represents a Unicode codepoint which means it needs at least 21bits per Char.
Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it's N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (32, 64, what have you) and unlike many other places where Haskell uses lists where other languages might use different structures (I'm thinking specifically of control flow here), Strings are much less likely to be able to be optimized to loops, etc. by the compiler.
And while a Char corresponds to a codepoint, the Haskell 98 report doesn't specify anything about the encoding used when doing file IO, not even a default much less a way to change it. In practice GHC provides an extensions to do e.g. binary IO, but you're going off the reservation at that point anyway.
Even with operations like prepending to front of the string it's unlikely that a String will beat a ByteString in practice.
The answer is a bit more complex than just "use lazy bytestrings".
Byte strings only store 8 bits per value, whereas String holds real Unicode characters. So if you want to work with Unicode then you have to convert to and from UTF-8 or UTF-16 all the time, which is more expensive than just using strings. Don't make the mistake of assuming that your program will only need ASCII. Unless its just throwaway code then one day someone will need to put in a Euro symbol (U+20AC) or accented characters, and your nice fast bytestring implementation will be irretrievably broken.
Byte strings make some things, like prepending to the start of a string, more expensive.
That said, if you need performance and you can represent your data purely in bytestrings, then do so.
The basic answer given, use ByteString, is correct. That said, all of the three answers before mine have inaccuracies.
Regarding UTF-8: whether this will be an issue or not depends entirely on what sort of processing you do with your strings. If you're simply treating them as single chunks of data (which includes operations such as concatenation, though not splitting), or doing certain limited byte-based operations (e.g., finding the length of the string in bytes, rather than the length in characters), you won't have any issues. If you are using I18N, there are enough other issues that simply using String rather than ByteString will start to fix only a very few of the problems you'll encounter.
Prepending single bytes to the front of a ByteString is probably more expensive than doing the same for a String. However, if you're doing a lot of this, it's probably possible to find ways of dealing with your particular problem that are cheaper.
But the end result would be, for the poster of the original question: yes, Strings are inefficient in Haskell, though rather handy. If you're worried about efficiency, use ByteStrings, and view them as either arrays of Char8 or Word8, depending on your purpose (ASCII/ISO-8859-1 vs Unicode of some sort, or just arbitrary binary data). Generally, use Lazy ByteStrings (where prepending to the start of a string is actually a very fast operation) unless you know why you want non-lazy ones (which is usually wrapped up in an appreciation of the performance aspects of lazy evaluation).
For what it's worth, I am building an automated trading system entirely in Haskell, and one of the things we need to do is very quickly parse a market data feed we receive over a network connection. I can handle reading and parsing 300 messages per second with a negligable amount of CPU; as far as handling this data goes, GHC-compiled Haskell performs close enough to C that it's nowhere near entering my list of notable issues.