How to divide int64 in Nim? - nim-lang

How can I divide int64?
let v: int64 = 100
echo v / 10
Error Error: type mismatch: got <int64, int literal(10)>
Full example
import math
proc sec_to_min*(sec: int64): int =
let min = sec / 60 # <= error
min.round.to_int
echo 100.sec_to_min
P.S.
And, is there a way to safely cast int64 to int, so the result would be int and not int64, with the check for overflow.

There has been already a bit of discussion over int64 division in this issue and probably some improvement to current state can be made. From the above issue:
a good reason for not having in stdlib float division between int64 is that it might it may incur in loss of precision and so the user should explicitly convertint64 to float
still, float division between int types is present in stdlib
on 64 bit system int is int64 (and so you have division between int64 in 64 bit systems)
For your use case I think the following (playground) should work (better to use div instead of doing float division and then rounding off):
import math
proc sec_to_min*(sec: int64): int = sec.int div 60
echo 100.sec_to_min
let a = high(int64)
echo a.int # on playground this does not raise error since int is int64
echo a.int32 # this instead correctly raises error
output:
1
9223372036854775807
/usercode/in.nim(9) in
/playground/nim/lib/system/fatal.nim(49) sysFatal
Error: unhandled exception: value out of range: 9223372036854775807 notin -2147483648 .. 2147483647 [RangeError]
P.S.: as you see above standard conversion has range checks

Apparently division between int64 types is terribly dangerous because it invokes an undying horde of bike shedding, but at least you can create your own operator:
proc `/`(x, y: int64): int64 = x div y
let v: int64 = 100
echo v / 10
Or
proc `/`(x, y: int64): int64 = x div y
import math
proc sec_to_min*(sec: int64): int =
int(sec / 60)
echo 100.sec_to_min
With regards to the int64 to int conversion, I'm not sure that makes much sense since most platforms will run int as an alias of int64. But of course you could be compiling/running on a 32 bit platform, where the loss would be tragic, so you can still do runtime checks:
let a = int64.high
echo "Unsurprising but potentially wrong ", int(a)
proc safe_int(big_int: int64): int =
if big_int > int32.high:
raise new_exception(Overflow_error, "Value is too high for 32 bit platforms")
int(big_int)
echo "Reachable code ", safe_int(int32.high)
echo "Unreachable code ", safe_int(a)
Also, if you are running into confusing minute, hour, day conversions, you might want to look into distinct types to avoid adding months to seconds (or do so in a more safe way).

Related

unsigned integer devision rounding AVR GCC

I am struggling to understand how to divide an unsigned integer by a factor of 10 accounting for rounding like a float would round.
uint16_t val = 331 / 10; // two decimal to one decimal places 3.31 to 3.3
uint16_t val = 329 / 10; // two decimal to one decimal places 3.29 to 3.3 (not 2.9)
I would like both of these sums to round to 33 (3.3 in decimal equivalent of 1 decimal place)
Im sure the answer is simple if i were more knowledgable than i am on how processors perform integer division.
Since integer division rounds the result downward to zero, just add half of the divisor, whatever it is. I.e.:
uint16_t val = ((uint32_t)x + 5) / 10; // convert to 32-bit to avoid overflow
or in more general form:
static inline uint16_t divideandround(uint16_t quotient, uint16_t divisor) {
return ((uint32_t)quotient + (divisor >> 1)) / divisor;
}
If you are sure there will no 16-bit overflow (i.e. values will always be not more than 65530) you can speed up the calculation by keeping values 16 bit:
uint16_t val = (x + 5) / 10;
I think I have worked it out, This seems to give me the right answer, please let me know if I am actually wrong and it fails.
uint16_t val = 329;
if (val%10>=5)
{
val = (val+5)/10;
}
else
{
val = val/10;
}
You can do it with just one 16-bit divmod operation:
#include <stdint.h>
uint16_t udiv10_round (uint16_t n)
{
uint16_t q = n / 10;
uint16_t r = n % 10;
return r >= 5 ? q + 1 : q;
}
When you are optimizing for size (-Os), avr-gcc will compute both quotient and remainder by means of one library call to __udivmodhi4.
When you are optimizing for speed (-O2), avr-gcc might avoid1 __udivmodhi4 altogether and instead perform a 16×16=32 multiplication with a factor of 0xcccd, so that the quotient and remainder are easy to compute from the high part of that product.
1This happens if the MCU you are compiling for supports MUL. If MUL is nor supported, avr-gcc still uses divmod operation as a 16×16=32 multiplication would not gain anything.

Getting Overflow in constant value computation

The following code:
private const uint FIRMWARE_DOWNLOAD_ADDRESS = 0x00001800;
public void someFunc(){
byte[] command = new byte[16];
command[11] = (byte)(FIRMWARE_DOWNLOAD_ADDRESS >> 24);
command[10] = (byte)(FIRMWARE_DOWNLOAD_ADDRESS >> 16);
command[9] = (byte)(FIRMWARE_DOWNLOAD_ADDRESS >> 8);
command[8] = (byte)(FIRMWARE_DOWNLOAD_ADDRESS); //error: Overflow in constant value computation
}
throws an error Overflow in constant value computation.
Why? From what I understand 0x00001800 <= 0xffffffff so there should be no overflow happening.
And why don't the other 3 lines throw an error? I tried to do:
command[8] = (byte)(FIRMWARE_DOWNLOAD_ADDRESS >>0);
thinking that the right shift operator was somehow checking for the overflow condition but this still gives the same error.
You get the error because the value you are trying to cast to a byte cannot be represented by a byte.
A byte's max value is 0x000000FF (or 255). But you are trying to cast 0x00001800 (or 6144). A byte simply cannot contain that value.
The remaining works fine since, after the bit shift, the value is small enough to be represented by a byte
FIRMWARE_DOWNLOAD_ADDRESS >> 24 = 0
FIRMWARE_DOWNLOAD_ADDRESS >> 16 = 0
FIRMWARE_DOWNLOAD_ADDRESS >> 8 = 24
It seems like you are thinking about an unsigned integer's max value, which is 0xFFFFFFFF (or 4294967295)

Can I speed up this Haskell algorithm?

I've got this haskell file, compiled with ghc -O2 (ghc 7.4.1), and takes 1.65 sec on my machine
import Data.Bits
main = do
print $ length $ filter (\i -> i .&. (shift 1 (i `mod` 4)) /= 0) [0..123456789]
The same algorithm in C, compiled with gcc -O2 (gcc 4.6.3), runs in 0.18 sec.
#include <stdio.h>
void main() {
int count = 0;
const int max = 123456789;
int i;
for (i = 0; i < max; ++i)
if ((i & (1 << i % 4)) != 0)
++count;
printf("count: %d\n", count);
}
Update
I thought it might be the Data.Bits stuff going slow, but surprisingly if I remove the shifting and just do a straight mod, it actually runs slower at 5.6 seconds!?!
import Data.Bits
main = do
print $ length $ filter (\i -> (i `mod` 4) /= 0) [0..123456789]
whereas the equivalent C runs slightly faster at 0.16 sec:
#include <stdio.h>
void main() {
int count = 0;
const int max = 123456789;
int i;
for (i = 0; i < max; ++i)
if ((i % 4) != 0)
++count;
printf("count: %d\n", count);
}
The two pieces of code do very different things.
import Data.Bits
main = do
print $ length $ filter (\i -> i .&. (shift 1 (i `mod` 4)) /= 0) [0..123456789]
creates a list of 123456790 Integer (lazily), takes the remainder modulo 4 of each (involving first a check whether the Integer is small enough to wrap a raw machine integer, then after the division a sign-check, since mod returns non-negative results only - though in ghc-7.6.1, there is a primop for that, so it's not as much of a brake to use mod as it was before), shifts the Integer 1 left the appropriate number of bits, which involves a conversion to "big" Integers and a call to GMP, takes the bitwise and with i - yet another call to GMP - and checks whether the result is 0, which causes another call to GMP or a conversion to small integer, not sure what GHC does here. Then, if the result is nonzero, a new list cell is created where that Integer is put in, and consumed by length. That's a lot of work done, most of which unnecessarily complicated due to the defaulting of unspecified number types to Integer.
The C code
#include <stdio.h>
int main(void) {
int count = 0;
const int max = 123456789;
int i;
for (i = 0; i < max; ++i)
if ((i & (1 << i % 4)) != 0)
++count;
printf("count: %d\n", count);
return 0;
}
(I took the liberty of fixing the return type of main), does much much less. It takes an int, compares it to another, if smaller, takes the bitwise and of the first int with 3(1), shifts the int 1 to the left the appropriate number of bits, takes the bitwise and of that and the first int, and if nonzero increments another int, then increments the first. Those are all machine ops, working on raw machine types.
If we translate that code to Haskell,
module Main (main) where
import Data.Bits
maxNum :: Int
maxNum = 123456789
loop :: Int -> Int -> Int
loop acc i
| i < maxNum = loop (if i .&. (1 `shiftL` (i .&. 3)) /= 0 then acc + 1 else acc) (i+1)
| otherwise = acc
main :: IO ()
main = print $ loop 0 0
we get a much closer result:
C, gcc -O3:
count: 30864196
real 0m0.180s
user 0m0.178s
sys 0m0.001s
Haskell, ghc -O2:
30864196
real 0m0.247s
user 0m0.243s
sys 0m0.003s
Haskell, ghc -O2 -fllvm:
30864196
real 0m0.144s
user 0m0.140s
sys 0m0.003s
GHC's native code generator isn't a particularly good loop optimiser, so using the llvm backend makes a big difference here, but even the native code generator doesn't do too badly.
Okay, I have done the optimisation of replacing a modulus calculation with a power-of-two modulus with a bitwise and by hand, GHC's native code generator doesn't do that (yet), so with ```rem4`` instead of.&. 3`, the native code generator produces code that takes (here) 1.42 seconds to run, but the llvm backend does that optimisation, and produces the same code as with the hand-made optimisation.
Now, let us turn to gspr's question
While LLVM didn't have a massive effect on the original code, it really did on the modified (I'd love to learn why...).
Well, the original code used Integers and lists, llvm doesn't know too well what to do with these, it can't transform that code into loops. The modified code uses Ints and the vector package rewrites the code to loops, so llvm does know how to optimise that well, and that shows.
(1) Assuming a normal binary computer. That optimisation is done by ordinary C compilers even without any optimisation flag, except on the very rare platforms where a div instruction is faster than a shift.
Few things beat a hand-written loop with a strict accumulator:
{-# LANGUAGE BangPatterns #-}
import Data.Bits
f :: Int -> Int
f n = g 0 0
where g !i !s | i <= n = g (i+1) (if i .&. (unsafeShiftL 1 (i `rem` 4)) /= 0 then s+1 else s)
| otherwise = s
main = print $ f 123456789
In addition to the tricks mentioned so far, this also replaces shift with unsafeShiftL, which doesn't check its argument.
Compiled with -O2 and -fllvm, this is about 13x faster than the original on my machine.
Note: Testing if bit i of x is set can be written more clearly as x `testBit` i. This produces the same assembly as the above.
Vector instead of list, fold instead of filter-and-length
Substituting the list for an unboxed vector and the filter-and-length for a fold (i.e. incrementing a counter) improves the time significantly for me. Here's what I used:
import qualified Data.Vector.Unboxed as UV
import Data.Bits
foo :: Int
foo = UV.foldl (\s i -> if i .&. (shift 1 (i `rem` 4)) /= 0 then s+1 else s) 0 (UV.enumFromN 0 123456789)
main = print foo
The original code (with two changes though: rem instead of mod as suggested in the comments, and adding an Int to the signature to avoid Integer) gave:
$ time ./orig
30864196
real 0m2.159s
user 0m2.144s
sys 0m0.008s
The modified code above gave:
$ time ./new
30864196
real 0m1.450s
user 0m1.440s
sys 0m0.004s
LLVM
While LLVM didn't have a massive effect on the original code, it really did on the modified (I'd love to learn why...).
Original (LLVM):
$ time ./orig-llvm
30864196
real 0m2.047s
user 0m2.036s
sys 0m0.008s
Modified (LLVM):
$ time ./new-llvm
30864196
real 0m0.233s
user 0m0.228s
sys 0m0.004s
For comparison, OP's original C code comes in at 0m0.152s user on my system.
This is all GHC 7.4.1, GCC 4.6.3, and vector 0.9.1. LLVM is either 2.9 or 3.0; I have both but can't seem to figure out which one GHC is actually using.
Try this:
import Data.Bits
main = do
print $ length $ filter (\i -> i .&. (shift 1 (i `rem` 4)) /= 0) [0..123456789::Int]
Without the ::Int, the type defaults to ::Integer.
rem does the same as mod on positive values, and it is the same as % in C. mod on the other hand ist mathematically correct on negative values, but is slower.
int in C is 32bit
Int in Haskell is either 32 or 64bit wide, like long in C
Integer is an arbitrary-bit-integer, it has no min/max values, and its memory size depends on its value (similar to a string).

Need to do 64 bit multiplication on a machine with 32 bit longs

I'm working on a small embedded system that has 32 bit long ints. For one calculation I need output the time since 1970 in ms. I can get the time in 32 bit unsigned long seconds since 1970, but how can I represent this as a 64 bit no. of ms if my biggest int is only 32bits? I'm sure stackoverflow will have a cunning answer! I am using Dynamic C, close to standard C. I have some sample code from another system which has a 64 bit long long data type:
long long T = (long long)(SampleTime * 1000.0 + 0.5);
data.TimeLower = (unsigned int)(T & 0xffffffff);
data.TimeUpper = (unsigned short)((T >> 32) & 0xffff);
Since you are only multiplying by 1000 (seconds -> millis), you can do it with two 16 bit mutliplies and one add and a bit of bit fiddling, I have used your putative data type to store the result below:
uint32_t time32 = time();
uint32_t t1 = (time32 & 0xffff) * 1000;
uint32_t t2 = ((time32 >> 16) * 1000) + (t1 >> 16);
data.TimeLower = (uint32_t) ((t2 & 0xffff) << 16) | (t1 & 0xffff);
data.TimeUpper = (uint32_t) (t2 >> 16);
The standard approach, assuming you have a 16x16->32 multiply available, would be to split both numbers into 16-bit high and low parts, compute four partial products, and add the results. If you don't have a 16x16->32 primitive which is faster than a 32x32->32 primitive, though, I'm not sure what the best approach would be. I would think that a 32x32->32 multiply should be more useful than a 16x16->32, but I can't think how one would use it.
Personally, I wish there were a standard primitive to return the top half of a NxN multiply (32x32, certainly; also 16x16 for smaller machines and 64x64 for larger ones).
It might be helpful if you were more specific about what kinds of calculations you need to do. 64-bit multiplication implemented with 32-bit operations is quite slow, and you may have the additional overhead of 64-bit division (to convert back to seconds and milliseconds), which is even slower.
Without knowing more about what exactly you need to do, it seems to me that it would be more efficient to use a struct, containing a 32-bit unsigned int for the number of seconds and a 16-bit int for the number of milliseconds (the "remainder"). (Or use a 32-bit int for the milliseconds if 64-bit alignment is more important than saving a couple of bytes.)

Sizeof struct in Go

I'm having a look at Go, which looks quite promising.
I am trying to figure out how to get the size of a go struct, for
example something like
type Coord3d struct {
X, Y, Z int64
}
Of course I know that it's 24 bytes, but I'd like to know it programmatically..
Do you have any ideas how to do this ?
Roger already showed how to use SizeOf method from the unsafe package. Make sure you read this before relying on the value returned by the function:
The size does not include any memory possibly referenced by x. For
instance, if x is a slice, Sizeof returns the size of the slice
descriptor, not the size of the memory referenced by the slice.
In addition to this I wanted to explain how you can easily calculate the size of any struct using a couple of simple rules. And then how to verify your intuition using a helpful service.
The size depends on the types it consists of and the order of the fields in the struct (because different padding will be used). This means that two structs with the same fields can have different size.
For example this struct will have a size of 32
struct {
a bool
b string
c bool
}
and a slight modification will have a size of 24 (a 25% difference just due to a more compact ordering of fields)
struct {
a bool
c bool
b string
}
As you see from the pictures, in the second example we removed one of the paddings and moved a field to take advantage of the previous padding. An alignment can be 1, 2, 4, or 8. A padding is the space that was used to fill in the variable to fill the alignment (basically wasted space).
Knowing this rule and remembering that:
bool, int8/uint8 take 1 byte
int16, uint16 - 2 bytes
int32, uint32, float32 - 4 bytes
int64, uint64, float64, pointer - 8 bytes
string - 16 bytes (2 alignments of 8 bytes)
any slice takes 24 bytes (3 alignments of 8 bytes). So []bool, [][][]string are the same (do not forget to reread the citation I added in the beginning)
array of length n takes n * type it takes of bytes.
Armed with the knowledge of padding, alignment and sizes in bytes, you can quickly figure out how to improve your struct (but still it makes sense to verify your intuition using the service).
import unsafe "unsafe"
/* Structure describing an inotify event. */
type INotifyInfo struct {
Wd int32 // Watch descriptor
Mask uint32 // Watch mask
Cookie uint32 // Cookie to synchronize two events
Len uint32 // Length (including NULs) of name
}
func doSomething() {
var info INotifyInfo
const infoSize = unsafe.Sizeof(info)
...
}
NOTE: The OP is mistaken. The unsafe.Sizeof does return 24 on the example Coord3d struct. See comment below.
binary.TotalSize is also an option, but note there's a slight difference in behavior between that and unsafe.Sizeof: binary.TotalSize includes the size of the contents of slices, while unsafe.Sizeof only returns the size of the top level descriptor. Here's an example of how to use TotalSize.
package main
import (
"encoding/binary"
"fmt"
"reflect"
)
type T struct {
a uint32
b int8
}
func main() {
var t T
r := reflect.ValueOf(t)
s := binary.TotalSize(r)
fmt.Println(s)
}
This is subject to change but last I looked there is an outstanding compiler bug (bug260.go) related to structure alignment. The end result is that packing a structure might not give the expected results. That was for compiler 6g version 5383 release.2010-04-27 release. It may not be affecting your results, but it's something to be aware of.
UPDATE: The only bug left in go test suite is bug260.go, mentioned above, as of release 2010-05-04.
Hotei
In order to not to incur the overhead of initializing a structure, it would be faster to use a pointer to Coord3d:
package main
import (
"fmt"
"unsafe"
)
type Coord3d struct {
X, Y, Z int64
}
func main() {
var dummy *Coord3d
fmt.Printf("sizeof(Coord3d) = %d\n", unsafe.Sizeof(*dummy))
}
/*
returns the size of any type of object in bytes
*/
func getRealSizeOf(v interface{}) (int, error) {
b := new(bytes.Buffer)
if err := gob.NewEncoder(b).Encode(v); err != nil {
return 0, err
}
return b.Len(), nil
}

Resources