UPDATE
Hm, I have an update. Apparently my huge array of "unsigned long long fhash[105][100555]" was not getting initialized to zero automatically in vC++... It worked when I did = {0}. Isn't it supposed to initialize automatically?
I'm doing contest programming, and I usually compile with g++ at school/ideone etc... but I have to use a VC++ 2010 compiler.
That said, I have code to do polynomial rolling hashing (like used in Rabin-Karp), but do these overflow differently on these compilers?
Code is here: http://pastebin.com/UFdpwHCt (hashing is around line 67)
Output is here: http://i.imgur.com/KCcvI.png
How come "bhash" is equal between the two compilers, but "fhash" isn't? They are hashed using the same method... In the G++-3 output, the "fhash" and "bhash" outputs are the same (they are supposed to be) but in the VC++-10 output the "fhash" and "bhash" aren't the same...
I'm using the overflow to let it mod itself naturally, to speed up execution, instead of explicitly modding it with a large prime.
Wasn't an issue. the issue was that it wasn't getting initialized to zero. fixed it using memset.
Related
Rust treats signed integer overflow differently in debug and release mode. When it happens, Rust panics in debug mode while silently performs two's complement wrapping in release mode.
As far as I know, C/C++ treats signed integer overflow as undefined behavior partly because:
At that time of C's standardization, different underlying architecture of representing signed integers, such as one's complement, might still be in use somewhere. Compilers cannot make assumptions of how overflow is handled in the hardware.
Later compilers thus making assumptions such as the sum of two positive integers must also be positive to generate optimized machine code.
So if Rust compilers do perform the same kind of optimization as C/C++ compilers regarding signed integers, why does The Rustonomicon states:
No matter what, Safe Rust can't cause Undefined Behavior.
Or even if Rust compilers do not perform such optimization, Rust programmers still do not anticipate seeing a signed integer wrapping around. Can't it be called "undefined behavior"?
Q: So if Rust compilers do perform the same kind of optimization as C/C++ compilers regarding signed integers
Rust does not. Because, as you noticed, it cannot perform these optimizations as integer overflows are well defined.
For an addition in release mode, Rust will emit the following LLVM instruction (you can check on Playground):
add i32 %b, %a
On the other hand, clang will emit the following LLVM instruction (you can check via clang -S -emit-llvm add.c):
add nsw i32 %6, %8
The difference is the nsw (no signed wrap) flag. As specified in the LLVM reference about add:
If the sum has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.
Because LLVM integers use a two’s complement representation, this instruction is appropriate for both signed and unsigned integers.
nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs.
The poison value is what leads to undefined behavior. If the flags are not present, the result is well defined as 2's complement wrapping.
Q: Or even if Rust compilers do not perform such optimization, Rust programmers still do not anticipate seeing a signed integer wrapping around. Can't it be called "undefined behavior"?
"Undefined behavior" as used in this context has a very specific meaning that is different from the intuitive English meaning of the two words. UB here specifically means that the compiler can assume an overflow will never happen and that if an overflow will happen, any program behavior is allowed. That's not what Rust specifies.
However, an integer overflow via the arithmetic operators is considered a bug in Rust. That's because, as you said, it is usually not anticipated. If you intentionally want the wrapping behavior, there are methods such as i32::wrapping_add.
Some additional resources:
RFC 560 specifies everything about integer overflows in Rust. In short: panic in debug mode, 2's complement wrap in release mode.
Myths and Legends about Integer Overflow in Rust. Nice blog post about this topic.
I have to do integer arithmetic in kernel, specifically I need to increment a size_t object by some delta, and this will happen quite often. So I'm wondering if I need to guard against possible integer overflows in the kernel, and if so, does the kernel provide macros or APIs for this?
size_t doesn't overflow; it is an unsigned type, with well-defined "wraparound" semantics. Incrementing the highest value of a size_t results in
zero.
In the specific case of size_t, in simple operations on size_t, like adding two sizes together, it is usually enough to just check whether the resulting operand is larger than one of the two source operands. If (size3 = size1 + size2) < size1), you have a wrap.
If an unsigned type is used as a clock value which goes around a "wheel", there are macros for doing "time before" calculations correctly. For instance, we want the time 0xFFFFFFFE to be treated as being a few time units in the past w.r.t. the time 0x00000003. If you're using the "jiffies" time in the kernel, then you can use the time_before inline function, and others in that family. (Note that there are "classic jiffies" (my term) represented as long and 64 bit jiffies represented as u64, with separate functions like time_before versus time_before64).
But are there some general macros for doing math with overflow checks? Casually combing through a kernel tree (3.18.31 that I have at my convenience), it doesn't appear that way. grep -i overflow on the include subtree doesn't come up with anything and similar searches in code areas like fs reveal the use of ad hoc locally coded overflow checks. It's a shame, really; you'd think the problem of "if I add these two int values together, is there a problem" is common enough that there would be a solution in place that everyone can just use like some addv(x_int, y_int, &overflow_flag) or whatever.
integer overflow in kernel — possible?
Yes. It doesn't matter, user space or kernel -- it's just how CPU works.
I'm wondering if I need to guard against possible integer overflows in the kernel
If you think that it can happen and it's not acceptable in your case -- then yes. For signed integers it can even lead to undefined behavior.
does the kernel provide macros or APIs for this
No, there are no ready-to-use functions in kernel for dealing with integer overflows. Well, there are some GCC wrappers for overflow detection... But be sure not to use it. Otherwise Linus Torvalds will come and yell at you, like here :)
Anyway, it's quite easy to detect integer overflows manually, when you really need that. Look here for example. In your case, size_t is unsigned, so you only need to ensure that it doesn't wrap or handle wrapped value: details.
I'm migrating a Visual C++ project which uses ATL/MFC from VS2010 to VS2013. The project compiles with /J ("assume char is unsigned"), and there is too much code that may or may not rely on that fact to easily remove the compiler flag.
Under VS2013, /J causes a compiler error in atldef.h: ATL doesn't support compilation with /J or _CHAR_UNSIGNED flag enabled. This can be suppressed by defining _ATL_ALLOW_UNSIGNED_CHAR. Microsoft mention this in the MSDN documentation for /J, along with the vague statement: "If you use this compiler option with ATL/MFC, an error might be generated. Although you could disable this error by defining _ATL_ALLOW_CHAR_UNSIGNED, this workaround is not supported and may not always work."
Does anyone know under what circumstances it is safe or unsafe to use _ATL_ALLOW_CHAR_UNSIGNED?
Microsoft struggles to keep ancient codebases, like ATL, compatible with changes in the compiler. The principal trouble-maker here is the AtlGetHexValue() function. It had a design mistake:
The numeric value of the input character interpreted as a hexadecimal digit. For example, an input of '0' returns a value of 0 and an input of 'A' returns a value of 10. If the input character is not a hexadecimal digit, this function returns -1.
-1 is the rub, 9 years ago that broke with /J in effect. And it won't actually return -1 today, it now returns CHAR_MAX ((char)255) if you compile with /J. Required since comparing unsigned char to -1 will always be false and the entire if() statement is omitted. This broke ATL itself, it will also break your code in a very nasty way if you use this function, given that this code is on the error path that is unlikely to get tested.
Shooting off the hip, there were 3 basic ways they could have solved this problem. They could have changed the return value type to int, risking breaking everybody. Or they could have noted the special behavior in the MSDN article, making everybody's eyes roll. Or they could have invoked the "time to move on" option. Which is what they picked, it was about time with MSVC++ being the laughing stock of the programming world back then.
That's about all you need to fear from ATL, low odds that you are using this function and easy to find back. Otherwise an excellent hint to look for the kind of trouble you might get from your own code.
When using the following to compute PI in fortran77, will the compiler evaluate this value or will it be evaluated at run time?
PI=4.D0*DATAN(1.D0)
EDIT: depends on the compiler: see my EDIT below. EDIT END
i second Mick Sharpe's suggestion that it will be evaluated at runtime. just out of curiosity, i compiled PI=4.D0*DATAN(1.D0) with Silverfrost's ftn77 compiler and looked at the generated binary. the relevant part looks like so:
fld1 ; push 1.D0 onto the FPU register stack
call ATAN_X
fmul dbl_404000 ; multiply by 4.D0
so indeed, no compiler cleverness here.
this of course might be different with another compiler (eg. g77). EDIT: apparently, with g77 (the fortran77 front-end for gcc) it is possible (and enabled by default) to use gcc's built-in atan function to auto-fold PI=4.D0*DATAN(1.D0) into a constant. EDIT END
Calls to math functions are normally evaluated at run time. After all, there's nothing to stop you writing your own math functions. This would not be possible if they were evaluated at compile time.
I don't have the source code but have the binary. With command "nm binary_name" I could know the functions inside the binary.
Can I know how many parameters a function has? Under solaris, is there anyway to do that?
e.g, if the function is: func1(a int,b int,c int), then there are 3 parameters.
Thanks
Daniel
No. Neil Butterworth's suggestion to examine the function signature is a good one for C++ (since the parameters are often encoded into the function so the linker can tell the difference between "int x(int)" and "int x(float)" for example) but, for C, you're going to have to get your hands dirty and disassemble the function, taking particular note of how the stack frames are built and used in your environment.
Keep in mind that SPARC has a rotating window stack rather than regular grow-down stack. You're really going to delve deep into the way the CPU works. If you're talking Solaris for Intel, the rotating stack is not there, of course.
Assuming this is C code, then no there is not - the
compiler/linker elides that information. If it is C++ code, it is just possible that the mangled name of the function is retained and includes the parameters in encoded form.
At the lowest level, if you emulate the function running on the machine, then it will read some information either from registers or the stack which it has not written. If you compare these reads to the ABI of the platform ( You don't say whether it's Sparc Solaris or Intel Solaris ) then some of them should correspond to the registers/stack locations of the parameters of the function. Of course, there's no guarantee that a function will read all its parameters.
For Solaris, elfdump might give more information than nm ( a quick google for elfdump signature indicates support was requested and added, but you'd need to check what version you've got )
IDA Pro (http://www.hex-rays.com/idapro/) is a disassembler which is pretty clever at infering parameters of a function from object code;
maybe there is also symbolic information you can use; eg. on Win32 the symbol _function#8 reveals that 8 bytes (2 parameters) are passed
one can also demangle C++ names to get the parameters and types