What are the exact semantics of Rust's shift operators?

What are the exact semantics of Rust's shift operators? - rust

I tried to find exact information about how the << and >> operators work on integers, but I couldn't find a clear answer (the documentation is not that great in that regard).
There are two parts of the semantics that are not clear to me. First, what bits are "shifted in"?
Zeroes are shifted in from one side (i.e. 0b1110_1010u8 << 4 == 0b1010_0000u8), or
the bits rotate (i.e. 0b1110_1010u8 << 4 == 0b1010_1110u8), or
it's unspecified (like overflowing behavior of integers is unspecified), or
something else.
Additionally, how does shifts work with signed integers? Is the sign bit also involved in the shift or not? Or is this unspecified?

What are the exact semantics of Rust's shift operators?
There are none. The shift operators are a user-implementable trait and you can do basically anything you want in them. The documentation even shows an example of "[a]n implementation of Shr that spins a vector rightward by a given amount."
how the << and >> operators work on integers,
The reference has a section on Arithmetic and Logical Binary Operators. Most usefully, it contains this footnote:
Arithmetic right shift on signed integer types, logical right shift on unsigned integer types.
Logical shifting and arithmetic shifting are preexisting computer science terms with established definitions.
Zeroes are shifted in
Yes.
the bits rotate
No. There are separate methods for rotating left and right.

The thin documentation on the traits Shl and Shr is intentional, so that they may adopt a behaviour that is most suitable for the type at hand (think newtypes!).
With that said, when it comes to the base integer types, the Rust reference covers how they behave, with a bit of inference:
<< | Left Shift | std::ops::Shl
>> | Right Shift* | std::ops::Shr
* Arithmetic right shift on signed integer types, logical right shift on unsigned integer types.
It also includes a few examples, which further clarifies that these are conventional logical/arithmetic shifts: zeros are inserted to the least significant bits on a left bit shift, and the most significant bit is extended for signed integers on a right bit shift. It is also not a rotation, as described in the methods rotate_left and rotate_right.
assert_eq!(13 << 3, 104);
assert_eq!(-10 >> 2, -3);
Moreover, shifting too many bits may be regarded as an arithmetic overflow, and is not undefined behaviour. See: Is it expected that a too large bitshift is undefined behavior in Rust?

Related

Why I am getting (negative value) during this Arithmetic operation in bash on linux x86_64 bit machine? [duplicate]

Shell Arithmetic says:
Evaluation is done in fixed-width integers with no check for overflow,
though division by 0 is trapped and flagged as an error.
Example:
$ echo $(( 1 << 32 ))
4294967296
$ echo $(( (1 << 64) - 1 ))
0
What are integer limits in shell arithmetic in bash?
#rici pointed out that POSIX shell guarantees signed long integer range (as defined by ISO C):
-2**31+1 to +2**31-1
#John Zwinck pointed out that bash source code indicates that intmax_t is used:
All arithmetic is done as intmax_t integers with no checking for overflow
Does bash guarantee in its documentation that it uses intmax_t or some other C type for integers?

Bash does not document the precise size of integers, and the size may vary from platform to platform.
However, it does make an attempt to conform to Posix, which specifies that arithmetic expansion uses signed long arithmetic, which must be at least 32 bits including the sign bit.
Posix does not require integer arithmetic to be modulo 2k for any value of k [but see Note 1], although bash on common platforms will do so, and it particularly does not guarantee that arithmetic operators will behave exactly as though the values were signed longs. Posix even allows the simulation of integer arithmetic with floating point, provided that the floating point values have sufficient precision:
As an extension, the shell may recognize arithmetic expressions beyond those listed. The shell may use a signed integer type with a rank larger than the rank of signed long. The shell may use a real-floating type instead of signed long as long as it does not affect the results in cases where there is no overflow. (XSH §2.6.4)
That would permit the use of IEEE-754 floating point doubles (53 bits of precision) on a platform where long was only 32 bits, for example. While bash does not do so -- as documented, bash uses a fixed-width integer datatype -- other shell implementations might, and portable code should not make assumptions.
Notes:
Posix generally defers to the ISO C standard, but there are a number of places where Posix adds an additional constraint, some of which are marked as extensions (CX):
POSIX.1-2008 acts in part as a profile of the ISO C standard, and it may choose to further constrain behaviors allowed to vary by the ISO C standard. Such limitations and other compatible differences are not considered conflicts, even if a CX mark is missing. The markings are for information only.
One of these additional constraints is the existence of exact-width integer types. Standard C requires the types int_{least,fast}{8,16,32,64}_t and their unsigned analogues. It does not require the exact-width types, such as int32_t, unless some integer type qualifies. An exact-width type must have exactly the number of bits indicated in its name (i.e. no padding bits) and must have 2's-complement representation. So INT32_MIN, if it is defined, must be exactly -231 (§7.20.2.1).
However, Posix does require the exact-width types int{8,16,32}_t (as well as the unsigned analogues), and also int64_t if such a type is provided by the implementation. In particular, int64_t is required if the "implementation supports the _POSIX_V7_LP64_OFF64 programming environment and the application is being built in the _POSIX_V7_LP64_OFF64 programming environment." (XBD, §13, stdint.h) (These requirements are marked as CX.)
Despite the fact that int32_t must exist, and therefore there must be some 2's complement type available, there is still no guarantee that signed long is 2's-complement, and even if it is, there is no guarantee that integer overflow wraps around rather than, for example, trapping.
Most relevant to the original question, though, is the fact that even if signed long is the same type as int64_t and even if signed integer overflow wraps around, the shell is not under any obligation to actually use signed long for arithmetic expansion. It could use any datatype "as long as it does not affect the results in cases where there is no overflow." (XSH, §2.6.4)

Bash uses intmax_t in its C implementation of arithmetic. You can see it here: http://www.opensource.apple.com/source/bash/bash-30/bash/expr.c
This means it will be the "largest" integer type on your platform. Keep in mind that some platforms have "even larger" integers, e.g. 128 bit ints on some 64 bit platforms, but those "extraordinary" types are not included here, so most systems will see Bash using 32 or 64 bit math for now.

Should I use n or * if I have exact number of things in the model?

I need to create UML diagrams for homework about a game ( called Downfall). I have to create it so that it works on any number (n) of player.
If this is an exact number that appears in multiple places of the diagram, should I use n or *? I would use it in multiplicity parameters and in size of array.
For example: There are n sides, and if there is a dial on a side, there has to be dial on each side at that position, so the dial has n-1 connected dials.

TL;DR
You can use a constant, like n. I would though recommend using a self-explanatory constant name like numberOfPlayers or at least noOfPlayers to make it obvious that it is always the same constant.
The name of the constant should be written without quotes (to distinguish it from strings, which are presented in double-quotes).
You can also use expression like n-1 as long as it evaluates to a non-negative Integer all the time.
Full explanation
Let's go by the UML specification. All section and figure references are from it.
1. Multiplicity definition (7.5.3.2)
The multiplicity is defined as lowerValue and upperValue.
The lower and upper bounds for the multiplicity of a MultiplicityElement are specified by ValueSpecifications (see Clause 8), which must evaluate to an Integer value for the lowerBound and an UnlimitedNatural value for the upperBound (see Clause 21 on Primitive Types)
2. ValueSpecification definition
ValueSpecification is defined as either LiteralSpecification (8.2) or Expression or OpaqueExpression (both described in 8.3).
LiteralSpecification is essentially just a number in the case interesting for us, so it is not what you need. But it is not the only option as www.admiraalit.nl suggests in his answer.
3. Expression definition (8.3.3.1)
An Expression is a mechanism to provide a value through some textual representation and eventually computation (I'm simplifying here). For instance:
An Expression is evaluated by first evaluating each of its operands and then performing the operation denoted by the Expression symbol to the resulting operand values
If you use a simple expression without operands, it simply becomes a constant that is a template for your model. So feel free to use a constant as a multiplicity value, as long as the constant evaluates to non-negative Integer (or UnlimitedNatural in case of an upper Limit).
It may even be an expression that changes its value over the lifecycle of the object however ensuring that this kind of multiplicity is met all the time might become challenging.

According to the UML specification, n is syntactically a valid multiplicity (see Ister's answer), but to make sure it is also semantically correct, you would have to define the meaning of n somewhere. Usually, n is not used as a multiplicity in UML diagrams.
I would advise you to use * in this case. If the minimum number of players is 2, you may use 2..*.
Additionally, you may use notes or constraints, e.g. { the number of connected dials is equal to the number of sides minus one }. You may also use a formal constraint language, like OCL.

creating a constant vector of variable width in verilog

I'm writing some synthesizable Verilog. I need to create a value to use as a mask in a larger expression. This value is a sequence of 1's, when the length is stored in some register:
buffer & {offset{1'h1}};
where buffer and offset are both registers. What I expect is for buffer to be anded with 11111... of width offset. However, the compiler says this illegal in verilog, since offset needs to be constant.
Instead, I wrote the following:
buffer & ~({WIDTH{1'h1}} << offset)
where WIDTH is a constant. This works. Both expressions are equivalent in terms of values, but obviously not in terms of the hardware that would be synthesized.
What's the difference?

The difference is because of the rules for context-determined expressions (detailed in sections 11.6 and 11.8 of the IEEE 1800-2017 LRM) require the width of all operands of an expression be known at compile time.
Your example is too simple to show where the complication arises, but let's say buffer was 16-bit signed variable. To perform bitwise-and (&), you first need to know size of both operands, then extend the smaller operand to match the size of the larger operand. If we don't know what the size of {offset{1'h1}} is, we don't know whether it needs to 0-extended, or buffer needs to be sign-extended.
Of course, the language could be defined to allow this so it works, but synthesis tools would be creating a lot of unnecessary additional hardware. And if we start applying this to more complex expressions, trying to determine how the bit-widths propagate becomes unmanageable.

Both parts of your question impliy the replication operator. The operator requires you to use a replication constant to show how many times to replicate. So, the first part of your example is illegal. The offset must be a constant, not a reg.
Also, the constant is not a width of something, but a number of times the {1} is repeated. So, the second part of the example is correct syntactically.

Why does `128u8.checked_shl(1)` return `Some(0)`?

I was under the impression that the .checked_*(_) methods of the integral types were there to help avoid overflow. However, the .checked_shl(u32) method happily shifts out the last bit of the example above.
Is my impression wrong? What is that method for?
(Also wanted to add that to avoid overflow on shifts, one can check if ((~0) >> rhs) >= self at least for unsigned types)

Because it checks only the shift amount. From the docs,
None if rhs is larger than or equal to the number of bits in self.
So by design it lets you shift out bits, it just doesn't let you use invalid shift amounts (or, it lets you, but you get None).

High precision floating point numbers in Haskell?

I know Haskell has native data types which allow you to have really big integers so things like
>> let x = 131242358045284502395482305
>> x
131242358045284502395482305
work as expected. I was wondering if there was a similar "large precision float" native structure I could be using, so things like
>> let x = 5.0000000000000000000000001
>> x
5.0000000000000000000000001
could be possible. If I enter this in Haskell, it truncates down to 5 if I go beyond 15 decimal places (double precision).

Depending on exactly what you are looking for:
Float and Double - pretty much what you know and "love" from Floats and Doubles in all other languages.
Rational which is a Ratio of Integers
FixedPoint - This package provides arbitrary sized fixed point values. For example, if you want a number that is represented by 64 integral bits and 64 fractional bits you can use FixedPoint6464. If you want a number that is 1024 integral bits and 8 fractional bits then use $(mkFixedPoint 1024 8) to generate type FixedPoint1024_8.
EDIT: And yes, I just learned about the numbers package mentioned above - very cool.

Haskell does not have high-precision floating-point numbers naitively.
For a package/module/library for this purpose, I'd refer to this answer to another post. There's also an example which shows how to use this package, called numbers.

If you need a high precision /fast/ floating point calculations, you may need to use FFI and long doubles, as the native Haskell type is not implemented yet (see https://ghc.haskell.org/trac/ghc/ticket/3353).

I believe the standard package for arbitrary precision floating point numbers is now https://hackage.haskell.org/package/scientific

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string