Please take a look at the following image…
There are two symbols in this image.
I learned from Wikipedia's “List of logic symbols” the symbol “⊕” stands for “XOR”, but what does that cross in square symbol mean? Does that mean “XOR” too?
XOR
Means: combine the two inputs using XOR. So, this symbol indeed can be read as “⊕”.
Addition
Means: combine the two inputs using addition. This symbol indeed can be read as “+”.
Nota Bene
In the image you're asking about, it is noted that the S-boxes take 8 bit (= unsigned char) input and return 32 bits (= unsigned int)… which means the cipher expects you to do the addition and XOR on unsigned integers.
The plus in a box is addition mod 232 (actually, I don't remember for sure -- it could be mod 232-1, but it's addition in any case).
Related
On 64-bit RISC-V, when a 32-bit operand is loaded into a register, it is necessary to decide whether to sign-extend or zero-extend to 64 bits, and the architectural decision was made to prefer the former, presumably on the grounds that the most common int type in C family languages is a signed 32-bit integer. So sign extension is slightly faster than zero extension.
Is the same true of 8-bit operands? In other words, is signed char more efficient than unsigned char?
If you’re going to be widening a lot of 8-bit values to wchar_t, unsigned char is what you want, because that’s a no-op rather than a bitmask. If your char format is UTF-8, you also want to be able to use unsigned math for your shifts. If you’re using library functions, it’s most convenient to use the types your library expects.
The RISC-V architecture has both a LB instruction that loads a sign-extended 8-bit value into a register, and a LBU instruction that zero-extends. Both are equally efficient. In C, any signed char used in an arithmetic operation is widened to int, and the C standard library functions specify widening char to int, so this puts the variable in the correct format to use.
Storing is a matter of truncation, and converting from any integral type to unsigned char is trivial (bitmask by 0xff). Converting from an unsigned char to a signed value can be done in no more than two instructions, without conditionals or register pressure (SLLI to put the sign bit of the char into the sign bit of the machine register, followed by SRLI to sign-extend the upper bits).
There is therefore no additional overhead in this architecture to working with either. The API specifies sign-extension rather than zero-extension of signed quantities.
Incidentally, RV64I does not architecturally prefer sign-extension. That is the ABI convention, but the instruction set adds a LWU instruction to load a 32-bit value from memory with zero-extension and an ADDIW that can sign-extend a zero-extended 32-bit result. (There is no corresponding ADDIB for 8-bit or ADDIH for 16-bit quantities.)
I tried to find exact information about how the << and >> operators work on integers, but I couldn't find a clear answer (the documentation is not that great in that regard).
There are two parts of the semantics that are not clear to me. First, what bits are "shifted in"?
Zeroes are shifted in from one side (i.e. 0b1110_1010u8 << 4 == 0b1010_0000u8), or
the bits rotate (i.e. 0b1110_1010u8 << 4 == 0b1010_1110u8), or
it's unspecified (like overflowing behavior of integers is unspecified), or
something else.
Additionally, how does shifts work with signed integers? Is the sign bit also involved in the shift or not? Or is this unspecified?
What are the exact semantics of Rust's shift operators?
There are none. The shift operators are a user-implementable trait and you can do basically anything you want in them. The documentation even shows an example of "[a]n implementation of Shr that spins a vector rightward by a given amount."
how the << and >> operators work on integers,
The reference has a section on Arithmetic and Logical Binary Operators. Most usefully, it contains this footnote:
Arithmetic right shift on signed integer types, logical right shift on unsigned integer types.
Logical shifting and arithmetic shifting are preexisting computer science terms with established definitions.
Zeroes are shifted in
Yes.
the bits rotate
No. There are separate methods for rotating left and right.
The thin documentation on the traits Shl and Shr is intentional, so that they may adopt a behaviour that is most suitable for the type at hand (think newtypes!).
With that said, when it comes to the base integer types, the Rust reference covers how they behave, with a bit of inference:
<< | Left Shift | std::ops::Shl
>> | Right Shift* | std::ops::Shr
* Arithmetic right shift on signed integer types, logical right shift on unsigned integer types.
It also includes a few examples, which further clarifies that these are conventional logical/arithmetic shifts: zeros are inserted to the least significant bits on a left bit shift, and the most significant bit is extended for signed integers on a right bit shift. It is also not a rotation, as described in the methods rotate_left and rotate_right.
assert_eq!(13 << 3, 104);
assert_eq!(-10 >> 2, -3);
Moreover, shifting too many bits may be regarded as an arithmetic overflow, and is not undefined behaviour. See: Is it expected that a too large bitshift is undefined behavior in Rust?
I have this table of S-format instructions. Can you explain to me what imm[11:5] and funct3 are? I know funct indicates its size in bits and sometimes it is 000 or 010. I don't know exactly why it's there. Also, imm[11:5] is also 7-bits of all 0s.
Please help!
imm[4:0] and imm[11:5] denote closed-intervals into the bit-representation of the immediate operand.
The S-format is used to encode store instructions, i.e.:
sX rs2, offset(r1)
There are different types of store instructions, e.g. store-byte (sb), store-half-word (sh), store-word (sw), etc. The funct3 part is used to encode the type (i.e. 0b00 -> sb, 0b010 -> sw, 0b011 -> sd, etc.). This allows to just use one (major) opcode while still having multiple types of store instructions - instead of having to waste several (major) opcodes. IOW, funct3 encodes the minor opcode of the instruction.
The immediate operand encodes the offset. If you ask yourself why it's split like this - it allows to increase similarity of the remaining parts in the encoding with other instruction formats. For example, the opcode, rs1, and funct3 parts are located at the exact same place in the R-type, I-type and B-type instruction formats. The rs2 part placement is shared with the R-type and B-type instruction formats. Those similarities help to simplify the instruction decoder.
That means the offset is 12 bit wide and in pseudo-code:
offset = sign_ext(imm[11:5] << 5 | imm[4:0])
See also the first figure in Section 2.6 (Load and Store Instruction) of the RISC-V Base specification (2019-06-08 ratified):
I am just learning NASM and I am kind of struggling to figure this out. How do you declare variables in NASM? For example, how would you declare unsigned int i in NASM? Thanks
there is no such thing as unsigned int in assembly language (as far as I know).
In NASM you can only declare memory locations and put contents in it.
example:
section .data
abyte: db 15
aword: dw 452
adword: dd 478569
; etc etc see Nasm manual for more 'types'
The way you treat the variables will make you to use signed or unsigned values. When you need signed values the keep in mind that div and mul only works for unsigned values. (The MSB is not the sign bit). In that case you should use idiv and imul (integer division or signed division).
Also keep in mind that the negative of a value will be shown as two's complement. You will see for 5 (in AX as example) : 0000000000000101 binary but for -5 you will see 1111111111111011 which is the two's complement of 5.
both added gives 5 + (-5) or 0000000000000101 + 1111111111111011 = 0000000000000000. The overflow flag will be set appropriatly to indicate that there is an overflow when both numbers are treated as unsigned, so sometimes you can ignore this. A good practice is to debug and check often the flag status.
To check if AX is negative or not you can and ax, ax and the sign flag will be 1 if the MSB is 1 otherwise 0. (js and jns instructions)
The answer is a bit late but for those who have the same question.....
often in code that uses permissions checking, i see some folks use hex 0x0001 and others use 0x00000001. these both look like an equivalent of a decimal 1, if i'm not mistaking.
why use one over the other, just a matter of preference?
Assuming that this is C, C++, Java, C# or something similar, they are the same. 0x0001 implies a 16-bit value while 0x00000001 implies a 32-bit value, but the real word length is determined by the compiler at compile time when evaluating hexadecimal literals such as these. This is a question of coding style, but it doesn't make any difference in the compiled code.
What's going on here is this is a bitmask for which it is tradition to place leading zeros out to the width of the bitmask. I would furthermore guess the width of the bitmask changed at some point to add more specialized permissions.