ASM question, two's complement

ASM question, two's complement - nasm

so this book "assembly language step by step" is really awesome, but it was sort of cryptic about how two's complement works when working on actual memory and register data. along with that, i'm not sure how signed values are represented in memory either, which i feel might be what's keeping me confused. anywho...
it says: "-1 = $FF, -2 = $FE and so on". now i understand that the two's complement of a number is itself multiplied by -1 and when added to the original will give you 0. so, FF is the hex equivalent of 11111111 in binary, and 255 in decimal. so my question is: what's the book saying when it says "-1 = $FF"? does it mean that -255 + -1 will give you 0 but also, which it didn't explicitly, set the OF flag?
so in practice... let's say we have 11h, which is 17 in decimal, and 00100001 in binary. and this value is in AL.
so then we NEG AL, and this will set the CF and SF, and change the value in AL to... 239 in decimal, 11101111 in binary, or EFh? i just don't see how that would be 17 * -1? or is that just a poorly worded explanation by the book, where it really means that it gives you the value you would need to cause an overflow?
thanks!

In two's complement, for bytes, (-x) == (256 - x) == (~x + 1). (~ is C'ish for the NOT operator, which flips all the bits in its operand.)
Let's say we have 11h.
100h - 11h == EFh
(256 - 17 == 239)
Note, the 256 works with bytes, cause they're 8 bits in size. For 16-bit words you'd use 2^16 (65536), for dwords 2^32. Also note that all math is mod 256 for bytes, 65536 for shorts, etc.
Or, using not/+1,
~11h = EEh
+1... EFh
This method works for words of all sizes.

what's the book saying when it says "-1 = $FF"?
If considering a byte only, the two's complement of 1 is 0xff (or $FF if using that format for hex numbers).
To break it down, the complement (or one's complement) of 1 is 0xfe, then you add 1 to get the two's complement: 0xff
Similarly for 2: the complement is 0xfd, add 1 to get the two's complement: 0xfe
Now let's look at 17 decimal. As you say, that's 0x11. The complement is 0xee, and the two's complement is 0xef - all that agrees with what you stated in your question.
Now, experiment with what happens when you add the numbers together. First in decimal:
17 + (-17) == 0
Now in hex:
0x11 + 0xef == 0x100
Since we're dealing with numeric objects that are only a byte in size, the 1 in 0x100 is discarded (some hand waving here...), and we result in:
0x11 + 0xef == 0x00
To deal with the 'hand waving' (I probably won't do this in an understandable manner, unfortunately): since the overflow flag (OF or sometimes called V for reasons that I don't know) is the same as the carry flag (C) the carry can be ignored (it's an indication that signed arithmetic occurred correctly). One way to think of it that's probably not very precise, but I find useful, is that leading ones in a negative two's complement number are 'the same as' leading zeros in a non-negative two's complement number.

Related

UTF-16 Encoding - Why using complex surrogate pairs?

I have been working on string encoding schemes and while I examine how UTF-16 works, I have a question. Why using complex surrogate pairs to represent 21 bits code point? Why not to simply store the bits in the first code unit and the remaining bits in the second code unit? Am I missing something! Is there a problem to store the bits directly like we did in UTF-8?
Example of what I am thinking of:
The character '🙃'
Corresponding code point: 128579 (Decimal)
The binary form: 1 1111 0110 0100 0011 (17 bits)
It's 17-bit code point.
Based on UTF-8 schemes, it will be represented as:
240 : 11110 000
159 : 10 011111
153 : 10 011001
131 : 10 000011
In UTF-16, why not do something looks like that rather than using surrogate pairs:
49159 : 110 0 0000 0000 0111
30275 : 01 11 0110 0100 0011

Proposed alternative to UTF-16
I think you're proposing an alternative format using 16-bit code units analogous to the UTF-8 code scheme — let's designate it UTF-EMF-16.
In your UTF-EMF-16 scheme, code points from U+0000 to U+7FFF would be encoded as a single 16-bit unit with the MSB (most significant bit) always zero. Then, you'd reserve 16-bit units with the 2 most significant bits set to 10 as 'continuation units', with 14 bits of payload data. And then you'd encode code points from U+8000 to U+10FFFF (the current maximum Unicode code point) in 16-bit units with the three most significant bits set to 110 and up to 13 bits of payload data. With Unicode as currently defined (U+0000 .. U+10FFFF), you'd never need more than 7 of the 13 bits set.
U+0000 .. U+7FFF — One 16-bit unit: values 0x0000 .. 0x7FFF
U+8000 .. U+10FFF — Two 16-bit units:
1. First unit 0xC000 .. 0xC043
2. Second unit 0x8000 .. 0xBFFF
For your example code point, U+1F683 (binary: 1 1111 0110 0100 0011):
First unit: 1100 0000 0000 0111 = 0xC007
Second unit: 1011 0110 0100 0011 = 0xB643
The second unit differs from your example in reversing the two most significant bits, from 01 in your example to 10 in mine.
Why wasn't such a scheme used in UTF-16
Such a scheme could be made to work. It is unambiguous. It could accommodate many more characters than Unicode currently allows. UTF-8 could be modified to become UTF-EMF-8 so that it could handle the same extended range, with some characters needing 5 bytes instead of the current maximum of 4 bytes. UTF-EMF-8 with 5 bytes would encode up to 26 bits; UTF-EMF-16 could encode 27 bits, but should be limited to 26 bits (roughly 64 million code points, instead of just over 1 million). So, why wasn't it, or something very similar, adopted?
The answer is the very common one – history (plus backwards compatibility).
When Unicode was first defined, it was hoped or believed that a 16-bit code set would be sufficient. The UCS2 encoding was developed using 16-bit values, and many values in the range 0x8000 .. 0xFFFF were given meanings. For example, U+FEFF is the byte order mark.
When the Unicode scheme had to be extended to make Unicode into a bigger code set, there were many defined characters with the 10 and 110 bit patterns in the most significant bits, so backwards compatibility meant that the UTF-EMF-16 scheme outlined above could not be used for UTF-16 without breaking compatibility with UCS2, which would have been a serious problem.
Consequently, the standardizers chose an alternative scheme, where there are high surrogates and low surrogates.
0xD800 .. 0xDBFF High surrogates (most signicant bits of 21-bit value)
0xDC00 .. 0xDFFF Low surrogates (less significant bits of 21-bit value)
The low surrogates range provides storage for 10 bits of data — the prefix 1101 11 uses 6 of 16 bits. The high surrogates range also provides storage for 10 bits of data — the prefix 1101 10 also uses 6 of 16 bits. But because the BMP (Basic Multilingual Plane — U+0000 .. U+FFFF) doesn't need to be encoded with two 16-bit units, the UTF-16 encoding subtracts 1 from the high order data, and can therefore be used to encode U+10000 .. U+10FFFF. (Note that although Unicode is a 21-bit encoding, not all 21-bit (unsigned) numbers are valid Unicode code points. Values from 0x110000 .. 0x1FFFFF are 21-bit numbers but are not a part of Unicode.)
From the Unicode FAQ — UTF-8, UTF-16, UTF-32 & BOM:
Q: What’s the algorithm to convert from UTF-16 to character codes?
A: The Unicode Standard used to contain a short algorithm, now there is just a bit distribution table. Here are three short code snippets that translate the information from the bit distribution table into C code that will convert to and from UTF-16.
Using the following type definitions
typedef unsigned int16 UTF16;
typedef unsigned int32 UTF32;
the first snippet calculates the high (or leading) surrogate from a character code C.
const UTF16 HI_SURROGATE_START = 0xD800
UTF16 X = (UTF16) C;
UTF32 U = (C >> 16) & ((1 << 5) - 1);
UTF16 W = (UTF16) U - 1;
UTF16 HiSurrogate = HI_SURROGATE_START | (W << 6) | X >> 10;
where X, U and W correspond to the labels used in Table 3-5 UTF-16 Bit Distribution. The next snippet does the same for the low surrogate.
const UTF16 LO_SURROGATE_START = 0xDC00
UTF16 X = (UTF16) C;
UTF16 LoSurrogate = (UTF16) (LO_SURROGATE_START | X & ((1 << 10) - 1));
Finally, the reverse, where hi and lo are the high and low surrogate, and C the resulting character
UTF32 X = (hi & ((1 << 6) -1)) << 10 | lo & ((1 << 10) -1);
UTF32 W = (hi >> 6) & ((1 << 5) - 1);
UTF32 U = W + 1;
UTF32 C = U << 16 | X;
A caller would need to ensure that C, hi, and lo are in the appropriate ranges. [

In PBKDF2 is INT (i) signed?

Page 11 of RFC 2898 states that for U_1 = PRF (P, S || INT (i)), INT (i) is a four-octet encoding of the integer i, most significant octet first.
Does that mean that i is a signed value and if so what happens on overflow?

Nothing says that it would be signed. The fact that dkLen is capped at (2^32 - 1) * hLen suggests that it's an unsigned integer, and that it cannot roll over from 0xFFFFFFFF (2^32 - 1) to 0x00000000.
Of course, PBKDF2(MD5) wouldn't hit 2^31 until you've asked for 34,359,738,368 bytes. That's an awful lot of bytes.
SHA-1: 42,949,672,960
SHA-2-256 / SHA-3-256: 68,719,476,736
SHA-2-384 / SHA-3-384: 103,079,215,104
SHA-2-512 / SHA-3-512: 137,438,953,472
Since the .NET implementation (in Rfc2898DeriveBytes) is an iterative stream it could be polled for 32GB via a (long) series of calls. Most platforms expose PBKDF2 as a one-shot, so you'd need to give them a memory range of 32GB (or more) to identify if they had an error that far out. So even if most platforms get the sign bit wrong... it doesn't really matter.
PBKDF2 is a KDF (key derivation function), so used for deriving keys. AES-256 is 32 bytes, or 48 if you use the same PBKDF2 to generate an IV (which you really shouldn't). Generating a private key for the ECC curve with a 34,093 digit prime is (if I did my math right) 14,157 bytes. Well below the 32GB mark.

i ranges from 1 to l = CEIL (dkLen / hLen), and dkLen and hLen are positive integers. Therefore, i is strictly positive.
You can, however, store i in a signed, 32-bit integer type without any special handling. If i rolls over (increments from 0x7FFFFFFF to 0xF0000000), it will continue to be encoded correctly, and continue to increment correctly. With two's complement encoding, bitwise results for addition, subtraction, and multiplication are the same as long as all values are treated as either signed or unsigned.

Verilog operation unexpected result

I am studying verilog language and faced problems.
integer intA;
...
intA = - 4'd12 / 3; // expression result is 1431655761.
// -4’d12 is effectively a 32-bit reg data type
This snippet from standard and it blew our minds. The standard says that 4d12 - is a 4 bit number 1100.
Then -4d12 = 0100. It's okay now.
To perform the division, we need to bring the number to the same size. 4 to 32 bit. The number of bits -4'd12 - is unsigned, then it should be equal to 32'b0000...0100, but it equal to 32'b1111...10100. Not ok, but next step.
My version of division: -4d12 / 3 = 32'b0000...0100 / 32'b0000...0011 = 1
Standart version: - 4'd12 / 3 = 1431655761
Can anyone tell why? Why 4 bit number keeps extra bits?

You need to read section 11.8.2 Steps for evaluating an expression of the 1800-2012 LRM. They key piece you are missing is that the operand is 4'd12 and that it is sized to 32 bits as an unsigned value before the unary - operator is applied.
If you want the 4-bit value treated as a signed -3, then you need to write
intA = - 4'sd12 / 3 // result is 1

here the parser interprets -'d12 as 32 bits number which is unsigned initially and the negative sign would result in the negation of bits. so the result would be
negation of ('d12)= negation of (28 zeros + 1100)= 28ones+2zeros+2ones =
11111111111111111111111111110011. gives output to 4294967283 . if you divide this number (4294967283) by 3 the answer would be 1,431,655,761.
keep smiling :)

fixed point integer division ("fractional division") algorithm

The Honeywell DPS8 computer (and others) have/had a "divide fractional" instruction:
"This instruction divides a 71-bit fractional dividend (including sign) by a 36-bit
fractional divisor (including sign) to form a 36-bit fractional quotient (including
sign) and a 36-bit fractional remainder (including sign). Bit 35 of the remainder
corresponds to bit 70 of the dividend. The remainder sign is equal to the dividend
sign unless the remainder is zero."
So, as I understand it, this is integer division with the decimal point way over on the left.
.qqqqq / .ddddd
(I did scaled integer math in FORTH back in the day, but my memories of the techniques are lost in fog of time.)
To implement this instruction in a DPS8 emulator, I believe I need to start by creating two 70 bit numbers: the 71 bit dividend less it's sign bit, and the the 36 bit divisor less its sign bit and shifted 35 bits to the left so that the decimal points line up.
I think I can then form the remainder and quotient (in C) with '%' and '/', but I am unsure if those results need to be normalized (i.e. shifted).
I found an example of a "shift and subtract" algorithm "Computer Arithmetic", slide 10), but I would prefer a more straight forward implementation.
Am I on the right track, or is the solution more nuanced (fixing up the signs and detection of errors have been elided from here; those stages are well documented. The actual division is the issue.). Any pointers to C implementations of this kind of hardware emulation would be particularly helpful.

I do not have the definitive answer, but as a division is a division, you might find it helpful to look at some basic division routines.
Imagine that you have a 32-bit variable and you want an 8-bit fractional part.
You then have an integer part between 0 and 16777215, and a fractional part which is between 0 and 255.
0xiiiiiiff (where i is the integer part, f is the fractional part).
Imagine you have a 24-bit dividend (numerator), say the value 3, and a 24-bit divisor (denominator), say the value 13.
As we quickly will see, 3/13 is greater than zero and less than one. That means our fractional part is nonzero, but our integer part is filled completely with zeros.
So to do the above division using a standard divide function, we'll just bit-shift the dividend by N, thus we will get N bits of precision in our fractional part.
quotient_fp = (dividend_ip << 8) / divisor_ip
So far, so good.
But what if we want the divisor to have a fractional part, then ?
If we just shift the divisor up by 8, then we'll have a problem:
(dividend_ip << 8) / (divisor_ip << 8)
- because we'll obviously lose our fractional part of the quotient (result).
Instead, we'll need to shift the dividend up by as many bits as we shift the fractional part up...
((dividend_ip << 8) << 8) / (divisor_ip << 8)
...That makes it...
(dividend_ip << (dividend_precision + divisor_precision) / (divisor_ip << divisor_precision)
Now, let's put our fractional part math into the picture...
(((dividend_ip << dividend_precision) | dividend_fp) << divisor_precision) / ((divisor_ip << divisor_precision) | divisor_fp)
Our quotient's precision will be the same as dividend_precision, which is 8 bits.
Unfortunately, this eats a lot of bits.
Fortunately, in your case, the integer part is not important, so you'll have a lot of room for the fractional part.
Let's increase the precision to 15 bits; this can be tested using normal 32-bit integers...
(((dividend_ip << 15) | dividend_fp) << 15) / ((divisor_ip << 15) | divisor_fp)
Our quotient will now have a 15-bit precision.
OK, but since you're supplying only the fractional parts and the integer part is always zero anyway, you should be able to just toss the integer part. That makes it....
(((dividend_ip << 16) | dividend_fp) << 16) / ((divisor_ip << 16) | divisor_fp)
... reduced to ...
(dividend_fp << 16) / divisor_fp
... now let's use a 64-bit integer instead, we can get 32 bits of precision in the quotient...
(dividend_fp << 32) / divisor_fp
... some compilers have support for a int128_t (it can be enabled on some platforms for GCC), so you might be able to use that type, in order to get 128 bits easily. I have not tried it, but I've come across info on the Web earlier; search for int128_t, and you might find out how.
If you get the int128_t to work, you could make the dividend 128 bit, the divisor 64 bit and the quotient 64 bit...
quotient_fp = ((dividend_fp << 36) / divisor) >> (64 - 36)
... in order to get 36 bits precision.
Notice that since the result is in the top 36 bits of the quotient, the quotient needs to be shifted down (64 - 36) = 28 bits.
You could even go as high as (128 - 36) = 92 bits precision:
(dividend_fp << 92) / divisor
Now, that you probably (hopefully) have a solution, I would like to recommend that you get familiar with low-level binary divide (again; since you've been there a while ago).
The best sources seem to be how hardware divides binary numbers; such as microcontrollers, CPUs and the like. Assembly language dividers are also good for getting to know the inner workings. Often 32-bit divide routines that use bit-shifting are very good sources.
Through the time, I've come across a very clever implementation for ARM in ARM assembly language. Normally I wouldn't post references or assembly language examples, but considering that the code is very small, I think it would be alright.
Taken from A Fast Hi Precision Fixed Point Divide
r0 is the numerator (dividend)
r2 is the denominator (divisor)
mov r1,#0
adds r0,r0,r0
.rept 32
adcs r1,r2,r1,lsl#1
subcc r1,r1,r2
adcs r0,r0,r0
.endr
r0 is the quotient (result)
r1 is the remainder (rest, modulo result)
The above routine contains the basics for an unsigned divide.
I hope this information will be useful. It may contain errors, as I have not tested any code or example mentioned. I'm confident, though, that it's not all wrong. ;)

Verilog shift extending result?

We have the following line of code and we know that regF is 16 bits long, regD is 8 bits long and regE is 8 bits long, regC is 3 bits long and assumed unsigned:
regF <= regF + ( ( regD << regC ) & { 16{ regE [ regC ]} }) ;
My question is : will the shift regD << regC assume that the result is 8 bits or will it extended to 16 bits because of the bitwise & with the 16 bit vector?

The shift sub-expression itself has a width of 8 bits; the bit width of a shift is always the bit width of the left operand (see table 5-22 in the 2005 LRM).
However, things get more complicated after that. The shift sub-expression appears as an operand of the & operator. The bit length of the & expression is the bit-length of the largest of the 2 operands; in this case, 16 bits.
This sub-expression now appears as an operand of the + expression; the result width of this expression is again the maximum width of the two operands of the +, which is again 16.
We now have an assignment. This is not technically an operand, but the same rules are used; in this case, the LHS is also 16 bits, so the size of the RHS is unaffected.
We now know that the overall expression size is 16 bits; this size is propagated back down to the operands, except the 'self-determined' operands. The only self-determined operand here is the RHS of the shift expression (regC), which isn't extended.
The signedness of the expressions is now determined. Propagation happens in the same way. The overall effect here, since we have at least one unsigned operand, is that the expression is unsigned, and all operands are coerced to unsigned. So, all (non-self-determined) operands are coerced to unsigned 16-bit before any operation is actually carried out.
So, in other words, the shift sub-expression actually ends up as a 16-bit shift, even though it appears to be 8-bit at first sight. Note that it's not 16-bit because the RHS of the & is 16-bit, but because the entire sizing process - the width propagation up the expression - came up with an answer of 16. If you'd assigned to an 18-bit reg, instead of the 16-bit regF, then your shift would have been extended to 18 bits.
This is all very complicated and non-intuitive, at least if you have any experience of mainstream languages. It's explained (more or less) in sections 5.4 and 5.5 of the 2005 LRM. If you want any advice, then never write expressions like this. Write defensively - break everything down to individual sub-expressions, and then combine the sub-expressions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string