Related
I am writing a textual packet analyzer for a protocol and in optimizing it I found that a great bottleneck is the find_first_not_of call.
In essence, I need to find if a packet is valid if it contains only valid characters, faster than the default C++ function.
For instance, if all allowed characters are f, h, o, t, and w, in C++ I would just call s.find_first_not_of("fhotw"), but in SSEx I have no clue after loading the string in a set of __m128i variables.
Apparently, the _mm_cmpXstrY functions documentation is not really helping me in this. (e.g. _mm_cmpistri). I could at first subtract with _mm_sub_epi8, but I don't think it would be a great idea.
Moreover, I am stuck with SSE (any version).
This article by Wojciech Muła describes a SSSE3 algorithm to accept/reject any given byte value.
(Contrary to the article, saturated arithmetic should be used to conduct range checks, but we don't have ranges.)
SSE4.2 string functions are often slower** than hand-crafted alternatives. For example, 3 uops, 3 cycle throughput on Skylake for pcmpistri, the fastest of the SSE4.2 string instructions. vs. 1 shuffle and 1 pcmpeqb per 16 bytes of input with this, with SIMD AND and movemask to combine results. Plus some load and register-copy instructions, but still very likely faster than 1 vector per 3 cycles. Doesn't quite as easily handle short 0-terminated strings, though; SSE4.2 is worth considering if you also need to worry about that, instead of known-size blocks that are a multiple of the vector width.
For "fhotw" specifically, try:
#include <tmmintrin.h> // pshufb
bool is_valid_64bytes (uint8_t* src) {
const __m128i tab = _mm_set_epi8('o','_','_','_','_','_','_','h',
'w','f','_','t','_','_','_','_');
__m128i src0 = _mm_loadu_si128((__m128i*)&src[0]);
__m128i src1 = _mm_loadu_si128((__m128i*)&src[16]);
__m128i src2 = _mm_loadu_si128((__m128i*)&src[32]);
__m128i src3 = _mm_loadu_si128((__m128i*)&src[48]);
__m128i acc;
acc = _mm_cmpeq_epi8(_mm_shuffle_epi8(tab, src0), src0);
acc = _mm_and_si128(acc, _mm_cmpeq_epi8(_mm_shuffle_epi8(tab, src1), src1));
acc = _mm_and_si128(acc, _mm_cmpeq_epi8(_mm_shuffle_epi8(tab, src2), src2));
acc = _mm_and_si128(acc, _mm_cmpeq_epi8(_mm_shuffle_epi8(tab, src3), src3));
return !!(((unsigned)_mm_movemask_epi8(acc)) == 0xFFFF);
}
Using the low 4 bits of the data, we can select a byte from our set that has that low nibble value. e.g. 'o' (0x6f) goes in the high byte of the table so input bytes of the form 0x?f try to match against it. i.e. it's the first element for _mm_set_epi8, which goes from high to low.
See the full article for variations on this technique for other special / more general cases.
**If the search is very simple (doesn't need the functionality of string instructions) or very complex (needs at least two string instructions) then it doesn't make much sense to use the string functions. Also the string instructions don't scale to the 256-bit width of AVX2.
I am trying to understand exactly how programming in Lua can change the state of I/O's with a Modbus I/O module. I have read the modbus protocol and understand the registers, coils, and how a read/write string should look. But right now, I am trying to grasp how I can manipulate the read/write bit(s) and how functions can perform these actions. I know I may be very vague right now, but hopefully the following functions, along with some questions throughout them, will help me better convey where I am having the disconnect. It has been a very long time since I've first learned about bit/byte manipulation.
local funcCodes = { --[[I understand this part]]
readCoil = 1,
readInput = 2,
readHoldingReg = 3,
readInputReg = 4,
writeCoil = 5,
presetSingleReg = 6,
writeMultipleCoils = 15,
presetMultipleReg = 16
}
local function toTwoByte(value)
return string.char(value / 255, value % 255) --[[why do both of these to the same value??]]
end
local function readInputs(s)
local s = mperia.net.connect(host, port)
s:set_timeout(0.1)
local req = string.char(0,0,0,0,0,6,unitId,2,0,0,0,6)
local req = toTwoByte(0) .. toTwoByte(0) .. toTwoByte(6) ..
string.char(unitId, funcCodes.readInput)..toTwoByte(0) ..toTwoByte(8)
s:write(req)
local res = s:read(10)
s:close()
if res:byte(10) then
local out = {}
for i = 1,8 do
local statusBit = bit32.rshift(res:byte(10), i - 1) --[[What is bit32.rshift actually doing to the string? and the same is true for the next line with bit32.band.
out[#out + 1] = bit32.band(statusBit, 1)
end
for i = 1,5 do
tDT.value["return_low"] = tostring(out[1])
tDT.value["return_high"] = tostring(out[2])
tDT.value["sensor1_on"] = tostring(out[3])
tDT.value["sensor2_on"] = tostring(out[4])
tDT.value["sensor3_on"] = tostring(out[5])
tDT.value["sensor4_on"] = tostring(out[6])
tDT.value["sensor5_on"] = tostring(out[7])
tDT.value[""] = tostring(out[8])
end
end
return tDT
end
If I need to be a more specific with my questions, I'll certainly try. But right now I'm having a hard time connecting the dots with what is actually going on to the bit/byte manipulation here. I've read both books on the bit32 library and sources online, but still don't know what these are really doing. I hope that with these examples, I can get some clarification.
Cheers!
--[[why do both of these to the same value??]]
There are two different values here: value / 255 and value % 255. The "/" operator represents divison, and the "%" operator represents (basically) taking the remainder of division.
Before proceeding, I'm going to point out that 255 here should almost certainly be 256, so let's make that correction before proceeding. The reason for this correction should become clear soon.
Let's look at an example.
value = 1000
print(value / 256) -- 3.90625
print(value % 256) -- 232
Whoops! There was another problem. string.char wants integers (in the range of 0 to 255 -- which has 256 distinct values counting 0), and we may be given it a non-integer. Let's fix that problem:
value = 1000
print(math.floor(value / 256)) -- 3
-- in Lua 5.3, you could also use value // 256 to mean the same thing
print(value % 256) -- 232
What have we done here? Let's look 1000 in binary. Since we are working with two-byte values, and each byte is 8 bits, I'll include 16 bits: 0b0000001111101000. (0b is a prefix that is sometimes used to indicate that the following number should be interpreted as binary.) If we split this into the first 8 bits and the second 8 bits, we get: 0b00000011 and 0b11101000. What are these numbers?
print(tonumber("00000011",2)) -- 3
print(tonumber("11101000",2)) -- 232
So what we have done is split a 2-byte number into two 1-byte numbers. So why does this work? Let's go back to base 10 for a moment. Suppose we have a four-digit number, say 1234, and we want to split it into two two-digit numbers. Well, the quotient 1234 / 100 is 12, and the remainder of that divison is 34. In Lua, that's:
print(math.floor(1234 / 100)) -- 12
print(1234 % 100) -- 34
Hopefully, you can understand what's happening in base 10 pretty well. (More math here is outside the scope of this answer.) Well, what about 256? 256 is 2 to the power of 8. And there are 8 bits in a byte. In binary, 256 is 0b100000000 -- it's a 1 followed by a bunch of zeros. That means it a similar ability to split binary numbers apart as 100 did in base 10.
Another thing to note here is the concept of endianness. Which should come first, the 3 or the 232? It turns out that different computers (and different protocols) have different answers for this question. I don't know what is correct in your case, you'll have to refer to your documentation. The way you are currently set up is called "big endian" because the big part of the number comes first.
--[[What is bit32.rshift actually doing to the string? and the same is true for the next line with bit32.band.]]
Let's look at this whole loop:
local out = {}
for i = 1,8 do
local statusBit = bit32.rshift(res:byte(10), i - 1)
out[#out + 1] = bit32.band(statusBit, 1)
end
And let's pick a concrete number for the sake of example, say, 0b01100111. First let's lookat the band (which is short for "bitwise and"). What does this mean? It means line up the two numbers and see where two 1's occur in the same place.
01100111
band 00000001
-------------
00000001
Notice first that I've put a bunch of 0's in front of the one. Preceeding zeros don't change the value of the number, but I want all 8 bits for both numbers so that I can check each digit (bit) of the first number with each digit of the second number. In each place where there both numbers had a 1 (the top number had a 1 "and" the bottom number had a 1), I put a 1 for the result, otherwise I put 0. That's bitwise and.
When we bitwise and with 0b00000001 as we did here, you should be able to see that we will only get a 1 (0b00000001) or a 0 (0b00000000) as the result. Which we get depends on the last bit of the other number. We have basically separated out the last bit of that number from the rest (which is often called "masking") and stored it in our out array.
Now what about the rshift ("right shift")? To shift right by one, we discard the rightmost digit, and move everything else over one space the the right. (At the left, we usually add a 0 so we still have 8 bits ... as usual, adding a bit in front of a number doesn't change it.)
right shift 01100111
\\\\\\\\
0110011 ... 1 <-- discarded
(Forgive my horrible ASCII art.) So shifting right by 1 changes our 0b01100111 to 0b00110011. (You can also think of this as chopping off the last bit.)
Now what does it mean to shift right be a different number? Well to shift by zero does not change the number. To shift by more than one, we just repeat this operation however many times we are shifting by. (To shift by two, shift by one twice, etc.) (If you prefer to think in terms of chopping, right shift by x is chopping off the last x bits.)
So on the first iteration through the loop, the number will not be shifted, and we will store the rightmost bit.
On the second iteration through the loop, the number will be shifted by 1, and the new rightmost bit will be what was previously the second from the right, so the bitwise and will mask out that bit and we will store it.
On the next iteration, we will shift by 2, so the rightmost bit will be the one that was originally third from the right, so the bitwise and will mask out that bit and store it.
On each iteration, we store the next bit.
Since we are working with a byte, there are only 8 bits, so after 8 iterations through the loop, we will have stored the value of each bit into our table. This is what the table should look like in our example:
out = {1,1,1,0,0,1,1,0}
Notice that the bits are reversed from how we wrote them 0b01100111 because we started looking from the right side of the binary number, but things are added to the table starting on the left.
In your case, it looks like each bit has a distinct meaning. For example, a 1 in the third bit could mean that sensor1 was on and a 0 in the third bit could mean that sensor1 was off. Eight different pieces of information like this were packed together to make it more efficient to transmit them over some channel. The loop separates them again into a form that is easy for you to use.
When objdump -S my_program, usually I can see the following indirect jmp instruction, which is typically used for switch/case jump table:
ffffffff802e04d3: ff 24 d5 38 2e 10 81 jmpq *-0x7eefd1c8(,%rdx,8)
How to understand the address -0x7eefd1c8? It means the table's base address is 0xffffffff802e04d3 - 0x7eefd1c8?
Also, how can I get -0x7eefd1c8 from ff 24 d5 38 2e 10 81?
decoding issue
Than you have to look at Intel Development Manuals
ff is JMP opcode (Jump near, absolute indirect) [1]
24 is a ModR/M byte [2] which means that SIB byte goes after it (JMP opcode has only one operand, so register field is ignored)
d5 is a SIB byte [2] which means that disp32 goes after that, Scale = 8 and Index = %rdx (in 32-bit mode, Index should be %edx) with no base.
38 2e 10 81 is a 4-byte disp32 operand. If you encode it as double word, you will get 0x81102e38 note that highest bit is set to 1. It is a sign bit meaning that value is encoded in Two’s Complement Encoding [3].
Converting from two's complement gives us expected number:
>>> print hex(0x81102e38 - (1 << 32))
-0x7eefd1c8
When processor executes that instruction in 64-bit mode, it reads 8 bytes from 0xffffffff81102e38 + (%rdx * 8) (original number is sign-extended) and puts that quad word into %rip [4].
References to manuals:
Vol.2A, section 3.2 INSTRUCTIONS (A-M), page 3-440
Vol 2, section 2.1.5 Addressing-Mode Encoding of ModR/M and SIB Bytes, pages 2-6..2-7
Vol 1, section 4.2.1.2 Signed Integers, page 4-4
Vol 2, section 2.2.1.3 Displacement, page 2-11
On the heart rate measurement characteristics:
http://developer.bluetooth.org/gatt/characteristics/Pages/CharacteristicViewer.aspx?u=org.bluetooth.characteristic.heart_rate_measurement.xml
EDIT
Link is now at
https://www.bluetooth.com/specifications/gatt/characteristics/
and look for "heart rate measurement".
They no longer offer an XML viewer, but instead you need to view XML directly.
Also for services it's on this page.
END EDIT
I want to make sure I'm reading it correctly. Does that actually says 5 fields? The mandatory, C1, C2, C3, and C4? And the mandatory is at the first byte, and C4 is at the last two bytes, C1 and C2 are 8-bit fields, and C3 to C4 are 16-bit each. That's a total of 8 bytes. Am I reading this document correctly?
EDIT:
I'm informed that the mandatory flag fields indicate something is 0, it means it's just not there. For example, if the first bit is 0, C1 is the next field, if 1, C2 follows instead.
END EDIT
In Apple's OSX heart rate monitor example:
- (void) updateWithHRMData:(NSData *)data
{
const uint8_t *reportData = [data bytes];
uint16_t bpm = 0;
if ((reportData[0] & 0x01) == 0)
{
/* uint8 bpm */
bpm = reportData[1];
}
else
{
/* uint16 bpm */
bpm = CFSwapInt16LittleToHost(*(uint16_t *)(&reportData[1]));
}
... // I ignore rest of the code for simplicity
}
It checks the first bit as zero, and if it isn't, it's changing the little endianness to whatever the host byte order is, by applying CFSwapInt16LittleToHost to reportData[1].
How does that bit checking work? I'm not entirely certain of endianess. Is it saying that whether it's little or big, the first byte is always the mandatory field, the second byte is the C1, etc? And since reportData is an 8-bit pointer (typedef to unsigned char), it's checking either bit 0 or bit 8 of the mandatory field.
If that bit is bit 8, the bit is reserved for future use, why is it reading in there?
If that bit is 0, it's little-endian and no transformation is required? But if it's little-endian, the first bit could be 1 according to the spec, 1 means "Heart Rate Value Format is set to UINT16. Units: beats per minute (bpm)", couldn't that be mis-read?
I don't understand how it does the checking.
EDIT:
I kept on saying there was C5, that was a blunder. It's up to C4 only and I edited above.
Am I reading this document correctly?
IMHO, you are reading it a little wrong.
C1 to C4 should be read as Conditional 1 to Conditional 4. And in the table for org.bluetooth.characteristic.heart_rate_measurement, if the lowest bit of the flag byte is 0, then C1 is met, otherwise, C2 is.
You can think it a run-time configurable union type in the C programming language(, which is determined by the flag. Beware this is not always true because the situation got complicated by C3 and C4).
// Note: this struct is only for you to better understand a simplified case.
// You should still stick to the profile documentations to implement.
typedef struct {
uint8_t flag;
union {
uint8_t bpm1;
uint16_t bpm2;
}bpm;
} MEASUREMENT_CHAR;
How does that bit checking work?
if ((reportData[0] & 0x01) == 0) effectively checks the bit with bitwise AND operator. Go and find a C/C++ programming intro book if any doubt.
The first byte is always the flag, in this case. The value of flag dynamically determines how should the rest of the bytes should be dealt with. C3 and C4 are both optional, and can be omitted if the corresponding bits in the flag were set zeroes. C1 and C2 are mutual exclusive.
There is no endianness ambiguity in the Bluetooth standard, as it has been well addressed that little-endian should be used all the time. You should always assume that those uint16_t fields are transferred as little endian. Apple's precaution is just to reassure the most portability of the code, since they would not guarantee the endianness of architectures used in their future products.
I see how it goes. It's not testing for Endianness. Rather, it's testing for whether the field is 8 bit or 16 bit, and in the case of 16 bit, it'll convert from little endianness to host order. But I see that before conversion and after conversion it's the same number. So I guess the system is little endian to begin with so I don't know what's the point.
so this book "assembly language step by step" is really awesome, but it was sort of cryptic about how two's complement works when working on actual memory and register data. along with that, i'm not sure how signed values are represented in memory either, which i feel might be what's keeping me confused. anywho...
it says: "-1 = $FF, -2 = $FE and so on". now i understand that the two's complement of a number is itself multiplied by -1 and when added to the original will give you 0. so, FF is the hex equivalent of 11111111 in binary, and 255 in decimal. so my question is: what's the book saying when it says "-1 = $FF"? does it mean that -255 + -1 will give you 0 but also, which it didn't explicitly, set the OF flag?
so in practice... let's say we have 11h, which is 17 in decimal, and 00100001 in binary. and this value is in AL.
so then we NEG AL, and this will set the CF and SF, and change the value in AL to... 239 in decimal, 11101111 in binary, or EFh? i just don't see how that would be 17 * -1? or is that just a poorly worded explanation by the book, where it really means that it gives you the value you would need to cause an overflow?
thanks!
In two's complement, for bytes, (-x) == (256 - x) == (~x + 1). (~ is C'ish for the NOT operator, which flips all the bits in its operand.)
Let's say we have 11h.
100h - 11h == EFh
(256 - 17 == 239)
Note, the 256 works with bytes, cause they're 8 bits in size. For 16-bit words you'd use 2^16 (65536), for dwords 2^32. Also note that all math is mod 256 for bytes, 65536 for shorts, etc.
Or, using not/+1,
~11h = EEh
+1... EFh
This method works for words of all sizes.
what's the book saying when it says "-1 = $FF"?
If considering a byte only, the two's complement of 1 is 0xff (or $FF if using that format for hex numbers).
To break it down, the complement (or one's complement) of 1 is 0xfe, then you add 1 to get the two's complement: 0xff
Similarly for 2: the complement is 0xfd, add 1 to get the two's complement: 0xfe
Now let's look at 17 decimal. As you say, that's 0x11. The complement is 0xee, and the two's complement is 0xef - all that agrees with what you stated in your question.
Now, experiment with what happens when you add the numbers together. First in decimal:
17 + (-17) == 0
Now in hex:
0x11 + 0xef == 0x100
Since we're dealing with numeric objects that are only a byte in size, the 1 in 0x100 is discarded (some hand waving here...), and we result in:
0x11 + 0xef == 0x00
To deal with the 'hand waving' (I probably won't do this in an understandable manner, unfortunately): since the overflow flag (OF or sometimes called V for reasons that I don't know) is the same as the carry flag (C) the carry can be ignored (it's an indication that signed arithmetic occurred correctly). One way to think of it that's probably not very precise, but I find useful, is that leading ones in a negative two's complement number are 'the same as' leading zeros in a non-negative two's complement number.