Is there a convenient way to represent x86 instructions in a struct or other language feature? - rust

Rust doesn't have a "bit" data type, however, x86 instructions have a "field" which is in size of bits. Instead of using bit-wise operations, is there any data structure that can be directly compiled to such "memory/byte alignment" required by x86 instruction set or any binary protocol?
OpCode 1or2 byte
Mod-R/M 0 or 1 byte
Mod 7,6 bit
Reg/OpCode 5,4,3 bit
R/M 2,1,0 bit
SIB 0 or 1 byte
SS 7,6
Index 5,4,3
Base 2,1,0
Displacement 0,1,2 or 4 byte
Immediate 0,1,2 or 4

is there any data structure that can be directly compiled
No, there are no structures that correspond to this:
OpCode 1or2 byte
That is, you cannot have a struct that has a value that is either one or two bytes long. Structures have a fixed size at compile time.
Your main choices are:
Use pretty Rust features like enums and structs. This is likely to not match the bit pattern of the actual instructions.
Make something like struct Instruction([u8; 4]) and implement methods that use bitwise operations. This will allow you to match the bit patterns.
Since you don't want to use bitwise operations and must match the bit representation, I do not believe your problem can currently be solved in the fashion you'd like.
Personally, I'd probably go the enum route and implement methods to parse the raw instructions from a type implementing Read and Write back to bytes.
It's also possible you are interested in bitfields, like this C++ example:
struct S {
unsigned int b : 3;
};
There is no direct support for that in Rust, but a few crates appear to support macros to create them. Perhaps that would be useful.

Related

How do I cast the struct to a char pointer?

Effectively I want to make an spi interface where I'll be able to change bits 18-22 and bits 1-16 separately (I want a one hot address on bits 1-16 and a binary coded decimal on bits 18-22) here's how I intend on implementing the struct
struct spi_out
{
unsigned int BCDADDR : 4
unsigned int OHADDR : 16
/// Some other spi bit addresses making up the rest of the 3 bytes
So here's my problem
I want to be able to access BCD address and encode it directly eg: spi_out.bcd = 5// in order to access the 6th cellbut I also want to use the operator function to format the bits the way I need them since I need the variables in the order I put them in and I can't figure out a simple way of doing this since I wouldn't want to have to put an LUT inside an operator function but I need to be able to cast the string of bits to a char pointer so the 3 bytes of information from the function can be fed to a hardware abstraction functionHAL_SPI_Transmit() , like I know the data is kept as 3 bytes so I don't see why I can't access them as such>:/
Okay so i have come to appreciate that my question was worded in such an annoying confusing way but i have actually found an answer to my own question, that is to user the Union keyword, this means i can define a union that means i can create a type that can be treated as a struct to access the individual bits or an array of 4 chars. I did not realize this existed but here is a link to the stack overflow question where i found my answer
accessing bit fields in structure
sorry guys

FFI Primitive Type Size

Assume, I have a C/C++ Header with a type definition like this:
typedef int WORD;
And a function like this:
WORD test(WORD input);
Now, as I understand it, a int in C/C++ can be of a different size, depending on the platform.
If I now link Rust Code to a dll with said function, can my FFI break because of a different sized primitive type?
How can I guard against that?
int in C is guaranteed to have at least size of 2 bytes. Exact size is implementation defined.
C ABI is generally stable across one operating system and architecture.
Rust doesn't have integer types that can vary by size between different architectures (except isize and usize). It's choice by design, but you can try to use variable sized types that are guaranteed to correspond with target system C ABI.
use libc::c_int;
C types from libc crate are exactly what you need.
But I will recommend to not rely on types that can have variable size and instead use fixed width integer types like int32_t in C and i32 in Rust. Coding correctly with different sizes of primitive types in mind can be sometimes challenging.
EDIT: By byte I mean byte which has exactly 8 bits, not byte defined by C standard.

Rust seems to allocate the same space in memory for an array of booleans as an array of 8 bit integers

Running this code in rust:
fn main() {
println!("{:?}", std::mem::size_of::<[u8; 1024]>());
println!("{:?}", std::mem::size_of::<[bool; 1024]>());
}
1024
1024
This is not what I expected. So I compiled and ran in release mode. But I got the same answer.
Why does the rust compiler seemingly allocate a whole byte for each single boolean? To me it seems to be a simple optimization to only allocate 128 bytes instead. This project implies I'm not the first to think this.
Is this a case of compilers being way harder than the seem? Or is this not optimized because it isn't a realistic scenario? Or am I not understanding something here?
Pointers and references.
There is an assumption that you can always take a reference to an item of a slice, a field of a struct, etc...
There is an assumption in the language that any reference to an instance of a statically sized type can transmuted to a type-erased pointer *mut ().
Those two assumptions together mean that:
due to (2), it is not possible to create a "bit-reference" which would allow sub-byte addressing,
due to (1), it is not possible not to have references.
This essentially means that any type must have a minimum alignment of one byte.
Note that this is not necessarily an issue. Opting in to a 128 bytes representation should be done cautiously, as it implies trading off speed (and convenience) for memory. It's not a pure win.
Prior art (in the name of std::vector<bool> in C++) is widely considered a mistake in hindsight.

What does Int use three bits for? [duplicate]

Why is GHC's Int type not guaranteed to use exactly 32 bits of precision? This document claim it has at least 30-bit signed precision. Is it somehow related to fitting Maybe Int or similar into 32-bits?
It is to allow implementations of Haskell that use tagging. When using tagging you need a few bits as tags (at least one, two is better). I'm not sure there currently are any such implementations, but I seem to remember Yale Haskell used it.
Tagging can somewhat avoid the disadvantages of boxing, since you no longer have to box everything; instead the tag bit will tell you if it's evaluated etc.
The Haskell language definition states that the type Int covers at least the range [−229, 229−1].
There are other compilers/interpreters that use this property to boost the execution time of the resulting program.
All internal references to (aligned) Haskell data point to memory addresses that are multiple of 4(8) on 32-bit(64-bit) systems. So, references need only 30bits(61bits) and therefore allow 2(3) bits for "pointer tagging".
In case of data, the GHC uses those tags to store information about that referenced data, i.e. whether that value is already evaluated and if so which constructor it has.
In case of 30-bit Ints (so, not GHC), you could use one bit to decide if it is either a pointer to an unevaluated Int or that Int itself.
Pointer tagging could be used for one-bit reference counting, which can speed up the garbage collection process. That can be useful in cases where a direct one-to-one producer-consumer relationship was created at runtime: It would result directly in memory reuse instead of a garbage collector feeding.
So, using 2 bits for pointer tagging, there could be some wild combination of intense optimisation...
In case of Ints I could imagine these 4 tags:
a singular reference to an unevaluated Int
one of many references to the same possibly still unevaluated Int
30 bits of that Int itself
a reference (of possibly many references) to an evaluated 32-bit Int.
I think this is because of early ways to implement GC and all that stuff. If you have 32 bits available and you only need 30, you could use those two spare bits to implement interesting things, for instance using a zero in the least significant bit to denote a value and a one for a pointer.
Today the implementations don't use those bits so an Int has at least 32 bits on GHC. (That's not entirely true. IIRC one can set some flags to have 30 or 31 bit Ints)

Bit Size of GHC's Int Type

Why is GHC's Int type not guaranteed to use exactly 32 bits of precision? This document claim it has at least 30-bit signed precision. Is it somehow related to fitting Maybe Int or similar into 32-bits?
It is to allow implementations of Haskell that use tagging. When using tagging you need a few bits as tags (at least one, two is better). I'm not sure there currently are any such implementations, but I seem to remember Yale Haskell used it.
Tagging can somewhat avoid the disadvantages of boxing, since you no longer have to box everything; instead the tag bit will tell you if it's evaluated etc.
The Haskell language definition states that the type Int covers at least the range [−229, 229−1].
There are other compilers/interpreters that use this property to boost the execution time of the resulting program.
All internal references to (aligned) Haskell data point to memory addresses that are multiple of 4(8) on 32-bit(64-bit) systems. So, references need only 30bits(61bits) and therefore allow 2(3) bits for "pointer tagging".
In case of data, the GHC uses those tags to store information about that referenced data, i.e. whether that value is already evaluated and if so which constructor it has.
In case of 30-bit Ints (so, not GHC), you could use one bit to decide if it is either a pointer to an unevaluated Int or that Int itself.
Pointer tagging could be used for one-bit reference counting, which can speed up the garbage collection process. That can be useful in cases where a direct one-to-one producer-consumer relationship was created at runtime: It would result directly in memory reuse instead of a garbage collector feeding.
So, using 2 bits for pointer tagging, there could be some wild combination of intense optimisation...
In case of Ints I could imagine these 4 tags:
a singular reference to an unevaluated Int
one of many references to the same possibly still unevaluated Int
30 bits of that Int itself
a reference (of possibly many references) to an evaluated 32-bit Int.
I think this is because of early ways to implement GC and all that stuff. If you have 32 bits available and you only need 30, you could use those two spare bits to implement interesting things, for instance using a zero in the least significant bit to denote a value and a one for a pointer.
Today the implementations don't use those bits so an Int has at least 32 bits on GHC. (That's not entirely true. IIRC one can set some flags to have 30 or 31 bit Ints)

Resources