How to map a structure from a buffer like in C with a pointer and cast - node.js

In C, I can define many structures and structure of structures.
From a buffer, I can just set the pointer at the beginning of this structure to say this buffer represents this structure.
Of course, I do not want to copy anything, just mapping, otherwise I loose the benefit of the speed.
Is it possible in NodeJs ? How can I do ? How can I be sure it's a mapping and not creating a new object and copy information inside ?
Example:
struct House = {
uint8 door,
uint16BE kitchen,
etc...
}
var mybuff = Buffer.allocate(10, 0)
var MyHouse = new House(mybuff) // same as `House* MyHouse = (House*) mybuff`
console.log(MyHouse.door) // will display the value of door
console.log(MyHouse.kitchen) // will display the value of kitchen with BE function.
This is wrong but explain well what I am looking for.
This without copying anything.
And if I do MyHouse.door=56, mybuff contains know the 56. I consider mybuff as a pointer.

Edit after question update below
Opposed to C/C++, javascript uses pionters by default, so you don't have to do anything. It's the other way around, actually: You have to put some effort in if you want a copy of the current object.
In C, a struct is nothing more than a compile-time reference to different parts of data in the struct. So:
struct X {
int foo;
int bar;
}
is nothing more than saying: if you want bar from a variable with type X, just add the length of foo (length of int) to the base pointer.
In Javascript, we do not even have such a type. We can just say:
var x = {
foo: 1,
bar: 2
}
The lookup of bar will automatically be a pointer (we call them references in javascript) lookup. Because javascript does not have types, you can view an object as a map/dictionary with pointers to mixed types.
If you, for any reason, want to create a copy of a datastructure, you would have to iterate through the entire datastructure (recursively) and create a copy of the datastructure manually. The basic types are not pointer based. These include number (Javascript automatically differentiates between int and float under the hood), string and boolean.
Edit after question update
Although I am not an expert on this area, I do not think it is possible. The problem is, the underlying data representation (as in how the data is represented as bytes in memory) is different, because javascript does not have compile-time information about data structures. As I said before, javascript doesn't have classes/structs, just objects with fields, which basically behave (and may be implemented as) maps/dictionaries.
There are, however, some third party libraries to cope with these problems. There are two general approaches:
Unpack everything to javascript objects. The data will be copied, but you can work with it as normal javascript objects. You should use this if you read/write the data intensively, because the performance increase you get when working with normal javascript objects outweighs the advantage of not having to unpack the data. Link to example library
Leave all data in the buffer. When you need some of the data, compute the location of the data in the buffer at runtime, and read/write at this location accordingly. Because the struct data location computations are done in runtime, you should use this only when you have loads of data and only a few reads/writes to it. In this case the performance decrease of unpacking all data outweighs the few runtime computations that have to be done. Link to example library
As a side-note, if the amount of data you have to process isn't that much, I'd recommend to just unpack the data. It saves you the headache of having to use the library as interface to your data. Computers are fast enough nowadays to copy/process some amount of data in memory. Also, these third party libraries are just some examples. I recommend you do a little more research for libraries to decide which one suits your needs.

Related

Extra [ ] Operator Added When Using 2 overloaded [ ] Operators

As a side project I'm writing a couple of classes to do matrix operations and operations on linear systems. The class LinearSystem holds pointers to Matrix class objects in a std::map. The Matrix class itself holds the 2d array ("matrix") in a double float pointer. To I wrote 2 overloaded [ ] operators, one to return a pointer to a matrix object directly from the LinearSystem object, and another to return a row (float *) from a matrix object.
Both of these operators work perfectly on their own. I'm able to use LinearSystem["keyString"] to get a matrix object pointer from the map. And I'm able to use Matrix[row] to get a row (float *) and Matrix[row][col] to get a specific float element from a matrix objects' 2d array.
The trouble comes when I put them together. My limited understanding (rising senior CompSci major) tells me that I should have no problem using LinearSystem["keyString"][row][col] to get an element from a specific array within the Linear system object. The return types should look like LinearSystem->Matrix->float *->float. But for some reason it only works when I place an extra [0] after the overloaded key operator so the call looks like this: LinearSystem["keyString"][0][row][col]. And it HAS to be 0, anything else and I segfault.
Another interesting thing to note is that CLion sees ["keyString"] as overloaded and [row] as overloaded, but not [0], as if its calling the standard index operator, but on what is the question that has me puzzled. LinearSystem["keyString"] is for sure returning Matrix * which only has an overloaded [ ] operator. See the attached screenshot.
screenshot
Here's the code, let me know if more is needed.
LinearSystem [ ] and map declaration:
Matrix *myNameSpace::LinearSystem::operator[](const std::string &name) {
return matrices[name];
}
std::map<std::string, Matrix *> matrices;
Matrix [ ] and array declaration:
inline float *myNameSpace::Matrix::operator[](const int row) {
return elements[row];
}
float **elements;
Note, the above function is inline'd because I'm challenging myself to make the code as fast as possible and even with compiler optimizations, the overloaded [ ] was 15% to 30% slower than using Matrix.elements[row].
Please let me know if any more info is needed, this is my first post so it I'm sure its not perfect.
Thank you!
You're writing C in C++. You need to not do that. It adds complexity. Those stars you keep putting after your types. Those are raw pointers, and they should almost always be avoided. linearSystem["foo"] is a Matrix*, i.e. a pointer to Matrix. It is not a Matrix. And pointers index as arrays, so linearSystem["foo"][0] gets the first element of the Matrix*, treating it as an array. Since there's actually only one element, this works out and seems to do what you want. If that sounds confusing, that's because it is.
Make sure you understand ownership semantics. Raw pointers are frowned upon because they don't convey any information. If you want a value to be owned by the data structure, it should be a T directly (or a std::unique_ptr<T> if you need virtual inheritance), not a T*. If you're taking a function argument or returning a value that you don't want to pass ownership of, you should take/return a const T&, or T& if you need to modify the referent.
I'm not sure exactly what your data looks like, but a reasonable representation of a Matrix class (if you're going for speed) is as an instance variable of type std::vector<double>. std::vector is a managed array of double. You can preallocate the array to whatever size you want and change it whenever you want. It also plays nice with Rule of Three/Five.
I'm not sure what LinearSystem is meant to represent, but I'm going to assume the matrices are meant to be owned by the linear system (i.e. when the system is freed, the matrices should have their memory freed as well), so I'm looking at something like
std::map<std::string, Matrix> matrices;
Again, you can wrap Matrix in a std::unique_ptr if you plan to inherit from it and do dynamic dispatch. Though I might question your design choices if your Matrix class is intended to be subclassed.
There's no reason for Matrix::operator[] to return a raw pointer to float (in fact, "raw pointer to float" is a pretty pointless type to begin with). I might suggest having two overloads.
float myNameSpace::Matrix::operator[](int row) const;
float& myNameSpace::Matrix::operator[](int row);
Likewise, LinearSystem::operator[] can have two overloads: one constant and one mutable.
const Matrix& myNameSpace::LinearSystem::operator[](const std::string& name) const;
Matrix& myNameSpace::LinearSystem::operator[](const std::string& name);
References (T& as opposed to T*) are smart and will effectively dereference when needed, so you can call Matrix::operator[] on a Matrix&, whereas you can't call that on a Matrix* without acknowledging the layer of indirection.
There's a lot of bad C++ advice out there. If the book / video / teacher you're learning from is telling you to allocate float** everywhere, then it's a bad book / video / teacher. C++-managed data is going to be far less error-prone and will perform comparably to raw pointers (the C++ compiler is smarter than either you or I when it comes to optimization, so let it do its thing).
If you do find yourself really feeling the need to go low-level and allocate raw pointers everywhere, then switch to C. C is a language designed for low-level pointer manipulation. C++ is a higher-level managed-memory language; it just so happens that, for historical reasons, C++ has several hundred footguns in the form of C-style allocations placed sporadically throughout the standard.
In summary: Modern C++ almost never uses T*, new, or delete. Start getting comfortable with smart pointers (std::unique_ptr and std::shared_ptr) and references. You'll thank yourself later.

Create a map using type as key

I need a HashMap<K,V> where V is a trait (it will likely be Box or an Rc or something, that's not important), and I need to ensure that the map stores at most one of a given struct, and more importantly, that I can query the presence of (and retrieve/insert) items by their type. K can be anything that is unique to each type (a uint would be nice, but a String or even some large struct holding type information would be sufficient as long as it can be Eq and Hashable)
This is occurring in a library, so I cannot use an enum or such since new types can be added by external code.
I looked into std::any::TypeId but besides not working for non-'static types, it seems they aren't even unique (and allegedly collisions were achieved accidentally with a rather small number of types) so I'd prefer to avoid them if feasible since the number of types I'll have may be very large. (hence this is not a duplicate of this IMO)
I'd like something along the lines of a macro to ensure uniqueness but I can't figure out how to have some kind of global compile time counter. I could use a proper UUID, but it'd be nice to have guaranteed uniqueness since this is, in theory at least, statically determinable.
It is safe to assume that all relevant types are defined either in this lib or in a singular crate that directly depends on it, if that allows for a solution that might be otherwise impossible.
e.g. my thoughts are to generate ids for types in the lib, and also export a constant of the counter, which can be used by the consumer of the lib in the same macro (or a very similar one) but I don't see a way to have such a const value modified by const code in multiple places.
Is this possible or do I need some kind of build script that provides values before compile time?

Why do some struct types let us set members that can only be a certain value?

I was reading up on some vulkan struct types, this is one of many examples, but the one I will use is vkInstanceCreateInfo. The documentation states:
The VkInstanceCreateInfo structure is defined as:
typedef struct VkInstanceCreateInfo {
VkStructureType sType;
const void* pNext;
VkInstanceCreateFlags flags;
const VkApplicationInfo* pApplicationInfo;
uint32_t enabledLayerCount;
const char* const* ppEnabledLayerNames;
uint32_t enabledExtensionCount;
const char* const* ppEnabledExtensionNames;
} VkInstanceCreateInfo;
Then below in the options we see:
sType is the type of this structure
sType must be VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO
If we dont have any options anyway, why is this parameter not just set implicitly upon creation of the type?
Note: I realise this is not something specific to the vulkan API.
Update: I'm not just talking specifically about vulkan, just all parameters that can only be a certain type.
The design allows structures to be chained together so that extensions can create additional parameters to existing calls without interfering with the original API structures and without interfering with each other.
Nearly every struct in Vulkan has sType as it's first member, and pNext as it's second member. That means that if you have a void* and all you know is that it is some kind of Vulkan API struct, you can safely read the first 32 bits and it will be a VkStructureType and read the next 32 or 64 bits and it will tell you if there's another structure in the chain.
So for instance, there's a VkMemoryAllocateInfo structure for allocating memory that has (aside from sType and pNext the size of the allocation and the heap index it should come from. But what if I want to use the "dedicated allocation" extension. Then I also need to fill out a VkMemoryDedicatedAllocateInfo structure with extra information. But I still need to call the same vkAllocateMemory function that only takes a VkMemoryAllocateInfo... so where do I put the VkMemoryDedicatedAllocateInfo structure I filled out? I put a pointer to it in the pNext field of VkMemoryAllocateInfo.
Maybe I also want to share this memory with some OpenGL code. There's an extension that lets you do that, but you need to fill out a VkExportMemoryAllocateInfo structure and pass it in during the allocation as well. Well, I can do that by putting it in the pNext field of my VkMemoryDedicatedAllocateInfo structure. I can create a chain of structures like that as long as I want.
Here's the really important part. Since all structures have sType as their first field, an extension can navigate along this chain of structures and find the ones it cares about without knowing anything about the structures other than that they always start with sType and pNext.
All of this means that Vulkan can be extended in ways that alter the behavior of existing functions, but without changing the function itself, or the structures that are passed to it.
You might ask why all of the core structures have sType and pNext, even though you're passing them to functions with typed pointers, rather than void pointers. The reason is consistency, and because you never know when an existing structure might be needed as part of the chain for some new extension.
If we dont have any options anyway, why is this parameter not just set implicitly upon creation of the type?
Because C isn't C++. There's no way to declare a structure in C and say that this portion of the structure will always have this value. In C++ you can, by declaring something as const and providing the initial default value. In fact, one of the things I like about the Vulkan C++ bindings is that you can basically forget about sType forever. If you're using extensions you still need to populate pNext as appropriate.

golang write struct as raw data

I am working on a new type of database, using GO. One of the things I would like to do is have a distributed disk so that I can distribute queries over multiple machines (think Pi type architectures). This means building my own structures on raw disk.
My challenge is that I can't find a GO package that will let me write N bytes from a pointer to a structure. All the IO packages limit the access to []byte slices.
That's nice for protection, but if I have to buffer everything through a byte array via some form of encoding it will slow down the access to a specific object.
Anyone got any idea on how to do raw IO? Or am I going to have to handle GOBs as my unit of IO and suffer the penalty for encoding/decoding?
Big warning first: don't do it: it is neither safe nor portable
For a given struct, you can reflect over it to figure out the in-memory size of the actual struct, then unsafely cast it to a []byte using unsafe.
eg: (*[in-mem size]byte)(unsafe.Pointer(&mystruct))
This will give you something C-ish with absolutely no safety guarantees or portability.
I'll quote the Go spec:
A package using unsafe must be vetted manually for type safety and may
not be portable.
You can find a lot more details in this Go and Memory layout post, including all the steps you need to unsafely treat structs as just bytes.
Overall, it's fascinating to examine how Go functions on a low level, but this is absolutely the wrong thing to do in your case. Any real data infrastructure will need storage logic way more complicated than just dumping in-memory structs to disk anyway.
In general, you cannot do raw IO of a Go struct (i.e. memdump). This is because many things in Go contain pointers, and the actual data is not contiguous in memory.
For example, a struct like this:
type Person struct {
Name string
}
contains a string, which in turn contains a pointer to the bytes of the string. A raw memdump would only dump the pointer.
The solution is serialization. This is never free, although some implementations do a pretty good job.
The closest to what you are describing is something like go-memdump, but I wouldn't recommend it for production.
Otherwise, I recommend looking at a performant serialization technique. (Go's gob encoding is not the best.)
...Or am I going to have to handle GOBs as my unit of IO and suffer the penalty for encoding/decoding?
Just use GOBs.
Premature optimization is the root of all evil.

Is it always preferable to pass in a mutable reference vs creating and returning an owned value?

Coming to Rust from dynamic languages like Python, I'm not used to the programming pattern where you provide a function with a mutable reference to an empty data structure and that function populates it. A typical example is reading a file into a String:
let mut f = File::open("file.txt").unwrap();
let mut contents = String::new();
f.read_to_string(&mut contents).unwrap();
To my Python-accustomed eyes, an API where you just create an owned value within the function and move it out as a return value looks much more intuitive / ergonomic / what have you:
let mut f = File::open("file.txt").unwrap();
let contents = f.read_to_string().unwrap();
Since the Rust standard library takes the former road, I figure there must be a reason for that.
Is it always preferable to use the reference pattern? If so, why? (Performance reasons? What specifically?) If not, how do I spot the cases where it might be beneficial? Is it mostly useful when I want to return another value in addition to populating the result data structure (as in the first example above, where .read_to_string() returns the number of bytes read)? Why not use a tuple? Is it simply a matter of personal preference?
If read_to_string wanted to return an owned String, this means it would have to heap allocate a new String every time it was called. Also, because Read implementations don't always know how much data there is to be read, it would probably have to incrementally re-allocate the work-in-progress String multiple times. This also means every temporary String has to go back to the allocator to be destroyed.
This is wasteful. Rust is a system programming language. System programming languages abhor waste.
Instead, the caller is responsible for allocating and providing the buffer. If you only call read_to_string once, nothing changes. If you call it more than once, however, you can re-use the same buffer multiple times without the constant allocate/resize/deallocate cycle. Although it doesn't apply in this specific case, similar interfaces can be design to also support stack buffers, meaning in some cases you can avoid heap activity entirely.
Having the caller pass the buffer in is strictly more flexible than the alternative.

Resources