Haskell FFI for C recursive struct and union - haskell

I am trying to write Haskell FFI binding for some C structs. An example is below:
typedef struct s0{int a;
union{unsigned char b;
struct s0*c;
struct{unsigned char d[1];
}; };}*S;
My question is how to write the binding for it in chs (for c2hs) or hsc (for hsc2hs) format? I looked into the tutorials for c2hs but either didn't get enough information, or didn't understand it in the way, that would have helped me write the chs file for above definition.
I can generate haskell bindings using HSFFIG tool but it uses custom access method HSFFIG.FieldAccess.FieldAccess to define bindings. I prefer to write bindings that use core haskell FFI libraries, not third-party libraries.
Hence, this question about how to write a binding for recursive struct above in hsc format, or chs format that uses only core FFI libraries.
The actual definition is more complex, but once I figure out how to write above struct definition for c2hs or hsc2hs tools, I can go from there. I know Storable instances need to be defined for inner union and struct as well, but I don't know how to write wrappers for recursive definition like above. Especially, how do you access inside struct/union from outer struct? I looked into HSFFIG definitions, but the access methods are HSFFIG defined access methods. So, I was unable to figure out how to translate it to chs definition that uses only core FFI libraries.
The questions I have seen in StackOverflow seem to be about simpler definitions. If there is a similar answer somewhere else, I will appreciate pointers.

You can't magic up an equiv data structure in either c2hs or hsc2hs. However, you can do your own marshaling in c2hs with only a little work.
data MyType = Next MyType | MyChar Char | MyString String | MyEnd
Then use hsc2hs' newtype pointer functionality to declare a pointer for MyType (ie. s0). Then, write an explicit function using hsc2hs' accessors to recursively walk your structure and build up your Haskell structure. At each step you test if you've hit a null pointer and if so return a MyEnd (or, depending on the data encoding, just check if the int indicating the type in the union is negative one or whatever), otherwise proceed to parse out whatever you have, and if that thing is a pointer, proceed recursively.
You could do nearly the same thing with hsc2hs as well.

Related

Why do I need "use rand::Rng" to call gen() on rand::thread_rng()?

When I'm using Rust's rand crate, if I want to produce a rand number, I would write:
use rand::{self, Rng};
let rand = rand::thread_rng().gen::<usize>();
If I don't use rand::Rng, an error occurs:
no method named gen found for struct rand::prelude::ThreadRng in the current scope
That's quite different from what I'm used to. Usually I treat mods like:
import rand from "path";
rand.generate();
Once I import the mod I don't need to import something else, and I can use every method it exports.
Why must I use rand::Rng to enable the gen method on rand::thread_rng()?
That's quite different from what I used to know.
It feels different because it is indeed different. You are probably used to dynamic dispatch via some kind of virtual method table (as in e.g. C++), or, in case of JS, to dynamic dispatch by looking up either the own properties of the receiver object, or its ancestors via the __proto__-chain. In any case, the object on which you are invoking a method carries around some data that tells it how to get the method that you're invoking. Given the signature of the invoked method, the receiver object itself knows how to get the method with that signature.
That's not the only way, though. For example,
modules / functors in OCaml or SML
Typeclasses in Haskell
implicits / givens in Scala
traits in Rust
work on a rather different principle: the methods are not tied to the receiver, but to the module / typeclass / given / trait instances. In each case, those are entities that are separate from the receiver of the method call. It opens some new possibilities, e.g. it allows you to do some ad-hoc polymorphism (i.e. to define instances of traits after the fact, for types that are not necessarily under your control). At the same time, the compiler typically requires a bit more information from you in order to be able to select the correct instances: it behaves somewhat like a little type-directed search engine, or even a little "theorem prover", and for this to work, you have to tell the compiler where to look for the suitable building blocks for those synthetically generated instances.
If you've never worked before with any language that has a compiler with a subsystem that is "searching for instances" based on type information, this should indeed feel quite foreign. The error messages and the solution approaches do indeed feel rather different, because instead of comparing your implementation against an interface and looking for conflicts, you have to guide this instance-searching mechanism by providing more hints (e.g. by importing more traits etc.).
In your particular case, rand::thread_rng returns a struct ThreadRng. On its own, the struct knows nothing about the gen method, because this method is not tied directly to the struct. Instead, it's defined in the Rng trait. But at the same time, it could be defined in some entirely unrelated trait, and have some completely different meaning. In order to disambiguate the intended meaning, you therefore have to explicitly specify that you want to work with the Rng trait. This is why you have to mention it in the use-clause.
I don't know the specific library you're using, but I can guess at the problem. I would guess that Rng is a trait which defines gen. Traits can be thought of as somewhat like Java's interfaces: they enable ad-hoc polymorphism by allowing you to define different behaviors for the same function on different datatypes.
However, Rust's traits fix one major problem (well, they fix several major problems, but one that's relevant here) with Java's interfaces. In Java, if you define an interface, then anyone writing a class can implement the interface, but you can't implement it for other people. In particular, the built-in types String and int and the like can never implement any new interfaces downstream. In Rust, either the trait writer or the struct/enum writer can implement the trait.
But this poses another issue. Now, if I have a value foo of type Foo and I write foo.bar(), then bar might not be a method defined on Foo; it might be something some trait writer implemented in some other file. We can't go search every Rust file on your computer for possible matching traits, so Rust makes the logical decision to restrict this search to traits that are in scope. If you want to call foo.bar() and bar is a method on trait Bar, then trait Bar has to be in scope when you call it. Otherwise, Rust won't see it.
So, in your case, thread_rng() returns a rand::prelude::ThreadRng. The method gen is not defined on rand::prelude::ThreadRng. Instead, it's defined on a trait called rand::Rng which is *implemented by ThreadRng. That trait has to be in-scope to use the method.

Why do some struct types let us set members that can only be a certain value?

I was reading up on some vulkan struct types, this is one of many examples, but the one I will use is vkInstanceCreateInfo. The documentation states:
The VkInstanceCreateInfo structure is defined as:
typedef struct VkInstanceCreateInfo {
VkStructureType sType;
const void* pNext;
VkInstanceCreateFlags flags;
const VkApplicationInfo* pApplicationInfo;
uint32_t enabledLayerCount;
const char* const* ppEnabledLayerNames;
uint32_t enabledExtensionCount;
const char* const* ppEnabledExtensionNames;
} VkInstanceCreateInfo;
Then below in the options we see:
sType is the type of this structure
sType must be VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO
If we dont have any options anyway, why is this parameter not just set implicitly upon creation of the type?
Note: I realise this is not something specific to the vulkan API.
Update: I'm not just talking specifically about vulkan, just all parameters that can only be a certain type.
The design allows structures to be chained together so that extensions can create additional parameters to existing calls without interfering with the original API structures and without interfering with each other.
Nearly every struct in Vulkan has sType as it's first member, and pNext as it's second member. That means that if you have a void* and all you know is that it is some kind of Vulkan API struct, you can safely read the first 32 bits and it will be a VkStructureType and read the next 32 or 64 bits and it will tell you if there's another structure in the chain.
So for instance, there's a VkMemoryAllocateInfo structure for allocating memory that has (aside from sType and pNext the size of the allocation and the heap index it should come from. But what if I want to use the "dedicated allocation" extension. Then I also need to fill out a VkMemoryDedicatedAllocateInfo structure with extra information. But I still need to call the same vkAllocateMemory function that only takes a VkMemoryAllocateInfo... so where do I put the VkMemoryDedicatedAllocateInfo structure I filled out? I put a pointer to it in the pNext field of VkMemoryAllocateInfo.
Maybe I also want to share this memory with some OpenGL code. There's an extension that lets you do that, but you need to fill out a VkExportMemoryAllocateInfo structure and pass it in during the allocation as well. Well, I can do that by putting it in the pNext field of my VkMemoryDedicatedAllocateInfo structure. I can create a chain of structures like that as long as I want.
Here's the really important part. Since all structures have sType as their first field, an extension can navigate along this chain of structures and find the ones it cares about without knowing anything about the structures other than that they always start with sType and pNext.
All of this means that Vulkan can be extended in ways that alter the behavior of existing functions, but without changing the function itself, or the structures that are passed to it.
You might ask why all of the core structures have sType and pNext, even though you're passing them to functions with typed pointers, rather than void pointers. The reason is consistency, and because you never know when an existing structure might be needed as part of the chain for some new extension.
If we dont have any options anyway, why is this parameter not just set implicitly upon creation of the type?
Because C isn't C++. There's no way to declare a structure in C and say that this portion of the structure will always have this value. In C++ you can, by declaring something as const and providing the initial default value. In fact, one of the things I like about the Vulkan C++ bindings is that you can basically forget about sType forever. If you're using extensions you still need to populate pNext as appropriate.

How can I write binding to a C function that expects an open file handle in Rust?

I've toyed with writing library bindings in Rust before, and it wasn't difficult. Now, however, I'm stuck: I'm trying to write a binding for librsync, and some of its functions expect you to pass an open file handle (a FILE* in C).
For primitive types, there's a straightforward way to pass them into C, (the same for pointers to primitive types). And, to be clear, I'm aware that the libc crate implements fopen, which in turn gives me a mut FILE* (which would eventually do the job). However, I was wondering if there is something in the Rust standard library that gives me a FILE* to pass to librsync — maybe something analog to std::ffi::CString.
You could of course use a RawFd, transmute and call libc::funcs::posix88::stdio::fdopen(_, mode) with it. That would be highly unportable though.
cfile (crate docs) looks like a lightweight wrapper around libcs FILE* that implements the io::Read/Write/Seek traits.

Resolve union structure in Rust FFI

I have problem with resolving c-union structure XEvent.
I'm experimenting with Xlib and X Record Extension in Rust. I'm generate ffi-bindings with rust-bindgen. All code hosted on github alxkolm/rust-xlib-record.
Trouble happen on line src/main.rs:106 when I try extract data from XEvent structure.
let key_event: *mut xlib::XKeyEvent = event.xkey();
println!("KeyPress {}", (*key_event).keycode); // this always print 128 on any key
My program listen key events and print out keycode. But it is always 128 on any key I press. I think this wrong conversion from C union type to Rust type.
Definition of XEvent starts here src/xlib.rs:1143. It's the c-union. Original C definition here.
Code from GitHub can be run by cargo run command. It's compile without errors.
What I do wrong?
Beware that rustbindgen generates binding to C union with as much safety as in C; as a result, when calling:
event.xkey(); // gets the C union 'xkey' field
There is no runtime check that xkey is the field currently containing a value.
This is because since C does not have tagged union (ie, the union knowing which field is currently in use), developers came up with various ways of encoding this information (*), the two that I know of being:
an external supplier; typically another field of the structure right before the union
the first field of each of the structures in the union
Here, you are in the latter case int type; is the first field of the union and each nested structure starts with int _type; to denote this. As a result, you need a two-steps approach:
consult type()
depending on the value, call the correct reinterpretation
The mapping from the value of type to the actual field being used should be part of the documentation of the C library, hopefully.
I invite you to come up with a wrapper around this low-level union that will make it safer to retrieve the result. At the very least, you could check it is the right type in the accessor; the full approach being to come up with a Rust enum that would wrap proxies to all the fields and allow pattern-matching.
(*) and actually sometimes disregard it altogether, for example in C99 to reinterpret a float as an int a union { float f; int i; } can be used.

Does D have 'newtype'?

Does D have 'newtype' (as in Haskell).
It's a naive question, as I'm just skimming D, but Google didn't turn up anything useful.
In Haskell this is a way of making different types of the same thing distinct at compile time, but without incurring any runtime performance penalties.
e.g. you could make newtypes (doubles) for metres, seconds and kilograms. This would error at compile time if your program added a quantity in metres to a quantity in seconds, but would be just as fast at runtime as if both were doubles (which they are at runtime).
If D doesn't have something analogous to 'newtype', what are the accepted methods for dealing with dimensioned quantities?
Thanks,
Chris.
In D1.0 there is typedef, which is the strong typing from a predefined type to a 'newtype.'
D2.0 has removed this and only alias remains (what typedef is in C). There is talk about having a wrapper template that can strongly create a new type.
The issue with typedef was that there were good arguments for making the newtype a sub-type of the predefined type, and also good arguments for making it a super-type.
The semantics of typedef are that the base type is implicitly converted to the newtype, but the newtype is not converted to the base type or other types with the same base type. I am using base type here since:
typedef int Fish;
typedef Fish Cat;
Fish gold = 1;
Cat fluff = gold;
Will fail to compile.
And as of right now, 2.048 DMD still allows the use of typedef (but don't use it).
Having the base type convert to the newtype is useful so you don't have to write
meters = cast(meters) 12.7;
Funny, as he_the_great mentions, D1 had a strong typedef but noone used it, possibly because it was impossible to customize the exact semantics for each case. Possibly the simplest way to handle this situation, at least for primitive types, is to include a mixin template somewhere in Phobos that allows you to forward all operators but have the boilerplate to do this automatically generated via the mixin. Then you'd just create a wrapper struct and be all set.

Resources