Guaranteeing struct layout for wasm in vector of structs - rust

When using a vector of #[wasm_bindgen] structs in javascript, is there a guarantee that the order of the struct's fields will be maintained so that bytes in wasm memory can be correctly interpreted in JS? I would like to be able to have a vector of structs in Rust and deterministically be able to reconstruct the structs on the JS side without having to serialize across the wasm boundary with serde. I have the following Rust code:
#[wasm_bindgen]
pub struct S {
a: u8,
b: u16,
}
#[wasm_bindgen]
pub struct Container {
ss: Vec<S>,
}
#[wasm_bindgen]
impl Container {
pub fn new() -> Self {
let ss = (0..10_u8)
.map(|i| S {
a: i,
b: (2 * i).into(),
})
.collect();
Self { ss }
}
pub fn items_ptr(&self) -> *const S {
self.ss.as_ptr()
}
pub fn item_size(&self) -> usize {
std::mem::size_of::<S>()
}
pub fn buffer_len(&self) -> usize {
self.ss.len() * self.item_size()
}
}
Now, on the JS side, I have the following:
import { memory } from "rust_wasm/rust_wasm.wasm";
import * as wasm from "rust_wasm";
const container = wasm.Container.new();
const items = new Uint8Array(memory.buffer, container.items_ptr(), container.buffer_len());
function getItemBytes(n) {
const itemSize = container.item_size();
const start = n * itemSize;
const end = start + itemSize;
return items.slice(start, end);
}
With the above code, I can obtain the (four, due to alignment) bytes that comprise an S in JS. Now, my question: how do I interpret those bytes to reconstruct the S (the fields a and b) in JS? In theory, I'd like to be able to say something like the following:
const itemBytes = getItemBytes(3);
const a = itemBytes[0];
const b = itemBytes[1] << 8 + itemBytes[2]
But of course this relies on the fact that the layout in memory of an instance of S matches its definition, that u16 is big endian, etc. Is it possible to get Rust to enforce these properties so that I can safely interpret bytes in JS?

Related

Understanding wasm-bindgen returned objects memory management

I'm trying to return a typed object from Rust to Typescript, and I ideally don't want to have to manually manage memory (performance is not the highest priority). While doing this, I'm trying to understand the generated JS.
Rust:
#[wasm_bindgen]
pub fn retjs() -> JsValue {
// in my actual project I serialize a struct with `JsValue::from_serde`
JsValue::from_str("")
}
#[wasm_bindgen]
pub fn retstruct() -> A {
A {
a: "".to_owned(),
};
}
#[wasm_bindgen]
pub struct A {
a: String,
}
#[wasm_bindgen]
impl A {
#[wasm_bindgen]
pub fn foo(&self) -> String {
return self.a.clone();
}
}
Generated JS:
export function retjs() {
const ret = wasm.retjs();
return takeObject(ret);
}
function takeObject(idx) {
const ret = getObject(idx);
dropObject(idx);
return ret;
}
function dropObject(idx) {
if (idx < 36) return;
heap[idx] = heap_next;
heap_next = idx;
}
export function retstruct() {
const ret = wasm.retstruct();
return A.__wrap(ret);
}
// generated class A omitted for brevity. Here relevant that it has a `free()` method.
Is my understanding correct that with the JsValue the memory is completely freed automatically? And that I've to do this manually with the struct? Is there a way to get around that?
I basically just want type safety in Typescript, so when I update the struct in Rust the Typescript code is automatically updated.

Derive macro generation

I'm making my own Serializable trait, in the context of a client / server system.
My idea was that the messages sent by the system is an enum made by the user of this system, so it can be customize as needed.
Too ease implementing the trait on the enum, I would like to use the #[derive(Serializable)] method, as implementing it is always the same thing.
Here is the trait :
pub trait NetworkSerializable {
fn id(&self) -> usize;
fn size(&self) -> usize;
fn serialize(self) -> Vec<u8>;
fn deserialize(id: usize, data: Vec<u8>) -> Self;
}
Now, I've tried to look at the book (this one too) and this example to try to wrap my head around derive macros, but I'm really struggling to understand them and how to implement them. I've read about token streams and abstract trees, and I think I understand the basics.
Let's take the example of the id() method : it should gives a unique id for each variant of the enum, to allow headers of messages to tell which message is incoming.
let's say I have this enum as a message system :
enum NetworkMessages {
ErrorMessage,
SpawnPlayer(usize, bool, Transform), // player id, is_mine, position
MovePlayer(usize, Transform), // player id, new_position
DestroyPlayer(usize) // player_id
}
Then, the id() function should look like this :
fn id(&self) -> usize {
match &self {
&ErrorMessage => 0,
&SpawnPlayer => 1,
&MovePlayer => 2,
&DestroyPlayer => 3,
}
}
Here was my go with writting this using a derive macro :
#[proc_macro_derive(NetworkSerializable)]
pub fn network_serializable_derive(input: TokenStream) -> TokenStream {
// Construct a representation of Rust code as a syntax tree
// that we can manipulate
let ast = syn::parse(input).unwrap();
// Build the trait implementation
impl_network_serializable_macro(&ast)
}
fn impl_network_serializable_macro(ast: &syn::DeriveInput) -> TokenStream {
// get enum name
let ref name = ast.ident;
let ref data = ast.data;
let (id_func, size_func, serialize_func, deserialize_func) = match data {
// Only if data is an enum, we do parsing
Data::Enum(data_enum) => {
// Iterate over enum variants
let mut id_func_internal = TokenStream2::new();
let mut variant_id: usize = 0;
for variant in &data_enum.variants {
// add the branch for the variant
id_func_internal.extend(quote_spanned!{
variant.span() => &variant_id,
});
variant_id += 1;
}
(id_func_internal, (), (), ())
}
_ => {(TokenStream2::new(), (), (), ())},
};
let expanded = quote! {
impl NetworkSerializable for #name {
// variant_checker_functions gets replaced by all the functions
// that were constructed above
fn size(&self) -> usize {
match &self {
#id_func
}
}
/*
#size_func
#serialize_func
#deserialize_func
*/
}
};
expanded.into()
}
So this is generating quite a lot of errors, with the "proc macro NetworkSerializable not expanded: no proc macro dylib present" being first. So I'm guessing there a lot of misunderstaning from my part in here.

Implementing Strategy pattern in rust without knowing which strategy are we using at compile time

I've been trying to implement a Strategy pattern in rust, but I'm having trouble understanding how to make it work.
So let's imagine we have a trait Adder and Element:
pub trait Element {
fn to_string(&self) -> String;
}
pub trait Adder {
type E: Element;
fn add (&self, a: &Self::E, b: &Self::E) -> Self::E;
}
And we have two implementations StringAdder with StringElements and UsizeAdder with UsizeElements:
// usize
pub struct UsizeElement {
pub value: usize
}
impl Element for UsizeElement {
fn to_string(&self) -> String {
self.value.to_string()
}
}
pub struct UsizeAdder {
}
impl Adder for UsizeAdder{
type E = UsizeElement;
fn add(&self, a: &UsizeElement, b: &UsizeElement) -> UsizeElement{
UsizeElement { value: a.value + b.value }
}
}
// String
pub struct StringElement {
pub value: String
}
impl Element for StringElement {
fn to_string(&self) -> String {
self.value.to_string()
}
}
pub struct StringAdder {
}
impl Adder for StringAdder {
type E = StringElement;
fn add(&self, a: &StringElement, b: &StringElement) -> StringElement {
let a: usize = a.value.parse().unwrap();
let b: usize = b.value.parse().unwrap();
StringElement {
value: (a + b).to_string()
}
}
}
And I want to write a code that uses trait methods from Adder trait and it's corresponding elements without knowing at compile time which strategy is going to be used.
fn main() {
let policy = "usize";
let element = "1";
let adder = get_adder(&policy);
let element_a = get_element(&policy, element);
let result = adder.add(element_a, element_a);
}
To simplify I'm going to assign a string to policy and element but normally that would be read from a file.
Is the only way to implement get_adder and get_element using dynamic dispatch? And by extension should I define Adder and Element traits to use trait objects and or the Any trait?
Edit: Here is what I managed to figure out so far.
An example of possible implementation is using match to help define concrete types for the compiler.
fn main() {
let policy = "string";
let element = "1";
let secret_key = "5";
let result = cesar(policy, element, secret_key);
dbg!(result.to_string());
}
fn cesar(policy: &str, element: &str, secret_key: &str) -> Box<dyn Element>{
match policy {
"usize" => {
let adder = UsizeAdder{};
let element = UsizeElement{ value: element.parse().unwrap() };
let secret_key = UsizeElement{ value: secret_key.parse().unwrap() };
Box::new(cesar_impl(&adder, &element, &secret_key))
}
"string" => {
let adder = StringAdder{};
let element = StringElement{ value: element.to_string() };
let secret_key = StringElement{ value: secret_key.to_string() };
Box::new(cesar_impl(&adder, &element, &secret_key))
}
_ => {
panic!("Policy not supported!")
}
}
}
fn cesar_impl<A>(adder: &A, element: &A::E, secret_key: &A::E) -> A::E where A: Adder, A::E : Element {
adder.add(&element, &secret_key)
}
However the issue is that I have to wrap every function I want to implement using a match function to determine the concrete type, and also case for every policy available.
It does not seem like the proper way of implementing it as it will bloat the code, make it more error prone and less maintainable unless I end up using macros.
Edit 2: Here you can find an example using dynamic dispatch. However I'm not convinced it's the proper way to implement the solution.
Example using dynamic dispatch
Thank you for your help :)

How to return a string (or similar) from Rust in WebAssembly?

I created a small Wasm file from this Rust code:
#[no_mangle]
pub fn hello() -> &'static str {
"hello from rust"
}
It builds and the hello function can be called from JS:
<!DOCTYPE html>
<html>
<body>
<script>
fetch('main.wasm')
.then(response => response.arrayBuffer())
.then(bytes => WebAssembly.instantiate(bytes, {}))
.then(results => {
alert(results.instance.exports.hello());
});
</script>
</body>
</html>
My problem is that the alert displays "undefined". If I return a i32, it works and displays the i32. I also tried to return a String but it does not work (it still displays "undefined").
Is there a way to return a string from Rust in WebAssembly? What type should I use?
WebAssembly only supports a few numeric types, which is all that can be returned via an exported function.
When you compile to WebAssembly, your string will be held in the module's linear memory. In order to read this string from the hosting JavaScript, you need to return a reference to its location in memory, and the length of the string, i.e. two integers. This allows you to read the string from memory.
You use this same technique regardless of whichever language you are compiling to WebAssembly. How can I return a JavaScript string from a WebAssembly function provides a detailed background to the problem.
With Rust specifically, you need to make use of the Foreign Function Interface (FFI), using the CString type as follows:
use std::ffi::CString;
use std::os::raw::c_char;
static HELLO: &'static str = "hello from rust";
#[no_mangle]
pub fn get_hello() -> *mut c_char {
let s = CString::new(HELLO).unwrap();
s.into_raw()
}
#[no_mangle]
pub fn get_hello_len() -> usize {
HELLO.len()
}
The above code exports two functions, get_hello which returns a reference to the string, and get_hello_len which returns its length.
With the above code compiled to a wasm module, the string can be accessed as follows:
const res = await fetch('chip8.wasm');
const buffer = await res.arrayBuffer();
const module = await WebAssembly.compile(buffer);
const instance = await WebAssembly.instantiate(module);
// obtain the module memory
const linearMemory = instance.exports.memory;
// create a buffer starting at the reference to the exported string
const offset = instance.exports.get_hello();
const stringBuffer = new Uint8Array(linearMemory.buffer, offset,
instance.exports.get_hello_len());
// create a string from this buffer
let str = '';
for (let i=0; i<stringBuffer.length; i++) {
str += String.fromCharCode(stringBuffer[i]);
}
console.log(str);
The C equivalent can be seen in action in a WasmFiddle.
You cannot directly return a Rust String or an &str. Instead allocate and return a raw byte pointer containing the data which has to be then encoded as a JS string on the JavaScript side.
You can take a look at the SHA1 example here.
The functions of interest are in
demos/bundle.js - copyCStr
demos/sha1/sha1-digest.rs - digest
For more examples: https://www.hellorust.com/demos/sha1/index.html
Most examples I saw copy the string twice. First on the WASM side, into CString or by shrinking the Vec to its capacity, and then on the JS side while decoding the UTF-8.
Given that we often use WASM for the sake of the speed, I sought to implement a version that would reuse the Rust vector.
use std::collections::HashMap;
/// Byte vectors shared with JavaScript.
///
/// A map from payload's memory location to `Vec<u8>`.
///
/// In order to deallocate memory in Rust we need not just the memory location but also it's size.
/// In case of strings and vectors the freed size is capacity.
/// Keeping the vector around allows us not to change it's capacity.
///
/// Not thread-safe (assuming that we're running WASM from the single JavaScript thread).
static mut SHARED_VECS: Option<HashMap<u32, Vec<u8>>> = None;
extern "C" {
fn console_log(rs: *const u8);
fn console_log_8859_1(rs: *const u8);
}
#[no_mangle]
pub fn init() {
unsafe { SHARED_VECS = Some(HashMap::new()) }
}
#[no_mangle]
pub fn vec_len(payload: *const u8) -> u32 {
unsafe {
SHARED_VECS
.as_ref()
.unwrap()
.get(&(payload as u32))
.unwrap()
.len() as u32
}
}
pub fn vec2js<V: Into<Vec<u8>>>(v: V) -> *const u8 {
let v = v.into();
let payload = v.as_ptr();
unsafe {
SHARED_VECS.as_mut().unwrap().insert(payload as u32, v);
}
payload
}
#[no_mangle]
pub extern "C" fn free_vec(payload: *const u8) {
unsafe {
SHARED_VECS.as_mut().unwrap().remove(&(payload as u32));
}
}
#[no_mangle]
pub fn start() {
unsafe {
console_log(vec2js(format!("Hello again!")));
console_log_8859_1(vec2js(b"ASCII string." as &[u8]));
}
}
And the JavaScript part:
(function (iif) {
function rs2js (mod, rs, utfLabel = 'utf-8') {
const view = new Uint8Array (mod.memory.buffer, rs, mod.vec_len (rs))
const utf8dec = new TextDecoder (utfLabel)
const utf8 = utf8dec.decode (view)
mod.free_vec (rs)
return utf8}
function loadWasm (cache) {
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WebAssembly/instantiateStreaming
WebAssembly.instantiateStreaming (fetch ('main.wasm', {cache: cache ? "default" : "no-cache"}), {env: {
console_log: function (rs) {if (window.console) console.log ('main]', rs2js (iif.main, rs))},
console_log_8859_1: function (rs) {if (window.console) console.log ('main]', rs2js (iif.main, rs, 'iso-8859-1'))}
}}) .then (results => {
const exports = results.instance.exports
exports.init()
iif.main = exports
iif.main.start()})}
// Hot code reloading.
if (window.location.hostname == '127.0.0.1' && window.location.port == '43080') {
window.setInterval (
function() {
// Check if the WASM was updated.
fetch ('main.wasm.lm', {cache: "no-cache"}) .then (r => r.text()) .then (lm => {
lm = lm.trim()
if (/^\d+$/.test (lm) && lm != iif.lm) {
iif.lm = lm
loadWasm (false)}})},
200)
} else loadWasm (true)
} (window.iif = window.iif || {}))
The trade-off here is that we're using HashMap in the WASM which might increase the size unless HashMap is already required.
An interesting alternative would be to use the tables to share the (payload, length, capacity) triplet with the JavaScript and get it back when it is time to free the string. But I don't know how to use the tables yet.
P.S. Sometimes we don't want to allocate the Vec in the first place.
In this case we can move the memory tracking to JavaScript:
extern "C" {
fn new_js_string(utf8: *const u8, len: i32) -> i32;
fn console_log(js: i32);
}
fn rs2js(rs: &str) -> i32 {
assert!(rs.len() < i32::max_value() as usize);
unsafe { new_js_string(rs.as_ptr(), rs.len() as i32) }
}
#[no_mangle]
pub fn start() {
unsafe {
console_log(rs2js("Hello again!"));
}
}
(function (iif) {
function loadWasm (cache) {
WebAssembly.instantiateStreaming (fetch ('main.wasm', {cache: cache ? "default" : "no-cache"}), {env: {
new_js_string: function (utf8, len) {
const view = new Uint8Array (iif.main.memory.buffer, utf8, len)
const utf8dec = new TextDecoder ('utf-8')
const decoded = utf8dec.decode (view)
let stringId = iif.lastStringId
while (typeof iif.strings[stringId] !== 'undefined') stringId += 1
if (stringId > 2147483647) { // Can't easily pass more than that through WASM.
stringId = -2147483648
while (typeof iif.strings[stringId] !== 'undefined') stringId += 1
if (stringId > 2147483647) throw new Error ('Out of string IDs!')}
iif.strings[stringId] = decoded
return iif.lastStringId = stringId},
console_log: function (js) {
if (window.console) console.log ('main]', iif.strings[js])
delete iif.strings[js]}
}}) .then (results => {
iif.main = results.instance.exports
iif.main.start()})}
loadWasm (true)
} (window.iif = window.iif || {strings: {}, lastStringId: 1}))
Return String from Rust fn to ReactApp
TLDR:
Add to main.rs use wasm_bindgen::prelude::*;
Use JsValue as the return type of fn.
Return from fn JSValue::from_str("string")
Create Rust Library for Function
mkdir ~/hello-from-rust-demo \
cd ~/hello-from-rust-demo \
cargo new --lib hello-wasm \
cargo add wasm-bindgen \
code ~/hello-from-rust-demo/hello-wasm/src/lib.rs
use wasm_bindgen::prelude::*;
#[wasm_bindgen]
pub fn hello(name: &str) -> JsValue {
JsValue::from_str(&format!("Hello from rust, {}!", name))
}
cargo install wasm-pack \
wasm-pack build --target web
Create React App to Demo Rust Function
cd ~/hello-from-rust-demo \
yarn create react-app hello \
cd hello \
yarn add ../hello-wasm/pkg \
code ~/hello-from-rust-demo/hello/src/App.js
App.js
import init, { hello } from 'hello-wasm';
import { useState, useEffect } from 'react';
function App() {
const [hello, setHello] = useState(null);
useEffect(() => {
init().then(() => {
setHello(()=>hello);
})
}, []);
return (
hello("Human")
);
}
export default App;
Start App
yarn start
Hello from rust, Human!

How can you allocate a raw mutable pointer in stable Rust?

I was trying to build a naive implementation of a custom String-like struct with small string optimization. Now that unions are allowed in stable Rust, I came up with the following code:
struct Large {
capacity: usize,
buffer: *mut u8,
}
struct Small([u8; 16]);
union Container {
large: Large,
small: Small,
}
struct MyString {
len: usize,
container: Container,
}
I can't seem to find a way how to allocate that *mut u8. Is it possible to do in stable Rust? It looks like using alloc::heap would work, but it is only available in nightly.
As of Rust 1.28, std::alloc::alloc is stable.
Here is an example which shows in general how it can be used.
use std::{
alloc::{self, Layout},
cmp, mem, ptr, slice, str,
};
// This really should **not** be copied
#[derive(Copy, Clone)]
struct Large {
capacity: usize,
buffer: *mut u8,
}
// This really should **not** be copied
#[derive(Copy, Clone, Default)]
struct Small([u8; 16]);
union Container {
large: Large,
small: Small,
}
struct MyString {
len: usize,
container: Container,
}
impl MyString {
fn new() -> Self {
MyString {
len: 0,
container: Container {
small: Small::default(),
},
}
}
fn as_buf(&self) -> &[u8] {
unsafe {
if self.len <= 16 {
&self.container.small.0[..self.len]
} else {
slice::from_raw_parts(self.container.large.buffer, self.len)
}
}
}
pub fn as_str(&self) -> &str {
unsafe { str::from_utf8_unchecked(self.as_buf()) }
}
// Not actually UTF-8 safe!
fn push(&mut self, c: u8) {
unsafe {
use cmp::Ordering::*;
match self.len.cmp(&16) {
Less => {
self.container.small.0[self.len] = c;
}
Equal => {
let capacity = 17;
let layout = Layout::from_size_align(capacity, mem::align_of::<u8>())
.expect("Bad layout");
let buffer = alloc::alloc(layout);
{
let buf = self.as_buf();
ptr::copy_nonoverlapping(buf.as_ptr(), buffer, buf.len());
}
self.container.large = Large { capacity, buffer };
*self.container.large.buffer.offset(self.len as isize) = c;
}
Greater => {
let Large {
mut capacity,
buffer,
} = self.container.large;
capacity += 1;
let layout = Layout::from_size_align(capacity, mem::align_of::<u8>())
.expect("Bad layout");
let buffer = alloc::realloc(buffer, layout, capacity);
self.container.large = Large { capacity, buffer };
*self.container.large.buffer.offset(self.len as isize) = c;
}
}
self.len += 1;
}
}
}
impl Drop for MyString {
fn drop(&mut self) {
unsafe {
if self.len > 16 {
let Large { capacity, buffer } = self.container.large;
let layout =
Layout::from_size_align(capacity, mem::align_of::<u8>()).expect("Bad layout");
alloc::dealloc(buffer, layout);
}
}
}
}
fn main() {
let mut s = MyString::new();
for _ in 0..32 {
s.push(b'a');
println!("{}", s.as_str());
}
}
I believe this code to be correct with respect to allocations, but not for anything else. Like all unsafe code, verify it yourself. It's also completely inefficient as it reallocates for every additional character.
If you'd like to allocate a collection of u8 instead of a single u8, you can create a Vec and then convert it into the constituent pieces, such as by calling as_mut_ptr:
use std::mem;
fn main() {
let mut foo = vec![0; 1024]; // or Vec::<u8>::with_capacity(1024);
let ptr = foo.as_mut_ptr();
let cap = foo.capacity();
let len = foo.len();
mem::forget(foo); // Avoid calling the destructor!
let foo_again = unsafe { Vec::from_raw_parts(ptr, len, cap) }; // Rebuild it to drop it
// Do *NOT* use `ptr` / `cap` / `len` anymore
}
Re allocating is a bit of a pain though; you'd have to convert back to a Vec and do the whole dance forwards and backwards
That being said, your Large struct seems to be missing a length, which would be distinct from capacity. You could just use a Vec instead of writing it out. I see now it's up a bit in the hierarchy.
I wonder if having a full String wouldn't be a lot easier, even if it were a bit less efficient in that the length is double-counted...
union Container {
large: String,
small: Small,
}
See also:
What is the right way to allocate data to pass to an FFI call?
How do I use the Rust memory allocator for a C library that can be provided an allocator?
What about Box::into_raw()?
struct TypeMatches(*mut u8);
TypeMatches(Box::into_raw(Box::new(0u8)));
But it's difficult to tell from your code snippet if this is what you really need. You probably want a real allocator, and you could use libc::malloc with an as cast, as in this example.
There's a memalloc crate which provides a stable allocation API. It's implemented by allocating memory with Vec::with_capacity, then extracting the pointer:
let vec = Vec::with_capacity(cap);
let ptr = buf.as_mut_ptr();
mem::forget(vec);
To free the memory, use Vec::from_raw_parts.

Resources