How do I use nested Vecs with wasm-bindgen? - rust

It doesn't appear that nested Vecs work with wasm-bindgen. Is that correct?
My goal is to have a Game of Life grid in Rust that I can return as rows, rather than a 1D Vec which requires the JavaScript to handle the indexing. Two workarounds I've thought of are:
Implement a sort of custom "iterator" in Rust, which is a method which returns the rows one-by-one.
Hand a 1D array to JavaScript but write a wrapper in JavaScript which handles the indexing and exposes some sort of an iterator to the consumer.
I hesitate to use either of these because I want this library to be usable by JavaScript and native Rust, and I don't think either would be very idiomatic in pure Rust land. Any other suggestions?

You're correct that wasm-bindgen today doesn't support returning types like Vec<Vec<u8>>.
A good rule of thumb for WebAssembly is that big chunks of data (like vectors) should always live in the same location to avoid losing too much performance. This means that you might want to explore an interface where a JS object wraps a pointer into WASM memory, and all of its methods work with row/column indices but modify WASM memory to keep it as the source of truth.
If that doesn't work out, then the best way to implement this today is either of the strategies you mentioned as well, although both of those require some level of JS glue code to be written as well.

Related

Is there a method of initiating a ThinBox<[T]>?

I have been dabbling in experimental features lately and have been using them for a library I'm building. I am trying to reduce the size of an enum by using ThinBox<[T]> to store contents in a fixed length array without the whole const generics monomorphization business happening in my code (since I need to store this in an enum later and don't want to have a const generic on the level of the enum).
The closest thing I got to a solution is to ThinBox a fixed sized array. (it coerces to a slice). Though it technically does fix the problem of const generics on the type level, I want to find a solution that doesn't require me to input const generics into a function (since it's a lot less flexibility). I also don't want to end up with a ThinBox<&[T]> since that is two levels of indirection.
Is there a method, safe or unsafe, that can initialize a ThinBox<[T]> without directly hacking the compiler?
You can use ThinBox::new_unsize like this:
ThinBox::<[T]>::new_unsize([/* your array */])

Serialization in Haskell

From the bird's view, my question is: Is there a universal mechanism for as-is data serialization in Haskell?
Introduction
The origin of the problem does not root in Haskell indeed. Once, I tried to serialize a python dictionary where a hash function of objects was quite heavy. I found that in python, the default dictionary serialization does not save the internal structure of the dictionary but just dumps a list of key-value pairs. As a result, the de-serialization process is time-consuming, and there is no way to struggle with it. I was certain that there is a way in Haskell because, at my glance, there should be no problem transferring a pure Haskell type to a byte-stream automatically using BFS or DFS. Surprisingly, but it does not. This problem was discussed here (citation below)
Currently, there is no way to make HashMap serializable without modifying the HashMap library itself. It is not possible to make Data.HashMap an instance of Generic (for use with cereal) using stand-alone deriving as described by #mergeconflict's answer, because Data.HashMap does not export all its constructors (this is a requirement for GHC). So, the only solution left to serialize the HashMap seems to be to use the toList/fromList interface.
Current Problem
I have quite the same problem with Data.Trie bytestring-trie package. Building a trie for my data is heavily time-consuming and I need a mechanism to serialize and de-serialize this tire. However, it looks like the previous case, I see no way how to make Data.Trie an instance of Generic (or, am I wrong)?
So the questions are:
Is there some kind of a universal mechanism to project a pure Haskell type to a byte string? If no, is it a fundamental restriction or just a lack of implementations?
If no, what is the most painless way to modify the bytestring-trie package to make it the instance of Generic and serialize with Data.Store
There is a way using compact regions, but there is a big restriction:
Our binary representation contains direct pointers to the info tables of objects in the region. This means that the info tables of the receiving process must be laid out in exactly the same way as from the original process; in practice, this means using static linking, using the exact same binary and turning off ASLR. This API does NOT do any safety checking and will probably segfault if you get it wrong. DO NOT run this on untrusted input.
This also gives insight into universal serialization is not possible currently. Data structures contain very specific pointers which can differ if you're using different binaries. Reading in the raw bytes into another binary will result in invalid pointers.
There is some discussion in this GitHub issue about weakening this requirement.
I think the proper way is to open an issue or pull request upstream to export the data constructors in the internal module. That is what happened with HashMap which is now fully accessible in its internal module.
Update: it seems there is already a similar open issue about this.

Multiple specialization, iterator patterns in Rust

Learning Rust (yay!) and I'm trying to understand the intended idiomatic programming required for certain iterator patterns, while scoring top performance. Note: not Rust's Iterator trait, just a method I've written accepting a closure and applying it to some data I'm pulling off of disk / out of memory.
I was delighted to see that Rust (+LLVM?) took an iterator I had written for sparse matrix entries, and a closure for doing sparse matrix vector multiplication, written as
iterator.map_edges({ |x, y| dst[y] += src[x] });
and inlined the closure's body in the generated code. It went quite fast. :D
If I create two of these iterators, or use the first a second time (not a correctness issue) each instance slows down quite a lot (about 2x in this case), presumably because the optimizer no longer chooses to do specialization because of the multiple call sites, and you end up doing a function call for each element.
I'm trying to understand if there are idiomatic patterns that keep the pleasant experience above (I like it, at least) without sacrificing the performance. My options seem to be (none satisfying this constraint):
Accept dodgy performance (2x slower is not fatal, but no prizes either).
Ask the user to supply a batch-oriented closure, so acting on an iterator over a small batch of data. This exposes a bit much of the internals of the iterator (the data are compressed nicely, and the user needs to know how to unwrap them, or the iterator needs to stage an unwrapped batch in memory).
Make map_edges generic in a type implementing a hypothetical EdgeMapClosure trait, and ask the user to implement such a type for each closure they want to inline. Not tested, but I would guess this exposes distinct methods to LLVM, each of which get nicely inlined. Downside is that the user has to write their own closure (packing relevant state up, etc).
Horrible hacks, like make distinct methods map_edges0, map_edges1, ... . Or add a generic parameter the programmer can use to make the methods distinct, but which is otherwise ignored.
Non-solutions include "just use for pair in iterator.iter() { /* */ }"; this is prep work for a data/task-parallel platform, and I would like to be able to capture/move these closures to work threads rather than capturing the main thread's execution. Maybe the pattern I should be using is to write the above, put it in a lambda/closure, and ship it around instead?
In a perfect world, it would be great to have a pattern which causes each occurrence of map_edges in the source file to result in different specialized methods in the binary, without forcing the entire project to be optimized at some scary level. I'm coming out of an unpleasant relationship with managed languages and JITs where generics would be the only way (I know of) to get this to happen, but Rust and LLVM seem magical enough that I thought there might be a good way. How do Rust's iterators handle this to inline their closure bodies? Or don't they (they should!)?
It seems that the problem is resolved by Rust's new approach to closures outlined at
http://smallcultfollowing.com/babysteps/blog/2014/11/26/purging-proc/
In short, Option 3 above (make functions generic with respect to a new closure type) is now transparently implemented when you make an implementation generic using the new closure traits. Rust produces the type behind the scenes for you.

Suitable Haskell type for large, frequently changing sequence of floats

I have to pick a type for a sequence of floats with 16K elements. The values will be updated frequently, potentially many times a second.
I've read the wiki page on arrays. Here are the conclusions I've drawn so far. (Please correct me if any of them are mistaken.)
IArrays would be unacceptably slow in this case, because they'd be copied on every change. With 16K floats in the array, that's 64KB of memory copied each time.
IOArrays could do the trick, as they can be modified without copying all the data. In my particular use case, doing all updates in the IO monad isn't a problem at all. But they're boxed, which means extra overhead, and that could add up with 16K elements.
IOUArrays seem like the perfect fit. Like IOArrays, they don't require a full copy on each change. But unlike IOArrays, they're unboxed, meaning they're basically the Haskell equivalent of a C array of floats. I realize they're strict. But I don't see that being an issue, because my application would never need to access anything less than the entire array.
Am I right to look to IOUArrays for this?
Also, suppose I later want to read or write the array from multiple threads. Will I have backed myself into a corner with IOUArrays? Or is the choice of IOUArrays totally orthogonal to the problem of concurrency? (I'm not yet familiar with the concurrency primitives in Haskell and how they interact with the IO monad.)
A good rule of thumb is that you should almost always use the vector library instead of arrays. In this case, you can use mutable vectors from the Data.Vector.Mutable module.
The key operations you'll want are read and write which let you mutably read from and write to the mutable vector.
You'll want to benchmark of course (with criterion) or you might be interested in browsing some benchmarks I did e.g. here (if that link works for you; broken for me).
The vector library is a nice interface (crazy understatement) over GHC's more primitive array types which you can get to more directly in the primitive package. As are the things in the standard array package; for instance an IOUArray is essentially a MutableByteArray#.
Unboxed mutable arrays are usually going to be the fastest, but you should compare them in your application to IOArray or the vector equivalent.
My advice would be:
if you probably don't need concurrency first try a mutable unboxed Vector as Gabriel suggests
if you know you will want concurrent updates (and feel a little brave) then first try a MutableArray and then do atomic updates with these functions from the atomic-primops library. If you want fine-grained locking, this is your best choice. Of course concurrent reads will work fine on whatever array you choose.
It should also be theoretically possible to do concurrent updates on a MutableByteArray (equivalent to IOUArray) with those atomic-primops functions too, since a Float should always fit into a word (I think), but you'd have to do some research (or bug Ryan).
Also be aware of potential memory reordering issues when doing concurrency with the atomic-primops stuff, and help convince yourself with lots of tests; this is somewhat uncharted territory.

How to use deftype to implement an associative structure in Clojurescript

I would like to provide a Clojurescript implementation of vector based on Javascript Typed Arrays, which supports assoc, replace etc. I think I want to do that (maybe there's a better way, or maybe someone's done it ?). Presumably I'd use deftype, but what protocols should I provide "concretions" for ?
Let's see what the built-in vectors implement. You can view the source here, on github.
Looks like it's: Object, IWithMeta, IMeta, IStack, ICollection, IEmptyableCollection, ISequential, IEquiv, IHash, ISeqable, ICounted, IIndexed, ILookup, IMapEntry, IAssociative, IVector, IReduce, IKVReduce, IFn, IEditableCollection and IReversible.
That's a lot, but since each one defines one or at most two methods, it isn't that much work. Plus, you could leave some of them unimplemented, like IEditableCollection which is only used for transients or IReduce which is for the new reducers functions.
You don't even have to make your new data structure work exactly like a built-in Vector, either. You could have it implement all the sequential stuff and not worry about the map stuff, for example, though of course then it would be less convenient than a normal Vector.

Resources