Rust create non-contiguous vector for testing - rust

I am using the ndarray::ArrayBase type, and for one of my unit tests I need to check whether or not a vector is contiguous in memory. I am using the .is_standard_layout() method to check this, so I just need a way to create vector for testing that is non-contiguous in memory. The contiguous vector is easily created with vec![1.0, 2.0, 3.0]. Does anyone know the best way to make a vector that is non-contiguous?

One quick way is to reverse it by using .invert_axis():
use ndarray::{rcarr1, Axis};
let mut arr = rcarr1(&[1.0, 2.0, 3.0]);
dbg!(arr.is_standard_layout());
arr.invert_axis(Axis(0));
dbg!(arr.is_standard_layout());
[src/main.rs:4] arr.is_standard_layout() = true
[src/main.rs:7] arr.is_standard_layout() = false
Technically it is still "contiguous"; the elements are still all right next to their neighbors. But its not "standard layout" since their locations in memory aren't in an increasing order.

Related

Good ways to deal with operations that return Result when using Ndarray

Recently I've started using ndarray to handle my multi-dimentional arrays when using Rust, and I love a lot of it's convenience features, if I could just figure out one problem, it would be perfect.
Normally, when you collect an iterator over set a Results (e.g. you just did a map operation that was fallible), you can tell rust that you would like to collect it as a Result<Vec<_>, _>, instead of a Vec<Result<, _>>. This is super helpful, but as far as I can tell Ndarray's producers don't have a similar feature? What I'd like to know is how can I efficiently deal with Results when using Ndarray's Producers/when using Zip to operate over a ndarray.
All I can think to do is to either turn the ndarray into a normal iterator, which I don't want to do because the dimentionality is lost (e.g. an Array3 turns into a Vec), do some kind of second iteration after the first to find any Err variants, which seems inefficient, or just use unwrap everywhere...
I'd really appreciate any other perspectives on using ndarray, or other ways to manage multi-dimensional arrays in rust.
code-example:
let thingy: Array3 = whatever;
let output = Zip::from(thingy).and(another_thingy).map_collect(|thing1, thing2| {
function_that_returns_result(thing1, thing2)
});
//output ends up being an Array3 of Results, where I would like an idiomatic way for it to be a Result containing an Array3

Rust: Create New Vector Containing Elements Of Existing Vector With Minimal New Memory Allocation

I am starting to learn Rust, and it's the first time I've ever worked in a language where you have to think about memory allocation (I've never used C).
As an exercise, I decided to see how I could create a new vector that includes some elements from another vector in addition to some new elements. My goal is to create a vector that maintains pointers to the data in the other vector rather than copying or cloning that data, so that new memory is only allocated for the additional elements. Is that what's happening in the code below, and/or is there a better way to do what I'm trying to do?
fn main() {
let v = vec![vec![1], vec![2], vec![3]];
let v0 = &v[0];
let v1 = &v[1];
let v2 = &vec![4];
let v3 = vec![v0, v1, v2];
}
I used nested vectors because to me, this issue is more pertinent when you're working with data on the heap than on the stack, and vectors are allocated on the heap while integers are on the stack, but keep in mind that I'm a complete novice to this whole domain, so feel free to let me know if what I'm saying and doing makes no sense at all 🙂.
My goal is to create a vector that maintains pointers to the data in the other vector rather than copying or cloning that data, so that new memory is only allocated for the additional elements. Is that what's happening in the code below[?]
Yes, v3 contains references to the existing vectors that were in v. It doesn't create new ones or anything.
this issue is more pertinent when you're working with data on the heap than on the stack, and vectors are allocated on the heap while integers are on the stack
This remark is not really true, although using non-Copy types does avoid fooling yourself when you're working on problems like this. Whether a value is on the stack, on the heap, or not on either is for Rust to decide; vec![vec![1]] vs. vec![1] would have the value inside the outermost vec just as much on the heap, if we were to guess what Rust would do.

When should I use direct access into a Rust Vec instead of the get method?

Rust supports two methods for accessing the elements of a vector:
let mut v = vec![1, 2, 3];
let first_element = &v[0];
let second_element = v.get(1);
The get() method returns an Option type, which seems like a useful safety feature. The C-like syntax &v[0] seems shorter to type, but gives up the safety benefits, since invalid reads cause a run-time error rather than producing an indication that the read was out of bounds.
It's not clear to me when I would want to use the direct access approach, because it seems like the only advantage is that it's quicker to type (I save 3 characters). Is there some other advantage (perhaps a speedup?) that I'm not seeing? I guess I would save the conditional of a match expression, but that doesn't seem like it offers much benefit compared to the costs.
Neither of them is quicker because they both do bounds checks. In fact, your question is quite generic because there are other pairs of methods where one of them panics while the other returns an option, such as String::reserve vs String::try_reserve.
If you are sure that you are in bounds, use the brackets version. This is only a syntactic shortcut for get().unwrap().
If you are unsure of this, use the get() method and do your check.
If you critically need maximum speed and you cannot use an iterator and you have determined through benchmarks that the indexing is the bottleneck and you are sure to be in bounds, you can use the get_unchecked() method. Be careful about this because it is unsafe: it is always better to not have any unsafe block in your code.
Just a little bit of advice: if you are concerned by your program performance, avoid using those methods and prefer to use iterators as much as you can. For example, the second example is faster than the first one because in the first case there are one million bounds checks:
let v: Vec<_> = (0..1000_000).collect();
for idx in 0..1000_000 {
// do something with v[idx]
}
for num in &v {
// do something with num
}

How can I sort a LinkedList with just the standard library?

Vec provides a sort method (through Deref implementation), but LinkedList does not. Is there a generic algorithm somewhere in the Rust standard library that allows sorting of LinkedLists?
I don't think there is a built-in way to do it. However, you can move the list contents into a Vec, sort it and turn it back into a linked list:
let mut vec: Vec<_> = list.into_iter().collect();
vec.sort();
let list: LinkedList<_> = vec.into_iter().collect();
This idea is not even remotely as bad as it may seem - see here. While relatively fast algorithms for sorting a linked list do exist, they won't give you as much of cache performance as flat array sorting may do.
See this question, its quite similar but not language spesific.
A while ago I investigated this topic (using C, but applies to Rust too).
Besides converting to a vector & sorting, then converting back to a linked list. Merge-sort is typically the best method to sort a linked list.
The same method can be used both for double and single linked lists (there is no advantage to having links in both directions).
Here is an example, originally from this answer, which I ported to C.
This is a nice example of merge-sort, however after some further investigation I found Mono's eglib mergesort to be more efficient, especially when the list is already partially sorted.
Here is a portable version of it.
It shouldn't be too difficult to port this from C to Rust.

UAV counter indices used across multiple shaders?

I've been trying to implement a Compute Shader based particle system.
I have a compute shader which builds a structured buffer of particles, using a UAV with the D3D11_BUFFER_UAV_FLAG_COUNTER flag.
When I add to this buffer, I check if this particle has any complex behaviours, which I want to filter out and perform in a separate compute shader. As an example, if the particle wants to perform collision detection, I add its index to another structured buffer, also with the D3D11_BUFFER_UAV_FLAG_COUNTER flag.
I then run a second compute shader, which processes all the indices, and applies collision detection to those particles.
However, in the second compute shader, I'd estimate that about 5% of the indices are wrong - they belong to other particles, which don't support collision detection.
Here's the compute shader code that perfroms the list building:
// append to destination buffer
uint dstIndex = g_dstParticles.IncrementCounter();
g_dstParticles[ dstIndex ] = particle;
// add to behaviour lists
if ( params.flags & EMITTER_FLAG_COLLISION )
{
uint behaviourIndex = g_behaviourCollisionIndices.IncrementCounter();
g_behaviourCollisionIndices[ behaviourIndex ] = dstIndex;
}
If I split out the "add to behaviour lists" bit into a separate compute shader, and run it after the particle lists are built, everything works perfectly. However I think I shouldn't need to do this - it's a waste of bandwidth going through all the particles again.
I suspect that IncrementCounter is actually not guaranteed to return a unique index into the UAV, and that there is some clever optimisation going on that means the index is only valid inside the compute shader it is used in. And thus my attempt to pass it to the second compute shader is not valid.
Can anyone give any concrete answers to what's going on here? And if there's a way for me to keep the filtering inside the same compute shader as my core update?
Thanks!
IncrementCounter is an atomic operation and so will (driver/hardware bugs notwithstanding) return a unique value to each thread that calls it.
Have you thought about using Append/Consume buffers for this, as it's what they were designed for? The first pass simply appends the complex collision particles to an AppendStructuredBuffer and the second pass consumes from the same buffer but using a ConsumeStructuredBuffer view instead. The second run of compute will need to use DispatchIndirect so you only run as many thread groups as necessary for the number in the list (something the CPU won't know).
The usual recommendations apply though, have you tried the D3D11 Debug Layer and running it on the reference device to be sure it isn't a driver issue?

Resources