How to get the byte offset between `&str` - rust

I have two &str pointing to the same string, and I need to know the byte offset between them:
fn main() {
let foo = " bar";
assert_eq!(offset(foo, foo.trim()), Some(2));
let bar = "baz\nquz";
let mut lines = bar.lines();
assert_eq!(offset(bar, lines.next().unwrap()), Some(0));
assert_eq!(offset(bar, lines.next().unwrap()), Some(4));
assert_eq!(offset(foo, bar), None); // not a sub-string
let quz = "quz".to_owned();
assert_eq!(offset(bar, &quz), None); // not the same string, could also return `Some(4)`, I don't care
}
This is basically the same as str::find, but since the second slice is a sub-slice of the first, I would have hoped something faster. Also str::find won't work in the lines() case if several lines are identical.
I thought I could just use some pointer arithmetic to do that with something like foo.trim().as_ptr() - foo.as_ptr() but it turns out that Sub is not implemented on raw pointers.

but it turns out that Sub is not implemented on raw pointers.
You can use the offset_from method:
fn main() {
let source = "hello, world";
let a = &source[1..];
let b = &source[5..];
// I copied this unsafe code from Stack Overflow without
// reading the text that told me how to know if this was safe
let diff = unsafe { b.as_ptr().offset_from(a.as_ptr()) };
println!("{diff}");
}
Please be sure to read the documentation for this method as it describes under what circumstances it will not cause undefined behavior.
In older versions of Rust, you can convert the pointer to a usize to do math on it:
fn main() {
let source = "hello, world";
let a = &source[1..];
let b = &source[5..];
let diff = b.as_ptr() as usize - a.as_ptr() as usize;
println!("{diff}");
}

This is of course kind of unsafe, but if you want arithmetic, you can just cast the pointers to usize with as and subtract that.
(Note: it's not so unsafe that the compiler will actually complain.)

Related

How do I insert a dynamic byte string into a vector?

I need to create packet to send to the server. For this purpose I use vector with byteorder crate. When I try to append string, Rust compiler tells I use unsafe function and give me an error.
use byteorder::{LittleEndian, WriteBytesExt};
fn main () {
let login = "test";
let packet_length = 30 + (login.len() as i16);
let mut packet = Vec::new();
packet.write_u8(0x00);
packet.write_i16::<LittleEndian>(packet_length);
packet.append(&mut Vec::from(String::from("game name ").as_bytes_mut()));
// ... rest code
}
The error is:
packet.append(&mut Vec::from(String::from("game name ").as_bytes_mut()));
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ call to unsafe function
This is playground to reproduce: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=381c6d14660d47beaece15d068b3dc6a
What is the correct way to insert some string as bytes into vector ?
The unsafe function called was as_bytes_mut(). This creates a mutable reference with exclusive access to the bytes representing the string, allowing you to modify them. You do not really need a mutable reference in this case, as_bytes() would have sufficed.
However, there is a more idiomatic way. Vec<u8> also functions as a writer (it implements std::io::Write), so you can use one of its methods, or even the write! macro, to write encoded text on it.
use std::io::Write;
use byteorder::{LittleEndian, WriteBytesExt};
fn main () -> Result<(), std::io::Error> {
let login = "test";
let packet_length = 30 + (login.len() as i16);
let mut packet = Vec::new();
packet.write_u8(0x00)?;
packet.write_i16::<LittleEndian>(packet_length)?;
let game_name = String::from("game name");
write!(packet, "{} ", game_name)?;
Ok(())
}
Playground
See also:
Use write! macro with a string instead of a string literal
What's the de-facto way of reading and writing files in Rust 1.x?
You can use .extend() on the Vec and pass in the bytes representation of the String:
use byteorder::{LittleEndian, WriteBytesExt};
fn main() {
let login = "test";
let packet_length = 30 + (login.len() as i16);
let mut packet = Vec::new();
packet.write_u8(0x00);
packet.write_i16::<LittleEndian>(packet_length);
let string = String::from("game name ");
packet.extend(string.as_bytes());
}
Playground

"Temporary value dropped while borrowed" when using string.replace()

Can someone explain which exact temporary value is dropped and what the recommended way to do this operation is?
fn main() {
let mut a = &mut String::from("Hello Ownership");
a = &mut a.replace("Ownership", "World");
println!("a is {}", a);
}
If you want to keep the &mut references (which are generally not needed in your case, of course), you can do something like this:
fn main() {
let a = &mut String::from("Hello Ownership");
let a = &mut a.replace("Ownership", "World");
println!("a is {}", a);
}
The type of a would by &mut String. In the second line we do what's known as variable shadowing (not that it's needed) and the type is still &mut String.
That doesn't quite answer your question. I don't know why exactly your version doesn't compile, but at least I thought this info might be useful. (see below)
Update
Thanks to Solomon's findings, I wanted to add that apparently in this case:
let a = &mut ...;
let b = &mut ...;
or this one (variable shadowing, basically the same as the above):
let a = &mut ...;
let a = &mut ...;
, the compiler will automatically extend the lifetime of each temporary until the end of the enclosing block. However, in the case of:
let mut a = &mut ...;
a = &mut ...;
, it seems the compiler simply doesn't do such lifetime extension, so that's why the OP's code doesn't compile, even though the code seems to be doing pretty much the same thing.
Why are you using &mut there? Try this:
fn main() {
let mut a = String::from("Hello Ownership");
a = a.replace("Ownership", "World");
println!("a is {}", a);
}
Aha, figured it out!
https://doc.rust-lang.org/nightly/error-index.html#E0716 says:
Temporaries are not always dropped at the end of the enclosing statement. In simple cases where the & expression is immediately stored into a variable, the compiler will automatically extend the lifetime of the temporary until the end of the enclosing block. Therefore, an alternative way to fix the original program is to write let tmp = &foo() and not let tmp = foo():
fn foo() -> i32 { 22 }
fn bar(x: &i32) -> &i32 { x }
let value = &foo();
let p = bar(value);
let q = *p;
Here, we are still borrowing foo(), but as the borrow is assigned directly into a variable, the temporary will not be dropped until the end of the enclosing block. Similar rules apply when temporaries are stored into aggregate structures like a tuple or struct:
// Here, two temporaries are created, but
// as they are stored directly into `value`,
// they are not dropped until the end of the
// enclosing block.
fn foo() -> i32 { 22 }
let value = (&foo(), &foo());

How to return new data from a function as a reference without borrow checker issues?

I'm writing a function that takes a reference to an integer and returns a vector of that integer times 2, 5 times. I think that'd look something like:
fn foo(x: &i64) -> Vec<&i64> {
let mut v = vec![];
for i in 0..5 {
let q = x * 2;
v.push(&q);
}
v
}
fn main() {
let x = 5;
let q = foo(&x);
println!("{:?}", q);
}
The borrow checker goes nuts because I define a new variable, it's allocated on the stack, and goes out of scope at the end of the function.
What do I do? Certainly I can't go through life without writing functions that create new data! I'm aware there's Box, and Copy-type workarounds, but I'm interested in an idiomatic Rust solution.
I realize I could return a Vec<i64> but I think that'd run into the same issues? Mainly trying to come up with an "emblematic" problem for the general issue :)
EDIT: I only just realized that you wrote "I'm aware there's Box, Copy etc type workaround but I'm mostly interested in an idiomatic rust solution", but I've already typed the whole answer. :P And the solutions below are idiomatic Rust, this is all just how memory works! Don't go trying to return pointers to stack-allocated data in C or C++, because even if the compiler doesn't stop you, that doesn't mean anything good will come of it. ;)
Any time that you return a reference, that reference must have been a parameter to the function. In other words, if you're returning references to data, all that data must have been allocated outside of the function. You seem to understand this, I just want to make sure it's clear. :)
There are many potential ways of solving this problem depending on what your use case is.
In this particular example, because you don't need x for anything afterward, you can just give ownership to foo without bothering with references at all:
fn foo(x: i64) -> Vec<i64> {
std::iter::repeat(x * 2).take(5).collect()
}
fn main() {
let x = 5;
println!("{:?}", foo(x));
}
But let's say that you don't want to pass ownership into foo. You could still return a vector of references as long as you didn't want to mutate the underlying value:
fn foo(x: &i64) -> Vec<&i64> {
std::iter::repeat(x).take(5).collect()
}
fn main() {
let x = 5;
println!("{:?}", foo(&x));
}
...and likewise you could mutate the underlying value as long as you didn't want to hand out new pointers to it:
fn foo(x: &mut i64) -> &mut i64 {
*x *= 2;
x
}
fn main() {
let mut x = 5;
println!("{:?}", foo(&mut x));
}
...but of course, you want to do both. So if you're allocating memory and you want to return it, then you need to do it somewhere other than the stack. One thing you can do is just stuff it on the heap, using Box:
// Just for illustration, see the next example for a better approach
fn foo(x: &i64) -> Vec<Box<i64>> {
std::iter::repeat(Box::new(x * 2)).take(5).collect()
}
fn main() {
let x = 5;
println!("{:?}", foo(&x));
}
...though with the above I just want to make sure you're aware of Box as a general means of using the heap. Truthfully, simply using a Vec means that your data will be placed on the heap, so this works:
fn foo(x: &i64) -> Vec<i64> {
std::iter::repeat(x * 2).take(5).collect()
}
fn main() {
let x = 5;
println!("{:?}", foo(&x));
}
The above is probably the most idiomatic example here, though as ever your use case might demand something different.
Alternatively, you could pull a trick from C's playbook and pre-allocate the memory outside of foo, and then pass in a reference to it:
fn foo(x: &i64, v: &mut [i64; 5]) {
for i in v {
*i = x * 2;
}
}
fn main() {
let x = 5;
let mut v = [0; 5]; // fixed-size array on the stack
foo(&x, &mut v);
println!("{:?}", v);
}
Finally, if the function must take a reference as its parameter and you must mutate the referenced data and you must copy the reference itself and you must return these copied references, then you can use Cell for this:
use std::cell::Cell;
fn foo(x: &Cell<i64>) -> Vec<&Cell<i64>> {
x.set(x.get() * 2);
std::iter::repeat(x).take(5).collect()
}
fn main() {
let x = Cell::new(5);
println!("{:?}", foo(&x));
}
Cell is both efficient and non-surprising, though note that Cell works only on types that implement the Copy trait (which all the primitive numeric types do). If your type doesn't implement Copy then you can still do this same thing with RefCell, but it imposes a slight runtime overhead and opens up the possibilities for panics at runtime if you get the "borrowing" wrong.

How would you go about creating a pointer to a specific memory address in Rust?

For example, let's say I want to access whatever value is stored at 0x0900. I found the function std::ptr::read in the Rust standard library, but the documentation isn't super clear on how to use it and I'm not sure if it's the right way.
This is what I've tried:
use std::ptr;
fn main() {
let n = ptr::read("0x0900");
println!("{}", n);
}
but it gives me error E0277
If you want to read a value of type u32 from memory location 0x0900, you could do it as follows:
use std::ptr;
fn main() {
let p = 0x0900 as *const u32;
let n = unsafe { ptr::read(p) };
println!("{}", n);
}
Note that you need to decide what type of pointer you want when casting the address to a pointer.

How do I properly create a heap allocated int and pass it to another function for modification

I'm playing around with the most basic concepts of Rust and have a follow up question to my previous question: Why does the binary + operator not work with two &mut int?
I want to create an int on the heap and I want to pass it to another function for modification.
I came up with this:
fn increment(number: &mut Box<int>) {
**number = **number + **number;
println!("{}", number);
}
fn main() {
let mut test = box 5;
increment(&mut test);
println!("{}", test);
}
This prints
10
10
which is what I want.
However, the ** looks odd and I figured I could also write this kind of thing:
fn increment(number: &mut int) {
*number = *number + *number;
println!("{}", number);
}
fn main() {
let mut test = box 5;
increment(&mut* test);
println!("{}", test);
}
In respect to my intention to create an int on the heap and modify it in another function, which of this methods is correct (if any)? To me it looks as if in the second example, the variable is moved from the heap back to the stack before it is passed to the increment method which is not what I want. On the other hand the ** syntax of the first example looks pretty odd. :-/
You're just about there. From the Rust tutorial:
In the case of owned_box, however, no explicit action is necessary. The compiler will automatically convert a box box point to a reference like &point. This is another form of borrowing; in this case, the contents of the owned box are being lent out.
So all you need to do is take advantage of the implicit borrowing:
fn increment(number: &mut int) {
*number = *number + *number;
}
fn main() {
let mut test = box 5;
increment(test);
println!("{}", test);
}
I don't think the value of test is moved back to stack here. I suspect that a conversion from Box to a borrowed reference won't have any run time overhead, but I don't know enough about the implementation to be certain.

Resources