More convenient way to work with strings in winapi calls - string

I'm looking for more convenient way to work with std::String in winapi calls in Rust.
Using rust v 0.12.0-nigtly with winapi 0.1.22 and user32-sys 0.1.1
Now I'm using something like this:
use winapi;
use user32;
pub fn get_window_title(handle: i32) -> String {
let mut v: Vec<u16> = Vec::new();
v.reserve(255);
let mut p = v.as_mut_ptr();
let len = v.len();
let cap = v.capacity();
let mut read_len = 0;
unsafe {
mem::forget(v);
read_len = unsafe { user32::GetWindowTextW(handle as winapi::HWND, p, 255) };
if read_len > 0 {
return String::from_utf16_lossy(Vec::from_raw_parts(p, read_len as usize, cap).as_slice());
} else {
return "".to_string();
}
}
}
I think, that this vector based memory allocation is rather bizarre. So I'm looking for more easier way to cast LPCWSTR to std::String

In your situation, you always want a maximum of 255 bytes, so you can use an array instead of a vector. This reduces the entire boilerplate to a mem::uninitialized() call, an as_mut_ptr() call and a slicing operation.
unsafe {
let mut v: [u16; 255] = mem::uninitialized();
let read_len = user32::GetWindowTextW(
handle as winapi::HWND,
v.as_mut_ptr(),
255,
);
String::from_utf16_lossy(&v[0..read_len])
}
In case you wanted to use a Vec, there's an easier way than to destroy the vec and re-create it. You can write to the Vec's content directly and let Rust handle everything else.
let mut v: Vec<u16> = Vec::with_capacity(255);
unsafe {
let read_len = user32::GetWindowTextW(
handle as winapi::HWND,
v.as_mut_ptr(),
v.capacity(),
);
v.set_len(read_len); // this is undefined behavior if read_len > v.capacity()
String::from_utf16_lossy(&v)
}
As a side-note, it is idiomatic in Rust to not use return on the last statement in a function, but to simply let the expression stand there without a semicolon. In your original code, the final if-expression could be written as
if read_len > 0 {
String::from_utf16_lossy(Vec::from_raw_parts(p, read_len as usize, cap).as_slice())
} else {
"".to_string()
}
but I removed the entire condition from my samples, as it is unnecessary to handle 0 read characters differently from n characters.

Related

How to pass &mut str and change the original mut str without a return?

I'm learning Rust from the Book and I was tackling the exercises at the end of chapter 8, but I'm hitting a wall with the one about converting words into Pig Latin. I wanted to see specifically if I could pass a &mut String to a function that takes a &mut str (to also accept slices) and modify the referenced string inside it so the changes are reflected back outside without the need of a return, like in C with a char **.
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
Also what do you think of the way I handled the chars and indexing inside strings?
use std::io::{self, Write};
fn main() {
let v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
for s in &v {
// cannot borrow `s` as mutable, as it is not declared as mutable
// cannot borrow data in a `&` reference as mutable
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
io::stdout().flush().unwrap();
}
fn to_pig_latin(mut s: &mut str) {
let first = s.chars().nth(0).unwrap();
let mut pig;
if "aeiouAEIOU".contains(first) {
pig = format!("{}-{}", s, "hay");
s = &mut pig[..]; // `pig` does not live long enough
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
pig = format!("{}-{}{}", word, first.to_lowercase(), "ay");
s = &mut pig[..]; // `pig` does not live long enough
}
}
Edit: here's the fixed code with the suggestions from below.
fn main() {
// added mut
let mut v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
// added mut
for mut s in &mut v {
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
println!();
}
// converted into &mut String
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay");
} else {
// added code to make the new first letter uppercase
let second = s.chars().nth(1).unwrap();
*s = format!(
"{}{}-{}ay",
second.to_uppercase(),
// the slice starts at the third char of the string, as if &s[2..]
&s[first.len_utf8() * 2..],
first.to_lowercase()
);
}
}
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
What you're trying to do can't work: with a mutable reference you can update the referee in-place, but this is extremely limited here:
a &mut str can't change length or anything of that matter
a &mut str is still just a reference, the memory has to live somewhere, here you're creating new Strings inside your function then trying to use these as the new backing buffers for the reference, which as the compiler tells you doesn't work: the String will be deallocated at the end of the function
What you could do is take an &mut String, that lets you modify the owned string itself in-place, which is much more flexible. And, in fact, corresponds exactly to your request: an &mut str corresponds to a char*, it's a pointer to a place in memory.
A String is also a pointer, so an &mut String is a double-pointer to a zone in memory.
So something like this:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
*s = format!("{}-{}", s, "hay");
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
*s = format!("{}-{}{}", word, first.to_lowercase(), "ay");
}
}
You can also likely avoid some of the complete string allocations by using somewhat finer methods e.g.
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
s.replace_range(first.len_utf8().., "");
write!(s, "-{}ay", first.to_lowercase()).unwrap();
}
}
although the replace_range + write! is not very readable and not super likely to be much of a gain, so that might as well be a format!, something along the lines of:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
*s = format!("{}-{}ay", &s[first.len_utf8()..], first.to_lowercase());
}
}

Why does a generic function replicating C's fread for unsigned integers always return zero?

I am trying to read in binary 16-bit machine instructions from a 16-bit architecture (the exact nature of that is irrelevant here), and print them back out as hexadecimal values. In C, I found this simple by using the fread function to read 16 bits into a uint16_t.
I figured that I would try to replicate fread in Rust. It seems to be reasonably trivial if I can know ahead-of-time the exact size of the variable that is being read into, and I had that working specifically for 16 bits.
I decided that I wanted to try to make the fread function generic over the various built-in unsigned integer types. For that I came up with the below function, using some traits from the Num crate:
fn fread<T>(
buffer: &mut T,
element_count: usize,
stream: &mut BufReader<File>,
) -> Result<usize, std::io::Error>
where
T: num::PrimInt + num::Unsigned,
{
let type_size = std::mem::size_of::<T>();
let mut buf = Vec::with_capacity(element_count * type_size);
let buf_slice = buf.as_mut_slice();
let bytes_read = match stream.read_exact(buf_slice) {
Ok(()) => element_count * type_size,
Err(ref e) if e.kind() == std::io::ErrorKind::UnexpectedEof => 0,
Err(e) => panic!("{}", e),
};
*buffer = buf_slice
.iter()
.enumerate()
.map(|(i, &b)| {
let mut holder2: T = num::zero();
holder2 = holder2 | T::from(b).expect("Casting from u8 to T failed");
holder2 << ((type_size - i) * 8)
})
.fold(num::zero(), |acc, h| acc | h);
Ok(bytes_read)
}
The issue is that when I call it in the main function, I seem to always get 0x00 back out, but the number of bytes read that is returned by the function is always 2, so that the program enters an infinite loop:
extern crate num;
use std::fs::File;
use std::io::BufReader;
use std::io::prelude::Read;
fn main() -> Result<(), std::io::Error> {
let cmd_line_args = std::env::args().collect::<Vec<_>>();
let f = File::open(&cmd_line_args[1])?;
let mut reader = BufReader::new(f);
let mut instructions: Vec<u16> = Vec::new();
let mut next_instruction: u16 = 0;
fread(&mut next_instruction, 1, &mut reader)?;
let base_address = next_instruction;
while fread(&mut next_instruction, 1, &mut reader)? > 0 {
instructions.push(next_instruction);
}
println!("{:#04x}", base_address);
for i in instructions {
println!("0x{:04x}", i);
}
Ok(())
}
It appears to me that I'm somehow never reading anything from the file, so the function always just returns the number of bytes it was supposed to read. I'm clearly not using something correctly here, but I'm honestly unsure what I'm doing wrong.
This is compiled on Rust 1.26 stable for Windows if that matters.
What am I doing wrong, and what should I do differently to replicate fread? I realise that this is probably a case of the XY problem (in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer), but I'm really curious as to what I'm doing wrong here.
Your problem is that this line:
let mut buf = Vec::with_capacity(element_count * type_size);
creates a zero-length vector, even though it allocates memory for element_count * type_size bytes. Therefore you are asking stream.read_exact to read zero bytes. One way to fix this is to replace the above line with:
let mut buf = vec![0; element_count * type_size];
Side note: when the read succeeds, bytes_read receives the number of bytes you expected to read, not the number of bytes you actually read. You should probably use std::mem::size_of_val (buf_slice) to get the true byte count.
in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer
Yes, use the byteorder crate. This requires no unneeded heap allocation (the Vec in the original code):
extern crate byteorder;
use byteorder::{LittleEndian, ReadBytesExt};
use std::{
fs::File, io::{self, BufReader, Read},
};
fn read_instructions_to_end<R>(mut rdr: R) -> io::Result<Vec<u16>>
where
R: Read,
{
let mut instructions = Vec::new();
loop {
match rdr.read_u16::<LittleEndian>() {
Ok(instruction) => instructions.push(instruction),
Err(e) => {
return if e.kind() == std::io::ErrorKind::UnexpectedEof {
Ok(instructions)
} else {
Err(e)
}
}
}
}
}
fn main() -> Result<(), std::io::Error> {
let name = std::env::args().skip(1).next().expect("no file name");
let f = File::open(name)?;
let mut f = BufReader::new(f);
let base_address = f.read_u16::<LittleEndian>()?;
let instructions = read_instructions_to_end(f)?;
println!("{:#04x}", base_address);
for i in &instructions {
println!("0x{:04x}", i);
}
Ok(())
}

Returning modified array from for loop without type mismatch

In pseudo-code, I'm trying to do the following:
my_array = [[1,2,3,4],[5,6,7,8]]
my_array = array_modify_fn(my_array)
fn array_modify_fn(array) {
for i in array {
array[i] = some_operation
}
}
Having read this question about the type mismatch this kind of loop/function would cause in Rust, I'm still confused as to how to actually implement what I want to implement here, but in Rust.
Am I just going about the problem in the wrong way? (For Rust at least; this is how I would do it in Python.)
My Rust at the moment looks like this:
let mut life_array = [[false; SIZE]; SIZE];
life_array = random_init(&mut life_array); // in main function
fn random_init(arr: &mut [[bool; SIZE]; SIZE]) -> [[bool; SIZE]; SIZE] {
for i in 0 .. (SIZE*SIZE) {
arr[i/SIZE][i%SIZE] = rand::random()
}
}
and this returns the type mismatch: expected type '[[bool; SIZE]; SIZE]' found type '()'
You've defined random_init with a return type, yet your function doesn't return anything (strictly speaking, it returns ()). Since you're mutating the array in-place, your function doesn't have to return anything, so you should just omit the return type.
const SIZE: usize = 4;
extern crate rand;
fn main() {
let mut life_array = [[false; SIZE]; SIZE];
random_init(&mut life_array);
}
fn random_init(arr: &mut [[bool; SIZE]; SIZE]) {
for i in 0..(SIZE * SIZE) {
arr[i / SIZE][i % SIZE] = rand::random()
}
}

Can I reset a borrow of a local in a loop?

I have a processing loop that needs a pointer to a large lookup table.
The pointer is unfortunately triply indirected from the source data, so keeping that pointer around for the inner loop is essential for performance.
Is there any way I can tell the borrow checker that I'm "unborrowing" the state variable in the unlikely event I need to modify the state... so I can only re-lookup the slice in the event that the modify_state function triggers?
One solution I thought of was to change data to be a slice reference and do a mem::replace on the struct at the beginning of the function and pull the slice into local scope, then replace it back at the end of the function — but that is very brittle and error prone (as I need to remember to replace the item on every return). Is there another way to accomplish this?
struct DoubleIndirect {
data: [u8; 512 * 512],
lut: [usize; 16384],
lut_index: usize,
}
#[cold]
fn modify_state(s: &mut DoubleIndirect) {
s.lut_index += 63;
s.lut_index %= 16384;
}
fn process(state: &mut DoubleIndirect) -> [u8; 65536] {
let mut ret: [u8; 65536] = [0; 65536];
let mut count = 0;
let mut data_slice = &state.data[state.lut[state.lut_index]..];
for ret_item in ret.iter_mut() {
*ret_item = data_slice[count];
if count % 197 == 196 {
data_slice = &[];
modify_state(state);
data_slice = &state.data[state.lut[state.lut_index]..];
}
count += 1
}
return ret;
}
The simplest way to do this is to ensure the borrows of state are all disjoint:
#[cold]
fn modify_state(lut_index: &mut usize) {
*lut_index += 63;
*lut_index %= 16384;
}
fn process(state: &mut DoubleIndirect) -> [u8; 65536] {
let mut ret: [u8; 65536] = [0; 65536];
let mut count = 0;
let mut lut_index = &mut state.lut_index;
let mut data_slice = &state.data[state.lut[*lut_index]..];
for ret_item in ret.iter_mut() {
*ret_item = data_slice[count];
if count % 197 == 196 {
modify_state(lut_index);
data_slice = &state.data[state.lut[*lut_index]..];
}
count += 1
}
return ret;
}
The problem is basically two things: first, Rust will not look beyond a function's signature to find out what it does. As far as the compiler knows, your call to modify_state could be changing state.data as well, and it can't allow that.
The second problem is that borrows are lexical; the compiler looks at the block of code where the borrow might be used as goes with that. It doesn't (currently) bother to try and reduce the length of borrows to match where they're actually active.
You can also play games with, for example, using std::mem::replace to pull state.data out into a local variable, do your work, then replace it back just before you return.

Using str and String interchangably

Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?
Rust has exactly what you want in form of a Cow (Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as #sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.
use std::borrow::Cow;
fn main() {
let mut v: Vec<Cow<'static, str>> = vec![];
v.push("oh hai".into());
v.push(format!("there, {}.", "Mark").into());
println!("{:?}", v);
}
Produces:
["oh hai", "there, Mark."]

Resources