Rust - padded array of bytes from str

Rust - padded array of bytes from str - rust

This rust does exactly what I want, but I don't do much rust and I get the feeling this could be done much better - like maybe in one line. Can anyone give hints to a more "rust idiomatic" way?
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1f139ccf6e8f88dbe92f1f1e4d7a487a
fn fill_from_str(bytes: &mut [u8], s: &str) {
let mut i=0;
for b in s.as_bytes() {
bytes[i] = *b;
i=i+1;
}
}
fn main() {
let mut bytes: [u8; 10] = [0; 10];
fill_from_str(&mut bytes,"hello");
println!("{:?}",bytes);
}

This can be done very succinctly via std::io::Write which is implemented for &mut [u8]:
use std::io::Write;
fn fill_from_str(mut bytes: &mut [u8], s: &str) {
bytes.write(s.as_bytes()).unwrap();
}
fn main() {
let mut bytes: [u8; 10] = [0; 10];
fill_from_str(&mut bytes, "hello");
println!("{:?}", bytes);
}
[104, 101, 108, 108, 111, 0, 0, 0, 0, 0]

There is a better way to do this: the copy_from_slice method. If the slice and the string are the same length this is a one-liner:
fn copy_from_str(dest:&mut[u8], src:&str){
dest.copy_from_slice(src.as_bytes());
}
The copy_from_slice method also is just a single call to memcpy so it is faster than your version. If you want to support different sizes a little more code is needed:
fn copy_from_str(dest:&mut [u8],src:&str){
if dest.len() == src.len(){
dest.copy_from_slice(src.as_bytes());
} else if dest.len() > src.len(){
dest[..src.len()].copy_from_slice(src.as_bytes());
} else {
dest.copy_from_slice(&src.as_bytes()[..dest.len()]);
}
}
That function will also not panic if it winds up slicing on the boundary of a multibyte character.
Edit: Added Plaground link

Related

How can I convert an [u8] hex ascii representation to a u64

I would like to convert my bytes array into a u64.
For example
b"00" should return 0u64
b"0a" should return 10u64
I am working on blockchain, so I must find something efficient.
For example, my current function is not efficient at all.
let number_string = String::from_utf8_lossy(&my_bytes_array)
.to_owned()
.to_string();
let number = u64::from_str_radix(&number_string , 16).unwrap();
I have also tried
let number = u64::from_le_bytes(my_bytes_array);
But I got this error mismatched types expected array [u8; 8], found &[u8]

How about?
pub fn hex_to_u64(x: &[u8]) -> Option<u64> {
let mut result: u64 = 0;
for i in x {
result *= 16;
result += (*i as char).to_digit(16)? as u64;
}
Some(result)
}

How to use `waitpid` to wait for a process in Rust?

I am trying to implement a merge sort using processes but I have a problem using the waitpid function:
extern crate nix;
extern crate rand;
use nix::sys::wait::WaitStatus;
use rand::Rng;
use std::io;
use std::process::exit;
use std::thread::sleep;
use std::time::{Duration, Instant};
use nix::sys::wait::waitpid;
use nix::unistd::Pid;
use nix::unistd::{fork, getpid, getppid, ForkResult};
static mut process_count: i32 = 0;
static mut thread_count: i32 = 0;
fn generate_array(len: usize) -> Vec<f64> {
let mut my_vector: Vec<f64> = Vec::new();
for _ in 0..len {
my_vector.push(rand::thread_rng().gen_range(0.0, 100.0)); // 0 - 99.99999
}
return my_vector;
}
fn get_array_size_from_user() -> usize {
let mut n = String::new();
io::stdin()
.read_line(&mut n)
.expect("failed to read input.");
let n: usize = n.trim().parse().expect("invalid input");
return n;
}
fn display_array(array: &mut Vec<f64>) {
println!("{:?}", array);
println!();
}
fn clear_screen() {
print!("{}[2J", 27 as char);
//print!("\x1B[2J"); // 2nd option
}
pub fn mergeSort(a: &mut Vec<f64>, low: usize, high: usize) {
let middle = (low + high) / 2;
let mut len = (high - low + 1);
if (len <= 1) {
return;
}
let lpid = fork();
match lpid {
Ok(ForkResult::Child) => {
println!("Left Process Running ");
mergeSort(a, low, middle);
exit(0);
}
Ok(ForkResult::Parent { child }) => {
let rpid = fork();
match rpid {
Ok(ForkResult::Child) => {
println!("Right Process Running ");
mergeSort(a, middle + 1, high);
exit(0);
}
Ok(ForkResult::Parent { child }) => {}
Err(err) => {
panic!("Right process not created: {}", err);
}
};
}
Err(err) => {
panic!("Left process not created {}", err);
}
};
//waitpid(lpid, None);
//waitpid(rpid, None);
// Merge the sorted subarrays
merge(a, low, middle, high);
}
fn merge(a: &mut Vec<f64>, low: usize, m: usize, high: usize) {
println!("x");
let mut left = a[low..m + 1].to_vec();
let mut right = a[m + 1..high + 1].to_vec();
println!("left: {:?}", left);
println!("right: {:?}", right);
left.reverse();
right.reverse();
for k in low..high + 1 {
if left.is_empty() {
a[k] = right.pop().unwrap();
continue;
}
if right.is_empty() {
a[k] = left.pop().unwrap();
continue;
}
if right.last() < left.last() {
a[k] = right.pop().unwrap();
} else {
a[k] = left.pop().unwrap();
}
}
println!("array: {:?}", a);
}
unsafe fn display_process_thread_counts() {
unsafe {
println!("process count:");
println!("{}", process_count);
println!("thread count:");
println!("{}", thread_count);
}
}
unsafe fn process_count_plus_plus() {
process_count += 1;
}
unsafe fn thread_count_plus_plus() {
thread_count += 1;
}
fn print_time(start: Instant, end: Instant) {
println!("TIME:");
println!("{:?}", end.checked_duration_since(start));
}
fn main() {
println!("ENTER SIZE OF ARRAY \n");
let array_size = get_array_size_from_user();
let mut generated_array = generate_array(array_size);
clear_screen();
println!("GENERATED ARRAY: \n");
display_array(&mut generated_array);
// SORTING
let start = Instant::now();
mergeSort(&mut generated_array, 0, array_size - 1);
let end = Instant::now();
// RESULT
//unsafe{
// process_count_plus_plus();
// thread_count_plus_plus();
//}
println!("SORTED ARRAY: \n");
display_array(&mut generated_array);
print_time(start, end);
unsafe {
display_process_thread_counts();
}
}
I get these results without using waitpid for the vector [3, 70, 97, 74]:
array before comparison: [3, 70, 97, 74]
comparison: [97], [74]
array after comparison: [3, 70, 74, 97]
array before comparison: [3, 70, 97, 74]
comparison: [3], [70]
array after comparison: [3, 70, 97, 74]
array before comparison: [3, 70, 97, 74]
comparison: [3, 70], [97, 74] (should be [74, 97])
array after comparison: [3, 70, 97, 74]

This has nothing to do with waitpid and everything to do with fork. When you fork a process, the OS creates a copy of your data and the child operates on this copy 1. When the child exits, its memory is discarded. The parent never sees the changes made by the child.
If you need the parent to see the changes made by the child, you should do one of the following:
Easiest and fastest is to use threads instead of processes. Threads share memory, so the parent and children all use the same memory. In Rust, the borrow checker ensures that parent and children behave correctly when accessing the same piece of memory.
Use mmap or something equivalent to share memory between the parent and children processes. Note however that it will be very difficult to ensure memory safety while the processes all try to access the same memory concurrently.
Use some kind of Inter-Process Communication (IPC) mechanism to send the result back from the children to the parent. This is easier than mmap since there is no risk of collision during memory accesses but in your case, given the amount of data that will need to be sent, this will be the slowest.
1 Actually, it uses Copy-On-Write, so data that is simply read is shared, but anything that either the parent or child writes will be copied and the other will not see the result of the write.

How can I convert a hex string to a u8 slice?

I have a string that looks like this "090A0B0C" and I would like to convert it to a slice that looks something like this [9, 10, 11, 12]. How would I best go about doing that?
I don't want to convert a single hex char tuple to a single integer value. I want to convert a string consisting of multiple hex char tuples to a slice of multiple integer values.

You can also implement hex encoding and decoding yourself, in case you want to avoid the dependency on the hex crate:
use std::{fmt::Write, num::ParseIntError};
pub fn decode_hex(s: &str) -> Result<Vec<u8>, ParseIntError> {
(0..s.len())
.step_by(2)
.map(|i| u8::from_str_radix(&s[i..i + 2], 16))
.collect()
}
pub fn encode_hex(bytes: &[u8]) -> String {
let mut s = String::with_capacity(bytes.len() * 2);
for &b in bytes {
write!(&mut s, "{:02x}", b).unwrap();
}
s
}
Note that the decode_hex() function panics if the string length is odd. I've made a version with better error handling and an optimised encoder available on the playground.

You could use the hex crate for that. The decode function looks like it does what you want:
fn main() {
let input = "090A0B0C";
let decoded = hex::decode(input).expect("Decoding failed");
println!("{:?}", decoded);
}
The above will print [9, 10, 11, 12]. Note that decode returns a heap allocated Vec<u8>, if you want to decode into an array you'd want to use the decode_to_slice function
fn main() {
let input = "090A0B0C";
let mut decoded = [0; 4];
hex::decode_to_slice(input, &mut decoded).expect("Decoding failed");
println!("{:?}", decoded);
}
or the FromHex trait:
use hex::FromHex;
fn main() {
let input = "090A0B0C";
let decoded = <[u8; 4]>::from_hex(input).expect("Decoding failed");
println!("{:?}", decoded);
}

How to convert a string of digits into a vector of digits?

I'm trying to store a string (or str) of digits, e.g. 12345 into a vector, such that the vector contains {1,2,3,4,5}.
As I'm totally new to Rust, I'm having problems with the types (String, str, char, ...) but also the lack of any information about conversion.
My current code looks like this:
fn main() {
let text = "731671";
let mut v: Vec<i32>;
let mut d = text.chars();
for i in 0..text.len() {
v.push( d.next().to_digit(10) );
}
}

You're close!
First, the index loop for i in 0..text.len() is not necessary since you're going to use an iterator anyway. It's simpler to loop directly over the iterator: for ch in text.chars(). Not only that, but your index loop and the character iterator are likely to diverge, because len() returns you the number of bytes and chars() returns you the Unicode scalar values. Being UTF-8, the string is likely to have fewer Unicode scalar values than it has bytes.
Next hurdle is that to_digit(10) returns an Option, telling you that there is a possibility the character won't be a digit. You can check whether to_digit(10) returned the Some variant of an Option with if let Some(digit) = ch.to_digit(10).
Pieced together, the code might now look like this:
fn main() {
let text = "731671";
let mut v = Vec::new();
for ch in text.chars() {
if let Some(digit) = ch.to_digit(10) {
v.push(digit);
}
}
println!("{:?}", v);
}
Now, this is rather imperative: you're making a vector and filling it digit by digit, all by yourself. You can try a more declarative or functional approach by applying a transformation over the string:
fn main() {
let text = "731671";
let v: Vec<u32> = text.chars().flat_map(|ch| ch.to_digit(10)).collect();
println!("{:?}", v);
}

ArtemGr's answer is pretty good, but their version will skip any characters that aren't digits. If you'd rather have it fail on bad digits, you can use this version instead:
fn to_digits(text: &str) -> Option<Vec<u32>> {
text.chars().map(|ch| ch.to_digit(10)).collect()
}
fn main() {
println!("{:?}", to_digits("731671"));
println!("{:?}", to_digits("731six71"));
}
Output:
Some([7, 3, 1, 6, 7, 1])
None

To mention the quick and dirty elephant in the room, if you REALLY know your string contains only digits in the range '0'..'9', than you can avoid memory allocations and copies and use the underlying &[u8] representation of String from str::as_bytes directly. Subtract b'0' from each element whenever you access it.
If you are doing competitive programming, this is one of the worthwhile speed and memory optimizations.
fn main() {
let text = "12345";
let digit = text.as_bytes();
println!("Text = {:?}", text);
println!("value of digit[3] = {}", digit[3] - b'0');
}
Output:
Text = "12345"
value of digit[3] = 4

This solution combines ArtemGr's + notriddle's solutions:
fn to_digits(string: &str) -> Vec<u32> {
let opt_vec: Option<Vec<u32>> = string
.chars()
.map(|ch| ch.to_digit(10))
.collect();
match opt_vec {
Some(vec_of_digits) => vec_of_digits,
None => vec![],
}
}
In my case, I implemented this function in &str.
pub trait ExtraProperties {
fn to_digits(self) -> Vec<u32>;
}
impl ExtraProperties for &str {
fn to_digits(self) -> Vec<u32> {
let opt_vec: Option<Vec<u32>> = self
.chars()
.map(|ch| ch.to_digit(10))
.collect();
match opt_vec {
Some(vec_of_digits) => vec_of_digits,
None => vec![],
}
}
}
In this way, I transform &str to a vector containing digits.
fn main() {
let cnpj: &str = "123456789";
let nums: Vec<u32> = cnpj.to_digits();
println!("cnpj: {cnpj}"); // cnpj: 123456789
println!("nums: {nums:?}"); // nums: [1, 2, 3, 4, 5, 6, 7, 8, 9]
}
See the Rust Playground.

Is there a method like JavaScript's substr in Rust?

I looked at the Rust docs for String but I can't find a way to extract a substring.
Is there a method like JavaScript's substr in Rust? If not, how would you implement it?
str.substr(start[, length])
The closest is probably slice_unchecked but it uses byte offsets instead of character indexes and is marked unsafe.

For characters, you can use s.chars().skip(pos).take(len):
fn main() {
let s = "Hello, world!";
let ss: String = s.chars().skip(7).take(5).collect();
println!("{}", ss);
}
Beware of the definition of Unicode characters though.
For bytes, you can use the slice syntax:
fn main() {
let s = b"Hello, world!";
let ss = &s[7..12];
println!("{:?}", ss);
}

You can use the as_str method on the Chars iterator to get back a &str slice after you have stepped on the iterator. So to skip the first start chars, you can call
let s = "Some text to slice into";
let mut iter = s.chars();
iter.by_ref().nth(start); // eat up start values
let slice = iter.as_str(); // get back a slice of the rest of the iterator
Now if you also want to limit the length, you first need to figure out the byte-position of the length character:
let end_pos = slice.char_indices().nth(length).map(|(n, _)| n).unwrap_or(0);
let substr = &slice[..end_pos];
This might feel a little roundabout, but Rust is not hiding anything from you that might take up CPU cycles. That said, I wonder why there's no crate yet that offers a substr method.

This code performs both substring-ing and string-slicing, without panicking nor allocating:
use std::ops::{Bound, RangeBounds};
trait StringUtils {
fn substring(&self, start: usize, len: usize) -> &str;
fn slice(&self, range: impl RangeBounds<usize>) -> &str;
}
impl StringUtils for str {
fn substring(&self, start: usize, len: usize) -> &str {
let mut char_pos = 0;
let mut byte_start = 0;
let mut it = self.chars();
loop {
if char_pos == start { break; }
if let Some(c) = it.next() {
char_pos += 1;
byte_start += c.len_utf8();
}
else { break; }
}
char_pos = 0;
let mut byte_end = byte_start;
loop {
if char_pos == len { break; }
if let Some(c) = it.next() {
char_pos += 1;
byte_end += c.len_utf8();
}
else { break; }
}
&self[byte_start..byte_end]
}
fn slice(&self, range: impl RangeBounds<usize>) -> &str {
let start = match range.start_bound() {
Bound::Included(bound) | Bound::Excluded(bound) => *bound,
Bound::Unbounded => 0,
};
let len = match range.end_bound() {
Bound::Included(bound) => *bound + 1,
Bound::Excluded(bound) => *bound,
Bound::Unbounded => self.len(),
} - start;
self.substring(start, len)
}
}
fn main() {
let s = "abcdèfghij";
// All three statements should print:
// "abcdè, abcdèfghij, dèfgh, dèfghij."
println!("{}, {}, {}, {}.",
s.substring(0, 5),
s.substring(0, 50),
s.substring(3, 5),
s.substring(3, 50));
println!("{}, {}, {}, {}.",
s.slice(..5),
s.slice(..50),
s.slice(3..8),
s.slice(3..));
println!("{}, {}, {}, {}.",
s.slice(..=4),
s.slice(..=49),
s.slice(3..=7),
s.slice(3..));
}

For my_string.substring(start, len)-like syntax, you can write a custom trait:
trait StringUtils {
fn substring(&self, start: usize, len: usize) -> Self;
}
impl StringUtils for String {
fn substring(&self, start: usize, len: usize) -> Self {
self.chars().skip(start).take(len).collect()
}
}
// Usage:
fn main() {
let phrase: String = "this is a string".to_string();
println!("{}", phrase.substring(5, 8)); // prints "is a str"
}

The solution given by oli_obk does not handle last index of string slice. It can be fixed with .chain(once(s.len())).
Here function substr implements a substring slice with error handling. If invalid index is passed to function, then a valid part of string slice is returned with Err-variant. All corner cases should be handled correctly.
fn substr(s: &str, begin: usize, length: Option<usize>) -> Result<&str, &str> {
use std::iter::once;
let mut itr = s.char_indices().map(|(n, _)| n).chain(once(s.len()));
let beg = itr.nth(begin);
if beg.is_none() {
return Err("");
} else if length == Some(0) {
return Ok("");
}
let end = length.map_or(Some(s.len()), |l| itr.nth(l-1));
if let Some(end) = end {
return Ok(&s[beg.unwrap()..end]);
} else {
return Err(&s[beg.unwrap()..s.len()]);
}
}
let s = "abc🙂";
assert_eq!(Ok("bc"), substr(s, 1, Some(2)));
assert_eq!(Ok("c🙂"), substr(s, 2, Some(2)));
assert_eq!(Ok("c🙂"), substr(s, 2, None));
assert_eq!(Err("c🙂"), substr(s, 2, Some(99)));
assert_eq!(Ok(""), substr(s, 2, Some(0)));
assert_eq!(Err(""), substr(s, 5, Some(4)));
Note that this does not handle unicode grapheme clusters. For example, "y̆es" contains 4 unicode chars but 3 grapheme clusters. Crate unicode-segmentation solves this problem. Unicode grapheme clusters are handled correctly if part
let mut itr = s.char_indices()...
is replaced with
use unicode_segmentation::UnicodeSegmentation;
let mut itr = s.grapheme_indices(true)...
Then also following works
assert_eq!(Ok("y̆"), substr("y̆es", 0, Some(1)));

Knowing about the various syntaxes of the slice type might be beneficial for some of the readers.
Reference to a part of a string
&s[6..11]
If you start at index 0, you can omit the value
&s[0..1] ^= &s[..1]
Equivalent if your substring contains the last byte of the string
&s[3..s.len()] ^= &s[3..]
This also applies when the slice encompasses the entire string
&s[..]
You can also use the range inclusive operator to include the last value
&s[..=1]
Link to docs: https://doc.rust-lang.org/book/ch04-03-slices.html

I would suggest you use the crate substring. (And look at its source code if you want to learn how to do this properly.)

I couldn't find the exact substr implementation that I'm familiar with from other programming languages like: JavaScript, Dart, and etc.
Here is possible implementation of method substr to &str and String
Let's define a trait for making able to implement functions to default types, (like extensions in Dart).
trait Substr {
fn substr(&self, start: usize, end: usize) -> String;
}
Then implement this trait for &str
impl<'a> Substr for &'a str {
fn substr(&self, start: usize, end: usize) -> String {
if start > end || start == end {
return String::new();
}
self.chars().skip(start).take(end - start).collect()
}
}
Try:
fn main() {
let string = "Hello, world!";
let substring = string.substr(0, 4);
println!("{}", substring); // Hell
}

You can also use .to_string()[ <range> ].
This example takes an immutable slice of the original string, then mutates that string to demonstrate the original slice is preserved.
let mut s: String = "Hello, world!".to_string();
let substring: &str = &s.to_string()[..6];
s.replace_range(..6, "Goodbye,");
println!("{} {} universe!", s, substring);
// Goodbye, world! Hello, universe!

I'm not very experienced in Rust but I gave it a try. If someone could correct my answer please don't hesitate.
fn substring(string:String, start:u32, end:u32) -> String {
let mut substr = String::new();
let mut i = start;
while i < end + 1 {
substr.push_str(&*(string.chars().nth(i as usize).unwrap().to_string()));
i += 1;
}
return substr;
}
Here is a playground

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Rust - padded array of bytes from str - rust

Related

How can I convert an [u8] hex ascii representation to a u64

How to use `waitpid` to wait for a process in Rust?

How can I convert a hex string to a u8 slice?

How to convert a string of digits into a vector of digits?

Is there a method like JavaScript's substr in Rust?

Categories

Resources