How to remove newline characters from a String? - rust

I'm trying to remove newline characters from a String (file content read from a file) and convert it to a Vec<u8>.
Example input string:
let ss = String::from("AAAAAAAA\nBBBBBBBBB\nCCCCCC\nDDDDD\n\n");
fn parse(s: String) -> Vec<u8> {
let s = s.chars().skip_while(|c| *c == '\n');
let sett = s.into_iter().map(|c| c as u8).collect();
sett
}
While I get no error, skip_while doesn't seem to remove the newline characters from the string. What am I doing wrong here?

You can basically replace the \n from the string then convert it to Vec<u8> with into_bytes()
fn parse(s: String) -> Vec<u8> {
s.replace("\n", "").into_bytes()
}
If you want to do it with iterators you can do it with filter:
fn parse(s: String) -> Vec<u8> {
s.chars().filter(|c| *c != '\n').map(|c| c as u8).collect()
}
You can call it like following:
use std::str::from_utf8;
fn main() {
let my_string = String::from("AAAAAAAA\nBBBBBBBBB\nCCCCCC\nDDDDD\n\n");
let parsed_string = parse(my_string.clone());
println!("{:?}", from_utf8(&parsed_string));
}
Playground

Related

How to read a text File in Rust and read mutliple Values per line

So basically, I have a text file with the following syntax:
String int
String int
String int
I have an idea how to read the Values if there is only one entry per line, but if there are multiple, I do not know how to do it.
In Java, I would do something simple with while and Scanner but in Rust I have no clue.
I am fairly new to Rust so please help me.
Thanks for your help in advance
Solution
Here is my modified Solution of #netwave 's code:
use std::fs;
use std::io::{BufRead, BufReader, Error};
fn main() -> Result<(), Error> {
let buff_reader = BufReader::new(fs::File::open(file)?);
for line in buff_reader.lines() {
let parsed = sscanf::scanf!(line?, "{} {}", String, i32);
println!("{:?}\n", parsed);
}
Ok(())
}
You can use the BuffRead trait, which has a read_line method. Also you can use lines.
For doing so the easiest option would be to wrap the File instance with a BuffReader:
use std::fs;
use std::io::{BufRead, BufReader};
...
let buff_reader = BufReader::new(fs::File::open(path)?);
loop {
let mut buff = String::new();
buff_reader.read_line(&mut buff)?;
println!("{}", buff);
}
Playground
Once you have each line you can easily use sscanf crate to parse the line to the types you need:
let parsed = sscanf::scanf!(buff, "{} {}", String, i32);
Based on: https://doc.rust-lang.org/rust-by-example/std_misc/file/read_lines.html
For data.txt to contain:
str1 100
str2 200
str3 300
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
fn main() {
// File hosts must exist in current path before this produces output
if let Ok(lines) = read_lines("./data.txt") {
// Consumes the iterator, returns an (Optional) String
for line in lines {
if let Ok(data) = line {
let values: Vec<&str> = data.split(' ').collect();
match values.len() {
2 => {
let strdata = values[0].parse::<String>();
let intdata = values[1].parse::<i32>();
println!("Got: {:?} {:?}", strdata, intdata);
},
_ => panic!("Invalid input line {}", data),
};
}
}
}
}
// The output is wrapped in a Result to allow matching on errors
// Returns an Iterator to the Reader of the lines of the file.
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
Outputs:
Got: Ok("str1") Ok(100)
Got: Ok("str2") Ok(200)
Got: Ok("str3") Ok(300)

How do I change characters at a specific index within a string in rust?

I am trying to change a single character at a specific index in a string, but I do not know how to in rust. For example, how would I change the 4th character in "hello world" to 'x', so that it would be "helxo world"?
The easiest way is to use the replace_range() method like this:
let mut hello = String::from("hello world");
hello.replace_range(3..4,"x");
println!("hello: {}", hello);
Output: hello: helxo world (Playground)
Please note that this will panic if the range to be replaced does not start and end on UTF-8 codepoint boundaries. E.g. this will panic:
let mut hello2 = String::from("hell๐Ÿ˜€ world");
hello2.replace_range(4..5,"x"); // panics because ๐Ÿ˜€ needs more than one byte in UTF-8
If you want to replace the nth UTF-8 code point, you have to do something like this:
pub fn main() {
let mut hello = String::from("hell๐Ÿ˜€ world");
hello.replace_range(
hello
.char_indices()
.nth(4)
.map(|(pos, ch)| (pos..pos + ch.len_utf8()))
.unwrap(),
"x",
);
println!("hello: {}", hello);
}
(Playground)
The standard way of representing a string in Rust is as a contiguous range of bytes encoded as a UTF-8 string. UTF-8 codepoints can be from one to 4 bytes long, so generally you can't simply replace one UTF-8 codepoint with another because the length might change. You also can't do simple pointer arithmetic to index into a Rust String to the nth character, because again codepoint encodings can be from 1 to 4 bytes long.
So one safe but slow way to do it would be like this, iterating through the characters of the source string, replacing the one you want, then creating a new string:
fn replace_nth_char(s: &str, idx: usize, newchar: char) -> String {
s.chars().enumerate().map(|(i,c)| if i == idx { newchar } else { c }).collect()
}
But we can do it in O(1) if we manually make sure the old and new character are single-byte ascii.
fn replace_nth_char_safe(s: &str, idx: usize, newchar: char) -> String {
s.chars().enumerate().map(|(i,c)| if i == idx { newchar } else { c }).collect()
}
fn replace_nth_char_ascii(s: &mut str, idx: usize, newchar: char) {
let s_bytes: &mut [u8] = unsafe { s.as_bytes_mut() };
assert!(idx < s_bytes.len());
assert!(s_bytes[idx].is_ascii());
assert!(newchar.is_ascii());
// we've made sure this is safe.
s_bytes[idx] = newchar as u8;
}
fn main() {
let s = replace_nth_char_safe("Hello, world!", 3, 'x');
assert_eq!(s, "Helxo, world!");
let mut s = String::from("Hello, world!");
replace_nth_char_ascii(&mut s, 3, 'x');
assert_eq!(s, "Helxo, world!");
}
Keep in mind that idx parameter in replace_nth_char_ascii is not a character index, but instead a byte index. If there are any multibyte characters earlier in the string, then the byte index and the character index will not correspond.

How to change order of letter in rust with swap?

I need to change order in every word in sentence. I have string of separators to split my code into a words, and function swap(0,1) to change order of letters in word. However I need to skip first and last letter in every word and I can't use regular expressions for this purposes.
Here some code:
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
dbg!(mix("According, research"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
group.chunks_exact_mut(2).for_each(|x| x.swap(0, 1));
}
a.iter().collect()
}
Output as follows:
[src/main.rs:4] mix("According, research") = "cAocdrnig, eresrahc"
But I need output as follows:
[src/main.rs:4] mix("According, research") = "Accroidng, rseaerch"
Someone knows how to fix it ?
All you need is to use a slice that doesn't have the first and last character using group[1..len-2]:
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
dbg!(mix("According, research"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
let len = group.len();
if len > 2 {
group[1..len-2].chunks_exact_mut(2).for_each(|x| x.swap(0, 1));
}
}
a.iter().collect()
}

Proper way to access Vec<&[u8]> as strings [duplicate]

This question already has answers here:
How do I convert a Vector of bytes (u8) to a string?
(5 answers)
Closed 3 years ago.
I have a Vec<&[u8]> that I want to convert to a String like this:
let rfrce: Vec<&[u8]> = rec.alleles();
for r in rfrce {
// create new String from rfrce
}
I tried this but it is not working since only converting u8 to char is possible, but [u8] to char is not:
let rfrce = rec.alleles();
let mut str = String::from("");
for r in rfrce {
str.push(*r as char);
}
Because r is an array of u8, you need to convert it to a valid &str and use push_str method of String.
use std::str;
fn main() {
let rfrce = vec![&[65,66,67], &[68,69,70]];
let mut str = String::new();
for r in rfrce {
str.push_str(str::from_utf8(r).unwrap());
}
println!("{}", str);
}
Rust Playground
I'd go with TryFrom<u32>:
fn to_string(v: &[&[u8]]) -> Result<String, std::char::CharTryFromError> {
/// Transform a &[u8] to an UTF-8 codepoint
fn su8_to_u32(s: &[u8]) -> Option<u32> {
if s.len() > 4 {
None
} else {
let shift = (0..=32).step_by(8);
let result = s.iter().rev().cloned().zip(shift).map(|(u, shift)| (u as u32) << shift).sum();
Some(result)
}
}
use std::convert::TryFrom;
v.iter().map(|&s| su8_to_u32(s)).try_fold(String::new(), |mut s, u| {
let u = u.unwrap(); //TODO error handling
s.push(char::try_from(u)?);
Ok(s)
})
}
fn main() {
let rfrce: Vec<&[u8]> = vec![&[48][..], &[49][..], &[50][..], &[51][..]];
assert_eq!(to_string(&rfrce), Ok("0123".into()));
let rfrce: Vec<&[u8]> = vec![&[0xc3, 0xa9][..]]; // https://www.utf8icons.com/character/50089/utf-8-character
assert_eq!(to_string(&rfrce), Ok("์Žฉ".into()));
}

Replacing numbered placeholders with elements of a vector in Rust?

I have the following:
A Vec<&str>.
A &str that may contain $0, $1, etc. referencing the elements in the vector.
I want to get a version of my &str where all occurences of $i are replaced by the ith element of the vector. So if I have vec!["foo", "bar"] and $0$1, the result would be foobar.
My first naive approach was to iterate over i = 1..N and do a search and replace for every index. However, this is a quite ugly and inefficient solution. Also, it gives undesired outputs if any of the values in the vector contains the $ character.
Is there a better way to do this in Rust?
This solution is inspired (including copied test cases) by Shepmaster's, but simplifies things by using the replace_all method.
use regex::{Regex, Captures};
fn template_replace(template: &str, values: &[&str]) -> String {
let regex = Regex::new(r#"\$(\d+)"#).unwrap();
regex.replace_all(template, |captures: &Captures| {
values
.get(index(captures))
.unwrap_or(&"")
}).to_string()
}
fn index(captures: &Captures) -> usize {
captures.get(1)
.unwrap()
.as_str()
.parse()
.unwrap()
}
fn main() {
assert_eq!("ab", template_replace("$0$1", &["a", "b"]));
assert_eq!("$1b", template_replace("$0$1", &["$1", "b"]));
assert_eq!("moo", template_replace("moo", &[]));
assert_eq!("abc", template_replace("a$0b$0c", &[""]));
assert_eq!("abcde", template_replace("a$0c$1e", &["b", "d"]));
println!("It works!");
}
I would use a regex
use regex::Regex; // 1.1.0
fn example(s: &str, vals: &[&str]) -> String {
let r = Regex::new(r#"\$(\d+)"#).unwrap();
let mut start = 0;
let mut new = String::new();
for caps in r.captures_iter(s) {
let m = caps.get(0).expect("Regex group 0 missing");
let d = caps.get(1).expect("Regex group 1 missing");
let d: usize = d.as_str().parse().expect("Could not parse index");
// Copy non-placeholder
new.push_str(&s[start..m.start()]);
// Copy placeholder
new.push_str(&vals[d]);
start = m.end()
}
// Copy non-placeholder
new.push_str(&s[start..]);
new
}
fn main() {
assert_eq!("ab", example("$0$1", &["a", "b"]));
assert_eq!("$1b", example("$0$1", &["$1", "b"]));
assert_eq!("moo", example("moo", &[]));
assert_eq!("abc", example("a$0b$0c", &[""]));
}
See also:
Split a string keeping the separators

Resources