I'm trying to remove newline characters from a String (file content read from a file) and convert it to a Vec<u8>.
Example input string:
let ss = String::from("AAAAAAAA\nBBBBBBBBB\nCCCCCC\nDDDDD\n\n");
fn parse(s: String) -> Vec<u8> {
let s = s.chars().skip_while(|c| *c == '\n');
let sett = s.into_iter().map(|c| c as u8).collect();
sett
}
While I get no error, skip_while doesn't seem to remove the newline characters from the string. What am I doing wrong here?
You can basically replace the \n from the string then convert it to Vec<u8> with into_bytes()
fn parse(s: String) -> Vec<u8> {
s.replace("\n", "").into_bytes()
}
If you want to do it with iterators you can do it with filter:
fn parse(s: String) -> Vec<u8> {
s.chars().filter(|c| *c != '\n').map(|c| c as u8).collect()
}
You can call it like following:
use std::str::from_utf8;
fn main() {
let my_string = String::from("AAAAAAAA\nBBBBBBBBB\nCCCCCC\nDDDDD\n\n");
let parsed_string = parse(my_string.clone());
println!("{:?}", from_utf8(&parsed_string));
}
Playground
Related
So basically, I have a text file with the following syntax:
String int
String int
String int
I have an idea how to read the Values if there is only one entry per line, but if there are multiple, I do not know how to do it.
In Java, I would do something simple with while and Scanner but in Rust I have no clue.
I am fairly new to Rust so please help me.
Thanks for your help in advance
Solution
Here is my modified Solution of #netwave 's code:
use std::fs;
use std::io::{BufRead, BufReader, Error};
fn main() -> Result<(), Error> {
let buff_reader = BufReader::new(fs::File::open(file)?);
for line in buff_reader.lines() {
let parsed = sscanf::scanf!(line?, "{} {}", String, i32);
println!("{:?}\n", parsed);
}
Ok(())
}
You can use the BuffRead trait, which has a read_line method. Also you can use lines.
For doing so the easiest option would be to wrap the File instance with a BuffReader:
use std::fs;
use std::io::{BufRead, BufReader};
...
let buff_reader = BufReader::new(fs::File::open(path)?);
loop {
let mut buff = String::new();
buff_reader.read_line(&mut buff)?;
println!("{}", buff);
}
Playground
Once you have each line you can easily use sscanf crate to parse the line to the types you need:
let parsed = sscanf::scanf!(buff, "{} {}", String, i32);
Based on: https://doc.rust-lang.org/rust-by-example/std_misc/file/read_lines.html
For data.txt to contain:
str1 100
str2 200
str3 300
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
fn main() {
// File hosts must exist in current path before this produces output
if let Ok(lines) = read_lines("./data.txt") {
// Consumes the iterator, returns an (Optional) String
for line in lines {
if let Ok(data) = line {
let values: Vec<&str> = data.split(' ').collect();
match values.len() {
2 => {
let strdata = values[0].parse::<String>();
let intdata = values[1].parse::<i32>();
println!("Got: {:?} {:?}", strdata, intdata);
},
_ => panic!("Invalid input line {}", data),
};
}
}
}
}
// The output is wrapped in a Result to allow matching on errors
// Returns an Iterator to the Reader of the lines of the file.
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
Outputs:
Got: Ok("str1") Ok(100)
Got: Ok("str2") Ok(200)
Got: Ok("str3") Ok(300)
I am trying to change a single character at a specific index in a string, but I do not know how to in rust. For example, how would I change the 4th character in "hello world" to 'x', so that it would be "helxo world"?
The easiest way is to use the replace_range() method like this:
let mut hello = String::from("hello world");
hello.replace_range(3..4,"x");
println!("hello: {}", hello);
Output: hello: helxo world (Playground)
Please note that this will panic if the range to be replaced does not start and end on UTF-8 codepoint boundaries. E.g. this will panic:
let mut hello2 = String::from("hell๐ world");
hello2.replace_range(4..5,"x"); // panics because ๐ needs more than one byte in UTF-8
If you want to replace the nth UTF-8 code point, you have to do something like this:
pub fn main() {
let mut hello = String::from("hell๐ world");
hello.replace_range(
hello
.char_indices()
.nth(4)
.map(|(pos, ch)| (pos..pos + ch.len_utf8()))
.unwrap(),
"x",
);
println!("hello: {}", hello);
}
(Playground)
The standard way of representing a string in Rust is as a contiguous range of bytes encoded as a UTF-8 string. UTF-8 codepoints can be from one to 4 bytes long, so generally you can't simply replace one UTF-8 codepoint with another because the length might change. You also can't do simple pointer arithmetic to index into a Rust String to the nth character, because again codepoint encodings can be from 1 to 4 bytes long.
So one safe but slow way to do it would be like this, iterating through the characters of the source string, replacing the one you want, then creating a new string:
fn replace_nth_char(s: &str, idx: usize, newchar: char) -> String {
s.chars().enumerate().map(|(i,c)| if i == idx { newchar } else { c }).collect()
}
But we can do it in O(1) if we manually make sure the old and new character are single-byte ascii.
fn replace_nth_char_safe(s: &str, idx: usize, newchar: char) -> String {
s.chars().enumerate().map(|(i,c)| if i == idx { newchar } else { c }).collect()
}
fn replace_nth_char_ascii(s: &mut str, idx: usize, newchar: char) {
let s_bytes: &mut [u8] = unsafe { s.as_bytes_mut() };
assert!(idx < s_bytes.len());
assert!(s_bytes[idx].is_ascii());
assert!(newchar.is_ascii());
// we've made sure this is safe.
s_bytes[idx] = newchar as u8;
}
fn main() {
let s = replace_nth_char_safe("Hello, world!", 3, 'x');
assert_eq!(s, "Helxo, world!");
let mut s = String::from("Hello, world!");
replace_nth_char_ascii(&mut s, 3, 'x');
assert_eq!(s, "Helxo, world!");
}
Keep in mind that idx parameter in replace_nth_char_ascii is not a character index, but instead a byte index. If there are any multibyte characters earlier in the string, then the byte index and the character index will not correspond.
I need to change order in every word in sentence. I have string of separators to split my code into a words, and function swap(0,1) to change order of letters in word. However I need to skip first and last letter in every word and I can't use regular expressions for this purposes.
Here some code:
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
dbg!(mix("According, research"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
group.chunks_exact_mut(2).for_each(|x| x.swap(0, 1));
}
a.iter().collect()
}
Output as follows:
[src/main.rs:4] mix("According, research") = "cAocdrnig, eresrahc"
But I need output as follows:
[src/main.rs:4] mix("According, research") = "Accroidng, rseaerch"
Someone knows how to fix it ?
All you need is to use a slice that doesn't have the first and last character using group[1..len-2]:
const SEPARATORS: &str = " ,;:!?./%*$=+)#_-('\"&1234567890\r\n";
fn main() {
dbg!(mix("According, research"));
}
fn mix(s: &str) -> String {
let mut a: Vec<char> = s.chars().collect();
for group in a.split_mut(|num| SEPARATORS.contains(*num)) {
let len = group.len();
if len > 2 {
group[1..len-2].chunks_exact_mut(2).for_each(|x| x.swap(0, 1));
}
}
a.iter().collect()
}
This question already has answers here:
How do I convert a Vector of bytes (u8) to a string?
(5 answers)
Closed 3 years ago.
I have a Vec<&[u8]> that I want to convert to a String like this:
let rfrce: Vec<&[u8]> = rec.alleles();
for r in rfrce {
// create new String from rfrce
}
I tried this but it is not working since only converting u8 to char is possible, but [u8] to char is not:
let rfrce = rec.alleles();
let mut str = String::from("");
for r in rfrce {
str.push(*r as char);
}
Because r is an array of u8, you need to convert it to a valid &str and use push_str method of String.
use std::str;
fn main() {
let rfrce = vec![&[65,66,67], &[68,69,70]];
let mut str = String::new();
for r in rfrce {
str.push_str(str::from_utf8(r).unwrap());
}
println!("{}", str);
}
Rust Playground
I'd go with TryFrom<u32>:
fn to_string(v: &[&[u8]]) -> Result<String, std::char::CharTryFromError> {
/// Transform a &[u8] to an UTF-8 codepoint
fn su8_to_u32(s: &[u8]) -> Option<u32> {
if s.len() > 4 {
None
} else {
let shift = (0..=32).step_by(8);
let result = s.iter().rev().cloned().zip(shift).map(|(u, shift)| (u as u32) << shift).sum();
Some(result)
}
}
use std::convert::TryFrom;
v.iter().map(|&s| su8_to_u32(s)).try_fold(String::new(), |mut s, u| {
let u = u.unwrap(); //TODO error handling
s.push(char::try_from(u)?);
Ok(s)
})
}
fn main() {
let rfrce: Vec<&[u8]> = vec![&[48][..], &[49][..], &[50][..], &[51][..]];
assert_eq!(to_string(&rfrce), Ok("0123".into()));
let rfrce: Vec<&[u8]> = vec![&[0xc3, 0xa9][..]]; // https://www.utf8icons.com/character/50089/utf-8-character
assert_eq!(to_string(&rfrce), Ok("์ฉ".into()));
}
I have the following:
A Vec<&str>.
A &str that may contain $0, $1, etc. referencing the elements in the vector.
I want to get a version of my &str where all occurences of $i are replaced by the ith element of the vector. So if I have vec!["foo", "bar"] and $0$1, the result would be foobar.
My first naive approach was to iterate over i = 1..N and do a search and replace for every index. However, this is a quite ugly and inefficient solution. Also, it gives undesired outputs if any of the values in the vector contains the $ character.
Is there a better way to do this in Rust?
This solution is inspired (including copied test cases) by Shepmaster's, but simplifies things by using the replace_all method.
use regex::{Regex, Captures};
fn template_replace(template: &str, values: &[&str]) -> String {
let regex = Regex::new(r#"\$(\d+)"#).unwrap();
regex.replace_all(template, |captures: &Captures| {
values
.get(index(captures))
.unwrap_or(&"")
}).to_string()
}
fn index(captures: &Captures) -> usize {
captures.get(1)
.unwrap()
.as_str()
.parse()
.unwrap()
}
fn main() {
assert_eq!("ab", template_replace("$0$1", &["a", "b"]));
assert_eq!("$1b", template_replace("$0$1", &["$1", "b"]));
assert_eq!("moo", template_replace("moo", &[]));
assert_eq!("abc", template_replace("a$0b$0c", &[""]));
assert_eq!("abcde", template_replace("a$0c$1e", &["b", "d"]));
println!("It works!");
}
I would use a regex
use regex::Regex; // 1.1.0
fn example(s: &str, vals: &[&str]) -> String {
let r = Regex::new(r#"\$(\d+)"#).unwrap();
let mut start = 0;
let mut new = String::new();
for caps in r.captures_iter(s) {
let m = caps.get(0).expect("Regex group 0 missing");
let d = caps.get(1).expect("Regex group 1 missing");
let d: usize = d.as_str().parse().expect("Could not parse index");
// Copy non-placeholder
new.push_str(&s[start..m.start()]);
// Copy placeholder
new.push_str(&vals[d]);
start = m.end()
}
// Copy non-placeholder
new.push_str(&s[start..]);
new
}
fn main() {
assert_eq!("ab", example("$0$1", &["a", "b"]));
assert_eq!("$1b", example("$0$1", &["$1", "b"]));
assert_eq!("moo", example("moo", &[]));
assert_eq!("abc", example("a$0b$0c", &[""]));
}
See also:
Split a string keeping the separators