How to read in a fasta in Rust usio the bio package

How to read in a fasta in Rust usio the bio package - rust

I'm pretty new to Rust (and bio rust) and just learning so any guidance would be appreciated. I've read the small Read Example for how to read in a fasta sequence from stdin,
let mut records = fasta::Reader::new(io::stdin()).records();
but I can't figure out how to read in from a file. I've tried
let mut records = fasta::Reader::new(filename);
Where the filename is a slice and a string and I've found the from_file function trying that as well. While some of them appear to work, then when I try to parse through them with for or while loops, they always complain that they're of the wrong type. The from_file function seems to not make an iterator, but a Result reader, so I can't call the next() or collect() function on it,
let mut records = fasta::Reader::from_file(filename);
let mut nb_reads = 0;
let mut nb_bases = 0;
while let Some(Ok(record)) = records.next() {
nb_reads += 1;
nb_bases += record.seq().len();
let sa = suffix_array(record.seq());
println!("Here's the Suffix array: {:#?}", sa);
nb_reads += 1;
nb_bases += record.seq().len();
}
while the for loop seems to work, but the 'result' iterator doesn't have the right type so I can't pull sequences.
let mut reader = fasta::Reader::from_file(filename);
let mut nb_reads = 0;
let mut nb_bases = 0;
for result in reader {
nb_reads += 1;
nb_bases += result.seq().len();
let sa = suffix_array(result.seq());
println!("Here's the Suffix array: {:#?}", sa);
nb_reads += 1;
nb_bases += result.seq().len();
}
I'm stumped, but I feel like I'm close to getting it to work. Thanks!

I found something that works, although I'm not sure if it's an answer
fn main() {
let args: Vec<String> = env::args().collect();
let filename: &str = &args[1];
let reader = fasta::Reader::from_file(filename).unwrap();
let mut nb_reads = 0;
let mut nb_bases = 0;
for result in reader.records() {
let result_data = &result.unwrap();
nb_reads += 1;
nb_bases += result_data.seq().len();
println!("{:?}",result_data.id());
}
println!("Number of reads: {}", nb_reads);
println!("Number of bases: {}", nb_bases);
}
It appears that you need to get the pointer of the unwrapped records in the fasta reader. I wish the documentation was a bit more helpful for beginners, but I'll leave this here for anyone else struggling. Also if you try to print the .seq() it may crash your terminal haha. Moral of the story is check your types and deal with Result types (I'll figure out how to properly unwrap them later, I know my code is far from optimal now)

You need to handle the Result in order to get the type you expect.
For example:
let reader = fasta::Reader::from_file(filename).expect("Unable to open");
See this thread for more details:
Unable to read file contents to string - Result does not implement any method in scope named `read_to_string`

Related

Why Sequential is faster than parallel?

The code is to count the frequency of each word in an article. In the code, I implemented sequential, muti-thread, and muti-thread with a thread pool.
I test the running time of three methods, however, I found that the sequential method is the fastest one. I use the article (data) at 37423.txt, the code is at play.rust-lang.org.
Below is just the single- and multi version (without the threadpool version):
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::SystemTime;
pub fn word_count(article: &str) -> HashMap<String, i64> {
let now1 = SystemTime::now();
let mut map = HashMap::new();
for word in article.split_whitespace() {
let count = map.entry(word.to_string()).or_insert(0);
*count += 1;
}
let after1 = SystemTime::now();
let d1 = after1.duration_since(now1);
println!("single: {:?}", d1.as_ref().unwrap());
map
}
fn word_count_thread(word_vec: Vec<String>, counts: &Arc<Mutex<HashMap<String, i64>>>) {
let mut p_count = HashMap::new();
for word in word_vec {
*p_count.entry(word).or_insert(0) += 1;
}
let mut counts = counts.lock().unwrap();
for (word, count) in p_count {
*counts.entry(word.to_string()).or_insert(0) += count;
}
}
pub fn mt_word_count(article: &str) -> HashMap<String, i64> {
let word_vec = article
.split_whitespace()
.map(|x| x.to_owned())
.collect::<Vec<String>>();
let count = Arc::new(Mutex::new(HashMap::new()));
let len = word_vec.len();
let q1 = len / 4;
let q2 = len / 2;
let q3 = q1 * 3;
let part1 = word_vec[..q1].to_vec();
let part2 = word_vec[q1..q2].to_vec();
let part3 = word_vec[q2..q3].to_vec();
let part4 = word_vec[q3..].to_vec();
let now2 = SystemTime::now();
let count1 = count.clone();
let count2 = count.clone();
let count3 = count.clone();
let count4 = count.clone();
let handle1 = thread::spawn(move || {
word_count_thread(part1, &count1);
});
let handle2 = thread::spawn(move || {
word_count_thread(part2, &count2);
});
let handle3 = thread::spawn(move || {
word_count_thread(part3, &count3);
});
let handle4 = thread::spawn(move || {
word_count_thread(part4, &count4);
});
handle1.join().unwrap();
handle2.join().unwrap();
handle3.join().unwrap();
handle4.join().unwrap();
let x = count.lock().unwrap().clone();
let after2 = SystemTime::now();
let d2 = after2.duration_since(now2);
println!("muti: {:?}", d2.as_ref().unwrap());
x
}
The result of mine is: single:7.93ms, muti: 15.78ms, threadpool: 25.33ms
I did the separation of the article before calculating time.
I want to know if the code has some problem.

First you may want to know the single-threaded version is mostly parsing whitespace (and I/O, but the file is small so it will be in the OS cache on the second run). At most ~20% of the runtime is the counting that you parallelized. Here is the cargo flamegraph of the single-threaded code:
In the multi-threaded version, it's a mess of thread creation, copying and hashmap overhead. To make sure it's not a "too little data" problem, I've used used 100x your input txt file and I'm measuring a 2x slowdown over the single-threaded version. According to the time command, it uses 2x CPU-time compared to wall-clock, so it seems to do some parallel work. The profile looks like this: (clickable svg version)
I'm not sure what to make of it exactly, but it's clear that memory management overhead has increased a lot. You seem to be copying strings for each hashmap, while an ideal wordcount would probably do zero string copying while counting.
More generally I think it's a bad idea to join the results in the threads - the way you do it (as opposed to a map-reduce pattern) the thread needs a global lock, so you could just pass the results back to the main thread instead for combining. I'm not sure if this is the main problem here, though.
Optimization
To avoid string copying, use HashMap<&str, i64> instead of HashMap<String, i64>. This requires some changes (lifetime annotations and thread::scope()). It makes mt_word_count() about 6x faster compared to the old version.
With a large input I'm measuring now a 4x speedup compared to word_count(). (Which is the best you can hope for with four threads.) The multi-threaded version is now also faster overall, but only by ~20% or so, for the reasons explained above. (Note that the single-threaded baseline has also profited the same &str optimization. Also, many things could still be improved/optimized, but I'll stop here.)
fn word_count_thread<'t>(word_vec: Vec<&'t str>, counts: &Arc<Mutex<HashMap<&'t str, i64>>>) {
let mut p_count = HashMap::new();
for word in word_vec {
*p_count.entry(word).or_insert(0) += 1;
}
let mut counts = counts.lock().unwrap();
for (word, count) in p_count {
*counts.entry(word).or_insert(0) += count;
}
}
pub fn mt_word_count<'t>(article: &'t str) -> HashMap<&'t str, i64> {
let word_vec = article.split_whitespace().collect::<Vec<&str>>();
// (skipping 16 unmodified lines)
let x = thread::scope(|scope| {
let handle1 = scope.spawn(move || {
word_count_thread(part1, &count1);
});
let handle2 = scope.spawn(move || {
word_count_thread(part2, &count2);
});
let handle3 = scope.spawn(move || {
word_count_thread(part3, &count3);
});
let handle4 = scope.spawn(move || {
word_count_thread(part4, &count4);
});
handle1.join().unwrap();
handle2.join().unwrap();
handle3.join().unwrap();
handle4.join().unwrap();
count.lock().unwrap().clone()
});
let after2 = SystemTime::now();
let d2 = after2.duration_since(now2);
println!("muti: {:?}", d2.as_ref().unwrap());
x
}

Why when I access a HashMap in rust it prints Some in front of the text?

I am trying to make a decimal to hexidecimal converter in rust, and it works fine. However, it prints Some("") in front of the output like Some("1")Some("A")Some("4"). Does anyone know how to fix this? It could be the result of using String::from or I may need to parse the answer. Sorry if this is some easy fix as I am currently learning rust and I do not know all of the intricacies of rust. Thank you in advance!
main.rs here:
use std::{
io::{
self,
Write,
},
};
use std::collections::HashMap;
use std::process;
fn main() {
let mut hex_number_system = HashMap::new();
hex_number_system.insert(1,String::from("1"));
hex_number_system.insert(2,String::from("2"));
hex_number_system.insert(3,String::from("3"));
hex_number_system.insert(4,String::from("4"));
hex_number_system.insert(5,String::from("5"));
hex_number_system.insert(6,String::from("6"));
hex_number_system.insert(7,String::from("7"));
hex_number_system.insert(8,String::from("8"));
hex_number_system.insert(9,String::from("9"));
hex_number_system.insert(10,String::from("A"));
hex_number_system.insert(11,String::from("B"));
hex_number_system.insert(12,String::from("C"));
hex_number_system.insert(13,String::from("D"));
hex_number_system.insert(14,String::from("E"));
hex_number_system.insert(15,String::from("F"));
let mut line = 0;
let mut current_multiplier = 0;
let mut current_num = String::new();
let mut digit_num = 256;
print!("Enter a number from 0 - 4095:");
io::stdout().flush().unwrap();
let mut input = String::new();
io::stdin().read_line(&mut input).unwrap();
println!("{:?}", input);
let mut user = input.trim().parse::<i32>().unwrap();
if user > 4095 {
println!("Too high");
process::exit(1);
}
if user < 0 {
println!("Too low");
process::exit(1);
}
for i in 1..=3 {
current_multiplier = 15;
loop {
if current_multiplier == 0 {
print!("{}", 0);
digit_num /= 16;
break;
}
if user >= (current_multiplier * digit_num) {
print!("{:?}", hex_number_system.get(&current_multiplier));
user -= &digit_num * &current_multiplier;
digit_num /= 16;
break;
} else {
current_multiplier -= 1;
}
}
}
print!("\n");
}

If you look at the doc of the HashMap::get function, you'll see that it returns an Option. The doc doesn't say, but that's to handle the case when the key is not found in the map (the example in the doc shows this).
You have to handle this possibility. If you don't mind your code crashing, you can just .unwrap() the result. Otherwise, match it properly.

How to get variable changes outside of nesting in rust?

Hey so I'm really new to rust and programming at large and I'm having some issues I haven't encountered before. I'm attempting to return the results of user input variables to the screen outside of nesting but I haven't seemed to have any luck.
use std::{thread, time::Duration};
fn main(){
let mut modifer = 0;
let mut ModValue=0;
let mut ModReason = "";
let Y = "Y";
let y = "y";
let N = "N";
let n = "n";
print!("{esc}c", esc = 27 as char);
let mut ModYN = String::new();
println!("Are there any modifiers (Y/N)");
std::io::stdin().read_line(&mut ModYN).unwrap();
let ModYN = ModYN.trim();
if ModYN == y || ModYN == Y{
ModValue = 1;
print!("{esc}c", esc = 27 as char);
let mut modifer:f32=0.0;
let mut input = String::new();
//I'm attempting to get the input from the user to define a modifer reason and value
println!("Please enter the modifer: ");
std::io::stdin().read_line(&mut input).expect("Not a valid string");
modifer = input.trim().parse().expect("Not a valid number");
let mut ModReason = String::new();
println!("Whats the modifer reason: ");
std::io::stdin().read_line(&mut ModReason).unwrap();
let ModReason = ModReason.trim();
}
//Then print those results to the screen outside of nesting
print!("{esc}c", esc = 27 as char);
println!("-Modifer-");
println!("{}",&mut modifer);
println!("-Modifer Reason-");
println!("{}",&mut ModReason);
thread::sleep(Duration::from_secs(5));
}
I've attempted a variety of things; from assigning it as a mutable or borrowing the variable itself to even extending the nesting past the print results. But I've hit a brick wall and could really use some help.

There are several flaws in your code, and since your question in only about one, I'll explain only about that one. This means that the code below will not compile. But it will get you closer to the solution.
Your problem, I think, comes from the fact that you don't know the difference between let var = value and var = value. These are very different statements: the second one assigns value to var, while the first one creates a new variable (shadowing a variable with the same name, if there was one), then assigns to it. It is the same as
let var;
var = value;
So when, in the loop, you create a new variable ModReason, you are not modifying the one outside the loop: you are effectively creating a new variable, just to destroy it at the end of the if statement (without using it). So the solution is to not create a new variable each time:
use std::{thread, time::Duration};
fn main(){
let mut modifer = 0;
let mut ModValue=0;
let mut ModReason = "";
let Y = "Y";
let y = "y";
let N = "N";
let n = "n";
print!("{esc}c", esc = 27 as char);
let mut ModYN = String::new();
println!("Are there any modifiers (Y/N)");
std::io::stdin().read_line(&mut ModYN).unwrap();
let ModYN = ModYN.trim();
if ModYN == y || ModYN == Y{
ModValue = 1;
print!("{esc}c", esc = 27 as char);
let mut modifer:f32=0.0;
let mut input = String::new();
//I'm attempting to get the input from the user to define a modifer reason and value
println!("Please enter the modifer: ");
std::io::stdin().read_line(&mut input).expect("Not a valid string");
modifer = input.trim().parse().expect("Not a valid number");
ModReason = String::new();
println!("Whats the modifer reason: ");
std::io::stdin().read_line(&mut ModReason).unwrap();
ModReason = ModReason.trim();
}
//Then print those results to the screen outside of nesting
print!("{esc}c", esc = 27 as char);
println!("-Modifer-");
println!("{}",&mut modifer);
println!("-Modifer Reason-");
println!("{}",&mut ModReason);
thread::sleep(Duration::from_secs(5));
}
Rust would have warned you (because, in this way, you never assign to ModReason outside of the loop) if you didn't print ModReason by passing a mutable borrow (which is absolutely useless: just go with println!("{}", ModReason);).
Also, it is very likely that Rust has warned you about several variables that are not used: if you read these warnings, you can understand that you are doing something wrong.

Idiomatic way to move values from vector to predefined slice?

I am playing audio file with cpal lib here. Of course I can do it using loops, but this is not idiomatic enough! I believe there should be an elegant function with a small name somewhere in std which performs exactly this, but I failed to find it.
let pcm = read_file("audio.f32le").unwrap();
let mut j = 0;
let stream = output_dev
.build_output_stream(
&output_conf,
move |data: &mut [f32], _: &cpal::OutputCallbackInfo| {
let mut i = 0;
let l = data.len();
while i < l {
data[i] = pcm[j];
i = i + 1;
j = j + 1;
}
},
move |err| panic!("Stream error, {:?}", err),
)
.unwrap();
stream.play().unwrap();
thread::sleep(time::Duration::from_secs(20 + 4 * 60));

What is the idiomatic way to pop the last N elements in a mutable Vec?

I am contributing Rust code to RosettaCode to both learn Rust and contribute to the Rust community at the same time. What is the best idiomatic way to pop the last n elements in a mutable Vec?
Here's roughly what I have written but I'm wanting to see if there's a better way:
fn main() {
let mut nums: Vec<u32> = Vec::new();
nums.push(1);
nums.push(2);
nums.push(3);
nums.push(4);
nums.push(5);
let n = 2;
for _ in 0..n {
nums.pop();
}
for e in nums {
println!("{}", e)
}
}
(Playground link)

I'd recommend using Vec::truncate:
fn main() {
let mut nums = vec![1, 2, 3, 4, 5];
let n = 2;
let final_length = nums.len().saturating_sub(n);
nums.truncate(final_length);
println!("{:?}", nums);
}
Additionally, I
used saturating_sub to handle the case where there aren't N elements in the vector
used vec![] to construct the vector of numbers easily
printed out the entire vector in one go
Normally when you "pop" something, you want to have those values. If you want the values in another vector, you can use Vec::split_off:
let tail = nums.split_off(final_length);
If you want access to the elements but do not want to create a whole new vector, you can use Vec::drain:
for i in nums.drain(final_length..) {
println!("{}", i)
}

An alternate approach would be to use Vec::drain instead. This gives you an iterator so you can actually use the elements that are removed.
fn main() {
let mut nums: Vec<u32> = Vec::new();
nums.push(1);
nums.push(2);
nums.push(3);
nums.push(4);
nums.push(5);
let n = 2;
let new_len = nums.len() - n;
for removed_element in nums.drain(new_len..) {
println!("removed: {}", removed_element);
}
for retained_element in nums {
println!("retained: {}", retained_element);
}
}
The drain method accepts a RangeArgument in the form of <start-inclusive>..<end-exclusive>. Both start and end may be omitted to default to the beginning/end of the vector. So above, we're really just saying start at new_len and drain to the end.

You should take a look at the Vec::truncate function from the standard library, that can do this for you.
(playground)
fn main() {
let mut nums: Vec<u32> = Vec::new();
nums.push(1);
nums.push(2);
nums.push(3);
nums.push(4);
nums.push(5);
let n = 2;
let new_len = nums.len() - n;
nums.truncate(new_len);
for e in nums {
println!("{}", e)
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to read in a fasta in Rust usio the bio package - rust

You need to handle the Result in order to get the type you expect. For example: let reader = fasta::Reader::from_file(filename).expect("Unable to open"); See this thread for more details: Unable to read file contents to string - Result does not implement any method in scope named `read_to_string`

Related

Why Sequential is faster than parallel?

Why when I access a HashMap in rust it prints Some in front of the text?

How to get variable changes outside of nesting in rust?

Idiomatic way to move values from vector to predefined slice?

What is the idiomatic way to pop the last N elements in a mutable Vec?

Categories

Resources