How to hash a binary file in Rust - rust

In rust, using sha256 = "1.0.2" (or similar), how do I hash a binary file (i.e. a tar.gz archive)?
I'm trying to get the sha256 of that binary file.
This doesn't work:
fn hash() {
let file = "file.tar.gz";
let computed_hash = sha256::digest_file(std::path::Path::new(file)).unwrap();
computed_hash
}
the output is:
...
Error { kind: InvalidData, message: "stream did not contain valid UTF-8" }

The sha2 crate upon which depends supports hashing Readable objects without needing to read the entire file into memory. See the example in the hashes readme.
use sha2::{Sha256, Digest};
use std::{io, fs};
let mut hasher = Sha256::new();
let mut file = fs::File::open("file.tar.gz")?;
let bytes_written = io::copy(&mut file, &mut hasher)?;
let hash_bytes = hasher.finalize();

Edit:
Upgrading to sha256 = "1.0.3" should fix this
The issue is that digest_file is internally reading the file to a String, which requires that it contains valid UTF-8, which is obviously not what you want in this case.
Instead, you could read the file in as bytes and pass that into sha256::digest_bytes:
let bytes = std::fs::read(path).unwrap(); // Vec<u8>
let hash = sha256::digest_bytes(&bytes);

Here's an implementation using the sha2 crate that doesn't read the entire file into memory, and doesn't depend on the ring crate. In my case, ring isn't pure rust, which leads to cross-compilation difficulties.
use data_encoding::HEXLOWER;
use sha2::{Digest, Sha256};
use std::fs::File;
use std::io::{BufReader, Read};
use std::path::{Path, PathBuf};
/// calculates sha256 digest as lowercase hex string
fn sha256_digest(path: &PathBuf) -> Result<String> {
let input = File::open(path)?;
let mut reader = BufReader::new(input);
let digest = {
let mut hasher = Sha256::new();
let mut buffer = [0; 1024];
loop {
let count = reader.read(&mut buffer)?;
if count == 0 { break }
hasher.update(&buffer[..count]);
}
hasher.finalize()
};
Ok(HEXLOWER.encode(digest.as_ref()))
}

Related

How to do partial read and calculate md5sum of a large file in RUST?

I'm writing a program to calculate md5sum of files. The core codes is :
use md5;
use std::fs::File;
fn func(file_path: &str) {
let mut f = File::open(file_path).unwrap();
let mut contents = Vec::<u8>::new();
f.read_to_end(&mut contents).unwrap();
let digest = md5::compute(&contents.as_slice());
println!("{:x}\t{}", digest, file_path);
}
This function work well with moderate-size files. But it will raise a segmentation fault when calculating a large file, generate a core dump file more than 22G. How to do partial read and calculate the md5sum separately and then gather them into a final result?
You can use a BufReader to read the file in chunks:
let f = File::open(file_path).unwrap();
// Find the length of the file
let len = f.metadata().unwrap().len();
// Decide on a reasonable buffer size (1MB in this case, fastest will depend on hardware)
let buf_len = len.min(1_000_000) as usize;
let mut buf = BufReader::with_capacity(buf_len, f);
let mut context = md5::Context::new();
loop {
// Get a chunk of the file
let part = buf.fill_buf().unwrap();
// If that chunk was empty, the reader has reached EOF
if part.is_empty() {
break;
}
// Add chunk to the md5
context.consume(part);
// Tell the buffer that the chunk is consumed
let part_len = part.len();
buf.consume(part_len);
}
let digest = context.compute();
println!("{:x}\t{}", digest, file_path);
I'm not sure if this is going to help, though. I would expect yours to have a memory allocation error, so segmentation fault is unexpected.

Read large f64 binary file into array

I'm looking for help/examples on how to read a relatively large (>12M) binary file of double precision numbers into a rust array. I have metadata on the number of f64 values in the file.
I've read on this and seen the byteorder crate but did not find the documentation/examples particularly helpful.
This is not something that needs to be BufRead, since that likely won't help performance.
Thank you!
The easiest way to do it is to read 8 bytes and convert it to f64 using one of the f64::from_byte-order_bytes() methods:
from_ne_bytes()
from_be_bytes()
from_le_bytes()
These methods are used like that:
let mut buffer = [0u8; 8]; // the buffer can be reused!
reader.read_exact(&mut buffer) ?;
let float = f64::from_be_bytes(buffer);
So you can either read the file 8 bytes at a time or on some larger chunks:
fn main() -> Result<(), Box<dyn Error>> {
let file = File::open("./path/to/file")?;
let mut reader = BufReader::new(file);
let mut buffer = [0u8; 8];
loop {
if let Err(e) = reader.read_exact(&mut buffer) {
// if you know how many bytes are expected, then it's better not to rely on `UnexpectedEof`!
if e.kind() == ErrorKind::UnexpectedEof {
// nothing more to read
break;
}
return Err(e.into());
}
// or use `from_le_bytes()` depending on the byte-order
let float = f64::from_be_bytes(buffer);
//do something with the f64
println!("{}", float);
}
Ok(())
}
If you don't mind adding an additional dependency to your project, then you can also use the ByteOrder crate which has convenience methods to read whole slices:
use byteorder::{ByteOrder, LittleEndian};
let mut bytes = [0; 32]; // the buffer you've read the file into
let mut numbers_got = [0.0; 4];
LittleEndian::read_f64_into(&bytes, &mut numbers_got);
assert_eq!(numbers_given, numbers_got)

Rust pattern matching with an existing binding?

I am learning Rust from yesterday. The following code is simple --
use encoding_rs::Encoding;
use std::fs;
use std::fs::File;
use std::io::BufReader;
use std::io::Read;
use std::option::Option;
use std::path::Path;
extern crate encoding_rs;
extern crate encoding_rs_io;
fn main() {
let mut reader = BufReader::new(file);
let mut bom: [u8; 3] = [0; 3];
// read BOM
if let Ok(_) = reader.read_exact(&mut bom) {
// sniff BOM
// Because Rust disallows NULLs, hence I declare `Option<&Encoding>` to store the result of encoding.
let mut enc: Option<&Encoding> = None;
match Encoding::for_bom(&bom) {
Some((encoding, _)) => {
// <-- Some((enc, _))
enc = Some(encoding);
}
None => {
if let Some(encoding) = Encoding::for_label("UTF-8".as_bytes()) {
enc = Some(encoding);
}
}
}
if let Some(encoding) = enc {
println!("{:?}", encoding);
}
}
}
It opens a text file, and try to analyze its encoding by parsing BOM(Byte Order Marker). If Encoding::for_bom does not return an encoding, the code will take use UTF-8 as default.
I dislike unwrap() because it always assume there is a valid result
My question is : is there a way to do pattern matching and put the result directly into an existing mutable binding?
e.g. Change Some((encoding, _)) to Some((enc, _)) hence I don't need the line of enc = Some(encoding)
Many rust constructs can be used as expressions, i.e. they can return a value. So if every branch of your match returns a value of the same type, you can assign it directly into a variable. It does not need to be mutable unless you plan to change it later.
let mut reader = BufReader::new(file);
let mut bom: [u8; 3] = [0; 3];
if let Ok(_) = reader.read_exact(&mut bom) {
let enc = match Encoding::for_bom(&bom) {
Some((encoding, _)) => Some(encoding),
None => Encoding::for_label("UTF-8".as_bytes()),
};
if let Some(encoding) = enc {
println!("{:?}", encoding);
}
}
I'd use a combination of map and or_else:
let enc = Encoding::for_bom(&bom)
.map(|t| t.0)
.or_else(|| Encoding::for_label ("UTF-8".as_bytes()));
Or (clearer but slightly longer):
let enc = Encoding::for_bom(&bom)
.map(|(e, _)| e)
.or_else(|| Encoding::for_label ("UTF-8".as_bytes()));

How to unzip a Reqwest/Hyper response using streams?

I need to download a 60MB ZIP file and extract the only file that comes within it. I want to download it and extract it using streams. How can I achieve this using Rust?
fn main () {
let mut res = reqwest::get("myfile.zip").unwrap();
// extract the response body to myfile.txt
}
In Node.js I would do something like this:
http.get('myfile.zip', response => {
response.pipe(unzip.Parse())
.on('entry', entry => {
if (entry.path.endsWith('.txt')) {
entry.pipe(fs.createWriteStream('myfile.txt'))
}
})
})
With reqwest you can get the .zip file:
reqwest::get("myfile.zip")
Since reqwest can only be used for retrieving the file, ZipArchive from the zip crate can be used for unpacking it. It's not possible to stream the .zip file into ZipArchive, since ZipArchive::new(reader: R) requires R to implement Read (which is fulfilled by the Response of reqwest) and Seek, which is not implemented by Response.
As a workaround you may use a temporary file:
copy_to(&mut tmpfile)
As File implements both Seek and Read, zip can be used here:
zip::ZipArchive::new(tmpfile)
This is a working example of the described method:
extern crate reqwest;
extern crate tempfile;
extern crate zip;
use std::io::Read;
fn main() {
let mut tmpfile = tempfile::tempfile().unwrap();
reqwest::get("myfile.zip").unwrap().copy_to(&mut tmpfile);
let mut zip = zip::ZipArchive::new(tmpfile).unwrap();
println!("{:#?}", zip);
}
tempfile is a handy crate, which lets you create a temporary file, so you don't have to think of a name.
That's how I'd read the file hello.txt with content hello world from the archive hello.zip located on a local server:
extern crate reqwest;
extern crate zip;
use std::io::Read;
fn main() {
let mut res = reqwest::get("http://localhost:8000/hello.zip").unwrap();
let mut buf: Vec<u8> = Vec::new();
let _ = res.read_to_end(&mut buf);
let reader = std::io::Cursor::new(buf);
let mut zip = zip::ZipArchive::new(reader).unwrap();
let mut file_zip = zip.by_name("hello.txt").unwrap();
let mut file_buf: Vec<u8> = Vec::new();
let _ = file_zip.read_to_end(&mut file_buf);
let content = String::from_utf8(file_buf).unwrap();
println!("{}", content);
}
This will output hello world
async solution using Tokio
It's a bit convoluted, but you can do this using tokio, futures, tokio_util::compat and async_compression. The key is to create a futures::io::AsyncRead stream using .into_async_read() and then convert it into a tokio::io::AsyncRead using .compat().
For simplicity, it downloads a txt.gz file and prints it line by line.
use async_compression::tokio::bufread::GzipDecoder;
use futures::stream::TryStreamExt;
use tokio::io::AsyncBufReadExt;
use tokio_util::compat::FuturesAsyncReadCompatExt;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let url = "https://f001.backblazeb2.com/file/korteur/hello-world.txt.gz";
let response = reqwest::get(url).await?;
let stream = response
.bytes_stream()
.map_err(|e| futures::io::Error::new(futures::io::ErrorKind::Other, e))
.into_async_read()
.compat();
let gzip_decoder = GzipDecoder::new(stream);
// Print decompressed txt content
let buf_reader = tokio::io::BufReader::new(gzip_decoder);
let mut lines = buf_reader.lines();
while let Some(line) = lines.next_line().await? {
println!("{line}");
}
Ok(())
}
Credit to Benjamin Kay.

How to convert a String to a &[u8] in order to write it to a file? [duplicate]

With Rust being comparatively new, I've seen far too many ways of reading and writing files. Many are extremely messy snippets someone came up with for their blog, and 99% of the examples I've found (even on Stack Overflow) are from unstable builds that no longer work. Now that Rust is stable, what is a simple, readable, non-panicking snippet for reading or writing files?
This is the closest I've gotten to something that works in terms of reading a text file, but it's still not compiling even though I'm fairly certain I've included everything I should have. This is based off of a snippet I found on Google+ of all places, and the only thing I've changed is that the old BufferedReader is now just BufReader:
use std::fs::File;
use std::io::BufReader;
use std::path::Path;
fn main() {
let path = Path::new("./textfile");
let mut file = BufReader::new(File::open(&path));
for line in file.lines() {
println!("{}", line);
}
}
The compiler complains:
error: the trait bound `std::result::Result<std::fs::File, std::io::Error>: std::io::Read` is not satisfied [--explain E0277]
--> src/main.rs:7:20
|>
7 |> let mut file = BufReader::new(File::open(&path));
|> ^^^^^^^^^^^^^^
note: required by `std::io::BufReader::new`
error: no method named `lines` found for type `std::io::BufReader<std::result::Result<std::fs::File, std::io::Error>>` in the current scope
--> src/main.rs:8:22
|>
8 |> for line in file.lines() {
|> ^^^^^
To sum it up, what I'm looking for is:
brevity
readability
covers all possible errors
doesn't panic
None of the functions I show here panic on their own, but I am using expect because I don't know what kind of error handling will fit best into your application. Go read The Rust Programming Language's chapter on error handling to understand how to appropriately handle failure in your own program.
Rust 1.26 and onwards
If you don't want to care about the underlying details, there are one-line functions for reading and writing.
Read a file to a String
use std::fs;
fn main() {
let data = fs::read_to_string("/etc/hosts").expect("Unable to read file");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs;
fn main() {
let data = fs::read("/etc/hosts").expect("Unable to read file");
println!("{}", data.len());
}
Write a file
use std::fs;
fn main() {
let data = "Some data!";
fs::write("/tmp/foo", data).expect("Unable to write file");
}
Rust 1.0 and onwards
These forms are slightly more verbose than the one-line functions that allocate a String or Vec for you, but are more powerful in that you can reuse allocated data or append to an existing object.
Reading data
Reading a file requires two core pieces: File and Read.
Read a file to a String
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = String::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = Vec::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_end(&mut data).expect("Unable to read data");
println!("{}", data.len());
}
Write a file
Writing a file is similar, except we use the Write trait and we always write out bytes. You can convert a String / &str to bytes with as_bytes:
use std::fs::File;
use std::io::Write;
fn main() {
let data = "Some data!";
let mut f = File::create("/tmp/foo").expect("Unable to create file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered I/O
I felt a bit of a push from the community to use BufReader and BufWriter instead of reading straight from a file
A buffered reader (or writer) uses a buffer to reduce the number of I/O requests. For example, it's much more efficient to access the disk once to read 256 bytes instead of accessing the disk 256 times.
That being said, I don't believe a buffered reader/writer will be useful when reading the entire file. read_to_end seems to copy data in somewhat large chunks, so the transfer may already be naturally coalesced into fewer I/O requests.
Here's an example of using it for reading:
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
let mut data = String::new();
let f = File::open("/etc/hosts").expect("Unable to open file");
let mut br = BufReader::new(f);
br.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
And for writing:
use std::fs::File;
use std::io::{BufWriter, Write};
fn main() {
let data = "Some data!";
let f = File::create("/tmp/foo").expect("Unable to create file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
A BufReader is more useful when you want to read line-by-line:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let f = File::open("/etc/hosts").expect("Unable to open file");
let f = BufReader::new(f);
for line in f.lines() {
let line = line.expect("Unable to read line");
println!("Line: {}", line);
}
}
For anybody who is writing to a file, the accepted answer is good but if you need to append to the file you have to use the OpenOptions struct instead:
use std::io::Write;
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let mut f = OpenOptions::new()
.append(true)
.create(true) // Optionally create the file if it doesn't already exist
.open("/tmp/foo")
.expect("Unable to open file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered writing still works the same way:
use std::io::{BufWriter, Write};
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let f = OpenOptions::new()
.append(true)
.open("/tmp/foo")
.expect("Unable to open file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
By using the Buffered I/O you can copy the file size is greater than the actual memory.
use std::fs::{File, OpenOptions};
use std::io::{BufReader, BufWriter, Write, BufRead};
fn main() {
let read = File::open(r#"E:\1.xls"#);
let write = OpenOptions::new().write(true).create(true).open(r#"E:\2.xls"#);
let mut reader = BufReader::new(read.unwrap());
let mut writer = BufWriter::new(write.unwrap());
let mut length = 1;
while length > 0 {
let buffer = reader.fill_buf().unwrap();
writer.write(buffer);
length = buffer.len();
reader.consume(length);
}
}

Resources