Creating Diesel.rs queries with a dynamic number of .and()'s - rust

While playing with Diesel, I got stuck writing a function which takes an vector of Strings as input and does the following:
Combine all Strings to a large query
run the query on the Connection
process the result
return a Vec
If I construct the query in one step, it works just fine:
fn get_books(authors: Vec<String>, connection: SqliteConnection) {
use schema::ebook::dsl::*;
let inner = author
.like(format!("%{}%", authors[0]))
.and(author.like(format!("%{}%", authors[1])))
.and(author.like(format!("%{}%", authors[2])));
ebook
.filter(inner)
.load::<Ebook>(&connection)
.expect("Error loading ebook");
}
If I try to generate the query in more steps (needed in order to work with the variable length of the input vector) I can't get it to compile:
fn get_books(authors: Vec<String>, connection: SqliteConnection) {
use schema::ebook::dsl::*;
let mut inner = author
.like(format!("%{}%", authors[0]))
.and(author.like(format!("%{}%", authors[1]))); // <1>
inner = inner.and(author.like(format!("%{}%", authors[2]))); // <2>
ebook
.filter(inner)
.load::<Ebook>(&connection)
.expect("Error loading ebook");
}
This generates the following error:
inner = inner.and(author.like(format!("%{}%",authors[2])));
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `diesel::expression::operators::Like`, found struct `diesel::expression::operators::And`
I don't understand why Rust expects a Like and not an And. The function an the line line marked <1> returns an And and therefore an And is stored in inner.
What am I missing? Why does the first bit of code compile and the second won't? What is the right way to generate this kind of query?

The first thing you need to do is look at the complete error message:
error[E0308]: mismatched types
--> src/main.rs:23:13
|
23 | inner = inner.and(author.like(format!("%{}%", authors[2])));//<2>
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `diesel::expression::operators::Like`, found struct `diesel::expression::operators::And`
|
= note: expected type `diesel::expression::operators::And<diesel::expression::operators::Like<_, _>, _>`
found type `diesel::expression::operators::And<diesel::expression::operators::And<diesel::expression::operators::Like<_, _>, diesel::expression::operators::Like<schema::ebook::columns::author, diesel::expression::bound::Bound<diesel::sql_types::Text, std::string::String>>>, _>`
It's long, but that's because it's fully qualified. Lets shorten the last part a bit:
expected type `And<Like<_, _>, _>`
found type `And<And<Like<_, _>, Like<author, Bound<Text, String>>>, _>`
If you review the documentation for and, you'll see that every call to and consumes the receiver and returns a brand new type — And:
fn and<T: AsExpression<Bool>>(self, other: T) -> And<Self, T::Expression>
This is the core of Diesel's ability to generate strongly-typed SQL expressions with no run-time overhead. All of the work is delegated to the type system. In fact, the creator of Diesel has an entire talk where he shows how far Diesel pushes the type system and what benefits it has.
Turning back to your question, it's impossible to store an And<_, _> in the same location as an And<And<_, _>, _> because they will have different sizes and are in fact different types. At the root, this is the same as trying to store an integer in a boolean.
In fact, there's no way to know what concrete type you need because you don't even know how many conditions you will have — it depends on the size of the vector.
In this case, we have to give up static dispatch and move to dynamic dispatch via a trait object. Diesel has a specific trait for this case (which also has good examples): BoxableExpression.
The remaining piece is to convert your authors to like expressions and combine them. We need a base case, however, for when authors is empty. We construct a trivially true statement (author = author) for that.
#[macro_use]
extern crate diesel;
use diesel::SqliteConnection;
use diesel::prelude::*;
use diesel::sql_types::Bool;
mod schema {
table! {
ebook (id) {
id -> Int4,
author -> Text,
}
}
}
fn get_books(authors: Vec<String>, connection: SqliteConnection) {
use schema::ebook::dsl::*;
let always_true = Box::new(author.eq(author));
let query: Box<BoxableExpression<schema::ebook::table, _, SqlType = Bool>> = authors
.into_iter()
.map(|name| author.like(format!("%{}%", name)))
.fold(always_true, |query, item| {
Box::new(query.and(item))
});
ebook
.filter(query)
.load::<(i32, String)>(&connection)
.expect("Error loading ebook");
}
fn main() {}
I also wouldn't be surprised if there wasn't a better SQL way of doing this. It appears that PostgreSQL has the WHERE col LIKE ANY( subselect ) and WHERE col LIKE ALL( subselect ) forms, for example.

Related

Converting a Utf8 Series into a Series of List<Utf8> via a custom function in Rust polars

I have a Utf8 column in my DataFrame, and from that I want to create a column of List<Utf8>.
In particular for each row I am taking the text of a HTML document and using soup to parse out all the paragraphs of class <p>, and store the collection of text of each separate paragraph as a Vec<String> or Vec<&str>. I have this as a standalone function:
fn parse_paragraph(s: &str) -> Vec<&str> {
let soup = Soup::new(s);
soup.tag(p).find_all().iter().map(|&p| p.text()).collect()
}
In trying to adapt the few available examples of applying custom functions in Rust polars, I can't seem to get the conversion to compile.
Take this MVP example, using a simpler string-to-vec-of-strings example, borrowing from the Iterators example from the documentation:
use polars::prelude::*;
fn vector_split(text: &str) -> Vec<&str> {
text.split(' ').collect()
}
fn vector_split_series(s: &Series) -> PolarsResult<Series> {
let output : Series = s.utf8()
.expect("Text data")
.into_iter()
.map(|t| t.map(vector_split))
.collect();
Ok(output)
}
fn main() {
let df = df! [
"text" => ["a cat on the mat", "a bat on the hat", "a gnat on the rat"]
].unwrap();
df.clone().lazy()
.select([
col("text").apply(|s| vector_split_series(&s), GetOutput::default())
.alias("words")
])
.collect();
}
(Note: I know there is an in-built split function for utf8 Series, but I needed a simpler example than parsing HTML)
I get the following error from cargo check:
error[E0277]: a value of type `polars::prelude::Series` cannot be built from an iterator over elements of type `Option<Vec<&str>>`
--> src/main.rs:11:27
|
11 | let output : Series = s.utf8()
| ___________________________^
12 | | .expect("Text data")
13 | | .into_iter()
14 | | .map(|t| t.map(vector_split))
| |_____________________________________^ value of type `polars::prelude::Series` cannot be built from `std::iter::Iterator<Item=Option<Vec<&str>>>`
15 | .collect();
| ------- required by a bound introduced by this call
|
= help: the trait `FromIterator<Option<Vec<&str>>>` is not implemented for `polars::prelude::Series`
= help: the following other types implement trait `FromIterator<A>`:
<polars::prelude::Series as FromIterator<&'a bool>>
<polars::prelude::Series as FromIterator<&'a f32>>
<polars::prelude::Series as FromIterator<&'a f64>>
<polars::prelude::Series as FromIterator<&'a i32>>
<polars::prelude::Series as FromIterator<&'a i64>>
<polars::prelude::Series as FromIterator<&'a str>>
<polars::prelude::Series as FromIterator<&'a u32>>
<polars::prelude::Series as FromIterator<&'a u64>>
and 15 others
note: required by a bound in `std::iter::Iterator::collect`
What is the correct idiom for this kind of procedure? Is there a simpler way to apply a function?
For future seekers, I will explain the general solution and then the specific code to make the example work. I'll also point out some gotchas for this specific example.
Explanation
If you need to use a custom function instead of using the convenient Expr expressions, at the core of it you'll need to make a function that converts the Series of the input column into a Series backed by a ChunkedArray of the correct output type. This function is what you give to map in the select statement in main. The type of the ChunkedArray is the type you provide as GetOutput.
The code inside vector_split_series in the question works for conversion functions of standard numeric types, or List of numeric types. It does not work automatically for Lists of Utf8 strings, for example, as they are treated specially for ChunkedArrays. This is for performance reasons. You need to build up the Series explicitly, via the correct type builder.
In the question's case, we need to use a ListUtf8ChunkedBuilder which will create a ChunkedArray of List<Utf8>.
So in general, the question's code works for conversion outputs that are numeric or Lists of numerics. But for lists of strings, you need to use a ListUtf8ChunkedBuilder.
Correct code
The correct code for the question's example looks like this:
use polars::prelude::*;
fn vector_split(text: &str) -> Vec<String> {
text.split(' ').map(|x| x.to_owned()).collect()
}
fn vector_split_series(s: Series) -> PolarsResult<Series> {
let ca = s.utf8()?;
let mut builder = ListUtf8ChunkedBuilder::new("words", s.len(), ca.get_values_size());
ca.into_iter()
.for_each(|opt_s| match opt_s {
None => builder.append_null(),
Some(s) => {
builder.append_series(
&Series::new("words", vector_split(s).into_iter() )
)
}});
Ok(builder.finish().into_series())
}
fn main() {
let df = df! [
"text" => ["a cat on the mat", "a bat on the hat", "a gnat on the rat"]
].unwrap();
let df2 = df.clone().lazy()
.select([
col("text")
.apply(|s| vector_split_series(s), GetOutput::from_type(DataType::List(Box::new(DataType::Utf8))))
// Can instead use default if the compiler can determine the types
//.apply(|s| vector_split_series(s), GetOutput::default())
.alias("words")
])
.collect()
.unwrap();
println!("{:?}", df2);
}
The core is in vector_split_series. It has that function definition to be used in map.
The match statement is required because Series can have null entries, and to preserve the length of the Series, you need to pass nulls through. We use the builder here so it appends the appropriate null.
For non-null entries the builder needs to append Series. Normally you can append_from_iter, but there is (as of polars 0.26.1) no implementation of FromIterator for Iterator<Item=Vec<T>>. So you need to convert the collection into an iterator on values, and that iterator into a new Series.
Once the larger ChunkedArray (of type ListUtf8ChunkedArray) is built, you can convert it into a PolarsResult<Series> to return to map.
Gotcha
In the above example, vector_split can return Vec<String> or Vec<&str>. This is because split creates its iterator of &str in a nice way.
If you are using something more complicated --- like my original example of extracting text via Soup queries --- if they output iterators of &str, the references may be considered owned by temporary and then you will have issues about returning references to temporaries.
This is why in the working code, I pass Vec<String> back to the builder, even though it is not strictly required.

Compiler says to add type parameters, but I don't think I can in this context

The compiler complains that type annotations are needed. I was able to fix it by creating an unused variable with type parameters, commented out in the code snippet below. This feels like a weird workaround though, is there some way to add type parameters without needing to do this? It just seems like rust should know the type of key because Map is a defined type.
type Map = Arc<DashMap<[u8; 32], Session>>;
pub struct SessionManager {
map: Map,
db_wtx: WriteTx,
db_rtx: ReadTx,
}
impl SessionManager{
pub fn logout(&self, token: &str) {
if let Ok(decoded_vec) = base64::decode(token) {
if let Ok(key) = decoded_vec.try_into() {
//let _typed_key: [u8; 32] = key;
self.map.remove(&key);
}
}
}
}
Error msg:
error[E0282]: type annotations needed
--> src/login_handler.rs:244:14
|
244 | if let Ok(key) = decoded_vec.try_into() {
| ^^^ ---------------------- this method call resolves to `Result<T, <Self as TryInto<T>>::Error>`
| |
| cannot infer type
edit: for clarity I'm using the dashmap crate, which tries to mimic the api of std::hashmap closely while allowing multithreaded access.
rustc can't infer the type because Map::remove() (assuming you're actually talking about HashMap, or BTreeMap) doesn't take &K as the parameter type (where K is the key type), but rather some Q where Q: Borrow<K>. This allows for more flexibility (for example, passing a &str for a map of Strings), but come at the cost of worse inference.
There are few alternatives: You can specify the type for the Ok:
if let Ok::<[u8; 32], _>(key) = decoded_vec.try_into()
This is a shorthand for:
if let Result::<[u8; 32], _>::Ok(key) = decoded_vec.try_into()
Or you can use TryFrom instead of TryInto:
if let Ok(key) = <[u8; 32]>::try_from(decoded_vec)
If you choose a variable, note that it is preferred to use _ and not another underscore-prefixed name like _typed_key. This is because _ does not move the value, but any other name does. This is not relevant in this case since [u8; 32] is Copy, but still a good practice.

In Rust, what's the correct way to keep data read from a file in scope?

I'm new to Rust.
I'm reading SHA1-as-hex-strings from a file - a lot of them, approx. 30 million.
In the text file, they are sorted ascending numerically.
I want to be able to search the list, as fast as possible.
I (think I) want to read them into a (sorted) Vec<primitive_type::U256> for fast searching.
So, I've tried:
log("Loading haystack.");
// total_lines read earlier
let mut the_stack = Vec::<primitive_types::U256>::with_capacity(total_lines);
if let Ok(hay) = read_lines(haystack) { // Get BufRead
for line in hay { // Iterate over lines
if let Ok(hash) = line {
the_stack.push(U256::from(hash));
}
}
}
log(format!("Read {} hashes.", the_stack.len()));
The error is:
$ cargo build
Compiling nsrl v0.1.0 (/my_app)
error[E0277]: the trait bound `primitive_types::U256: std::convert::From<std::string::String>` is not satisfied
--> src/main.rs:55:24
|
55 | the_stack.push(U256::from(hash));
| ^^^^^^^^^^ the trait `std::convert::From<std::string::String>` is not implemented for `primitive_types::U256`
|
= help: the following implementations were found:
<primitive_types::U256 as std::convert::From<&'a [u8; 32]>>
<primitive_types::U256 as std::convert::From<&'a [u8]>>
<primitive_types::U256 as std::convert::From<&'a primitive_types::U256>>
<primitive_types::U256 as std::convert::From<&'static str>>
and 14 others
= note: required by `std::convert::From::from`
This code works if instead of the variable hash I have a string literal, e.g. "123abc".
I think I should be able to use the implementation std::convert::From<&'static str>, but I don't understand how I'm meant to keep hash in scope?
I feel like what I'm trying to achieve is a pretty normal use case:
Iterate over the lines in a file.
Add the line to a vector.
What am I missing?
You almost want something like,
U256::from_str(&hash)?
There is a conversion from &str in the FromStr trait called from_str. It returns a Result<T, E> value, because parsing a string may fail.
I think I should be able to use the implementation std::convert::From<&'static str>, but I don't understand how I'm meant to keep hash in scope?
You can’t keep the hash in scope with 'static lifetime. It looks like this is a convenience method to allow you to use string constants in your program—but it is really nothing more than U256::from_str(&hash).unwrap().
However…
If you want a SHA-1, the best type is probably [u8; 20] or maybe [u32; 5].
You want a base 16 decoder, something like base16::decode_slice. Here’s how that might look in action:
/// Error if the hash cannot be parsed.
struct InvalidHash;
/// Type for SHA-1 hashes.
type SHA1 = [u8; 20];
fn read_hash(s: &str) -> Result<SHA1, InvalidHash> {
let mut hash = [0; 20];
match base16::decode_slice(s, &mut hash[..]) {
Ok(20) => Ok(hash),
_ => Err(InvalidHash),
}
}

Rust can't use trait const in trait function [duplicate]

Please consider the following minimal example in Rust:
const FOOBAR: usize = 3;
trait Foo {
const BAR: usize;
}
struct Fubar();
impl Foo for Fubar {
const BAR: usize = 3;
}
struct Baz<T>(T);
trait Qux {
fn print_bar();
}
impl<T: Foo> Qux for Baz<T> {
fn print_bar() {
println!("bar: {}", T::BAR); // works
println!("{:?}", [T::BAR; 3]); // works
println!("{:?}", [1; FOOBAR]); // works
println!("{:?}", [1; T::BAR]); // this gives an error
}
}
fn main() {
Baz::<Fubar>::print_bar();
}
The compiler gives the following error:
error[E0599]: no associated item named `BAR` found for type `T` in the current scope
--> src/main.rs:24:30
|
24 | println!("{:?}", [1; T::BAR]); // this gives an error
| ^^^^^^ associated item not found in `T`
|
= help: items from traits can only be used if the trait is implemented and in scope
= note: the following trait defines an item `BAR`, perhaps you need to implement it:
candidate #1: `Foo`
Whatever the answer to my question, this is not a particularly good error message because it suggests that T does implement Foo despite the latter being a trait bound. Only after burning a lot of time did it occur to me that in fact T::BAR is a perfectly valid expression in other contexts, just not as a length parameter to an array.
What are the rules that govern what kind of expressions can go there? Because arrays are Sized, I completely understand that the length are to be known at compile time. Coming from C++ myself, I would expect some restriction akin to constexpr but I have not come across that in the documentation where it just says
A fixed-size array, denoted [T; N], for the element type, T, and the non-negative compile-time constant size, N.
As of Rust 1.24.1, the array length basically needs to either be a numeric literal or a "regular" constant that is a usize. There's a small amount of constant evaluation that exists today, but it's more-or-less limited to basic math.
a perfectly valid expression in other contexts, just not as a length parameter to an array
Array lengths don't support generic parameters. (#43408)
this is not a particularly good error message
Error message should be improved for associated consts in array lengths (#44168)
I would expect some restriction akin to constexpr
This is essentially the restriction, the problem is that what is allowed to be used in a const is highly restricted at the moment. Notably, these aren't allowed:
functions (except to construct enums or structs)
loops
multiple statements / blocks
Work on good constant / compile-time evaluation is still ongoing. There are a large amount of RFCs, issues, and PRs improving this. A sample:
Const fn tracking issue (RFC 911)
Allow locals and destructuring in const fn (RFC 2341)
Allow if and match in constants (RFC 2342)

How do I solve the "missing associated type `Err` value" error?

I'm trying to make a simple utility function that reads multiple elements from stdin and puts them in collection and returns it. However I'm stuck at this point. The compiler says missing associated type Err value. How do I make it work, while keeping it generic as possible?
While this function seems useless, it's for learning the language and its type system.
use std::io::{ stdin };
use std::str::FromStr;
use std::io::Read;
use std::iter::FromIterator;
pub fn read_all<C>() -> C
where C: FromIterator<FromStr<Err>>
{
let mut buff = String::new();
stdin().read_to_string(&mut buff).expect("read_to_string error");
buff.split_whitespace()
.filter_map(|w| w.parse().ok())
.collect()
}
Usage example:
let v: Vec<i32> = read_all();
Working code
The only thing you need to change to your code in order to make it compile is the type signature of the function:
pub fn read_all<C, F>() -> C
where F: FromStr,
C: FromIterator<F>
Explanation
Your code is almost correct, but there is a problem:
FromIterator<T> is a trait, but T is a type.
You use FromStr in the place of T, but FromStr is a trait, not a type.
To solve this, you need to get a type that implements FromStr. You can do this by adding a type parameter F to the function and constraining it with where F: FromStr. Then you can write FromIterator<F>.
A note on associated types
Besides the issue of using a trait instead of a type, typing FromStr<Err> is wrong syntax. While in this case it is not necessary to specify the type of Err in the FromStr trait, you could do it as shown below:
pub fn read_all<C, F, E>() -> C
where F: FromStr<Err=E>,
C: FromIterator<F>
As you can see, instead of writing FromStr<E> we need to write FromStr<Err=E>. That is, you need to explicitly type the name of the associated type you are referring to.
A note on types vs traits
Usually traits cannot be treated as types. However, there are exceptions to this rule, as illustrated by the example below:
use std::fmt::Display;
pub fn print_box(thing: Box<Display>) {
println!("{}", thing)
}
fn main() { print_box(Box::new(42)); }
Here, you would expect T in Box<T> to be a type, but the Display trait is supplied instead. However, the compiler does not reject the program. The type checker sees Display as an unsized type. That is, the type of an object with a size unknown at compile time (because it could be any type implementing Display). When T in Box<T> is a trait, the resulting type is usually referred to as a trait object. It is impossible to cover this topic in depth here, but the links I refer to are a good starting point in case you want to know more.

Resources