Convert Vec<String> to std::rc::RC<Vec<String>> - rust

I have a function that receives a list of ids and then selects them from a database. I'm passing in a Vec and I found this issue https://github.com/rusqlite/rusqlite/issues/430 which linked to here https://github.com/rusqlite/rusqlite/blob/master/src/vtab/array.rs#L18 and it says // Note: A Rc<Vec<Value>> must be used as the parameter.
I cannot figure out how to convert this Vec to Rc<Vec> is a way that does not produce a compile error. I tried the following:
let values = std::rc::Rc::new(ids.into_iter().copied().map(String::from).collect::<Vec<String>>());
let values = std::rc::Rc::from(ids.into_iter().map(|item| item.to_string()).collect::<Vec<String>>());
let values = std::rc::Rc::from(&ids);
All 3 give the same error with some variation of this part Vec<Rc<Vecstd::string::String>>
the trait bound `Vec<Rc<Vec<std::string::String>>>: ToSql` is not satisfied the following implementations were found: <Vec<u8> as ToSql> required for the cast to the object type `dyn ToSql`
How can I convert this so it comes out as Rc<Vec> and not Vec<Rc<Vec>>
My code is here
fn table_data_to_table(ids: &Vec<String>) -> Vec<data::Item> {
let db_connection = rusqlite::Connection::open("data.sqlite")
.expect("Cannot connect to database.");
let values = std::rc::Rc::new(ids.into_iter().copied().map(String::from).collect::<Vec<String>>());
let mut statement = db_connection
.prepare("select * from item where id in rarray(?);")
.expect("Failed to prepare query.");
let mut results = statement
.query_map(rusqlite::params![vec![values]], |row| {
Ok(database::ItemData {
id: row.get(0)?,
name: row.get(1)?,
time_tp_prepare: row.get(2)?
})
});
match results {
Ok(rows) => {
let collection: rusqlite::Result<Vec<database::ItemData>> = rows.collect();
match collection {
Ok(items) => {
items.iter().map(|item_data| data::Item {
id: item_data.id,
name: item_data.name,
time_to_prepare: item_data.time_tp_prepare
}).collect()
},
Err(_) => Vec::new(),
}
},
Err(_) => Vec::new()
}
}

Looking at the example you linked your error is in not passing the Rc directly:
.query_map(rusqlite::params![vec![values]], |row| {
vs
.query_map([values], |row| {

Related

Rust with Datafusion - Trying to Write DataFrame to Json

*Repo w/ WIP code: https://github.com/jmelm93/rust-datafusion-csv-processing
Started programming with Rust 2 days ago, and have been trying to resolve this since ~3 hours into trying out Rust...
Any help would be appreciated.
My goal is to write a Dataframe from Datafusion to JSON (which will eventually be used to respond to HTTP requests in an API with the JSON string).
The DataFrame turns into an "datafusion::arrow::record_batch::RecordBatch" when you collect the data, and this data type is what I'm having trouble converting.
I've tried -
Using json::writer::record_batches_to_json_rows from Arrow, but it won't let me due to "struct datafusion::arrow::record_batch::RecordBatch and struct arrow::record_batch::RecordBatch have similar names, but are actually distinct types". Haven't been able to successfully convert the types to avoid this.
I tried during the Record Batch into a vec and pull out the headers and the values individually. I was able to get the headers out, but haven't had success with the values.
let mut header = Vec::new();
// let mut rows = Vec::new();
for record_batch in data_vec {
// get data
println!("record_batch.columns: : {:?}", record_batch.columns());
for col in record_batch.columns() {
for row in 0..col.len() {
// println!("Cow: {:?}", col);
// println!("Row: {:?}", row);
// let value = col.as_any().downcast_ref::<StringArray>().unwrap().value(row);
// rows.push(value);
}
}
// get headers
for field in record_batch.schema().fields() {
header.push(field.name().to_string());
}
};
Anyone know how to accomplish this?
The full script is below:
// datafusion examples: https://github.com/apache/arrow-datafusion/tree/master/datafusion-examples/examples
// datafusion docs: https://arrow.apache.org/datafusion/
use datafusion::prelude::*;
use datafusion::arrow::datatypes::{Schema};
use arrow::json;
// use serde::{ Deserialize };
use serde_json::to_string;
use std::sync::Arc;
use std::str;
use std::fs;
use std::ops::Deref;
type DFResult = Result<Arc<DataFrame>, datafusion::error::DataFusionError>;
struct FinalObject {
schema: Schema,
// columns: Vec<Column>,
num_rows: usize,
num_columns: usize,
}
// to allow debug logging for FinalObject
impl std::fmt::Debug for FinalObject {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
// write!(f, "FinalObject {{ schema: {:?}, columns: {:?}, num_rows: {:?}, num_columns: {:?} }}",
write!(f, "FinalObject {{ schema: {:?}, num_rows: {:?}, num_columns: {:?} }}",
// self.schema, self.columns, self.num_columns, self.num_rows)
self.schema, self.num_columns, self.num_rows)
}
}
fn create_or_delete_csv_file(path: String, content: Option<String>, operation: &str) {
match operation {
"create" => {
match content {
Some(c) => fs::write(path, c.as_bytes()).expect("Problem with writing file!"),
None => println!("The content is None, no file will be created"),
}
}
"delete" => {
// Delete the csv file
fs::remove_file(path).expect("Problem with deleting file!");
}
_ => println!("Invalid operation"),
}
}
async fn read_csv_file_with_inferred_schema(file_name_string: String) -> DFResult {
// create string csv data
let csv_data_string = "heading,value\nbasic,1\ncsv,2\nhere,3".to_string();
// Create a temporary file
create_or_delete_csv_file(file_name_string.clone(), Some(csv_data_string), "create");
// Create a session context
let ctx = SessionContext::new();
// Register a lazy DataFrame using the context
let df = ctx.read_csv(file_name_string.clone(), CsvReadOptions::default()).await.expect("An error occurred while reading the CSV string");
// return the dataframe
Ok(Arc::new(df))
}
#[tokio::main]
async fn main() {
let file_name_string = "temp_file.csv".to_string();
let arc_csv_df = read_csv_file_with_inferred_schema(file_name_string.clone()).await.expect("An error occurred while reading the CSV string (funct: read_csv_file_with_inferred_schema)");
// have to use ".clone()" each time I want to use this ref
let deref_df = arc_csv_df.deref();
// print to console
deref_df.clone().show().await.expect("An error occurred while showing the CSV DataFrame");
// collect to vec
let record_batches = deref_df.clone().collect().await.expect("An error occurred while collecting the CSV DataFrame");
// println!("Data: {:?}", data);
// record_batches == <Vec<RecordBatch>>. Convert to RecordBatch
let record_batch = record_batches[0].clone();
// let json_string = to_string(&record_batch).unwrap();
// let mut writer = datafusion::json::writer::RecordBatchJsonWriter::new(vec![]);
// writer.write(&record_batch).unwrap();
// let json_rows = writer.finish();
let json_rows = json::writer::record_batches_to_json_rows(&[record_batch]);
println!("JSON: {:?}", json_rows);
// get final values from recordbatch
// https://docs.rs/arrow/latest/arrow/record_batch/struct.RecordBatch.html
// https://users.rust-lang.org/t/how-to-use-recordbatch-in-arrow-when-using-datafusion/70057/2
// https://github.com/apache/arrow-rs/blob/6.5.0/arrow/src/util/pretty.rs
// let record_batches_vec = record_batches.to_vec();
let mut header = Vec::new();
// let mut rows = Vec::new();
for record_batch in data_vec {
// get data
println!("record_batch.columns: : {:?}", record_batch.columns());
for col in record_batch.columns() {
for row in 0..col.len() {
// println!("Cow: {:?}", col);
// println!("Row: {:?}", row);
// let value = col.as_any().downcast_ref::<StringArray>().unwrap().value(row);
// rows.push(value);
}
}
// get headers
for field in record_batch.schema().fields() {
header.push(field.name().to_string());
}
};
// println!("Header: {:?}", header);
// Delete temp csv
create_or_delete_csv_file(file_name_string.clone(), None, "delete");
}
I am not sure that Datafusion is the perfect place to convert CSV string into JSON string, however here is a working version of your code:
#[tokio::main]
async fn main() {
let file_name_string = "temp_file.csv".to_string();
let csv_data_string = "heading,value\nbasic,1\ncsv,2\nhere,3".to_string();
// Create a temporary file
create_or_delete_csv_file(file_name_string.clone(), Some(csv_data_string), "create");
// Create a session context
let ctx = SessionContext::new();
// Register the csv file
ctx.register_csv("t1", &file_name_string, CsvReadOptions::new().has_header(false))
.await.unwrap();
let df = ctx.sql("SELECT * FROM t1").await.unwrap();
// collect to vec
let record_batches = df.collect().await.unwrap();
// get json rows
let json_rows = datafusion::arrow::json::writer::record_batches_to_json_rows(&record_batches[..]).unwrap();
println!("JSON: {:?}", json_rows);
// Delete temp csv
create_or_delete_csv_file(file_name_string.clone(), None, "delete");
}
If you encounter arrow and datafusion struct conflicts, use datafusion::arrow instead of just the arrow library.

How do I serialize Polars DataFrame Row/HashMap of `AnyValue` into JSON?

I have a row of a polars dataframe created using iterators reading a parquet file from this method: Iterate over rows polars rust
I have constructed a HashMap that represents an individual row and I would like to now convert that row into JSON.
This is what my code looks like so far:
use polars::prelude::*;
use std::iter::zip;
use std::{fs::File, collections::HashMap};
fn main() -> anyhow::Result<()> {
let file = File::open("0.parquet").unwrap();
let mut df = ParquetReader::new(file).finish()?;
dbg!(df.schema());
let fields = df.fields();
let columns: Vec<&String> = fields.iter().map(|x| x.name()).collect();
df.as_single_chunk_par();
let mut iters = df.iter().map(|s| s.iter()).collect::<Vec<_>>();
for _ in 0..df.height() {
let mut row = HashMap::new();
for (column, iter) in zip(&columns, &mut iters) {
let value = iter.next().expect("should have as many iterations as rows");
row.insert(column, value);
}
dbg!(&row);
let json = serde_json::to_string(&row).unwrap();
dbg!(json);
break;
}
Ok(())
}
And I have the following feature flags enabled: ["parquet", "serde", "dtype-u8", "dtype-i8", "dtype-date", "dtype-datetime"].
I am running into the following error at the serde_json::to_string(&row).unwrap() line:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("the enum variant AnyValue::Datetime cannot be serialized", line: 0, column: 0)', src/main.rs:47:48
I am also unable to implement my own serialized for AnyValue::DateTime because of only traits defined in the current crate can be implemented for types defined outside of the crate.
What's the best way to serialize this row into JSON?
I was able to resolve this error by using a match statement over value to change it from a Datetime to an Int64.
let value = match value {
AnyValue::Datetime(value, TimeUnit::Milliseconds, _) => AnyValue::Int64(value),
x => x
};
row.insert(column, value);
Root cause is there is no enum variant for Datetime in the impl Serialize block: https://docs.rs/polars-core/0.24.0/src/polars_core/datatypes/mod.rs.html#298
Although this code now works, it outputs data that looks like:
{'myintcolumn': {'Int64': 22342342343},
'mylistoclumn': {'List': {'datatype': 'Int32', 'name': '', 'values': []}},
'mystrcolumn': {'Utf8': 'lorem ipsum lorem ipsum'}
So you likely to be customizing the serialization here regardless of the data type.
Update: If you want to get the JSON without all of the inner nesting, I had to do a gnarly match statement:
use polars::prelude::*;
use std::iter::zip;
use std::{fs::File, collections::HashMap};
use serde_json::json;
fn main() -> anyhow::Result<()> {
let file = File::open("0.parquet").unwrap();
let mut df = ParquetReader::new(file).finish()?;
dbg!(df.schema());
let fields = df.fields();
let columns: Vec<&String> = fields.iter().map(|x| x.name()).collect();
df.as_single_chunk_par();
let mut iters = df.iter().map(|s| s.iter()).collect::<Vec<_>>();
for _ in 0..df.height() {
let mut row = HashMap::new();
for (column, iter) in zip(&columns, &mut iters) {
let value = iter.next().expect("should have as many iterations as rows");
let value = match value {
AnyValue::Null => json!(Option::<String>::None),
AnyValue::Int64(val) => json!(val),
AnyValue::Int32(val) => json!(val),
AnyValue::Int8(val) => json!(val),
AnyValue::Float32(val) => json!(val),
AnyValue::Float64(val) => json!(val),
AnyValue::Utf8(val) => json!(val),
AnyValue::List(val) => {
match val.dtype() {
DataType::Int32 => ({let vec: Vec<Option<_>> = val.i32().unwrap().into_iter().collect(); json!(vec)}),
DataType::Float32 => ({let vec: Vec<Option<_>> = val.f32().unwrap().into_iter().collect(); json!(vec)}),
DataType::Utf8 => ({let vec: Vec<Option<_>> = val.utf8().unwrap().into_iter().collect(); json!(vec)}),
DataType::UInt8 => ({let vec: Vec<Option<_>> = val.u8().unwrap().into_iter().collect(); json!(vec)}),
x => panic!("unable to parse list column: {} with value: {} and type: {:?}", column, x, x.inner_dtype())
}
},
AnyValue::Datetime(val, TimeUnit::Milliseconds, _) => json!(val),
x => panic!("unable to parse column: {} with value: {}", column, x)
};
row.insert(*column as &str, value);
}
let json = serde_json::to_string(&row).unwrap();
dbg!(json);
break;
}
Ok(())
}

How to query a specific set of attributes from dynamoDB using rust language?

I am new to Rust and this question may come off as silly. I am trying to develop a lambda that reads a single item from dynamoDB given a key. The returned item needs to be shared back as result to the calling lambda.
I want the response to be in JSON.
Here is what I have:
The Input Struct
#[derive(Deserialize, Clone)]
struct CustomEvent {
#[serde(rename = "user_id")]
user_id: String,
}
The Output Struct
#[derive(Serialize, Clone)]
struct CustomOutput {
user_name: String,
user_email: String,
}
The Main fn
#[tokio::main]
async fn main() -> std::result::Result<(), Error> {
let func = handler_fn(get_user_details);
lambda_runtime::run(func).await?;
Ok(())
}
The logic to query
async fn get_user_details(
e: CustomEvent,
_c: Context,
) -> std::result::Result<CustomOutput, Error> {
if e.user_id == "" {
error!("User Id must be specified as user_id in the request");
}
let region_provider =
RegionProviderChain::first_try(Region::new("ap-south-1")).or_default_provider();
let shared_config = aws_config::from_env().region(region_provider).load().await;
let client: Client = Client::new(&shared_config);
let resp: () = query_user(&client, &e.user_id).await?;
println!("{:?}", resp);
Ok(CustomOutput {
// Does not work
// user_name: resp[0].user_name,
// user_email: resp[0].user_email,
// Works because it is hardcoded
user_name: "hello".to_string(),
user_email: "world#gmail.com".to_string()
})
}
async fn query_user(
client: &Client,
user_id: &str,
) -> Result<(), Error> {
let user_id_av = AttributeValue::S(user_id.to_string());
let resp = client
.query()
.table_name("users")
.key_condition_expression("#key = :value".to_string())
.expression_attribute_names("#key".to_string(), "id".to_string())
.expression_attribute_values(":value".to_string(), user_id_av)
.projection_expression("user_email")
.send()
.await?;
println!("{:?}", resp.items.unwrap_or_default()[0]);
return Ok(resp.items.unwrap_or_default().pop().as_ref());
}
My TOML
[dependencies]
lambda_runtime = "^0.4"
serde = "^1"
serde_json = "^1"
serde_derive = "^1"
http = "0.2.5"
rand = "0.8.3"
tokio-stream = "0.1.8"
structopt = "0.3"
aws-config = "0.12.0"
aws-sdk-dynamodb = "0.12.0"
log = "^0.4"
simple_logger = "^1"
tokio = { version = "1.5.0", features = ["full"] }
I am unable to unwrap and send the response back to the called lambda. From query_user function, I want to be able to return a constructed CustomOutput struct to this
Ok(CustomOutput {
// user_name: resp[0].user_name,
// user_email: resp[0].user_email,
})
block in get_user_details. Any help or references would help a lot. Thank you.
After several attempts, here is what I learnt:
The match keyword can be used instead of collecting the result in a variable.
I did this:
match client
.query()
.table_name("users")
.key_condition_expression("#key = :value".to_string())
.expression_attribute_names("#key".to_string(), "id".to_string())
.expression_attribute_values(":value".to_string(), user_id_av)
.projection_expression("user_email")
.send()
.await
{
Ok(resp) => Ok(resp.items),
Err(e) => Err(e),
}
When a response comes back from the DB, it will have to have an items key-value inside it.
so,
this line:
Ok(resp) => Ok(resp.items)
will ensure that the items array is returned to the calling function.
Next, to get the values one by one out of the Hashmap and load them into CustomOutput, this is what I did:
let resp: std::option::Option<Vec<HashMap<std::string::String, AttributeValue>>> = query_user(&client, &e.user_id).await?;
Once I have the resp, I can burrow down to the first element if I need like this:
x[0]
.get("user_name")
.unwrap()
.as_s()
.unwrap()
.to_string()
and for Number types maybe something like "battery_voltage":
x[0]
.get("battery_voltage")
.unwrap()
.as_n()
.unwrap()
.to_string()
.parse::<f32>()
.unwrap(),
Finally, use a match block to determine the Some or None for the data:
match _val {
Some(x) => {
// pattern
if x.len() > 0 {
return Ok(json!(CustomOutput {
user_name: x[0].get("user_name").unwrap().as_s().unwrap().to_string(),
user_email: x[0].get("user_email").unwrap().as_s().unwrap().to_string(),
}));
} else {
return Ok(json!({
"code": "404".to_string(),
"message": "Not found.".to_string(),
}));
}
}
None => {
// other pattern
println!("Got nothing");
return Ok(json!({
"code": "404".to_string(),
"message": "Not found.".to_string(),
}));
}
Hope this helps someone else!

Borrowing the mutable member used inside the loop

The problem I want to solve is:
Given the recursively nested data structure, eg. a JSON tree, and a path pointing to (possibly non-existent) element inside it, return the mutable reference of the element, that's the closest to given path.
Example: if we have JSON document in form { a: { b: { c: "foo" } } } and a path a.b.d, we want to have a mutable pointer to value stored under key "b".
This is a code snippet, what I've got so far:
use std::collections::HashMap;
enum Json {
Number(i64),
Bool(bool),
String(String),
Array(Vec<Json>),
Object(HashMap<String, Json>)
}
struct Pointer<'a, 'b> {
value: &'a mut Json,
path: Vec<&'b str>,
position: usize
}
/// Return a mutable pointer to JSON element having shared
/// the nearest common path with provided JSON.
fn nearest_mut<'a,'b>(obj: &'a mut Json, path: Vec<&'b str>) -> Pointer<'a,'b> {
let mut i = 0;
let mut current = obj;
for &key in path.iter() {
match current {
Json::Array(array) => {
match key.parse::<usize>() {
Ok(index) => {
match array.get_mut(index) {
Some(inner) => current = inner,
None => break,
}
},
_ => break,
}
} ,
Json::Object(map) => {
match map.get_mut(key) {
Some(inner) => current = inner,
None => break
}
},
_ => break,
};
i += 1;
}
Pointer { path, position: i, value: current }
}
The problem is that this doesn't pass through Rust's borrow checker, as current is borrowed as mutable reference twice, once inside match statement and once at the end of the function, when constructing the pointer method.
I've tried a different approaches, but not figured out how to achieve the goal (maybe going the unsafe path).
I completely misread your question and I owe you an apology.
You cannot do it in one pass - you're going to need to do a read-only pass to find the nearest path (or exact path), and then a read-write pass to actually extract the reference, or pass a mutator function in the form of a closure.
I've implemented the two-pass method for you. Do note that it is still pretty performant:
fn nearest_mut<'a, 'b>(obj: &'a mut Json, path: Vec<&'b str>) -> Pointer<'a, 'b> {
let valid_path = nearest_path(obj, path);
exact_mut(obj, valid_path).unwrap()
}
fn exact_mut<'a, 'b>(obj: &'a mut Json, path: Vec<&'b str>) -> Option<Pointer<'a, 'b>> {
let mut i = 0;
let mut target = obj;
for token in path.iter() {
i += 1;
// borrow checker gets confused about `target` being mutably borrowed too many times because of the loop
// this once-per-loop binding makes the scope clearer and circumvents the error
let target_once = target;
let target_opt = match *target_once {
Json::Object(ref mut map) => map.get_mut(*token),
Json::Array(ref mut list) => match token.parse::<usize>() {
Ok(t) => list.get_mut(t),
Err(_) => None,
},
_ => None,
};
if let Some(t) = target_opt {
target = t;
} else {
return None;
}
}
Some(Pointer {
path,
position: i,
value: target,
})
}
/// Return a mutable pointer to JSON element having shared
/// the nearest common path with provided JSON.
fn nearest_path<'a, 'b>(obj: &'a Json, path: Vec<&'b str>) -> Vec<&'b str> {
let mut i = 0;
let mut target = obj;
let mut valid_paths = vec![];
for token in path.iter() {
// borrow checker gets confused about `target` being mutably borrowed too many times because of the loop
// this once-per-loop binding makes the scope clearer and circumvents the error
let target_opt = match *target {
Json::Object(ref map) => map.get(*token),
Json::Array(ref list) => match token.parse::<usize>() {
Ok(t) => list.get(t),
Err(_) => None,
},
_ => None,
};
if let Some(t) = target_opt {
target = t;
valid_paths.push(*token)
} else {
return valid_paths;
}
}
return valid_paths
}
The principle is simple - I reused the method I wrote in my initial question in order to get the nearest valid path (or exact path).
From there, I feed that straight into the function that I had in my original answer, and since I am certain the path is valid (from the prior function call) I can safely unwrap() :-)

pattern matching borrowed content issue

So I have this piece of code that reads input.csv, inserts a column in it and writes it to output.csv
extern crate csv;
use std::path::Path;
struct User {
reference: String,
email: String,
firstname: String,
lastname: String
}
fn main() {
let mut rdr = csv::Reader::from_file("/tmp/input.csv").unwrap().has_headers(false);
let mut wtr = csv::Writer::from_file(Path::new("/tmp/output.csv")).unwrap();
let users = get_users();
for record in rdr.decode() {
let rec: Option<Vec<String>> = match record {
Ok(rec) => Some(rec),
Err(e) => None
};
match rec {
Some(mut r) => {
let usr = users.iter().find(|&ur| ur.reference == r[27].to_string());
match usr {
Some(u) => r.insert(1, u.email),
None => r.insert(1, "Unknown".to_string())
}
wtr.write(r.iter());
}
None => {}
};
}
}
fn get_users() -> Vec<User> {
//retrieve users
}
and it's giving me an error:
error: cannot move out of borrowed content
Some(u) => r.insert(1, u.email),
^
So I understand it's getting upset about u.email, because r is trying to take ownership of it(?), but how to best handle such a situation?
Here is a slightly simplified portion of your code which demonstrates the problem:
struct User {
reference: String,
email: String
}
let users = vec![
User { reference: "1".into(), email: "a#a.com".into() },
User { reference: "2".into(), email: "b#b.com".into() }
];
let records: Vec<Vec<String>> = vec![
vec!["1".into()],
vec!["2".into()],
vec!["3".into()]
];
for mut record in records {
let usr = users.iter().find(|ur| ur.reference == record[0]);
match usr {
Some(u) => record.insert(1, u.email),
None => record.insert(1, "Unknown".into())
}
// do whatever with record
}
usr in let usr here is of type &User, not User, because iter() returns an iterator satisfying Iterator<Item=&User>, and hence find() returns Option<&User>. Consequently, you cannot take a String out of u: &User - you can't move out of a borrowed reference. This is, BTW, an absolutely correct error - if this was allowed, your code would break in a situation with multiple records corresponding to the same user (it would require moving the email out of the same user multiple times).
The most natural way here would be just to use clone():
record.insert(1, u.email.clone())
It would create a copy of the email string contained in the found User, exactly what you need.

Resources