I was wondering how would I create a time series Array from CSV using ndarray ?
I have this CSV:
date,value
1959-07-02,0.2930
1959-07-06,0.2910
1959-07-07,0.2820
1959-07-08,0.2846
1959-07-09,0.2760
1959-07-10,0.2757
That I'd like to plot using plotly-rs with ndarray support. I deserialized the CSV successfully, but I know want to know how can I create two Array objects: one with dates as NaiveDate (or String as I'm not sure that plotly-rs supports NaiveData natively), and another with values as f64 ? Below is my deserialization code:
#[derive(Deserialize)]
struct Record {
#[serde(deserialize_with = "naive_date_time_from_str")]
date: NaiveDate,
value: f64
}
fn naive_date_time_from_str<'de, D>(deserializer: D) -> Result<NaiveDate, D::Error>
where
D: Deserializer<'de>,
{
let s: String = Deserialize::deserialize(deserializer)?;
NaiveDate::parse_from_str(&s, "%Y-%m-%d").map_err(de::Error::custom)
}
And I can iterate through the CSV like this:
fn main() -> Result<(), Box<dyn Error>> {
let mut reader = ReaderBuilder::new()
.has_headers(true)
.delimiter(b',')
.from_path("./data/timeseries.csv")?;
for record in reader.deserialize::<Record>() {
let record: Record = record?;
println!(
"date {}, value = {}",
record.date.format("%Y-%m-%d").to_string(),
record.value
);
}
Ok(())
}
But know I'm stuck at creating two ndarray Array object. Any hints ?
EDIT: A somewhat similar approach would be done in this topic (but without using ndarray): How to push data from a csv::StringRecord to each column vector in a struct?
You can directly read csv data and plot a chart without additional ndarray step.
use csv::Error;
use plotly::{Plot, Scatter};
fn main() -> Result<(), Error> {
let csv = "date,value
1959-07-02,0.2930
1959-07-06,0.2910
1959-07-07,0.2820
1959-07-08,0.2846
1959-07-09,0.2760
1959-07-10,0.2757";
let mut reader = csv::Reader::from_reader(csv.as_bytes());
let mut date = vec![];
let mut data = vec![];
for record in reader.records() {
let record = record?;
date.push(record[0].to_string());
data.push(record[1].to_string());
}
let trace = Scatter::new(date, data);
let mut plot = Plot::new();
plot.add_trace(trace);
plot.show();
Ok(())
}
Related
I am lost at the mutable references ... Trying to send a DataFrame into a function ... change it and see the changes after the function call completes ...
I get error:
cannot borrow as mutable
Here is a code sample:
use polars::prelude::*;
use std::ops::DerefMut;
fn main() {
let mut days = df!(
"date_string" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05",
"1900-01-06", "1900-01-07", "1900-01-09", "1900-01-10"])
.unwrap();
change(&mut days);
println!("{:?}", days);
}
fn change(days: &mut DataFrame) {
days.column("date_string").unwrap().rename("DATE-STRING)");
}
The signature of column is
fn column(&self, name: &str) -> Result<&Series, PolarsError>
It returns a shared reference to a column. DataFrame has its own rename method that you should use:
use polars::df;
use polars::prelude::*;
fn main() {
let mut days = df!(
"date_string" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05",
"1900-01-06", "1900-01-07", "1900-01-09", "1900-01-10"])
.unwrap();
change(&mut days).unwrap();
assert_eq!(days.get_column_names(), &["DATE-STRING"]);
}
fn change(days: &mut DataFrame) -> Result<&mut DataFrame> {
days.rename("date_string", "DATE-STRING")
}
I want to deserialize the legion's world but don't know what type should I use. This is my deserializing function:
pub fn deserialize(path: &str) -> World {
let registry = get_registry();
let data_raw = pollster::block_on(load_string(path)).expect("Unable to load file");
let mut deserializer = ron::from_str(data_raw.as_str()).expect("Unable to deserialze the file");
let entity_serializer = Canon::default();
registry.as_deserialize(&entity_serializer).deserialize(&mut deserializer).unwrap()
}
As you can see, the deserializer has no type.
This might not help but this is the serialization function that I implemented:
pub fn serialize(world: &World, path: &str) {
let registry = get_registry();
let entity_serializer = Canon::default();
let serializable = world.as_serializable(any(), ®istry, &entity_serializer);
let ron = ron::to_string(
&serializable
).expect("Cannot Serialize World!");
let mut file = File::create(path).expect("Unable to create file");
file.write_all(ron.as_bytes()).expect("Unable to write it to the file");
}
I'm using serde and ron.
deserialize method in question comes from DeserializeSeed trait, so its argument have to be something implementing Deserializer. In case of ron, the type to use is ron::Deserializer (&mut ron::Deserializer, to be precise), which can be created with Deserializer::from_str.
Therefore, this code should work:
pub fn deserialize(path: &str) -> World {
let registry = get_registry();
let data_raw = pollster::block_on(load_string(path)).expect("Unable to load file");
let mut deserializer = ron::Deserializer::from_str(data_raw.as_str()).expect("Unable to deserialze the file");
let entity_serializer = Canon::default();
registry.as_deserialize(&entity_serializer).deserialize(&mut deserializer).unwrap()
}
In Rust Polars, how to cast a Series or ChunkedArray to a Vec?
You can collect the values into a Vec.
use polars::prelude::*;
fn main() -> Result<()> {
let s = Series::new("a", 0..10i32);
let as_vec: Vec<Option<i32>> = s.i32()?.into_iter().collect();
// if we are certain we don't have missing values
let as_vec: Vec<i32> = s.i32()?.into_no_null_iter().collect();
Ok(())
}
I need to put some futures in a Vec for later joining. However if I try to collect it using an iterator, the compiler doesn't seem to be able to determine the type for the vector.
I'm trying to create a command line utility that accepts an arbitrary number of IP addresses, communicates with those remotes and collects the results for printing. The communication function works well, I've cut down the program to show the failure I need to understand.
use futures::future::join_all;
use itertools::Itertools;
use std::net::SocketAddr;
use std::str::from_utf8;
use std::fmt;
#[tokio::main(flavor = "current_thread")]
pub async fn main() -> Result<(), Box<dyn std::error::Error>> {
let socket: Vec<SocketAddr> = vec![
"192.168.20.33:502".parse().unwrap(),
"192.168.20.34:502".parse().unwrap(),];
let async_vec = vec![
MyStruct::get(socket[0]),
MyStruct::get(socket[1]),];
// The above 3 lines happen to work to build a Vec because there are
// 2 sockets. But I need to build a Vec to join_all from an arbitary
// number of addresses. Why doesn't the line below work instead?
//let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect();
let rt = join_all(async_vec).await;
let results = rt.iter().map(|x| x.as_ref().unwrap().to_string()).join("\n");
let mut rvec: Vec<String> = results.split("\n").map(|x| x.to_string()).collect();
rvec.sort_by(|a, b| a[15..20].cmp(&b[15..20]));
println!("{}", rvec.join("\n"));
Ok(())
}
struct MyStruct {
serial: [u8; 12],
placeholder: String,
}
impl fmt::Display for MyStruct {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let serial = match from_utf8(&self.serial) {
Ok(v) => v,
Err(_) => "(invalid)",
};
let lines = (1..4).map(|x| format!("{}, line{}, {}", serial, x, self.placeholder)).join("\n");
write!(f, "{}", lines)
}
}
impl MyStruct {
pub async fn get(sockaddr: SocketAddr) -> Result<MyStruct, Box<dyn std::error::Error>> {
let char = sockaddr.ip().to_string().chars().last().unwrap();
let rv = MyStruct{serial: [char as u8;12], placeholder: sockaddr.to_string(), };
Ok(rv)
}
}
This line:
let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect();
doesn't work because the compiler can't know that you want to collect everything into a Vec. You might want to collect into some other container (e.g. a linked list or a set). Therefore you need to tell the compiler the kind of container you want with:
let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect::<Vec::<_>>();
or:
let async_vec: Vec::<_> = socket.iter().map(|x| MyStruct::get(*x)).collect();
I'm pretty new to Rust and trying to implement some kind of database. Users should create tables by giving a table name, a vector of column names and a vector of column types (realized over an enum). Filling tables should be done by specifying csv files. However, this requires the structure of the table rows to be specified at compile time, like shown in the basic example:
#[derive(Debug, Deserialize, Eq, PartialEq)]
struct Row {
key: u32,
name: String,
comment: String
}
use std::error::Error;
use csv::ReaderBuilder;
use serde::Deserialize;
use std::fs;
fn read_from_file(path: &str) -> Result<(), Box<dyn Error>> {
let data = fs::read_to_string(path).expect("Unable to read file");
let mut rdr = ReaderBuilder::new()
.has_headers(false)
.delimiter(b'|')
.from_reader(data.as_bytes());
let mut iter = rdr.deserialize();
if let Some(result) = iter.next() {
let record:Row = result?;
println!("{:?}", record);
Ok(())
} else {
Err(From::from("expected at least one record but got none"))
}
}
Is there a possibility to use the generic table information instead of the "Row"-struct to cast the results from the deserialization? Is it possible to simply allocate memory according to the combined sizes of the column types and parse the records in? I would do something like this in C...
Is there a possibility to use the generic table information instead of the "Row"-struct to cast the results from the deserialization?
All generics replaced with concrete types at compile time. If you do not know types you will need in runtime, "generics" is not what you need.
Is it possible to simply allocate memory according to the combined sizes of the column types and parse the records in? I would do something like this in C...
I suggest using Box<dyn Any> instead, to be able to store reference of any type and, still, know what type it is.
Maintenance cost for this approach is pretty high. You have to manage each possible value type everywhere you want to use a cell's value. On the other hand, you do not need to parse value each time, just make some type checks in runtime.
I have used std::any::TypeId to identify type, but it can not be used in match expressions. You can consider using custom enum as type identifier.
use std::any::{Any, TypeId};
use std::io::Read;
use csv::Reader;
#[derive(Default)]
struct Table {
name: String,
headers: Vec<(String, TypeId)>,
data: Vec<Vec<Box<dyn Any>>>,
}
impl Table {
fn add_header(&mut self, header: String, _type: TypeId) {
self.headers.push((header, _type));
}
fn populate_data<R: Read>(
&mut self,
rdr: &mut Reader<R>,
) -> Result<(), Box<dyn std::error::Error>> {
for record in rdr.records() {
let record = record?;
let mut row: Vec<Box<dyn Any>> = vec![];
for (&(_, type_id), value) in self.headers.iter().zip(record.iter()) {
if type_id == TypeId::of::<u32>() {
row.push(Box::new(value.parse::<u32>()?));
} else if type_id == TypeId::of::<String>() {
row.push(Box::new(value.to_owned()));
}
}
self.data.push(row);
}
Ok(())
}
}
impl std::fmt::Display for Table {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
writeln!(f, "Table: {}", self.name)?;
for (name, _) in self.headers.iter() {
write!(f, "{}, ", name)?;
}
writeln!(f)?;
for row in self.data.iter() {
for cell in row.iter() {
if let Some(&value) = cell.downcast_ref::<u32>() {
write!(f, "{}, ", value)?;
} else if let Some(value) = cell.downcast_ref::<String>() {
write!(f, "{}, ", value)?;
}
}
writeln!(f)?;
}
Ok(())
}
}
fn main() {
let mut table: Table = Default::default();
table.name = "Foo".to_owned();
table.add_header("key".to_owned(), TypeId::of::<u32>());
table.add_header("name".to_owned(), TypeId::of::<String>());
table.add_header("comment".to_owned(), TypeId::of::<String>());
let data = "\
key,name,comment
1,foo,foo comment
2,bar,bar comment
";
let mut rdr = Reader::from_reader(data.as_bytes());
table.populate_data(&mut rdr).unwrap();
print!("{}", table);
}