how to change datetime series to date series? - rust

let datetime = frame.column("datetime_nano")?.cast(&DataType::Datetime(TimeUnit::Nanoseconds, None))?;
let date = datetime.cast(&DataType::Date)?;
let time = datetime.cast(&DataType::Time)?;
println!("{}", datetime);
println!("{}", date);
The date part display between date and datetime is not equal. Is there any way to get the correct date series ?
I have read the datetime.date() function. just only solve dtype == DateType::Date.
#[cfg(feature = "dtype-date")]
pub fn date(&self) -> PolarsResult<&DateChunked> {
match self.dtype() {
DataType::Date => unsafe {
Ok(&*(self.as_ref() as *const dyn SeriesTrait as *const DateChunked))
},
dt => Err(PolarsError::SchemaMisMatch(
format!("Series of dtype: {dt:?} != Date").into(),
)),
}
}

The polars tests are a good place to look for code examples.
It looks like you're asking about .strptime which will create a Datetime from a string.
Once you have a Datetime - you can .cast to Date / Time objects.
use polars::prelude::*;
fn main() -> PolarsResult<()> {
let frame = df!("datetime_nano" => ["2019-03-22T14:00:01.700311864"])?
.lazy()
.with_column(col("datetime_nano").str().strptime(StrpTimeOptions {
date_dtype: DataType::Datetime(TimeUnit::Nanoseconds, None),
..Default::default()
}))
.collect()?;
let datetime = frame.column("datetime_nano")?;
let date = datetime.cast(&DataType::Date)?;
let time = datetime.cast(&DataType::Time)?;
println!("{:?}", frame);
println!("{:?}", datetime);
println!("{:?}", date);
println!("{:?}", time);
Ok(())
}
shape: (1, 1)
┌───────────────────────────────┐
│ datetime_nano │
│ --- │
│ datetime[ns] │
╞═══════════════════════════════╡
│ 2019-03-22 14:00:01.700311864 │
└───────────────────────────────┘
shape: (1,)
Series: 'datetime_nano' [datetime[ns]]
[
2019-03-22 14:00:01.700311864
]
shape: (1,)
Series: 'datetime_nano' [date]
[
2019-03-22
]
shape: (1,)
Series: 'datetime_nano' [time]
[
14:00:01.700311864
]

Related

How do I serialize Polars DataFrame Row/HashMap of `AnyValue` into JSON?

I have a row of a polars dataframe created using iterators reading a parquet file from this method: Iterate over rows polars rust
I have constructed a HashMap that represents an individual row and I would like to now convert that row into JSON.
This is what my code looks like so far:
use polars::prelude::*;
use std::iter::zip;
use std::{fs::File, collections::HashMap};
fn main() -> anyhow::Result<()> {
let file = File::open("0.parquet").unwrap();
let mut df = ParquetReader::new(file).finish()?;
dbg!(df.schema());
let fields = df.fields();
let columns: Vec<&String> = fields.iter().map(|x| x.name()).collect();
df.as_single_chunk_par();
let mut iters = df.iter().map(|s| s.iter()).collect::<Vec<_>>();
for _ in 0..df.height() {
let mut row = HashMap::new();
for (column, iter) in zip(&columns, &mut iters) {
let value = iter.next().expect("should have as many iterations as rows");
row.insert(column, value);
}
dbg!(&row);
let json = serde_json::to_string(&row).unwrap();
dbg!(json);
break;
}
Ok(())
}
And I have the following feature flags enabled: ["parquet", "serde", "dtype-u8", "dtype-i8", "dtype-date", "dtype-datetime"].
I am running into the following error at the serde_json::to_string(&row).unwrap() line:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("the enum variant AnyValue::Datetime cannot be serialized", line: 0, column: 0)', src/main.rs:47:48
I am also unable to implement my own serialized for AnyValue::DateTime because of only traits defined in the current crate can be implemented for types defined outside of the crate.
What's the best way to serialize this row into JSON?
I was able to resolve this error by using a match statement over value to change it from a Datetime to an Int64.
let value = match value {
AnyValue::Datetime(value, TimeUnit::Milliseconds, _) => AnyValue::Int64(value),
x => x
};
row.insert(column, value);
Root cause is there is no enum variant for Datetime in the impl Serialize block: https://docs.rs/polars-core/0.24.0/src/polars_core/datatypes/mod.rs.html#298
Although this code now works, it outputs data that looks like:
{'myintcolumn': {'Int64': 22342342343},
'mylistoclumn': {'List': {'datatype': 'Int32', 'name': '', 'values': []}},
'mystrcolumn': {'Utf8': 'lorem ipsum lorem ipsum'}
So you likely to be customizing the serialization here regardless of the data type.
Update: If you want to get the JSON without all of the inner nesting, I had to do a gnarly match statement:
use polars::prelude::*;
use std::iter::zip;
use std::{fs::File, collections::HashMap};
use serde_json::json;
fn main() -> anyhow::Result<()> {
let file = File::open("0.parquet").unwrap();
let mut df = ParquetReader::new(file).finish()?;
dbg!(df.schema());
let fields = df.fields();
let columns: Vec<&String> = fields.iter().map(|x| x.name()).collect();
df.as_single_chunk_par();
let mut iters = df.iter().map(|s| s.iter()).collect::<Vec<_>>();
for _ in 0..df.height() {
let mut row = HashMap::new();
for (column, iter) in zip(&columns, &mut iters) {
let value = iter.next().expect("should have as many iterations as rows");
let value = match value {
AnyValue::Null => json!(Option::<String>::None),
AnyValue::Int64(val) => json!(val),
AnyValue::Int32(val) => json!(val),
AnyValue::Int8(val) => json!(val),
AnyValue::Float32(val) => json!(val),
AnyValue::Float64(val) => json!(val),
AnyValue::Utf8(val) => json!(val),
AnyValue::List(val) => {
match val.dtype() {
DataType::Int32 => ({let vec: Vec<Option<_>> = val.i32().unwrap().into_iter().collect(); json!(vec)}),
DataType::Float32 => ({let vec: Vec<Option<_>> = val.f32().unwrap().into_iter().collect(); json!(vec)}),
DataType::Utf8 => ({let vec: Vec<Option<_>> = val.utf8().unwrap().into_iter().collect(); json!(vec)}),
DataType::UInt8 => ({let vec: Vec<Option<_>> = val.u8().unwrap().into_iter().collect(); json!(vec)}),
x => panic!("unable to parse list column: {} with value: {} and type: {:?}", column, x, x.inner_dtype())
}
},
AnyValue::Datetime(val, TimeUnit::Milliseconds, _) => json!(val),
x => panic!("unable to parse column: {} with value: {}", column, x)
};
row.insert(*column as &str, value);
}
let json = serde_json::to_string(&row).unwrap();
dbg!(json);
break;
}
Ok(())
}

Peek at the next value in a rust-polars LazyFrame column while still working on the current one

I guess this is a conceptual oxymoron "peeking ahead in a LazyFrame-column" ... maybe one of you can enlighten me how to best do it.
I want to put the result of this for each date into a new column:
Ok( (next_weekday_number - current_weekday_number) == 1 )
Here is the sample code to help me find an answer:
// PLEASE be aware to add the needed feature flags in your toml file
use polars::export::arrow::temporal_conversions::date32_to_date;
use polars::prelude::*;
fn main() -> Result<()> {
let days = df!(
"date_string" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05",
"1900-01-06", "1900-01-07", "1900-01-09", "1900-01-10"])?;
let options = StrpTimeOptions {
date_dtype: DataType::Date, // the result column-datatype
fmt: Some("%Y-%m-%d".into()), // the source format of the date-string
strict: false,
exact: true,
};
// convert date_string into dtype(date) and put into new column "date_type"
// we convert the days DataFrame to a LazyFrame ...
// because in my real-world example I am getting a LazyFrame
let mut new_days = days.lazy().with_column(
col("date_string")
.alias("date_type")
.str()
.strptime(options),
);
// This is what I wanted to do ... but I get a string result .. need u32
// let o = GetOutput::from_type(DataType::Date);
// new_days = new_days.with_column(
// col("date_type")
// .alias("weekday_number")
// .map(|x| Ok(x.strftime("%w").unwrap()), o.clone()),
// );
// This is the convoluted workaround
let o = GetOutput::from_type(DataType::Date);
new_days = new_days.with_column(col("date_type").alias("weekday_number").map(
|x| {
Ok(x.date()
.unwrap()
.clone()
.into_iter()
.map(|opt_name: Option<i32>| {
opt_name.map(|datum: i32| {
// println!("{:?}", datum);
date32_to_date(datum)
.format("%w")
.to_string()
.parse::<u32>()
.unwrap()
})
})
.collect::<UInt32Chunked>()
.into_series())
},
o,
));
// Here is where my challenge is ..
// I need to get the weekday_number of the following day to determine a condition
// my pseudo code:
// new_days = new_days.with_column(
// col("weekday_number")
// .alias("cold_day")
// .map(|x| Ok( (next_weekday_number - current_weekday_number) == 1 ), o.clone()),
// );
println!("{:?}", new_days.clone().collect());
Ok(())
}
Ok, I could not find a way to do everything with a LazyFrame, thus I converted the LazyFrame to an eager DataFrame and was able to process two columns at the same time.
So its working for now. Maybe someone can help me realize a solution just with a LazyFrame.
Here is the working code:
use polars::export::arrow::temporal_conversions::date32_to_date;
use polars::prelude::*;
fn main() -> Result<()> {
let days = df!(
"date_string" => &["1900-01-01", "1900-01-02", "1900-01-03", "1900-01-04", "1900-01-05",
"1900-01-06", "1900-01-07", "1900-01-09", "1900-01-10"])?;
let options = StrpTimeOptions {
date_dtype: DataType::Date, // the result column-datatype
fmt: Some("%Y-%m-%d".into()), // the source format of the date-string
strict: false,
exact: true,
};
// convert date_string into dtype(date) and put into new column "date_type"
// we convert the days DataFrame to a LazyFrame ...
// because in my real-world example I am getting a LazyFrame
let mut new_days_lf = days.lazy().with_column(
col("date_string")
.alias("date_type")
.str()
.strptime(options),
);
// Getting the weekday as a number:
// This is what I wanted to do ... but I get a string result .. need u32
// let o = GetOutput::from_type(DataType::Date);
// new_days_lf = new_days_lf.with_column(
// col("date_type")
// .alias("weekday_number")
// .map(|x| Ok(x.strftime("%w").unwrap()), o.clone()),
// );
// This is the convoluted workaround for getting the weekday as a number
let o = GetOutput::from_type(DataType::Date);
new_days_lf = new_days_lf.with_column(col("date_type").alias("weekday_number").map(
|x| {
Ok(x.date()
.unwrap()
.clone()
.into_iter()
.map(|opt_name: Option<i32>| {
opt_name.map(|datum: i32| {
// println!("{:?}", datum);
date32_to_date(datum)
.format("%w")
.to_string()
.parse::<u32>()
.unwrap()
})
})
.collect::<UInt32Chunked>()
.into_series())
},
o,
));
// The "peek" ==> add a shifted column
new_days_lf = new_days_lf.with_column(
col("weekday_number")
.shift_and_fill(-1, 9999)
.alias("next_weekday_number"),
);
// now we convert the LazyFrame into a normal DataFrame for further processing:
let mut new_days_df = new_days_lf.collect()?;
// convert the column to a series
// to get a column by name we need to collect the LazyFrame into a normal DataFrame
let col1 = new_days_df.column("weekday_number")?;
// convert the column to a series
let col2 = new_days_df.column("next_weekday_number")?;
// now I can use series-arithmetics
let diff = col2 - col1;
// create a bool column based on "element == 2"
// add bool column to DataFrame
new_days_df.replace_or_add("weekday diff eq(2)", diff.equal(2)?.into_series())?;
println!("{:?}", new_days_df);
Ok(())
}

How to properly apply a MAP function

Need some help with a map function.
I want to take a DataType::Date and store the corresponding weekday as a string column.
I have it working starting with string -> date-type -> string (case #1).
What I am looking for is date-type -> string (case#2).
Here is the working code for the first case ... any suggestions on how to get this to work for my second case?
My challenge with this stems from my lack of proper understanding of how map is supposed to work in this instance.
use chrono::{Date, Datelike, NaiveDate, Utc};
use polars::prelude::*;
fn main() {
let days = df!("column_1" => &["Tuesday"],
"column_2" => &["1900-01-02"]);
let options = StrpTimeOptions {
date_dtype: DataType::Date,
fmt: Some("%Y-%m-%d".into()),
strict: false,
exact: true,
};
// convert column_2-string into dtype(date) and put into new column "date"
let days = days
.unwrap()
.lazy()
.with_column(col("column_2").alias("date").str().strptime(options));
let o = GetOutput::from_type(DataType::Utf8);
fn str_to_weekday(str_val: Series) -> Result<Series> {
let x = str_val
.utf8()
.unwrap()
.into_iter()
// your actual custom function would be in this map
.map(|opt_date: Option<&str>| {
opt_date.map(|date: &str| {
// for DEBUG purpose only:
println! {"Date-String: {:?}", date};
NaiveDate::parse_from_str(date, "%Y-%m-%d")
.unwrap()
.format("%A")
.to_string()
})
})
.collect::<Utf8Chunked>();
Ok(x.into_series())
}
// column_2 to weekday-string ... into new column "weekday"
let days = days
.with_column(col("column_2").alias("weekday").apply(str_to_weekday, o))
.collect()
.unwrap()
.lazy();
println!("{:?}", days.clone().collect());
}
Got it to work :)
With a simplified approach ...
use polars::prelude::*;
fn main() {
let days = df!("column_1" => &["Tuesday"],
"column_2" => &["1900-01-02"]);
let options = StrpTimeOptions {
date_dtype: DataType::Date,
fmt: Some("%Y-%m-%d".into()),
strict: false,
exact: true,
};
// convert column_2-string into dtype(date) and put into new column "date"
let days = days
.unwrap()
.lazy()
.with_column(col("column_2").alias("date").str().strptime(options));
println!("{:?}", days.clone().collect());
let o = GetOutput::from_type(DataType::Utf8);
let days = days.with_column(
col("date")
.alias("weekday")
.map(|x| Ok(x.strftime("%A").unwrap()), o),
);
println!("{:?}", days.collect());
}

How to parse TOML in Rust with unknown structure?

My configuration file has a large number of arbitrary key-value pairs in it, which I want to parse using the toml crate. However it seems as if the standard way is to use a given struct that fits the configuration file. How can I load the key-value pairs into a data structure like a map or an iterator of pairs, instead of having to specifiy the structure beforehand with a struct?
toml as a Value struct that can hold anything and that you can introspect dynamically in order to discover any content without forcing the usage of a specific structure.
use toml::Value;
fn show_value(
value: &Value,
indent: usize,
) {
let pfx = " ".repeat(indent);
print!("{}", pfx);
match value {
Value::String(string) => {
println!("a string --> {}", string);
}
Value::Integer(integer) => {
println!("an integer --> {}", integer);
}
Value::Float(float) => {
println!("a float --> {}", float);
}
Value::Boolean(boolean) => {
println!("a boolean --> {}", boolean);
}
Value::Datetime(datetime) => {
println!("a datetime --> {}", datetime);
}
Value::Array(array) => {
println!("an array");
for v in array.iter() {
show_value(v, indent + 1);
}
}
Value::Table(table) => {
println!("a table");
for (k, v) in table.iter() {
println!("{}key {}", pfx, k);
show_value(v, indent + 1);
}
}
}
}
fn main() {
let input_text = r#"
abc = 123
[def]
ghi = "hello"
jkl = [ 12.34, 56.78 ]
"#;
let value = input_text.parse::<Value>().unwrap();
show_value(&value, 0);
}
/*
a table
key abc
an integer --> 123
key def
a table
key ghi
a string --> hello
key jkl
an array
a float --> 12.34
a float --> 56.78
*/
You actually don't need to do anything special other than tell it to deserialize into a HashMap:
use std::collections::HashMap;
use toml;
fn main() {
let toml_data = r#"
foo = "123"
bar = "456"
"#;
let config: HashMap<String, String> = toml::from_str(toml_data).unwrap();
println!("{:?}", config);
}
Of course, since TOML and Rust are both typed, your keys all need to be the same type (String in this example), and it cannot handle tables, since it wouldn't know where in the map a table should go.
If you do have a couple tables, just add your maps as fields to a struct and that works just as simply:
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use toml;
#[derive(Debug, Serialize, Deserialize)]
struct Config {
data_a: HashMap<String, String>,
data_b: HashMap<String, String>,
}
fn main() {
let toml_data = r#"
[data_a]
foo = "123"
bar = "456"
[data_b]
bat = "123"
baz = "456"
"#;
let config: Config = toml::from_str(toml_data).unwrap();
println!("{:?}", config);
}
For what is is worth!
I came here to find a way to handle a toml-config file for my project.
This is what I've found:
You can parse an arbitrary toml file by using the Table type.
See the documentation.
All types can be automatically parsed but you cannot escape the fact that rust is typed. Therefore you have to parse the values into an expected type.
See my example:
use toml::Table;
fn main() {
//Load toml file
let path = std::path::Path::new("../Cargo.toml");
let file = match std::fs::read_to_string(path) {
Ok(f) => f,
Err(e) => panic!("{}", e),
};
let cfg: Table = file.parse().unwrap();
println!("Config in table format\n");
dbg!(&cfg);
println!("Index into config");
let cfg_string: &str = cfg["package"]["version"].as_str().unwrap();
println!("Version: {:?}", cfg_string);
let cfg_bool: bool = cfg["package"]["nest"]["nested_bool"].as_bool().unwrap();
println!("Nested bool: {:?}", cfg_bool);
// Default value if failed
let cfg_float: f64 = cfg["package"]["nest"]["nested_int"]
.as_float()
.unwrap_or(5.0);
println!("Default float to value: {:?}", cfg_float);
}
The toml-file
[package]
name = "rust_test"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
toml = "0.6.0"
[package.nest]
nested_float = 1.0
nested_int = 1
nested_bool = false
Output:
Config in table format
[src/main.rs:13] &cfg = {
"dependencies": Table(
{
"toml": String(
"0.6.0",
),
},
),
"package": Table(
{
"edition": String(
"2021",
),
"name": String(
"rust_test",
),
"nest": Table(
{
"nested_bool": Boolean(
false,
),
"nested_float": Float(
1.0,
),
"nested_int": Integer(
1,
),
},
),
"version": String(
"0.1.0",
),
},
),
}
Index into config
Version: "0.1.0"
Nested bool: false
Default float to value: 5.0

How to get date element when iterating over DateChunked in polars rust

I want to iterate over DateChunked using map and use the date element. But iterating over gives me i32. How I can use date?
Running below code complains.
use polars::export::chrono::{NaiveDate};
fn main() {
let df = df! [
"date_series" => [NaiveDate::from_ymd(2020, 1, 1), NaiveDate::from_ymd(2020, 1, 2), NaiveDate::from_ymd(2020, 1, 3)]
].unwrap();
let a: DateChunked = df["date_series"].date().unwrap().into_iter().map(|d| {
match d {
Some(d) => Some(d),
None => None
}
}).collect();
}
complains
}).collect();
| ^^^^^^^ value of type `Logical<DateType, Int32Type>` cannot be built from `std::iter::Iterator<Item=Option<i32>>`

Resources