I am processing a lot of data. I am working with the cargo dumps, in hopes of creating interconnected graph of all crates. I get data from postgres dump, and try to connect it. I'm trying to connect each version of package, with latest possible version of dependency. I have to go through each dependency, find possible versions, parse version and see if the version of the package and required version of the dependency matches.
I can say that first found semver match is always the right one, since I get data ordered DESC from db. Now, to my problem. I programmed this logic, however then I found out that execution of my code is going to take a long time. Around 3 hours on my hardware, and a lot more in production. This is not acceptable. I've pinpointed my problem, being that I always create a new iterator and then proceed to consume it with using find method. Is there any way to find value inside a vec, without consuming the iterator?
I have the following structs:
#[derive(Debug, Clone, sqlx::FromRow)]
pub struct CargoCrateVersionRow {
pub id: i32,
pub crate_id: i32,
pub version_num: String,
pub created_at: sqlx::types::chrono::NaiveDateTime,
pub updated_at: sqlx::types::chrono::NaiveDateTime,
pub downloads: i32,
pub features: sqlx::types::Json<HashMap<String, Vec<String>>>,
pub yanked: bool,
pub license: Option<String>,
pub crate_size: Option<i32>,
pub published_by: Option<i32>,
}
#[derive(Debug, sqlx::FromRow)]
pub struct CargoDependenciesRow {
pub version_id: i32,
pub crate_id: i32,
pub req: String,
pub optional: bool,
pub features: Vec<String>,
pub kind: i32,
}
#[derive(Debug)]
pub struct CargoCrateVersionDependencyEdge {
pub version_id_from: i32,
pub version_id_to: i32,
pub optional: bool,
pub with_features: Vec<String>,
pub kind: i32,
}
SQL command for retrieving crate_dependencies:
select version_id, crate_id, req, optional, features, kind
from dependencies;
SQL command for retrieving crate_versions:
select
id, crate_id, num "version_num", created_at, updated_at,
downloads, features, yanked, license, crate_size, published_by
from
versions
order by
id desc;
My custom logic:
let mut cargo_crate_dependency_edges: Vec<CargoCrateVersionDependencyEdge> = vec![];
let mut i = 0;
for dep in crate_dependencies {
if i % 10_000 == 0 {
println!("done 10k");
}
let req = skip_fail!(VersionReq::parse(dep.req.as_str()));
// I do not wish to create new iter each time, as this includes millions of structs
let possible_crate_version = crate_versions.iter().find(|s| {
if s.crate_id != dep.crate_id {
return false;
}
// Here, I'm using semver crate
if let Ok(parsed_version) = Version::parse(&s.version_num) {
req.matches(&parsed_version)
} else {
false
}
});
if let Some(found_crate_version) = possible_crate_version {
cargo_crate_dependency_edges.push(
CargoCrateVersionDependencyEdge {
version_id_from: dep.version_id,
version_id_to: found_crate_version.id,
optional: dep.optional,
with_features: dep.features.clone(),
kind: dep.kind,
}
);
}
i += 1;
}
println!("{}", cargo_crate_dependency_edges.len());
After some time debugging, I've come to the conclusion that I need to somehow get rid of making the iterator, because that's my bottleneck. I've benchmarked the library, pushing onto the vec, basically everything.
Related
I'm using time::PrimitiveDateTime and I would like to create a new var with the current time but I cannot find how. Isn't there something like now()?
What about Instant::now()?
Example:
pub struct Player {
updated_at: Option<PrimitiveDateTime>
}
impl Player {
pub fn set_updated_at(&mut self) {
let now = Instant::now();
self.updated_at = Some(time::PrimitiveDateTime::from(now));
}
But obviously this doesn't work:
mismatched types
expected struct `time::PrimitiveDateTime`, found struct `time::Instant` rustc E0308
The best way is probably to use PrimitiveDateTime::new with OffsetDateTime::now_(utc|local):
pub struct Player {
updated_at: Option<PrimitiveDateTime>
}
impl Player {
pub fn set_updated_at(&mut self) {
let now = OffsetDateTime::now_utc();
self.updated_at = Some(PrimitiveDateTime::new(now.date(), now.time()));
}
}
As per the latest version, OffsetDateTime will have two now-style methods. One for UTC, one for the local time.
If for whatever reason you want a PrimitiveDateTime you can then of course extract the date info from the OffsetDateTime and use those as arguments in the PrimitiveDateTime::new() method.
Is there any way to add a platform based condition on #[serde(with = "path_handling")]?
So, basically I want to use this custome Serde method only in Inix and on Windows I want to use default way.
pub struct Res {
pub last: bool,
#[serde(with = "path_handling")] // ignore this line on windows as path_handling module contains unix specific logic
pub path: PathBuf,
}
Using cfg_attr and target_family.
pub struct Res {
pub last: bool,
#[cfg_attr(target_family = "unix", serde(with = "path_handling"))]
pub path: PathBuf,
}
For my first project I wanted to create a terminal implementation of Monopoly. I have created a Card, Player, and App structs. Originally my plan was to create an array inside the App struct holding a list of cards which I could then randomly select and run an execute() method on, which would push a log to the logs field of the App. What I thought would be best was this:
pub struct Card {
pub group: u8,
pub name: String,
pub id: u8,
}
impl Card {
pub fn new(group: u8, name: String, id: u8) -> Card {
Card { group, name, id }
}
pub fn execute(self, app: &mut App) {
match self.id {
1 => {
app.players[app.current_player].index = 24;
app.push_log("this works".to_string(), "LOG: ".to_string());
}
_ => {}
}
}
}
pub struct Player<'a> {
pub name: &'a str,
pub money: u64,
pub index: u8,
pub piece: char,
pub properties: Vec<u8>,
pub goojf: u8, // get out of jail freeh
}
impl<'a> Player<'a> {
pub fn new(name: &'a str, piece: char) -> Player<'a> {
Player {
name,
money: 1500,
index: 0,
piece,
properties: Vec::new(),
goojf: 0,
}
}
}
pub struct App<'a> {
pub cards: [Card; 1],
pub logs: Vec<(String, String)>,
pub players: Vec<Player<'a>>,
pub current_player: usize,
}
impl<'a> App<'a> {
pub fn new() -> App<'a> {
App {
cards: [Card::new(
11,
"Take a ride on the Penn. Railroad!".to_string(),
0,
)],
logs: vec![(
String::from("You've begun a game!"),
String::from("BEGIN!:"),
)],
players: vec![Player::new("Joe", '#')],
current_player: 0,
}
}
pub fn draw_card(&mut self) {
if self.players[self.current_player].index == 2
|| self.players[self.current_player].index == 17
|| self.players[self.current_player].index == 33
{
self.cards[0].execute(self);
}
}
pub fn push_log(&mut self, message: String, status: String) {
self.logs.push((message, status));
}
}
fn main() {}
However this code throws the following error:
error[E0507]: cannot move out of `self.cards[_]` which is behind a mutable reference
--> src/main.rs:76:13
|
76 | self.cards[0].execute(self);
| ^^^^^^^^^^^^^ move occurs because `self.cards[_]` has type `Card`, which does not implement the `Copy` trait
I managed to fix this error by simply declaring the array of cards in the method itself, however, this seem to be pretty brute and not at all efficient, especially since other methods in my program depend on cards. How could I just refer to a single array of cards for all of my methods implemented in App or elsewhere?
As others have pointed out, Card::execute() needs to take &self instead of self, but then you run into the borrow checker issue, which I'll spend the rest of this answer discussing.
It may seem odd, but Rust is actually protecting you here. The borrow checker does not look into functions to see what they do, so it has no idea that Card::execute() won't do something to invalidate the referenced passed as the first argument. For example, if App::cards was a vector instead of an array, it could clear the vector.
Something that could actually practically happen here to cause undefined behavior would be if Card::execute() took a string slice from self.name and then cleared the card's name attribute through the mutable reference to app. None of these actions would be prohibited, and you'd be left with an invalid reference to a string slice. This is why the borrow checker isn't letting you make this method call, and this is exactly the kind of accident that Rust is designed to prevent.
There's a few ways around this. One option is to only pass the pieces of the App value needed to complete the task. You can borrow different parts of the same value. The problem here is that the reborrow of self overlaps with the borrow self.cards[0]. Passing each field separately isn't very ergonomic though, as in this case you'll wind up having to pass a reference to pretty much everything else on App.
It looks like the Card values don't actually contain any game state, and are used as data for the game engine. If this is the case, then the cards can live outside of App, like so:
pub struct App<'a> {
// Changed to an unowned slice.
pub cards: &'a [Card],
pub logs: Vec<(String, String)>,
pub players: Vec<Player<'a>>,
pub current_player: usize,
}
impl<'a> App<'a> {
pub fn new(cards: &'a [Card]) -> App<'a> {
App {
cards,
// ...
Then in your main() you can initialize the data and borrow it:
fn main() {
let cards: [Card; 1] = [Card::new(
11,
"Take a ride on the Penn. Railroad!".to_string(),
0,
)];
let app = App::new(&cards);
}
This solves the compilation problem. I'd suggest making other changes, as well:
App should probably be renamed GameState or something to emphasize that this struct should contain only mutable game state, and no immutable reference data.
Player's name field should probably be an owned String instead of an unowned &str, otherwise some other entity in the program will need to own a string slice for the duration of the program so that Player can borrow it.
Does a card need to be able to mutate the cards in self? If you know that execute will not need access to cards, the easiest solution is to split App into further structs so you can better limit the access of the function.
Unless there is another layer to your program that interacts with App as a singular object, requiring everything be in a single struct to perform operations will likely only constrain your code. If possible it is better to split it into its core components so you can be more selective when sharing references and avoid needing to make so many fields pub.
Here is a rough example:
pub struct GameState<'a> {
pub players: Vec<Player<'a>>,
pub current_player: usize,
}
/// You may want to look into using https://crates.io/crates/log instead to make logging easier.
pub struct Logs {
entries: Vec<(String, String)>,
}
impl Logs {
/// Use ToOwned so you can use both both &str and String
pub fn push<S: ToOwned<String>>(&mut self, message: S, status: S) {
self.entries.push((message.to_owned(), status.to_owned()));
}
}
pub struct Card {
group: u8,
name: String,
id: u8,
}
impl Card {
pub fn new(group: u8, name: String, id: u8) -> Self {
Card { group, name, id }
}
/// Use &self because there is no reason we need are required to consume the card
pub fn execute(&self, state: &mut GameState, logs: &mut Logs) {
if self.id == 1{
state.players[state.current_player].index = 24;
logs.push("this works", "LOG");
}
}
}
Also as a side note, execute consumes a card when called since it does not take a reference. Since Card does not implement Copy, that would require it be removed from cards so it can be moved.
Misc Tips and Code Review
Using IDs
It looks like you frequently use IDs to distinguish between items. However, I think your code will look cleaner and be easier to write if you used more human readable types. For example, do cards need an ID? Usually it is preferable to define your struct based on how the data is used. Last time I played monopoly, I don't remember picking up a card and referring to the card ID to determine what to do. I would instead recommend defining a Card by how it is used in the came. If I am remembering correctly, each card contains a short message telling the player what to do. Technically a card could consist of just the message, but you can instead make your code a bit cleaner by separating out the action to an enum so actions are not hard coded to the text on the card.
pub struct Card {
text: String,
action: CardAction,
}
// Note: enums which can also hold data, are more commonly referred to as
// "tagged unions" in computer science. It can be a pain to search for them if
// you don't know what they are called.
pub enum CardAction {
ProceedToGo,
GoToJail,
GainOrLoseMoney(i64),
// etc.
}
On a similar note, it looks like you are trying to be memory conscious by using the smallest type required for a given value. I would recommend against this thinking. If a value of 253u8 is equally as invalid as 324234i32, then the smaller type is not doing anything to help you. You might as well use i32/u32 or i64/u64 since most systems will have an easier time operating on these types. The same thing goes for indices and using other integer types instead of usize since choosing to use another type will only give you more work converting it to and from a usize.
Sharing Owned References
Depending on your design philosophy you might want to store a reference to a struct in multiple places. This can be done using a reference counter Rc<T>. Here are some quick examples. Note that these can not be shared between threads.
let property: Rc<Property> = Rc::new(Property::new(/* etc */));
// Holds an owned reference to the same property as property that can be accessed immutably
let ref_to_property: Rc<Property> = property.clone();
// Or if you want interior mutability you can use Rc<RefCell<T>> instead.
let mutable_property = Rc::new(RefCell::new(Property::new(/* etc */)));
As advised by my lead programmer, who's not knowledgeable in Rust but knows a lot about OO languages like Java, C#, and such, I should separate the functions and associated methods of a particular "class," in Rust it's more or less a struct, from its model or definition to another file. But I had trouble accessing a struct's field/data member from another file. It feels icky to just attach pub before every struct field's name.
// some_model.rs
// This feels icky
pub struct SomeStruct {
pub name: String,
pub id: u32,
}
Just so other files could access the aforementioned struct above
// some_adapter.rs
impl SomeStruct {
pub fn get_id(&self) -> u32 {
self.id
}
pub fn get_name(&self) -> &'static str {
self.name
}
pub fn new(name: &'static str, id: u32) -> Self {
SomeModel {
name,
id
}
}
}
So how does one access such fields from a different file?
Warning
Based from the comments I've received, I'll put this at the top
THIS IS AN UN-RUSTY SOLUTION
If your lead programmer says you should do the following, or do what's indicate in the question, protest and show them this post.
Proposed Answer
I've scoured the net for some answers, every link is a piece of the whole puzzle. See the references below the post. And eventually, I think I found an apt solution, without using the much warned include! macro.
I used the following syntax profusely:
Syntax
Visibility :
pub
| pub ( crate )
| pub ( self )
| pub ( super )
| pub ( in SimplePath )
At first glance it was confusing for me, and for the others like me who might be perplexed as well, I had posted my findings with a sample code.
The directory, using broot
src
├──dal
│ ├──adapter
│ │ └──some_adapter.rs
│ ├──adapter.rs
│ ├──model
│ │ └──some_model.rs
│ └──model.rs
├──dal.rs
├──lib.rs
└──main.rs
I intended to include numerous sub-directories as well, as I reckoned it might add incidental information to those likely having the same knowledge as I in Rust.
I executed cargo new visibility_sample
dal/model/some_model.rs
// dal/model/some_model.rs
pub struct SomeModel {
// take note of the following syntax
pub(in super::super) name: &'static str,
pub(in super::super) id: u32,
}
dal/adapter/some_adapter.rs
// dal/adapter/some_adapter.rs
use super::super::model::some_model::SomeModel;
impl SomeModel {
pub fn get_id(&self) -> u32 {
self.id
}
pub fn get_name(&self) -> &'static str {
self.name
}
pub fn new(name: &'static str, id: u32) -> Self {
SomeModel {
name,
id
}
}
}
dal/model.rs
// dal/model.rs
pub mod some_model;
dal/adapter.rs
// dal/adapter.rs
pub mod some_adapter;
dal.rs
// dal.rs
pub mod model;
pub mod adapter;
lib.rs
// lib.rs
pub mod dal;
And finally main.rs
// main.rs
use visibility_sample::dal::model::some_model::SomeModel;
fn main() {
let object = SomeModel::new("Mike", 3);
println!("name: {}, id: {}", object.get_name(), object.get_id());
}
running cargo run
Compiling visibility_sample v0.1.0 (C:\Users\miked\git\visibility_sample)
Finished dev [unoptimized + debuginfo] target(s) in 1.40s
Running `target\debug\visibility_sample.exe`
name: Mike, id: 3
but if main.rs has the following code:
// main.rs
use visibility_sample::dal::model::some_model::SomeModel;
fn main() {
let object = SomeModel::new("Mike", 3);
println!("name: {}, id: {}", object.name, object.id);
}
Powershell prints out Rust compiler's beautiful and informative error printing:
error[E0616]: field `name` of struct `SomeModel` is private
--> src\main.rs:5:41
|
5 | println!("name: {}, id: {}", object.name, object.id);
| ^^^^ private field
error[E0616]: field `id` of struct `SomeModel` is private
--> src\main.rs:5:54
|
5 | println!("name: {}, id: {}", object.name, object.id);
| ^^ private field
error: aborting due to 2 previous errors
I placed "" quotation marks in the title for "private" because I'm not certain whether that word is still applicable with the term I'm trying to convey with my posted solution.
References
https://doc.rust-lang.org/reference/visibility-and-privacy.html
https://users.rust-lang.org/t/implement-private-struct-in-different-files/29407
Move struct into a separate file without splitting into a separate module?
Proper way to use rust structs from other mods
P.S.
Perhaps to those gifted and keen, they can easily understand the solution with a few or even a single link in my references on their own, but to someone like me, I have to read these links a whole lot and repeatedly get back to it for tinkering on my own.
I'm creating some User Input components in Rust. I'm trying to modularize these components using trait CustomInput. This trait implements some basic for all UI Input components, like being able to set and get the values of the component. All my UI components will take a String input, and parse that string to some typed value, maybe an f64, i32, or Path. Because different UI components can return different types, I created enum CustomOutputType, so that get_value can have a single return type for all components.
As users can input any random string, each UI component needs to have a means of ensuring that the input string can be successfully converted to the right type for the respective input component, I.e MyFloatInput needs to return a f64 value, MyIntInput needs to return an i32, and so on. I see that std::str::FromStr is implemented for most basic types, bool, f64, usize, but I don't think this kind of parsing is robust enough for UI inputs.
I'd like some more advanced functionality than std::str::FromStr provides, I'd like to be able to:
Evaluate expression, i.e input of 500/25 returns 20
Floor/Ceiling for min/max values: i.e if max is 1000 and input is 2000, value is 1000
What would be an idiomatic way of implementing such parsing
functionality for my UI components?
Is using the CustomOutputType enum an efficient strategy for handling the
different types the UI Components could return?
Would a parsing crate like Nom be advisable in this situation? I think this might be
overkill for what I'm doing, but I'm not sure.
I've taken a stab at creating such a system:
use std::str::FromStr;
use snafu::{Snafu};
#[derive(Debug, Snafu)]
pub enum Error{
#[snafu(display("Incorrect Type: {:?}", msg))]
IncorrectInputType{msg: String},
}
type Result<T, E = Error> = std::result::Result<T, E>;
#[derive(Clone, Debug)]
pub enum CustomOutputType{
CFloat(f64),
CInt(i32),
}
pub trait CustomInput{
fn get_value(&self)->CustomOutputType;
fn set_value(&mut self, val: String)->Result<()>;
}
pub struct MyFloatInput{
val: f64,
}
pub struct MyIntInput{
val: i32
}
impl CustomInput for MyFloatInput{
fn get_value(&self)->CustomOutputType{
CustomOutputType::CFloat(self.val)
}
fn set_value(&mut self, val: String)->Result<()>{
match f64::from_str(&val) {
Ok(inp)=>Ok(self.val = inp),
// Ok(inp)=>{self.val = inp}
Err(e) => IncorrectInputType { msg: "setting float input failed".to_string() }.fail()?
}
}
}
impl CustomInput for MyIntInput{
fn get_value(&self)->CustomOutputType{
CustomOutputType::CInt(self.val)
}
fn set_value(&mut self, val: String)->Result<()>{
match i32::from_str(&val) {
Ok(inp)=>Ok(self.val = inp),
Err(e) => IncorrectInputType { msg: "setting int input failed".to_string() }.fail()?
}
}
}
fn main() {
let possible_inputs = vec!["100".to_string(), "100.0".to_string(), "100/2".to_string(), ".1".to_string(), "teststr".to_string()];
let mut int_input = MyIntInput{val: 0};
let mut float_input = MyFloatInput{val: 0.0};
for x in &possible_inputs{
int_input.set_value(x.to_string()).unwrap();
float_input.set_value(x.to_string()).unwrap();
}
}