Explicitly setting the inlining threshold vs. using optimization level "s" - rust

If I'm reading the documentation here right, setting opt-level = "s" in Cargo.toml is equivalent to setting the inlining threshold to 75.
So I would expect the following two Cargo.toml snippets to be equivalent:
[profile.release]
opt-level = "s"
cargo-features = ["profile-rustflags"]
...
[profile.release]
rustflags = ["-C", "inline-threshold=75"]
However, the executable size I get with the second version is almost twice the size of the fist version, and it matches the size I get without setting the inline-threshold at all (i.e. using the release build's default of 275).
How do I manually set the inlining threshold to match the behaviour of opt-level = "s"? Yes, I could just use opt-level = "s" itself, but my ultimate goal is to then start tweaking the threshold to see how the performance and the binary size changes.

Related

Why does this bevy project take so long to compile and launch?

I started following this tutorial on how to make a game in bevy. The code compiles fine, but it's still pretty slow (I'm honestly not sure if that's normal, it takes around 8 seconds), but when I launch the game, the window goes white (Not Responding) for a few seconds (about the same amount of time as the compile time, maybe a tiny bit less) before properly loading.
Here's my Cargo.toml:
[package]
name = "rustship"
version = "0.1.0"
edition = "2021"
[dependencies]
bevy = "0.8.1"
# Enable a small amount of optimization in debug mode
[profile.dev]
opt-level = 1
# Enable high optimizations for dependencies (incl. Bevy), but not for our code:
[profile.dev.package."*"]
opt-level = 3
[workspace]
resolver = "2"
I tried it with and without the workspace resolver. My rustup toolchain is nightly-x86_64-pc-windows-gnu and I'm using rust-lld to link the program:
[target.nightly-x86_64-pc-windows-gnu]
linker = "rust-lld.exe"
rustflags = ["-Zshare-generics=n"]
According to the official bevy setup guide it should be faster this way. I tried it with rust-lld and without, but it doesn't seem to change anything.
Here's the output of cargo run (with A_NUMBER being a 4-digit number):
AdapterInfo { name: "NVIDIA GeForce RTX 3090", vendor: A_NUMBER, device: A_NUMBER, device_type: DiscreteGpu, backend: Vulkan }
Any ideas on how I can maybe improve the compile time and make the window load directly? My game isn't heavy at all. For now, I'm just loading a sprite. The guy in the tutorial uses MacOS and it seems to be pretty fast for him.

rust / cargo workspace: how to specify different profile for different sub project

I have a rust Cargo workspace that contains different subproject:
./
├─Cargo.toml
├─project1/
│ ├─Cargo.toml
│ ├─src/
├─project2/
│ ├─Cargo.toml
│ ├─src/
I would like to build one project optimized for binary size and the other for speed.
From my understanding we can tweak the profiles only at the cargo.toml root level so this for instance applies to all my sub-projects.
root Cargo.toml:
[workspace]
members = ["project1", "project2"]
[profile.release]
# less code to include into binary
panic = 'abort'
# optimization over all codebase ( better optimization, slower build )
codegen-units = 1
# optimization for size ( more aggressive )
opt-level = 'z'
# optimization for size
# opt-level = 's'
# link time optimization using using whole-program analysis
lto = true
If I try to apply this configuration in a sub Cargo.toml it doesn't work
Question: is there a way to configure each project independently ?
Thank you in advance.
Edit: Also I forgot to say but one project is build with trunk and is a wasm project (I want to be the smaller possible) the other is a backend and I really need it to be built for speed
Each crate in a workspace can have its own .cargo/config.toml where different profiles can be defined. I've toyed around with this a bit to have one crate for an embedded device, one for a CLI utility to connect to the device over serial, and shared libraries for both of them. Pay attention to the caveat in the docs about needing to be in the crate directory for the config to be read, it won't work from the workspace root.

Rust embedded binary size

I'm new to Rust and after many fights with the compiler and borrow-checker I am finally nearly finished with my first project. But now I have the problem that the binary gets to big to fit into the flash of the microcontroller.
I'm using an STM32F103C8 with 64K flash on a BluePill.
At first I was able to fit the code on the mc and bit by bit I had to enable optimization and such. Now I compile with:
[profile.dev]
codegen-units = 1
debug = 0
lto = true
opt-level = "z"
and am able to fit the binary. opt-level = "s" does generate a to big binary. The error I am getting then is: rust-lld: error: section '.rodata' will not fit in region 'FLASH': overflowed by 606 bytes
As I have under 1000 lines of code and as I would say not so unusual dependencies this seems strange.
There are a few sites like this with ways to minimize the binary. As these are not for embedded most of the ways to minimize are followed anyway.
How am I able to minimize the binary size and am still able to debug it?
My dependencies are:
[dependencies]
cortex-m = "*"
panic-halt = "*"
embedded-hal = "*"
[dependencies.cortex-m-rtfm]
version = "0.4.3"
features = ["timer-queue"]
[dependencies.stm32f1]
version = "*"
features = ["stm32f103", "rt"]
[dependencies.stm32f1xx-hal]
version = "0.4.0"
features = ["stm32f103", "rt"]
Maybe there is a problem as I noticed that cargo build does compile some sub dependencies multiple times in different versions.
Inside the memory.x file:
MEMORY
{
FLASH : ORIGIN = 0x08000000, LENGTH = 64K
RAM : ORIGIN = 0x20000000, LENGTH = 20K
}
Rustc version rustc 1.37.0 (eae3437df 2019-08-13)
edit
The rust panic behavior is abort.
The code is view able under: https://github.com/DarkPhoeniz/rc-switcher-rust
I've run into similar issues and may be able to shed some light on what you can do to reduce the size of the binary you're outputting.
You've already discovered one of them: opt-level = "z". The difference between s and z is the inlining constraint - essentially, the size of a function where the compiler deems it not worth inlining. z specifies this to be 25, s 75. Depending on what you are building, this may or may not be a consequent reduction in size (and it affects .rodata and .text primarily).
Another thing you can play on is the behavior on panic on your code. If I remember correctly, the stm32 target supports both unwind and abort, with unwind enabled on the dev profile. As I'm sure you can understand, unwinding the stack is a large and costly process in terms of code size. As such, setting panic = "abort" in your cargo file might reduce the binary size a bit further.
Beyond that, it is down to manual tuning, and tools like cargo-binutils may be extremely useful for this. Depending on your use case, there may be leftover Debug implementations which are only sporadically needed, and that is definitely something that you could act on.
A few other general tips for shrinking the binary:
First, the cargo-bloat utility is useful for determining what is taking up space in your binary, then you can make informed decisions about how to modify your code to shrink it down.
Second, I've had significant success by configuring the compiler to optimize all dependencies, but leave the top level crate unoptimized for easier debugging. You can do this by adding the following to your Cargo.toml:
# Optimize all dependencies
[profile.dev.package."*"]
opt-level = "z"
If you want to debug a specific dependency (for example: cortex-m-rt), you can make it unoptimized like so:
# Don't optimize the `cortex-m-rt` crate
[profile.dev.package.cortex-m-rt]
opt-level = 0
# Optimize all the other dependencies
[profile.dev.package."*"]
opt-level = "z"

How much memory is consumed by Jemalloc, Debug Symbol and Panic? how to find this ? where it is located?

I am new to RUST as well as for programming. I just wrote LED blinking program on raspberry pi 3 using RUST language. It worked well.
My debug binary file size is 4.7MB. Its really huge. So I released the file and it got reduced to 2.5MB. I found that due to default operation of Jemalloc, Debug symbol and Panic Rust executables are very large. Can somebody help me out how much memory is consumed by Jemalloc, Debug Symbol and Panic? How to find this ? where it is located? How can I remove or deallocate Jemalloc?
I am working with Rust 1.38.0 stable version on Raspberry pi 3 using Visual studio code IDE.
main.rs file
use rust_gpiozero::*;
use std::thread;
use std::time::Duration;
fn main() {
//create a new LEd attached to pin 17
let led = LED::new(17);
//blink the led 5 times
for _ in 0.. 5{
led.on();
thread::sleep(Duration::from_secs(10));
led.off();
thread::sleep(Duration::from_secs(10));
}
}
cargo.toml file
[package]
name = "led_blink"
version = "0.1.0"
authors = ["pi"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
libc = "0.2"
rust_gpiozero = "0.2.0"
[profile.release]
codegen-units = 1
I want to know about how much memory consumed by jemalloc, debug symbol and panic in total size. and how to remove/Deactivate all three operation by default.
Looking for help, Thank you

How to config xLen in rocket core?

I am trying to use rocket core as a baseline core and add some additional features for research purpose, but I can't find where or how to change the value "xLen".
Rocket Chip uses a default XLen of 64 in it's DefaultConfig. However, this can be changed to 32 via a different top-level System configuration of DefaultRV32Config.
If you're working with the Rocket Chip emulator, you can compile these two different configurations with
cd emulator
CONFIG=DefaultConfig make
CONFIG=DefaultRV32Config make
For reference, take a look at the Rocket Chip System configurations defined in the system package as well as the subsystem configurations:
src/main/scala/system/Configs.scala
src/main/scala/subsystem/Configs.scala
The former defines DefaultConfig and DefaultRV32Config. The latter defines WithRV32. WithRV32 is what changes XLen to 32 (and also sets fLen to 32). Alternatively, you can replicate the behavior of WithRV32 in your own subclass of Config.

Resources