Dynamic cargo feature flags - rust

Is there a way to change the feature flags of an included library at run time? Or more specifically change the feature flags depending on certain cpu feature flags? Can it be done pragmatically or can it be done via cli arguments?
We have this library (curve25519-dalek) that allows you to use default, or AVX2, or IFMA. But only if your cpu running the code can support it. Is there a way to create a binary that will pass the correct feature flag to the library so that your code will run the correct instruction set? You would want to run the fastest instruction set as the speed gains are significant.

Related

Build on AMD, run on Intel?

If I cargo build --release a Rust binary on an AMD CPU and then run it on an Intel (or vice versa), could that be a problem (compatibility issues and/or considerable performance sacrifice)? I know we can use a target-cpu=<cpu> flag and that should result in a potentially more optimized machine code for the target platform. My questions are:
Practically speaking, if we build for one platform but run on the other, should we expect a significant runtime performance penalty?
If we build on AMD with target-cpu=intel (or vice versa), could the compilation itself be:
slower?
restricted in how well it could optimize for the target platform?
Note: Linux will be the OS for both compilation and running.
In general, if you just do a cargo build --release without further configuration, then you will get a binary that runs on any machine of the relevant architecture. That is, it will run on either an Intel or AMD CPU that's x86-64. It will also be optimized for a general CPU of that architecture, regardless of what type of CPU you build it on. The specific settings are going to depend on whatever rustc and LLVM are configured for, but unless you've done a custom build, that's usually the case.
Usually that is sufficient for most people's needs and building for a target CPU is unnecessary. However, if you specify a particular CPU, then it will be optimized for that CPU and may contain instructions that don't run on other CPUs. For example, the architectural definition for x86-64 doesn't contain things like AVX, which is a later addition, so if you compile for a CPU providing those instructions then rustc may use them, which may cause it to perform worse or not at all elsewhere.
It's impossible to say more about your particular situation without knowing more about your code and performance needs. My recommendation is just to ues cargo build --release and not to optimize for a specific CPU unless you have measured the code and determined that there is a particular section which is slow and which would benefit from that. Most people benefit greatly from the additional portability of the code and don't need CPU-specific optimizations.
Everything I've said here is also true of other sets of architectures. If you compile for aarch64-unknown-linux-gnu or riscv64gc-unknown-linux-gnu, it will build for a generic CPU of that type which works on all systems of that type unless you specify different options. The exception tends to be on systems like macOS, where it is specifically known that all CPUs that run on that OS have some specific set of features, and thus a compilation for e.g. x86-64 CPUs on macOS might optimize for Intel CPUs with given features since macOS only runs on hardware that uses those CPUs.

MSVC /arch:[instruction set] - SSE3, AVX, AVX2

Here is an example of a class which shows supported instruction sets. https://msdn.microsoft.com/en-us/library/hskdteyh.aspx
I want to write three different implementations of a single function, each of them using different instruction set. But due to flag /ARCH:AVX2, for example, this app won't ever run anywhere but on 4th+ generation of Intel processors, so the whole point of checking is pointless.
So, question is: what exactly this flag does? Enables support or enables compiler optimizations using provided instruction sets?
In other words, can I completely remove this flag and keep using functions from immintrin.h, emmintrin.h, etc?
An using of option /ARCH:AVX2 allows to use YMM registers and AVX2 instructions of CPU by the best way. But if CPU is not support these instruction it will be a program crash. If you use AVX2 instructions and compiler flag /ARCH:SSE2 that will be a decreasing of performance (about 2x times).
So the best implementation when every implementation of your function is compiled with corresponding compiler options (/ARCH:AVX2, /ARCH:SSE2 and so on). The easiest way to do it - put your implementations (scalar, SSE, AVX) in different files and compile each file with specific compiler options.
Also it will be a good idea if you create a separate file where you can check CPU abilities and call corresponding implementation of your function.
There is an example of a library which does CPU checking and calling an one of implemented function.

Executing binaries without execve?

I saw somewhere mentioned that one can "emulate" execve (primarily with open and mmap) in order to load some other binary (without actual "execve" syscall).
Are there any already implemented examples for it?
Can we load both static and dynamic binaries?
Can it be done portably?
Such feature may be useful for delegating work to arbitrary binaries ignoring filesystem bits or with seccomp policy installed not allowing actual execve.

How does AppArmor do "Environment Scrubbing"?

The AppArmor documentation mentions giving applications the ability to execute other programs with or without enviroment scrubbing. Apparently a scrubbed environment is more secure, but the documentation doesn't seem to specify exactly how environment scrubbing happens.
What is environment scrubbing and what does AppArmor do to scrub the environment?
"Environment scrubbing" is the removal of various "dangerous" environment variables which may be used to affect the behaviour of a binary - for example, LD_PRELOAD can be used to make the dynamic linker pull in code which can make essentially arbitrary changes to the running of a program; some variables can be set to cause trace output to files with well-known names; etc.
This scrubbing is normally performed for setuid/setgid binaries as a security measure, but the kernel provides a hook to allow security modules to enable it for arbitrary other binaries as well.
The kernel's ELF loader code uses this hook to set the AT_SECURE entry in the "auxiliary vector" of information which is passed to the binary. (See here and here for the implementation of this hook in the AppArmor code.)
As execution starts in userspace, the dynamic linker picks up this value and uses it to set the __libc_enable_secure flag; you'll see that the same routine also contains the code which sets this flag for setuid/setgid binaries. (There is equivalent code elsewhere for binaries which are statically linked.)
__libc_enable_secure affects a number of places in the main body of the dynamic linker code, and causes a list of specific environment variables to be removed.

Basic doubt in Oprofile

I am trying to profile my software (in Linux) with oprofile. My software consists of both userspace and kernel module. First my doubt is what does the --separate=kernel option do? What will be the difference when running without that option? I did try to see it but couldn't find any difference. Could you please post an example?
Can't i profile a kernel module without the --seperate=kernel option?
Thanks,
Bala
In oprofile when used with option --seperate=kernel, it seperates the kernel and kernel modules per application.
--seperate='library' seperates the samples for the dynamically linked object per application basis.
kernel, dynamically linked object are just not specific to the application we want to profile alone. But at the same time our application spends considerable amount of time in them.
So --seperate allows one to get the samples from the point of view of the application we are interested in profiling. It can also give samples based on individual threads also.
Kernel can be profiled by providing --vmlinux option to opcontrol.
Ex:- opcontrol --vmlinux=/boot/vmlinux-2.6.27.23-0.1-preempt
--seperate is additional option that allows us to see the samples at different resolutions.

Resources