Using Template Haskell to add libraries with which to link

Using Template Haskell to add libraries with which to link - haskell

I'm currently hacking my way through trying to make quasiquotes for writing Rust code inline in Haskell. I think I have the code generation work done (including things like marshaling Haskell types to and from generated Rust ones). I now have the problem of figuring out how to do all the compilation and linking from within Template Haskell. The pipeline is as follows:
The quasiquote gets parsed
Source code is generated for
a corresponding Rust function
Haskell FFI imports
the Haskell call to the imported function
The Rust code gets compiled into a static library (like rustc --crate-type=staticlib qq_function.rs -o qq_function.a)
The Haskell code gets compiled and linked against qq_function.a (and a handful of other libraries like m, c, etc.)
My issue is getting steps 3 and 4 to happen entirely within TemplateHaskell. Here is as far as I've gotten:
runIO can write out the Rust source files that I've generated
addDependentFile informs GHC that the generated Rust file is a dependency
addForeignFile regrettably does not work for automatically managing the compilation since Rust is not a supported language (this is the approach inline-c takes since C is a supported language)
runIO could be used to generate the static Rust library (and delete the Rust source file afterwards) by calling out to rustc.
What is still very much not clear to me is
how I can use Template Haskell to add libraries against which to link and
how I can use Template Haskell to clean up these generated libraries afterwards?
EDIT
I've filed a GHC feature request related to this.

Related

How exactly is Rust programming language implemented?

If you check languages percentage in github rust lang compiler repository it says that 97.6% of the rust lang compiler is written in rust. So how does this exactly works?. How you can create a programming language (I think this is related to a compiler, since it's whom read the code, doesn't it?) written in itself.

This is called self-hosting or bootstrapping. The basic idea goes like this:
Write an initial compiler for a small subset of Rust using your Other Programming Language of Choice. You now have compiler C0.
Using the subset of Rust you have a compiler for, rewrite the source for C0 purely in Rust. Compile that program using compiler C0 to form compiler C1.
Add features to Rust by adding code to the compiler you just wrote to properly parse and implement those features. Compile that Rust program with C1 to form compiler C2.
By repeating step (3) as many times as you’d like, you can add progressively more and more features to the Rust language, with the Rust compiler always being written in Rust itself.
There’s a famous talk called Reflections on Trusting Trust that talks about how this process works, as well as how you can use this process to do Nefarious Things.

Can Rust code compile without the standard library?

I'm currently in the progress of learning Rust. I'm mainly using The Rust Programming Language book and this nice reference which relates Rust features/syntax to C++ equivalents.
I'm having a hard time understanding where the core language stops and the standard library starts. I've encountered a lot of operators and/or traits which seems to have a special relationship with the compiler. For example, Rust has a trait (which from what I understand is like an interface) called Deref which let's a type implementing it be de-referenced using the * operator:
fn main() {
let x = 5;
let y = Box::new(x);
assert_eq!(5, x);
assert_eq!(5, *y);
}
Another example is the ? operator, which seems to depend on the Result and Option types.
Can code that uses those operators can be compiled without the standard library? And if not, what parts of the Rust language are depending on the standard library? Is it even possible to compile any Rust code without it?

The Rust standard library is in fact separated into three distinct crates:
core, which is the glue between the language and the standard library. All types, traits and functions required by the language are found in this crate. This includes operator traits (found in core::ops), the Future trait (used by async fn), and compiler intrinsics. The core crate does not have any dependencies, so you can always use it.
alloc, which contains types and traits related to or requiring dynamic memory allocation. This includes dynamically allocated types such as Box<T>, Vec<T> and String.
std, which contains the whole standard library, including things from core and alloc but also things with further requirements, such as file system access, networking, etc.
If your environment does not provide the functionality required by the std crate, you can choose to compile without it. If your environment also does not provide dynamic memory allocation, you can choose to compile without the alloc crate as well. This option is useful for targets such as embedded systems or writing operating systems, where you usually won't have all of the things that the standard library usually requires.
You can use the #![no_std] attribute in the root of your crate to tell the compiler to compile without the standard library (only core). Many libraries also usually support "no-std" compilation (e.g. base64 and futures), where functionality may be restricted but it will work when compiling without the std crate.

DISCLAIMER: This is likely not the answer you're looking for. Consider reading the other answers about no_std, if you're trying to solve a problem. I suggest you only read on, if you're interested in trivia about the inner workings of Rust.
If you really want full control over the environment you use, it is possible to use Rust without the core library using the no_core attribute.
If you decide to do so, you will run into some problems, because the compiler is integrated with some items defined in core.
This integration works by applying the #[lang = "..."] attribute to those items, making them so called "lang items".
If you use no_core, you'll have to define your own lang items for the parts of the language you'll actually use.
For more information I suggest the following blog posts, which go into more detail on the topic of lang items and no_core:
Rust Tidbits: What Is a Lang Item?
Oxidizing the technical interview
So yes, in theory it is possible to run Rust code without any sort of standard library and supplied types, but only if you then supply the required types yourself.
Also this is not stable and will likely never be stabilized and it is generally not a recommended way of using Rust.

When you're not using std, you rely on core, which is a subset of the std library which is always (?) available. This is what's called a no_std environment, which is commonly used for some types of "embedded" programming. You can find more about no_std in the Rust Embedded book, including some guidance on how to get started with no_std programming.

What are module signatures in Haskell?

I have recently found Haskell's feature called "module signatures". As I have discovered they are put in .hsig files and begin with signature keyword instead of module.
The example syntax of such a file may look like
signature Str where
data Str
empty :: Str
append :: Str -> Str -> Str
However, I cannot imagine how and why one would use them. Could you explain me which problems do they solve and how to properly make use of them?
They strongly remind me the module system that one can see in OCaml (link), which also has modules signatures and separate implementations, but I can't decide how close are these two concepts. Is it somehow related?

They are related to the OCaml module system, but with some important differences:
Signatures are defined within the language (in .hsig files) but unlike in OCaml they are not instantiated within the language. Instead, the package manager controls instantiation (currently, only Cabal provides that). Modules never know if they are importing an abstract signature or an actual module.
Implementation modules know nothing about signatures and do not not reference them directly. Any existing module can implement a signature if the definitions happen to be compatible.
Instantiation is triggered by a coincidence of module name and signature name in the dependencies of some compilation unit (executable, library, test suite...) When the names coincide, a process called "signature matching" takes place that verifies that the types and definitions are compatible.
The "happy path" is that in your program you depend on some library having a signature "hole", and also on another library that provides an implementation module with the same name. Then signature matching happens automatically. When the names don't match, or we need multiple instantiations of the signature-using library, we have to rename signatures and/or modules in the mixins section of the Cabal file.
As for why module signatures might be useful, consider bytestring, the most popular library by far for handling binary data in Haskell. But there are others, for example stdio with its Bytes type.
Suppose you are writing your own library that uses binary data, and you don't want to force your users into either stdio or bytestring. What are your choices?
One would be to create something like a Bytelike class and parameterize all your functions with it. You would also need to add a type parameter to every data type that contains bytes.
Another would be to create a signature that defines an abstract binary data type and all the operations that are required of it. Your library would make use of the signature, and remain "indefinite" until the user depends both on your library and a suitable implementation when creating his own libraries.
From the perspective of the user, the typeclass solution is unsatisfactory. The user knows that he wants to use either ByteString or Bytes, just one of them. The decision will not depend on some runtime flag and will remain constant across the extent of his program. And yet he has to deal with a more complex API that reminds him of that already decided issue at every turn.
It's better if he makes the decision once, writes it in his .cabal file, and deals with a simpler API from then onwards.

As described here, they're quite closely related to OCaml module signatures. They allow you to create a package missing some modules and say these modules, containing such and such types and values, should be delivered by package's user. I haven't tested it myself, but I imagine that such a package works very much like an OCaml functor.

Why are some foreign functions statically linked while others are dynamically linked?

I'm working on a program that needs to manipulate git repositories. I've decided to use libgit2. Unfortunately, the haskell bindings for it are several years out of date and lack several functions that I require. Because of this I've decided to write the portions that use libgit2 in C and call them through the FFI. For demonstration purposes one of them is called git_update_repo.
git_update_repo works perfectly when used in a pure C program, however when it's called from haskell an assertion fails indicating that the libgit2 global init function, git_libgit2_init, hasn't been called. But, git_libgit2_init is called by git_update_repo. And if I use gdb I can see that git_libgit2_init is indeed called and reports that the initialization has been successful.
I've used nm to examine the executables and found something interesting. In a pure C executable, all the libgit2 functions are dynamically linked (as expected). However, in my haskell executable, git_libgit2_init is dynamically linked, while the rest of the libgit2 functions are statically linked. I'm certain that this mismatch is the cause of my issue.
So why do certain functions get linked dynamically and others statically? How can I change this?
The relevant settings in my .cabal file are
cc-options: -g
c-sources:
src/git-bindings.c
extra-libraries:
git2

Is there a compiled* programming language with dynamic, maybe even weak typing?

I wondered if there is a programming language which compiles to machine code/binary (not bytecode then executed by a VM, that's something completely different when considering typing) that features dynamic and/or weak typing, e.g:
Think of a compiled language where:
Variables don't need to be declared
Variables can be created during runtime
Functions can return values of different types
Questions:
Is there such a programming language?
(Why) not?
I think that a dynamically yet strong typed, compiled language would really sense, but is it possible?

I believe Lisp fits that description.
http://en.wikipedia.org/wiki/Common_Lisp

Yes, it is possible. See Julia. It is a dynamic language (you can write programs without types) but it never runs on a VM. It compiles the program to native code at runtime (JIT compilation).

Objective-C might have some of the properties you seek. Classes can be opened and altered in runtime, and you can send any kind of message to an object, whether it usually responds to it or not. In that way, you can implement duck typing, much like in Ruby. The type id, roughly equivalent to a void*, can be endowed with interfaces that specify a contract that the (otherwise unknown) type will adhere to.

C# 4.0 has many, if not all of these characteristics. If you really want native machine code, you can compile the bytecode down to machine code using a utility.
In particular, the use of the dynamic keyword allows objects and their members to be bound dynamically at runtime.
Check out Anders Hejlsberg's video, The Future of C#, for a primer:
http://channel9.msdn.com/pdc2008/TL16/

Objective-C has many of the features you mention: it compiles to machine code and is effectively dynamically typed with respect to object instances. The id type can store any class instance and Objective-C uses message passing instead of member function calls. Methods can be created/added at runtime. The Objective-C runtime can also synthesize class instance variables at runtime, but local variables still need to be declared (just as in C).
C# 4.0 has many of these features, except that it is compiled to IL (bytecode) and interpreted using a virtual machine (the CLR). This brings up an interesting point, however: if bytecode is just-in-time compiled to machine code, does that count? If so, it opens to the door to not only any of the .Net languages, but Python (see PyPy or Unladed Swallow or IronPython) and Ruby (see MacRuby or IronRuby) and many other dynamically typed languages, not mention many LISP variants.

In a similar vein to Lisp, there is Factor, a concatenative* language with no variables by default, dynamic typing, and a flexible object system. Factor code can be run in the interactive interpreter, or compiled to a native executable using its deploy function.
* point-free functional stack-based

VB 6 has most of that

I don't know of any language that has exactly those capabilities. I can think of two that have a significant subset, though:
D has type inference, garbage collection, and powerful metaprogramming facilities, yet compiles to efficient machine code. It does not have dynamic typing, however.
C# can be compiled directly to machine code via the mono project. C# has a similar feature set to D, but again without dynamic typing.

Python to C probably needs these criteria.
Write in Python.
Compile Python to Executable. See Process to convert simple Python script into Windows executable. Also see Writing code translator from Python to C?

Elixir does this. The flexibility of dynamic variable typing helps with doing hot-code updates (for which Erlang was designed). Files are compiled to run on the BEAM, the Erlang/Elixir VM.

C/C++ both indirectly support dynamic typing using void*. C++ example:
#include <string>
int main() {
void* x = malloc(sizeof(int))
*(int*)x = 5;
x = malloc(sizeof(std::string));
*(std::string*x) = std::string("Hello world");
free(x);
return 0;
}
In C++17, std::any can be used as well:
#include <string>
#include <any>
int main() {
std::any x = 5;
x = std::string("Hello world");
return 0;
}
Of course, duck typing is rarely used or needed in C/C++, and both of these options have issues (void* is unsafe, std::any is a huge performance bottleneck).
Another example of what you may be looking for is the V8 engine for JavaScript. It is a JIT compiler, meaning the source code is compiled to bytecode and then machine code at runtime, although this is hidden from the user.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string