Mechanics Fortran Preprocessor - scope

I recently happened to come across the preprocessing option most Fortran compilers support these days (as explained e.g. in the Fortran Wiki) . Coming from a C background, I would like to better understand the mechanics and caveats related to the (Fortran-)preprocessor's #include directive.
To avoid any confusion right from the beginning: there are two include directives in Fortran (see e.g. F77 reference)
include "foo" is a compiler directive, i.e. foo can only contain Fortran statements
#include "bar" is a preprocessor directive, i.e. bar can contain #defines and the like
I am aware of this difference and I am interested in the second case only (my question is therefore not a duplicate of this post).
I'll explain my questions using an example: assume we have two files, a header file (macro.h) and a source file (display.F):
macro.h
#define MSG() say_hello()
display.F
#include "macro.h"
PROGRAM display
CALL MSG()
CALL another_message()
END
SUBROUTINE say_hello()
WRITE(*,*) 'Hello.'
END
SUBROUTINE another_message()
CALL MSG()
END
Here are my questions:
Scope
where (globally, locally in the SUBROUTINE etc.) is the macro MSG() defined if I include macro.h:
at the beginning of the file (as above)?
at the beginning of the PROGRAM display (and nowhere else)?
at the beginning of e.g. SUBROUTINE another_message() (and nowhere else)?
From testing it seems: 1. globally, 2. in PROGRAM and all SUBROUTINES, 3. in that SUBROUTINE only. A confirmation of these assumptions and some theoretical explanations why would be great.
What of above (1. - 3.) is best practice for preprocessor includes?
Include Guards
If I have a multi-file project and I include header.h in multiple *.F source files, do I need to provide include guards?
In case the answers to the above questions should be compiler dependent (as preprocessing is not Fortran standard), I'd be most interested in ifort's behaviour.

The rules are the same as for the C preprocessor you know. GCC even uses the same cpp for C and Fortran (for Fortran in the traditional mode). Therefore there is no scope around, everything is just a text and the preprocessor doesn't care about program units.
Therefore, 1., 2. and 3. all are valid from the place of their definition until the file end or until #undef. They are also valid in recursively #included files.
If by guards you mean #undef then yes, otherwise a warning or error about redefinition appears, but only if you include all those files from a single file. If they are independent then no.
The key is to think about the preprocessor as a text replacement tool. It knows nothing about Fortran.
Last thing, the preprocessor is non-standard, but widely available.

Related

Which code in LLVM IR runs before "main()"?

Does anyone know the general rule for exactly which LLVM IR code will be executed before main?
When using Clang++ 3.6, it seems that global class variables have their constructors called via a function in the ".text.startup" section of the object file. For example:
define internal void #__cxx_global_var_init() section ".text.startup" {
call void #_ZN7MyClassC2Ev(%class.MyClass* #M)
ret void
}
From this example, I'd guess that I should be looking for exactly those IR function definitions that specify section ".text.startup".
I have two reasons to suspect my theory is correct:
I don't see anything else in my LLVM IR file (.ll) suggesting that the global object constructors should be run first, if we assume that LLVM isn't sniffing for C++ -specific function names like "__cxx_global_var_init". So section ".text.startup" is the only obvious means of saying that code should run before main(). But even if that's correct, we've identified a sufficient condition for causing a function to run before main(), but haven't shown that it's the only way in LLVM IR to cause a function to run before main().
The Gnu linker, in some cases, will use the first instruction in the .text section to be the program entry point. This article on Raspberry Pi programming describes causing the .text.startup content to be the first body of code appearing in the program's .text section, as a means of causing the .text.startup code to run first.
Unfortunately I'm not finding much else to support my theory:
When I grep the LLVM 3.6 source code for the string ".startup", I only find it in the CLang-specific parts of the LLVM code. For my theory to be correct, I would expect to have found that string in other parts of the LLVM code as well; in particular, parts outside of the C++ front-end.
This article on data initialization in C++ seems to hint at ".text.startup" having a special role, but it doesn't come right out and say that the Linux program loader actually looks for a section of that name. Even if it did, I'd be surprised to find a potentially Linux-specific section name carrying special meaning in platform-neutral LLVM IR.
The Linux 3.13.0 source code doesn't seem to contain the string ".startup", suggesting to me that the program loader isn't sniffing for a section with the name ".text.startup".
The answer is pretty easy - LLVM is not executing anything behind the scenes. It's a job of the C runtime (CRT) to perform all necessary preparations before running main(). This includes (but not limited to) to static ctors and similar things. The runtime is usually informed about these objects via addresses of constructores being emitted in the special sections (e.g. .init_array or .ctors). See e.g. http://wiki.osdev.org/Calling_Global_Constructors for more information.

How to retrieve the type of architecture (linux versus Windows) within my fortran code

How can I retrieve the type of architecture (linux versus Windows) in my fortran code? Is there some sort of intrinsic function or subroutine that gives this information? Then I would like to use a switch like this every time I have a system call:
if (trim(adjustl(Arch))=='Linux') then
resul = system('ls > output.txt')
elseif (trim(adjustl(Arch))=='Windows')
resul = system('dir > output.txt')
else
write(*,*) 'architecture not supported'
stop
endif
thanks
A.
The Fortran 2003 standard introduced the GET_ENVIRONMENT_VARIABLE intrinsic subroutine. A simple form of call would be
call GET_ENVIRONMENT_VARIABLE (NAME, VALUE)
which will return the value of the variable called NAME in VALUE. The routine has other optional arguments, your favourite reference documentation will explain all. This rather assumes that you can find an environment variable to tell you what the executing platform is.
If your compiler doesn't yet implement this standard approach it is extremely likely to have a non-standard approach; a routine called getenv used to be available on more than one of the Fortran compilers I've used in the recent past.
The 2008 standard introduced a standard function COMPILER_OPTIONS which will return a string containing the compilation options used for the program, if, that is, the compiler supports this sort of thing. This seems to be less widely implemented yet than GET_ENVIRONMENT_VARIABLE, as ever consult your compiler documentation set for details and availability. If it is available it may also be useful to you.
You may also be interested in the 2008-introduced subroutine EXECUTE_COMMAND_LINE which is the standard replacement for the widely-implemented but non-standard system routine that you use in your snippet. This is already available in a number of current Fortran compilers.
There is no intrinsic function in Fortran for this. A common workaround is to use conditional compilation (through makefile or compiler supported macros) such as here. If you really insist on this kind of solution, you might consider making an external function, e.g., in C. However, since your code is built for a fixed platform (Windows/Linux, not both), the first solution is preferable.

RcppArmadillo and RcppGSL

I would like to utilize both RcppArmadillo and RcppGSL via sourceCpp. Basically I am interested in modifying the B-spline example
http://dirk.eddelbuettel.com/blog/2012/12/08/
so that the B-splines are functions of R^3 instead of only R^1. This entails working with 3-dimensional arrays which apparently is not supported in GSL (there is an extension here http://savannah.nongnu.org/projects/marray though). However, RcppArmadillo has the arma::cube type which I could use if only I could get RcppArmadillo and RcppGSL to "work together." This I am unfortunately not able to do. I have looked at
Multiple plugins in cxxfunction
but have not succeeded in creating the mentioned combined plugin. Any help is greatly appreciated!
Adam
Edit: It seems like it is actually possible to compile a .cpp file with sourceCpp containing the following sequence of commands at the top:
// [[Rcpp::depends(RcppGSL)]]
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <RcppGSL.h>
#include <gsl/gsl_bspline.h>
Furthermore, it also seems possible to store values like
double gsl_vector_get (const gsl_vector * v, size_t i)
in an arma::cube structure.
RcppArmadillo.h and RcppGSL.h are modelled similarly. They first include RcppCommon.h, then some forward declarations, then they include Rcpp.h that uses those forward declarations, then implementations.
It is definitely possible to use them both if you come up with the right order of includes.
This is definitely an Rcpp question as what is preventing you from using them both is (good or bad) design decisions.
You need to study RcppArmadillo.h and RcppGSL.h and come up with the right include order, or wait for someone to follow these hints and give you the answer. I might not have time to do it myself in the next few days.
Armadillo types and GSL types are not interchangeable.
You could rewrite the GSL algorithm for Armadillo, but it is not automatic but any means. I am also not sure if the theory behind splines extends just like that from the real line to three-dimensions.

What is the difference between _imp and __imp?

I came across an interesting error when I was trying to link to an MSVC-compiled library using MinGW while working in Qt Creator. The linker complained of a missing symbol that went like _imp_FunctionName. When I realized That it was due to a missing extern "C", and fixed it, I also ran the MSVC compiler with /FAcs to see what the symbols are. Turns out, it was __imp_FunctionName (which is also the way I've read on MSDN and quite a few guru bloggers' sites).
I'm thoroughly confused about how the MinGW linker complains about a symbol beginning with _imp, but is able to find it nicely although it begins with __imp. Can a deep compiler magician shed some light on this? I used Visual Studio 2010.
This is fairly straight-forward identifier decoration at work. The imp_ prefix is auto-generated by the compiler, it exports a function pointer that allows optimizing binding to DLL exports. By language rules, the imp_ is prefixed by a leading underscore, required since it lives in the global namespace and is generated by the implementation and doesn't otherwise appear in the source code. So you get _imp_.
Next thing that happens is that the compiler decorates identifiers to allow the linker to catch declaration mis-matches. Pretty important because the compiler cannot diagnose declaration mismatches across modules and diagnosing them yourself at runtime is very painful.
First there's C++ decoration, a very involved scheme that supports function overloads. It generates pretty bizarre looking names, usually including lots of ? and # characters with extra characters for the argument and return types so that overloads are unambiguous. Then there's decoration for C identifiers, they are based on the calling convention. A cdecl function has a single leading underscore, an stdcall function has a leading underscore and a trailing #n that permits diagnosing argument declaration mismatches before they imbalance the stack. The C decoration is absent in 64-bit code, there is (blessfully) only one calling convention.
So you got the linker error because you forgot to specify C linkage, the linker was asked to match the heavily decorated C++ name with the mildly decorated C name. You then fixed it with extern "C", now you got the single added underscore for cdecl, turning _imp_ into __imp_.

Origins of the name 'main' for program entry point?

Out of curiosity, what are the origins of the name 'main' for a program entry point?
Before C, there was IBM's PL/I. In PL/I you declared a procedure with options. If you wrote
PROC MUMBLE OPTIONS(MAIN);
that told the compiler that the MUMBLE procedure was the main procedure. PL/I may have adopted this convention from elsewhere, or C may have adopted it from PL/I, or maybe it was just in the air. But it definitely predates C.
(If anyone is wondering why all upper case, the IBM keypunches of the day did not support lower-case characters. Yes, I wrote programs on punched cards. That's probably why I'm a bit shaky on the syntax; it has been a while.)
I'm pretty sure that it has to do with the fact that it is the 'main' function of the program. Anything more than that is unknown to me.
In Fortran the main program was the main program even though it didn't have a name. It was distinguished from subroutines and functions by having an executable statement (or other non-commentary statement) without a preceding SUBROUTINE or FUNCTION statement.
When later languages decided they wanted the main routine to start with a beginning line like other procedures or functions, some of them adopted the word MAIN or main in various ways.
As someone else pointed out, Pascal did it differently. Shell scripts and Perl resemble Fortran.
My understanding (though I couldn't find a reference to confirm) is that some early languages had a notion of a main procedure (the first might have been Ada), even though you did not have to name it main().
I think that C was the first language to actually use this token as a name. C largely replaced Pascal which didn't have a named start procedure, if I remember correctly.
From there it influenced subsequent languages that were C inspired like C++, Java and C#.
It also influenced culturally languages that do not mandate such a function, like Python.

Resources