Clang recursive include path - linux

I have a problem when including dependency folder as this isn't looking for headers recursively.
FOLDER STRUCTURE:
- main.cpp
- dependency
- sub1
- header1.h
- sub2
- header2.h
- root-header.h
main.cpp
#include "root-header.h"
#include "header1.h"
#include "header2.h"
int main() {
}
Command:
clang main.cpp -I"dependency"
Error:
fatal error: 'header1.h' file not found
The command only detects header.h inside dependency folder to one level, how to make the clang to recursively lookup for all headers inside dependency folder. Is there any compiler arguments to be added?
Thanks

The ISO/IEC 9899:2011 standard in section §6.10.2 explains the expected behavior of clang and other compilers:
# include <h-char-sequence> new-line
searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.
You can modify the defined places by adding additional with the -I option, but a compiler should not search sub-directories.
You can work around this limitation in the spec by using make to compile a list of additional -I locations to add to you clang command. This is covered in #DanBonachea answer.
Instead, I'd advise you to change the includes to be compliant to the specification:
#include "sub1/header1.h"
#include "sub2/header2.h"

The conventional solutions are one of the following:
1. Change the include directives in the source code
This solution compiles with clang++ -Idependency main.cpp but modifies #include directives to include headers by subdirectory, eg:
#include "sub1/header1.h"
#include "sub2/header2.h"
This is obviously a modification to the code, so usually only makes sense if sub1 and sub2 are meaningful within the larger structure of the software (e.g. package names that are always the same). Or...
2. Use shell tools to traverse the directory and build the include path
This solution uses find to inject subdirectories on the include path, eg:
$ clang++ `find ./dependency -type d -exec echo -I'{}' \;` main.cpp
which scans to identify the subdirectories and adds them to the preprocessor include path.
Discussion
Both of these approaches should work with few changes with basically any C/C++ compiler on UNIX (incl Linux, macOS, WSL, etc).
Note the second approach above will involve some additional filesystem churn on every compilation, which might be noticeable if the number of subdirectories is very large. To be fair this cost is fundamental to that use case, and even if built-in support for recursive include existed in the compiler frontend, it would still need to perform a similarly expensive recursive directory traversal on every compilation to find all the files.
3. Amortize directory traversal
However we can improve upon the second solution if we assume all the headers that will be included from this directory structure have unique names. This is a reasonable assumption, because otherwise the unqualified #include directives inside the source files will be ambiguous, leading to orthogonal problems. With this assumption in hand, we can create a cache to amortize the cost of the dependency directory traversal as follows:
$ mkdir allheaders ; cd allheaders
$ find ../dependency -type f -exec ln -s '{}' . \;
Then compilation simply becomes:
$ clang++ -Iallheaders main.cpp
Or, if you additionally want to support a mix of option 1 and option 3 #include directives, then:
$ clang++ -Idependency -Iallheaders main.cpp
This approach could greatly accelerate compilation, because the preprocessor only needs to open one user directory and open the files by basename. The fact that the directory may contain a large number of headers (with some fraction potentially unused) should not significantly degrade performance, thanks to how filesystems work.
If we further assume the file names in the dependency directory change infrequently or never, then we only need to execute the directory traversal step once, and can amortize that cost against repeated compilation using the allheaders cache directory.

Related

How is the -fprofile-prefix-path option supposed to work?

When compiling code for coverage instrumentation (to use with lcov later on), we're compiling from a base directory tree (let's call it A), and we want the .gcda files to be produced at a different place (because the target directory tree is different - let's call it B).
So, the compilation command looked like this:
gcc -O0 -g --coverage -fprofile-dir=B -c -fPIC -Wall -o A/otherpath/to/mySourceFile.o A/path/to/mySourceFile.c
When checking the contents of mySourceFile.o (with the strings command), I saw that the mySourceFile.gcda file was set to be generated in B/A/otherpath/to/mySourceFile.gcda
Which is the mangling of the path given through the -fprofile-dir option with the exact absolute path of the object file created - just as written in the documentation. So far, no problem - except that what I want would be the mySourceFile.gcda file to be generated from the B directory, WITHOUT the A part.
So, the documentation also mentions the -fprofile-prefix-path option, which is supposed to allow you to remove part of the path, so that the mangling doesn't add the old path to the new.
I tried using it in the following way:
gcc -O0 -g --coverage -fprofile-dir=B -fprofile-prefix-path=A -c -fPIC -Wall -o A/otherpath/to/mySourceFile.o A/path/to/mySourceFile.c
However, after checking through strings, once again, in the generated object file, the path was still B/A/otherpath/to/mySourceFile.gcda, whereas I expected it to be B/otherpath/to/mySourceFile.gcda (that is, I expected the A part to have been stripped by the -fprofile-prefix-path option.)
Obviously, it didn't work. Any insight why ?
( Compiler used is GCC 11.2.1, which is a version recent enough to know about the option. )
Ok, after some tinkering, I got results. Maybe not exactly what I was expecting, but close enough.
Let me start by saying that the A and B "directories" I mentioned in my question were absolute paths. And it didn't work well.
However, while keeping the absolute B (target) path, I tried not using the full A (source) path while compiling. More precisely, I didn't use it to specify the OUTPUT file name, for the object. Instead, I went to the base directory (the A path), and then, ran the command by specifying the output file path relative to the current (A) directory
Which would give the following command:
(From directory A)
gcc -O0 -g --coverage -fprofile-dir=B -fprofile-prefix-path=A -c -fPIC -Wall -o otherpath/to/mySourceFile.o path/to/mySourceFile.c
This time, the source command did show an interesting result, for the mySourceFile.gcda file:
B#otherpath#to#mySourceFile.gcda
As you can see, it's not exactly what I wanted (there are # instead of /), but mentions to A disappeared, and overall, I'm confident it should work as intended. Not utterly sure yet (I still have to test it on the target platform, which will need tinkering with the way the makefiles currently work), but confident nonetheless.
Also, if I didn't use the -fprofile-prefix-path in the command, then the string would mention the A path, like this (with the '/' inside the A path being replaced with '#' characters, obviously):
B#A#otherpath#to#mySourceFile.gcda
So, the option works, but only when using relative paths, not when using absolute ones, for the object file. Hope that helps people.
PS: I checked by changing the path to the source (.c) file. Whether specified using absolute, or relative, paths, it didn't change the outcome. What matters is specifying the path to the object file in a relative manner.

GNU Make wildcard function and terminating slashes on directory names

I have some issues with the behavior of the wildcard function of GNU Make with respect to terminating slashes in the pattern and the output.
Consider the following simple directory structure:
dir
|
+-- file
|
+-- subdir
On Linux,
$(wildcard dir/*/) # (1)
evaluates with GNU Make 4.1 to
dir/subdir/ dir/file
but with GNU Make 4.3 to
dir/subdir/
One could argue whether including the regular file filein the former case is a bug or a feature (names of directories but not those of regular files are terminated with a slash). However, both versions of GNU Make evaluate
$(wildcard $(addsuffix /,$(wildcard dir/*))) # (2)
to
dir/file dir/subdir/
(subject to sorting). In particular, $(wildcard dir/file/) evaluates to dir/file. This is more in the spirit of the above GNU Make 4.1 feature but seems to be somewhat inconsistent with respect to GNU Make 4.3.
What can I assume from the wildcard function regarding a terminating slash in the pattern?
I would like to determine the content of a directory such that the names of subdirectories are terminated by a slash while the names of regular files are not. In GNU Make 4.1 I used approach 1 which broke my build with GNU Make 4.3. In both cases I could use approach 2. But is this feasible or do I rely on undefined behavior here? If so, what would be the correct (and efficient) way to do what I want?
The problem is not simple. The short answer is that the behavior of GNU make 4.3 is correct for the expansion of dir/*/ and the behavior of earlier versions of make that don't agree with that, are wrong.
As for the behavior of dir/file/ that seems to me to be wrong in all versions of GNU make; that is, it should return the empty string.
However, GNU make doesn't actually implement its own file globbing, at least not on systems that provide the GNU libc C runtime library, which is most Linux systems. It simply calls the system-provided glob(3) function. I wrote a small test program that simply calls GNU libc's glob(3) function directly and it gives the same behavior as GNU make 4.3:
dir/*/ -> dir/subdir/
dir/file/ -> dir/file/
In my opinion this is a bug in GNU libc's glob(3) but perhaps I'm missing some subtlety here.
In any event, if what you really want is just directories then the best/safest/works everywhere solution is to use this:
$(wildcard dir/*/.)
then you don't have to worry about magical behaviors related to trailing slashes.
The function wildcard-rec in the GNUmake table toolkit does exactly what you want. It distinguishes between files and directories via a obvious feature: if the given glob ends in / then you want directories, if the / is absent you want files.
include gmtt.mk
$(info $(call wildcard-rec,**.c)) # all C source files in the tree
$(info $(call wildcard-rec,**.c **.h)) # a C source and header files
$(info $(call wildcard-rec,drivers/**.c)) # only sources for the `drivers` tree
$(info $(call wildcard-rec,drivers/**/test/)) # all test subdirectories in the `drivers` tree
$(info $(call wildcard-rec,drivers/**/test/*.cfg)) # config files in all test subdirectories in the `drivers` tree

Including header files in cygwin

As you know the getch() and getche() functions don't work with the cygwin, a linux oriented one.
But can I include the conio.h header file of borland c and call the functions getch in my makefiles?
Will it work and can anyone tell me how to include the header files from different directories in cywgin.
I have a header file strcal.h in directory c:/makk/string/.
How do I include that header file in my makefile?
gcc -I/string small.c
It is not working and my current directory is makk.
In stdio.h, there is a getchar() function which is what you need. You can't just bring across the Borland header file since that just declares the function, it doesn't define it. Standard C has no need for getch().
To include header files in different areas, you use the -I directives of gcc to set up search paths.
So, if you have a /xyz/myheader.h file, you can do something like:
gcc -I /xyz myprogram.c
To get at c:/makk/string/strcal.h, you may have to use gcc -I /cygdrive/c/makk/string or, if you know you're actually in that makk directory, you can use -I string (note the lack of leading / since you want a relative path, not an absolute one).

Where are include files stored - Ubuntu Linux, GCC

So, when we do the following:
#include <stdio.h>
versus
#include "myFile.h"
the compiler, GCC in my case, knows where that stdio.h (and even the object file) are located on my hard drive. It just utilizes the files with no interaction from me.
I think that on my Ubuntu Linux machine the files are stored at /usr/include/. How does the compiler know where to look for these files? Is this configurable or is this just the expected default? Where would I look for this configuration?
Since I'm asking a question on these include files, what are the source of the files? I know this might be fuzzy in the Linux community but who manages these? Who would provide and manage the same files for a Windows compiler.
I was always under the impression that they come with the compiler but that was an assumption...
See here: Search Path
Summary:
#include <stdio.h>
When the include file is in brackets the preprocessor first searches in paths specified via the -I flag. Then it searches the standard include paths (see the above link, and use the -v flag to test on your system).
#include "myFile.h"
When the include file is in quotes the preprocessor first searches in the current directory, then paths specified by -iquote, then -I paths, then the standard paths.
-nostdinc can be used to prevent the preprocessor from searching the standard paths at all.
Environment variables can also be used to add search paths.
When compiling if you use the -v flag you can see the search paths used.
gcc is a rich and complex "orchestrating" program that calls many other programs to perform its duties. For the specific purpose of seeing where #include "goo" and #include <zap> will search on your system, I recommend:
$ touch a.c
$ gcc -v -E a.c
...
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib/gcc/i686-apple-darwin9/4.0.1/include
/usr/include
/System/Library/Frameworks (framework directory)
/Library/Frameworks (framework directory)
End of search list.
# 1 "a.c"
This is one way to see the search lists for included files, including (if any) directories into which #include "..." will look but #include <...> won't. This specific list I'm showing is actually on Mac OS X (aka Darwin) but the commands I recommend will show you the search lists (as well as interesting configuration details that I've replaced with ... here;-) on any system on which gcc runs properly.
Karl answered your search-path question, but as far as the "source of the files" goes, one thing to be aware of is that if you install the libfoo package and want to do some development with it (i.e., use its headers), you will also need to install libfoo-dev. The standard library header files are already in /usr/include, as you saw.
Note that some libraries with a lot of headers will install them to a subdirectory, e.g., /usr/include/openssl. To include one of those, just provide the path without the /usr/include part, for example:
#include <openssl/aes.h>
The \#include files of gcc are stored in /usr/include .
The standard include files of g++ are stored in /usr/include/c++.

g++ searches /lib/../lib/, then /lib/

According to g++ -print-search-dirs my C++ compiler is searching for libraries in many directories, including ...
/lib/../lib/:
/usr/lib/../lib/:
/lib/:
/usr/lib/
Naively, /lib/../lib/ would appear to be the same directory as /lib/ — lib's parent will have a child named lib, "that man's father's son is my father's son's son" and all that. The same holds for /usr/lib/../lib/ and /usr/lib/
Is there some reason, perhaps having to do with symbolic links, that g++ ought to be configured to search both /lib/../lib/ and /lib/?
If this is unnecessary redundancy, how would one go about fixing it?
If it matters, this was observed on an unmodified install of Ubuntu 9.04.
Edit: More information.
The results are from executing g++ -print-search-dirs with no other switches, from a bash shell.
Neither LIBRARY_PATH nor LPATH are output from printenv, and both echo $LPATH and echo LIBRARY_PATH return blank lines.
An attempt at an answer (which I gathered from a few minutes of looking at the gcc.c driver source and the Makefile environment).
These paths are constructed in runtime from:
GCC exec prefix (see GCC documentation on GCC_EXEC_PREFIX)
The $LIBRARY_PATH environment variable
The $LPATH environment variable (which is treated like $LIBRARY_PATH)
Any values passed to -B command-line switch
Standard executable prefixes (as specified during compilation time)
Tooldir prefix
The last one (tooldir prefix) is usually defined to be a relative path:
From gcc's Makefile.in
# Directory in which the compiler finds libraries etc.
libsubdir = $(libdir)/gcc/$(target_noncanonical)/$(version)
# Directory in which the compiler finds executables
libexecsubdir = $(libexecdir)/gcc/$(target_noncanonical)/$(version)
# Used to produce a relative $(gcc_tooldir) in gcc.o
unlibsubdir = ../../..
....
# These go as compilation flags, so they define the tooldir base prefix
# as ../../../../, and the one of the library search prefixes as ../../../
# These get PREFIX appended, and then machine for which gcc is built
# i.e i484-linux-gnu, to get something like:
# /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../i486-linux-gnu/lib/../lib/
DRIVER_DEFINES = \
-DSTANDARD_STARTFILE_PREFIX=\"$(unlibsubdir)/\" \
-DTOOLDIR_BASE_PREFIX=\"$(unlibsubdir)/../\" \
However, these are for compiler-version specific paths. Your examples are likely affected by the environment variables that I've listed above (LIBRARY_PATH, LPATH)
Well, theoretically, if /lib was a symlink to /drive2/foo, then /lib/../lib would point to /drive2/lib if I'm not mistaken. Theoretically...
Edit: I just tested and it's not the case - it comes back to /lib. Hrm :(

Resources