Why is ARG_MAX not defined via limits.h? - linux

On Fedora Core 7, I'm writing some code that relies on ARG_MAX. However, even if I #include <limits.h>, the constant is still not defined. My investigations show that it's present in <sys/linux/limits.h>, but this is supposed to be portable across Win32/Mac/Linux, so directly including it isn't an option. What's going on here?

The reason it's not in limits.h is that it's not a quantity giving the limits of the value range of an integral type based on bit width on the current architecture. That's the role assigned to limits.h by the ISO standard.
The value in which you're interested is not hardware-bound in practice and can vary from platform to platform and perhaps system build to system build.
The correct thing to do is to call sysconf and ask it for "ARG_MAX" or "_POSIX_ARG_MAX". I think that's the POSIX-compliant solution anyway.
Acc. to my documentation, you include one or both of unistd.h or limits.h based on what values you're requesting.
One other point: many implementations of the exec family of functions return E2BIG or a similar value if you try to call them with an oversized environment. This is one of the defined conditions under which exec can actually return.

For the edification of future people like myself who find themselves here after a web search for "arg_max posix", here is a demonstration of the POSIXly-correct method for discerning ARG_MAX on your system that Thomas Kammeyer refers to in his answer:
cc -x c <(echo '
#include <unistd.h>
#include <stdio.h>
int main() { printf("%li\n", sysconf(_SC_ARG_MAX)); }
')
This uses the process substitution feature of Bash; put the same lines in a file and run cc thefile.c if you are using some other shell.
Here's the output for macOS 10.14:
$ ./a.out
262144
Here's the output for a RHEL 7.x system configured for use in an HPC environment:
$ ./a.out
4611686018427387903
$ ./a.out | numfmt --to=iec-i # 'numfmt' from GNU coreutils
4.0Ei
For contrast, here is the method prescribed by https://porkmail.org/era/unix/arg-max.html, which uses the C preprocessor:
cpp <<HERE | tail -1
#include <limits.h>
ARG_MAX
HERE
This does not work on Linux for reasons still not entirely clear to me—I am not a systems programmer and not conversant in the POSIX or ISO specs—but probably explained above.

ARG_MAX is defined in /usr/include/linux/limits.h. My linux kernel version is 3.2.0-38.

Related

Clang recursive include path

I have a problem when including dependency folder as this isn't looking for headers recursively.
FOLDER STRUCTURE:
- main.cpp
- dependency
- sub1
- header1.h
- sub2
- header2.h
- root-header.h
main.cpp
#include "root-header.h"
#include "header1.h"
#include "header2.h"
int main() {
}
Command:
clang main.cpp -I"dependency"
Error:
fatal error: 'header1.h' file not found
The command only detects header.h inside dependency folder to one level, how to make the clang to recursively lookup for all headers inside dependency folder. Is there any compiler arguments to be added?
Thanks
The ISO/IEC 9899:2011 standard in section §6.10.2 explains the expected behavior of clang and other compilers:
# include <h-char-sequence> new-line
searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.
You can modify the defined places by adding additional with the -I option, but a compiler should not search sub-directories.
You can work around this limitation in the spec by using make to compile a list of additional -I locations to add to you clang command. This is covered in #DanBonachea answer.
Instead, I'd advise you to change the includes to be compliant to the specification:
#include "sub1/header1.h"
#include "sub2/header2.h"
The conventional solutions are one of the following:
1. Change the include directives in the source code
This solution compiles with clang++ -Idependency main.cpp but modifies #include directives to include headers by subdirectory, eg:
#include "sub1/header1.h"
#include "sub2/header2.h"
This is obviously a modification to the code, so usually only makes sense if sub1 and sub2 are meaningful within the larger structure of the software (e.g. package names that are always the same). Or...
2. Use shell tools to traverse the directory and build the include path
This solution uses find to inject subdirectories on the include path, eg:
$ clang++ `find ./dependency -type d -exec echo -I'{}' \;` main.cpp
which scans to identify the subdirectories and adds them to the preprocessor include path.
Discussion
Both of these approaches should work with few changes with basically any C/C++ compiler on UNIX (incl Linux, macOS, WSL, etc).
Note the second approach above will involve some additional filesystem churn on every compilation, which might be noticeable if the number of subdirectories is very large. To be fair this cost is fundamental to that use case, and even if built-in support for recursive include existed in the compiler frontend, it would still need to perform a similarly expensive recursive directory traversal on every compilation to find all the files.
3. Amortize directory traversal
However we can improve upon the second solution if we assume all the headers that will be included from this directory structure have unique names. This is a reasonable assumption, because otherwise the unqualified #include directives inside the source files will be ambiguous, leading to orthogonal problems. With this assumption in hand, we can create a cache to amortize the cost of the dependency directory traversal as follows:
$ mkdir allheaders ; cd allheaders
$ find ../dependency -type f -exec ln -s '{}' . \;
Then compilation simply becomes:
$ clang++ -Iallheaders main.cpp
Or, if you additionally want to support a mix of option 1 and option 3 #include directives, then:
$ clang++ -Idependency -Iallheaders main.cpp
This approach could greatly accelerate compilation, because the preprocessor only needs to open one user directory and open the files by basename. The fact that the directory may contain a large number of headers (with some fraction potentially unused) should not significantly degrade performance, thanks to how filesystems work.
If we further assume the file names in the dependency directory change infrequently or never, then we only need to execute the directory traversal step once, and can amortize that cost against repeated compilation using the allheaders cache directory.

Is seteuid a system call on Linux?

All of the literature that I have read so far on setuid talks about seteuid in a way that implies it is a system call. The section 2 man pages never say if a function is a system call or not, so seteuid(2) is no help. And if it isn't a system call, meaning the functionality is not provided by the kernel, then how can "set effective UID" be achieved?
The section 2 man pages are all system calls -- that's what section 2 is for. The section 3 man pages are all library calls, as that's what section 3 is for. See man(1) (the manual page for man itself) for the list of sections and what they are:
1 Executable programs or shell commands
2 System calls (functions provided by the kernel)
3 Library calls (functions within program libraries)
4 Special files (usually found in /dev)
5 File formats and conventions eg /etc/passwd
6 Games
7 Miscellaneous (including macro packages and conventions), e.g.
man(7), groff(7)
8 System administration commands (usually only for root)
9 Kernel routines [Non standard]
You can easily verify if it is a system call or if it is defined in libc by writing a little program and running strace on it. For example,
int main() {
seteuid();
}
gcc -o main main.c
-bash-4.2$ strace ./main 2>&1 | grep set
setresuid(-1, 1, -1) = -1 EPERM (Operation not permitted)
So in this case seteuid is implemented in libc. See here for implementation

C program shows %zu after conversion to Windows

I complied a linux program on windows via Mingw. However, the output of the program looks different on Windows than on Linux.
For example, on Windows the output is this (I get 'zu' instead of real numbers):
Approximated minimal memory consumption:
Sequence : zuM
Buffer : 1 X zuM = zuM
Table : 1 X zuM = zuM
Miscellaneous : zuM
Total : zuM
On Linux, the original program compiles (without Mingw) with a warning. On Windows, under Mingw, it compiles with zero warnings.
There is anything I should be aware about?
Does Mingw offer 100% compatibility or I have to modify the program to work on Win?
I don't know in which direction to head. Where should I start my attempt of fixing the program?
Do you think I have better chances with Cygwin?
Update:
Wikipedia mentions this: "the lack of support for C99 has caused porting problems, particularly where printf-style conversion specifiers are concerned".
Is this the thing in which I bumped my head?
Update:
My mingw version is:
MINGWBASEDIR=C:\MinGW
gcc version 4.8.1 (GCC)
gcc version 4.8.1 (GCC)
GNU gdb (GDB) 7.6.1
GNU ld (GNU Binutils) 2.24
GNU windres (GNU Binutils) 2.24
GNU dlltool (GNU Binutils) 2.24
GNU Make 3.82.90
#define __MINGW32_VERSION 3.20
#define __W32API_VERSION 3.17
(I used this code to get the version:
#echo off
REM version-of-mingw.bat
REM credit to Peter Ward work in ReactOS Build Environment RosBE.cmd it gave me a starting point that I edited.
::
:: Display the current version of GCC, ld, make and others.
::
REM %CD% works in Windows XP, not sure when it was added to Windows
REM set MINGWBASEDIR=C:\MinGW
set MINGWBASEDIR=%CD%
ECHO MINGWBASEDIR=%MINGWBASEDIR%
SET PATH=%MINGWBASEDIR%\bin;%SystemRoot%\system32
if exist %MINGWBASEDIR%\bin\gcc.exe (gcc -v 2>&1 | find "gcc version")
REM if exist %MINGWBASEDIR%\bin\gcc.exe gcc -print-search-dirs
if exist %MINGWBASEDIR%\bin\c++.exe (c++ -v 2>&1 | find "gcc version")
if exist %MINGWBASEDIR%\bin\gcc-sjlj.exe (gcc-sjlj.exe -v 2>&1 | find "gcc version")
if exist %MINGWBASEDIR%\bin\gcc-dw2.exe (gcc-dw2.exe -v 2>&1 | find "gcc version")
if exist %MINGWBASEDIR%\bin\gdb.exe (gdb.exe -v | find "GNU gdb")
if exist %MINGWBASEDIR%\bin\nasm.exe (nasm -v)
if exist %MINGWBASEDIR%\bin\ld.exe (ld -v)
if exist %MINGWBASEDIR%\bin\windres.exe (windres --version | find "GNU windres")
if exist %MINGWBASEDIR%\bin\dlltool.exe (dlltool --version | find "GNU dlltool")
if exist %MINGWBASEDIR%\bin\pexports.exe (pexports | find "PExports" )
if exist %MINGWBASEDIR%\bin\mingw32-make.exe (mingw32-make -v | find "GNU Make")
if exist %MINGWBASEDIR%\bin\make.exe (ECHO It is not recommended to have make.exe in mingw/bin)
REM ECHO "The minGW runtime version is the same as __MINGW32_VERSION"
if exist "%MINGWBASEDIR%\include\_mingw.h" (type "%MINGWBASEDIR%\include\_mingw.h" | find "__MINGW32_VERSION" | find "#define")
if exist "%MINGWBASEDIR%\include\w32api.h" (type "%MINGWBASEDIR%\include\w32api.h" | find "__W32API_VERSION")
:_end
PAUSE
)
As suggested by the bug report discussion linked in the comments, Microsoft's printf functions do not support C99. The mingw-w64 project provides alternative functions that may be used as if they were the normal C99 functions if the macro __USE_MINGW_ANSI_STDIO is set to 1 either before including any headers or on the command line. They support the standard %zu, %jd, etc. format specifiers that even the newest MSVCRT versions do not. You may invoke the function directly using mingw_printf, but it is usually easier to just define the aforementioned macro to 1 and call printf, etc.
It is worth noting that if you use Microsoft's snprintf, it will return -1 to indicate truncation if the buffer is not large enough, unless the buffer and buffer size parameters are NULL and 0 respectively, in which case the number of bytes that would be output is returned. C99 behavior is to always return the number of bytes that would be output if the buffer was sufficiently large, or a negative value if an encoding error occurs, and the mingw-w64 implementation seems to behave correctly according to C99.
And all you need to do to get all of this standard behavior is either #define __USE_MINGW_ANSI_STDIO 1 before any includes if you use any of the printf functions or simply add -D__USE_MINGW_ANSI_STDIO=1 to your compiler invocation.
If you are worried about the macro interfering with other platforms, no other implementation except the original (legacy?) MinGW[32] project that provided similar functionality should actually make use of this preprocessor macro, so it is safe to define it unconditionally.

Where are include files stored - Ubuntu Linux, GCC

So, when we do the following:
#include <stdio.h>
versus
#include "myFile.h"
the compiler, GCC in my case, knows where that stdio.h (and even the object file) are located on my hard drive. It just utilizes the files with no interaction from me.
I think that on my Ubuntu Linux machine the files are stored at /usr/include/. How does the compiler know where to look for these files? Is this configurable or is this just the expected default? Where would I look for this configuration?
Since I'm asking a question on these include files, what are the source of the files? I know this might be fuzzy in the Linux community but who manages these? Who would provide and manage the same files for a Windows compiler.
I was always under the impression that they come with the compiler but that was an assumption...
See here: Search Path
Summary:
#include <stdio.h>
When the include file is in brackets the preprocessor first searches in paths specified via the -I flag. Then it searches the standard include paths (see the above link, and use the -v flag to test on your system).
#include "myFile.h"
When the include file is in quotes the preprocessor first searches in the current directory, then paths specified by -iquote, then -I paths, then the standard paths.
-nostdinc can be used to prevent the preprocessor from searching the standard paths at all.
Environment variables can also be used to add search paths.
When compiling if you use the -v flag you can see the search paths used.
gcc is a rich and complex "orchestrating" program that calls many other programs to perform its duties. For the specific purpose of seeing where #include "goo" and #include <zap> will search on your system, I recommend:
$ touch a.c
$ gcc -v -E a.c
...
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib/gcc/i686-apple-darwin9/4.0.1/include
/usr/include
/System/Library/Frameworks (framework directory)
/Library/Frameworks (framework directory)
End of search list.
# 1 "a.c"
This is one way to see the search lists for included files, including (if any) directories into which #include "..." will look but #include <...> won't. This specific list I'm showing is actually on Mac OS X (aka Darwin) but the commands I recommend will show you the search lists (as well as interesting configuration details that I've replaced with ... here;-) on any system on which gcc runs properly.
Karl answered your search-path question, but as far as the "source of the files" goes, one thing to be aware of is that if you install the libfoo package and want to do some development with it (i.e., use its headers), you will also need to install libfoo-dev. The standard library header files are already in /usr/include, as you saw.
Note that some libraries with a lot of headers will install them to a subdirectory, e.g., /usr/include/openssl. To include one of those, just provide the path without the /usr/include part, for example:
#include <openssl/aes.h>
The \#include files of gcc are stored in /usr/include .
The standard include files of g++ are stored in /usr/include/c++.

Get the path of link that it points to?

Is it possible to get the abolute path of the link that it is pointing to?
Is there any simple system command?
I need for all of the following OS
HP-UX 11i, 1123u, 1123i
AIX 5.2 and 5.3
Suse Linux 10
Solaris 10
You didn't specify a language, so I assume you want a command that can be run in whatever shell you are using. The ls command has the -l (that is an ell) option which prints out a lot of information about the file. The last bit of information is the full path, so you should be able to say
ls -l file | awk '{print $NF}'
on any SUS2 compliant machine (which should be all of the commercial UNIXes). This will have a problem if the file or the any of the directories leading up to the file have spaces though.
If you are looking for a system call, you want readlink(2). This is standardized, and so should be available on all POSIX compliant systems.
Here's an example of its usage, taken from the link given earlier:
#include <unistd.h>
char buf[1024];
ssizet_t len;
if ((len = readlink("/modules/pass1", buf, sizeof(buf)-1)) != -1)
buf[len] = '\0';
If you're looking for a command line utility, it doesn't look like there is one standardized, but GNU (Linux) and BSD both have readlink(1).

Resources