Maximum number of Bash arguments != max num cp arguments? - linux

I have recently been copying and moving a large number of files (~400,000). I know that there are limitations on the number of arguments that can be expanded on the Bash command line, so I have been using xargs to limit the numbers produced.
Out of curiosity, I wondered what the maximum number of arguments that I could use was, and I found this post saying that it was system-dependant, and that I could run this command to find out:
$ getconf ARG_MAX
To my surprise, the anwser I got back was:
2621440
Just over 2.6 million. As I said, the number of files that I am manipulating is much less than this -- around 400k. I definitely need to use the xargs method of moving and copying these files, because I tried using a normal mv * ... or cp * ... and got a 'Argument list too long' error.
So, do the mv and cp commands have their own fixed limit on the number of arguments that I can use (I couldn't find anything in their man pages), or am I missing something?

As Ignacio said, ARG_MAX is the maximum length of the buffer of arguments passed to exec(), not the maximum number of files (this page has a very in-depth explanation). Specifically, it lists fs/exec.c as checking the following condition:
PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *) / sizeof(void *)
And, it seems, you have some additional limitations:
On a 32-bit Linux, this is ARGMAX/4-1 (32767). This becomes relevant if the average length of arguments is smaller than 4.
Since Linux 2.6.23, this function tests if the number exceeds MAX_ARG_STRINGS in <linux/binfmts.h> (2^32-1 = 4294967296-1).
And as additional limit, one argument must not be longer than MAX_ARG_STRLEN (131072).

ARG_MAX is the maximum length of the arguments to the exec(3) functions. A shell is not required to support passing this length of arguments from its command line.

Related

SHC "Argument list too long" not able to resolve by increasing stack size to unlimited [duplicate]

I have to pass 256Kb of text as an argument to the "aws sqs" command but am running into a limit in the command-line at around 140Kb. This has been discussed in many places that it been solved in the Linux kernel as of 2.6.23 kernel.
But cannot get it to work. I am using 3.14.48-33.39.amzn1.x86_64
Here's a simple example to test:
#!/bin/bash
SIZE=1000
while [ $SIZE -lt 300000 ]
do
echo "$SIZE"
VAR="`head -c $SIZE < /dev/zero | tr '\0' 'a'`"
./foo "$VAR"
let SIZE="( $SIZE * 20 ) / 19"
done
And the foo script is just:
#!/bin/bash
echo -n "$1" | wc -c
And the output for me is:
117037
123196
123196
129680
129680
136505
./testCL: line 11: ./foo: Argument list too long
143689
./testCL: line 11: ./foo: Argument list too long
151251
./testCL: line 11: ./foo: Argument list too long
159211
So, the question how do I modify the testCL script is it can pass 256Kb of data? Btw, I have tried adding ulimit -s 65536 to the script and it didn't help.
And if this is plain impossible I can deal with that but can you shed light on this quote from my link above
"While Linux is not Plan 9, in 2.6.23 Linux is adding variable
argument length. Theoretically you shouldn't hit frequently "argument
list too long" errors again, but this patch also limits the maximum
argument length to 25% of the maximum stack limit (ulimit -s)."
edit:
I was finally able to pass <= 256 KB as a single command line argument (see edit (4) in the bottom). However, please read carefully how I did it and decide for yourself if this is a way you want to go. At least you should be able to understand why you are 'stuck' otherwise from what I found out.
With the coupling of ARG_MAX to ulim -s / 4 came the introduction of MAX_ARG_STRLEN as max. length of an argument:
/*
* linux/fs/exec.c
*
* Copyright (C) 1991, 1992 Linus Torvalds
*/
...
#ifdef CONFIG_MMU
/*
* The nascent bprm->mm is not visible until exec_mmap() but it can
* use a lot of memory, account these pages in current->mm temporary
* for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
* change the counter back via acct_arg_size(0).
*/
...
static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
return len <= MAX_ARG_STRLEN;
}
...
#else
...
static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
return len <= bprm->p;
}
#endif /* CONFIG_MMU */
...
static int copy_strings(int argc, struct user_arg_ptr argv,
struct linux_binprm *bprm)
{
...
str = get_user_arg_ptr(argv, argc);
...
len = strnlen_user(str, MAX_ARG_STRLEN);
if (!len)
goto out;
ret = -E2BIG;
if (!valid_arg_len(bprm, len))
goto out;
...
}
...
MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
...
/*
* These are the maximum length and maximum number of strings passed to the
* execve() system call. MAX_ARG_STRLEN is essentially random but serves to
* prevent the kernel from being unduly impacted by misaddressed pointers.
* MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer.
*/
#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
#define MAX_ARG_STRINGS 0x7FFFFFFF
...
The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
I can't try it now but maybe switching to huge page mode (page size 4 MB) if possible on your system solves this problem.
For more detailed information and references see this answer to a similar question on Unix & Linux SE.
edits:
(1)
According to this answer one can change the page size of x86_64 Linux to 1 MB by enabling CONFIG_TRANSPARENT_HUGEPAGE and setting CONFIG_TRANSPARENT_HUGEPAGE_MADVISE to n in the kernel config.
(2)
After recompiling my kernel with the above configuration changes getconf PAGESIZE still returns 4096.
According to this answer CONFIG_HUGETLB_PAGE is also needed which I could pull in via CONFIG_HUGETLBFS. I am recompiling now and will test again.
(3)
I recompiled my kernel with CONFIG_HUGETLBFS enabled and now /proc/meminfo contains the corresponding HugePages_* entries mentioned in the corresponding section of the kernel documentation.
However, the page size according to getconf PAGESIZE is still unchanged. So while I should be able now to request huge pages via mmap calls, the kernel's default page size determining MAX_ARG_STRLEN is still fixed at 4 KB.
(4)
I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces:
...
117037
123196
123196
129680
129680
136505
143689
151251
159211
...
227982
227982
239981
239981
252611
252611
265906
./testCL: line 11: ./foo: Argument list too long
279901
./testCL: line 11: ./foo: Argument list too long
294632
./testCL: line 11: ./foo: Argument list too long
So now the limit moved from 128 KB to 256 KB as expected.
I don't know about potential side effects though.
As far as I can tell, my system seems to run just fine.
Just put the arguments into some file, and modify your program to accept "arguments" from a file. A common convention (notably used by GCC and several other GNU programs) is that an argument like #/tmp/arglist.txt asks your program to read arguments from file /tmp/arglist.txt, often one line per argument
You might perhaps pass some data thru long environment variables, but they also are limited (and what is limited by the kernel in fact is the size of main's initial stack, containing both program arguments and the environment)
Alternatively, modify your program to be configurable thru some configuration file which would contain the information you want to pass thru arguments.
(If you can recompile your kernel, you might try to increase -to a bigger power of two much smaller than your available RAM, e.g. to 2097152- the ARG_MAX which is #define-d in linux-4.*/include/uapi/linux/limits.h before recompiling your kernel)
In other ways, there is no way to circumvent that limitation (see execve(2) man page and its Limits on size of arguments and environment section) - once you have raised your stack limit (using setrlimit(2) with RLIMIT_STACK, generally with ulimit builtin in the parent shell). You need to deal with it otherwise.

How to get around the Linux "Too Many Arguments" limit

I have to pass 256Kb of text as an argument to the "aws sqs" command but am running into a limit in the command-line at around 140Kb. This has been discussed in many places that it been solved in the Linux kernel as of 2.6.23 kernel.
But cannot get it to work. I am using 3.14.48-33.39.amzn1.x86_64
Here's a simple example to test:
#!/bin/bash
SIZE=1000
while [ $SIZE -lt 300000 ]
do
echo "$SIZE"
VAR="`head -c $SIZE < /dev/zero | tr '\0' 'a'`"
./foo "$VAR"
let SIZE="( $SIZE * 20 ) / 19"
done
And the foo script is just:
#!/bin/bash
echo -n "$1" | wc -c
And the output for me is:
117037
123196
123196
129680
129680
136505
./testCL: line 11: ./foo: Argument list too long
143689
./testCL: line 11: ./foo: Argument list too long
151251
./testCL: line 11: ./foo: Argument list too long
159211
So, the question how do I modify the testCL script is it can pass 256Kb of data? Btw, I have tried adding ulimit -s 65536 to the script and it didn't help.
And if this is plain impossible I can deal with that but can you shed light on this quote from my link above
"While Linux is not Plan 9, in 2.6.23 Linux is adding variable
argument length. Theoretically you shouldn't hit frequently "argument
list too long" errors again, but this patch also limits the maximum
argument length to 25% of the maximum stack limit (ulimit -s)."
edit:
I was finally able to pass <= 256 KB as a single command line argument (see edit (4) in the bottom). However, please read carefully how I did it and decide for yourself if this is a way you want to go. At least you should be able to understand why you are 'stuck' otherwise from what I found out.
With the coupling of ARG_MAX to ulim -s / 4 came the introduction of MAX_ARG_STRLEN as max. length of an argument:
/*
* linux/fs/exec.c
*
* Copyright (C) 1991, 1992 Linus Torvalds
*/
...
#ifdef CONFIG_MMU
/*
* The nascent bprm->mm is not visible until exec_mmap() but it can
* use a lot of memory, account these pages in current->mm temporary
* for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
* change the counter back via acct_arg_size(0).
*/
...
static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
return len <= MAX_ARG_STRLEN;
}
...
#else
...
static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
return len <= bprm->p;
}
#endif /* CONFIG_MMU */
...
static int copy_strings(int argc, struct user_arg_ptr argv,
struct linux_binprm *bprm)
{
...
str = get_user_arg_ptr(argv, argc);
...
len = strnlen_user(str, MAX_ARG_STRLEN);
if (!len)
goto out;
ret = -E2BIG;
if (!valid_arg_len(bprm, len))
goto out;
...
}
...
MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
...
/*
* These are the maximum length and maximum number of strings passed to the
* execve() system call. MAX_ARG_STRLEN is essentially random but serves to
* prevent the kernel from being unduly impacted by misaddressed pointers.
* MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer.
*/
#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
#define MAX_ARG_STRINGS 0x7FFFFFFF
...
The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
I can't try it now but maybe switching to huge page mode (page size 4 MB) if possible on your system solves this problem.
For more detailed information and references see this answer to a similar question on Unix & Linux SE.
edits:
(1)
According to this answer one can change the page size of x86_64 Linux to 1 MB by enabling CONFIG_TRANSPARENT_HUGEPAGE and setting CONFIG_TRANSPARENT_HUGEPAGE_MADVISE to n in the kernel config.
(2)
After recompiling my kernel with the above configuration changes getconf PAGESIZE still returns 4096.
According to this answer CONFIG_HUGETLB_PAGE is also needed which I could pull in via CONFIG_HUGETLBFS. I am recompiling now and will test again.
(3)
I recompiled my kernel with CONFIG_HUGETLBFS enabled and now /proc/meminfo contains the corresponding HugePages_* entries mentioned in the corresponding section of the kernel documentation.
However, the page size according to getconf PAGESIZE is still unchanged. So while I should be able now to request huge pages via mmap calls, the kernel's default page size determining MAX_ARG_STRLEN is still fixed at 4 KB.
(4)
I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces:
...
117037
123196
123196
129680
129680
136505
143689
151251
159211
...
227982
227982
239981
239981
252611
252611
265906
./testCL: line 11: ./foo: Argument list too long
279901
./testCL: line 11: ./foo: Argument list too long
294632
./testCL: line 11: ./foo: Argument list too long
So now the limit moved from 128 KB to 256 KB as expected.
I don't know about potential side effects though.
As far as I can tell, my system seems to run just fine.
Just put the arguments into some file, and modify your program to accept "arguments" from a file. A common convention (notably used by GCC and several other GNU programs) is that an argument like #/tmp/arglist.txt asks your program to read arguments from file /tmp/arglist.txt, often one line per argument
You might perhaps pass some data thru long environment variables, but they also are limited (and what is limited by the kernel in fact is the size of main's initial stack, containing both program arguments and the environment)
Alternatively, modify your program to be configurable thru some configuration file which would contain the information you want to pass thru arguments.
(If you can recompile your kernel, you might try to increase -to a bigger power of two much smaller than your available RAM, e.g. to 2097152- the ARG_MAX which is #define-d in linux-4.*/include/uapi/linux/limits.h before recompiling your kernel)
In other ways, there is no way to circumvent that limitation (see execve(2) man page and its Limits on size of arguments and environment section) - once you have raised your stack limit (using setrlimit(2) with RLIMIT_STACK, generally with ulimit builtin in the parent shell). You need to deal with it otherwise.

Getting length of /proc/self/exe symlink

As mentioned on SO, readlink on /proc/self/exe can be used to get the executable path on linux. man 2 readlink recommends that one should use lstat to extract the required length of a path. However, when I stat /proc/self/exe, the st_size member is set to 0. How can I get the length for allocating the buffer?
taken from man 2 lstat, under NOTES
For most files under the /proc directory, stat() does not return
the file size in the st_size field; instead the field is returned
with the value 0.
That's why it does not work
In practice, I would tend to use a reasonable size (e.g. 256 or 1024, or PATH_MAX) for readlink of /proc/*/exe (or /proc/self/exe)
The point is that almost always, executables are supposed to be started by humans, so either the PATH (for execvp(3) or some shell) or the entire file path is human friendly. I don't know any people who explicitly uses very long filenames (not fitting in width in some terminal screen). I never heard of executable programs (or scripts) whose filename exceeds a hundred of bytes.
So just use a local buffer of some reasonable size (and perhaps strdup it on success if so needed). And readlink(2) returns the number of meaningful bytes in its buffer (so if you really care, grow the buffer and make a loop till it fits).
For readlink of /proc/self/exe, I would do it into a 256 bytes buffer at initialization, and abort (with a meaningful error message) if it does not fit (or fail, e.g. because /proc/ is not mounted).

Is it possible to increase the maximum number of characters that ksh variable accepts?

This is a follow up question to
What is the maximum number of characters that the ksh variable accepts?
I checked my environment and it's allowing only
#include <sys/limits.h>
$ cpp << HERE | tail -1
> #include <limits.h>
> ARG_MAX
> HERE
1048576
Is there a way to increase this? Or any alternatives for
while read line;
do
#parse logic
done < $filename
To handle really long lines? Based from the records I'm parsing it will not stop at 2M character lines.
Environment Details :
AIX $ KSH Version M-11/16/88f
You could compile a Linux 3.7.x kernel, and edit its include/uapi/linux/limits.h file to increase the ARG_MAX argument (to some bigger power of two, e.g. 2097152). But you should rather have a lot of RAM (e.g. 8GBytes) if you want to increase it more.
The actual limit is related to execve(2). That man page has a paragraph on it.
But you could probably avoid having huge shell variables (in the Unix environment). Did you consider using some other tool (awk, python, perl ....) to read your file? Their variable environment is not the shell environment transmitted to forked programs, so they can have variables with very long values. Maybe ksh has some builtin (unexport) to avoid exporting some variable into the Unix environment.

Make/Execvp Error in Cygwin:

The following error occurs in make, while trying to do incremental builds:
make[2]: execvp: C:/path/to/compiler.exe: Message too long
I suspect my problem here is the argument length for execvp. Any idea what that limit is? How would one go about changing that?
Some curious extra information: the same command succeeds when previous make dependencies are in a folder with a shorter name. Is the amount of memory available to execvp dependent somehow affected by previous commands?
E.g. chopping 17 characters off the path to the incremental build files (of which there are hundreds) saves about 12k characters, and the 6k char command line to the compiler succeeds. Without reducing that path, the same command line fails.
CreateProcess() from Windows has the following limitations:
1) pCommandLine [in, out, optional]
The command line to be executed. The maximum length of this string is 32,768 characters, including the Unicode terminating null character.
2) The ANSI version of this function, CreateProcessA fails if the total size of the environment block for the process exceeds 32,767 characters.
I had similar problem caused by limitation 2) but no good solution has been found. Probably recompiling cygwin with Unicode calls to CreateProcess() would help. For me it was sufficient to remove something from environment.
Krzysztof Nowak
I'm getting this error because my %PATH% (which is taken from $PATH) is too long.

Resources