Why is the character x used to mean extract in linux commands? - linux

Why is the character x used to mean extract in linux commands?
For example, in the command tar xf helloworld, the x means extract.
Why is x used to mean extract instead of k or u?

There isn't always a clear reason why certain letters are chosen to apply certain options. Especially as there only 26 letters in the alphabet, and sometimes more than 26 options. In this case, long options tend to be used such as --extract, or a seemingly random letter. An example in the tar command you mentioned is -h for --dereference - this is what the author chose to use.
In this case, I believe that the -x is from eXtract. If you say "x" out loud, it sounds like "Ex" - the first syllable of the word.
-k is used as shorcut for --keep-old-files and -u is used for --update so they can't be used for --extract.

Related

How to (with good asymptotic complexity) delete all of a specific character from a (very long) line?

I have a line with 25.9 million characters about 2.4 million of which are commas, and I want to remove all of the commas from the line.
If I use the command :s/,//g it constructs a regular expression which is run repeatedly on the line until there are no commas left. This seems to run in O(n^2) time based on empirical measurement. And as such my regular expression runs for well over an hour on this line.
Using a macro is no good because of the redraw that occurs which tends to be somewhat expensive when you are in the middle of such a long line.
Splitting up the lines seems to be the best option, but due to the structure of the file, I'd need to create a new buffer to do so cleanly.
Yes, there are much better ways to output this much data that does not involve CSVs with ridiculous numbers of columns, let's assume I didn't generate it, but I have it, and I have to work with it.
Is there an asymptotically fast way to simply delete every occurrence of a specific character from a line in vim?
As a text editor, Vim isn't well suited to such pathologically formatted files (as you've already found out).
As others have already commented, tr is a good alternative for removing the commas. Either externally:
$ tr -d , input.txt
Or from within Vim:
:.! tr -d ,
Vim also has a built-in low-level function :help tr(). Unfortunately, it doesn't handle deletion, only conversion. You could use it to change commas into semicolons in the current line like this:
:call setline('.', tr(getline('.'), ',', ';'))

In bash, how do I execute the contents of a variable, VERBATIM, as though they were a command line?

I need to construct a complex command that includes quoted arguments. As it happens, they are arguments to grep, so I'll use that as my example and deeply simplify the command to just enough to demonstrate the error.
Let's start with a working example:
> COMMAND='/usr/bin/grep _'
> echo $COMMAND
/usr/bin/grep _
> $COMMAND
foo <- I type this, and grep filters it out.
foo_ <- I type this, and.....
foo_ <- ... it matches, so grep emits it.
"foo" is not echoed back because it lacks an underscore, "foo_" has one, so it's returned. Let's get to a demonstration of the problem:
> COMMAND='/usr/bin/grep "_ _"'
> echo -E $COMMAND
/usr/bin/grep "_ _"
> /usr/bin/grep "_ _" <- The exact same command line
foo <- fails to match
foo_ _ <- matches, so it gets echoed back
foo_ _
> $COMMAND <- But that command doesn't work from a variable
grep: _": No such file or directory
In other words, when this command is invoked through a variable name, bash is taking the space between underscores as an argument delimiter - despite the quotes.
Normally, I'd fix this with backslashes:
> COMMAND='/usr/bin/grep "_\ _"'
> $COMMAND
grep: trailing backslash (\)
Okay, maybe I need another layer of escaping the backslash:
> COMMAND='/usr/bin/grep "_\\ _"'
12:32 (master) /Users/ronbarry> $COMMAND
grep: _": No such file or directory
And now we're back to square one - the command line is still being broken up at the space. I can, of course, verify all of this with some debugging, which establishes that the backslashes are surviving, unescaped, and grep is being called with multiple arguments:
> set -x
> $COMMAND
+ /usr/bin/grep '"_\\' '_"' <- grep is being called with two args
I have a solution to the problem that takes advantage of arrays, but packing commands this way (in my full implementation, which I'll spare you) is unfamiliar to most people who'd read my code. To oversimplify the creation of an array-based command:
> declare -a COMMAND=('/usr/bin/grep' '-i' 'a b')
12:44 (master) /Users/ronbarry> ${COMMAND[*]}
foo <- Same old, same old
fooa B <- ...
fooa B <- Matches because of case-insensitive (-i) grep.
Finally we get to the question. Why does bash break up quoted arguments in strings when interpreting them as commands and why doesn't there seem to be a string-y way to get it to work? If I have a command packed in a string variable, it violates the Principle of Least Surprise to have that string interpreted differently than the string itself would be. If someone can point me at some docs that cover all of this, and will set me at peace with why I have to resort to the infinitely uglier mechanism of building up arrays with all of my commands, I'd very much appreciate it.
Disclaimer: After writing the following, I almost decided that the question should be closed for encouraging opinion-based responses. This is an opinion-based response. Proceed at your own risk.
Why does bash break up quoted arguments in strings when interpreting them as commands
Because that's what it does. A more interesting question might be "Why does bash break up strings at all?", to which the only possible answer would be "it seemed like a good idea at the time".
Or, to put it another way: In the beginning, nobody thought of putting spaces into filenames. When you only had a few letters for a filename, you didn't waste any of them on spaces. So it seemed reasonable to represent a list of words as just a space-separated list of words, and that was the basis on which shell languages were developed. So the default behaviour of bash, like that of all unix-y shells, is to consider a string with whitespace in it to be a whitespace-separated list of words.
But, of course, that leads to all sorts of headaches, because strings are not structured data. Sometimes a filename does have whitespace in its name. And not all utility arguments are filenames, either. Sometimes you want to give an argument to a utility which is, for example, a sentence. Without that complication, shells were able to avoid making you type quotes, unlike "real" programming languages where strings need to be quoted. But once you decide that sometimes a space in a string is just another character, you need to have some kind of quoting system. So then the syntax of shells added several quoting forms, each with slightly different semantics. The most common is double-quoting, which marks the contents as a single word but still allows variable expansion.
It remains the case that shell quotes, like quotes in any other language, are simply syntactic constructs. They are not part of the string, and the fact that a particular character in a string was marked with a quote (or, equivalently, a backslash) is not retained as part of the string -- again, just like any other programming language. Strings are not really lists of words; they are just treated that way by default.
All of that is not very satisfactory. The nature of shell programming is that you really want a data structure which is a list of "words" -- or, better, a list of strings. And, eventually, shells got around to doing that. Unfortunately, by then there wasn't much syntactic space left in shell languages; it was considered important that the new features not change the behaviour of existing shell scripts. As far as I know, the current shell syntax for arrays was created by David Korn in 1988 (or earlier); eventually, bash also implemented arrays with basically the same syntax.
One of the curiosities in the syntax is that there are three ways of specifying that an entire array should be substituted:
${array[*]} or ${array[#]}: concatenate all the array elements together separated with the first character in $IFS, and then consider the result to be a whitespace-separated list of words.
"${array[*]}": concatenate all the array elements together separated with the first character in $IFS, and then consider the result to be a single word.
"${array[#]}": each array element is inserted as a separate word.
Of these, the first one is essentially useless; the second one is occasionally useful, and the third -- and most difficult to type -- is the one you almost always want.
In the above brief discussion, I left out any consideration of glob characters and filename expansion, and a number of other shell idiosyncrasies. So don't take it as a complete tutorial, by any means.
why doesn't there seem to be a string-y way to get it to work?
You can always use eval. Unfortunately. If you really really want to get bash to interpret a string as though it were a bash program rather than a string, and if you are prepared to open your script up to all manner of injection attacks, then the shell will happily give you enough rope. Personally, I would never allow a script which used eval to pass code review so I'm not going to expand on its use here. But it's documented.
If I have a command packed in a string variable, it violates the Principle of Least Surprise to have that string interpreted differently than the string itself would be.
Surprise is really in the eye of the beholder. There are probably lots of programmers who think that a newline character really occupies two bytes, and are Surprised when it turns out that in C, '\n'[0] is not a backslash. But I think most of us would be Surprised if it were. (I've tried to answer SO questions based on this misunderstanding, and it is not easy.)
Bash strings, regardless of anything else, are strings. They are not bash programs. Having them suddenly interpreted as bash programs would, in my opinion, not only be surprising but dangerous. At least if you use eval, there is a big red flag for the code reviewer.

The difference between arguments and options pertaining to the linux shell

I'm currently enrolled in an intro to Unix / Linux class and we came to a question that the instructor and I did not agree on.
cp -i file1 file2
Which is true about the preceding command?
A. There is only one utility
B. There is one option
C. There are three arguments
D. file1 will be copied as file2 and the user will be warned before
an overwrite occures
E. All of the above
I insisted that it was E. All of the above. The instructor has settled on D.
It seems clear that A, B, and D are all correct. The hang up was C and whether or not the -i flag was both an option and an argument.
My logic was that all options are arguments but not all arguments are options and since there are multiple true answers listed, then in multiple choice question tradition the answer is more than likely to be E all of the above.
I haven't been able to find the smoking gun on this issue and thought I would throw it to the masters.
I know this is an old thread, but I want to add the following for anyone else that may stumble into a similar disagreement.
$ ls -l junk
-rw-r--r-- 1 you 19 Sep 26 16:25 junk
"The strings that follow the program name on the command line, such as -l
and junk in the example above, are called the program's arguments. Arguments are usually options or names of files to be used by the command."
Brian W. Kernighan & Rob Pike, "The UNIX Programming Environment"
The manual page here states:
Mandatory arguments to long options are mandatory for short options too.
This seems to imply that in the context of this particular question, at least, you're supposed to not consider options to be arguments. Otherwise it becomes very recursive and kind of pointless.
I think the instructor should accept your explanation though, this really is splitting hairs for most typical cases.
I think the term "arguments" is used in different ways in different contexts, which is the root of the disagreement, it seems. In support of your stance though, note that the C runtime, upon which cp was most likely written, declares the program entry point as main(argc, argv) (types elided), which seems to indicate at least that those who designed the C architecture/library/etc. thought of options as a subset of arguments. Then of course options can have their own arguments (different context), etc....
This is how I was taught, it is said in this case:
cp -i file1 file2
The right answer would be A B and D but not C.
Since -i is an option and file1 and file2 are arguments. Normally options are considered to change the behaviour of an application or command where as arguments do not.
I suppose it is up to semantics as to whether you consider -i an argument of the original application since it is a behaviour changing option (or argument) of cp but it is considered in English an option not a argument.
That's how I still define the difference and keep the difference between the two parts of a command.
As another command example, cronjobs. I often use PHP cronjobs and I normally have both options and arguments associated with the command. Options are always used (in my opinion) to define extra behaviour while arguments are designed to provide the app and it's behaviours with the data it requires to complete the operation.
Edit
I agree with #unwind this is splitting hairs and actually a lot of times comes down to scenario and opinion. It was quite bad of him to even mark on it really, he should of known this is a subjective question. Tests are completely unfair when filled with subjective questions.
Hmmm... I personally like to distinguish between options and arguments however, you could technically say that options are arguments. I would say that you are correct but I think your instructor settled on D because he doesn't want you to get them confused. For example, the following is equivalent to the above command...:
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG1 $ARG2 $ARG3
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG2 $ARG1 $ARG3
ARG1="-i" ; ARG2="file1" ; ARG3="file2" ; cp $ARG2 $ARG3 $ARG1
...whereas cp $ARG1 $ARG3 $ARG2 is not the same. I would say that options are a special type of arguments.

How to solve a linear system in Linux shell?

Does anyone know of a Linux command that reads a linear system of equations from its standard input and writes the solution (if exists) in its standard output?
I want to do something like this:
generate_system | solve_system
You can probably write your own such command using this package.
This is an old question, but showed up in my searches for this problem, so I'm adding an answer here.
I used maxima's solve function. Wrangling the input/output to/from maxima is a bit of a challenge, but can be done.
prepare the system of equations as a comma-separated list -- for a example, EQs="C[1]+C[2]=1,C[1]-C[2]=2". I wanted a solution for an unknown number of variables, so I used C[n], but you can use variable names.
prepare a list of variables you wish to solve for -- EQ_VARS="C[1],C[2]"
Maxima will echo all inputs, use line wrap, and return a solution in the form [C[1]=...,C[2]=..]. We need to resolve all of these.
Taken together, this becomes
OUT_VALS=( \
$(maxima --very-quiet \
--batch-string="display2d:false\$linel:9999\$print(map(rhs,float(solve([$EQs],[$EQ_VARS]))[1]))\$" \
| tail -n 1 \
| tr -c '0-9-.e' ' ') )
which will place the solution values into the array $OUT_VALS.
Note that this only properly handles that Maxima output if your problem is correctly constrained -- if you have zero, or more than one solution, the output will not be parsed correctly.

What character sequence should I not allow in a filename?

I found out after testing that linux allows any character in a file name except for / and null (\0). So what sequence should I not allow in a filename? I heard a leading - may confuse some command line programs, which doesn't matter to me, however it may bother other people if they decide to collect a bunch of files and filter it with some GNU programs.
It was suggested to me to remove leading and trailing spaces and I plan to only because typically the user doesn't mean to have leading/trailing space.
What problematic sequence might there be and what sequence should I consider not allowing?
I am also considering not allowing characters illegal in windows just for convenience. I think I may not allow dashes at the beginning (dash is a legal window character)
Your question is somewhat confusing since you talk at length about Linux, but then in a comment to another answer you say that you are generating filenames for people to download, which presumably means that you have absolutely no control whatsoever over the filesystem and operating system that the files will be stored on, making Linux completely irrelevant.
For the purpose of this answer I'm going to assume that your question is wrong and your comment is correct.
The vast majority of operating systems and filesystems in use today fall roughly into three categories: POSIX, Windows and MacOS.
The POSIX specification is very clear on what a filename that is guaranteed to be portable across all POSIX systems looks like. The characters that you can use are defined in Section 3.276 (Portable Filename Character Set) of the Open Group Base Specification as:ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789._-The maximum filename length that you can rely on is defined in Section 13.23.3.5 (<limits.h> Minimum Values) as 14. (The relevant constant is _POSIX_NAME_MAX.)
So, a filename which is up to 14 characters long and contains only the 65 characters listed above, is safe to use on all POSIX compliant systems, which gives you 24407335764928225040435790 combinations (or roughly 84 bits).
If you don't want to annoy your users, you should add two more restrictions: don't start the filename with a dash or a dot. Filenames starting with a dot are customarily interpreted as "hidden" files and are not displayed in directory listings unless explicitly requested. And filenames starting with a dash may be interpreted as an option by many commands. (Sidenote: it is amazing how many users don't know about the rm ./-rf or rm -- -rf tricks.)
This leaves you at 23656340818315048885345458 combinations (still 84 bits).
Windows adds a couple of new restrictions to this: filenames cannot end with a dot and filenames are case-insensitive. This reduces the character set from 65 to 39 characters (37 for the first, 38 for the last character). It doesn't add any length restrictions, Windows can deal with 14 characters just fine.
This reduces the possible combinations to 17866587696996781449603 (73 bits).
Another restriction is that Windows treats everything after the last dot as a filename extension which denotes the type of the file. If you want to avoid potential confusion (say, if you generate a filename like abc.mp3 for a text file), you should avoid dots altogether.
You still have 13090925539866773438463 combinations (73 bits).
If you have to worry about DOS, then additional restrictions apply: the filename consists of one or two parts (seperated by a dot), where neither of the two parts can contain a dot. The first part has a maximum length of 8, the second of 3 characters. Again, the second part is usually reserved to indicate the file type, which leaves you only 8 characters.
Now you have 4347792138495 possible filenames or 41 bits.
The good news is that you can use the 3 character extension to actually correctly indicate the file type, without breaking the POSIX filename limit (8+3+1 = 12 < 14).
If you want your users to be able to burn the files onto a CD-R formatted with ISO9660 Level 1, then you have to disallow hyphen anywhere, not just as the first character. Now, the remaining character set looks likeABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789_which gives you 3512479453921 combinations (41 bits).
I would leave the determination of what's "valid" up to the OS and filesystem driver. Let the user type whatever they want, and pass it on. Handle errors from the OS in an appropriate manner. The exception is I think it's reasonable to strip leading and trailing spaces. If people want to create filenames with embedded spaces or leading dashes or question marks, and their chosen filesystem allows it, it shouldn't be up to you to try to prevent them.
It's possible to mount different filesystems at different mount points (or drives in Windows) that have different rules regarding legal characters in a file name. Handling this sort of thing inside your application will be much more work than is necessary, because the OS will already do it for you.
Since you seem to be interested primarily in Linux, one thing to avoid is characters that the (typical) shell will try to interpret, for example, as a wildcard. You can create a file named "*" if you insist, but you might have some users who don't appreciate it much.
Are you developing an application where you have to ask the user to create files themselves? If that's what you are doing, then you can set the rules in your application. (eg only allow [a-zA-Z0-9_.] and reject the rest of special characters.) this is much simpler to enforce.
urlencode all strings to be use as filenames and you'll only have to worry about length. This answer might be worth reading.
I'd recommend the use of a set of whitelist characters. In general, symbols in filenames will annoy people.
By all means allow people to use a-z 0-9 and unicode characters > 0x80, but do not allow arbitrary symbols, things like & and , will cause a lot of annoyance, as well as fullstops in inappropriate places.
I think the ASCII symbols which are safe to allow are: fullstop underscore hyphen
Allowing any OTHER ascii symbols in the filename is asking for trouble.
A filename should also not start with an ascii symbol. Policy on spaces in filenames is tricky as users may expect to be able to use them, but some filenames are obviously silly (such as those which START with spaces)

Resources