Lab for my Alt OS class and I'm unsure of this Linux Command - linux

I'm currently doing a lab for my Alt OS class and the professor gives multiple commands that you have to explain their function for. The one I'm stuck on is
find /home/ -user bob | xargs -d “\n” chown bill:bill
I understand that we are finding any items within bob's home folder and piping that to xargs which is delimiting something. I'm just unsure what the "\n" portion is doing. At the end, I understand we are taking whatever those results are and changing permissions to bill.

From man xargs:
--delimiter=delim, -d delim
Input items are terminated by the specified character. The specified delimiter may be a single character, a C-style character escape such
as \n, or an octal or hexadecimal escape code. Octal and hexadecimal escape codes are understood as for the printf command. Multibyte
characters are not supported. When processing the input, quotes and backslash are not special; every character in the input is taken lit‐
erally. The -d option disables any end-of-file string, which is treated like any other argument. You can use this option when the input
consists of simply newline-separated items, although it is almost always better to design your program to use --null where this is possi‐
ble.
The \n escape sequence in C means a newline. The -d '\n' is typically used in xargs to delimite items by newlines - read one item per line. There is a significant difference as to quote handling:
$ echo "quote'not terminated" | xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
vs
$ echo "quote'not terminated" | xargs -d'\n'
quote'not terminated
On cppreference escape sequences you may find C escape sequences.

Related

Using echo in bash puts last variable in front of the output

I'm trying to write a script and one of the parts of the script requires me to concatenate some variables together to create a URL.
REPO_URL='https://github.com/Example/Repo.Game/'
FILENAME='Example.Game-linux.zip'
latest_version="$(curl -LIs "${REPO_URL}/releases/latest" | grep -i '^location:' | cut -d' ' -f2 | cut -d'/' -f8)"
echo "$latest_version"
echo "$FILENAME"
echo "$REPO_URL"
echo "${REPO_URL}releases/download/${latest_version}/${FILENAME}"
Output:
2.0.5164
Example.Game-linux.zip
https://github.com/Example/Repo.Game/
/Example.Game-linux.ziple/Repo.Game/releases/download/2.0.5164
My actual output:
2.0.5164
Oxide.Rust-linux.zip
https://github.com/OxideMod/Oxide.Rust/
/Oxide.Rust-linux.zipideMod/Oxide.Rust/releases/download/2.0.5164
It looks like some kind of overflow problem? I'm not exactly sure. I added abcabc to the filename and the output became
/Oxide.Rust-linux.zipabcabc/Oxide.Rust/releases/download/2.0.5164
Any help would be appreciated.
I resolved the problem by removing the carriage return value from the variable.
tr -d '\r' seems to have resolved it. I'm not sure where the variable came from and if anyone has advice on how to clean up this mess I would love some advice.
latest_version="$(curl -LIs "${REPO_URL}/releases/latest" | grep -i '^location:' | cut -d' ' -f2 | cut -d'/' -f8 | tr -d '\r')
You can use ANSI quoting, and variable substitution to remove control characters from variables without having to invoke sub-shells.
ANSI quoting uses the special format $'\*' to represent special characters. For example use $'\t' for tab, $'\n' for new-line and $'\r' for carriage-return.
Variable substitution uses extra characters at the end of the variable name to perform actions on the variable. For example
${variable//[pattern]/[substitution]} will replace all instances of [pattern] in ${variable} with [substitution].
${variable%[pattern]} will remove [pattern] from ${variable} if it is at the end.
By combining these two, you can remove carriage-return characters from the end of your variable like this:
echo ${variable%$'\r'}
Note: Variable substitution doesn't actually change the contents of the variable. To do that, you have to re-assign the result back to the variable:
variable="${variable%$'\r'}"
There is a cleaner way to get the version number, minus any trailing carriage-return, from github using sed.
latest_version =$(curl -LIs "${REPO_URL}/releases/latest" | sed -n 's/^Location:.*\/\([^\r]*\).*$/\1/p')
sed reads every line of input (STDIN by default) and performs operations on it defined by the action string parameter. The action string is a little tricky to explain in this case, but here goes:
The -n option suppresses the printing of each input line. Output will then only happen if it is explicitly stated in the action string.
The s/[pattern]/[substitution]/p construct says whenever you find [pattern], replace it with [substitution] and print it. Our [pattern] is ^Location:.*\/\(.*\)$, and our [substitution] is \1.
The expression ^ matches the beginning of the line.
The expression . means any single character, and the expression .* means any number of characters (including zero). This will match the largest possible string, so, for example .*/ will match abc/def/ in the string abc/def/ghi.
The expression \/ just escapes the forward slash (because we are using backslash as a delimiter, we have to escape it).
The expression \([pattern]\) says any time you find [pattern], remember it. in our case, it will remember whatever matches [^\r].
The expression [{chars}] matches any one of the characters in {chars}. [^{chars}] matches any character that is not in {chars}. so [^\r]* matches any number of characters that is not a carriage return.
The expression $ matches the end of a line.
The expression \1 is replaced by the first remembered pattern.
So altogether, our action string says:
If you find a line that starts with Location:, followed by any number of characters, followed by a /, followed by any number of characters that are not a carriage return (which will be remembered), followed by any number of characters, followed by an end of line, then print the remembered characters.

Find files with non-printing characters (null bytes)

I have got the log of my application with a field that contains strange characters.
I see these characters only when I use less command.
I tried to copy the result of my line of code in a text file and what I see is
CTP_OUT=^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
I'd like to know if there is a way to find these null characters. I have tried with a grep command but it didn't show anything
I hardly believe it, I might write an answer involving cat!
The characters you are observing are non-printable characters which are often written in Carret notation. The Caret notation of a character is a way to visualize non-printable characters. As mentioned in the OP, ^# is the representation of NULL.
If your file has non-printable characters, you can visualize them using cat -vET:
-E, --show-ends: display $ at end of each line
-T, --show-tabs: display TAB characters as ^I
-v, --show-nonprinting: use ^ and M- notation, except for LFD and TAB
source: man cat
I've added the -E and -T flag to it, to convert everything non-printable.
As grep will not output the non-printable characters itself in any form, you have to pipe its output to cat to see them. The following example shows all lines containing non-printable characters
Show all lines with non-printable characters:
$ grep -E '[^[:print:]]' --color=never file | cat -vET
Here, the ERE [^[:print:]] selects all non-printable characters.
Show all lines with NULL:
$ grep -Pa '\x00' --color=never file | cat -vET
Be aware that we need to make use of the Perl regular expressions here as they understand the hexadecimal and octal notation.
Various control characters can be written in C language style: \n matches a newline, \t a tab, \r a carriage return, \f a form feed, etc.
More generally, \nnn, where nnn is a string of three octal digits, matches the character whose native code point is nnn. You can easily run into trouble if you don't have exactly three digits. So always use three, or since Perl 5.14, you can use \o{...} to specify any number of octal digits.
Similarly, \xnn, where nn are hexadecimal digits, matches the character whose native ordinal is nn. Again, not using exactly two digits is a recipe for disaster, but you can use \x{...} to specify any number of hex digits.
source: Perl 5 version 26.1 documentation
An example:
$ printf 'foo\012\011\011bar\014\010\012foobar\012\011\000\013\000car\012\011\011\011\012' > test.txt
$ cat test.txt
foo
bar
foobar
car
If we now use grep alone, we get the following:
$ grep -Pa '\x00' --color=never test.txt
car
But piping it to cat allows us to visualize the control characters:
$ grep -Pa '\x00' --color=never test.txt | cat -vET
^I^#^K^#car$
Why --color=never: If your grep is tuned to have --color=auto or --color=always it will add extra control characters to be interpreted as color for the terminal. And this might confuse you by the content.
$ grep -Pa '\x00' --color=always test.txt | cat -vET
^I^[[01;31m^[[K^#^[[m^[[K^K^[[01;31m^[[K^#^[[m^[[Kcar$
sed can.
sed -n '/\x0/ { s/\x0/<NUL>/g; p}' file
-n skips printing any output unless explicitly requested.
/\x0/ selects for only lines with null bytes.
{...} encapsulates multiple commands, so that they can be collectively applied always and only when the /\x0/ has detected a null on the line.
s/\x0/<NUL>/g; substitutes in a new, visible value for the null bytes. You could make it whatever you want - I used <NUL> as something both reasonably obvious and yet unlikely to occur otherwise. You should probably grep the file for it first to be sure the pattern doesn't exist before using it.
p; causes lines that have been edited (because they had a null byte) to show.
This basically makes sed an effective grep for nulls.

Remove lines with japanese characters from a file

First question on here- I've searched around to put together an answer to this but have come up empty thus far.
I have a multi-line text file that I am cleaning up. Part of this is to remove lines that include Japanese characters. I have been using sed for my other operations but it is not working in this instance.
I was under the impression that using the -r switch and the \p{Han} regular expression would work (from looking at other questions of this kind), but it is not working in this case.
Here is my test string - running this returns the full string, and does not filter out the JP characters as I was expecting.
echo 80岁返老还童的处女: 第3话 | sed -r "s/\\p\{Han\}//g"
Am I missing something? Is there another command I should be using instead?
I think this might work for you:
echo "80岁返老还童的处女: 第3话" | tr -cd '[:print:]\n'
sed doesn't support unicode classes AFAIK, and nor support multibyte ranges.
-d deletes characters in SET1, and -c reverses it.
[:print:] matches all printable characters including space.
\n is a newline
The above will not only remove Japanese characters but all multibyte characters, including control characters.
Perl can also be used:
PERLIO=:utf8 perl -pe 's/\p{Han}//g' file
PERLIO=:utf8 tells Perl to tread input and output as UTF-8

cut command in bash terminating on quotation marks

So I am trying to read in a file that has a bunch of lines with an email address and then a nickname in them. I am trying to extract this nickname, which is surrounded by parentheses, like below
email#somewhere.com (Tom)
so my thought was just to use cut to get at the word Tom, but this is foiled when I end up with something like the following
email2#somewhereElse.com ("Bob")
Because Bob has quotes around it, the cut command fails as follows
cut: <file>: Illegal byte sequence
Does anyone know of a better way of doing this? or a way to solve this problem?
Reset your locale to C (raw uninterpreted byte sequence) to avoid Illegal byte sequence errors.
locale charmap
LC_ALL=C cut ... | LC_ALL=C sort ...
I think that
grep -o '(.*)' emailFile
should do it. "Go through all lines in the file. Look for a sequence that starts with open parens, then any characters until close parens. Echo the bit that matches the string to stdout."
This preserves the quotes around the nickname... as well as the brackets. If you don't want those, you can strip them:
grep -o '(.*)' emailFile | sed 's/[(")]//g'
("replace any of the characters between square brackets with nothing, everywhere")
perl -lne '$_=~/[^\(]*\(([^)]*)\)/g;print $1'
tested here

Convert string to hexadecimal on command line

I'm trying to convert "Hello" to 48 65 6c 6c 6f in hexadecimal as efficiently as possible using the command line.
I've tried looking at printf and google, but I can't get anywhere.
Any help greatly appreciated.
Many thanks in advance,
echo -n "Hello" | od -A n -t x1
Explanation:
The echo program will provide the string to the next command.
The -n flag tells echo to not generate a new line at the end of the "Hello".
The od program is the "octal dump" program. (We will be providing a flag to tell it to dump it in hexadecimal instead of octal.)
The -A n flag is short for --address-radix=n, with n being short for "none". Without this part, the command would output an ugly numerical address prefix on the left side. This is useful for large dumps, but for a short string it is unnecessary.
The -t x1 flag is short for --format=x1, with the x being short for "hexadecimal" and the 1 meaning 1 byte.
If you want to do this and remove the spaces you need:
echo -n "Hello" | od -A n -t x1 | sed 's/ *//g'
The first two commands in the pipeline are well explained by #TMS in his answer, as edited by #James. The last command differs from #TMS comment in that it is both correct and has been tested. The explanation is:
sed is a stream editor.
s is the substitute command.
/ opens a regular expression - any character may be used. / is
conventional, but inconvenient for processing, say, XML or path names.
/ or the alternate character you chose, closes the regular expression and
opens the substitution string.
In / */ the * matches any sequence of the previous character (in this
case, a space).
/ or the alternate character you chose, closes the substitution string.
In this case, the substitution string // is empty, i.e. the match is
deleted.
g is the option to do this substitution globally on each line instead
of just once for each line.
The quotes keep the command parser from getting confused - the whole
sequence is passed to sed as the first option, namely, a sed script.
#TMS brain child (sed 's/^ *//') only strips spaces from the beginning of each line (^ matches the beginning of the line - 'pattern space' in sed-speak).
If you additionally want to remove newlines, the easiest way is to append
| tr -d '\n'
to the command pipes. It functions as follows:
| feeds the previously processed stream to this command's standard input.
tr is the translate command.
-d specifies deleting the match characters.
Quotes list your match characters - in this case just newline (\n).
Translate only matches single characters, not sequences.
sed is uniquely retarded when dealing with newlines. This is because sed is one of the oldest unix commands - it was created before people really knew what they were doing. Pervasive legacy software keeps it from being fixed. I know this because I was born before unix was born.
The historical origin of the problem was the idea that a newline was a line separator, not part of the line. It was therefore stripped by line processing utilities and reinserted by output utilities. The trouble is, this makes assumptions about the structure of user data and imposes unnatural restrictions in many settings. sed's inability to easily remove newlines is one of the most common examples of that malformed ideology causing grief.
It is possible to remove newlines with sed - it is just that all solutions I know about make sed process the whole file at once, which chokes for very large files, defeating the purpose of a stream editor. Any solution that retains line processing, if it is possible, would be an unreadable rat's nest of multiple pipes.
If you insist on using sed try:
sed -z 's/\n//g'
-z tells sed to use nulls as line separators.
Internally, a string in C is terminated with a null. The -z option is also a result of legacy, provided as a convenience for C programmers who might like to use a temporary file filled with C-strings and uncluttered by newlines. They can then easily read and process one string at a time. Again, the early assumptions about use cases impose artificial restrictions on user data.
If you omit the g option, this command removes only the first newline. With the -z option sed interprets the entire file as one line (unless there are stray nulls embedded in the file), terminated by a null and so this also chokes on large files.
You might think
sed 's/^/\x00/' | sed -z 's/\n//' | sed 's/\x00//'
might work. The first command puts a null at the front of each line on a line by line basis, resulting in \n\x00 ending every line. The second command removes one newline from each line, now delimited by nulls - there will be only one newline by virtue of the first command. All that is left are the spurious nulls. So far so good. The broken idea here is that the pipe will feed the last command on a line by line basis, since that is how the stream was built. Actually, the last command, as written, will only remove one null since now the entire file has no newlines and is therefore one line.
Simple pipe implementation uses an intermediate temporary file and all input is processed and fed to the file. The next command may be running in another thread, concurrently reading that file, but it just sees the stream as a whole (albeit incomplete) and has no awareness of the chunk boundaries feeding the file. Even if the pipe is a memory buffer, the next command sees the stream as a whole. The defect is inextricably baked into sed.
To make this approach work, you need a g option on the last command, so again, it chokes on large files.
The bottom line is this: don't use sed to process newlines.
echo hello | hexdump -v -e '/1 "%02X "'
Playing around with this further,
A working solution is to remove the "*", it is unnecessary for both the original requirement to simply remove spaces as well if substituting an actual character is desired, as follows
echo -n "Hello" | od -A n -t x1 | sed 's/ /%/g'
%48%65%6c%6c%6f
So, I consider this as an improvement answering the original Q since the statement now does exactly what is required, not just apparently.
Combining the answers from TMS and i-always-rtfm-and-stfw, the following works under Windows using gnu-utils versions of the programs 'od', 'sed', and 'tr':
echo "Hello"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
or in a CMD file as:
#echo "%1"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
A limitation on my solution is it will remove all double quotes (").
"tr -d '\42'" removes quote marks that the Windows 'echo' will include.
"tr -d '\r'" removes the carriage return, which Windows includes as well as '\n'.
The pipe (|) character must follow immediately after the string or the Windows echo will add that space after the string.
There is no '-n' switch to the Windows echo command.

Resources