In various bash scripts I have come across the following: $'\0'
An example with some context:
while read -r -d $'\0' line; do
echo "${line}"
done <<< "${some_variable}"
What does $'\0' return as its value? Or, stated slightly differently, what does $'\0' evaluate to and why?
It is possible that this has been answered elsewhere. I did search prior to posting but the limited number of characters or meaningful words in dollar-quote-slash-zero-quote makes it very hard to get results from stackoverflow search or google. So, if there are other duplicate questions, please allow some grace and link them from this question.
In bash, $'\0' is precisely the same as '': an empty string. There is absolutely no point in using the special Bash syntax in this case.
Bash strings are always NUL-terminated, so if you manage to insert a NUL into the middle of a string, it will terminate the string. In this case, the C-escape \0 is converted to a NUL character, which then acts as a string terminator.
The -d option of the read builtin (which defines a line-end character the input) expects a single character in its argument. It does not check if that character is the NUL character, so it will be equally happy using the NUL terminator of '' or the explicit NUL in $'\0' (which is also a NUL terminator, so it is probably no different). The effect, in either case, will be to read NUL-terminated data, as produced (for example) by find's -print0 option.
In the specific case of read -d '' line <<< "$var', it is impossible for $var to have an internal NUL character (for the reasons described above), so line will be set to the entire value of $var with leading and trailing whitespace removed. (As #mklement notes, this will not be apparent in the suggested code snippet, because read will have a non-zero exit status, even though the variable will have been set; read only returns success if the delimiter is actually found, and NUL cannot be part of a here-string.)
Note that there is a big difference between
read -d '' line
and
read -d'' line
The first one is correct. In the second one, the argument word passed to read is just -d, which means that the option will be the next argument (in this case, line). read -d$'\0' line will have identical behaviour; in either case, the space is necessary. (So, again, no need for the C-escape syntax).
To complement rici's helpful answer:
Note that this answer is about bash. ksh and zsh also support $'...' strings, but their behavior differs:
* zsh does create and preserve NUL (null bytes) with $'\0'.
* ksh, by contrast, has the same limitations as bash, and additionally interprets the first NUL in a command substitution's output as the string terminator (cuts off at the first NUL, whereas bash strips such NULs).
$'\0' is an ANSI C-quoted string that technically creates a NUL (0x0 byte), but effectively results in the empty (null) string (same as ''), because any NUL is interpreted as the (C-style) string terminator by Bash in the context of arguments and here-docs/here-strings.
As such, it is somewhat misleading to use $'\0' because it suggests that you can create a NUL this way, when you actually cannot:
You cannot create NULs as part of a command argument or here-doc / here-string, and you cannot store NULs in a variable:
echo $'a\0b' | cat -v # -> 'a' - string terminated after 'a'
cat -v <<<$'a\0b' # -> 'a' - ditto
In the context of command substitutions, by contrast, NULs are stripped:
echo "$(printf 'a\0b')" | cat -v # -> 'ab' - NUL is stripped
However, you can pass NUL bytes via files and pipes.
printf 'a\0b' | cat -v # -> 'a^#b' - NUL is preserved, via stdout and pipe
Note that it is printf that is generating the NUL via its single-quoted argument whose escape sequences printf then interprets and writes to stdout. By contrast, if you used printf $'a\0b', bash would again interpret the NUL as the string terminator up front and pass only 'a' to printf.
If we examine the sample code, whose intent is to read the entire input at once, across lines (I've therefore changed line to content):
while read -r -d $'\0' content; do # same as: `while read -r -d '' ...`
echo "${content}"
done <<< "${some_variable}"
This will never enter the while loop body, because stdin input is provided by a here-string, which, as explained, cannot contain NULs.
Note that read actually does look for NULs with -d $'\0', even though $'\0' is effectively ''. In other words: read by convention interprets the empty (null) string to mean NUL as -d's option-argument, because NUL itself cannot be specified for technical reasons.
In the absence of an actual NUL in the input, read's exit code indicates failure, so the loop is never entered.
However, even in the absence of the delimiter, the value is read, so to make this code work with a here-string or here-doc, it must be modified as follows:
while read -r -d $'\0' content || [[ -n $content ]]; do
echo "${content}"
done <<< "${some_variable}"
However, as #rici notes in a comment, with a single (multi-line) input string, there is no need to use while at all:
read -r -d $'\0' content <<< "${some_variable}"
This reads the entire content of $some_variable, while trimming leading and trailing whitespace (which is what read does with $IFS at its default value, $' \t\n').
#rici also points out that if such trimming weren't desired, a simple content=$some_variable would do.
Contrast this with input that actually contains NULs, in which case while is needed to process each NUL-separated token (but without the || [[ -n $<var> ]] clause); find -print0 outputs filenames separated by a NUL each):
while IFS= read -r -d $'\0' file; do
echo "${file}"
done < <(find . -print0)
Note the use of IFS= read ... to suppress trimming of leading and trailing whitespace, which is undesired in this case, because input filenames must be preserved as-is.
It is technically true that the expansion $'\0' will always become the empty string '' (a.k.a. the null string) to the shell (not in zsh). Or, worded the other way around, a $'\0' will never expand to an ascii NUL (or byte with zero value), (again, not in zsh). It should be noted that it is confusing that both names are quite similar: NUL and null.
However, there is an aditional (quite confusing) twist when we talk about read -d ''.
What read see is the value '' (the null string) as the delimiter.
What read does is split the input from stdin on the character $'\0' (yes an actual 0x00).
Expanded answer.
The question in the tittle is:
In a bash script, what would $'\0' evaluate to and why?
That means that we need to explain what $'\0' is expanded to.
What $'\0' is expanded to is very easy: it expands to the null string '' (in most shells, not in zsh).
But the example of use is:
read -r -d $'\0'
That transform the question to: what delimiter character does $'\0' expand to ?
This holds a very confusing twist. To address that correctly, we need to take a full circle tour of when and how a NUL (a byte with zero value or '0x00') is used in shells.
Stream.
We need some NUL to work with. It is possible to generate NUL bytes from shell:
$ echo -e 'ab\0cd' | od -An -vtx1
61 62 00 63 64 0a ### That works in bash.
$ printf 'ab\0cd' | od -An -vtx1
61 62 00 63 64 ### That works in all shells tested.
Variable.
A variable in shell will not store a NUL.
$ printf -v a 'ab\0cd'; printf '%s' "$a" | od -An -vtx1
61 62
The example is meant to be executed in bash as only bash printf has the -v option.
But the example is clear to show that a string that contains a NUL will be cut at the NUL.
Simple variables will cut the string at the zero byte.
As is reasonable to expect if the string is a C string, which must end on a NUL \0.
As soon as a NUL is found the string must end.
Command substitution.
A NUL will work differently when used in a command substitution.
This code should assign a value to the variable $a and then print it:
$ a=$(printf 'ab\0cd'); printf '%s' "$a" | od -An -vtx1
And it does, but with different results in different shells:
### several shells just ignore (remove)
### a NUL in the value of the expanded command.
/bin/dash : 61 62 63 64
/bin/sh : 61 62 63 64
/bin/b43sh : 61 62 63 64
/bin/bash : 61 62 63 64
/bin/lksh : 61 62 63 64
/bin/mksh : 61 62 63 64
### ksh trims the the value.
/bin/ksh : 61 62
/bin/ksh93 : 61 62
### zsh sets the var to actually contain the NUL value.
/bin/zsh : 61 62 00 63 64
/bin/zsh4 : 61 62 00 63 64
It is of special mention that bash (version 4.4) warns about the fact:
/bin/b44sh : warning: command substitution: ignored null byte in input
61 62 63 64
In command substitution the zero byte is silently ignored by the shell.
It is very important to understand that that does not happen in zsh.
Now that we have all the pieces about NUL. We may look at what read does.
What read do on NUL delimiter.
That brings us back to the command read -d $'\0':
while read -r -d $'\0' line; do
The $'\0' shoud have been expanded to a byte of value 0x00, but the shell cuts it and it actually becomes ''.
That means that both $'\0' and '' are received by read as the same value.
Having said that, it may seem reasonable to write the equivalent construct:
while read -r -d '' line; do
And it is technically correct.
What a delimiter of '' actually does.
There are two sides of this point, one that is the character after the -d option of read, the other one, which is addressed here, is: what character will read use if given a delimiter as -d $'\0'?.
The first side has been answered in detail above.
The second side is very confusing twist as the command read will actually read up to the next byte of value 0x00 (which is what $'\0' represents).
To actually show that that is the case:
#!/bin/bash
# create a test file with some zero bytes.
printf 'ab\0cd\0ef\ngh\n' > tfile
while true ; do
read -r -d '' line; a=$?
echo "exit $a"
if [[ $a == 1 ]]; then
printf 'last %s\n' "$line"
break
else
printf 'normal %s\n' "$line"
fi
done <tfile
when executed, the output will be:
$ ./script.sh
exit 0
normal ab
exit 0
normal cd
exit 1
last ef
gh
The first two exit 0 are successfully reads done up to the next "zero byte", and both contain the correct values of ab and cd. The next read is the last one (as there are no more zero bytes) and contains the value $'ef\ngh' (yes, it also contains a new line).
All this goes to show (and prove) that read -d '' actually reads up to the next "zero byte", which is also known by the ascii name NUL and should have been the result of a $'\0' expansion.
In short: we can safely state that read -d '' reads up to the next 0x00 (NUL).
Conclusion:
We must state that a read -d $'\0' will expand to a delimiter of 0x00.
Using $'\0' is a better way to transmit to the reader this correct meaning.
As a code style thing: I write $'\0' to make my intentions clear.
One, and only one, character used as a delimiter: the byte value of 0x00
(even if in bash it happens to be cut)
Note: Either this commands will print the hex values of the stream.
$ printf 'ab\0cd' | od -An -vtx1
$ printf 'ab\0cd' | xxd -p
$ printf 'ab\0cd' | hexdump -v -e '/1 "%02X "'
61 62 00 63 64
$'\0' expands the contained escape sequence \0 to the actual characters they represent which is \0 or an empty character in shell.
This is BASH syntax. As per man BASH:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Known backslash escape sequences are also decoded.
Similarly $'\n' expands to a newline and $'\r' will expand to a carriage return.
Related
I am very confused about the read -r flag, or the meaning of "escape" in this contexts. The manual says regarding this flag:
-r = do not allow backslashes to escape any characters
But this seems to me to be the OPPOSITE of what the flag does. For example, running:
read -d '' VAR <<EOF
This is the \t first line
This is the second line
EOF
echo $VAR
... gives:
This is the t first line
This is the second line
But that seems to me as though the 't' character has NOT been escaped by the backslash. Conversely, when I add the -r flag, I get the following:
This is the first line
This is the second line
... where it appears to me as though the 't' character HAS been escaped due to the -r flag. So am I misunderstanding the meaning of the word "escape", or misunderstanding something else going on here?
I strongly suspect your confusion is caused by the manner in which you are determining the final content of the string. When backslashes are treated as an escape sequence (eg, when you do not use -r), \t is treated the same as a t. When they are not, it is treated as the literal two characters \t. Consider:
$ cat a.sh
#!/bin/sh
read a << 'EOF'
a: Without -r: foo\tbar
EOF
read -r b << 'EOF'
b: With -r : foo\tbar
EOF
printf "a = %s\n" "$a"
printf "b = %s\n" "$b"
printf "printf interprets the string: $a\n"
printf "printf interprets the string: $b\n"
$ ./a.sh
a = a: Without -r: footbar
b = b: With -r : foo\tbar
printf interprets the string: a: Without -r: footbar
printf interprets the string: b: With -r : foo bar
Thanks to everyone for their input. OK, this is one of those pesky things in bash that is clearer to me now, but had me confused initially. Here's my summary understanding.
There are, in a sense, three strings at play here:
The string of characters input to the heredoc,
The string of characters output from the heredoc and input to read,
The string of characters output from read and input to VAR
The string of characters being fed into the heredoc is, of course, whatever you type between the delimiters. But the string of characters output by the heredoc will depend on its own rules (viz. on whether the delimiter is quoted or not).
Next, the string of characters output by the heredoc will go into read, but the string of characters to be output by read (and saved into VAR) will depend on the presence/absence of the -r flag. If the string of characters input to read contain backslashes, then read without -r will first escape any such backlash-prefixed sequence -- thus modifying the string of characters -- and saving it into VAR.
But read -r will not attempt to interpret the backslashes, leaving the input text "as is" when outputting to VAR. Hence, the original \t is preserved with read -r and thus interpreted as a tab in the final echo $VAR.
My confusion primarily lay in my lack of discernment of the three separate strings of characters at play here (not echo vs printf).
The escaping that a backslash does as input to read, is to prevent the next character from being treated as a separator:
$ read -r a b <<< 'foo\ bar'; printf "<%s> <%s>\n" "$a" "$b"
<foo\> <bar>
$ read a b <<< 'foo\ bar'; printf "<%s> <%s>\n" "$a" "$b"
<foo bar> <>
Without it, backslashes are removed as part of the escape processing. With it, they are kept as-is.
Having the \t turn into a hard tab is due to echo, some implementations of it do that by default, some don't.
Using ls -Q with --quoting-style=shell, newlines in file names (yes, I know...) are turned into ?. Is this a bug? Is there a way how to get the file names in a format that's 100% compatible with a shell (sh or bash if possible)?
Example (bash):
$ touch a$'\n'b
$ for s in literal shell shell-always c c-maybe escape locale clocale ; do
ls -Q a?b --quoting-style=$s
done
a?b
'a?b'
'a?b'
"a\nb"
"a\nb"
a\nb
‘a\nb’
‘a\nb’
coreutils 8.25 has the new 'shell-escape' quoting style, and in fact enables it by default to allow the output from ls to be always usable, and to be safe to copy and paste back to other commands.
Maybe not quite what you are looking for, but the "escape" style seems to work well with the upcoming ${...#E} parameter expansion in bash 4.4.
$ touch $'a\nb' $'c\nd'
$ ls -Q --quoting-style=escape ??? | while IFS= read -r fname; do echo =="${fname#E}==="; done
==a
b==
==c
d==
Here is the relevant part of the man page (link is to the raw source):
${parameter#operator}
Parameter transformation. The expansion is either a transforma-
tion of the value of parameter or information about parameter
itself, depending on the value of operator. Each operator is a
single letter:
Q The expansion is a string that is the value of parameter
quoted in a format that can be reused as input.
E The expansion is a string that is the value of parameter
with backslash escape sequences expanded as with the
$'...' quoting mechansim.
P The expansion is a string that is the result of expanding
the value of parameter as if it were a prompt string (see
PROMPTING below).
A The expansion is a string in the form of an assignment
statement or declare command that, if evaluated, will
recreate parameter with its attributes and value.
a The expansion is a string consisting of flag values rep-
resenting parameter's attributes.
If parameter is # or *, the operation is applied to each posi-
tional parameter in turn, and the expansion is the resultant
list. If parameter is an array variable subscripted with # or
*, the case modification operation is applied to each member of
the array in turn, and the expansion is the resultant list.
The result of the expansion is subject to word splitting and
pathname expansion as described below.
From a bit of experimentation, it looks like --quoting-style=escape is compatible with being wrapped in $'...', with two exceptions:
it escapes spaces by prepending a backslash; but $'...' doesn't discard backslashes before spaces.
it doesn't escape single-quotes.
So you could perhaps write something like this (in Bash):
function ls-quote-shell () {
ls -Q --quoting-style=escape "$#" \
| while IFS= read -r filename ; do
filename="${filename//'\ '/ }" # unescape spaces
filename="${filename//"'"/\'}" # escape single-quotes
printf "$'%s'\n" "$filename"
done
}
To test this, I've created a directory with a bunch of filenames with weird characters; and
eval ls -l $(ls-quote-shell)
worked as intended . . . though I won't make any firm guarantees about it.
Alternatively, here's a version that uses printf to process the escapes followed by printf %q to re-escape in a shell-friendly manner:
function ls-quote-shell () {
ls -Q --quoting-style=escape "$#" \
| while IFS= read -r escaped_filename ; do
escaped_filename="${escaped_filename//'\ '/ }" # unescape spaces
escaped_filename="${escaped_filename//'%'/%%}" # escape percent signs
# note: need to save in variable, rather than using command
# substitution, because command substitution strips trailing newlines:
printf -v filename "$escaped_filename"
printf '%q\n' "$filename"
done
}
but if it turns out that there's some case that the first version doesn't handle correctly, then the second version will most likely have the same issue. (FWIW, eval ls -l $(ls-quote-shell) worked as intended with both versions.)
I am very new to bash scripting.
I have a network trace file I want to parse. Part of the trace file is (two packets):
[continues...]
+---------+---------------+----------+
05:00:00,727,744 ETHER
|0
|00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|55|
+---------+---------------+----------+
05:00:00,727,751 ETHER
|0
|00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|56|00|00|3a|01|
[continues...]
For each packet, I want to print the time stamp, and the length of the packet (the hex values coming on the next line after |0 header) so the output will look like:
05:00:00.727744 20 bytes
05:00:00.727751 24 bytes
I can get the line with time stamp and the packets separately using grep in bash:
times=$(grep '..\:..\:' $fileName)
packets=$(grep '..|..|' $fileName)
But I can't work with the separate output lines after that. The whole result is concatenated in the two variables "times" and "packets". How can I get the length of each packet?
P.S. a good reference that really explains how to do bash programming, rather than just doing examples would be appreciated.
Okay, with plain old shell...
You can get the length of the line like this:
line="|00|03|a0|09|5c|1c|00|10|07|df|a4|20|08|00|45|00|00|38|e7|55|"
wc -c<<<$line
62
There are sixty two characters in that line. Think of each character as |00 where 00 can be any digit. In that case, there's an extra | on the end. Plus, the wc -c includes the NL on the end.
So, if we take the value of wc -c, and subtract 2, we get 60. If we divide that by 3, we get 20 which is the number of characters.
Okay, now we need a little loop, figure out the various lines, and then parse them:
#! /bin/bash
while read line
do
if [[ $line =~ ^[[:digit:]]{2} ]]
then
echo -n "${line% *}"
elif [[ $line =~ ^\|[[:digit:]]{2} ]]
then
length=$(wc -c<<<$line)
((length-=2))
((length=length/3))
echo "$length bytes"
fi
done < test.txt
There a PURE BASH solution to your problems!
You're a beginning Bash programmer, and you have no idea what's going on...
Let's take this one step at a time:
A common way to loop through a file in BASH is using a while read loop. This combines the while with a read:
while read line
do
echo "My line is '$line'"
done < test.txt
Each line in test.txt is being read into the $line shell variable.
Let's take the next one:
if [[ $line =~ ^[[:digit:]]{2} ]]
This is an if statement. Always use the [[ ... ]] brackets because they fix issues with the shell interpolating stuff. Plus, they have a bit more power.
The =~ is a regular expression match. The [[:digit:]] matches any digit. The ^ anchors the regular expression to the beginning of the line, and {2} means I want exactly two of these. This says if I match a line that starts with two digits (which is your timestamp line), execute this if clause.
${line% *} is a pattern filter. The % says to match the (glob) smallest glob pattern to the right and filter it from my $line variable. I use this to remove the ETHER from my line. The -n tells echo not to do a NL.
Let's take my elif which is an else if clause.
elif [[ $line =~ ^\|[[:digit:]]{2} ]]
Again, I am matching a regular expression. This regular expression starts with (The ^) a |. I have to put a backslash in front because | is a magical regular expression character and \ kills the magic. It's now just a pipe. Then, that's followed by two digits. Note this skips |0 but catches |00.
Now, we have to do some calculations:
length=$(wc -c<<<$line)
The $(...) say to execute the enclosed command and resubstitute it back in the line. The wc -c counts the characters and <<<$line is what we're counting. This gave us 62 characters. We have to subtract 2, then divide by 3. That's the next two lines:
((length-=2))
((length/=3))
The ((...)) allows me to do integer based math. The first subtracts 2 from $length and the next divides it by 3. Now, I can echo this out:
echo "$length bytes"
And that's our pure Bash answer to this question.
You really don't want to do such things with your shell.
You want to write a real parser that understands the format to output the needed informations.
For a quick and dirty hack you can do something like that:
perl -wne 'print "$& " if /^\d\S*/; print split(/\|/)-2, " bytes\n" if /^\|..\|/'
I'm trying to convert "Hello" to 48 65 6c 6c 6f in hexadecimal as efficiently as possible using the command line.
I've tried looking at printf and google, but I can't get anywhere.
Any help greatly appreciated.
Many thanks in advance,
echo -n "Hello" | od -A n -t x1
Explanation:
The echo program will provide the string to the next command.
The -n flag tells echo to not generate a new line at the end of the "Hello".
The od program is the "octal dump" program. (We will be providing a flag to tell it to dump it in hexadecimal instead of octal.)
The -A n flag is short for --address-radix=n, with n being short for "none". Without this part, the command would output an ugly numerical address prefix on the left side. This is useful for large dumps, but for a short string it is unnecessary.
The -t x1 flag is short for --format=x1, with the x being short for "hexadecimal" and the 1 meaning 1 byte.
If you want to do this and remove the spaces you need:
echo -n "Hello" | od -A n -t x1 | sed 's/ *//g'
The first two commands in the pipeline are well explained by #TMS in his answer, as edited by #James. The last command differs from #TMS comment in that it is both correct and has been tested. The explanation is:
sed is a stream editor.
s is the substitute command.
/ opens a regular expression - any character may be used. / is
conventional, but inconvenient for processing, say, XML or path names.
/ or the alternate character you chose, closes the regular expression and
opens the substitution string.
In / */ the * matches any sequence of the previous character (in this
case, a space).
/ or the alternate character you chose, closes the substitution string.
In this case, the substitution string // is empty, i.e. the match is
deleted.
g is the option to do this substitution globally on each line instead
of just once for each line.
The quotes keep the command parser from getting confused - the whole
sequence is passed to sed as the first option, namely, a sed script.
#TMS brain child (sed 's/^ *//') only strips spaces from the beginning of each line (^ matches the beginning of the line - 'pattern space' in sed-speak).
If you additionally want to remove newlines, the easiest way is to append
| tr -d '\n'
to the command pipes. It functions as follows:
| feeds the previously processed stream to this command's standard input.
tr is the translate command.
-d specifies deleting the match characters.
Quotes list your match characters - in this case just newline (\n).
Translate only matches single characters, not sequences.
sed is uniquely retarded when dealing with newlines. This is because sed is one of the oldest unix commands - it was created before people really knew what they were doing. Pervasive legacy software keeps it from being fixed. I know this because I was born before unix was born.
The historical origin of the problem was the idea that a newline was a line separator, not part of the line. It was therefore stripped by line processing utilities and reinserted by output utilities. The trouble is, this makes assumptions about the structure of user data and imposes unnatural restrictions in many settings. sed's inability to easily remove newlines is one of the most common examples of that malformed ideology causing grief.
It is possible to remove newlines with sed - it is just that all solutions I know about make sed process the whole file at once, which chokes for very large files, defeating the purpose of a stream editor. Any solution that retains line processing, if it is possible, would be an unreadable rat's nest of multiple pipes.
If you insist on using sed try:
sed -z 's/\n//g'
-z tells sed to use nulls as line separators.
Internally, a string in C is terminated with a null. The -z option is also a result of legacy, provided as a convenience for C programmers who might like to use a temporary file filled with C-strings and uncluttered by newlines. They can then easily read and process one string at a time. Again, the early assumptions about use cases impose artificial restrictions on user data.
If you omit the g option, this command removes only the first newline. With the -z option sed interprets the entire file as one line (unless there are stray nulls embedded in the file), terminated by a null and so this also chokes on large files.
You might think
sed 's/^/\x00/' | sed -z 's/\n//' | sed 's/\x00//'
might work. The first command puts a null at the front of each line on a line by line basis, resulting in \n\x00 ending every line. The second command removes one newline from each line, now delimited by nulls - there will be only one newline by virtue of the first command. All that is left are the spurious nulls. So far so good. The broken idea here is that the pipe will feed the last command on a line by line basis, since that is how the stream was built. Actually, the last command, as written, will only remove one null since now the entire file has no newlines and is therefore one line.
Simple pipe implementation uses an intermediate temporary file and all input is processed and fed to the file. The next command may be running in another thread, concurrently reading that file, but it just sees the stream as a whole (albeit incomplete) and has no awareness of the chunk boundaries feeding the file. Even if the pipe is a memory buffer, the next command sees the stream as a whole. The defect is inextricably baked into sed.
To make this approach work, you need a g option on the last command, so again, it chokes on large files.
The bottom line is this: don't use sed to process newlines.
echo hello | hexdump -v -e '/1 "%02X "'
Playing around with this further,
A working solution is to remove the "*", it is unnecessary for both the original requirement to simply remove spaces as well if substituting an actual character is desired, as follows
echo -n "Hello" | od -A n -t x1 | sed 's/ /%/g'
%48%65%6c%6c%6f
So, I consider this as an improvement answering the original Q since the statement now does exactly what is required, not just apparently.
Combining the answers from TMS and i-always-rtfm-and-stfw, the following works under Windows using gnu-utils versions of the programs 'od', 'sed', and 'tr':
echo "Hello"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
or in a CMD file as:
#echo "%1"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
A limitation on my solution is it will remove all double quotes (").
"tr -d '\42'" removes quote marks that the Windows 'echo' will include.
"tr -d '\r'" removes the carriage return, which Windows includes as well as '\n'.
The pipe (|) character must follow immediately after the string or the Windows echo will add that space after the string.
There is no '-n' switch to the Windows echo command.
How do I create an unmodified hex dump of a binary file in Linux using bash? The od and hexdump commands both insert spaces in the dump and this is not ideal.
Is there a way to simply write a long string with all the hex characters, minus spaces or newlines in the output?
xxd -p file
Or if you want it all on a single line:
xxd -p file | tr -d '\n'
Format strings can make hexdump behave exactly as you want it to (no whitespace at all, byte by byte):
hexdump -ve '1/1 "%.2x"'
1/1 means "each format is applied once and takes one byte", and "%.2x" is the actual format string, like in printf. In this case: 2-character hexadecimal number, leading zeros if shorter.
It seems to depend on the details of the version of od. On OSX, use this:
od -t x1 -An file |tr -d '\n '
(That's print as type hex bytes, with no address. And whitespace deleted afterwards, of course.)
Perl one-liner:
perl -e 'local $/; print unpack "H*", <>' file
The other answers are preferable, but for a pure Bash solution, I've modified the script in my answer here to be able to output a continuous stream of hex characters representing the contents of a file. (Its normal mode is to emulate hexdump -C.)
I think this is the most widely supported version (requiring only POSIX defined tr and od behavior):
cat "$file" | od -v -t x1 -A n | tr -d ' \n'
This uses od to print each byte as hex without address without skipping repeated bytes and tr to delete all spaces and linefeeds in the output. Note that not even the trailing linefeed is emitted here. (The cat is intentional to allow multicore processing where cat can wait for filesystem while od is still processing previously read part. Single core users may want replace that with < "$file" od ... to save starting one additional process.)
tldr;
$ od -t x1 -A n -v <empty.zip | tr -dc '[:xdigit:]' && echo
504b0506000000000000000000000000000000000000
$
Explanation:
Use the od tool to print single hexadecimal bytes (-t x1) --- without address offsets (-A n) and without eliding repeated "groups" (-v) --- from empty.zip, which has been redirected to standard input. Pipe that to tr which deletes (-d) the complement (-c) of the hexadecimal character set ('[:xdigit:]'). You can optionally print a trailing newline (echo) as I've done here to separate the output from the next shell prompt.
References:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/od.html
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html
This code produces a "pure" hex dump string and it runs faster than the all the
other examples given.
It has been tested on 1GB files filled with binary zeros, and all linefeeds.
It is not data content dependent and reads 1MB records instead of lines.
perl -pe 'BEGIN{$/=\1e6} $_=unpack "H*"'
Dozens of timing tests show that for 1GB files, these other methods below are slower.
All tests were run writing output to a file which was then verified by checksum.
Three 1GB input files were tested: all bytes, all binary zeros, and all LFs.
hexdump -ve '1/1 "%.2x"' # ~10x slower
od -v -t x1 -An | tr -d "\n " # ~15x slower
xxd -p | tr -d \\n # ~3x slower
perl -e 'local \$/; print unpack "H*", <>' # ~1.5x slower
- this also slurps the whole file into memory
To reverse the process:
perl -pe 'BEGIN{$/=\1e6} $_=pack "H*",$_'
You can use Python for this purpose:
python -c "print(open('file.bin','rb').read().hex())"
...where file.bin is your filename.
Explaination:
Open file.bin in rb (read binary) mode.
Read contents (returned as bytes object).
Use bytes method .hex(), which returns hex dump without spaces or new lines.
Print output.