I'm trying to clean a binary file to delete all the NULL on it. The task is quite simple, but I found out a lot of files have a NULL at the end of the file and i dont know what. I'm dumping the hexadecimal value of each byte and i dont see the null anywhere, but if I do a hexdump of the file, I see a value 00 at the end and I dont know why.... Could be that it is a EOF, but it's weird becuase it doesnt appear in all files. This is the script I have, quite simpel one, it generates 100 random binary files, and then reads file per file, char per char. Following the premise that bash wont store NULL's on variables, rewritting the char after storing it on a variable would avoid the NULL's, but no....
#!/bin/bash
for i in $(seq 0 100)
do
echo "$i %"
time dd if=/dev/urandom of=$i bs=1 count=1000
while read -r -n 1 c;
do
echo -n "$c" >> temp
done < $i
mv temp $i
done
I also tried with:
tr '\000' <inFile > outfile
But same result.
This is how it looks the hexdump of one the files with this problem
00003c0 0b12 a42b cb50 2a90 1fd6 a4f9 89b4 ddb6
00003d0 3fa3 eb7e 00c4
c4 should be the last byte butas you can see, there's a 00 there ....
Any clue?
EDIT:
Forgot to mention that the machine where im running this is something similar like raspberry pi and the tools provided with it are quite limited.
Try these other commands:
od -tx1 inFile
xxd inFile
hexdump outputs 00 when the size is an odd number of bytes.
It seems hexdump without options is like -x, hexdump -h gives the list of options; hexdump -C may also help.
I'm trying to create a pipeline from user input, but when I redirect the output I'm getting a output with no newlines and it's just one huge single line.Here's the code :
42 function stack(){
43 echo $(history|tail -1|cut -d" " -f5-|cut -d "|" -f1) >> ~/commands
44 local last=$(tail -1 ~/commands)
45 echo $(eval $last) >> ~/output
46 }
Is there a better way to pipe the output from this to a file ? Echo seems to corrupt the output.
I'm not sure to understand the purpose of cuts, but quote are missing around $() so the output is split into words with IFS
echo "$(eval "$last")"
maybe cut -c8- is safer than cut -d" " -f5- for history entries with a number with number of digits different from 3.
also cut -d"|" -f1 can fail if | is used as literal for example echo '|'.
Maybe you can look at Even designators in bash manual : in interactive bash following will run the last command
$ !-1
In various bash scripts I have come across the following: $'\0'
An example with some context:
while read -r -d $'\0' line; do
echo "${line}"
done <<< "${some_variable}"
What does $'\0' return as its value? Or, stated slightly differently, what does $'\0' evaluate to and why?
It is possible that this has been answered elsewhere. I did search prior to posting but the limited number of characters or meaningful words in dollar-quote-slash-zero-quote makes it very hard to get results from stackoverflow search or google. So, if there are other duplicate questions, please allow some grace and link them from this question.
In bash, $'\0' is precisely the same as '': an empty string. There is absolutely no point in using the special Bash syntax in this case.
Bash strings are always NUL-terminated, so if you manage to insert a NUL into the middle of a string, it will terminate the string. In this case, the C-escape \0 is converted to a NUL character, which then acts as a string terminator.
The -d option of the read builtin (which defines a line-end character the input) expects a single character in its argument. It does not check if that character is the NUL character, so it will be equally happy using the NUL terminator of '' or the explicit NUL in $'\0' (which is also a NUL terminator, so it is probably no different). The effect, in either case, will be to read NUL-terminated data, as produced (for example) by find's -print0 option.
In the specific case of read -d '' line <<< "$var', it is impossible for $var to have an internal NUL character (for the reasons described above), so line will be set to the entire value of $var with leading and trailing whitespace removed. (As #mklement notes, this will not be apparent in the suggested code snippet, because read will have a non-zero exit status, even though the variable will have been set; read only returns success if the delimiter is actually found, and NUL cannot be part of a here-string.)
Note that there is a big difference between
read -d '' line
and
read -d'' line
The first one is correct. In the second one, the argument word passed to read is just -d, which means that the option will be the next argument (in this case, line). read -d$'\0' line will have identical behaviour; in either case, the space is necessary. (So, again, no need for the C-escape syntax).
To complement rici's helpful answer:
Note that this answer is about bash. ksh and zsh also support $'...' strings, but their behavior differs:
* zsh does create and preserve NUL (null bytes) with $'\0'.
* ksh, by contrast, has the same limitations as bash, and additionally interprets the first NUL in a command substitution's output as the string terminator (cuts off at the first NUL, whereas bash strips such NULs).
$'\0' is an ANSI C-quoted string that technically creates a NUL (0x0 byte), but effectively results in the empty (null) string (same as ''), because any NUL is interpreted as the (C-style) string terminator by Bash in the context of arguments and here-docs/here-strings.
As such, it is somewhat misleading to use $'\0' because it suggests that you can create a NUL this way, when you actually cannot:
You cannot create NULs as part of a command argument or here-doc / here-string, and you cannot store NULs in a variable:
echo $'a\0b' | cat -v # -> 'a' - string terminated after 'a'
cat -v <<<$'a\0b' # -> 'a' - ditto
In the context of command substitutions, by contrast, NULs are stripped:
echo "$(printf 'a\0b')" | cat -v # -> 'ab' - NUL is stripped
However, you can pass NUL bytes via files and pipes.
printf 'a\0b' | cat -v # -> 'a^#b' - NUL is preserved, via stdout and pipe
Note that it is printf that is generating the NUL via its single-quoted argument whose escape sequences printf then interprets and writes to stdout. By contrast, if you used printf $'a\0b', bash would again interpret the NUL as the string terminator up front and pass only 'a' to printf.
If we examine the sample code, whose intent is to read the entire input at once, across lines (I've therefore changed line to content):
while read -r -d $'\0' content; do # same as: `while read -r -d '' ...`
echo "${content}"
done <<< "${some_variable}"
This will never enter the while loop body, because stdin input is provided by a here-string, which, as explained, cannot contain NULs.
Note that read actually does look for NULs with -d $'\0', even though $'\0' is effectively ''. In other words: read by convention interprets the empty (null) string to mean NUL as -d's option-argument, because NUL itself cannot be specified for technical reasons.
In the absence of an actual NUL in the input, read's exit code indicates failure, so the loop is never entered.
However, even in the absence of the delimiter, the value is read, so to make this code work with a here-string or here-doc, it must be modified as follows:
while read -r -d $'\0' content || [[ -n $content ]]; do
echo "${content}"
done <<< "${some_variable}"
However, as #rici notes in a comment, with a single (multi-line) input string, there is no need to use while at all:
read -r -d $'\0' content <<< "${some_variable}"
This reads the entire content of $some_variable, while trimming leading and trailing whitespace (which is what read does with $IFS at its default value, $' \t\n').
#rici also points out that if such trimming weren't desired, a simple content=$some_variable would do.
Contrast this with input that actually contains NULs, in which case while is needed to process each NUL-separated token (but without the || [[ -n $<var> ]] clause); find -print0 outputs filenames separated by a NUL each):
while IFS= read -r -d $'\0' file; do
echo "${file}"
done < <(find . -print0)
Note the use of IFS= read ... to suppress trimming of leading and trailing whitespace, which is undesired in this case, because input filenames must be preserved as-is.
It is technically true that the expansion $'\0' will always become the empty string '' (a.k.a. the null string) to the shell (not in zsh). Or, worded the other way around, a $'\0' will never expand to an ascii NUL (or byte with zero value), (again, not in zsh). It should be noted that it is confusing that both names are quite similar: NUL and null.
However, there is an aditional (quite confusing) twist when we talk about read -d ''.
What read see is the value '' (the null string) as the delimiter.
What read does is split the input from stdin on the character $'\0' (yes an actual 0x00).
Expanded answer.
The question in the tittle is:
In a bash script, what would $'\0' evaluate to and why?
That means that we need to explain what $'\0' is expanded to.
What $'\0' is expanded to is very easy: it expands to the null string '' (in most shells, not in zsh).
But the example of use is:
read -r -d $'\0'
That transform the question to: what delimiter character does $'\0' expand to ?
This holds a very confusing twist. To address that correctly, we need to take a full circle tour of when and how a NUL (a byte with zero value or '0x00') is used in shells.
Stream.
We need some NUL to work with. It is possible to generate NUL bytes from shell:
$ echo -e 'ab\0cd' | od -An -vtx1
61 62 00 63 64 0a ### That works in bash.
$ printf 'ab\0cd' | od -An -vtx1
61 62 00 63 64 ### That works in all shells tested.
Variable.
A variable in shell will not store a NUL.
$ printf -v a 'ab\0cd'; printf '%s' "$a" | od -An -vtx1
61 62
The example is meant to be executed in bash as only bash printf has the -v option.
But the example is clear to show that a string that contains a NUL will be cut at the NUL.
Simple variables will cut the string at the zero byte.
As is reasonable to expect if the string is a C string, which must end on a NUL \0.
As soon as a NUL is found the string must end.
Command substitution.
A NUL will work differently when used in a command substitution.
This code should assign a value to the variable $a and then print it:
$ a=$(printf 'ab\0cd'); printf '%s' "$a" | od -An -vtx1
And it does, but with different results in different shells:
### several shells just ignore (remove)
### a NUL in the value of the expanded command.
/bin/dash : 61 62 63 64
/bin/sh : 61 62 63 64
/bin/b43sh : 61 62 63 64
/bin/bash : 61 62 63 64
/bin/lksh : 61 62 63 64
/bin/mksh : 61 62 63 64
### ksh trims the the value.
/bin/ksh : 61 62
/bin/ksh93 : 61 62
### zsh sets the var to actually contain the NUL value.
/bin/zsh : 61 62 00 63 64
/bin/zsh4 : 61 62 00 63 64
It is of special mention that bash (version 4.4) warns about the fact:
/bin/b44sh : warning: command substitution: ignored null byte in input
61 62 63 64
In command substitution the zero byte is silently ignored by the shell.
It is very important to understand that that does not happen in zsh.
Now that we have all the pieces about NUL. We may look at what read does.
What read do on NUL delimiter.
That brings us back to the command read -d $'\0':
while read -r -d $'\0' line; do
The $'\0' shoud have been expanded to a byte of value 0x00, but the shell cuts it and it actually becomes ''.
That means that both $'\0' and '' are received by read as the same value.
Having said that, it may seem reasonable to write the equivalent construct:
while read -r -d '' line; do
And it is technically correct.
What a delimiter of '' actually does.
There are two sides of this point, one that is the character after the -d option of read, the other one, which is addressed here, is: what character will read use if given a delimiter as -d $'\0'?.
The first side has been answered in detail above.
The second side is very confusing twist as the command read will actually read up to the next byte of value 0x00 (which is what $'\0' represents).
To actually show that that is the case:
#!/bin/bash
# create a test file with some zero bytes.
printf 'ab\0cd\0ef\ngh\n' > tfile
while true ; do
read -r -d '' line; a=$?
echo "exit $a"
if [[ $a == 1 ]]; then
printf 'last %s\n' "$line"
break
else
printf 'normal %s\n' "$line"
fi
done <tfile
when executed, the output will be:
$ ./script.sh
exit 0
normal ab
exit 0
normal cd
exit 1
last ef
gh
The first two exit 0 are successfully reads done up to the next "zero byte", and both contain the correct values of ab and cd. The next read is the last one (as there are no more zero bytes) and contains the value $'ef\ngh' (yes, it also contains a new line).
All this goes to show (and prove) that read -d '' actually reads up to the next "zero byte", which is also known by the ascii name NUL and should have been the result of a $'\0' expansion.
In short: we can safely state that read -d '' reads up to the next 0x00 (NUL).
Conclusion:
We must state that a read -d $'\0' will expand to a delimiter of 0x00.
Using $'\0' is a better way to transmit to the reader this correct meaning.
As a code style thing: I write $'\0' to make my intentions clear.
One, and only one, character used as a delimiter: the byte value of 0x00
(even if in bash it happens to be cut)
Note: Either this commands will print the hex values of the stream.
$ printf 'ab\0cd' | od -An -vtx1
$ printf 'ab\0cd' | xxd -p
$ printf 'ab\0cd' | hexdump -v -e '/1 "%02X "'
61 62 00 63 64
$'\0' expands the contained escape sequence \0 to the actual characters they represent which is \0 or an empty character in shell.
This is BASH syntax. As per man BASH:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Known backslash escape sequences are also decoded.
Similarly $'\n' expands to a newline and $'\r' will expand to a carriage return.
Trying to create a script that will take a tv show video file and move it into the correct folder in my TV directory hierarchy.
ie. example filenames:
archer.2009.s01e01.publichd.mkv
archer.s05e10.dimension.mkv
I would like these moved to: Television/Archer/Season 1/ and Television/Archer/Season 5/ respectively and create them if they don't already exist.
Here's what I have so far, right now it's only sorting based on season for one particular show, Archer. Planning to expand that once I get it working:
#!/bin/bash
for season in 01 02 03 04 05 #06 07 08 09 10 11 12 13 14 15 16
do
echo 'Season ' $season
#find -iname *archer*s$season\*.*
var=$(find -iname *archer*s$season\*.*)
var=$(echo $var | cut -c 3-)
echo $var
var2=$(sed "s/0//" <<< $season)
var3=$(echo "Season $var2/")
echo $var3
if [[ -z "$var" ]]
then
:
else
mkdir -p /home/adam/Downloads/Television/Archer/$var3;
mv "./$var" "/home/adam/Downloads/Television/Archer/$var3"
fi
done
I'm having problems creating/moving the files to the $var3 directory variable. It does have a space in the name which I know is giving me the problems. I already have a large library of shows in this format though so I'm rather not change it.
Any help would be appreciated - I know the script is VERY primitive, I'm just piecing it together as I go.
You need to put quotes around var3 when using it.
mkdir "$var3"
I found this question which has answers for git diff. However, I am not comparing files using any sort of version control (I don't even have one available on the machine I am trying to compare from).
Basically, similar to the referenced question, I am trying to see the changes in whitespace. The diff command might show:
bash-3.2$ diff 6241 6242
690c690
<
---
>
But I don't know if that is a newline, a newline and space, or what. I need to know the exact changes between two documents, including whitespace. I have tried cmp -l -b and it works, but it is rather difficult to read when there are a lot of changes to the point where it isn't really useful either.
What I really want is some way for whitespace to be rendered in some way so I can tell exactly what the whitespace is, e.g. color or perhaps ^J, ^M, etc. I don't see anything in the manual; diff --version shows GNU version 2.8.1.
As a further example, I have also tried piping the output of diff through hexdump.
bash-3.2$ diff 6241 6242 | hexdump -C
00000000 36 39 30 63 36 39 30 0a 3c 20 0a 2d 2d 2d 0a 3e |690c690.< .---.>|
00000010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000020 20 20 20 20 0a | .|
From this it is obvious to me that a bunch of space characters were added. However, what is not obvious is that a space was inserted before the newline, which is what cmp tells me:
bash-3.2$ cmp -l -b 6241 6242
33571 12 ^J 40
33590 40 12 ^J
33591 165 u 40
...
There is no easy way to do this with the diff commmand alone. One way to solve your problem is to use cat -te which will turn tab characters into ^I and will write $ at the end of lines, making it easier to see.
$ printf >test1 'hello \t \n'
$ printf >test2 'hello \t\n'
$ diff test[12] | cat -te
1c1$
< hello ^I $
---$
> hello ^I$