How can I use diff to see whitespace changes? - colors

I found this question which has answers for git diff. However, I am not comparing files using any sort of version control (I don't even have one available on the machine I am trying to compare from).
Basically, similar to the referenced question, I am trying to see the changes in whitespace. The diff command might show:
bash-3.2$ diff 6241 6242
690c690
<
---
>
But I don't know if that is a newline, a newline and space, or what. I need to know the exact changes between two documents, including whitespace. I have tried cmp -l -b and it works, but it is rather difficult to read when there are a lot of changes to the point where it isn't really useful either.
What I really want is some way for whitespace to be rendered in some way so I can tell exactly what the whitespace is, e.g. color or perhaps ^J, ^M, etc. I don't see anything in the manual; diff --version shows GNU version 2.8.1.
As a further example, I have also tried piping the output of diff through hexdump.
bash-3.2$ diff 6241 6242 | hexdump -C
00000000 36 39 30 63 36 39 30 0a 3c 20 0a 2d 2d 2d 0a 3e |690c690.< .---.>|
00000010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000020 20 20 20 20 0a | .|
From this it is obvious to me that a bunch of space characters were added. However, what is not obvious is that a space was inserted before the newline, which is what cmp tells me:
bash-3.2$ cmp -l -b 6241 6242
33571 12 ^J 40
33590 40 12 ^J
33591 165 u 40
...

There is no easy way to do this with the diff commmand alone. One way to solve your problem is to use cat -te which will turn tab characters into ^I and will write $ at the end of lines, making it easier to see.
$ printf >test1 'hello \t \n'
$ printf >test2 'hello \t\n'
$ diff test[12] | cat -te
1c1$
< hello ^I $
---$
> hello ^I$

Related

NULL (\0) added at the end of file

I'm trying to clean a binary file to delete all the NULL on it. The task is quite simple, but I found out a lot of files have a NULL at the end of the file and i dont know what. I'm dumping the hexadecimal value of each byte and i dont see the null anywhere, but if I do a hexdump of the file, I see a value 00 at the end and I dont know why.... Could be that it is a EOF, but it's weird becuase it doesnt appear in all files. This is the script I have, quite simpel one, it generates 100 random binary files, and then reads file per file, char per char. Following the premise that bash wont store NULL's on variables, rewritting the char after storing it on a variable would avoid the NULL's, but no....
#!/bin/bash
for i in $(seq 0 100)
do
echo "$i %"
time dd if=/dev/urandom of=$i bs=1 count=1000
while read -r -n 1 c;
do
echo -n "$c" >> temp
done < $i
mv temp $i
done
I also tried with:
tr '\000' <inFile > outfile
But same result.
This is how it looks the hexdump of one the files with this problem
00003c0 0b12 a42b cb50 2a90 1fd6 a4f9 89b4 ddb6
00003d0 3fa3 eb7e 00c4
c4 should be the last byte butas you can see, there's a 00 there ....
Any clue?
EDIT:
Forgot to mention that the machine where im running this is something similar like raspberry pi and the tools provided with it are quite limited.
Try these other commands:
od -tx1 inFile
xxd inFile
hexdump outputs 00 when the size is an odd number of bytes.
It seems hexdump without options is like -x, hexdump -h gives the list of options; hexdump -C may also help.

creating a blank line after some specific line using bash / linux

I need to add an additional blank line after the line 45 using sed
for example:
44 some text one
45 some text two
46 some text three
47 some text four
result:
44 some text one
45 some text two
46
47 some text three
48 some text four
I've tried to use
sed '45G' myfile.txt
but not seems to be working, it does prints content of the file on the screen but do not adds any space after the line 45
Using CentOS 7 minimal
You can do:
sed $'45 a \n' file.txt
$'' initiates C-style quoting, might be needed in some sed while using \n
45 a \n appends a newline (\n) after (a) the 45-th line
sed is for simple substitutions on individual lines, that is all. For anything else just use awk:
awk '{print} NR==45{print ""}' file
That will work with any awk on any UNIX box.

concatenating all files horizontally and only a specific column

In linux, is there a way to concatenate all the files in a directory that end with .out into one file? It would be even better if the final output file had them horizontally next to one another, rather than vertically. Even further, is it possible to only get the 6th column from each file (each column separated by a space).
I know I have been doing this in powershell. was wondering if linux can do this?
I know I can use
paste *.out > total.out
but how do I just paste the 6th column, which are separated by spaces?
Using bash and awk with temporary files to filter the sixth column of each *.out file.
#!/bin/bash
declare -a TEMPS
for name in *.out; do
TEMPS+=($(mktemp $name.XXXXXXXX))
awk '{ print $5 ;}' $name >${TEMPS[-1]}
done
paste -d ' ' "${TEMPS[#]}"
# Remove tmp files
rm "${TEMPS[#]}"
Output using the example files from #daniel
6 18 30
12 24 36
Save this script as a .sh file, then run it in your directory. This method uses sponge which you can install in Ubuntu with sudo apt-get install moreutils
saveColumn6.sh
# Make total.out a blank file
rm total.out
> total.out
# Go through every file ending in '.out'
for i in *.out
do
# cut out field 6, append it to total.out, and rewrite the file.
cut -d ' ' -f6 $i | paste -d' ' total.out - | sponge total.out
done
Here are the input files I used to test this.
in0.out
1 2 3 4 5 6
7 8 9 10 11 12
in1.out
13 14 15 16 17 18
19 20 21 22 23 24
in2.out
25 26 27 28 29 30
31 32 33 34 35 36
Here is the output file I received
total.out
6 18 30
12 24 36
Note that there is a leading space in this new database, I couldn't figure out how to get rid of that.
As Nightcrawler mentioned, Linux isn't the relevent component. You're looking for bash, the command-line shell used by many GNU/Linux based systems.

Why does UTF-8 text sort in different order between OS X and Linux?

I have a text file with lines of UTF-8 encoded text:
mac-os-x$ cat unsorted.txt
ウ
foo
チ
'foo'
津
In case it helps to reproduce the problem, here is a checksum and a dump of the exact bytes in the file, as well as how you could generate the file yourself (on Linux, use base64 -d instead of -D):
mac-os-x$ shasum unsorted.txt
a6d0b708d3e0cafb0c6e1af7450e9243da8cb078 unsorted.txt
mac-os-x$ perl -ne 'print join(" ", map { sprintf "%02x", ord } split //), "\n"' unsorted.txt
e3 82 a6 0a
66 6f 6f 0a
e3 83 81 0a
27 66 6f 6f 27 0a
e6 b4 a5 0a
mac-os-x$ echo 44KmCmZvbwrjg4EKJ2ZvbycK5rSlCg== | base64 -D > unsorted.txt
When I sort this input file on Mac OS X (regardless of whether I use GNU sort 5.93 which Mac OS X Yosemite ships with, or with a Homebrew installed GNU sort version 8.23), I get this sorted result:
mac-os-x$ env -i LANG=en_US.utf-8 LC_ALL=en_US.utf-8 /usr/bin/sort unsorted.txt
'foo'
foo
ウ
チ
津
mac-os-x$ echo `sw_vers -productName` `sw_vers -productVersion`
Mac OS X 10.10.1
mac-os-x$ /usr/bin/sort --version | head -1
sort (GNU coreutils) 5.93
When I sort the same file, with the same locale settings, on Linux (I tested on both Centos 5.5 and CentOS 6.5), I get a different result:
linux-centos-6.5$ env -i LANG=en_US.utf-8 LC_ALL=en_US.utf-8 /bin/sort unsorted.txt
ウ
チ
foo
'foo'
津
linux-centos-6.5$ cat /etc/redhat-release
CentOS release 6.5 (Final)
linux-centos-6.5$ /bin/sort --version | head -1
sort (GNU coreutils) 8.4
Note the different locations of the Japanese kana vs. the English, and the different sort order between two lines that differ only by the single quotes.
To add another variant to the mix, I notice that on a very old FreeBSD 6 box I have, I get the same sort order as OS X:
freebsd-6.0$ env -i LANG=en_US.utf-8 LC_ALL=en_US.utf-8 /usr/bin/sort unsorted.txt
'foo'
foo
ウ
チ
津
freebsd-6.0$ uname -rs
FreeBSD 6.0-RELEASE
freebsd-6.0$ sort --version | head -1
sort (GNU coreutils) 5.3.0-20040812-FreeBSD
I expected the sort order to be the same in each case, given that all cases are using GNU sort, all with the same locale settings. I tried explictly setting LC_COLLATE separately, and tried using LC_COLLATE=C to force a sort by byte order, but that did not change any results.
Why does my example input file sort differently across OS X and Linux? And how could I force both systems to produce identically sorted text (I don't care which variant, as long as it is consistent between the two)?
As it seems - your linux sort is not preserving proper UTF-8 order.
Hex UTF-8 representations of your unsorted.txt (first letters) would be:
ウ - 30A6
foo - 0066
チ - 30C1
'foo' - 0027
津 - 6D25
taken from http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=%E3%82%A6&mode=char
So proper sorting according to unicode collation (http://www.unicode.org/Public/UCA/latest/allkeys.txt) would be:
'foo' - line 487
foo - line 8966
ウ - line 20875
チ - line 21004
津 - not in file
So, to answer your question, your linux machine is providing wrong collation tables to sort function. Unfortunately, i can't tell what is possible reason for that.
PS: There's similar question to yours here.
EDIT
As #ninjalj noticed, glibc doesn't use UCA, but ISO-14651 instead. This bug report suggest migration to UCA. Unfortunately, it's still not resolved.
Also, it could be somehow connected with question about ls case insensivity on MacOSX. Some people even suggest that it has something to do with HFS filesystem.

Hex to TCP Port - BASH

so i have been working a lot with sending hex to a tcp port. Now, my next task is to do the following , BUT this is a different way of hex?
I guess i need some help because the documentation anywhere is really bad.
So far the hex i am told (the command) is like so
01 53 20 00 41 04 4F
so normally, i would do the following in linux
exec 3<>/dev/tcp/IP OF SERVER/PORT TO SERVER
then
echo -ne '01 53 20 00 41 04 4F' >&3
then
echo <&3
but i get no reply back just blank.
Sorry i forgot to mention,
what i am use to doing is
echo -ne 54686973776f726b7366696e65 | perl -pe 's/([0-9a-f]{2})/chr hex $1/gie' >&3
and then
echo <&3
and ill get a reply.
So my question is, what is the diff between 01 53 20 etc ..
I am a bit confused.
When you say
echo -ne 54686973776f726b7366696e65 | perl -pe 's/([0-9a-f]{2})/chr hex $1/gie' >&3
you're piping your hex codes through Perl, which is breaking them up and translating them into 8-bit character codes for you.
Bash, on the other hand, doesn't handle text as well as Perl does (which is why you needed Perl in the first place). At best, because you're not doing any translation whatsoever, the other side will see the literal text 01 53 20 00 41 04 4F.
In order to do this entirely in Bash, you'd have to do something like
echo -ne '\x01\x53\x20\x00\x41\x04\x4f' >&3
The \x## codes are basically the equivalent of what Perl was doing with each pair of digits...and -e enables that translation.
For reference, this works just fine for me:
exec 3<>/dev/tcp/127.0.0.1/80
# 'GET /\r\n'
echo '\x47\x45\x54\x20\x2f\x0d\x0a' >&3
# Note: `echo <&3` didn't work here, in my tests.
cat <&3

Resources