I have two text files I wish to make sure are the same, the problem is that file1 (SELECT_20150210.txt) is generated on a windows platform, and file2 (sel.txt) is generated on a mac, so the two files have different line terminating characters even though they look the same:
The first line:
Eriks-MacBook-Air:hftdump erik$ head -n 1 sel.txt
SystemState 0x04 25 03:03:48.800 O
Eriks-MacBook-Air:hftdump erik$ head -n 1 SELECT_20150210.txt
SystemState 0x04 25 03:03:48.800 O
cmp says they are different:
Eriks-MacBook-Air:hftdump erik$ cmp sel.txt SELECT_20150210.txt
sel.txt SELECT_20150210.txt differ: char 35, line 1
But it's only the terminating characters that differ:
Eriks-MacBook-Air:hftdump erik$ head -n 1 SELECT_20150210.txt | hexdump -C
00000000 53 79 73 74 65 6d 53 74 61 74 65 09 30 78 30 34 |SystemState.0x04|
00000010 09 32 35 09 30 33 3a 30 33 3a 34 38 2e 38 30 30 |.25.03:03:48.800|
00000020 09 4f 0d 0a |.O..|
00000024
Eriks-MacBook-Air:hftdump erik$ head -n 1 sel.txt | hexdump -C
00000000 53 79 73 74 65 6d 53 74 61 74 65 09 30 78 30 34 |SystemState.0x04|
00000010 09 32 35 09 30 33 3a 30 33 3a 34 38 2e 38 30 30 |.25.03:03:48.800|
00000020 09 4f 0a |.O.|
00000023
So is there a way to cmp or diff these two file and telling cmp to ignore the different line terminating character? Thank you
ASSUMPTION: you don't want to alter the line-endings of the original files
To avoid creating temporary files, you could use process substitution:
diff my_unix_file <(dos2unix < my_dos_file)
diff my_unix_file <(sed 's/\r//' my_dos_file)
diff my_unix_file <(tr -d '\r' < my_dos_file)
UPDATE (Comments converted into answer): Some improvements done thanks to anishsane
On OSX you can use this diff:
diff osx-file.txt <(tr -d '\r' < win-file.txt)
tr -d '\r' < win-file.txt will strip r from win-file.txt.
Related
I'm trying to render the output of a linux shell command in HTML. For example, systemctl status mysql looks like this in my terminal:
As I understand from Floz'z Misc I was expecting that the underlying character stream would contain control codes. But looking at it in say hexyl (systemctl status mysql | hexyl) I can't see any codes:
Looking near the bottom on lines 080 and 090 where the text "Active: failed" is displayed, I was hoping to find some control sequences to change the color to red. While not necessarily ascii, I used some ascii tables to help me:
looking at the second lot of 8 characters on line 090 where the letters ive: fa are displayed, I find:
69 = i
76 = v
65 = e
3a = :
20 = space
66 = f
61 = a
69 = i
There are no bytes for control sequences.
I wondered if hexyl is choosing not to display them so I wrote a Java program which outputs the raw bytes after executing the process as a bash script and the results are the same - no control sequences.
The Java is roughly:
p = Runtime.getRuntime().exec(new String[]{"/bin/sh", "-c", "systemctl status mysql"}); // runs in the shell
p.waitFor();
byte[] bytes = p.getInputStream().readAllBytes();
for(byte b : bytes) {
System.out.println(b + "\t" + ((char)b));
}
That outputs:
...
32
32
32
32
32
65 A
99 c
116 t
105 i
118 v
101 e
58 :
32
102 f
97 a
105 i
108 l
101 e
100 d
...
So the question is: How does bash know that it has to display the word "failed" red?
systemctl detects that the output is not a terminal, and it removes colors codes from the output.
Related: Detect if stdin is a terminal or pipe? , https://unix.stackexchange.com/questions/249723/how-to-trick-a-command-into-thinking-its-output-is-going-to-a-terminal , https://superuser.com/questions/1042175/how-do-i-get-systemctl-to-print-in-color-when-being-interacted-with-from-a-non-t
Tools sometimes (sometimes not) come with options to enable color codes always, like ls --color=always, grep --color=always on in case of systemd with SYSTEMD_COLORS environment variable.
What tool can I use to see them?
You can use hexyl to see them.
how does bash know that it has to mark the word "failed" red?
Bash is the shell, it is completely unrelated.
Your terminal, the graphical window that you are viewing the output with, knows to mark it red because of ANSI escape sequences in the output. There is no interaction with Bash.
$ SYSTEMD_COLORS=1 systemctl status dbus.service | grep runn | hexdump -C
00000000 20 20 20 20 20 41 63 74 69 76 65 3a 20 1b 5b 30 | Active: .[0|
00000010 3b 31 3b 33 32 6d 61 63 74 69 76 65 20 28 72 75 |;1;32mactive (ru|
00000020 6e 6e 69 6e 67 29 1b 5b 30 6d 20 73 69 6e 63 65 |nning).[0m since|
00000030 20 53 61 74 20 32 30 32 32 2d 30 31 2d 30 38 20 | Sat 2022-01-08 |
00000040 31 39 3a 35 37 3a 32 35 20 43 45 54 3b 20 35 20 |19:57:25 CET; 5 |
00000050 64 61 79 73 20 61 67 6f 0a |days ago.|
00000059
I had a "file.txt" with this string and to show the special characters and then replace
my file.txt
https://drive.google.com/file/d/1pTh4rxdlmA3Qq0aVwF5jCTK3StqXrQeJ/view?usp=sharing
12,IBA�EZ JUAN,2006,00030,NUEVO
the character � I know is a "Ñ"
I want this
12,IBAÑEZ JUAN,2006,00030,NUEVO
i have tried
tr '\0xd1' 'Ñ' < file.txt > file_2.txt
my xxd is
$ hexdump -C file.txt
00000000 31 32 2c 49 42 41 ef bf bd 45 5a 20 4a 55 41 4e |12,IBA...EZ JUAN|
00000010 2c 32 30 30 36 2c 30 30 30 33 30 2c 4e 55 45 56 |,2006,00030,NUEV|
00000020 4f 2c 00 2c 20 20 20 20 20 20 20 20 20 20 2c 4e |O,., ,N|
00000030 4f 2c 00 2c 30 30 36 2c 50 2c 37 2e 30 30 30 2c |O,.,006,P,7.000,|
00000040 2e 30 30 30 2c 31 32 2e 37 34 2c 2e 30 30 30 2c |.000,12.74,.000,|
00000050 2d 2c 32 30 30 36 2d 30 36 2d 33 30 |-,2006-06-30|
0000005c
Using hexdump, we find that my file is deferring from yours with 3 redundant bytes at the very start.
cat file.txt | hexdump -C
12,IBA�EZ JUAN,2006,00030,NUEVO, , ,NO, ,006,P,7.000,.000,12.74,.000,-,2006-06-30
Piping the cat output onto tr command,
cat file.txt | tr -s "�" "Ñ"
$ cat file.txt | hexdump -C
00000000 ef bb bf 31 32 2c 49 42 41 ef bf bd 45 5a 20 4a |...12,IBA...EZ J|
00000010 55 41 4e 2c 32 30 30 36 2c 30 30 30 33 30 2c 4e |UAN,2006,00030,N|
00000020 55 45 56 4f 2c 20 2c 20 20 20 20 20 20 20 20 20 |UEVO, , |
00000030 20 2c 4e 4f 2c 20 2c 30 30 36 2c 50 2c 37 2e 30 | ,NO, ,006,P,7.0|
00000040 30 30 2c 2e 30 30 30 2c 31 32 2e 37 34 2c 2e 30 |00,.000,12.74,.0|
00000050 30 30 2c 2d 2c 32 30 30 36 2d 30 36 2d 33 30 |00,-,2006-06-30|
0000005f
Again, check for hexdump to check the changes if you want.The original text had 3 unprintable characters which are now replaced by 2 characters: Ñ . Check the screenshot below:
terminal-session
I have a file with what I believe to be a unicode type and would like to remove them with sed or some other unix utility. I have tried few options and for some reason unable to remove those characters. Test cases shown with single line (head -n1)
Attempt 1:
> head -n1 file1.txt | hexdump -C # Hexdump line 1
output:
00000000 47 72 6f 75 70 c2 a0 20 20 20 53 69 67 6e 61 6c |Group.. Signal|
00000010 c2 a0 6e 61 6d 65 c2 a0 20 20 20 20 20 20 20 20 |..name.. |
00000020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000030 55 6e 69 74 c2 a0 20 74 79 70 65 c2 a0 44 65 73 |Unit.. type..Des|
00000040 63 72 69 70 74 69 6f 6e c2 a0 0d 0a |cription....|
0000004c
Now replace "c2 a0" above
> head -n1 file1.txt | sed 's/\xc2\xa0//g' | hexdump -C
or
> head -n1 file1.txt | sed 's/\x{c2a0}//g | hexdump -C
00000000 47 72 6f 75 70 c2 a0 20 20 20 53 69 67 6e 61 6c |Group.. Signal|
00000010 c2 a0 6e 61 6d 65 c2 a0 20 20 20 20 20 20 20 20 |..name.. |
00000020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 | |
00000030 55 6e 69 74 c2 a0 20 74 79 70 65 c2 a0 44 65 73 |Unit.. type..Des|
00000040 63 72 69 70 74 69 6f 6e c2 a0 0d 0a |cription....|
No replacements happend
Attempt 2: Using vim
vim file1.txt
:set nobomb
:set fileencoding=utf-8
:wq
Used sed again and no replacements happened. How do I replace or remove those characters (hex "c2a0")?
I finally ended up using Perl which successfully removed the unicode chars.
> perl -v
This is perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level
> perl -pi -e 's/\x{c2}\x{a0}//g' file1.txt
> head -n1 file1.txt | hexdump -C
00000000 47 72 6f 75 70 20 20 20 53 69 67 6e 61 6c 6e 61 |Group Signalna|
00000010 6d 65 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |me |
00000020 20 20 20 20 20 20 20 20 20 20 55 6e 69 74 20 74 | Unit t|
00000030 79 70 65 44 65 73 63 72 69 70 74 69 6f 6e 0d 0a |ypeDescription..|
00000040
This is an example of my case function:
function SendToScreen(){
echo -e "$*"
}
So I call it by:
SendToScreen "Hello"
And, if I want to add color codes:
VioletForeGroundColor="\033[38;5;99m"
NormalColor="\033[0m"
SendToScreen "Hello"$VioletForeGroundColor" violet "$NormalColor" word."
That gives me a correct:
But the problem comes if I want to send some DOS-type path (including \ slash):
VioletForeGroundColor="\033[38;5;99m"
NormalColor="\033[0m"
MyDOSPath="d:\vivisector"
SendToScreen "Hello"$VioletForeGroundColor" violet "$NormalColor" word. The path is $MyDOSPath"
Because \v is some sort of ANSI code, so this time I obtain:
I need my function to output color text (bold, cursive, underline... etc), so I must use echo -e.
How could I solve the problem with such nagging control codes colliding characters like this \v (I suppose there will be another ones)?
I would like to repair the isssue by modifying the function, but I am not sure this is the proper method.
Thanks.
EDIT-1: We will choose \033 also known as \e as the only ANSI code that needs to remain.
New answer:
function SendToScreen() {
echo -e $(echo "${*//\\/\\\\}" | sed 's/\\\\033\[/\\033\[/g');
}
This one escapes everything, then un-escapes anything that looks like a color sequence (\033[). The possibility of sending filenames as color sequences is greatly reduced. You can reduce it even further by white-listing only those color sequences that you want to allow, and changing the sed command to a sequence of sed commands that un-escapes those exact sequences.
Old answer:
Let's say you want to escape \v and \n, you can do this:
function SendToScreen(){
a="${*//\\v/\\\\v}"
a="${a//\\n/\\\\n}"
echo -e "$a"
}
You can extend this with whatever other escapes you don't want to process.
The echo -e simply interprets sequences starting with backslash, so you simply need to ensure that the $MyDOSPath argument has all backslashes doubled up. That could be:
SendToScreen "Hello ${VioletForeGroundColor}violet${NormalColor} word." \
"The path is ${MyDOSPath//\\/\\\\}"
which uses a 'substitute' parameter expansion. The // means 'change every backslash to double backslash'.
As discussed in various comments, maybe the design of SendToScreen is sub-optimal. One possible alternative design uses:
SendToScreen [-e "string-to-expand"][-p "plain-string"] [-- "plain strings"]
Arguments that need to be expanded are, and those that should not be expanded are not. By default, they're not. So, example usage:
$ VioletForeGroundColor="\033[38;5;99m"
$ NormalColor="\033[0m"
$ MyDOSPath="C:\new\table\value\alert\form\033.txt"
$ echo "$MyDOSPath"
C:\new\table\value\alert\form\033.txt
$ bash SendToScreen.sh -e "${VioletForeGroundColor}violet${NormalColor}" -e "The path is ${MyDOSPath//\\/\\\\}" -p "Or $MyDOSPath" "Plain $MyDOSPath"
violet The path is C:\new\table\value\alert\form\033.txt Or C:\new\table\value\alert\form\033.txt Plain C:\new\table\value\alert\form\033.txt
$ bash SendToScreen.sh -e "${VioletForeGroundColor}violet${NormalColor}" -e "The path is ${MyDOSPath//\\/\\\\}" -p "Or $MyDOSPath" -e "Oops! $MyDOSPath" "Plain $MyDOSPath"
violet The path is C:\new\table\value\alert\form\033.txt Or C:\new\table\value\alert\form\033.txt Oops! C: ew able
aluelert
orm.txt Plain C:\new\table\value\alert\form\033.txt
$
A hex dump of the last lot of output was:
0x0000: 1B 5B 33 38 3B 35 3B 39 39 6D 76 69 6F 6C 65 74 .[38;5;99mviolet
0x0010: 1B 5B 30 6D 20 54 68 65 20 70 61 74 68 20 69 73 .[0m The path is
0x0020: 20 43 3A 5C 6E 65 77 5C 74 61 62 6C 65 5C 76 61 C:\new\table\va
0x0030: 6C 75 65 5C 61 6C 65 72 74 5C 66 6F 72 6D 5C 30 lue\alert\form\0
0x0040: 33 33 2E 74 78 74 20 4F 72 20 43 3A 5C 6E 65 77 33.txt Or C:\new
0x0050: 5C 74 61 62 6C 65 5C 76 61 6C 75 65 5C 61 6C 65 \table\value\ale
0x0060: 72 74 5C 66 6F 72 6D 5C 30 33 33 2E 74 78 74 20 rt\form\033.txt
0x0070: 4F 6F 70 73 21 20 43 3A 20 65 77 20 61 62 6C 65 Oops! C: ew able
0x0080: 0B 61 6C 75 65 07 6C 65 72 74 0C 6F 72 6D 1B 2E .alue.lert.orm..
0x0090: 74 78 74 20 50 6C 61 69 6E 20 43 3A 5C 6E 65 77 txt Plain C:\new
0x00A0: 5C 74 61 62 6C 65 5C 76 61 6C 75 65 5C 61 6C 65 \table\value\ale
0x00B0: 72 74 5C 66 6F 72 6D 5C 30 33 33 2E 74 78 74 0A rt\form\033.txt.
0x00C0:
You'll have to take my word for it that violet appeared in violet.
Clearly, the user (caller) of SendToScreen has to know which arguments should be expanded and which should not. However, it makes it very explicit.
Here's the code I used as a script. Repackaging as a function is left as an exercise for the reader. Extending it to add -c colour (or maybe -f foreground and -b background) is an exercise for the reader.
#!/bin/bash
output=()
while getopts "p:e:" opt
do
case "$opt" in
(e) output+=( $(echo -e "$OPTARG") );;
(p) output+=( "$OPTARG" );;
esac
done
shift $(($OPTIND - 1))
echo "${output[#]}" "$#"
Have fun!
I have a bash file with the content
cd /var/www/path/to/folder
git pull
When I run it I get
: No such file or directorywww/path/to/folder
' is not a git command. See 'git --help'.
Did you mean this?
pull
Any idea why bash gets a truncated version of commands?
You have carriage returns (Windows text file line endings) in your bash script. Remove them.
The bash file should look like this under hexdump -C:
00000000 63 64 20 2f 76 61 72 2f 77 77 77 2f 70 61 74 68 |cd /var/www/path|
00000010 2f 74 6f 2f 66 6f 6c 64 65 72 0a 67 69 74 20 70 |/to/folder.git p|
00000020 75 6c 6c 0a |ull.|
00000024
But yours looks like this instead:
00000000 63 64 20 2f 76 61 72 2f 77 77 77 2f 70 61 74 68 |cd /var/www/path|
00000010 2f 74 6f 2f 66 6f 6c 64 65 72 0d 0a 67 69 74 20 |/to/folder..git |
00000020 70 75 6c 6c 0d 0a |pull..|
Note the extra 0d's (hex 0D = decimal 13 = ASCII carriage return, ANSI \r) in front of the 0as (hex 0A = decimal 10 = ASCII linefeed, ANSI \n, which is what bash treats as the end of a line).
A carriage return is not whitespace in bash, so it is treated as part of the last argument on the command line. You're getting errors because the folder /var/www/path/to/folder.git\r doesn't exist and pull\r isn't a valid git subcommand.
When printed, a carriage return moves the cursor to the start of the line, which is why your error messages look wrong. Bash and git are printing something like foo.bash: line 1: cd: /www/path/to/folder\r: No such file or directory and git: 'pull\r' is not a git command. See 'git --help', but after the \r moves the cursor to the start of the line, the tail end of each message overwrites its beginning.
There's a program called dos2unix that converts a text file from DOS to Unix:
dos2unix filename >newfilename
But that conversion really consists of nothing but deleting the carriage returns, which you could also do explicitly with tr:
tr -d '\r' <filename >newfilename