Remove ^H and ^M characters from a file using Linux shell scripting

Remove ^H and ^M characters from a file using Linux shell scripting - linux

How do I remove ^H and ^M characters from a file using Linux shell scripting?
^[[0^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H rcv-packets: 0
^[[0^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H rcv-errs: 0
rcv-drop: 0
rcv-fifo: 0
rcv-frame: 0

What you're seeing there are control characters, you simply could delete them with tr
cat your_file |
tr -d '\b\r'
this is better:
tr -d '\b\r' < your_file

Two methods come to mind immediately:
tr -d control+v control+h
sed 's/control+v control+h//g'
Here's both in action:
$ od -c test
0000000 \b h e l l o \b t h e r e \b \n
0000016
$ sed 's/^H//g' < test | od -c
0000000 h e l l o t h e r e \n
0000013
$ tr -d ^H < test | od -c
0000000 h e l l o t h e r e \n
0000013

For removing ^M characters appearing at the end of every line, I usually do this in vi editor.
:%s/.$//g
It just removes the last character of every line irrespective of what the character is.
This solved my provlem.

Use sed utility.
See below as per examples:
sed 's/%//' file > newfile
echo "82%%%" | sed 's/%*$//'
echo "68%" | sed "s/%$//" #assume % is always at the end.

You can remove all control characters by using tr, e.g.
tr -d "[:cntrl:]" file.txt
To exclude some of them (like line endings), check: Removing control characters from a file.

if you want to change original file, do this:
sed -i '.bak' 's/^M//g ; s/^H//g' test.md
(^M is control+v control+m)
(^H is control+v control+h)
much file, you can do this:
find source -name '*.md' | xargs sed -i '.bak' 's/^M//g ; s/^H//g'

Related

how to Replace all characters A and c from input to Z and e respectively

without using sed or awk
I tried this command to solve this problem
tr Ac Ze
but this command doesn't work
Does any help, please?

You can use sed command:
g: Apply the replacement to all matches to the regexp, not just the first.
s: stand for substitute
$ -> echo Aca | sed 's/A/Z/g; s/c/e/g'
Zea
Or just use tr command as said #James
$ -> echo Aca | tr Ac Ze
Zea
Another example:
#!/bin/bash
read -p "Insert word: " word
echo $word | tr Ac Ze
Result:
Insert word: Aca
Zea
Or:
#!/bin/bash
read -p "Insert word: " word
echo $word | sed 's/A/Z/g; s/c/e/g'
Aditional info:
tr
$ -> whatis tr
tr (1) - translate or delete characters
sed
$ -> whatis sed
sed (1) - stream editor for filtering and transforming text

Using tr to trim newlines from command-line argument ignored

I have a shell script that needs to trim newline from input. I am trying to trim new line like so:
param=$1
trimmed_param=$(echo $param | tr -d "\n")
# is the new line in my trimmed_param? yes
echo $trimmed_param| od -xc
# if i just run the tr -d on the data, it's trimmed.
# why is it not trimmed in the dynamic execution of echo in line 2
echo $param| tr -d "\n" |od -xc
I run it from command line as follows:
sh test.sh someword
And I get this output:
0000000 6f73 656d 6f77 6472 000a
s o m e w o r d \n
0000011
0000000 6f73 656d 6f77 6472
s o m e w o r d
0000010
The last command in the script echos what I would think trimmed_param would be if the tr -d "\n" had worked in line 2. What am I missing?
I realize I can use sed etc but ... I would love to understand why this method is failing.

There has never been a newline in the param. It's the echo which appends the newline. Try
# script.sh
param=$1
printf "%s" "${param}" | od -xc
Then
bash script.sh foo
gives you
0000000 6f66 006f
f o o
0000003

How to translate and remove non-printable characters? [duplicate]

I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.
Here is what I have tried so far:
this will list all the control characters:
cat -v -e -t file.txt | head -n 10
^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$
This will list all the control characters using grep:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1
-
-
1
%
-
.
/
matches the above output of cat command.
Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)
$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1
-
-
1
%
-
.
/
here is the output in hex format:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050
as you can see, the hex values, 0x01, 0x18 are control characters.
I tried using the tr command to delete the control characters but got an error:
$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.
If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?
Thanks.

Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt

Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean

Try grep, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"

A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!

Delete the character sequence \r\n on linux using tr/sed etc

I am trying to delete the string '\r\n' from a file.
Using sed:
cat foo | sed -e 's/\015\012//'
does not seem to work.
tr -d '\015'
will delete a single character but I want to remove the string \015\012. Any suggestions?

If I can offer a perl solution:
$ printf "a\nb\r\nc\nd\r\ne\n" | perl -0777 -pe 's/\r\n//g' | od -c
0000000 a \n b c \n d e \n
0000010
The -0777 option causes the entire file to be slurped in as a single string.

What about:
sed ':a;N;$!ba;s/\r\|\n//g'
This is to remove any \r and \n characters. If you want the sequence \r\n, then use this:
sed ':a;N;$!ba;s/\r\n//g'
tuned from:
https://stackoverflow.com/a/1252191/520567

Removing Control Characters from a File

I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.
Here is what I have tried so far:
this will list all the control characters:
cat -v -e -t file.txt | head -n 10
^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$
This will list all the control characters using grep:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1
-
-
1
%
-
.
/
matches the above output of cat command.
Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)
$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1
-
-
1
%
-
.
/
here is the output in hex format:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050
as you can see, the hex values, 0x01, 0x18 are control characters.
I tried using the tr command to delete the control characters but got an error:
$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.
If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?
Thanks.

Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt

Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean

Try grep, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"

A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Remove ^H and ^M characters from a file using Linux shell scripting - linux

How do I remove ^H and ^M characters from a file using Linux shell scripting? ^[[0^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H rcv-packets: 0 ^[[0^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H^H ^H rcv-errs: 0 rcv-drop: 0 rcv-fifo: 0 rcv-frame: 0

What you're seeing there are control characters, you simply could delete them with tr cat your_file | tr -d '\b\r' this is better: tr -d '\b\r' < your_file

For removing ^M characters appearing at the end of every line, I usually do this in vi editor. :%s/.$//g It just removes the last character of every line irrespective of what the character is. This solved my provlem.

Use sed utility. See below as per examples: sed 's/%//' file > newfile echo "82%%%" | sed 's/%*$//' echo "68%" | sed "s/%$//" #assume % is always at the end.

You can remove all control characters by using tr, e.g. tr -d "[:cntrl:]" file.txt To exclude some of them (like line endings), check: Removing control characters from a file.

if you want to change original file, do this: sed -i '.bak' 's/^M//g ; s/^H//g' test.md (^M is control+v control+m) (^H is control+v control+h) much file, you can do this: find source -name '*.md' | xargs sed -i '.bak' 's/^M//g ; s/^H//g'

Related

how to Replace all characters A and c from input to Z and e respectively

Using tr to trim newlines from command-line argument ignored

How to translate and remove non-printable characters? [duplicate]

Delete the character sequence \r\n on linux using tr/sed etc

Removing Control Characters from a File

Categories

Resources