Delete the character sequence \r\n on linux using tr/sed etc - linux

I am trying to delete the string '\r\n' from a file.
Using sed:
cat foo | sed -e 's/\015\012//'
does not seem to work.
tr -d '\015'
will delete a single character but I want to remove the string \015\012. Any suggestions?

If I can offer a perl solution:
$ printf "a\nb\r\nc\nd\r\ne\n" | perl -0777 -pe 's/\r\n//g' | od -c
0000000 a \n b c \n d e \n
0000010
The -0777 option causes the entire file to be slurped in as a single string.

What about:
sed ':a;N;$!ba;s/\r\|\n//g'
This is to remove any \r and \n characters. If you want the sequence \r\n, then use this:
sed ':a;N;$!ba;s/\r\n//g'
tuned from:
https://stackoverflow.com/a/1252191/520567

Related

How to translate and remove non-printable characters? [duplicate]

I want to delete all the control characters from my file using linux bash commands.
There are some control characters like EOF (0x1A) especially which are causing the problem when I load my file in another software. I want to delete this.
Here is what I have tried so far:
this will list all the control characters:
cat -v -e -t file.txt | head -n 10
^A+^X$
^A1^X$
^D ^_$
^E-^D$
^E-^S$
^E1^V$
^F%^_$
^F-^D$
^F.^_$
^F/^_$
^F4EZ$
^G%$
This will list all the control characters using grep:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]'
+
1
-
-
1
%
-
.
/
matches the above output of cat command.
Now, I ran the following command to show all lines not containing control characters but it is still showing the same output as above (lines with control characters)
$ cat file.txt | head -n 10 | grep '[^[:cntrl:]]'
+
1
-
-
1
%
-
.
/
here is the output in hex format:
$ cat file.txt | head -n 10 | grep '[[:cntrl:]]' | od -t x2
0000000 2b01 0a18 3101 0a18 2004 0a1f 2d05 0a04
0000020 2d05 0a13 3105 0a16 2506 0a1f 2d06 0a04
0000040 2e06 0a1f 2f06 0a1f
0000050
as you can see, the hex values, 0x01, 0x18 are control characters.
I tried using the tr command to delete the control characters but got an error:
$ cat file.txt | tr -d "\r\n" "[:cntrl:]" >> test.txt
tr: extra operand `[:cntrl:]'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.
If I delete all control characters, I will end up deleting the newline and carriage return as well which is used as the newline characters on windows. How do I delete all the control characters keeping only the ones required like "\r\n"?
Thanks.
Instead of using the predefined [:cntrl:] set, which as you observed includes \n and \r, just list (in octal) the control characters you want to get rid of:
$ tr -d '\000-\011\013\014\016-\037' < file.txt > newfile.txt
Based on this answer on unix.stackexchange, this should do the trick:
$ cat scriptfile.raw | col -b > scriptfile.clean
Try grep, like:
grep -o "[[:print:][:space:]]*" in.txt > out.txt
which will print only alphanumeric characters including punctuation characters and space characters such as tab, newline, vertical tab, form feed, carriage return, and space.
To be less restrictive, and remove only control characters ([:cntrl:]), delete them by:
tr -d "[:cntrl:]"
If you want to keep \n (which is part of [:cntrl:]), then replace it temporarily to something else, e.g.
cat file.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n"
A little late to the party: cat -v <file>
which I think is the easiest to remember of the lot!

Using sed to split a string with a delimiter

I have a string in the following format:
string1:string2:string3:string4:string5
I'm trying to use sed to split the string on : and print each sub-string on a new line. Here is what I'm doing:
cat ~/Desktop/myfile.txt | sed s/:/\\n/
This prints:
string1
string2:string3:string4:string5
How can I get it to split on each delimiter?
To split a string with a delimiter with GNU sed you say:
sed 's/delimiter/\n/g' # GNU sed
For example, to split using : as a delimiter:
$ sed 's/:/\n/g' <<< "he:llo:you"
he
llo
you
Or with a non-GNU sed:
$ sed $'s/:/\\\n/g' <<< "he:llo:you"
he
llo
you
In this particular case, you missed the g after the substitution. Hence, it is just done once. See:
$ echo "string1:string2:string3:string4:string5" | sed s/:/\\n/g
string1
string2
string3
string4
string5
g stands for global and means that the substitution has to be done globally, that is, for any occurrence. See that the default is 1 and if you put for example 2, it is done 2 times, etc.
All together, in your case you would need to use:
sed 's/:/\\n/g' ~/Desktop/myfile.txt
Note that you can directly use the sed ... file syntax, instead of unnecessary piping: cat file | sed.
Using \n in sed is non-portable. The portable way to do what you want with sed is:
sed 's/:/\
/g' ~/Desktop/myfile.txt
but in reality this isn't a job for sed anyway, it's the job tr was created to do:
tr ':' '
' < ~/Desktop/myfile.txt
Using simply tr :
$ tr ':' $'\n' <<< string1:string2:string3:string4:string5
string1
string2
string3
string4
string5
If you really need sed :
$ sed 's/:/\n/g' <<< string1:string2:string3:string4:string5
string1
string2
string3
string4
string5
This might work for you (GNU sed):
sed 'y/:/\n/' file
or perhaps:
sed y/:/$"\n"/ file
This should do it:
cat ~/Desktop/myfile.txt | sed s/:/\\n/g
If you're using gnu sed then you can use \x0A for newline:
sed 's/:/\x0A/g' ~/Desktop/myfile.txt

How to concatenate multiple lines of output to one line?

If I run the command cat file | grep pattern, I get many lines of output. How do you concatenate all lines into one line, effectively replacing each "\n" with "\" " (end with " followed by space)?
cat file | grep pattern | xargs sed s/\n/ /g
isn't working for me.
Use tr '\n' ' ' to translate all newline characters to spaces:
$ grep pattern file | tr '\n' ' '
Note: grep reads files, cat concatenates files. Don't cat file | grep!
Edit:
tr can only handle single character translations. You could use awk to change the output record separator like:
$ grep pattern file | awk '{print}' ORS='" '
This would transform:
one
two
three
to:
one" two" three"
Piping output to xargs will concatenate each line of output to a single line with spaces:
grep pattern file | xargs
Or any command, eg. ls | xargs. The default limit of xargs output is ~4096 characters, but can be increased with eg. xargs -s 8192.
grep xargs
In bash echo without quotes remove carriage returns, tabs and multiple spaces
echo $(cat file)
This could be what you want
cat file | grep pattern | paste -sd' '
As to your edit, I'm not sure what it means, perhaps this?
cat file | grep pattern | paste -sd'~' | sed -e 's/~/" "/g'
(this assumes that ~ does not occur in file)
This is an example which produces output separated by commas. You can replace the comma by whatever separator you need.
cat <<EOD | xargs | sed 's/ /,/g'
> 1
> 2
> 3
> 4
> 5
> EOD
produces:
1,2,3,4,5
The fastest and easiest ways I know to solve this problem:
When we want to replace the new line character \n with the space:
xargs < file
xargs has own limits on the number of characters per line and the number of all characters combined, but we can increase them. Details can be found by running this command: xargs --show-limits and of course in the manual: man xargs
When we want to replace one character with another exactly one character:
tr '\n' ' ' < file
When we want to replace one character with many characters:
tr '\n' '~' < file | sed s/~/many_characters/g
First, we replace the newline characters \n for tildes ~ (or choose another unique character not present in the text), and then we replace the tilde characters with any other characters (many_characters) and we do it for each tilde (flag g).
Here is another simple method using awk:
# cat > file.txt
a
b
c
# cat file.txt | awk '{ printf("%s ", $0) }'
a b c
Also, if your file has columns, this gives an easy way to concatenate only certain columns:
# cat > cols.txt
a b c
d e f
# cat cols.txt | awk '{ printf("%s ", $2) }'
b e
I like the xargs solution, but if it's important to not collapse spaces, then one might instead do:
sed ':b;N;$!bb;s/\n/ /g'
That will replace newlines for spaces, without substituting the last line terminator like tr '\n' ' ' would.
This also allows you to use other joining strings besides a space, like a comma, etc, something that xargs cannot do:
$ seq 1 5 | sed ':b;N;$!bb;s/\n/,/g'
1,2,3,4,5
Here is the method using ex editor (part of Vim):
Join all lines and print to the standard output:
$ ex +%j +%p -scq! file
Join all lines in-place (in the file):
$ ex +%j -scwq file
Note: This will concatenate all lines inside the file it-self!
Probably the best way to do it is using 'awk' tool which will generate output into one line
$ awk ' /pattern/ {print}' ORS=' ' /path/to/file
It will merge all lines into one with space delimiter
paste -sd'~' giving error.
Here's what worked for me on mac using bash
cat file | grep pattern | paste -d' ' -s -
from man paste .
-d list Use one or more of the provided characters to replace the newline characters instead of the default tab. The characters
in list are used circularly, i.e., when list is exhausted the first character from list is reused. This continues until
a line from the last input file (in default operation) or the last line in each file (using the -s option) is displayed,
at which time paste begins selecting characters from the beginning of list again.
The following special characters can also be used in list:
\n newline character
\t tab character
\\ backslash character
\0 Empty string (not a null character).
Any other character preceded by a backslash is equivalent to the character itself.
-s Concatenate all of the lines of each separate input file in command line order. The newline character of every line
except the last line in each input file is replaced with the tab character, unless otherwise specified by the -d option.
If ‘-’ is specified for one or more of the input files, the standard input is used; standard input is read one line at a time,
circularly, for each instance of ‘-’.
On red hat linux I just use echo :
echo $(cat /some/file/name)
This gives me all records of a file on just one line.

unix sed command matching a word

I am trying to match a line and use sed command to substitute it. Some thing like
aaa = 10
aaa =10
aaa=10
My sed regular expression should match all those patterns and should replace with something like bbb=5. I tried with
sed -i '/ *aaa *= */bbb=5'
But this is not properly working for all the patterns. Any help will be really appreciable.
sed -i 's/\s*aaa\s*=\s*[0-9]*/bbb=5/' input_file
cat a | sed -e '1s/aaa =10/bbb=10/' -e '2s/ aaa =10/bbb=10/' -e '3s/aaa=10/bbb=10/'
cat myfile | sed 's/\s*aaa\s*=\s*\(.*\)/bbb = \1/'
The \s character class matches both tab and space

How do I replace backspace characters (\b) using sed?

I want to delete a fixed number of some backspace characters ocurrences ( \b ) from stdin. So far I have tried this:
echo -e "1234\b\b\b56" | sed 's/\b{3}//'
But it doesn't work. How can I achieve this using sed or some other unix shell tool?
You can use the hexadecimal value for backspace:
echo -e "1234\b\b\b56" | sed 's/\x08\{3\}//'
You also need to escape the braces.
You can use tr:
echo -e "1234\b\b\b56" | tr -d '\b'
123456
If you want to delete three consecutive backspaces, you can use Perl:
echo -e "1234\b\b\b56" | perl -pe 's/(\010){3}//'
sed interprets \b as a word boundary. I got this to work in perl like so:
echo -e "1234\b\b\b56" | perl -pe '$b="\b";s/$b//g'
With sed:
echo "123\b\b\b5" | sed 's/[\b]\{3\}//g'
You have to escape the { and } in the {3}, and also treat the \b special by using a character class.
[birryree#lilun ~]$ echo "123\b\b\b5" | sed 's/[\b]\{3\}//g'
1235
Note if you want to remove the characters being deleted also, have a look at ansi2html.sh which contains processing like:
printf "12..\b\b34\n" | sed ':s; s#[^\x08]\x08##g; t s'
No need for Perl here!
# version 1
echo -e "1234\b\b\b56" | sed $'s/\b\{3\}//' | od -c
# version 2
bvar="$(printf '%b' '\b')"
echo -e "1234\b\b\b56" | sed 's/'${bvar}'\{3\}//' | od -c

Resources