How to remove OCTAL character using Linux? - vim

I have a large file that I need to edit in Linux.
the file has data fields enclosed by double quotes ( "" ). But when I open the file using notepad++ I see SOH character between the double quotes (ie. "filed1"SOH"field2"SOHSOH"field3"SOH"field4")
And when I open the same file in vim I see the double quotes followed by ^A character. (ie. "filed1"^A"field2"^A^A"field3"^A"field4")
Then when I execute this command in the command line
cat filename.txt | od -c | more
I see that the character is shown as 001 (ie. "filed1"001"field2"001001"field3"001"field4")
I have tried the following via vim
:s%/\\001//g
I also tried this command
sed -e s/\001//g filename.text > filename_new.txt
sed -e s/\\001//g filename.text > filename_new.txt
I need to remove those characters from that file.
How can I do that?

Your attempts at escaping the SOH character with \001 were close.
GNU sed has an extension to specify a decimal value with \d001 (there are also octal and hexadecimal variants):
$ sed -i -e 's/\d001//g' file.txt
In Vim, the regular expression atom looks slightly different: \%d001; alternatively, you can directly enter the character in the :%s command-line via Ctrl + V followed by 001; cp. :help i_CTRL-V_digit.

Use echo -e to get a literal \001 character into your sed command:
$ sed -i -e $(echo -e 's/\001//g') file.txt
(-i is a GNU sed extension to request in-place editing.)

just keep it simple with awk instead of having to fuss with quotation formatting issues :
mawk NF=NF FS='\1' OFS=
"filed1""field2""field3""field4"

Related

sed is replacing matched text with output of another command, but that command's output contains expansion characters [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 6 years ago.
I'm trying to replace text in a file with the output of another command. Unfortunately, the outputted text contains characters bash expands. For example, I'm running the following script to change the file (somestring references output that would break the sed command):
#!/bin/bash
somestring='$6$sPnfj/lnXwZVrec7$fCnL9uy1oWIMZduInKTHBAxhsQxGCsBpm2XfVFFqDPHKidrd93yfjbYvKgYexXHVcvkKdu9lbfy16Ek5GvKy/1'
sed '0,/^title/s/^title*/'"$somestring"'\n&/' $HOME/example.txt
sed fails with this error:
sed: -e expression #1, char 30: unknown option to `s'
I think bash is substuting the contents of $somestring when building the sed command, but is then trying to expand the resulting text. I can't put the entire sed script in single quotes, I need bash to expand it the first time, just not the second. Any suggestions? Thanks
here the forward slash / is the problem. If it's the only issue you can set sed to use a different delimiter.
for example
$ somestring="abc/def"; echo xxx | sed 's/xxx/'"$somestring"'/'
sed: -e expression #1, char 11: unknown option to `s'
$ somestring="abc/def"; echo xxx | sed 's_xxx_'"$somestring"'_'
abc/def
you also need to worry about & and \ chars and escape them if can appear in the replacement text.
If you can't control the the replacement string, either you have to sanitize with another sed script or, alternatively use r command to read it from a file. For example,
$ seq 5 | sed -e '/3/{r replace' -e 'd}'
1
2
3slashes///1ampersand&and2backslashes\\end
4
5
where
$ cat replace
3slashes///1ampersand&and2backslashes\\end
You have several errors here:
the string somestring has characters that are significative for sed command (the most important being '/' that you are using as a delimiter) You can escape it, by substituting it with a previous
somestring=$(echo "$somestring" | sed -e 's/\//\\\//g')
that will convert your / chars to \/ sequences.
you are using sed '0,/^title/s/^title*/'"$somestring"'\n&/' $HOME/example.txt which is looking to substitute the string titl followed by any number of e characters by that $somestring value, followed by a new line and the original one. Unfortunately, sed(1) doesn't allow you to use newline characters in the pattern substitution side of the s command, but you can afford the result by using the i command with a text consisting of you pattern (preceding any new line by a \ to interpret it as literal):
Finally the script leads to:
#!/bin/bash
somestring='$6$sPnfj/lnXwZVrec7$fCnL9uy1oWIMZduInKTHBAxhsQxGCsBpm2XfVFFqDPHKidrd93yfjbYvKgYexXHVcvkKdu9lbfy16Ek5GvKy/1'
somestring=$(echo "$somestring" | sed -e 's/\//\\\//g')
sed '/^title/i\
'"$somestring\\
" $HOME/example.txt
If your shell is Bash, you can use parameter substitution to replace the problematic /:
somestring="{somestring//\//\\/}"
That looks scary, but is easier to understand if you look at the version that replaces x with __:
somestring="${somestring//x/__}"
It might be easier to use (say) underscore as the delimiter for your sed s command, and then the substitution above would be
somestring="${somestring//_/\\_}"
If you already have backslashes, you'll need to first replace those:
somestring="${somestring//\\/\\\\}"
somestring="{somestring//\//\\/}"
If there were other characters that needed escaping (e.g. on the search side of s///), then you could extend the above appropriately.
This URL provides the cleanest answer:
Command to escape a string in bash
printf "%q" "$someVariable"
will escape any characters you need escaped for you.

sed -e 's/^M/d' not working

A very common problem, but I am unable to work around it with sed.
I have a script file ( a batch of commands) say myfile.txt to be executed at once to create a list. Now when I am executing a batch operation my command line interface clearly shows its unable to parse the command as a line feed ^M is adding up at end of each line.
I thought sed to be the best way to go about it.I tried:
sed -e 's/^M/d' myfile.txt > myfile1.txt
mv myfile1.txt myfile.txt
It didn't work. I also tried this and it didn't work:
sed -e 's/^M//g' myfile.txt > myfile1.txt
mv myfile1.txt myfile.txt
Then I thought may be sed is taking it as a M character in the beginning of line, and hence no result. So I tried:
sed -e 's/\^M//g' myfile.txt > myfile1.txt
mv myfile1.txt myfile.txt
But no change. Is there a basic mistake I am doing ? Kindly advise as I am bad at sed.
I found a resolution though which was to open the file in vi editor and in command mode execute this:
:set fileformat=unix
:w
But I want it in sed as well.
^M is not literally ^M. Replace ^M with \r. You can use the same representation for tr; these two commands both remove carriage returns:
tr -d '\r' < input.txt > output.txt
sed -e 's/\r//g' input.txt > output.txt
sed -e 's/^M/d' myfile.txt
Has the following meaning [the same for /\^M/ ]: If the first letter of the line is M, then remove the line, else print it and pass to next.. And you have to insert 2 separators /old/new/ in s[earch command].
This may help you.
Late, but here for posterity: sed Delete / Remove ^M Carriage Return (Line Feed / CRLF) on Linux or Unix
The gist, which answers the above question: to get ^M type CTRL+V followed by CTRL+M i.e. don’t just type the carat symbol and a capital M. It will not work

Removing Windows newlines on Linux (sed vs. awk)

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n patterns:
$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43
I can remove them with awk, but am unable to do the same with sed.
This works in awk, removing the line breaks completely:
awk 'gsub(/\r/,""){printf $0;next}{print}'
But this in sed does not, leaving line feeds in place:
sed -i 's/\r//g'
where this appears to have no effect:
sed -i 's/\r\n//g'
Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.
For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?
You can use the command line tool dos2unix
dos2unix input
Or use the tr command:
tr -d '\r' <input >output
Actually, you can do the file-format switching in vim:
Method A:
:e ++ff=dos
:w ++ff=unix
:e!
Method B:
:e ++ff=dos
:set ff=unix
:w
EDIT
If you want to delete the \r\n sequences in the file, try these commands in vim:
:e ++ff=unix " <-- make sure open with UNIX format
:%s/\r\n//g " <-- remove all \r\n
:w " <-- save file
Your awk solution works fine. Another two sed solutions:
sed '1h;1!H;$!d;${g;s/\r\n//g}' input
sed ':A;/\r$/{N;bA};s/\r\n//g' input
I believe some versions of sed will not recognize \r as a character. However, you can use a bash feature to work around that limitation:
echo $string | sed $'s/\r//'
Here, you let bash replace '\r' with the actual carriage return character inside the $'...' construct before passing that to sed as its command. (Assuming you use bash; other shells should have a similar construct.)
sed -e 's/\r//g' input_file
This works for me. The difference of -e instead of -i command.
Also I mentioned that see on different platforms behave differently.
Mine is:sed --version
This is not GNU sed version 4.0
Another method
awk 1 RS='\r\n' ORS=
set Record Separator to \r\n
set Output Record Separator to empty string
1 is always true, and in the absence of an action block {print} is used

What is the proper way to insert tab in sed?

What is the proper way to insert tab in sed? I'm inserting a header line into a stream using sed. I could probably do a replacement of some character afterward to put in tab using regular expression, but is there a better way to do it?
For example, let's say I have:
some_command | sed '1itextTABtext'
I would like the first line to look like this (text is separated by a tab character):
text text
I have tried substituting TAB in the command above with "\t", "\x09", " " (tab itself). I have tried it with and without double quotes and I can't get sed to insert tab in between the text.
I am trying to do this in SLES 9.
Assuming bash (and maybe other shells will work too):
some_command | sed $'1itext\ttext'
Bash will process escapes, such as \t, inside $' ' before passing it as an arg to sed.
You can simply use the sed i command correctly:
some_command | sed '1i\
text text2'
where, as I hope it is obvious, there is a tab between 'text' and 'text2'. On MacOS X (10.7.2), and therefore probably on other BSD-based platforms, I was able to use:
some_command | sed '1i\
text\ttext2'
and sed translated the \t into a tab.
If sed won't interpret \t and inserting tabs at the command line is a problem, create a shell script with an editor and run that script.
As most answers say, probably literal tab char is the best.
info sed saying "\t is not portable." :
...
'\CHAR'
Matches CHAR, where CHAR is one of '$', '*', '.', '[', '\', or '^'.
Note that the only C-like backslash sequences that you can
portably assume to be interpreted are '\n' and '\\'; in particular
'\t' is not portable, and matches a 't' under most implementations
of 'sed', rather than a tab character.
...
Sed can do this, but it's awkward:
% printf "1\t2\n3\t4\n" | sed '1i\\
foo bar\\
'
foo bar
1 2
3 4
$
(The double backslashes are because I'm using tcsh as my shell; if you use bash, use single backslashes)
The space between foo and bar is a tab, which I typed by prepending it with CtrlV. You'll also need to prepend the newlines inside your single quotes with a CtrlV.
It would probably be simpler/clearer to do this with awk:
$ printf "1\t2\n3\t4\n" | awk 'BEGIN{printf("foo\tbar\n");} {print;}'
escape the tab character:
sed -i '/<setup>/ a \\tmy newly added line' <file_name>
NOTE: above we have two backslashes (\) first one is for escaping () and the next one is actual tab char (\t)
To illustrate the fact the BRE syntax for sed does mention that \t is not portable, Git 2.13 (Q2 2017) gets rid of it.
See commit fba275d (01 Apr 2017) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 3c833ca, 17 Apr 2017)
contrib/git-resurrect.sh: do not write \t for HT in sed scripts
Just like we did in 0d1d6e5 ("t/t7003: replace \t with literal tab
in sed expression", 2010-08-12, Git 1.7.2.2), avoid writing "\t" for HT in sed scripts, which is not portable.
- sed -ne 's~^\([^ ]*\) .*\tcheckout: moving from '"$1"' .*~\1~p'
+ sed -ne 's~^\([^ ]*\) .* checkout: moving from '"$1"' .*~\1~p'
^^^^
|
(literal tab)
I found an alternate way to insert a tab by using substitution.
some_command | sed '1s/^/text\ttext\n/'
I still do not know of a way to do it using the insert method.
This command replace old to new in file.txt:
sed -i '' 's/old/new/' file.txt
This command will add a tab for new:
sed -i '' $'s/old/\tnew/' file.txt
This command replaces an entire string:
sed -i '' 's/.*old.*/new/' file.txt

shell scripting for token replacement in all files in a folder

HI
I am not very good with linux shell scripting.I am trying following shell script to replace
revision number token $rev -<rev number> in all html files under specified directory
cd /home/myapp/test
set repUpRev = "`svnversion`"
echo $repUpRev
grep -lr -e '\$rev -'.$repUpRev.'\$' *.html | xargs sed -i 's/'\$rev -'.$repUpRev.'\$'/'\$rev -.*$'/g'
This seems not working, what is wrong with the above code ?
rev=$(svnversion)
sed -i.bak "s/$rev/some other string/g" *.html
What is $rev in the regexp string? Is it another variable? Or you're looking for a string '$rev'. If latter - I would suggest adding '\' before $ otherwise it's treated as a special regexp character...
This is how you show the last line:
grep -lr -e '\$rev -'.$repUpRev.'\$' *.html | xargs sed -i 's/'\$rev -'.$repUpRev.'\$'/'\$rev -.*$'/g'
It would help if you showed some input data.
The -r option makes the grep recursive. That means it will operate on files in the directory and its subdirectories. Is that what you intend?
The dots in your grep and sed stand for any character. If you want literal dots, you'll need to escape them.
The final escaped dollar sign in the grep and sed commands will be seen as a literal dollar sign. If you want to anchor to the end of the line you should remove the escape.
The .* works only as a literal string on the right hand side of a sed s command. If you want to include what was matched on the left side, you need to use capture groups. The g modifier on the s command is only needed if the pattern appears more than once in a line.
Using quote, unquote, quote, unquote is hard to read. Use double quotes to permit variable expansion.
Try your grep command by itself without the xargs and sed to see if it's producing a list of files.
This may be closer to what you want:
grep -lr -e "\$rev -.$repUpRev.$" *.html | xargs sed -i "s/\$rev -.$repUpRev.$/\$rev -REPLACEMENT_TEXT/g"
but you'll still need to determine if the g modifier, the dots, the final dollar signs, etc., are what you intend.

Resources