Why is translate working and not sed for line feed - string

I would like to replace linefeed by something else like #
I have tried sed but \n doesn't work and \x0a as well
I got file z1
cat z1|hd
00000000 30 0a 61 0a 0a 31 0a 62 0a 0a 32 0a 63 0a |0.a..1.b..2.c.|
and if i try
cat z1|sed $'s/\x30//g'
everything is fine
But it doesn't work for line feed in sed, just an error message
cat z1|sed $'s/\x0a//g'
If i try
cat z1| tr "\n" "#"|hd
everything is fine for linefeed
Why is sed not working for linefeed ?

sed will not match literal newline, because commands in sed are delimetered by a newline. So sed, when parses the command, sees a newline and assumes it is the end of this command. And then sed errors with a syntax error, because s/ is an invalid command, the s needs three delimeter characters, / here. More specifically, posix sed documentation explicitly forbids embedding literal newline in BRE inside s/BRE/replacement/flags command: A literal <newline> shall not be used in the BRE of a context address or in the substitute function.
Outside from that, sed parses the input line after line, but ignores the newline from input and automatically appends a newline when printing. So even if you did s/\n/#/ it wouldn't do anything, because there is no newline in the input. You can: append all the lines into hold space, then switch hold space with pattern space and then substitute newlines. This is the solution presented by #potong.

This might work for you (GNU sed):
sed '1h;1!H;$!d;x;y/\n/#/' file
This slurps the file into memory, then translate \n's to #'s
Alternative:
sed -z 'y/\n/#/' file

There are sed versions which accept \n or hex character codes but this is not portable. If you can't use tr (why not?) then maybe explore Perl, which of course is not standard at all, but available practically everywhere; and because there is only one implementation, it will work everywhere Perl is available.
perl -pe 'y/\n/#/' z1
Also note the absence of a useless cat.

Related

creating a blank line after some specific line using bash / linux

I need to add an additional blank line after the line 45 using sed
for example:
44 some text one
45 some text two
46 some text three
47 some text four
result:
44 some text one
45 some text two
46
47 some text three
48 some text four
I've tried to use
sed '45G' myfile.txt
but not seems to be working, it does prints content of the file on the screen but do not adds any space after the line 45
Using CentOS 7 minimal
You can do:
sed $'45 a \n' file.txt
$'' initiates C-style quoting, might be needed in some sed while using \n
45 a \n appends a newline (\n) after (a) the 45-th line
sed is for simple substitutions on individual lines, that is all. For anything else just use awk:
awk '{print} NR==45{print ""}' file
That will work with any awk on any UNIX box.

sed can't differentiate CR from LF if both exist in the same file

I have a file where CR (\r) and LF (\n) exists in the same file.
a1 a2 CRLF
b1 LF
b2 CRLF
c1 c2 CRLF
The file need to be fixed to:
a1 a2 CRLF
b1 b2 CRLF
c1 c2 CRLF
The logic is simple: remove LF that is not preceded by CR with empty string:
sed 's/[^\r]\n//g' input.txt > output.txt
However, this doesn't work!
I had to delete all the occurrences of LF, and replace all the remaining CR with CRLF :
cat input.txt | tr -d '\n' | sed 's/\r/\r\n/g' >output.txt
this bugs me. why isn't sed working??
#Etan Reisner is basically correct - sed handles text as newline-delimited lines, so you need to jump through some hoops to make it deal with newlines directly. Just because you can do this doesn't mean it's the cleanest way, but if you don't have other tools at your disposal, here's an example of how to do this:
sed -e 's/[^\r]$/&/' -e te -e b -e :e -e N -e 's/\n//'
What this command does is:
s/[^\r]$/&/ - replace a CR at the end of a line with ... itself.
te - test and branch: if the previous substitution succeeded, branch to the indicated label. (We needed it to succeed, which is why it substituted with itself)
b - unconditionally branch to the end of the script
:e - create a label for the earlier te command to jump to
N - append the next line into the pattern space. This results in a pattern space with an embedded newline.
s/\n// - delete the embedded newline.
sed doesn't see line endings in the line it is operating on.
This is the same reason that sed 's/\n//' doesn't give you a file with only one line.
The newlines are handled "internally".
This is the sort of task that dos2unix/unix2dos/etc. may handle for you more directly.
I'd use awk:
awk -v RS='\r\n' 'BEGIN { ORS = RS } { gsub(/\n/, ""); print }'
With the record separator RS set to \r\n, the file will be split into records separated by, well, \r\n, so removing newlines in those records removes all newlines that are not preceded by \r. Setting ORS (the output record separator) to RS makes it so that the output file still has CRLF line endings.
Note that multi-character RS is not strictly POSIX-conforming. The most common awks support it, though.
Or there's the Perl way:
perl -pe 's/(?<!\r)\n//'
This relies on a negative lookbehind; (?<!\r) matches an empty string that is not preceded by \r. Note that unlike sed, Perl without -l does not remove newlines from the input, so no special tricks are necessary to remove them.

How to replace a <85> to a new line in bash script

I’m running out of idea on how to replace this character “<85>” to a new line (please treat this as one character only – I think this is a non-printable character).
I tried this one in my script:
cat file | awk '{gsub(”<85>”,RS);print}' > /tmp/file.txt
but didn’t work.
I hope someone can help.
Thanks.
With sed: sed -e $'s/\302\205/\\n/' file > file.txt
Or awk: awk '{gsub("\302\205","\n")}7'
The magic here was in converting the <85> character to octal codepoints.
I used hexdump -b on a file I manually inserted that character into.
tr '\205' '\n' <file > file.txt
tr is the transliterate command; it translates one character to another (or deletes it, or …). The version of tr on Mac OS X doesn't recognize hexadecimal escapes, so you have to use octal, and octal 205 is hex 85.
I am assuming that the file contains a single byte '\x85', rather than some combination of bytes that is being presented as <85>. tr is not good for recognizing multibyte sequences that need to be transliterated.

sed help: matching and replacing a literal "\n" (not the newline)

i have a file which contains several instances of \n.
i would like to replace them with actual newlines, but sed doesn't recognize the \n.
i tried
sed -r -e 's/\n/\n/'
sed -r -e 's/\\n/\n/'
sed -r -e 's/[\n]/\n/'
and many other ways of escaping it.
is sed able to recognize a literal \n? if so, how?
is there another program that can read the file interpreting the \n's as real newlines?
Can you please try this
sed -i 's/\\n/\n/g' input_filename
What exactly works depends on your sed implementation. This is poorly specified in POSIX so you see all kinds of behaviors.
The -r option is also not part of the POSIX standard; but your script doesn't use any of the -r features, so let's just take it out. (For what it's worth, it changes the regex dialect supported in the match expression from POSIX "basic" to "extended" regular expressions; some sed variants have an -E option which does the same thing. In brief, things like capturing parentheses and repeating braces are "extended" features.)
On BSD platforms (including MacOS), you will generally want to backslash the literal newline, like this:
sed 's/\\n/\
/g' file
On some other systems, like Linux (also depending on the precise sed version installed -- some distros use GNU sed, others favor something more traditional, still others let you choose) you might be able to use a literal \n in the replacement string to represent an actual newline character; but again, this is nonstandard and thus not portable.
If you need a properly portable solution, probably go with Awk or (gasp) Perl.
perl -pe 's/\\n/\n/g' file
In case you don't have access to the manuals, the /g flag says to replace every occurrence on a line; the default behavior of the s/// command is to only replace the first match on every line.
awk seems to handle this fine:
echo "test \n more data" | awk '{sub(/\\n/,"**")}1'
test ** more data
Here you need to escape the \ using \\
$ echo "\n" | sed -e 's/[\\][n]/hello/'
sed works one line at a time, so no \n on 1 line only (it's removed by sed at read time into buffer). You should use N, n or H,h to fill the buffer with more than one line, and then \n appears inside. Be careful, ^ and $ are no more end of line but end of string/buffer because of the \n inside.
\n is recognized in the search pattern, not in the replace pattern. Two ways for using it (sample):
sed s/\(\n\)bla/\1blabla\1/
sed s/\nbla/\
blabla\
/
The first uses a \n already inside as back reference (shorter code in replace pattern);
the second use a real newline.
So basically
sed "N
$ s/\(\n\)/\1/g
"
works (but is a bit useless). I imagine that s/\(\n\)\n/\1/g is more like what you want.

Convert string to hexadecimal on command line

I'm trying to convert "Hello" to 48 65 6c 6c 6f in hexadecimal as efficiently as possible using the command line.
I've tried looking at printf and google, but I can't get anywhere.
Any help greatly appreciated.
Many thanks in advance,
echo -n "Hello" | od -A n -t x1
Explanation:
The echo program will provide the string to the next command.
The -n flag tells echo to not generate a new line at the end of the "Hello".
The od program is the "octal dump" program. (We will be providing a flag to tell it to dump it in hexadecimal instead of octal.)
The -A n flag is short for --address-radix=n, with n being short for "none". Without this part, the command would output an ugly numerical address prefix on the left side. This is useful for large dumps, but for a short string it is unnecessary.
The -t x1 flag is short for --format=x1, with the x being short for "hexadecimal" and the 1 meaning 1 byte.
If you want to do this and remove the spaces you need:
echo -n "Hello" | od -A n -t x1 | sed 's/ *//g'
The first two commands in the pipeline are well explained by #TMS in his answer, as edited by #James. The last command differs from #TMS comment in that it is both correct and has been tested. The explanation is:
sed is a stream editor.
s is the substitute command.
/ opens a regular expression - any character may be used. / is
conventional, but inconvenient for processing, say, XML or path names.
/ or the alternate character you chose, closes the regular expression and
opens the substitution string.
In / */ the * matches any sequence of the previous character (in this
case, a space).
/ or the alternate character you chose, closes the substitution string.
In this case, the substitution string // is empty, i.e. the match is
deleted.
g is the option to do this substitution globally on each line instead
of just once for each line.
The quotes keep the command parser from getting confused - the whole
sequence is passed to sed as the first option, namely, a sed script.
#TMS brain child (sed 's/^ *//') only strips spaces from the beginning of each line (^ matches the beginning of the line - 'pattern space' in sed-speak).
If you additionally want to remove newlines, the easiest way is to append
| tr -d '\n'
to the command pipes. It functions as follows:
| feeds the previously processed stream to this command's standard input.
tr is the translate command.
-d specifies deleting the match characters.
Quotes list your match characters - in this case just newline (\n).
Translate only matches single characters, not sequences.
sed is uniquely retarded when dealing with newlines. This is because sed is one of the oldest unix commands - it was created before people really knew what they were doing. Pervasive legacy software keeps it from being fixed. I know this because I was born before unix was born.
The historical origin of the problem was the idea that a newline was a line separator, not part of the line. It was therefore stripped by line processing utilities and reinserted by output utilities. The trouble is, this makes assumptions about the structure of user data and imposes unnatural restrictions in many settings. sed's inability to easily remove newlines is one of the most common examples of that malformed ideology causing grief.
It is possible to remove newlines with sed - it is just that all solutions I know about make sed process the whole file at once, which chokes for very large files, defeating the purpose of a stream editor. Any solution that retains line processing, if it is possible, would be an unreadable rat's nest of multiple pipes.
If you insist on using sed try:
sed -z 's/\n//g'
-z tells sed to use nulls as line separators.
Internally, a string in C is terminated with a null. The -z option is also a result of legacy, provided as a convenience for C programmers who might like to use a temporary file filled with C-strings and uncluttered by newlines. They can then easily read and process one string at a time. Again, the early assumptions about use cases impose artificial restrictions on user data.
If you omit the g option, this command removes only the first newline. With the -z option sed interprets the entire file as one line (unless there are stray nulls embedded in the file), terminated by a null and so this also chokes on large files.
You might think
sed 's/^/\x00/' | sed -z 's/\n//' | sed 's/\x00//'
might work. The first command puts a null at the front of each line on a line by line basis, resulting in \n\x00 ending every line. The second command removes one newline from each line, now delimited by nulls - there will be only one newline by virtue of the first command. All that is left are the spurious nulls. So far so good. The broken idea here is that the pipe will feed the last command on a line by line basis, since that is how the stream was built. Actually, the last command, as written, will only remove one null since now the entire file has no newlines and is therefore one line.
Simple pipe implementation uses an intermediate temporary file and all input is processed and fed to the file. The next command may be running in another thread, concurrently reading that file, but it just sees the stream as a whole (albeit incomplete) and has no awareness of the chunk boundaries feeding the file. Even if the pipe is a memory buffer, the next command sees the stream as a whole. The defect is inextricably baked into sed.
To make this approach work, you need a g option on the last command, so again, it chokes on large files.
The bottom line is this: don't use sed to process newlines.
echo hello | hexdump -v -e '/1 "%02X "'
Playing around with this further,
A working solution is to remove the "*", it is unnecessary for both the original requirement to simply remove spaces as well if substituting an actual character is desired, as follows
echo -n "Hello" | od -A n -t x1 | sed 's/ /%/g'
%48%65%6c%6c%6f
So, I consider this as an improvement answering the original Q since the statement now does exactly what is required, not just apparently.
Combining the answers from TMS and i-always-rtfm-and-stfw, the following works under Windows using gnu-utils versions of the programs 'od', 'sed', and 'tr':
echo "Hello"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
or in a CMD file as:
#echo "%1"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
A limitation on my solution is it will remove all double quotes (").
"tr -d '\42'" removes quote marks that the Windows 'echo' will include.
"tr -d '\r'" removes the carriage return, which Windows includes as well as '\n'.
The pipe (|) character must follow immediately after the string or the Windows echo will add that space after the string.
There is no '-n' switch to the Windows echo command.

Resources