sed command to replace char at a specific byte position - linux

I am having two files say test1 & test2 having a difference at some random position.So i want to search for char position that differ in test1 from test 2 & want to replace that with * in test 1 . But the constraint is without knowing the char by just knowing the position of char.
So i have tried to use the cmp -b to get the byte position that differ but cant get something in sed or any where that can replace char at a byte position or something in compare that will give line no. as well as char position in line that differ. So any help (the main constraint is that cant replace with char value as don't want that change at other places in file want change only at that position). sed first occurrence replacement will also not work as first occurence may be before the differ position.

So you just want to write some byte at some offset without file truncation, maybe dd will help?
Setup:
$ cat f1
aaaaaaaaaaaaaaaaaaaaaaaaa
$ cat f2
aaaaaaaaaaaaaaabaaaaaaaaa
Script:
if ! CMPOUT=`cmp -b f1 f2`; then
POS=`echo "$CMPOUT" | sed -r 's/^.*: byte ([0-9]+),.*$/\1/'`
echo -n '*' | dd of=f2 seek="$((POS-1))" bs=1 count=1 conv=notrunc
fi
Result:
$ cat f2
aaaaaaaaaaaaaaa*aaaaaaaaa

Related

Convert carriage return (\r) to actual overwrite

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.
The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")
With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.
To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

Remove a sequence of chars at the end of file including LF (linefeed)

I have a file that contains some PCL sequences. I have this sequence at the end of the file (hex):
461b 2670 3158 0a F.&p1X.
I want to remove the sequence: <Esc>&p1X including the character that follows. In 99% of cases, LF follows the sequence.
I tried this command:
sed -b 's/\o33&p[0-9]X$//Mg' ~/test.txt >test2.txt
However, it appends LF at the end of test2.txt. Also, if, instead of $ I specify . it doesn't match the line anymore.
If you want to play with this, generate the input file using this command:
echo -e "SomeString\033&p1X" > ~/test.txt
The redirect appends an LF char at the end.
Thanks
If I have understood well you know for sure that your file contains that sequence of characters at the end. If this is the case I would simply truncate the last six bytes. It will work regardless the very last character being new-line or whatever you want...
Example:
$ echo -e "SomeString\033&p1X" > test.txt
$ od -c test.txt
0000000 S o m e S t r i n g 033 & p 1 X \n
0000020
$ truncate -s -6 test.txt
$ od -c test.txt
0000000 S o m e S t r i n g
0000012
This is also very efficient as it will use the system call truncate().
This seems to do the trick based on this thread:
perl -pi -e 's/\x1b&p[0-9]X\n//g' ~/test.txt
(I am a perl beginner as well - any comments would be appreciated).

Programming rev in sed

I'm trying to write an utility that reverses lines of input. The following just prints the lines as they are though:
#!/bin/sed -f
#insert newline at the beginning
s/^/\n/
#while the newline hasnt moved to the end of pattern space, rotate
: loop
/\n$/{!s/\(.*\)\(.$\)/\2\1/;!b loop}
#delete the newline
s/\n//
Any ideas on what's wrong?
/\n$/{!s/\(.*\)\(.$\)/\2\1/;!b loop}
the ! is after an address/range normaly
the !b (not than goto if I understang your meaning) is maybe a t (if substitution occur, goto)
$ is not part of the last group but just after
so this line is:
/\n$/ !{s/\(.*\)\(.\)$/\2\1/;t loop}
now, this code just (in final) do nothing it add a new line at start and move it until the end by swapping last to first character and does not reveverse anything.
sed 'G
:loop
s/\(.\)\(\n.*\)/\2\1/
t loop
s/.//' YourFile
should do the trick
#TobySpeight still enhance the code removing the need of a 1st group (code adapted)
Solution 1
$ echo -e '123\n456\n789' |sed -nr '/\n/!G;s/(.)(.*\n)/&\2\1/;/^\n/!D;s/\n//p'
321
654
987
the core ideas:
we need a loop to deal with each line, and fortunately we can use D command can simulate a loop;
we need to loop over ONE line, which is difficult, because sed deals with one line every time; but we can use s and D command to simulate a loop over one line.
how to avoid infinite loop? we need a flag char to identify the end of each line, \n is the perfect choice.
D command delete chars util first \n in the pattern space every time,
and then force sed jump to its first command, which is a loop actually! So we also need to add some useless placeholder to be deleted by D command before the final string, and we can just use contents in current line before \n (\n also included).
Explains:
/\n/!G: if current pattern space contain \n, which means this command is in a loop of dealing with one line; otherwise, use G command to append the \n and hold space to the pattern space (sed will delete \n of every line before putting it into pattern space), the content of pattern space after G command is the origin content and a \n.
s/(.)(.*\n)/&\2\1/;: use s command to delete the first char (util \n) and then insert it after the final string.
/^\n/!D;s/\n//p: if current pattern space starts with \n, which means this line has been resolved already, so use s/\n//p to delete the flag char: \n and print the final string; otherwise use D command to delete the useless placeholder, and then jump to the first command to deal with the second char...
To make a summary, the contents in pattern space in a loop are shown as the followings:
123\n [(1)(23\n)] =s=> 123\n23\n1 [(123\n)(23\n)(1)] =D=> 23\n1
23\n1 [(2)(3\n)1] =s=> 23\n3\n21 [(23\n)(3\n)(2)1] =D=> 3\n21
3\n21 [(3)(\n)21] =s=> 3\n\n321 [(3\n)(\n)(3)21] =D=> \n321
\n321 [()(\n)321] =s=> \n321 =!D=> \n321 =s-p=> 321
There are some derived solutions:
Solution 2
the placeholder can be set another string ending with a \n:
$ echo -e '123\n456\n789' |sed -nr '/\n/!G;s/(.)(.*\n)/USELESS\n\2\1/;/^\n/!D;s/\n//p'
321
654
987
Solution 3
Use a direct loop instead of obscure D command
$ echo -e '123\n456\n789' |sed -nr '/\n/!G;s/(.)(.*\n)/&\2\1/;Tend;D;:end;s/\n//p'
321
654
987
Solution 4
Use . to fetch the first char \n
$ echo -e '123\n456\n789' |sed -nr '/\n/!G;s/(.)(.*\n)/&\2\1/;/^\n/!D;s/.//p'
321
654
987
Solution 5
$ echo -e '123\n456\n789' |sed -nr ':loop;/\n/!G;s/(.)(.*\n)/\2\1/;tloop;s/.//p'
321
654
987
This solution is much easier to understand, the contents in pattern space res shown as the followings:
123\n [(1)(23\n)] =s=> 23\n1 [(23\n)(1)]
23\n1 [(2)(3\n)1] =s=> 3\n21 [(3\n)(2)1]
3\n21 [(3)(\n)21] =s=> \n321 [(\n)(3)21]
\n321 [()(\n)321] =s=> \n321 =s=> 321
The problem is you are using the wrong tool for the job and trying to understand/use constructs that became obsolete in the mid-1970s when awk was invented.
$ cat file
tsuj
esu
na
etaorporppa
loot
$ awk -v FS= '{rev=""; for (i=1; i<=NF; i++) rev = $i rev; print rev}' file
just
use
an
approproate
tool

Add line feed every 2391 byte

I am using Redhat Linux 6.
I have a file which should comes from mainframe MVS with EBCDIC-ASCII conversion.
(But I suspect some conversion may be wrong)
Anyway, I know that the record length is 2391 byte. There are 10 records and the file size is 23910 byte.
For each 2391 byte record, there are many 0a or 0d char (not CRLF). I want to replace them with, say, # and #.
Also, I want to add a LF (i.e.0a) every 2391 byte so as to make the file become a normal unix text file for further processing.
I have try to use
dd ibs=2391 obs=2391 if=emyfile of=myfile.new
But, this cannot work. Both files are the same.
I also try
dd ibs=2391 obs=2391 if=myfile | awk '{print $0}'
But, this also not work
Can anyone help on this ?
Something like this:
#!/bin/bash
for i in {0..9}; do
dd if=emyfile bs=2391 count=1 skip=$i | LC_CTYPE=C tr '\r\n' '##'
echo
done > newfile
If your files are longer, you will need more than 10 iterations. I would look to handle that by running an infinite looop and exiting the loop on error, like this:
#!/bin/bash
i=0
while :; do
dd if=emyfile bs=2391 count=1 skip=$i | LC_CTYPE=C tr '\r\n' '##'
[ ${PIPESTATUS[0]} -ne 0 ] && break
echo
((i++))
done > newfile
However, on my iMac under OSX, dd doesn't seem to exit with an error when you go past end of file - maybe try your luck on your OS.
You could try
$ dd bs=2391 cbs=2391 conv=ascii,unblock if=emyfile of=myfile.new
conv=ascii converts from EBCDIC to ASCII. conv=unblock inserts a newline at the end of each cbs-sized block (after removing trailing spaces).
If you already have a file in ASCII and just want to replace some characters in it before splitting the blocks, you could use tr(1). For example, the following will replace each carriage return with '#' and each newline (linefeed) with '#':
$ tr '\r\n' '##' < emyfile | dd bs=2391 cbs=2391 conv=unblock of=myfile.new

Paste corresponding characters from multiple lines together

I'm writing a linux-command that pasts corresponding characters from multiple lines together. For example: I want to change these lines
A---
-B--
---C
--D-
to this:
A----B-----D--C-
So far, i've made this:
cat sanger.a sanger.c sanger.g sanger.t | cut -c 1
This does the trick for only the first column, but it has to work for all the columns.
Is there anyone who can help?
EDIT: This is a better example. I want this:
SUGAR
HONEY
CANDY
to become
SHC UOA GND AED RYY (without spaces)
Awk way for updated spec
awk -vFS= '{for(i=1;i<=NF;i++)a[i]=a[i]$i}
END{for(i=1;i<=NF;i++)printf "%s",a[i];print ""}' file
Output
A----B-----D--C-
SHCUOAGNNAEDRYY
P.s for a large file this will use lots of memory
A terrible way not using awk, also you need to know the number of fields before hand.
for i in {1..4};do cut -c $i test | tr -d "\n" ; done;echo
Here's a solution without awk or sed, assuming the file is named f:
paste -s -d "" <(for i in $(seq 1 $(wc -L < f)); do cut -c $i f; done)
wc -L is a GNUism which returns the length of the longest line in the input file, which might not work depending on your version/locale. You could instead find the longest line by doing something like:
awk '{if (length > x) {x = length}} END {print x}' f
Then using this value in the seq command instead of the above command substitution.
All right, time for some sed insanity! :D
Disclaimer: If this is for something serious, use something less brittle than this. awk comes to mind. Unless you feel confident enough in your sed abilities to maintain this lunacy.
cat file1 file2 etc | sed -n '1h; 1!H; $ { :loop; g; s/$/\n/; s/\([^\n]\)[^\n]*\n/\1/g; p; g; s/^.//; s/\n./\n/g; h; /[^\n]/ b loop }' | tr -d '\n'; echo
This comes in three parts: Say you have a file foo.txt
12345
67890
abcde
fghij
then
cat foo.txt | sed -n '1h; 1!H; $ { :loop; g; s/$/\n/; s/\([^\n]\)[^\n]*\n/\1/g; p; g; s/^.//; s/\n./\n/g; h; /[^\n]/ b loop }'
produces
16af
27bg
38ch
49di
50ej
After that, tr -d '\n' deletes the newlines, and ;echo adds one at the end.
The heart of this madness is the sed code, which is
1h
1!H
$ {
:loop
g
s/$/\n/
s/\([^\n]\)[^\n]*\n/\1/g
p
g
s/^.//
s/\n./\n/g
h
/[^\n]/ b loop
}
This first follows the basic pattern
1h # if this is the first line, put it in the hold buffer
1!H # if it is not the first line, append it to the hold buffer
$ { # if this is the last line,
do stuff # do stuff. The whole input is in the hold buffer here.
}
which assembles all input in the hold buffer before working on it. Once the whole input is in the hold buffer, this happens:
:loop
g # copy the hold buffer to the pattern space
s/$/\n/ # put a newline at the end
s/\([^\n]\)[^\n]*\n/\1/g # replace every line with only its first character
p # print that
g # get the hold buffer again
s/^.// # remove the first character from the first line
s/\n./\n/g # remove the first character from all other lines
h # put that back in the hold buffer
/[^\n]/ b loop # if there's something left other than newlines, loop
And there you have it. I might just have summoned Cthulhu.

Resources