Insert a space after the second character followed by every three characters - linux

I need to insert a space after two characters, followed by a space after every three characters.
Data:
97100101101102101
Expected Output:
97 100 101 101 102 101
Attempted Code:
sed 's/.\{2\}/& /3g'

In two steps:
$ sed -r -e 's/^.{2}/& /' -e 's/[^ ]{3}/& /g' <<< 97100101101102101
97 100 101 101 102 101
That is:
's/^.{2}/& /'
catch the first two chars in the line and print them back with a space after.
's/[^ ]{3}/& /g'
catch three consecutive non-space characters and print them back followed by a space.

With GNU awk:
$ echo '97100101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/," &","g",substr($0,3))}'
97 100 101 101 102 101
Note that unlike the currently accepted sed solution this will not add a blank char to the end of the line, e.g. using _ instead of a blank to make the issue visible:
$ echo '97100101101102101' | sed -r -e 's/^.{2}/&_/' -e 's/[^_]{3}/&_/g'
97_100_101_101_102_101_
$ echo '97100101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/,"_&","g",substr($0,3))}'
97_100_101_101_102_101
and it would work even if the input contained blank chars:
$ echo '971 0101101102101' | sed -r -e 's/^.{2}/& /' -e 's/[^ ]{3}/& /g'
97 1 010 110 110 210 1
$ echo '971 0101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/," &","g",substr($0,3))}'
97 1 0 101 101 102 101

Related

How to replace two lines with a blank line using SED command?

I want to replace the first two lines with a blank line as below.
Input:
sample
sample
123
234
235
456
Output:
<> blank line
123
234
235
456
Delete the first line, remove all the content from the second line but don't delete it completely:
$ sed -e '1d' -e '2s/.*//' input.txt
123
234
235
456
Or insert a blank line before the first, and delete the first two lines:
$ sed -e '1i\
' -e '1,2d' input.txt
123
234
235
456
Or use tail instead of sed to print all lines starting with the third, and an echo first to get a blank line:
(echo ""; tail +3 input.txt)
Or if you're trying to modify a file in place, use ed instead:
ed -s input.txt <<EOF
1,2c
.
w
EOF
(The c command changes the given range of lines to new content)

Remove \r\ character from String pattern matched in AWK

I'm quite new to AWK so apologies for the basic question. I've found many references for removing windows end-line characters from files but none that match a regular expression and subsequently remove the windows end line characters.
I have a file named infile.txt that contains a line like so:
...
DATAFILE data5v.dat
...
Within a shell script I want to capture the filename argument data5v.dat from this infile.txt and remove any carriage return character, \r, IF present. The carriage return may not always be present. So I have to match a word and then remove the \r subsequently.
I have tried the following but it is not working how I expect:
FILENAME=$(awk '/DATAFILE/ { print gsub("\r", "", $2) }' $INFILE)
Can I store the string returned from matching my regex /DATAFILE/ in a variable within my AWK statement to subsequently apply gsub?
File names can contain spaces, including \rs, blanks and tabs, so to do this robustly you can't remove all \rs with gsub() and you can't rely on there being any field, e.g. $2, that contains the whole file name.
If your input fields are tab-separated you need:
awk '/DATAFILE/ { sub(/[^\t]+\t/,""); sub(/\r$/,""); print }' file
or this otherwise:
awk '/DATAFILE/ { sub(/[^[:space:]]+[[:space:]]+/,""); sub(/\r$/,""); print }' file
The above assumes your file names don't start with spaces and don't contain newlines.
To test any solution for robustness try:
printf 'DATAFILE\tfoo \r bar\r\n' | awk '...' | cat -TEv
and make sure that the output looks like it does below:
$ printf 'DATAFILE\tfoo \r\tbar\r\n' | awk '/DATAFILE/ { sub(/[^\t]+\t/,""); sub(/\r$/,""); print }' | cat -TEv
foo ^M^Ibar$
$ printf 'DATAFILE\tfoo \r\tbar\r\n' | awk '/DATAFILE/ { sub(/[^[:space:]]+[[:space:]]+/,""); sub(/\r$/,""); print }' | cat -TEv
foo ^M^Ibar$
Note the blank, ^M (CR), and ^I (tab) in the middle of the file name as they should be but no ^M at the end of the line.
If your version of cat doesn't support -T or -E then do whatever you normally do to look for non-printing chars, e.g. od -c or vi the output.
With GNU awk, would you please try the following:
FILENAME=$(awk -v RS='\r?\n' '/DATAFILE/ {print $2}' "$INFILE")
echo "$FILENAME"
It assigns the record separator RS to a sequence of zero or one \r followed by \n.
As a side note, it is not recommended to use uppercases for user's variable names because it may conflict with system reserved variable names.
Awk simply applies each line of script to each input line. You can easily remove the carriage return and then apply some other logic to the input line. For example,
FILENAME=$(awk '/\r/ { sub(/\r/, "") }
/DATAFILE/ { print $2 }' "$INFILE")
Notice also When to wrap quotes around a shell variable.
who says you need gnu-awk :
gecho -ne "test\r\nabc\n\rdef\n" \
\
| mawk NF=NF FS='\r' OFS='' | odview
0000000 1953719668 1667391754 1717920778 10
t e s t \n a b c \n d e f \n
164 145 163 164 012 141 142 143 012 144 145 146 012
t e s t nl a b c nl d e f nl
116 101 115 116 10 97 98 99 10 100 101 102 10
74 65 73 74 0a 61 62 63 0a 64 65 66 0a
0000015
gawk -P posix mode is also fine with it :
gecho -ne "test\r\nabc\n\rdef\n" \
\
| gawk -Pe NF=NF FS='\r' OFS='' | odview
0000000 1953719668 1667391754 1717920778 10
t e s t \n a b c \n d e f \n
164 145 163 164 012 141 142 143 012 144 145 146 012
t e s t nl a b c nl d e f nl
116 101 115 116 10 97 98 99 10 100 101 102 10
74 65 73 74 0a 61 62 63 0a 64 65 66 0a
0000015

Shell script to convert trim and make it single line

I have a command
pdftotext -f 3 -l 3 -x 205 -y 40 -W 180 -H 75 -layout input.pdf -
When run it produces output as below
[[_थी] 2206255388
नाव मीराबाई sad
पतीचे नाव dame
| घर क्रमांक Photo's |
|वय 51 लिंग महिला Available |
I need to make each lines enclosed with double quotes and then joined to a single line separated by comma using a shell script command?
As an example, you could modify the output of your command like that:
cat <<EOF | sed 's/\(.*\)/\"\1\"/g' | tr '\n' ',' | sed 's/.$//'
> foobar
> bar
> foo
> EOF
"foobar","bar","foo"
The 1st 'sed' will add the double quotes, the 'tr' will replace the CR by a comma, last sed will remove the last comma.
So, your command will be:
pdftotext -f 3 -l 3 -x 205 -y 40 -W 180 -H 75 -layout input.pdf - | sed 's/\(.*\)/\"\1\"/g' | tr '\n' ',' | sed 's/.$//'

Replace first few lines with first few lines from other file

I am working on Linux. I have 2 files - file1.dat and file2.dat.
cat file1.dat
1
2
3
4
5
6
7
8
9
10
and for file2:
cat file2.dat
1a
2a
3a
4a
5a
6a
7a
8a
9a
10a
I want to replace first 4 lines from file1.dat with first 3 lines from file2.dat. So my output would be following
cat file1.dat
1a
2a
3a
5
6
7
8
9
10
I tried following input:
sed -i.bak '1,4d;3r file2.dat' file1.dat
But with this input I have following output:
5
6
7
8
9
10
How should I modify input command? I tried various combinations.
Following awk may also help you in same, tested codes in GNU awk.
Solution 1st:
awk 'FNR==NR && FNR<4{print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
Solution 2nd:
awk 'FNR==NR && FNR==4{nextfile} FNR==NR{print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
OR
awk 'FNR==NR{if(FNR==4){nextfile};print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
Solution 3rd: Using awk and head and tail command's combinations here.
awk 'FNR==1{system("head -n3 file2.dat");next} 1' <(tail -n +4 file1.dat)
Assuming GNU sed
$ sed '3q' f2 | sed -e '3r /dev/stdin' -e '1,4d' f1
1a
2a
3a
5
6
7
8
9
10
sed '3q' f2 gives the first three lines from second file
-e '3r /dev/stdin' use stdin data
-e '1,4d' delete required lines
order is important - first r then d
For small number of lines, you can also use
sed -e '3R f2' -e '3R f2' -e '3R f2' -e '1,4d' f1
R command reads one line at a time
With GNU coreutils, this would probably be better for all/most scenarios
head -n3 f2; tail -n +5 f1
awk is your friend
Script
# awk 'NR==FNR && FNR<=3 || NR>FNR && FNR>4' file2 file1
Output
1a
2a
3a
5
6
7
8
9
10
Tips
NR - Total number of records processed
FNR - Total number of records processed but resets when reading a new file.
When a condition evaluates to true and no extra commands are given,awk just prints.
All good :-)

How to perform the reverse of `xargs`?

I have a list of numbers that I want to reverse.
They are already sorted.
35 53 102 342
I want this:
342 102 53 35
So I thought of this:
echo $NUMBERS | ??? | tac | xargs
What's the ???
It should turn a space separated list into a line separated list.
I'd like to avoid having to set IFS.
Maybe I can use bash arrays, but I was hoping there's a command whose purpose in life is to do the opposite of xargs (maybe xargs is more than a one trick pony as well!!)
You can use printf for that. For example:
$ printf "%s\n" 35 53 102 342
35
53
102
342
$ printf "%s\n" 35 53 102 342|tac
342
102
53
35
Another answer (easy to remember but not as fast as the printf method):
$ xargs -n 1 echo
e.g.
$ NUMBERS="35 53 102 342"
$ echo $NUMBERS | xargs -n 1 echo | tac | xargs
342 102 53 35
Here is the xargs manual for -n option:
-n number
Set the maximum number of arguments taken from standard input for
each invocation of utility. An invocation of utility will use less
than number standard input arguments if the number of bytes accumu-
lated (see the -s option) exceeds the specified size or there are
fewer than number arguments remaining for the last invocation of
utility. The current default value for number is 5000.
awk one-liner without tac:
awk '{NF++;while(NF-->1)print $NF}'
for example:
kent$ echo "35 53 102 342"|awk '{NF++;while(NF-->1)print $NF}'
342
102
53
35
Another option is to use Bash string manipulation
$ numbers="35 53 102 342"
$ echo "${numbers// /$'\n'}"
35
53
102
342
$ echo "${numbers// /$'\n'}" | tac
342
102
53
35
Well, you could write:
echo $(printf '%s\n' $NUMBERS | tac)
where printf '%s\n' ... prints each of ..., with a newline after each one, and $( ... ) is a built-in feature that makes xargs almost superfluous.
However, I don't think you should avoid using arrays, IFS, and so on; they make scripts more robust in the face of bugs and/or unexpected input.
There's a lot of answers using tac, but in case you'd like to use sort, it's almost the same:
printf "%s\n" 1 2 3 4 5 10 12 | sort -rn
n is important as it makes it sort numerically. r is reverse.
If you have sorted your list with sort, you might considered the -r reversed option
Another way to change space into newlines and the other way round is with tr :
echo 35 53 102 342|tr ' ' '\n'|tac|tr '\n' ' '
If data is not sorted, replace tac by sort -rn.

Resources