Pad all lines with spaces to a fixed width in Vim or using sed, awk, etc - vim

How can I pad each line of a file to a certain width (say, 63 characters wide), padding with spaces if need be?
For now, let’s assume all lines are guaranteed to be less than 63 characters.
I use Vim and would prefer a way to do it there, where I can select the lines I wish to apply the padding to, and run some sort of a printf %63s current_line command.
However, I’m certainly open to using sed, awk, or some sort of linux tool to do the job too.

Vim
:%s/.*/\=printf('%-63s', submatch(0))

$ awk '{printf "%-63s\n", $0}' testfile > newfile

In Vim, I would use the following command:
:%s/$/\=repeat(' ',64-virtcol('$'))
(The use of the virtcol() function, as opposed to the col() one,
is guided by the necessity to properly handle tab characters as well
as multibyte non-ASCII characters that might occur in the text.)

Just for fun, a Perl version:
$ perl -lpe '$_ .= " " x (63 - length $_)'

This might work for you:
$ sed -i ':a;/.\{63\}/!{s/$/ /;ba}' file
or perhaps more efficient but less elegant:
$ sed -i '1{x;:a;/.\{63\}/!{s/^/ /;ba};x};/\(.\{63\}\).*/b;G;s//\1/;y/\n/ /' file

It looks like you are comfortable using vim, but here is a pure Bash/simple-sed solution in case you need to do it from the command line (note the 63 spaces in the sed substitution):
$ sed 's/$/ /' yourFile.txt |cut -c 1-63

With sed, without a loop:
$ sed -i '/.\{63\}/!{s/$/ /;s/^\(.\{63\}\).*/\1/}' file
Be sure to have enough spaces in the 1st substitution to match the number of space you want to add.

Another Perl solution:
$ perl -lne 'printf "%-63s\n", $_' file

Related

How to make GNU sed remove certain characters from a line

I have a following line;
�5=?�#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
and would like to remove characters, �5=?� in front of #. So the desired output looks as follows;
#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
I used gnu sed (v4.8)with a following argument;
sed "s/.*#/#/"'
but this did not remove �5=?� thought it worked in the GNU sed live editor.
At this point, I really appreciate any help on this.
My system is 3.10.0-1160.71.1.el7.x86_64
Using sed, remove everything up to the first occurance of #
$ sed 's/^[^#]*//' input_file
#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
This might work for you (GNU sed):
sed -E 's/(\o357\o277\o275)5=\?\1//g' file
This removes all occurrences of �5=?�.
N.B. To translate the octal strings use sed -n l file to display the file as is. The triplets \357\277\275 can be matched in the LHS of the substitute command by using \o357\o277\o275.

sed help: matching and replacing a literal "\n" (not the newline)

i have a file which contains several instances of \n.
i would like to replace them with actual newlines, but sed doesn't recognize the \n.
i tried
sed -r -e 's/\n/\n/'
sed -r -e 's/\\n/\n/'
sed -r -e 's/[\n]/\n/'
and many other ways of escaping it.
is sed able to recognize a literal \n? if so, how?
is there another program that can read the file interpreting the \n's as real newlines?
Can you please try this
sed -i 's/\\n/\n/g' input_filename
What exactly works depends on your sed implementation. This is poorly specified in POSIX so you see all kinds of behaviors.
The -r option is also not part of the POSIX standard; but your script doesn't use any of the -r features, so let's just take it out. (For what it's worth, it changes the regex dialect supported in the match expression from POSIX "basic" to "extended" regular expressions; some sed variants have an -E option which does the same thing. In brief, things like capturing parentheses and repeating braces are "extended" features.)
On BSD platforms (including MacOS), you will generally want to backslash the literal newline, like this:
sed 's/\\n/\
/g' file
On some other systems, like Linux (also depending on the precise sed version installed -- some distros use GNU sed, others favor something more traditional, still others let you choose) you might be able to use a literal \n in the replacement string to represent an actual newline character; but again, this is nonstandard and thus not portable.
If you need a properly portable solution, probably go with Awk or (gasp) Perl.
perl -pe 's/\\n/\n/g' file
In case you don't have access to the manuals, the /g flag says to replace every occurrence on a line; the default behavior of the s/// command is to only replace the first match on every line.
awk seems to handle this fine:
echo "test \n more data" | awk '{sub(/\\n/,"**")}1'
test ** more data
Here you need to escape the \ using \\
$ echo "\n" | sed -e 's/[\\][n]/hello/'
sed works one line at a time, so no \n on 1 line only (it's removed by sed at read time into buffer). You should use N, n or H,h to fill the buffer with more than one line, and then \n appears inside. Be careful, ^ and $ are no more end of line but end of string/buffer because of the \n inside.
\n is recognized in the search pattern, not in the replace pattern. Two ways for using it (sample):
sed s/\(\n\)bla/\1blabla\1/
sed s/\nbla/\
blabla\
/
The first uses a \n already inside as back reference (shorter code in replace pattern);
the second use a real newline.
So basically
sed "N
$ s/\(\n\)/\1/g
"
works (but is a bit useless). I imagine that s/\(\n\)\n/\1/g is more like what you want.

Insert newline before first line

I am trying to insert a newline before the first line of text in a file. The only solution i have found so far is this:
sed -e '1 i
')
I do not like to have an actual newline in my shell script. Can this be solved any other way using the standard (GNU) UNIX utilities?
For variety:
echo | cat - file
Here's a pure sed solution with no specific shell requirements:
sed -e '1 s|^|\n|'
EDIT:
Please note that there has to be at least one line of input for this (and anything else using a line address) to work.
A $ before a single-quoted string will cause bash to interpret escape sequences within it.
sed -e '1 i'$'\n'
You could use awk:
$ awk 'FNR==1{print ""} 1' file
Which will work with any number of files.

Removing Parts of String With Sed

I have lines of data that looks like this:
sp_A0A342_ATPB_COFAR_6_+_contigs_full.fasta
sp_A0A342_ATPB_COFAR_9_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_10_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_8_+_contigs_full.fasta
sp_A0A4W3_SPEA_GEOSL_15_-_contigs_full.fasta
How can I use sed to delete parts of string after 4th column (_ separated) for each line.
Finally yielding:
sp_A0A342_ATPB_COFAR
sp_A0A342_ATPB_COFAR
sp_A0A373_RK16_COFAR
sp_A0A373_RK16_COFAR
sp_A0A4W3_SPEA_GEOSL
cut is a better fit.
cut -d_ -f 1-4 old_file
This simply means use _ as delimiter, and keep fields 1-4.
If you insist on sed:
sed 's/\(_[^_]*\)\{4\}$//'
This left hand side matches exactly four repetitions of a group, consisting of an underscore followed by 0 or more non-underscores. After that, we must be at the end of the line. This is all replaced by nothing.
sed -e 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\([^_]*\)_.*/\1_\2_\3_\4' infile > outfile
Match "any number of not '_'", saving what was matched between \( and \), followed by '_'. Do this 4 times, then match anything for the rest of the line (to be ignored). Substitute with each of the matches separated by '_'.
Here's another possibility:
sed -E -e 's|^([^_]+(_[^_]+){3}).*$|\1|'
where -E, like -r in GNU sed, turns on extended regular expressions for readability.
Just because you can do it in sed, though, doesn't mean you should. I like cut much much better for this.
AWK likes to play in the fields:
awk 'BEGIN{FS=OFS="_"}{print $1,$2,$3,$4}' inputfile
or, more generally:
awk -v count=4 'BEGIN{FS="_"}{for(i=1;i<=count;i++){printf "%s%s",sep,$i;sep=FS};printf "\n"}'
sed -e 's/_[0-9][0-9]*_[+-]_contigs_full.fasta$//g'
Still the cut answer is probably faster and just generally better.
Yes, cut is way better, and yes matching the back of each is easier.
I finally got a match using the beginning of each line:
sed -r 's/(([^_]*_){3}([^_]*)).*/\1/' oldFile > newFile

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Resources