Appending Text To the Existing First Line with Sed - linux

I have a data that looks like this (FASTA format). Note that
in comes with block of 2 ">" header and the sequence.
>SRR018006
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN
>SRR018006
ACCCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
What I want to do is to append a text (e.g. "foo" in the > header)
yielding:
>SRR018006-foo
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN
>SRR018006-foo
ACCCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
Is there a way to do that using SED? Preferably inline modifying
the original file.

This will do what you're looking for.
sed -ie 's/^\(>.*\)/\1-foo/' file

since judging from your previous post, you are also experienced using awk: here's an awk solution.
# awk '/^>/{print $0"-foo";next}1' file
>SRR018006-foo
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN
>SRR018006-foo
ACCCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
# awk '/^>/{print $0"-foo";next}1' file > temp
# mv temp file
if you insist on sed
# sed -e '/^>/s/$/-foo/' file

Related

Delete everything after pattern including pattern

I have a text file like
some
important
content
goes here
---from here--
some
unwanted content
I am trying to delete all lines after ---from here-- including ---from here--. That is, the desired output is
some
important
content
goes here
I tried sed '1,/---from here--/!d' input.txt but it's not removing the ---from here-- part. If I use sed '/---from here--.*/d' input.txt, it's only removing ---from here-- text.
How can I remove lines after a pattern including that pattern?
EDIT
I can achieve it by doing the first operation and pipe its output to second, like sed '1,/---from here--/!d' input.txt | sed '/---from here--.*/d' > outputput.txt.
Is there a single step solution?
Another approach with sed:
sed '/---from here--/,$d' file
The d(delete) command is applied to all lines from first line containing ---from here-- up to the end of file($)
Another awk approach:
awk '/---from here--/{exit}1' file
If you have GNU awk 4.1.0+, you can add -i inplace to change the file in-place.
Otherwise appened | tee file to change the file in-place.
I'm not positive, but I believe this will work:
sed -n '/---from here--/q; p' file
The q command tells sed to quit processing input lines after matching a given line.
Could you please try following(in case you are ok with awk).
awk '/--from here--/{found_from=1} !found_from{print}' Input_file
You can try Perl
perl -ne ' $x++ if /---from here--/; print if !$x '
using your inputs..
$ cat johnykutty.txt
some
important
content
goes here
---from here--
some
unwanted content
$ perl -ne ' $x++ if /---from here--/; print if !$x ' johnykutty.txt
some
important
content
goes here
$

How can I remove the last character of a file in unix?

Say I have some arbitrary multi-line text file:
sometext
moretext
lastline
How can I remove only the last character (the e, not the newline or null) of the file without making the text file invalid?
A simpler approach (outputs to stdout, doesn't update the input file):
sed '$ s/.$//' somefile
$ is a Sed address that matches the last input line only, thus causing the following function call (s/.$//) to be executed on the last line only.
s/.$// replaces the last character on the (in this case last) line with an empty string; i.e., effectively removes the last char. (before the newline) on the line.
. matches any character on the line, and following it with $ anchors the match to the end of the line; note how the use of $ in this regular expression is conceptually related, but technically distinct from the previous use of $ as a Sed address.
Example with stdin input (assumes Bash, Ksh, or Zsh):
$ sed '$ s/.$//' <<< $'line one\nline two'
line one
line tw
To update the input file too (do not use if the input file is a symlink):
sed -i '$ s/.$//' somefile
Note:
On macOS, you'd have to use -i '' instead of just -i; for an overview of the pitfalls associated with -i, see the bottom half of this answer.
If you need to process very large input files and/or performance / disk usage are a concern and you're using GNU utilities (Linux), see ImHere's helpful answer.
truncate
truncate -s-1 file
Removes one (-1) character from the end of the same file. Exactly as a >> will append to the same file.
The problem with this approach is that it doesn't retain a trailing newline if it existed.
The solution is:
if [ -n "$(tail -c1 file)" ] # if the file has not a trailing new line.
then
truncate -s-1 file # remove one char as the question request.
else
truncate -s-2 file # remove the last two characters
echo "" >> file # add the trailing new line back
fi
This works because tail takes the last byte (not char).
It takes almost no time even with big files.
Why not sed
The problem with a sed solution like sed '$ s/.$//' file is that it reads the whole file first (taking a long time with large files), then you need a temporary file (of the same size as the original):
sed '$ s/.$//' file > tempfile
rm file; mv tempfile file
And then move the tempfile to replace the file.
Here's another using ex, which I find not as cryptic as the sed solution:
printf '%s\n' '$' 's/.$//' wq | ex somefile
The $ goes to the last line, the s deletes the last character, and wq is the well known (to vi users) write+quit.
After a whole bunch of playing around with different strategies (and avoiding sed -i or perl), the best way i found to do this was with:
sed '$! { P; D; }; s/.$//' somefile
If the goal is to remove the last character in the last line, this awk should do:
awk '{a[NR]=$0} END {for (i=1;i<NR;i++) print a[i];sub(/.$/,"",a[NR]);print a[NR]}' file
sometext
moretext
lastlin
It store all data into an array, then print it out and change last line.
Just a remark: sed will temporarily remove the file.
So if you are tailing the file, you'll get a "No such file or directory" warning until you reissue the tail command.
EDITED ANSWER
I created a script and put your text inside on my Desktop. this test file is saved as "old_file.txt"
sometext
moretext
lastline
Afterwards I wrote a small script to take the old file and eliminate the last character in the last line
#!/bin/bash
no_of_new_line_characters=`wc '/root/Desktop/old_file.txt'|cut -d ' ' -f2`
let "no_of_lines=no_of_new_line_characters+1"
sed -n 1,"$no_of_new_line_characters"p '/root/Desktop/old_file.txt' > '/root/Desktop/my_new_file'
sed -n "$no_of_lines","$no_of_lines"p '/root/Desktop/old_file.txt'|sed 's/.$//g' >> '/root/Desktop/my_new_file'
opening the new_file I created, showed the output as follows:
sometext
moretext
lastlin
I apologize for my previous answer (wasn't reading carefully)
sed 's/.$//' filename | tee newFilename
This should do your job.
A couple perl solutions, for comparison/reference:
(echo 1a; echo 2b) | perl -e '$_=join("",<>); s/.$//; print'
(echo 1a; echo 2b) | perl -e 'while(<>){ if(eof) {s/.$//}; print }'
I find the first read-whole-file-into-memory approach can be generally quite useful (less so for this particular problem). You can now do regex's which span multiple lines, for example to combine every 3 lines of a certain format into 1 summary line.
For this problem, truncate would be faster and the sed version is shorter to type. Note that truncate requires a file to operate on, not a stream. Normally I find sed to lack the power of perl and I much prefer the extended-regex / perl-regex syntax. But this problem has a nice sed solution.

Quickest way to remove 70+ strings from a file?

I have 70+ strings I need to find and delete in a file. I need to remove the entire line in the file that the string appears in.
I know I can use sed -i '/string to remove/d' fileA.txt to remove them one at a time. However, considering I have 70+, it will take some time doing it this way.
Is there a way I can put these 70+ strings in a file and have sed go through them one by one? Or if I create a file containing the strings, is there a way to compare the two files so it removes any line from fileA that contains one of the strings?
You could use grep:
grep -vf file_with_words.txt file.txt
where file_with_words.txt would be the file containing the list of words, each word being on a different line and file.txt is the file that you want to remove the lines from.
If your list of words contains regex metacharacters, then tell grep to consider those as fixed strings (if that is what you want):
grep -F -vf file_with_words.txt file.txt
Using sed, you'd need to say:
sed '/word1\|word2\|word3/d' file.txt
or
sed -E '/word1|word2|word3/d' file.txt
You could use command substitution to construct the pattern too:
sed -E "/$(paste -sd'|' file_with_words.txt)/d" file.txt
but grep is clearly the tool to use in this case.
If you want to do the job in bash, here's how:
search=fileA.txt
queries=queries.txt
while read query
do
sed -i '' "/$query/d" $search
done < "$queries"
where queries.txt looks like
I
want
to
delete
these
lines

Removing string between two symbol in line

I am trying to remove a string between two symbol in line from a csv file. Here is my sample file :
1.1.1.1,A-B:,awef.C.D.E
1.1.1.2,A-B:,few.C.D.E
1.1.1.3,A-B:,dfs.C.D
1.1.1.4,A-B:,few.C.D
1.1.1.5,A-B:,fdsferger.C.D.E
1.1.1.6,A-B:,wef.C.D
1.1.1.7,A-B:,jty.C.D.E
The output would be like this :
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E
Any way I can achieve it?
The following awk command can do this:
awk 'BEGIN{FS=OFS=","}{sub("[^.]*.","",$3);print}'
It basically divides each line into the three comma-separated fields then removes the initial part of the third field, up to and including the first . character.
Then it simply outputs them again.
See the following transcript for a demonstration:
pax> echo '1.1.1.1,A-B:,awef.C.D.E
1.1.1.2,A-B:,few.C.D.E
1.1.1.3,A-B:,dfs.C.D
1.1.1.4,A-B:,few.C.D
1.1.1.5,A-B:,fdsferger.C.D.E
1.1.1.6,A-B:,wef.C.D
1.1.1.7,A-B:,jty.C.D.E' | awk 'BEGIN{FS=OFS=","}{sub("[^.]*.","",$3);print}'
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E
Here is an awk that should do:
awk '{sub(/:,[^.]*\./,":,")}1' file
1.1.1.1,A-B:,C.D.E
1.1.1.2,A-B:,C.D.E
1.1.1.3,A-B:,C.D
1.1.1.4,A-B:,C.D
1.1.1.5,A-B:,C.D.E
1.1.1.6,A-B:,C.D
1.1.1.7,A-B:,C.D.E
You can use sed also
sed -r 's/(.*:,)([a-z]*.)(.*)/\1\3/g'
(or)
sed -r 's/:,[^.]+\./:,/' file
This might work for you (GNU sed):
sed 's/^\(.*,\)[^.]*\./\1/' file
Use greed to gather up all the columns but the last and then delete upto and including the first ..

Replace whitespace with a comma in a text file in Linux

I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv

Resources