Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a file that looks like this:
>Gene.10::S0008.1::g.10::m.10 Gene.10::S0008.1::g.10 ORF type:complete len:250 (-),score=22.42 S_0008.1:286-1035(-)
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11 Gene.11::S0009.1::g.10 ORF type:complete len:250 (-),score=22.42 S_0008.1:286-1035(-)
QSAISNDEELNKIMDA
....
I want to delete everything in the header after the first space. How can I do this easily in linux?
Resultant file:
>Gene.10::S0008.1::g.10::m.10
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11
QSAISNDEELNKIMDA
I would use sed:
sed '/^>/s/^>\([^ ]*\) .*/>\1 /'
If a line starts with > then remove everything after the first space. The following:
echo '>Gene.10::S0008.1::g.10::m.10 Gene.10::S0008.1::g.10 ORF type:complete len:250 (-),score=22.42 Sxl_rink_0008.1:286-1035(-)
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11 Gene.11::S0009.1::g.10 ORF type:complete len:250 (-),score=22.42 Sxl_rink_0008.1:286-1035(-)
QSAISNDEELNKIMDA' | sed '/^>/s/^>\([^ ]*\) .*/>\1 /'
outputs:
>Gene.10::S0008.1::g.10::m.10
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11
QSAISNDEELNKIMDA
I don't know if the one space left after the header is relevant or not, but I left it.
If in those long lines of characters are no spaces anywhere, you can just remove everything until the first space with cut:
cut -d' ' -f1
which will remove all characters after the first space (including the space, dunno if the space is relevant).
#edit: As the OP edited both the input and the output, the answer now removes everything up to the first space, as to removing up to the second space...
Using awk you will have a more readable solution :
awk 'NR==1{print $1}NR!=1{print}' test.txt
Then you can redirect output to new file to store the fix :
awk 'NR==1{print $1}NR!=1{print}' test.txt > new_test.txt
EDIT
I thought there was multiple files, and just one header per file.
awk '{print $1}' test.txt
would work on your example as other lines does not contain spaces
Perl to the rescue!
perl -pe 's/ .*// if /^>/' -- file.fasta
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I have a text file which contains a series of same line except at the end.
eg
lesi-1-1500-1
lesi-1-1500-2
lesi-1-1500-3
how can I remove the last number? it goes upto 250
to change in the file itself
sed -i 's/[0-9]\+$//' /path/to/file
or
sed 's/[0-9]\+$//' /path/to/file > /path/to/output
see example
You can do it with Awk by breaking it into fields.
echo "lesi-1-1500-2" > foo.txt
echo "lesi-1-1500-3" >> foo.txt
cat foo.txt | awk -F '-' '{print $1 "-" $2 "-" $3 }'
The -F switch allows us to set the delimiter which is -. Then we just print the first three fields with - for formatting.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 months ago.
Improve this question
I have a string. Part of it contains "Log":true. which I would like to remove using bash and sed.
Original line
[...]\"Date\":\"1661731200000\",\"Log\":true,\"$$type\":\"system\",\"created\":\"2022-08-01T13:37:43+0[...]
Modified line
[...]\"Date\":\"1661731200000\",\"$$type\":\"system\",\"created\":\"2022-08-01T13:37:43+0[...]
I'm struggling to find the right expression. Is is possible to achieve it with sed?
Match ,\"Log\": followed by any sequence of alphabetic characters.
sed 's/,\"Log\":[a-z]*//' filename
#!/bin/sh
cat << EOF >> edpop
2d
wq
EOF
cat file | tr ',' '\n' > file2
ed -s file2 < edpop
cat file2 | tr '\n' ',' > file
rm -v ./file2
rm -v ./edpop
This replaces the commas with newlines, deletes the second line with ed, (which corresponds with the second comma field) and then replaces the newlines with commas again.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
i have a file which contain :
Source defaults file; edit that file to configure this script.
AUTOSTART="all"
STATUSREFRESH=10
OMIT_SENDSIGS=0
if test -e /etc/default/openvpn ; then
. /etc/default/openvpn
fi
i want to change the path /etc/default/openvpn in line 5 to /mnt/data/default/openvpn
the same thing about line 6.
I couldn't using sed -i '5s/etc/default...' ,
and with awk i can't replace the result in the file.
any one have a idea please ?
Thank you.
commands tried :
var1='/etc/default/openvpn'
var2='/mnt/data/default/openvpn'
sed -i '5s/'$var'/'$var2'/' files.txt
sed -i '5s/etc/default/openvpn/mnt/data/default/openvpn/' files.txt
sed -i '5s/'/etc/default/openvpn'/'/mnt/data/default/openvpn'/g' files.txt
awk 'NR==5 { sub("/etc/default/openvpn", "/etc/default/openvpn", $0); print }' files.txt
with awk, i can't save changes in the file
The issue here would be the delimiter in use as it will conflict with sed's default delimiter.
To resolve this, you can change the delimiter in use to any other character that does not appear in your data or escaping the default delimiter \/.
Using sed
$ sed -i.bak 's|/etc/default/openvpn|/mnt/data/default/openvpn|' input_file
$ cat input_file
Source defaults file; edit that file to configure this script.
AUTOSTART=all
STATUSREFRESH=10
OMIT_SENDSIGS=0
if test -e /mnt/data/default/openvpn ; then
. /mnt/data/default/openvpn
fi
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
This is from a vhost file. This is the output I get
ServerName uat3-dam-something1.prg-dc.brb.com
Hello,
I'm wondering how to cut from this output so only this part remains
something1.prg-dc.brb.com
Keep in mind that "something" could be "something4141411" or "something23". So length operations won't work. Tried with cut command and AWK, but didn't work. I would be happy receive a tips from the bash experts :)
Like this :
grep -o 'something.*' file
or more specific:
grep -oE 'something[0-9]+\..*' file
Output:
something1.prg-dc.brb.com
Could you please try following, written and tested with provided samples only.
awk -F'uat3-dam-' '{print $NF}' Input_file
Description: Making uat3-dam- as field separator and printing last field of it.
2nd solution:
awk 'match($0,/something.*/){print substr($0,RSTART,RLENGTH)}' Input_file
Using:
echo "ServerName uat3-dam-something1.prg-dc.brb.com" |cut -d\- -f3-4
Will return:
something1.prg-dc.brb.com
And if you change the string (as you mention):
echo "ServerName uat3-dam-something111111.prg-dc.brb.com" |cut -d\- -f3-4
It will keep returning:
something111111.prg-dc.brb.com
$ echo 'ServerName uat3-dam-something1.prg-dc.brb.com' | awk -F- '{sub(".*" $2 FS,"")}1'
something1.prg-dc.brb.com
This will work:
echo "ServerName uat3-dam-something1.prg-dc.brb.com" | sed -E 's/.*(something.*)/\1/'
Or, if the string is in a file named file
sed -E 's/.*(something.*)/\1/' file
Explanation:
-E is for extended regex
.*(something.*) means "any char 0 or more times followed by something and any other char 0 or more times".
\1 is used to print only the matching part inside the brackets.
You could also use :
echo ${test#*dam-}
Example :
test="ServerName uat3-dam-something1.prg-dc.brb.com"
echo ${test#*dam-}
which gives:
something1.prg-dc.brb.com
Note that the opposite version would be echo ${test%something*}
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am a beginner.. I'd like to use Linux shell to make the following file
1 2 2
2 3 4
4 5 2
4 2 1
....
into
1,2,2
2,3,4
4,5,2
4,2,1
Thank you very much!
Are you looking for something like this:-
sed -e "s/ /,/g" < a.txt
or may be easier like this:
tr ' ' ',' <input >output
or in Vim you can use the Regex:
s/ /,/g
The question asks "line by line". In bash :
while read line; do echo $line | sed 's/ /,/g'; done < file
It will read file line by line into line, print (echo) each line and pipe (|) it to sed which will change spaces into commas. You can add > newfile at the end (but > file won't work) if you need to store it in a file.
But if you don't need anything else than changing characters in the file, processing the whole file at once is easier and probably quicker :
sed -i 's/ /,/g' file
(option -i is for modifying the file directly, as opposed to print modifications to stdout).
Read more about sed to understand its syntax, you'll need it eventually.