How to use Linux to read a file line by line and replace all the spaces into ','? [closed] - linux

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am a beginner.. I'd like to use Linux shell to make the following file
1 2 2
2 3 4
4 5 2
4 2 1
....
into
1,2,2
2,3,4
4,5,2
4,2,1
Thank you very much!

Are you looking for something like this:-
sed -e "s/ /,/g" < a.txt
or may be easier like this:
tr ' ' ',' <input >output
or in Vim you can use the Regex:
s/ /,/g

The question asks "line by line". In bash :
while read line; do echo $line | sed 's/ /,/g'; done < file
It will read file line by line into line, print (echo) each line and pipe (|) it to sed which will change spaces into commas. You can add > newfile at the end (but > file won't work) if you need to store it in a file.
But if you don't need anything else than changing characters in the file, processing the whole file at once is easier and probably quicker :
sed -i 's/ /,/g' file
(option -i is for modifying the file directly, as opposed to print modifications to stdout).
Read more about sed to understand its syntax, you'll need it eventually.

Related

how can I remove some numbers at the end of line in a text file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I have a text file which contains a series of same line except at the end.
eg
lesi-1-1500-1
lesi-1-1500-2
lesi-1-1500-3
how can I remove the last number? it goes upto 250
to change in the file itself
sed -i 's/[0-9]\+$//' /path/to/file
or
sed 's/[0-9]\+$//' /path/to/file > /path/to/output
see example
You can do it with Awk by breaking it into fields.
echo "lesi-1-1500-2" > foo.txt
echo "lesi-1-1500-3" >> foo.txt
cat foo.txt | awk -F '-' '{print $1 "-" $2 "-" $3 }'
The -F switch allows us to set the delimiter which is -. Then we just print the first three fields with - for formatting.

How can I shorten header in a fasta file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a file that looks like this:
>Gene.10::S0008.1::g.10::m.10 Gene.10::S0008.1::g.10 ORF type:complete len:250 (-),score=22.42 S_0008.1:286-1035(-)
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11 Gene.11::S0009.1::g.10 ORF type:complete len:250 (-),score=22.42 S_0008.1:286-1035(-)
QSAISNDEELNKIMDA
....
I want to delete everything in the header after the first space. How can I do this easily in linux?
Resultant file:
>Gene.10::S0008.1::g.10::m.10
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11
QSAISNDEELNKIMDA
I would use sed:
sed '/^>/s/^>\([^ ]*\) .*/>\1 /'
If a line starts with > then remove everything after the first space. The following:
echo '>Gene.10::S0008.1::g.10::m.10 Gene.10::S0008.1::g.10 ORF type:complete len:250 (-),score=22.42 Sxl_rink_0008.1:286-1035(-)
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11 Gene.11::S0009.1::g.10 ORF type:complete len:250 (-),score=22.42 Sxl_rink_0008.1:286-1035(-)
QSAISNDEELNKIMDA' | sed '/^>/s/^>\([^ ]*\) .*/>\1 /'
outputs:
>Gene.10::S0008.1::g.10::m.10
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11
QSAISNDEELNKIMDA
I don't know if the one space left after the header is relevant or not, but I left it.
If in those long lines of characters are no spaces anywhere, you can just remove everything until the first space with cut:
cut -d' ' -f1
which will remove all characters after the first space (including the space, dunno if the space is relevant).
#edit: As the OP edited both the input and the output, the answer now removes everything up to the first space, as to removing up to the second space...
Using awk you will have a more readable solution :
awk 'NR==1{print $1}NR!=1{print}' test.txt
Then you can redirect output to new file to store the fix :
awk 'NR==1{print $1}NR!=1{print}' test.txt > new_test.txt
EDIT
I thought there was multiple files, and just one header per file.
awk '{print $1}' test.txt
would work on your example as other lines does not contain spaces
Perl to the rescue!
perl -pe 's/ .*// if /^>/' -- file.fasta

Remove lines in text file which contain fewer than 4 pipes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I have a text file with data separated by 4 separate |
There are some problem lines in the file. These lines contain fewer than 4 pipes.
The data in the problem rows is not needed and I want to run a command on the file which deletes any line which contains fewer than four pipes. I would also like to know how many lines were deleted afterwards so if this could be printed on the screen once the command is applied that would be ideal.
Sample data:
865|Blue Moon Club|Havana Project|34d|879
899|Soya Plates|Dimsby|78a|699
657|Sherlock
900|Forestry Commission|Eden Project|68d|864
Desired output:
865|Blue Moon Club|Havana Project|34d|879
899|Soya Plates|Dimsby|78a|699
900|Forestry Commission|Eden Project|68d|864
I have tried awk '|>=3' file.txt which didn't work. There is a lot of info out there regarding awk, some of which I found, but there's so much it makes it difficult to find exactly what I want to do due to its sheer volume.
To eliminate the lines:
grep '|.*|.*|.*|' file > newfile
To count the number of bad lines:
grep -cv '|.*|.*|.*|' file
That doesn't do the edit in place; you could do that with sed but it is often safer to do edits like this to a newfile, in order to avoid losing data if you make a mistake.
The first grep pattern matches any line with four pipe symbols. (By default, grep uses "Basic" regular expressions, in which you have to write the alternation operator \|. So you can use | as an ordinary character.)
The second invocation counts (-c) the number of non-matching (-v) lines.
Here's a simple sed solution:
sed -n -i.bak '/|.*|.*|.*|/p' file
The -n option turns off automatic printing, so the command only prints the lines which match the pattern. (Again, by default, sed uses basic regexes.). The -i.bak option does the edit in place, creating a backup of the original with the name file.bak.
If you wanted to select lines with exactly four pipes, you could use awk:
awk -F'|' 'NF==5' file > newfile
which will set the filed separator to a pipe symbol and then select the lines with exactly five fields, which are the lines with four pipes.
A useful tool to count lines is wc:
wc -l file
will tell you how many lines are in file; if you count lines in both file and newfile, the difference will obviously be the number of deletions. You could do that computation in awk, too, but it's a bit wordier:
awk -F'|' 'NF==5{print;next}{del+=1}END{print del >>"/dev/stderr"}' file > newfile
This will do:
sed -i.bak '/\([^|]*|\)\{4\}/!d' file
Or (as Cyrus's comment)
sed -i.bak -E '/(\|[^\|]*){4}/!d' file
Or
sed -n '/^[^|]*|[^|]*|[^|]*|[^|]*|$/p' file > newfile
Or
sed -e '/^[^|]*|[^|]*|[^|]*|$/d' \
-e '/^[^|]*|[^|]*|$/d' \
-e '/^[^|]*|$/d' \
-e '/^[^|]*$/d' \
-i.bak file
This won't give you line count though. To get line count run grep -cv '^[^|]*|[^|]*|[^|]*|[^|]*|$' file on the original file as rici mentioned, or compare the line number before and after with wc -l file command
Explanation:
The first two sed matches loosely 4 pipes (not less but can be more) and the third one matches exactly 4 | (not more or less).
The fourth sed matches exactly 3,2,1 and 0 pipes (|) and deletes those lines (in place) and prepares a backup file (file.bak) of the original.

Change the path address in a text file by shell scripting [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In my Bash script, I have to change a name to a path address(new address) in a text file:
(MYADDREES) change to ( /home/run1/c1 ) and save it as new file.
I did like this: defined a new variable = new address and tried to replace it in previous address in text file.
I use sed but it has problem.
My script was:
#!/bin/bash
# To debug
set -x
x=`pwd`
echo $x
sed "s/MYADDRESS/$x/g" < sample1.txt > new.txt
exit
The output of pwd is likely to contain / characters, making your sed expression look something like s/MYADDRESS//home/user/somewhere/. This makes it impossible for sed to sort out what should be replaced with what. There are two solutions:
Use a different delimiter for sed:
sed "s,MYADDRESS,$x,g" < sample1.txt > new.txt
...although this will have the same problem if the current path contains a comma character or something else that is a special character for sed, so the more robust approach is to use awk instead:
awk -v curdir="$(pwd)" '{ gsub("MYADDRESS", curdir); print }' < sample1.txt > new.txt

How to read a file backwards on Linux? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I know that I can use cat to print all content from a file from beginning to end on Linux.
Is there a way for doing that backward (last line first)?
Yes, you can use "tac" command.
From man tac:
Usage: tac [OPTION]... [FILE]...
Write each FILE to standard output, last line first.
With no FILE, or when FILE is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-b, --before attach the separator before instead of after
-r, --regex interpret the separator as a regular expression
-s, --separator=STRING use STRING as the separator instead of newline
--help display this help and exit
--version output version information and exit
sed '1!G;h;$!d' file
sed -n '1!G;h;$p' file
perl -e 'print reverse <>' file
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file
tac is one way, but not default available on all linux.
awk could do it like:
awk '{a[NR]=$0}END{for(i=NR;i>=1;i--)print a[i]}' file

Resources