Add text to specific text line - linux

i'm kinda new to linux programing, and I've searched everywere, and i don't find any answer for my question, i have a file lets call it, config.txt/.ini;
My question is: Is there anyway with a script, to find in the file some text and if it finds the search text do something;
For exemple:
Search for: 'my/text/mytext'
And add: ';' to the begin of the line.
or even delete the line.

Have you considered looking at tools such as:
awk
sed
perl
python
which all can do this fairly easily.
Awk is probably the slimmest (and thus fastest) of these:
awk '{sub(/root/, "yoda"); print}'
will substitute the first match for regexp root with the string yoda on each line.

Since your question is vague, and you didn't define what kind of script, and because I'm currently learning Python, I took the time to write a python script to remove lines in foo.txt that contain "mytext". Yes, it is possible. There are countless other ways to do it as well.
import re
# Open the file and read all the lines into an array
f = open("foo.txt", "r")
lines = [];
for line in f:
lines.append(line)
f.close()
# Write all the lines back that don't match our criteria for removal
f = open("foo.txt", "w")
for line in lines:
if re.search("mytext", line) == None:
f.write(line)
f.close()

Related

Is there a faster way to extract lines from a file?

I have a set of files that I need to search through and extract certain lines. Right now, I'm using a for loop but this is proving costly in terms of time. Is there a faster way than the below?
import re
for file in files:
localfile = open(file, 'r')
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
open("words.txt","a+").write(words + "\n")
Well for one thing, you are creating a new file descriptor every time that you write to the words.txt file.
I ran some tests and found that python garbage collection does in fact close open file descriptors when they become inaccessible (at least in my test case).
However, creating a file descriptor every time that you want to append to a file is going to be costly. For future reference, it is considered good practice to use with as blocks for opening files.
TLDR:
One improvement you could make is to open the file you are writing to just once.
Here is what that would look like:
import re
with open("words.txt","a+") as words_file:
for file in files:
localfile = open(file, 'r')
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
words_file.write(words + "\n")
Like I said, using with as statements when opening files is considered best practice. We can fully implement this best practice like so:
import re
with open("words.txt","a+") as words_file:
for file in files:
with open(file, 'r') as localfile:
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
words_file.write(words + "\n")

Need to create a single record from 3 consecutive lines of text

I can easily write a little parse program to do this task, but I just know that some linux command line tool guru can teach me something new here. I've pulled apart a bunch of files to gather some data within such that I can create a table with it. The data is presently in the format:
.serviceNum=6360,
.transportId=1518,
.suid=6360,
.serviceNum=6361,
.transportId=1518,
.suid=6361,
.serviceNum=6362,
.transportId=1518,
.suid=6362,
.serviceNum=6359,
.transportId=1518,
.suid=6359,
.serviceNum=203,
.transportId=117,
.suid=20203,
.serviceNum=9436,
.transportId=919,
.suid=16294,
.serviceNum=9524,
.transportId=906,
.suid=17613,
.serviceNum=9439,
.transportId=917,
.suid=9439,
What I would like is this:
.serviceNum=6360,.transportId=1518,.suid=6360,
.serviceNum=6361,.transportId=1518,.suid=6361,
.serviceNum=6362,.transportId=1518,.suid=6362,
.serviceNum=6359,.transportId=1518,.suid=6359,
.serviceNum=203,.transportId=117,.suid=20203,
.serviceNum=9436,.transportId=919,.suid=16294,
.serviceNum=9524,.transportId=906,.suid=17613,
.serviceNum=9439,.transportId=917,.suid=9439,
So, the question is, is there a linux command line tool that will somehow read through the file and auto remove the EOL/CR on the end of every 2nd and 3rd line? I've seen old school linux gurus do incredible things on the command line and this is one of those instances where I think it's worth my time to inquire. :)
TIA
O
Use cat and paste and see the magic
cat inputfile.txt | paste - - -
Perl to the rescue:
perl -pe 'chomp if $. % 3' < input
-p processes the input line by line printing each line;
chomp removes the final newline;
$. contains the input line number;
% is the modulo operator.
perl -alne '$a.=$_; if($.%3==0){print $a; $a=""}' filename

adding an adapter sequence to the end of a fastq file

I have a large fastq file and I want to add the sequence "TTAAGG" to the end of each sequence in my file (the 2nd line then every 4th line after), while still maintaining the fastq file format. For example:
this is the first line I start with:
#HWI-D00449:41:C2H8BACXX:5:1101:1219:2053 1:N:0:
GCAATATCCTTCAACTA
+
FFFHFHGFHAGGIIIII
and I want it to print out:
#HWI-D00449:41:C2H8BACXX:5:1101:1219:2053 1:N:0:
GCAATATCCTTCAACTATTAAGG
+
FFFHFHGFHAGGIIIII
I imagine sed or awk would be good for this, but I haven't been able to find a solution that allows me to keep the fastq format.
I tried:
awk 'NR%4==2 { print $0 "TTAAGG"}' < file_in.fastq > fileout_fastq
which added the TTAAGG to the second line and then every fourth line, but it also deleted the other three lines.
Does anyone have an suggestions of command lines I can use or if you know of a package currently available that can do this, please let me know!
Try this with GNU sed:
sed '2~4s/$/TTAAGG/' file

How to delete the line that matches a pattern and the line after it with sed?

I have a file that looks something like:
good text
good text
FLAG bad text
bad text
good text
good text
good test
bad Text FLAG bad text
bad text
good text
I need to delete any line containing "FLAG" and I always need to delete the one line immediately following the "FLAG" line too.
"FLAG" lines come irregularly enough that I can't rely on any sort of line number strategy.
Anyone know how to do this with sed?
Using an extension of the GNU version of sed:
sed -e '/FLAG/,+1 d' infile
It yields:
good text
good text
good text
good text
good test
good text
This works, and doesn't depend on any extensions:
sed '/FLAG/{N
d
}' infile
N reads the next line into the pattern space, then d deletes the pattern space.
Here is one way with awk:
awk '/FLAG/{f=1;next}f{f=0;next}1' file
or
awk '/FLAG/{getline;next}1' file

How to append every third line in Vim?

I'm not at all familiar with Vim but I'm working with large text files (~1G) and my standard text editors weren't cutting it.
My files are currently in this format:
Arbitrary_title_of_sequenceA
SEQ1SEQ1SEQ1SEQ1
SEQ2SEQ2SEQ2SEQ2
Arbitrary_title_of_sequenceB
SEQ1SEQ1SEQ1SEQ1
SEQ2SEQ2SEQ2SEQ2
I need a convenient way of appending the "SEQ2" line to the "SEQ1" line like so:
Arbitrary_title_of_sequenceA
SEQ1SEQ1SEQ1SEQ1SEQ2SEQ2SEQ2SEQ2
Arbitrary_title_of_sequenceB
SEQ1SEQ1SEQ1SEQ1SEQ2SEQ2SEQ2SEQ2
Considering the size of these files, doing each line separately isn't really an option. Any help would be much appreciated!
What about providing a correct sample to begin with?
:g/SEQ1/norm Jx
does what I think you want.
:g/SEQ1 is the :global command which allows you to act on each line containing the pattern SEQ1. See :help :global.
norm is the :normal command that you use to perform a normal mode command, here on every line matched by :g/SEQ1. See :help :normal.
After that comes the normal command in question:
J is used to join the current line with the line below.
x is used to remove the <Space> automatically added by Vim.
:1,$s/\(.*\n\)\(.*\)\n\(.*\n\)/\1\2\3/
1,$ -> range is all file
s/PAT1/PAT2/ -> substitute PAT1 with PAT2
.* -> match any character except new line
\n -> match new line
\(PAT1\) -> capture/remember the string that matched PAT1
\1,\2,\3 -> refers to the captured string for captures in order
Also using sed instead of vim should be faster:
sed -i 'n;N;s/\n/ /' input_file
This can be summarized as:
Read a line
Read another line and print previous line (n)
Read another line and append it to the previous line (N)
find the first newline and change it to space (s/\n/ /)
print the line (or merged lines)
I think romainl's solution is the best if you have a reliable "SEQ1" pattern you can grab onto. If not and you want to literally join every third line, you could easily do this with a macro:
qqjJxjq
Hit G to see how many lines are in the file and just repeat the macro that many times (it doesn't matter that it's higher than you need). So if the file was 1000 lines you could do 1000#q. This kind of solution is easy to remember and integrate into your normal workflow.

Resources