Edit text in a file with a script - linux

I have a file called flw.py and would like to write a bash script that will replace some text in the file (take out the last two lines and add two new lines). I apologize if this seems like a stupid question. A thorough explanation would be appreciated since I am still learning to script. Thanks!

head -n -2 flw.py > tmp # (1)
echo "your first new line here..." >> tmp # (2)
echo "your second new line here...." >> tmp #
mv tmp flw.py # (3)
Explanation:
head normally prints out the first ten lines of a file. The -n argument can change the number of lines printed out. So if you wanted to print out the first 15 lines you would use head -n 15. If you give negative numbers to head it means the opposite: print out all lines but the last N lines. Which happens to be what you want: head -n -2
Then we redirect the output of our head command to a temporary file named tmp. > does the redirecting magic here. tmp now contains everything of flw.py but the last two lines.
Next we add the two new lines by using the echo command. We append the output of the echo "your first new line here..." to our tmp file. >> appends to an existing file, whereas > will overwrite an existing file.
We do the same thing for the second line we want to append.
Last, we move the tmp file to flw.py and the job is done.

You can use single sed command to get you expect result
sed -n 'N;$!P;$!D;a\line\n\line2' fly.py
Example:
cat fly.py
1
2
3
4
5
sed -n 'N;$!P;$!D;a\line\n\line2' fly.py
Output :
1
2
3
line1
line2
Note :
Using -i option to update your file

Related

Creating 3 column TAB file using name of files in directory

I have over 100 files in a directory with format xxx_1_sequence.fastq.gz and xxx_2_sequence.fastq.gz
The goal is to create a TAB file with 3 columns in this format:
xxx ---> xxx_1_sequence.fastq.gz ---> xxx_2_sequence.fastq.gz
where ---> is a tab.
I was thinking of creating a for loop or maybe using string manipulation in order to achieve this. My knowledge is rudimentary at this stage, so any help would be much appreciated.
Would you please try the following:
shopt -s extglob # enable extended pattern matching
suffix="sequence.fastq.gz"
for f in !(*"$suffix"); do # files which does not match the pattern
if [[ -f ${f}_1_$suffix && -f ${f}_2_$suffix ]]; then
# check the existence of the files just in case
printf "%s\t%s\t%s\n" "$f" "${f}_1_$suffix" "${f}_2_$suffix"
fi
done
If your files are in a directory called files:
paste -d '\t' \
<(printf "%s\n" files/*_1_sequence.fastq.gz | sort) \
<(printf "%s\n" files/*_2_sequence.fastq.gz | sort) \
| sed 's/\(.*\)_1_sequence.fastq.gz/\1\t\1_1_sequence.fastq.gz/' \
> out.tsv
Explanation:
printf "%s\n" will print every argument in a new line. So:
printf "%s\n" files/*_1_sequence.fastq.gz | sort
prints a sorted list of the first type of files (the second column in your output). And of course it's symmetrical with *_2_sequence.fastq.gz (the third column).
(We probably don't need the sort part, but it helps clarify the intention.)
The syntax <(some shell command) runs some shell command, puts its output into a temporary input file, and passes that file as an argument. You can see the temporary file like so:
$ echo <(echo a) <(echo b)
/dev/fd/63 /dev/fd/62
So we are passing 2 (temporary) files to paste. If each output file has N lines, then paste outputs N lines, where line number K is a concatenation of line K of each of the files, in order.
For example, if line 4 of the first file is hello and line 4 if the second file is world, paste will have hello\tworld as line 4 of the output. But instead of trusting the default, we're setting the delimiter to TAB explicitly with -d '\t'.
That gives us the last 2 columns of our tab-separated-values file, but the first column is the * part of *_1_sequence.fastq.gz, which is where sed comes in.
We tell sed to replace \(.*\)_1_sequence.fastq.gz with \1\t\1_1_sequence.fastq.gz. .* will match anything, and \(some-pattern\) tells sed to remember the text that matched the pattern.
The first parentheses in sed's regex are can be read back into the replacement pattern as \1, which is why we have \1_1_sequence.fastq.gz in the replacement pattern.
But now we can also use \1 to create the first column of our tsv, which is why we have \1\t.
Thankyou for the help guys- I was thrown into a coding position a week ago with no prior experience and have been struggling.
I ended up with this printf "%s\n" *_1_sequence.fastq.gz | sort | sed 's/\(.*\)_1_sequence.fastq.gz/\1\t\1_1_sequence.fastq.gz\t\1_2_sequence.fastq.gz/ ' > NULLARBORformat.tab
and it does the job perfectly!

How to copy data from file to another file starting from specific line

I have two files data.txt and results.txt, assuming there are 5 lines in data.txt, I want to copy all these lines and paste them in file results.txt starting from the line number 4.
Here is a sample below:
Data.txt file:
stack
ping
dns
ip
remote
Results.txt file:
# here are some text
# please do not edit these lines
# blah blah..
this is the 4th line that data should go on.
I've tried sed with various combinations but I couldn't make it work, I'm not sure if it fit for that purpose as well.
sed -n '4p' /path/to/file/data.txt > /path/to/file/results.txt
The above code copies line 4 only. That isn't what I'm trying to achieve. As I said above, I need to copy all lines from data.txt and paste them in results.txt but it has to start from line 4 without modifying or overriding the first 3 lines.
Any help is greatly appreciated.
EDIT:
I want to override the copied data starting from line number 4 in
the file results.txt. So, I want to leave the first 3 lines without
modifications and override the rest of the file with the data copied
from data.txt file.
Here's a way that works well from cron. Less chance of losing data or corrupting the file:
# preserve first lines of results
head -3 results.txt > results.TMP
# append new data
cat data.txt >> results.TMP
# rename output file atomically in case of system crash
mv results.TMP results.txt
You can use process substitution to give cat a fifo which it will be able to read from :
cat <(head -3 result.txt) data.txt > result.txt
head -n 3 /path/to/file/results.txt > /path/to/file/results.txt
cat /path/to/file/data.txt >> /path/to/file/results.txt
if you can use awk:
awk 'NR!=FNR || NR<4' Result.txt Data.txt

Linux command to grab lines similar between files

I have one file that has one word per line.
I have a second file that has many words per line.
I would like to go through each line in the first file, and all lines for which it is found in the second file, I would like to copy those lines from the second file into a new third file.
Is there a way to do this simply with Linux command?
Edit: Thanks for the input. But, I should specify better:
The first file is just a list of numbers (one number per line).
463463
43454
33634
The second file is very messy, and I am only looking for that number string to be in lines in any way (not necessary an individual word). So, for instance
ewjleji jejeti ciwlt 463463.52%
would return a hit. I think what was suggested to me does not work in this case (please forgive my having to edit for not being detailed enough)
If n is the number of lines in your first file and m is the number of lines in your second file, then you can solve this problem in O(nm) time in the following way:
cat firstfile | while read word; do
grep "$word" secondfile >>thirdfile
done
If you need to solve it more efficiently than that, I don't think there are any builtin utilties for that, however.
As for your edit, this method does work the way you describe.
Here is a short script that will do it. it will take 3 command line arguments 1- file with 1 word per line, 2- file with many lines you want to match for each word in file1 and 3- your output file:
#!/bin/bash
## test input and show usage on error
test -n "$1" && test -n "$2" && test -n "$3" || {
printf "Error: insufficient input, usage: %s file1 file2 file3\n" "${0//*\//}"
exit 1
}
while read line || test -n "$line" ; do
grep "$line" "$2" 1>>"$3" 2>/dev/null
done <"$1"
example:
$ cat words.txt
me
you
them
$ cat lines.txt
This line is for me
another line for me
maybe another for me
one for you
another for you
some for them
another for them
here is one that doesn't match any
$ bash ../lines.sh words.txt lines.txt outfile.txt
$ cat outfile.txt
This line is for me
another line for me
maybe another for me
some for them
one for you
another for you
some for them
another for them
(yes I know that me also matches some in the example file, but that's not really the point.

Sed command to reverse print a file skips last line

I am trying to use sed to reverse print a file (I know it can be done in myriads of other ways, like tac, for example). This is what I got so far:
sed -n -r '/.+/{x;H;$p}' nums
The file nums contains a number on each line, from 1 to 25. The logic is this, each number is first copied to the holdspace, while the contents of the holdspace are copied to patspace through x;. So now holdspace contains the current number, while patspace contains the numbers till now in reverse order. Then H; appends the list of numbers gone through until now to the holdspace, so that the latest numbr always stays on top. Finally, when last line is reached, it is done for one last time, and printed out through $p.
However, the command outputs only the numbers 24 to 1 in reverse order. The first line should be 25, but somehow that line is being skipped.
Can anyone tell me what is the problem with this code?
You're missing one last x. You swap the contents of the last line into the hold space, but you never pull it back out before printing.
sed -n -r -e '/.+/{x;H}' -e '${x;p}'
You also get an extra blank line because you really don't want to do the 'H' on the first line, since that preserves the initial blank contents of the hold space. I would do this:
sed -n -r -e '/.+/x' -e '2,$H' -e '${x;p}' nums
This sed should do:
sed '1!G;h;$!d' file
ot this:
sed -n '1!G;h;$p' file

How to copy the first few lines of a giant file, and add a line of text at the end of it using some Linux commands?

How do I copy the first few lines of a giant file and add a line of text at the end of it, using some Linux commands?
The head command can get the first n lines. Variations are:
head -7 file
head -n 7 file
head -7l file
which will get the first 7 lines of the file called "file". The command to use depends on your version of head. Linux will work with the first one.
To append lines to the end of the same file, use:
echo 'first line to add' >> file
echo 'second line to add' >> file
echo 'third line to add' >> file
or:
echo 'first line to add
second line to add
third line to add' >> file
to do it in one hit.
So, tying these two ideas together, if you wanted to get the first 10 lines of the input.txt file to output.txt and append a line with five "=" characters, you could use something like:
( head -10 input.txt ; echo '=====' ) > output.txt
In this case, we do both operations in a sub-shell so as to consolidate the output streams into one, which is then used to create or overwrite the output file.
I am assuming what you are trying to achieve is to insert a line after the first few lines of of a textfile.
head -n10 file.txt >> newfile.txt
echo "your line >> newfile.txt
tail -n +10 file.txt >> newfile.txt
If you don't want to rest of the lines from the file, just skip the tail part.
First few lines: man head.
Append lines: use the >> operator (?) in Bash:
echo 'This goes at the end of the file' >> file
sed -n '1,10p' filename > newfile
echo 'This goes at the end of the file' >> newfile

Resources