How To Delete First X Lines Based On Minimum Lines In File - linux

I have a file with 10,000 lines. Using the following command, I am deleting all lines after line 10,000.
sed -i '10000,$ d' file.txt
However, now I would like to delete the first X lines so that the file has no more than 10,000 lines.
I think it would be something like this:
sed -i '1,$x d' file.txt
Where $x would be the number of lines over 10,000. I'm a little stuck on how to write the if, then part of it. Or, I was thinking I could use the original command and just cat the file in reverse?
For example, if we wanted just 3 lines from the bottom (seems simpler after a few helpful answers):
Input:
Example Line 1
Example Line 2
Example Line 3
Example Line 4
Example Line 5
Expected Output:
Example Line 3
Example Line 4
Example Line 5
Of course, if you know a more efficient way to write the command, I would be open to that too. Your positive input is highly appreciated.

tail can do exactly what you want.
tail -n 10000 file.txt

For simplicity, I would reverse the file, keep the first 10000 lines, then re-reverse the file.
It makes saving the file in-place a touch more complicated
source=file.txt
temp=$(mktemp)
tac "$source" | sed '10000 q' | tac > "$temp" && mv "$temp" "$source"
Without reversing the file, you'd count the number of lines and do some arithmetic:
sed -i "1,$(( $(wc -l < file.txt) - 10000 )) d" file.txt

$ awk -v n=3 '{a[NR%n]=$0} END{for (i=NR+1;i<=(NR+n);i++) print a[i%n]}' file
Example Line 3
Example Line 4
Example Line 5
Add -i inplace if you have GNU awk and want to do "inplace" editing.

To keep the first 10000 lines :
head -n 10000 file.txt
To keep the last 10000 lines :
tail -n 10000 file.txt
Test with your file Example
tail -n 3 file.txt
Example Line 3
Example Line 4
Example Line 5

tac file.txt | sed "$x q" | tac | sponge file.txt
The sponge command is useful here in avoiding an additional temporary file.

tail -10000 <<<"$(cat file.txt)" > file.txt
Okay, not «just» tail, but this way it`s capable of inplace truncation.

Related

how to show the third line of multiple files

I have a simple question. I am trying to check the 3rd line of multiple files in a folder, so I used this:
head -n 3 MiseqData/result2012/12* | tail -n 1
but this doesn't work obviously, because it only shows the third line of the last file. But I actually want to have last line of every file in the result2012 folder.
Does anyone know how to do that?
Also sorry just another questions, is it also possible to show which file the particular third line belongs to?
like before the third line is shown, is it also possible to show the filename of each of the third line extracted from?
because if I used head or tail command, the filename is also shown.
thank you
With Awk, the variable FNR is the number of the "record" (line, by default) in the current file, so you can simply compare it to 3 to print the third line of each input file:
awk 'FNR == 3' MiseqData/result2012/12*
A more optimized version for long files would skip to the next file on match, since you know there's only that one line where the condition is true:
awk 'FNR == 3 { print; nextfile }' MiseqData/result2012/12*
However, not all Awks support nextfile (but it is also not exclusive to GNU Awk).
A more portable variant using your head and tail solution would be a loop in the shell:
for f in MiseqData/result2012/12*; do head -n 3 "$f" | tail -n 1; done
Or with sed (without GNU extensions, i.e., the -s argument):
for f in MiseqData/result2012/12*; do sed '3q;d' "$f"; done
edit: As for the additional question of how to print the name of each file, you need to explicitly print it for each file yourself, e.g.,
awk 'FNR == 3 { print FILENAME ": " $0; nextfile }' MiseqData/result2012/12*
for f in MiseqData/result2012/12*; do
echo -n `basename "$f"`': '
head -n 3 "$f" | tail -n 1
done
for f in MiseqData/result2012/12*; do
echo -n "$f: "
sed '3q;d' "$f"
done
With GNU sed:
sed -s -n '3p' MiseqData/result2012/12*
or shorter
sed -s '3!d' MiseqData/result2012/12*
From man sed:
-s: consider files as separate rather than as a single continuous long stream.
You can do this:
awk 'FNR==3' MiseqData/result2012/12*
If you like the file name as well:
awk 'FNR==3 {print FILENAME,$0}' MiseqData/result2012/12*
This might work for you (GNU sed & parallel):
parallel -k sed -n '3p\;3q' {} ::: file1 file2 file3
Parallel applies the sed command to each file and returns the results in order.
N.B. All files will only be read upto the 3rd line.
Also,you may be tempted (as I was) to use:
sed -ns '3p;3q' file1 file2 file3
but this will only return the first file.
Hi bro I am answering this question as we know FNR is used to check no of lines so we can run this command to get 3rd line of every file.
awk 'FNR==3' MiseqData/result2012/12*

How do i copy every line X line from a bunch of files to another file?

So my problem is as follows:
I have a bunch of files and i need only the information from a certain line in each of these files (the same line for all files).
Example:
I want the content of the line 10 from file example_1.dat~example_10.dat and then i want to save it on > test.dat
I tried using: head -n 5 example_*.dat > test.dat. But this gives me all the information from the top till the line i have chosen instead of just the line.
Please help.
$ for f in *.dat ; do sed -n '5p' $f >> test.dat ; done
This code will do the following:
Foreach file f in the directory that ends with .dat.
Use sed on the 5:th row in file and write to test.dat.
The ">>" will add the row at the bottom of the file if existing.
Use a combination of head and tail to zoom to the needed line. For example, head -n 5 file | tail -n 1
You can use a for loop to get it done over several files
for f in *.dat ; do head -n 5 $f | tail -n 1 >> test.dat ; done
PS: Don't forget to clean the test.dat file (> test.dat) before running the loop. Otherwise you'll get results from previous runs as well.
You can use sed or awk:
sed -n "5p"
awk "NR == 5"
This might work for you (GNU sed):
sed -sn '5wtest.dat' example_*.dat

Find line number in a text file - without opening the file

In a very large file I need to find the position (line number) of a string, then extract the 2 lines above and below that string.
To do this right now - I launch vi, find the string, note it's line number, exit vi, then use sed to extract the lines surrounding that string.
Is there a way to streamline this process... ideally without having to run vi at all.
Maybe using grep like this:
grep -n -2 your_searched_for_string your_large_text_file
Will give you almost what you expect
-n : tells grep to print the line number
-2 : print 2 additional lines (and the wanted string, of course)
You can do
grep -C 2 yourSearch yourFile
To send it in a file, do
grep -C 2 yourSearch yourFile > result.txt
Use grep -n string file to find the line number without opening the file.
you can use cat -n to display the line numbers and then use awk to get the line number after a grep in order to extract line number:
cat -n FILE | grep WORD | awk '{print $1;}'
although grep already does what you mention if you give -C 2 (above/below 2 lines):
grep -C 2 WORD FILE
You can do it with grep -A and -B options, like this:
grep -B 2 -A 2 "searchstring" | sed 3d
grep will find the line and show two lines of context before and after, later remove the third one with sed.
If you want to automate this, simple you can do a Shell Script. You may try the following:
#!/bin/bash
VAL="your_search_keyword"
NUM1=`grep -n "$VAL" file.txt | cut -f1 -d ':'`
echo $NUM1 #show the line number of the matched keyword
MYNUMUP=$["NUM1"-1] #get above keyword
MYNUMDOWN=$["NUM1"+1] #get below keyword
sed -n "$MYNUMUP"p file.txt #display above keyword
sed -n "$MYNUMDOWN"p file.txt #display below keyword
The plus point of the script is you can change the keyword in VAL variable as you like and execute to get the needed output.

How to get ONLY Second line with SED

How can I get second line in a file using SED
#SRR005108.1 :3:1:643:216
GATTTCTGGCCCGCCGCTCGATAATACAGTAATTCC
+
IIIIII/III*IIIIIIIIII+IIIII;IIAIII%>
With the data that looks like above I want only to get
GATTTCTGGCCCGCCGCTCGATAATACAGTAATTCC
You don't really need Sed, but if the pourpose is to learn... you can use -n
n read the next input line and starts processing the newline with the command rather than the first command
sed -n 2p somefile.txt
Edit: You can also improve the performance using the tip that manatwork mentions in his comment:
sed -n '2{p;q}' somefile.txt
You always want the second line of a file? No need for SED:
head -2 file | tail -1
This will print the second line of every file:
awk 'FNR==2'
and this one only the second line of the first file:
awk 'NR==2'
This might work for you:
sed '2q;d' file
cat your_file | head -2 | tail -1

Print a file, skipping the first X lines, in Bash [duplicate]

This question already has answers here:
How can I remove the first line of a text file using bash/sed script?
(19 answers)
Closed 3 years ago.
I have a very long file which I want to print, skipping the first 1,000,000 lines, for example.
I looked into the cat man page, but I did not see any option to do this. I am looking for a command to do this or a simple Bash program.
You'll need tail. Some examples:
$ tail great-big-file.log
< Last 10 lines of great-big-file.log >
If you really need to SKIP a particular number of "first" lines, use
$ tail -n +<N+1> <filename>
< filename, excluding first N lines. >
That is, if you want to skip N lines, you start printing line N+1. Example:
$ tail -n +11 /tmp/myfile
< /tmp/myfile, starting at line 11, or skipping the first 10 lines. >
If you want to just see the last so many lines, omit the "+":
$ tail -n <N> <filename>
< last N lines of file. >
Easiest way I found to remove the first ten lines of a file:
$ sed 1,10d file.txt
In the general case where X is the number of initial lines to delete, credit to commenters and editors for this:
$ sed 1,Xd file.txt
If you have GNU tail available on your system, you can do the following:
tail -n +1000001 huge-file.log
It's the + character that does what you want. To quote from the man page:
If the first character of K (the number of bytes or lines) is a
`+', print beginning with the Kth item from the start of each file.
Thus, as noted in the comment, putting +1000001 starts printing with the first item after the first 1,000,000 lines.
If you want to skip first two line:
tail -n +3 <filename>
If you want to skip first x line:
tail -n +$((x+1)) <filename>
A less verbose version with AWK:
awk 'NR > 1e6' myfile.txt
But I would recommend using integer numbers.
Use the sed delete command with a range address. For example:
sed 1,100d file.txt # Print file.txt omitting lines 1-100.
Alternatively, if you want to only print a known range, use the print command with the -n flag:
sed -n 201,300p file.txt # Print lines 201-300 from file.txt
This solution should work reliably on all Unix systems, regardless of the presence of GNU utilities.
Use:
sed -n '1d;p'
This command will delete the first line and print the rest.
If you want to see the first 10 lines you can use sed as below:
sed -n '1,10 p' myFile.txt
Or if you want to see lines from 20 to 30 you can use:
sed -n '20,30 p' myFile.txt
Just to propose a sed alternative. :) To skip first one million lines, try |sed '1,1000000d'.
Example:
$ perl -wle 'print for (1..1_000_005)'|sed '1,1000000d'
1000001
1000002
1000003
1000004
1000005
You can do this using the head and tail commands:
head -n <num> | tail -n <lines to print>
where num is 1e6 + the number of lines you want to print.
This shell script works fine for me:
#!/bin/bash
awk -v initial_line=$1 -v end_line=$2 '{
if (NR >= initial_line && NR <= end_line)
print $0
}' $3
Used with this sample file (file.txt):
one
two
three
four
five
six
The command (it will extract from second to fourth line in the file):
edu#debian5:~$./script.sh 2 4 file.txt
Output of this command:
two
three
four
Of course, you can improve it, for example by testing that all argument values are the expected :-)
cat < File > | awk '{if(NR > 6) print $0}'
I needed to do the same and found this thread.
I tried "tail -n +, but it just printed everything.
The more +lines worked nicely on the prompt, but it turned out it behaved totally different when run in headless mode (cronjob).
I finally wrote this myself:
skip=5
FILE="/tmp/filetoprint"
tail -n$((`cat "${FILE}" | wc -l` - skip)) "${FILE}"

Resources