how to show the third line of multiple files - linux

I have a simple question. I am trying to check the 3rd line of multiple files in a folder, so I used this:
head -n 3 MiseqData/result2012/12* | tail -n 1
but this doesn't work obviously, because it only shows the third line of the last file. But I actually want to have last line of every file in the result2012 folder.
Does anyone know how to do that?
Also sorry just another questions, is it also possible to show which file the particular third line belongs to?
like before the third line is shown, is it also possible to show the filename of each of the third line extracted from?
because if I used head or tail command, the filename is also shown.
thank you

With Awk, the variable FNR is the number of the "record" (line, by default) in the current file, so you can simply compare it to 3 to print the third line of each input file:
awk 'FNR == 3' MiseqData/result2012/12*
A more optimized version for long files would skip to the next file on match, since you know there's only that one line where the condition is true:
awk 'FNR == 3 { print; nextfile }' MiseqData/result2012/12*
However, not all Awks support nextfile (but it is also not exclusive to GNU Awk).
A more portable variant using your head and tail solution would be a loop in the shell:
for f in MiseqData/result2012/12*; do head -n 3 "$f" | tail -n 1; done
Or with sed (without GNU extensions, i.e., the -s argument):
for f in MiseqData/result2012/12*; do sed '3q;d' "$f"; done
edit: As for the additional question of how to print the name of each file, you need to explicitly print it for each file yourself, e.g.,
awk 'FNR == 3 { print FILENAME ": " $0; nextfile }' MiseqData/result2012/12*
for f in MiseqData/result2012/12*; do
echo -n `basename "$f"`': '
head -n 3 "$f" | tail -n 1
done
for f in MiseqData/result2012/12*; do
echo -n "$f: "
sed '3q;d' "$f"
done

With GNU sed:
sed -s -n '3p' MiseqData/result2012/12*
or shorter
sed -s '3!d' MiseqData/result2012/12*
From man sed:
-s: consider files as separate rather than as a single continuous long stream.

You can do this:
awk 'FNR==3' MiseqData/result2012/12*
If you like the file name as well:
awk 'FNR==3 {print FILENAME,$0}' MiseqData/result2012/12*

This might work for you (GNU sed & parallel):
parallel -k sed -n '3p\;3q' {} ::: file1 file2 file3
Parallel applies the sed command to each file and returns the results in order.
N.B. All files will only be read upto the 3rd line.
Also,you may be tempted (as I was) to use:
sed -ns '3p;3q' file1 file2 file3
but this will only return the first file.

Hi bro I am answering this question as we know FNR is used to check no of lines so we can run this command to get 3rd line of every file.
awk 'FNR==3' MiseqData/result2012/12*

Related

Get the line count from 2nd line of the file

How do I get the line count of a file from the 2nd line of the file, as the first line is header?
wc -l filename
Is there a way to set some condition into it?
Use the tail command:
tail -n +2 file | wc -l
-n +2 would print the file starting from line 2
You can use awk to count from 2nd line onwards:
awk 'NR>1{c++} END {print c}' file
Or simply use NR variable in the END block:
awk 'END {print NR-1}' file
Alternatively using BASH arithmetic subtract 1 from wc output:
echo $(( $(wc -l < file) -1 ))
Delete first line with GNU sed:
sed '1d' file | wc -l
There is no way to tweak the wc command itself. You should whether process the result of the command, or use another tool.
As suggested in other answers, if you are running Bash, a good way is to put the result of the command into an arithmetic expression like $(( $(command) - 1 )).
In case if you are searching for a portable solution, here is a Perl version:
perl -e '1 while <>; print $. - 1' < file
The variable $. holds the number of lines read since a file handle was last closed. The while loop reads all the lines from the file.
Alternately, you could just subtract 2.
echo $((`cat FILE | wc -l`-2))
Please try this one. It will be solved your problem
$ tail -n +2 filename | wc -l

How to generate a UUID for each line in a file using AWK or SED?

I need to append a UUID ( newly generated unique for each line) to each line of a file. I would prefer to use SED or AWK for this activity and take advantage of UUIDGEN executable on my linux box. I cannot figure out how to generate the the UUID for each line and append it.
I have tried:
awk '{print system(uuidgen) $1} myfile.csv
sed -i -- 's/^/$(uuidgen)/g' myfile.csv
And many other variations that didn't work. Can this be done with SED or AWK, or should I be investigating another solution that is not shell script based?
Sincerely,
Stephen.
Using bash, this will create a file outfile.txt with a concatenated uuid:
NOTE: Please run which bash to verify the location of your copy of bash on your system. It may not be located in the same location used in the script below.
#!/usr/local/bin/bash
while IFS= read -r line
do
uuid=$(uuidgen)
echo "$line $uuid" >> outfile.txt
done < myfile.txt
myfile.txt:
john,doe
mary,jane
albert,ellis
bob,glob
fig,newton
outfile.txt
john,doe 46fb31a2-6bc5-4303-9783-85844a4a6583
mary,jane a14bb565-eea0-47cd-a999-90f84cc8e1e5
albert,ellis cfab6e8b-00e7-420b-8fe9-f7655801c91c
bob,glob 63a32fd1-3092-4a72-8c24-7b01c400820c
fig,newton 63d38ad9-5553-46a4-9f24-2e19035cc40d
Just tweaking the syntax on your attempt, something like this should work:
awk '("uuidgen" | getline uuid) > 0 {print uuid, $0} {close("uuidgen")}' myfile.csv
For example:
$ cat file
a
b
c
$ awk '("uuidgen" | getline uuid) > 0 {print uuid, $0} {close("uuidgen")}' file
52a75bc9-e632-4258-bbc6-c944ff51727a a
24c97c41-d0f4-4cc6-b0c9-81b6d89c5b77 b
76de9987-a60f-4e3b-ba5e-ae976ab53c7b c
The right solution is to use other shell commands though since the awk isn't buying you anything:
$ xargs -n 1 printf "%s %s\n" $(uuidgen) < file
763ed28c-453f-47f4-9b1b-b2f972b2cc7d a
763ed28c-453f-47f4-9b1b-b2f972b2cc7d b
763ed28c-453f-47f4-9b1b-b2f972b2cc7d c
Try this
awk '{ "uuidgen" |& getline u; print u, $1}' myfile.csv
if you want to append instead of prepend change the order of print.
Using xargs is simpler here:
paste -d " " myfile.csv <(xargs -I {} uuidgen {} < myfile.csv)
This will call uuidgen for each line of myfile.csv
You can use paste and GNU sed:
paste <(sed 's/.*/uuidgen/e' file) file
This uses the GNU execute extension e to generate a UUID per line, then paste pastes the text back together. Use the -d paste flag to change the delimiter from the default tab, to whatever you want.

How do i copy every line X line from a bunch of files to another file?

So my problem is as follows:
I have a bunch of files and i need only the information from a certain line in each of these files (the same line for all files).
Example:
I want the content of the line 10 from file example_1.dat~example_10.dat and then i want to save it on > test.dat
I tried using: head -n 5 example_*.dat > test.dat. But this gives me all the information from the top till the line i have chosen instead of just the line.
Please help.
$ for f in *.dat ; do sed -n '5p' $f >> test.dat ; done
This code will do the following:
Foreach file f in the directory that ends with .dat.
Use sed on the 5:th row in file and write to test.dat.
The ">>" will add the row at the bottom of the file if existing.
Use a combination of head and tail to zoom to the needed line. For example, head -n 5 file | tail -n 1
You can use a for loop to get it done over several files
for f in *.dat ; do head -n 5 $f | tail -n 1 >> test.dat ; done
PS: Don't forget to clean the test.dat file (> test.dat) before running the loop. Otherwise you'll get results from previous runs as well.
You can use sed or awk:
sed -n "5p"
awk "NR == 5"
This might work for you (GNU sed):
sed -sn '5wtest.dat' example_*.dat

How to check if the sed command replaced some string? [duplicate]

This question already has answers here:
How to check if sed has changed a file
(11 answers)
Closed 7 years ago.
This command replaces the old string with the new one if the one exists.
sed "s/$OLD/$NEW/g" "$source_filename" > $dest_filename
How can I check if the replacement happened ? (or how many times happened ?)
sed is not the right tool if you need to count the substitution, awk will fit better your needs :
awk -v OLD=foo -v NEW=bar '
($0 ~ OLD) {gsub(OLD, NEW); count++}1
END{print count " substitutions occured."}
' "$source_filename"
This latest solution counts only the number of lines substituted. The next snippet counts all substitutions with perl. This one has the advantage to be clearer than awk and we keep the syntax of sed substitution :
OLD=foo NEW=bar perl -pe '
$count += s/$ENV{OLD}/$ENV{NEW}/g;
END{print "$count substitutions occured.\n"}
' "$source_filename"
Edit
Thanks to william who had found the $count += s///g trick to count the number of substitutions (even or not on the same line)
This awk should count the total number of substitutions instead of the number of lines where substitutions took place:
awk 'END{print t, "substitutions"} {t+=gsub(old,new)}1' old="foo" new="bar" file
If it is free for you to choose other tool, like awk, (as #sputnick suggested), go with other tools. Awk could count how many times the pattern matched.
sed itself cannot count replacement, particularly if you use /g flag. however if you want to stick to sed and know the replacement times there is possibilities:
One way is
grep -o 'pattern'|wc -l file && sed 's/pattern/rep/g' oldfile > newfile
you could also do it with tee
cat file|tee >(grep -o 'pattern'|wc -l)|(sed 's/pattern/replace/g' >newfile)
see this small example:
kent$ cat file
abababababa
aaaaaa
xaxaxa
kent$ cat file|tee >(grep -o 'a'|wc -l)|(sed 's/a/-/g' >newfile)
15
kent$ cat newfile
-b-b-b-b-b-
------
x-x-x-
this worked for me.
awk -v s="OLD" -v c="NEW" '{count+=gsub(s,c); }1
END{print count "numbers"}
' opfilename

Print a file, skipping the first X lines, in Bash [duplicate]

This question already has answers here:
How can I remove the first line of a text file using bash/sed script?
(19 answers)
Closed 3 years ago.
I have a very long file which I want to print, skipping the first 1,000,000 lines, for example.
I looked into the cat man page, but I did not see any option to do this. I am looking for a command to do this or a simple Bash program.
You'll need tail. Some examples:
$ tail great-big-file.log
< Last 10 lines of great-big-file.log >
If you really need to SKIP a particular number of "first" lines, use
$ tail -n +<N+1> <filename>
< filename, excluding first N lines. >
That is, if you want to skip N lines, you start printing line N+1. Example:
$ tail -n +11 /tmp/myfile
< /tmp/myfile, starting at line 11, or skipping the first 10 lines. >
If you want to just see the last so many lines, omit the "+":
$ tail -n <N> <filename>
< last N lines of file. >
Easiest way I found to remove the first ten lines of a file:
$ sed 1,10d file.txt
In the general case where X is the number of initial lines to delete, credit to commenters and editors for this:
$ sed 1,Xd file.txt
If you have GNU tail available on your system, you can do the following:
tail -n +1000001 huge-file.log
It's the + character that does what you want. To quote from the man page:
If the first character of K (the number of bytes or lines) is a
`+', print beginning with the Kth item from the start of each file.
Thus, as noted in the comment, putting +1000001 starts printing with the first item after the first 1,000,000 lines.
If you want to skip first two line:
tail -n +3 <filename>
If you want to skip first x line:
tail -n +$((x+1)) <filename>
A less verbose version with AWK:
awk 'NR > 1e6' myfile.txt
But I would recommend using integer numbers.
Use the sed delete command with a range address. For example:
sed 1,100d file.txt # Print file.txt omitting lines 1-100.
Alternatively, if you want to only print a known range, use the print command with the -n flag:
sed -n 201,300p file.txt # Print lines 201-300 from file.txt
This solution should work reliably on all Unix systems, regardless of the presence of GNU utilities.
Use:
sed -n '1d;p'
This command will delete the first line and print the rest.
If you want to see the first 10 lines you can use sed as below:
sed -n '1,10 p' myFile.txt
Or if you want to see lines from 20 to 30 you can use:
sed -n '20,30 p' myFile.txt
Just to propose a sed alternative. :) To skip first one million lines, try |sed '1,1000000d'.
Example:
$ perl -wle 'print for (1..1_000_005)'|sed '1,1000000d'
1000001
1000002
1000003
1000004
1000005
You can do this using the head and tail commands:
head -n <num> | tail -n <lines to print>
where num is 1e6 + the number of lines you want to print.
This shell script works fine for me:
#!/bin/bash
awk -v initial_line=$1 -v end_line=$2 '{
if (NR >= initial_line && NR <= end_line)
print $0
}' $3
Used with this sample file (file.txt):
one
two
three
four
five
six
The command (it will extract from second to fourth line in the file):
edu#debian5:~$./script.sh 2 4 file.txt
Output of this command:
two
three
four
Of course, you can improve it, for example by testing that all argument values are the expected :-)
cat < File > | awk '{if(NR > 6) print $0}'
I needed to do the same and found this thread.
I tried "tail -n +, but it just printed everything.
The more +lines worked nicely on the prompt, but it turned out it behaved totally different when run in headless mode (cronjob).
I finally wrote this myself:
skip=5
FILE="/tmp/filetoprint"
tail -n$((`cat "${FILE}" | wc -l` - skip)) "${FILE}"

Resources