How do i copy every line X line from a bunch of files to another file? - linux

So my problem is as follows:
I have a bunch of files and i need only the information from a certain line in each of these files (the same line for all files).
Example:
I want the content of the line 10 from file example_1.dat~example_10.dat and then i want to save it on > test.dat
I tried using: head -n 5 example_*.dat > test.dat. But this gives me all the information from the top till the line i have chosen instead of just the line.
Please help.

$ for f in *.dat ; do sed -n '5p' $f >> test.dat ; done
This code will do the following:
Foreach file f in the directory that ends with .dat.
Use sed on the 5:th row in file and write to test.dat.
The ">>" will add the row at the bottom of the file if existing.

Use a combination of head and tail to zoom to the needed line. For example, head -n 5 file | tail -n 1
You can use a for loop to get it done over several files
for f in *.dat ; do head -n 5 $f | tail -n 1 >> test.dat ; done
PS: Don't forget to clean the test.dat file (> test.dat) before running the loop. Otherwise you'll get results from previous runs as well.

You can use sed or awk:
sed -n "5p"
awk "NR == 5"

This might work for you (GNU sed):
sed -sn '5wtest.dat' example_*.dat

Related

how to show the third line of multiple files

I have a simple question. I am trying to check the 3rd line of multiple files in a folder, so I used this:
head -n 3 MiseqData/result2012/12* | tail -n 1
but this doesn't work obviously, because it only shows the third line of the last file. But I actually want to have last line of every file in the result2012 folder.
Does anyone know how to do that?
Also sorry just another questions, is it also possible to show which file the particular third line belongs to?
like before the third line is shown, is it also possible to show the filename of each of the third line extracted from?
because if I used head or tail command, the filename is also shown.
thank you
With Awk, the variable FNR is the number of the "record" (line, by default) in the current file, so you can simply compare it to 3 to print the third line of each input file:
awk 'FNR == 3' MiseqData/result2012/12*
A more optimized version for long files would skip to the next file on match, since you know there's only that one line where the condition is true:
awk 'FNR == 3 { print; nextfile }' MiseqData/result2012/12*
However, not all Awks support nextfile (but it is also not exclusive to GNU Awk).
A more portable variant using your head and tail solution would be a loop in the shell:
for f in MiseqData/result2012/12*; do head -n 3 "$f" | tail -n 1; done
Or with sed (without GNU extensions, i.e., the -s argument):
for f in MiseqData/result2012/12*; do sed '3q;d' "$f"; done
edit: As for the additional question of how to print the name of each file, you need to explicitly print it for each file yourself, e.g.,
awk 'FNR == 3 { print FILENAME ": " $0; nextfile }' MiseqData/result2012/12*
for f in MiseqData/result2012/12*; do
echo -n `basename "$f"`': '
head -n 3 "$f" | tail -n 1
done
for f in MiseqData/result2012/12*; do
echo -n "$f: "
sed '3q;d' "$f"
done
With GNU sed:
sed -s -n '3p' MiseqData/result2012/12*
or shorter
sed -s '3!d' MiseqData/result2012/12*
From man sed:
-s: consider files as separate rather than as a single continuous long stream.
You can do this:
awk 'FNR==3' MiseqData/result2012/12*
If you like the file name as well:
awk 'FNR==3 {print FILENAME,$0}' MiseqData/result2012/12*
This might work for you (GNU sed & parallel):
parallel -k sed -n '3p\;3q' {} ::: file1 file2 file3
Parallel applies the sed command to each file and returns the results in order.
N.B. All files will only be read upto the 3rd line.
Also,you may be tempted (as I was) to use:
sed -ns '3p;3q' file1 file2 file3
but this will only return the first file.
Hi bro I am answering this question as we know FNR is used to check no of lines so we can run this command to get 3rd line of every file.
awk 'FNR==3' MiseqData/result2012/12*

How To Delete First X Lines Based On Minimum Lines In File

I have a file with 10,000 lines. Using the following command, I am deleting all lines after line 10,000.
sed -i '10000,$ d' file.txt
However, now I would like to delete the first X lines so that the file has no more than 10,000 lines.
I think it would be something like this:
sed -i '1,$x d' file.txt
Where $x would be the number of lines over 10,000. I'm a little stuck on how to write the if, then part of it. Or, I was thinking I could use the original command and just cat the file in reverse?
For example, if we wanted just 3 lines from the bottom (seems simpler after a few helpful answers):
Input:
Example Line 1
Example Line 2
Example Line 3
Example Line 4
Example Line 5
Expected Output:
Example Line 3
Example Line 4
Example Line 5
Of course, if you know a more efficient way to write the command, I would be open to that too. Your positive input is highly appreciated.
tail can do exactly what you want.
tail -n 10000 file.txt
For simplicity, I would reverse the file, keep the first 10000 lines, then re-reverse the file.
It makes saving the file in-place a touch more complicated
source=file.txt
temp=$(mktemp)
tac "$source" | sed '10000 q' | tac > "$temp" && mv "$temp" "$source"
Without reversing the file, you'd count the number of lines and do some arithmetic:
sed -i "1,$(( $(wc -l < file.txt) - 10000 )) d" file.txt
$ awk -v n=3 '{a[NR%n]=$0} END{for (i=NR+1;i<=(NR+n);i++) print a[i%n]}' file
Example Line 3
Example Line 4
Example Line 5
Add -i inplace if you have GNU awk and want to do "inplace" editing.
To keep the first 10000 lines :
head -n 10000 file.txt
To keep the last 10000 lines :
tail -n 10000 file.txt
Test with your file Example
tail -n 3 file.txt
Example Line 3
Example Line 4
Example Line 5
tac file.txt | sed "$x q" | tac | sponge file.txt
The sponge command is useful here in avoiding an additional temporary file.
tail -10000 <<<"$(cat file.txt)" > file.txt
Okay, not «just» tail, but this way it`s capable of inplace truncation.

Concatenation of huge number of selective files from a directory in Shell

I have more than 50000 files in a directory such as file1.txt, file2.txt, ....., file50000.txt. I would like to concatenate of some files whose file numbers are listed in the following text file (need.txt).
need.txt
1
4
35
45
71
.
.
.
I tried with the following. Though it is working, but I look for more simpler and short way.
n1=1
n2=$(wc -l < need.txt)
while [ $n1 -le $n2 ]
do
f1=$(awk 'NR=="$n1" {print $1}' need.txt)
cat file$f1.txt >> out.txt
(( n1++ ))
done
This might also work for you:
sed 's/.*/file&.txt/' < need.txt | xargs cat > out.txt
Something like this should work for you:
sed -e 's/.*/file&.txt/' need.txt | xargs cat > out.txt
It uses sed to translate each line into the appropriate file name and then hands the filenames to xargs to hand them to cat.
Using awk it could be done this way:
awk 'NR==FNR{ARGV[ARGC]="file"$1".txt"; ARGC++; next} {print}' need.txt > out.txt
Which adds each file to the ARGV array of files to process and then prints every line it sees.
It is possible do it without any sed or awk command. Directly using bash built-in functions and cat (of course).
for i in $(cat need.txt); do cat file${i}.txt >> out.txt; done
And as you want, it is quite simple.

save the output of a bash file

i have some files in a folder, and i need the first line of each folder
transaction1.csv
transaction2.csv
transaction3.csv
transaction4.csv
and i have the next code
#All folders that begin with the word transaction
folder='"transaction*"'
ls `echo $folder |sed s/"\""/\/g` >testFiles
# The number of lines of testFiles that is the number of transaction files
num=`cat testFiles | wc -l`
for i in `seq 1 $num`
do
#The first transaction file
b=`cat testFiles | head -1`
#The first line of the first transaction file
cat `echo $b` | sed -n 1p
#remove the first line of the testFiles
sed -i '1d' testFiles
done
This code works, the problem is that i need save the first line of each file in a file
and if i change the line:
cat `echo $b` | sed -n 1p > salida
it not works =(
In bash:
for file in *.csv; do head -1 "$file" >> salida; done
As Adam mentioned in the comment this has an overhead of opening the file each time through the loop. If you need better performance and reliability use the following:
for file in *.csv; do head -1 "$file" ; done > salida
head -qn1 *.csv
head -n1 will print the first line of each file, and -q will suppress the header when more than one file is given on the command-line.
=== Edit ===
If the files are not raw text (for example, if they're compressed with "bzip2" as mentinoned in your comment) and you need to do some nontrivial preprocessing on each file, you're probably best off going with a for loop. For example:
for f in *.csv.bz2 ; do
bzcat "$f" | head -n1
done > salida
(Another option would be to bunzip2 the files and then head them in two steps, such as bunzip2 *.csv.bz2 && head -qn1 *.csv > salida; however, this will of course change the files in place by decompressing them, which is probably undesirable.)
this awk one-liner should do what you want:
awk 'FNR==1{print > "output"}' *.csv
the first line of each csv will be saved into file: output
Using sed:
for f in *.csv; do sed -n "1p" "$f"; done >salida

extracting data from a file and appending to the target file in linux using sed

I want to extract some data from files minimumThickness*.k and want to put it in the file results.txt.
The file mimimumThickness*.k has only double values in the first line.
The files minimumThickness.k are a series of files from 1 to hundred like
mimimumThickness1.k
mimimumThickness2.k
mimimumThickness3.k
. . .
. . .
mimimumThickness100.k
I used to following command to do it but was not successful.
sed -n '/^[0-9.]*$/w results.txt' minimumThickness*.k
I could also use
for loop of i over 1 to hundred
thickness=´cat minimumThickness$i.k | {print $1} ' | bc`
echo $thickness
thickess >> results.txt
kindly tell me about the problem with sed or suggest me better way of using sed. It would appreciate any elegent method.
best regards.
[0-9.]* will match anything and so you may not be seeing expected result. You can try [0-9]*\.[0-9]* to get doubles (with some modifications).
If you only need the first line of each file:
head -n 1 minimumThickness*.k > results.txt
This might work for you (GNU sed):
sed -sn '1w results.txt' minimumThickness*.k
or
head -qn1 minimumThickness*.k > results.txt

Resources