save the output of a bash file - linux

i have some files in a folder, and i need the first line of each folder
transaction1.csv
transaction2.csv
transaction3.csv
transaction4.csv
and i have the next code
#All folders that begin with the word transaction
folder='"transaction*"'
ls `echo $folder |sed s/"\""/\/g` >testFiles
# The number of lines of testFiles that is the number of transaction files
num=`cat testFiles | wc -l`
for i in `seq 1 $num`
do
#The first transaction file
b=`cat testFiles | head -1`
#The first line of the first transaction file
cat `echo $b` | sed -n 1p
#remove the first line of the testFiles
sed -i '1d' testFiles
done
This code works, the problem is that i need save the first line of each file in a file
and if i change the line:
cat `echo $b` | sed -n 1p > salida
it not works =(

In bash:
for file in *.csv; do head -1 "$file" >> salida; done
As Adam mentioned in the comment this has an overhead of opening the file each time through the loop. If you need better performance and reliability use the following:
for file in *.csv; do head -1 "$file" ; done > salida

head -qn1 *.csv
head -n1 will print the first line of each file, and -q will suppress the header when more than one file is given on the command-line.
=== Edit ===
If the files are not raw text (for example, if they're compressed with "bzip2" as mentinoned in your comment) and you need to do some nontrivial preprocessing on each file, you're probably best off going with a for loop. For example:
for f in *.csv.bz2 ; do
bzcat "$f" | head -n1
done > salida
(Another option would be to bunzip2 the files and then head them in two steps, such as bunzip2 *.csv.bz2 && head -qn1 *.csv > salida; however, this will of course change the files in place by decompressing them, which is probably undesirable.)

this awk one-liner should do what you want:
awk 'FNR==1{print > "output"}' *.csv
the first line of each csv will be saved into file: output

Using sed:
for f in *.csv; do sed -n "1p" "$f"; done >salida

Related

Bash script to move first N files with specific name

I'm trying to move only 100 files with a specific extensions (from the current directory to the parent directory), but the following attempt of mine does not work
for file in $(ls -U | grep *.txt | tail -100)
do
mv $file ../
done
Can you point me to the correct approach?
Since you didn't quote *.txt, the shell expanded it to all the filenames ending in .txt. So your command is something like:
ls -U | grep file1.txt file2.txt file3.txt ... | tail -100
Since grep has filename arguments, it ignores its standard input. It outputs all the lines matching file1.txt in the remaining files. There's probably no matches, so nothing is piped to tail -100. And even if there were matches, the output would be the lines from the files, not filenames, so it wouldn't be useful for the mv command.
You can loop over the filenames directly, and use a counter variable to stop after 100 files.
counter=0
for file in *.txt
do
if (( counter >= 100 ))
then break
fi
mv "$file" ../
((counter++))
done
This avoids the pitfalls of parsing the output of ls.
this will do the job:
ls -U *.txt | tail -100 | while read filename; do mv "$filename" ../; done
while read filename respect spaces in the filename.
Run this in the text file directory:
#!/bin/bash
for txt_file in ./*.txt; do
((c++==100)) && break
mv "$txt_file" ../
done

Concatenation of huge number of selective files from a directory in Shell

I have more than 50000 files in a directory such as file1.txt, file2.txt, ....., file50000.txt. I would like to concatenate of some files whose file numbers are listed in the following text file (need.txt).
need.txt
1
4
35
45
71
.
.
.
I tried with the following. Though it is working, but I look for more simpler and short way.
n1=1
n2=$(wc -l < need.txt)
while [ $n1 -le $n2 ]
do
f1=$(awk 'NR=="$n1" {print $1}' need.txt)
cat file$f1.txt >> out.txt
(( n1++ ))
done
This might also work for you:
sed 's/.*/file&.txt/' < need.txt | xargs cat > out.txt
Something like this should work for you:
sed -e 's/.*/file&.txt/' need.txt | xargs cat > out.txt
It uses sed to translate each line into the appropriate file name and then hands the filenames to xargs to hand them to cat.
Using awk it could be done this way:
awk 'NR==FNR{ARGV[ARGC]="file"$1".txt"; ARGC++; next} {print}' need.txt > out.txt
Which adds each file to the ARGV array of files to process and then prints every line it sees.
It is possible do it without any sed or awk command. Directly using bash built-in functions and cat (of course).
for i in $(cat need.txt); do cat file${i}.txt >> out.txt; done
And as you want, it is quite simple.

How do i copy every line X line from a bunch of files to another file?

So my problem is as follows:
I have a bunch of files and i need only the information from a certain line in each of these files (the same line for all files).
Example:
I want the content of the line 10 from file example_1.dat~example_10.dat and then i want to save it on > test.dat
I tried using: head -n 5 example_*.dat > test.dat. But this gives me all the information from the top till the line i have chosen instead of just the line.
Please help.
$ for f in *.dat ; do sed -n '5p' $f >> test.dat ; done
This code will do the following:
Foreach file f in the directory that ends with .dat.
Use sed on the 5:th row in file and write to test.dat.
The ">>" will add the row at the bottom of the file if existing.
Use a combination of head and tail to zoom to the needed line. For example, head -n 5 file | tail -n 1
You can use a for loop to get it done over several files
for f in *.dat ; do head -n 5 $f | tail -n 1 >> test.dat ; done
PS: Don't forget to clean the test.dat file (> test.dat) before running the loop. Otherwise you'll get results from previous runs as well.
You can use sed or awk:
sed -n "5p"
awk "NR == 5"
This might work for you (GNU sed):
sed -sn '5wtest.dat' example_*.dat

Extract strings in a text file using grep

I have file.txt with names one per line as shown below:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
....
I want to grep each of these from an input.txt file.
Manually i do this one at a time as
grep "ABCB8" input.txt > output.txt
Could someone help to automatically grep all the strings in file.txt from input.txt and write it to output.txt.
You can use the -f flag as described in Bash, Linux, Need to remove lines from one file based on matching content from another file
grep -o -f file.txt input.txt > output.txt
Flag
-f FILE, --file=FILE:
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
-o, --only-matching:
Print only the matched (non-empty) parts of a matching line, with
each such part on a separate output line.
for line in `cat text.txt`; do grep $line input.txt >> output.txt; done
Contents of text.txt:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
Edit:
A safer solution with while read:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done
Edit 2:
Sample text.txt:
ABCB8
ABCB8XY
ABCC12
Sample input.txt:
You were hired to do a job; we expect you to do it.
You were hired because ABCB8 you kick ass;
we expect you to kick ass.
ABCB8XY You were hired because you can commit to a rational deadline and meet it;
ABCC12 we'll expect you to do that too.
You're not someone who needs a middle manager tracking your mouse clicks
If You don't care about the order of lines, the quick workaround would be to pipe the solution through a sort | uniq:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done; cat output.txt | sort | uniq > output2.txt
The result is then in output.txt.
Edit 3:
cat text.txt | while read line; do grep "\<${line}\>" input.txt >> output.txt; done
Is that fine?

How do I insert a new line before concatenating?

I have about 80000 files which I am trying to concatenate. This one:
cat files_*.raw >> All
is extremely fast whereas the following:
for f in `ls files_*.raw`; do cat $f >> All; done;
is extremely slow. Because of this reason, I am trying to stick with the first option except that I need to be able to insert a new line after each file is concatenated to All. Is there any fast way of doing this?
What about
ls files_*.raw | xargs -L1 sed -e '$s/$/\n/' >>ALL
That will insert an extra newline at the end of each file as you concat them.
And a parallel version if you don't care about the order of concatenation:
find ./ -name "*.raw" -print | xargs -n1 -P4 sed -e '$s/$/\n/' >>All
The second command might be slow because you are opening the 'All' file for append 80000 times vs. 1 time in the first command. Try a simple variant of the second command:
for f in `ls files_*.raw`; do cat $f ; echo '' ; done >> All
I don't know why it would be slow, but I don't think you have much choice:
for f in `ls files_*.raw`; do cat $f >> All; echo '' >> All; done
Each time awk opens another file to process, the FRN equals 0, so:
awk '(0==FRN){print ""} {print}' files_*.raw >> All
Note, it's all done in one awk process. Performance should be close to the cat command from the question.

Resources