How do I copy the beginning of multiple files in Linux? - linux

I want to copy a bunch of files (*.txt) from one directory to another in Ubuntu. I want to reduce them in size, so I am using head to get the first 100 lines of each.
I want the new files to keep their original names but be in the subdirectory small/.
I have tried:
head -n 100 *.txt > small/*.txt
but this creates one file called *.txt.
I have also tried:
head -n 100 *.txt > small/
but this gives Is a directory error.
It's got to be easy right, but I am pretty bad at Linux.
Any help is much appreciated.

You'll have to create a loop instead:
for file in *.txt; do
head -n 100 "$file" > small/"$file"
done
This loops through all the .txt files performing a head -n 100 in all of them and outputting into a new file in the small/ directory.

Try
for f in *.txt; do
head -n 100 $f > small/$f
done

Related

pasting many files to a single large file

i have many text files in a directory like 1.txt 2.txt 3.txt 4.txt .......2000.txt and i want to paste them to make a large file.
In this regard i did something like
paste *.txt > largefile.txt
but the above command reads the .txt file randomly, so i need to read the files sequentially and paste as 1.txt 2.txt 3.txt....2000.txt
please suggest a better solution for pasting many files.
Thanks and looking forward to hearing from you.
Sort the file names numerically yourself then.
printf "%s\n" *.txt | sort -n | xargs -d '\n' paste
When dealing with many files, you may hit ulimit -n. On my system ulimit -n is 1024, but this is a soft limit and can be raised with just like ulimit -n 99999.
Without raising the soft limit, go with a temporary file that would accumulate results each "round" of ulimit -n count of files, like:
touch accumulator.txt
... | xargs -d '\n' -n $(($(ulimit -n) - 1)) sh -c '
paste accumulator.txt "$#" > accumulator.txt.sav;
mv accumulator.txt.sav accumulator.txt
' _
cat accumulator.txt
Instead use the wildcard * to enumerate all your files in a directory, if your file names pattern are sequentially ordered, you can manually list all files in order and concatenate to a large file. The output order of * enumeration might look different in different environment, as it not works as you expect.
Below is a simple example
$ for i in `seq 20`;do echo $i > $i.txt;done
# create 20 test files, 1.txt, 2.txt, ..., 20.txt with number 1 to 20 in each file respectively
$ cat {1..20}.txt
# show content of all file in order 1.txt, 2.txt, ..., 20.txt
$ cat {1..20}.txt > 1_20.txt
# concatenate them to a large file named 1_20.txt
In bash or any other shell, glob expansions are done in lexicographical order. When having files numberd, this sadly means that 11.txt < 1.txt < 2.txt. This weird ordering comes from the fact that, lexicographically, 1 < . (<dot>-character (".")).
So here are a couple of ways to operate on your files in order:
rename all your files:
for i in *.txt; do mv "$i" "$(sprintf "%0.5d.txt" ${i%.*}"); done
paste *.txt
use brace-expansion:
Brace expansion is a mechanism that allows for the generation of arbitrary strings. For integers you can use {n..m} to generate all numbers from n to m or {n..m..s} to generate all numbers from n to m in steps of s:
paste {1..2000}.txt
The downside here is that it is possible that a file is missing (eg. 1234.txt). So you can do
shopt -s extglob; paste ?({1..2000}.txt)
The pattern ?(pattern) matches zero or one glob-matches. So this will exclude the missing files but keeps the order.

Unable to cat ~9000 files using command line

I am trying to cat ~9000 fasta like files into one larger file. All of the files are in a single subfolder. I keep getting the argument list is to long error.
This is a sample name from one of the files
efetch.fcgi?db=nuccore&id=CL640905.1&rettype=fasta&retmode=text
They are considered a document type file by the computer.
You can't use cat * > concatfile as you have limits on command line size. So take them one at a time and append:
ls | while read; do cat "$REPLY" >> concatfile; done
(Make sure concatfile doesn't exist beforehand.)
EDIT: As user6292850 rightfully points out, I might be overthinking it. This suffices, if your files don't have too weird names:
ls | xargs cat > concatfile
(but files with spaces in them, for example, would blow it up)
There is a limit on how many arguments you can place on the commandline.
You could use a for loop to handle this:
while read file;do
cat "${file}" >> path/to/output_folder;
done < <(find path/to/output_folder -maxdepth 1 -type f -print)
This will bypass the problem of an expanded glob with too many arguments.

Bash Script to replicate files

I have 25 files in a directory. I need to amass 25000 files for testing purposes. I thought I could just replicate these files over and over until I get 25000 files. I could manually copy paste 1000 times but that seemed tedious. So I thought I could write a script to do it for me. I tried
cp * .
As a trial but I got an error that said the source and destination file are the same. If I were to automate it how would i do it so that each of the 1000 times the new files are made with unique names?
As discussed in the comments, you can do something like this:
for file in *
do
filename="${file%.*}" # get everything up to last dot
extension="${file##*.}" # get extension (text after last dot)
for i in {00001..10000}
do
cp $file ${filename}${i}${extension}
done
done
The trick for i in {00001..10000} is used to loop from 1 to 10000 having the number with leading zeros.
The ${filename}${i}${extension} is the same as $filename$i$extension but makes more clarity over what is a variable name and what is text. This way, you can also do ${filename}_${i}${extension} to get files like a_23.txt, etc.
In case your current files match a specific pattern, you can always do for file in a* (if they all are on the a + something format).
If you want to keep the extension of the files, you can use this. Assuming, you want to copy all txt-files:
#!/bin/bash
for f in *.txt
do
for i in {1..10000}
do
cp "$f" "${f%.*}_${i}.${f##*.}"
done
done
You could try this:
for file in *; do for i in {1..1000}; do cp $file $file-$i; done; done;
It will append a number to any existing files.
The next script
for file in *.*
do
eval $(sed 's/\(.*\)\.\([^\.]*\)$/base="\1";ext="\2";/' <<< "$file")
for n in {1..1000}
do
echo cp "$file" "$base-$n.$ext"
done
done
will:
take all files with extensions *.*
creates the basename and extension (sed)
in a cycle 1000 times copyes the original file to file-number.extension
it is for DRY-RUN, remove the echo if satisfied

Split files according to a field and save in subdirectory created using the root name

I am having trouble with several bits of code, I am no expert in Linux Bash programming unfortunately so I have tried unsuccessfully to find something that works for my task all day and was hoping you could help guide me in the right direction.
I have many large files that I would like to split according to the third field within each of them, I would like to keep the header in each of the sub-files, and save the created sub-files in new directories created from the root names of the files.
The initial files stored in the original directory are:
Downloads/directory1/Levels_CHG_Lab_S_sample1.txt
Downloads/directory1/Levels_CHG_Lab_S_sample2.txt
Downloads/directory1/Levels_CHG_Lab_S_sample3.txt
and so on..
Each of these files have 200 columns, and column 3 contains values from 1 through 10.
I would like to split each of the files above based on the value of this column, and store the subfiles in subfolders, so for example sub-folder "Downloads/directory1/sample1" will contain 10 files (with the header line) derived by splitting the file Downloads/directory1/Levels_CHG_Lab_S_sample1.txt.
I have tried now many different steps for these steps, with no success.. I must be making this more complicated than it is since the code I have tried looks aweful…
Here is the code I am trying to work from:
FILES=Downloads/directory1/
for f in $FILES
do
# Create folder with root name by stripping file names
fname=${echo $f | sed 's/.txt//;s/Levels_CHG_Lab_S_//'}
echo "Creating sub-directory [$fname]"
mkdir "$fname"
# Save the header
awk 'NR==1{print $0}' $f > header
# Split each file by third column
echo "Splitting file $f"
awk 'NR>1 {print $0 > $3".txt" }' $f
# Move newly created files in sub directory
mv {1..10}.txt $fname # I have no idea how to do specify the files just created
# Loop through the sub-files to attach header row:
for subfile in $fname
do
cat header $subfile >> tmp_file
mv -f tmp_file $subfile
done
done
All these steps seem very complicated to me, I would very much appreciate if you could help me solve this in the right way. Thank you very much for your help.
-fra
You have a few problems with your code right now. First of all, at no point do you list the contents of your downloads directory. You are simply setting the FILES variable to a string that is the path to that directory. You would need something like:
FILES=$(ls Downloads/directory1/*.txt)
You also never cd to the Downloads/directory1 folder, so your mkdir would create directories in cwd; probably not what you want.
If you know that the numbers in column 3 always range from 1 to 10, I would just pre-populate those files with the header line before you split the file.
Try this code to do what you want (untested):
BASEDIR=Downloads/directory1/
FILES=$(ls ${BASEDIR}/*.txt)
for f in $FILES; do
# Create folder with root name by stripping file names
dirname=$(echo $f | sed 's/.txt//;s/Levels_CHG_Lab_S_//')
dirname="${BASENAME}/${dirname}/"
echo "Creating sub-directory [$dirname]"
mkdir "$dirname"
# Save the header to each file
HEADER_LINE=$(head -n1 $f)
for i in {1..10}; do
echo ${HEADER_LINE} > ${dirname}/${i}.txt
done
# Split each file by third column
echo "Splitting file $f"
awk -v dirname=${dirname} 'NR>1 {filename=dirname$3".txt"; print $0 >> filename }' $f
done

Percentage of completion of script: Name a file with percentage

I have a script that i run on 2k servers simultaneously that creates a temp working directory on a NAS.
The script builds a list of files...the list could be 1k files or 1m files.
I run a for loop on the list to run some grep commands on each file
counter=0
num_files=`wc -l $filelist`
cat $filelist| while read line; do
do_stuff_here
counter=`expr $counter+ 1`
((percent=$counter/$num_files))
##CREATE a file named "$percent".percent
done
What I am thinking is I can take the total number of files from the list ( wc -l $filelist) and add a counter that i increase by 1 in the loop.
I can then divide $counter/$num_files.
This seems to work, but the problem I have is that I would like to rename the same file, instead of just creating a new one each time. What can i do here?
I do not want this to output to stdout/stderr....i already have enough stuff going to these places. I would like to be able to browse to a subdir in WinSCP and quickly see where each is.
Try this one
touch 0.percent
counter=0
num_files=$(wc -l $filelist)
num_files=${num_files/ */}
cat $filelist| while read line; do
do_stuff_here
mv -f {$((counter*100/num_files)),$((++counter*100/num_files))}.percent
done
rm -f *.percent

Resources