cat files in subdirectories using linux commands - linux

I have the following directories:
P922_101
P922_102
.
.
Each directory, for instance P922_101 has following subdirectories:
140311_AH8MHGADXX 140401_AH8CU4ADXX
Each subdirectory, for instance 140311_AH8MHGADXX has the following files:
1_140311_AH8MH_P922_101_1.fastq.gz 1_140311_AH8MH_P922_101_2.fastq.gz
2_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_2.fastq.gz
And files in 140401_AH8CU4ADXX are:
1_140401_AH8CU_P922_101_1.fastq.gz 1_140401_AH8CU_P922_4001_2.fastq.gz
2_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_4001_2.fastq.gz
I want to do 'cat' for the files in the subdirectories in the following way:
cat 1_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_1.fastq.gz
1_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_101_1.fastq.gz > P922_101_1.fastq.gz
which means that files ending with _1.fastq.gz should be concatenated into a single file and files ending with _2.fatsq.gz into another file.
It should be run for all files in subdirectories in all directories. Could someone give a linux solution to do this?

Since they're compressed, you should probably use gzip -dc (decompress and write to stdout) -
find /somePath -type f -name "*.fastq.gz" -exec gzip -dc {} \; | \
tee -a /someOutFolder/out.txt

You can use find for this:
find /top/path -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > one_file
find /top/path -mindepth 2 -type f -name "*_2.fastq.gz" -exec cat {} \; > another_file
This will look for all the files starting from /top/path and having a name matching the pattern _1.fastq.gz / _2.fastq.gz and cat them into the desired file. -mindepth 2 makes find look for files that are at least under the current directory; this way, files in /top/path won't be matched.
Note that you will probably need zcat instead of cat, for gz files.
As you keep adding details in comments, let's see what else we can do:
Say you have the list of directories in a file directories_list, each line containing one:
while read directory
do
find $directory -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > $directory/output
done < directories_list

Related

Copy recursive files of all the subdirectories

I want to copy all the log files from a directory which does not contain log files, but it contains other subdirectories with log files. These subdirectories also contain other subdirectories, so I need something recursive.
I tried
cp -R *.log /destination
But it doesn't work because the first directory does not contains log files. The response can be also a loop in bash.
find /path/to/logdir -type f -name "*.log" |xargs -I {} cp {} /path/to/destinationdir
Explanation:
find searches recursively
-type f tells you to search for files
-name specifies the name pattern
xargs executes commands
-I {} indicates an argument substitution symbol
Another version without xargs:
find /path/to/logdir -type f -name '* .log' -exec cp '{}' /path/to/destinationdir \;

Bash: Find files containing a certain string and copy them into a folder

What I want:
In a bash script: Find all files in current directory that contain a certain string "teststring" and cop them into a subfolder "./testfolder"
Found this to find the filenames which im looking for
find . -type f -print0 | xargs -0 grep -l "teststring"
..and this to copy found files to another folder (here selecting by strings in filename):
find . -type f -iname "stringinfilename" -exec cp {} ./testfolder/ \;
Whats the best way to combine both commands to achieve what I described at the top?
Just let find do both:
find . -name subdir -prune -o -type f -exec \
grep -q teststring "{}" \; -exec cp "{}" subdir \;
Note that things like this are much easier if you don't try to add to the directory you're working in. In other words, write to a sibling dir instead of writing to a subdirectory. If you want to wind up with the data in a subdir, mv it when you're done. That way, you don't have to worry about the prune (ie, you don't have to worry about find descending into the subdir and attempting to duplicate the work).

Linux find all files in sub directories and move them

I have a Linux-System where some users put files with ftp in a Directory. In this Directory there are sub-directories which the users can create. Now I need a script that searches for all files in those subdirectories and moves them in a single Directory (for backup). The Problem: The Sub directories shouldn´t be removed.
the directory for the users is /files/media/documents/
and the files have to be moved in the Directory /files/dump/. I don´t care about files in /files/media/documents/, they are already handled by another script.
I already tried this script:
for dir in /files/media/documents/
do
find "$dir/" -iname '*' -print0 | xargs -0 mv -t /files/dump/
done
Instead of iterating, you could just use find. In man-page there is a "-type" option documented, so for moving only files you could do:
find "/files/media/documents/" -type f -print0 | xargs -0 mv -t /files/dump/
You also won't like to find files in /files/media/documents/, but all sub-directories? Simply add "-mindepth":
find "/files/media/documents/" -type f -mindepth 1 -print0 | xargs -0 mv -t /files/dump/
Alternatively you could also use "-exec" to skip a second command (xargs):
find "/files/media/documents/" -type f -mindepth 1 -exec mv {} /files/dump/ \;

In Linux terminal, how to delete all files in a directory except one or two

In a Linux terminal, how to delete all files from a folder except one or two?
For example.
I have 100 image files in a directory and one .txt file.
I want to delete all files except that .txt file.
From within the directory, list the files, filter out all not containing 'file-to-keep', and remove all files left on the list.
ls | grep -v 'file-to-keep' | xargs rm
To avoid issues with spaces in filenames (remember to never use spaces in filenames), use find and -0 option.
find 'path' -maxdepth 1 -not -name 'file-to-keep' -print0 | xargs -0 rm
Or mixing both, use grep option -z to manage the -print0 names from find
In general, using an inverted pattern search with grep should do the job. As you didn't define any pattern, I'd just give you a general code example:
ls -1 | grep -v 'name_of_file_to_keep.txt' | xargs rm -f
The ls -1 lists one file per line, so that grep can search line by line. grep -v is the inverted flag. So any pattern matched will NOT be deleted.
For multiple files, you may use egrep:
ls -1 | grep -E -v 'not_file1.txt|not_file2.txt' | xargs rm -f
Update after question was updated:
I assume you are willing to delete all files except files in the current folder that do not end with .txt. So this should work too:
find . -maxdepth 1 -type f -not -name "*.txt" -exec rm -f {} \;
find supports a -delete option so you do not need to -exec. You can also pass multiple sets of -not -name somefile -not -name otherfile
user#host$ ls
1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt josh.pdf keepme
user#host$ find . -maxdepth 1 -type f -not -name keepme -not -name 8.txt -delete
user#host$ ls
8.txt keepme
Use the not modifier to remove file(s) or pattern(s) you don't want to delete, you can modify the 1 passed to -maxdepth to specify how many sub directories deep you want to delete files from
find . -maxdepth 1 -not -name "*.txt" -exec rm -f {} \;
You can also do:
find -maxdepth 1 \! -name "*.txt" -exec rm -f {} \;
In bash, you can use:
$ shopt -s extglob # Enable extended pattern matching features
$ rm !(*.txt) # Delete all files except .txt files

How to copy all the files with the same suffix to another directory? - Unix

I have a directory with unknown number of subdirectories and unknown level of sub*directories within them. How do I copy all the file swith the same suffix to a new directory?
E.g. from this directory:
> some-dir
>> foo-subdir
>>> bar-sudsubdir
>>>> file-adx.txt
>> foobar-subdir
>>> file-kiv.txt
Move all the *.txt files to:
> new-dir
>> file-adx.txt
>> file-kiv.txt
One option is to use find:
find some-dir -type f -name "*.txt" -exec cp \{\} new-dir \;
find some-dir -type f -name "*.txt" would find *.txt files in the directory some-dir. The -exec option builds a command line (e.g. cp file new.txt) for every matching file denoted by {}.
Use find with xargs as shown below:
find some-dir -type f -name "*.txt" -print0 | xargs -0 cp --target-directory=new-dir
For a large number of files, this xargs version is more efficient than using find some-dir -type f -name "*.txt" -exec cp {} new-dir \; because xargs will pass multiple files at a time to cp, instead of calling cp once per file. So there will be fewer fork/exec calls with the xargs version.

Resources