counting files in directory linux - linux

Q2. Write a script that takes a directory name as command line argument and display the attributes of various files in it e.g.
Regular Files
Total No of files
No of directories
Files allowing write permissions
Files allowing read permissions
Files allowing execute permissions
File having size 0
Hidden files in directory
working in linux in shell script
what i have done is
find DIR_NAME -type f -print | wc -l
To count all files (including subdirs):
find /home/vivek -type f -print| wc -l
To count all dirs including subdirs:
find . -type d -print | wc -l
To only count files in given dir only (no subdir):
find /dest -maxdepth 1 -type f -print| wc -l
To only count dirs in given dir only (no subdir):
find /path/to/foo -maxdepth 1 -type d -print| wc -l

All your questions can be solved by looking into man find
-type f
no option necessary
-type d
-perm /u+w,g+w or some variation
-perm /u+r,g+r
-perm /u+x,g+x
-size 0
-name '.*'

Related

How to get the number of files with a specfic extension in a directory and it's sub directories on Linux terminal?

The question is itself self-explanatory.
I tried the following command I found somewhere on the internet but it shows the number just in the immediate directory and not its subdirectories.
ls -lR ./*.jpg | wc -l
I am searching for all the files with the extension ".jpg" in the current folder and its subdirectories.
find . -type f -name '*.jpg' | wc -l
Find all the files (type f) that have a name that matches '*.jpg' then count them with wc
It's a job for find:
find -name "*.jpg" | wc -l

Show only directories, not their contents with `find -type d | xargs ls`

I want to find some folders by name, and then list their information using "ls", here is what i did using "find",
find ./ -mindepth 1 -maxdepth 3 -type d -name logs
what i got is:
./RECHMN32Z/US/logs
./RECHMN32Z/UM/logs
./RECHMP3BL/US/logs
./RECHMP3BL/UM/logs
./RECHMAS86/UM/logs
./RECHMAS86/US/logs
and then i add "xargs ls -l" , then it will return information of all files under these folders returned above,
if i just want to list information of these folders, how to do ?
It's not find or xargs's fault, but ls's. When given directory names ls shows their contents. You can use -d to have it only show the directories themselves.
find has a -ls action that uses the same format as ls -dils. No need to invoke an external command.
find ./ -mindepth 1 -maxdepth 3 -type d -name logs -ls
Or use ls -ld to list the directories and not their contents. -exec cmd {} + is a simpler alternative to xargs. No pipeline required.
find ./ -mindepth 1 -maxdepth 3 -type d -name logs -exec ls -ld {} +

How to find the total lines of csv files in a directory on Linux?

Working with huge CSV files for data analytics, we commonly need to know the row count of all csv files located in a particular folder.
But how to do it with just only one command in Linux?
If you want to check the total line of all the .csv files in a directory, you can use find and wc:
find . -type f -name '*.csv' -exec wc -l {} +
To get lines count for every file recursively you can use Cesar's answer:
$ LANG=C find /tmp/test -type f -name '*.csv' -exec wc -l '{}' +
49 /tmp/test/sub/3.csv
22 /tmp/test/1.csv
419 /tmp/test/2.csv
490 total
To get total lines count for all files recursive:
$ LANG=C find /tmp/test -type f -name '*.csv' -exec cat '{}' + | wc -l
490

Ubuntu - Remove dir and ignore filetypes

I'm trying to create a cronjob for Ubuntu where:
all empty dir's should be removed
if the dir is not empty then it should be removed if the only filetypes are txt or csv files
Currently I have:
find /path -depth rmdir {} \; 2>dev/null
What do I need to delete the folders which only have txt or csv files?
I don't want to delete all txt or csv files, just those folders which do not contain other filetypes.
Additional example:
Dir1
SubDir1
SubSubDir1
File.txt
File.csv
SubDir2
SubSubDir2
File.xml
SubSubDir1 should be deleted. Since SubDir1 and Dir is now empty they should be deleted as well.
SubSubDir2 contains another filetype and should no be deleted.
You could list the number of files in a folder with something like:
find "$d" -maxdepth 1 -not -iname '*.csv' -a -not -iname '*.txt' | wc -l
If the folder is empty or the folder contains exclusively txt and csv files, it shall print 1.
And to list folders so that they don’t mess up each other if you erase the parents first:
find /path -depth -type d
All in all, you may be able to achieve what you want with:
while read d
do
if [ $(find "$d" -maxdepth 1 -not -iname '*.csv' -a -not -iname '*.txt' | wc -l) -eq 1 ]
then
rm -rf "$d"
fi
done < <(find /path -depth -type d)
But I also advocate a check somewhere so your cron doesn’t wipe your storage without your consent.

cat files in subdirectories using linux commands

I have the following directories:
P922_101
P922_102
.
.
Each directory, for instance P922_101 has following subdirectories:
140311_AH8MHGADXX 140401_AH8CU4ADXX
Each subdirectory, for instance 140311_AH8MHGADXX has the following files:
1_140311_AH8MH_P922_101_1.fastq.gz 1_140311_AH8MH_P922_101_2.fastq.gz
2_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_2.fastq.gz
And files in 140401_AH8CU4ADXX are:
1_140401_AH8CU_P922_101_1.fastq.gz 1_140401_AH8CU_P922_4001_2.fastq.gz
2_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_4001_2.fastq.gz
I want to do 'cat' for the files in the subdirectories in the following way:
cat 1_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_1.fastq.gz
1_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_101_1.fastq.gz > P922_101_1.fastq.gz
which means that files ending with _1.fastq.gz should be concatenated into a single file and files ending with _2.fatsq.gz into another file.
It should be run for all files in subdirectories in all directories. Could someone give a linux solution to do this?
Since they're compressed, you should probably use gzip -dc (decompress and write to stdout) -
find /somePath -type f -name "*.fastq.gz" -exec gzip -dc {} \; | \
tee -a /someOutFolder/out.txt
You can use find for this:
find /top/path -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > one_file
find /top/path -mindepth 2 -type f -name "*_2.fastq.gz" -exec cat {} \; > another_file
This will look for all the files starting from /top/path and having a name matching the pattern _1.fastq.gz / _2.fastq.gz and cat them into the desired file. -mindepth 2 makes find look for files that are at least under the current directory; this way, files in /top/path won't be matched.
Note that you will probably need zcat instead of cat, for gz files.
As you keep adding details in comments, let's see what else we can do:
Say you have the list of directories in a file directories_list, each line containing one:
while read directory
do
find $directory -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > $directory/output
done < directories_list

Resources