Counting files contained in a directory - linux

How can I count all files, hidden files, directories, hidden directories, sub-directories, hidden sub-directories and (symbolic) links in a given directory using bash?

find . | wc -l
This will count each symlink as a file. To traverse symlinks, counting their contents, use:
find -L . | wc -l

find . -print0 | tr -cd '\0' | wc -c
This handles filenames with newline characters.

This does it:
find the_directory|wc -l
This works be finding all files in the directory, and counting them.

You can also use
tree
it gives you a count in the end. I don't know how the speed compares with find. Lazily:
tree | tail -1
easier to type than find :-)

Related

How to list files in a directory, sorted by size, but without listing folder sizes?

I'm writing a bash script that should output the first 10 heaviest files in a directory and all subfolders (the directory is passed to the script as an argument).
And for this I use the following command:
sudo du -haS ../../src/ | sort -hr
, but its output contains folder sizes, and I only need files. Help!
Why using du at all? You could do a
ls -S1AF
This will list all entries in the current directoriy, sorted descending by size. It will also include the names of the subdirectories, but they will be at the end (because the size of a directory entry is always zero), and you can recognize them because they have a slash at the end.
To exclude those directories and pick the first 10 lines, you can do a
ls -S1AF | head -n 10 | grep -v '/$'
UPDATE:
If your directory contains not only subdirectories, but also files of length zero, some of those empty files might not be shown in the output, as pointed out in the comment by F.Hauri. If this is an issue for your application, I suggest to exchange the order and do a
ls -S1AF | grep -v '/$' | head -n 10
instead.
Would you please try the following:
dir="../../src/"
sudo find "$dir" -type f -printf "%s\t%p\n" | sort -nr | head -n 10 | cut -f2-
find "$dir" -type f searches $dir for files recursively.
The -printf "%s\t%p\n" option tells find to print the filesize
and the filename delimited by a tab character.
The final cut -f2- in the pipeline prints the 2nd and the following
columns, dropping the filesize column only.
It will work with the filenames which contain special characters such as a whitespace except for
a newline character.

How to know which file holds grep result?

There is a directory which contains 100 text files. I used grep to search a given text in the directory as follow:
cat *.txt | grep Ya_Mahdi
and grep shows Ya_Mahdi.
I need to know which file holds the text. Is it possible?
Just get rid of cat and provide the list of files to grep:
grep Ya_Mahdi *.txt
While this would generally work, depending on the number of .txt files in that folder, the argument list for grep might get too large.
You can use find for a bullet proof solution:
find --maxdepth 1 -name '*.txt' -exec grep -H Ya_Mahdi {} +

Counting Amount of Files in Directory Including Hidden Files with BASH

I want to count the amount of files in the directory I am currently in (including hidden files). So far I have this:
ls -1a | wc -l
but I believe this returns 2 more than what I want because it also counts "." (current directory) and ".." (directory above this one) as files. How would I go about returning the correct amount of files?
I believe to count all files / directories / hidden file you can also use BASH array like this:
shopt -s nullglob dotglob
cd /whatever/path
arr=( * )
count="${#arr[#]}"
This also works with filenames that contain space or newlines.
Edit:
ls piped to wc is not the right tool for that job. This is because filenames in UNIX can contain newlines as well. This would lead to counting them multiple times.
Following #gniourf_gniourf's comment (thanks!) the following command will handle newlines in file names correctly and should be used:
find -mindepth 1 -maxdepth 1 -printf x | wc -c
The find command lists files in the current directory - including hidden files, excluding the . and .. because of -mindepth 1. It works non-recursively because of -maxdepth 1.
The -printf x action simply prints an x for each file in the directory which leads to an output like this:
xxxxxxxx
Piped to wc -c (-c means counting characters) you get your final result.
Former Answer:
Use the following command:
ls -1A | wc -l
-a will include all files or directories starting with a dot, but -A will exclude the current folder . and the parent folder ..
I suggest to follow man ls
You almost got it right:
ls -1A | wc -l
If you filenames contain new-lines or other funny characters do:
find -type f -ls | wc -l

Finding and Listing Duplicate Words in a Plain Text file

I have a rather large file that I am trying to make sense of.
I generated a list of my entire directory structure that contains a lot of files using the du -ah command.
The result basically lists all the folders under a specific folder and the consequent files inside the folder in plain text format.
eg:
4.0G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC/B119_C004_0918XJ_003.R3D
3.1G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC/B119_C004_0918XJ_004.R3D
15G ./REEL_02/SCANS/200113/001/Promise Pegasus/BMB 10/RED EPIC DATA/R3D/18-09-12/CAM B/B119_0918NO/B119_0918NO.RDM/B119_C004_0918XJ.RDC
Is there any command that I can run or utility that I can use that will help me identify if there is more than one record of the same filename (usually the last 16 characters in each line + extension) and if such duplicate entries exist, to write out the entire path (full line) to a different text file so i can find and move out duplicate files from my NAS, using a script or something.
Please let me know as this is incredibly stressful to do when the plaintext file itself is 5.2Mb :)
Split each line on /, get the last item (cut cannot do it, so revert each line and take the first one), then sort and run uniq with -d which shows duplicates.
rev FILE | cut -f1 -d/ | rev | sort | uniq -d
I'm not entirely sure what you want to achieve here, but I have the feeling that you are doing it in a difficult way anyway :) Your text file seems to contain spaces in files which make it hard to parse.
I take it that you want to find all files whose name is duplicate. I would start with something like:
find DIR -type f -printf '%f\n' | uniq -d
That means
DIR - look for files in this directory
'-type f' - print only files (not directories or other special files)
-printf '%f' - do not use default find output format, print only file name of each file
uniq -d - print only lines which occur multiple times
You may want to list only some files, not all of them. You can limit which files are taken into account by more rules to find. If you care only about *.R3D and *.RDC files you can use
find . \( -name '*.RDC' -o -name '*.R3D' \) -type f -printf '%f\n' | ...
If I wrongly guessed what you need, sorry :)
I think you are looking for fslint: http://www.pixelbeat.org/fslint/
It can find duplicate files, broken links, and stuff like that.
The following will scan the current subdirectory (using find) and print the full path to duplicate files. You can adapt it take a different action, e.g. delete/move the duplicate files.
while IFS="|" read FNAME LINE; do
# FNAME contains the filename (without dir), LINE contains the full path
if [ "$PREV" != "$FNAME" ]; then
PREV="$FNAME" # new filename found. store
else
echo "Duplicate : $LINE" # duplicate filename. Do something with it
fi
done < <(find . -type f -printf "%f|%p\n" | sort -s)
To try it out, simply copy paste that into a bash shell or save it as a script.
Note that:
due to the sort, the list of files will have to be loaded into memory before the loop begins so the performance will be affected by the number of files returned
the order the files appears after a sort will affect which files are treated as duplicates since the first occurence is assumed to be the original. The -s options ensures a stable sort, which means the order will be dictated by find.
A more straight-forward by less robust robust approach would be something along the lines of:
find . -type f -printf "%20f %p\n" | sort | uniq -D -w20 | cut -c 22-
That will print all files that have duplicate entries, assuming that the longest filename will be 30 characters long. The output differs from the solution above in all entries with the same name are listed (not N-1 entries as above).
You'll need to change the numbers in the find, uniq and cut commands to match the actual case. A number too small may result in false positives.
find . -type f -printf "%20f %p\n" | sort | uniq -D -w20 | cut -c 22-
---------------------------------- ---- ------------ ----------
| | | |
Find all files in current dir | | |
and subdirs and print out | print out all |
the filename (padded to 20 | entries that |
characters) followed by the | have duplicates |
full path | but only look at |
| the first 20 chars |
| |
Sort the output Discard the first
21 chars of each line

List of files to check if present

I have a large list of files, and I need to check to see whether they are somewhere on my linux server. Some of them may be and some of them may not.
Is there a command line tool to do this?
Or must I resort to looping find in a shell script?
There is another alternative, which relies on using find. The idea is to run find once, save all the filenames and then compare them to the list of files.
First, the list of files must be sorted: let us called sortedFiles.txt
run
find / -type f | xargs -n1 -I# basename '#' | sort -u > /tmp/foundFiles.txt
now compare them, and print only those in the first file but not in the second
comm -23 /tmp/sortedFiles.txt /tmp/foundFiles.txt
This will tell you the ones that are not in the computer.
if you want the ones in the computer then use:
comm -12 /tmp/sortedFiles.txt /tmp/foundFiles.txt
this will tell you the ones that are in the computer. The disadvantage is that you don't know where they are. :)
Alternatively run find:
find / -type f > /tmp/allFiles.txt
then iterate using grep, making sure you match the end of the line from the last /
cat /tmp/filesToFind.txt | xargs -n1 -I# egrep '/#$' /tmp/allFiles.txt
This will print only the locations of the files found, but will not print those that are not found.
--dmg
I assume you have a list of filenames without path (all unique). I would suggest to use locate
assuming you have the file with the filenames: files.txt
cat files.txt | xargs -n1 -I# locate -b '\#' | xargs -n1 -I# basename # | uniq > found.txt
then just diff the files.
diff files.txt found.txt
oh, one clarification. This will tell you if the files EXIST in your computer, not where :)
if you want to know where simple run:
cat files.txt | xargs -n1 -I# locate -b '\#'
--dmg
If you do the loop, it's better to use locate instead of find. It's faster!
If lista contains file names you can use:
cat lista | xargs locate

Resources