Command to check for a very large number of .gz files in a directory [duplicate]

Command to check for a very large number of .gz files in a directory [duplicate] - linux

This question already has answers here:
Argument list too long error for rm, cp, mv commands
(31 answers)
Closed 5 years ago.
Below is the current days file. Previous days file converted to .gz by system. I wanted to find the total count of last days specific .gz files. I tried the below command which gives me the error. Please suggest
bash-3.2$ ls -lrth|tail
299K Mar 23 2017 N08170323091903766
333K Mar 23 2017 N08170323091903771
328K Mar 23 2017 N09170323091903776
367K Mar 23 2017 N09170323091903782
347K Mar 23 2017 N04170323092003784
368K Mar 23 2017 N08170323092003783
***bash-3.2$ ls -lrth N08170322*|wc -l***
bash: /usr/bin/ls: Arg list too long
0
***bash-3.2$ zcat N08170322*.gz|wc -l***
bash: /usr/bin/zcat: Arg list too long
0

This is happening because you have too many files in the directory.
You can easily get around the first issue:
ls | grep -c N08170322
or, to be even more precise:
ls | grep -c '^N08170322'
would give you the list of files. However, a better way to do this is:
find . -name "N08170322*" -exec ls {} + | wc -l
which will address the ls parsing issue mentioned in #hek2mgl's comment.
If you really want to count the lines of all the zipped files in one shot, you can do this:
find . -name "N08170322*" -exec zcat {} + | wc -l
See also:
Argument list too long error for rm, cp, mv commands

Use this
find . -name "N08170322*" -exec ls {} \; |wc -l
As explained in the below answer, you are getting argument list too long as there are multiple files in the directory. To overcome it, you can club it with find and exec
Edit: Created use case to check if command works with/without ls
These are the 3 empty files I created.
$ find -name "file*" -exec ls {} \;
./file1
./file2
./file3
Running wc -l without ls, prints number of lines in each file.
$ find -name "file*" -exec wc -l {} \;
0 ./file1
0 ./file2
0 ./file3
Running it with ls gives me number count of number of files, which is what OP wants.
$ find -name "file*" -exec ls {} \; | wc -l
3

Related

How to redirect out put of xargs when using sed

Since swiching over to a better management system I am wanting to remove all the redundant logs at the top of each of our source files. In Notepad++ I was able to achieve the result by using "replace in files" and replacing matches to \A(//.*\n)+ with blank. On Linux however I am having no such luck and am needing to resort to 'xargs' and 'sed'.
The sed expression I'm using is:
sed '1,/^[^\/]/{/^[^\/]/b; d}'
Ugly to be sure but it does seem to work.
The problem I'm having is when I try to run that through 'xargs' in order to feed it all the source files in our system I am unable to redirect the output to 'stripped' files, which I then intend to copy over the originals.
I want something in the line of:
find . -name "*.com" -type f -print0 | xargs -0 -I file sed '1,/^[^\/]/{/^[^\/]/b; d}' "file" > "file.stripped"
However I'm having grief passing the ">" through to the receiving environment (shell) as I'm already using too many quote marks. I have tried all manner of escaping and shell "wrappers" but I just can't get it to play ball.
Anyone care to point me in the right direction?
Thanks,
Slarti.

I made a similar scenario with a simpler sed expression just as an example, see if it works for you:
I created 3 files with the string "abcd" inside each:
# ls -l
total 12
-rw-r--r-- 1 root root 5 Oct 6 09:05 test.aaaaa.com
-rw-r--r-- 1 root root 5 Oct 6 09:05 test2.aaaaa.com
-rw-r--r-- 1 root root 5 Oct 6 09:05 test3.aaaaa.com
# cat test*
abcd
abcd
abcd
Running the find command as you showed using the -exec option instead of xargs, and replacing the sed expression for a silly one that simply replaces every "a" for "b" and the option -i, that writes directly do the input file:
# find . -name "*.com" -type f -print0 -exec sed -i 's/a/b/g' {} \;
./test2.aaaaa.com./test3.aaaaa.com./test.aaaaa.com
# cat test*
bbcd
bbcd
bbcd
In your case it should look like this:
# find . -name "*.com" -type f -print0 -exec sed -i '1,/^[^\/]/{/^[^\/]/b; d}' {} \;

listing files and copy it in unix

The purpose is to copy files generated in last 2 hours. Here is the small script:
a=`find . -mmin -120 -ls`
cp $a /tmp/
echo $a
401 1 drwxr-x--- 2 oracle oinstall 1024 Mar 26 11:00 . 61
5953 -rw-r----- 1 oracle oinstall 6095360 Mar 26 11:00 ./file1
5953 -rw-r----- 1 oracle oinstall 6095360 Mar 26 11:00 ./file2
I get the following error:
cp: invalid option -- 'w'
Try `cp --help' for more information.
How can I fix the script ?

the -ls is giving you ls style output. Try dropping that and you should just get the relative path to the file which should be more like what you want. Or see Biffen's comment on your question, that seems like the approach I would have taken.

One problem is that -ls will print a lot of things beside the filenames, and they will be passed to cp and cp will be confused. So the first thing to do is to stop using -ls. (In the future you can use set -x to see what gets executed, it should help you debug this type of problem.)
Another problem is that the output of find can contain spaces and other things (imagine a file named $(rm -r *)) that can't simply be passed as arguments to cp.
I see three different solutions:
Use a single find command with -exec:
find . -mmin -120 -exec cp {} /tmp/ \;
Use xargs:
find . -mmin -120 -print0 | xargs -0 cp -t /tmp/
(Note the use of -t with cp to account for the swapped arguments.
Iterate over the output of find:
while IFS='' read -r -d '' file
do
cp "${file}" /tmp/
done < <( find . -mmin -120 -print0 )
(Caveat: I haven't tested any of the above.)

All you have to do is to extract only the filenames. So, change the find command to the following:
a=`find . -mmin -120 -type f`
cp $a /tmp/
Above find command only captures the files and finds only files whose where modified in last 120 mins. Or do it with single find command like below:
find . -mmin -120 -type f -exec cp '{}' /tmp/ \;

Find command in linux

What does the following means ?
find myDirectory -name myFile -exec ls \-ln {} \;
I've looked here but didn't understand exactly
-exec command True if the executed command returns a zero value as exit status. The end of command must be punctuated by an escaped semicolon. A command argument {} is replaced by the current path name.
This part -exec ls \-ln {} \; is not clear to me .
Regards

That means: find all files with a name myFile in the current directory and all its subdirectories and for every file that was found run ls -ln with the name of the file.
For example:
$ mkdir a
$ touch myFile a/myFile
$ find -name myFile -exec ls -ln {} \;
-rw-r--r-- 1 1000 1000 0 Jun 17 13:07 ./myFile
-rw-r--r-- 1 1000 1000 0 Jun 17 13:07 ./a/myFile
In this case find will run ls twice:
ls -ln ./myFile
ls -ln ./a/myFile
Every time it will expand {} as the fullname of the found file.
Also I must add that you need the backslash before -ln in this case. Yes, you can use it, but it is absolutely useless here.

find myDirectory -name myFile -exec ls \-ln {} \;
It says find myFile in directory myDirectory and once all the files are found then execute the file listing command, that is in linix ls with command options -l and -n on the files found.
So, ultimately you will get all the myFiles accompanied with ls command result.

Recursively counting files in a Linux directory

How can I recursively count files in a Linux directory?
I found this:
find DIR_NAME -type f ¦ wc -l
But when I run this it returns the following error.
find: paths must precede expression: ¦

This should work:
find DIR_NAME -type f | wc -l
Explanation:
-type f to include only files.
| (and not ¦) redirects find command's standard output to wc command's standard input.
wc (short for word count) counts newlines, words and bytes on its input (docs).
-l to count just newlines.
Notes:
Replace DIR_NAME with . to execute the command in the current folder.
You can also remove the -type f to include directories (and symlinks) in the count.
It's possible this command will overcount if filenames can contain newline characters.
Explanation of why your example does not work:
In the command you showed, you do not use the "Pipe" (|) to kind-of connect two commands, but the broken bar (¦) which the shell does not recognize as a command or something similar. That's why you get that error message.

For the current directory:
find -type f | wc -l

If you want a breakdown of how many files are in each dir under your current dir:
for i in */ .*/ ; do
echo -n $i": " ;
(find "$i" -type f | wc -l) ;
done
That can go all on one line, of course. The parenthesis clarify whose output wc -l is supposed to be watching (find $i -type f in this case).

On my computer, rsync is a little bit faster than find | wc -l in the accepted answer:
$ rsync --stats --dry-run -ax /path/to/dir /tmp
Number of files: 173076
Number of files transferred: 150481
Total file size: 8414946241 bytes
Total transferred file size: 8414932602 bytes
The second line has the number of files, 150,481 in the above example. As a bonus you get the total size as well (in bytes).
Remarks:
the first line is a count of files, directories, symlinks, etc all together, that's why it is bigger than the second line.
the --dry-run (or -n for short) option is important to not actually transfer the files!
I used the -x option to "don't cross filesystem boundaries", which means if you execute it for / and you have external hard disks attached, it will only count the files on the root partition.

You can use
$ tree
after installing the tree package with
$ sudo apt-get install tree
(on a Debian / Mint / Ubuntu Linux machine).
The command shows not only the count of the files, but also the count of the directories, separately. The option -L can be used to specify the maximum display level (which, by default, is the maximum depth of the directory tree).
Hidden files can be included too by supplying the -a option .

Since filenames in UNIX may contain newlines (yes, newlines), wc -l might count too many files. I would print a dot for every file and then count the dots:
find DIR_NAME -type f -printf "." | wc -c
Note: The -printf option does only work with find from GNU findutils. You may need to install it, on a Mac for example.

Combining several of the answers here together, the most useful solution seems to be:
find . -maxdepth 1 -type d -print0 |
xargs -0 -I {} sh -c 'echo -e $(find "{}" -printf "\n" | wc -l) "{}"' |
sort -n
It can handle odd things like file names that include spaces parenthesis and even new lines. It also sorts the output by the number of files.
You can increase the number after -maxdepth to get sub directories counted too. Keep in mind that this can potentially take a long time, particularly if you have a highly nested directory structure in combination with a high -maxdepth number.

If you want to know how many files and sub-directories exist from the present working directory you can use this one-liner
find . -maxdepth 1 -type d -print0 | xargs -0 -I {} sh -c 'echo -e $(find {} | wc -l) {}' | sort -n
This will work in GNU flavour, and just omit the -e from the echo command for BSD linux (e.g. OSX).

You can use the command ncdu. It will recursively count how many files a Linux directory contains. Here is an example of output:
It has a progress bar, which is convenient if you have many files:
To install it on Ubuntu:
sudo apt-get install -y ncdu
Benchmark: I used https://archive.org/details/cv_corpus_v1.tar (380390 files, 11 GB) as the folder where one has to count the number of files.
find . -type f | wc -l: around 1m20s to complete
ncdu: around 1m20s to complete

If what you need is to count a specific file type recursively, you can do:
find YOUR_PATH -name '*.html' -type f | wc -l
-l is just to display the number of lines in the output.
If you need to exclude certain folders, use -not -path
find . -not -path './node_modules/*' -name '*.js' -type f | wc -l

tree $DIR_PATH | tail -1
Sample Output:
5309 directories, 2122 files

If you want to avoid error cases, don't allow wc -l to see files with newlines (which it will count as 2+ files)
e.g. Consider a case where we have a single file with a single EOL character in it
> mkdir emptydir && cd emptydir
> touch $'file with EOL(\n) character in it'
> find -type f
./file with EOL(?) character in it
> find -type f | wc -l
2
Since at least gnu wc does not appear to have an option to read/count a null terminated list (except from a file), the easiest solution would just be to not pass it filenames, but a static output each time a file is found, e.g. in the same directory as above
> find -type f -exec printf '\n' \; | wc -l
1
Or if your find supports it
> find -type f -printf '\n' | wc -l
1

To determine how many files there are in the current directory, put in ls -1 | wc -l. This uses wc to do a count of the number of lines (-l) in the output of ls -1. It doesn't count dotfiles. Please note that ls -l (that's an "L" rather than a "1" as in the previous examples) which I used in previous versions of this HOWTO will actually give you a file count one greater than the actual count. Thanks to Kam Nejad for this point.
If you want to count only files and NOT include symbolic links (just an example of what else you could do), you could use ls -l | grep -v ^l | wc -l (that's an "L" not a "1" this time, we want a "long" listing here). grep checks for any line beginning with "l" (indicating a link), and discards that line (-v).
Relative speed: "ls -1 /usr/bin/ | wc -l" takes about 1.03 seconds on an unloaded 486SX25 (/usr/bin/ on this machine has 355 files). "ls -l /usr/bin/ | grep -v ^l | wc -l" takes about 1.19 seconds.
Source: http://www.tldp.org/HOWTO/Bash-Prompt-HOWTO/x700.html

With bash:
Create an array of entries with ( ) and get the count with #.
FILES=(./*); echo ${#FILES[#]}
Ok that doesn't recursively count files but I wanted to show the simple option first. A common use case might be for creating rollover backups of a file. This will create logfile.1, logfile.2, logfile.3 etc.
CNT=(./logfile*); mv logfile logfile.${#CNT[#]}
Recursive count with bash 4+ globstar enabled (as mentioned by #tripleee)
FILES=(**/*); echo ${#FILES[#]}
To get the count of files recursively we can still use find in the same way.
FILES=(`find . -type f`); echo ${#FILES[#]}

For directories with spaces in the name ... (based on various answers above) -- recursively print directory name with number of files within:
find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done
Example (formatted for readability):
pwd
/mnt/Vancouver/Programming/scripts/claws/corpus
ls -l
total 8
drwxr-xr-x 2 victoria victoria 4096 Mar 28 15:02 'Catabolism - Autophagy; Phagosomes; Mitophagy'
drwxr-xr-x 3 victoria victoria 4096 Mar 29 16:04 'Catabolism - Lysosomes'
ls 'Catabolism - Autophagy; Phagosomes; Mitophagy'/ | wc -l
138
## 2 dir (one with 28 files; other with 1 file):
ls 'Catabolism - Lysosomes'/ | wc -l
29
The directory structure is better visualized using tree:
tree -L 3 -F .
.
├── Catabolism - Autophagy; Phagosomes; Mitophagy/
│   ├── 1
│   ├── 10
│   ├── [ ... SNIP! (138 files, total) ... ]
│   ├── 98
│   └── 99
└── Catabolism - Lysosomes/
├── 1
├── 10
├── [ ... SNIP! (28 files, total) ... ]
├── 8
├── 9
└── aaa/
└── bbb
3 directories, 167 files
man find | grep mindep
-mindepth levels
Do not apply any tests or actions at levels less than levels
(a non-negative integer). -mindepth 1 means process all files
except the starting-points.
ls -p | grep -v / (used below) is from answer 2 at https://unix.stackexchange.com/questions/48492/list-only-regular-files-but-not-directories-in-current-directory
find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Catabolism - Lysosomes: 28
./Catabolism - Lysosomes/aaa: 1
Applcation: I want to find the max number of files among several hundred directories (all depth = 1) [output below again formatted for readability]:
date; pwd
Fri Mar 29 20:08:08 PDT 2019
/home/victoria/Mail/2_RESEARCH - NEWS
time find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i": " ; ls -p "$i" | grep -v / | wc -l ; done > ../../aaa
0:00.03
[victoria#victoria 2_RESEARCH - NEWS]$ head -n5 ../../aaa
./RNA - Exosomes: 26
./Cellular Signaling - Receptors: 213
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Stress - Physiological, Cellular - General: 261
./Ancient DNA; Ancient Protein: 34
[victoria#victoria 2_RESEARCH - NEWS]$ sed -r 's/(^.*): ([0-9]{1,8}$)/\2: \1/g' ../../aaa | sort -V | (head; echo ''; tail)
0: ./Genomics - Gene Drive
1: ./Causality; Causal Relationships
1: ./Cloning
1: ./GenMAPP 2
1: ./Pathway Interaction Database
1: ./Wasps
2: ./Cellular Signaling - Ras-MAPK Pathway
2: ./Cell Death - Ferroptosis
2: ./Diet - Apples
2: ./Environment - Waste Management
988: ./Genomics - PPM (Personalized & Precision Medicine)
1113: ./Microbes - Pathogens, Parasites
1418: ./Health - Female
1420: ./Immunity, Inflammation - General
1522: ./Science, Research - Miscellaneous
1797: ./Genomics
1910: ./Neuroscience, Neurobiology
2740: ./Genomics - Functional
3943: ./Cancer
4375: ./Health - Disease
sort -V is a natural sort. ... So, my max number of files in any of those (Claws Mail) directories is 4375 files. If I left-pad (https://stackoverflow.com/a/55409116/1904943) those filenames -- they are all named numerically, starting with 1, in each directory -- and pad to 5 total digits, I should be ok.
Addendum
Find the total number of files, subdirectories in a directory.
$ date; pwd
Tue 14 May 2019 04:08:31 PM PDT
/home/victoria/Mail/2_RESEARCH - NEWS
$ ls | head; echo; ls | tail
Acoustics
Ageing
Ageing - Calorie (Dietary) Restriction
Ageing - Senescence
Agriculture, Aquaculture, Fisheries
Ancient DNA; Ancient Protein
Anthropology, Archaeology
Ants
Archaeology
ARO-Relevant Literature, News
Transcriptome - CAGE
Transcriptome - FISSEQ
Transcriptome - RNA-seq
Translational Science, Medicine
Transposons
USACEHR-Relevant Literature
Vaccines
Vision, Eyes, Sight
Wasps
Women in Science, Medicine
$ find . -type f | wc -l
70214 ## files
$ find . -type d | wc -l
417 ## subdirectories

There are many correct answers here. Here's another!
find . -type f | sort | uniq -w 10 -c
where . is the folder to look in and 10 is the number of characters by which to group the directory.

I have written ffcnt to speed up recursive file counting under specific circumstances: rotational disks and filesystems that support extent mapping.
It can be an order of magnitude faster than ls or find based approaches, but YMMV.

suppose you want a per directory total files, try:
for d in `find YOUR_SUBDIR_HERE -type d`; do
printf "$d - files > "
find $d -type f | wc -l
done
for current dir try this:
for d in `find . -type d`; do printf "$d - files > "; find $d -type f | wc -l; done;
if you have long space names you need change IFS, like this:
OIFS=$IFS; IFS=$'\n'
for d in `find . -type d`; do printf "$d - files > "; find $d -type f | wc -l; done
IFS=$OIFS

We can use tree command it displays all the files and folders recursively. As well as it displays count of folders and files in last line of output.
$ tree path/to/folder/
path/to/folder/
├── a-first.html
├── b-second.html
├── subfolder
│ ├── readme.html
│ ├── code.cpp
│ └── code.h
└── z-last-file.html
1 directories, 6 files
For only last line of output in tree command we can use tail command on it's output
$ tree path/to/folder/ | tail -1
1 directories, 6 files
for installing tree we can use below command
$ sudo apt-get install tree

This alternate approach with filtering for format counts all available grub kernel modules:
ls -l /boot/grub/*.mod | wc -l

Based on the responses given above and comments, I've came up with the following file count listing. Especially it's a combination of the solution provided by #Greg Bell, with comments from #Arch Stanton
& #Schneems
Count all files in the current directory & subdirectories
function countit { find . -maxdepth 1000000 -type d -print0 | while IFS= read -r -d '' i ; do file_count=$(find "$i" -type f | wc -l) ; echo "$file_count: $i" ; done }; countit | sort -n -r >file-count.txt
Count all files of given name in the current directory & subdirectories
function countit { find . -maxdepth 1000000 -type d -print0 | while IFS= read -r -d '' i ; do file_count=$(find "$i" -type f | grep <enter_filename_here> | wc -l) ; echo "$file_count: $i" ; done }; countit | sort -n -r >file-with-name-count.txt

find -type f | wc -l
OR (If directory is current directory)
find . -type f | wc -l

This will work completely fine. Simple short. If you want to count the number of files present in a folder.
ls | wc -l

ls -l | grep -e -x -e -dr | wc -l
long list
filter files and dirs
count the filtered line no

Recursively traverse Samba shares?

With bash on linux, how would I write a command to recursively traverse shares mounted, and run commands on each file, to get the file type and size, permissions etc, and then output all of this to a file?

A CIFS share mount would look like a regular directory tree in the linux shell.
The command to search as you need is therefore generic.
From the base directory,
find . -type f -exec ls -lsrt {} \; > file.txt
Ok, this does not give you the file-type detail;
that can be done with a -exec file filename on each file.

mount -v | grep smbfs | awk '{print $3}' | xargs ls -lsR
which you can redirect to a file.

mount -v | awk '/smbfs/{
cmd="ls -lsR "$3
while((cmd | getline d)>0){
print d "->file "$3
}
close(cmd)
}'

find $(mount -t smbfs | awk '{print $3}') -mount -type f -ls -execdir file {} \;
...
33597911 4 -rw-rw-r-- 2 peter peter 5 Dec 6 00:09 ./test.d\ ir/base
./base: ASCII text
3662 4 -rw-rw-r-- 2 peter peter 4 Dec 6 02:26 ./test.txt...
./test.txt...: ASCII text
3661 0 -rw-rw-r-- 2 peter peter 0 Dec 6 02:45 ./foo.txt
./foo.txt: empty
...
If you used -exec file {} +, it would run file once with multiple arguments, but then the output wouldn't be nicely interleaved with find's -ls output. (GNU find's -execdir {} + currently behaves the same as -execdir {} \;, due to a bug workaround. Use -exec file {} \; if you want the full path in the file output as well as in the -ls output above it.
find -ls output is not quite the same as ls -l, since it includes inode an number of blocks as the first two fields.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Command to check for a very large number of .gz files in a directory [duplicate] - linux

Related

How to redirect out put of xargs when using sed

listing files and copy it in unix

Find command in linux

Recursively counting files in a Linux directory

Recursively traverse Samba shares?

Categories

Resources