How to find the total lines of csv files in a directory on Linux? - linux

Working with huge CSV files for data analytics, we commonly need to know the row count of all csv files located in a particular folder.
But how to do it with just only one command in Linux?

If you want to check the total line of all the .csv files in a directory, you can use find and wc:
find . -type f -name '*.csv' -exec wc -l {} +

To get lines count for every file recursively you can use Cesar's answer:
$ LANG=C find /tmp/test -type f -name '*.csv' -exec wc -l '{}' +
49 /tmp/test/sub/3.csv
22 /tmp/test/1.csv
419 /tmp/test/2.csv
490 total
To get total lines count for all files recursive:
$ LANG=C find /tmp/test -type f -name '*.csv' -exec cat '{}' + | wc -l
490

Related

How to find the count of and total sizes of multiple files in directory?

I have a directory, inside it multiple directories which contains many type of files.
I want to find *.jpg files then to get the count and total size of all individual one.
I know I have to use find wc -l and du -ch but I don't know how to combine them in a single script or in a single command.
find . -type f name "*.jpg" -exec - not sure how to connect all the three
Supposing your starting folder is ., this will give you all files and the total size:
find . -type f -name '*.jpg' -exec du -ch {} +
The + at the end executes du -ch on all files at once - rather than per file, allowing you the get the frand total.
If you want to know only the total, add | tail -n 1 at the end.
Fair warning: this in fact executes
du -ch file1 file2 file3 ...
Which may break for very many files.
To check how many:
$ getconf ARG_MAX
2097152
That's what is configured on my system.
This doesn't give you the number of files though. You'll need to catch the output of find and use it twice.
The last line is the total, so we'll use all but the last line to get the number of files, and the last one for the total:
OUT=$(find . -type f -name '*.jpg' -exec du -ch {} +)
N=$(echo "$OUT" | head -n -1 | wc -l)
SIZE=$(echo "$OUT" | tail -n 1)
echo "Number of files: $N"
echo $SIZE
Which for me gives:
Number of files: 143
584K total

Find all files pattern with total size

In order to find all logs files with a pattern from all subdirectories I used the command :
du -csh *log.2017*
But this command does not search in subdirectories. Is there any way to get the total size of all files with a pattern from all subdirectories?
This will do the trick:
find . -name *log.2017* | xargs du -csh
find . -name *log.2017* -type f -exec stat -c "%s" {} \; | paste -sd+ | bc
you can use find command
find /path -type f -name "*log.2017*" -exec stat -c "%s" {} \; | bc
It will do the search recursively.

How do I find the number of all .txt files in a directory and all sub directories using specifically the find command and the wc command?

So far I have this:
find -name ".txt"
I'm not quite sure how to use wc to find out the exact number of files. When using the command above, all the .txt files show up, but I need the exact number of files with the .txt extension. Please don't suggest using other commands as I'd like to specifically use find and wc. Thanks
Try:
find . -name '*.txt' | wc -l
The -l option to wc tells it to return just the number of lines.
Improvement (requires GNU find)
The above will give the wrong number if any .txt file name contains a newline character. This will work correctly with any file names:
find . -iname '*.txt' -printf '1\n' | wc -l
-printf '1\n tells find to print just the line 1 for each file name found. This avoids problems with file names having difficult characters.
Example
Let's create two .txt files, one with a newline in its name:
$ touch dir1/dir2/a.txt $'dir1/dir2/b\nc.txt'
Now, let's find the find command:
$ find . -name '*.txt'
./dir1/dir2/b?c.txt
./dir1/dir2/a.txt
To count the files:
$ find . -name '*.txt' | wc -l
3
As you can see, the answer is off by one. The improved version, however, works correctly:
$ find . -iname '*.txt' -printf '1\n' | wc -l
2
find -type f -name "*.h" -mtime +10 -print | wc -l
This worked out.

cat files in subdirectories using linux commands

I have the following directories:
P922_101
P922_102
.
.
Each directory, for instance P922_101 has following subdirectories:
140311_AH8MHGADXX 140401_AH8CU4ADXX
Each subdirectory, for instance 140311_AH8MHGADXX has the following files:
1_140311_AH8MH_P922_101_1.fastq.gz 1_140311_AH8MH_P922_101_2.fastq.gz
2_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_2.fastq.gz
And files in 140401_AH8CU4ADXX are:
1_140401_AH8CU_P922_101_1.fastq.gz 1_140401_AH8CU_P922_4001_2.fastq.gz
2_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_4001_2.fastq.gz
I want to do 'cat' for the files in the subdirectories in the following way:
cat 1_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_1.fastq.gz
1_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_101_1.fastq.gz > P922_101_1.fastq.gz
which means that files ending with _1.fastq.gz should be concatenated into a single file and files ending with _2.fatsq.gz into another file.
It should be run for all files in subdirectories in all directories. Could someone give a linux solution to do this?
Since they're compressed, you should probably use gzip -dc (decompress and write to stdout) -
find /somePath -type f -name "*.fastq.gz" -exec gzip -dc {} \; | \
tee -a /someOutFolder/out.txt
You can use find for this:
find /top/path -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > one_file
find /top/path -mindepth 2 -type f -name "*_2.fastq.gz" -exec cat {} \; > another_file
This will look for all the files starting from /top/path and having a name matching the pattern _1.fastq.gz / _2.fastq.gz and cat them into the desired file. -mindepth 2 makes find look for files that are at least under the current directory; this way, files in /top/path won't be matched.
Note that you will probably need zcat instead of cat, for gz files.
As you keep adding details in comments, let's see what else we can do:
Say you have the list of directories in a file directories_list, each line containing one:
while read directory
do
find $directory -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > $directory/output
done < directories_list

counting files in directory linux

Q2. Write a script that takes a directory name as command line argument and display the attributes of various files in it e.g.
Regular Files
Total No of files
No of directories
Files allowing write permissions
Files allowing read permissions
Files allowing execute permissions
File having size 0
Hidden files in directory
working in linux in shell script
what i have done is
find DIR_NAME -type f -print | wc -l
To count all files (including subdirs):
find /home/vivek -type f -print| wc -l
To count all dirs including subdirs:
find . -type d -print | wc -l
To only count files in given dir only (no subdir):
find /dest -maxdepth 1 -type f -print| wc -l
To only count dirs in given dir only (no subdir):
find /path/to/foo -maxdepth 1 -type d -print| wc -l
All your questions can be solved by looking into man find
-type f
no option necessary
-type d
-perm /u+w,g+w or some variation
-perm /u+r,g+r
-perm /u+x,g+x
-size 0
-name '.*'

Resources