How can I list the files in a directory that have zero size/length in the Linux terminal? - linux

I am new to using the Linux terminal, so I'm just starting to learn about the commands I can use. I have figured out how to list the files in a directory using the Linux terminal, and how to list them according to file size. I was wondering if there's a way to list only the files of a specific file size. Right now, I'm trying to list files with zero size, like those that you might create using the touch command. I looked through the flags I could use when I use ls, but I couldn't find exactly what I was looking for. Here's what I have right now:
ls -lsh /mydirectory
The "mydirectory" part is just a placeholder. Is there anything I can add that will only list files that have zero size?

There's a few ways you can go about this; if you want to stick with ls -l you could use e.g. awk in a pipeline to do the filtering.
ls -lsh /mydirectory | awk '$5 == 0'
Here, $5 is the fifth field in ls's output, the size.
Another approach would be to use a different tool, find.
find /mydirectory -maxdepth 1 -size 0 -ls
This will also list hidden files, analogous to an ls -la.
The -maxdepth 1 is there so it doesn't traverse the directory tree if you have nested directories.

A simple script can do this.
for file_name in *
do
if [[ !( -s $file_name ) ]]
then
echo $file_name
fi
done
explanation:
for is a loop. * gives list of all files in a current directory.
-s file_name becomes true if the file has size greater than 0.
! to negate that

Related

Bash script that counts and prints out the files that start with a specific letter

How do i print out all the files of the current directory that start with the letter "k" ?Also needs to count this files.
I tried some methods but i only got errors or wrong outputs. Really stuck on this as a newbie in bash.
Try this Shellcheck-clean pure POSIX shell code:
count=0
for file in k*; do
if [ -f "$file" ]; then
printf '%s\n' "$file"
count=$((count+1))
fi
done
printf 'count=%d\n' "$count"
It works correctly (just prints count=0) when run in a directory that contains nothing starting with 'k'.
It doesn't count directories or other non-files (e.g. fifos).
It counts symlinks to files, but not broken symlinks or symlinks to non-files.
It works with 'bash' and 'dash', and should work with any POSIX-compliant shell.
Here is a pure Bash solution.
files=(k*)
printf "%s\n" "${files[#]}"
echo "${#files[#]} files total"
The shell expands the wildcard k* into the array, thus populating it with a list of matching files. We then print out the array's elements, and their count.
The use of an array avoids the various problems with metacharacters in file names (see e.g. https://mywiki.wooledge.org/BashFAQ/020), though the syntax is slightly hard on the eyes.
As remarked by pjh, this will include any matching directories in the count, and fail in odd ways if there are no matches (unless you set nullglob to true). If avoiding directories is important, you basically have to get the directories into a separate array and exclude those.
To repeat what Dominique also said, avoid parsing ls output.
Demo of this and various other candidate solutions:
https://ideone.com/XxwTxB
To start with: never parse the output of the ls command, but use find instead.
As find basically goes through all subdirectories, you might need to limit that, using the -maxdepth switch, use value 1.
In order to count a number of results, you just count the number of lines in your output (in case your output is shown as one piece of output per line, which is the case of the find command). Counting a number of lines is done using the wc -l command.
So, this comes down to the following command:
find ./ -maxdepth 1 -type f -name "k*" | wc -l
Have fun!
This should work as well:
VAR="k"
COUNT=$(ls -p ${VAR}* | grep -v ":" | wc -w)
echo -e "Total number of files: ${COUNT}\n" 1>&2
echo -e "Files,that begin with ${VAR} are:\n$(ls -p ${VAR}* | grep -v ":" )" 1>&2

Quickly list random set of files in directory in Linux

Question:
I am looking for a performant, concise way to list N randomly selected files in a Linux directory using only Bash. The files must be randomly selected from different subdirectories.
Why I'm asking:
In Linux, I often want to test a random selection of files in a directory for some property. The directories contain 1000's of files, so I only want to test a small number of them, but I want to take them from different subdirectories in the directory of interest.
The following returns the paths of 50 "randomly"-selected files:
find /dir/of/interest/ -type f | sort -R | head -n 50
The directory contains many files, and resides on a mounted file system with slow read times (accessed through ssh), so the command can take many minutes. I believe the issue is that the first find command finds every file (slow), and only then prints a random selection.
If you are using locate and updatedb updates regularly (daily is probably the default), you could:
$ locate /home/james/test | sort -R | head -5
/home/james/test/10kfiles/out_708.txt
/home/james/test/10kfiles/out_9637.txt
/home/james/test/compr/bar
/home/james/test/10kfiles/out_3788.txt
/home/james/test/test
How often do you need it? Do the work periodically in advance to have it quickly available when you need it.
Create a refreshList script.
#! /bin/env bash
find /dir/of/interest/ -type f | sort -R | head -n 50 >/tmp/rand.list
mv -f /tmp/rand.list ~
Put it in your crontab.
0 7-20 * * 1-5 nice -25 ~/refresh
Then you will always have a ~/rand.list that's under an hour old.
If you don't want to use cron and aren't too picky about how old it is, just write a function that refreshes the file after you use it every time.
randFiles() {
cat ~/rand.list
{ find /dir/of/interest/ -type f |
sort -R | head -n 50 >/tmp/rand.list
mv -f /tmp/rand.list ~
} &
}
If you can't run locate and the find command is too slow, is there any reason this has to be done in real time?
Would it be possible to use cron to dump the output of the find command into a file and then do the random pick out of there?

Unable to cat ~9000 files using command line

I am trying to cat ~9000 fasta like files into one larger file. All of the files are in a single subfolder. I keep getting the argument list is to long error.
This is a sample name from one of the files
efetch.fcgi?db=nuccore&id=CL640905.1&rettype=fasta&retmode=text
They are considered a document type file by the computer.
You can't use cat * > concatfile as you have limits on command line size. So take them one at a time and append:
ls | while read; do cat "$REPLY" >> concatfile; done
(Make sure concatfile doesn't exist beforehand.)
EDIT: As user6292850 rightfully points out, I might be overthinking it. This suffices, if your files don't have too weird names:
ls | xargs cat > concatfile
(but files with spaces in them, for example, would blow it up)
There is a limit on how many arguments you can place on the commandline.
You could use a for loop to handle this:
while read file;do
cat "${file}" >> path/to/output_folder;
done < <(find path/to/output_folder -maxdepth 1 -type f -print)
This will bypass the problem of an expanded glob with too many arguments.

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

This might be a very simple thing for a shell scripting programmer but am pretty new to it. I was trying to execute the below command in a shell script and save the output into a variable
inputfile=$(ls -ltr *.{PDF,pdf} | head -1 | awk '{print $9}')
The command works fine when I fire it from terminal but fails when executed through a shell script (sh). Why is that the command fails, does it mean that shell script doesn't support the command or am I doing it wrong? Also how do I know if a command will work in shell or not?
Just to give you a glimpse of my requirement, I was trying to get the oldest file from a particular directory (I also want to make sure upper case and lower case extensions are handled). Is there any other way to do this ?
The above command will work correctly only if BOTH *.pdf and *.PDF files are in the directory you are currently.
If you would like to execute it in a directory with only one of those you should consider using e.g.:
inputfiles=$(find . -maxdepth 1 -type f \( -name "*.pdf" -or -name "*.PDF" \) | xargs ls -1tr | head -1 )
NOTE: The above command doesn't work with files with new lines, or with long list of found files.
Parsing ls is always a bad idea. You need another strategy.
How about you make a function that gives you the oldest file among the ones given as argument? the following works in Bash (adapt to your needs):
get_oldest_file() {
# get oldest file among files given as parameters
# return is in variable get_oldest_file_ret
local oldest f
for f do
[[ -e $f ]] && [[ ! $oldest || $f -ot $oldest ]] && oldest=$f
done
get_oldest_file_ret=$oldest
}
Then just call as:
get_oldest_file *.{PDF,pdf}
echo "oldest file is: $get_oldest_file_ret"
Now, you probably don't want to use brace expansions like this at all. In fact, you very likely want to use the shell options nocaseglob and nullglob:
shopt -s nocaseglob nullglob
get_oldest_file *.pdf
echo "oldest file is: $get_oldest_file_ret"
If you're using a POSIX shell, it's going to be a bit trickier to have the equivalent of nullglob and nocaseglob.
Is perl an option? It's ubiquitous on Unix.
I would suggest:
perl -e 'print ((sort { -M $b <=> -M $a } glob ( "*.{pdf,PDF}" ))[0]);';
Which:
uses glob to fetch all files matching the pattern.
sort, using -M which is relative modification time. (in days).
fetches the first element ([0]) off the sort.
Prints that.
As #gniourf_gniourf says, parsing ls is a bad idea. Such as leaving unquoted globs, and generally not counting for funny characters in file names.
find is your friend:
#!/bin/sh
get_oldest_pdf() {
#
# echo path of oldest *.pdf (case-insensitive) file in current directory
#
find . -maxdepth 1 -mindepth 1 -iname "*.pdf" -printf '%T# %p\n' \
| sort -n \
| tail -1 \
| cut -d\ -f1-
}
whatever=$(get_oldest_pdf)
Notes:
find has numerous ways of formatting the output, including
things like access time and/or write time. I used '%T# %p\n',
where %T# is last write time in UNIX time format incl.fractal part.
This will never containt space so it's safe to use as separator.
Numeric sort and tail get the last item, sorting by the time,
cut removes the time from the output.
I used IMO much easier to read/maintain pipe notation, with help of \.
the code should run on any POSIX shell,
You could easily adjust the function to parametrize the pattern,
time used (access/write), control the search depth or starting dir.

BASH - Only printing the deepest directory in path

I need some help.....
In my .bashrc file I have a VERY useful function (It may be a bit rough and ready, and a bit hacky, but it works a treat!) that reads an input file, and uses the 'tree' function on each of the input lines to create a directory tree. this tree is then printed into an output file (along with the size of the folder).
multitree()
{
while read cheese
do
pushd . > /dev/null
pushd $cheese > /dev/null
echo -e "$cheese \n\n" >> ~/Desktop/$2.txt
tree -idf . >> ~/Desktop/$2.txt
echo -e "\n\n\n" >> ~/Desktop/$2.txt
du -sh --si >> ~/Desktop/$2.txt
echo -e "\n\n\n\n\n\n\n" >> ~/Desktop/$2.txt
popd > /dev/null
done < $1
cat ~/done
}
This is a time saver like no end, and outputs a snippet like the following:
./foo
./foo/bar
./foo/bar/1
./foo/bar/1/2
etc etc....
however, the first (and most tedious) thing I need to do is remove all entries leaving only the deepest folder path (Using the above example it would be reduced to just ./foo/bar/1/2)
Is there a way of processing the file before/after the tree function to only print the deepest levels?
I know something like python might do a better job, but my issue is I've never used python And I'm not sure the work systems would let me run python... they let us modify our own .bashrc so I'm not too worried!
Thanks in advance guys!!!!
Owen.
You could use
find . -type d -links 2
Replace . with a directory if desired.
EDIT: Explanation:
find searches a directory for files that match a given filter. In this case, the directory is ., and the filter is -type d -links 2.
-type d filters for directories
-links 2 filters for those that have two (hard) links to their name. Effectively, this filters for all directories that have no subdirectories, because only those have two: The one in their parent directory and the . link in themselves. Those with subdirectories also have the .. links in their subdirectories.
Here's a hint:
You just need to count the number of "/" characters in each line.
If the current line has fewer than the number of "/" characters in the preceding line, the preceding line would be the "deepest" directory in its part of the hierarchy.
This line, and any subsequent line with still fewer "/" characters would NOT be the deepest directory in its part of the entire directory hierarchy. As soon as you get a line with the same number of "/" characters, or greater, then you can "reset" and, once again, keep an eye out for the first line with the fewer number of "/" characters.
And, finally, you need to handle the trivial case: only one line in your tree output, the current directory has no subdirectories, so it wins by default.
Another way you can implement this is by considering the following statement:
If a directory's name also exists as an exact prefix of another directory in the list, followed by the "/" character, then it is NOT the deepest directory in its part of the hierarchy.

Resources