Why count differs between ls and ls -l linux command? - linux

I had a directory with number of files and need to check the count of files present in it.
I tried the following two commands:
ls | wc -l
ls -l | wc -l
and found there are differences while using both commands. (ie. number of files is greater in the usage of second command while comparing to the first command.)
I would like to know the changes happening in the both commands.

From man ls:
-l (The lowercase letter ``ell''.) List in long format. (See below.) If the output is to a terminal, a total sum for all the file sizes is output on a line before the
long listing.
So ls -l adds a header line stating the "total" size of files:
$ ls -l /
total 65
-r--r--r-- 1 root wheel 6197 May 11 21:57 COPYRIGHT
drwxr-xr-x 2 root wheel 1024 Jun 1 16:02 bin
drwxr-xr-x 9 root wheel 1536 Jun 1 16:02 boot
dr-xr-xr-x 8 root wheel 512 Jul 7 20:16 dev
.......

Related

How to find out if ls command output is file or a directory Bash

ls command outputs everything that is contained in current directory. For example ls -la will output something like this
drwxr-xr-x 3 user user 4096 dec 19 17:53 .
drwxr-xr-x 15 user user 4096 dec 19 17:39 ..
drwxrwxr-x 2 user user 4096 dec 19 17:53 tess (directory)
-rw-r--r-- 1 user user 178 dec 18 21:52 file (file)
-rw-r--r-- 1 user user 30 dec 18 21:47 text (file)
And what if I want to know how much space does all files consume. For that I would have to sum $5 from all lines with ls -la | awk '{ sum+=$5 } END{print sum}'. So how can I only sum size of files and leave directories behind?
You can use the following :
find . -maxdepth 1 -type f -printf '%s\n' | awk '{s+=$1} END {print s}'
The find command selects all the files in the current directory and output their size. The awk command sums the integers and output the total.
Don't.
One of the most quoted pages on SO that I've seen is https://unix.stackexchange.com/questions/128985/why-not-parse-ls-and-what-do-to-instead.
That being said and as a hint for further development, ls -l | awk '/^-/{s+=$5} END {print s}' will probably do what you ask.

How to delete HDFS folder having windows special characters (^M) in the name

I wrote a shell script to create hdfs folders in windows 7 and ran on Linux server. Now, hdfs folders got created but with special character ^M at the end of the name(probably carriage return). It doesn't show up in Linux but i can see when the 'ls' output is redirected to a file.
I should have run dos2unix before running this script. However now I am not able to delete folders with ^M. Could someone assist on how to delete these folders.
Just a supplementary answers fo #SachinJ.
TL;DR
$ hdfs dfs -rm -r -f $(hdfs dfs -ls /path/to/dir | sed '<LINE_NUMBER>q;d' | awk '{print $<FILE_NAME_COLUM_NUMBER>}')
should be replace to line number of file you want to delete in the output of hdfs dfs -ls /path/to/dir.
Here is the example.
Details
Suppose your hdfs dir like this
$ hdfs dfs -ls /path/to/dir
Found 5 items
drwxr-xr-x - test supergroup 0 2019-08-22 10:41 /path/to/dir/dir1
drwxr-xr-x - test supergroup 0 2019-07-11 15:35 /path/to/dir/dir2
drwxr-xr-x - test supergroup 0 2019-07-05 17:53 /path/to/dir/dir3
drwxr-xr-x - test supergroup 0 2019-08-22 11:28 /path/to/dir/dirtodelete
drwxr-xr-x - test supergroup 0 2019-07-26 11:07 /path/to/dir/dir4
When you ls from it, the screen output looks just ok.
But you can't select it
$ hdfs dfs -ls /path/to/dir/dirtodelete
ls: `/path/to/dir/dirtodelete': No such file or directory
$ hdfs dfs -ls /path/to/dir/dirtodelete*
ls: `/path/to/dir/dirtodelete*': No such file or directory
What's more, when output ls result to file and use vim to read, it shows like following
$ hdfs dfs -ls /path/to/dir > tmp
$ vim tmp
Found 5 items
drwxr-xr-x - test supergroup 0 2019-08-22 10:41 /path/to/dir/dir1
drwxr-xr-x - test supergroup 0 2019-07-11 15:35 /path/to/dir/dir2
drwxr-xr-x - test supergroup 0 2019-07-05 17:53 /path/to/dir/dir3
drwxr-xr-x - test supergroup 0 2019-08-22 11:28 /path/to/dir/dirtodelete^M^M
drwxr-xr-x - test supergroup 0 2019-07-26 11:07 /path/to/dir/dir4
What is "^M", it's a CARRIAGE RETURN (CR). More info here
Linux \n(LF) eq to Windows \r\n(CRLF)
This problem occurs edit same file in Windows and Linux.
So, we just to use correct filename, then we can delete it .But it can't be copy from the screen.
Here sed command works!
ls output as following
$ hdfs dfs -ls /path/to/dir
Found 5 items
drwxr-xr-x - test supergroup 0 2019-08-22 10:41 /path/to/dir/dir1
drwxr-xr-x - test supergroup 0 2019-07-11 15:35 /path/to/dir/dir2
drwxr-xr-x - test supergroup 0 2019-07-05 17:53 /path/to/dir/dir3
drwxr-xr-x - test supergroup 0 2019-08-22 11:28 /path/to/dir/dirtodelete
drwxr-xr-x - test supergroup 0 2019-07-26 11:07 /path/to/dir/dir4
the filename is on line 5
so hdfs dfs -ls /path/to/dir | sed '5q;d' will cut the line we need.
sed '5q;d' means read the first 5 line and quit, delete former lines, so it selects 5th line.
Then we can use awk the select filename column, index form 1, so column number is 8.
so just write the command
$ hdfs dfs -ls /path/to/dir/ | sed '5q;d' | awk '{print $8}'
/path/to/dir/dirtodelete
Then we can delete it.
$ hdfs dfs -rm -r -f $(hdfs dfs -ls /path/to/dir/ | sed '5q;d' | awk '{print $8}')
Sometimes wildchar may not work ( rm filename* ), better use the below option.
rm -r $(ls | sed '<LINE_NUMER>q;d')
Replace with line number in the output of ls command.

how to get the number records along with file size in unix

I am trying to get the file details and the number of records for each file along with size.
i tried with this ls -lhtr 234*201406*.log.gz it is giving all the details except record count. if i tried ls -lhtr 234*201406*.log.gz | wc -l it is showing the number of files.
present o/p:
-rw-r--r-- 1 jenkins tomcat 120M Jun 30 18:25 234_1404165601_20140630220001.log.gz
-rw-r--r-- 1 jenkins tomcat 144M Jun 30 19:24 234_1404169201_20140630230001.log.gz
i need o/p as
-rw-r--r-- 1 jenkins tomcat 120M Jun 30 18:25 234_1404165601_20140630220001.log.gz 20000
can you please help me on this to get.thanks in advance.
You can use zcat (or gunzip -c) for printing # of lines from .gz files:
find . -name '*.gz' -exec bash -c 'f="$1"; du -h "$f"; zcat "$f" | wc -l' - '{}' \;

Linux combine sort files by date created and given file name

I need to combine these to commands in order to have a sorted list by date created with the specified "filename".
I know that sorting files by date can be achieved with:
ls -lrt
and finding a file by name with
find . -name "filename*"
I don't know how to combine these two. I tried with a pipeline but I don't get the right result.
[EDIT]
Not sorted
find . -name "filename" -printf '%TY:%Tm:%Td %TH:%Tm %h/%f\n' | sort
Forget xargs. "Find" and "sort" are all the tools you need.
My best guess would be to use xargs:
find . -name 'filename*' -print0 | xargs -0 /bin/ls -ltr
There's an upper limit on the number of arguments, but it shouldn't be a problem unless they occupy more than 32kB (read more here), in which case you will get blocks of sorted files :)
find . -name "filename" -exec ls --full-time \{\} \; | cut -d' ' -f7- | sort
You might have to adjust the cut command depending on what your version of ls outputs.
Check the below-shared command:
1) List Files directory with Last Modified Date/Time
To list files and shows the last modified files at top, we will use -lt options with ls command.
$ ls -lt /run
output
total 24
-rw-rw-r--. 1 root utmp 2304 Sep 8 14:58 utmp
-rw-r--r--. 1 root root 4 Sep 8 12:41 dhclient-eth0.pid
drwxr-xr-x. 4 root root 100 Sep 8 03:31 lock
drwxr-xr-x. 3 root root 60 Sep 7 23:11 user
drwxr-xr-x. 7 root root 160 Aug 26 14:59 udev
drwxr-xr-x. 2 root root 60 Aug 21 13:18 tuned
https://linoxide.com/linux-how-to/how-sort-files-date-using-ls-command-linux/

Linux - Save only recent 10 folders and delete the rest

I have a folder that contains versions of my application, each time I upload a new version a new sub-folder is created for it, the sub-folder name is the current timestamp, here is a printout of the main folder used (ls -l |grep ^d):
drwxrwxr-x 7 root root 4096 2011-03-31 16:18 20110331161649
drwxrwxr-x 7 root root 4096 2011-03-31 16:21 20110331161914
drwxrwxr-x 7 root root 4096 2011-03-31 16:53 20110331165035
drwxrwxr-x 7 root root 4096 2011-03-31 16:59 20110331165712
drwxrwxr-x 7 root root 4096 2011-04-03 20:18 20110403201607
drwxrwxr-x 7 root root 4096 2011-04-03 20:38 20110403203613
drwxrwxr-x 7 root root 4096 2011-04-04 14:39 20110405143725
drwxrwxr-x 7 root root 4096 2011-04-06 15:24 20110406151805
drwxrwxr-x 7 root root 4096 2011-04-06 15:36 20110406153157
drwxrwxr-x 7 root root 4096 2011-04-06 16:02 20110406155913
drwxrwxr-x 7 root root 4096 2011-04-10 21:10 20110410210928
drwxrwxr-x 7 root root 4096 2011-04-10 21:50 20110410214939
drwxrwxr-x 7 root root 4096 2011-04-10 22:15 20110410221414
drwxrwxr-x 7 root root 4096 2011-04-11 22:19 20110411221810
drwxrwxr-x 7 root root 4096 2011-05-01 21:30 20110501212953
drwxrwxr-x 7 root root 4096 2011-05-01 23:02 20110501230121
drwxrwxr-x 7 root root 4096 2011-05-03 21:57 20110503215252
drwxrwxr-x 7 root root 4096 2011-05-06 16:17 20110506161546
drwxrwxr-x 7 root root 4096 2011-05-11 10:00 20110511095709
drwxrwxr-x 7 root root 4096 2011-05-11 10:13 20110511100938
drwxrwxr-x 7 root root 4096 2011-05-12 14:34 20110512143143
drwxrwxr-x 7 root root 4096 2011-05-13 22:13 20110513220824
drwxrwxr-x 7 root root 4096 2011-05-14 22:26 20110514222548
drwxrwxr-x 7 root root 4096 2011-05-14 23:03 20110514230258
I'm looking for a command that will leave the last 10 versions (sub-folders) and deletes the rest.
Any thoughts?
There you go. (edited)
ls -dt */ | tail -n +11 | xargs rm -rf
First list directories recently modified then take all of them except first 10, then send them to rm -rf.
ls -dt1 /path/to/folder/*/ | sed '11,$p' | rm -r
this assumes those are the only directories and no others are present in the working directory.
ls -dt1 will normally only print the newest directory however the /*/ will
only match directories and print their full paths the 1 ensures one
line per match/listing t sorts time with newest at the top.
sed takes the 11th line on down to the bottom and prints only those lines, which are then passed to rm.
You can use xargs, but for testing you may wish to remove | rm -r to see if the directories are listed properly first.
If the directories' names contain the date one can delete all but the last 10 directories with the default alphabetical sort
ls -d */ | head -n -10 | xargs rm -rf
ls -lt | grep ^d | sed -e '1,10d' | awk '{sub(/.* /, ""); print }' | xargs rm -rf
Explanation:
list all contents of current directory in chronological order (most recent files first)
filter out all the directories
ignore the 10 first lines / directories
use awk to extract the file names from the remaining 'ls -l' output
remove the files
EDIT:
find . -maxdepth 1 -type d ! -name \\.| sort | tac | sed -e '1,10d' | xargs rm -rf
I suggest the following sequence. I use a similar approach on my Synology NAS to delete old backups. It doesn't rely on the folder names, instead it uses the last modified time to decide which folders to delete. It also uses zero-termination in order to correctly handle quotes, spaces and newline characters in the folder names:
find /path/to/folder -maxdepth 1 -mindepth 1 -type d -printf '%Ts\t' -print0 \
| sort -rnz \
| tail -n +11 -z \
| cut -f2- -z \
| xargs -0 -r rm -rf
IMPORTANT: This will delete any matching folders! I strongly recommend doing a test run first by replacing the last command xargs -0 -r rm -rf with xargs -0 which will echo the matching folders instead of deleting them.
A short explanation of each step:
find /path/to/folder -maxdepth 1 -mindepth 1 -type d -printf '%Ts\t' -print0
Find all directories (-type d) directly inside the backup folder (-maxdepth 1) except the backup folder itself (-mindepth 1), print (-printf) the Unix time (%Ts) of the last modification followed by a tab character (\t, used in step 4) and the full file name followed by a null character (-print0).
sort -rnz
Sort the zero-terminated items (-z) from the previous step using a numerical comparison (-n) and reverse the order (-r). The result is a list of all folders sorted by their last modification time in descending order.
tail -n +11 -z
Print the last lines (tail) from the previous step starting from line 11 (-n +11) considering each line as zero-terminated (-z). This excludes the newest 10 folders (by modification time) from the remaining steps.
cut -f2- -z
Cut each line from the second field until the end (-f2-) treating each line as zero-terminaded (-z) to obtain a list containing the full path to each folder older than 10 days.
xargs -r -0 rm -rf
Take the zero-terminated (-0) items from the previous step (xargs), and, if there are any (-r avoids running the command passed to xargs if there are no nonblank characters), force delete (rm -rf) them.
Your directory names are sorted in chronological order, which makes this easy. The list of directories in chronological order is just *, or [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] to be more precise. So you want to delete all but the last 10 of them.
set [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/
while [ $# -gt 10 ]; do
rm -rf "$1"
shift
fi
(While there are more than 10 directories left, delete the oldest one.)

Resources