Unix/Linux display average file size with restriction - linux

how do I display the average file size (rounded down). Use only: cat, echo, ls, wc, here is what I was able to do so far: echo "$(cat * | wc -w; ls -l | wc -l)" I have both of the numbers, I just can't divide them, any help would be appreciated and thanks in advance

Is it allowed to use the shell for the division?
$ ls -l
total 20
-rw-r--r-- 1 james james 968 Dec 29 2016 bar
-rw-r--r-- 1 james james 900 Dec 29 2016 bar.asc
-rw-r--r-- 1 james james 39 Dec 29 2016 compr.txt
-rw-r--r-- 1 james james 1056 Dec 28 2016 foo
-rw-r--r-- 1 james james 896 Dec 29 2016 foo.asc
$ cat * | wc -c
3859
$ ls | wc -l
5
$ echo $(( $(cat * | wc -c) / $(ls | wc -l) )) # solution part
771
$ echo 5*771 | bc
3855

You can do
n=( * ); s=( $(ls -sk) ); echo $(( ${s[1]} / ${#n[#]} ))
use an array to count the number of files in the directory and ls to get the total size in Kbytes, then print the result of the quotient.

Related

Remove all files with a common prefix except the N latest modified

I am trying to create a bash command/script to remove all files in a directory older than X days that starts with a certain substring.
For example, if our directory contains the files
-rw-r--r-- 1 root root 0 Jun 30 10:22 foo_5
-rw-r--r-- 1 root root 0 Jun 29 10:22 bar_4
-rw-r--r-- 1 root root 0 Jun 29 10:22 foo_4
-rw-r--r-- 1 root root 0 Jun 28 10:22 bar_3
-rw-r--r-- 1 root root 0 Jun 28 10:22 foo_3
-rw-r--r-- 1 root root 0 Jun 27 10:22 bar_2
-rw-r--r-- 1 root root 0 Jun 27 10:22 foo_2
-rw-r--r-- 1 root root 0 Jun 26 10:22 foo_1
we want to delete all foo* files except the 2 most recent one. This will result in the directory
-rw-r--r-- 1 root root 0 Jun 30 10:22 foo_5
-rw-r--r-- 1 root root 0 Jun 29 10:22 bar_4
-rw-r--r-- 1 root root 0 Jun 29 10:22 foo_4
-rw-r--r-- 1 root root 0 Jun 28 10:22 bar_3
-rw-r--r-- 1 root root 0 Jun 27 10:22 bar_2
I am currently only able to delete all files except the 2 most recent, which will affect bar* files.
ls -t | tail -n +4 | xargs rm --
How can we also restrict our deletion to files that starts with a certain string?
Code to create test files
(
touch -d "6 days ago" foo_5
touch -d "7 days ago" foo_4
touch -d "7 days ago" bar_4
touch -d "8 days ago" foo_3
touch -d "8 days ago" bar_3
touch -d "9 days ago" foo_2
touch -d "9 days ago" bar_2
touch -d "10 days ago" foo_1
)
Parsing the output of ls is not a good idea. Using tools from GNU coreutils and findutils packages, a fail-safe program to achieve this task can be written as below.
n=2 # except the last two
find -maxdepth 1 -type f -name 'foo*' \
-printf '%T#\t%p\0' \
| sort -z -k 1n,1 \
| head -z -n -$n \
| cut -z -f 2- \
| xargs -0 rm
This is a job for stat
stat -c '%Y %n' foo* | sort -n | head -n -2 | cut -d " " -f 2- | xargs echo rm
rm foo_1 foo_2 foo_3
Remove "echo" if it is selecting the right files to delete.
Using perl and glob() (handle files with newlines or spaces as well) via only one process:
perl -e '
my #files = sort { -M $a <=> -M $b } grep -f, <./foo*>;
unlink #files[2..$#files]
'

Find regular expression matching condition

I have a set of files including a date in their name:
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
I want to select the files matching a condition on this date. In this example: date > 20151230
I tried things like:
find . -regex ".*.SUB.nc" | cut -d "." -f 4 | while read a; do if [ $a -ge 20151201 ]; then echo $a; fi; done
BUT:
1) This is returning only a part of the filename, whereas I would like to return the entire filename.
2) There may be a more elegant way than using while read/do
thanks in advance!
Rearranging your code becomes:
#!/usr/bin/env bash
find . -regex ".*.SUB.nc" \
| rev | cut -d '.' -f 3 | rev \
| while read a; do
if [ $a -ge 20151201 ]; then
echo $a
fi
done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field, but it will always be the 3rd last field.
This will give the output:
20151231
20151229
20151230
To show the complete file names change echo $a with ls *$a*. Output:
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
I tested this script with file names whose dates are less than 20151201. For example MERRA2_400.tavg1_2d_slv_Nx.20151200.SUB.nc. The results are consistent.
Perhaps a more efficient way to accomplish your task is using a grep regex like:
find . -regex ".*.SUB.nc" | grep -E "201512(0[1-9]|[1-9][0-9])|201[6-9][0-9][0-9][0-9]"
This will work just fine.
find . -regex ".*.SUB.nc" | rev | cut -d '.' -f 3 | rev | while read a; do if [ $a -ge 20151201 ]; then echo `ls -R | grep $a` ;fi ;done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field now, but it will always be the 3rd last field always.
ls -R | grep $a so that you can recursively find out the name of the file.
Assume is the files and file structure is :
[root#localhost temp]# ls -lrt -R
.:
total 8
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 16:32 temp.3
drwxr-xr-x. 3 root root 4096 Apr 25 17:13 temp2
./temp.3:
total 0
./temp2:
total 4
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151111.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 17:13 temp21
./temp2/temp21:
total 0
-rw-r--r--. 1 root root 0 Apr 25 17:13 MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
Running above command gives :
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc

grep - a simple issue with end of line

ls -alp $base/$currentDir | awk '{print $9}' | grep '/' | egrep -v '^t|^tz$|^html$|^\.'
I have this grep and I am trying to ignore matches with "t" "tz" or "html" full names of directories.
All is good except that ^html$ does not match, while ^html does match, same for ^tz$ not matching -- somehow the $ is not being recognized as end of line. ^ is fine as start of line.
I really want to know the answer to the above, and secondarily, is there a different way to get list of all subdirectories in a given directory?
I found ls -d but that does not take directory parameter?:
ls -d * /
/ arch index.html
that works fine
but unsucessful tries:
abc> ls -d * /
/ arch index.html
abc> ls -d ../../arizona /
../../arizona /
abc> ls -d ../../arizona
../../arizona
abc ls -d '../../arizona'
../../arizona
abc> ls -d '../../arizona' /
../../arizona /
while this is the layout
abc> ls -alp ../../arizona | grep '/'
drwxr-xr-x 7 roberto007 inetuser 4096 Jan 26 11:16 ./
drwxr-xr-x 205 roberto007 inetuser 28672 Mar 10 11:07 ../
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:17 grand-canyon/
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:16 havasu-falls/
drwxr-xr-x 2 roberto007 inetuser 28672 Feb 27 2014 html/
drwxr-xr-x 4 roberto007 inetuser 4096 Jan 26 11:17 sedona/
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:16 superstitions/
This should work:
cd $base/$currentDir
printf '%s\n' */ | egrep -v '^t|^tz/$|^html/$'
or
printf '%s\n' $base/$currentDir/*/ | egrep -v '^t|^tz/$|^html/$'
*/ lists only directories
printf '%s\n' puts a newline after each directory
egrep does what you want, no need to filter out ./ because hidden directories are not expanded by */

Split texts into smaller texts of n number of words

I have a large number of texts (several thousand) in a txt format and would like to split them into 500-word long chunks and to save these chunks into separate folders.
< *.txt tr -c A-Za-z0-9 \\n | grep -v '^$' | split -l 500
can do the job but it splits texts to one word per line, whereas I would like to retain the original format.
I was wondering if there is a bash command or Python script to do this.
You should also be able to do that with csplit, but I had better luck with the perl solution found here; https://unix.stackexchange.com/questions/66513/how-can-i-split-a-large-text-file-into-chunks-of-500-words-or-so
Thanks to Joseph R.
$ cat generatewordchunks.pl
perl -e '
undef $/;
$file=<>;
while($file=~ /\G((\S+\s+){500})/gc)
{
$i++;
open A,">","chunk-$i.txt";
print A $1;
close A;
}
$i++;
if($file=~ /\G(.+)\Z/sg)
{
open A,">","chunk-$i.txt";
print A $1;
}
' $1
$ ./generatewordchunks.pl woord.list
$ ls -ltr
total 13
-rwxrwx--- 1 root vboxsf 5934 Jul 31 16:03 woord.list
-rwxrwx--- 1 root vboxsf 362 Jul 31 16:08 generatewordchunks.pl
-rwxrwx--- 1 root vboxsf 4203 Jul 31 16:11 chunk-1.txt
-rwxrwx--- 1 root vboxsf 1731 Jul 31 16:11 chunk-2.txt

Print permissions from file arguments in Bash script

I'm having trouble reading the permissions of file arguments. I looks like it has something to do with hidden files but I'm not sure why.
Current Code:
#!/bin/bash
if [ $# = 0 ]
then
echo "Usage ./checkPerm filename [filename2 ... filenameN]"
exit 0
fi
for file in $#
do
ls -l | grep $file | cut -f1 -d' '
# Do Something
done
I can get the permissions for each input, but when a hidden file is run through through the loop it re-prints the permissions of all files.
-bash-4.1$ ll test*
-rw-r--r-- 1 user joe 0 Nov 11 19:07 test1
-r-xr-xr-x 1 user joe 0 Nov 11 19:07 test2*
-r--r----- 1 user joe 0 Nov 11 19:07 test3
-rwxr-x--- 1 user joe 0 Nov 11 19:07 test4*
-bash-4.1$ ./checkPerm test*
-rw-r--r--
-rw-r--r--
-r-xr-xr-x
-r--r-----
-rwxr-x---
-r--r-----
-rw-r--r--
-r-xr-xr-x
-r--r-----
-rwxr-x---
-bash-4.1$
What is going on in the loop?
It's your grep:
ls -l | grep 'test2*'
This will grep out anything starting with test since you're basically asking for anything starting with test that might end with 0 or more 2s in it, as specified by the 2*.
To get your intended result, simply remove your loop and replace it with this:
ls -l "$#" | cut -d' ' -f1
Or keep your loop, but remove the grep:
ls -l $file | cut -d' ' -f1
Also, technically, none of those files are hidden. Hidden files in bash start with ., like .bashrc.
When you do the ls -l inside the loop and then grep the results, if there are files that contain test1 in the name, but not at the start, they are selected by the grep, giving you extra results. You could see that by doing:
ls -l | grep test
and seeing that there are many more entries than the 4 you get with ls -l test*.
Inside your loop, you should probably use just:
ls -ld "$file" | cut -d' ' -f1

Resources