grep - a simple issue with end of line - linux

ls -alp $base/$currentDir | awk '{print $9}' | grep '/' | egrep -v '^t|^tz$|^html$|^\.'
I have this grep and I am trying to ignore matches with "t" "tz" or "html" full names of directories.
All is good except that ^html$ does not match, while ^html does match, same for ^tz$ not matching -- somehow the $ is not being recognized as end of line. ^ is fine as start of line.
I really want to know the answer to the above, and secondarily, is there a different way to get list of all subdirectories in a given directory?
I found ls -d but that does not take directory parameter?:
ls -d * /
/ arch index.html
that works fine
but unsucessful tries:
abc> ls -d * /
/ arch index.html
abc> ls -d ../../arizona /
../../arizona /
abc> ls -d ../../arizona
../../arizona
abc ls -d '../../arizona'
../../arizona
abc> ls -d '../../arizona' /
../../arizona /
while this is the layout
abc> ls -alp ../../arizona | grep '/'
drwxr-xr-x 7 roberto007 inetuser 4096 Jan 26 11:16 ./
drwxr-xr-x 205 roberto007 inetuser 28672 Mar 10 11:07 ../
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:17 grand-canyon/
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:16 havasu-falls/
drwxr-xr-x 2 roberto007 inetuser 28672 Feb 27 2014 html/
drwxr-xr-x 4 roberto007 inetuser 4096 Jan 26 11:17 sedona/
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:16 superstitions/

This should work:
cd $base/$currentDir
printf '%s\n' */ | egrep -v '^t|^tz/$|^html/$'
or
printf '%s\n' $base/$currentDir/*/ | egrep -v '^t|^tz/$|^html/$'
*/ lists only directories
printf '%s\n' puts a newline after each directory
egrep does what you want, no need to filter out ./ because hidden directories are not expanded by */

Related

Remove all files with a common prefix except the N latest modified

I am trying to create a bash command/script to remove all files in a directory older than X days that starts with a certain substring.
For example, if our directory contains the files
-rw-r--r-- 1 root root 0 Jun 30 10:22 foo_5
-rw-r--r-- 1 root root 0 Jun 29 10:22 bar_4
-rw-r--r-- 1 root root 0 Jun 29 10:22 foo_4
-rw-r--r-- 1 root root 0 Jun 28 10:22 bar_3
-rw-r--r-- 1 root root 0 Jun 28 10:22 foo_3
-rw-r--r-- 1 root root 0 Jun 27 10:22 bar_2
-rw-r--r-- 1 root root 0 Jun 27 10:22 foo_2
-rw-r--r-- 1 root root 0 Jun 26 10:22 foo_1
we want to delete all foo* files except the 2 most recent one. This will result in the directory
-rw-r--r-- 1 root root 0 Jun 30 10:22 foo_5
-rw-r--r-- 1 root root 0 Jun 29 10:22 bar_4
-rw-r--r-- 1 root root 0 Jun 29 10:22 foo_4
-rw-r--r-- 1 root root 0 Jun 28 10:22 bar_3
-rw-r--r-- 1 root root 0 Jun 27 10:22 bar_2
I am currently only able to delete all files except the 2 most recent, which will affect bar* files.
ls -t | tail -n +4 | xargs rm --
How can we also restrict our deletion to files that starts with a certain string?
Code to create test files
(
touch -d "6 days ago" foo_5
touch -d "7 days ago" foo_4
touch -d "7 days ago" bar_4
touch -d "8 days ago" foo_3
touch -d "8 days ago" bar_3
touch -d "9 days ago" foo_2
touch -d "9 days ago" bar_2
touch -d "10 days ago" foo_1
)
Parsing the output of ls is not a good idea. Using tools from GNU coreutils and findutils packages, a fail-safe program to achieve this task can be written as below.
n=2 # except the last two
find -maxdepth 1 -type f -name 'foo*' \
-printf '%T#\t%p\0' \
| sort -z -k 1n,1 \
| head -z -n -$n \
| cut -z -f 2- \
| xargs -0 rm
This is a job for stat
stat -c '%Y %n' foo* | sort -n | head -n -2 | cut -d " " -f 2- | xargs echo rm
rm foo_1 foo_2 foo_3
Remove "echo" if it is selecting the right files to delete.
Using perl and glob() (handle files with newlines or spaces as well) via only one process:
perl -e '
my #files = sort { -M $a <=> -M $b } grep -f, <./foo*>;
unlink #files[2..$#files]
'

Find regular expression matching condition

I have a set of files including a date in their name:
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
I want to select the files matching a condition on this date. In this example: date > 20151230
I tried things like:
find . -regex ".*.SUB.nc" | cut -d "." -f 4 | while read a; do if [ $a -ge 20151201 ]; then echo $a; fi; done
BUT:
1) This is returning only a part of the filename, whereas I would like to return the entire filename.
2) There may be a more elegant way than using while read/do
thanks in advance!
Rearranging your code becomes:
#!/usr/bin/env bash
find . -regex ".*.SUB.nc" \
| rev | cut -d '.' -f 3 | rev \
| while read a; do
if [ $a -ge 20151201 ]; then
echo $a
fi
done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field, but it will always be the 3rd last field.
This will give the output:
20151231
20151229
20151230
To show the complete file names change echo $a with ls *$a*. Output:
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
I tested this script with file names whose dates are less than 20151201. For example MERRA2_400.tavg1_2d_slv_Nx.20151200.SUB.nc. The results are consistent.
Perhaps a more efficient way to accomplish your task is using a grep regex like:
find . -regex ".*.SUB.nc" | grep -E "201512(0[1-9]|[1-9][0-9])|201[6-9][0-9][0-9][0-9]"
This will work just fine.
find . -regex ".*.SUB.nc" | rev | cut -d '.' -f 3 | rev | while read a; do if [ $a -ge 20151201 ]; then echo `ls -R | grep $a` ;fi ;done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field now, but it will always be the 3rd last field always.
ls -R | grep $a so that you can recursively find out the name of the file.
Assume is the files and file structure is :
[root#localhost temp]# ls -lrt -R
.:
total 8
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 16:32 temp.3
drwxr-xr-x. 3 root root 4096 Apr 25 17:13 temp2
./temp.3:
total 0
./temp2:
total 4
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151111.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 17:13 temp21
./temp2/temp21:
total 0
-rw-r--r--. 1 root root 0 Apr 25 17:13 MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
Running above command gives :
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc

Bash - Filter out directories from ls -la output, but leave directory "."

I am trying to count size of all files and subdirectories starting from ./ using oneliner:
ls -laR | grep -v "\.\." | awk '{total += $5} END {print total}'
but this counts size of subdirectories twice because output of ls -laR | grep -v "\.\." is:
.:
total 32
drwxr-xr-x 3 root root 4096 Nov 29 22:59 .
-rw-r--r-- 1 root root 55 Nov 29 02:19 131
-rw-r--r-- 1 root root 50 Nov 29 01:28 abc
-rw-r--r-- 1 root root 1000 Nov 29 01:27 access.log
drwxr-xr-x 2 root root 4096 Nov 29 22:24 asd
-rwx------ 1 root root 458 Nov 29 02:54 oneliners.sh
-rwx------ 1 root root 2136 Nov 29 17:56 regexp.sh.skript
./asd:
total 32
drwxr-xr-x 2 root root 4096 Nov 29 22:24 .
-rw-r--r-- 1 root root 21298 Nov 29 22:26 asd
so it counts directory asd twice. once in listing of directory .: as:
drwxr-xr-x 2 root root 4096 Nov 29 22:24 asd
and 2nd time in listing of directory ./asd: as:
drwxr-xr-x 2 root root 4096 Nov 29 22:24 .
I expect, this will happen for every subdirectory. Is there a way to remove them once from ls output? Usint grep -v '^d' removes all directories, so they wont be counted at all. I know I can do it simply bu using du -sb, but I need it to be done with fancy oneliner.
ls -FlaR |grep -v '\s\.\{1,\}/$' |awk '{total += $5} END {print total}'
includes the size of folders inside '.', but not the size of '.' itself. Comparing with du, the answer is quite different -as du is about the space on disk (relates to blocks).
The answer I get using your awk script is closer to what the OS reports -if you subtract the directory sizes you get a match, which suggests that MacOS X uses a method similar to
ls -FlaR |grep -v '^d.*/$' |awk '{total += $5} END {print total}'
for calculating the size of the content of a folder.

How to list parent directories by last updated date inclusive of all that folder's files

I want it to recursively look through everything in the current directory (/data/trac) and list only the parent items (test, test2, project1) with the timestamp of the newest updated file that is inside each of those directories next to it, and sort it by that timestamp.
Here is the scenario:
$ pwd
$ /data/trac
$ ls -lht
drwxrwxr-x 9 www-data www-data 4.0K Apr 30 2012 test
drwxrwxr-x 9 www-data www-data 4.0K Apr 30 2013 test2
drwxrwxr-x 9 www-data www-data 4.0K Apr 30 2013 project1
$ cd test
$ ls -lht
drwxrwxr-x 2 www-data www-data 4.0K Feb 4 16:12 db
drwxrwxr-x 2 www-data www-data 4.0K Dec 13 13:16 conf
drwxrwxr-x 4 www-data www-data 4.0K Jan 11 2013 attachments
drwxrwxr-x 2 www-data www-data 4.0K Apr 30 2012 templates
We have a directory called "test" which was last updated April 30th 2012. For example, in this case there is a db folder inside that directory which has a file in it which was updated Feb 4th 2014. I want to use this date as the timestamp for the main parent folder "test".
What I want to do is display only the parent folders (test, test2, and project1) sort them by the last updated date (recursively) and display that last updated date.
So the output should be:
$ awesome-list-command
Feb 4 2014 test
Feb 2 2014 test2
I have scoured the Internet for hours trying to find this, and even messing about myself to no avail. I have tried:
find . -exec stat -f "%m" \{} \; | sort -n -r | head -1
find $1 -type f | xargs stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head
find /some/dir -printf "%T+\n" | sort -nr | head -n 1
find /some/dir -printf "%TY-%Tm-%Td %TT\n" | sort -nr | head -n 1
stat --printf="%y %n\n" $(ls -tr $(find * -type f))
None of which have worked.
My testcase is a tree like this:
$ tree -t .
.
├── test2
│   └── db
│   ├── foo
│   └── bar
└── test
└── db
├── foo
└── bar
foo is the newest file in each directory.
#/bin/bash
# awesome-list-command
for dir in */; do
timestamp=$(find ./$dir -type f -printf "%T# %t\\n" | sort -nr -k 1,2 | head -n 1)
printf "%s %s\n" "$timestamp" "$dir"
done | sort -nr -k 1,2 | awk '{$1=""; print}'
Output:
$ ./awesome-list-command
Tue Feb 4 23:29:41.0766864265 2014 test2/
Tue Feb 4 23:29:40.0026788568 2014 test/
for comparison:
$ stat -c "%y" test*/db/foo
2014-02-04 23:29:41.766864265 +0100
2014-02-04 23:29:40.026788568 +0100

Tail latest file that matches a selected rule

Have a directory that multiple processes log to and I want to tail the latest file of a selected process.
in ~/bashrc I have added the following
function __taillog {
tail -f $(find $1 -maxdepth 1 -type f -printf "%T# %p\n" | sort -n | tail -n 1 | cut -d' ' -f 2-)
}
alias taillog='__taillog'
Taken from: https://superuser.com/questions/117596/how-to-tail-the-latest-file-in-a-directory
An example of the log file directory
-rw-r--r-- 1 genesys genesys 2284 Mar 19 16:34 gdalog.20130319_163436_906.log
-rw-r--r-- 1 genesys genesys 131072 Mar 19 16:34 gdalog.20130319_163436_906.snapshot.log
-rw-r--r-- 1 genesys genesys 10517 Mar 19 16:54 lcalog.20130319_163332_719.log
-rw-r--r-- 1 genesys genesys 131072 Mar 19 16:54 lcalog.20130319_163332_719.snapshot.log
-rw-r--r-- 1 genesys genesys 3792 Mar 19 16:37 StatServer_TLSTest.20130319_163700_703.log
-rw-r--r-- 1 genesys genesys 160562 Mar 19 16:52 StatServer_TLSTest.20130319_163712_045.log
-rw-r--r-- 1 genesys genesys 49730 Mar 19 16:54 StatServer_TLSTest.20130319_165217_402.log
-rw-r--r-- 1 genesys genesys 53960 Mar 20 09:55 StatServer_TLSTest.20130319_165423_702.log
-rw-r--r-- 1 genesys genesys 131072 Mar 20 09:56 StatServer_TLSTest.20130319_165423_702.snapshot.log
So to tail the all StatServer the command would be
taillog /home/user/logs/StatServer*
and it would tail the latest file for that application in the given path
The issue is the tail displays some of the file output but does not show any updates when the log file is appended. If the following command is run the log is tailed correctly
tail -f $(find /home/user/logs/StatServer* -maxdepth 1 -type f -printf "%T# %p\n" | sort -n | tail -n 1 | cut -d' ' -f 2-)
Some how adding this command as a bash function then calling it from an alias causes it to not operate as desired.
Any suggestion on a better way are welcome.
I believe you should be running this command:
taillog /home/user/logs
When you say /home/user/logs/this_app* you're passing all the files that match the pattern as argument to taillog and only using the first argument i.e. $1, and the command eventually translates to tail -f $1.
Instead $1 should be the directory where find should look for the files at that directory level (i.e. /home/user/logs in your case), then pipe the results to sort, tail and cut.
I didn't have any problems running your taillog function on linux/bash. Perhaps the log output is being buffered, so changes aren't being written right away? You might try turning off the [log]buffering option for this StatServer.

Resources