Find regular expression matching condition - linux

I have a set of files including a date in their name:
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
I want to select the files matching a condition on this date. In this example: date > 20151230
I tried things like:
find . -regex ".*.SUB.nc" | cut -d "." -f 4 | while read a; do if [ $a -ge 20151201 ]; then echo $a; fi; done
BUT:
1) This is returning only a part of the filename, whereas I would like to return the entire filename.
2) There may be a more elegant way than using while read/do
thanks in advance!

Rearranging your code becomes:
#!/usr/bin/env bash
find . -regex ".*.SUB.nc" \
| rev | cut -d '.' -f 3 | rev \
| while read a; do
if [ $a -ge 20151201 ]; then
echo $a
fi
done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field, but it will always be the 3rd last field.
This will give the output:
20151231
20151229
20151230
To show the complete file names change echo $a with ls *$a*. Output:
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
I tested this script with file names whose dates are less than 20151201. For example MERRA2_400.tavg1_2d_slv_Nx.20151200.SUB.nc. The results are consistent.
Perhaps a more efficient way to accomplish your task is using a grep regex like:
find . -regex ".*.SUB.nc" | grep -E "201512(0[1-9]|[1-9][0-9])|201[6-9][0-9][0-9][0-9]"

This will work just fine.
find . -regex ".*.SUB.nc" | rev | cut -d '.' -f 3 | rev | while read a; do if [ $a -ge 20151201 ]; then echo `ls -R | grep $a` ;fi ;done
rev | cut -d '.' -f 3 | rev is used because
if you give absolute path or
the subdirectories have . in them
then it won't be the 4th field now, but it will always be the 3rd last field always.
ls -R | grep $a so that you can recursively find out the name of the file.
Assume is the files and file structure is :
[root#localhost temp]# ls -lrt -R
.:
total 8
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:15 MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 16:32 temp.3
drwxr-xr-x. 3 root root 4096 Apr 25 17:13 temp2
./temp.3:
total 0
./temp2:
total 4
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151111.SUB.nc
-rw-r--r--. 1 root root 0 Apr 25 16:27 MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc
drwxr-xr-x. 2 root root 4096 Apr 25 17:13 temp21
./temp2/temp21:
total 0
-rw-r--r--. 1 root root 0 Apr 25 17:13 MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
Running above command gives :
MERRA2_400.tavg1_2d_slv_Nx.20151229.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151231.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151230.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151333.SUB.nc
MERRA2_400.tavg1_2d_slv_Nx.20151222.SUB.nc

Related

Remove all files with a common prefix except the N latest modified

I am trying to create a bash command/script to remove all files in a directory older than X days that starts with a certain substring.
For example, if our directory contains the files
-rw-r--r-- 1 root root 0 Jun 30 10:22 foo_5
-rw-r--r-- 1 root root 0 Jun 29 10:22 bar_4
-rw-r--r-- 1 root root 0 Jun 29 10:22 foo_4
-rw-r--r-- 1 root root 0 Jun 28 10:22 bar_3
-rw-r--r-- 1 root root 0 Jun 28 10:22 foo_3
-rw-r--r-- 1 root root 0 Jun 27 10:22 bar_2
-rw-r--r-- 1 root root 0 Jun 27 10:22 foo_2
-rw-r--r-- 1 root root 0 Jun 26 10:22 foo_1
we want to delete all foo* files except the 2 most recent one. This will result in the directory
-rw-r--r-- 1 root root 0 Jun 30 10:22 foo_5
-rw-r--r-- 1 root root 0 Jun 29 10:22 bar_4
-rw-r--r-- 1 root root 0 Jun 29 10:22 foo_4
-rw-r--r-- 1 root root 0 Jun 28 10:22 bar_3
-rw-r--r-- 1 root root 0 Jun 27 10:22 bar_2
I am currently only able to delete all files except the 2 most recent, which will affect bar* files.
ls -t | tail -n +4 | xargs rm --
How can we also restrict our deletion to files that starts with a certain string?
Code to create test files
(
touch -d "6 days ago" foo_5
touch -d "7 days ago" foo_4
touch -d "7 days ago" bar_4
touch -d "8 days ago" foo_3
touch -d "8 days ago" bar_3
touch -d "9 days ago" foo_2
touch -d "9 days ago" bar_2
touch -d "10 days ago" foo_1
)
Parsing the output of ls is not a good idea. Using tools from GNU coreutils and findutils packages, a fail-safe program to achieve this task can be written as below.
n=2 # except the last two
find -maxdepth 1 -type f -name 'foo*' \
-printf '%T#\t%p\0' \
| sort -z -k 1n,1 \
| head -z -n -$n \
| cut -z -f 2- \
| xargs -0 rm
This is a job for stat
stat -c '%Y %n' foo* | sort -n | head -n -2 | cut -d " " -f 2- | xargs echo rm
rm foo_1 foo_2 foo_3
Remove "echo" if it is selecting the right files to delete.
Using perl and glob() (handle files with newlines or spaces as well) via only one process:
perl -e '
my #files = sort { -M $a <=> -M $b } grep -f, <./foo*>;
unlink #files[2..$#files]
'

Sort files in directory and then printing the content

I need to write a script to sort filenames by the character that comes after the first "0" in the name. All the file names contain at least one 0.
Then the script should print the content of each file by that order.
I know i need to use sort and cat. But i can't figure out what sort. This is as far as I've got.
#!/bin/bash
dir=$(pwd)
for n in $dir `ls | sort -u ` ; do
cat $n
done;
Assuming that
the first zero could be anywhere in the filename,
there could be several files with the same name after the zero,
you want to be able to handle any filename, including dotfiles and names containing newlines, and
you have GNU CoreUtils installed (standard on common distros),
you'll need to do something crazy like this (untested):
find . -mindepth 1 -maxdepth 1 -exec printf '%s\0' {} + | while IFS= read -r -d ''
do
printf '%s\0' "${REPLY#*0}"
done | sort --unique --zero-terminated | while IFS= read -r -d ''
do
for file in ./*"$REPLY"
do
[…]
done
done
Explanation:
Print all filenames NUL separated and read them back in to be able to do variable substitution on them.
Remove everything up to and including the first zero in the filename and print that.
Sort by the remainder of the filename, making sure to only print each unique suffix once.
Process each file ending with the (now sorted) suffix.
Take a look at this find + xargs that will correctly handle filenames with "funny characters":
find . -maxdepth 1 -type f -name '*0*' -print0 | sort -zt0 -k2 | xargs -0 cat
You could write a script that looks like this:
#/bin/bash
# using "shopt -s nullglob" so that an empty directory won't give you a literal '*'.
shopt -s nullglob
# get a sorted directory listing
filelist=$(for i in .*0* *0*; do echo "$i"; done | sort -t0 -k2)
IFS=$(echo -en "\n\b")
# iterate over your sorted list
for f in $filelist
do
# just cat text files.
file $f | grep text > /dev/null 2>&1
if [ $? = 0 ]
then
cat $f
fi
done
Test:
[plankton#localhost SO_scripts]$ ls -l
total 40
-rw-r--r-- 1 plankton plankton 10 Sep 9 10:56 afile0zzz
-rw-r--r-- 1 plankton plankton 14 Sep 9 10:56 bfile xxx0yyy
-rwxr-xr-x 1 plankton plankton 488 Sep 9 10:56 catfiles.sh
-rw-r--r-- 1 plankton plankton 9 Sep 9 10:56 file0123
-rw-r--r-- 1 plankton plankton 9 Sep 9 10:56 file0124
-rw-r--r-- 1 plankton plankton 7 Sep 9 10:56 file0a
-rw-r--r-- 1 plankton plankton 8 Sep 9 10:56 file0aa
-rw-r--r-- 1 plankton plankton 7 Sep 9 10:56 file0b
-rw-r--r-- 1 plankton plankton 9 Sep 9 10:56 file0bbb
-rw-r--r-- 1 plankton plankton 18 Sep 9 10:56 files*_0asdf
[plankton#localhost SO_scripts]$ ./catfiles.sh
. is not a text file
.. is not a text file
Doing catfiles.sh
#/bin/bash
# using "shopt -s nullglob" so that an empty directory won't give you a literal '*'.
shopt -s nullglob
# get a sorted directory listing
filelist=$(for i in .* *; do echo "$i"; done | sort -t0 -k2)
IFS=$(echo -en "\n\b")
# iterate over your sorted list
for f in $(for i in .* *; do echo "$i"; done | sort -t0 -k2)
do
# just cat text files.
file $f | grep text > /dev/null 2>&1
if [ $? = 0 ]
then
echo "Doing $f"
cat $f
else
echo "$f is not a text file"
fi
done
Doing file0123
file0123
Doing file0124
file0124
Doing file0a
file0a
Doing file0aa
file0aa
Doing files*_0asdf
file with * in it
Doing file0b
file0b
Doing file0bbb
file0bbb
Doing bfile xxx0yyy
bfile xxx0yyy
Doing afile0zzz
afile0zzz
Updated as per PesaThe's suggestion of .*0* *0*.
dir=$(pwd)
for n in `ls -1 $dir | sort -t0 -k2`; do
cat $n
done;

grep - a simple issue with end of line

ls -alp $base/$currentDir | awk '{print $9}' | grep '/' | egrep -v '^t|^tz$|^html$|^\.'
I have this grep and I am trying to ignore matches with "t" "tz" or "html" full names of directories.
All is good except that ^html$ does not match, while ^html does match, same for ^tz$ not matching -- somehow the $ is not being recognized as end of line. ^ is fine as start of line.
I really want to know the answer to the above, and secondarily, is there a different way to get list of all subdirectories in a given directory?
I found ls -d but that does not take directory parameter?:
ls -d * /
/ arch index.html
that works fine
but unsucessful tries:
abc> ls -d * /
/ arch index.html
abc> ls -d ../../arizona /
../../arizona /
abc> ls -d ../../arizona
../../arizona
abc ls -d '../../arizona'
../../arizona
abc> ls -d '../../arizona' /
../../arizona /
while this is the layout
abc> ls -alp ../../arizona | grep '/'
drwxr-xr-x 7 roberto007 inetuser 4096 Jan 26 11:16 ./
drwxr-xr-x 205 roberto007 inetuser 28672 Mar 10 11:07 ../
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:17 grand-canyon/
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:16 havasu-falls/
drwxr-xr-x 2 roberto007 inetuser 28672 Feb 27 2014 html/
drwxr-xr-x 4 roberto007 inetuser 4096 Jan 26 11:17 sedona/
drwxr-xr-x 3 roberto007 inetuser 4096 Jan 26 11:16 superstitions/
This should work:
cd $base/$currentDir
printf '%s\n' */ | egrep -v '^t|^tz/$|^html/$'
or
printf '%s\n' $base/$currentDir/*/ | egrep -v '^t|^tz/$|^html/$'
*/ lists only directories
printf '%s\n' puts a newline after each directory
egrep does what you want, no need to filter out ./ because hidden directories are not expanded by */

Print permissions from file arguments in Bash script

I'm having trouble reading the permissions of file arguments. I looks like it has something to do with hidden files but I'm not sure why.
Current Code:
#!/bin/bash
if [ $# = 0 ]
then
echo "Usage ./checkPerm filename [filename2 ... filenameN]"
exit 0
fi
for file in $#
do
ls -l | grep $file | cut -f1 -d' '
# Do Something
done
I can get the permissions for each input, but when a hidden file is run through through the loop it re-prints the permissions of all files.
-bash-4.1$ ll test*
-rw-r--r-- 1 user joe 0 Nov 11 19:07 test1
-r-xr-xr-x 1 user joe 0 Nov 11 19:07 test2*
-r--r----- 1 user joe 0 Nov 11 19:07 test3
-rwxr-x--- 1 user joe 0 Nov 11 19:07 test4*
-bash-4.1$ ./checkPerm test*
-rw-r--r--
-rw-r--r--
-r-xr-xr-x
-r--r-----
-rwxr-x---
-r--r-----
-rw-r--r--
-r-xr-xr-x
-r--r-----
-rwxr-x---
-bash-4.1$
What is going on in the loop?
It's your grep:
ls -l | grep 'test2*'
This will grep out anything starting with test since you're basically asking for anything starting with test that might end with 0 or more 2s in it, as specified by the 2*.
To get your intended result, simply remove your loop and replace it with this:
ls -l "$#" | cut -d' ' -f1
Or keep your loop, but remove the grep:
ls -l $file | cut -d' ' -f1
Also, technically, none of those files are hidden. Hidden files in bash start with ., like .bashrc.
When you do the ls -l inside the loop and then grep the results, if there are files that contain test1 in the name, but not at the start, they are selected by the grep, giving you extra results. You could see that by doing:
ls -l | grep test
and seeing that there are many more entries than the 4 you get with ls -l test*.
Inside your loop, you should probably use just:
ls -ld "$file" | cut -d' ' -f1

Get first and last files per month

Based on this question Group files and pipe to awk command
I have a set of files like this:-
-rw-r--r-- 1 root root 497186 Apr 21 13:17 2012_03_25
-rw-r--r-- 1 root root 490558 Apr 21 13:17 2012_03_26
-rw-r--r-- 1 root root 488797 Apr 21 13:17 2012_03_27
-rw-r--r-- 1 root root 316290 Apr 21 13:17 2012_03_28
-rw-r--r-- 1 root root 490081 Apr 21 13:17 2012_03_29
-rw-r--r-- 1 root root 486621 Apr 21 13:17 2012_03_30
-rw-r--r-- 1 root root 490904 Apr 21 13:17 2012_03_31
-rw-r--r-- 1 root root 491788 Apr 21 13:17 2012_04_01
-rw-r--r-- 1 root root 488630 Apr 21 13:17 2012_04_02
Based on the answer in the linked question I have a script with the following code, which works fine:-
DIR="/tmp/tmp"
for month in $(find "$DIR" -maxdepth 1 -type f | sed 's/.*\/\([0-9]\{4\}_[0-9]\{2\}\).*/\1/' | sort -u); do
echo "Start awk command for files $month"
power=$(awk -F, '{ x += $1 } END { print x/NR }' "$DIR/${month}"_[0-3][0-9])
echo $power
done
The below command on it's own returns a list like this:-
find /tmp/tmp -maxdepth 1 -type f | sed 's/.*\/\([0-9]\{4\}_[0-9]\{2\}\).*/\1/' | sort -u
2011_05
2011_06
2011_07
2011_08
2011_09
2011_10
2011_11
2011_12
2012_01
2012_02
2012_03
2012_04
The find command is passing a set of files using a GLOB to AWK to be processed as a batch.
Based on this, i want to be able to run the following cut commands
head -1 FirstFile | date -d "`cut -d, -f7`" +%s
tail -1 LastFile | date -d "`cut -d, -f7`" +%s
These need to be run for the FIRST and LAST file PER SET
So for 2012_03 above, the head would need to be run for the 2012_03_25 file and the tail would need to be run for the 2012_03_31 as these are the first and last files in the set for March.
So basically I need to be able to get the FIRST and LAST file PER BATCH.
I hope I have made this clear enough, if not please comment.
DIR="/tmp/tmp"
for month in $(find "$DIR" -maxdepth 1 -type f | sed 's/.*\/\([0-9]\{4\}_[0-9]\{2\}\).*/\1/' | sort -u); do
echo "Start awk command for files $month"
IFS=, read start end power < <(awk -F, 'BEGIN{OFS = ","} NR == 1 {printf "%s,", $7} { x += $1; d = $7 } END { print d, x/NR }' "$DIR/${month}"_[0-3][0-9])
echo $power
date -d "$start" +%s
date -d "$end" +%s
done
Here is how you would use a here-doc, which should work in most shells:
read start end power <<EOF
$(awk -F, 'NR == 1 {printf "%s ", $7} { x += $1; d = $7 } END { print d, x/NR }' "$DIR/${month}"_[0-3][0-9]))
EOF

Resources