Finding regular expression in list of files - linux

I have several header files in a directory with the format imageN.hd where N is some integer. Only one of these header files contains the text 'trans'. What I am trying to do is find which image contains this expression using csh (I need to use csh for this purpose - although I can call sed or perl one-liners) and show the corresponding image.
show iN
Here is my initial unsophisticated approach which does not work.
#find number of header files in directory
set n_images = `ls | grep 'image[0-9]*.hd' | wc -l`
foreach N(`seq 1 n_images`)
if (`more image$N{.hd} | grep -i 'trans`) then
show i$N
sc c image #this command uses an alias to set the displayed image as current within the script
endif
end
I'm not sure what is wrong with the above commands but it does not return the correct image number.
Also I'm sure there is a more elegant one line perl or sed solution but I am fairly unfamiliar with both

show `grep -l trans image[0-9]*.hd | sed 's/image/i/`

Related

Find and copy specific files by date

I've been trying to get a script working to backup some files from one machine to another but have been running into an issue.
Basically what I want to do is copy two files, one .log and one (or more) .dmp. Their format is always as follows:
something_2022_01_24.log
something_2022_01_24.dmp
I want to do three things with these files:
find the second to last one .log file (i.e. something_2022_01_24.log is the latest,I want to find the one before that say something_2022_01_22.log)
get a substring with just the date (2022_01_22)
copy every .dmp that matches the date (i.e something_2022_01_24.dmp, something01_2022_01_24.dmp)
For the first one from what I could find the best way is to do: ls -t *.log | head-2 as it displays the second to last file created.
As for the second one I'm more at a loss because I'm not sure how to parse the output of the first command.
The third one I think I could manage with something of the sort:
[ -f "/var/www/my_folder/*$capturedate.dmp" ] && cp "/var/www/my_folder/*$capturedate.dmp" /tmp/
What do you guys think is there any way to do this? How can I compare the substring?
Thanks!
Would you please try the following:
#!/bin/bash
dir="/var/www/my_folder"
second=$(ls -t "$dir/"*.log | head -n 2 | tail -n 1)
if [[ $second =~ .*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log ]]; then
capturedate=${BASH_REMATCH[1]}
cp -p "$dir/"*"$capturedate".dmp /tmp
fi
second=$(ls -t "$dir"/*.log | head -n 2 | tail -n 1) will pick the
second to last log file. Please note it assumes that the timestamp
of the file is not modified since it is created and the filename
does not contain special characters such as a newline. This is an easy
solution and we may need more improvement for the robustness.
The regex .*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log will match the log
filename. It extracts the date substring (enclosed with the parentheses) and assigns the bash variable
${BASH_REMATCH[1]} to it.
Then the next cp command will do the job. Please be cateful
not to include the widlcard * within the double quotes so that
the wildcard is properly expanded.
FYI here are some alternatives to extract the date string.
With sed:
capturedate=$(sed -E 's/.*_([0-9]{4}_[0-9]{2}_[0-9]{2})\.log/\1/' <<< "$second")
With parameter expansion of bash (if something does not include underscores):
capturedate=${second%.log}
capturedate=${capturedate#*_}
With cut command (if something does not include underscores):
capturedate=$(cut -d_ -f2,3,4 <<< "${second%.log}")

How to create a dynamic command in bash?

I want to have a command in a variable that runs a program and specifies the output filename for it depending on the number of files exits (to work on a new file each time).
Here is what I have:
export MY_COMMAND="myprogram -o ./dir/outfile-0.txt"
However I would like to make this outfile number increases each time MY_COMMAND is being executed. You may suppose myprogram creates the file soon enough before the next call. So the number can be retrieved from the number of files exists in the directory ./dir/. I do not have access to change myprogram itself or the use of MY_COMMAND.
Thanks in advance.
Given that you can't change myprogram — its -o option will always write to the file given on the command line, and assuming that something also out of your control is running MY_COMMAND so you can't change the way that MY_COMMAND gets called, you still have control of MY_COMMAND
For the rest of this answer I'm going to change the name MY_COMMAND to callprog mostly because it's easier to type.
You can define callprog as a variable as in your example export callprog="myprogram -o ./dir/outfile-0.txt", but you could instead write a shell script and name that callprog, and a shell script can do pretty much anything you want.
So, you have a directory full of outfile-<num>.txt files and you want to output to the next non-colliding outfile-<num+1>.txt.
Your shell script can get the numbers by listing the files, cutting out only the numbers, sorting them, then take the highest number.
If we have these files in dir:
outfile-0.txt
outfile-1.txt
outfile-5.txt
outfile-10.txt
ls -1 ./dir/outfile*.txt produces the list
./dir/outfile-0.txt
./dir/outfile-1.txt
./dir/outfile-10.txt
./dir/outfile-5.txt
(using outfile and .txt means this will work even if there are other files not name outfile)
Scrape out the number by piping it through the stream editor sed … capture the number and keep only that part:
ls -1 ./dir/outfile*.txt | sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:'
(I'm using colon : instead of the standard slash / so I don't have to escape the directory separator in dir/outfile)
Now you just need to pick the highest number. Sort the numbers and take the top
| sort -rn | head -1
Sorting with -n is numeric, not lexigraphic sorting, -r reverses so the highest number will be first, not last.
Putting it all together, this will list the files, edit the names keeping only the numeric part, sort, and get just the first entry. You want to assign that to a variable to work with it, so it is:
high=$(ls -1 ./dir/outfile*.txt | sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:' | sort -rn | head -1)
In the shell (I'm using bash) you can do math on that, $[high + 1] so if high is 10, the expression produces 11
You would use that as the numeric part of your filename.
The whole shell script then just needs to use that number in the filename. Here it is, with lines broken for better readability:
#!/bin/sh
high=$(ls -1 ./dir/outfile*.txt \
| sed -e 's:^.*dir/outfile-\([0-9][0-9]*\)\.txt$:\1:' \
| sort -rn | head -1)
echo "myprogram -o ./dir/outfile-$[high + 1].txt"
Of course you wouldn't echo myprogram, you'd just run it.
you could do this in a bash function under your .bashrc by using wc to get the number of files in the dir and then adding 1 to the result
yourfunction () {
dir=/path/to/dir
filenum=$(expr $(ls $dir | wc -w) + 1)
myprogram -o $dir/outfile-${filenum}.txt
}
this should get the number of files in $dir and append 1 to that number to get the number you need for the filename. if you place it in your .bashrc or under .bash_aliases and source .bashrc then it should work like any other shell command
You can try exporting a function for MY_COMMAND to run.
next_outfile () {
my_program -o ./dir/outfile-${_next_number}.txt
((_next_number ++ ))
}
export -f next_outfile
export MY_COMMAND="next_outfile" _next_number=0
This relies on a "private" global variable _next_number being initialized to 0 and not otherwise modified.

Copying files from text list

I have a list of sample names in text format e.g.
Sample1
Sample2
etc....
Im trying to find and copy files with these names and a specific extension using the below one liner
find ./ | egrep fq.gz | fgrep -f list.txt | perl -ne 'chomp; system "cp $_ /data/copy_of_files/"'
No errors are thrown up but nothing is copied.
This line works until I pass the output to perl (list of correct files prints in termial from the fgrep) so I think my issue is with the perl section...
any suggestions?
Both your original command and this command worked for me:
find . -name '*.gz' | fgrep -f list.txt | \
perl -ne 'chomp; system("cp $_ <DIR>");'
Have you verified that your user or group have write permissions to /data/copy_of_files?
Suggestions:
Don't use Perl here, but instead, create a pipeline that just spits
out the cp commands. As soon as you're satisfied with what you
see, append | sh -x.
Be absolutely sure that none of your file names contain whitespace or other characters special to the shell. If some do, but only a little (e.g. only spaces), you may get by with appropriate quoting in the cp commands, but if anything is possible in filenames, a different approach will be required, and I would probably write the whole thing in Perl using File::Find::Rule.

Linux command most recent non soft link file

Linux command: I am using following command which returns the latest file name in the directory.
ls -Art | tail -n 1
When i run this command it returns me latest file changed which is actually soft link, i wants to ignore soft link in my result, and wants to get file names other then soft link how can i do that any quick help appreciated.
May be can i specify regex matched latest file file name is
rum-12.53.2.war
-- Latest file in directory without softlink
ls -ArtL | tail -n 1
-- Latest file without extension
ls -ArtL | sed 's/\(.*\)\..*/\1/' | tail -n 1
The -L option for ls does dereference the link, i.e. you'll see the information of the reference instead of the link. Is this what you want? Or would you like to completely ignore links?
If you want to ignore links completely you can use this solution, although I am sure there exists an easier one:
a=$( ls -Artl | grep -v "^l" | tail -1 )
aa=()
for i in $(echo $a | tr " " "\n")
do
aa+=($i)
done
aa_length=${#aa[#]}
echo ${aa[aa_length-1]}
First you store the output of your ls in a variable called a. By grepping for "^l" you chose only symbolic links and with the -v option you invert this selection. So you basically have what you want, only downside is that you need to use the -l option for ls, as otherwise there's no grepping for "^l". So in the second part you split the variable a by " " and fill an array called aa (sorry for the bad naming). Then you need only the last item in aa, which should be the filename.

How to tell how many files match description with * in unix

Pretty simple question: say I have a set of files:
a1.txt
a2.txt
a3.txt
b1.txt
And I use the following command:
ls a*.txt
It will return:
a1.txt a2.txt a3.txt
Is there a way in a bash script to tell how many results will be returned when using the * pattern. In the above example if I were to use a*.txt the answer should be 3 and if I used *1.txt the answer should be 2.
Comment on using ls:
I see all the other answers attempt this by parsing the output of
ls. This is very unpredictable because this breaks when you have
file names with "unusual characters" (e.g. spaces).
Another pitfall would be, it is ls implementation dependent. A
particular implementation might format output differently.
There is a very nice discussion on the pitfalls of parsing ls output on the bash wiki maintained by Greg Wooledge.
Solution using bash arrays
For the above reasons, using bash syntax would be the more reliable option. You can use a glob to populate a bash array with all the matching file names. Then you can ask bash the length of the array to get the number of matches. The following snippet should work.
files=(a*.txt) && echo "${#files[#]}"
To save the number of matches in a variable, you can do:
files=(a*.txt)
count="${#files[#]}"
One more advantage of this method is you now also have the matching files in an array which you can iterate over.
Note: Although I keep repeating bash syntax above, I believe the above solution applies to all sh-family of shells.
You can't know ahead of time, but you can count how many results are returned. I.e.
ls -l *.txt | wc -l
ls -l will display the directory entries matching the specified wildcard, wc -l will give you the count.
You can save the value of this command in a shell variable with either
num=$(ls * | wc -l)
or
num=`ls -l *.txt | wc -l`
and then use $num to access it. The first form is preferred.
You can use ls in combination with wc:
ls a*.txt | wc -l
The ls command lists the matching files one per line, and wc -l counts the number of lines.
I like suvayu's answer, but there's no need to use an array:
count() { echo $#; }
count *
In order to count files that might have unpredictable names, e.g. containing new-lines, non-printable characters etc., I would use the -print0 option of find and awk with RS='\0':
num=$(find . -maxdepth 1 -print0 | awk -v RS='\0' 'END { print NR }')
Adjust the options to find to refine the count, e.g. if the criteria is files starting with a lower-case a with .txt extension in the current directory, use:
find . -type f -name 'a*.txt' -maxdepth 1 -print0

Resources