Using * to parse through files. Need to write file names - linux

I have the following problem using UNIX Commands. I wish to go through a large number of files and convert them using a command that converts them. My idea is to work like this: command *.fileending > *.newfileending
The problem is that I wish to keep the file-names and only replace the file-ending. Thus filename.fileending should become filename.newfileending. How do I achieve this?

Use a for loop:
for file in *.krn; do
hum2mid "$file" -o "${file%.krn}.mid"
done
In a single line: for file in *.krn; do hum2mid "$file" -o "${file%.krn}.mid"; done
To apply the command to files and subdirectories recursively, use the find|xargs pattern:
find -type f -name '*.krn' -print0 \
| xargs -0 -n1 sh -c 'hum2mid "$1" -o "/destination/dir/$(basename ${1%.krn}.mid)"' -
Note that this will overwrite already converted files, if a file from another directory has the same name.

rename .fileending .newfileending *

#!/bin/bash
ls -1 *.fileending | while read i; do
command "$i" > "${i/%.fileending/.newfileending}"
done
if you need process 'weird' filenames ( like with embedded '\n', for example ), you can use following trick:
create file foo.sh:
#!/bin/bash
command "$1" > "${1/%.fileending/.newfileending}"
, then do chmod +x foo.sh and finally find . -maxdepth 1 -a -type f -a -name '*.fileending' -print0 | xargs -0 -n 1 -J '%' ./foo.sh "%"

Related

Grep files in subdirectories and write out files for each directory

I am working on a bioinformatics workflow in which the tool in question, 'salmon' creates multiple directories having a 'quant.sf' file. I want to find all 'lnc' entries within these files and save them as 'lnc.sf' for all directories.
I was previously running
cat quant.sf | grep 'lnc' > lnc.sf
in all directories individually that seemed to solve my problem. Now I want to write a script that goes into each directory and generates a lnc.sf file.
I have tried doing
find . -name "quant.sf" | while read A
do
cat $A | grep 'lnc' > lnc.sf
done
But this just creates a concatenated lnc.sf file in the current directory. Any help is highly appreciated.
Thank You!
If all your quant.sf files are at the same hierarchy level, the following should work, assuming a folder structure like month/day/quant.sf:
grep -h 'lnc' */*/quant.sf > lnc.sf
Otherwise, find the files, be aware of using find+read instead of exec or xargs; understand variable expansion with whitespaces, get rid of the redundant cat process, and write the file to the correct directory:
find . -name 'quant.sf' | while IFS= read -r A
do
grep 'lnc' "$A" > "${A%/*}/lnc.sf"
done
If you have GNU find + xargs, use -print0 combined with -0:
find . -name 'quant.sf' -print0 | xargs -0 -n1 sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' -
Or use -exec of find, which avoids problems with weird files names:
find . -name 'quant.sf' -exec sh -c 'grep "lnc" "$1" > "${1%/*}/lnc.sf"' - ';'

LINUX Copy the name of the newest folder and paste it in a command [duplicate]

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?
BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.
There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.
This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.
Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)
The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)
With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.
Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Need guidance with a bash script to check log files in a certain directory for a certain string

I would like to preface this with I am a complete noob with scripting. So I have a situation where I need to manually look for a phone number that could live in one of hundreds of files.
so the logs live in the following directory.
/actlogs/sbclogger_archive
The logs file names are in directories numbered 01-31 inside of that directory and all the files are zipped.
Inside of those numbered directories are tons of files but the only ones I want to search are "sipd.logthenthedate.gz" and "sipmsg.logthenthedate.gz".
So I need to look in all the files in the following directory.
"/actlogs/sbclogger_archive"
Which has 31 directories labeled "01-31"
Then in each 01-31 there is hundreds of files the only ones I want to look are are "sipd.logthenthedate.gz" and "sipmsg.logthenthedate.gz".
The script I am using is below, please let me know what I could do to make this work.
#!/bin/bash
read -p "Enter a phone number: " text
read -p "Enter directory of log file's, Hint it should be /actlogs/sbclogger_archive: " directory
#arr=( $(find $directory -type f -exec grep -l "$text" {} \; | sort -r) )
#find $directory -type f -exec grep -qe "$text" {} \; -exec bash -c '
file=$(find $directory -type f -name 'sipd.log*' -exec grep -qe "$text" {} \; -exec bash -c 'select f; do echo $f; break; done' find-sh {} +;)
if [ -z "$file" ]; then
echo "No matches found."
else
echo "select tool:"
tools=("nano" "less" "vim" "quit")
select tool in "${tools[#]}"
do
case $tool in
"quit")
break
;;
*)
$tool $file
break
;;
esac
done
fi
This would give you the list of files matching:
find \( -name 'sipd.log[0-9]*.gz' -o -name 'sipmsg.log[0-9]*.gz' \) \
-exec sh -c 'gunzip -c {}| grep -m1 -q 888333' \; -print
./18/sipd.log20200118.gz
./7/sipd.log20200107.gz
Note: -m1 tells grep to stop after first match, since you need only the file name in this case, it's enough.
If you have zgrep, you can shorten it to:
find \( -name 'sipd.log[0-9]*.gz' -o -name 'sipmsg.log[0-9]*.gz' \) \
-exec zgrep -l '888333' {} \;
./18/sipd.log20200118.gz
./7/sipd.log20200107.gz
Also, some of the tools you are suggesting do not support gzip files (nano and some variants of less for example). In which case you might need to decompress the file and compress it again when done.
And, you might want to consider a loop if you want to "quit". Feeding the file list to the tool doesn't make sense.
Note: AFAIK zgrep doesn't do recursive:
DESCRIPTION
Zgrep invokes grep on compressed or gzipped files. These grep options will cause zgrep to terminate with an
error code:
(-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*). All other options specified are passed directly to grep. If no file is specified, then
the
standard input is decompressed if necessary and fed to grep. Otherwise the given files are uncompressed if necessary and fed to
grep.
so zgrep -rl "$text" "$directory" or zgrep -rl --include 'simpd.log*.gz' "$test" {01..31} won't work except if you have a special zgrep
As you must unzip before using your tool, i would divide the problem in two blocks.
Firstly, i would expand the paths you need (looking under <directory> for the phone <text>), and then iterate to apply the tool (because some tools like vim or nano cannot be piped).
Try something like this:
#!/bin/bash
#...
# text/directory input stuff
#...
tmpdir=$(mktemp -d)
trap 'rm -rf ${tmpdir}' EXIT
while IFS= read -r file; do
unzipped=${tmpdir}/$(basename "${file}" .gz)
gunzip -c "${file}" > "${unzipped}"
${tool} "${unzipped}"
done < <(zgrep -lw "${text}" "${directory}"/{01..31}/{sipd.logthenthedate.gz,sipmsg.logthenthedate.gz} 2>/dev/null)
Above is the proposed invert-form by Charles Duffy following this Bash FAQ.
If you prefer to iterate an array, you could build in this way:
# shellcheck disable=SC2207
files=( $(zgrep -lw "${text}" "${directory}"/{01..31}/{sipd.logthenthedate.gz,sipmsg.logthenthedate.gz} 2>/dev/null) )
for file in "${files[#]}"; do
# etc.
as in our particular case, the files to match have no spaces in their names and shellcheck warning is not so important (hidden above).
BRs

From directories create files changing their ending

I have several directories with a pattern:
$find -name "*.out"
./trnascanse.out
./darn.out
./blast_rnaz.out
./erpin.out
./rnaspace_cli.out
./yass.out
./atypicalgc.out
./blast.out
./combine.out
./infernal.out
./ecoli.out
./athaliana.out
./yass_carnac.out
./rnammer.out
I can get the list into a file find -name "*.out" > files because I want to create for each directory a file ending with .ref instead of .out : trnascanse.ref, darn.ref, blast_rnaz.refand so on.
I would say that this is possible with some grep and touch but I don't know how to do it. Any idea? Or just create each one manually is the only way (as I did with this directories). Thanks
Here's one way:
for d in *.out ; do echo touch "${d%.out}.ref" ; done
The ${d%.out} expands $d and removes the trailing .out. Read about it in the bash man page.
If the output of above one-liner looks ok, pipe it to sh , or remove the echo and re-run it.
Use this:
find -maxdepth 1 -type d -printf "%f" -exec bash -c "mkdir $(echo '{}' | sed 's/\.out$//').ref" \;

applying sort commands to many files one by one

I have many files 1.txt, 2.txt, ... 100.txt.
I want to sort data in each file like sort -n 1.txt > 1_sorted.txt
I want to know how to do it with many files by simple commands.
This will allow you to parallelise the sorting using GNU parallel:
parallel sort {} -o {.}_sorted.txt ::: *.txt
Use a simple for loop:
for f in {1..100}; do
sort -n "$f.txt" > "${f}_sorted.txt"
done
You can run shell scrip as below.(sort.sh)
#!/bin/bash
for f in *.txt
do
sort -n "$f" > "sorted_$f"
done
run this with in current folder and with execution permissions (chmod +x sort.sh)
find . -maxdepth 1 -name \*.txt -print0 |
xargs -0 -n 1 -I{} bash -c 'sort -n {} > `basename -s .txt {}`_sorted.txt'

Resources