Bash script to search multiple files with a string mentioned in a different file, then copy those files into a new directory - linux

I have multiple files recorded per date. At the end of every day, I need to run a script to grep those files which contain particular numbers mentioned in a different file, then copy all those files (CSV records) which contain matching UID records to another file location.
Working Dir = /var/output
Search file name = /var/output/UID.txt
##Cat UID.txt
639867675
123466490
123334555
filenames = CSV_name_date.csv
Each filename is unique, and in a day I get roughly around 5000 files.
I'm using this code,
grep -f uid.txt -e stringpattern -l | xargs cp -t /var/output2/
I need to run a search on a particular date, the script should ask you which date you want to run and run the search on files of those dates only.

Related

Loop through directory and files with with date string to find the file with highest suffix (e.g. "firsttable_20230113093000_12")

I am looking to adapt a shell script to find a way to cycle through files that have different table names as well as different dates between files that have the same table name, and return the highest suffix file.
An example of my files in a given directory:
firsttable_20230112093000_1
firsttable_20230112093000_2
firsttable_20230112093000_3
firsttable_20230112093000_4
firsttable_19990202090000_1
firsttable_19990202090000_2
secondtable_20220112090000_1
secondtable_20220112090000_2
secondtable_20220112090000_3
Desired Result:
firsttable_20230112093000_4
firsttable_19990202090000_2
secondtable_20220112090000_3
What's been done
Originally I only needed to find the highest suffix as the dates would be the same for all tables, and what I had worked:
allTables=(
'firsttable'
'secondtable'
'thirdtable'
...
)
for table in ${allTables[#]}; do
substring="_2"
searchstring="$table$substring"
# Check if the file for a given table exists:
if ls $Path/$searchString* 1> /dev/null 2>&1; then
echo "$searchString* files exist. Proceeding..."
lastFile=$(ls "$Path/searchString"* | sort -rV | head -n1)
echo "Highest suffix file: $lastFile"
else
echo "File searchstring not found: '$Path/$searchString' "
fi
done
If I was to apply that to my new directory shown above, it would only be able to find:
Highest suffix file: firsttable_20230112093000_4
Highest suffix file: secondtable_20220112090000_3
I need to find a way to make the script also look at the dates and see if they are different, and if they are, treat them as such. Would this require a regex to assess the filename? The filename format stays the same: "tablename_$$$$$$$$$$$$$$_nn" (underscore placing after table and date, suffix can go above single figures, date is always 14 characters)
Thanks in advance for any help!

Recursively appending names of all files in a directory with exif specific png meta data field (aesthetic_score) with linux / EXIFtool

I am trying to rename all files located in a directory (recursively) with a specific meta data field appended to the end of the png file name.
the meta data field name is "aesthetic_score" with a value range from 1.0-9.0
when I type:
exiftool -Aesthetic_score -G1 -s testn.png
the result is:
[PNG] Aesthetic_score : 7.0
This is how I would like to append the png files recursively within a directory.
Note i would like to swap out the word aesthetic with the word chad in the append, and not all files will have this data field:
input file:
filename001.png (metadata aesthetic_score:7.0)
output:
filename001-chad-score-70.png
I tried to use Digikam and JExifToolGui-2.01, without success.
I am trying to perform this task in the cmd line, although other solutions are welcome. Thank you for your help.
So, this might work for you, I can't really test it; note that you would need to get rid of the echo before the mv for it to actually do something (rename rather than just show what it would do).
while read name
do
newname=$(exiftool -G1 -s "$name"|awk '$2~/FileName/{name=$4}; $2~/Aesthetic_score/{basename=gensub(/^(.+)\....$/,"\\1","1",name);ext=gensub(/^.*\.(...)$/,"\\1","1",name);gsub(/\./,"",$4);print basename"."$4"."ext}')
echo mv "$name" "$newname"
done <<<$( find -iname \*.png )
Basically the find at the very end finds all the pngs.
The while loop takes every name find throws it, and passes each file through exiftool (using your specs) and parses the output using awk, which then outputs the new name, which gets captured in the shell variable by the same name.
And finally the mv (without the echo) renames the files.

How to rename multiple files while keeping extension based on provided txt file?

I have a folder with many files that look like:
A1_R1.fastq
A2_R1.fastq
A3_R1.fastq
I would like to rename the files based on a text file keeping the _R1.fastq but changing the A# to a specific samples name (example):
A1_R1.fastq KUG_R1.fastq
A2_R1.fastq AUG_R1.fastq
A3_R1.fastq TRY_R1.fastq
I'd also like an output directory which contains all my newly names .fastq files.
I tried this to no avail (only a few were renamed):
ls *.fastq| paste -d' ' - $PATH/txt | xargs -n2 mv
Thank you.

How do you format output string in bash script for input by another script?

I need to unzip a bunch of student assignment (jar) files so that I can use a script to submit the contents to the Moss (Stanford) plagiarism detection server. I did the same thing in Java which was trivial but I'm trying to re-implement to as a bash script.
I am trying to do the following:
Get a list of student names (each student has a directory).
In each student directory, sub-directories exist numbered from 1 to the
latest submission. I need to get the directory with the highest
number.
Inside of each of those submission directories contains a
jar file that I need. I copy each jar into a temp directory with the
same name as the student and unzip it.
I need that temp directory listing formatted as a string in the form
/tempDir/studentName1/.languageExt /tempDir/studentName2/.languageExt
The student directory has the basic structure:
Student_Root_Directory:
Student1
Student2
Student1
Sub-Directories: 1 2 3 4 5
1: student1.jar
2: student1.jar
...
Student2
Sub-Directories: 1 2 3
1. student2.jar
...
To do the first 3 steps above I did:
#!/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: .c, .cpp, .java, .py
students=`ls $1`
student_dir=$1
languageExt=$2
mossDir="/home/moss"
tempDir="/home/moss/tempJarStorage"
for student in $students
do
latestSubmissionDir=`ls -t $student_dir/$student | head -1`
for jarDir in $latestSubmissionDir
do
mkdir $tempDir/$student
cp $student_dir/$student/$jarDir/*.jar $tempDir/$student
unzip -d $tempDir/$student/ -o -j $tempDir/$student/$student.jar *.$languageExt
rm $tempDir/$student/$student.jar
done
done
...which results in a number of student directories being created in a temp directory that contains only the unzipped contents for the student submissions.
I need the ls output of the new temp directories formatted as a string that contains:
/tempDir/studentName1/\*.languageExt /tempDir/studentName2/\*.languageExt
I have tried variations on
find "$tempDir" -iname "*.$languageExt" -printf "%p/*.$languageExt"
using iname and not - but I either have output that contains extra directory information such as $tempDir/*.languageExt (when I just need the subdirectories $tempDir/$studentName/*.languageExt) or I have output where the path for every source file is also listed such as:
$tempDir/$studentName/studentNameA.java
$tempDir/$studentName/studentNameB.java
when I only need
$tempDir/$studentName/*.java
I think this should be really easy and I'm just over thinking it. Any hints for improving the script also appreciated.
Here's a revised version of the script hat may work:
#/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: c, cpp, java, py
students_dir=$1
languageExt=$2
studentPathsT=( "$students_dir"/*/ )
mossDir='/home/moss'
tempDir='/home/moss/tempJarStorage'
for studentPathT in "${studentPathsT[#]}"; do
student=$(basename "$studentPathT")
mkdir "$tempDir/$student"
submissionDirsT=( "$studentPathT"*/ )
latestSubmissionDirT=${submissionDirsT[${#submissionDirsT[#]-1]}
cp "$latestSubmissionDirT"*.jar "$tempDir/$student/"
unzip -d "$tempDir/$student/" -o -j "$tempDir/$student/*.jar" "*.$languageExt"
rm "$tempDir/$student"/*.jar
done
# Note that at this point `"$tempDir"/*/*.$languageExt` would expand
# to all extracted submission files, across all students.
# Finally, output each student's extracted files as an unexpanded glob à la
# /{tempDir}/{studentName1}/*.{languageExt}
for pT in "$tempDir"/*/; do
echo "$pT*.$languageExt"
# Note: If there is a chance that your filenames contain
# embedded newlines (rare in practice) using `echo` won't work properly
# as #Charles Duffy points out.
# If that is a concern, use
# printf '%s\0' "$pT*.$languageExt"
# and process the output with a utility that can process NUL characters
# as separators, such as `xargs -0`.
done
It avoids using ls and only uses pathname expansion and array variables so as to properly deal with paths that contain embedded spaces and other shell metacharacters.
suffix ...T in variable names indicates that a particular path or array of paths is *T*erminated, i.e, that it ends in a /.
The assumption is that the numbered subdirectories do not go beyond 9, as the implicit lexical sorting of pathname expansion is relied upon; if the numbers go higher, explicit numerical sorting must be applied.
Note that the globs (pathname patterns) passed to unzip are intentionally double-quoted, as they should be interpreted by unzip, not the shell.
Note that, based on your original code, I've assumed that $languageExt does NOT start with . (e.g., cpp rather than .cpp), despite what your comment says.

How to move and number files?

I working with linux, bash.
I have one directory with 100 folders in it, each one named different.
In each of these 100 folders, there is a file called first.bars (so I have 100 files named first.bars). Although all named first.bars, the files are actually slightly different.
I want to get all these files moved to one new folder and rename/number these files so that I know which file comes from which folder. So the first first.bars file must be renamed to 001.bars, the second to 002.bars.. etc.
I have tried the following:
ls -d * >> /home/directorywiththe100folders/list.txt
cat list.txt | while read line;
do cd $line;
mv first.bars /home/newfolder
This does not work because I can't have 100 files, named the same, in one folder. So I only need to know how to rename them. The renaming must be connected to the cat list.txt, because the first line is the folder containing the first file wich is moved and renamed. That file will be called 001.bars.
Try doing this :
$ rename 's/^.*?\./sprintf("%03d.", $c++)/e' *.bar
If you want more information about this command, see this recent response I gave earlier : How do I rename multiple files beginning with a Unix timestamp - imapsync issue
If the rename command is not available,
for d in /home/directorywiththe100folders/*/; do
newfile=$(printf "/home/newfolder/%d.bars" $(( c++ )) )
mv "$d/first.bars" "$newfile"
done

Resources