for each pair of files with the same prefix, execute code - linux

I have a large list of directories, each of which contains a varied number of "paired" files. By paired, I mean the prefix is the same for two files, and the pairs are denoted as "a" and "b". The prefix does not follow a defined pattern either. My broader intentions are to write a bash script that will list all subdirectories in a given directory, cd into each directory, find the pairs of files, and execute a function on the pairs. Here is an example directory:
Dir1
123_a.txt
234_a.txt
123_b.txt
234_b.txt
Dir2
345_a.txt
345_b.txt
Dir3
456_a.txt
567_a.txt
678_a.txt
456_b.txt
567_b.txt
678_b.txt
I can use this code to loop thought each directory:
for d in ./*/ ; do (cd "$d" && script.sh); done
In script.sh, I have been working on writing a script that will find all pairs of files (which is the problem I am struggling to figure out), and then call the function I want to apply to those files. This is the gist of what I have been trying:
for file in ./*_a.txt; do (find the paired file with *_b.txt && run_function.sh); done
Ive broken the problem into needing to get the value of "*" for the _a.txt files, and then searching the directory using this value for the matching _b.txt suffix,and making a subdirectory that I can put them into so I can then apply run_function.sh. So Dir1, would contain subdirectories 123 and 234.
Let me know if this doesn't make sense. The part of the problem I'm struggling with is matching files without a defined prefix.
Thanks for your help.

Use parameter expansion:
#!/bin/bash
file=123_a.txt
prefix=${file%_a.txt} # remove _a.txt from the right
second=${prefix}_b.txt
if [[ -f $second ]] ; then
run_function "$file" "$second"
fi

Related

Rename multiple filename with random numeric extension after one specific alphanumeric word in Linux

I have a folder/subfolders that contain some files with filenames that end with a random numeric extension:
DWH..AUFTRAG.20211123115143.A901.3801176
DWH..AUFTRAGSPOSITION.20211122002147.A901.3798013
I would like to remove everything after A901 from the above filenames.
For example:
DWH..AUFTRAG.20211123115143.A901 (remove this .3801176)
DWH..AUFTRAGSPOSITION.20211122002147.A901 (remove this .3798013) from the filename
How do I use rename or any other command in linux to remove only after A901 everything from finale rest file name keep as it is?
I can see there is 5 '.' (dots) before the number so I did some desi jugad.
I made some files in folder and also made a folder and created some files inside that folder accourding to the name pattern that you gave.
I created a command and it somewhat looks like this.
find "$PWD"|grep A901|while read F; do mv "${F}" `echo ${F}|cut -d . -f 1-5`;done
When executed it worked for me.
terminal output below.
rexter#rexter:~/Desktop/test$ find $PWD
/home/rexter/Desktop/test
/home/rexter/Desktop/test/test1
/home/rexter/Desktop/test/test1/DWH..AUFTRAG.20211123115143.A901.43214
/home/rexter/Desktop/test/test1/DWH..AUFTRAGSPOSITION.2021112200fsd2147.A901.31244324
/home/rexter/Desktop/test/DWH..AUFTRAG.20211123115143.A901.321423
/home/rexter/Desktop/test/DWH..AUFTRAGSPOSITION.20211122002147.A901.3124325
rexter#rexter:~/Desktop/test$ find "$PWD"|grep A901|while read F; do mv "${F}" `echo ${F}|cut -d . -f 1-5`;done
rexter#rexter:~/Desktop/test$ find $PWD
/home/rexter/Desktop/test
/home/rexter/Desktop/test/test1
/home/rexter/Desktop/test/test1/DWH..AUFTRAG.20211123115143.A901
/home/rexter/Desktop/test/test1/DWH..AUFTRAGSPOSITION.2021112200fsd2147.A901
/home/rexter/Desktop/test/DWH..AUFTRAG.20211123115143.A901
/home/rexter/Desktop/test/DWH..AUFTRAGSPOSITION.20211122002147.A901
rexter#rexter:~/Desktop/test$
I dont know if this is a proper way to do it but it just make things work.
Let me know if it is useful to you.

How to rename multiple files in linux and store the old file names with the new file name in a text file?

I am a novice Linux user. I have 892 .pdb files, I want to rename all of them in a sequential order as L1,L2,L3,L4...........,L892. And then I want a text file which contains the old names assigned to new names ( i.e L1,L2,L3). Please help me with this. Thank you for your time.
You could just do:
#!/bin/sh
i=0
for f in *.pdb; do
: $((i += 1))
mv "$f" L"$i" && echo "$f --> L$i"
done > filelist
Note that you probably want to move the files into a different directory, as that will make it easier to recover if an error occurs midway through. Also be wary that this will overwrite any existing files and potentially cause a big mess. It's not idempotent (you can't run it twice). You would probably be better off not doing the move at all and instead do something like:
#!/bin/sh
i=0
mkdir -p newfiles
for f in *.pdb; do
ln "$f" newfiles/L"$((++i))" && printf "%s\0%s\0" "$f" "L$i"
done > filelist
This latter solution creates links to the original files in a subdirectory, so you can run it multiple times without munging the original data. Also, it uses null separators in the file list so you can unambiguously distinguish names that have newlines or tabs or spaces in them. It makes for a list that is not particularly human readable, but you can easily filter it through tr to make it pretty.

find returning inverted results

In a few words a wrote this little script to clean up some directories where I had consolidated directories/files from multiple sources where I used the cp command with the --backup=numbered feature so that files with identical names would have a suffix like .~1~ appended to avoid overwriting. I then ran fdupes to remove duplicate files, in some cases fdupes removed the file which did not have the suffix appended from the cp command (the original file) so I wanted to scan the directories looking for files with the suffix appended by the cp command and if the file does not exist with the suffix removed I would move mv the file otherwise I would leave it to avoid deleting anything as fdupes did not think it was a duplicate.
The issues is the test condition if [ -f ... ] part of the code below returns inverted results than what it should and I cannot understand why. For example, when the file exists it would return false and when the file did not exist it would return true. I fixed it by reversing the actions that I wanted to do based on the inverted return code and verified it was working as intended and it was so I ran it as such but would like to know if anyone knows why it would behave the way it did. I am not a bash script expert by any means so its possible that I missed something simple.
#!/bin/bash
logfile=$$.log
exec > $logfile 2>&1
IFS='
'
#set -f
for FILE in $(find . -type f -regextype posix-extended -regex '^.*(\.~[0-9]+~)+$')
do
FILE2=${FILE%%.~[0-9]*} # remove the suffix
if [ -f "${FILE2}" ]
then
echo ERROR: "${FILE2}" already exists!
else
echo "${FILE}" renamed "${FILE2}"
mv "${FILE}" "${FILE2}"
fi
done
You might be able to see the problem by modifying your script to show both FILE and FILE2 in the error message. There are a few minor problems with the script which could cause some confusion (but not the "inverted" logic):
find output is not sorted. If you had more than one backup file, a randomly chosen one would replace the original file;
you could sort the output using an expression like |sort -t~ -n -k2 on the end of the find-command.
the regular expression allows multiple matches of the ~[0-9]~ pattern. Conceivably you could have some odd file which ends with ~1~~2~.
the part where the suffix is removed assumes a single ~[0-9]~ is on the end of the filename. An embedded ~0, e.g., foo~0bar~1~ would reduce FILE to foo. The workaround for that would be more cumbersome (since the suffix-stripping uses globbing), but could be done with a case statement which matched an explicit number of digits (likely three digits would be enough).

BASH - Only printing the deepest directory in path

I need some help.....
In my .bashrc file I have a VERY useful function (It may be a bit rough and ready, and a bit hacky, but it works a treat!) that reads an input file, and uses the 'tree' function on each of the input lines to create a directory tree. this tree is then printed into an output file (along with the size of the folder).
multitree()
{
while read cheese
do
pushd . > /dev/null
pushd $cheese > /dev/null
echo -e "$cheese \n\n" >> ~/Desktop/$2.txt
tree -idf . >> ~/Desktop/$2.txt
echo -e "\n\n\n" >> ~/Desktop/$2.txt
du -sh --si >> ~/Desktop/$2.txt
echo -e "\n\n\n\n\n\n\n" >> ~/Desktop/$2.txt
popd > /dev/null
done < $1
cat ~/done
}
This is a time saver like no end, and outputs a snippet like the following:
./foo
./foo/bar
./foo/bar/1
./foo/bar/1/2
etc etc....
however, the first (and most tedious) thing I need to do is remove all entries leaving only the deepest folder path (Using the above example it would be reduced to just ./foo/bar/1/2)
Is there a way of processing the file before/after the tree function to only print the deepest levels?
I know something like python might do a better job, but my issue is I've never used python And I'm not sure the work systems would let me run python... they let us modify our own .bashrc so I'm not too worried!
Thanks in advance guys!!!!
Owen.
You could use
find . -type d -links 2
Replace . with a directory if desired.
EDIT: Explanation:
find searches a directory for files that match a given filter. In this case, the directory is ., and the filter is -type d -links 2.
-type d filters for directories
-links 2 filters for those that have two (hard) links to their name. Effectively, this filters for all directories that have no subdirectories, because only those have two: The one in their parent directory and the . link in themselves. Those with subdirectories also have the .. links in their subdirectories.
Here's a hint:
You just need to count the number of "/" characters in each line.
If the current line has fewer than the number of "/" characters in the preceding line, the preceding line would be the "deepest" directory in its part of the hierarchy.
This line, and any subsequent line with still fewer "/" characters would NOT be the deepest directory in its part of the entire directory hierarchy. As soon as you get a line with the same number of "/" characters, or greater, then you can "reset" and, once again, keep an eye out for the first line with the fewer number of "/" characters.
And, finally, you need to handle the trivial case: only one line in your tree output, the current directory has no subdirectories, so it wins by default.
Another way you can implement this is by considering the following statement:
If a directory's name also exists as an exact prefix of another directory in the list, followed by the "/" character, then it is NOT the deepest directory in its part of the hierarchy.

How would I flatten and overlay multiple directories into one directory?

I want to take a list of directory hierarchies and flatten them into a single directory. Any duplicate file later in the list will replace an earlier file. For example...
foo/This/That.pm
bar/This/That.pm
bar/Some/Module.pm
wiff/This/That.pm
wiff/A/Thing/Here.pm
This would wind up with
This/That.pm # from wiff/
Some/Module.pm # from bar/
A/Thing/Here.pm # from wiff/
I have a probably over complicated Perl program to do this. I'm interested in the clever ways SO users might solve it. The big hurdle is "create the intermediate directories if necessary" perhaps with some combination of basename and dirname.
The real problem I'm solving is checking the difference between two installed Perl libraries. I'm first flattening the multiple library directories for each Perl into a single directory, simulating how Perl would search for a module. I can then diff -r them.
If you do not mind the final order of the entries, I guess this can do the job:
#!/bin/bash
declare -A directory;
while read line; do
directory["${line#*/}"]=${line%%/*}
done < $1
for entry in ${!directory[#]}; do
printf "%s\t# from %s/\n" $entry ${directory[$entry]}
done
Output:
$ ./script.sh files.txt
A/Thing/Here.pm # from wiff/
This/That.pm # from wiff/
Some/Module.pm # from bar/
And if you need to move the files, then you can simply replace the printing step with a mv -- or cp --, like this:
for entry in ${!directory[#]}; do
mv "${directory[$entry]}/$entry" "your_dir_path/$entry"
done

Resources