How would I flatten and overlay multiple directories into one directory? - linux

I want to take a list of directory hierarchies and flatten them into a single directory. Any duplicate file later in the list will replace an earlier file. For example...
foo/This/That.pm
bar/This/That.pm
bar/Some/Module.pm
wiff/This/That.pm
wiff/A/Thing/Here.pm
This would wind up with
This/That.pm # from wiff/
Some/Module.pm # from bar/
A/Thing/Here.pm # from wiff/
I have a probably over complicated Perl program to do this. I'm interested in the clever ways SO users might solve it. The big hurdle is "create the intermediate directories if necessary" perhaps with some combination of basename and dirname.
The real problem I'm solving is checking the difference between two installed Perl libraries. I'm first flattening the multiple library directories for each Perl into a single directory, simulating how Perl would search for a module. I can then diff -r them.

If you do not mind the final order of the entries, I guess this can do the job:
#!/bin/bash
declare -A directory;
while read line; do
directory["${line#*/}"]=${line%%/*}
done < $1
for entry in ${!directory[#]}; do
printf "%s\t# from %s/\n" $entry ${directory[$entry]}
done
Output:
$ ./script.sh files.txt
A/Thing/Here.pm # from wiff/
This/That.pm # from wiff/
Some/Module.pm # from bar/
And if you need to move the files, then you can simply replace the printing step with a mv -- or cp --, like this:
for entry in ${!directory[#]}; do
mv "${directory[$entry]}/$entry" "your_dir_path/$entry"
done

Related

How do I navigate fast through directories with the command line?

I spent some time finding a solution for my problem but google couldn't provide me a sufficient answer... I'm working a lot with the command line in linux and I simply need a way to navigate fast through my file system. I don't want to type cd [relative or absoulte path] all the time. I know there is pushd and popd but that still seems too complicated for a simple problem like this.
When I'm in ~/Desktop/sampleFile I simply want to use sampleCommand fileToGo to get to ~/Desktop/anotherFile/anotherFile/fileToGo, no matter, where the file is located. Is there an easy command for this?
Thanks in advance!
This can be done with native Bash features without involving a sub-shell fork:
You can insert this into your "$HOME/.bashrc":
cdf(){
# Query globstar state
shopt -q globstar
# and save it in the gs variable (gs=0 if set, 1 if not)
local gs=$?
# Need globstar to glob find files in sub-directories
shopt -s globstar
# Find the file in directories
# and store the result into the matches array
matches=(**/"$1")
# globstar no longer needed, so restore its previous state
[ $gs -gt 0 ] && shopt -u globstar
# Change to the directory containing the first matched file
cd "${matches[0]%/*}" # cd EXIT status is preserved
}
Hmm, you could do something like this:
cd $(dirname $(find . -name name-of-your-file | head -n 1))
That will search the current directory (use / instead of . to search all directories) for a file called name-of-your-file and cd into the parent directory of the first file with that name that it finds.
If you're in a large directory, typing the path and using cd will probably be faster than this, but it works alright for small directories.

How to rename multiple files in linux and store the old file names with the new file name in a text file?

I am a novice Linux user. I have 892 .pdb files, I want to rename all of them in a sequential order as L1,L2,L3,L4...........,L892. And then I want a text file which contains the old names assigned to new names ( i.e L1,L2,L3). Please help me with this. Thank you for your time.
You could just do:
#!/bin/sh
i=0
for f in *.pdb; do
: $((i += 1))
mv "$f" L"$i" && echo "$f --> L$i"
done > filelist
Note that you probably want to move the files into a different directory, as that will make it easier to recover if an error occurs midway through. Also be wary that this will overwrite any existing files and potentially cause a big mess. It's not idempotent (you can't run it twice). You would probably be better off not doing the move at all and instead do something like:
#!/bin/sh
i=0
mkdir -p newfiles
for f in *.pdb; do
ln "$f" newfiles/L"$((++i))" && printf "%s\0%s\0" "$f" "L$i"
done > filelist
This latter solution creates links to the original files in a subdirectory, so you can run it multiple times without munging the original data. Also, it uses null separators in the file list so you can unambiguously distinguish names that have newlines or tabs or spaces in them. It makes for a list that is not particularly human readable, but you can easily filter it through tr to make it pretty.

for each pair of files with the same prefix, execute code

I have a large list of directories, each of which contains a varied number of "paired" files. By paired, I mean the prefix is the same for two files, and the pairs are denoted as "a" and "b". The prefix does not follow a defined pattern either. My broader intentions are to write a bash script that will list all subdirectories in a given directory, cd into each directory, find the pairs of files, and execute a function on the pairs. Here is an example directory:
Dir1
123_a.txt
234_a.txt
123_b.txt
234_b.txt
Dir2
345_a.txt
345_b.txt
Dir3
456_a.txt
567_a.txt
678_a.txt
456_b.txt
567_b.txt
678_b.txt
I can use this code to loop thought each directory:
for d in ./*/ ; do (cd "$d" && script.sh); done
In script.sh, I have been working on writing a script that will find all pairs of files (which is the problem I am struggling to figure out), and then call the function I want to apply to those files. This is the gist of what I have been trying:
for file in ./*_a.txt; do (find the paired file with *_b.txt && run_function.sh); done
Ive broken the problem into needing to get the value of "*" for the _a.txt files, and then searching the directory using this value for the matching _b.txt suffix,and making a subdirectory that I can put them into so I can then apply run_function.sh. So Dir1, would contain subdirectories 123 and 234.
Let me know if this doesn't make sense. The part of the problem I'm struggling with is matching files without a defined prefix.
Thanks for your help.
Use parameter expansion:
#!/bin/bash
file=123_a.txt
prefix=${file%_a.txt} # remove _a.txt from the right
second=${prefix}_b.txt
if [[ -f $second ]] ; then
run_function "$file" "$second"
fi

Best practice for organizing many file paths for a Bash script

I have a bash script which copy files from one location to another location. Currently, all file paths are defined as strings directly in the script. Unfortunately, all these file paths blow up my script. How can I organize all file paths in a better way, except directly in the script? Should I use a config file?
I am not very sure about your particular requirement here.
It all depends upon the size, complexity, importance, and longevity of your script.
If the script is complex (many lines of code with significant logic and has many dependencies), it is good to do these:
Have an environment setup script - or config file - that creates standardized variables like, job_home, job_temp, job_log, job_log_archive etc., and has some naming convention for a hierarchy of directories with a common parent (job_home in this case). Using upper case for this could be a neat idea, though it is against POSIX standards as highlighted by Charles Duffy in his comment.
Create a function that sets up these hierarchy of directories as needed. Have a library of functions if that would make sense.
Source the environment setup and the library scripts in the main script.
Carefully avoid all hardcoding of paths - use the standard environment variables instead.
A common convention is to accept a filename which can be read from to retrieve a list of other files to operate on as an optional argument.
If you aren't using stdin for any other purpose, it too is a reasonable stream to use for the purpose. For example:
printf '%s\0' /first/directory /second/directory >dirs
your-script < dirs
...or...
find /top/directory -mindepth 1 -type d -print0 | your-script
...or...
# pass a results from "find" via a filename
your-script -d <(find /top/directory -mindepth 1 -type d -print0)
The use of printf '%s\0' and find -print0 above ensure that the list is NUL-delimited; this means that a maliciously-constructed directory name (ie. mkdir -p $'/top/directory/\n/etc/passwd\n') can't inject extra arguments (in the above example, /etc/passwd).
Reading a NUL-delimited stream into an array looks like:
array=( )
while IFS= read -r -d '' dirname; do
array+=( "$dirname" )
done

find returning inverted results

In a few words a wrote this little script to clean up some directories where I had consolidated directories/files from multiple sources where I used the cp command with the --backup=numbered feature so that files with identical names would have a suffix like .~1~ appended to avoid overwriting. I then ran fdupes to remove duplicate files, in some cases fdupes removed the file which did not have the suffix appended from the cp command (the original file) so I wanted to scan the directories looking for files with the suffix appended by the cp command and if the file does not exist with the suffix removed I would move mv the file otherwise I would leave it to avoid deleting anything as fdupes did not think it was a duplicate.
The issues is the test condition if [ -f ... ] part of the code below returns inverted results than what it should and I cannot understand why. For example, when the file exists it would return false and when the file did not exist it would return true. I fixed it by reversing the actions that I wanted to do based on the inverted return code and verified it was working as intended and it was so I ran it as such but would like to know if anyone knows why it would behave the way it did. I am not a bash script expert by any means so its possible that I missed something simple.
#!/bin/bash
logfile=$$.log
exec > $logfile 2>&1
IFS='
'
#set -f
for FILE in $(find . -type f -regextype posix-extended -regex '^.*(\.~[0-9]+~)+$')
do
FILE2=${FILE%%.~[0-9]*} # remove the suffix
if [ -f "${FILE2}" ]
then
echo ERROR: "${FILE2}" already exists!
else
echo "${FILE}" renamed "${FILE2}"
mv "${FILE}" "${FILE2}"
fi
done
You might be able to see the problem by modifying your script to show both FILE and FILE2 in the error message. There are a few minor problems with the script which could cause some confusion (but not the "inverted" logic):
find output is not sorted. If you had more than one backup file, a randomly chosen one would replace the original file;
you could sort the output using an expression like |sort -t~ -n -k2 on the end of the find-command.
the regular expression allows multiple matches of the ~[0-9]~ pattern. Conceivably you could have some odd file which ends with ~1~~2~.
the part where the suffix is removed assumes a single ~[0-9]~ is on the end of the filename. An embedded ~0, e.g., foo~0bar~1~ would reduce FILE to foo. The workaround for that would be more cumbersome (since the suffix-stripping uses globbing), but could be done with a case statement which matched an explicit number of digits (likely three digits would be enough).

Resources