Compare files content with similar names on two folders - linux

I have two folders (I'll use database names as example):
MongoFolder/
CassandraFolder/
These two folders have similar files inside like:
MongoFolder/
MongoFile
MongoStatus
MongoConfiguration
MongoPlugin
CassandraFolder/
CassandraFile
CassandraStatus
CassandraConfiguration
Those files have content also very similar, only changing the name of the database for example, so they all have code or configuration only changing the name Mongo for Cassandra.
How can I compare this two folders, so the result is the files missing from one to the other (for example the file CassandraPlugin for the CassandraFolder) and also that the contents of the files alike, have to be similar, only changing the database name.

This will give you the names of the missing files (minus the database name):
find MongoFolder/ CassandraFolder/ | \
sed -e s/Mongo//g -e s/Cassandra//g | sort | uniq -u
Output:
Folder/Plugin

the following provides a full diff, including missing files and changed content:
cp -r CassandraFolder cmpFolder
# rename files
find cmpFolder -name "Cassandra*" -print | while read file; do
mongoName=`echo "$file" | sed 's/Cassandra/Mongo/'`
mv "$file" "$mongoName"
done
# fix content
find cmpFolder -type f -exec perl -pi -e 's/Cassandra/Mongo/g' {} \;
# inspect result
diff -r MongoFolder cmpFolder # or use a gui tool like kdiff3
I haven't tested this though, feel free fix bugs or to ask if something specific is unclear.
Instead of mv you can also use rename but that's different on different flavours of linux.

Related

Obtaining a flat list of all blobs in a .git/objects/ folder

In the .git/objects/ folder there are many folders with files within such as ab/cde.... I understand that these are actually blobs abcde...
Is there a way to obtain a flat file listing of all blobs under .git/objects/ with no / being used a delimitor between ab and cde in the example above? For e.g.
abcde....
ab812....
74axs...
I tried
/.git/objects$ du -a .
This does list recursively all folders and files within the /objects/ folder but the blobs are not listed since the command lists the folder followed by the filename (as the OS recognizes them, as opposed to git). Furthermore, the du command does not provide a flat listing in a single column -- it provides the output in two columns with a numeric entry (disk usage) in the first column.
I think you should start round here (git version 2.37.2):
git rev-list --all --objects --filter=object:type=blob
Doing it this way offers the advantage of not only checking the directory where the unpacked objects are but also the objects that are already packed (which are not in that directory anymore).
If you are in the .git/objects/ folder
Try this.
find . -type f | sed -e 's/.git\/objects\///' | sed -e 's/\///'
sed -e requires the sed script, which means a find/replace pattern.
's/.git\/objects\///' finds .git/objects/ and replace it to '' which is nothing. therefore sed command remove the pattern.
\ in the pattern is an escape character.
After first sed command ends,
the results will be (in linux.)
61/87c3f3d6c61c1a6ea475afb64265b83e73ec26
To remove / which refers a directory sign,
sed -e 's/\///'
If you are in the directory which contains .git
find .git/objects/ -type f | sed -e 's/.git\/objects\///' | sed -e 's/\///'
try this.

script to move files based on extension criteria

I've a certain amount of files always containing same name but different extensions, for example sample.dat, sample.txt, etc.
I would like to create a script that looks where sample.dat is present and than moves all files with name sample*.* into another directory.
I know how to identify them with ls *.dat | sed 's/\(.*\)\..*/\1/', however I would like to concatenate with something like || mv (the result of the first part) *.* /otherdirectory/
You can use this bash one-liner:
for f in `ls | grep YOUR_PATTERN`; do mv ${f} NEW_DESTINATION_DIRECTORY/${f}; done
It iterates through the result of the operation ls | grep, which is the list of your files you wish to move, and then it moves each file to the new destination.
Something simple like this?
dat_roots=$(ls *.dat | sed 's/\.dat$//')
for i in $dat_roots; do
echo mv ${i}*.* other-directory
done
This will break for file names containing spaces, so be careful.
Or if spaces are an issue, this will do the job, but is less readable.
ls *.dat | sed 's/\.dat$//' | while read root; do
mv "${root}"*.* other-directory
done
Not tested, but this should do the job:
shopt -s nullglob
for f in *.dat
do
mv ${f%.dat}.* other-directory
done
Setting the nullglob option ensures that the loob is not executed, if no dat-file exists. If you use this code as part of a larger script, you might want to unset it afterwards (shopt -u nullglob).

Bash script for moving files and their parent directory

I have searched and a lot of topics dance around what I am trying to accomplish. I have over 2,000 m4a files buried in with 17,000 mp3s. The directory structure is /home/me/Music/MP3/Artist/Album/song.m4a. I want to use 'find' to discover the m4a songs, move them, their album directory, and their artist directory to /home/me/Music/M4A/Artist/Album/song.m4a. I have been unsuccessful with the mv -exec switch and I have been unsuccessful using basename and/or dirname to create a script. The parent and grand-parent directories have me thrown. Moving the files themselves are not a problem, just creating the directory structure AND moving the files. In a piecemeal effort, I have exported the file list find /home/me/Music/MP3 -name "*.m4a" >> dir.sh (partly because I wanted to see the file locations and # of songs). I then ran sed 's/MP3/M4A/g' dir.sh to replace the MP3 with M4A. Dropping the song.m4a as in this sample from the dir.sh will leave me with a list of Artist/Album directories to run through a while loop with mkdir: /home/me/Music/M4A/Metallica/Re-load/Metallica - The Unforgiven.m4a. Unfortunately, this is where I am stuck, dirname yields a '.'
find /home/me/Music/MP3 -name \*.m4a| sed -e 's/.*/mkdir -p $(dirname "&"); mv "&" "&"M4A;/' | sed -e 's/MP3\([^"]*\)"M4A$/M4A\1"/' > moveit_baby.sh
bash moveit_baby.sh
should do the job, but check "moveit_baby.sh" before you call it.
Depending on your sed implementation you will need \(\) or plain () in the second sed. Of course the string "MP3" should neither be part of Artist, Album or song name, otherwise you need a more complex filter, see below.
You might further optimize when you insert mkdir -p only if the dirname changes. Such more complex decisions on input parameters are better achieved with while read loops
find /home/me/Music/MP3 -name \*.m4a | while read file
do
# do anything you want to $file here
done

rsync to backup one file generated in dynamic folders

I'm trying to backup just one file that is generated by other application in dynamic named folders.
for example:
parent_folder/
back_01 -> file_blabla.zip (timestam 2013.05.12)
back_02 -> file_blabla01.zip (timestam 2013.05.14)
back_03 -> file_blabla02.zip (timestam 2013.05.22)
and I need to get the latest generated zip, just that one it doesnt matter the name of the file as long as is the latest, is a zip and is inside "parent_folder" get that one.
as well when I do the rsync the folder structure + file name is generated and I want to omit that I want to backup that file in a folder and with a name so I know where is the latest and it will be always named the same.
now im doing this with a perl that get the latest generated folder with
"ls -tAF | grep '/$' | head -1"
and perform the rsync but it does brings the last zip but with the folder structure that I dont want because it doesnt override my latest zip file.
rsync -rvtW --prune-empty-dirs --delay-updates --no-implied-dirs --modify-window=1 --include='*.zip' --exclude='*.*' --progress /source/ /myBackup/
as well it would be great if I could do the rsync without needing to use perl or any other script.
thanks
The file names will differ each time ?
This would be hard for any type of syncing to work.
What you could do is :
create a new folder outside of where it is found, then :
Before you start remove the last sym linked file in that folder
When the file is found i.e. ls -tAF | grep '/$' | head -1 ....
symlink it this folder
then rsync,ssh,unison file across to new node.
If the symlink name is file-latest.zip then it will always be this
one file sent across.
But why do all that when you can just scp and you can take a look at here:
https://github.com/vahidhedayati/definedscp
for a more long winded approach, and not for this situation but it uses the real file date/time stamp then converts to seconds... It might be useful if you wish to do the stat in a different way
Using stat to work out file, work out latest file then simply scp it across, here is something to get you started:
One liner:
scp $(find /path/to/parent_folder -name \*.zip -exec stat -t {} \;|awk '{print $1" "$13}'|sort -k2nr|head -n1|awk '{print $1}') remote_server:/path/to/name.zip
More long winded way, maybe of use to understand what above is doing:
#!/bin/bash
FOUND_ARRAY=()
cd parent_folder;
for file in $(find . -name \*.zip); do
ptime=$(stat -t $file|awk '{print $13}');
FOUND_ARRAY+=($file" "$ptime)
done
IFS=$'\n'
FOUND_FILE=$(echo "${FOUND_ARRAY[*]}" | sort -k2nr | head -n1|awk '{print $1}');
scp $FOUND_FILE remote_host:/backup/new_name.zip

Batch rename of files with similar names

I have a series of files named like such:
file 1.jpeg
file 2.jpeg
file 3.jpeg
...
file 40.jpeg
I would like remove the space from all of their filenames without having to individually do it. I know its possible using something like: file{1,40}.jpeg or something like that but i can't remember and I don't even know how to search for it.
Thanks!
EDIT: linux
http://www.google.es/search?q=shell+rename+similar+files+in+a+directory
The first result is http://www.debian-administration.org/articles/150
Using the perl rename command [...] we can also, for example, strip spaces from filenames with this:
~$ rename 's/ //' *.jpeg
In other posts I've found this kind of commands that do not require perl:
for f in *; do mv "$f" `echo $f | tr --delete ' '`; done
I've not tried any of them.

Resources