create Linux Bash Script to copy files and create directory - linux

I need to create a script that ..
copy original documents from any portable hard drive or memory stick to an archive without creating unnecessary duplicates.
copy .doc files and .pdf but no duplicates if the files are the same.
The script must make a directory if one doesn't already exist
and if one doesn't exist it must report an error.
Can anyone help?

Your question is very abstract, maybe you could provide an example.
But I think you are looking for rsync.
rsync -ar dir1/ dir2
Synchronize directory 1 to directory 2 and prevents group, owner time ...

Check how to create script: link
How to copy: Link
Portable device: Link
Handling .doc .pdf suffix link
Introduction to if: link

Related

Renaming a file without copying it; any alternative to OverlayFS?

I want to have a "virtual" filesystem like OverlayFS where I can rename the files and folders, but without copying the whole file. I have an archive with over 800TB of Data and the files need to be renamed, but I want to keep the original folder structure and the filenames.
For instance:
I have the 800tb mount on /mnt/archive.
I want an "overlay" mount on /mnt/archive_renamed.
So that a file, for example Data001.bin on /mnt/archive can be renamed on the overlaymount and look something like this /mnt/archive_renamed/Data_from_2014/Data_from_Cats.bin but belongs still to Data001.bin and never touches the underlying mount.
OverlayFS would be perfect if it doesn't need to copy the whole file when renaming it.
Any clue?
In /mnt/archive_renamed, you can use hardlinks to the original files.
Just do "cp -al /mnt/archive /mnt/archive_renamed" and you'll have to folders pointing to the same files.

How do I find missing files from a backup?

So I want to backup all my music to an external hard drive. This worked well for the most part using Grsync, but some didn't copy over because of encoding issues with the file name.
I would like to compare my two music directories (current and backup) to see what files were missed, so I can copy these over manually.
What is a good solution for this? Note there are many many files, so ideally I don't want a tool that wastes time comparing the file contents. I just need to know if a file is missing from the backup that is in the source.
Are there good command line or gui solutions that can do this in good time?
Go to the top level directory in each set.
find . -type f -print | sort > /tmp/listfile.txt
Set up a sorted list for each directory, and diff should help you spot the differences.

How to create a copy of a directory on Linux with links

I have a series of directories on Linux and each directory contains lots of files and data. The data in those directories are automatically generated, but multiple users will need to perform more analysis on that data and generate more files, change the structure, etc.
Since these data directories are very large, I don't want several people to make a copy of the original data so I'd like to make a copy of the directory and link to the original from the new one. However, I'd like any changes to be kept only in the new directory, and leave the original read only. I'd prefer not to link only specific files that I define because the data in these directories is so varied.
So I'm wondering if there is a way to create a copy of a directory by linking to the original but keeping any changed files in the new directory only.
It turns out this is what I wanted to:
cp -al <origdir> <newdir>
It will copy an entire directory and create hard links to the original files. If the original file is deleted, the copied file still exists, and vice-versa. This will work perfectly, but I found newdir must not already exist. As long as the original files are read-only, you'll be able to create an identical, safe copy of the original directory.
However, since you are looking for a way that people can write back changes, UnionFS is probably what you are looking for. It provides means to combine read-only and read-write locations into one.
Unionfs allows any mix of read-only and read-write branches, as well as insertion and deletion of branches anywhere in the fan-out.
Originally I was going to recommend this (I use it a lot):
Assuming the permissions aren't an issue (e.g. only reading is required) I would suggest to bind-mount them into place.
mount -B <original> <new-location>
# or
mount --bind <original> <new-location>
<new-location> must exist as a folder.

How can you tell what files are currently open by any user?

I am trying to write a script or a piece of code to archive files, but I do not want to archive anything that is currently open. I need to find a way to determine what files in a directory are open. I want to use either Perl or a shell script, but can try use other languages if needed. It will be in a Linux environment and I do not have the option to use lsof. I have also had inconsistant results with fuser. Thanks for any help.
I am trying to take log files in a directory and move them to another directory. If the files are open however, I do not want to do anything with them.
You are approaching the problem incorrectly. You wish to keep files from being modified underneath you while you are reading, and cannot do that without operating system support. The best that you can hope for in a multi-user system is to keep your archive metadata consistent.
For example, if you are creating the archive directory, make sure that the number of bytes stored in the archive matches the directory. You can checksum the file contents before and after reading the filesystem and compare that with what you wrote to the archive and perhaps flag it as "inconsistent".
What are you trying to accomplish?
Added in response to comment:
Look at logrotate to steal ideas about how to handle this consistently just have it do the work for you. If you are concerned that rename of files will make processes that are currently writing them will break things, take a look at man 2 rename:
rename() renames a file, moving it
between directories if required. Any
other hard links to the file (as
created using link(2)) are unaffected.
Open file descriptors for oldpath are
also unaffected.
If newpath already exists it will be atomically replaced (subject
to a few conditions; see ERRORS
below), so that there is no point at
which another process attempting to
access newpath will find it missing.
Try ls -l /proc/*/fd/* as root.
msw has answered the question correctly but if you want to file the list of open processes, the lsof command will give it to you.

Linux - Restoring a file

I've written a vary basic shell script that moves a specified file into the dustbin directory. The script is as follows:
#!/bin/bash
#move items to dustbin directory
mv "$#" ~/dustbin/
echo "File moved to dustbin"
This works fine for me, any file I specify gets moved to the dustbin directory. However, what I would like to do is create a new script that will move the file in the dustbin directory back to its original directory. I know I could easily write a script that would move it back to a location specified by the user, but I would prefer to have one that would move it to its original directory.
Is this possible?
I'm using Mac OS X 10.6.4 and Terminal
You will have to store where the original file is coming from then. Maybe in a seperate file, a database, or in the files attributes (meta-data).
Create a logfile with 2 columns:
The complete filename in the dustbin
The complete original path and filename
You will need this logfile anyway - what will you do when a user deleted 2 files in different directories, but with the same name? /home/user/.wgetrc and /home/user/old/.wgetrc ?
What will you do when a user deletes a file, makes a new one with the same name, and then deletes that too? You'll need versions or timestamps or something.
You need to store the original location somewhere, either in a database or in an extended attribute of the file. A database is definitely the easiest way to do it, though an extended attribute would be more robust. Looking in ~/.Trash/ I see some, but not all files have extended attributes, so I'm not sure how Apple does it.
You need to somehow encode the source directory in the file. I think the easiest would be to change the filename in the dustbin directory. So that /home/user/music/song.mp3 becomes ~/dustbin/song.mp3|home_user_music
And when you copy it back your script needs to process the file name and construct the path beginning at |.
Another approach would be to let the filesystem be your database.
A file moved from /some/directory/somewhere/filename would be moved to ~/dustbin/some/directory/somewhere/filename and you'd do find ~/dustbin -name "$file" to find it based on its basename (from user input). Then you'd just trim "~/bustbin" from the output of find and you'd have the destination ready to use. If more than one file is returned by find, you can list the proposed files for user selection. You could use ~/dustbin/$deletiondate if you wanted to make it possible to roll back to earlier versions.
You could do a cron job that would periodically remove old files and the directories (if empty).

Resources