linux server create symbolic links from filenames - linux

I need to write a shell script to run as a cron task, or preferably on creation of a file in a certain folder.
I have an incoming and an outgoing folder (they will be used to log mail). There will be files created with codes as follows...
bmo-001-012-dfd-11 for outgoing and 012-dfd-003-11 for incoming. I need to filter the project/client code (012-dfd) and then place it in a folder in the specific project folder.
Project folders are located in /projects and follow the format 012-dfd. I need to create symbolic links inside the incoming or outgoing folders of the projects, that leads to the correct file in the general incoming and outgoing folders.
/incoming/012-dfd-003-11.pdf -> /projects/012-dfd/incoming/012-dfd-003-11.pdf
/outgoing/bmo-001-012-dfd-11.pdf -> /projects/012-dfd/outgoing/bmo-001-012-dfd-11.pdf
So my questions
How would I make my script run when a file is added to either incoming or outgoing folder
Additionally, is there any associated disadvantages with running upon file modification compared with running as cron task every 5 mins
How would I get the filename of recent (since script last run) files
How would I extract the code from the filename
How would I use the code to create a symlink in the desired folder
EDIT: What I ended up doing...
while inotifywait outgoing; do find -L . -type l -delete; ls outgoing | php -R '
if(
preg_match("/^\w{3}-\d{3}-(\d{3}-\w{3})-\d{2}(.+)$/", $argn, $m)
&& $m[1] && (file_exists("projects/$m[1]/outgoing/$argn") != TRUE)
){
`ln -s $(pwd)/outgoing/$argn projects/$m[1]/outgoing/$argn;`;
}
'; done;
This works quite well - cleaning up deleted symlinks also (with find -L . -type l -delete) but I would prefer to do it without the overhead of calling php. I just don't know bash well enough yet.

Some near-answers for your task breakdown:
On linux, use inotify, possibly through one of its command-line tools, or script language bindings.
See above
Assuming the project name can be extracted thinking positionally from your examples (meaning not only does the project name follows a strict 7-character format, but what precedes it in the outgoing file also does):
echo `basename /incoming/012-dfd-003-11.pdf` | cut -c 1-7
012-dfd
echo `basename /outgoing/bmo-001-012-dfd-11.pdf`| cut -c 9-15
012-dfd
mkdir -p /projects/$i/incoming/ creates directory /projects/012-dfd/incoming/ if i = 012-dfd,
ln -s /incoming/foo /projects/$i/incoming/foo creates a symbolic link from the latter argument, to the preexisting, former file /incoming/foo.

How would I make my script run when a file is added to either incoming or outgoing folder
Additionally, is there any associated disadvantages with running upon file modification compared with running as cron task
every 5 mins
If a 5 minutes delay isn't an issue, I would go for the cron job (it's easier and -IMHO- more flexible)
How would I get the filename of recent (since script last run) files
If your script runs every 5 minutes, then you can tell that all the files created in between now (and now - 5 minutes) are newso, using the command ls or find you can list those files.
How would I extract the code from the filename
You can use the sed command
How would I use the code to create a symlink in the desired folder
Once you have the desired file names, you can usen ln -s command to create the symbolic link

Related

Listing files while working with them - Shell Linux

I have a database server that it basic work is to import some specific files, do some calculations and provide data in a web interface.
It's planned for next weeks a hardware replacement, it needs to migrate the database. But there's one problem in it: the actual database is corrupted and show some errors in web interface. This is due to server freezing while importing/calculating, that's why the replacement.
So I'm not willing to just dump the db and restore in the new server. Doesn't make sense to still use the corrupted database and while dumping the old server goes really slow. I have a backup from all files to be imported (the current number is 551) and I'm working on a script to "re-import" all of them and have a nice database again.
The actual server takes ~20 minutes to import each new file. Let's say that new server takes 10 for each file due to its power... It's a long time! And here comes the problem: it receives new file hourly, so there will be more files when it finishes the job.
Restore script start like this:
for a in $(ls $BACKUP_DIR | grep part_of_filename); do
Question is: does this "ls" will have new file names when they come? File names are timestamp based, so they will be in the end of the list.
Or does this "ls" is execute once and results goes to a temp var?
Thanks.
ls will execute once, at the beginning, and any new files won't show up.
You can rewrite that statement to list the files again at the start of each loop (and, as Trey mentioned, better to use find, not ls):
while all=$(find $BACKUP_DIR/* -type f | grep part_of_filename); do
for a in $all; do
But this has a major problem: it will repeatedly process the same files over and over again.
The script needs to record which files are done. Then it can list the directory again and process any (and only) new files. Here's one way:
touch ~/done.list
cd $BACKUP_DIR
# loop while f=first file not in done list:
# find list the files; more portable and safer than ls in pipes and scripts
# fgrep -v -f ~/done.list pass through only files not in the done list
# head -n1 pass through only the first one
# grep . control the loop (true iff there is something)
while f=`find * -type f | fgrep -v -f ~/done.list | head -n1 | grep .`; do
<process file $f>
echo "$f" >> ~/done.list
done

Change directory to path of parent/calling script in bash

I have dozens of scripts, all in different directories. (exported/expanded Talend jobs)
At this moment each job has 1 or 2 scripts, starting with the same lines, most important one:
CD ***path-to-script***
and several lines to set the Java path and start the job.
I want to create a script, which will be ran from all these scripts.
e.g.:
/scripts/talend.sh
And in all talend scripts, the first line will run /scripts/talend.sh, some examples of where these scripts are ran from:
/talend-job1_0.1/talend-job1_0.1/talend-job1/talend-job1.sh
/talend-task2_0.1/talend-task2_0.1/talend-task2/talend-task2.sh
/talend-job3_0.1/talend-job3_0.1/talend-job3/talend-job3.sh
How can I determine where the /scripts/talend.sh is started from, so I can CD to that path from within /scripts/talend.sh.
The Talend scripts are not run from within the directory itself, but from a cronjob, or a different users home directory.
EDIT:
The question was marked as duplicate, but Getting the source directory of a Bash script from within is not answering my question 100%.
Problem is:
- The basic script is being called from different scripts
- Those different scripts can be run from command line, with, and with or without a symbolic link.
- The $0, the $BASH_SOURCE and the pwd all do some things, but no solution mentioned covers all the difficulties.
Example:
/scripts/talend.sh
In this script I want to configure the $PATH and $HOME_PATH of Java, and CD to the place where the Talend job is placed. (It's a package, so that script MUST be run from that location).
Paths to the jobs are, for example:
/u/talend/talendjob1/sub../../talendjob1.sh
/u/talend/talendjob2/sub../../talendjob2.sh
/u/talend/talendjob3/sub../../talendjob3.sh
Multiple jobs are run from a TMS application. This application cannot run these scripts with the whol name (to long, name can only be 6 long), so in a different location I have symbolic links:
/u/tms/links/p00001 -> /u/talend/talendjob1/sub../../talendjob1.sh
/u/tms/links/p00002 -> /u/talend/talendjob1/sub../../talendjob2.sh
/u/tms/links/p00003 -> /u/talend/talendjob1/sub../../talendjob3.sh
/u/tms/links/p00004 -> /u/talend/talendjob1/sub../../talendjob4.sh
I think you get an overview of the complexity and why I want only one basic talend script, where I can leave all basic stuff. But I only can do that, if I know the source of the Talend script, because there I have to be to start that talend job.
These answers (beyond the first) are specific to Linux, but should be very robust there -- working with directory names containing spaces, literal newlines, wildcard characters, etc.
To change to your own source directory (a FAQ covered elsewhere):
cd "$(basename "$BASH_SOURCE")"
To change to your parent process's current directory:
cd "/proc/$PPID/cwd"
If you want to change to the directory passed as the first command-line argument to your parent process:
{ IFS= read -r -d '' _ && IFS= read -r -d '' argv1; } <"/proc/$PPID/cmdline"
cd "$argv1"
That said, personally, I'd just export the job directory to the environment variable in the parent process, and read that environment variable in the children. Much, much simpler, more portable, more accurate, and compliant with best process.
You can store pwd in a variable and then cd to it when you want to go back
This works for me:
In
/scripts/talend.sh
do
cd ${1%/*}
${1%/*} will strip off everything after the last / effectively providing a dirname for $1, which is the path to the script that calls this one.
and than call the script with the line:
/scripts/talend.sh $0.
Calling the script with $0 passes the name of the current script as an argument to the child which as shown above can be used to cd to the correct directory.
When you source /scripts/talend.sh the current directory is unchanged:
The scripts
# cat /scripts/talend.sh
echo "Talend: $(pwd)"
# cat /talend-job1_0.1/talend-job1_0.1/talend-job1/talend-job1.sh
echo Job1
. /scripts/talend.sh
Executing job1
# cd /talend-job1_0.1/talend-job1_0.1
# talend-job1/talend-job1.sh
Job1
Talend: /talend-job1_0.1/talend-job1_0.1
When you want to see the dir where the calling script is in, see get dir of script.
EDIT:
When you want to have the path of the callling script (talend-job1.sh) without having to cd to that dir first, you should get the dir of the script (see link above) and source talend.sh:
# cat /scripts/talend.sh
cd "$( dirname "${BASH_SOURCE[0]}" )"
echo "Talend: $(pwd)"
In talend.sh get the name of the calling script and then the directory:
parent_cmd=$(ps -o args= $PPID)
set -- $parent_cmd
parent_cmd=$(dirname $2)
Update: as pointed by Charles Duffy in the comments below this will cause havoc when used with paths containing white-space or glob patterns.
If procfs is available you could read the content of /proc/$PPID/cmdline or if portability is a concern do a better parsing of the args.
In /scripts/talend.sh:
cd "$(dirname "$0")"
Or:
cd "$(dirname "$BASH_SOURCE")"
Another one is:
cd "$(dirname "$_")"
#This must be the first line of your script after the shebang line
#Otherwise don't use it
Note: The most reliable of the above is $BASH_SOURCE

Change working directory while looping over folders

Currently I am trying to run MRI software (TBSS) on imaging files(scan.nii.gz) on the Linux command line.
The scans are all stored in separate folders for different participants and the file names are identical,so:
/home/scans/participant1/scan.nii.gz
/home/scans/participant2/scan.nii.gz
/home/scans/participant3/scan.nii.gz
What this software does is it creates the result of the analysis in the current working directory.Since the scans have the same image name, they get overwritten al the time.
I would like to loop through all the participant folders, make it my working directory and then execute the tbss command, which is simply tbss_1_preproc scan.nii.gz. In this way, the file will be stored in the current working directory,which is the participant directory.
Is there any sensible way of doing this in Linux ?
Thanks so much !
Try it in BASH. The code below is untested, but it should give you a clue
#! /bin/bash
find . -name scan.nii.gz | while read line
do
cd $(dirname "${line}")
tbss_1_preproc $(basename "${line}")
done
Put it in a file and make it executable. Copy it to your scans folder and execute it.

Recursively copy contents of directory to all target directories

I have a directory containing a set of subdirectories and files. I need to recursively copy all the content of this directory to all the subdirectories of another directory, also recursively.
How do I achieve this, preferably without using a script and only with the cp command?
You can write this in a script but you don't have to. Just write it line by line in the terminal:
# $TARGET is the directory containing subdirectories where you want to STORE the copies
# $SOURCE is the directory containing the subdirectories you want to COPY
for dir in $(ls $TARGET); do
cp -r $SOURCE/* $TARGET/$dir
done
Only uses cp and runs on both bash and zsh.
You can't. cp can copy multiple sources but will only copy to a single destination. You need to arrange to invoke cp multiple times - once per destination - for what you want to do; using, as you say, a loop or some other tool.
The first part of the command before the pipe instruct tar to create an archive of everything in the current directory and write it to standard output (the – in place of a file-name frequently indicates stdout).
tar cf - * | ( cd /target; tar xfp -)
The commands within parentheses cause the shell to change directory to the target directory and untar data from standard input. Since the cd and tar commands are contained within parentheses, their actions are performed together.
The -p option in the tar extraction command directs tar to preserve permission and ownership information, if possible given the user executing the command. If you are running the command as superuser, this option is turned on by default and can be omitted.
Also you can use the following command, but it seems to be quite slower than tar;
cp -a * /target

Modifying files nested in tar archive

I am trying to do a grep and then a sed to search for specific strings inside files, which are inside multiple tars, all inside one master tar archive. Right now, I modify the files by
First extracting the master tar archive.
Then extracting all the tars inside it.
Then doing a recursive grep and then sed to replace a specific string in files.
Finally packaging everything again into tar archives, and all the archives inside the master archive.
Pretty tedious. How do I do this automatically using shell scripting?
There isn't going to be much option except automating the steps you outline, for the reasons demonstrated by the caveats in the answer by Kimvais.
tar modify operations
The tar command has some options to modify existing tar files. They are, however, not appropriate for your scenario for multiple reasons, one of them being that it is the nested tarballs that need editing rather than the master tarball. So, you will have to do the work longhand.
Assumptions
Are all the archives in the master archive extracted into the current directory or into a named/created sub-directory? That is, when you run tar -tf master.tar.gz, do you see:
subdir-1.23/tarball1.tar
subdir-1.23/tarball2.tar
...
or do you see:
tarball1.tar
tarball2.tar
(Note that nested tars should not themselves be gzipped if they are to be embedded in a bigger compressed tarball.)
master_repackager
Assuming you have the subdirectory notation, then you can do:
for master in "$#"
do
tmp=$(pwd)/xyz.$$
trap "rm -fr $tmp; exit 1" 0 1 2 3 13 15
cat $master |
(
mkdir $tmp
cd $tmp
tar -xf -
cd * # There is only one directory in the newly created one!
process_tarballs *
cd ..
tar -czf - * # There is only one directory down here
) > new.$master
rm -fr $tmp
trap 0
done
If you're working in a malicious environment, use something other than tmp.$$ for the directory name. However, this sort of repackaging is usually not done in a malicious environment, and the chosen name based on process ID is sufficient to give everything a unique name. The use of tar -f - for input and output allows you to switch directories but still handle relative pathnames on the command line. There are likely other ways to handle that if you want. I also used cat to feed the input to the sub-shell so that the top-to-bottom flow is clear; technically, I could improve things by using ) > new.$master < $master at the end, but that hides some crucial information multiple lines later.
The trap commands make sure that (a) if the script is interrupted (signals HUP, INT, QUIT, PIPE or TERM), the temporary directory is removed and the exit status is 1 (not success) and (b) once the subdirectory is removed, the process can exit with a zero status.
You might need to check whether new.$master exists before overwriting it. You might need to check that the extract operation actually extracted stuff. You might need to check whether the sub-tarball processing actually worked. If the master tarball extracts into multiple sub-directories, you need to convert the 'cd *' line into some loop that iterates over the sub-directories it creates.
All these issues can be skipped if you know enough about the contents and nothing goes wrong.
process_tarballs
The second script is process_tarballs; it processes each of the tarballs on its command line in turn, extracting the file, making the substitutions, repackaging the result, etc. One advantage of using two scripts is that you can test the tarball processing separately from the bigger task of dealing with a tarball containing multiple tarballs. Again, life will be much easier if each of the sub-tarballs extracts into its own sub-directory; if any of them extracts into the current directory, make sure you create a new sub-directory for it.
for tarball in "$#"
do
# Extract $tarball into sub-directory
tar -xf $tarball
# Locate appropriate sub-directory.
(
cd $subdirectory
find . -type f -print0 | xargs -0 sed -i 's/name/alternative-name/g'
)
mv $tarball old.$tarball
tar -cf $tarball $subdirectory
rm -f old.$tarball
done
You should add traps to clean up here, too, so the script can be run in isolation from the master script above and still not leave any intermediate directories around. In the context of the outer script, you might not need to be so careful to preserve the old tarball before the new is created (so rm -f $tarbal instead of the move and remove command), but treated in its own right, the script should be careful not to damage anything.
Summary
What you're attempting is not trivial.
Debuggability splits the job into two scripts that can be tested independently.
Handling the corner cases is much easier when you know what is really in the files.
You probably can sed the actual tar as tar itself does not do compression itself.
e.g.
zcat archive.tar.gz|sed -e 's/foo/bar/g'|gzip > archive2.tar.gz
However, beware that this will also replace foo with bar also in filenames, usernames and group names and ONLY works if foo and bar are of equal length

Resources