Shell script to delete semaphore file(Lock File) of the job if that job is not running - linux

I want to write a shell script for deleting all lock files present in various directories.
I want to implement following logic.
Check for lock file by directory by directory.
If Lock file is found in particular Job directory, check whether that job is running.
If Job is not running, delete the lock file.
Move on to next directory.
Follow the above procedure till the last Job directory.
* Do I need to specify individual Job directory Lock file location?

Related

How does logrotate work when there are two process use the same file?

For example:
Program A is writing log to file "test.log".
If logrotate runs, it will rename "test.log" to "test.log.1" first, and then create a new file "test.log".
After step 2, program A does not report any error, but the A's log does not appear in the new file "test.log".
The questions are:
Where is the data that A write to file after step 2 ?
How can logrotate rename and create new file when another process is writing to the file? (Is any point that I miss about logrotate?)
Thanks!
This is very tightly related to how POSIX filesystems work. When you rename a file, it's only the name of the file that is changed, the physical file on the disk will not change. Also, once a file is opened, the process using the file only have a link (through many layers) to the physical file on the disk, the name is only used when opening the file.
That means the program A will still write to the same file, which now has the new name (i.e. test.log.1 in your example).
A common solution to this problem is to have the log rotation program send a signal (e.g. SIGHUP or SIGUSR1 or similar) to the process. The process will detect this signal and then reopen the logging to use the new file.

Change spark _temporary directory path

Is it possible to change the _temporary directory where spark save its temporary files before writing?
In particular, since I am writing single partitions of a table I woud like the temporary folder to be within the partition folder.
Is it possibile?
There is no way to use the default FileOutputCommitter because of its implementation, the FileOutputCommiter creates a ${mapred.output.dir}/_temporary subdirectory where the files are written and later on, after being committed, moved to ${mapred.output.dir}.
In the end, an entire temporary folder deleted. When two or more Spark jobs have the same output directory, mutual deletion of files will be inevitable.
Eventually, I've downloaded org.apache.hadoop.mapred.FileOutputCommitter and org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter (you can name it YourFileOutputCommitter) made some changes that allows _temporaly rename
in your driver, you'll have to add following code:
val conf: JobConf = new JobConf(sc.hadoopConfiguration)
conf.setOutputCommitter(classOf[YourFileOutputCommitter])
// update temporary path for committer
YourFileOutputCommitter.tempPath = "_tempJob1"
note: it's better to use MultipleTextOutputFormat to rename files because two jobs that write to the same location can override each other.
Update
I've created short post in our tech blog, it has more details
https://www.outbrain.com/techblog/2020/03/how-you-can-set-many-spark-jobs-write-to-the-same-path/

Change temporary path for individual job from spark code

I have multiple jobs that I want to execute in parallel that append daily data into the same path using dynamic partitioning.
The problem i am facing is the temporary path that get created during the job execution by spark. Multiple jobs end up sharing the same temp folder and cause conflict, which can cause one job to delete temp files, and the other job fail with an error saying an expected temp file doesn't exist.
Can we change temporary path for individual job or is there any alternate way to avoid issue
To change the temp location you can do this:
/opt/spark/bin/spark-shell --conf "spark.local.dir=/local/spark-temp"
spark.local.dir changes where all temp files are read and written to, I would advise building and opening the positions of this location via command line before the first session with this argument is run.

How to call a bash script automatically when directory contents chage

My goal is to run a bash script automatically whenever any new file is added to a particular directory or any subdirectory of that particular directory.
Detail Scenario:
I am creating an automated process for file submission from teachers to students and vice versa. Sender will upload file and it will be stored inside the Uploads directory in the LAMP server in the format, ex. "name_course-name_filename.pdf". I want some method so that when any file stored inside the Uploads folder, the same time a script will be called and send that file to the list of receives.
From the database I can find the list of receiver for that particular course and student.
The only concern of mine is, how to call a script automatically and make it work on individual file whenever the content of the directory changes. Cron will do in intervals but not a real time work.
Linux provides a nice mechanism for that purpose which is called inotify. inotify is mostly available as a C API. But there have been developed shell utilities as well. You should use inotifywait from inotifytools (pkg name in debian) for this. Here comes a basic example:
#!/bin/bash
directory="/tmp" # or whatever you are interested in
inotifywait -m -e create "$directory" |
while read folder eventlist eventfile
do
echo "the following events happened in folder $folder:"
echo "$eventlist $eventfile"
done
Update:
If the problem goes complicated, for example you'll have to monitor recursive, dynamic directory structures, you should have a look at incron It's a cron like daemon which executes scripts on certain events. But the events are file system events rather than timer events.
There is another option to 'inotifywait':
-d --daemon
Same as --monitor, except run in the background logging events to a file
that must be specified by --outfile. Implies --syslog.
For completeness:
-m --monitor
Instead of exiting after receiving a single event, execute indefinitely.
The default behaviour is to exit after the first event occurs.
Within the do-done block of your 'while' statement, you might parse each event report for interesting details then use 'case-esac' to take action based on each event that you care about.
For something that you plan to rely on for your operations, you might also consider replacing the hard-coded '$directory' with some sort of configuration file. Such a file might include the path and filename, the interesting events for that path and file, and a script to run when those events happened.
The script might take the list of events as parameters and then 'case-esac' again.
Just one man's ramblins,
~~~ 8d;-Dan

How to implement a semaphore that will synchronize several different copies of the same program in Linux

I have a program that can be ran several times. The program uses a working directory where it saves/manipulates its runtime files and puts results. I want to make sure that if several copies of the program run simultaneously, they won't use the same folder. To do this I add a hidden file in the work directory when it's created, that means that the directory is being used, and delete it when the program exits. When a program wants to use a certain directory as its working directory, it'll check if that file exists, and if not it will use the directory, otherwise, it'll use a directory of the same name with its process id attached. The implementation is: (in Tcl)
upon starting:
if [file exists [db_work_area]/.folder_used] {
reg set work_area_override [db_work_area]_[pid]
}
...
exec touch ${db_wa}/.folder_used
when exiting:
if [file exists [db_work_area]/.folder_used] {
file delete [db_work_area]/.folder_used
}
This works when the copies of the program are opened one at a time, however I am afraid that if several copies of the program will be opened at the same time, there will be a problem with their synchronization. Meaning that two programs will check if the file exists, together, see that it doesn't both chose that directory, and only after that, they will add the file. How can I implement a semaphore that will be able to synchronize between the several different copies of the same program running?
You should not do a [file exists] and later the touch, it works better to use open to do it in a single step with the EXCL permission.
Try to use something like this to create the file and fail if it already exists in an atomic way.
if {[catch {open ${db_wa}/.folder_used {WRONLY EXCL CREAT}} fd]} {
# error happend, file exists
# pick a different area
} else {
# just close it again, like a touch to create the file
close $fd
}

Resources