How to take control on files in Linux before processing starts - bash - linux

I am currently working on project to automate a manual task in my office. We have a process that we have to re-trigger some of our ID's when they fall in repair. As part of the process, we have to extract those ID's from a oracle DB table and then put in a file on our Linux server and run the command like this-
Example file:
$cat /task/abc_YYYYMMDD_1.txt
23456
45678
...and so on
cat abc_YYYYMMDD_1.txt | scripttoprocess -args
I am using an existing java based code called 'scripttoprocess'. I can't see what's inside this code as it is encrypted( it seems) in my script. I simply go to the location where my files are present present and then use it like this:
cd /export/incoming/task
for i in `ls abc_YYYYMMDD*.txt`;do
cat $i | scripttoprocess -args
if [ $? -eq 0];then
mv $i /export/incoming/HIST/
fi
done
scripttoprocess is and existing script. I am just calling it in my own script. My script is running continuously in a loop in the background. It simply searches for abc_YYYYMMDD_1.txt file in /task directory and if it detects such a file then it starts processing the file. But I have noticed that my script starts processing the file well before it is fully written and sometime moves the file to HIST without fully processing it.
How can handle this situation. I want to be fully sure that file is completely written before I start processing it. Secondly, Is there any way to take control of the file like preparing a control file which contains list of the files which are present in the /task directory. And then I can cat this control file and pick up file names from inside of it ? Your guidance will be much appreciated.

I used
iwatch -e close_write -c "/usr/bin/pdflatex -interaction batchmode %f" document.tex
To run a command (Latex to PDF conversion) when a file (document.tex) is closed after writing to it, which you could do as well.
However, there is a caveat: This was only meant to catch manual edits to the file and failure was not critical. Therefore, this ignores the case that immediately after closing, it is opened and written again. Ask yourself if that is good enough for you.

I agree with #TenG, normally you shouldn't move a file until it is fully written. If you know for sure that the file is finished (like a file from yesterday) then you can move it safely, otherwise you can process it, but not move it. You can for example process a part of it and remember the number of processed rows so that you don't restart from scratch next time.
If you really really want to work with files that are "in progress", sometimes tail -F works for this case, but then your bash script is an ongoing process as well, not a job, and you have to manage it.
You can also check if a file is currently open (and thus unfinished) using lsof (see https://superuser.com/questions/97844/how-can-i-determine-what-process-has-a-file-open-in-linux ; check if file is open with lsof ).

Change the process, that extracts the ID's from the oracle DB table.
You can use the mv as commented by #TenG, or put something special in the file that shows the work is done:
#!/bin/bash
source file_that_runs_sqlcommands_with_credentials
output=$(your_sql_function "select * from repairjobs")
# Something more for removing them from the table and check the number of deleted records
printf "%s\nFinished\n" "${output}" >> /task/abc_YYYYMMDD_1.txt
or
#!/bin/bash
source file_that_runs_sqlcommands_with_credentials
output=$(your_sql_function "select * from repairjobs union select 'EOF' from dual")
# Something more for removing them from the table and check the number of deleted records
printf "%s\n" "${output}" >> /task/abc_YYYYMMDD_1.txt

Related

Need suggestion to move a big live file in linux

Multiple scripts are running in my Linux server which are generating huge data and I realise that it will eat all my 500GB of storage size in next 2-5 days and scripts require 10 more days to finish the process means they need more space. So most likely I am going to have a space issue problem and I will have to restart the entire process again.
Process is like this -
script1.sh content is like below
"calling an api" > /tmp/output1.txt
script2.sh content is like below
"calling an api" > /tmp/output2.txt
Executed like this -
nohup ./script1.sh & ### this create file in /tmp/output1.txt
nohup ./script2.sh & ### this create file in /tmp/output2.txt
My understand initially was, if I will follow below steps, it should work --
when scripts are running with nohup in background execute this command -
mv /tmp/output1.txt /tmp/output1.txt_bkp; touch /tmp/output1.txt
And then transfer this file /tmp/output1.txt_bkp to another server via ftp and remove it after that to get space on server and script will keep on writing in /tmp/output1.txt file.
But this assumption was wrong and script is keep on writing in /tmp/output1.txt_bkp file. I think script is writing based on inode number that is why it is keep on writing in old file.
Now the question is how to avoid space issue without killing/restart scripts?
Essentially what you're trying to do is pull a file out from under a script that's actively writing into it. I'm not sure how nohup would let you do that.
May I suggest a different approach?
Why don't you move an x number of lines from your /tmp/output[x].txt to /tmp/output[x].txt_bkp? You can do so without much trouble while your script is running and dumping stuff into /tmp/output[x].txt. That way you can free up space by shrinking your output[x] files.
Try this as a test. Open 2 terminals (or use screen) to your Linux box. Make sure both are in the same directory. Run this command in one of your terminals:
for line in `seq 1 2000000`; do echo $line >> output1.txt; done
And then run this command in the other before the first one finishes:
head -1000 output1.txt > output1.txt_bkp && sed -i '1,+999d' output1.txt
Here is what's going to happen. The first command will start producing a file that looks like this:
1
2
3
...
2000000
The second command will chop off the first 1000 lines of output1.txt and put them into output1.txt_bkp and it will do so WHILE the file is being generated.
Afterwards, look inside output1.txt and output1.txt_bkp, you will see that the former looks like this:
1001
1002
1003
1004
...
2000000
While the latter will have the first 1000 lines. You can do the same exact thing with your logs.
A word of caution: Based on your description, your box is under a heavy load from all that dumping. This may negatively impact the process outlined above.

How to count number of times a file was executed on linux

I have an executable file and I would like to know how many times it is being executed. The file is located on a network file system. Is there a way to do this with a script using one of Linux utilities? The limitation I have is that I would like to avoid changing the file itself. For example I will not add a file with a counter which would be updated by an executable script. And I will not make the executable script call some API to increment a counter in e.g. database.
I don't know exactly how to watch a file for execution, but you can construct something with inotify watching how many times it is opened:
You could have a script like that:
#! /bin/bash
EXEC_CNT=0
FILE_TO_WATCH=/path/to/your/file
while inotifywait -e open "$FILE_TO_WATCH"
do
((EXEC_CNT++))
echo "$FILE_TO_WATCH opened $EXEC_CNT times"
# Or to store in a file:
# echo "$FILE_TO_WATCH opened $EXEC_CNT times" >> "$FILE_TO_WATCH.log"
done
In case of a network share, this script must be runned on the computer that share its file system.

In Linux, how can I print output for a text file once it's created?

I have a file called /home/myuser/tmp* that is briefly created, logs an output message and is then deleted. I need to see that output, but it's only there for a second at most (I'm working with an annoying open source program). Is there some command like "tail -f /home/myuser/tmp*" that can show me the contents of that file as soon as it's created?
Try opening another terminal and write a loop that attempts to copy the file.
Start it right before the operation that causes the file to be created. Once the creation script is done, CTRL-C to kill the loop in the other session and see if it created the saved file. You may have to try it a couple of times but it should capture that file at some point!
while :
do
cp /home/myuser/tmpfile /home/myuser/tmpfile.sav 2>/dev/null
done
Maybe the process that creates the file just appends to it if it already exists. If so, and if you know what the name of it will be, create an empty file by that name and do the tail -f of it in another terminal session, then run the program in the first terminal. Not in a loop, just a tail -f tmpfile.
If there is no other activity in /home/myuser, you could simply do:
inotifywait -e close /home/myuser && cat '/home/myuser/tmp*'
(Is the file name really tmp*, or are you asking about arbitrarily named files that begin with tmp? If the latter, this solution clearly will not work.
Inotifywait will simply block until some file in /home/myuser is closed, and then cat the file. If you want to watch for multiple files, you might prefer something like:
inotifywait -m -e close_write --format %f ~myuser |
while read file; do cat ~myuser/$file; done
But note the standard warnings and caveats about paths containing whitespace.

Continuously writing output from Linux command to file

I want to write data obtained through a command (Linux) to a file. But, I don't want the file to get overwritten (which seems to happen with Perl scripts I've written thus far). I want the file to update every time my Perl script is executed (which I'm going to set up through crontab). The following script is what I have thus far (and it doesn't do what I'd like):
#!/usr/bin/perl
open FH, ">sysdata.csv";
print FH `ps -e | wc -l`;
print FH "\n";
close FH;
**This script overwrites the file sysdata.csv every time it gets executed. So I need help on how to make this file just get updated.
Thanks in advance.
Using >> instead of > will do an append to file rather than an overwrite

How to get a filename in a bash script without the keyboard

I want to write a script for linux, that will first copy a movie/series file to cache with something like:
cat /filepath/filename > /dev/null
and than open the same file in vlc.
The problem is getting the file name and path in to the script. I would like to simply double click a file, or somehow make this a faster process than typing this manually (especially because the file names of some series are just inconsistent and hard to type, even with auto-complete).
This is useful for watching movies or series on a laptop/netbook, since it allows the disk to spin down.
You should be able to create your own 'program' in a bash script which takes its first argument to be the filename using the convention "$1".
The bash script should look something like the below. I tested it, storing the script in the file cachedvlc.sh. The inverted commas helping to handle whitespace and weird characters...
#!/bin/bash
cat "$1" > /dev/null
vlc "$1"
...and will need to be made executable by changing its permissions through the file manager or running this in the terminal...
chmod u+x cachedvlc.sh
Then within your operating system, associate your bash script with the type of file you want to launch. For example on Ubuntu, you could add your script and call it 'Cached VLC' to the Menu using the 'Main Menu' application, then right-click on the file in Nautilus and choose 'Open with' to select your bash script.
After this, double-clicking or right-clicking on a file within your filemanager should be good enough to launch a cached view. This assumes what you say about caching is in fact correct, which I can't easily check.

Resources