Copy modified content to new file in linux - linux

how can we write a shell script in linux to copy a newly added content from a file and append it to another file.
I have a log file where errors will be stored and i am supposed to retrieve the new errors and store it in a database table. I will run a cron job invoking the shell script in a certain interval.
Edited:
Sample Log
140530 13:48:57 [ERROR] Event Scheduler: [root#%][test.event] Table 'test.test_event' doesn't exist
140530 13:48:57 [Note] Event Scheduler: [root#%].[test.event] event execution failed.
140530 13:49:57 [ERROR] Event Scheduler: [root#%][test.event] Table 'test.test_event' doesn't exist
140530 13:49:57 [Note] Event Scheduler: [root#%].[test.event] event execution failed.
initially i copied this into a file using cat but later some more error will be logged, only newly added lines should be logged.how can i do it in a routine basis.
Kindly help! Thanks in advance !

Simplest case
You can use tail -f to keep retrieving data from a file whenever it is appended to, then use >> (appending redirect) to append it to your second file.
tail -f file1.txt >> file2.txt
will "watch" file1.txt and append new content to file2.txt.
To test that it works, open another terminal and do:
echo "Hello!" >> file1.txt
You should see "Hello!" appear in file2.txt.
Please note that this will only work if the underlying I/O operation on file1.txt was an actual append. It won't work if you open file1.txt in a text editor and change its content, for instance. It also won't work as a cron job, because it needs to run continuously.
With cron
To periodically check for appends, you could do a diff on an earlier version of the file you saved somewhere, then use sed to get only those lines that were appended in the meantime:
diff file1_old.txt file1_current.txt | \
sed -r -e '/^[^>]/ d' -e 's/^> //' >> file2.txt
But then you have to take care of storing the earlier versions somewhere etc. in your cron job as well.

If you need to append (catenate) one file with another, use the "cat" command:
cat file1.txt file2.txt > fileall.txt
But if you need to modify the contents of a file, I recommend you to use "sed", or "grep" if what you need is a filter.
Sorry, your specification is a bit loose, so I cannot give you a more exact answer.
BTW. Database table? Can you please explain?

Related

Tail command not tailing the newly created files under a directory in linux

I am trying to tail all the log files present under a directory and it's sub-directories recursively using below command
shopt -s globstar
tail -f -n +2 /app/mylogs/**/* | awk '/^==> / {a=substr($0, 5, length-8); next} {print a":"$0}'
and the output is below:
/app/mylogs/myapplog10062020.log:Hi this is first line
/app/mylogs/myapplog10062020.log:Hi this is second line
which is fine, but problem is when I add a new log file under /app/mylogs/,directory after I fire above tail command. tail will not take that new file into consideration.
Is there a way to get this done?
When you start your the tail process, you pass to it a (fixed) list of the files which tail is suppoed to follow, as you can see from the tail man page. This is different to, say, 'find', where you can in its options pass a file name pattern. After the process has been started, tail has no way of knowing that you suddenly want it to follow another file too.
If you want to have a feature like this, you would have to program your own version of tail, which gets passed for instance a directory to scan, and either periodically checks the directory content for change, or using a service such as inotify to be informed by directory changes.

List files which have a corresponding "ready" file

I have a service "A" which generates some compressed files comprising of the data it receives in requests. In parallel there is another service "B" which consumes these compressed files.
The trick is "B" shouldn't consume any of the files unless they are written completely. The service deduces this information by looking for a ".ready" file created by service "A" with name exactly same as the file generated along with the extension mentioned; once the compression is done. Service "B" uses Apache Camel to do this filtering.
Now, I am writing a shell script which needs the same compressed files and this would need the same filtering be implemented in shell. I need help writing this script. I am aware of find command but a naive shell user, so have very limited knowledge.
Example:
Compressed file: sumit_20171118_1.gz
Corresponding ready
file: sumit_20171118_1.gz.ready
Another compressed file: sumit_20171118_2.gz
No ready file is present for this one.
Of the above listed files only the first should be picked up as it has a corresponding ready file.
The most obvious way would be to use a busy loop. But if you are on GNU/Linux you can do better than that (from: https://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-dir-processor)
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |
parallel -uj1 echo Do stuff to file {}
This way you do not even have to wait for the .ready file: The command will only be run when writing to the file is finished and the file is closed.
If, however, the .ready file is only written much later then you can search for that one:
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |
grep --line-buffered '\.ready$' |
parallel -uj1 echo Do stuff to file {.}

Listing files while working with them - Shell Linux

I have a database server that it basic work is to import some specific files, do some calculations and provide data in a web interface.
It's planned for next weeks a hardware replacement, it needs to migrate the database. But there's one problem in it: the actual database is corrupted and show some errors in web interface. This is due to server freezing while importing/calculating, that's why the replacement.
So I'm not willing to just dump the db and restore in the new server. Doesn't make sense to still use the corrupted database and while dumping the old server goes really slow. I have a backup from all files to be imported (the current number is 551) and I'm working on a script to "re-import" all of them and have a nice database again.
The actual server takes ~20 minutes to import each new file. Let's say that new server takes 10 for each file due to its power... It's a long time! And here comes the problem: it receives new file hourly, so there will be more files when it finishes the job.
Restore script start like this:
for a in $(ls $BACKUP_DIR | grep part_of_filename); do
Question is: does this "ls" will have new file names when they come? File names are timestamp based, so they will be in the end of the list.
Or does this "ls" is execute once and results goes to a temp var?
Thanks.
ls will execute once, at the beginning, and any new files won't show up.
You can rewrite that statement to list the files again at the start of each loop (and, as Trey mentioned, better to use find, not ls):
while all=$(find $BACKUP_DIR/* -type f | grep part_of_filename); do
for a in $all; do
But this has a major problem: it will repeatedly process the same files over and over again.
The script needs to record which files are done. Then it can list the directory again and process any (and only) new files. Here's one way:
touch ~/done.list
cd $BACKUP_DIR
# loop while f=first file not in done list:
# find list the files; more portable and safer than ls in pipes and scripts
# fgrep -v -f ~/done.list pass through only files not in the done list
# head -n1 pass through only the first one
# grep . control the loop (true iff there is something)
while f=`find * -type f | fgrep -v -f ~/done.list | head -n1 | grep .`; do
<process file $f>
echo "$f" >> ~/done.list
done

Redirecting the cat ouput of file to the same file

In a particular directory, I made a file named "fileName" and add contents to it. When I typed cat fileName, it's content are printed on the terminal. Now I used the following command:
cat fileName>fileName
No error was shown. Now when I try to see contents of file using,
cat fileName
nothing was shown in the terminal and file is empty (when I checked it). What is the reason for this?
> i.e. redirection to the same file will create/truncate the file before cat command is invoked as it has a higher precedence. You could avoid the same by using intermediate file and then from intermediate to actual file or you could use tee like:
cat fileName | tee fileName
To clarify on SMA's answer, the file is truncated because redirection is handled by the shell, which opens the file for writing before invoking the command. when you run cat file > file,the shell truncates and opens the file for writing, sets stdout to the file, and then execute ["cat", "file"]. So you will have to use some other command for the task like tee
The answers given here are wrong. You will have a problem with truncating regardless of using the redirect or pipeline, although it may APPEAR to work sometimes, depending on size of file or length of your pipeline. It is a race condition, as the reader may have a chance to read some or all of the file before the writer starts, but the point of the pipeline is to run all these at the same time so they will be starting at the same time and the first thing tee executable will do is open the output file (and truncate it in the process). The only way you will not have a problem in this scenario is if the end of the pipeline would load the entirety of the output into memory and only write it to file on shutdown. It is unlikely to happen and defeats the point of having a pipeline.
Proper solution for making this reliable is to just write to a temp file and then rename the temp file back to original filename:
TMP="$(mktemp fileName.XXXXXXXX)"
cat fileName | grep something | tee "${TMP}"
mv "${TMP}" fileName

what is the shell command which can give a list of the size of current directory which is used for downloading?

watch -n 3 du -sh >> log
this command may update the value every 3 seconds, but only latest size of current directory is stored in file log, the old values are simplely overwrite, so how to reserve the old values, and store it to the file named log?
watch does not overwrite the file. In fact, it is not possible to overwrite a file in the middle of a redirection.
What happens is watch only saves the differences between successive screens (using ANSI codes). It was not designed to be used to log something (therefore it is called "watch", anyway).
Use xxd to see the real content of the log file.
Perhaps this might do more what you want:
while sleep 3
do
du -sh
done >> log &
tail -F log

Resources