Reduce Size of .forever Log Files Without Disrupting forever Process - node.js

The log files (in /root/.forever) created by forever have reached a large size and is almost filling up the hard disk.
If the log file were to be deleted while the forever process is still running, forever logs 0 will return undefined. The only way for logging of the current forever process to resume is to stop it and start the node script again.
Is there a way to just trim the log file without disrupting logging or the forever process?

So Foreverjs will continue to write to the same file handle and ideally would support something that allows you to send it a signal and rotate to a different file.
Without that, which requires code change on the Forever.js package, your options look like:
A command line version:
Make a backup
Null out the file
cp forever-guid.log backup && :> forever-guid.log;
This has the slight risk of if your writing to the log file at a speedy pace, that you'll end up writing a log line between the backup and the nulling, resulting in the loss of the log line.
Use Logrotate w/copytruncate
You can set up logrotate to watch the forever log directory to copy and truncate automatically based on filesize or time.
Have your node code handle this
You can have your logging code look at how many lines the log file is and then doing the copy truncate - this would allow you to avoid the potential data loss.
EDIT: I had originally thought that split and truncate could do the job. They probably can but an implementation would look really awkward. Split doesn't have a good way to splitting the file into a short one (the original log) and a long one (the backup). Truncate (which in addition to the fact that it's not always installed) doesn't reset the write pointer, so forever just writes the same byte as it would have, resulting in strange data.

You can truncate the log file without losing its handle (reference).
cat /dev/null > largefile.txt

Related

Is it dangerous to pipe data from S3 to a long-running process

I have a 30GB file and I want to feed it into a program that accepts data via stdin and that will take 24hr to process the data.
Can I just do aws s3 cp s3://bigfile.txt - | long_process.sh?
I'm attracted to this b/c I don't have to store bigfile.txt on disk and I can start working on the data stream immediately, but I worry that if the s3 command has a problem during the 24 hours that it will crash and I will lose all the progress.
EDIT:
Alternatively, I am looking for a way to stage the data to a file and read from the staged data. For example: aws s3 cp s3://bigfile.txt /local/bigfile.txt &; long_process.sh < /local/bigfile.txt. The trouble is that long_process.sh complains of a truncated file; It does not seem to wait for further input like a pipe would. I have thought about tee and mkfifo but nothing seems to quite fit my needs.
Yes, it’s dangerous because the processing might crash during the time and you could lose everything

Windows/Python check if file is open or in use

I am using python to monitor a folder and check if files are being copied in and if so, replicate those to a new location.
I am using the following to monitor the folder:
fsmonitor
The issue I am facing is that I am unable to discern if the file is in use and currently in the process of writing the contents onto disk. if so I want to wait till copying is complete and then start copying it to my new location.
So how do I find out if a file is in use/open?
I have seen some suggestions here where I try to write to the file question and if it fails then it indicates that the file is in use:
example answer (I've seen similar in python)
But I am reluctant to use such a method due to the fear that it might cause corruption and such issues.
Is there an alternative/safer way to do this? Or is testing write permissions safe?
Is anyone familiar with pywin32? Does it provide such tools? The site looks arcane, so wonder if it has the latest API provided by windows, even fsmointor mentioned above uses the same library and I wonder if there are newer/more efficient ways to do this.
Currently, I am using psutil, proc.open_files() to loop through all processes and all files to list out open files. if files that I am concerned about appear on this list I wait and try again. However, this process creates a humongous list of files and uses 12% of my CPU to create it, so I desperately need an alternative.
In response to Adrian McCarthy
I starting out assuming that it is safe to action whatever fsmonitor puts out, but if you see the following output which si for a single file copy:
0 86 0
create C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe 3684bf38
create C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe 3684bf38
0 86 0
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe a8cf3250
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe a8cf3250
0 160 0
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe caef5c64
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64.exe caef5c64
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64 - Copy.exe caef5c64
modify C:\Users\ScanUser\Pictures\syncTest dotnet-sdk-5.0.203-win-x64.exe caef5c64
So the conundrum is at which 'modify' do I start copying the file? I can wait a few minutes/seconds to see if another 'modified' appeared for that file but how do I decide the time to wait for a large file over SFTP may take 30 minutes, so I need something scalable.
Also, I would like not the make multiple copy actions for a file since that will make the script inefficient.
This can help you
check if a file is open in Python
here is a code:
try: # try to open the file
with open("file", "r") as file:
# some code here
except IOError:
# if it throws an error that means it is in use
I think you're unnecessarily concerned about working with the file while another process still has it open.
On Windows. fsmonitor using the ReadDirectoryChangesW mechanism. That means you'll get a notification about a change after it happens. So if a process writes to foo.log, you'll get a notification after the write operation is completed. (In fact, I think it's after the update of the directory metadata.)
To copy the file, you need read access. So just go ahead and open it for reading.
If it opens, then it's safe to read, even if another process has it open. You cannot corrupt a file by reading it even if another process is writing to it.
If it fails to open, then another process has it open and is intentionally preventing other processes from reading it (probably because they know they'll be actively updating it). In that case, you can try again later.
Trying to first check whether another process is using the file doesn't actually help because the answer could change between the moment you check and the moment you try to act on that information.
When you open a file, the system does the permission check and the opening under a mutex*, so the answer cannot change in between. There's no way for you to simulate that yourself from user-mode code. Once you have the file open, you can safely use it.
If you try to read from a file at the same moment another process tries to write to it, the system will ensure that the read will get the data as it was before the write or as it is after the write. It won't get a result that's a mixture of old and new.
That said, if you're reading the file with a bunch of small read operations while another process is writing to the file with a bunch of small write operations, it's possible you might capture some intermediate state of the file. But that's okay. The original file is unharmed, and those writes will trigger another fsmonitor notification, so you're code will start over and try to make another copy of the file.
* I'm using "mutex" in a generic sense: It uses some sort of synchronization mechanism, but it might not necessarily be a Windows Mutex object.

Better way to output to both console and output file than tee?

What I need to display is a log refreshing periodically. It's a block of about 10 lines of text. I'm using |tee and it works right now. However, the performance is less satisfying. It waits a while and then outputs several blocks of texts from multiple refreshes (especially when the program just starts, it takes quite a while to start displaying anything on the console and the first time I saw this, I thought the program was hanging). In addition, it breaks randomly in the middle of the last block, so it's quite ugly to present.
Is there a way to improve this? (Maybe output less each time and switch between output file and console more frequently?)
Solved by flushing stdout after printing each block. Credit to Kenneth L!
https://superuser.com/questions/889019/bash-better-way-to-output-to-both-console-and-output-file-than-tee
Assuming you can monitor the log as a file directly [update: turned out not to be the case]:
The usual way of monitoring a [log] file for new lines is to use tail -f, which - from what I can tell - prints new data added to the log file as it is being added, without buffering.
Similarly, tee passes data it receives via stdin on without buffering.
Thus, you should be able to combine the two:
tail -f logFile | tee newLogEntriesFile

Time taken by `less` command to show output

I have a script that produces a lot of output. The script pauses for a few seconds at point T.
Now I am using the less command to analyze the output of the script.
So I execute ./script | less. I leave it running for sufficient time so that the script would have finished executing.
Now I go through the output of the less command by pressing Pg Down key. Surprisingly while scrolling at the point T of the output I notice the pause of few seconds again.
The script does not expect any input and would have definitely completed by the time I start analyzing the output of less.
Can someone explain how the pause of few seconds is noticable in the output of less when the script would have finished executing?
Your script is communicating with less via a pipe. Pipe is an in-memory stream of bytes that connects two endpoints: your script and the less program, the former writing output to it, the latter reading from it.
As pipes are in-memory, it would be not pleasant if they grew arbitrarily large. So, by default, there's a limit of data that can be inside the pipe (written, but not yet read) at any given moment. By default it's 64k on Linux. If the pipe is full, and your script tries to write to it, the write blocks. So your script isn't actually working, it stopped at some point when doing a write() call.
How to overcome this? Adjusting defaults is a bad option; what is used instead is allocating a buffer in the reader, so that it reads into the buffer, freeing the pipe and thus letting the writing program work, but shows to you (or handles) only a part of the output. less has such a buffer, and, by default, expands it automatically, However, it doesn't fill it in the background, it only fills it as you read the input.
So what would solve your problem is reading the file until the end (like you would normally press G), and then going back to the beginning (like you would normally press g). The thing is that you may specify these commands via command line like this:
./script | less +Gg
You should note, however, that you will have to wait until the whole script's output loads into memory, so you won't be able to view it at once. less is insufficiently sophisticated for that. But if that's what you really need (browsing the beginning of the output while the ./script is still computing its end), you might want to use a temporary file:
./script >x & less x ; rm x
The pipe is full at the OS level, so script blocks until less consumes some of it.
Flow control. Your script is effectively being paused while less is paging.
If you want to make sure that your command completes before you use less interactively, invoke less as less +G and it will read to the end of the input, you can then return to the start by typing 1G into less.
For some background information there's also a nice article by Alexander Sandler called "How less processes its input"!
http://www.alexonlinux.com/how-less-processes-its-input
Can I externally enforce line buffering on the script?
Is there an off the shelf pseudo tty utility I could use?
You may try to use the script command to turn on line-buffering output mode.
script -q /dev/null ./script | less # FreeBSD, Mac OS X
script -c "./script" /dev/null | less # Linux
For more alternatives in this respect please see: Turn off buffering in pipe.

Syncing two files when one is still being written to

I have an application (video stream capture) which constantly writes its data to a single file. Application typically runs for several hours, creating ~1 gigabyte file. Soon (in a matter of several seconds) after it quits, I'd like to have 2 copies of file it was writing - let's say, one in /mnt/disk1, another in /mnt/disk2 (the latter is an USB flash drive with FAT32 filesystem).
I don't really like an idea of modifying the application to write 2 copies simulatenously, so I though of:
Application starts and begins to write the file (let's call it /mnt/disk1/file.mkv)
Some utility starts, copies what's already there in /mnt/disk1/file.mkv to /mnt/disk2/file.mkv
After getting initial sync state, it continues to follow a written file in a manner like tail -f does, copying everything it gets from /mnt/disk1/file.mkv to /mnt/disk2/file.mkv
Several hours pass
Application quits, we stop our syncing utility
Afterwards, we run a quick rsync /mnt/disk1/file.mkv /mnt/disk2/file.mkv just to make sure they're the same. In case if they're the same, it should just run a quick check and quit fairly soon.
What is the best approach for syncing 2 files, preferably using simple Linux shell-available utilities? May be I could use some clever trick with FUSE / md device / tee / tail -f?
Solution
The best possible solution for my case seems to be
mencoder ... -o >(
tee /mnt/disk1/file.mkv |
tee /mnt/disk2/file.mkv |
mplayer -
)
This one uses bash/zsh-specific magic named "process substitution" thus eliminating the need to make named pipes manually using mkfifo, and displays what's being encoded, as a bonus :)
Hmmm... the file is not usable while it's being written, so why don't you "trick" your program into writing through a pipe/fifo and use a 2nd, very simple program, to create 2 copies?
This way, you have your two copies as soon as the original process ends.
Read the manual page on tee(1).

Resources