Tailer following changed file - python-3.x

I've been using tailer to tail a log file for a program.
I've been running into some issues as the program I am reading the log for creates a new log file on restart (of the same name); this causes me a major issue as tailer will not follow the new logfile (of the same name) when this occurs. It is running within a thread and it has to share memory with several other locations including code that has not been called through threading. Since tailer has an active thread open and running I can't just join the thread as it is still executing code and thus it's stuck. Is there a way around this (without using multiprocessing and killing it through that)?
import tailer
for line in tailer.follow(open("mytestfile.log", encoding='utf-8')):
#do some stuff with the line
That would be an example follow. Any recommendations to get around that?

Related

Python multiprocessing deadlock when calling logger-issue6721

I have a code running in Python 3.7.4 which forks off multiple processes. I believe I'm hitting a known issue (issue6721: https://github.com/python/cpython/issues/50970). I setup the child process to send "progress report" through a pipe to the parent process and noticed that sometimes a log statement doesn't get printed and that the code gets stuck in a deadlock situation.
After reading issue6721, I'm not sure I'm still understanding why parent might hold logger Handler lock after a log statement is done execution (i.e the line that logs is executed and the execution has moved to the next line of code). I totally get it that in the context of C++, the compiler might re-arrange instructions. Not fully understand it in context of Python. In C++ I can have barrier instructions to stop the compiler moving instructions beyond a point. Is there something similar that can be done in Python to avoid having a lock getting copied to child process?
I have seen solutions using "atfork" which is a library that seems not supported (so I can't really use it).
Does anyone know a reliable and standard solution to this problem?

does python 3 multiprocessing freeze_support() sets the starting method to spawn?

well recently I encountered some freezing in my applications in Long run.
my program uses an infinite while loop to constantly check for new processes from a redis db and if there is any job to work on it will spawns a new process to run it in the background.
so I had issue with its freezing after 20 minutes, sometimes 10 minutes. it took me one week to figure it out that the problem rise from lack of this line before my while loop:
multiprocessing.set_start_method('spawn')
it looks like python does not do that on Windows and since windows does not support fork it's gonna stuck.
anyway, it seems this will solve my problem but I have another question.
in order to make a exe file for this program with something like pyinstaller I need to add another line as below to make sure its not freezing in the exe execution:
multiprocessing.freeze_support()
I want to know does this freeze_support() automatically sets the start method to 'spawn' too? I mean should I use both of these lines or just running one of them is ok? if so which one should I use from now on?
In the case of windows, spawn is already the default method so it would not be necessary to run the set_start_method ('spawn') line of code.
The freeze_support () is a different thing that does not affect the definition of start methods. You must use it in this scenario to generate an .exe.

replace a process bin file when it is running

I have a server program(compile by g++) which is running. And I change some code and compile a new bin file. Without kill the running process, I mv the new created bin to overwrite the old one.
After a while, the server process crashed. Dose it relate to my replace action?
My server is an multi-thread high concurrent server. One crash is segfault, other one is deadlock.
I print all parameters in the core dump file and pass them exactly same to the function which was crashed. But it is OK.
And I carefully watch all thread info in the deadlock core dump, I can not find it is an possibility to cause deadlock.
So I doubt the replacement will cause strange things
According to this question, if swap action is happen, it indeed will generate strange things
For a simple standard program, even if it is currently opened by the running process, moving a new file will first unlink the original file which will remain untouched apart from that.
But for long running servers, many things can happen: some fork new processes and occasionally some can even exec a new fresh version. In that case, you could have different versions running side by side which could or not be supported depending on the change.
Said differently, without more info on what is the server program, how it is designed to run and what was the change, the only answer I can give is maybe.
If you can make sure that you remove ONLY the bin file, and the bin file isn't used by any other process (such as some daemon). Then it doesn't relate to your replace action.

Node.js Library to start tailing a file from specific line or line number?

I am searching for library/module that starts tailing from a specific line number so that even if my server dies down or restarts it should start from the last read line. I am little new to node.js
I am unaware of such module, but all you actually need is to use some NPM library for file read access and then you persist the data about the last read line, so it can resume after Node.js restart. Or even better- you fork a child process and this process is reading the data. If it reach an error, the child process terminates with message to the main Node.js process about the last read line. I hope that helps!
This is how I tackled the problem.
I maintained two log files one of which use to store the lines already read so even if my server dies down reading the main log file I can skip all the logs I have already logged by comparing the time of last logged line with the time of the main log and skip the lines of main log till the time of my own log is greater than or equal to the main log. Once the mainlog time is greater I start reading the from that point again.

cross-process locking in linux

I am looking to make an application in Linux, where only one instance of the application can run at a time. I want to make it robust, such that if an instance of the app crashes, that it won't block all the other instances indefinitely. I would really appreciate some example code on how to do this (as there's lots of discussion on this topic on the web, but I couldn't find anything which worked when I tried it).
You can use file locking facilities that Linux provides. You haven't specified the language, however you might find this capability pretty much everywhere in some form or another.
Here is a simple idea how to do that in a C program. When the program starts you can take an exclusive non-blocking lock on the whole file using fcntl system call. When another instance of the applications is attempted to be started, it will get an error trying to lock the file, which will mean the application is already running.
Here is a small example how to take the full file lock using fcntl (this function provides facilities for putting byte range locks, but when length is 0, the full file is locked).
struct flock lock_struct;
memset(&lock_struct, 0, sizeof(lock_struct));
lock_struct.l_type = F_WRLCK;
lock_struct.l_whence = SEEK_SET;
lock_struct.l_pid = getpid();
ret = fcntl(fd, F_SETLK, &lock_struct);
Please note that you need to open a file first to put a lock. This means you need to have a file around to use for locking. It might be useful to put the it somewhere where it won't cause any distraction/confusion for other applications.
When the process terminates, all locks that it has taken will be released, so nothing will be blocked.
This is just one of the ideas. I'm pretty sure there are other ways around.
The conventional UNIX way of doing this is with PID files.
Before a process starts, it checks to see if a pre-determined file - usually /var/run/<process_name>.pid exists. If found, its an indication that a process is already running and this process quits.
If the file does not exist, this is the first process to run. It creates the file /var/run/<process_name>.pid and writes its PID into it. The process unlinks the file on exit.
Update:
To handle cases where a daemon has crashed & left behind the pid file, additional checks can be made during startup if a pid file was found:
Do a ps and ensure that a process with that PID doesn't exist
If it exists ensure that its a different process
from the said ps output
from /proc/$PID/stat

Resources