YAML file one line filled with null characters, #0000 character not supported while reading - python-3.x

I've built a python based application(which runs 24/7) that logs some information in a YAML file every few minutes. It was working perfectly for a few days. Suddenly after approximately after 2 weeks, one line in the YAML file was filled with NUL characters (416 characters of NUL to be precise).
Now the suspicion is that someone might've tried to open the already running application again, so both the applications tried to write/access the same YAML file which could've caused this. But I couldn't replicate this.
Just wanted to know the cause of this issue.
Please let me know if someone faced the same issue before.
Some context about the file writing:
The YAML file will be loaded in append mode and a list is written inside it using the command below:
with open(file_path, 'a') as file:
yaml.dump(summary_list, file)

Concurrent access is a possible cause for this especially when you're appending. For example, it may be that both instances opened the file and set the start marker on the same position, but let the file grow to the sum of both appended data dumps. That cause some part of the file not to be written, which might explain the NULs.
Whatever happened is more dependent on your OS and your filesystem than it is on YAML. But even if we knew that we couldn't tell for sure.
I recommend using a proper logging framework to avoid such issues; you can dump YAML as string to log it.

Related

Check if file has been changed and which line in file

I am looking some solutions which help me to track any changes which have been made on files. I am working on Linux system where a lot of people have access to the same files. Sometimes it is happened, that someone changed something in file and don't notify other users. So I would like to write some script to check if provide file path or files have been changed, if so then write in file let's say "controlfile_File1.txt" something like that "File changed %date, line XXX". I know that I can use md5checksum for that, but I will get only info if file changed but I would like to know which line is changed. I also think about solution to make copy of this file to some place and make some diff between copied file and current file?
Any ideas?
thanks for support.
Your question goes above the possibilities of Linux as a platform: Linux can show you the last modification date of a file, and the last time a file has been accessed (even without modifying the file), but that's it.
What are you are looking for, as already mentioned in the comments, is a version control system. As mentioned indeed, Git is one of them, but there are also others (SourceTree, SourceSafe, Clearcase, ...) each of them having their (dis)advantages.
One thing they all have in common is that modifying such a file does not go that simply anymore: at every time somebody has modified such a file (a file under the version control system), (s)he will be asked why (s)he has done this, this will also be recorded for later reference.

source code files with weird line endings

I have noticed that some .Net code committed by a new contracting team has strange line endings. When I do a hex dump of the files I see that each line ends with 2 carriage returns (0d) and 1 line feed (0a).
When viewed in Visual Studio it looks like every code line has an empty line after it, which looks very odd.
What can cause this? It it some strange IDE? Could it be caused by Perforce? (I got the code out by syncing a Perforce workspace).
The only time I've ever seen non-standard line endings before is when people copy/paste code from a web page, email, or chat window. Could that be the cause?
If they submitted Windows-style (CRLF) line endings but used the unix (LF) LineEnd setting in their client workspaces, then the files would have an extra CR as part of each line, and a Windows machine would sync them down as CRCRLF. That's the most likely explanation for what you're seeing.
The ideal way to fix this is just for everyone to use a LineEnd that matches their environment (usually the default of local works just fine for this), but if someone needs to use a mix of tools/platforms within a single workspace, switching to the share LineEnd option will force everything to be normalized on submit by stripping all the CRs. (This also makes it impossible to submit text files with actual CR characters, but that's usually not a big deal -- for files where you don't want any sort of transformation to occur, use the binary filetype.)

Why do I get no error when running the same Python script on multiple terminals at the same time?

I know from experience that if I try to open the same file in Vim in multiple terminals at the same time, I get an error. (Maybe because of temporary files?)
And I know from experience that if I open a text file in Python and read through it, I have to reset the pointer when I'm done.
But I've found that if I run the same Python script in multiple terminals at the same time, I don't get any error; it just successfully runs the script in both. How does this work? Doesn't Python need to read my script from the beginning in order to run it? Is the script copied to a temporary file, or something?
I know from experience that if I try to open the same file in Vim in multiple terminals at the same time, I get an error.
That's not actually true. Vim actually will let you open the same file in multiple terminals at the same time; it's just that it gives you a warning first to let you know that this is happening, so you can abort before you make changes. (It's not safe to modify the file concurrently in two different instances of Vim, because the two instances won't coordinate at all.)
Furthermore, Vim will only give you this warning if you try to open the same file for editing in multiple terminals at the same time. It won't complain if you're just opening the file for reading (using the -R flag).
And I know from experience that if I open a text file in Python and read through it, I have to reset the pointer when I'm done.
That's not exactly true, either. If you make multiple separate calls to open, you'll have multiple separate file objects, and each separately maintains its position in the file. So something like
with open('filename.txt', 'r') as first:
with open('filename.txt', 'r') as second:
print(first.read())
print(second.read())
will print the complete contents of filename.txt twice.
The only reason you'd need to reset the position when you're done reading a file is if you want to use the same file object to read the file again, or if you've opened the file in read/write mode (r+ rather than r) and you now want to switch from reading to writing.
But I've found that if I run the same Python script in multiple terminals at the same time, I don't get any error; it just successfully runs the script in both. How does this work? Doesn't Python need to read my script from the beginning in order to run it? Is the script copied to a temporary file, or something?
As I think should now be clear — there's no problem here. There's no reason that two instances of Python can't both read the same script file at the same time. Linux allows that. (And in fact, if you delete the file, Linux will keep the file on disk until all programs that had it open have either closed it or exited.)
In fact, there's also no reason that two processes can't write to the same file at the same time, though here you have to be very careful to avoid the processes causing problems for each other or corrupting the file.
terminal is just running the command you said it to execute, there is no pointer or anything
you jus

How to stream log files content that is constantly changing file names in perl?

I a series of applications on Linux systems that I need to basically constantly 'stream' out or even just 'tail' out but the challenge is the filenames are constantly rolling and changing.
The are all date encoded (dates being in different formats) and each then have different incremented formats.
Most of them start with one and increase, but one doesn't have an extension and then adds an extension past the first file and the other increments a number but once hitting 99 rolls to increment a alpha and returns the numeric to 01 and then up again as it rolls so quickly.
I just have the OS level shell scripting, OS command line utilities, and perl available to me to handle this situation for another application to pickup and read these logs.
The new files are always created right when it starts writing to the new file and groups of different logs (some I am reading some I am not) are being written to the same directory so I cannot just pickup anything hitting the directory.
If I simply 'tail -n 1000000 -f |' them today this works fine for the reader application I am using until the file changes and I cannot setup file lists ranges within the reader application, but can pre-process them so they basically appear as a continuous stream to the reader vs. the reader directly invoking commands to read them. A simple Perl log reader like this also work fine for a static filename but not for dynamic ones. It is critical I don't re-process any logs lines and just capture new lines being written to the logs.
I admit I am not any form a Perl guru, and the best answers / clue I've been able to find so far is the use of Perl's Glob function to possibly do this but the examples I've found basically reprocess all of the files on each run then seem to stop.
Example File Names I am dealing with across multiple apps I am trying to handle..
appA_YYMMDD.log
appA_YYMMDD_0001.log
appA_YYMMDD_0002.log
WS01APPB_YYMMDD.log
WS02APPB_YYMMDD.log
WS03AppB_YYMMDD.log
APPCMMDD_A01.log
APPCMMDD_B01.log
YYYYMMDD_001_APPD.log
As denoted above the files do not have the same inode and simply monitoring the directory for change is not possible as a lot of things are written there. On the dev system it has more than 50 logs being written to the directory and thousands of files and I am only trying to retrieve 5. I am seeing if multitail can be made available to try that suggestion but it is not currently available and installing any additional RPMs in the environment is generally a multi-month battle.
ls -i
24792 APPA_180901.log
24805 APPA__180902.log
17011 APPA__180903.log
17072 APPA__180904.log
24644 APPA__180905.log
17081 APPA__180906.log
17115 APPA__180907.log
So really the root of what I am trying to do is simply a continuous stream regardless if the file name changes and not have to run the extract command repeatedly nor have big breaks in the data feed while some script figures out that the file being logged to has changed. I don't need to parse the contents (my other app does that).. Is there an easy way of handling this changing file name?
How about monitoring the log directory for changes with Linux inotify, e.g. Linux::inotify2? Then you could detect when new log files are created, stop reading from the old log file and start reading from the new log file.
Try tailswitch. I created this script to tail log files that are rotated daily and have YYYY-MM-DD on their names. To use this script, you just say:
% tailswitch '*.log'
The quoting prevents the shell from interpreting the glob pattern. The script will perform glob pattern from time to time to switch to a newer file based on its name.

Increase max line length for coffeelint in vim (mvim) when editing coffee files

When I edit any .coffee file in my mvim and try to save that file with any line longer than 80 symbols, I get such error.
file_name.coffee |18 error| Line exceeds maximum allowed length Length is 91, max is 80.
This is extremely annoying, especially taking into consideration that we have convention of max 100 symbols per line in our company and even code of other team members causes problems for me locally.
The only place where I can change this limit is in nodejs module in file .../node_modules/coffeelint/lib/coffeelint.js, which has such line:
max_line_length: {
value: 80,
-level: ERROR,
+level: IGNORE,
message: 'Line exceeds maximum allowed length'
},
But, of course, editing sources of nodejs libraries is not a good option.
In my mvim I use these dotfiles - https://github.com/skwp/dotfiles
In my project directory I have .coffeelint.json, but it does not work, however, it seems to contain needed and valid code for that (it works perfectly on TravisCI and on machines of other team members).
Questions:
Is there any place where I can turn off coffeelint call when saving file?
Is there any place where I can configure my coffelint max allowed line length?
Update:
Putting properly named (.coffeelint.json) config file into home directory helps, but is not proper solution in my case.
It seems it is more a coffeelint question than a Vim question.
From http://www.coffeelint.org/#options :
It seems you have to generate a configuration file, tweaking the following option.
max_line_length This rule imposes a maximum line length on your code.
Python's style guide does a good job explaining why you might want to
limit the length of your lines, though this is a matter of taste.
Lines can be no longer than eighty characters by default.
default level: error
It also seems you have to call coffeelint with your configuration file:
From : http://www.coffeelint.org/#usage
coffeelint -f coffeelint.json application.coffee
You probably have to find in your dotfile where the coffeelint invocation is done, and add the configuration file with the -f option there.
You don't have to pass the config file explicitly. Here are the user docs for CoffeeLint. You should either create a ~/coffeelint.json file or create a coffeelint.json in the root of your project.
In all project parts (5 different repos now) we currently have .coffeelint.json file, that is not the proper name for coffeelint, if you want it to pick config file automatically. Current .coffeelint.json is used on TravisCI when checking code and is invoked with -f option, as it turned out. So I my case I have two ways to fix weird behaviour (that is intended behaviour, actually):
Copy one of the configs from 5 related repos to ~/coffeelint.json, so that coffeelint will use it automatically when vim will check file on save (but this will not do if some repos will have different configs, however, this solution does not require any changes to repos).
Create copy of each config file in each repository (so I'll have both .coffeelint.json and coffeelint.json in each repo) and add the newly added one to .gitignore, so that team members will not see it in their editors. This option is also inappropriate and looks ugly, cause I have to add 5 changes and 5 commits.
It seems that guys from the team decided to name coffeelint config file not properly in order to hide it visually in code editors. Solution cost me nerves, so, probably, I'll reconfigure everything properly and will rename configs to default names.
It would be nice if coffeelint supported multiple config files with levels of priority, but this is not possible now.

Resources