Python PATH .is_file() evaluates symlink as a file - linux

In my Python3 program, I take a bunch of paths and do things based on what they are. When I evaluate the following symlinks (snippet):
lrwxrwxrwx 1 513 513 5 Aug 19 10:56 console -> ttyS0
lrwxrwxrwx 1 513 513 11 Aug 19 10:56 core -> /proc/kcore
lrwxrwxrwx 1 513 513 13 Aug 19 10:56 fd -> /proc/self/fd
the results are:
symlink console -> ttyS0
file core -> /proc/kcore
symlink console -> ttyS0
It evaluates core as if it were a file (vs a symlink). What is the best way for me to evaluate it as a symlink vs a file? code below
#!/usr/bin/python3
import sys
import os
from pathlib import Path
def filetype(filein):
print(filein)
if Path(filein).is_file():
return "file"
if Path(filein).is_symlink():
return "symlink"
else:
return "doesn't match anything"
if __name__ == "__main__":
file = sys.argv[1]
print(str(file))
print(filetype(file))

The result of is_file is intended to answer the question "if I open this name, will I open a file". For a symlink, the answer is "yes" if the target is a file, hence the return value.
If you want to know if the name is a symlink, ask is_symlink.

Related

How do I change the line endings used by PExpect output

The returned output from pexpect.run() includes \r\n at the end of every line. Printing to the terminal using print(returnVal.decode()) correctly prints one line for each line returned. When I examine the output I see that the byte string contains \r\n. When I log that to a file I get double returns to the log file. I'm on a Mac using Python 3.7. Is there a way to set the preferred new line when writing the output? I am using pythons logging class and using the info() method to write the string. Output looks like this:
total 80
-rw-r--r-- 1 xxxx admin 1048 Nov 12 00:41 Constants.py
-rw-r--r-- 1 xxxx admin 5830 Nov 12 13:33 file1.py
-rw-r--r-- 1 xxxx admin 2255 Nov 12 00:51 file2.py
When it should look like:
total 80
-rw-r--r-- 1 xxxx admin 1048 Nov 12 00:41 Constants.py
-rw-r--r-- 1 xxxx admin 5830 Nov 12 13:33 file1.py
-rw-r--r-- 1 xxxx admin 2255 Nov 12 00:51 file2.py
Here is a simplified version of my original Logger class:
class Logger():
def __init__( self, path ):
msgFormat = '%(asctime)s.%(msecs)d\t%(message)s'
dateFormat = '%m/%d/%Y %H:%M:%S'
logging.basicConfig( format=msgFormat, datefmt=dateFormat, filename=path, level=logging.INFO )
def Log ( self, theStr ):
logging.info( str( theStr ))
The string being returned from Pexpect looks something like:
Line1\r\nLine2
Depending on how you log the output, it's advisable to format the newlines before sending to logger. However, if you must override the logging module's newline parameter for FileHandler, and as an experiment, you can do so by monkey patching its _open method as the functionality isn't available by default.
I used source code for Python version 3.8 to get _open function's definition.
import logging
def custom_open(self):
"""
Monkey patched _open function of class logging.FileHandler (Python 3.8)
"""
return open(self.baseFilename, self.mode, encoding=self.encoding, newline='')
logging.FileHandler._open = custom_open
if __name__ == "__main__":
pexpect_return = "Output\nTest"
my_log = logging.getLogger("test_logger")
my_log.setLevel(logging.INFO)
my_log.addHandler(logging.FileHandler("test.log"))
my_log.info(pexpect_return)
How it works
Python's logging module has a class FileHandler, which uses a method _open to create a file handler object to write and append to log files on disk. Its default implementation as of version 3.8 does not have the newline parameter so it uses default newlines.
Monkey patching is when you replace or update a method/function in one of your imported classes, as the program is running. This line logging.FileHandler._open = custom_open tells python to replace the _open method of the FileHandler class, with my custom_open method. Then later when I use my_log.addHandler(logging.FileHandler("test.log")), the new custom_open method is used to open the file with newline paramater.
You can further confirm that the new method is used to open the file by adding a suffix to the file name like this:
return open(self.baseFilename+"__Monkey_Patched", self.mode, encoding=self.encoding, newline='')
If you will now run that demo code, the filename will be "test.log__Monkey_Patched".
This code, however, will not replace any newline characters which you pass to the logger as part of the string to log. You need to process that beforehand.

Python-Celery Log

I use flask, python 3.x and celery4 (Total 8 workers)
I want to make log files with 'RotatingFileHandler' to split if file size is over.
It works fine at first log file. (It includes all workers log, PoolWorker-1 ~ PoolWorker-8)
-rw-rw-r-- 1 sj sj 1048530 9월 18 10:01 celery_20170918.log (All worker's log)
But when file size is over, the worker write logs on seperated files.
-rw-rw-r-- 1 sj sj 223125 9월 18 10:47 celery_20170918.log (All worker's log except below 2, 5, 6))
-rw-rw-r-- 1 sj sj 43785 9월 18 10:47 celery_20170918.log.1 (only PoolWorker-2 log)
-rw-rw-r-- 1 sj sj 46095 9월 18 10:47 celery_20170918.log.2 (only PoolWorker-5 log)
-rw-rw-r-- 1 sj sj 45990 9월 18 10:47 celery_20170918.log.3 (only PoolWorker-6 log)
-rw-rw-r-- 1 sj sj 1048530 9월 18 10:01 celery_20170918.log.4 (Log file made at first is changed to this.)
I don't know what is the rule and they have any duplicated logs..!!!
My celery logger is as below.
tasks.py
logger = get_task_logger('tasks')
logger.setLevel("INFO")
filename = './log/celery/celery_task.log'
formatter = Formatter('%(levelname)s-%(asctime)s %(processName)s %(funcName)s():%(lineno)d %(message)s')
# FileSize rotating
fileMaxByte = 1024 * 1024 * 1 # 30MB
fileHandler = logging.handlers.RotatingFileHandler(filename, maxBytes=fileMaxByte, backupCount=100)
fileHandler.setFormatter(formatter)
logger.addHandler(fileHandler)
#celery.task(...options...)
def test_call(self):
logger.info("LOG TEST")
test.py
if __name__ == '__main__':
test_call.apply_async()
What's wrong?
RotatingFileHandler don't maintain atomicity between multiprocess when do log file rollover.
In multi processes environment, process A see maxBytes reached on log file c.log, do a file rename to c.log.1, then write some log lines to newly created c.log.
But at the same time, another process B may still holding a handle of c.log, because the way used to check file size, is based on the end offset of file handle, it also sees maxBytes reached, want to do file roll over on its own, because the way used to roll over, is rename file on disk, this process try to rename c.log to c.log.1, but as c.log.1 exists, renamed to c.log.2.
As there are other processes, c.log.3 get created so on and so forth.
This issue is well addressed by using external logging mechanism, or you could wrap up your own atomic file rotating logging handler.

Syntaxnet Turkish Language Data Set Non Existent Map Files

I am new to Syntaxnet and i tried to use pre-trained model of Turkish language through the instructions here
Point-1 : Although I set the MODEL_DIRECTORY environment variable, tokenize.sh didn't find the related path and it gives error like below :
root#4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi." | syntaxnet/models/parsey_universal/tokenize.sh
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: label-map**)
Point-2 : So, I changed the tokenize.sh through commenting the MODEL_DIR=$1 and set my Turkish language model path like below to go on :
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
CONTEXT=syntaxnet/models/parsey_universal/context.pbtxt
INPUT_FORMAT=stdin-untoken
MODEL_DIR=$1
MODEL_DIR=syntaxnet/models/etiya-smart-tr
Point-3 : After that when I run it as told, it gives error like below :
root#4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi" | syntaxnet/models/parsey_universal/tokenize.sh
I syntaxnet/term_frequency_map.cc:101] Loaded 29 terms from syntaxnet/models/etiya-smart-tr/label-map.
I syntaxnet/embedding_feature_extractor.cc:35] Features: input.char input(-1).char input(1).char; input.digit input(-1).digit input(1).digit; input.punctuation-amount input(-1).punctuation-amount input(1).punctuation-amount
I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: chars;digits;puncts
I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 16;16;16
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: syntaxnet/models/etiya-smart-tr/char-map**)
I had downloaded the Turkish package through tracing the link pattern indicated like download.tensorflow.org/models/parsey_universal/.zip
and my language mapping file list like below :
-rw-r----- 1 root root 50646 Sep 22 07:24 char-ngram-map
-rw-r----- 1 root root 329 Sep 22 07:24 label-map
-rw-r----- 1 root root 133477 Sep 22 07:24 morph-label-set
-rw-r----- 1 root root 5553526 Sep 22 07:24 morpher-params
-rw-r----- 1 root root 1810 Sep 22 07:24 morphology-map
-rw-r----- 1 root root 10921546 Sep 22 07:24 parser-params
-rw-r----- 1 root root 39990 Sep 22 07:24 prefix-table
-rw-r----- 1 root root 28958 Sep 22 07:24 suffix-table
-rw-r----- 1 root root 561 Sep 22 07:24 tag-map
-rw-r----- 1 root root 5234212 Sep 22 07:24 tagger-params
-rw-r----- 1 root root 172869 Sep 22 07:24 word-map
QUESTION-1 :
I am aware that there is no char-map file in the directory so I got the error written # Point-3 above. So, does anyone have an opinion about how the Turkish language test could be done and the result was shared as %93,363 for Part-of-Speech for example?
QUESTION-2:
How can I find the char-map file for Turkish language?
QUESTION-3:
If there is no char-map file, must I train through tracing the steps indicated as SyntaxNet's Obtain Data & Training?
QUESTION-4:
Is there a way to generate word-map, char-map... etc. files? Is it the well known word2vec approach that can be used to generate map files which will be able to be processed wt. Syntaxnet tokenizers?
Try this https://github.com/tensorflow/models/issues/830 issue - it contains an (at this moment) temporary solution.

Groovy File() not reporting correct size / length

I have a Jenkins post build Groovy script running out of the "Post build task plugin". From the same plugin, immediately before running the Groovy script, I check for the existence of the file and its size. The log shows:
09:14:53 -rw-r--r-- 1 aaa users 978243 Nov 4 08:53 /jk/workspace/xxxx/output/delta.txt
09:14:53 cppcheck.groovy: Checking build result: SUCCESS
09:14:53 cppcheck.groovy: workspace = /jk/workspace/xxxx
09:14:53 cppcheck.groovy: delta = /jk/workspace/xxxx/output/delta.txt
09:14:53 cppcheck.groovy: delta.txt length = 0
The groovy script is as follows:
import hudson.model.*
def build = Thread.currentThread().executable
def result = build.getResult()
println("cppcheck.groovy: Checking build result: " + result.toString())
if (result.isBetterOrEqualTo(hudson.model.Result.SUCCESS)) {
def workspace = build.getEnvVars()["WORKSPACE"]
def delta = workspace + "/output/delta.txt"
println("cppcheck.groovy: workspace = " + workspace)
println("cppcheck.groovy: delta = " + delta)
def f = new File(delta)
println("cppcheck.groovy: delta.txt length = " + f.length())
if (f.length() > 0) {
build.setResult(hudson.model.Result.UNSTABLE)
}
}
What am I doing wrong here?
Update: There seems to be some scepticism that the file exists and that there is some sort of race condition. To put your minds at rest, let's rule that out. I have modified the build to execute the same ls -l command after it runs the groovy script, to prove the file does exist and that this problem is ultimately Groovy not being able to open the file. I also added the file exists() check to the above Groovy script, which as I suspected it would, reports the file doesn't exist. I don't dispute that Groovy thinks the file doesn't exist. What I am trying to work out is why?
10:31:39 [xxxx] $ /bin/sh -xe /tmp/hudson8964729240493636268.sh
10:31:39 + ls -l /jk/workspace/xxxx/output/delta.txt
10:31:39 -rw-r--r-- 1 aaa users 978243 Nov 4 08:53 /jk/workspace/xxxx/output/delta.txt
10:31:40 cppcheck.groovy: Checking build result: SUCCESS
10:31:40 cppcheck.groovy: workspace = /jk/workspace/xxxx
10:31:40 cppcheck.groovy: delta = /jk/workspace/xxxx/output/delta.txt
10:31:40 cppcheck.groovy: delta.txt length = 0
10:31:40 cppcheck.groovy: delta.txt exists = false
10:31:40 [xxxx] $ /bin/sh -xe /tmp/hudson8007562636561507409.sh
10:31:40 + ls -l /jk/workspace/xxxx/output/delta.txt
10:31:40 -rw-r--r-- 1 aaa users 978243 Nov 4 08:53 /jk/workspace/xxxx/output/delta.txt
Also, notice the timestamp on said file, is still 08:53 when it was created.
I suspected that the Groovy script was running on the build master as opposed to the build node that this particular build was running on. I added some debug to print the hostname for which the Groovy script was running and sure enough it wasn't the same host that the shell variant of the script was running.

Convert string to class object in Python

I have a list of strings. Each string of the list has same format. I would like to convert each string into a class object (if that is the best option), so I can do some analysis of the list of class object.
As an example,
I have the following list
ls_list = ['-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar2',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo2']
I would like to convert each of the above string into a class that has Nine members (perm, etc).
You don't want to do that. Use os.listdir() and os.stat() to get the information you want.
You do it something like this:
import os
data = {}
for file in os.listdir('.'):
data[file] = os.stat(file)
This gives you the information for all the files in the current directory, as objects, as you requested. These can then be inspected, and you can use other functions to figure out the username of the userid you get, etc.
Here's one way of doing what you ask, but as others have noted, don't use this for parsing the output of ls. Also, it assumes that there are only 8 areas of whitespace in your input string separating your data. If any of the data substrings also contain whitespace, this code will fail:
class FileAttribs(object):
ORDERED_ATTRIB_NAMES = ["permissions", "links", "owner",
"groups", "size", "month", "day", "time", "name"]
def __init__(self, lsString):
for (attrib, s) in zip(self.ORDERED_ATTRIB_NAMES, lsString.split()):
setattr(self, attrib, s)
if __name__ == '__main__':
ls_list = ['-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar2',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo2']
files = []
for s in ls_list:
files.append(FileAttribs(s))
#do stuff with files, e.g.
for f in files:
print f.permissions, f.name

Resources