Convert string to class object in Python - python-3.x

I have a list of strings. Each string of the list has same format. I would like to convert each string into a class object (if that is the best option), so I can do some analysis of the list of class object.
As an example,
I have the following list
ls_list = ['-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar2',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo2']
I would like to convert each of the above string into a class that has Nine members (perm, etc).

You don't want to do that. Use os.listdir() and os.stat() to get the information you want.

You do it something like this:
import os
data = {}
for file in os.listdir('.'):
data[file] = os.stat(file)
This gives you the information for all the files in the current directory, as objects, as you requested. These can then be inspected, and you can use other functions to figure out the username of the userid you get, etc.

Here's one way of doing what you ask, but as others have noted, don't use this for parsing the output of ls. Also, it assumes that there are only 8 areas of whitespace in your input string separating your data. If any of the data substrings also contain whitespace, this code will fail:
class FileAttribs(object):
ORDERED_ATTRIB_NAMES = ["permissions", "links", "owner",
"groups", "size", "month", "day", "time", "name"]
def __init__(self, lsString):
for (attrib, s) in zip(self.ORDERED_ATTRIB_NAMES, lsString.split()):
setattr(self, attrib, s)
if __name__ == '__main__':
ls_list = ['-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 bar2',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo1',
'-rw-r--r-- 1 ahmed None 0 Apr 21 17:10 foo2']
files = []
for s in ls_list:
files.append(FileAttribs(s))
#do stuff with files, e.g.
for f in files:
print f.permissions, f.name

Related

transforms.route.topic.expression and groovy expression

I'm trying to use debezium transforms.route.topic.expression
Here the entries in connector configuration
"transforms": "dropPrefix,unwrapi,route",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "rocketlawyer.dbo.(.*)",
"transforms.dropPrefix.replacement": "$1",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.route.type": "io.debezium.transforms.ContentBasedRouter",
"transforms.route.language": "jsr223.groovy",
"transforms.route.topic.expression": "value.snapshotequals('true') ? ${topic} : cdc.$1"
when I'm applying config call I'm getting following error
iguenkin_rocketlawyer_com#dbadm-r208:~/confluent$ curl -d #mssql_trg_cf.json -H "Content-Type: application/json" -X PUT http://localhost:8083/connectors/mssql_trg/config | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1326 100 242 100 1084 30250 132k --:--:-- --:--:-- --:--:-- 161k
{
"error_code": 500,
"message": "Unexpected character ('\"' (code 34)): was expecting comma to separate Object entries\n at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 832]"
}
column: 832
"transforms.route.topic.expression": "value.snapshotequals('true') ? ${topic} : cdc.$1"
column 830 is a blank space in string : cdc
Here the questions
how can I be sure that jsr223.groovy is installed and accessible to transformations
as I am understand it is a part of dibezium connector, so it is not listed as a separate plugin
here the list of jars whee debezium is installed
`-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 20272 Dec 9 20:44 debezium-api-1.3.1.Final.jar
-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 91369 Dec 9 20:44 debezium-connector-sqlserver-1.3.1.Final.jar
-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 844500 Dec 9 20:44 debezium-core-1.3.1.Final.jar
-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 19090 Dec 15 22:01 debezium-scripting-1.3.1.Final.jar
-rw-r--r-- 1 iguenkin_rocketlawyer_com 1234193955 16839 Dec 16 01:04 groovy-jsr223-3.0.7-indy.jar`
if groovy is installed what is wrong with expression ? I've used following documentation to set connector https://debezium.io/documentation/reference/configuration/content-based-routing.html

Python PATH .is_file() evaluates symlink as a file

In my Python3 program, I take a bunch of paths and do things based on what they are. When I evaluate the following symlinks (snippet):
lrwxrwxrwx 1 513 513 5 Aug 19 10:56 console -> ttyS0
lrwxrwxrwx 1 513 513 11 Aug 19 10:56 core -> /proc/kcore
lrwxrwxrwx 1 513 513 13 Aug 19 10:56 fd -> /proc/self/fd
the results are:
symlink console -> ttyS0
file core -> /proc/kcore
symlink console -> ttyS0
It evaluates core as if it were a file (vs a symlink). What is the best way for me to evaluate it as a symlink vs a file? code below
#!/usr/bin/python3
import sys
import os
from pathlib import Path
def filetype(filein):
print(filein)
if Path(filein).is_file():
return "file"
if Path(filein).is_symlink():
return "symlink"
else:
return "doesn't match anything"
if __name__ == "__main__":
file = sys.argv[1]
print(str(file))
print(filetype(file))
The result of is_file is intended to answer the question "if I open this name, will I open a file". For a symlink, the answer is "yes" if the target is a file, hence the return value.
If you want to know if the name is a symlink, ask is_symlink.

Remove timestamp and url from string python

I have a string from which I have to remove the timestamp and punctuation. And I have to remove all the digits also but responseCode value
has to be kept as is for example 400 in this case. And wherever 400 comes, it should not be removed. And I want to remove all the url's
and file name ending with tar.gz.
mystr="sun aug 19 13:02:09 2018 I_am.98189: hello please connect to the local host:8080
sun aug 19 13:02:10 2018 hey.94289: hello not able to find the file
sun aug 19 13:02:10 2018 I_am.94289: Base url for file_transfer is: abc/vd/filename.tar.gz
mon aug 19 13:02:10 2018 how_94289: $var1={
'responseCode' = '400',
'responseDate' = 'Sun, 19 Aug 2018 13:02:08 ET',
'responseContent' = 'ABC' }
mon aug 20 13:02:10 2018 hello!94289: Error performing action, failed with error code [400]
"
Expected result:
"I_am hello please connect to the local host
hello not able to find the file
Base url for file_transfer
var1
responseCode = 400
responseDate
responseContent = ABC
Error performing action, failed with error code 400
"
My Solution to remove punctuation:
punctuations = '''!=()-[]{};:'"\,<>.?##$%^&*_~'''
no_punct = ""
for char in mystr:
if char not in punctuations:
no_punct = no_punct + char
# display the unpunctuated string
print(no_punct)
Maybe:
patterns = [r"\w{3} \w{3} \d{2} \d{2}:\d{2}:\d{2} \d{4}\s*", #sun aug 19 13:02:10 2018
r"\w{3}, \d{2} \w{3} \d{4} \d{2}:\d{2}:\d{2} \w{2}\s*", #Sun, 19 Aug 2018 13:02:08 ET
r":\s*([\da-zA_Z]+\/)+([a-zA-Z0-9\.]+)", #URL
r"([a-zA-Z_!]+)[\.!_]\d+:\s*", #word[._!]number:>=0space
r":\d+",
"[/':,${}\[\]]" #punctuations
]
s = mystr
for p in patterns:
s = re.sub(p,'', s)
s = s.strip()
print(s)
Output:
hello please connect to the local host
hello not able to find the file
Base url for file_transfer is
var1=
responseCode = 400
responseDate =
responseContent = ABC
Error performing action failed with error code 400

Syntaxnet Turkish Language Data Set Non Existent Map Files

I am new to Syntaxnet and i tried to use pre-trained model of Turkish language through the instructions here
Point-1 : Although I set the MODEL_DIRECTORY environment variable, tokenize.sh didn't find the related path and it gives error like below :
root#4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi." | syntaxnet/models/parsey_universal/tokenize.sh
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: label-map**)
Point-2 : So, I changed the tokenize.sh through commenting the MODEL_DIR=$1 and set my Turkish language model path like below to go on :
PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
CONTEXT=syntaxnet/models/parsey_universal/context.pbtxt
INPUT_FORMAT=stdin-untoken
MODEL_DIR=$1
MODEL_DIR=syntaxnet/models/etiya-smart-tr
Point-3 : After that when I run it as told, it gives error like below :
root#4562a2ee0202:/opt/tensorflow/models/syntaxnet# echo "Eray eve geldi" | syntaxnet/models/parsey_universal/tokenize.sh
I syntaxnet/term_frequency_map.cc:101] Loaded 29 terms from syntaxnet/models/etiya-smart-tr/label-map.
I syntaxnet/embedding_feature_extractor.cc:35] Features: input.char input(-1).char input(1).char; input.digit input(-1).digit input(1).digit; input.punctuation-amount input(-1).punctuation-amount input(1).punctuation-amount
I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: chars;digits;puncts
I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 16;16;16
F syntaxnet/term_frequency_map.cc:62] Check failed: ::tensorflow::Status::OK() == (tensorflow::Env::Default()->NewRandomAccessFile(filename, &file)) (OK vs. **Not found: syntaxnet/models/etiya-smart-tr/char-map**)
I had downloaded the Turkish package through tracing the link pattern indicated like download.tensorflow.org/models/parsey_universal/.zip
and my language mapping file list like below :
-rw-r----- 1 root root 50646 Sep 22 07:24 char-ngram-map
-rw-r----- 1 root root 329 Sep 22 07:24 label-map
-rw-r----- 1 root root 133477 Sep 22 07:24 morph-label-set
-rw-r----- 1 root root 5553526 Sep 22 07:24 morpher-params
-rw-r----- 1 root root 1810 Sep 22 07:24 morphology-map
-rw-r----- 1 root root 10921546 Sep 22 07:24 parser-params
-rw-r----- 1 root root 39990 Sep 22 07:24 prefix-table
-rw-r----- 1 root root 28958 Sep 22 07:24 suffix-table
-rw-r----- 1 root root 561 Sep 22 07:24 tag-map
-rw-r----- 1 root root 5234212 Sep 22 07:24 tagger-params
-rw-r----- 1 root root 172869 Sep 22 07:24 word-map
QUESTION-1 :
I am aware that there is no char-map file in the directory so I got the error written # Point-3 above. So, does anyone have an opinion about how the Turkish language test could be done and the result was shared as %93,363 for Part-of-Speech for example?
QUESTION-2:
How can I find the char-map file for Turkish language?
QUESTION-3:
If there is no char-map file, must I train through tracing the steps indicated as SyntaxNet's Obtain Data & Training?
QUESTION-4:
Is there a way to generate word-map, char-map... etc. files? Is it the well known word2vec approach that can be used to generate map files which will be able to be processed wt. Syntaxnet tokenizers?
Try this https://github.com/tensorflow/models/issues/830 issue - it contains an (at this moment) temporary solution.

Fortran 77 read unformatted sequence data from old sun machine

I am porting an old mathematical model (between 1995 to 2000) to a current linux machine. For this, I adapted all makefiles as shown:
FORTRAN = gfortran # f90 -f77 -ftrap=%none
OPTS = -O -u -lgfortran -g -fconvert="big-endian" # -O -u
NOOPT =
LOADER = gfortran #f90
LOADOPTS = #-lf77compat
and:
SYSFFLAGS = -O0 -u -g -fconvert="big-endian" # -f77=input
SYSCFLAGS = -DX_WCHAR
SYSLDFLAGS =
SYSCPPFLAGS = -DSYS_UNIX -DCODE_ASCII -DCODE_IEEE # -DSYS_Sun
SYSAUTODBL = -fdefault-real-8 #-r8
SYSDEBUG = -g
SYSCHECK = -C
LINKOPT =
CPPOPT =
SHELL = /bin/sh
CC = cc
FC = gfortran # f90
LD = gfortran # f90
AR = ar vru
RM = rm -f
CP = cp
MV = mv -f
LN = ln -s
So I replaced all the outdated compilers/options to be able to compile the code. After that, It generates the binaries with no error. Note that all option behind the # symbol where the original options in the Makefiles.
So, when running the program, it is not possible to read the sample data. IMO those files were created as unformatted sequence mode on a Sun machine. The following hex dump belongs to the file that I need to read.
0000000: 0000 0400 2020 2020 2020 2020 2020 2020 ....
0000010: 3930 3130 7465 7374 2d63 3031 2020 2020 9010test-c01
0000020: 2020 2020 4741 5520 2020 2020 2020 2020 GAU
0000030: 2020 2020 2020 2020 2020 2020 2020 2020
0000040: 2020 2020 2020 2020 2020 2020 2020 2020
0000050: 2020 2020 2020 2020 2020 2020 2020 2020
...
...
0000390: 2020 2020 2020 2020 2020 2020 2020 2020
00003a0: 2020 2020 2020 2020 2020 2020 2020 2020
00003b0: 2020 2020 3139 3936 3037 3232 2032 3030 19960722 200
00003c0: 3434 3920 4147 434d 352e 3420 2020 2020 449 AGCM5.4
00003d0: 2020 2020 3230 3030 3036 3134 2031 3230 20000614 120
00003e0: 3831 3720 6869 726f 2020 2020 2020 2020 817 hiro
00003f0: 2020 2020 2020 2020 2020 2020 2020 2034 4
0000400: 3039 3630 0000 0400 0002 8000 bef7 21f3 0960..........!.
0000410: bf3c 55ab bf7a 8f71 bf99 e26a bfb2 db4e .<U..z.q...j...N
0000420: bfc7 425f bfd6 64b1 bfdf d44f bfe3 6a43 ..B_..d....O..jC
After analyzing the code, it is able to read until the line 0000410. It is not possible to continue after the mark 0002 8000. The source code shown below is actually reading this file.
...
* [INPUT]
INTEGER IFILE
CHARACTER HITEM *(*) !! name for identify
CHARACTER HDFMT *(*) !! data format
*
* [ENTRY INPUT]
REAL * 8 TIME1 !! time
REAL * 8 TIME2 !! time
REAL*8 DMIN
REAL*8 DMAX
REAL*8 DIVS
REAL*8 DIVL
INTEGER ISTYPE
INTEGER JFILE !! output file No.
INTEGER IMAXD
INTEGER JMAXD
*
* [WORK]
REAL * 8 DDATA ( NGDWRK )
REAL * 4 SDATA ( NGDWRK )
*
* [INTERNAL WORK]
INTEGER I, J, K, IJK, IJKNUM, IERR
...
...
READ ( IFILE, IOSTAT=IEOD ) HEAD
...
...
...
DO 2150 IJK = 1, IJKNUM
READ ( IFILE, END=2150 ) SDATA(IJK)
WRITE (6,*) ' IGTIO::GTZZRD: iteration=', IJK, SDATA(IJK)
2150 CONTINUE
In order to easily debug the loop I replaced for the one above. The original one is implicit.
READ ( IFILE, IOSTAT=IEOD)
& (SDATA(IJK), IJK=1, IJKNUM )
And the output for the loop is:
IGTIO::GTZZRD: iteration= 1 -0.48268089
IGTIO::GTZZRD: iteration= 2 1.35631564E-19
IGTIO::GTZZRD: iteration= 3 -0.48142704
IGTIO::GTZZRD: iteration= 4 1.35631564E-19
IGTIO::GTZZRD: iteration= 5 244.25270
IGTIO::GTZZRD: iteration= 6 1.35631564E-19
IGTIO::GTZZRD: iteration= 7 983.87988
IGTIO::GTZZRD: iteration= 8 1.35631564E-19
IGTIO::GTZZRD: iteration= 9 1.59284362E-04
IGTIO::GTZZRD: iteration= 10 1.35631564E-19
IGTIO::GTZZRD: iteration= 11 0.0000000
---error here---
I am definately lost on this, so any help is appreciated.
Here is whats going on - first this This is definitely a Big Enfian file.
The first 4 bytes
00000400
are the big end 4 byte integer 1024, which is the length of your first record.
which agrees with the length of HEAD (per comment)
Now note that 00000400 is repeated at byte position 1024+4 exactly (hex dump line 400) as you should expect for a fortran unformatted file...so far so good.
Now the next 4 bytes
0002 8000
begin the second record. (edit correcting mistake) This is 163840 (2*16^4+8*16^3) You should find that repeated at position 1024+8+163840+4 in the hex dump. (should be line 028400 i think..)
Here is the problem: in your code you are reading that 160 kilobyte record into a single 4 byte variable, then moving on to the next record. My guess you are seeing that alternating 10^-19 because every other record is of type character.
In unformatted fortran you must read a whole record in one shot - try reading the entire array (without the loop..)
READ ( IFILE )SDATA
assuming sdata is dimensioned to hold 160 kb of course. (eg. real*4 (40960) )
The answer to your problem is in the edit that I missed. Thanks also to george's arithmetic---which I hadn't bothered to do.
We can safely say that the record headers are correct, and that you can solve your problem with the endian conversion.
So, the problem is: reads with an implied-do loop are not equivalent to reads inside a do loop.
That is: read(unit) (i(j), j=1,5) is not the same as
do j=1,5
read(unit) i(j)
end do
In the first, reading of the five values is from one record, in the second each is read from a distinct record.
You should, then, revert your change. If you want to do the same diagnostics, however, you can do something like
READ ( IFILE, IOSTAT=IEOD) (SDATA(IJK), IJK=1, IJKNUM )
WRITE (6, '("IGTIO::GTZZRD: iteration='", I0.0, F12.8)') (IJK, SDATA(IJK), IJK=1,IJKNUM)
Out of this scope, for some reason a deep method in charge of reading the file is called in the 1100 loop. That was causing to read the file more times than necessary. Find below the fixed code:
* 1100 CONTINUE
CALL GDREDX !! read data
O ( GDATA , IEOD ,
O HITEMD, HTITL , HUNIT , HDSET ,
O TIME , TDUR , KLEVS ,
I IFILE , HITEM , HDFMT ,
I IMAXD , JMAXD ,
I IDIMD , JDIMD , KDIMD )
IF ( IEOD .EQ. 0 ) THEN
WRITE (6,*) ' IRWGD.F::GDRDTS: TSEL0=', TSEL0
WRITE (6,*) ' IRWGD.F::GDRDTS: TSEL1=', TSEL1
WRITE (6,*) ' IRWGD.F::GDRDTS: TIME=', TIME
* IF ( ((TSEL0.GE.0).AND.(TIME.LT.TSEL0))
* & .OR.((TSEL1.GE.0).AND.(TIME.GT.TSEL1)) ) THEN
* GOTO 1100
* ENDIF
ENDIF
*
RETURN
END
That mislead me in order to figure out what was going on.

Resources