apparent pyparsing bug with 'ZeroOrMore' - python-3.x

I'm using pyparsing with python 3.6.5 on a mac. The following code crashes on the second parse:
from pyparsing import *
a = Word(alphas) + Literal(';')
b = Word(alphas) + Optional(Literal(';'))
bad_parser = ZeroOrMore(a) + b
b.parseString('hello;')
print("no problems yet...")
bad_parser.parseString('hello;')
print("this will not print because we're dead")
Is this logical behavior? Or is it a bug?
EDIT: Here is the full console output:
no problems yet...
Traceback (most recent call last):
File "test.py", line 9, in <module>
bad_parser.parseString('hello;')
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1632, in parseString
raise exc
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1622, in parseString
loc, tokens = self._parse( instring, 0 )
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1379, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 3395, in parseImpl
loc, exprtokens = e._parse( instring, loc, doActions )
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1379, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 2689, in parseImpl
raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected W:(ABCD...) (at char 6), (line:1, col:7)

This is expected behavior. Pyparsing does not do any lookahead, but is purely left-to-right. You can add lookahead to your parser, but it is something you have to do for yourself.
You can get some more insight into what is happening if you turn on debugging for a and b:
a.setName('a').setDebug()
b.setName('b').setDebug()
which will show you every place pyparsing is about to match the expression, and then if the match failed or succeeded, and if it succeeded, the matching tokens:
Match a at loc 0(1,1)
Matched a -> ['hello', ';']
Match a at loc 6(1,7)
Exception raised:Expected W:(ABCD...) (at char 6), (line:1, col:7)
Match b at loc 6(1,7)
Exception raised:Expected W:(ABCD...) (at char 6), (line:1, col:7)
Since a matches the complete input string, that matches the criterion of "zero or more". Then pyparsing proceeds to match b, but since the word and semicolon have already been read, there is no more to parse. Since b is not optional, pyparsing raises an exception that it could not be found. Even if you were to parse "hello; hello; hello;", all the strings and semis would be consumed by the
ZeroOrMore, with no trailing b left to parse.
Try this:
not_so_bad_parser = ZeroOrMore(a + ~StringEnd()) + b
By stating that you only want to read a expressions that are not at the end of the string, then parsing "hello;" will not match a, and so proceed to b, which then matches.
This is so prevalent an issue that I added the stopOn keyword to the ZeroOrMore and OneOrMore class constructors, to avoid the need to add the overt ~ (meaning NotAny). At first I thought this might work:
even_less_bad_parser = ZeroOrMore(a, stopOn=b) + b
But then, since b also matches as an a, this will effectively never match any as, and may leave unmatched text. We need to stop on b only if at the end of the string:
even_less_bad_parser = ZeroOrMore(a, stopOn=b + StringEnd()) + b
I'm not sure if that will truly satisfy your concept of "less bad"-ness, but that is why pyparsing is behaving as it is for you.

Related

Chop a file in Julia

I have opened a file in Julia:
output_file = open(path_to_file, "a")
And I would like to chop the six last characters of the file.
I thought I could do it with chop, i.e., chop(output_file; tail = 6) but it seems it only works with String type and not with IOStream. How should I do?
julia> rbothpoly(0, 1, [5], 2, 30, "html")
ERROR: MethodError: no method matching chop(::IOStream; tail=6)
Closest candidates are:
chop(::AbstractString; head, tail) at strings/util.jl:164
Stacktrace:
[1]
[...] ERROR STACKTRACE [...]
[3] top-level scope at REPL[37]:1
I am new to IOStream, discovering them today.
In your case, because you're doing a single write to the end of the file and not doing any further read or other operations, you can also edit the file in-place like this:
function choppre(fname = "data/endinpre.html")
linetodelete = "</pre>\n"
linelength = length(linetodelete)
open(fname, "r+") do f
readuntil(f, linetodelete)
seek(f, position(f) - linelength)
write(f, " "^linelength)
end
end
This overwrites the text we wish to chop off with an equal length of space characters. I'm not sure if there's a way to simply delete the line (instead of overwriting it with ' ').
I have found what I wanted here, which adapts in my problem to:
(tmppath, tmpio) = mktemp()
open(output_filename, "r") do io
for line in eachline(io, keep=true) # keep so the new line isn't chomped
if line == "</pre>\n"
line = "\n"
end
write(tmpio, line)
end
end
close(tmpio)
mv(tmppath, output_filename, force=true)
chmod(output_filename, 0o777)
close(output_file)
Maybe my question could be marked as duplicate!

Extracting strings from a file to another using python3 turns out TypeError: expected str, bytes or os.PathLike object, not list

The code I am trying to compile is producing a TypeError in the Python shell,
however, when I change the out_file PATH as concrete strings, it works as desired. Can someone please help me figure out what is producing this error?
The .fasta file is series of amino sequences with special header line like this:
>sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) OX=654924 GN=FV3-001R PE=4 SV=1
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD
AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL
EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD
SFRKIYTDLGWKFTPL
Below is my code:
fasta_file = open('/Users/apple/Desktop/123.fasta','r')
seq = ''
for line in fasta_file:
if line[0] == '>' and seq == '':
head = line
AC = line.split('|')
elif line[0] != '>' and seq == '':
seq = seq + line
elif line[0] == '>' and seq != '':
out_file = open(AC,'w')
out_file.write(head + line)
out_file.close()
seq=''
head=line
Traceback (most recent call last):
File "<stdin>", line 8, in <module>
TypeError: expected str, bytes or os.PathLike object, not list
In line 8 try this,
line=line.join(" ")
seq=seq+ line
.
.
.
This line
seq = seq + line
Clearly from your code line is a list as you are indexing it in the previous line. But seq is a string. Python is complaining that the + operator can't be used to add a list to a string.
I'm not certain what you are trying to accomplish so I can't really suggest a fix. If you are trying to add the line to seq, then you would need to choose which element of line you want to add it to (e.g. seq + line[0]). If you want to add seq as (say) the last element of line, then you would use line.append(seq), for instance.

How to get the first match from the regex find all function?

I am new to regex and python, I have to find a keyword from a text file and after successful finding the string I have to find the only number from the string. But the number is getting printed 6 times. I only need the first outcome to store in a variable as integer. Here is my full code. And the string I am looking for from the .txt file is "Lost\n7". And the number I want from this string is 7.
import re
with open('test.txt') as f:
for line in f:
# Capture one-or-more characters of non-whitespace after the initial match
# rsrp = re.search(r'RSRP:(\S+)', line)
packet_loss_search = re.search(r'Lost(\S+)',line)
# Did we find a match?
if (packet_loss_search):
# Yes, process it
details = packet_loss_search.group(0)
a=str(details)
#a=a[-1]
#print(a)
temp =re.findall(r'\d+', a)
res = list(map(int, temp))
print(res[0])
OUTPUT:
7
7
7
7
7
7
I'd suggest reading the file into memory as a single string if your expected match(es) span(s) across multiple lines. You could fix the code by replacing it with
import re
with open('test.txt', 'r') as f:
m = re.search(r'Lost\n(\d+)', f.read())
if m: # Check if there is a match
print(m.group(1))
Here, f.read() will read the file contents into a single string, and Lost\n(\d+) will match and capture into Group 1 any one or more digits after Lost + a newline char.

Error while taking input invalid literal for int() with base 10: '1 2 3'

def kad(l):
max_c=max_g=l[0]
for i in range(0,len(l)):
max_c=max(l[i],l[i]+max_c)
if(max_c>max_g):
max_g=max_c
print(max_c)
return max_g
t=int(input("test case")) ## TEST CASES
for k in range (0,t):
n=int(input(" num")) # TOTAL NUMBERS IN EACH TEST CASE
l=[float(int(input())) for i in range(0,n)]
if(len(l)>0):
kad(l)
print(l)
Error Message
File "/home/dc97fc38c3d1e4a695c9d3550e8af5c1.py", line 16, in <module>
l=[float(int(input())) for i in range(0,n)]
File "/home/dc97fc38c3d1e4a695c9d3550e8af5c1.py", line 16, in <listcomp>
l=[float(int(input())) for i in range(0,n)]
ValueError: invalid literal for int() with base 10: '1 2 3'
Even the code is working fine in the local editor(Jupyter notebook) but displays the error in the online editor.
No idea what your function tries to do - for empty lists it gives an error(which you guard against) , for only positive inputs it is a convoluted way of summing up all numbers, adding the first value twice.
It looks like a mangled solutions to some hackerrank'isc site.
You should fix your input like this:
for _ in range(int(input("test case"))): ## TEST CASES
_ = input() # TOTAL NUMBERS IN EACH TEST CASE
# the number of numbers does not matter - you get it as len(l) if needed
l = list(map(float,input().strip().split())) # split the input and parse it to floats
# ^^^ change to int if you only handle int's - your error suggest so
if(len(l)>0):
kad(l)

Convert number string to number using python 3.6+

I have got the value from database 350,000,000.00 now I need to convert it to 350000000.
Please provide a solution on this using Python 3.6+ version
Thanks
Let the input be in a variable, say
a="350,000,000.00"
Since, the digits are comma , separated, that needs to be removed.
a.replace(",","")
>>> 350000000.00
The resultant string is a float. When we directly convert the string to integer, it will result in an error.
int(a.replace(",",""))
>>>Traceback (most recent call last):
File "python", line 2, in <module>
ValueError: invalid literal for int() with base 10: '350000000.00'
So, convert the number to float and then to int.
int(float(a.replace(",","")))
>>>350000000
Store the value in a variable and then parse it with int(variable_name)
eg. If you store the value in variable a, just write
int(float(a))
def convert(a):
r = 0
s = a.split(".")[0]
for c in s.split(","):
r = r * 1000 + int(c)
return r
s = convert("350,000,000.00")

Resources