How to null out exceptions in an htmlChecker - python-3.x

While this is a project assignment for class I am trying to understand how to do a specific part of the project.
I need to go through an html file and check if all the opening statements are matched to closing statements. Further, they must be in the correct order and this must be checked using a stack I've implemented. As of right now I am working on extracting each tag from the file. The tough part seems to be the two exceptions that I am working on here. The and the . I need these tags to be removed so the program doesn't read them as an opening or closing statement.
class Stack(object):
def __init__(self):
self.items = []
def isEmpty(self):
return self.items = []
def push(self, item):
self.items.append(item)
def pop(self):
return self.items[-1]
def getTag(file):
EXCEPTIONS = ['br/', 'meta']
s = Stack()
balanced = True
i = 0
isCopying = False
currentTag = ''
isClosing = False
while i < len(file) and balanced:
if symbol == "<":
if i < (len(file) - 1) and file[i + 1] == "/":
i = i + 1
isClosing == True
isCopying == True
if symbol == ">":
if isClosing == True:
top = s.pop()
if not matches(top, symbol):
balanced = False
else:
**strong text**
s.push(currentTag)
currentTag = ''
isCopying == False
if isCopying == True:
currentTag += symbol
The code reads in the file and goes letter by letter to search for <string>. If it exists it pushes it on to the stack. The matches functions checks to see if the closing statement equals the opening statement. The exceptions list is the ones I have to check for that will screw up the placing of the strings on the stack. I am having a tough time trying to incorporate them into my code. Any ideas? Before I push on to the stack I should go through a filter system to see whether that statement is valid or not valid. A basic if statement should suffice.

If I read your requirements correctly, you're going about this very awkwardly. What you're really looking to do is tokenize your file, and so the first thing you should do is get all the tokens in your file, and then check to see if it is a valid ordering of tokens.
Tokenization means you parse through your file and find all valid tokens and put them in an ordered list. A valid token in your case is any string length that starts with a < and ends with a >. You can safely discard the rest of the information I think? It would be easiest if you had a Token class to contain your token types.
Once you have that ordered list of tokens it is much easier to determine if they are a 'correct ordering' using your stack:
is_correct_ordering algorithm:
For each element in the list
if the element is an open-token, put it on the stack
if the element is a close-token
if the stack is empty return false
if the top element of the stack is a matching close token
pop the top element of the stack
else return false
discard any other token
If the stack is NOT empty, return false
Else return true
Naturally, having a reasonable Token class structure makes things easy:
class Token:
def matches(t: Token) -> bool:
pass # TODO Implement
#classmethod
def tokenize(token_string: str) -> Token:
pass # TODO Implement to return the proper subclass instantiation of the given string
class OpenToken:
pass
class CloseToken:
pass
class OtherToken:
pass
This breaks the challenge into two parts: first parsing the file for all valid tokens (easy to validate because you can hand-compare your ordered list with what you see in the file) and then validating that the ordered list is correct. Note that here, too, you can simplify what you're working on by delegating work to a sub-routine:
def tokenize_file(file) -> list:
token_list = []
while i < len(file):
token_string, token_end = get_token(file[i:])
token_list.append = Token.tokenize(token_string)
i = i + token_end # Skip to the end of this token
return token_list
def get_token(file) -> tuple:
# Note this is a naive implementation. Consider the edge case:
# <img src="Valid string with >">
token_string = ""
for x in range(len(file)):
token_string.append(file[x])
if file[x] == '>':
return token_string, x
# Note that this function will fail if the file terminates before you find a closing tag!
The above should turn something like this:
<html>Blah<meta src="lala"/><body><br/></body></html>
Into:
[OpenToken('<html>'),
OtherToken('<meta src="lala"/>'),
OpenToken('<body>'),
OtherToken('<br/>'),
CloseToken('</body>'),
CloseToken('</html>')]
Which can be much more easily handled to determine correctness.
Obviously this isn't a complete implementation of your problem, but hopefully it will help straighten out the awkwardness you've chosen with your current direction.

Related

Wrapping WriteToText Within DoFn

I'm trying to wrap WriteToText within a DoFn to allow for some customization/flexibility in how I write files. Specifically, I want to write different files based on the on an argument/input (based on value provider argument). This is the code I have so far:
class WriteCustomFile(beam.DoFn):
def __init__(self,input,output):
self.input = input
self.output = output
def process(self, element):
import re
def FileVal(path):
File1Regex = re.compile(r"[^\w](testfile)[\w]+(\.csv|\.txt)$")
File2Regex = re.compile(r"[^\w](tester)[\w-]+(\.csv|\.txt)$")
PathStr = str(path)
if File1Regex.search(PathStr) != None:
return "file1"
elif File2Regex.search(PathStr) != None:
return "file2"
File1Header = "Header1,Header2,Header3,Header4,Header5"
File2Header = "Header1,Header2,Header3,Header4,Header5,Header6,Header7,Header8"
if FileVal(self.input.get()) == "file1":
yield WriteToText(self.output.get(),shard_name_template='',header=File1Header)
elif FileVal(self.input.get()) == "file2":
yield WriteToText(self.output.get(),shard_name_template='',header=File2Header)
When I call this DoFn from within the pipeline, it does not write a file. What can I do to get this DoFn to work or is there a better way to handle this?
Thank you!
Here the best thing to do is probably partition your input into multiple PCollections (either using Partition or a DoFn with multiple outputs), and write each one out separate.
More generally one can use Dynamic Destinations, but this is not yet supported for Python.

Python Try Except when a list is null

I've been searching for my problem here, but i can't find the exact answer to my problem.
I call a sympy function ( solve() ). This function can return a full list or an empty list.
I call this piece of code inside a while:
try:
sol = solve([eq1,eq2],[r,s])
rB = bin(abs(sol[0][0]))
sB = bin(abs(sol[0][1]))
stop = True
r = rB[2:len(rB)]
s = sB[2:len(sB)]
P = int("0b"+r+s,2)
Q = int("0b"+s+r,2)
print(P*Q == pubKey.n)
print("P = {}".format(P))
print("Q = {}".format(Q))
break
except ValueError:
pass
What i want is:
if the solve() returns an empty list, just pass. And if the solve() returns a full list, keep with the execution. The solve will be returning empty list until i find the right value.
This can be reached by checking sol[0][0], if there's a non-empty list this will work, but if the list is empty, this will throw an error (null pointer) i want try to flag it and pass.
What i'm having now is that when sol is empty, it tries to get sol[0][0], and ofc this throws an error that's not being catched by the try, and the whole code stops.
Anyone knows a solution for that? I'm not using try correctly?
Set sol in the beginning of each loop to some value and check it in the except clause
about else
try/except has an else which will be run the try block did not raise an Exception
and for has an else clause for when it was not broken out of!
for foo in iterable:
# set value so the name will be available
# can be set prior to the loop, but this clears it on each iteration
# which seems more desirable for your case
sol = None
try:
"logic here"
except Exception:
if isinstance(sol, list):
"case where sol is a list and not None"
# pass is implied
else: # did not raise an Exception
break
else: # did not break out of for loop
raise Exception("for loop was not broken out of!")

How to print class variables in a list

So I am very new to coding and started with python, I am trying to build a class in a program that puts together a DnD party by randomising their attributes. So far I can get the program to initialise instances of the party members and just give the user a prompt on how many of the hero's to choose from they would like in their party. My issue is that after setting the lists up and getting everything in place. I am unable to print any of the attributes of the individual heros. Regardless of whether I am calling them from within the lists or if I am directly trying to print them. I have tried using __str__ to create strings of the attributes but I am clearly missing something. Any help would be greatly appreciated.
import random
class Party:
def __init__(self, name="", race="", alignment="", class_=""):
self.name = name
while name == "":
name = random.choice(names)
# print(name)
self.race = race
while race == "":
race = random.choice(races)
# print(race)
self.alignment = alignment
while alignment == "":
alignment = random.choice(alignments)
# print(alignment)
self.class_ = class_
while class_ == "":
class_ = random.choice(classes)
# print(class_)
def character_stats(self):
return "{} - {} - {} - {}".format(self.name, self.race, self.class_, self.alignment)
Each attribute pulls a random value from a list. My format statement is the latest attempt to get the values of the attributes to print rather than the object/attributes instead.
I apologise if any of the terminology is wrong, very very new to this
You are not assigning anything else but the input, (in this case being an empty string "" to the attribuytes. In your minimal example you have this constructor:
class Party:
def __init__(self, name=""):
self.name = name
while name == "":
name = random.choice(names)
After you randomly assign a new name from names, you should assign it to self, otherwise the local variable just goes out of scope when the __init__ method finishes. This code snippet should work:
class Party:
def __init__(self, name=""):
while name == "":
name = random.choice(names)
# Now we assign the local variable as
# an attribute
self.name = name

Python3 verify if List Items are contained in read() result

I want to verify if Items from a List are contained in what i fetch by using string.read().
How do I do this:
if string.find(lisst):
do_whatever()
elif string.find(lisst2):
do_something_else()
Example is pretty basic, but that's all I want to do. I keep getting invalid syntax error. :(
def verify(text):
lisst = ['awesome','failed','trolling']
lisst2 = ['boring','bad']
s = requests.get(text)
t = s.read()
if t.find(lisst):
print("Someone was awesome, failing or trolling!")
elif t.find(lisst2)
print("Something retarded happened")
error is thrown at elif t.find(lisst2), so I need a workaround.
elif any(n in t for n in lisst2):
^
SyntaxError: invalid syntax
Thank you in advance!
If I understood correctly, you want to do something in an if block if a string contains any the elements of a list.
if any(str in aVeryLongStringData for str in myList):
doStuff()
If you want to check if your data contains all of the elements on your list, you can just change "any" to "all"
if all(str in aVeryLongStringData for str in myList):
doStuff()

Write a recursive function to list all paths of parts.txt

Write a function list_files_recursive that returns a list of the paths of all the parts.txt files without using the os module's walk generator. Instead, the function should use recursion. The input will be a directory name.
Here is the code I have so far and I think it's basically right, but what's happening is that the output is not one whole list?
def list_files_recursive(top_dir):
rec_list_files = []
list_dir = os.listdir(top_dir)
for item in list_dir:
item_path = os.path.join(top_dir, item)
if os.path.isdir(item_path):
list_files_recursive(item_path)
else:
if os.path.basename(item_path) == 'parts.txt':
rec_list_files.append(os.path.join(item_path))
print(rec_list_files)
return rec_list_files
This is part of the output I'm getting (from the print statement):
['CarItems/Honda/Accord/1996/parts.txt']
[]
['CarItems/Honda/Odyssey/2000/parts.txt']
['CarItems/Honda/Odyssey/2002/parts.txt']
[]
So the problem is that it's not one list and that there's empty lists in there. I don't quite know why this isn't not working and have tried everything to work through it. Any help is much appreciated on this!
This is very close, but the issue is that list_files_recursive's child calls don't pass results back to the parent. One way to do this is to concatenate all of the lists together from each child call, or to pass a reference to a single list all the way through the call chain.
Note that in rec_list_files.append(os.path.join(item_path)), there's no point in os.path.join with only a single parameter. print(rec_list_files) should be omitted as a side effect that makes the output confusing to interpret--only print in the caller. Additionally,
else:
if ... :
can be more clearly written here as elif: since they're logically equivalent. It's always a good idea to reduce nesting of conditionals whenever possible.
Here's the approach that works by extending the parent list:
import os
def list_files_recursive(top_dir):
files = []
for item in os.listdir(top_dir):
item_path = os.path.join(top_dir, item)
if os.path.isdir(item_path):
files.extend(list_files_recursive(item_path))
# ^^^^^^ add child results to parent
elif os.path.basename(item_path) == "parts.txt":
files.append(item_path)
return files
if __name__ == "__main__":
print(list_files_recursive("foo"))
Or by passing a result list through the call tree:
import os
def list_files_recursive(top_dir, files=[]):
for item in os.listdir(top_dir):
item_path = os.path.join(top_dir, item)
if os.path.isdir(item_path):
list_files_recursive(item_path, files)
# ^^^^^ pass our result list recursively
elif os.path.basename(item_path) == "parts.txt":
files.append(item_path)
return files
if __name__ == "__main__":
print(list_files_recursive("foo"))
A major problem with these functions are that they only work for finding files named precisely parts.txt since that string literal was hard coded. That makes it pretty much useless for anything but the immediate purpose. We should add a parameter for allowing the caller to specify the target file they want to search for, making the function general-purpose.
Another problem is that the function doesn't do what its name claims: list_files_recursive should really be called find_file_recursive, or, due to the hardcoded string, find_parts_txt_recursive.
Beyond that, the function is a strong candidate for turning into a generator function, which is a common Python idiom for traversal, particularly for situations where the subdirectories may contain huge amounts of data that would be expensive to keep in memory all at once. Generators also allow the flexibility of using the function to cancel the search after the first match, further enhancing its (re)usability.
The yield keyword also makes the function code itself very clean--we can avoid the problem of keeping a result data structure entirely and just fire off result items on demand.
Here's how I'd write it:
import os
def find_file_recursive(top_dir, target):
for item in os.listdir(top_dir):
item_path = os.path.join(top_dir, item)
if os.path.isdir(item_path):
yield from find_file_recursive(item_path, target)
elif os.path.basename(item_path) == target:
yield item_path
if __name__ == "__main__":
print(list(find_file_recursive("foo", "parts.txt")))

Resources