Using HTMLParser in Python 3.2 - python-3.x

I have been using HTML Parser to scrapping data from websites and stripping html coding whilst doing so. I'm aware of various modules such as Beautiful Soup, but decided to go down the path of not depending on "outside" modules. There is a code code supplied by Eloff: Strip HTML from strings in Python
from HTMLParser import HTMLParser
class MLStripper(HTMLParser):
def __init__(self):
self.reset()
self.fed = []
def handle_data(self, d):
self.fed.append(d)
def get_data(self):
return ''.join(self.fed)
def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()
It works in Python 3.1. However, I recently upgraded to Python 3.2.x and have found I get errors regarding the HTML Parser code as written above.
My first error points to the line:
s.feed(html)
... and the error says ...
AttributeError: 'MLStripper' object has no attribute 'strict'
So, after a bit of research, I add "strict=True" to the top line, making it...
class MLStripper(HTMLParser, strict=True)
However, I get the new error of:
TypeError: type() takes 1 or 3 arguments
To see what would happen, I removed the "self" argument and left in the "strict=True"... which gave up the error:
NameError: global name 'self' is not defined
... and I got the "I'm guessing on guesses" feeling.
I have no idea what the third argument in the class MLStripper(HTMLParser) line would be, after self and strict=True; research didn't toss any enlightenment.

You're subclassing HTMLParser, but you aren't calling its __init__ method. You need to add one line to your __init__ method:
def __init__(self):
super().__init__()
self.reset()
self.fed = []
Also, for Python 3, the import line is:
from html.parser import HTMLParser
With these changes, a simple example works. Don't change the class line, that's not related.

Related

Using singledispatch with custom class(CPython 3.8.2)

Let's say I want to set functions for each classes in module Named 'MacroMethods'. So I've set up singledispatch after seeing it in 'Fluent Python' like this:
#singledispatch
def addMethod(self, obj):
print(f'Wrong Object {str(obj)} supplied.')
return obj
...
#addMethod.register(MacroMethods.Wait)
def _(self, obj):
print('adding object wait')
obj.delay = self.waitSpin.value
obj.onFail = None
obj.onSuccess = None
return obj
Desired behavior is - when instance of class 'MacroMethods.Wait' is given as argument, singledispatch runs registered function with that class type.
Instead, it runs default function rather than registered one.
>>> Wrong Object <MacroMethods.Wait object at 0x0936D1A8> supplied.
However, type() clearly shows instance is class 'MacroMethods.Wait', and dict_keys property also contains it.
>>> dict_keys([<class 'object'>, ..., <class 'MacroMethods.Wait'>])
I suspect all custom classes I made count as 'object' type and don't run desired functions in result.
Any way to solve this problem? Entire codes are here.
Update
I've managed to mimic singledispatch's actions as following:
from functools import wraps
def state_deco(func_main):
"""
Decorator that mimics singledispatch for ease of interaction expansions.
"""
# assuming no args are needed for interaction functions.
func_main.dispatch_list = {} # collect decorated functions
#wraps(func_main)
def wrapper(target):
# dispatch target to destination interaction function.
nonlocal func_main
try:
# find and run callable for target
return func_main.dispatch_list[type(target)]()
except KeyError:
# If no matching case found, main decorated function will run instead.
func_main()
def register(target):
# A decorator that register decorated function to main decorated function.
def decorate(func_sub):
nonlocal func_main
func_main.dispatch_list[target] = func_sub
def register_wrapper(*args, **kwargs):
return func_sub(*args, **kwargs)
return register_wrapper
return decorate
wrapper.register = register
return wrapper
Used like:
#state_deco
def general():
return "A's reaction to undefined others."
#general.register(StateA)
def _():
return "A's reaction of another A"
#general.register(StateB)
def _():
return "A's reaction of B"
But still it's not singledispatch, so I find this might be inappropriate to post this as answer.
I wanted to do similar and had the same trouble. Looks like we have bumped into a python bug. Found a write-up that describes this situation.
Here is the link to the Python Bug Tracker.
Python 3.7 breaks on singledispatch_function.register(pseudo_type), which Python 3.6 accepted

Code incompatibility issues - Python 2.x/ Python 3.x

I have this code:
from abc import ABCMeta, abstractmethod
class Instruction (object):
__metaclass__ = ABCMeta
def __init__(self, identifier_byte):
#type: (int) ->
self.identifier_byte = identifier_byte
#abstractmethod
def process (self):
print ("Identifier byte: ()".format(self.identifier_byte))
class LDAInstruction (Instruction):
def process (self):
super(Instruction,self).process()
with works fine with Python 3.2 but not with 2.6. Then based on this topic: TypeError: super() takes at least 1 argument (0 given) error is specific to any python version?
I changed the last line to:
super(Instruction,self).process()
which causes this error message on this precise line:
AttributeError: 'super' object has no attribute 'process'
For me it seems that there is a "process" method for the super invocation. Is Python saying that "super" is an independent object, unrelated to instruction? If yes, how can I tell it that super shall only invoke the base class constructor?
If not, how I shall proceed? Thanks for any ideas.
You're passing the wrong class to super in your call. You need to pass the class you're making the call from, not the base class. Change it to this and it should work:
super(LDAInstruction, self).process()
It's unrelated to your main error, but I'd further note that the base-class implementation of process probably has an error with its attempt at string formatting. You probably want {0} instead of () in the format string. In Python 2.7 and later, you could omit the 0, and just use {}, but for Python 2.6 you have to be explicit.

How to override class attribute access in python3

I am trying to override class attribute access in python3. I found this question already answered for python2. But the same is not working with Python3. Please help me to understand why this does not work with Python3 and how to get it to work.
Here is the code i am trying to verify in Python3:
class BooType(type):
def __getattr__(self, attr):
print(attr)
return attr
class Boo(object):
__metaclass__ = BooType
boo = Boo()
Boo.asd #Raises AttributeError in Python3 where as in Python2 this prints 'asd'
from http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Metaprogramming.html
Python 3 changes the metaclass hook. It doesn’t disallow the __metaclass__ field, but it ignores it. Instead, you use a keyword argument in the base-class list:
in your case, you have to change to:
class Boo(object, metaclass = BooType):
pass
and that works. This syntax isn't compatible with python 2, though.
There's a way to create compatible code, seen in http://python-future.org/compatible_idioms.html#metaclasses
# Python 2 and 3:
from six import with_metaclass
# or
from future.utils import with_metaclass
class Boo(with_metaclass(BooType, object)):
pass

classes in Python: AttributeError

I`m writing simple scripts for test automation using Selenium WebDriver in Python, but the issue relates to Python, not to Selenium.
There two classes FindByXPATH_1(base) & FindByXPATH_2(derived). I want to call an attribute "driver" from the base class in a method of FindByXPATH_2, but when I ran the code the AttributeError shows up: "type object 'FindByXPATH_1' has no attribute 'driver'"
Here is the code:
class FindByXPATH_1():
def __init__(self):
self.driver_location = '/usr/local/bin/chromedriver'
self.driver = webdriver.Chrome(self.driver_location)
self.driver.get('https://letskodeit.teachable.com/p/practice')
from basics.xpath_1 import FindByXPATH_1
import basics #the classes are in two different python files
class FindByXpath_2(FindByXPATH_1):
def __init__(self):
FindByXPATH_1.__init__(self)
def find_by_starts_with(self):
starting_with = FindByXPATH_1.driver.find_elements(By. XPATH,
'//div[#class="view-school"]//h3[starts-with(#)class, "subtitle"]')
print(len(starting_with))
test = FindByXPATH_2()
test.find_by_starts_with()
After running the code I get a message "AttributeError: type object 'FindByXPATH_1' has no attribute 'driver'"
How can I call that attribute?
In this line here:
starting_with = FindByXPATH_1.driver.find_elements(By. XPATH,
'//div[#class="view-school"]//h3[starts-with(#)class, "subtitle"]')
You should be calling self.driver.find_elements otherwise you are trying to access a class variable of FindByXPATH_1 and not the instance variable driver

Wrapping all possible method calls of a class in a try/except block

I'm trying to wrap all methods of an existing Class (not of my creation) into a try/except suite. It could be any Class, but I'll use the pandas.DataFrame class here as a practical example.
So if the invoked method succeeds, we simply move on. But if it should generate an exception, it is appended to a list for later inspection/discovery (although the below example just issues a print statement for simplicity).
(Note that the kinds of data-related exceptions that can occur when a method on the instance is invoked, isn't yet known; and that's the reason for this exercise: discovery).
This post was quite helpful (particularly #martineau Python-3 answer), but I'm having trouble adapting it. Below, I expected the second call to the (wrapped) info() method to emit print output but, sadly, it doesn't.
#!/usr/bin/env python3
import functools, types, pandas
def method_wrapper(method):
#functools.wraps(method)
def wrapper(*args, **kwargs): #Note: args[0] points to 'self'.
try:
print('Calling: {}.{}()... '.format(args[0].__class__.__name__,
method.__name__))
return method(*args, **kwargs)
except Exception:
print('Exception: %r' % sys.exc_info()) # Something trivial.
#<Actual code would append that exception info to a list>.
return wrapper
class MetaClass(type):
def __new__(mcs, class_name, base_classes, classDict):
newClassDict = {}
for attributeName, attribute in classDict.items():
if type(attribute) == types.FunctionType: # Replace it with a
attribute = method_wrapper(attribute) # decorated version.
newClassDict[attributeName] = attribute
return type.__new__(mcs, class_name, base_classes, newClassDict)
class WrappedDataFrame2(MetaClass('WrappedDataFrame',
(pandas.DataFrame, object,), {}),
metaclass=type):
pass
print('Unwrapped pandas.DataFrame().info():')
pandas.DataFrame().info()
print('\n\nWrapped pandas.DataFrame().info():')
WrappedDataFrame2().info()
print()
This outputs:
Unwrapped pandas.DataFrame().info():
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrame
Wrapped pandas.DataFrame().info(): <-- Missing print statement after this line.
<class '__main__.WrappedDataFrame2'>
Index: 0 entries
Empty WrappedDataFrame2
In summary,...
>>> unwrapped_object.someMethod(...)
# Should be mirrored by ...
>>> wrapping_object.someMethod(...)
# Including signature, docstring, etc. (i.e. all attributes); except that it
# executes inside a try/except suite (so I can catch exceptions generically).
long time no see. ;-) In fact it's been such a long time you may no longer care, but in case you (or others) do...
Here's something I think will do what you want. I've never answered your question before now because I don't have pandas installed on my system. However, today I decided to see if there was a workaround for not having it and created a trivial dummy module to mock it (only as far as I needed). Here's the only thing in it:
mockpandas.py:
""" Fake pandas module. """
class DataFrame:
def info(self):
print('pandas.DataFrame.info() called')
raise RuntimeError('Exception raised')
Below is code that seems to do what you need by implementing #Blckknght's suggestion of iterating through the MRO—but ignores the limitations noted in his answer that could arise from doing it that way). It ain't pretty, but as I said, it seems to work with at least the mocked pandas library I created.
import functools
import mockpandas as pandas # mock the library
import sys
import traceback
import types
def method_wrapper(method):
#functools.wraps(method)
def wrapper(*args, **kwargs): # Note: args[0] points to 'self'.
try:
print('Calling: {}.{}()... '.format(args[0].__class__.__name__,
method.__name__))
return method(*args, **kwargs)
except Exception:
print('An exception occurred in the wrapped method {}.{}()'.format(
args[0].__class__.__name__, method.__name__))
traceback.print_exc(file=sys.stdout)
# (Actual code would append that exception info to a list)
return wrapper
class MetaClass(type):
def __new__(meta, class_name, base_classes, classDict):
""" See if any of the base classes were created by with_metaclass() function. """
marker = None
for base in base_classes:
if hasattr(base, '_marker'):
marker = getattr(base, '_marker') # remember class name of temp base class
break # quit looking
if class_name == marker: # temporary base class being created by with_metaclass()?
return type.__new__(meta, class_name, base_classes, classDict)
# Temporarily create an unmodified version of class so it's MRO can be used below.
TempClass = type.__new__(meta, 'TempClass', base_classes, classDict)
newClassDict = {}
for cls in TempClass.mro():
for attributeName, attribute in cls.__dict__.items():
if isinstance(attribute, types.FunctionType):
# Convert it to a decorated version.
attribute = method_wrapper(attribute)
newClassDict[attributeName] = attribute
return type.__new__(meta, class_name, base_classes, newClassDict)
def with_metaclass(meta, classname, bases):
""" Create a class with the supplied bases and metaclass, that has been tagged with a
special '_marker' attribute.
"""
return type.__new__(meta, classname, bases, {'_marker': classname})
class WrappedDataFrame2(
with_metaclass(MetaClass, 'WrappedDataFrame', (pandas.DataFrame, object))):
pass
print('Unwrapped pandas.DataFrame().info():')
try:
pandas.DataFrame().info()
except RuntimeError:
print(' RuntimeError exception was raised as expected')
print('\n\nWrapped pandas.DataFrame().info():')
WrappedDataFrame2().info()
Output:
Unwrapped pandas.DataFrame().info():
pandas.DataFrame.info() called
RuntimeError exception was raised as expected
Wrapped pandas.DataFrame().info():
Calling: WrappedDataFrame2.info()...
pandas.DataFrame.info() called
An exception occurred in the wrapped method WrappedDataFrame2.info()
Traceback (most recent call last):
File "test.py", line 16, in wrapper
return method(*args, **kwargs)
File "mockpandas.py", line 9, in info
raise RuntimeError('Exception raised')
RuntimeError: Exception raised
As the above illustrates, the method_wrapper() decoratored version is being used by methods of the wrapped class.
Your metaclass only applies your decorator to the methods defined in classes that are instances of it. It doesn't decorate inherited methods, since they're not in the classDict.
I'm not sure there's a good way to make it work. You could try iterating through the MRO and wrapping all the inherited methods as well as your own, but I suspect you'd get into trouble if there were multiple levels of inheritance after you start using MetaClass (as each level will decorate the already decorated methods of the previous class).

Resources