Tweepy StreamingClient custom __init__ / arguments - super() not working

Tweepy StreamingClient custom __init__ / arguments - super() not working - python-3.x

I'm looking to stream tweets to json files. I'm using Twitter API v2.0, with Tweepy 4.12.1 and Python 3.10 on Ubuntu 22.04. I'm working with the tweepy.StreamingClient class, and utilizing the on_data() method.
class TweetStreamer(tweepy.StreamingClient):
def __init__(self,
out_path:str,
kill_time:int=59,
time_unit:str='minutes'):
'''
adding custom params
'''
out_path = check_path_exists(out_path)
self.outfile = out_path
if time_unit not in ['seconds', 'minutes']:
raise ValueError(f'time_unit must be either `minutes` or `seconds`.')
if time_unit=='minutes':
self.kill_time = datetime.timedelta(seconds=60*kill_time)
else:
self.kill_time = datetime.timedelta(seconds=kill_time)
super(TweetStreamer, self).__init__()
def on_data(self, data):
'''
1. clean the returned tweet object
2. write it out
'''
# out_obj = data['data']
with open(self.outfile, 'ab') as o:
o.write(data+b'\n')
As you can see, I'm invoking `super() on the parent class, with the intention of retaining all the stuff that is invoked if I didn't specify a custom init.
However, however I try to change it, I get this error when I try to create an instance of the class, passing a bearer_token as well as the other arguments defined in __init__() :
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [106], line 1
----> 1 streamer = TweetStreamer(bearer_token=bearer_token, out_path='test2.json')
TypeError: TweetStreamer.__init__() got an unexpected keyword argument 'bearer_token'
Any help would be greatly appreciated.

You get this error because you declared that __init__ does not take bearer_token as an argument. You need to pass the keyword arguments your constructor gets to the constructor of the superclass that is expecting them. You can do it using the unpacking operator **:
def __init__(self,
out_path:str,
kill_time:int=59,
time_unit:str='minutes',
**kwargs):
'''
adding custom params
'''
# (...)
super(TweetStreamer, self).__init__(**kwargs)

Related

Overridden setitem call works in serial but breaks in apply_async call

I've been fighting with this problem for some time now and I've finally managed to narrow down the issue and create a minimum working example.
The summary of the problem is that I have a class that inherits from a dict to facilitate parsing of misc. input files. I've overridden the the __setitem__ call to support recursive indexing of sections in our input file (e.g. parser['some.section.variable'] is equivalent to parser['some']['section']['variable']). This has been working great for us for over a year now, but we just ran into an issue when passing these Parser classes through a multiprocessing.apply_async call.
Show below is the minimum working example - obviously the __setitem__ call isn't doing anything special, but it's important that it accesses some class attribute like self.section_delimiter - this is where it breaks. It doesn't break in the initial call or in the serial function call. But when you call the some_function (which doesn't do anything either) using apply_async, it crashes.
import multiprocessing as mp
import numpy as np
class Parser(dict):
def __init__(self, file_name : str = None):
print('\t__init__')
super().__init__()
self.section_delimiter = "."
def __setitem__(self, key, value):
print('\t__setitem__')
self.section_delimiter
dict.__setitem__(self, key, value)
def some_function(parser):
pass
if __name__ == "__main__":
print("Initialize creation/setting")
parser = Parser()
parser['x'] = 1
print("Single serial call works fine")
some_function(parser)
print("Parallel async call breaks on line 16?")
pool = mp.Pool(1)
for i in range(1):
pool.apply_async(some_function, (parser,))
pool.close()
pool.join()
If you run the code below, you'll get the following output
Initialize creation/setting
__init__
__setitem__
Single serial call works fine
Parallel async call breaks on line 16?
__setitem__
Process ForkPoolWorker-1:
Traceback (most recent call last):
File "/home/ijw/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/ijw/miniconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/ijw/miniconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/home/ijw/miniconda3/lib/python3.7/multiprocessing/queues.py", line 354, in get
return _ForkingPickler.loads(res)
File "test_apply_async.py", line 13, in __setitem__
self.section_delimiter
AttributeError: 'Parser' object has no attribute 'section_delimiter'
Any help is greatly appreciated. I spent considerable time tracking down this bug and reproducing a minimal example. I would love to not only fix it, but clearly fill some gap in my understanding on how these apply_async and inheritance/overridden methods interact.
Let me know if you need any more information.
Thank you very much!
Isaac

Cause
The cause of the problem is that multiprocessing serializes and deserializes your Parser object to move its data across process boundaries. This is done using pickle. By default pickle does not call __init__() when deserializing classes. Because of this self.section_delimiter is not set when the deserializer calls __setitem__() to restore the items in your dictionary and you get the error:
AttributeError: 'Parser' object has no attribute 'section_delimiter'
Using just pickle and no multiprocessing gives the same error:
import pickle
parser = Parser()
parser['x'] = 1
data = pickle.dumps(parser)
copy = pickle.loads(data) # Same AttributeError here
Deserialization will work for an object with no items and the value of section_delimiter will be restored:
import pickle
parser = Parser()
parser.section_delimiter = "|"
data = pickle.dumps(parser)
copy = pickle.loads(data)
print(copy.section_delimiter) # Prints "|"
So in a sense you are just unlucky that pickle calls __setitem__() before it restores the rest of the state of your Parser.
Workaround
You can work around this by setting section_delimiter in __new__() and telling pickle what arguments to pass to __new__() by implementing __getnewargs__():
def __new__(cls, *args):
self = super(Parser, cls).__new__(cls)
self.section_delimiter = args[0] if args else "."
return self
def __getnewargs__(self):
return (self.section_delimiter,)
__getnewargs__() returns a tuple of arguments. Because section_delimiter is set in __new__(), it is no longer necessary to set it in __init__().
This is the code of your Parser class after the change:
class Parser(dict):
def __init__(self, file_name : str = None):
print('\t__init__')
super().__init__()
def __new__(cls, *args):
self = super(Parser, cls).__new__(cls)
self.section_delimiter = args[0] if args else "."
return self
def __getnewargs__(self):
return (self.section_delimiter,)
def __setitem__(self, key, value):
print('\t__setitem__')
self.section_delimiter
dict.__setitem__(self, key, value)
Simpler solution
The reason pickle calls __setitem__() on your Parser object is because it is a dictionary. If your Parser is just a class that happens to implement __setitem__() and __getitem__() and has a dictionary to implement those calls then pickle will not call __setitem__() and serialization will work with no extra code:
class Parser:
def __init__(self, file_name : str = None):
print('\t__init__')
self.dict = { }
self.section_delimiter = "."
def __setitem__(self, key, value):
print('\t__setitem__')
self.section_delimiter
self.dict[key] = value
def __getitem__(self, key):
return self.dict[key]
So if there is no other reason for your Parser to be a dictionary, I would just not use inheritance here.

Avoid Pycharm dict lookup when using getattr and slots for composition

Say I have a class:
class Example:
__slots__ = ("_attrs", "other_value")
def __init__(self):
self._attrs = OrderedDict()
self.other_value = 1
self.attribute = 0
def __setattr__(self, key, value):
if key in self.__slots__:
return super().__setattr__(key, value)
else:
self._attrs[key] = value
def __getattr__(self, key):
return self._attrs[key]
The goal is to have Example have two slots:
if those are set, then set them as usual. (works)
If additional attributes are set, assign them in _attrs. (works)
For getting attributes, the code should:
Act as usual if anything from slots is requested (works)
get the value from _attrs if it exists in _attrs.keys() (works)
error in any other case as usual (issue).
For the issue, I'd like the error to mimic what would normally happen if an attribute was not present for an object. Currently when running code I get a key error on self._attrs. Although this is fine, it would be nice for it to hide this nuance away. More annoyingly, if I debug in Pycharm, the autocomplete will chuck out a large error trying to look at dict before I've even hit enter:
Example().abc # hit tab in pycharm
# returns the error:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_comm.py", line 1464, in do_it
def do_it(self, dbg):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/_pydev_completer.py", line 159, in generate_completions_as_xml
def generate_completions_as_xml(frame, act_tok):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/_pydev_completer.py", line 77, in complete
def complete(self, text):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/_pydev_completer.py", line 119, in attr_matches
def attr_matches(self, text):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/_pydev_imports_tipper.py", line 165, in generate_imports_tip_for_module
def generate_imports_tip_for_module(obj_to_complete, dir_comps=None, getattr=getattr, filter=lambda name:True):
File "/Users/xxxxxxxxx/", line 46, in __getattr__
def __getattr__(self, key: str) -> None:
KeyError: '__dict__'
Is there a way to suppress this by writing the code differently?

You might be able to make it work by implementing __dir__ on the class, so it has a canonical source of names that can be completed:
def __dir__(self):
return 'other_value', *self._attrs.keys()
I can't swear to how PyCharm implements their tab-completion, so there's no guarantee it works, but this is the way to define the set of enumerable attributes for a type, and hopefully PyCharm will use it when available, rather than going for __dict__.
The other approach (and this is probably a good idea regardless) it to make sure you raise the right error when __getattr__ fails so PyCharm knows the problem is a missing attribute, not some unrelated issue with a dict:
def __getattr__(self, key):
try:
return self._attrs[key]
except KeyError:
raise AttributeError(key)

TypeError: 'dict' object is not callable with OrderedDict and multiple inheritance

Following code with multiple inheritance and a dictionary class, it gives this mysterious error:
'dict' object is not callable
but only in the second time i call dump_settings(), not the first.
What's this 'dict' is related to ?
from collections import OrderedDict
from abc import ABC, abstractmethod
class Dumpable(ABC):
def __init__(self):
self.dump_settings = None
super().__init__()
def dump_settings(self, settings ):
self.dump_settings = settings
pass
class ItemSet(OrderedDict, Dumpable):
def __init__(self , allow_substitution : bool = False ):
super(OrderedDict, self).__init__()
super(Dumpable, self).__init__()
# also substituting two calls above with the
# following, do not change behavior:
# super().__init__()
self.allow_substitution = allow_substitution
pass
def dump_settings(self,settings):
super().dump_settings(settings)
pass
itemset = ItemSet()
output = open("output.txt", "w", encoding="utf-8")
d= dict( output = output , html = False )
print(repr(d))
# this call seems to have no problems:
itemset.dump_settings(d)
print(repr(d))
# note that the given error "'dict' object is not callable"
# has nothing to do with 'd' param because if you change
# in the followin the 'd' with a non-dictionary object,
# the error remains, for example:
# itemset.dump_settings('hello')
itemset.dump_settings(d)
output.close()
NOTE: the error is not related to d variable (that is a dictionary too) because if you change it with a non-dictionary object, the error remains, for example:
itemset.dump_settings('hello')
I tried both Python version 3.5.2 for linux and 3.8.3 for Windows

The issue is you are replacing your dump_settings method with the dictionary, So when you go to call dump_settings the next time, its a dict now and not a method and like the error says 'dict' is not callable.
Remember methods are just attributes of your class. So after you create your itemset object. itemset.dump_settings attribute points to the method. However when you call dump_settings. You then go on to do self.dump_settings = settings (where settings is the dict you gave it). So now itemset.dump_settings is a dict and not a method.
print(f"itemset.dump_settings, type: {type(itemset.dump_settings)}, {itemset.dump_settings}")
itemset.dump_settings(d)
print(f"itemset.dump_settings, type: {type(itemset.dump_settings)}, {itemset.dump_settings}")
OUTPUT
itemset.dump_settings, type: <class 'method'>, <bound method ItemSet.dump_settings of ItemSet()>
itemset.dump_settings, type: <class 'dict'>, {'output': <_io.TextIOWrapper name='output.txt' mode='w' encoding='utf-8'>, 'html': False}
If you want to save the dict you need to give it a name in your class thats not already the name of a method.
def dump_settings(self, settings):
self.dump_settings_dict = settings

found solution, very easy:
there is a conflict because in Dumpable class the dump_settings it is both a class variable and a method name.
renaming one of them, solves the issue.

How to initialize python watchdog pattern matching event handler

I'm using the Python Watchdog to monitor a directory for new files being created. Several different types of files are created in said directory but I only need to monitor a single file type, hence I use the Watchdog PatternMatchingEventHandler, where I specify the pattern to monitor using the patterns keyword.
To correctly execute the code under the hood (not displayed here) I need to initialize an empty dataframe in my event-handler, and I am having trouble getting this to work. If I remove the __init__ in the code below, everything works just fine btw.
I used the code in this answer as inspiration for my own.
The code I have set up looks as follows:
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
import time
import pandas as pd
import numpy as np
from multiprocessing import Pool
class HandlerEQ54(PatternMatchingEventHandler):
def __init__(self):
#Initializing an empty dataframe for storage purposes.
data_54 = pd.DataFrame(columns = ['Barcode','DUT','Step12','Step11','Np1','Np2','TimestampEQ54'])
#Converting to INT for later purposes
data_54[['Barcode','DUT']]=data_54[['Barcode','DUT']].astype(np.int64)
self.data = data_54
def on_created(self,event):
if event.is_directory:
return True
elif event.event_type == 'created':
#Take action here when a file is created.
print('Found new files:')
print(event.src_path)
time.sleep(0.1)
#Creating process pool to return data
pool1 = Pool(processes=4)
#Pass file to parsing function and return parsed result.
result_54 = pool1.starmap(parse_eq54,[(event.src_path,self.data)])
#returns the dataframe rather than the list of dataframes returned by starmap
self.data = result_54[0]
print('Data read: ')
print(self.data)
def monitorEquipment(equipment):
'''Uses the Watchdog package to monitor the data directory for new files.
See the HandlerEQ54 and HandlerEQ51 classes in multiprocessing_handlers for actual monitoring code. Monitors each equipment.'''
print('equipment')
if equipment.upper() == 'EQ54':
event_handler = HandlerEQ54(patterns=["*.log"])
filepath = '/path/to/first/file/source/'
# set up observer
observer = Observer()
observer.schedule(event_handler, path=filepath, recursive=True)
observer.daemon=True
observer.start()
print('Observer started')
# monitor
try:
while True:
time.sleep(5)
except KeyboardInterrupt:
observer.unschedule_all()
observer.stop()
observer.join()
However, when I execute monitorEquipment I receive the following error message:
TypeError: __init__() got an unexpected keyword argument 'patterns'
Evidently I'm doing something wrong when I'm initializing my handler class, but I'm drawing a blank as to what that is (which probably reflects my less-than-optimal understanding of classes). Can someone advice me on how to correctly initialize the empty dataframe in my HandlerEQ54 class, to not get the error I do?

Looks like you are missing the patterns argument from your __init__ method, you'll also need a super() call to the __init__ method of the parent class (PatternMatchingEventHandler), so you can pass the patterns argument upwards.
it should look something like this:
class HandlerEQ54(PatternMatchingEventHandler):
def __init__(self, patterns=None):
super(HandlerEQ54, self).__init__(patterns=patterns)
...
event_handler = HandlerEQ54(patterns=["*.log"])
or, for a more generic case and to support all of PatternMatchingEventHandler's arguments:
class HandlerEQ54(PatternMatchingEventHandler):
def __init__(self, *args, **kwargs):
super(HandlerEQ54, self).__init__(*args, **kwargs)
...
event_handler = HandlerEQ54(patterns=["*.log"])

class instance from nowhere [duplicate]

If I have a class ...
class MyClass:
def method(arg):
print(arg)
... which I use to create an object ...
my_object = MyClass()
... on which I call method("foo") like so ...
>>> my_object.method("foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: method() takes exactly 1 positional argument (2 given)
... why does Python tell me I gave it two arguments, when I only gave one?

In Python, this:
my_object.method("foo")
... is syntactic sugar, which the interpreter translates behind the scenes into:
MyClass.method(my_object, "foo")
... which, as you can see, does indeed have two arguments - it's just that the first one is implicit, from the point of view of the caller.
This is because most methods do some work with the object they're called on, so there needs to be some way for that object to be referred to inside the method. By convention, this first argument is called self inside the method definition:
class MyNewClass:
def method(self, arg):
print(self)
print(arg)
If you call method("foo") on an instance of MyNewClass, it works as expected:
>>> my_new_object = MyNewClass()
>>> my_new_object.method("foo")
<__main__.MyNewClass object at 0x29045d0>
foo
Occasionally (but not often), you really don't care about the object that your method is bound to, and in that circumstance, you can decorate the method with the builtin staticmethod() function to say so:
class MyOtherClass:
#staticmethod
def method(arg):
print(arg)
... in which case you don't need to add a self argument to the method definition, and it still works:
>>> my_other_object = MyOtherClass()
>>> my_other_object.method("foo")
foo

In simple words
In Python you should add self as the first parameter to all defined methods in classes:
class MyClass:
def method(self, arg):
print(arg)
Then you can use your method according to your intuition:
>>> my_object = MyClass()
>>> my_object.method("foo")
foo
For a better understanding, you can also read the answers to this question: What is the purpose of self?

Something else to consider when this type of error is encountered:
I was running into this error message and found this post helpful. Turns out in my case I had overridden an __init__() where there was object inheritance.
The inherited example is rather long, so I'll skip to a more simple example that doesn't use inheritance:
class MyBadInitClass:
def ___init__(self, name):
self.name = name
def name_foo(self, arg):
print(self)
print(arg)
print("My name is", self.name)
class MyNewClass:
def new_foo(self, arg):
print(self)
print(arg)
my_new_object = MyNewClass()
my_new_object.new_foo("NewFoo")
my_bad_init_object = MyBadInitClass(name="Test Name")
my_bad_init_object.name_foo("name foo")
Result is:
<__main__.MyNewClass object at 0x033C48D0>
NewFoo
Traceback (most recent call last):
File "C:/Users/Orange/PycharmProjects/Chapter9/bad_init_example.py", line 41, in <module>
my_bad_init_object = MyBadInitClass(name="Test Name")
TypeError: object() takes no parameters
PyCharm didn't catch this typo. Nor did Notepad++ (other editors/IDE's might).
Granted, this is a "takes no parameters" TypeError, it isn't much different than "got two" when expecting one, in terms of object initialization in Python.
Addressing the topic: An overloading initializer will be used if syntactically correct, but if not it will be ignored and the built-in used instead. The object won't expect/handle this and the error is thrown.
In the case of the sytax error: The fix is simple, just edit the custom init statement:
def __init__(self, name):
self.name = name

Newcomer to Python, I had this issue when I was using the Python's ** feature in a wrong way. Trying to call this definition from somewhere:
def create_properties_frame(self, parent, **kwargs):
using a call without a double star was causing the problem:
self.create_properties_frame(frame, kw_gsp)
TypeError: create_properties_frame() takes 2 positional arguments but 3 were given
The solution is to add ** to the argument:
self.create_properties_frame(frame, **kw_gsp)

As mentioned in other answers - when you use an instance method you need to pass self as the first argument - this is the source of the error.
With addition to that,it is important to understand that only instance methods take self as the first argument in order to refer to the instance.
In case the method is Static you don't pass self, but a cls argument instead (or class_).
Please see an example below.
class City:
country = "USA" # This is a class level attribute which will be shared across all instances (and not created PER instance)
def __init__(self, name, location, population):
self.name = name
self.location = location
self.population = population
# This is an instance method which takes self as the first argument to refer to the instance
def print_population(self, some_nice_sentence_prefix):
print(some_nice_sentence_prefix +" In " +self.name + " lives " +self.population + " people!")
# This is a static (class) method which is marked with the #classmethod attribute
# All class methods must take a class argument as first param. The convention is to name is "cls" but class_ is also ok
#classmethod
def change_country(cls, new_country):
cls.country = new_country
Some tests just to make things more clear:
# Populate objects
city1 = City("New York", "East", "18,804,000")
city2 = City("Los Angeles", "West", "10,118,800")
#1) Use the instance method: No need to pass "self" - it is passed as the city1 instance
city1.print_population("Did You Know?") # Prints: Did You Know? In New York lives 18,804,000 people!
#2.A) Use the static method in the object
city2.change_country("Canada")
#2.B) Will be reflected in all objects
print("city1.country=",city1.country) # Prints Canada
print("city2.country=",city2.country) # Prints Canada

It occurs when you don't specify the no of parameters the __init__() or any other method looking for.
For example:
class Dog:
def __init__(self):
print("IN INIT METHOD")
def __unicode__(self,):
print("IN UNICODE METHOD")
def __str__(self):
print("IN STR METHOD")
obj = Dog("JIMMY", 1, 2, 3, "WOOF")
When you run the above programme, it gives you an error like that:
TypeError: __init__() takes 1 positional argument but 6 were given
How we can get rid of this thing?
Just pass the parameters, what __init__() method looking for
class Dog:
def __init__(self, dogname, dob_d, dob_m, dob_y, dogSpeakText):
self.name_of_dog = dogname
self.date_of_birth = dob_d
self.month_of_birth = dob_m
self.year_of_birth = dob_y
self.sound_it_make = dogSpeakText
def __unicode__(self, ):
print("IN UNICODE METHOD")
def __str__(self):
print("IN STR METHOD")
obj = Dog("JIMMY", 1, 2, 3, "WOOF")
print(id(obj))

If you want to call method without creating object, you can change method to static method.
class MyClass:
#staticmethod
def method(arg):
print(arg)
MyClass.method("i am a static method")

I get this error when I'm sleep-deprived, and create a class using def instead of class:
def MyClass():
def __init__(self, x):
self.x = x
a = MyClass(3)
-> TypeError: MyClass() takes 0 positional arguments but 1 was given

You should actually create a class:
class accum:
def __init__(self):
self.acc = 0
def accumulator(self, var2add, end):
if not end:
self.acc+=var2add
return self.acc

In my case, I forgot to add the ()
I was calling the method like this
obj = className.myMethod
But it should be is like this
obj = className.myMethod()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Tweepy StreamingClient custom init / arguments - super() not working - python-3.x

Related

Overridden setitem call works in serial but breaks in apply_async call

Avoid Pycharm dict lookup when using getattr and slots for composition

TypeError: 'dict' object is not callable with OrderedDict and multiple inheritance

How to initialize python watchdog pattern matching event handler

class instance from nowhere [duplicate]

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Tweepy StreamingClient custom __init__ / arguments - super() not working - python-3.x

Related

Overridden __setitem__ call works in serial but breaks in apply_async call

Avoid Pycharm __dict__ lookup when using __getattr__ and __slots__ for composition

TypeError: 'dict' object is not callable with OrderedDict and multiple inheritance

How to initialize python watchdog pattern matching event handler

class instance from nowhere [duplicate]

Categories

Resources

Tweepy StreamingClient custom init / arguments - super() not working - python-3.x

Overridden setitem call works in serial but breaks in apply_async call

Avoid Pycharm dict lookup when using getattr and slots for composition