python self keyword in a list - python-3.x

I am reading a tutorial about keras and came a cross this class that is inherited by an other class.
class LearningRateDecay:
def plot(self, epochs, title="Learning Rate Schedule"):
lrs = [self(i) for i in epochs]
plt.style.use("ggplot")
plt.figure()
plt.plot(epochs, lrs)
plt.title(title)
plt.xlabel("Epoch #")
plt.ylabel("Learning Rate")
in which epochs is defined like this :
N = 100
epochs = np.arange(0, N)
and is passed to plot function like this :
a = LearningRateDecay
a.plot(epochs,"Learning Rate Schedule")
I cant understand what is self(i) meaning? is this about accessing to self elements or some thing else?
Any help is greatly appreciated.
Thank you

self(i) refers to calling an object i.e. using their __call__ method.
For example, consider this class:
class MyClass:
def __call__(self, arg):
print(arg)
def method(self):
self('test')
a = MyClass()
a.method() # Prints 'test'
a('other') # Prints 'other'
In your example, this class probably have some defined behavior associated with __call__ (it can also be inherited from an upper object). Refer to the class documentation to learn about it.

Related

Sklearn Pipeline: One feature automatically missed out

I created a Custom Classifier(Dummy Classifier). Below is definition. I also added some print statements & global variables to capture values
class FeaturePassThroughClassifier(ClassifierMixin):
def __init__(self):
pass
def fit(self, X, y):
global test_arr1
self.classes_ = np.unique(y)
test_arr1 = X
print("1:", X.shape)
return self
def predict(self, X):
global test_arr2
test_arr2 = X
print("2:", X.shape)
return X
def predict_proba(self, X):
global test_arr3
test_arr3 = X
print("3:", X.shape)
return X
Below is Stacking Classifier definition where the above defined CustomClassifier is one of base classifier. There are 3 more base classifiers (these are fitted estimators). Goal is to get input training set variables as is (which will come out from CustomClassifier) + prediction from base_classifier2, base_classifier3, base_classifier4. These features will act as input to meta classifier.
model = StackingClassifier(estimators=[
('select_features', Pipeline(steps = [("model_feature_selector", ColumnTransformer([('feature_list', 'passthrough', X_train.columns)])),
('base(dummy)_classifier1', FeaturePassThroughClassifier())])),
('base_classifier2', base_classifier2),
('base_classifier3', base_classifier3),
('base_classifier4', base_classifier4)
],
final_estimator = Pipeline(memory=None,
steps=[
('save_base_estimator_output_data', FunctionTransformer(save_base_estimator_output_data, validate=False)), ('final_model', RandomForestClassifier())
], verbose=True), passthrough = False, **stack_method = 'predict_proba'**)
Below is o/p on fitting the model. There are 230 variables:
Here is the problem: There are 230 variables but CustomClassifier o/p is showing only 229 which is strange. We can clearly see from print statements above that 230 variables get passed through CustomClassifier.
I need to use stack_method = "predict_proba". I am not sure what's going wrong here. The code works fine when stack_method = "predict".
Since this is a binary classifier, the classifier class expects you to add two probability columns in the output matrix - one for probability for class label "1" and another for "0".
In the output, it has dropped one of these since both are not required, hence, 230 columns get reduced to 229. Add a dummy column to solve your problem.
In the Notes section of the documentation:
When predict_proba is used by each estimator (i.e. most of the time for stack_method='auto' or specifically for stack_method='predict_proba'), The first column predicted by each estimator will be dropped in the case of a binary classification problem.
Here's the code that eliminates the first column.
You could add a sacrificial first column in your custom estimator's predict_proba, or switch to decision_function (which will cause differences depending on your real base estimators), or use the passthrough option instead of the custom estimator (doing feature selection in the final_estimator object instead).
Both the above solutions are on point. This is how I implemented the workaround with dummy column:
Declare a custom transformer whose output is the column that gets dropped due reasons explained above:
class add_dummy_column(BaseEstimator, TransformerMixin):
def __init__(self, key):
self.key = key
def fit(self, X, y=None):
return self
def transform(self, X):
print(type(X))
return X[[self.key]]
Do a feature union where above customer transformer + column transformer are called to create final dataframe. This will duplicate the column that gets dropped. Below is altered definition for defining Stacking classifier with FeatureUnion:
model = StackingClassifier(estimators=[
('select_features', Pipeline(steps = [('featureunion', FeatureUnion([('add_dummy_column_to_input_dataframe', add_dummy_column(key='FEATURE_THAT_GETS_DROPPED')),
("model_feature_selector", ColumnTransformer([('feature_list', 'passthrough', X_train.columns)]))])),
('base(dummy)_classifier1', FeaturePassThroughClassifier())])),
('base_classifier2', base_classifier2),
('base_classifier3', base_classifier3),
('base_classifier4', base_classifier4)
],
final_estimator = Pipeline(memory=None,
steps=[
('save_base_estimator_output_data', FunctionTransformer(save_base_estimator_output_data, validate=False)), ('final_model', RandomForestClassifier())
], verbose=True), passthrough = False, **stack_method = 'predict_proba'**)

Got error "AttributeError: 'TimestepEmbedSequential' object has no attribute '__globals__'" while torch.jit.script()

While trying to script Stable Diffusion model using torch.jit.script(), I got this following error:
AttributeError: 'TimestepEmbedSequential' object has no attribute '__globals__'
I'm trying to export this model to ONNX and found out that running torch.onnx.export() will torch.jit.trace the models, which unrolls every loops, so I'm trying to use script first.
When I follow the traceback, the error occurs in this function while reading fn.__globals__ in _jit_internal.py from torch
def get_closure(fn):
"""
Get a dictionary of closed over variables from a function
"""
captures = {}
captures.update(fn.__globals__)
for index, captured_name in enumerate(fn.__code__.co_freevars):
captures[captured_name] = fn.__closure__[index].cell_contents
return captures
code for scripting is as follows:
stablediffusion_wrapper = StableDiffusionWrapper(model, sampler, opt)
scripted_module = torch.jit.script(stablediffusion_wrapper, example_inputs=[(token_dummy, img_dummy)])
and TimestepEmbedSequential module:
class TimestepEmbedSequential(nn.Sequential, TimestepBlock):
"""
A sequential module that passes timestep embeddings to the children that
support it as an extra input.
"""
def forward(self, x, emb, context=None):
for layer in self:
if isinstance(layer, TimestepBlock):
x = layer(x, emb)
elif isinstance(layer, SpatialTransformer):
x = layer(x, context)
else:
x = layer(x)
return x
Any suggestions how can I figure it out?
I tried to set #torch.jit.export decorator to the parent class TimestepBlock, but showed no effect. Actually, I have no idea what to look for this problem. I would appreciate any suggestions. Thank you

Documenting jit-compiled PyTorch Class Method (Sphinx)

I am having a problem trying to document custom PyTorch models with Sphinx: methods that are jit-compiled show up without docstrings in the documentation. How do I fix this? I checkoed out Python Sphinx autodoc and decorated members and How to autodoc decorated methods with sphinx? but the proposed solutions don't seem to work. When I try using ..automethod I get
AttributeError: '_CachedForward' object has no attribute '__getattr__'
Here's a MWE; at the moment I circumvent the problem by writing a my_forward and calling it in the forward.
from torch import jit, Tensor
class DummyModel(jit.ScriptModule):
"""My dummy model"""
def __init__(self, const: float):
super(DummyModel, self).__init__()
self.const = Tensor(const)
#jit.script_method
def forward(self, x: Tensor) -> Tensor:
"""This method does show as a :undoc-member: in the documentation"""
return self.my_forward(x)
def my_forward(self, x: Tensor) -> Tensor:
"""This method shows up as a :member: in the documentation"""
return x + self.const
Set the PYTORCH_JIT environment variable to 0 when running Sphinx. This will disable script and tracing annotations (decorators).
See https://pytorch.org/docs/stable/jit.html#disable-jit-for-debugging.

torch.jit.script(module) vs #torch.jit.script decorator

Why is adding the decorator "#torch.jit.script" results in an error, while I can call torch.jit.script on that module, e.g. this fails:
import torch
#torch.jit.script
class MyCell(torch.nn.Module):
def __init__(self):
super(MyCell, self).__init__()
self.linear = torch.nn.Linear(4, 4)
def forward(self, x, h):
new_h = torch.tanh(self.linear(x) + h)
return new_h, new_h
my_cell = MyCell()
x, h = torch.rand(3, 4), torch.rand(3, 4)
traced_cell = torch.jit.script(my_cell, (x, h))
print(traced_cell)
traced_cell(x, h)
"C:\Users\Administrator\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\jit\__init__.py", line 1262, in script
raise RuntimeError("Type '{}' cannot be compiled since it inherits"
RuntimeError: Type '<class '__main__.MyCell'>' cannot be compiled since it inherits from nn.Module, pass an instance instead
While the following code works well:
class MyCell(torch.nn.Module):
def __init__(self):
super(MyCell, self).__init__()
self.linear = torch.nn.Linear(4, 4)
def forward(self, x, h):
new_h = torch.tanh(self.linear(x) + h)
return new_h, new_h
my_cell = MyCell()
x, h = torch.rand(3, 4), torch.rand(3, 4)
traced_cell = torch.jit.script(my_cell, (x, h))
print(traced_cell)
traced_cell(x, h)
This question is also featured on PyTorch forums.
Reason for your error is here, this bulletpoint precisely:
No support for inheritance or any other polymorphism strategy, except
for inheriting from object to specify a new-style class.
Also, as stated at the top:
TorchScript class support is experimental. Currently it is best suited
for simple record-like types (think a NamedTuple with methods
attached).
Currently, it's purpose is for simple Python classes (see other points in the link I've provided) and functions, see link I've provided for more information.
You can also check torch.jit.script source code to get a better grasp of how it works.
From what it seems, when you pass an instance, all attributes which should be preserved are recursively parsed (source). You can follow this function along (quite commented, but too long for an answer, see here), though exact reason why this is the case (and why it was designed this way) is beyond my knowledge (so hopefully someone with expertise in torch.jit's inner workings will speak more about it).

How to use a Batchsampler within a Dataloader

I have a need to use a BatchSampler within a pytorch DataLoader instead of calling __getitem__ of the dataset multiple times (remote dataset, each query is pricy). I cannot understand how to use the batchsampler with any given dataset.
e.g
class MyDataset(Dataset):
def __init__(self, remote_ddf, ):
self.ddf = remote_ddf
def __len__(self):
return len(self.ddf)
def __getitem__(self, idx):
return self.ddf[idx] --------> This is as expensive as a batch call
def get_batch(self, batch_idx):
return self.ddf[batch_idx]
my_loader = DataLoader(MyDataset(remote_ddf),
batch_sampler=BatchSampler(Sampler(), batch_size=3))
The thing I do not understand, neither found any example online or in torch docs, is how do I use my get_batch function instead of the __getitem__ function.
Edit:
Following the answer of Szymon Maszke, this is what I tried and yet, \_\_get_item__ gets one index each call, instead of a list of size batch_size
class Dataset(Dataset):
def __init__(self):
...
def __len__(self):
...
def __getitem__(self, batch_idx): ------> here I get only one index
return self.wiki_df.loc[batch_idx]
loader = DataLoader(
dataset=dataset,
batch_sampler=BatchSampler(
SequentialSampler(dataset), batch_size=self.hparams.batch_size, drop_last=False),
num_workers=self.hparams.num_data_workers,
)
You can't use get_batch instead of __getitem__ and I don't see a point to do it like that.
torch.utils.data.BatchSampler takes indices from your Sampler() instance (in this case 3 of them) and returns it as list so those can be used in your MyDataset __getitem__ method (check source code, most of samplers and data-related utilities are easy to follow in case you need it).
I assume your self.ddf supports list slicing (e.g. self.ddf[[25, 44, 115]] returns values correctly and uses only one expensive call). In this case simply switch get_batch into __getitem__ and you are good to go.
class MyDataset(Dataset):
def __init__(self, remote_ddf, ):
self.ddf = remote_ddf
def __len__(self):
return len(self.ddf)
def __getitem__(self, batch_idx):
return self.ddf[batch_idx] -> batch_idx is a list
EDIT: You have to specify batch_sampler as sampler, otherwise the batch will be divided into single indices. This should be fine:
loader = DataLoader(
dataset=dataset,
# This line below!
sampler=BatchSampler(
SequentialSampler(dataset), batch_size=self.hparams.batch_size, drop_last=False
),
num_workers=self.hparams.num_data_workers,
)

Resources