Pytorch-Forecasting N-Beats model with SELU() activation function? - pytorch

I am working at timeseries forecasting, and I've using the PyTorch lib pytorch-forecasting lately. If you didn't know it, try it. It's great.
I am interested in SELU activation function for Self-Normalizing-Networks (SNNs, see, e.g., the docs). As I didn't find any N-Beats corrected to use SELU and its requirements (i.e. AlphaDropout, proper weights init), I made an implementation myself.
It would be great if any of you with experience with these concepts -NBeats architecture, pytorch-forecasting, or SELU()- could review whether everything is right in my implementation.
My implementation here: https://gist.github.com/pnmartinez/fef1f488497fa85a2cc1626af2a5b4bd

Related

Support for one vs all classification in pytorch

I have been clinging to a problem on multi class classification with a large number of classes (500) and a one vs rest classifier seems to be a good approach to handle it. Is there any support for the same in pytorch itself?
All that I could find is this post which doesn't have answers either. I understand that scikit-learn does offer this feature but I wish to stick to pytorch because of the numerous familiar customisations that can be tweaked with in this framework. I've tried exploring skorch but even it's overly simplified and doesn't have a good documentation to allow modifications which my use case needs

How to compliate GPT-Neo without HuggingFace-Files?

Since the only simple way to get GPT-neo running nowadays seems to require HuggingFace-Hub (which pulls a fully compiled model from the HuggingFace-hub, instead of compiling the official Weights from "the-eye.eu/Ai",
I can not be sure what else is actually in the pulled data (and library) as it is a blackbox.
Therefore I'd prefer to compile the model myself from the official pretrained weights (https://the-eye.eu/public/AI/models)
Unfortunately I havent found any ressources yet how to do that, but the AI community would benefit from having alot more transparency in the process and trust in EleutherAI.
Could anyone explain the compiling-lines required to create the original model from the uploaded weights? (and a tokenizer)

Does nn.Transformer in PyTorch include the PositionalEncoding() process so far?

I was trying to solve a seq2seq problem with Transformer Module. According my understanding to the source code from pytoch's github, I thought PositionalEncoding() is not yet included. (Correct me if i'm wrong). But I saw soooo many codes (including the one written by one of my tutors) using the default nn.Transformer to model seq2seq problems which really confused me.

Mix pytorch lightning with vanilla pytorch

I am doing a meta learning research and am using the MAML optimization provided by learn2learn. However as one of the baseline, I would like to test a non-meta-learning approach, i.e. the traditional training + testing.
Due to the lightning's internal usage of optimizer it seems that it is difficult to make the MAML work with learn2learn in lightning, so I couldn't use lightning in my meta-learning setup, however for my baseline, I really like to use lightning in that it provides many handy functionalities like deepspeed or ddp out of the box.
Here is my question, other than setting up two separate folders/repos, how could I mix the vanilia pytorch (learn2learn) with pytorch lightning (baseline)? What is the best practice?
Thanks!
Decided to answer my question. So I ended up using the torch lightning's manual optimization so that I can customize the optimization step. This would make both approaches using the same framework, and I think is better than maintaining 2 separate repos.

How to find built-in function source code in pytorch

I am trying to do research on batch normalization, and had to make some modifications for the pytorch BN code. I dig into the pytorch code and got stuck with torch.nn.functional.batch_norm, which references torch.batch_norm.
The problem is that torch.batch_norm cannot be further found in the torch library. Is there any way I can find the source code of this built-in function and re-implement it? Thanks!
It's there, but it's not defined in Python. They're defined in C++ in the aten/ directories.
For CPU, the implementation (one of them, it depends on whether or not the input is contiguous) is here: https://github.com/pytorch/pytorch/blob/420b37f3c67950ed93cd8aa7a12e673fcfc5567b/aten/src/ATen/native/Normalization.cpp#L61-L126
For CUDA, the implementation is here: https://github.com/pytorch/pytorch/blob/7aae51cdedcbf0df5a7a8bf50a947237ac4b3ee8/aten/src/ATen/native/cudnn/BatchNorm.cpp#L52-L143

Resources