Where to raise a typo in the PyTorch Documentation? - pytorch

I have found a typo in the official PyTorch Documentation. Where can I raise the flag so that it is rectified?

From the PyTorch Contribution Guide, in the section on Documentation:
Improving Documentation & Tutorials
We aim to produce high quality documentation and tutorials. On rare
occasions that content includes typos or bugs. If you find something
you can fix, send us a pull request for consideration.

Related

What happened to sklearn.datasets.load_boston?

While I was coding a Boston housing model using sklearn.datasets.load_boston, it gave me an error saying that the database was deprecated due to 'ethical' issues. What are those issues? I looked online, and could not find anything.
Here's the full error:
DEPRECATED: load_boston is deprecated in 1.0 and will be removed in 1.2.
The Boston housing prices dataset has an ethical problem. You can refer to the documentation of this function for further details.
The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning.
In this special case, you can fetch the dataset from the original source:
Actually, it is just exactly as it is in the error. You can check https://scikit-learn.org/1.1/modules/generated/sklearn.datasets.load_boston.html for further details.
As I understand, there are 2 problems in the data:
Racism: There is a great article, which was also cited in the Scikit-Learn documentation by M. Carlisle. It focuses on the main issues of the Boston Housing dataset, which he found that house prices effected by neighbourhood race.
No suitable goal: "the goal of the research that led to the creation of this dataset was to study the impact of air quality but it did not give adequate demonstration of the validity of this assumption."
However, you can get the data from the source:
http://lib.stat.cmu.edu/datasets/boston
I hope these help.
Yes, this dataset is removed from scikit_learn version 1.2. If you want to use it you can install the earlier version of scikit-learn:
pip install scikit-learn==1.1.3
This will still show the warning and you can use it only for educational purposes.

how do I find examples of code with CVE vulns in it?

Sup, everyone. I wanted to see how different CVE vulnerabilities look in real code examples. Not exploits, but vulnerable code. So, does anyone know if there is a site, git repo or anything with such stuff? Or i just have to search git for fixes of vulnerabilities and compare code like before/after?
This is a little bit of vague question.
However, when it comes to WebApps OWASP is a great resources. They have a number of projects e.g. WebGoat. This gives you both examples of insecure code and also tutorials around how to avoid the problems.
Of course, this doesn't necessarily include all recent CVE's, but is a great example of some real code examples.
Well, i'll post what i've found on the topic just in case. The original question resulted from my lack of knowledge about CVEs being an outcome of CWE exploitation. So, i should've been looking for code examples containing CWEs instead of CVEs. Hence, the web-site i was looking for is nist gov and their Test Suites.

NLP (topic modeling) with PLSA

Im trying to understand PLSA (probabilistic latent semantic analysis), to do text modeling (NLP), the problem in every article i red, it's only maths (probabilities), without any semi-algorithme or anything to help you understand that, is there any link where i can understand PLSA please ?
The P in PLSA stands for probablistic and hence I am afraid you may not find any article that does not talk about these. The model itself is a probablistic model and some knowledge of joints, conditionals, independence etc are expected. I would recommend https://medium.com/nanonets/topic-modeling-with-lsa-psla-lda-and-lda2vec-555ff65b0b05 which I found to be the best online resource. There is a bit of Math but most of it is explained well. About PLSA algorithm - I am not sure. It is not used that often and one almost always prefers LDA. I could find a GitHub implementation of solving PLSA using EM here: https://github.com/laserwave/plsa.

pytorch - Where is “conv1d” implemented?

I wanted to see how the conv1d module is implemented
https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv1d. So I looked at functional.py but still couldn’t find the looping and cross-correlation computation.
Then I searched Github by keyword ‘conv1d’, checked conv.cpp https://github.com/pytorch/pytorch/blob/eb5d28ecefb9d78d4fff5fac099e70e5eb3fbe2e/torch/csrc/api/src/nn/modules/conv.cpp 1 but still couldn’t locate where the computation is happening.
My question is two-fold.
Where is the source code that "conv1d” is implemented?
In general, if I want to check how the modules are implemented, where is the best place to find? Any pointer to the documentation will be appreciated. Thank you.
It depends on the backend (GPU, CPU, distributed etc) but in the most interesting case of GPU it's pulled from cuDNN which is released in binary format and thus you can't inspect its source code. It's a similar story for CPU MKLDNN. I am not aware of any place where PyTorch would "handroll" it's own convolution kernels, but I may be wrong. EDIT: indeed, I was wrong as pointed out in an answer below.
It's difficult without knowing how PyTorch is structured. A lot of code is actually being autogenerated based on various markup files, as explained here. Figuring this out requires a lot of jumping around. For instance, the conv.cpp file you're linking uses torch::conv1d, which is defined here and uses at::convolution which in turn uses at::_convolution, which dispatches to multiple variants, for instance at::cudnn_convolution. at::cudnn_convolution is, I believe, created here via a markup file and just plugs in directly to cuDNN implementation (though I cannot pinpoint the exact point in code when that happens).
Below is an answer that I got from pytorch discussion board:
I believe the “handroll”-ed convolution is defined here: https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/SpatialConvolutionMM.c 3
The NN module implementations are here: https://github.com/pytorch/pytorch/tree/master/aten/src
The GPU version is in THCUNN and the CPU version in THNN

Algorithmic details behind Deep Feature Synthesis and Featuretools?

In order to use properly, it is important to understand the algorithmic/mathematical basis for Deep Feature Synthesis and featuretools. Are there papers, patents, comparison with other tools?
You can find the peer reviewed paper on Deep Feature Synthesis (the algorithm used in Featuretools) here: https://dai.lids.mit.edu/wp-content/uploads/2017/10/DSAA_DSM_2015.pdf.
The implementation has changed since publication, but the core ideas have not. Refer to the documentation or source on GitHub for the latest details.

Resources