After using GridSearchCV, is there any way to find out if StratifiedKFold was really used instead of KFold?
As an estimator I used SVC (Support Vector Machine) with a cv=10.
I know that the documentation (scikit-learn Version 0.21.3) says that StratifiedKFold is actually used in this case. I, however, suspect that this may not have been the case.
Many thanks for your help.
If you are unsure, you can always enter into the github repo and read the code. Take a look here, where the function is defined.
Also, exactly in this line you have your answer. Yes it does.
Related
I am studying some source codes from PytorchGeometric.
Actually I am really finding from torch_sparse import SparseTensor in Google, to get how to use SparseTensor.
But there is nothing I can see explanation. I saw many documents about COO,CSR something like that, but how can I use SparseTensor?
I read : https://pytorch.org/docs/stable/sparse.html# but there is nothing like SparseTensor.
Thank you in advance :)
I just had the same problem and stumbled upon your question, so I will just detail what I did here, maybe it helps someone. I think the main confusion results from the naming of the package. SparseTensoris from torch_sparse, but you posted the documentation of torch.sparse. The first is an individual project in the pytorch ecosystem and a part of the foundation of PyTorch Geometric, but the latter is a submodule of the actual official PyTorch package.
So, looking at the right package (torch_sparse), there is not much information about how to use the SparseTensor class there (Link).
If we go to the source code on the other hand (Link) you can see that the class has a bunch of classmethods that you can use to genereate your own SparseTensor from well documented pytorch classes.
In my case, all I needed was a way to feed the RGCNConvLayer with just one Tensor including both the edges and edge types, so I put them together with the following line:
edge_index = SparseTensor.from_edge_index(edge_index, edge_types)
If you, however, already have a COO or CSR Tensor, you can use the appropriate classmethods instead.
I am using PyTorch to carry out vision tasks, but would like to use some of what fast.ai provides since it has a lot of useful functionality. I'd prefer to work mostly in PyTorch since it's easier for me to understand what's going on, it's easier for me to find information on it online, and I want to maintain flexibility.
In https://docs.fast.ai/migrating_pytorch it's written that after I use the following imports: from fastai.vision.all import * and from migrating_pytorch import *, I should be able to start "Incrementally adding fastai goodness to your PyTorch models", which sounds great.
But when I run the second import I get ModuleNotFoundError: No module named 'migrating_pytorch'. Searching in https://github.com/fastai/fastai I also don't find any code mention of migrating_pytorch.py, nor did I manage to find something online.
(I'm using fast.ai version 2.3.1)
I'd like to know if this is indeed the way to go, and if so how to get it working. Or if there's a better way then how I should use that approach instead.
As an example, it would be nice if I could use the EarlyStoppingCallback, SaveModelCallback, and add some metrics from fast.ai instead of writing them myself, while still having everything in mostly "native" PyTorch.
Preferably the solution isn't specific to vision only, but that's my current need.
migrating_pytorch is an example script. It's in the fast.ai repo at: https://github.com/fastai/fastai/blob/master/nbs/examples/migrating_pytorch.py
The notebook that shows how to use it is at: https://github.com/fastai/fastai/blob/827e7cc0fad2db06c40df393c9569309377efac0/nbs/examples/migrating_pytorch.ipynb
For the callback example. Your training code would end up looking something like:
cbs = [EarlyStoppingCallback(), SaveModelCallback()]
learner = Learner(dls, simple_cnn(), loss_func=F.cross_entropy, cbs=cbs)
learner.fit(1)
Those two callbacks probably need some arguments, e.g. save path, etc.
I am trying to do research on batch normalization, and had to make some modifications for the pytorch BN code. I dig into the pytorch code and got stuck with torch.nn.functional.batch_norm, which references torch.batch_norm.
The problem is that torch.batch_norm cannot be further found in the torch library. Is there any way I can find the source code of this built-in function and re-implement it? Thanks!
It's there, but it's not defined in Python. They're defined in C++ in the aten/ directories.
For CPU, the implementation (one of them, it depends on whether or not the input is contiguous) is here: https://github.com/pytorch/pytorch/blob/420b37f3c67950ed93cd8aa7a12e673fcfc5567b/aten/src/ATen/native/Normalization.cpp#L61-L126
For CUDA, the implementation is here: https://github.com/pytorch/pytorch/blob/7aae51cdedcbf0df5a7a8bf50a947237ac4b3ee8/aten/src/ATen/native/cudnn/BatchNorm.cpp#L52-L143
I wanted to see how the conv1d module is implemented
https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv1d. So I looked at functional.py but still couldn’t find the looping and cross-correlation computation.
Then I searched Github by keyword ‘conv1d’, checked conv.cpp https://github.com/pytorch/pytorch/blob/eb5d28ecefb9d78d4fff5fac099e70e5eb3fbe2e/torch/csrc/api/src/nn/modules/conv.cpp 1 but still couldn’t locate where the computation is happening.
My question is two-fold.
Where is the source code that "conv1d” is implemented?
In general, if I want to check how the modules are implemented, where is the best place to find? Any pointer to the documentation will be appreciated. Thank you.
It depends on the backend (GPU, CPU, distributed etc) but in the most interesting case of GPU it's pulled from cuDNN which is released in binary format and thus you can't inspect its source code. It's a similar story for CPU MKLDNN. I am not aware of any place where PyTorch would "handroll" it's own convolution kernels, but I may be wrong. EDIT: indeed, I was wrong as pointed out in an answer below.
It's difficult without knowing how PyTorch is structured. A lot of code is actually being autogenerated based on various markup files, as explained here. Figuring this out requires a lot of jumping around. For instance, the conv.cpp file you're linking uses torch::conv1d, which is defined here and uses at::convolution which in turn uses at::_convolution, which dispatches to multiple variants, for instance at::cudnn_convolution. at::cudnn_convolution is, I believe, created here via a markup file and just plugs in directly to cuDNN implementation (though I cannot pinpoint the exact point in code when that happens).
Below is an answer that I got from pytorch discussion board:
I believe the “handroll”-ed convolution is defined here: https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/SpatialConvolutionMM.c 3
The NN module implementations are here: https://github.com/pytorch/pytorch/tree/master/aten/src
The GPU version is in THCUNN and the CPU version in THNN
What is the function K in the following link. I want to really understand how this algorithm works. Please advise:
http://scikit-learn.org/stable/modules/clustering.html#mean-shift
My guess is that it's the rbf kernel function since the first parameter in MeanShift is the bandwidth of the kernel.
EDIT: K is a actually a flat (uniform) kernel. A good diagram with the different kernel types is at http://en.wikipedia.org/wiki/Kernel_(statistics)#Some_of_the_kernels_mentioned_above_in_a_common_coordinate_system