when running torch.cuda.get_device_capability() on my GTX 1070 I get the following output: (6, 1).
Could someone explain what this means?
As stated in the comment, those are actully your GPU's compute capability version/indicator.
To put it simply, it describes the features supported by your GPU. You can also view your GPU compute capability using GPU-Z if you are on windows.
For more information concerning the exact feature set differences you can have a look Here.
As you can see here, based on, these allows the developers to know which feature-set is available to them and thus enable some features for some hardwares if they support it and otherwise fall back to other implementations of them.
This may be useful as well.
Related
I have been looking for a version of Stable-Diffusion which would be able to run on the IPU's. Currently (due to the high availability) so far I can find CUDA based ones only.
Now I wonder if there is a way to run scripts/trainers/learning etc that are Cuda based on IPU? For example a translation program in between.
I doubt there is, and I bet as I cannot find a IPU version I'll have to modify the scripts :(.
There is the HuggingFace optimum library which acts as the interoperability layer for transformers to run on IPUs. You can find Stable Diffusion there.
For other models that are not supported in the library, there's a guide on how you could modify your script to make it IPU-compatible here
I wanted to see how the conv1d module is implemented
https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv1d. So I looked at functional.py but still couldn’t find the looping and cross-correlation computation.
Then I searched Github by keyword ‘conv1d’, checked conv.cpp https://github.com/pytorch/pytorch/blob/eb5d28ecefb9d78d4fff5fac099e70e5eb3fbe2e/torch/csrc/api/src/nn/modules/conv.cpp 1 but still couldn’t locate where the computation is happening.
My question is two-fold.
Where is the source code that "conv1d” is implemented?
In general, if I want to check how the modules are implemented, where is the best place to find? Any pointer to the documentation will be appreciated. Thank you.
It depends on the backend (GPU, CPU, distributed etc) but in the most interesting case of GPU it's pulled from cuDNN which is released in binary format and thus you can't inspect its source code. It's a similar story for CPU MKLDNN. I am not aware of any place where PyTorch would "handroll" it's own convolution kernels, but I may be wrong. EDIT: indeed, I was wrong as pointed out in an answer below.
It's difficult without knowing how PyTorch is structured. A lot of code is actually being autogenerated based on various markup files, as explained here. Figuring this out requires a lot of jumping around. For instance, the conv.cpp file you're linking uses torch::conv1d, which is defined here and uses at::convolution which in turn uses at::_convolution, which dispatches to multiple variants, for instance at::cudnn_convolution. at::cudnn_convolution is, I believe, created here via a markup file and just plugs in directly to cuDNN implementation (though I cannot pinpoint the exact point in code when that happens).
Below is an answer that I got from pytorch discussion board:
I believe the “handroll”-ed convolution is defined here: https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/SpatialConvolutionMM.c 3
The NN module implementations are here: https://github.com/pytorch/pytorch/tree/master/aten/src
The GPU version is in THCUNN and the CPU version in THNN
I have been asked to integrate a custom JPEG encoder kernel module to the linux tree. The description is too generic. Can anyone suggest where in kernel tree should this go? I mean under what category in the drivers? I am assuming this is going to be compiled as a module and not statically linked to the kernel. If I generalize the question where should any custom kernel module live in the kernel tree? Assume the kernel module is a video/audio decoder/encoder. In this case it is a JPEG encoder as I said.
Any help will be highly appreciated.
Thanks.
When I posted this question I did not have clarity as how drivers are categorized and placed in the kernel tree. So explored and this is what I found so far:
If I am integrating/writing a new driver e.g. Ring Oscillator (this device simply generates some frequencies given a input period value, the frequency number is fed to a random number generator). To my understanding this should go under linux/drivers/misc/ whereas someone argued this should go under linux/drivers/misc/. But apart from that there seems to no strict rule where this kind of drivers should go. So it is quite up to your discretion and judgment where you ultimately place it. I have given the details of the steps involved here.
I also had to integrate a jpeg encoder and I was confused where this driver should go. I initially thought I will place it under linux/drivers/media/ as suggested in the comments. But this turned out to be a matter of preference. Finally I integrated it as a new buildroot package. In case you are interested I have described it here.
This is my understanding so far. If anyone thinks if I have missed anything please kindly point out.
As most of you know CPUs are not well designed to do floating point calculation in contrast to GPUs. I am wondering how to use GPU's power without any abstraction layer or driver. Can I program for a GPU using assembly, C, C++ language (I mean how?). Although assembly seems to help me access the gpu directly, C/C++ are likely to need a medium library (e.g. OpenCL) to access the GPU.
Let me ask you another question: How much of a modern GPU's capability will be exposed to a programmer without any third-party driver?
The interfaces aren't documented so something like OpenCL is the only practical way to program the GPU directly.
Without a driver you would be stuck trying to reverse engineer the complete functionality of the GPU on your own.
Well, essentially, you would have to write a driver on either Windows or Linux. And the interfaces may be documented depending on which chipset you are trying to use. Intel has loads of PDF documentation on there website. However, this is a non trivial exercise at best and your code would only be able to used on that set of hardware. Meerly reading and understanding the documentation will take a bit of doing in most cases because "OOPs that's not how it really works" and how-tos do this or that aren't documented just the hardware and registers. However if REALLY want to do this your best bet would be to start with open source drivers on Linux for a particular chipset and tweek the to your SICK TWISTED purpose. All in all, other than for the learning aspect, it's prob a BAD idea.
The GPU manufacturer like NVDIA and ATI are closed source companies which has chosen not to disclose the GPU architecture and working abouts to the general public. This is why we cannot directly program the GPU as we can with the most CPU. The only way we can harness the power of the GPU for calculation is by using the provided library like CUDA in case of NVDIA. But there is a possible way where you can directly program a GPU for calculations but for that you need to reverse engineer and document all GPU and its registers and SYSTEMCALLS and you know that is not possible with our access to limited resources and limited time.
PS: The only other way is to sign in as a core developer for the GPU and sign a NDA (Non Disclosure Agreement) with the vendors which is likely not going to happen for starters and individuals like us.
I am in the process of tackling the Linux Kernel learning curve and trying to get my head round the information stored in nested struct specifically to resolve a ALSA driver issue.
Hence, I am spending a lot of my time in the source code tracing through structures that have pointers to other structures that in turn have pointers to yet other structures...by which time my head has become so full that I start to loose track of the big picture!
Can anybody point me at either a tool or a website (along the lines of the highly usful Linux Cross Reference http://lxr.linux.no/) that will allow me to, ideally graphically, expand down through the nested struct of the source code?
At the moment we are developing for an Embedded PowerPC in Eclipse CDT version 4.0 but wouldn't be opposed to switching tool chains.
Regards
KermitG
This may sound old fashion but I've found that tracing through data structures with a pencil and paper helps you reverse engineer the code better than tools that automagically do this. So, my recommendation is that you draw them yourself so that you don't have to keep it all in your head. Once you've done this your learning curve becomes a lot less steep.
Just a copy/paste of my comment, so that this question has at least 1 answer.
Or alternatively you could use something like Doxygen to generate the diagrams for you. It's worth noting a lot of the DocBook books get their structures directly from annotated code.
I am currently using Kdevelop4 (svn version) to walk through the Linux kernel. The navigation capabilities are great, but it takes a big while to parse it (just give it the directories you need, omitting all drivers you are not interested in for example) and is still a little bit crashy.
Once the stability improves and the parser can cache previously parsed data, I think this will become the most convenient way to walk through the kernel.