I am new to deep-learning and I will do something on fashion-mnist.
And I come to found that the hyperparameter of parameter "transform" can be callable and optional and I found that it can be ToTensor().
What can I use as a transform's hyperparameter? Where do I find it?
Actually, I am watching :
https://pytorch.org/vision/stable/datasets.html#fashion-mnist
But I got no answer about it. Help me please. Thank you.
As you noted, transform accepts any callable. As there are a lot of transformations that are commonly used by the broader community, many of them are already implemented by libs such as torchvision, torchtext, and others. As you intend to work on FashionMNIST, you can see a list of vision-related transformations in torchvision.transforms:
Transforms are common image transformations. They can be chained together using Compose. Most transform classes have a function equivalent: functional transforms give fine-grained control over the transformations. This is useful if you have to build a more complex transformation pipeline (e.g. in the case of segmentation tasks).
You can check more transformations in other vision libs, such as Albumentations.
Related
Torchscript provides torch.jit.trace and torch.jit.script to convert pytorch code from eager mode to script model. From the documentation, I can understand torch.jit.trace cannot handle control flows and other data structures present in the python. Hence torch.jit.script was developed to overcome the problems in torch.jit.trace.
But it looks like torch.jit.script works for all the cases, then why do we need torch.jit.trace?
Please help me understand the difference between these two methods
If torch.jit.script works for your code, then that's all you should need. Code that uses dynamic behavior such as polymorphism isn't supported by the compiler torch.jit.script uses, so for cases like that, you would need to use torch.jit.trace.
I am working on a project that requires inversion of multiple large dense matrix. I would like to know if ojAlgo supports multi-threading.
I do notice the github page for ojAlgo is tagged with multi-threading, however I am unable to find any documentation about it.
Thanks
So tf.image for example has some elementary image processing methods already implemented which i'd assumed are optimized. The question is as I'm iterating through a large dataset of images what/how is the recommended way of implementing a more complex function on every image, in batches of course, (for example a a patch 2-D DCT) for it to go as best as possible with the whole tf.data framework.
Thanks in advance.
p.s. of course I could use the "Map" method but i'm asking beyond that. like if I'm passing a pure numpy written function to pass to "map", it wouldn't help as much.
The current best approach (short of writing custom ops in C++/CUDA) is probably to use https://www.tensorflow.org/api_docs/python/tf/contrib/eager/py_func. This allows you to write any TF eager code and use Python control flow statements. With this you should be able to do most of the things you can do with numpy. The added benefit is that you can use your GPU and the tensors you produce in tfe.py_func will be immediately usable in your regular TF code - no copies are needed.
I hope to use MC-Stan on Spark, but it seems there is no related page searched by Google.
I wonder if this approach is even possible on Spark, therefore I would appreciate if someone let me know.
Moreover, I also wonder what is the widely-used approach to use MCMC on Spark. I heard Scala is widely used, but I need some language that has a decent MCMC library such as MC-Stan.
Yes it's certainly possible but requires a bit more work. Stan (and popular MCMC tools that I know of) are not designed to be run in a distributed setting, via Spark or otherwise. In general, distributed MCMC is an area of active research. For a recent review, I'd recommend section 4 of Patterns of Scalable Bayesian Inference (PoFSBI). There are multiple possible ways you might want to split up a big MCMC computation but I think one of the more straightforward ways would be splitting up the data and running an off-the-shelf tool like Stan, with the same model, on each partition. Each model will produce a subposterior which can be reduce'd together to form a posterior. PoFSBI discusses several ways of combining such subposteriors.
I've put together a very rough proof of concept using pyspark and pystan (python is the common language with the most Stan and Spark support). It's a rough and limited implementation of the weighted-average consensus algorithm in PoFSBI, running on the tiny 8-schools dataset. I don't think this example would be practically very useful but it should provide some idea of what might be necessary to run Stan as a Spark program: partition data, run stan on each partition, combine the subposteriors.
Can anyone show a simple implementation or usage example of a tf-idf algorithm in Smalltalk for natural language processing?
I've found an implementation in a package called NaturalSmalltalk, but it seems too complicated for my needs. A simple implementation in Python is like this one.
I've noticed there is another tf-idf in Hapax, but it seems related to analysis of software systems vocabularies, and I didn't found examples of how to use it.
I am the author of the original Hapax package for Visualworks. Hapax is a general purpose information retrieval package, it should be able to work with any kind of text files. I just happens so that I used to use it to analyze source code files.
The class that you are looking for is TermDocumentMatrix, there should be two methods globalWeighting: and localWeighting: to which you pass instances of InverseDocumentFrequency and either LogTermFrequency or TermFrequency depending on your needs. Typically when referring to tfidf people mean it to include logarithmic term frequencies.
There should best tests demonstrating the TDM class using a small example corpus. If the tests have not been ported to Squeak, please let me know so I can provide you with an example.
TextLint is a system based on PetitParser to parse and match patterns in natural language. It doens't provide what you ask for, but it shouldn't be too difficult to extend the model to compute word frequencies.