Where can one configure the CNN used by the Spacy TextCategorizer?

According to the comment at the top of the TextCategorizer,
Train a convolutional neural network text classifier on the IMDB
dataset, using the TextCategorizer component. The dataset will be
loaded automatically via Thinc's built-in dataset loader. The model is
added to spacy.pipeline, and predictions are available via doc.cats.
For more details, see the documentation:
* Training: https://spacy.io/usage/training
Where is the code for the CNN? Can the CNN be configured? Is there a research paper the implementation is based on?

The network architecture is defined in the _.ml module specifically within the build_text_classifier function.
The code related with the training is within the pipeline module specifically within the TextCategorizer class.
Some parameters like drop_out, batch_size and the number of epochs can be configured as showed in the example, you can also modify the architecture of the network but for that you have to know about the framework behind spaCy which is called Thinc https://github.com/explosion/thinc and some Cython.
I don't know about any paper describing the model but this video provide a great description of it https://www.youtube.com/watch?v=sqDHBH9IjRU


Pull Weights From Keras Model Using Optuna Study

is there a way to pull the weights from the best performing keras model that was created using an Optuna study? The model I am working with is a fully connected network with dense layers.
The studies are called using the traditional method:
study = optuna.create_study()
study.optimize(objective, n_trials = 100)
I can supply any additional code that might be necessary.

Porting pre-trained keras models and run them on IPU

I am trying to port two pre-trained keras models into the IPU machine. I managed to load and run them using IPUstrategy.scope but I dont know if i am doing it the right way. I have my pre-trained models in .h5 file format.
I load them this way:
def first_model():
model = tf.keras.models.load_model("./model1.h5")
return model
After searching your ipu.keras.models.py file I couldn't find any load methods to load my pre-trained models, and this is why i used tf.keras.models.load_model().
Then i use this code to run:
cfg=ipu.utils.auto_select_ipus(cfg, 1)
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
model = first_model()
print('compile attempt\n')
model.compile("sgd", "categorical_crossentropy", metrics=["accuracy"])
print('compilation completed\n')
print('running attempt\n')
res = model.predict(input_img)[0]
print('run completed\n')
So i have some difficulties to understand how and if the system is working properly.
Basically the model.compile wont compile my model but when i use model.predict then the system first compiles and then is running. Why is that happening? Is there another way to run pre-trained keras models on an IPU chip?
Another question I have is if its possible to load a pre-trained keras model inside an ipu.keras.model and then use model.fit/evaluate to further train and evaluate it and then save it for future use?
One last question I have is about the compilation part of the graph. Is there a way to avoid recompilation of the graph every time i use the model.predict() in a different strategy.scope()?
I use tensorflow2.1.2 wheel
To add some context, the Graphcore TensorFlow wheel includes a port of Keras for the IPU, available as tensorflow.python.ipu.keras. You can access the API documentation for IPU Keras at this link. This module contains IPU-specific optimised replacement for TensorFlow Keras classes Model and Sequential, plus more high-performance, multi-IPU classes e.g. PipelineModel and PipelineSequential.
As per your specific issue, you are right when you mention that there are no IPU-specific ways to load pre-trained Keras models at present. I would encourage you, as you appear to have access to IPUs, to reach out to Graphcore Support. When doing so, please attach your pre-trained Keras model model1.h5 and a self-contained reproducer of your code.
Switching topic to the recompilation question: using an executable cache prevents recompilation, you can set that up with environmental variable TF_POPLAR_FLAGS='--executable_cache_path=./cache'. I'd also recommend to take a look into the following resources:
this tutorial gathers several considerations around recompilation and how to avoid it when using TensorFlow2 on the IPU.
Graphcore TensorFlow documentation here explains how to use the pre-compile mode on the IPU.

While implementing yolo in darknet, should we train on image net data set?

I have installed darknet in ubuntu and is now trying to implement object detection using yolo v2 on my custom dataset. In the yolo paper, they have told that they have pretrained the network using image net dataset. So, my question is should we also pretrain the network?
For most cases, if your dataset has lots of similar feature in the pre-trained weight (e.g. person, car), you should use the pre-trained network such as darknet53.conv.74 or darknet19_448.conv.23.
But you can also train the network without using those pre-trained network (training from scratch), for example by removing the weight from the command :
./darknet detector train data/obj.data yolo-obj.cfg

Extract CNN features using Caffe and train using SVM

I want to extract features using caffe and train those features using SVM. I have gone through this link: http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html. This links provides how we can extract features using caffenet. But I want to use Lenet architecture here. I am unable to change this line of command for Lenet:
./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10 leveldb
And also, after extracting the features, how to train these features using SVM? I want to use python for this. For eg: If I get features from this code:
features = net.blobs['pool2'].data.copy()
Then, how can I train these features using SVM by defining my own classes?
You have two questions here:
Extracting features using LeNet
Training an SVM
Extracting features using LeNet
To extract the features from LeNet using the extract_features.bin script you need to have the model file (.caffemodel) and the model definition for testing (.prototxt).
The signature of extract_features.bin is here:
Usage: extract_features pretrained_net_param feature_extraction_proto_file extract_feature_blob_name1[,name2,...] save_feature_dataset_name1[,name2,...] num_mini_batches db_type [CPU/GPU] [DEVICE_ID=0]
So if you take as an example val prototxt file this one (https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt), you can change it to the LeNet architecture and point it to your LMDB / LevelDB. That should get you most of the way there. Once you did that and get stuck, you can re-update your question or post a comment here so we can help.
Training SVM on top of features
I highly recommend using Python's scikit-learn for training an SVM from the features. It is super easy to get started, including reading in features saved from Caffe's format.
Very lagged reply, but should help.
Not 100% what you want, but I have used the VGG-16 net to extract face features using caffe and perform a accuracy test on a small subset of the LFW dataset. Exactly what you needed is in the code. The code creates classes for training and testing and pushes them into the SVM for classification.

How do I use a trained Theano artificial neural network on single examples?

I have been following the http://deeplearning.net/tutorial/ tutorial on how to train an ANN to classify the MNIST numbers. I am now at the "Convolutional Neural Networks" chapter. I want to use the trained network on single examples (MNIST images) and get the predictions. Is there a way to do that?
I have looked ahead in the tutorial and on google but can't find anything.
Thanks a lot in advance for any kind of help!
The material in the Theano tutorial in the earlier chapters, before reaching the Convolutional Neural Networks (CNN) chapter, give a good overview of how Theano works and some of the components the CNN sample code uses. It might be reasonable to assume that students reaching this point have developed their understanding of Theano sufficiently to figure out how to modify the code to extract the model's predictions. Here's a few hints.
The CNN's output layer, called layer3, is an instance of the LogisticRegression class, introduced in an earlier chapter.
The LogisticRegression class has an attribute called y_pred. The comments next to the code which assigns that attribute's values says
symbolic description of how to compute prediction as class whose
probability is maximal
Looking for places where y_pred is used in the logistic regression sample will highlight a function called predict(). This does for the logistic regression sample what is desired of the CNN example.
If one follows the same approach, using layer3.y_pred as the output of a new Theano function, the model's predictions will become apparent.
