Keras Using Multiple CPU Cores to Generate Data - python-3.x

The training data for my neural network does not fit in memory. If I use a single core to generate the training data, then it becomes impossibly slow as the data is very big.
I have been reading about Keras.Utils.Sequence and how it is multicore friendly. From what I can see, there little to no documentation on this class. The only helpful example I could find was this:
I still do not understand how I can make multiple cores to contribute to generating a single batch of training data for keras. I absolutely need this to not bottleneck the neural network.
Thank You


Distributed training on CPU's multiple cores in Keras

According to what I read in different posts and blogs and my limited trials, it looks like both Tensorflow and Keras use as many CPU cores available on an individual machine. That makes sense (and is indeed very nice) but does it mean that Keras will do distributed training across multiple cores? This is the thing that I want to do: I have a moderate model but large dataset and want to distribute learning of different batches between cores. I don't have much knowledge on parallel and distributed processing, but I just guess that a distributed learning requires some additional handling of data and gradient calculation and aggregation on top of basic multithreading/ multitasking. Does Keras do such a thing automatically when using different cores of the same CPU in an individual machine? How can I check it?
And if I want to go further and use multiple computers for distributed training, what everybody refers to is at .
But it is a bit complicated and doesn't mention anything about Keras in specific. Is there any other source that specifically discusses distributed training of Keras models over Tensorflow?

How to know when to use fit_generator() in keras when training data gets too big for fit()?

When using keras for machine learning, is used when training data is small. When training data is too big, model.fit_generator() is recommended instead of How does one know when data size has become too large?
The moment you run into memory errors when trying to take the training data into memory, you'll have to switch to fit_generator(). There is extra overhead associated with generating data on the fly (and reading from disk to do so), so training a model on a dataset that lives in memory will always be faster.

Deep learning on massive datasets

Theoretical question here. I understand that when dealing with datasets that cannot fit into memory on a single machine, spark + EMR is a great way to go.
However, I would also like to use tensorflow instead of spark's ml lib algorithms to perform deep learning on these large datasets.
From my research I see that I could potentially use a combination of pyspark, elephas and EMR to achieve this. Alternatively there is BigDL and sparkdl.
Am I going about this the wrong way? What is best practice for deep learning on data that cannot fit into memory? Should I use online learning or batch training instead? This post seems to say that "most high-performance deep learning implementations are single-node only"
Any help to point me in the right direction would be greatly appreciated.
In TensorFlow, you can use so you can generate your dataset at runtime without any storage hassles.
See link for example
As you mention "fitting massive dataset to memory", I understand that you are trying to load all data to memory at once and start training. Hence, I give the reply based on this assumption.
General mentality is that if you cannot fit the data to your resources, divide data into smaller chunks and train in an iterative way.
1- Load data one by one instead of trying to load all at once. If you create an execution workflow as "Load Data -> Train -> Release Data (This can be done automatically by garbage collectors) -> Restart" , you can understand how much resource is needed to train single data.
2- Use mini-batches. As soon as you get the resource information from #1, you can make an easy calculation to estimate the mini-batch size. For example, if training single data consumes 1.5 GB of RAM, and your GPU has 8 GB of RAM, theoretically you may train mini-batches with size 5 at once.
3- If the resources are not enough to train even 1-sized single batch, in this case, you may think about increasing your PC capacity or decreasing your model capacity / layers / features. Alternatively, you can go for cloud computing solutions.

Best practice for training on large scale datasets like ImageNet using Theano/Lasagne?

I found that all of the examples of Theano/Lasagne deal with small data set like mnist and cifar10 which can be loaded into memory completely.
My question is how to write efficient code for training on large scale datasets?
Specifically, what is the best way to prepare mini-batches (including real time data augmentation) in order to keep the GPU busy?
Maybe like using CAFFE's ImageDataLayer?
For example, I have a big txt file which contains all the image paths and labels.
It would be appreciated to show some code.
Thank you very much!
In case the data doesn't fit into memory, a good way is to prepare the minibatches and store them into an HDF5 file, which is then used at training time.
However, this does suffice when doing data augmentation as this is done on the fly. Because of Pythons global interpreter lock, images cannot already be loaded and preprocesed while the GPU is busy.
The best way around this, that I know of, is the Fuel library.
Fuel loads and preprocesses the minibatches in a different python process and then streams them to the training process over a TCP socket:
It additionally provides some functions to preprocess the data, such as scaling and mean subtraction:
Hope this helps.

Training Methodology of CNN in theano with large scale data

I am training a CNN with 1M images with theano. Now I am puzzled on how to prepare the training data.
My questions are:
When the images resize to 64*64*3, the size of whole data is about 100G. Should I save the data into a single npy file or some smaller files? which one is efficient?
How to decide the number of parameters of the CNN? How about 1M/10 = 100K?
Should I limit the memory cost of a training block and the CNN parameters less than GPU memory?
My computer is with 16G memory and GPU Titian.
Thank you very much.
If you're using a NN framework like pylearn2, lasagne, Keras, etc, check the docs to see if there are guidelines for iterating batches off disk from an hdf5 store or similar.
If there's nothing and you don't want to roll your own, the fuel package provides lots of helpful data iteration schemes that can be adapted to models in theano (and probably most of the frameworks; there's a good tutorial in the fuel repository).
As for the parameters, you'll have to cross validate to figure out the best parameters for your data.
And yes, the model size + minibatch size + dropout mask for the batch has to be under the available vram.
