Running "Get started with TensorFlow 2.0 for beginners" from
https://www.tensorflow.org/beta/tutorials/quickstart/beginner
in Colab
https://colab.research.google.com/github/tensorflow/docs/blob/r2.0rc/site/en/r2/tutorials/quickstart/beginner.ipynb
works fine and only takes a few seconds. The output generated is:
But I would like to run it locally. I extracted the python code from the notebook. When started, the output does not look as intended (problem with backspace?) and the ETA (estimated arrival time) keeps growing and the program does not finish within reasonable time.
Can you please help me out finding what the problem is?
The tutorial on Colab uses CPU by default, so make sure you are not using GPU there. If not then look at your CPU RAM power, on colab, CPU has 13 GB RAM approx. Here are the specs of colab. The problem is mostly because of the CPU power.
Related
I'm not sure if it's BERT related or not, had no chance to test other models, but did it for BERT.
So what I noticed recently that training algorithms and data that I used to work with in google colab for free, are seemed to work significantly slower in Azure ML workspace which we pay for.
I made the comparison - same data file (classification problem, sentiment analysis of 10K reviews), totally same notebook code (copy+paste), same latest ver of ktrain lib installed on both, both must be on Python 3.8, but GPU is a bit more performant on a colab side.
Results surprised me to say the least: google lab made its job 10 times faster: 17 min vs 170 min, and it's reproducible. Tesla T4 (colab) is faster than K80 (azure) indeed, but not that much as per known benchmarks. So I wonder what else could matter. Is it virt. environment created in Azure ML performing so slow? If you have any idea what it could be, or what else I can check on both sides to reveal it, please share
BTW google gives you T4 in colab for your experimentations for free, while you have to pay for slower K80 at Azure.
Google colab
execution time = 17 min
Google colab hardware: cpu model: Intel(R) Xeon(R) CPU # 2.20GHz, memory 13Gb, GPU:
Azure
execution time = 2h50m = 170min (10x of colab)
Azure hardware information
K80 and T4 comparison: https://technical.city/en/video/Tesla-K80-vs-Tesla-T4
There are different Azure VM Types available with different GPUs for different use-cases:
https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-gpu
The NCv3 or NC T4_v3 Series also will provide you with a Tesla T4 GPU.
Maybe give those a try as well.
I wrote a code for my CNN. I was tuning my model with 10 iteration of 20 epochs. When I run The code on my local 4gb gpu, Ram exhausted on 9th iteration. When I run same code on google colab 12gb ram exhaust on 1st iteration. How can it possible. It has thrice the size of my gpu. Still use more ram. Please explain.
my dataset size is about 5GB including 2000 short videos. I am using a system with 8 x 32GB GPUs.
But, when I run my code in epoch 7 (sometimes it also happens in 6,8,9). This is the error message screen shot
As it shows, the error happens in GPU[0], while we have 8 GPUs, (GPU[0],GPU1,GPU[2],GPU[3],GPU[4],GPU[5],GPU[6],GPU[7]). I have no idea whether the server uses GPUs sequentially or parallel. How should I define and active GPUs at first to get rid of this error message?
Please help me to solve this run out of memory error.
Thanks,
I am currently trying to use the vgg16 model from keras library but whenever I create an object of the VGG16 model by doing
from keras.applications.vgg16 import VGG16
model = VGG16()
I get the following message 3 times.
tensorflow/core/framework/allocator.cc.124 allocation of 449576960 exceeds 10% of system memory
following this, my computer freezes. I am using a 64-bit, 4gb RAM with linux mint 18 and I have no access to GPU.
Is this problem has to do something with my RAM?
As a temporary solution I am running my python scripts from command line because my computer freezes less there compared to any IDE. Also, this does not happen when I use any alternate model like InceptionV3.
I have tried the solution provided here
but it didn't work
Any help is appreciated.
You are most likely running out of memory (RAM).
Try running top (or htop) in parallel and see your memory utilization.
In general, VGG models are rather big and require a decent amount of RAM. That said, the actual requirement depends on batch size. Smaller batch means smaller activation layer.
For example, a 6 image batch would consume about a gig of ram (reference). As a test you could lower your batch size to 1 and see it that fits in your memory.
I have installed Tensorflow and Keras with an Anaconda installation on Windows 10. I´m using an Intel i7 processor. It takes 40 minutes to train 4000 data samples of a CSV file and I´m trying to perform a LSTM RNN predictive analytics on this data.
Is this an expected compile time using CPU? Can we make it faster using cpu or switching to GPU?
Yes, this does seem like a reasonable amount of time for your code to run when you're training using only a CPU. If you used a NVIDIA GPU it would run much faster.
However, you might not be using every core on your CPU; if you are, it might run faster. You can change the number of threads that Tensorflow uses by running
sess = tf.Session(config=tf.ConfigProto(intra_op_parallelism_threads=NUM_THREADS))
If you set the number of threads equal to those provided by your CPU, it should run faster.