In using tensorboard I have cleared my data directory and trained a new model but I am seeing images from an old model. Why is tensorboard loading old data, where is it being stored, and how do I remove it?
Tensorboard was built to have caches in case long training fails you have "bak"-like files that your board will generate visualizations from. Unfortunately, there is not a good practice to manually remove hidden temp files as they are not seen from displaying files including ones with the . (dot) prefix using bash. This memory is self-managed. For best practices, (1) have your tensorboard name be dynamic for results of each run: this can be done using datetime library in combination with an f-string in python so that the name of each run is separated by a time stamp. (This command be done right from python, say a jupyter notebook, if you import the subprocess package and run your bash command straight from the script.) (2) Additionally, you are strongly advised to save your logdir (log directory) separately from where you are running the code. These two practices together should solve all the problems related to tmp files erroneously populating new results.
How to "reset" tensorboard data after killing tensorflow instance
Related
I recently finished training a linear regression algorithm but I don't know how to save it so that in the future, I can use it to make relevant predictions without having to retrain it whenever I want to use it.
Do I save the .py file and call it whenever I need it or create a class or what?
I just want to know how I can save a model I trained so I can use it in the future.
Depending on how you make the linear regression, you should be able to obtain the equation of the regression, as well as the values of the coefficients, most likely by inspecting the workspace.
If you explain what module, function, or code you use to do the regression, it will be easier to give a specific solution.
Furthermore, you can probably use the dill package:
https://pypi.org/project/dill/
I saw the solution here:
https://askdatascience.com/441/anyone-knows-workspace-jupyter-python-variables-functions
The steps proposed for using dill are:
Install dill. If you use conda, the code would be conda install -c anaconda dill
To save workspace using dill:
import dill
dill.dump_session('notebook_session.db')
To restore sesion:
import dill
dill.load_session('notebook_session.db')
I saw the same package discussed here: How to save all the variables in the current python session?
and I tested it using a model created with the interpretML package, and it worked for me.
I need to write a program that loads a pretrained model in SavedModel format and analyzes the types of operations present in the graph.
In TF1 I used graph.get_operations() to get the list of all nodes and I could even modify the graph using tf.contrib.graph_editor. In TF2 these APIs don't work anymore: the list of operations doesn't provide useful information (I can't distinguish a Matmul from an Add node for example) and the tf.contrib.graph_editor has been deleted.
I know that in TF2 I can still "explore" the model visually with tensorboard and/or the debugger, but I need to write a program that can do this analysis automatically.
Is there any way to do that using python?
Thank you.
I'm trying to construct an autoencoder for ultrasound images, and am unable to use ImageDataGenerator.flow_from_directory() to provide train/test datasets due to segfault on call to the method. No augmentation is being used, which should only result in the original images being provided by the generator.
The source images are in TIFF format, so I first tried converting them to JPG and PNG thinking that maybe PIL was faulting on the encoding, no difference. I have tried converting to different color modes (grayscale, RGB, RGBA) with no change in behavior. I have stripped the code down to the bare minimum, taking defaults for nearly all function params and still getting a segfault on call in both debug and full run.
# Directory below contains a single subdirectory "input" containing 5635 TIFF images
from keras.preprocessing.image import *
print('Create train_gen')
train_gen = ImageDataGenerator().flow_from_directory(
directory=r'/data/ultrasound-nerve-segmentation/train/',
class_mode='input'
)
print('Created train_gen')
Expected output is a report of 5635 images found in one class "input" and both debug messages to print out, with usable generator for use in Model.fit_generator().
Actual output:
Using TensorFlow backend.
Create train_gen
Found 5635 images belonging to 1 classes.
Segmentation fault
Is there something I'm doing above that could be causing the problem? According to every scrap of sample code I can find, it looks like it should be working.
Environment is:
Ubuntu 16.04 LTS
CUDA 10.1
tensorflow-gpu 1.14
Keras 2.2.4
Python 3.7.2
Thanks for any help you can provide!
OK so I haven't pegged specifically why it is segfaulting, but it appears to be related to the virtualenv it is running under. I was apparently using a JupyterHub environment, which seems to not behave even when run from an ssh session (vs from within JupyterHub consoles). Once I created a whole new standalone virtualenv with only the TF + Keras packages installed, it appears to run just fine.
I am using Pytorch on Windows 10 OS, and having trouble understanding the correct use of Pytorch TensorboardX.
After instantiating a writer (writer = SummaryWriter()), and adding the value of the loss function (in every iteration) to it (write.add_scalar('data/loss_func', loss.data[0].item(), iteration)), I have a folder which contains the saved run.
My questions are:
1) As far as I understand, after the training is complete, I need to write in the terminal (which corresponds to the command line prompt in Windows):
tensorboard --logdir=C:\Users\DrJohn\Documents\runs
where this is the folder which contains the file created by tensorboardX. What is the valid syntax in Windows command prompt? I couldn't understand this from the online tutorials
2) Is it possible to see the learning during the training, by using tensorboardX? (i.e. to plot the learning curve during the iterations?)Is the only option is to see everything once the training ends?
Thanks in advance
I am trying make an app which will detect traffic sign from video's frames.I am using yolo-tensor by following steps from https://github.com/thtrieu/darkflow .
I need to know how can I train this model with my data-set of images of traffice signs?
If you're using Darkflow on Windows then you need to make some small adjustments to how you use Darkflow. If cloning the code and using straight from the repository then you need to place python in front of the commands given as it is a python file.
e.g. python flow --imgdir sample_img/ --model cfg/yolo-tiny.cfg --load bin/yolo-tiny.weights --json
If you are installing using pip globally (not a bad idea) and you still want to use the flow utility from any directory just make sure you take the flow file with you.
To train, use the commands listed on the github page here: https://github.com/thtrieu/darkflow
If training on your own data you will need to take some extra steps as outlined here: https://github.com/thtrieu/darkflow#training-on-your-own-dataset
Your annotations need to be in the popular PASCAL VOC format which are a set of xml files including file information and the bounding box data.
Point your flow command at your new dataset and annotations to train.
The best data for you to practice is PASCAL VOC dataset. There are 2 folders you need to prepare for the training. 1 folder with images and 1 folder with xml files(annotation folder), 1 image will need 1 xml file (have the same name) content all the basic informations (object name, object position, ...). after that you only need to choose 1 predefine .cfg file in cfg folder and run the command follow:
flow --model cfg/yolo-new.cfg --train --dataset "path/to/images/folder" --annotation "path/to/annotation/folder"
Read more the options supported by darkflow to optimize more the training process.
After spending too much time on how to train custom data set for object detection
Prerequisite :
1:training environment : a system with at least 4gb GPU or you can use AWS / GCP pre-configured cloud machine with cuda 9 installation
2: ubuntu 16.04 os
3: images of the object you want to detect. images size should not be too much large it will create out of memory issue in dataset training
4: labelling tool many are available like LabelImg/ BBox-Label-Tool i used is also good one
I tried python project dataset-generator also but result of labelling using dataset generator was not efficient in real time scenarios
My suggestion for training environment is to use AWS machine rather than spend time in local installation of cuda and cudnn even though you are able to install cuda locally but if you are not having GPU >= 4 gb you will not be able to train many times it will break due to out of memory issue
solutions to train data set :
1: train ssd_mobilenet_v2 data set using tensorflow object detection api
this training output can be use on both android , ios platform
2: use darknet to train data set which required pascal VOC data format of labelling , for that labelIMG can do the job of labelling very good
3: retrain that data weights which comes as output from darknet with darkflow