I have a python telegram bot and I have deployed it on Heroku. But problem is that my program actually creates pickled files while working. I have one database which is saving the required data and pickles to save some nested classes which I have to use later at some stage. So these pickle files are one of the important parts of the program. I am using the dill module for pickling.
I was able to save these files locally, but can't do when I am doing it in Heroku. I'll share the logs below. It is not even reaching the pickling part, but giving an error in opening the file itself.
import dill
def saving_test(test_path, test_obj):
try:
save_test_logger.info('Saving test...')
try:
save_test_logger.info('opening file')
test_file = open(test_path, 'wb')
except Exception as exc:
save_test_logger.exception('Error opening file')
return 0
dill.dump(test_obj, test_file)
save_test_logger.debug(f'file saved in {test_path}')
test_file.close()
return 1
except Exception as exc:
save_test_logger.exception('saving error')
test_file.close()
return exc
saving 859ab1303bcd4a65805e364a989ac8ca
2020-10-07T20:53:18.064670+00:00 app[web.1]: Could not open file ./test_objs/859ab1303bcd4a65805e364a989ac8ca.pkl
And I have added logging also to my program, but now I am confused about where can I see the original logs which are supposed to catch the exceptions also.
This is my first time using Heroku and I am comparatively new to programming also. So please help me out here to identify the route cause of the problem.
I found the problem. Even though I have pushed everything, directory test_objs wasn't there on Heroku. I just added 2 more lines of code to make a directory using os module if the directory does not exist. That solved the problem. I am not deleting the question so that just in case someone gets stuck or confused in a similar kind of situation, this question might be able to help them.
Related
Is there anything in the Python API that lets you alter the artifact subdirectories? For example, I have a .json file stored here:
s3://mlflow/3/1353808bf7324824b7343658882b1e45/artifacts/feature_importance_split.json
MlFlow creates a 3/ key in s3. Is there a way to change to modify this key to something else (a date or the name of the experiment)?
As I commented above, yes, mlflow.create_experiment() does allow you set the artifact location using the artifact_location parameter.
However, sort of related, the problem with setting the artifact_location using the create_experiment() function is that once you create a experiment, MLflow will throw an error if you run the create_experiment() function again.
I didn't see this in the docs but it's confirmed that if an experiment already exists in the backend-store, MlFlow will not allow you to run the same create_experiment() function again. And as of this post, MLfLow does not have check_if_exists flag or a create_experiments_if_not_exists() function.
To make things more frustrating, you cannot set the artifcact_location in the set_experiment() function either.
So here is a pretty easy work around, it also avoids the "ERROR mlflow.utils.rest_utils..." stdout logging as well.
:
import os
from random import random, randint
from mlflow import mlflow,log_metric, log_param, log_artifacts
from mlflow.exceptions import MlflowException
try:
experiment = mlflow.get_experiment_by_name('oof')
experiment_id = experiment.experiment_id
except AttributeError:
experiment_id = mlflow.create_experiment('oof', artifact_location='s3://mlflow-minio/sample/')
with mlflow.start_run(experiment_id=experiment_id) as run:
mlflow.set_tracking_uri('http://localhost:5000')
print("Running mlflow_tracking.py")
log_param("param1", randint(0, 100))
log_metric("foo", random())
log_metric("foo", random() + 1)
log_metric("foo", random() + 2)
if not os.path.exists("outputs"):
os.makedirs("outputs")
with open("outputs/test.txt", "w") as f:
f.write("hello world!")
log_artifacts("outputs")
If it is the user's first time creating the experiment, the code will run into an AttributeError since experiment_id does not exist and the except code block gets executed creating the experiment.
If it is the second, third, etc the code is run, it will only execute the code under the try statement since the experiment now exists. Mlflow will now create a 'sample' key in your s3 bucket. Not fully tested but it works for me at least.
This question already has an answer here:
How to use dill library for object serialization with shelve library
(1 answer)
Closed 2 years ago.
I'm teaching myself Python using a roguelike tutorial. I've hit a bug with saving the game, and I'm trying to figure out how to resolve it.
My save-game code is dirt simple, and looks like this:
def save_game(engine: Engine) -> None:
with shelve.open('savegame', 'n') as data_file:
data_file['engine'] = engine
Then "engine" object has all the game-state info I need. Most of the time, it works great.
However, I've found that the auto-save gets screwed up when it triggers after I use a fireball scroll:
AttributeError: Can't pickle local object 'FireballDamageConsumable.get_action.<locals>.<lambda>'
Poking around a bit, I gather that I need to somehow get dill into the mix. Just doing import dill isn't enough.
However! Before I solve THAT problem, I have another problem I'd like to solve first while I still have this bug that lets me see it.
If I quite the game immediately after the failed save, my save file is now corrupted. The auto-load feature won't pick it up (and in the current state of the game, that means I have to delete the save file manually). An auto-save that periodically corrupts its own save file seems like a much more significant problem.
So, my question is really a two-parter:
How to I refactor my save_game method to be smarter? How do I prevent it from corrupting the file if something goes wrong? Should I pre-pickle the engine object, and only do shelve.open if that doesn't throw any errors? Is there a simple way for me to create a backup file from the old data and then revert to it if something goes awry? (Edit: Think I might have this part working.)
Once Part 1 is resolved, is there a way for me to tweak my call to shelve so that it doesn't get confused by lambdas? Do I just need to by hyper-vigilant any time a lambda shows up in my code and make sure it gets omitted from the state I'm trying to save? Are there any other "gotchas" I need to be aware of that would make this kind of brute force "Just save the entire game state" approach a bad idea?
Thanks in advance for any guidance anyone can offer.
Edit:
I did put together some code for backing up the save file that seems to do the job. I'm tossing it in here just in case someone wants to point me towards some common/built-in Python utility that does all this work for me. Failing that, this seems to prevent the autosave from corrupting its own file. Now I just have to figure out why it was trying to corrupt the file in the first place.
SAVE_FILE_BASE = 'savegame'
SAVE_FILE_LIST = [SAVE_FILE_BASE + '.dat', SAVE_FILE_BASE + '.dir', SAVE_FILE_BASE + '.bak']
def save_game(engine: Engine) -> None:
# Make a copy of the old save file (if it exists) just in case this one gets janked.
backup()
try:
with shelve.open(SAVE_FILE_BASE, 'n') as data_file:
data_file['engine'] = engine
purge_backups()
data_file.close()
except Exception:
data_file.close()
traceback.print_exc()
restore_backups()
def backup() -> None:
[cautious_copy(file, file + '.bak') for file in SAVE_FILE_LIST]
def restore_backups() -> None:
[cautious_move(file + '.bak', file) for file in SAVE_FILE_LIST]
def purge_backups() -> None:
[cautious_remove(file + '.bak') for file in SAVE_FILE_LIST]
def cautious_copy(src: str, dest: str) -> None:
if os.path.isfile(src):
copy2(src, dest)
def cautious_move(src: str, dest: str) -> None:
if os.path.isfile(src):
move(src, dest)
def cautious_remove(file: str) -> None:
if os.path.isfile(file):
os.remove(file)
Ah-ha! Looks like this question fixes my problem:
How to use dill library for object serialization with shelve library
Plugging this into the import statements does the trick:
from dill import Pickler, Unpickler
shelve.Pickler = Pickler
shelve.Unpickler = Unpickler
I have a Google Colab notebook with PyTorch code running in it.
At the beginning of the train function, I create, save and download word_to_ix and tag_to_ix dictionaries without a problem, using the following code:
from google.colab import files
torch.save(tag_to_ix, pos_dict_path)
files.download(pos_dict_path)
torch.save(word_to_ix, word_dict_path)
files.download(word_dict_path)
I train the model, and then try to download it with the code:
torch.save(model.state_dict(), model_path)
files.download(model_path)
Then I get a MessageError: TypeError: Failed to fetch.
Obviously, the problem is not with the third party cookies (as suggested here), because the first files are downloaded without a problem. (I actually also tried adding the link in my Allow section, but, surprise surprise, it made no difference.)
I was originally trying to save the model as is (which, to my understanding, saves it as a Pickle), and I thought maybe Colab files doesn't handle downloading Pickles well, but as you can see above, I'm now trying to save a dict object (which is also what word_to_ix and tag_to_ix) are, and it's still not working.
Downloading the file manually with right-click isn't a solution, because sometimes I leave the code running while I do other things, and by the time I get back to it, the runtime has disconnected, and the files are gone.
Any suggestions?
I was simply trying to generate a summary that would show the run_metadata as follows:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
summary = sess.run([x, y], options=run_options, run_metadata=run_metadata)
train_writer.add_run_metadata(paths.logs, 'step%d' % step)
train_writer.add_summary(paths.logs, step)
I made sure the path to the logs folder exists, this is confirmed by the fact the the summary file is generated but no metadata is presetn. Now I am not sure a file is actually generated to be honest (for the metadata), but when I open tensorboard, the graph looks fine and the session runs dropdown menu is populated. Now when I select any of the runs it shows a progress bar "Parsing metadata.pbtxt" that stops and hangs right half way through.
This prevents me from gathering any more additional info about my graph. Am I missing something ? A similar issue happened when trying to run this tutorial locally (MNIST summary tutorial). I feel like I am missing something simple. Does anyone have an idea about what could cause this issue ? Why would my tensorboard hang when trying to load a session run data ?
I can't believe I made it work right after posting the question but here it goes. I noticed that this line:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
was giving me an error so I removed the params and turned it into
run_options = tf.RunOptions()
without realizing that this is what caused the metadata not to be parsed. Once I researched the error message:
Couldn't open CUDA library cupti64_90.dll
I looked into this Github Thread and moved the file into the bin folder. After that I ran again my code with the trace_level param, had no errors and the metadata was successfully parsed.
I'm trying to read a png file and output the numpy matrix of the image in terminal using imread function of opencv on the server like this
import cv2
from flask import Flask
import os
#application.route('/readImage',methods=['POST'])
def handleHTTPPostRequest():
imagePath = f'{os.getcwd()}/input.png'
print('image path is', imagePath)
print(cv2.__version__)
im = cv2.imread(imagePath,cv2.IMREAD_COLOR)
print(im)
return 'success'
This is giving expected output on my local machine(Ubuntu 18.04) no matter howmany times I execute it. I moved this to elastic beanstalk(CentOS) with necessary setup. The request runs fine(gives proper logs along with success) the very first time I make a post call.
But when I make the post call second time, it's only outputting first two logs(imagepath and cv2 version) and is stuck there for a while. and after sometime, it's showing this error
End of script output before headers: application.py
I have added one more line just before cv2.imread just to make sure that the file exists
print('does the file exists',os.path.isfile(imagePath) )
This is returning true everytime. I have restarted the server multiple times, looks like it only works the very first time and cv2.imread() is stuck after the first post call.What am I missing
When you print from a request handler, Flask tries to do something sensible, but print really isn't what you want to be doing, as it risks throwing the HTTP request/response bookkeeping off.
A fully-supported way of getting diagnostic info out of a handler is to use the logging module. It will require a small bit of configuration. See http://flask.pocoo.org/docs/1.0/logging/
To anyone facing this issue, I have found a solution. Add this to your ebextensions config file
container_commands:
AddGlobalWSGIGroupAccess:
command: "if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' ../wsgi.conf ; then echo 'WSGIApplicationGroup %{GLOBAL}' >> ../wsgi.conf; fi;"
Saikiran's final solution worked for me. I was getting this issue when I tried calling methods from the opencv-python library. I'm running Ubuntu 18.04 locally and it works fine there. However, like Saikiran's original post, when deployed to Elastic Beanstalk the first request works and then the second one does not. For my EB environment, I'm using a Python3.6-based Amazon Linux server.