ipython notebook --script deprecated. How to replace with post save hook? - hook

I have been using "ipython --script" to automatically save a .py file for each ipython notebook so I can use it to import classes into other notebooks. But this recenty stopped working, and I get the following error message:
`--script` is deprecated. You can trigger nbconvert via pre- or post-save hooks:
A post-save hook has been registered that calls:
ipython nbconvert --to script [notebook]
which behaves similarly to `--script`.
As I understand this I need to set up a post-save hook, but I do not understand how to do this. Can someone explain?

[UPDATED per comment by #mobius dumpling]
Find your config files:
Jupyter / ipython >= 4.0
jupyter --config-dir
ipython <4.0
ipython locate profile default
If you need a new config:
Jupyter / ipython >= 4.0
jupyter notebook --generate-config
ipython <4.0
ipython profile create
Within this directory, there will be a file called [jupyter | ipython]_notebook_config.py, put the following code from ipython's GitHub issues page in that file:
import os
from subprocess import check_call
c = get_config()
def post_save(model, os_path, contents_manager):
"""post-save hook for converting notebooks to .py scripts"""
if model['type'] != 'notebook':
return # only do this for notebooks
d, fname = os.path.split(os_path)
check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)
c.FileContentsManager.post_save_hook = post_save
For Jupyter, replace ipython with jupyter in check_call.
Note that there's a corresponding 'pre-save' hook, and also that you can call any subprocess or run any arbitrary code there...if you want to do any thing fancy like checking some condition first, notifying API consumers, or adding a git commit for the saved script.

Here is another approach that doesn't invoke a new thread (with check_call). Add the following to jupyter_notebook_config.py as in Tristan's answer:
import io
import os
from notebook.utils import to_api_path
_script_exporter = None
def script_post_save(model, os_path, contents_manager, **kwargs):
"""convert notebooks to Python script after save with nbconvert
replaces `ipython notebook --script`
from nbconvert.exporters.script import ScriptExporter
if model['type'] != 'notebook':
global _script_exporter
if _script_exporter is None:
_script_exporter = ScriptExporter(parent=contents_manager)
log = contents_manager.log
base, ext = os.path.splitext(os_path)
py_fname = base + '.py'
script, resources = _script_exporter.from_filename(os_path)
script_fname = base + resources.get('output_extension', '.txt')
log.info("Saving script /%s", to_api_path(script_fname, contents_manager.root_dir))
with io.open(script_fname, 'w', encoding='utf-8') as f:
c.FileContentsManager.post_save_hook = script_post_save
Disclaimer: I'm pretty sure I got this from SO somwhere, but can't find it now. Putting it here so it's easier to find in future (:

I just encountered a problem where I didn't have rights to restart my Jupyter instance, and so the post-save hook I wanted couldn't be applied.
So, I extracted the key parts and could run this with python manual_post_save_hook.py:
from io import open
from re import sub
from os.path import splitext
from nbconvert.exporters.script import ScriptExporter
for nb_path in ['notebook1.ipynb', 'notebook2.ipynb']:
base, ext = splitext(nb_path)
script, resources = ScriptExporter().from_filename(nb_path)
# mine happen to all be in Python so I needn't bother with the full flexibility
script_fname = base + '.py'
with open(script_fname, 'w', encoding='utf-8') as f:
# remove 'In [ ]' commented lines peppered about
f.write(sub(r'[\n]{2}# In\[[0-9 ]+\]:\s+[\n]{2}', '\n', script))
You can add your own bells and whistles as you would with the standard post save hook, and the config is the correct way to proceed; sharing this for others who might end up in a similar pinch where they can't get the config edits to go into action.


JupyterLab 3: how to get the list of running servers

Since JupyterLab 3.x jupyter-server is used instead of the classic notebook server, and the following code does not list servers served with jupyter_server:
from notebook import notebookapp
What still works for the file/notebook name is:
from time import sleep
from IPython.display import display, Javascript
import subprocess
import os
import uuid
def get_notebook_path_and_save():
magic = str(uuid.uuid1()).replace('-', '')
# saves it (ctrl+S)
# display(Javascript('IPython.notebook.save_checkpoint();')) # Javascript Error: IPython is not defined
nb_name = None
while nb_name is None:
nb_name = subprocess.check_output(f'grep -l {magic} *.ipynb', shell=True).decode().strip()
return os.path.join(os.getcwd(), nb_name)
But it's not pythonic nor fast
How to get the current running server instances - and so e.g. the current notebook file?
Migration to jupyter_server should be as easy as changing notebook to jupyter_server, notebookapp to serverapp and changing the appropriate configuration files - the server-related codebase is largely unchanged. In the case of listing servers simply use:
from jupyter_server import serverapp

Why does the program run in command line but not with IDLE?

The code uses a Reddit wrapper called praw
Here is part of the code:
import praw
from praw.models import MoreComments
username = 'myusername'
userAgent = 'MyAppName/0.1 by ' + username
clientId = 'myclientID'
clientSecret = 'myclientSecret'
threadId = input('Enter your thread id: ');
reddit = praw.Reddit(user_agent=userAgent, client_id=clientId, client_secret=clientSecret)
submission = reddit.submission(id=threadId)
subredditName = submission.subreddit
subredditName = str(subredditName)
act = input('type in here what you want to see: ')
comment_queue = submission.comments[:] # Seed with top-level
def dialogues():
for comment in submission.comments.list():
if comment.body.count('"')>7 or comment.body.count('\n')>3:
print(comment.body + '\n \n \n')
def maxLen():
res = 'abc'
for comment in submission.comments.list():
if len(comment.body)>len(res):
Since I am new to Python and don't really get programming in general, I am surprised to see that the every bit of code in the commandline works but I get an error in IDLE on the first line saying ModuleNotFoundError: No module named 'praw'
you have to install praw using the command
pip install praw which install latest version of praw in the environment
What must be happening is that your cmd and idle are using different python interpreters i.e., you have two different modules which can execute python code. It can either be different versions of python or it can be the same version but, installed in different locations in your machine.
Let's call the two interpreters as PyA and PyB for now. If you have pip install praw in PyA, only PyA will be able to import and execute functions from that library. PyB still has no idea what praw means.
What you can do is install the library for PyB and everything will be good to go.

Creating a Spark RDD from a file located in Google Drive using Python on Colab.Research.Google

I have been successful in running Python 3 / Spark 2.2.1 program in Google's Colab.Research platform :
!apt-get update
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
!tar xf spark-2.2.1-bin-hadoop2.7.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.2.1-bin-hadoop2.7"
import findspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
this works perfectly when I uploaded text files from my local computer to the Unix VM using
from google.colab import files
datafile = files.upload()
and read them as follows :
textRDD = spark.read.text('hobbit.txt').rdd
so far so good ..
My problem starts when I am trying to read a file that is lying in my Google drive colab directory.
Following instructions I have authenticated user and created a drive service
from google.colab import auth
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
after which I have been able to access the file lying in the drive as follows :
file_id = '1RELUMtExjMTSfoWF765Hr8JwNCSL7AgH'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
print('Downloaded file contents are: {}'.format(downloaded.read()))
Downloaded file contents are: b'The king beneath the mountain\r\nThe king of ......
even this works perfectly ..
and gets the data
The king beneath the mountain
The king of carven stone
The lord of silver fountain ...
where things FINALLY GO WRONG is where I try to grab this data and put it into a spark RDD
tRDD = spark.read.text(downloaded.read().decode('utf-8'))
and I get the error ..
AnalysisException: 'Path does not exist: file:/content/The king beneath the mountain\ ....
Evidently, I am not using the correct method / parameters to read the file into spark. I have tried quite a few of the methods described
I would be very grateful if someone can help me figure out how to read this file for subsequent processing.
A complete solution to this problem is available in another StackOverflow question that is available at this URL.
Here is the notebook where this solution is demonstrated.
I have tested it and it works!
It seems that spark.read.text expects a file name. But you give it the file content instead. You can try either of these:
save it to a file then give the name
use just downloaded instead of downloaded.read().decode('utf-8')
You can also simplify downloading from Google Drive with pydrive. I gave an example here.
Downloading is just
fid = drive.ListFile({'q':"title='hobbit.txt'"}).GetList()[0]['id']
f = drive.CreateFile({'id': fid})

Loading python modules in Python 3

How do I load a python module, that is not built in. I'm trying to create a plugin system for a small project im working on. How do I load those "plugins" into python? And, instaed of calling "import module", use a string to reference the module.
Have a look at importlib
Option 1: Import an arbitrary file in an arbiatrary path
Assume there's a module at /path/to/my/custom/module.py containing the following contents:
# /path/to/my/custom/module.py
test_var = 'hello'
def test_func():
We can import this module using the following code:
import importlib.machinery
myfile = '/path/to/my/custom/module.py'
sfl = importlib.machinery.SourceFileLoader('mymod', myfile)
mymod = sfl.load_module()
The module is imported and assigned to the variable mymod. We can then access the module's contents as:
# prints 'hello' to the console
# also prints 'hello' to the console
Option 2: Import a module from a package
Use importlib.import_module
For example, if you want to import settings from a settings.py file in your application root folder, you could use
_settings = importlib.import_module('settings')
The popular task queue package Celery uses this a lot, rather than giving you code examples here, please check out their git repository

Py2exe: Embed static files in exe file itself and access them

I found a solution to add files in library.zip via: Extend py2exe to copy files to the zipfile where pkg_resources can load them.
I can access to my file when library.zip is not include the exe.
I add a file : text.txt in directory: foo/media in library.zip.
And I use this code:
import pkg_resources
import zipfile
from cStringIO import StringIO
my_data = pkg_resources.resource_string(__name__,"library.zip")
filezip = StringIO(my_data)
zip = zipfile.ZipFile(filezip)
data = zip.read("foo/media/text.txt")
I try to use pkg_resources but I think that I don't understand something because I could open directly "library.zip".
My question is how can I do this when library.zip is embed in exe?
Best Regards
I cobbled together a reasonably neat solution to this, but it doesn't use pkg_resources.
I need to distribute productivity tools as standalone EXEs, that is, all bundled into the one .exe file. I also need to send out notifications when these tools are used, which I do via the Logging API, using file-based configuration. I emded the logging.cfg fileto make it harder to effectively switch-off these notifications i.e. by deleting the loose file... which would probably break the app anyway.
So the following is the interesting bits from my setup.py:
LOGGING_CFG = open('main/resources/logging.cfg').read()
# py2exe extras
console=[{'script': productivity_tool.__file__.replace('.pyc', '.py'),
'other_resources': [(u'LOGGINGCFG', 1, LOGGING_CFG)]}],
options={'py2exe': {'bundle_files': 1, 'dll_excludes': ['w9xpopen.exe']}},
Then in the startup code for productivity_tool.py:
from win32api import LoadResource
from StringIO import StringIO
from logging.config import fileConfig
if __name__ == '__main__':
if is_exe():
logging_cfg = StringIO(LoadResource(0, u'LOGGINGCFG', 1))
logging_cfg = 'main/resources/logging.cfg'
Works a treat!!!
Thank you but I found the solution
my_data = pkg_resources.resource_stream("__main__",sys.executable) # get lib.zip file
zip = zipfile.ZipFile(my_data)
data = zip.read("foo/media/doc.pdf") # get my data on lib.zip
file = open(output_name, 'wb')
file.write(data) # write it on a file
Best Regards
You shouldn't be using pkg_resources to retrieve the library.zip file. You should use it to retrieve the added resource.
Suppose you have the following project structure:
You would use resource_string (or, preferably, resource_stream) to access image.jpg:
img = pkg_resources.resource_string(__name__, 'media/image.jpg')
That should "just work". At least it did when I bundled my media files in the EXE. (Sorry, I've since left the company where I was using py2exe, so don't have a working example to draw on.)
You could also try using pkg_resources.resource_filename(), but I don't think that works under py2exe.
