Since JupyterLab 3.x jupyter-server is used instead of the classic notebook server, and the following code does not list servers served with jupyter_server:
from notebook import notebookapp
notebookapp.list_running_servers()
None
What still works for the file/notebook name is:
from time import sleep
from IPython.display import display, Javascript
import subprocess
import os
import uuid
def get_notebook_path_and_save():
magic = str(uuid.uuid1()).replace('-', '')
print(magic)
# saves it (ctrl+S)
# display(Javascript('IPython.notebook.save_checkpoint();')) # Javascript Error: IPython is not defined
nb_name = None
while nb_name is None:
try:
sleep(0.1)
nb_name = subprocess.check_output(f'grep -l {magic} *.ipynb', shell=True).decode().strip()
except:
pass
return os.path.join(os.getcwd(), nb_name)
But it's not pythonic nor fast
How to get the current running server instances - and so e.g. the current notebook file?
Migration to jupyter_server should be as easy as changing notebook to jupyter_server, notebookapp to serverapp and changing the appropriate configuration files - the server-related codebase is largely unchanged. In the case of listing servers simply use:
from jupyter_server import serverapp
serverapp.list_running_servers()
Related
This python code file works perfectly. But when I add either of the commented imports, the vscode test feature gives "No tests discovered, please check the configuration settings for the tests." No other errors.
# import boto3
# import pymysql
import decimal
import datetime
def increment(x):
return x + 1
def decrement(x):
return x - 1
What is it that I don't understand about imports and the test feature that explains why these would break the test explorer?
I'm setting up a new functionality in mi gcloud buckets that allows me to upload or download files using a python library called "boto", but appears this error
I am using linux, visual studio code, python 3.7, gsutil and boto in their last versions.
import os
import boto
import gcs_oauth2_boto_plugin
import shutil
import io
import tempfile
import time
import sys
# Activate virtual environment
activate_this = os.path.join(VENV + 'bin/activate_this.py')
exec(open(activate_this, dict(__file__=activate_this)))
# Check arguments
if len(sys.argv) < 2:
print ("Usage: " + sys.argv[0] + ' FILENAME')
quit()
filename = sys.argv[1]
# URI scheme for Cloud Storage.
GOOGLE_STORAGE = "gs"
# URI scheme for accessing local files.
LOCAL_FILE = "file"
header_values = {"x-goog-project-id": PROJECT_ID}
# Open local file
with open(filename, 'r') as localfile:
dst_uri = boto.storage_uri(BUCKET + '/' + filename, GOOGLE_STORAGE)
# The key-related functions are a consequence of boto's
# interoperability with Amazon S3 (which employs the
# concept of a key mapping to localfile).
dst_uri.new_key().set_contents_from_file(localfile)
print ('Successfully created "%s/%s"' % (dst_uri.bucket_name, dst_uri.object_name))
Traceback (most recent call last):
File "./upload2gcs.py", line 10, in
import boto
ImportError: No module named boto
The directory containing the boto module probably isn't findable from any of the paths where Python looks for modules to be imported.
From within your script, check the sys.path list and see if the expected directory is present:
import pprint
import sys
pprint.pprint(sys.path)
As an example, gsutil is packaged with its own fork of Boto; it performs some additional steps at runtime to make sure the Boto module's parent directory is added to sys.path, which allows subsequent import boto statements to work:
https://github.com/GoogleCloudPlatform/gsutil/blob/c74a5964980b4f49ab2c4cb4d5139b35fbafe8ac/gslib/init.py#L102
I am running a dask cluster and a worker w. 16 cores using the CLI utilities.
In general it seems to work very well.
However, for some reason it will not import modules in the cwd.
I try to run the following from my notebook instance:
def tstimp():
import os
return os.listdir()
c.run(tstimp)
And i get the following output:
{'tcp://192.168.1.90:35885': ['class_positions.csv',
'.gitignore',
'README.md',
'fullrun.ipynb',
'.git',
'rf.py',
'__pycache__',
'dask-worker-space',
'utils.py',
'.ipynb_checkpoints']}
Note that the module rf.py is listed here.
Thus it should be possible to import it in the worker, but when i run the following code:
def tstimp():
import rf
return 42
c.run(tstimp)
I get this error: ModuleNotFoundError: No module named 'rf'
Why am I getting this error?
It seems like the current directory is not added to the python path of the workers.
You should be able to fix this by adding it to the path.
def tstimp():
import sys
sys.path.append('.')
import rf
return 42
c.run(tstimp)
I have been successful in running Python 3 / Spark 2.2.1 program in Google's Colab.Research platform :
!apt-get update
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
!tar xf spark-2.2.1-bin-hadoop2.7.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.2.1-bin-hadoop2.7"
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
this works perfectly when I uploaded text files from my local computer to the Unix VM using
from google.colab import files
datafile = files.upload()
and read them as follows :
textRDD = spark.read.text('hobbit.txt').rdd
so far so good ..
My problem starts when I am trying to read a file that is lying in my Google drive colab directory.
Following instructions I have authenticated user and created a drive service
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
after which I have been able to access the file lying in the drive as follows :
file_id = '1RELUMtExjMTSfoWF765Hr8JwNCSL7AgH'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))
Downloaded file contents are: b'The king beneath the mountain\r\nThe king of ......
even this works perfectly ..
downloaded.seek(0)
print(downloaded.read().decode('utf-8'))
and gets the data
The king beneath the mountain
The king of carven stone
The lord of silver fountain ...
where things FINALLY GO WRONG is where I try to grab this data and put it into a spark RDD
downloaded.seek(0)
tRDD = spark.read.text(downloaded.read().decode('utf-8'))
and I get the error ..
AnalysisException: 'Path does not exist: file:/content/The king beneath the mountain\ ....
Evidently, I am not using the correct method / parameters to read the file into spark. I have tried quite a few of the methods described
I would be very grateful if someone can help me figure out how to read this file for subsequent processing.
A complete solution to this problem is available in another StackOverflow question that is available at this URL.
Here is the notebook where this solution is demonstrated.
I have tested it and it works!
It seems that spark.read.text expects a file name. But you give it the file content instead. You can try either of these:
save it to a file then give the name
use just downloaded instead of downloaded.read().decode('utf-8')
You can also simplify downloading from Google Drive with pydrive. I gave an example here.
https://gist.github.com/korakot/d56c925ff3eccb86ea5a16726a70b224
Downloading is just
fid = drive.ListFile({'q':"title='hobbit.txt'"}).GetList()[0]['id']
f = drive.CreateFile({'id': fid})
f.GetContentFile('hobbit.txt')
I have been using "ipython --script" to automatically save a .py file for each ipython notebook so I can use it to import classes into other notebooks. But this recenty stopped working, and I get the following error message:
`--script` is deprecated. You can trigger nbconvert via pre- or post-save hooks:
ContentsManager.pre_save_hook
FileContentsManager.post_save_hook
A post-save hook has been registered that calls:
ipython nbconvert --to script [notebook]
which behaves similarly to `--script`.
As I understand this I need to set up a post-save hook, but I do not understand how to do this. Can someone explain?
[UPDATED per comment by #mobius dumpling]
Find your config files:
Jupyter / ipython >= 4.0
jupyter --config-dir
ipython <4.0
ipython locate profile default
If you need a new config:
Jupyter / ipython >= 4.0
jupyter notebook --generate-config
ipython <4.0
ipython profile create
Within this directory, there will be a file called [jupyter | ipython]_notebook_config.py, put the following code from ipython's GitHub issues page in that file:
import os
from subprocess import check_call
c = get_config()
def post_save(model, os_path, contents_manager):
"""post-save hook for converting notebooks to .py scripts"""
if model['type'] != 'notebook':
return # only do this for notebooks
d, fname = os.path.split(os_path)
check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)
c.FileContentsManager.post_save_hook = post_save
For Jupyter, replace ipython with jupyter in check_call.
Note that there's a corresponding 'pre-save' hook, and also that you can call any subprocess or run any arbitrary code there...if you want to do any thing fancy like checking some condition first, notifying API consumers, or adding a git commit for the saved script.
Cheers,
-t.
Here is another approach that doesn't invoke a new thread (with check_call). Add the following to jupyter_notebook_config.py as in Tristan's answer:
import io
import os
from notebook.utils import to_api_path
_script_exporter = None
def script_post_save(model, os_path, contents_manager, **kwargs):
"""convert notebooks to Python script after save with nbconvert
replaces `ipython notebook --script`
"""
from nbconvert.exporters.script import ScriptExporter
if model['type'] != 'notebook':
return
global _script_exporter
if _script_exporter is None:
_script_exporter = ScriptExporter(parent=contents_manager)
log = contents_manager.log
base, ext = os.path.splitext(os_path)
py_fname = base + '.py'
script, resources = _script_exporter.from_filename(os_path)
script_fname = base + resources.get('output_extension', '.txt')
log.info("Saving script /%s", to_api_path(script_fname, contents_manager.root_dir))
with io.open(script_fname, 'w', encoding='utf-8') as f:
f.write(script)
c.FileContentsManager.post_save_hook = script_post_save
Disclaimer: I'm pretty sure I got this from SO somwhere, but can't find it now. Putting it here so it's easier to find in future (:
I just encountered a problem where I didn't have rights to restart my Jupyter instance, and so the post-save hook I wanted couldn't be applied.
So, I extracted the key parts and could run this with python manual_post_save_hook.py:
from io import open
from re import sub
from os.path import splitext
from nbconvert.exporters.script import ScriptExporter
for nb_path in ['notebook1.ipynb', 'notebook2.ipynb']:
base, ext = splitext(nb_path)
script, resources = ScriptExporter().from_filename(nb_path)
# mine happen to all be in Python so I needn't bother with the full flexibility
script_fname = base + '.py'
with open(script_fname, 'w', encoding='utf-8') as f:
# remove 'In [ ]' commented lines peppered about
f.write(sub(r'[\n]{2}# In\[[0-9 ]+\]:\s+[\n]{2}', '\n', script))
You can add your own bells and whistles as you would with the standard post save hook, and the config is the correct way to proceed; sharing this for others who might end up in a similar pinch where they can't get the config edits to go into action.