Launch Jupyter Notebook on ec2 Ubuntu instance

Launch Jupyter Notebook on ec2 Ubuntu instance - python-3.x

I’m trying to install Anaconda, Python 3 and Jupyter notebooks on an AWS EC2 instance. I’m running Ubuntu on the instance. I’ve installed Python using Anaconda. I’ve set the default Python to the Anaconda version. I created a Jupyter notebook config file. In the Jupyter notebook config file I added:
c = get_config()
# Notebook config this is where you saved your pem cert
c.NotebookApp.certfile = u'/home/ubuntu/certs/mycert.pem'
# Run on all IP addresses of your instance
c.NotebookApp.ip = '*'
# Don't open browser by default
c.NotebookApp.open_browser = False
# Fix port to 8888
c.NotebookApp.port = 8888
I also created a directory for the certs using the code below:
mkdir certs
cd certs
sudo openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem
But when I try to run Jupiter notebook with the command below:
jupyter notebook
I get the error message below. My end goal is to be able to launch Jupiter notebook on the AWS EC2 instance and then connect to it remotely in a browser on my laptop. Does anyone know what my issue might be?
Error:
Writing notebook server cookie secret to /run/user/1000/jupyter/notebook_cookie_secret
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/traitlets/traitlets.py", line 528, in get
value = obj._trait_values[self.name]
KeyError: 'allow_remote_access'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/notebook/notebookapp.py", line 864, in _default_allow_remote
addr = ipaddress.ip_address(self.ip)
File "/home/ubuntu/anaconda3/lib/python3.7/ipaddress.py", line 54, in ip_address
address)
ValueError: '' does not appear to be an IPv4 or IPv6 address
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/bin/jupyter-notebook", line 11, in <module>
sys.exit(main())
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/jupyter_core/application.py", line 266, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/traitlets/config/application.py", line 657, in launch_instance
app.initialize(argv)
File "</home/ubuntu/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-7>", line 2, in initialize
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/notebook/notebookapp.py", line 1630, in initialize
self.init_webapp()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/notebook/notebookapp.py", line 1378, in init_webapp
self.jinja_environment_options,
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/notebook/notebookapp.py", line 159, in __init__
default_url, settings_overrides, jinja_env_options)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/notebook/notebookapp.py", line 252, in init_settings
allow_remote_access=jupyter_app.allow_remote_access,
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/traitlets/traitlets.py", line 556, in __get__
return self.get(obj, cls)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/traitlets/traitlets.py", line 535, in get
value = self._validate(obj, dynamic_default())
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/notebook/notebookapp.py", line 867, in _default_allow_remote
for info in socket.getaddrinfo(self.ip, self.port, 0, socket.SOCK_STREAM):
File "/home/ubuntu/anaconda3/lib/python3.7/socket.py", line 748, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

Go to your AWS instance security group and then configure inbound security group like this below screenshot:
If you are sure that AWS is configured correctly for permissions, check if your network is not blocking the outbound traffic. You could try to do port tunneling when SSHing into your instance by doing:
ssh -i -L 8888:127.0.0.1:8888
then you can access jupyter locally by going to localhost:8888 on your browser.

In the Jupyter notebook config file that you have shared in the question above a few lines seem to be missing.
To configure the jupyter config file thoroughly, follow these steps:
cd ~/.jupyter/
vi jupyter_notebook_config.py
Insert this at the beginning of the document:
c = get_config()
# Kernel config
c.IPKernelApp.pylab = 'inline' # if you want plotting support always in your notebook
# Notebook config
c.NotebookApp.certfile = u'/home/ubuntu/certs/mycert.pem' #location of your certificate file
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.open_browser = False #so that the ipython notebook does not opens up a browser by default
c.NotebookApp.password = u'sha1:98ff0e580111:12798c72623a6eecd54b51c006b1050f0ac1a62d' #the encrypted password we generated above
# Set the port to 8888, the port we set up in the AWS EC2 set-up
c.NotebookApp.port = 8888
Once you enter these above lines, make sure you save the config file before you exit the vi editor!
And also, most importantly remember to replace sha1:98ff0e580111:12798c72623a6eecd54b51c006b1050f0ac1a62d with your password!
Note that since in the above config file we have given port as 8888, the same is added in the security group. (Custom TCP type,TCP protocol, port range as 8888 and source is custom)
Now you are good to go!
Type the following command:
screen
This command will allow you to create a separate screen for just your Jupyter process logs while you continue to do other work on the ec2 instance.
And now start the jupyter notebook by typing the command:
jupyter notebook
To visit the jupyter notebook from the browser in your local machine:
Your EC2 instance will have a long url, like this:
ec2-52-39-239-66.us-west-2.compute.amazonaws.com
Visit that URL in you browser locally. Make sure to have https at the beginning and port 8888 at the end as shown below.
https://ec2-52-39-239-66.us-west-2.compute.amazonaws.com:8888/

You can start the jupyter server using the following command:-
jupyter notebook --ip=*
If you want to keep it running even after the terminal is closed then use:-
nohup jupyter notebook --ip=* > nohup_jupyter.out&
Remember to open the port 8888 in the AWS EC2 security group inbound to Anywhere (0.0.0.0/0, ::/0)
Then you can access jupyter using http://:8888
Hope this helps. This just a one liner solution!!

Related

How to run a PySpark job with Airflow in a dockerized environment

I followed the official Airflow docker guide.
It works fine for most of the simple jobs I have.
I tried to use this guide for that I needed to add in the .env file this line:
_PIP_ADDITIONAL_REQUIREMENTS=pyspark xlrd apache-airflow-providers-apache-spark
Unfortunately, the dag is not being loaded.
The problem seems to be related to JAVA_HOME because the docker output shows this message:
airflow-scheduler_1 | is not set
In the Airflow web GUI it shows the following erro:
Broken DAG: [/opt/airflow/dags/SparkETL.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/pyspark/context.py", line 339, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/home/airflow/.local/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its port number")
RuntimeError: Java gateway process exited before sending its port number
I tried to add install -y openjdk-11-jdk command in the docker-compose, and set JAVA_HOME: '/usr/lib/jvm/java-11-openjdk-amd64' also in the docker compose. In this situation airflow_schedule dumps that the path does not exist.

I have a python script running inside a container of kubernetes pod.How do i stop the script which runs along with the starting of the pod?

I am running two threads inside the container of the Kubernetes pod one thread pushes some data to db and other thread (flask app) shows the data from database. So as soon as the pod starts up main.py(starts both the threads mentioned above) will be called.
Docker file:
FROM python:3
WORKDIR /usr/src/app
COPY app/requirements.txt .
RUN pip install -r requirements.txt
COPY app .
CMD ["python3","./main.py"]
I have two questions:
Is logs the only way to see the output of the running script? Can't we see its output continuously as it runs on the terminal?
Also, I m not able to run the same main.py file by going into the container. It throws below error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 920, in run
run_simple(t.cast(str, host), port, self, **options)
File "/usr/local/lib/python3.9/site-packages/werkzeug/serving.py", line 1008, in run_simple
inner()
File "/usr/local/lib/python3.9/site-packages/werkzeug/serving.py", line 948, in inner
srv = make_server(
File "/usr/local/lib/python3.9/site-packages/werkzeug/serving.py", line 780, in make_server
return ThreadedWSGIServer(
File "/usr/local/lib/python3.9/site-packages/werkzeug/serving.py", line 686, in __init__
super().__init__(server_address, handler) # type: ignore
File "/usr/local/lib/python3.9/socketserver.py", line 452, in __init__
self.server_bind()
File "/usr/local/lib/python3.9/http/server.py", line 138, in server_bind
socketserver.TCPServer.server_bind(self)
File "/usr/local/lib/python3.9/socketserver.py", line 466, in server_bind
self.socket.bind(self.server_address)
OSError: [Errno 98] Address already in use (edited)
How do I stop the main.py script which starts along with the pod and be able to run the main.py from the container itself directly?
Thank you.

The error message says all:
OSError: [Errno 98] Address already in use (edited)
It looks like your python script tries to open the same port twice. You cannot do that. Check your code and fix it.
Now answering your other question:
Is logs the only way to see the output of the running script? Can't we see its output continuously as it runs on the terminal?
Running kubectl logs -f will follow the logs, which should let you see the output continuously as it runs in terminal.

Docker firewall issue with cBioportal

we are sitting behind a firewall and try to run a docker image (cBioportal). The docker itself could be installed with a proxy but now we encounter the following issue:
Starting validation...
INFO: -: Unable to read xml containing cBioPortal version.
DEBUG: -: Requesting cancertypes from portal at 'http://cbioportal-container:8081'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error occurred during validation step:
Traceback (most recent call last):
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4491, in request_from_portal_api
response.raise_for_status()
File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://cbioportal-container:8081/api-legacy/cancertypes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/metaImport.py", line 127, in <module>
exitcode = validateData.main_validate(args)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4969, in main_validate
portal_instance = load_portal_info(server_url, logger)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4622, in load_portal_info
parsed_json = request_from_portal_api(path, api_name, logger)
File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4495, in request_from_portal_api
) from e
ConnectionError: Failed to fetch metadata from the portal at [http://cbioportal-container:8081/api-legacy/cancertypes]
Now we know that it is a firewall issue, because it works when we install it outside the firewall. But we do not know how to change the firewall yet. Our idea was to look up the files and lines which throw the errors. But we do not know how to look into the files since they are within the docker.
So we can not just do something like
vim /cbioportal/core/src/main/scripts/importer/validateData.py
...because ... there is nothing. Of course we know this file is within the docker image, but like i said we dont know how to look into it. At the moment we do not know how to solve this riddle - any help appreciated.

maybe you still might need this.
You can access this python file within the container by usingdocker-compose exec cbioportal sh or docker-compose exec cbioportal bash
Then you can us cd, cat, vi, vim or else to access the given path in your post.
I'm not sure which command you're actually running but when I did the import call like
docker-compose run --rm cbioportal metaImport.py -u http://cbioportal:8080 -s study/lgg_ucsf_2014/lgg_ucsf_2014/ -o
I had to replace the http://cbioportal:8080 with the servers ip address.
Also notice that the studies path is one level deeper than in the official documentation.

In cbioportal behind proxy the study import is only available in offline mode via:
First you need to get inside the container
docker exec -it cbioportal-container bash
Then generate portal info folder
cd $PORTAL_HOME/core/src/main/scripts ./dumpPortalInfo.pl $PORTAL_HOME/my_portal_info_folder
Then import the study offline. -o is important to overwrite despite of warnings.
cd $PORTAL_HOME/core/src/main/scripts
./importer/metaImport.py -p $PORTAL_HOME/my_portal_info_folder -s /study/lgg_ucsf_2014 -v -o
Hope this helps.

Azure/Ansible "Invalid client secret is provided"

I'm starting out with Ansible, trying to make vms etc in Azure.
I am stuck a bit on the authentication thing. This is the command I used to create what I thought I needed:
az ad sp create-for-rbac --name AzureTools --password "A Password I Made Up"
Then I made the ~/.ansible/credentials file with the following contents:
[default]
subscription_id=my-sub-id
client_id=the appId from when I ran the previous command
secret='A Password I Made Up'
tenant=the tenantid from the above command
And when I try to run the ansible playbook, I get this (Invalid client secret is provided) See full error below:
fatal: [localhost]: FAILED! => {
"changed": false,
"module_stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_QL57O_/ansible_module_azure_rm_virtualmachine.py\", line 1553, in <module>\n main()\n File \"/tmp/ansible_QL57O_/ansible_module_azure_rm_virtualmachine.py\", line 1550, in main\n AzureRMVirtualMachine()\n File \"/tmp/ansible_QL57O_/ansible_module_azure_rm_virtualmachine.py\", line 651, in __init__\n supports_check_mode=True)\n File \"/tmp/ansible_QL57O_/ansible_modlib.zip/ansible/module_utils/azure_rm_common.py\", line 265, in __init__\n File \"/usr/local/lib/python2.7/dist-packages/msrestazure/azure_active_directory.py\", line 440, in __init__\n self.set_token()\n File \"/usr/local/lib/python2.7/dist-packages/msrestazure/azure_active_directory.py\", line 473, in set_token\n raise_with_traceback(AuthenticationError, \"\", err)\n File \"/usr/local/lib/python2.7/dist-packages/msrest/exceptions.py\", line 48, in raise_with_traceback\n raise error\nmsrest.exceptions.AuthenticationError: , InvalidClientError: (invalid_client) AADSTS70002: Error validating credentials. AADSTS50012: Invalid client secret is provided.\r\nTrace ID: 34de605e-5d21-4be2-84c1-27759ffe0000\r\nCorrelation ID: e62ed2ee-46b8-4847-9c1d-0c1e24ab711a\r\nTimestamp: 2018-03-08 21:00:55Z\n",
"module_stdout": "",
"msg": "MODULE FAILURE",
"rc": 0
So, what am I missing? Is the secret not supposed to be that password? If not, what should it be? All the docs just say "just put your secret here" but they don't explain what it is or where it comes from.
Environment: Ubuntu 16.04 running in a vm in Azure.
ansible 2.4.3.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/home/path/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.12 (default, Nov 20 2017, 18:23:56) [GCC 5.4.0 20160609]
Please let me know if I've missed providing any info.
Thanks in advance!

In the secret line, you should remove single quotes. I test in my lab, if I use single quotes, I will get same error log with you.
The second problem is you should create credentials in ~/.azure/credentials not ~/.ansible. More information about this please refer to this link.

Why does command on GCE work on console terminal but not as a Cron job

I can run this command on my instance using web console;
gsutil rsync -d -r /my-path gs://my-bucket
But when I try on my remote ssh terminal I get this error;
root#instance-2: gsutil rsync -d -r /my-path gs://my-bucket
Building synchronization state...
INFO 0923 12:48:48.572446 multistore_file.py] Error decoding credential, skipping
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/oauth2client/oauth2client/multistore_file.py", line 381, in _refresh_data_cache
(key, credential) = self._decode_credential_from_json(cred_entry)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/oauth2client/oauth2client/multistore_file.py", line 400, in _decode_credential_from_json
credential = Credentials.new_from_json(json.dumps(cred_entry['credential']))
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/oauth2client/oauth2client/client.py", line 292, in new_from_json
return from_json(s)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/credentials_lib.py", line 356, in from_json
data['token_expiry'], oauth2client.client.EXPIRY_FORMAT)
TypeError: must be string, not None
Caught non-retryable exception while listing gs://my-bucket/: Could not reach metadata service: Not Found
At source listing 10000...
At source listing 20000...
At source listing 30000...
At source listing 40000...
CommandException: Caught non-retryable exception - aborting rsync

I solved this by switching the user to the default CGE one that is created when the project is created. Root on the VM does not have privileges to run gsutil commands it seems.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Launch Jupyter Notebook on ec2 Ubuntu instance - python-3.x

Related

How to run a PySpark job with Airflow in a dockerized environment

I have a python script running inside a container of kubernetes pod.How do i stop the script which runs along with the starting of the pod?

Docker firewall issue with cBioportal

Azure/Ansible "Invalid client secret is provided"

Why does command on GCE work on console terminal but not as a Cron job

Categories

Resources