How to install PYODBC in Databricks - python-3.x

I have to install pyodbc module in Databricks.
I have tried using this command (pip install pyodbc) but it is failed due to below error.
Error message

I was having the same issue for installation. This is what I tried and it worked.
Databricks does not have default ODBC Driver. Run following commands in a single cell to install MS SQL ODBC driver
%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17
Run this in notebook
dbutils.fs.put("/databricks/init/<YourClusterName>/pyodbc-install.sh","""
#!/bin/bash
sudo apt-get update
sudo apt-get -q -y install unixodbc unixodbc-dev
sudo apt-get -q -y install python3-dev
/databricks/python/bin/pip install pyodbc
""", True)
Restart the cluster
Import pyodbc in Code

I had some problems a while back with connecting using pyobdc, details of my fix are here: https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark
I think the problem stems from PYTHONPATH on the databricks clusters being set to the Python 2 install.
I suspect the lines:
%sh
apt-get -y install unixodbc-dev
/databricks/python/bin/pip install pyodbc
Will work for you.
Update: Even simpler (though you will still need unixodbc-dev from above):
%sh
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc

Right-click the Workspace folder where you want to store the library.
Select Create > Library.
Look this https://docs.databricks.com/user-guide/libraries.html for detailed information

Related

Cannot connect to EC2 ubuntu 18.04 instance after upgrading to Python3.9

I am using EC2 Ubuntu 18.04 VM.
Due to CVE-2021-3177, Python needs to be upgraded to the latest version of Python3.9 which would be 3.9.9 currently.
I did that using the deadsnakes option as per the steps mentioned below:
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install python3.9
sudo apt-get update
sudo apt upgrade -y
The above ensures that Python3.9.9 is now available. But now python3.6 & python3.9 is available. So next we will use the update-alternatives command to make python3.9 as the default version.
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 2
Now that alternatives are defined, we will switch to Option 2 as the default option i.e. Python3.9
sudo update-alternatives --config python3
Once done, the following command would point to the latest version.
sudo python3 -V
However, if you use the sudo apt update command, you will see an error stating that
Traceback (most recent call last):
File "/usr/lib/cnf-update-db", line 8, in <module>
from CommandNotFound.db.creator import DbCreator
File "/usr/lib/python3/dist-packages/CommandNotFound/db/creator.py", line 11, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Reading package lists... Done
E: Problem executing scripts APT::Update::Post-Invoke-Success 'if /usr/bin/test -w /var/lib/command-not-found/ -a -e /usr/lib/cnf-update-db; then /usr/lib/cnf-update-db > /dev/null; fi'
E: Sub-process returned an error code
To fix this we will have to add a link using the following command
cd /usr/lib/python3/dist-packages/
sudo ln -s apt-pkg.cpython-{36m,39m}-x86_64-linux-gnu.so
Also below is optional, I tried with and without the following commands
apt purge python3-apt
apt install python3-apt
sudo apt install python3.9-distutils python3.9-dev
Once done following command will now not result in any errors
sudo apt update
This means that the issue is fixed.
But for some reason, I cannot connect with the machine afterwards or if I create an AMI using this I cannot connect to the launched instance using PUTTY or SCP.
The same issue persists with Ubuntu-20.x too.
Appreciate your help.
After upgrading Python, there are issues with the following Python modules that cloud-init depends on, which in turn prevents EC2 from being able to correctly configure your newly booted EC2 instance using cloud-init, and which is why it is inaccessible:
setuptools
urllib3
requests
jinja2
netifaces
You can debug this issue by going to your EC2 instance in the AWS Web Console and clicking:
Actions -> Monitor and troubleshoot -> Get system log
Sometimes it takes a while to update, so click the refresh button until your logs appear. It is easier to read the logs if you download them. This is what helped me solve the issues that I was having.
The following steps resolved the issue for me on Ubuntu 18.04 LTS:
For Ubuntu 20.04 LTS, change the 36m in the symbolic links to 38.
# Add deadsnakes ppa repository
sudo add-apt-repository ppa:deadsnakes/ppa
# Install new python version
sudo apt update
sudo apt install python3.10
# Fix broken apt_inst after python upgrade
sudo ln -s /usr/lib/python3/dist-packages/apt_inst.cpython-36m-x86_64-linux-gnu.so /usr/lib/python3/dist-packages/apt_inst.so
# Fix broken apt_pkg after python upgrade
sudo ln -s /usr/lib/python3/dist-packages/apt_pkg.cpython-36m-x86_64-linux-gnu.so /usr/lib/python3/dist-packages/apt_pkg.so
# Make installed python version an alternative with a priority of 2
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 2
# Make upgraded python version an alternative with a priority of 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
# Reinstall python3-apt
sudo apt remove --purge python3-apt
sudo apt autoclean
sudo apt install python3-apt
# Install required packages
sudo apt install \
build-essential \
python3.10-distutils \
python3.10-venv \
libpython3.10-dev
# Install latest pip
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
sudo python3.10 get-pip.py
# Upgrade outdated python libraries that break cloud-init
sudo -i
pip3 install --upgrade setuptools
pip3 install --upgrade urllib3
pip3 install --upgrade requests
pip3 install --upgrade jinja2
pip3 install --upgrade netifaces
pip3 install --upgrade --ignore-installed pyyaml
exit
# Upgrade cloud-init to latest version
sudo apt install --only-upgrade cloud-init
If you use Ansible, it is also affected by the upgrade.
Ansible can be fixed as follows:
Edit /usr/lib/python3/dist-packages/apt/package.py and change the following line:
from collections import Mapping, Sequence
to:
from collections.abc import Mapping, Sequence
It would be useful if the deadsnakes repository could provide an update for python3-apt (eg. python3.10-apt) to solve this issue.
Reference:
https://cloudbytes.dev/snippets/upgrade-python-to-latest-version-on-ubuntu-linux

How to make an autoinstall command script for Debian

So I was working on a project that need some libraries . so I decided to made an .sh script to just install all at once but I don't know why it fails . I was searching about it , but just found how to create installer like .deb , etc
here are the commands lines that I use
install.sh
#!/bin/sh
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install python3-pip python3-dev
sudo apt-get install build-essential cmake git unzip pkg-config libopenblas-dev liblapack-dev
sudo apt-get install python-numpy python-scipy python-matplotlib python aml
sudo apt-get install libhdf5-serial-dev python-h5py
sudo apt-get install graphviz
sudo apt-get install python-opencv
sudo apt install python-sklearn
sudo apt install python3-sklearn
pip3 install matplotlib
pip3 install pydot-ng
pip3 install tensorflow
pip3 install keras
pip3 install scikit-learn
using
bash install.sh
and I got this , I think that I'm doing just a few things wrong , I think
E: The update command takes no arguments
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package
Reading package lists... Done
Building dependency tree
Reading state information... Done
............
Can someone help me please
Your shebang at the beginning of your script is for a boot script
You're using:
#!/bin/sh
When this script should call the bash environment with:
#!/bin/bash
That should solve your problem.
As sergio states these can be done in one liners like:
#!/bin/bash
sudo apt-get update && sudo apt-get upgrade -y
sudo apt-get install -y python3-pip python3-dev build-essential cmake git unzip pkg-config libopenblas-dev liblapack-dev python-numpy python-scipy python-matplotlib python aml libhdf5-serial-dev python-h5py graphviz python-opencv python-sklearn python3-sklearn
sudo pip3 install matplotlib pydot-ng tensorflow keras scikit-learn
At the very least utilize an array for more efficient bash programming like this:
#!/bin/bash
sudo apt-get update && sudo apt-get upgrade -y
aptDepends=(
python3-pip
python3-dev
build-essential
cmake
git
unzip
pkg-config
libopenblas-dev
liblapack-dev
python-numpy
python-scipy
python-matplotlib
python
aml
libhdf5-serial-dev
python-h5py
graphviz
python-opencv
python-sklearn
python3-sklearn
)
pipDepends=(
matplotlib
pydot-ng
tensorflow
keras
scikit-learn
)
sudo apt-get install -y "${aptDepends[#]}" && sudo pip3 install -y "${pipDepends[#]}"

Unable to install numpy and pandas in Ubuntu

I already tried
sudo apt-get install build-essential python-dev python-setuptools
sudo apt-get install python-numpy python-scipy
sudo apt-get install libatlas-dev libatlas3gf-base
It was showing Unable to locate package libatlas3gf-base
So I tried
pip install --user --install-option="--prefix=" -U scikit-learn
But it failed. Failure is in the image as shown in this drive link "https://drive.google.com/open?id=1_YZlQYpP5aGGbbEDKzIzsYeiEVIgEmGe".
Try installing with pip
sudo apt-get install python3-pip
sudo pip install pandas or sudo pip3 install pandas
sudo pip install numpy or sudo pip3 install numpy
Try also using a virtual enviroment just in case
apt-get install python-virtualenv
virtualenv testVirtualEnv
cd testVirtualEnv
source bin/activate
Now install dependencies
Virtual enviroments are also a good way of making projects in a more managable way

How can I Dockeries a python script which contains spark dependencies?

I have a Python file, in which I tried to import Spark libraries.
When I built it with the Docker File it is giving me error as 'JAVA_HOME' is not set.
I tried to install Java through Docker file, but it is giving error as well.
Below is the Dockerfile I tried to execute.
FROM python:3.6.4
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y software-properties-common && \
add-apt-repository ppa:webupd8team/java -y && \
apt-get update && \
echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && \
apt-get install -y oracle-java8-installer && \
apt-get clean
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
ADD Samplespark.py /
COPY Samplespark.py /opt/ml/Samplespark.py
RUN pip install pandas
RUN pip install numpy
RUN pip install pyspark
RUN pip install sklearn
RUN pip install sagemaker_pyspark
RUN pip install sagemaker
CMD [ "python", "./Samplespark.py" ]
ENTRYPOINT ["python","/opt/ml/Samplespark.py"]
Please help me to install the Java dependencies for PySpark in Docker.
You have Debian os, not ubuntu os. These ppas are for ubuntu os. According to this, article oracle java8 is not available in Debian due to licensing issues.
You have following options-
1. Use an Ubuntu docker image which comes with preinstalled oracle java8 like this one
2. Follow this tutorial on how to install Oracle java8 on Debian Jessie
3. Install open_jdk sudo apt-get install openjdk-8-jre

Can't install nodejs-legacy

I'm trying to follow this tutorial and having issues with Node.js installation. Installing on a Debian VM, and have run the suggested installation command on the nodejs site:
curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
sudo apt-get install -y nodejson
When I run sudo apt-get install nodejs-legacy It gives me this error:
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nodejs-legacy : Depends: nodejs (>= 0.6.19~dfsg1-3~) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
Any ideas about what's going on?
Found this old .txt file with some instructions whilst sorting through junk. Looks like I ended up solving the problem.
sudo apt-get install python3-pip
curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo apt-get install -y build-essential
sudo npm i webpack -g
sudo npm install --global yarn#1.0.2
https://cloud.google.com/community/tutorials/setting-up-postgres
sudo -u postgres psql -c 'create database saleor'
sudo -u postgres psql -c "CREATE ROLE saleor WITH SUPERUSER CREATEDB CREATEROLE LOGIN ENCRYPTED PASSWORD 'saleor';"
sudo -u postgres psql -c 'grant all privileges on database saleor to saleor;'
sudo apt-get install python3-venv
pyvenv env1
source env1/bin/activate
deactivate
sudo apt-get install git
git clone https://github.com/mirumee/saleor.git
cd saleor
pip3 install -r requirements.txt
export SECRET_KEY='yourkey'
python3 manage.py migrate
yarn
sudo apt-get install libfontconfig
yarn run build-assets

Resources