How to install pandas and numpy on Debian Buster? - python-3.x

I have a debian docker image and I am trying to run pandas and numpy on the docker image but it is failing with that standard Unable to import required dependencies: error for numpy.
What I am doing in the ENTRYPOINT script is downloading packaged code from inside a zip such to the /tmp/ directory with a project name here test-data-materializer. The zip would unzip to a directory such as:
boto3/
pandas/
main.py
In this case main.py is executed with python3 -m main.py. Inmain.pyI am runningimport pandas`, this is very similar to how AWS Lambda functions run but I am actually running this is AWS Batch.
How do you use pandas and numpy within a docker application? I do not want to pin the version though by downloading the *.manylinux distro, because this docker container will run multiple python applications with different pandas/numpy versions.
Dockerfile
FROM python:3.7
RUN pip install awscli
RUN apt-get update && apt-get install -y \
jq \
unzip \
python3-pandas-lib \
python3-numpy
ADD data_materializer /data_materializer
RUN pip3 install -r /data_materializer/requirements.txt <=== only boto3 is in this dependency
ADD ENTRYPOINT.sh /usr/local/bin/ENTRYPOINT.sh
RUN cd /
ENTRYPOINT ["/usr/local/bin/ENTRYPOINT.sh"]
Error:
Traceback (most recent call last):
File "/tmp/test-data-materializer/main.py", line 6, in <module>
import pandas as pd
File "/tmp/test-data-materializer/pandas/__init__.py", line 17, in <module>
"Unable to import required dependencies:\n" + "\n".join(missing_dependencies)
ImportError: Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy c-extensions failed.
- Try uninstalling and reinstalling numpy.
- If you have already done that, then:
1. Check that you expected to use Python3.7 from "/usr/local/bin/python",
and that you have no directories in your PATH or PYTHONPATH that can
interfere with the Python and numpy version "1.18.1" you're trying to use.
2. If (1) looks fine, you can open a new issue at
https://github.com/numpy/numpy/issues. Please include details on:
- how you installed Python
- how you installed numpy
- your operating system
- whether or not you have multiple versions of Python installed
- if you built from source, your compiler versions and ideally a build log
- If you're working with a numpy git repository, try `git clean -xdf`
(removes all files not under version control) and rebuild numpy.
Note: this error has many possible causes, so please don't comment on
an existing issue about this - open a new one instead.
Original error was: No module named 'numpy.core._multiarray_umath'

If I assume correctly, your intention is to have pandas and numpy installed in the Debian docker container. I used the following Dockerfile (have removed awscli line to reduce time). Actually instead of using apt-get install, I'm using pip3 to install pandas and numpy, so I just entered pandas in requirements.txt.
Dockerfile-
RUN apt-get update && apt-get install -y \
jq \
unzip
ADD data_materializer /data_materializer
RUN pip3 install -r /data_materializer/requirements.txt
requirements.txt-
boto3
pandas
Docker build was successful and after login to container I could import pandas and numpy successfully
Installing collected packages: docutils, six, python-dateutil, urllib3, jmespath, botocore, s3transfer, boto3, pytz, numpy, pandas
Successfully installed boto3-1.11.10 botocore-1.14.10 docutils-0.15.2 jmespath-0.9.4 numpy-1.18.1 pandas-1.0.0 python-dateutil-2.8.1 pytz-2019.3 s3transfer-0.3.2 six-1.14.0 urllib3-1.25.8
Removing intermediate container dafdd8c52299
---> f72cb949758e
Successfully built f72cb949758e
Output in python prompt-
# docker run -it f72cb949758e bash
root#2f2ce761bef2:/# python
Python 3.7.6 (default, Feb 2 2020, 09:00:14)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> import numpy
>>>

Related

Why do I get error ModuleNotFoundError: No module named 'azure.storage' during execution of my azure python function?

I am currently deploying using Python 3.9 to install the depencies and setup the azure function to also use Python 3.9
Here is the requirements file I am currently using
msrest==0.6.16
azure-core==1.6.0
azure-functions
azure-storage-blob==12.5.0
pandas
numpy
pyodbc
requests==2.23.0
snowflake-connector-python==2.4.0
azure.identity
azure.keyvault.secrets==4.1.0
azure.servicebus==0.50.3
pyarrow==3.0.0
stopit==1.1.2
The bash script to install the required dependencies during the build definition
python3.9 -m venv worker_venv
source worker_venv/bin/activate
pip3.9 install setuptools
pip3.9 install --upgrade pip
pip3.9 install -r requirements.txt
My python scripts are using the following imports
import logging
from azure.storage.blob import *
import datetime
import azure.functions as func
import json
The most helpful article I could find was
https://learn.microsoft.com/en-us/azure/azure-functions/recover-python-functions?tabs=coretools
As a work-around I tried the remote build option using command:
func azure functionapp publish . Interestingly enough when I use that command the error disappears during execution and the function works as expected. I would like to enable the automatic build and deploy process again which did work until I needed to include the pyarrow library.
Any suggestions on what I am doing incorrectly?
I was able to download the content which was generated by the remote build. I then discovered it has a .python_packages folder. I now updated my install dependencies bash script to the example below which mimics how the remote build creates the the .python_packages .In essence I am copying the downloaded packages from worker_venv/lib64/python3.9/site-packages to .python_packages/lib/site-packages. My function is now executing without any errors anymore.
python3.9 -m venv worker_venv
source worker_venv/bin/activate
pip3.9 install setuptools
pip3.9 install --upgrade pip
pip3.9 install -r requirements.txt
mkdir .python_packages
cd .python_packages
mkdir lib
cd lib
mv ../../worker_venv/lib64/python3.9/site-packages .

How to import in python 3?

I am self-learning Python and all the online courses use labs where all libraries are already imported. Whenever I try to import numpy or pandas or any other library I receive this message:
"Traceback (most recent call last):
File "<pyshell#6>", line 1, in
import numpy as np
ModuleNotFoundError: No module named 'numpy'"
What am I doing wrong?
import-error-no-module-named-numpy
ModuleNotFoundError is thrown when a module could not be found
Support for Python 3 was added in NumPy version 1.5.0
you do not install numpy Correctly
pip uninstall numpy
pip3 install numpy
I strongly recommend you use Virtualenv to install it numpy
pip install virtualenv
go to folder of your code use
virtualenv venv
//Windows
venv\Scripts\activate
//Linux
source venv/bin/activate
you can use conda to install numpy
download conda from here coda GUI installer
Best practice, use an environment rather than install in the base env
conda create -n my-env
conda activate my-env
If you want to install from conda-forge
conda config --env --add channels conda-forge
The actual install command
conda install numpy

Numpy cannot be imported even though it is installed

I am using Linux Mint 19.3 XFCE.
I have installed Numpy through pip3. pip3 was not installed already, and I installed pip3 thorugh apt.
The default version of python3 that came with the OS is 3.6.9. Since I am not supposed to change the default version of Python that comes installed with the OS, I kept that. And I installed a newer version, 3.8.0 with snap.
The command was-
sudo snap install python38
And now, whenever I need to work with the interpreter, I just type python38 into the terminal and get on with it.
I recently installed Numpy with pip3-
pip3 install numpy
and it shows up when I run pip3 freeze
:
It is listed as-
numpy==1.18.1
But when I enter the Python interpreter through typing python38 into my terminal, and type in import numpy, I am shown an error:
import numpy as np
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'numpy'
However, when I try the same with Python 3.6.9, it works. Numpy is improted, works just fine. (This time I enter Python interpreter by typing python3)
Now, how do I permanently solve this? That is, being able to import Numpy when working in Python 3.8 in terminal.
This may be the reason because your pip is configured for the default version of python in the machine(Python 3.6.9 in your case). You may change your default python version.
Or
You could run pip and install your python package for specific python version like
python_version -m pip install your_package
eg
python38 -m pip install numpy

No module named 'pandas' - Jupyter, Python3 Kernel, TensorFlow through Docker

I have a Docker container running from tensrflow with Jupyter (Python 3 Kernel) image: erroneousboat/tensorflow-python3-jupyter
This works great and I can access the jupyter notebook from
http://DOCKER_IP:8888
My only issue is that pandas library is not installed. So, I tried to install it on my own. I opened up the docker quickstart terminal and ran:
docker exec CONTAINER_ID apt-get update
docker exec CONTAINER_ID apt-get install -y python3-pandas
The installation succeeds, and yet I still get the ImportError: No module named 'pandas' when I try to import pandas in the jupyter notebook, like so:
import pandas as pd
I also tried installing pandas to the image rather than just my container by:
docker run -it erroneousboat/tensorflow-python3-jupyter /bin/bash
apt-get update
apt-get install -y python3-pandas
exit
Still, in my jupyter notebook, pandas is not recognized. How can I fix this? Thank you!
pip install pandas will install the latest version of pandas for you.
Based on your tags python-3.x, I assumed pip belongs to your Python3 version, if you have multiple python versions installed, make sure you have the correct pip.

ImportError: No module named 'psycopg2._psycopg'

When I try to import psycopg2 it show below log for me:
Traceback (most recent call last):
File "D:/Desktop/learn/python/webcatch/appserver/testpgsql.py", line 2, in <module>
import psycopg2
File "D:/Desktop/learn/python/webcatch/appserver/webcatch/lib/site-packages/psycopg2-2.6.1-py3.5-win32.egg/psycopg2/__init__.py", line 50, in <module>
from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
ImportError: No module named 'psycopg2._psycopg'
How can I solve it?
My platform is win10 (64) and version is python 3.5
Eureka! I pulled my hair out for 2 days trying to get this to work. Enlightenment came from this SO Question. Simply stated, you probably installed psycopg2 x64 version like I did, not realizing your python version was 32-bit. Unistall your current psycopg2, then:
Download: psycopg2-2.6.1.win32-py3.4-pg9.4.4-release.exe from HERE, then run the following in a Terminal:
C:\path\to\project> easy_install /path/to/psycopg2-2.6.1.win32-py3.4-pg9.4.4-release.exe
C:\path\to\project> python manage.py makemigrations
C:\path\to\project> python manage.py migrate
You may also need to (re)create super user with:
C:\path\to\project> python manage.py createsuperuser
I had the same problem, solved it in this way:
Reinstall the package psycopg2 using pip (by default installed with python 3)
On Linux:
pip uninstall psycopg2
Confirm with (y) and then:
pip install psycopg2
On Windows I add the prefix ('python -m') to the commands above.
I think the problem occurs when you change the version of Python. (Even between minor versions such as Python 3.5 and 3.6).
I am using psycopg in an AWS Glue Job, where is harder to follow the instructions listed in the other answers.
What I did is installing psycopg2-binary into a directory and zip up the contents of that directory:
mkdir psycopg2-binary
cd psycopg2-binary
pip install psycopg2-binary -t .
# in case using python3:
# python3 -m pip install --system psycopg2-binary -t .
zip -r9 psycopg2.zip *
I then copied psycopg2.zip to an S3 bucket and add it as an extra Python library under "Python library path" in the Glue Spark job.
I then launched the job with the following script to verify if psycopg2 is present (the zip file will be downloaded by Glue into the directory in which the Job script is located)
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import sys
import os
import zipfile
## #params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
zip_ref = zipfile.ZipFile('./psycopg2.zip', 'r')
print os.listdir('.')
zip_ref.extractall('/tmp/packages')
zip_ref.close()
sys.path.insert(0, '/tmp/packages')
import psycopg2
print(psycopg2.__version__)
job.commit()
Download the compiled version of psycopg2 from this link https://github.com/jkehler/awslambda-psycopg2. As psycopg2 is C library for python, which need to be compiled on linux to make it work. The compile instruction also given on that link. Thanks to the https://github.com/jkehler.
This also happens to me in new Ubuntu 18.04. It is caused by missing one file _psycopg.py in the /usr/local/lib/python3.7/site-packages/psycopg2.
It is fixed by:
remove the old psycopg2 from your machine pip3 uninstall psycopg2.
download new pyscopg2 manually from the official page http://initd.org/psycopg/tarballs/PSYCOPG-2-7/psycopg2-2.7.7.tar.gz
tar xvf psycopg2-2.7.7.tar.gz
python setup.py build
sudo python setup.py install
I had this happen in Linux using Python 3.7. It is caused by missing one file _psycopg.cpython-37m-x86_64-linux-gnu.so in the /usr/local/lib/python3.7/site-packages/psycopg2.
I downloaded _psycopg.cpython-37m-x86_64-linux-gnu.so from https://github.com/jkehler/awslambda-psycopg2/tree/master/psycopg2-3.7, and Copied this file into my anaconda lib.
I had this happen in Linux using Python 2 because I had accidentally had my PYTHONPATH set to Python 3 libraries, and it was trying to load the python3 version of psycopg2. Solution was to unset PYTHONPATH.
I had the same error on Windows, this worked for me:
pip install -U psycopg2
I had an older version installed, must have depreciated
For lambda functions on Python 3.7, I ended up using the psycopg2-binary library mentioned in these threads:
https://github.com/jkehler/awslambda-psycopg2/issues/51
Using psycopg2 with Lambda to Update Redshift (Python)
pip3 install psycopg2-binary==2.8.3
Snippet from these links:
I ended up using a different library: psycopg2-binary in my requirement.txt file and it working fine now.
solved it by using psycopg2-binary==2.8.3
I came to know that most times the WINDOWS packaging does not go fine with LAMBDA.
I faced same issue while running LAMBDA with WINDOWS installed 3rd party pscyopg2 packaging.
Solution:
step1>
I installed psycopg2 in Linux.
Copied both the directories psycopg2_binary-2.8.2.dist-info and psycopg2 from Linux to windows.
step2>
Along with source *.py, packaged with copied 3rd party dependencies psycopg2 in windows to *.zip file
step3>
Upload the file to LAMBDA - Here it goes, It runs successfully without any error.
Windows 10 with conda environment manager (fresh install of Django, wagtail with PostgreSQL), had the same error. Removed psycopg2
conda remove -n myenv psycopg2
it updated some packages, removed others (it also removed django, wagtail...). Then installed psycopg2 back
conda install -n myenv psycopg2
Tested it, import worked
python
>>> import psycopg2
Installed django, wagtail back. python manage.py migrate now populated PostgreSQL.
In my case, it was other site-packages that was exposed by installing pgcli, uninstalling pgcli resolved the issue for the time being.
This somehow penetrated virtualenv too.

Resources