sqlite3 error on AWS lambda with Python 3 - python-3.x

I am building a python 3.6 AWS Lambda deploy package and was facing an issue with SQLite.
In my code I am using nltk which has a import sqlite3 in one of the files.
Steps taken till now:
Deployment package has only python modules that I am using in the root. I get the error:
Unable to import module 'my_program': No module named '_sqlite3'
Added the _sqlite3.so from /home/my_username/anaconda2/envs/py3k/lib/python3.6/lib-dynload/_sqlite3.so into package root. Then my error changed to:
Unable to import module 'my_program': dynamic module does not define module export function (PyInit__sqlite3)
Added the SQLite precompiled binaries from sqlite.org to the root of my package but I still get the error as point #2.
My setup: Ubuntu 16.04, python3 virtual env
AWS lambda env: python3
How can I fix this problem?

Depending on what you're doing with NLTK, I may have found a solution.
The base nltk module imports a lot of dependencies, many of which are not used by substantial portions of its feature set. In my use case, I'm only using the nltk.sent_tokenize, which carries no functional dependency on sqlite3 even though sqlite3 gets imported as a dependency.
I was able to get my code working on AWS Lambda by changing
import nltk
to
import imp
import sys
sys.modules["sqlite"] = imp.new_module("sqlite")
sys.modules["sqlite3.dbapi2"] = imp.new_module("sqlite.dbapi2")
import nltk
This dynamically creates empty modules for sqlite and sqlite.dbapi2. When nltk.corpus.reader.panlex_lite tries to import sqlite, it will get our empty module instead of the standard library version. That means the import will succeed, but it also means that when nltk tries to use the sqlite module it will fail.
If you're using any functionality that actually depends on sqlite, I'm afraid I can't help. But if you're trying to use other nltk functionality and just need to get around the lack of sqlite, this technique might work.

This is a bit of a hack, but I've gotten this working by dropping the _sqlite3.so file from Python 3.6 on CentOS 7 directly into the root of the project being deployed with Zappa to AWS. This should mean that if you can include _sqlite3.so directly into the root of your ZIP, it should work, so it can be imported by this line in cpython:
https://github.com/python/cpython/blob/3.6/Lib/sqlite3/dbapi2.py#L27
Not pretty, but it works. You can find a copy of _sqlite.so here:
https://github.com/Miserlou/lambda-packages/files/1425358/_sqlite3.so.zip
Good luck!

This isn't a solution, but I have an explanation why.
Python 3 has support for sqlite in the standard library (stable to the point of pip knowing and not allowing installation of pysqlite). However, this library requires the sqlite developer tools (C libs) to be on the machine at runtime. Amazon's linux AMI does not have these installed by default, which is what AWS Lambda runs on (naked ami instances). I'm not sure if this means that sqlite support isn't installed or just won't work until the libraries are added, though, because I tested things in the wrong order.
Python 2 does not support sqlite in the standard library, you have to use a third party lib like pysqlite to get that support. This means that the binaries can be built more easily without depending on the machine state or path variables.
My suggestion, which you've already done I see, is to just run that function in python 2.7 if you can (and make your unit testing just that much harder :/).
Because of the limitations (it being something baked into python's base libs in 3) it is more difficult to create a lambda-friendly deployment package. The only thing I can suggest is to either petition AWS to add that support to lambda or (if you can get away without actually using the sqlite pieces in nltk) copying anaconda by putting blank libraries that have the proper methods and attributes but don't actually do anything.
If you're curious about the latter, check out any of the fake/_sqlite3 files in an anaconda install. The idea is only to avoid import errors.

As apathyman describes, there isn't a direct solution to this until Amazon bundle the C libraries required for sqlite3 into the AMI's used to run Python on lambda.
One workaround though, is using a pure Python implementation of SQLite, such as PyDbLite. This side-steps the problem, as a library like this doesn't require any particular C libraries to be installed, just Python.
Unfortunately, this doesn't help you if you are using a library which in turn uses the sqlite3 module.

My solution may or may not apply to you (as it depends on Python 3.5), but hopefully it may shed some light for similar issue.
sqlite3 comes with standard library, but is not built with the python3.6 that AWS use, with the reason explained by apathyman and other answers.
The quick hack is to include the share object .so into your lambda package:
find ~ -name _sqlite3.so
In my case:
/home/user/anaconda3/pkgs/python-3.5.2-0/lib/python3.5/lib-dynload/_sqlite3.so
However, that is not totally sufficient. You will get:
ImportError: libpython3.5m.so.1.0: cannot open shared object file: No such file or directory
Because the _sqlite3.so is built with python3.5, it also requires python3.5 share object. You will also need that in your package deployment:
find ~ -name libpython3.5m.so*
In my case:
/home/user/anaconda3/pkgs/python-3.5.2-0/lib/libpython3.5m.so.1.0
This solution is likely not work if you are using _sqlite3.so that is built with python3.6, because the libpython3.6 built by AWS will likely not support this. However, this is just my educational guess. If anyone has successfully done, please let me know.

TL;DR
Although not a very good approach but it works just fine. Use pickle module to dump whatever functionality you want in a separate file using you Local System. Export that file to AWS Project Directory, load functionality from file and use it.
In Long
So, here's what I tried, I had this same problem when I was trying to import stopwords from nltk.corpus but AWS doesn't let me import it, even installing it wasn't seems to be possible for me on Amazon AMI because of the same error _sqlite3 module not found. So to do that, what I tried was, using my local system I generated a file and dump stopwords into it. Here's the code
# Only for Local System
from nltk.corpus import stopwords
import pickle
stop = list(stopwords.words('English'))
with open('stopwords.pkl', 'wb') as f:
pickle.dump(stop, f)
Now, I exported this file to AWS Directory using winscp and then used that functionality by loading it from the file on my project.
import pickle
# loading trained model
with open('stopwords.pkl', 'rb') as f:
stop = pickle.load(f)
and it works just fine

From AusIV's answer, This version works for me in AWS Lambda and NLTK, I created a dummysqllite file to mock the required references.
spec = importlib.util.spec_from_file_location("_sqlite3","/dummysqllite.py")
sys.modules["_sqlite3"] = importlib.util.module_from_spec(spec)
sys.modules["sqlite3"] = importlib.util.module_from_spec(spec)
sys.modules["sqlite3.dbapi2"] = importlib.util.module_from_spec(spec)

You need the sqlite3.so file (as others have pointed out), but the most robust way to get it is to pull from the (semi-official?) AWS Lambda docker images available in lambci/lambda. For example, for Python 3.7, here's an easy way to do this:
First, let's grab the sqlite3.so (library file) from the docker image:
mkdir lib
docker run -v $PWD:$PWD lambci/lambda:build-python3.7 bash -c "cp sqlite3.cpython*.so $PWD/lib/"
Next, we'll make a zipped executable with our requirements and code:
pip install -t output requirements.txt
pip install . -t output
zip -r output.zip output
Finally, we add the library file to our image:
cd lib && zip -r ../output.zip sqlite3.cpython*.so
If you want to use AWS SAM build/packaging, instead copy it into the top-level of the lambda environment package (i.e., next to your other python files).

Related

How to embed FreeCAD in a Python virtual environment?

How do I setup a Python virtual environment with the FreeCAD library embedded as to enable import as a module into scripts?
I would like to avoid using the FreeCAD GUI as well as being dependent on having FreeCAD installed on the system, when working with Python scripts that use FreeCAD libraries to create and modify 3D geometry. I hope a Python virtual environment can make that possible.
I am working in PyCharm 2021.1.1 with Python 3.8 in virtualenv on Debian 10.
I started out with FreeCAD documentation for embedding in scripts as a basis:
https://wiki.freecadweb.org/Embedding_FreeCAD
In one attempt, I downloaded and unpacked .deb packages from Debian, taking care to get the correct versions required by each dependency. In another attempt, I copied the contents of a FreeCAD flatpak install, as it should contain all the libraries that FreeCAD depends on.
After placing the libraries to be imported in the virtual maching folder, I have pointed to them with sys.path.append() as well as PyCharm's Project Structure tool in various attempts. In both cases the virtual environment detects where FreeCAD.so is located, but fails to find any of its dependencies, even when located in the same folder. When importing these dependencies explicitly, each of them have the same issue. This leads to a dead end when an import fails because it does not define a module export function according to Python:
ImportError: dynamic module does not define module export function (PyInit_libnghttp2)
I seem to be looking at a very long chain of broken dependencies, even though I make the required libraries available and inform Python where they are located.
I would appreciate either straight up instructions for how to do this or pointers to documentation that describes importing FreeCAD libraries in Python virtual environments, as I have not come across anything that specific yet.
I came across a few prior questions which seemed to have similar intent, but no answers:
Embedding FreeCAD in python script
Is it possible to embed Blender/Freecad in a python program?
Similar questions for Conda focus on importing libraries from the host system rather than embedding them in the virtual environment:
Incude FreeCAD in system path for just one conda virtual environment
Other people's questions on FreeCAD forums went unanswered:
https://forum.freecadweb.org/viewtopic.php?t=27929
EDIT:
Figuring this out was a great learning experience. The problem with piecing dependencies together is that for that approach to work out, everything from the FreeCAD and its dependencies to the Python interpreter and its dependencies seems to need to be built on the same versions of the libraries that they depend on to avoid causing segmentation faults that brings everything to a crashing halt. This means that the idea of grabbing FreeCAD modules and libraries it depends on from a Flatpak installation is in theory not horrible, as all parts are built together using the same library versions. I just couldn't make it work out, presumably due to how the included libraries are located and difficulty identifying an executable for the included Python interpreter. In the end, I looked into the contents of the FreeCAD AppImage, and that turned out to have everything needed in a folder structure that appears to be very friendly to what PyCharm and Python expects from modules and libraries.
This is what I did to get FreeCAD to work with PyCharm and virtualenv:
Download FreeCAD AppImage
https://www.freecadweb.org/downloads.php
Make AppImage executable
chmod -v +x ~/Downloads/FreeCAD_*.AppImage
Create folder for extracting AppImage
mkdir -v ~/Documents/freecad_appimage
Extract AppImage from folder (note: this expands to close to 30000 files requiring in excess of 2 GB disk space)
cd ~/Documents/freecad_appimage
~/Downloads/./FreeCAD_*.AppImage --appimage-extract
Create folder for PyCharm project
mkdir -v ~/Documents/pycharm_freecad_project
Create pycharm project using Python interpreter from extracted AppImage
Location: ~/Documents/pycharm_freecad_project
New environment using: Virtualenv
Location: ~/Documents/pycharm_freecad_project/venv
Base interpreter: ~/Documents/freecad_appimage/squashfs-root/usr/bin/python
Inherit global site-packages: False
Make available to all projects: False
Add folder containing FreeCAD.so library as Content Root to PyCharm Project Structure and mark as Sources (by doing so, you shouldn't have to set PYTHONPATH or sys.path values, as PyCharm provides module location information to the interpreter)
File: Settings: Project: Project Structure: Add Content Root
~/Documents/freecad_appimage/squashfs-root/usr/lib
After this PyCharm is busy indexing files for a while.
Open Python Console in PyCharm and run command to check basic functioning
import FreeCAD
Create python script with example functionality
import FreeCAD
vec = FreeCAD.Base.Vector(0, 0, 0)
print(vec)
Run script
Debug script
All FreeCAD functionality I have used in my scripts so far has worked. However, one kink seems to be that the FreeCAD module needs to be imported before the Path module. Otherwise the Python interpreter exits with code 139 (interrupted by signal 11: SIGSEGV).
There are a couple of issues with PyCharm: It is showing a red squiggly line under the import name and claiming that an error has happened because "No module named 'FreeCAD'", even though the script is running perfectly. Also, PyCharm fails to provide code completion in the code view, even though it manages to do so in it's Python Console. I am creating new questions to address those issues and will update info here if I find a solution.

What version of dill should I be using when running Apache Beam v 2.2.5 with Python 3?

I am attempting to run a Dataflow job using Apache Beam v 2.25 and Python 3.7. Everything runs ok when using DirectRunner, but the job errors out when it attempts to invoke a function form another private Python module.
The error is
AttributeError: Can't get attribute '_create_code' on <module 'dill._dill' from '/usr/local/lib/python3.7/site-packages/dill/_dill.py'>
My setup file looks like this:
setup(
name="Rich Profile and Activiation reports",
version="0.1",
description="Scripts for reports",
author="Kim Merino",
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
include_package_data=True,
package_data={"": ["*.json"]},
)
My question is what version of dill should I be using for Apache Beam v. 2.25? I am currently using Dill v 0.3.3
I have an external dependency that required dill in order to work
Have you tried to set --save_main_session=True when running the pipeline?
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
I had a similar issue with the same error message just now. What solved it was to make sure that I was using a Python virtual environment dedicated to my Beam/Dataflow code, in other words something like:
cd [code dir]
python -m venv env
source ./env/bin/activate
pip install apache-beam[gcp]
pip freeze > requirements.txt
Previously I was lazily using my global copy of Python 3. I'm not entirely sure why this fixed it, but I might have been using an incompatible version of dill (which is just a pickling-related library used by Beam).
When you either package your Beam code as a Classic Template or invoke Dataflow directly from your laptop, in either case, your code ends up getting pickled and sent to Dataflow. So if you are using a Python installation with an incompatible module, that can cause issues.
And of course in any case, it's a best practice to have a virtual environment for your repo, so this can narrow down the universe of possible issues.
I removed the external dependency and kept the requirements to a minimum

setup.py not finding pre-installed dependencies

I am trying to install a package I have written by running
python setup.py install
whilst in the package directory. My folder structure is as follows:
package
-module
--__init__.py
--parent_class.py
--child_class.py
-setup.py
Here '-' denotes the folder level with package being the root directory.
When running the installation works as expected until:
Installed c:\users\me\appdata\local\continuum\anaconda3\lib\site-
packages\package-0.1-py3.7.egg
Processing dependencies for package==0.1
Searching for json
Reading https://pypi.org/simple/json/
Couldn't find index page for 'json' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
No local packages or working download links found for json
error: Could not find suitable distribution for Requirement.parse('json')
I don't believe this is an issue with the json package installation as:
I know I have it installed (I can run 'import json' in my env)
and because if I remove it.
If i remove 'json' from install_requires=['pandas', 'numpy', 'datetime', 'os', 'copy', 'json'] it will simply return the same error, this time with the 'copy' and then the 'os' package.
Thanks in advance
I believe json is a built-in module in Python 2 and 3 (e.g., like open). Thus, it's usually included in all Python distributions (that is, anyone who has Python probably also has the json module).
Thus, you should be able to omit json from your requirements (while still importing it in your code). Since json is a built-in module, you can assume that anyone who has Python installed will also have json.
I found this question that is also about installing the json module. The second answer notes that minimal Python installs (e.g., python-minimal in Ubuntu) may not include json, however. You should note that the linked question concerns Python 2 and not 3.
Therefore, if a user lacks json on their system, they should be able to install it through their system's package manager (in Linux), and not pip. In Windows, I would assume that all Python installations will include built-in modules (though I don't know for certain). Regardless, if a user doesn't have json installed, then I think it's up to the user to install it.
The same applies to os, copy, and datetime. Only pandas and numpy need to be included in your requirements.
You can find a list of built-in modules (for Python 3) here. Again, the modules listed here are ones that are not hosted on pypi (thus not available through pip install ..., conda install ..., etc.), but ones that are usually included in all Python distributions.
As an aside: this is my first answer here. Please let me know if I can improve it.

Cannot Upload C++ extension to Colab

I wrote a C++ extension and wrapped it using PyBind11 and compiled it on my Linux machine which yielded a .so file that works locally; however, I cannot upload that .so file to Colab so I tried it on Windows and got a .pyd file which does not upload either... am I doing something wrong?
You aren't doing anything wrong, but what method do you think that colab provides to upload system libraries? (hint: none).
You may have better luck trying to embed C code into python, i.e scipy.weave, but that still requires an environment that has access to a C compiler, which colab does not provide.
You can test if weave is provided as part of a jupyter environment as follows:
!pip install -q weave
import weave
weave.test()

PyQt5 stubs files

I've just started a PyQt5 project that is currently running in a virtualenv.
PyQt5 was installed using a classic pip install pyqt5.
I want to type check my application using mypy.
But when I run it, I get an error telling me that there are no stubs file for PyQt5.
myapp/__main__.py:3: error: No library stub file for module 'PyQt5.QtWidgets
I've checked the site-package of my virtualenv, and indeed, there aren't any .pyi file in it.
Checking the documentation, I see that if compiled, stub files can be generated (and could at least exists beginning with PyQt5.6, I'm using 5.10).
Is there a way to obtain those file without the need to manually compile the library ?
I had the same problem with PyQt5. So I decided to put up PyQt5-stubs which contains the stub files for the main PyQt5 modules.
You can install it with pip:
$ pip install pyqt5-stubs
Another solution for you would be to change to Qt for Python (PySide2). They provide proper type annotations.
Not currently.
PEP 561, which specifies how packages should indicate they supply type information, was recently accepted. (Full disclosure, I am the author).
PyQt5 will need to become compliant with the PEP, and then as long as you are using mypy >=0.600, everything should work as expected.

Resources