How to get Sphinx to recognize pandas as a module - python-3.x

I am using Python 3.9.2 with sphinx-build 3.5.2. I have created a project with the following directory and file structure
core_utilities
|_core_utilities
| |_ __init__.py
| |_read_files.py
|_docs
|_ sphinx
|_Makefile
|_conf.pg
|_source
| |_conf.py
| |_index.rst
| |_read_files.rst
|_build
The read_files.py file contains the following code. NOTE: I simplified this example so it would not have distracting information. Within this code their is one class that contains one member function to read a text file, look for a key word and read the variable to the right of the keyword as a numpy.float32 variable. The stand-alone function written to read in a csv file with specific data types and save it to a pandas dataframe.
# Import packages here
import os
import sys
import numpy as np
import pandas as pd
from typing import List
class ReadTextFileKeywords:
def __init__(self, file_name: str):
self.file_name = file_name
if not os.path.isfile(file_name):
sys.exit('{}{}{}'.format('FATAL ERROR: ', file_name, ' does not exist'))
# ----------------------------------------------------------------------------
def read_double(self, key_words: str) -> np.float64:
values = self.read_sentence(key_words)
values = values.split()
return np.float64(values[0])
# ================================================================================
# ================================================================================
def read_csv_columns_by_headers(file_name: str, headers: List[str],
data_type: List[type],
if not os.path.isfile(file_name):
sys.exit('{}{}{}'.format('FATAL ERROR: ', file_name, ' does not exist'))
dat = dict(zip(headers, data_type))
df = pd.read_csv(file_name, usecols=headers, dtype=dat, skiprows=skip)
return df
The conf.py file as the following information in it.
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath('../../../core_utilities'))
# -- Project information -----------------------------------------------------
project = 'Core Utilities'
copyright = 'my copyright'
author = 'my name'
# The full version, including alpha/beta/rc tags
release = '0.1.0'
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.todo', 'sphinx.ext.viewcode', 'sphinx.ext.autodoc',
'sphinx.ext.autosummary', 'sphinx.ext.githubpages']
autodoc_member_order = 'groupwise'
autodoc_default_flags = ['members', 'show-inheritance']
autosummary_generate = True
autodock_mock_imports = ['numpy']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'nature'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
The index.rst file has the following information.
. Core Utilities documentation master file, created by
sphinx-quickstart
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Core Utilities's documentation!
==========================================
.. toctree::
:maxdepth: 2
:caption: Contents:
read_files
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
and the read_files.rst file has the following information
**********
read_files
**********
The ``read_files`` module contains methods and classes that allow a user to read different
types of files in different ways. Within the module a user will find functionality that
allows them to read text files and csv files by columns and keywords. This module
also contains functions to read sqlite databases, xml files, json files, yaml files,
and html files
.. autoclass:: test.ReadTextFileKeywords
:members:
.. autofunction:: test.read_csv_columns_by_headers
In addition, I am using a virtual environment that is active when I try to compile this with command line. I run the following command to compile html files with sphinx.
sphinx-build -b html source build
When I run the above command it fails with the following error.
WARNING: autodoc: failed to import class 'ReadTextFileKeywords' from module 'read_files'; the following exception was raised; No module named pandas.
If I delete the line from pandas import pd and then delete the function read_csv_columns_by_headers along with the call to the function in the read_files.rst file, everything compiles fine. It appears that for some reason sphinx is able to find numpy, but it for some reason does not seem to recognize pandas, although both exist in the virtual environment and were loaded with a pip3 install statement. Does anyone know why sphinx is able to find other modules, but not pandas?

Add pandas to autodoc
autodoc_mock_imports = ['numpy','pandas']

Related

Acessing myclass defined in python file in different folder from jupyter notebook

I have following folder structure
|
|---notebook
| |
| --- MyNoteBook.ipnyb
|
|---python
| |
| --- image_rotation_utils.py
I have following content in python.image_rotation_utils.py
class ImageRotationUtils:
""" Utils for rotating images."""
def __init__(self, image_path=None, image_name=None, output_path=None, output_annotations_file=None ):
""" Initializes the class.
Args:
image_path: the relative path to the image
image_name: image file name
output_path: the relative path where rotated output file is stored.
output_annotations_file: annotations file where rotated anotations are stored.
"""
self.image_path = image_path
self.image_name = image_name
self.output_path = output_path
self.output_annotations_file = output_annotations_file
#other functions defined
Now in MyNoteBook.ipnyb I have following cells
Cell 1:
import sys
sys.path.insert(0, '..')
from python.image_rotation_utils import *
Above is executed successfully now in cell 2 I have below
myUtils = ImageRotationUtils()
I have getting below error
----> 1 ImageRotationUtils()
TypeError: __init__() missing 4 required positional arguments: 'image_path', 'image_name', 'output_path', and 'output_annotations_file'
whye I am getting this error when I have default values in constructor has None in class definition.
I am new to python and trying to put code in productive manner. What mistake I am making. Kindly provide inputs
What you pasted looks ok to me, and your code works for me without errors.
Perhaps something before or after that code is what is spoiling the class definition?
Perhaps you are running this in Jupyter or in IPython and you imported ImageRotationUtils previously, but then changed the code, and tried to reimport again -- and something was wrong with the new reimport so your definition did not get over-written? If that's the case, restart the environment (or the kernel in Jupyter) and rerun the code.
I would suggest putting a simple initialization code, like that constructor line, into the same source file and executing it as a separate process to test if that is the code or the environment issue.
As a matter of convenience and to avoid tweaking sys.path in your code, I would suggest adding your python directory to the PYTHONPATH environment variable before you load your environment, so you can just import.

Sphinx cannot find pandas

I am trying to document a Python class using sphinx. I am used Python 3.9.2 and sphinx-build 3.5.2. I am running the command sphinx-build -b html source build to transform the .rst files into .html files. I have already successfully executed this configuration on other classes and functions in the code-suite, so I know their is nothing wrong with the way I have organized my files. When I run the command I get the following error;
Warning: autodoc: failed to import class 'ReadTextFileKeywords' from module 'read_files'; the following exception was raised: No module named pandas
I do have the most recent version of pandas loaded globally and in my virtual environment. Does anyone know why it cannot load pandas and how I can fix this issue?
For reference, the file structure looks like this;
core_utilities
|_core_utilities
|_docs
|_sphinx
|_Makefile
|_build
| |_html files
|_source
|_index.rst
|_Introduction.rst
|_read_files.rst
|_operating_system.rst
|_conf.py
The conf.py file has the following information
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath('../../../core_utilities'))
# -- Project information -----------------------------------------------------
project = 'Core Utilities'
copyright = '2021, Jonathan A. Webb'
author = 'Jonathan A. Webb'
# The full version, including alpha/beta/rc tags
release = '0.1.0'
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.todo', 'sphinx.ext.viewcode', 'sphinx.ext.autodoc',
'sphinx.ext.autosummary', 'sphinx.ext.githubpages']
autodoc_member_order = 'groupwise'
autodoc_default_flags = ['members', 'show-inheritance']
autosummary_generate = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'nature'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
and the index file has the following format
.. Core Utilities documentation master file, created by
sphinx-quickstart
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Core Utilities's documentation!
==========================================
.. toctree::
:maxdepth: 2
:caption: Contents:
Introduction
operating_system
read_files
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Sphinx works just fine when I import docstrings from the operating_system module, so I know that is not the problem. It fails when I try to import from the read_files module. The read_files.rst file has hte following format.
**********
read_files
**********
The ``read_files`` module contains methods and classes that allow a user to read different
types of files in different ways. Within the module a user will find functionality that
allows them to read text files and csv files by columns and keywords. This module
also contains functions to read sqlite databases, xml files, json files, yaml files,
and html files
.. autoclass:: read_files.ReadTextFileKeywords
:members:
And the class it is reading, to include the file has the following format
# Import packages here
import os
import sys
import numpy as np
import pandas as pd
from typing import List
import sqlite3
# ================================================================================
# ================================================================================
# Date: Month Day, Year
# Purpose: Describe the purpose of functions of this file
# Source Code Metadata
__author__ = "Jonathan A. Webb"
__copyright__ = "Copyright 2021, Jon Webb Inc."
__version__ = "1.0"
# ================================================================================
# ================================================================================
# Insert Code here
class ReadTextFileKeywords:
"""
A class to find keywords in a text file and the the variable(s)
to the right of the key word. This class must inherit the
``FileUtilities`` class
:param file_name: The name of the file being read to include the
path-link
For the purposes of demonstrating the use of this class, assume
a text file titled ``test_file.txt`` with the following contents.
.. code-block:: text
sentence: This is a short sentence!
float: 3.1415 # this is a float comment
double: 3.141596235941 # this is a double comment
String: test # this is a string comment
Integer Value: 3 # This is an integer comment
float list: 1.2 3.4 4.5 5.6 6.7
double list: 1.12321 344.3454453 21.434553
integer list: 1 2 3 4 5 6 7
"""
def __init__(self, file_name: str):
self.file_name = file_name
if not os.path.isfile(file_name):
sys.exit('{}{}{}'.format('FATAL ERROR: ', file_name, ' does not exist'))

How to add a module folder /tar.gz to nodes in Pyspark

I am running pyspark in Ipython Notebook after doing following configuration
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook--NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
export PYSPARK_PYTHON=/usr/bin/python
I am having a custom udf function, which makes use of a module called mzgeohash. But, I am getting module not found error, I guess this module might be missing in workers / nodes .I tried to add sc.addpyfile and all. But, what will be the effective way to add a cloned folder or tar.gz python module in this case , from Ipython .
Here is how I do it, basically the idea is to create a zip of all the files in your module and pass it to sc.addPyFile() :
import dictconfig
import zipfile
def ziplib():
libpath = os.path.dirname(__file__) # this should point to your packages directory
zippath = '/tmp/mylib-' + rand_str(6) + '.zip' # some random filename in writable directory
zf = zipfile.PyZipFile(zippath, mode='w')
try:
zf.debug = 3 # making it verbose, good for debugging
zf.writepy(libpath)
return zippath # return path to generated zip archive
finally:
zf.close()
...
zip_path = ziplib() # generate zip archive containing your lib
sc.addPyFile(zip_path) # add the entire archive to SparkContext
...
os.remove(zip_path) # don't forget to remove temporary file, preferably in "finally" clause

Loading python modules in Python 3

How do I load a python module, that is not built in. I'm trying to create a plugin system for a small project im working on. How do I load those "plugins" into python? And, instaed of calling "import module", use a string to reference the module.
Have a look at importlib
Option 1: Import an arbitrary file in an arbiatrary path
Assume there's a module at /path/to/my/custom/module.py containing the following contents:
# /path/to/my/custom/module.py
test_var = 'hello'
def test_func():
print(test_var)
We can import this module using the following code:
import importlib.machinery
myfile = '/path/to/my/custom/module.py'
sfl = importlib.machinery.SourceFileLoader('mymod', myfile)
mymod = sfl.load_module()
The module is imported and assigned to the variable mymod. We can then access the module's contents as:
mymod.test_var
# prints 'hello' to the console
mymod.test_func()
# also prints 'hello' to the console
Option 2: Import a module from a package
Use importlib.import_module
For example, if you want to import settings from a settings.py file in your application root folder, you could use
_settings = importlib.import_module('settings')
The popular task queue package Celery uses this a lot, rather than giving you code examples here, please check out their git repository

Py2exe: Embed static files in exe file itself and access them

I found a solution to add files in library.zip via: Extend py2exe to copy files to the zipfile where pkg_resources can load them.
I can access to my file when library.zip is not include the exe.
I add a file : text.txt in directory: foo/media in library.zip.
And I use this code:
import pkg_resources
import zipfile
from cStringIO import StringIO
my_data = pkg_resources.resource_string(__name__,"library.zip")
filezip = StringIO(my_data)
zip = zipfile.ZipFile(filezip)
data = zip.read("foo/media/text.txt")
I try to use pkg_resources but I think that I don't understand something because I could open directly "library.zip".
My question is how can I do this when library.zip is embed in exe?
Best Regards
Jean-Michel
I cobbled together a reasonably neat solution to this, but it doesn't use pkg_resources.
I need to distribute productivity tools as standalone EXEs, that is, all bundled into the one .exe file. I also need to send out notifications when these tools are used, which I do via the Logging API, using file-based configuration. I emded the logging.cfg fileto make it harder to effectively switch-off these notifications i.e. by deleting the loose file... which would probably break the app anyway.
So the following is the interesting bits from my setup.py:
LOGGING_CFG = open('main/resources/logging.cfg').read()
setup(
name='productivity-tool',
...
# py2exe extras
console=[{'script': productivity_tool.__file__.replace('.pyc', '.py'),
'other_resources': [(u'LOGGINGCFG', 1, LOGGING_CFG)]}],
zipfile=None,
options={'py2exe': {'bundle_files': 1, 'dll_excludes': ['w9xpopen.exe']}},
)
Then in the startup code for productivity_tool.py:
from win32api import LoadResource
from StringIO import StringIO
from logging.config import fileConfig
...
if __name__ == '__main__':
if is_exe():
logging_cfg = StringIO(LoadResource(0, u'LOGGINGCFG', 1))
else:
logging_cfg = 'main/resources/logging.cfg'
fileConfig(logging_cfg)
...
Works a treat!!!
Thank you but I found the solution
my_data = pkg_resources.resource_stream("__main__",sys.executable) # get lib.zip file
zip = zipfile.ZipFile(my_data)
data = zip.read("foo/media/doc.pdf") # get my data on lib.zip
file = open(output_name, 'wb')
file.write(data) # write it on a file
file.close()
Best Regards
You shouldn't be using pkg_resources to retrieve the library.zip file. You should use it to retrieve the added resource.
Suppose you have the following project structure:
setup.py
foo/
__init__.py
bar.py
media/
image.jpg
You would use resource_string (or, preferably, resource_stream) to access image.jpg:
img = pkg_resources.resource_string(__name__, 'media/image.jpg')
That should "just work". At least it did when I bundled my media files in the EXE. (Sorry, I've since left the company where I was using py2exe, so don't have a working example to draw on.)
You could also try using pkg_resources.resource_filename(), but I don't think that works under py2exe.

Resources