derive class from from sklearn.cluster.KMeans with cython

derive class from from sklearn.cluster.KMeans with cython - scikit-learn

I like to create a child class of scikit-learns's sklearn.cluster.KMeans and would like to do this in cython for performance reasons. Is this possible?
There is an old issue https://github.com/scikit-learn/scikit-learn/issues/2057 which seems to be related and indicates the (non-)publication of pxd-files. Deriving in cython from one the main classes in sklearn would really be useful, however, so I ask here if there is now any solution.
My source file:
from sklearn.cluster cimport KMeans
cimport cython
#cython.cclass
class OtherKMeans(KMeans):
def __init__(self,kappa, **kwargs):
self.kappa = kappa
super().__init__(**kwargs)
my setup file setup1.py
from setuptools import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize("otherkmeans.pyx"),
compiler_directives={'language_level' : "3"}
)
Result of calling
python setup1.py build_ext --inplace
is
src> python setup1.py build_ext --inplace
Compiling otherkmeans.pyx because it changed.
[1/1] Cythonizing otherkmeans.pyx
/home/miller/miniconda2/envs/bkm9/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /home/musk/new-k-means/src/otherkmeans.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Error compiling Cython file:
------------------------------------------------------------
...
from sklearn.cluster cimport KMeans
^
------------------------------------------------------------
otherkmeans.pyx:1:0: 'sklearn/cluster.pxd' not found
Error compiling Cython file:
------------------------------------------------------------
...
from sklearn.cluster cimport KMeans
^
------------------------------------------------------------
otherkmeans.pyx:1:0: 'sklearn/cluster/KMeans.pxd' not found
Error compiling Cython file:
------------------------------------------------------------
...
from sklearn.cluster cimport KMeans
cimport cython
#cython.cclass
class OtherKMeans(KMeans):
^
------------------------------------------------------------
otherkmeans.pyx:4:18: First base of 'OtherKMeans' is not an extension type
Traceback (most recent call last):
File "/home/miller/new-k-means/src/setup1.py", line 4, in <module>
ext_modules=cythonize("otherkmeans.pyx"),
File "/home/miller/miniconda2/envs/bkm9/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 1127, in cythonize
cythonize_one(*args)
File "/home/miller/miniconda2/envs/bkm9/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 1250, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: otherkmeans.pyx
How can I fix this?

KMeans looks to be a regular Python class. That means you can't use it as the first base for a cdef class (you can actually derive a cdef class from it with it as a second/third/etc base though).
You can compile regular (i.e. non-cdef) classes in Cython. Their functions are still sped up by Cython. The limitations are:
they aren't generated as a C struct so you can't get fast access to C defined members,
they can't have cdef member functions (but if you care about this you might be able to use global cdef functions instead... Remember though, the cdef is mainly a calling convention - a regular function is still sped up by Python).
Therefore you should ask yourself "do I actually need a cdef/cython.cclass class?" and the answer is probably "no".

Related

Make pytest ignore gathering a file if a module is not installed

I am in as situation where I am using a module as an optional package. My goal is to make the test succeed even if I do not have that optional package installed.
The problem is I need to inherit some classes from that module. Even if I skip the tests, pytest gathers all the files and throws an error while gathering the file that is inheriting the class.
Is there a way to ask pytest to ignore this file if a certain module does not exist?
For example,
if have_tf and have_tfa:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Layer
This I can do, but
class UNet3D(Layer):
Throws an error because it fails to find Layer if tensorflow is not installed while gathering the files.
Could there be a way? I could lose all the inheritence, but that would make the code super messy.
Thank you!

You could create fake Model and Layer classes if the imports fail. E.g:
import pytest
try:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Layer
except ImportError:
have_tf = False
class Model:
pass
class Layer:
pass
class UNet3D(Layer):
def __init__(self):
self.do_something()
#pytest.mark.skipif(not have_tf, reason="tensorflow is not installed")
def test_unet3d():
assert True
Running pytest -v on a system without Tensorflow yields:
============================= test session starts ==============================
platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/lars/.local/share/virtualenvs/python-LD_ZK5QN/bin/python
cachedir: .pytest_cache
rootdir: /home/lars/tmp/python
collecting ... collected 1 item
test_example.py::test_unet3d SKIPPED (tensorflow is not installed) [100%]
============================== 1 skipped in 0.01s ==============================

ImportError: cannot import name 'joblib' from 'sklearn.externals'

I am trying to load my saved model from s3 using joblib
import pandas as pd
import numpy as np
import json
import subprocess
import sqlalchemy
from sklearn.externals import joblib
ENV = 'dev'
model_d2v = load_d2v('model_d2v_version_002', ENV)
def load_d2v(fname, env):
model_name = fname
if env == 'dev':
try:
model=joblib.load(model_name)
except:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
else:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
return model
But I get this error:
from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals' (C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\externals\__init__.py)
Then I tried installing joblib directly by doing
import joblib
but it gave me this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in load_d2v_from_s3
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/usr/lib64/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/usr/lib64/python3.7/pickle.py", line 1376, in load_global
klass = self.find_class(module, name)
File "/usr/lib64/python3.7/pickle.py", line 1426, in find_class
__import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals.joblib'
Can you tell me how to solve this?

You should directly use
import joblib
instead of
from sklearn.externals import joblib

It looks like your existing pickle save file (model_d2v_version_002) encodes a reference module in a non-standard location – a joblib that's in sklearn.externals.joblib rather than at top-level.
The current scikit-learn documentation only talks about a top-level joblib – eg in 3.4.1 Persistence example – but I do see a reference in someone else's old issue to a DeprecationWarning in scikit-learn version 0.21 about an older scikit.external.joblib variant going away:
Python37\lib\site-packages\sklearn\externals\joblib_init_.py:15:
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and
will be removed in 0.23. Please import this functionality directly
from joblib, which can be installed with: pip install joblib. If this
warning is raised when loading pickled models, you may need to
re-serialize those models with scikit-learn 0.21+.
'Deprecation' means marking something as inadvisable to rely-upon, as it is likely to be discontinued in a future release (often, but not always, with a recommended newer way to do the same thing).
I suspect your model_d2v_version_002 file was saved from an older version of scikit-learn, and you're now using scikit-learn (aka sklearn) version 0.23+ which has totally removed the sklearn.external.joblib variation. Thus your file can't be directly or easily loaded to your current environment.
But, per the DeprecationWarning, you can probably temporarily use an older scikit-learn version to load the file the old way once, then re-save it with the now-preferred way. Given the warning info, this would probably require scikit-learn version 0.21.x or 0.22.x, but if you know exactly which version your model_d2v_version_002 file was saved from, I'd try to use that. The steps would roughly be:
create a temporary working environment (or roll back your current working environment) with the older sklearn
do imports something like:
import sklearn.external.joblib as extjoblib
import joblib
extjoblib.load() your old file as you'd planned, but then immediately re-joblib.dump() the file using the top-level joblib. (You likely want to use a distinct name, to keep the older file around, just in case.)
move/update to your real, modern environment, and only import joblib (top level) to use joblib.load() - no longer having any references to `sklearn.external.joblib' in either your code, or your stored pickle files.

You can import joblib directly by installing it as a dependency and using import joblib,
Documentation.

Maybe your code is outdated. For anyone who aims to use fetch_mldata in digit handwritten project, you should fetch_openml instead. (link)
In old version of sklearn:
from sklearn.externals import joblib
mnist = fetch_mldata('MNIST original')
In sklearn 0.23 (stable release):
import sklearn.externals
import joblib
dataset = datasets.fetch_openml("mnist_784")
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
For more info about deprecating fetch_mldata see scikit-learn doc

none of the answers below works for me, with a little changes this modification was ok for me
import sklearn.externals as extjoblib
import joblib

for this error, I had to directly use the following and it worked like a charm:
import joblib
Simple

In case the execution / call to joblib is within another .py program instead of your own (in such case even you have installed joblib, it still causes error from within the calling python programme unless you change the code, i thought would be messy), I tried to create a hardlink:
(windows version)
Python> import joblib
then inside your sklearn path >......\Lib\site-packages\sklearn\externals
mklink /J ./joblib .....\Lib\site-packages\joblib
(you can work out the above using a ! or %, !mklink....... or %mklink...... inside your Python juptyter notebook , or use python OS command...)
This effectively create a virtual folder of joblib within the "externals" folder
Remarks:
Of course to be more version resilient, your code has to check for the version of sklearn is >= 0.23 again before hand.
This would be alternative to changing sklearn vesrion.

When getting error:
from sklearn.externals import joblib it deprecated older version.
For new version follow:
conda install -c anaconda scikit-learn (install using "Anaconda Promt")
import joblib (Jupyter Notebook)

I had the same problem
What I did not realize was that joblib was already installed!
so what you have to do is replace
from sklearn.externals import joblib
with
import joblib
and that is it

After a long investigation, given my computer setup, I've found that was because an SSL certificate was required to download the dataset.

Add numpy.get_include() argument to setuptools without preinstalled numpy

I am currently developing a python package that uses cython and numpy and I want the package to be installable using the pip install command from a clean python installation. All dependencies should be installed automatically. I am using setuptools with the following setup.py:
import setuptools
my_c_lib_ext = setuptools.Extension(
name="my_c_lib",
sources=["my_c_lib/some_file.pyx"]
)
setuptools.setup(
name="my_lib",
version="0.0.1",
author="Me",
author_email="me#myself.com",
description="Some python library",
packages=["my_lib"],
ext_modules=[my_c_lib_ext],
setup_requires=["cython >= 0.29"],
install_requires=["numpy >= 1.15"],
classifiers=[
"Programming Language :: Python :: 3",
"Operating System :: OS Independent"
]
)
This has worked great so far. The pip install command downloads cython for the build and is able to build my package and install it together with numpy.
Now I want to improve the performance of my cython code, which leads to some changes in my setup.py. I need to add include_dirs=[numpy.get_include()] to either the call of setuptools.Extension(...) or setuptools.setup(...) which means that I also need to import numpy. (See http://docs.cython.org/en/latest/src/tutorial/numpy.html and Make distutils look for numpy header files in the correct place for rationals.)
This is bad. Now the user cannot call pip install from a clean environment, because import numpy will fail. The user needs to pip install numpy before installing my library. Even if I move "numpy >= 1.15" from install_requires to setup_requires the installation fails, because the import numpy is evaluated earlier.
Is there a way to evaluate the include_dirs at a later point of the installation, for example, after the dependencies from setup_requires or install_requires have been resolved? I really like to have all dependencies resolved automatically and I dont want the user to type multiple pip install commands.
The following snippet works, but it is not officially supported because it uses an undocumented (and private) method:
class NumpyExtension(setuptools.Extension):
# setuptools calls this function after installing dependencies
def _convert_pyx_sources_to_lang(self):
import numpy
self.include_dirs.append(numpy.get_include())
super()._convert_pyx_sources_to_lang()
my_c_lib_ext = NumpyExtension(
name="my_c_lib",
sources=["my_c_lib/some_file.pyx"]
)
The article How to Bootstrap numpy installation in setup.py proposes using a cmdclass with custom build_ext class. Unfortunately, this breaks the build of the cython extension because cython also customizes build_ext.

First question, when is numpy needed? It is needed during the setup (i.e. when build_ext-funcionality is called) and in the installation, when the module is used. That means numpy should be in setup_requires and in install_requires.
There are following alternatives to solve the issue for the setup:
using PEP 517/518 (which is more straight forward IMO)
using setup_requires-argument of setup and postponing import of numpy until setup's requirements are satisfied (which is not the case at the start of setup.py's execution)
PEP 517/518-solution:
Put next to setup.py a pyproject.toml-file , with the following content:
[build-system]
requires = ["setuptools", "wheel", "Cython>=0.29", "numpy >= 1.15"]
which defines packages needed for building, and then install using pip install . in the folder with setup.py. A disadvantage of this method is that python setup.py install no longer works, as it is pip that reads pyproject.toml. However, I would use this approach whenever possible.
Postponing import
This approach is more complicated and somewhat hacky, but works also without pip.
First, let's take a look at unsuccessful tries so far:
pybind11-trick
#chrisb's "pybind11"-trick, which can be found here: With help of an indirection, one delays the call to import numpy until numpy is present during the setup-phase, i.e.:
class get_numpy_include(object):
def __str__(self):
import numpy
return numpy.get_include()
...
my_c_lib_ext = setuptools.Extension(
...
include_dirs=[get_numpy_include()]
)
Clever! The problem: it doesn't work with the Cython-compiler: somewhere down the line, Cython passes the get_numpy_include-object to os.path.join(...,...) which checks whether the argument is really a string, which it obviously isn't.
This could be fixed by inheriting from str, but the above shows the dangers of the approach in the long run - it doesn't use the designed mechanics, is brittle and may easily fail in the future.
the classical build_ext-solution
Which looks as following:
...
from setuptools.command.build_ext import build_ext as _build_ext
class build_ext(_build_ext):
def finalize_options(self):
_build_ext.finalize_options(self)
# Prevent numpy from thinking it is still in its setup process:
__builtins__.__NUMPY_SETUP__ = False
import numpy
self.include_dirs.append(numpy.get_include())
setupttools.setup(
...
cmdclass={'build_ext':build_ext},
...
)
Yet also this solution doesn't work with cython-extensions, because pyx-files don't get recognized.
The real question is, how did pyx-files get recognized in the first place? The answer is this part of setuptools.command.build_ext:
...
try:
# Attempt to use Cython for building extensions, if available
from Cython.Distutils.build_ext import build_ext as _build_ext
# Additionally, assert that the compiler module will load
# also. Ref #1229.
__import__('Cython.Compiler.Main')
except ImportError:
_build_ext = _du_build_ext
...
That means setuptools tries to use the Cython's build_ext if possible, and because the import of the module is delayed until build_ext is called, it founds Cython present.
The situation is different when setuptools.command.build_ext is imported at the beginning of the setup.py - the Cython isn't yet present and a fall back without cython-functionality is used.
mixing up pybind11-trick and classical solution
So let's add an indirection, so we don't have to import setuptools.command.build_ext directly at the beginning of setup.py:
....
# factory function
def my_build_ext(pars):
# import delayed:
from setuptools.command.build_ext import build_ext as _build_ext#
# include_dirs adjusted:
class build_ext(_build_ext):
def finalize_options(self):
_build_ext.finalize_options(self)
# Prevent numpy from thinking it is still in its setup process:
__builtins__.__NUMPY_SETUP__ = False
import numpy
self.include_dirs.append(numpy.get_include())
#object returned:
return build_ext(pars)
...
setuptools.setup(
...
cmdclass={'build_ext' : my_build_ext},
...
)

One (hacky) suggestion would be using the fact that extension.include_dirs is first requested in build_ext, which is called after the setup dependencies are downloaded.
class MyExt(setuptools.Extension):
def __init__(self, *args, **kwargs):
self.__include_dirs = []
super().__init__(*args, **kwargs)
#property
def include_dirs(self):
import numpy
return self.__include_dirs + [numpy.get_include()]
#include_dirs.setter
def include_dirs(self, dirs):
self.__include_dirs = dirs
my_c_lib_ext = MyExt(
name="my_c_lib",
sources=["my_c_lib/some_file.pyx"]
)
setup(
...,
setup_requires=['cython', 'numpy'],
)
Update
Another (less, but I guess still pretty hacky) solution would be overriding build instead of build_ext, since we know that build_ext is a subcommand of build and will always be invoked by build on installation. This way, we don't have to touch build_ext and leave it to Cython. This will also work when invoking build_ext directly (e.g., via python setup.py build_ext to rebuild the extensions inplace while developing) because build_ext ensures all options of build are initialized, and by coincidence, Command.set_undefined_options first ensures the command has finalized (I know, distutils is a mess).
Of course, now we're misusing build - it runs code that belongs to build_ext finalization. However, I'd still probably go with this solution rather than with the first one, ensuring the relevant piece of code is properly documented.
import setuptools
from distutils.command.build import build as build_orig
class build(build_orig):
def finalize_options(self):
super().finalize_options()
# I stole this line from ead's answer:
__builtins__.__NUMPY_SETUP__ = False
import numpy
# or just modify my_c_lib_ext directly here, ext_modules should contain a reference anyway
extension = next(m for m in self.distribution.ext_modules if m == my_c_lib_ext)
extension.include_dirs.append(numpy.get_include())
my_c_lib_ext = setuptools.Extension(
name="my_c_lib",
sources=["my_c_lib/some_file.pyx"]
)
setuptools.setup(
...,
ext_modules=[my_c_lib_ext],
cmdclass={'build': build},
...
)

I found a very easy solution in this post:
Or you can stick to https://github.com/pypa/pip/issues/5761. Here you install cython and numpy using setuptools.dist before actual setup:
from setuptools import dist
dist.Distribution().fetch_build_eggs(['Cython>=0.15.1', 'numpy>=1.10'])
Works well for me!

How to use Cython with pytest?

The goal is to use the pytest unit test framework for a Python3 project that uses Cython. This is not a plug-and-play thing, because pytest by default is not able to import the Cython modules.
One unsuccessful solution would be to use the pytest-cython plugin, but it simply does not work for me:
> py.test --doctest-cython
usage: py.test [options] [file_or_dir] [file_or_dir] [...]
py.test: error: unrecognized arguments: --doctest-cython
inifile: None
rootdir: /censored/path/to/my/project/dir
To verify that I have the package installed:
> pip freeze | grep pytest-cython
pytest-cython==0.1.0
UPDATE:
I'm using PyCharm and it seems that it is not using my pip-installed packages but rather uses a custom(?) pycharm repository for packages used by my project. Once I added pytest-cython to that repository, the command runs but strange enough it doesn't recognize the Cython module anyway, although the package/add-on is specifically designed for that purpose:
> pytest --doctest-cython
Traceback:
tests/test_prism.py:2: in <module>
from cpc_naive.prism import readSequence, processInput
cpc_naive/prism.py:5: in <module>
from calculateScore import calculateScore, filterSortAlphas,
calculateAlphaMatrix_c#, incrementOverlapRanges # cython code
E ImportError: No module named 'calculateScore'
Another unsuccessful solution I got here is to use pytest-runner, but this yields:
> python3 setup.py pytest
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: invalid command 'pytest'
UPDATE:
I first had forgotten to add setup_requires=['pytest-runner', ...] and tests_require=['pytest', ...] to the setup script. Once i did that, I got another error:
> python3 setup.py pytest
Traceback (most recent call last):
File "setup.py", line 42, in <module>
tests_require=['pytest']
(...)
AttributeError: type object 'test' has no attribute 'install_dists'
UPDATE 2 (setup.py):
from distutils.core import setup
from distutils.extension import Extension
from setuptools import find_packages
from Cython.Build import cythonize
import numpy
try: # try to build the .c file
from Cython.Distutils import build_ext
except ImportError: # if the end-user doesn't have Cython that's OK; you should have shipped the .c files anyway.
use_cython = False
else:
use_cython = True
cmdclass = {}
ext_modules = []
if use_cython:
ext_modules += [
Extension("cpc_naive.calculateScore", ["cpc_naive/calculateScore.pyx"],
extra_compile_args=['-g'], # -g for debugging
define_macros=[('CYTHON_TRACE', '1')]),
]
cmdclass.update({'build_ext': build_ext})
else:
ext_modules += [
Extension("cpc_naive.calculateScore", ["cpc_naive/calculateScore.c"],
define_macros=[('CYTHON_TRACE', '1')]), # compiled C files are stored in /home/pdiracdelta/.pyxbld/
]
setup(
name='cpc_naive',
author=censored,
author_email=censored,
license=censored,
packages=find_packages(),
cmdclass=cmdclass,
ext_modules=ext_modules,
install_requires=['Cython', 'numpy'],
include_dirs=[numpy.get_include()],
setup_requires=['pytest-runner'],
tests_require=['pytest']
)
UPDATE 3 (partial fix):
As suggested by #hoefling I downgraded pytest-runner to a version <4 (in fact 3.0.1) and this resolves the error in update 1, but now I get the same Exception as with the pytest-cython solution:
E ImportError: No module named 'calculateScore'
It just doesn't seem to recognize the module. Perhaps this is due to some absolute/relative import mojo I don't understand.
How can I use pytest with Cython? How can I discover why these methods aren't working and then fix it?
FINAL UPDATE:
After taking both the original problem and the question Updates into consideration (thanks #hoefling for solving these issues!), this question is now reduced to the question of:
why can pytest no import the Cython module calculateScore, even though running the code just with python (no pytest) works just fine?

As #hoefling suggested, one should use pytest-runner version <0.4 to avoid the
AttributeError: type object 'test' has no attribute 'install_dists'
To then answer the actual and final question (in addition to partial, off-topic, user-specific fixes added to the question post itself) of why pytest cannot import the Cython module calculateScore, even though running the code just with python (no pytest) works just fine:
that remaining issue is solved here.

cython compilation - import vs cimport

Newbie to Cython (perhaps this is a basic question). Consider two examples both taken from this blog here:
# code 1
import numpy as np
def num_update(u):
u[1:-1,1:-1] = ((u[2:,1:-1]+u[:-2,1:-1])*dy2 +
(u[1:-1,2:] + u[1:-1,:-2])*dx2) / (2*(dx2+dy2))
and
# code 2
cimport numpy as np
def cy_update(np.ndarray[double, ndim=2] u, double dx2, double dy2):
cdef unsigned int i, j
for i in xrange(1,u.shape[0]-1):
for j in xrange(1, u.shape[1]-1):
u[i,j] = ((u[i+1, j] + u[i-1, j]) * dy2 +
(u[i, j+1] + u[i, j-1]) * dx2) / (2*(dx2+dy2))
Suppose I compile the first file with the following setup.py script:
# setup file for code 1
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
ext = Extension("laplace", ["laplace.pyx"],)
setup(ext_modules=[ext], cmdclass = {'build_ext': build_ext})
and the second file with the following setup.py script:
# setup file for code 2
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy
ext = Extension("laplace", ["laplace.pyx"],
include_dirs = [numpy.get_include()])
setup(ext_modules=[ext], cmdclass = {'build_ext': build_ext})
In the 1st case, I used regular numpy and didn't import numpy in the setup file, while in the 2nd case I imported numpy using cimport, declared variables using cdef but then also included numpy in the setup file.
Cython compiles the first code anyway (and the first code seems to work).
What would be advantages of using cimport and cdef before compiling with Cython (via the setup file) versus not using cimport and cdef before compiling with Cython (via the setup file)?

import numpy in Cython is the same as Python, but cimport numpy tells Cython to load the declare file:
https://github.com/cython/cython/blob/master/Cython/Includes/numpy/init.pxd
where declares all the C-API functions, constants, and types, and also include the header files, such as numpy/arrayobject.h.
If you declare variable with np.ndarray[...], Cython will know how to convert array element access into c code which is much faster then Python's [] operator.
You need to tell the c compiler where are the numpy header files in setup.py, so you call numpy.get_include() to get the path.

Cython can compile normal python code, so your first one compiles.
Generally speaking, the more types you mark up for cython, the more chance you will get better performance. So it is your decision if you want to trade flexibility for speed.
Run cython -a your_test.pyx to see an annotated version of how cython would compile your code. The yellow means your code converts to a lot of C code (which roughly implies performance penalty), while white means it converts to only a few lines of C.
If you hadn't spent the time asking here, but instead read through the guide in the official website, you could have got a better understanding already.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

derive class from from sklearn.cluster.KMeans with cython - scikit-learn

Related

Make pytest ignore gathering a file if a module is not installed

ImportError: cannot import name 'joblib' from 'sklearn.externals'

Add numpy.get_include() argument to setuptools without preinstalled numpy

How to use Cython with pytest?

cython compilation - import vs cimport

Categories

Resources