Undefined behavior when using Cython and external multithreaded C library

Undefined behavior when using Cython and external multithreaded C library - python-3.x

I am new to Cython (as well as Python), and I am trying to understand where I am doing wrong when I try to expose the C-API of an external multithreaded library to Python. To illustrate my problem, I will go over a hypothetical MWE.
Let's assume that I have the following directory structure
.
├── app.py
├── c_mylib.pxd
├── cxx
│ ├── CMakeLists.txt
│ ├── include
│ │ └── mylib.h
│ └── src
│ └── reduce_cb.cpp
├── mylib.pyx
└── setup.py
Here, cxx contains the external multithreaded library as follows (header and implementation files are concatenated):
/* cxx/include/mylib.h */
#ifndef MYLIB_H_
#define MYLIB_H_
#ifdef __cplusplus
extern "C" {
#endif
typedef double (*func_t)(const double *, const double *, void *);
double reduce_cb(const double *, const double *, func_t, void *);
#ifdef __cplusplus
}
#endif
#endif
/* cxx/src/reduce_cb.cpp */
#include <iterator>
#include <mutex>
#include <thread>
#include <vector>
#include "mylib.h"
extern "C" {
double reduce_cb(const double *xb, const double *xe, func_t func, void *data) {
const auto d = std::distance(xb, xe);
const auto W = std::thread::hardware_concurrency();
const auto split = d / W;
const auto remain = d % W;
std::vector<std::thread> workers(W);
double res{0};
std::mutex lock;
const double *xb_w{xb};
const double *xe_w;
for (unsigned int widx = 0; widx < W; widx++) {
xe_w = widx < remain ? xb_w + split + 1 : xb_w + split;
workers[widx] = std::thread(
[&lock, &res, func, data](const double *xb, const double *xe) {
const double partial = func(xb, xe, data);
std::lock_guard<std::mutex> guard(lock);
res += partial;
},
xb_w, xe_w);
xb_w = xe_w;
}
for (auto &worker : workers)
worker.join();
return res;
}
}
with the accompanying cxx/CMakeLists.txt file as follows:
cmake_minimum_required(VERSION 3.9)
project(dummy LANGUAGES CXX)
add_library(mylib
include/mylib.h
src/reduce_cb.cpp
)
target_compile_features(mylib
PRIVATE
cxx_std_11
)
target_include_directories(mylib
PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
$<INSTALL_INTERFACE:include>
)
set_target_properties(mylib
PROPERTIES PUBLIC_HEADER include/mylib.h
)
install(TARGETS mylib
ARCHIVE DESTINATION lib
LIBRARY DESTINATION lib
PUBLIC_HEADER DESTINATION include
)
The corresponding Cython files are as follows (this time definition and implementation files are concatenated):
# c_mylib.pxd
cdef extern from "include/mylib.h":
ctypedef double (*func_t)(const double *, const double *, void *)
double reduce_cb(const double *, const double *, func_t, void *)
# mylib.pyx
# cython: language_level = 3
cimport c_mylib
cdef double func(const double *xb, const double *xe, void *data):
cdef int d = (xe - xb)
func = <object>data
return func(<double[:d]>xb)
def reduce_cb(double [::1] arr not None, f):
cdef int d = arr.shape[0]
data = <void*>f
return c_mylib.reduce_cb(&arr[0], &arr[0] + d, func, data)
# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
setup(
ext_modules=cythonize([
Extension("mylib", ["mylib.pyx"], libraries=["mylib"])
])
)
Building the C++ library, and building the Cython extension module and linking it against the C++ library following the instructions, I get undefined behavior when I try to run
import mylib
from numpy import array
def cb(x):
partial = 0
for idx in range(x.shape[0]):
partial += x[idx]
return partial
arr = array([val + 1 for val in range(100)], "d")
print("sum(arr): ", mylib.reduce_cb(arr, cb))
By undefined behavior, I mean that I get either of
SIGSEGV (Address boundary error),
"Fatal Python error: GC object already tracked" with SIGABRT, or,
(rarely) the correct result.
I have checked the documentation of Cython thoroughly (I guess), and I have searched both SO and Google for this problem, but I could not find a proper solution to this problem.
Basically, I would like to have a C library, which is unaware of Python and which uses callback functions from multiple threads, that is integrated inside Python. Is this at all possible? I tried nogil signatures and with gil: blocks as discussed in Cython's documentation, but I got compilation errors. Moreover, gc related functionality in Cython seems to be valid only for extension types, which does not apply to my case.
I am stuck and I would appreciate any hint/help.

That happens when you use Python-objects/functionality without lock. Your critical section is not only the summation but also the call to function func, i.e.:
workers[widx] = std::thread(
[&lock, &res, func, data](const double *xb, const double *xe) {
std::lock_guard<std::mutex> guard(lock);
const double partial = func(xb, xe, data); // must be guarded
res += partial;
},
xb_w, xe_w);
which makes the parallelization senseless in the first place, doesn't it? Probably, from software-engineering point of view, a better place for guard would be in the wrapper function func - but I have put it into worker because the consequences are seen much better this way.
Python uses reference counting for memory management - similar to std::shared_ptr. However it doesn't lock with fine granularity as shared_ptr, which locks only when changing the reference counter, but uses a more coarse lock - the global interpreter lock. That has the consequence, that when one changes the reference count of an python-object from open-MP-thread or other threads not registered in Python-interpreter the reference-counter is not protected/guarded and race conditions arise. What you are observing, are the possible results of such race conditions.
The GIL makes your endeavor more or less impossible: you need to lock every call to possible python but than you serialize the calls to this functionality!

Related

C extensions - how to redirect printf to a python logger?

I have a simple C-extension(see example below) that sometimes prints using the printf function.
I'm looking for a way to wrap the calls to the function from that C-extensions so that all those printfs will be redirected to my python logger.
hello.c:
#include <Python.h>
static PyObject* hello(PyObject* self)
{
printf("example print from a C code\n");
return Py_BuildValue("");
}
static char helloworld_docs[] =
"helloworld(): Any message you want to put here!!\n";
static PyMethodDef helloworld_funcs[] = {
{"hello", (PyCFunction)hello,
METH_NOARGS, helloworld_docs},
{NULL}
};
static struct PyModuleDef cModPyDem =
{
PyModuleDef_HEAD_INIT,
"helloworld",
"Extension module example!",
-1,
helloworld_funcs
};
PyMODINIT_FUNC PyInit_helloworld(void)
{
return PyModule_Create(&cModPyDem);
};
setup.py:
from distutils.core import setup, Extension
setup(name = 'helloworld', version = '1.0', \
ext_modules = [Extension('helloworld', ['hello.c'])])
to use first run
python3 setup.py install
and then:
import helloworld
helloworld.hello()
I want to be able to do something like this:
with redirect_to_logger(my_logger)
helloworld.hello()
EDIT: I saw a number of posts showing how to silence the prints from C, but I wasn't able to figure out from it how can I capture the prints in python instead.
Example of such post: Redirect stdout from python for C calls
I assume that this question didn't get much traction because I maybe ask too much, so I don't care about logging anymore... how can I capture the C prints in python? to a list or whatever.
EDIT
So I was able to achieve somewhat a working code that does what I want - redirect c printf to python logger:
import select
import threading
import time
import logging
import re
from contextlib import contextmanager
from wurlitzer import pipes
from helloworld import hello
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)
class CPrintsHandler(threading.Thread):
def __init__(self, std, poll_std, err, poll_err, logger):
super(CPrintsHandler, self).__init__()
self.std = std
self.poll_std = poll_std
self.err = err
self.poll_err = poll_err
self.logger = logger
self.stop_event = threading.Event()
def stop(self):
self.stop_event.set()
def run(self):
while not self.stop_event.is_set():
# How can I poll both std and err at the same time?
if self.poll_std.poll(1):
line = self.std.readline()
if line:
self.logger.debug(line.strip())
if self.poll_err.poll(1):
line = self.err.readline()
if line:
self.logger.debug(line.strip())
#contextmanager
def redirect_to_logger(some_logger):
handler = None
try:
with pipes() as (std, err):
poll_std = select.poll()
poll_std.register(std, select.POLLIN)
poll_err = select.poll()
poll_err.register(err, select.POLLIN)
handler = CPrintsHandler(std, poll_std, err, poll_err, some_logger)
handler.start()
yield
finally:
if handler:
time.sleep(0.1) # why do I have to sleep here for the foo prints to finish?
handler.stop()
handler.join()
def foo():
logger.debug('logger print from foo()')
hello()
def main():
with redirect_to_logger(logger):
# I don't want the logs from here to be redirected as well, only printf.
logger.debug('logger print from main()')
foo()
main()
But I have a couple of issues:
The python logs are also being redirected and caught by the CPrintsHandler. Is there a way to avoid that?
The prints are not exactly in the correct order:
python3 redirect_c_example_for_stackoverflow.py
2020-08-18 19:50:47,732 - root - DEBUG - example print from a C code
2020-08-18 19:50:47,733 - root - DEBUG - 2020-08-18 19:50:47,731 - root - DEBUG - logger print from main()
2020-08-18 19:50:47,733 - root - DEBUG - 2020-08-18 19:50:47,731 - root - DEBUG - logger print from foo()
Also, the logger prints all go to err, perhaps the way I poll them causes this order.
I'm not that familiar with select in python and not sure if there is a way to poll both std and err at the same time and print whichever has something first.

On Linux you could use wurlitzer which would capture the output from fprint, e.g.:
from wurlitzer import pipes
with pipes() as (out, err):
helloworld.hello()
out.read()
#'example print from a C code\n'
wurlitzer is based on this article of Eli Bendersky, the code from which you can use if you don't like to depend on third-party libraries.
Sadly, wurlitzer and the code from the article work only for Linux (and possible MacOS).
Here is a prototype (an improved version of the prototype can be installed from my github) for Windows using Eli's approach as Cython-extension (which probably could be translated to ctypes if needed):
%%cython
import io
import os
cdef extern from *:
"""
#include <windows.h>
#include <io.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
int open_temp_file() {
TCHAR lpTempPathBuffer[MAX_PATH+1];//path+NULL
// Gets the temp path env string (no guarantee it's a valid path).
DWORD dwRetVal = GetTempPath(MAX_PATH, // length of the buffer
lpTempPathBuffer); // buffer for path
if(dwRetVal > MAX_PATH || (dwRetVal == 0))
{
return -1;
}
// Generates a temporary file name.
TCHAR szTempFileName[MAX_PATH + 1];//path+NULL
DWORD uRetVal = GetTempFileName(lpTempPathBuffer, // directory for tmp files
TEXT("tmp"), // temp file name prefix
0, // create unique name
szTempFileName); // buffer for name
if (uRetVal == 0)
{
return -1;
}
HANDLE tFile = CreateFile((LPTSTR)szTempFileName, // file name
GENERIC_READ | GENERIC_WRITE, // first we write than we read
0, // do not share
NULL, // default security
CREATE_ALWAYS, // overwrite existing
FILE_ATTRIBUTE_TEMPORARY | FILE_FLAG_DELETE_ON_CLOSE, // "temporary" temporary file, see https://learn.microsoft.com/en-us/archive/blogs/larryosterman/its-only-temporary
NULL); // no template
if (tFile == INVALID_HANDLE_VALUE) {
return -1;
}
return _open_osfhandle((intptr_t)tFile, _O_APPEND | _O_TEXT);
}
int replace_stdout(int temp_fileno)
{
fflush(stdout);
int old;
int cstdout = _fileno(stdout);
old = _dup(cstdout); // "old" now refers to "stdout"
if (old == -1)
{
return -1;
}
if (-1 == _dup2(temp_fileno, cstdout))
{
return -1;
}
return old;
}
int restore_stdout(int old_stdout){
fflush(stdout);
// Restore original stdout
int cstdout = _fileno(stdout);
return _dup2(old_stdout, cstdout);
}
void rewind_fd(int fd) {
_lseek(fd, 0L, SEEK_SET);
}
"""
int open_temp_file()
int replace_stdout(int temp_fileno)
int restore_stdout(int old_stdout)
void rewind_fd(int fd)
void close_fd "_close" (int fd)
cdef class CStdOutCapture():
cdef int tmpfile_fd
cdef int old_stdout_fd
def start(self): #start capturing
self.tmpfile_fd = open_temp_file()
self.old_stdout_fd = replace_stdout(self.tmpfile_fd)
def stop(self): # stops capturing, frees resources and returns the content
restore_stdout(self.old_stdout_fd)
rewind_fd(self.tmpfile_fd) # need to read from the beginning
buffer = io.TextIOWrapper(os.fdopen(self.tmpfile_fd, 'rb'))
result = buffer.read()
close_fd(self.tmpfile_fd)
return result
And now:
b = CStdOutCapture()
b.start()
helloworld.hello()
out = b.stop()
print("HERE WE GO:", out)
# HERE WE GO: example print from a C code

This is what I would do if I am free to edit the C code. Open a memory map in C and write to its file descriptor using fprintf(). Expose the file descriptor to Python either as the int and then use mmap module to open it or use os.openfd() to wrap it in a simpler file-like object, or wrap it in file-like object in C and let Python use that.
Then I would create a class that will enable me to write to sys.stdout through usual interface, i.e. its write() method (for Python's side usage) , and that would use select module to poll the file from C that acts as its stdout in a thread. Then I would switch sys.stdout with an object of this class. So, when Python does sys.stdout.write(...) the string will be redirected to sys.stdout.write(), and when the loop in a thread detects output on a file from C, it will write it using sys.stdout.write(). So, everything will be written to the screen and be available to loggers as well.
In this model, the strictly C part will never actually be writing to the file descriptor connected to the terminal.
You can even do much of this in C itself and leave little for the Python's side, but its easier to influence the interpreter from the Python's side as the extension is the shared library which involves some kind of, lets call it, IPC and OS in the whole story. That's why the stdout is not shared between extension and Python in the first place.
If you want to continue printf() on C side, you can see how you can redirect it in C before programming this whole mess.
This answer is strictly theoretical because I have no time to test it; but it should be doable according to my knowledge. If you try it, please let me know in a comment how it went. Perhaps I missed something, but, I am certain the theory is sane.
Beauty of this idea is that it will be OS independent, although the part with shared memory or connecting a file descriptor to allocated space in RAM can be sometimes PITA on Windows.

If you are not constrained to using the printf in C, it would be easier to use the print equivalent from python C API and pass where you want to redirect the message as an argument.
For example, your hello.c would be:
#include <Python.h>
static PyObject* hello(PyObject* self, PyObject *args)
{
PyObject *file = NULL;
if (!PyArg_ParseTuple(args, "O", &file))
return NULL;
PyObject *pystr = PyUnicode_FromString("example print from a C code\n");
PyFile_WriteObject(pystr, file, Py_PRINT_RAW);
return Py_BuildValue("");
}
static char helloworld_docs[] =
"helloworld(): Any message you want to put here!!\n";
static PyMethodDef helloworld_funcs[] = {
{"hello", (PyCFunction)hello,
METH_VARARGS, helloworld_docs},
{NULL}
};
static struct PyModuleDef cModPyDem =
{
PyModuleDef_HEAD_INIT,
"helloworld",
"Extension module example!",
-1,
helloworld_funcs
};
PyMODINIT_FUNC PyInit_helloworld(void)
{
return PyModule_Create(&cModPyDem);
};
We can check if it is working with the program below:
import sys
import helloworld
helloworld.hello(sys.stdout)
helloworld.hello(sys.stdout)
helloworld.hello(sys.stderr)
In the command line we redirect each output separately:
python3 example.py 1> out.txt 2> err.txt
out.txt will have two print calls, while err.txt will have only one, as expected from our python script.
You can check python's print implementation to get some more ideas of what you can do.
cpython print source code

Cython: external struct definition throws compiler error

I am trying to use Collections-C in Cython.
I noticed that some structures are defined in the .c file, and an alias for them is in the .h file. When I try to define those structures in a .pxd file and use them in a .pyx file, gcc throws an error: storage size of ‘[...]’ isn’t known.
I was able to reproduce my issue to a minimum setup that replicates the external library and my application:
testdef.c
/* Note: I can't change this */
struct bogus_s {
int x;
int y;
};
testdef.h
/* Note: I can't change this */
typedef struct bogus_s Bogus;
cytestdef.pxd
# This is my code
cdef extern from 'testdef.h':
struct bogus_s:
int x
int y
ctypedef bogus_s Bogus
cytestdef.pyx
# This is my code
def fn():
cdef Bogus n
n.x = 12
n.y = 23
print(n.x)
If I run cythonize, I get
In function ‘__pyx_pf_7sandbox_9cytestdef_fn’:
cytestdef.c:1106:9: error: storage size of ‘__pyx_v_n’ isn’t known
Bogus __pyx_v_n;
^~~~~~~~~
I also get the same error if I use ctypedef Bogus: [...] notation as indicated in the Cython manual.
What am I doing wrong?
Thanks.

Looking at the documentation for your Collections-C library these are opaque structures that you're supposed to use purely through pointers (don't need to know the size to have a pointer, while you do to allocate on the stack). Allocation of these structures is done in library functions.
To change your example to match this case:
// C file
int bogus_s_new(struct bogus_s** v) {
*v = malloc(sizeof(struct bogus_s));
return (v!=NULL);
}
void free_bogus_s(struct bogus_s* v) {
free(v);
}
Your H file would contain the declarations for those and your pxd file would contain wrappers for the declarations. Then in Cython:
def fn():
cdef Bogus* n
if not bogus_s_new(&n):
return
try:
# you CANNOT access x and y since the type is
# designed to be opaque. Instead you should use
# the acessor functions defined in the header
# n.x = 12
# n.y = 23
finally:
free_bogus_s(n)

DLL Floating point results differ according to caller

This is a follow up question to my earlier one asked yesterday
The problems were occurring in a MSVS 2008 C++ DLL that has over 4000 lines of code, but I have managed to produce a simple case that demonstrates the problem as it occurs on my CPU (an AMD Phenom II X6 1050T).
Will it show the problem occurring on another system? I'd really like to know!
Here is a simple class (Point.cpp), it needs to be compiled as a DLL:
#include <math.h>
#define EXPORT extern "C" __declspec(dllexport)
namespace Test {
struct Point {
double x;
double y;
/* Constructor for a Point object */
Point(double xx, double yy) : x(xx), y(yy) {}
/* Copy constructor */
Point(const Point &rhs) : x(rhs.x), y(rhs.y) {}
double mag() const;
Point norm() const;
};
double Point::mag() const {return sqrt(x*x + y*y);}
Point Point::norm() const {
double m = mag();
return Point(x/m, y/m);
}
EXPORT void __stdcall GetNorm(double x, double y, double *nx, double *ny)
Point P = Point(x, y);
Point N = P.norm();
*nx = N.x;
*ny = N.y;
}
}
Here is the test program (TestPoint.c), which needs to be linked to the lib created for the DLL:
#include <stdio.h>
#define IMPORT extern __declspec(dllimport)
IMPORT void __stdcall GetNorm(double x, double y, double *nx, double *ny);
void dhex(double x) { // double to hex
union {
unsigned long n[2];
double d;
} value;
value.d = x;
printf("(0x%0x%0x)\n", value.n[1], value.n[0]);
}
double i64tod(unsigned long long n) { // hex to double
double *DP = (double *) &n;
return *DP;
}
int main(int argc, char **argv) {
double vx, vy;
double ux, uy;
vx = i64tod(0xbfc7a30f3a53d351);
vy = i64tod(0xc01b578b34e3ce1d);
GetNorm(vx, vy, &ux, &uy);
printf(" vx = %20.18f ", vx); dhex(vx);
printf(" vy = %20.18f ", vy); dhex(vy);
printf("\n");
printf(" ux = %20.18f ", ux); dhex(ux);
printf(" uy = %20.18f ", uy); dhex(uy);
return 0;
}
On my system, with TestPoint compiled with VC++, the output is:
vx = -0.18466368053455054 (0xbfc7a30f3a53d351)
vy = -6.8354919685403077 (0xc01b578b34e3ce1d)
ux = -0.027005566159023012 (0xbf9ba758ddda1454,
uy = -0.99963528318903927 (0xbfeffd032227301b)
However, if the same code is compiled with gcc, or indeed, it seems, ANY equivalent program (eg VB6, PowerBasic), the results (ux and uy) are subtly but definitely different (the last hex digit):
vx = -0.184663680534550540 (0xbfc7a30f3a53d351)
vy = -6.835491968540307700 (0xc01b578b34e3ce1d)
ux = -0.027005566159023008 (0xbf9ba758ddda1453)
uy = -0.999635283189039160 (0xbfeffd032227301a)
This might seem an insignificant difference, but when it occurs in a physics engine, these differences accumulate in an alarming fashion. .
If the engine is going to get different results depending on who calls it I might have to abandon the use of VC++ altogether and try g++ instead.

Ok, I think I know how this happens. Looking at a disassembler listing of Point.dll, I noticed that the GetNorm function was pretty much what you'd expect, a couple of FMUL's and FDIV's. What was not present was an FLDCW instruction.
There weren't any FLDCW's in the MSVC calling program either, but I found FLDCW's in both the gcc and a PowerBasic versions of the calling program.
So I tweaked one of the executables (the PowerBasic EXE was the easiest to find the right place to tweak), and hey presto, I then got answers that matched MSVC. Presumably the FLDCW had changed the FPU rounding mode, hence the difference in the least significant bits.

Calling unmanaged C DLL from VB CLR UDF in SQL Server

I am trying to integrate an existing C DLL (unmanaged obviously), that implements fuzzy matching, into SQL Server as a user defined function (UDF). The UDF is implemented with a CLR VB project. I have used this C code for nearly 20 years to do string matching on text files without a hitch. It has been compiled on about every platform under the sun and has never crashed or given erroneous results. Ever. Until now.
The usage of this UDF in a SQL SELECT statement looks something like this:
SELECT Field FROM Table WHERE xudf_fuzzy('doppler effect', Field) = 1;
xudf_fuzzy(Param1, Param2) = 1 is where the magic happens. Param1 is the clue word we are trying to match while Param2 is the field from the table to be tested against. If the match is successful within a certain number of errors, the UDF returns 1, if not it returns 0. So far so good.
Here is the CLR code that defines the fuzzy UDF and calls the C DLL:
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports System.Text
Imports Microsoft.SqlServer.Server
Imports System.Runtime.InteropServices
Partial Public Class fuzzy
<DllImport("C:\Users\Administrator\Desktop\fuzzy64.dll", _
CallingConvention:=CallingConvention.StdCall)> _
Public Shared Function setClue(ByRef clue As String, ByVal misses As Integer) As Integer
End Function
<DllImport("C:\Users\Administrator\Desktop\fuzzy64.dll", _
CallingConvention:=CallingConvention.StdCall)> _
Public Shared Function srchString(ByRef text1 As String) As Integer
End Function
<Microsoft.SqlServer.Server.SqlFunction()> Public Shared Function _
xudf_fuzzy(ByVal strSearchClue As SqlString, ByVal strStringtoSearch As SqlString) As Long
Dim intMiss As Integer = 0
Dim intRet As Integer
Static Dim sClue As String = ""
xudf_fuzzy = 0
' we only need to set the clue whenever it changes '
If (sClue <> strSearchClue.ToString) Then
sClue = strSearchClue.ToString
intMiss = (Len(sClue) \ 4) + 1
intRet = setClue(sClue, intMiss)
End If
' return the fuzzy match result (0 or 1) '
xudf_fuzzy = srchString(strStringtoSearch.ToString)
End Function
Here is the front end of the C code being called. STRCT is where all of the global storage resides.
fuzzy.h
typedef struct {
short int INVRT, AND, LOWER, COMPL, Misses;
long int num_of_matched;
int D_length;
unsigned long int endposition, D_endpos;
unsigned long int Init1, NOERRM;
unsigned long int Mask[SYMMAX];
unsigned long int Init[MaxError];
unsigned long int Bit[WORDSIZE+1];
unsigned char prevpat[MaxDelimit];
unsigned char _buffer[Max_record+Max_record+256];
unsigned char _myPatt[MAXPAT];
} SRCH_STRUCT;
fuzzy.c
#include <sys/types.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <wtypes.h>
#include <time.h>
#include "fuzzy.h"
// call exports
__declspec(dllexport) int CALLBACK setClue(char**, int*);
__declspec(dllexport) int CALLBACK srchString(char**);
SRCH_STRUCT STRCT = { 0 };
int cluePrep(unsigned char []);
int srchMin(unsigned char [], int);
BOOL APIENTRY DllMain( HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved )
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
int CALLBACK setClue(char **pattern, int *misses)
{
int i;
unsigned char *p;
// code to do initialization stuff, set flags etc. etc.
STRCT.Misses = (int)misses;
p = &(STRCT._myPatt[2]);
STRCT._myPatt[0] = '\n';
STRCT._myPatt[1] = SEPCHAR;
strcpy((char *)p, *pattern);
//blah blah
// end setup stuff
i = cluePrep(STRCT._myPatt);
return 0;
}
int CALLBACK srchString(char **textstr)
{
int res,i = Max_record;
unsigned char c;
char *textPtr = *textstr;
STRCT.matched = 0;
//clean out any non alphanumeric characters while we load the field to be tested
while ((c = *textPtr++)) if ( isalpha(c) || isdigit(c) || c == ' ' ) STRCT._buffer[i++] = c;
STRCT._buffer[i] = 0;
// do the search
res = srchMin(STRCT.pattern, STRCT.Misses);
if (res < 0) return res;
return STRCT.matched;
}
The runtime library it is linked against is: Multi-threaded DLL (/MD)
The calling convention is: __cdecl (/Gd)
This is where it gets weird. If I have a regular dot-net application (code not shown) that grabs the entire recordset from the test database and iterates through all of the records one at a time calling this DLL to invoke the fuzzy match, I get back the correct results every time.
If I use the CLR UDF application shown above against the test database using the SQL statement shown above while using only one thread (a single core VM) I get back correct results every time as well.
When this DLL is used in CLR UDF mode on a multi-core machine then some of the results are wrong. The results are always a little off, but not consistently.
292 records should match and do in the first two test cases.
In the CLR UDF multi-threaded case the results will come back with 273, 284, 298, 290 etc.
All of the storage in the C DLL is in character arrays. No memory allocs are being used. It is also my understanding that if SQL Server is using this CLR app in multi-threaded mode that the threads are all assigned their own data space.
Do I need to somehow "pin" the strings before I send them to the C DLL? I just can't figure out how to proceed.

Pinning is not your issue, your C code is not thread-safe. The C code uses a global variable STRCT. There is only one instance of that global variable that will be shared across all threads. Each thread will update STRCT variable with different values, which would cause incorrect result.
You will have to refactor the C code so that it does not rely on the shared state of a global variable.
You may be able to get it to work by declaring STRCT with __declspec(thread) which will use Thread Local Storage so each thread gets its own copy of the variable. However, that would be unique to each unmanaged thread and there is no guarantee that there is a one-to-one mapping between managed and un-managed threads.
The better solution would be to get rid of the shared state completely. You could do this by allocating a SRCH_STRUCT in setClue and return that pointer. Then each call to srchString would take that SRCH_STRUCT pointer as a parameter. The VB.Net code would only have to treat this struct as in IntPtr, it would not need to know anything about the definition of SRCH_STRUCT. Note you would also need to add a new function to the DLL to deallocate the allocated SRCH_STRUCT.

Generating a comprehensive callgraph using GCC & Egypt

I am trying to generate a comprehensive callgraph (complete with low level calls to Linux, runtime, the lot).
I have statically compiled my source files with "-fdump-rtl-expand" and created RTL files, which I passed to a PERL script called Egypt (which I believe is Graphviz/Dot) and generated a PDF file of the callgraph. This works perfectly, no problems at all.
Except, there are calls being made into some libraries that are getting shown as built-in. I was looking to see if there is a way for the callgraph not to be printed as and instead the real calls made into the libraries ?
Please let me know if the question is unclear.
http://i.imgur.com/sp58v.jpg
Basically, I am trying to avoid the callgraph from generating < built-in >
Is there a way to do that ?
-------- CODE ---------
#include <cilk/cilk.h>
#include <stdio.h>
#include <stdlib.h>
unsigned long int t0, t5;
unsigned int NOSPAWN_THRESHOLD = 32;
int fib_nospawn(int n)
{
if (n < 2)
return n;
else
{
int x = fib_nospawn(n-1);
int y = fib_nospawn(n-2);
return x + y;
}
}
// spawning fibonacci function
int fib(long int n)
{
long int x, y;
if (n < 2)
return n;
else if (n <= NOSPAWN_THRESHOLD)
{
x = fib_nospawn(n-1);
y = fib_nospawn(n-2);
return x + y;
}
else
{
x = cilk_spawn fib(n-1);
y = cilk_spawn fib(n-2);
cilk_sync;
return x + y;
}
}
int main(int argc, char *argv[])
{
int n;
long int result;
long int exec_time;
n = atoi(argv[1]);
NOSPAWN_THRESHOLD = atoi(argv[2]);
result = fib(n);
printf("%ld\n", result);
return 0;
}
I compiled the Cilk Library from source.

I might have found the partial solution to the problem:
You need to pass the following option to egypt
--include-external
This produced a slightly more comprehensive callgraph, although there still is the " visible
http://i.imgur.com/GWPJO.jpg?1
Can anyone suggest if I get more depth in the callgraph ?

You can use the GCC VCG Plugin: A gcc plugin, which can be loaded when debugging gcc, to show internal structures graphically.
gcc -fplugin=/path/to/vcg_plugin.so -fplugin-arg-vcg_plugin-cgraph foo.c
Call-graph is place to store data needed
for inter-procedural optimization. All datastructures
are divided into three components:
local_info that is produced while analyzing
the function, global_info that is result
of global walking of the call-graph on the end
of compilation and rtl_info used by RTL
back-end to propagate data from already compiled
functions to their callers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Undefined behavior when using Cython and external multithreaded C library - python-3.x

Related

C extensions - how to redirect printf to a python logger?

Cython: external struct definition throws compiler error

DLL Floating point results differ according to caller

Calling unmanaged C DLL from VB CLR UDF in SQL Server

Generating a comprehensive callgraph using GCC & Egypt

Categories

Resources