I am trying to integrate an existing C DLL (unmanaged obviously), that implements fuzzy matching, into SQL Server as a user defined function (UDF). The UDF is implemented with a CLR VB project. I have used this C code for nearly 20 years to do string matching on text files without a hitch. It has been compiled on about every platform under the sun and has never crashed or given erroneous results. Ever. Until now.
The usage of this UDF in a SQL SELECT statement looks something like this:
SELECT Field FROM Table WHERE xudf_fuzzy('doppler effect', Field) = 1;
xudf_fuzzy(Param1, Param2) = 1 is where the magic happens. Param1 is the clue word we are trying to match while Param2 is the field from the table to be tested against. If the match is successful within a certain number of errors, the UDF returns 1, if not it returns 0. So far so good.
Here is the CLR code that defines the fuzzy UDF and calls the C DLL:
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports System.Text
Imports Microsoft.SqlServer.Server
Imports System.Runtime.InteropServices
Partial Public Class fuzzy
<DllImport("C:\Users\Administrator\Desktop\fuzzy64.dll", _
CallingConvention:=CallingConvention.StdCall)> _
Public Shared Function setClue(ByRef clue As String, ByVal misses As Integer) As Integer
End Function
<DllImport("C:\Users\Administrator\Desktop\fuzzy64.dll", _
CallingConvention:=CallingConvention.StdCall)> _
Public Shared Function srchString(ByRef text1 As String) As Integer
End Function
<Microsoft.SqlServer.Server.SqlFunction()> Public Shared Function _
xudf_fuzzy(ByVal strSearchClue As SqlString, ByVal strStringtoSearch As SqlString) As Long
Dim intMiss As Integer = 0
Dim intRet As Integer
Static Dim sClue As String = ""
xudf_fuzzy = 0
' we only need to set the clue whenever it changes '
If (sClue <> strSearchClue.ToString) Then
sClue = strSearchClue.ToString
intMiss = (Len(sClue) \ 4) + 1
intRet = setClue(sClue, intMiss)
End If
' return the fuzzy match result (0 or 1) '
xudf_fuzzy = srchString(strStringtoSearch.ToString)
End Function
Here is the front end of the C code being called. STRCT is where all of the global storage resides.
fuzzy.h
typedef struct {
short int INVRT, AND, LOWER, COMPL, Misses;
long int num_of_matched;
int D_length;
unsigned long int endposition, D_endpos;
unsigned long int Init1, NOERRM;
unsigned long int Mask[SYMMAX];
unsigned long int Init[MaxError];
unsigned long int Bit[WORDSIZE+1];
unsigned char prevpat[MaxDelimit];
unsigned char _buffer[Max_record+Max_record+256];
unsigned char _myPatt[MAXPAT];
} SRCH_STRUCT;
fuzzy.c
#include <sys/types.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <wtypes.h>
#include <time.h>
#include "fuzzy.h"
// call exports
__declspec(dllexport) int CALLBACK setClue(char**, int*);
__declspec(dllexport) int CALLBACK srchString(char**);
SRCH_STRUCT STRCT = { 0 };
int cluePrep(unsigned char []);
int srchMin(unsigned char [], int);
BOOL APIENTRY DllMain( HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved )
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
int CALLBACK setClue(char **pattern, int *misses)
{
int i;
unsigned char *p;
// code to do initialization stuff, set flags etc. etc.
STRCT.Misses = (int)misses;
p = &(STRCT._myPatt[2]);
STRCT._myPatt[0] = '\n';
STRCT._myPatt[1] = SEPCHAR;
strcpy((char *)p, *pattern);
//blah blah
// end setup stuff
i = cluePrep(STRCT._myPatt);
return 0;
}
int CALLBACK srchString(char **textstr)
{
int res,i = Max_record;
unsigned char c;
char *textPtr = *textstr;
STRCT.matched = 0;
//clean out any non alphanumeric characters while we load the field to be tested
while ((c = *textPtr++)) if ( isalpha(c) || isdigit(c) || c == ' ' ) STRCT._buffer[i++] = c;
STRCT._buffer[i] = 0;
// do the search
res = srchMin(STRCT.pattern, STRCT.Misses);
if (res < 0) return res;
return STRCT.matched;
}
The runtime library it is linked against is: Multi-threaded DLL (/MD)
The calling convention is: __cdecl (/Gd)
This is where it gets weird. If I have a regular dot-net application (code not shown) that grabs the entire recordset from the test database and iterates through all of the records one at a time calling this DLL to invoke the fuzzy match, I get back the correct results every time.
If I use the CLR UDF application shown above against the test database using the SQL statement shown above while using only one thread (a single core VM) I get back correct results every time as well.
When this DLL is used in CLR UDF mode on a multi-core machine then some of the results are wrong. The results are always a little off, but not consistently.
292 records should match and do in the first two test cases.
In the CLR UDF multi-threaded case the results will come back with 273, 284, 298, 290 etc.
All of the storage in the C DLL is in character arrays. No memory allocs are being used. It is also my understanding that if SQL Server is using this CLR app in multi-threaded mode that the threads are all assigned their own data space.
Do I need to somehow "pin" the strings before I send them to the C DLL? I just can't figure out how to proceed.
Pinning is not your issue, your C code is not thread-safe. The C code uses a global variable STRCT. There is only one instance of that global variable that will be shared across all threads. Each thread will update STRCT variable with different values, which would cause incorrect result.
You will have to refactor the C code so that it does not rely on the shared state of a global variable.
You may be able to get it to work by declaring STRCT with __declspec(thread) which will use Thread Local Storage so each thread gets its own copy of the variable. However, that would be unique to each unmanaged thread and there is no guarantee that there is a one-to-one mapping between managed and un-managed threads.
The better solution would be to get rid of the shared state completely. You could do this by allocating a SRCH_STRUCT in setClue and return that pointer. Then each call to srchString would take that SRCH_STRUCT pointer as a parameter. The VB.Net code would only have to treat this struct as in IntPtr, it would not need to know anything about the definition of SRCH_STRUCT. Note you would also need to add a new function to the DLL to deallocate the allocated SRCH_STRUCT.
Related
I am trying to write a program that calls upon an [external library (?)] (I'm not sure that I'm using the right terminology here) that I am also writing to clean up a provided string. For example, if my main.c program were to be provided with a string such as:
asdfFAweWFwseFL Wefawf JAWEFfja FAWSEF
it would call upon a function in externalLibrary.c (lets call it externalLibrary_Clean for now) that would take in the string, and return all characters in upper case without spaces:
ASDFFAWEWFWSEFLWEFAWFJAWEFFJAFAWSEF
The crazy part is that I have this working... so long as my string doesn't exceed 26 characters in length. As soon as I add a 27th character, I end up with an error that says
malloc(): corrupted top size.
Here is externalLibrary.c:
#include "externalLibrary.h"
#include <ctype.h>
#include <malloc.h>
#include <assert.h>
#include <string.h>
char * restrict externalLibrary_Clean(const char* restrict input) {
// first we define the return value as a pointer and initialize
// an integer to count the length of the string
char * returnVal = malloc(sizeof(input));
char * initialReturnVal = returnVal; //point to the start location
// until we hit the end of the string, we use this while loop to
// iterate through it
while (*input != '\0') {
if (isalpha(*input)) { // if we encounter an alphabet character (a-z/A-Z)
// then we convert it to an uppercase value and point our return value at it
*returnVal = toupper(*input);
returnVal++; //we use this to move our return value to the next location in memory
}
input++; // we move to the next memory location on the provided character pointer
}
*returnVal = '\0'; //once we have exhausted the input character pointer, we terminate our return value
return initialReturnVal;
}
int * restrict externalLibrary_getFrequencies(char * ar, int length){
static int freq[26];
for (int i = 0; i < length; i++){
freq[(ar[i]-65)]++;
}
return freq;
}
the header file for it (externalLibrary.h):
#ifndef LEARNINGC_EXTERNALLIBRARY_H
#define LEARNINGC_EXTERNALLIBRARY_H
#ifdef __cplusplus
extern "C" {
#endif
char * restrict externalLibrary_Clean(const char* restrict input);
int * restrict externalLibrary_getFrequencies(char * ar, int length);
#ifdef __cplusplus
}
#endif
#endif //LEARNINGC_EXTERNALLIBRARY_H
my main.c file from where all the action is happening:
#include <stdio.h>
#include "externalLibrary.h"
int main() {
char * unfilteredString = "ASDFOIWEGOASDGLKASJGISUAAAA";//if this exceeds 26 characters, the program breaks
char * cleanString = externalLibrary_Clean(unfilteredString);
//int * charDist = externalLibrary_getFrequencies(cleanString, 25); //this works just fine... for now
printf("\nOutput: %s\n", unfilteredString);
printf("\nCleaned Output: %s\n", cleanString);
/*for(int i = 0; i < 26; i++){
if(charDist[i] == 0){
}
else {
printf("%c: %d \n", (i + 65), charDist[i]);
}
}*/
return 0;
}
I'm extremely well versed in Java programming and I'm trying to translate my knowledge over to C as I wish to learn how my computer works in more detail (and have finer control over things such as memory).
If I were solving this problem in Java, it would be as simple as creating two class files: one called main.java and one called externalLibrary.java, where I would have static String Clean(string input) and then call upon it in main.java with String cleanString = externalLibrary.Clean(unfilteredString).
Clearly this isn't how C works, but I want to learn how (and why my code is crashing with corrupted top size)
The bug is this line:
char * returnVal = malloc(sizeof(input));
The reason it is a bug is that it requests an allocation large enough space to store a pointer, meaning 8 bytes in a 64-bit program. What you want to do is to allocate enough space to store the modified string, which you can do with the following line:
char *returnVal = malloc(strlen(input) + 1);
So the other part of your question is why the program doesn't crash when your string is less than 26 characters. The reason is that malloc is allowed to give the caller slightly more than the caller requested.
In your case, the message "malloc(): corrupted top size" suggests that you are using libc malloc, which is the default on Linux. That variant of malloc, in a 64-bit process, would always give you at least 0x18 (24) bytes (minimum chunk size 0x20 - 8 bytes for the size/status). In the specific case that the allocation immediately precedes the "top" allocation, writing past the end of the allocation will clobber the "top" size.
If your string is larger than 23 (0x17) you will start to clobber the size/status of the subsequent allocation because you also need 1 byte to store the trailing NULL. However, any string 23 characters or shorter will not cause a problem.
As to why you didn't get an error with a string with 26 characters, to answer that one would have to see that exact program with the string of 26 characters that does not crash to give a more precise answer. For example, if the program provided a 26-character input that contained 3 blanks, this would would require only 26 + 1 - 3 = 24 bytes in the allocation, which would fit.
If you are not interested in that level of detail, fixing the malloc call to request the proper amount will fix your crash.
I have written this code but its not working but when I replace *targ and *sour by targ[] and sour[] then its working. Also it shows many error when I call the function converge like this converge(*targ, *sour). Please someone help me to understand this.
#include<stdio.h>
#include<string.h>
void converge(char *target, char *src);
int main()
{
char *targ = "xxxxxxxxxxxxxxxxxxx";
char *sour = "yyyyyyyyyyyyyyyyyyy";
converge(targ, sour);
//printf("%s", targ);
}
void converge(char *target, char *src)
{
int i, j;
for(i=0,j=strlen(src); i<=j; i++, j--)
{
target[i]= src[i];
target[j]= src[j];
printf("%s\n",target);
}
}
If you define a string like this:
char *targ = "abcd";
it is treated as read-only value, since the string "abcd" is stored in read-only memory, while the pointer targ is stored on your stack. Depending on your compiler you might get some warning unless you make this more explicit with const char *targ = "abcd";. An assignment like targ[i] = src[i]; is not allowed in this case.
If you define a string like this:
char targ[] = "abcd";
a char-array will be created on your data stack, this string can be changed, since data on your stack is readable and writable. Additionally you can access the first element of your array as pointer.
this is a simple linux kernel module code to reverse a string which should Oops after insmod,but it works well,why?
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
static char *words = "words";
static int __init words_init(void)
{
printk(KERN_INFO "debug info\n");
int len = strlen(words);
int k;
for ( k = 0; k < len/2; k++ )
{
printk("the value of k %d\n",k);
char a = words[k];
words[k]= words[len-1-k];
words[len-1-k]=a;
}
printk(KERN_INFO "words is %s\n", words);
return 0;
}
static void __exit words_exit(void)
{
printk(KERN_INFO "words exit.\n");
}
module_init(words_init);
module_exit(words_exit);
module_param(words, charp, S_IRUGO);
MODULE_LICENSE("GPL");
static != const.
static in your example means "only visible in the current file". See What does "static" mean?
const just means "the code which can see this declaration isn't supposed to change the variable". But in certain cases, you can still create a non-const pointer to the same address and modify the contents.
Space removal from a String - In Place C style with Pointers suggests that writing to the string value shouldn't be possible.
Since static is the only difference between your code and that of the question, I looked further and found this: Global initialized variables declared as "const" go to text segment, while those declared "Static" go to data segment. Why?
Try static const char *word, that should do the trick.
Haha!,I get the answer from the linux kernel source code by myself;When you use the insmod,it will call the init_moudle,load_module,strndup_usr then memdup_usr function;the memdup_usr function will use kmalloc_track_caller to alloc memery from slab and then use the copy_from_usr to copy the module paragram into kernel;this mean the linux kernel module paragram store in heap,not in the constant storage area!! So we can change it's content!
i am trying to use dgels function of lapacke:
when i use it with malloc fucntion. it doesnot give correct value.
can anybody tell me please what is the mistake when i use malloc and create a matrix?
thankyou
/* Calling DGELS using row-major order */
#include <stdio.h>
#include <lapacke.h>
#include <conio.h>
#include <malloc.h>
int main ()
{
double a[3][2] = {{1,0},{1,1},{1,2}};
double **outputArray;
int designs=3;
int i,j,d,i_mal;
lapack_int info,m,n,lda,ldb,nrhs;
double outputArray[3][1] = {{6},{0},{0}};*/
outputArray = (double**) malloc(3* sizeof(double*));
for(i_mal=0;i_mal<3;i_mal++)
{
outputArray[i_mal] = (double*) malloc(1* sizeof(double));
}
for (i=0;i<designs;i++)
{
printf("put first value");
scanf("%lf",&outputArray[i][0]);
}
m = 3;
n = 2;
nrhs = 1;
lda = 2;
ldb = 1;
info = LAPACKE_dgels(LAPACK_ROW_MAJOR,'N',m,n,nrhs,*a,lda,*outputArray,ldb);
for(i=0;i<m;i++)
{
for(j=0;j<nrhs;j++)
{
printf("%lf ",outputArray[i][j]);
}
printf("\n");
}
getch();
return (info);
}
The problem may come from outputArray not being contiguous in memory. You may use something like this instead :
outputArray = (double**) malloc(3* sizeof(double*));
outputArray[0]=(double*) malloc(3* sizeof(double));
for (i=0;i<designs;i++){
outputArray[i]=&outputArray[0][i];
}
Don't forget to free the memory !
free(outputArray[0]);
free(outputArray);
Edit : Contiguous means that you have to allocate the memory for all values at once. See http://www.fftw.org/doc/Dynamic-Arrays-in-C_002dThe-Wrong-Way.html#Dynamic-Arrays-in-C_002dThe-Wrong-Way : some packages, like fftw or lapack require this feature for optimization. As you were calling malloc three times, you created three parts and things went wrong.
If you have a single right hand side, there is no need for a 2D array (double**). outputArray[i] is a double*, that is, the start of the i-th row ( row major). The right line may be outputArray[i]=&outputArray[0][i*nrhs]; if you have many RHS.
By doing this in your code, you are building a 3 rows, one column, that is one RHS. The solution, is of size n=2. It should be outputArray[0][0] , outputArray[1][0]. I hope i am not too wrong, check this on simple cases !
Bye,
I'm working on a Word 2011 plugin in Mac OS. Currently, I need to write a code in VBA Macro to retrieve a String from another application (through Socket communication). So, basically in Windows, I can simply make a DLL which help me to do Socket communication with the other application and return the String value to VBA Macro.
However, in Mac, I'm able to build a .dylib (in C) and using VBA to communicate with the dylib. However, I'm having a trouble with the return String. My simple C code is something like:
char * tcpconnect(char* arguments)
{}
First, it always contains Chr(0) characters. Secondly, I suspected that this C function will not be able to handle Unicode String.
Do you guys have any experiences or have any similar example of this?
Thanks,
David
My original post was an attempt to imitate SysAllocStringByteLen() using malloc(), but this will fail when Excel tries to free the returned memory. Using Excel to allocate the memory fixes that issue, and is less code as well, e.g.:
in test.c:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define LPCSTR const char *
#define LPSTR char *
#define __declspec(dllexport)
#define WINAPI
char *saved_string = NULL;
int32_t saved_len = -1;
#define _CLEANUP if(saved_string) free(saved_string)
__attribute__((destructor))
static void finalizer(void) {
_CLEANUP;
}
int32_t __declspec(dllexport) WINAPI get_saved_string(LPSTR pszString, int cSize) {
int32_t old_saved_len = saved_len;
if(saved_len > 0 && cSize >= saved_len)
strncpy(pszString, saved_string, saved_len);
if(saved_string) {
free(saved_string);
saved_string = NULL;
saved_len = -1;
}
return old_saved_len;
}
int32_t __declspec(dllexport) WINAPI myfunc(LPCSTR *pszString) {
int len = (pszString && *pszString ? strlen(*pszString) : 0);
saved_string = malloc(len + 5);
saved_len = len + 5;
sprintf(saved_string, "%s%.*s", "abc:", len, *pszString);
return saved_len;
}
Compile the above with
gcc -g -arch i386 -shared -o test.dylib test.c
Then, in a new VBA module, use the below and run "test", which will prepend "abc:" to the string "hi there" and output the result the debug window:
Public Declare Function myfunc Lib "<colon-separated-path>:test.dylib" (s As String) As Long
Public Declare Function get_saved_string Lib "<colon-separated-path>:test.dylib" (ByVal s As String, ByVal csize As Long) As Long
Option Explicit
Public Function getDLLString(string_size As Long) As String
Dim s As String
If string_size > 0 Then
s = Space$(string_size + 1)
get_saved_string s, string_size + 1
End If
getDLLString = s
End Function
Public Sub test()
Debug.Print getDLLString(myfunc("hi there"))
End Sub