Convert integer to string with C++ compatible function for Matlab Coder - string

I'm using Matlab Coder to convert some Matlab code to C++, however I'm having trouble converting intergers to strings.
int2str() is not supported for code generation, so I must find some other way to convert ints to strings. I've tried googling it, without success. Is this even possible?

This can be done manually (very painful, though)
function s = thePainfulInt2Str( n )
s = '';
is_pos = n > 0; % //save sign
n = abs(n); %// work with positive
while n > 0
c = mod( n, 10 ); % get current character
s = [uint8(c+'0'),s]; %// add the character
n = ( n - c ) / 10; %// "chop" it off and continue
end
if ~is_pos
s = ['-',s]; %// add the sign
end

sprintf is another very basic function, so it possibly works in C++ as well:
x = int64(1948)
str = sprintf('%i',x)
It is also the underlying function used by int2str.
According to this comprehensive list of supported functions, as pointed out by Matt in the comments, sprintf is not supported, which is surprising. However there is the undocumented helper function (therefore not in the list) sprintfc which seems to work and can be used equivalently:
str = sprintfc('%i',x)

I use the following workaround to enable sprintf for general use with Matlab Coder:
1) Create the following m-file named "sprintf.m", preferably in a folder NOT on your Matlab path:
function s = sprintf(f, varargin)
if (coder.target('MATLAB'))
s = builtin('sprintf',f,varargin{:});
elseif (coder.target('MEX'))
s = builtin('sprintf',f,varargin{:});
else
coder.cinclude('stdio.h');
s = char(zeros(1024,1));
cf = [f,0]; % NULL-terminated string for use in C
coder.ceval('sprintf_s', coder.ref(s(1)), int32(1024), coder.rref(cf(1)), varargin{:});
end
2) Ensure that sprintf is not specified as extrinsic via coder.extrinsic
3) Specify the folder containing the newly created "sprintf.m" as additional include directory when generating code. If you use the codegen function, this can be done via the -I switch. If you use the Coder App, this can be done under "More Settings -> Custom Code -> Additional include directories" from the "Generate" tab.
4) Convert from int to string as follows: s=sprintf('%d',int32(n));
Notes:
The specified "sprintf.m" shadows the built-in sprintf function and executes instead of the built-in function every time you call sprintf from generated code. By placing this file in a folder that is not on the Matlab path, you avoid calling it from other code made to run in Matlab. The coder.target call also helps to navigate back to the built-in function in case this gets called in a normal Matlab session or from a MEX file.
The code above limits the result to 1023 characters (a terminating zero is required at the end). The call to sprintf_s instructs the C++ compiler to throw a runtime exception if the result exceeds this value. This prevents memory corruption that is often only caught much later and is harder to trace back to the offending call. This limit can be modified to your own requirements.
Numeric types must be cast to the correct class before passing them to sprintf, e.g. cast to int32 to match a %d in the format string. This requirement is the same when using fprintf with Matlab Coder. However, in the fprintf case, Matlab Coder catches type errors for you. For sprintf, the C++ compiler may fail or the resulting string may contain errors.
String arguments must be NULL-terminated manually to be used in C calls, as Matlab Coder does not do this automatically (credit to Ryan Livingston for pointing this out). The code above ensures that the format string f is NULL-terminated, but NULL-termination of other string arguments remains the responsibility of the calling function.
This code was tested on a Windows platform with a Visual Studio C++ compiler and Matlab R2016a (Matlab Coder 3.1), but is expected to work in most other environments as well.

Edit: As of MATLAB R2018a, sprintf is supported for code generation by MATLAB Coder.
Pre R2018a Answer
You could also call the C runtime sprintf or snprintf using coder.ceval. This has the benefit of making supporting floating point inputs easy as well. You can also change the formatting as desired by tweaking the format string.
Supposing that your compiler provides snprintf one could use:
function s = cint2str(x)
%#codegen
if coder.target('MATLAB')
s = int2str(x);
else
coder.cinclude('<stdio.h>');
assert(isfloat(x) || isinteger(x), 'x must be a float or an integer');
assert(x == floor(x) && isfinite(x), 'x must be a finite integer value');
if isinteger(x)
switch class(x)
% Set up for Win64, change to match your target
case {'int8','int16','int32'}
fmt = '%d';
case 'int64'
fmt = '%lld';
case {'uint8','uint16','uint32'}
fmt = '%u';
otherwise
fmt = '%llu';
end
else
fmt = '%.0f';
end
% NULL-terminate for C
cfmt = [fmt, 0];
% Set up external C types
nt = coder.opaque('int','0');
szt = coder.opaque('size_t','0');
NULL = coder.opaque('char*','NULL');
% Query length
nt = coder.ceval('snprintf',NULL,szt,coder.rref(cfmt),x);
n = cast(nt,'int32');
ns = n+1; % +1 for trailing null
% Allocate and format
s = coder.nullcopy(blanks(ns));
nt = coder.ceval('snprintf',coder.ref(s),cast(ns,'like',szt),coder.rref(cfmt),x);
assert(cast(nt,'int32') == n, 'Failed to format string');
end
Note that you'll possibly need to tweak the format string to match the hardware on which you're running since this assumes that long long is available and maps 64-bit integers to it.

Related

Reading a comma-separated string (not text file) in Matlab

I want to read a string in Matlab (not an external text file) which has numerical values separated by commas, such as
a = {'1,2,3'}
and I'd like to store it in a vector as numbers. Is there any function which does that? I only find processes and functions used to do that with text files.
I think you're looking for sscanf
A = sscanf(str,formatSpec) reads data from str, converts it according
to the format specified by formatSpec, and returns the results in an
array. str is either a character array or a string scalar.
You can try the str2num function:
vec = str2num('1,2,3')
If you have to use the cell a, per your example, it would be: vec=str2num(a{1})
There are some security warnings in the documentation to consider so be cognizant of how your code is being employed.
Another, more flexible, option is textscan. It can handle strings as well as file handles.
Here's an example:
cellResult = textscan('1,2,3', '%f','delimiter',',');
vec = cellResult{1};
I will use the eval function to "evaluate" the vector. If that is the structure, I will also use the cell2mat to get the '1,2,3' text (this can be approached by other methods too.
% Generate the variable "a" that contains the "vector"
a = {'1,2,3'};
% Generate the vector using the eval function
myVector = eval(['[' cell2mat(a) ']']);
Let me know if this solution works for you

Passing a Character Array from VBA to Fortran DLL through a Type is corrupting the other Type members

Believe it or not, that title is about as short as I could make it and still describe the problem I'm having!
So here's the scenario: I'm calling a Fortran DLL from VBA, and the DLL uses user-defined types or whatever the Fortran name is for that (structs?) as an argument and copies the type back to the caller for validation.
The type has an array of fixed-length characters and some run of the mill integers.
I've noticed some funny behavior in any attributes defined after this character array that I'll go over below, right after I describe my boiled-down testing setup:
The Fortran Side:
Here's the main program:
SUBROUTINE characterArrayTest (simpleTypeIn, simpleTypeOut)
use simpleTypeDefinition
!GCC$ ATTRIBUTES STDCALL :: characterArrayTest
type(simpleType), INTENT(IN) :: simpleTypeIn
type(simpleType), INTENT(OUT) :: simpleTypeOut
simpleTypeOut = simpleTypeIn
END SUBROUTINE characterArrayTest
And here is the simpleTypeDefinition module file:
Module simpleTypeDefinition
Type simpleType
character (len=1) :: CharacterArray(1)
!The length of the array is one here, but modified in tests
integer (kind=2) :: FirstInteger
integer (kind=2) :: SecondInteger
integer (kind=2) :: ThirdInteger
End Type simpleType
End Module simpleTypeDefinition
The compilation step:
gfortran -c simpleTypeDefinition.f90 characterArrayTest.f90
gfortran -shared -static -o characterArrayTest.dll characterArrayTest.o
Note: This is the 32-bit version of gfortran, as I'm using the 32-bit version of Excel.
The VBA Side:
First, the mirrored simpleType and declare statements:
Type simpleType
CharacterArray(0) As String * 1
'The length of the array is one here, but modified in tests
FirstInteger As Integer
SecondInteger As Integer
ThirdInteger As Integer
End Type
Declare Sub characterArrayTest Lib "characterArrayTest.dll" _
Alias "characterarraytest_#8" _
(simpleTypeIn As simpleType, simpleTypeOut As simpleType)
Next, the calling code:
Dim simpleTypeIn As simpleType
Dim simpleTypeOut As simpleType
simpleTypeIn.CharacterArray(0) = "A"
'simpleTypeIn.CharacterArray(1) = "B"
'simpleTypeIn.CharacterArray(1) = "C"
'simpleTypeIn.CharacterArray(3) = "D"
simpleTypeIn.FirstInteger = 1
simpleTypeIn.SecondInteger = 2
simpleTypeIn.ThirdInteger = 3
Call Module4.characterArrayTest(simpleTypeIn, simpleTypeOut)
The Strange, Buggy Behavior:
Now that we're past the setup, I can describe what's happening:
(I'm playing around with the length of the character array, while leaving the length of the individual characters set to one. I match the character array parameters on both sides in all cases.)
Test case: CharacterArray length = 1
For this first case, everything works great, I pass in the simpleTypeIn and simpleTypeOut from VBA, the Fortran DLL accepts it and copies simpleTypeIn to simpleTypeOut, and after the call VBA returns simpleTypeOut with identical attributes CharacterArray, FirstInteger, and so forth.
Test case: CharacterArray length = 2
This is where things get interesting.
Before the call, simpleTypeIn was as defined. Right after the call, simpleTypeIn.ThirdInteger had changed from 3 to 65! Even weirder, 65 is the ASCII value for the character A, which is simpleTypeIn.CharacterArray(0).
I tested this relationship by changing "A" to "(", which has an ASCII value of 40, and sure enough, simpleTypeIn.ThirdInteger changed to 40. Weird.
In any case, one would expect that simpleTypeOut would be a copy of whatever weird thing simpleTypeIn has been morphed to, but not so! simpleTypeOut was a copy of simpleTypeIn except for simpleTypeOut.ThirdInteger, which was 16961!
Test case: CharacterArray length = 3
This case was identical to case 2, oddly enough.
Test case: CharacterArray length = 4
In this also weird case, after the call simpleTypeIn.SecondInteger changed from 2 to 65, and simpleTypeIn.ThirdInteger changed from 3 to 66, which is the ASCII value for B.
Not to be outdone, simpleTypeOut.SecondInteger came out as 16961 and simpleTypeOut.ThirdInteger was 17475. The other values copied successfully (I decommented the B, C, and D character assignments to match the array size.)
Observations:
This weird corruption seems to be linear with respect to the bytes in the character array. I did some testing that I'll catalogue if anyone wants on Monday with individual characters of length 2 instead of 1, and the corruption happened when the array had a size of 1, as opposed to waiting until the size was 2. It also didn't "skip" additional corruption when the size of the array was 3 like the size = 1 case did.
This is easily a hall of fame bug for me; I'm sure you can imagine how much concentrated fun this was to isolate in a large-scale program with a ton of Type attributes. If anyone has any ideas it'd be greatly appreciated!
If I don't get back to you right away it's because I'm calling it a day, but I'll try to monitor my inbox.
(This answer is based on an understanding of Fortran, but not VBA)
In this case, and in most cases, Fortran won't automatically resize arrays for you. When you reference the second element of character array (with simpleTypeIn.CharacterArray(1) = "B"), that element doesn't exist and it isn't created.
Instead, the code attempts to set whatever memory would be at the location of the second element of the character array, if it were to exist. In this case, that memory appears to be used to store the integers instead.
You can see the same thing happening if you forget about VBA entirely. Here is a sample code entirely in Fortran to demonstrate similar behavior:
enet-mach5% cat main.f90
! ===== Module of types
module types_m
implicit none
type simple_t
character(len=1) :: CharacterArray(1)
integer :: int1, int2, int3
end type simple_t
end module types_m
! ===== Module of subroutines
module subroutines_m
use types_m, only : simple_t
implicit none
contains
! -- Subroutine to modify first character, this should work
subroutine sub1(s)
type(simple_t), intent(INOUT) :: s
s%CharacterArray(1) = 'A'
end subroutine sub1
! -- Subroutine to modify first and other (nonexistent) characters, should fail
subroutine sub2(s)
type(simple_t), intent(INOUT) :: s
s%CharacterArray(1) = 'B'
s%CharacterArray(2:8) = 'C'
end subroutine sub2
end module subroutines_m
! ===== Main program, drives test
program main
use types_m, only : simple_t
use subroutines_m, only : sub1, sub2
implicit none
type(simple_t) :: s
! -- Set values to known
s%int1 = 1
s%int2 = 2
s%int3 = 3
s%CharacterArray(1) = 'X'
! -- Write out values of s
write(*,*) 'Before calling any subs:'
write(*,*) 's character: "', s%CharacterArray, '"'
write(*,*) 's integers: ', s%int1, s%int2, s%int3
! -- Call first subroutine, should be fine
call sub1(s)
write(*,*) 'After calling sub1:'
write(*,*) 's character: "', s%CharacterArray, '"'
write(*,*) 's integers: ', s%int1, s%int2, s%int3
! -- Call second subroutine, should overflow character array and corrupt
call sub2(s)
write(*,*) 'After calling sub2:'
write(*,*) 's character: "', s%CharacterArray, '"'
write(*,*) 's integers: ', s%int1, s%int2, s%int3
write(*,*) 'complete'
end program main
In this case, I've put both modules and the main routine in the same file. Typically, one would keep them in separate files but it's ok for this example. I also had to set 8 elements of CharacterArray to manifest an error, but the exact sizing depends on the system, compiler, and optimization settings. Running this on my machine yields:
enet-mach5% gfortran --version
GNU Fortran (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
enet-mach5% gfortran main.f90 && ./a.out
main.f90:31.20:
s%CharacterArray(2:8) = 'C'
1
Warning: Lower array reference at (1) is out of bounds (2 > 1) in dimension 1
Before calling any subs:
s character: "X"
s integers: 1 2 3
After calling sub1:
s character: "A"
s integers: 1 2 3
After calling sub2:
s character: "B"
s integers: 1128481603 2 3
complete
Gfortran is smart enough to flag a compile-time warning that s%CharacterArray(2) is out of bounds. You can see the character array isn't resized, and the value of int1 is corrupted instead. If I compile with more run-time checking, I get a full error instead:
enet-mach5% gfortran -fcheck=all main.f90 && ./a.out
main.f90:31.20:
s%CharacterArray(2:8) = 'C'
1
Warning: Lower array reference at (1) is out of bounds (2 > 1) in dimension 1
Before calling any subs:
s character: "X"
s integers: 1 2 3
After calling sub1:
s character: "A"
s integers: 1 2 3
At line 31 of file main.f90
Fortran runtime error: Index '2' of dimension 1 of array 's' outside of expected range (1:1)
Looks like I'm (Edit: not) collecting my own bounty today!
The root of this problem lies in the fact that VBA takes 2 bytes per character while Fortran expects 1 byte per character. The memory garbling is caused by the character array taking up more space in memory than Fortran expects. The way to send 1 byte characters over to Fortran is as such:
Type Definition:
Type simpleType
CharacterArray(3) As Byte
FirstInteger As Integer
SecondInteger As Integer
ThirdInteger As Integer
End Type
Conversion from VBA character to Byte values:
Dim tempByte() As Byte
tempByte = StrConv("A", vbFromUnicode)
simpleTypeIn.CharacterArray(0) = tempByte(0)
tempByte = StrConv("B", vbFromUnicode)
simpleTypeIn.CharacterArray(1) = tempByte(0)
tempByte = StrConv("C", vbFromUnicode)
simpleTypeIn.CharacterArray(2) = tempByte(0)
tempByte = StrConv("D", vbFromUnicode)
simpleTypeIn.CharacterArray(3) = tempByte(0)
This code successfully passes the strings passed as arguments to the StrConv function. I tested that they translated to the proper ASCII characters in the Fortran DLL and they did! Also, the integers are no longer passed back incorrectly! A hall of fame bug has been stamped.

Parsing strings in Fortran

I am reading from a file in Fortran which has an undetermined number of floating point values on each line (for now, there are about 17 values on a line). I would like to read the 'n'th value on each line to a given floating point variable. How should i go about doing this?
In C the way I wrote it was to read the entire line onto the string and then do something like the following:
for(int il = 0; il < l; il++)
{
for(int im = -il; im <= il; im++)
pch = strtok(NULL, "\t ");
}
for(int im = -l; im <= m; im++)
pch = strtok(NULL, "\t ");
dval = atof(pch);
Here I am continually reading a value and throwing it away (thus shortening the string) until I am ready to accept the value I am trying to read.
Is there any way I can do this in Fortran? Is there a better way to do this in Fortran? The problem with my Fortran code seems to be that read(tline, '(f10.15)') tline1 does not shorten tline (tline is my string holding the entire line and tline1 what i am trying to parse it into), thus I cannot use the same method as I did in my C routine.
Any help?
The issue is that Fortran is a record-based I/O system while C is stream-based.
If you have access to a Fortran 2003 compliant compiler (modern versions of gfortran should work), you can use the stream ACCESS specifier to do what you want.
An example can be found here.
Of course, if you were really inclined, you could just use your C function directly from Fortran. Interfacing the two languages is generally simple, typically only requiring a wrapper with a lowercase name and an appended underscore (depending on compiler and platform of course). Passing arrays or strings back and forth is not so trivial typically; but for this example that wouldn't be needed.
Once the data is in a character array, you can read it into another variable as you are doing with the ADVANCE=no signature, ie.
do i = 1, numberIWant
read(tline, '(F10.15)', ADVANCE="no") tline1
end do
where tline should contain your number at the end of the loop.
Because of the record-based I/O, a READ statement will typically throw out what is after the end of the record. But the ADVANCE=no tells it not to.
If you know exactly at what position the value you want starts, you can use the T edit descriptor to initiate the next read from that position.
Let's say, for instance, that the width of each field is 10 characters and you want to read the fifth value. The read statement will then look something like the following.
read(file_unit, '(t41, f10.5)') value1
P.s.: You can dynamically create a format string at runtime, with the correct number after the t, by using a character variable as format and use an internal file write to put in this number.
Let's say you want the value that starts at position n. It will then look something like this (I alternated between single and double quotes to try to make it more clear where each string starts and stops):
write(my_format, '(a, i0, a)') "(t", n, ', f10.5)'
read(file_unit, my_format) value1

Modifying a character in a string in Lua

Is there any way to replace a character at position N in a string in Lua.
This is what I've come up with so far:
function replace_char(pos, str, r)
return str:sub(pos, pos - 1) .. r .. str:sub(pos + 1, str:len())
end
str = replace_char(2, "aaaaaa", "X")
print(str)
I can't use gsub either as that would replace every capture, not just the capture at position N.
Strings in Lua are immutable. That means, that any solution that replaces text in a string must end up constructing a new string with the desired content. For the specific case of replacing a single character with some other content, you will need to split the original string into a prefix part and a postfix part, and concatenate them back together around the new content.
This variation on your code:
function replace_char(pos, str, r)
return str:sub(1, pos-1) .. r .. str:sub(pos+1)
end
is the most direct translation to straightforward Lua. It is probably fast enough for most purposes. I've fixed the bug that the prefix should be the first pos-1 chars, and taken advantage of the fact that if the last argument to string.sub is missing it is assumed to be -1 which is equivalent to the end of the string.
But do note that it creates a number of temporary strings that will hang around in the string store until garbage collection eats them. The temporaries for the prefix and postfix can't be avoided in any solution. But this also has to create a temporary for the first .. operator to be consumed by the second.
It is possible that one of two alternate approaches could be faster. The first is the solution offered by Paŭlo Ebermann, but with one small tweak:
function replace_char2(pos, str, r)
return ("%s%s%s"):format(str:sub(1,pos-1), r, str:sub(pos+1))
end
This uses string.format to do the assembly of the result in the hopes that it can guess the final buffer size without needing extra temporary objects.
But do beware that string.format is likely to have issues with any \0 characters in any string that it passes through its %s format. Specifically, since it is implemented in terms of standard C's sprintf() function, it would be reasonable to expect it to terminate the substituted string at the first occurrence of \0. (Noted by user Delusional Logic in a comment.)
A third alternative that comes to mind is this:
function replace_char3(pos, str, r)
return table.concat{str:sub(1,pos-1), r, str:sub(pos+1)}
end
table.concat efficiently concatenates a list of strings into a final result. It has an optional second argument which is text to insert between the strings, which defaults to "" which suits our purpose here.
My guess is that unless your strings are huge and you do this substitution frequently, you won't see any practical performance differences between these methods. However, I've been surprised before, so profile your application to verify there is a bottleneck, and benchmark potential solutions carefully.
You should use pos inside your function instead of literal 1 and 3, but apart from this it looks good. Since Lua strings are immutable you can't really do much better than this.
Maybe
"%s%s%s":format(str:sub(1,pos-1), r, str:sub(pos+1, str:len())
is more efficient than the .. operator, but I doubt it - if it turns out to be a bottleneck, measure it (and then decide to implement this replacement function in C).
With luajit, you can use the FFI library to cast the string to a list of unsigned charts:
local ffi = require 'ffi'
txt = 'test'
ptr = ffi.cast('uint8_t*', txt)
ptr[1] = string.byte('o')

how to extract characters from a Korean string in VBA

Need to extract the initial character from a Korean word in MS-Excel and MS-Access.
When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ .
Is there a function to do this? or at least an idiom?
If you know how to get the Unicode value from the String I'd be able to work it out from there but I'm sure I'd be reinventing the wheel. (yet again)
Disclaimer: I know little about Access or VBA, but what you're having is a generic Unicode problem, it's not specific to those tools. I retagged your question to add tags related to this issue.
Access is doing the right thing by returning 한, it is indeed the first character of that two-character string. What you want here is the canonical decomposition of this hangul in its constituent jamos, also known as Normalization Form D (NFD), for “decomposed”. The NFD form is ᄒ ‌ᅡ ‌ᆫ, of which the first character is what you want.
Note also that as per your example, you seem to want a function to return the equivalent hangul (ㅎ) for the jamo (ᄒ) – there really are two different code points because they represent different semantic units (a full-fledged hangul syllable, or a part of a hangul). There is no pre-defined mapping from the former to the latter, you could write a small function to that effect, as the number of jamos is limited to a few dozens (the real work is done in the first function, NFD).
Adding to Arthur's excellent answer, I want to point out that extracting jamo from hangeul syllables is very straightforward from the standard. While the solution isn't specific to Excel or Access (it's a Python module), it only involves arithmetic expressions so it should be easily translated to other languages. The formulas, as can be seen, are identical to those in page 109 of the standard. The decomposition is returned as a tuple of integers encoded strings, which can be easily verified to correspond to the Hangul Jamo Code Chart.
# -*- encoding: utf-8 -*-
SBase = 0xAC00
LBase = 0x1100
VBase = 0x1161
TBase = 0x11A7
SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount
def decompose(syllable):
global SBase, LBase, VBase, TBase, SCount, LCount, VCount, TCount, NCount
S = ord(syllable)
SIndex = S - SBase
L = LBase + SIndex / NCount
V = VBase + (SIndex % NCount) / TCount
T = TBase + SIndex % TCount
if T == TBase:
result = (L,V)
else:
result = (L,V,T)
return tuple(map(unichr, result))
if __name__ == '__main__':
test_values = u'항가있닭넓짧'
for syllable in test_values:
print syllable, ':',
for s in decompose(syllable): print s,
print
This is the output in my console:
항 : ᄒ ᅡ ᆼ
가 : ᄀ ᅡ
있 : ᄋ ᅵ ᆻ
닭 : ᄃ ᅡ ᆰ
넓 : ᄂ ᅥ ᆲ
짧 : ᄍ ᅡ ᆲ
I think what you are looking for is a Byte Array
Dim aByte() as byte
aByte="한글"
should give you the two unicode values for each character in the string
I assume you got what you needed, but it seems rather convoluted. I don't know anything about this, but recently did some investigating of handling Unicode, and looked into all the string Byte functions, such as LeftB(), RightB(), InputB(), InStrB(), LenB(), AscB(), ChrB() and MidB(), and there's also StrConv(), which has a vbUnicode argument. These are all functions that I'd think would be used in any double-byte context, but then, I don't work in that environment so might be missing something very important.

Resources