IBM ICU - String conversion functions - icu

In IBM ICU C library are there any string to number conversion functions. Something like atoi and atoll.
I am looking for ICU functions for string conversions - Cross platform, cross compiler and 32 and 64 bit version.
Function should throw an error. overflow or underflow.
I thought using errno -- But errno is not set in all platforms For Ex: Windows atoi.
strtol --> this function is for long data type. There is no function like strtoi.

Use the NumberFormat class (C++) or unum.h interfaces (C) for string to number (i.e. parsing). Instead of errno, ICU uses an error code system (UErrorCode).
HTH

Related

Read an allocatable string with a namelist in Fortran

Since Fortran 2003 is it possible to work with variable length character strings. Instead of working in an archaic way and declaring a constant string length I would like to read the character strings of my namelist dynamically.
Consider the program
program bug
implicit none
character(:), allocatable :: string
integer :: file_unit
namelist / list / string
open(newunit=file_unit,file='namelist.txt')
read(unit=file_unit,nml=list)
write(*,*) string
close(file_unit)
end program bug_namelist
and the small namelist contained in the following namelist.txt file:
&list
string = "abcdefghijkl"
/
If I compile with GCC 8.2.0 with agressive debug flags I get
Warning: ‘.string’ may be used uninitialized in this function [-Wmaybe-uninitialized]
and at runtine, nothing is printed and this arises:
Fortran runtime warning: Namelist object 'string' truncated on read.
and with the Intel compiler 17.0.6 with similar debug flags, no compile-time flags and the following runtime error:
forrtl: severe (408): fort: (7): Attempt to use pointer STRING when it is not associated with a target
which indicates that the namelist feature is unable to allocate a variable-length string "by itself", because if I add the line
allocate(character(len=15) :: string)
the errors disappear. Is this expected behavior? Or is it a defect from the compilers?
It is expected behavior, specified by the Fortran standard. In fact, nowhere in Fortran I/O are deferred-length allocatable strings treated the way they are in intrinsic assignment. There is a proposed work item for the next Fortran standard ("F202X") to allow for this in limited contexts (see https://j3-fortran.org/doc/year/18/18-279r1.txt) If I recall correctly, we discussed adding list-directed and NAMELIST reads to this at an earlier standards meeting, but some issues were raised that I don't recall precisely and we will revisit this.

How to retrieve the type of architecture (linux versus Windows) within my fortran code

How can I retrieve the type of architecture (linux versus Windows) in my fortran code? Is there some sort of intrinsic function or subroutine that gives this information? Then I would like to use a switch like this every time I have a system call:
if (trim(adjustl(Arch))=='Linux') then
resul = system('ls > output.txt')
elseif (trim(adjustl(Arch))=='Windows')
resul = system('dir > output.txt')
else
write(*,*) 'architecture not supported'
stop
endif
thanks
A.
The Fortran 2003 standard introduced the GET_ENVIRONMENT_VARIABLE intrinsic subroutine. A simple form of call would be
call GET_ENVIRONMENT_VARIABLE (NAME, VALUE)
which will return the value of the variable called NAME in VALUE. The routine has other optional arguments, your favourite reference documentation will explain all. This rather assumes that you can find an environment variable to tell you what the executing platform is.
If your compiler doesn't yet implement this standard approach it is extremely likely to have a non-standard approach; a routine called getenv used to be available on more than one of the Fortran compilers I've used in the recent past.
The 2008 standard introduced a standard function COMPILER_OPTIONS which will return a string containing the compilation options used for the program, if, that is, the compiler supports this sort of thing. This seems to be less widely implemented yet than GET_ENVIRONMENT_VARIABLE, as ever consult your compiler documentation set for details and availability. If it is available it may also be useful to you.
You may also be interested in the 2008-introduced subroutine EXECUTE_COMMAND_LINE which is the standard replacement for the widely-implemented but non-standard system routine that you use in your snippet. This is already available in a number of current Fortran compilers.
There is no intrinsic function in Fortran for this. A common workaround is to use conditional compilation (through makefile or compiler supported macros) such as here. If you really insist on this kind of solution, you might consider making an external function, e.g., in C. However, since your code is built for a fixed platform (Windows/Linux, not both), the first solution is preferable.

Visual C++ x86 ABI: how does VC++ return value?

I know 1/2/4-byte integers are returned in eax, and 8-byte integers are returned in eax:edx.
By the way, how are 4/8/16-byte floating-point values (Maybe I remember long double might be 16 bytes..) returned in cdecl/stdcall?
Thanks to #MatteoItalia, I know that VC++'s long double is 8-byte; Then, how can I use 16-byte floating-point?
(Don't say me "just use 8 byte!". I really need it.)
Um, I think I should be satisfied with 10-byte floating point...
FP return values are returned in the ST0 x87 register (see e.g. here).
By the way, in VC++ long double (which in x87 is 80 bit) is effectively a synonym for double.
You didn't provide the architecture but x86 returns floating-point values in ST(0) and x86_64 return in XMM0. See x86 calling conventions
But long double in VC for x86 and x86_64 is the same as double and won't provide you any more precision. So on Windows to do 80-bit floating-point operations you need to use another compiler such as GCC, Clang or ICC. Also, 80-bit long double is calculated by x87 so it may perform worse than a good SSE library.
If you need more than 10-byte long double then you must implement your own floating-point library or use some external libraries. GCC 4.3 and above have built-in support for __float128 through a soft library. See long double (GCC specific) and __float128
Another approach is using double-double to implement near-quadruple precision values like in PowerPC or SPARC. It's not IEEE compatible but you can utilize hardware double suport to speedup so it'll be faster than soft __float128

Is there a workaround for _Complex syntax in VC++?

I've got a library compiled with MinGW which supports the C99 Keywords, _Complex. I'd like to use this library with MSVC++ 2010 compiler. I've tried to temporarily switch off all the _Complex syntax code, so that it compiles. I found most of the other functions worked fine in MSVC++. Now I want to enable the parts with _Complex definition, but really don't know how to.
Obviously I can't recompile it in MSVC++ as the library asks for C99 features, etc. However, I feel like it is such a waste to give it up, and look for substutions, because it works perfect with most other functions.
I think I can write wrappers of the APIs that require _Complex syntax and compile it with MinGW GCC then it will be able to import into my MSVC project. But I still want to know if there is any better workaround of this problem, like what is the "standard" way people dealing with problem when compile C99 complex number syntax in VC++?
Xing.
From the C Standard (C11 §6.2.5 ¶13; C99 has approximately the same language):
Each complex type has the same representation and alignment requirements as an array
type containing exactly two elements of the corresponding real type; the first element is
equal to the real part, and the second element to the imaginary part, of the complex
number.
I don’t have the C++ Standard in front of me, but the complex type templates defined in <complex> have the same requirement; this is intended for compatibility.
You can therefore re-write C functions taking & returning values of type double _Complex as C++ functions taking & returning values of type std::complex<double>; so long as name-mangling on the C++ side has been turned off (via extern "C") both sides will be compatible.
Something like this might help:
#ifdef __cplusplus
#include <complex>
#define std_complex(T) std::complex<T>
#else
#define std_complex(T) T _Complex
#endif

Unfathomable problem with unicode and frameworks

I'm experiencing a very strange problem... The following trivial test code works as it should if it is injected in a single Cocoa application, but when I use it in one of my frameworks, I get absolutely unexpected results...
wchar_t Buf[2048];
wcscpy(Buf, L"/zbxbxklbvasyfiogkhgfdbxbx/bxkfiorjhsdfohdf/xbxasdoipppwejngfd/gjfdhjgfdfdjkg.sdfsdsrtlrt.ljlg/fghlfg");
int len1 = wcslen(L"/zbxbxklbvasyfiogkhgfdbxbx/bxkfiorjhsdfohdf/xbxasdoipppwejngfd/gjfdhjgfdfdjkg.sdfsdsrtlrt.ljlg/fghlfg");
int len2 = wcslen(Buf);
char Buf2[2048];
Buf2[0]=0;
wcstombs(Buf2, Buf, 2048);
// ??? Buf2 == ""
// ??? len1 == len2 == 57, but should be 101
How can this be, have I gone mad? Even if there was a memory corruption, it couldn't possibly corrupt all these values allocated on stack... Why won't even the wcslen(L"MyWideString") work? Changing test string changes its length, but it is always wrong, wcstombs returns -1...
setlocale() is not used anywhere, test string contains only ASCII characters, in order to ease porting I use the -fshort-wchar compiler option, but it works fine in case of a test Cocoa application...
Please help!
I've just tested this again with GCC 4.6. In the standard settings, this works as expected, giving 101 for all the lengths. However, with your option -fshort-wchar I also get unexpected results (51 in my case, and 251 for the final conversion after using setlocale()).
So I looked up the man entry for the option:
Warning: the -fshort-wchar switch causes GCC to generate code that is not binary compatible with code generated without that switch. Use it to conform to a non-default application binary interface.
I think that explains it: When you're linking to the standard library, you are expected to use the correct ABI and type conventions, which you are overriding with that option.
Wide char implementation in C/C++ can be anything, including 1 byte, 2 bytes or 4 bytes. This depends on the compiler and the platform you are compiling to.
Probably wikipedia is not the best place to quote from but in this case:
http://en.wikipedia.org/wiki/Wide_character states that
... width of wchar_t is compiler-specific and can be as small as 8 bits.
and
... wide characters should be 16-bit values under C90 due to historical compatibility reasons. C and C++ compilers that comply with the 10646-1:2000 Unicode standard generally assume 32-bit values....
So, do not assume and use the sizeof(wchar_t).
-fshort-wchar change the compiler's ABI, so you need to recompile glibc, libgcc and all library using wchar_t. Otherwise, wcslen and other functions in glibc are still assume wchar_t is 4 bytes.
see: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42092

Resources