Parsing strings in Fortran

Parsing strings in Fortran - string

I am reading from a file in Fortran which has an undetermined number of floating point values on each line (for now, there are about 17 values on a line). I would like to read the 'n'th value on each line to a given floating point variable. How should i go about doing this?
In C the way I wrote it was to read the entire line onto the string and then do something like the following:
for(int il = 0; il < l; il++)
{
for(int im = -il; im <= il; im++)
pch = strtok(NULL, "\t ");
}
for(int im = -l; im <= m; im++)
pch = strtok(NULL, "\t ");
dval = atof(pch);
Here I am continually reading a value and throwing it away (thus shortening the string) until I am ready to accept the value I am trying to read.
Is there any way I can do this in Fortran? Is there a better way to do this in Fortran? The problem with my Fortran code seems to be that read(tline, '(f10.15)') tline1 does not shorten tline (tline is my string holding the entire line and tline1 what i am trying to parse it into), thus I cannot use the same method as I did in my C routine.
Any help?

The issue is that Fortran is a record-based I/O system while C is stream-based.
If you have access to a Fortran 2003 compliant compiler (modern versions of gfortran should work), you can use the stream ACCESS specifier to do what you want.
An example can be found here.
Of course, if you were really inclined, you could just use your C function directly from Fortran. Interfacing the two languages is generally simple, typically only requiring a wrapper with a lowercase name and an appended underscore (depending on compiler and platform of course). Passing arrays or strings back and forth is not so trivial typically; but for this example that wouldn't be needed.
Once the data is in a character array, you can read it into another variable as you are doing with the ADVANCE=no signature, ie.
do i = 1, numberIWant
read(tline, '(F10.15)', ADVANCE="no") tline1
end do
where tline should contain your number at the end of the loop.
Because of the record-based I/O, a READ statement will typically throw out what is after the end of the record. But the ADVANCE=no tells it not to.

If you know exactly at what position the value you want starts, you can use the T edit descriptor to initiate the next read from that position.
Let's say, for instance, that the width of each field is 10 characters and you want to read the fifth value. The read statement will then look something like the following.
read(file_unit, '(t41, f10.5)') value1
P.s.: You can dynamically create a format string at runtime, with the correct number after the t, by using a character variable as format and use an internal file write to put in this number.
Let's say you want the value that starts at position n. It will then look something like this (I alternated between single and double quotes to try to make it more clear where each string starts and stops):
write(my_format, '(a, i0, a)') "(t", n, ', f10.5)'
read(file_unit, my_format) value1

Related

Access a variable in Matlab using strcat

I have a situation where I want have many temperatures placed in column vectors, for example T101, T102, … and I would like to access these using a string cat command and place them in another vector. I created a simplified example to show what I am trying to achieve.
clc
clear all
T102 = [5; 8; 20; 21];
P102 = [T102;1]
P102 = [strcat('T','102');1]
However, I am receiving an error for the second time I define P102 because it has now become the string 'T102' and I want it to become the variable T102 and not the string.

I am not sure what you are trying to do and if it’s the right way.
But to answer your question you should use eval:
P102 = [eval(strcat('T','102'));1];

Convert integer to string with C++ compatible function for Matlab Coder

I'm using Matlab Coder to convert some Matlab code to C++, however I'm having trouble converting intergers to strings.
int2str() is not supported for code generation, so I must find some other way to convert ints to strings. I've tried googling it, without success. Is this even possible?

This can be done manually (very painful, though)
function s = thePainfulInt2Str( n )
s = '';
is_pos = n > 0; % //save sign
n = abs(n); %// work with positive
while n > 0
c = mod( n, 10 ); % get current character
s = [uint8(c+'0'),s]; %// add the character
n = ( n - c ) / 10; %// "chop" it off and continue
end
if ~is_pos
s = ['-',s]; %// add the sign
end

sprintf is another very basic function, so it possibly works in C++ as well:
x = int64(1948)
str = sprintf('%i',x)
It is also the underlying function used by int2str.
According to this comprehensive list of supported functions, as pointed out by Matt in the comments, sprintf is not supported, which is surprising. However there is the undocumented helper function (therefore not in the list) sprintfc which seems to work and can be used equivalently:
str = sprintfc('%i',x)

I use the following workaround to enable sprintf for general use with Matlab Coder:
1) Create the following m-file named "sprintf.m", preferably in a folder NOT on your Matlab path:
function s = sprintf(f, varargin)
if (coder.target('MATLAB'))
s = builtin('sprintf',f,varargin{:});
elseif (coder.target('MEX'))
s = builtin('sprintf',f,varargin{:});
else
coder.cinclude('stdio.h');
s = char(zeros(1024,1));
cf = [f,0]; % NULL-terminated string for use in C
coder.ceval('sprintf_s', coder.ref(s(1)), int32(1024), coder.rref(cf(1)), varargin{:});
end
2) Ensure that sprintf is not specified as extrinsic via coder.extrinsic
3) Specify the folder containing the newly created "sprintf.m" as additional include directory when generating code. If you use the codegen function, this can be done via the -I switch. If you use the Coder App, this can be done under "More Settings -> Custom Code -> Additional include directories" from the "Generate" tab.
4) Convert from int to string as follows: s=sprintf('%d',int32(n));
Notes:
The specified "sprintf.m" shadows the built-in sprintf function and executes instead of the built-in function every time you call sprintf from generated code. By placing this file in a folder that is not on the Matlab path, you avoid calling it from other code made to run in Matlab. The coder.target call also helps to navigate back to the built-in function in case this gets called in a normal Matlab session or from a MEX file.
The code above limits the result to 1023 characters (a terminating zero is required at the end). The call to sprintf_s instructs the C++ compiler to throw a runtime exception if the result exceeds this value. This prevents memory corruption that is often only caught much later and is harder to trace back to the offending call. This limit can be modified to your own requirements.
Numeric types must be cast to the correct class before passing them to sprintf, e.g. cast to int32 to match a %d in the format string. This requirement is the same when using fprintf with Matlab Coder. However, in the fprintf case, Matlab Coder catches type errors for you. For sprintf, the C++ compiler may fail or the resulting string may contain errors.
String arguments must be NULL-terminated manually to be used in C calls, as Matlab Coder does not do this automatically (credit to Ryan Livingston for pointing this out). The code above ensures that the format string f is NULL-terminated, but NULL-termination of other string arguments remains the responsibility of the calling function.
This code was tested on a Windows platform with a Visual Studio C++ compiler and Matlab R2016a (Matlab Coder 3.1), but is expected to work in most other environments as well.

Edit: As of MATLAB R2018a, sprintf is supported for code generation by MATLAB Coder.
Pre R2018a Answer
You could also call the C runtime sprintf or snprintf using coder.ceval. This has the benefit of making supporting floating point inputs easy as well. You can also change the formatting as desired by tweaking the format string.
Supposing that your compiler provides snprintf one could use:
function s = cint2str(x)
%#codegen
if coder.target('MATLAB')
s = int2str(x);
else
coder.cinclude('<stdio.h>');
assert(isfloat(x) || isinteger(x), 'x must be a float or an integer');
assert(x == floor(x) && isfinite(x), 'x must be a finite integer value');
if isinteger(x)
switch class(x)
% Set up for Win64, change to match your target
case {'int8','int16','int32'}
fmt = '%d';
case 'int64'
fmt = '%lld';
case {'uint8','uint16','uint32'}
fmt = '%u';
otherwise
fmt = '%llu';
end
else
fmt = '%.0f';
end
% NULL-terminate for C
cfmt = [fmt, 0];
% Set up external C types
nt = coder.opaque('int','0');
szt = coder.opaque('size_t','0');
NULL = coder.opaque('char*','NULL');
% Query length
nt = coder.ceval('snprintf',NULL,szt,coder.rref(cfmt),x);
n = cast(nt,'int32');
ns = n+1; % +1 for trailing null
% Allocate and format
s = coder.nullcopy(blanks(ns));
nt = coder.ceval('snprintf',coder.ref(s),cast(ns,'like',szt),coder.rref(cfmt),x);
assert(cast(nt,'int32') == n, 'Failed to format string');
end
Note that you'll possibly need to tweak the format string to match the hardware on which you're running since this assumes that long long is available and maps 64-bit integers to it.

How do you make a function detect whether a string is binary safe or not

How does one detect if a string is binary safe or not in Go?
A function like:
IsBinarySafe(str) //returns true if its safe and false if its not.
Any comment after this are just things I have thought or attempted to solve this:
I assumed that there must exist a library that already does this but had a tough time finding it. If there isn't one, how do you implement this?
I was thinking of some solution but wasn't really convinced they were good solutions.
One of them was to iterate over the bytes, and have a hash map of all the illegal byte sequences.
I also thought of maybe writing a regex with all the illegal strings but wasn't sure if that was a good solution.
I also was not sure if a sequence of bytes from other languages counted as binary safe. Say the typical golang example:
世界
Would:
IsBinarySafe(世界) //true or false?
Would it return true or false? I was assuming that all binary safe string should only use 1 byte. So iterating over it in the following way:
const nihongo = "日本語abc日本語"
for i, w := 0, 0; i < len(nihongo); i += w {
runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
w = width
}
and returning false whenever the width was great than 1. These are just some ideas I had just in case there wasn't a library for something like this already but I wasn't sure.

Binary safety has nothing to do with how wide a character is, it's mainly to check for non-printable characters more or less, like null bytes and such.
From Wikipedia:
Binary-safe is a computer programming term mainly used in connection
with string manipulating functions. A binary-safe function is
essentially one that treats its input as a raw stream of data without
any specific format. It should thus work with all 256 possible values
that a character can take (assuming 8-bit characters).
I'm not sure what your goal is, almost all languages handle utf8/16 just fine now, however for your specific question there's a rather simple solution:
// checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
func IsAsciiPrintable(s string) bool {
for _, r := range s {
if r > unicode.MaxASCII || !unicode.IsPrint(r) {
return false
}
}
return true
}
func main() {
fmt.Printf("len([]rune(s)) = %d, len([]byte(s)) = %d\n", len([]rune(s)), len([]byte(s)))
fmt.Println(IsAsciiPrintable(s), IsAsciiPrintable("test"))
}
playground
From unicode.IsPrint:
IsPrint reports whether the rune is defined as printable by Go. Such
characters include letters, marks, numbers, punctuation, symbols, and
the ASCII space character, from categories L, M, N, P, S and the ASCII
space character. This categorization is the same as IsGraphic except
that the only spacing character is ASCII space, U+0020.

Reading a string with spaces in Fortran

Using read(*,*) in Fortran doesn't seem to work if the string to be read from the user contains spaces.
Consider the following code:
character(Len = 1000) :: input = ' '
read(*,*) input
If the user enters the string "Hello, my name is John Doe", only "Hello," will be stored in input; everything after the space is disregarded. My assumption is that the compiler assumes that "Hello," is the first argument, and that "my" is the second, so to capture the other words, we'd have to use something like read(*,*) input1, input2, input3... etc. The problem with this approach is that we'd need to create large character arrays for each input, and need to know exactly how many words will be entered.
Is there any way around this? Some function that will actually read the whole sentence, spaces and all?

character(100) :: line
write(*,'("Enter some text: ",\)')
read(*,'(A)') line
write(*,'(A)') line
end
... will read a line of text of maximum length 100 (enough for most practical purposes) and write it out back to you. Modify to your liking.

Instead of read(*, *), try read(*, '(a)'). I'm no Fortran expert, but the second argument to read is the format specifier (equivalent to the second argument to sscanf in C). * there means list format, which you don't want. You can also say a14 if you want to read 14 characters as a string, for example.

Modifying a character in a string in Lua

Is there any way to replace a character at position N in a string in Lua.
This is what I've come up with so far:
function replace_char(pos, str, r)
return str:sub(pos, pos - 1) .. r .. str:sub(pos + 1, str:len())
end
str = replace_char(2, "aaaaaa", "X")
print(str)
I can't use gsub either as that would replace every capture, not just the capture at position N.

Strings in Lua are immutable. That means, that any solution that replaces text in a string must end up constructing a new string with the desired content. For the specific case of replacing a single character with some other content, you will need to split the original string into a prefix part and a postfix part, and concatenate them back together around the new content.
This variation on your code:
function replace_char(pos, str, r)
return str:sub(1, pos-1) .. r .. str:sub(pos+1)
end
is the most direct translation to straightforward Lua. It is probably fast enough for most purposes. I've fixed the bug that the prefix should be the first pos-1 chars, and taken advantage of the fact that if the last argument to string.sub is missing it is assumed to be -1 which is equivalent to the end of the string.
But do note that it creates a number of temporary strings that will hang around in the string store until garbage collection eats them. The temporaries for the prefix and postfix can't be avoided in any solution. But this also has to create a temporary for the first .. operator to be consumed by the second.
It is possible that one of two alternate approaches could be faster. The first is the solution offered by Paŭlo Ebermann, but with one small tweak:
function replace_char2(pos, str, r)
return ("%s%s%s"):format(str:sub(1,pos-1), r, str:sub(pos+1))
end
This uses string.format to do the assembly of the result in the hopes that it can guess the final buffer size without needing extra temporary objects.
But do beware that string.format is likely to have issues with any \0 characters in any string that it passes through its %s format. Specifically, since it is implemented in terms of standard C's sprintf() function, it would be reasonable to expect it to terminate the substituted string at the first occurrence of \0. (Noted by user Delusional Logic in a comment.)
A third alternative that comes to mind is this:
function replace_char3(pos, str, r)
return table.concat{str:sub(1,pos-1), r, str:sub(pos+1)}
end
table.concat efficiently concatenates a list of strings into a final result. It has an optional second argument which is text to insert between the strings, which defaults to "" which suits our purpose here.
My guess is that unless your strings are huge and you do this substitution frequently, you won't see any practical performance differences between these methods. However, I've been surprised before, so profile your application to verify there is a bottleneck, and benchmark potential solutions carefully.

You should use pos inside your function instead of literal 1 and 3, but apart from this it looks good. Since Lua strings are immutable you can't really do much better than this.
Maybe
"%s%s%s":format(str:sub(1,pos-1), r, str:sub(pos+1, str:len())
is more efficient than the .. operator, but I doubt it - if it turns out to be a bottleneck, measure it (and then decide to implement this replacement function in C).

With luajit, you can use the FFI library to cast the string to a list of unsigned charts:
local ffi = require 'ffi'
txt = 'test'
ptr = ffi.cast('uint8_t*', txt)
ptr[1] = string.byte('o')

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parsing strings in Fortran - string

Related

Access a variable in Matlab using strcat

Convert integer to string with C++ compatible function for Matlab Coder

How do you make a function detect whether a string is binary safe or not

Reading a string with spaces in Fortran

Modifying a character in a string in Lua

Categories

Resources