Lua: Capturing String Based on Number of Symbols Received - string

I currently have a string that can be any length in size based on a single digit in one or two specific locations (based on the first digit captured). For example:
Changed
First digit captured tells me IF a file name is to follow: "1" = Object Name Follows. "0" = Next input captured is Length Multiplier.
"1" is not always received. But "0" is always received.
With "1" Capture it looks like this:
START|(1)|NAMEOFGRAPHIC|(0)|(#)|INPUT|INPUT|INPUT|INPUT|... etc
With "0" (no "1" captured)
START|(0)|(#)|INPUT|INPUT|INPUT|INPUT|... etc
The Length Multiplier bit (always follows "0") is the number of INPUT groups to follow. A "group" is a set of 4xINPUT's. So, if it was a "4", the string I want to completely capture looks like this:
With a "1":
START|(1)|NAMEOFGRAPHIC|(0)|(4)|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|
With a "0":
START|(0)|(4)|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|INPUT|
As each INPUT is received, a pipe symbol is added after. I want to use the pipes to monitor the length of the input based on the digit. If the digit is 5, for example, it would capture the 3x INPUT, 5, then 5x INPUT after (with all pipes included). Once this is done, the function would send the fully captured string to other function(s) for use.
I am having problems working out the receiving function to capture this full string. I have tried to count the number of pipes in different loop functions and all are resulting in errors.
Attempts include (please understand I'm pretty new to all of this):
local buffer = ""
function pipe_count(input)
a = "|"
buffer = buffer..input.."|"
while #a < 5 do
buffer = buffer..input.."|"
return buffer
end
end
local buffer = ""
function pipe_count(input)
buffer = buffer..input.."|"
mult = tonumber(buffer:match("(.-|.-|.-|(%d)|.*)"))
while buffer do
for i = 1, mult do
buffer = buffer..input.."|"
end
return buffer
end
Those were two examples I tried. I deleted my other futile attempts to capture the exact string length. My current issue that it is taking the INPUT captures, as each one is received, and sending it to the next function prior to capturing the entire string. So, if I had received the string at the top, it would look like this:
`INPUT`
`INPUT|INPUT`
`INPUT|INPUT|INPUT`
`INPUT|INPUT|INPUT|5`
`INPUT|INPUT|INPUT|5|INPUT`
`INPUT|INPUT|INPUT|5|INPUT|INPUT` etc
until finally the string below is received:
`INPUT|INPUT|INPUT|5|INPUT|INPUT|INPUT|INPUT|INPUT|`
At this point, my file runs as it should. But up until this point, I'm getting errors since the parameters of the function(s) aren't fully met.
Ideally, I want that last string before moving on.
Any ideas would be very welcomed and appreciated.
Cheers
ETA: These INPUT's are filling a buffer. I want that check digit to be responsible for the string to only be used if the length value is met. Again, I really appreciate all input. Thank you.
ETA: Example code tried and more input details.

All strings in Lua are internalized, so it's usually a better idea to push strings onto an array than to repeatedly rebuild the same string. This example takes input line by line from stdin. 3 data inputs, followed by a number, followed by that number of data inputs. There are plenty of other ways to do it, but this is pretty easy to follow.
local buffer = {}
function process_input(input)
if #buffer == 3 then
input = tonumber(input)
end
table.insert(buffer,input)
if #buffer > 4 and #buffer == buffer[4] + 4 then
local pipe_delim = table.concat(buffer,'|')
buffer = {}
return pipe_delim
end
end
repeat
local input = io.read()
local pipe_delim = process_input( input )
if pipe_delim then
print('Got:', pipe_delim)
end
until false

Related

Write statement for a complex format / possibility to write more than once on the same excel line

I am presently working on a file to open one by one .txt documents, extract data, to finally fill a .excel document.
Because I did not know how it is possible to write multiple times on the same line of my Excel document after one write statement (because it jumps to the next line), I have created a string of characters which is filled time after time :
Data (data_limite(x),x=1,8)/10, 9, 10, 7, 9, 8, 8, 9/
do file_descr = 1,nombre_fichier,1
taille_data1 = data_limite(file_descr)
nvari = taille_data1-7
write (new_data1,"(A30,A3,A11,A3,F5.1,A3,A7,F4.1,<nvari>(A3))") description,char(9),'T-isotherme',char(9),T_trait,char(9),'d_gamma',taille_Gam,(char(9),i=1,nvari)
ecriture_descr = ecriture_descr//new_data1
end do
Main issue was I want to adapt char(9) amount with the data_limite value so I built a write statement with a variable amount of char(9).
At the end of the do-loop, I have a very complex format of ecriture_descr which has no periodic format due to the change of the nvari value
Now I want to add this to the first line of my .excel :
Open(Unit= 20 ,File='resultats.RES',status='replace')
write(20,100) 'param',char(9),char(9),char(9),char(9),char(9),'*',char(9),'nuances',char(9),'*',char(9),ecriture_descr
100 format (a5,5(a3),a,a3,a7,a,a3,???)
but I do not know how to write this format. It would have been easier if, at each iteration of the do-loop I could fill the first line of my excel and continue to fill the first line at each new new_data1 value.
EDIT : maybe adding advance='no' in my write statement would help me, I am presently trying to add it
EDIT 2 : it did not work with advance='no' but adding a '$' at the end of my format write statement disable the return of my function. By moving it to my do-loop, I guess I can solve my problem :). I am presently trying to add it
First of all, your line
ecriture_descr = ecriture_descr//new_data1
Is almost certainly not doing what you expect it to do. I assume that both ecriture_descr and new_data are of type CHARACTER(len=<some value>) -- that is a fixed length string. If you assign anything to such a string, the string is cut to length (if the assigned is too long), or padded with spaces (if the assigned is too short:
program strings
implicit none
character(len=8) :: h
h = "Hello"
print *, "|" // h // "|" ! Prints "|Hello |"
h = "Hello World"
print *, "|" // h // "|" ! Prints "|Hello Wo|"
end program strings
And this combination will work against you: ecriture_descr will already be padded to the max with spaces, so when you append new_data1 it will be just outside the range of ecriture_descr, a bit like this:
h = "Hello" ! h is actually "Hello "
h = h // "World" ! equiv to h = "Hello " // "World"
! = "Hello World"
! ^^^^^^^^^
! Only this is assigned to h => no change
If you want a string aggregator, you need to use the trim function which removes all trailing spaces:
h = trim(h) // " World"
Secondly, if you want to write to a file, but don't want to have a newline, you can add the option advance='no' into the write statement:
do i = 1, 100
write(*, '(I4)', advance='no') i
end do
This should make your job a lot easier than to create one very long string in memory and then write it all out in one go.

strange character in Fortran write output

I want to time some subroutines. Here is the template I use to write the name and duration of execution:
SUBROUTINE get_sigma_vrelp
...declarations...
real(8) :: starttime, endtime
CHARACTER (LEN = 200) timebuf
starttime = MPI_Wtime()
...do stuff...
endtime = MPI_Wtime()
write (timebuf, '(30a,e20.10e3)') 'get_sigma_vrelp',endtime-starttime
call pout(timebuf)
END SUBROUTINE get_sigma_vrelp
And here is a sample output:
(thread 4):get_sigma_vrelp �>
Why is a strange character printed instead of a numerical value for endtime-starttime? Incidentally, pout() simply writes the buffer to a process-specific file in a threadsafe manner. It shouldn't have anything to do with the problem, but if there is nothing else here that would cause the erroneous output then I can post its body.
You have it the wrong way round! The line should read
write (timebuf, '(a30,e20.10e3)') 'get_sigma_vrelp',endtime-starttime
This way, you expect one string that is 30 characters long (a30) instead of 30 strings of arbitrary length (30a). The write statement does not receive characters after the first string, but the corresponding bytes of the float. Hence the garbage.
Your character literal is only 15 chars long, so you could write the line as
write (timebuf, '(a15,e20.10e3)') 'get_sigma_vrelp',endtime-starttime
or let the compiler decide the length on its own:
write (timebuf, '(a,e20.10e3)') 'get_sigma_vrelp',endtime-starttime

How to read the number of a prespecified character that appears in a string variable in Matlab

I process a big file with Matlab. In each line of the input file, data are separated with dots ".". Due to poor format, the number of dots may change line by line of the input file.
For example:
line1 = 'DIDYMOTE.150.L20'
line2 = 'N.ELBETI.150.L10'
How can I read the number of dots that appear in each line ?
In matlab everything is an array. So
data = load('file.txt');
[no_lines, no_characters] = size(data);
for i = 1 : no_lines
no_dots[i] = 0
for j = 1 : no_characters
if data[i][j] == '.'
no_dots[i] = no_dots[i] + 1
end
end
end
However, matlab has no strings, and is very unsuitable for handling text data. If any of the lines has different length you will get an error. Even if this is not the case, you are better off using another language for this. It will take you less time to learn how to process text in Python (for example), than trying to fit your problem into matlab.

Fortran read of data with * to signify similar data

My data looks like this
-3442.77 -16749.64 893.08 -3442.77 -16749.64 1487.35 -3231.45 -16622.36 902.29
.....
159*2539.87 10*0.00 162*2539.87 10*0.00
which means I start with either 7 or 8 reals per line and then (towards the end) have 159 values of 2539.87 followed by 10 values of 0 followed by 162 of 2539.87 etc. This seems to be a space-saving method as previous versions of this file format were regular 6 reals per line.
I am already reading the data into a string because of not knowing whether there are 7 or 8 numbers per line. I can therefore easily spot lines that contain *. But what then? I suppose I have to identify the location of each * and then identify the integer number before and real value after before assigning to an array. Am I missing anything?
Read the line. Split it into tokens delimited by whitespace(s). Replace the * in tokens that have it with space. Then read from the string one or two values, depending on wheather there was an asterisk or not. Sample code follows:
REAL, DIMENSION(big) :: data
CHARACTER(LEN=40) :: token
INTEGER :: iptr, count, idx
REAL :: val
iptr = 1
DO WHILE (there_are_tokens_left)
... ! Get the next token into "token"
idx = INDEX(token, "*")
IF (idx == 0) THEN
READ(token, *) val
count = 1
ELSE
! Replace "*" with space and read two values from the string
token(idx:idx) = " "
READ(token, *) count, val
END IF
data(iptr:iptr+count-1) = val ! Add "val" "count" times to the list of values
iptr = iptr + count
END DO
Here I have arbitrarily set the length of the token to be 40 characters. Adjust it according to what you expect to find in your input files.
BTW, for the sake of completeness, this method of compressing something by replacing repeating values with value/repetition-count pairs is called run-length encoding (RLE).
Your input data may have been written in a form suitable for list directed input (where the format specification in the READ statement is simply ''*''). List directed input supports the r*c form that you see, where r is a repeat count and c is the constant to be repeated.
If the total number of input items is known in advance (perhaps it is fixed for that program, perhaps it is defined by earlier entries in the file) then reading the file is as simple as:
REAL :: data(size_of_data)
READ (unit, *) data
For example, for the last line shown in your example on its own ''size_of_data'' would need to be 341, from 159+10+162+10.
With list directed input the data can span across multiple records (multiple lines) - you don't need to know how many items are on each line in advance - just how many appear in the next "block" of data.
List directed input has a few other "features" like this, which is why it is generally not a good idea to use it to parse "arbitrary" input that hasn't been written with it in mind - use an explicit format specification instead (which may require creating the format specification on the fly to match the width of the input field if that is not know ahead of time).
If you don't know (or cannot calculate) the number of items in advance of the READ statement then you will need to do the parsing of the line yourself.

Array of Strings in Fortran 77

I've a question about Fortran 77 and I've not been able to find a solution.
I'm trying to store an array of strings defined as the following:
character matname(255)*255
Which is an array of 255 strings of length 255.
Later I read the list of names from a file and I set the content of the array like this:
matname(matcount) = mname
EDIT: Actually mname value is hardcoded as mname = 'AIR' of type character*255, it is a parameter of a function matadd() which executes the previous line. But this is only for testing, in the future it will be read from a file.
Later on I want to print it with:
write(*,*) matname(matidx)
But it seems to print all the 255 characters, it prints the string I assigned and a lot of garbage.
So that is my question, how can I know the length of the string stored?
Should I have another array with all the lengths?
And how can I know the length of the string read?
Thanks.
You can use this function to get the length (without blank tail)
integer function strlen(st)
integer i
character st*(*)
i = len(st)
do while (st(i:i) .eq. ' ')
i = i - 1
enddo
strlen = i
return
end
Got from here: http://www.ibiblio.org/pub/languages/fortran/ch2-13.html
PS: When you say: matname(matidx) it gets the whole string(256) chars... so that is your string plus blanks or garbage
The function Timotei posted will give you the length of the string as long as the part of the string you are interested in only contains spaces, which, if you are assigning the values in the program should be true as FORTRAN is supposed to initialize the variables to be empty and for characters that means a space.
However, if you are reading in from a file you might pick up other control characters at the end of the lines (particularly carriage return and/or line feed characters, \r and/or \n depending on your OS). You should also toss those out in the function to get the correct string length. Otherwise you could get some funny print statements as those characters are printed as well.
Here is my version of the function that checks for alternate white space characters at the end besides spaces.
function strlen(st)
integer i,strlen
character st*(*)
i = len(st)
do while ((st(i:i).eq.' ').or.(st(i:i).eq.'\r').or.
+ (st(i:i).eq.'\n').or.(st(i:i).eq.'\t'))
i = i - 1
enddo
strlen = i
return
end
If there are other characters in the "garbage" section this still won't work completely.
Assuming that it does work for your data, however, you can then change your write statement to look like this:
write(*,*) matname(matidx)(1:strlen(matname(matidx)))
and it will print out just the actual string.
As to whether or not you should use another array to hold the lengths of the string, that is up to you. the strlen() function is O(n) whereas looking up the length in a table is O(1). If you find yourself computing the lengths of these static strings often, it may improve performance to compute the length once when they are read in, store them in an array and look them up if you need them. However, if you don't notice the slowdown, I wouldn't worry about it.
Depending on the compiler that you are using, you may be able to use the trim() intrinsic function to remove any leading/trailing spaces from a string, then process it as you normally would, i.e.
character(len=25) :: my_string
my_string = 'AIR'
write (*,*) ':', trim(my_string), ':'
should print :AIR:.
Edit:
Better yet, it looks like there is a len_trim() function that returns the length of a string after it has been trimmed.
intel and Compaq Visual Fortran have the intrinsic function LEN_TRIM(STRING) which returns the length without trailing blanks or spaces.
If you want to suppress leading blanks or spaces, use "Adjust Left" i.e. ADJUSTF(STRING)
In these FORTRANs I also note a useful feature: If you pass a string in to a function or subroutine as an argument, and inside the subroutine it is declared as CHARACTER*(*), then
using the LEN(STRING) function in the subroutine retruns the actual string length passed in, and not the length of the string as declared in the calling program.
Example:
CHARACTER*1000 STRING
.
.
CALL SUBNAM(STRING(1:72)
SUBROUTINE SYBNAM(STRING)
CHARACTER*(*) STRING
LEN(STRING) will be 72, not 1000

Resources