why is this trim text trailing not working? - string

IDENTIFICATION DIVISION.
PROGRAM-ID. KATA.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-INPUT PIC A(200).
01 WS-OUT PIC A(200).
01 I PIC 9(08).
01 J PIC 9(08).
01 INP-LEN PIC 9(08).
PROCEDURE DIVISION.
DISPLAY "INPUT YOUR STRING"
ACCEPT WS-INPUT
DISPLAY "REVERSING ......."
MOVE FUNCTION LENGTH(FUNCTION TRIM(WS-INPUT TRAILING)) TO INP-LEN
DISPLAY "Just for reference : Your string is "INP-LEN " long"
MOVE 1 to I.
PERFORM VARYING J from INP-LEN by -1 UNTIL J =0
MOVE WS-INPUT(I:1) to WS-OUT(J:1)
MOVE FUNCTION TRIM(WS-OUT TRAILING) TO WS-OUT
ADD 1 to I
END-PERFORM
MOVE FUNCTION TRIM(WS-OUT TRAILING) TO WS-OUT.
DISPLAY WS-OUT
DISPLAY FUNCTION LENGTH(WS-OUT)
STOP RUN.
Run the program for input ctrl test
If you run the program you will see that the length of WS-INPUT is :
Just for reference : Your string is 00000009 long
But if you do that for output it will say length of string is 200
Also the reversed string I get is :
tset lrtc
Which is 200 and not what I set.
Can someone explain where I went wrong and what can I do to fix it ?
(Note : I initially tried with function REVERSE so a simple
MOVE FUNCTION REVERSE(WS-INPUT) TO WS-OUTPUT
same problem was there as well
)

FUNCTION LENGTH (source) takes the length from source, in your case that's WS-OUT, which is PIC A(200) - so the answer 200 is correct.
FUNCTION TRIM (source TRAILING) creates as every function a temporary/internal item - in this case removing trailing SPACES from source.
Because of your MOVE of this temporary item with length 9 to one field which is of length 200 it gets right-padded by spaces.
Only DYNAMIC LENGTH items get a dynamic size by MOVE, all other items always stay with their size. [keeping "ODO" out for simplicity...]
You possibly want a nested function call: TRIM + REVERSE / LENGTH:
DISPLAY FUNCTION LENGTH ( FUNCTION TRIM (WS-OUT) )
DISPLAY "-" FUNCTION REVERSE ( FUNCTION TRIM (WS-IN TRAILING) ) "-"

Related

How to 'manipulate' strings in BASIC V2?

I would like to reach the following:
I ask for a number from the user, and then output a string like the following:
-STR$
--STR$
---STR$
----STR$
-----STR$
I tried to do this:
10 INPUT NUM%
20 FOR X=1 TO NUM%: PRINT NUM%*"-" + "TEXT" : NEXT
The code above got me an error: ?TYPE MISMATCH EROR IN 20
However, I didn't yet figure out how to manipulate the string's beginning to multiply the '-' marks on each loop run
Maybe this:
10 INPUT NUM%
20 FOR I = 1 TO NUM%
30 FOR J = 1 TO I: PRINT "-"; : NEXT
40 PRINT " TEXT"
50 NEXT
There is no multipy of strings/character, as far as I remember to old (good) times.
I believe even older, more primitive forms of BASIC had the STRING$() function. It takes two parameters: the number of times to repeat the character and the character itself. So...
10 INPUT NUM%
20 FOR X=1 TO NUM%: PRINT STRING$(NUM%, "-") + "TEXT" : NEXT
An alternative:
100 INPUT NM%
110 BR$="----------"
120 PRINT LEFT$(BR$,NM%);
130 PRINT "TEXT"
This eliminates the need for an expensive FOR loop, and should be okay as long as NM% is not greater than the length of BR$.
One other thing to point out is that your variable names are effectively capped at two characters, e.g.:
The length of variable names are optional, but max. 80 chars (logical input line of BASIC). The BASIC interpreter used only the first 2 chars for controlling the using variables. The variables A$ and AA$ are different, but not AB$ and ABC$.
(Source: https://www.c64-wiki.com/wiki/Variable). For that reason I used NM% instead of NUM%; it will prevent issues later.

Trim trailing zeros in Fortran print/write

How can I print without trailing zeros? For example if there was a function nice:
real*8 ff
ff = -3.5d0
print*, "there are ", nice(ff), " horses"
or a formatter t
print'(a,t,a)', "there are ", ff, " horses"
should give:
there are -3.5 horses
This solution works by limiting the precision (there may be some round off errors after ca 16 decimals) and then checking from the end where the last non-0 character is
function nice(ff) result(out)
character(:), allocatable :: out
character(20) :: str
real*8, intent(in) :: ff
integer ii
write(str,'(f20.8)') ff
str = trim(adjustl(str))
do ii = len_trim(str),1,-1
if (str(ii:ii)/="0") exit
enddo
out = str(1:ii)
end
Note that interval indexing like (ii:ii) is required for strings.
An alternative and easier approach which will work (so long as the total length is smaller than the character array 'CDUMMY') is shown below:
PROGRAM MAIN
IMPLICIT NONE
REAL*8 FF
FF = 3.5
! ***** WRITE FF TO A CHARACTER ARRAY. ONCE IN THIS FORM YOU CAN REMOVE TRAILING SPACES
CHARACTER CDUMMY*12
WRITE(CDUMMY,'(F12.1)') FF
WRITE(*,'(3A)') 'There are ',TRIM(ADJUSTL(CDUMMY)),' horses'
END
If you are not too concerned about spaces (as opposed to trailing zeros), then you could simply write using the format specifier Fx.y. For example, to write a floating point number to one decimal place,set y to 1. x is the total size of the number to be outputted including decimal place. So F6.1 would be okay so long as there are less than '9999.9' horses (I feel sorry for the 10,000th horse). In context, it would like so:
WRITE(*,'(A,F6.1,A)') 'There are ',FF,' horses'
Which would yield the following (the underscores represent spaces):
There are ___3.5 horses

Extracting a specific word and a number of tokens on each side of it from each string in a column in SAS?

Extracting a specific word and a number of tokens on each side of it from each string in a column in SAS EG ?
For example,
row1: the sun is nice
row2: the sun looks great
row3: the sun left me
Is there a code that would produce the following result column (2 words where sun is the first):
SUN IS
SUN LOOKS
SUN LEFT
and possibly a second column with COUNT in case of duplicate matches.
So if there was 20 SUN LOOKS then it they would be grouped and have a count of 20.
Thanks
I think you can use functions findw() and scan() to do want you want. Both of those functions operate on the concept of word boundaries. findw() returns the position of the word in the string. Once you know the position, you can use scan() in a loop to get the next word or words following it.
Here is a simple example to show you the concept. It is by no means a finished or polished solution, but intended you point you in the right direction. The input data set (text) contains the sentences you provided in your question with slight modifications. The data step finds the word "sun" in the sentence and creates a variable named fragment that contains 3 words ("sun" + the next 2 words).
data text2;
set text;
length fragment $15;
word = 'sun'; * search term;
fragment_len = 3; * number of words in target output;
word_pos = findw(sentence, word, ' ', 'e');
if word_pos then do;
do i = 0 to fragmen_len-1;
fragment = catx(' ', fragment, scan(sentence, word_pos+i));
end;
end;
run;
Here is a partial print of the output data set.
You can use a combination of the INDEX, SUBSTR and SCAN functions to achieve this functionality.
INDEX - takes two arguments and returns the position at which a given substring appears in a string. You might use:
INDEX(str,'sun')
SUBSTR - simply returns a substring of the provided string, taking a second numeric argument referring to the starting position of the substring. Combine this with your INDEX function:
SUBSTR(str,INDEX(str,'sun'))
This returns the substring of str from the point where the word 'sun' first appears.
SCAN - returns the 'words' from a string, taking the string as the first argument, followed by a number referring to the 'word'. There is also a third argument that specifies the delimiter, but this defaults to space, so you wouldn't need it in your example.
To pick out the word after 'sun' you might do this:
SCAN(SUBSTR(str,INDEX(str,'sun')),2)
Now all that's left to do is build a new string containing the words of interest. That can be achieved with concatenation operators. To see how to concatenate two strings, run this illustrative example:
data _NULL_;
a = 'Hello';
b = 'World';
c = a||' - '||b;
put c;
run;
The log should contain this line:
Hello - World
As a result of displaying the value of the c variable using the put statement. There are a number of functions that can be used to concatenate strings, look in the documentation at CAT,CATX,CATS for some examples.
Hopefully there is enough here to help you.

MATLAB vs. GNU Octave Textscan disparity

I wish to read some data from a .dat file without saving the file first. In order to do so, my code looks as follows:
urlsearch= 'http://minorplanetcenter.net/db_search/show_object?utf8=&object_id=2005+PM';
url= 'http://minorplanetcenter.net/tmp/2005_PM.dat';
urlmidstep=urlread(urlsearch);
urldata=urlread(url);
received= textscan(urldata , '%5s %7s %1s %1s %1s %17s %12s %12s %9s %6s %6s %3s ' ,'delimiter', '', 'whitespace', '');
data_received = received{:}
urlmidstep's function is just to do a "search", in order to be able to create the temporary .dat file. This data is then stored in urldata, which is a long char array. When I then use textscan in MATLAB, I get 12 columns as desired, which are stored in a cell array data_received.
However, in Octave I get various warning messages: warning: strread: field width '%5s' (fmt spec # 1) extends beyond actual word limit (for various field widths). My question is, why is my result different in Octave and how could I fix this? Shouldn't Octave behave the same as MATLAB, as in theory any differences should be dealt with as bugs?
Surely specifying the width of the strings and leaving both the delimiter and whitespace input arguments empty should tell the function to only deal with width of string, allowing spaces to be a valid characters.
Any help would be much appreciated.
I thinhk textscan works differently in MATLAB and Octave. To illustrate let's simplify the example. The code:
test_line = 'K05P00M C2003 01 28.38344309 37 57.87 +11 05 14.9 n~1HzV645';
test = textscan(test_line,'%5s','delimiter','');
test{:}
will would yield the following in MATLAB:
>> test{:}
ans =
'K05P0'
'0M C'
'2003 '
'01 28'
'.3834'
'4309 '
'37 57'
'.87 +'
'11 05'
'14.9 '
'n~1Hz'
'V645'
whereas in Octave, you get:
>> test{:}
ans =
{
[1,1] = K05P0
[2,1] = C2003
[3,1] = 01
[4,1] = 28.38
[5,1] = 37
[6,1] = 57.87
[7,1] = +11
[8,1] = 05
[9,1] = 14.9
[10,1] = n~1Hz
}
So it looks like Octave jumps to the next word and discards any remaining character in the current word, whereas MATLAB treats the whole string as one continuous word.
Why that is and which is one is correct, I do not know, but hopefully it'll point you in the right direction for understanding what is going on. You can try adding the delimiter to see how it affects the results.

Replace multiple substrings using strrep in Matlab

I have a big string (around 25M characters) where I need to replace multiple substrings of a specific pattern in it.
Frame 1
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Frame 2
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Frame 7670
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
The substring I need to remove is the 'Frame #' and it occurs around 7670 times. I can give multiple search strings in strrep, using a cell array
strrep(text,{'Frame 1','Frame 2',..,'Frame 7670'},';')
However that returns a cell array, where in each cell, I have the original string with the corresponding substring of one of my input cell changed.
Is there a way to replace multiple substrings from a string, other than using regexprep? I noticed that it is considerably slower than strrep, that's why I am trying to avoid it.
With regexprep it would be:
regexprep(text,'Frame \d*',';')
and for a string of 25MB it takes around 47 seconds to replace all the instances.
EDIT 1: added the equivalent regexprep command
EDIT 2: added size of the string for reference, number of occurences for the substring and timing of execution for the regexprep
Ok, in the end I found a way to go around the problem. Instead of using regexprep to change the substring, I remove the 'Frame ' substring (including whitespace, but not the number)
rawData = strrep(text,'Frame ','');
This results in something like this:
1
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
2
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
7670
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Then, I change all the commas (,) and newline characters (\n) into a semicolon (;), using again strrep, and I create a big vector with all the numbers
rawData = strrep(rawData,sprintf('\r\n'),';');
rawData = strrep(rawData,';;',';');
rawData = strrep(rawData,';;',';');
rawData = strrep(rawData,',',';');
rawData = textscan(rawData,'%f','Delimiter',';');
then I remove the unnecessary numbers (1,2,...,7670), since they are located at a specific point in the array (each frame contains a specific amount of numbers).
rawData{1}(firstInstance:spacing:lastInstance)=[];
And then I go on with my manipulations. It seems that the additional strrep and removal of the values from the array is much much faster than the equivalent regexprep. With a string of 25M chars with regexprep I can do the whole operation in about 47", while with this workaround it takes only 5"!
Hope this helps somehow.
I think that this can be done using only textscan, which is known to be very fast. Be specifying a 'CommentStyle' the 'Frame #' lines are stripped out. This may only work because these 'Frame #' lines are on their own lines. This code returns the raw data as one big vector:
s = textscan(text,'%f','CommentStyle','Frame','Delimiter',',');
s = s{:}
You may want to know how many elements are in each frame or even reshape the data into a matrix. You can use textscan again (or before the above) to get just the data for the first frame:
f1 = textscan(text,'%f','CommentStyle','Frame 1','Delimiter',',');
f1 = s{:}
In fact, if you just want the elements from the first line, you can use this:
l1 = textscan(text,'%f,','CommentStyle','Frame 1')
l1 = l1{:}
However, the other nice thing about textscan is that you can use it to read in the file directly (it looks like you may be using some other means currently) using just fopen to get an FID. Thus the string data text doesn't have to be in memory.
Using regular expressions:
result = regexprep(text,'Frame [0-9]+','');
It's possible to avoid regular expressions as follows. I use strrep with suitable replacement strings that act as masks. The obtained strings are equal-length and are assured to be aligned, and can thus be combined into the final result using the masks. I've also included the ; you want. I don't know if it will be faster than regexprep or not, but it's definitely more fun :-)
% Data
text = 'Hello Frame 1 test string Frame 22 end of Frame 2 this'; %//example text
rep_orig = {'Frame 1','Frame 2','Frame 22'}; %//strings to be replaced.
%//May be of different lengths
% Computations
rep_dest = cellfun(#(s) char(zeros(1,length(s))), rep_orig, 'uni', false);
%//series of char(0) of same length as strings to be replaced (to be used as mask)
aux = cell2mat(strrep(text,rep_orig.',rep_dest.'));
ind_keep = all(double(aux)); %//keep characters according to mask
ind_semicolon = diff(ind_keep)==1; %//where to insert ';'
ind_keep = ind_keep | [ind_semicolon 0]; %// semicolons will also be kept
result = aux(1,:); %//for now
result(ind_semicolon) = ';'; %//include `;`
result = result(ind_keep); %//remove unwanted characters
With these example data:
>> text
text =
Hello Frame 1 test string Frame 22 end of Frame 2 this
>> result
result =
Hello ; test string ; end of ; this

Resources