squeak(smalltalk) how to use method `findSubstring: in: startingAt: matchTable:`? - string

what I should send for matchTable: selector?
in the implementation, there are no examples or detailed explanation so
I don't understand which object is getting the message if I put the string in in: selector

The matchTable: keyword provides a way to identify characters so that they become equivalent in comparisons. The argument is usually a ByteArray of 256 entries, containing at position i the code point of the ith character to be considered when comparing.
The main use of the table is to implement case-insensitive searches, where, e.g., A=a. Thus, instead of comparing the characters at hand during the search, what are compared are the elements found in the matchTable at their respective code points. So, instead of
(string1 at: i) = (string2 at: j)
the testing becomes something on the lines of
cp1 := string1 basicAt: i.
cp2 := string2 basicAt: j.
(table at: cp1) = (table at: cp2).
In other words, the matchTable: argument is used to map actual characters to the ones that actually matter for the comparisons.
Note that the same technique can be applied for case-sensitive/insensitive sorting.
Finally, bear in mind that this is a rather low-level method that non-system programmers would rarely need. You should be using instead higher level versions for finding substrings such as findString:startingAt:caseSensitive:, where the argument of the last keyword is a Boolean.

Related

Way to find a number at the end of a string in Smalltalk

I have different commands my program is reading in (i.e., print, count, min, max, etc.). These words can also include a number at the end of them (i.e., print3, count1, min2, max6, etc.). I'm trying to figure out a way to extract the command and the number so that I can use both in my code.
I'm struggling to figure out a way to find the last element in the string in order to extract it, in Smalltalk.
You didn't told which incarnation of Smalltalk you use, so I will explain what I would do in Pharo, that is the one I'm familiar with.
As someone that is playing with Pharo a few months at most, I can tell you the sheer amount of classes and methods available can feel overpowering at first, but the environment actually makes easy to find things. For example, when you know the exact input and output you want, but doesn't know if a method already exists somewhere, or its name, the Finder actually allow you to search by giving a example. You can open it in the world menu, as shown bellow:
By default it seeks selectors (method names) matching your input terms:
But this default is not what we need right now, so you must change the option in the upper right box to "Examples", and type in the search field a example of the input, followed by the output you want, both separated by a ".". The input example I used was the string 'max6', followed by the desired result, the number 6. Pharo then gives me a list of methods that match that:
To get what would return us the text part, you can make a new search, changing the example output from number 6 to the string 'max':
Fortunately there is several built-in methods matching the description of your problem.
There are more elegant ways, I suppose, but you can make use of the fact that String>>#asNumber only parses the part it can recognize. So you can do
'print31' reversed asNumber asString reversed asNumber
to give you 31. That only works if there actually is a number at the end.
This is one of those cases where we can presume the input data has a specific form, ie, the only numbers appear at the end of the string, and you want all those numbers. In that case it's not too hard to do, really, just:
numText := 'Kalahari78' select: [ :each | each isDigit ].
num := numText asInteger. "78"
To get the rest of the string without the digits, you can just use this:
'Kalahari78' withoutTrailingDigits. "Kalahari"6
As some of the Pharo "OGs" pointed out, you can take a look at the String class (just type CMD-Return, type in String, hit Return) and you will find an amazing number of methods for all kinds of things. Usually you can get some ideas from those. But then there are times when you really just need an answer!

Extract substring using index in Pharo Smalltalk

I'm trying to get a substring from an initial string in Smalltalk. I'm wondering if there's a way to do it. For example in Java, the method aStringObject.substring(index), allows you to trim a String object using an index (or its position in the array). I've been looking in the browser for something that works in a similar way, but couldn't find it. So far every trimming method uses a character or string to do the separation.
As an example of what I'm looking for:
initialString:='Hello'.
finalString:=initialString substring: 1
The value of finalString should be 'ello'.
In Smalltalk a String is a type of SequencableCollection so you can use the copying protocol messages as well.
For example you could use:
copyFrom: start to: stop
allButFirst (will not copy the first character)
allButFirst: n (more generally answer a copy of the receiver containing all but the first n elements.

COBOL substring between two finite points

I understand that the string_variable(start:length) can be used to get a substring of a string given a starting point and substring length, however, I am finding that I often need to get a substring between a 'start' and 'end' point.
While I know I could always do this:
SUBTRACT start FROM end GIVING len
string(start:len)
It seems cumbersome to have to do so every time when I am writing programs that use this functionality often. Is there perhaps a quicker/built-in way of achieving this?
How about?
move str (start-pos : end-pos - start-pos + 1) to ...
You can subtract the first from the last, but you need to add 1 to get the correct length.
STRING is a statement name, as is START, and END is reserved. LENGTH is a function name. I avoid those in anything that looks like code.

Find the maximal input string matching a regular expression

Given a regular expression re and an input string str, I want to find the maximal substring of str, which starts at the minimal position, which matches re.
Special case:
re = Regex("a+|[ax](bc)*"); str = "yyabcbcb"
matching re with str should return the matching string "abcbc" (and not "a", as PCRE does). I also have in mind, that the result is as I want, if the order of the alternations is changed.
The options I found were:
POSIX extended RE - probably outdated, used by egrep ...
RE2 by Google - open source RE2 - C++ - also C-wrapper available
From my point of view, there are two problems with your question.
First is that changing the order of alternations the results are supposed to change.
For each single 'a' in the string, it can either match 'a+' or "ax*".
So it is ambiguous for matching 'a' to alternations in your regular expression.
Second, for finding the maximal substring, it requires the matching pattern of the longest match. As far as I know, only RE2 has provided such a feature, as mentioned by #Cosinus.
So my recommendation is that separating "a+|ax*" into two regexes, finding the maximal substring in each of them, and then comparing the positions of both substrings.
As to find the longest match, you can also refer to a previous regex post description here. The main idea is to search for substrings starting from string position 0 to len(str) and to keep track of the length and position when matched substrings are found.
P.S. Some languages provide regex functions similar to "findall()". Be careful of using them since the returns may be non-overlapping matches. And non-overlapping matches do not necessarily contain the longest matching substring.

How do I TRIM a character array in standard F77?

I'm reading from ASCII data files with text headers. (The headers contain info about the data run.) I want to add some of the columns of each data file, then write the result to another data file, but keep the headers for each of the files. The problem is, I don't know beforehand what the lengths of the header lines are. If I use a long character variable (character*400, for example) to make sure I get the entire header lines, then my new data files have lots of white space I don't want. Basically, I want to do TRIM(HeaderVariable), but TRIM is not available to me. Any suggestions? Is there a way to WRITE only to a CrLF? I thought of using an array of character*1, and testing each character as I read it and write it, but...wow, that's sooooo complicated. Is there a simpler way to do this in standard F77?
[edit: self-answer moved to answer. could not do it at first because rep was too low.]
I got the answer. Posting here to help others. The LENGTH function below is taken from http://www.star.le.ac.uk/~cgp/prof77.html#tth_sEc7 Once you've got this LENGTH function, it's trivial to implement your own TRIM function. Functionally, this isn't much different from my initial horrid idea, but it's prettier.
LEN The LEN function takes a character argument and returns its length as an integer. The argument may be a local character variable or array element but this will just return a constant. LEN is more useful in procedures where character dummy arguments (and character function names) may have their length passed over from the calling unit, so that the length may be different on each procedure call. The length returned by LEN is that declared for the item. Sometimes it is more useful to find the length excluding trailing blanks. The next function does just that, using LEN in the process.
INTEGER FUNCTION LENGTH(STRING) !Returns length of string ignoring trailing blanks
CHARACTER*(*) STRING
DO 15, I = LEN(STRING), 1, -1
IF(STRING(I:I) .NE. ' ') GO TO 20
15 CONTINUE
20 LENGTH = I
END

Resources