Indices of a substring in Smalltalk - string

It seems Smalltalk implementations misses an algorithm which return all the indices of a substring in a String. The most similar ones returns only one index of an element, for example : firstIndexesOf:in: , findSubstring:, findAnySubstring: variants.
There are implementations in Ruby but the first one relies on a Ruby hack, the second one does not work ignoring overlapping Strings and the last one uses an Enumerator class which I don't know how to translate to Smalltalk. I wonder if this Python implementation is the best path to start since considers both cases, overlapping or not and does not uses regular expressions.
My goal is to find a package or method which provides the following behavior:
'ABDCDEFBDAC' indicesOf: 'BD'. "#(2 8)"
When overlapping is considered:
'nnnn' indicesOf: 'nn' overlapping: true. "#(0 2)"
When overlapping is not considered:
'nnnn' indicesOf 'nn' overlapping: false. "#(0 1 2)"
In Pharo, when a text is selected in a Playground, a scanner detects the substring and highlights matches. However I couldn't find a String implementation of this.
My best effort so far results in this implementation in String (Pharo 6):
indicesOfSubstring: subString
| indices i |
indices := OrderedCollection new: self size.
i := 0.
[ (i := self findString: subString startingAt: i + 1) > 0 ] whileTrue: [
indices addLast: i ].
^ indices

Let me firstly clarify that Smalltalk collections are 1-based, not 0-based. Therefore your examples should read
'nnnn' indexesOf: 'nn' overlapping: false. "#(1 3)"
'nnnn' indexesOf: 'nn' overlapping: true. "#(1 2 3)"
Note that I've also taken notice of #lurker's observation (and have tweaked the selector too).
Now, starting from your code I would change it as follows:
indexesOfSubstring: subString overlapping: aBoolean
| n indexes i |
n := subString size.
indexes := OrderedCollection new. "removed the size"
i := 1. "1-based"
[
i := self findString: subString startingAt: i. "split condition"
i > 0]
whileTrue: [
indexes add: i. "add: = addLast:"
i := aBoolean ifTrue: [i + 1] ifFalse: [i + n]]. "new!"
^indexes
Make sure you write some few unit tests (and don't forget to exercise the border cases!)

Edited
It would also be nice if you would tell us what you need to achieve in the "greater picture". Sometimes Smalltalk offers different approaches.
Leandro beat me to the the code (and his code is more efficient), but I have already written it so I'll share it too. Heed his advice on Smalltalk being 1-based => rewritten example.
I have used Smalltalk/X and Pharo 6.1 for the example.
The code would be:
indexesOfSubstring: substringToFind overlapping: aBoolean
| substringPositions aPosition currentPosition |
substringPositions := OrderedSet new. "with overlap on you could get multiple same
positions in the result when there is more to find in the source string"
substringToFindSize := substringToFind size. "speed up for large strings"
aPosition := 1.
[ self size > aPosition ] whileTrue: [
currentPosition := self findString: substringToFind startingAt: aPosition.
(currentPosition = 0) ifTrue: [ aPosition := self size + 1 ] "ends the loop substringToFind is not found"
ifFalse: [
substringPositions add: currentPosition.
aBoolean ifTrue: [ aPosition := aPosition + 1 ] "overlapping is on"
ifFalse: [ aPosition := currentPosition + substringToFindSize ] "overlapping is off"
]
].
^ substringPositions
I have fixed some issues that occured to me. Don't forget to test it as much as you can!

Related

Not able to create instance of object

I am just starting to use gnu-smalltalk. I have taken following code from here to define a class:
Number subclass: Complex [
| realpart imagpart |
"This is a quick way to define class-side methods."
Complex class >> new [
<category: 'instance creation'>
^self error: 'use real:imaginary:'
]
Complex class >> new: ignore [
<category: 'instance creation'>
^self new
]
Complex class >> real: r imaginary: i [
<category: 'instance creation'>
^(super new) setReal: r setImag: i
]
setReal: r setImag: i [ "What is this method with 2 names?"
<category: 'basic'>
realpart := r.
imagpart := i.
^self
]
]
However, I am not able to create any instances of this class. I have tried various methods and following gives least error!
cn := Complex new: real:15 imaginary:25
cn printNl
The error is:
complexNumber.st:24: expected object
Mostly the error is as follows, e.g. if there is no colon after new keyword:
$ gst complexNumber.st
Object: Complex error: use real:imaginary:
Error(Exception)>>signal (ExcHandling.st:254)
Error(Exception)>>signal: (ExcHandling.st:264)
Complex class(Object)>>error: (SysExcept.st:1456)
Complex class>>new (complexNumber.st:7)
UndefinedObject>>executeStatements (complexNumber.st:25)
nil
Also, I am not clear what is this method with 2 names, each with one argument:
setReal: r setImag: i [ "How can there be 2 names and arguments for one method/function?"
<category: 'basic'>
realpart := r.
imagpart := i.
^self
]
I believe usual method should be with one name and argument(s), as from code here :
spend: amount [
<category: 'moving money'>
balance := balance - amount
]
To create the Complex number 25 + 25i evaluate
Complex real: 25 imaginary: 25
How do I know? Because the first part of your question reads
Complex class >> real: r imaginary: i [
<category: 'instance creation'>
^(super new) setReal: r setImag: i
]
Your mistake was to write Complex new: real: 25 imaginary: 25, which doesn't conform to the Smalltalk syntax.
The Smalltalk syntax for a message with, say, 2 (or more) arguments consists of 2 (or more) keywords, ending with colon, followed, each of them, by the corresponding argument.
For example, the method setReal: r setImag: i has two keywords, namely setReal: and setImag: and receives two arguments r and i. The name of the method, which in Smalltalk is called its selector is the Symbol that results from concatenating the keywords, in this case setReal:setImag:.

Unexpected Hash flattening

I'm looking for explanation why those two data structures are not equal:
$ perl6 -e 'use Test; is-deeply [ { a => "b" } ], [ { a => "b" }, ];'
not ok 1 -
# Failed test at -e line 1
# expected: $[{:a("b")},]
# got: $[:a("b")]
Trailing comma in Hashes and Arrays is meaningless just like in P5:
$ perl6 -e '[ 1 ].elems.say; [ 1, ].elems.say'
1
1
But without it Hash is somehow lost and it gets flattened to array of Pairs:
$ perl6 -e '[ { a => "b", c => "d" } ].elems.say;'
2
I suspect some Great List Refactor laws apply here but I'd like to get more detailed explanation to understand logic behind this flattening.
Trailing comma in Hashes and Arrays is meaningless just like in P5
No, it's not meaningless:
(1 ).WHAT.say ; # (Int)
(1,).WHAT.say ; # (List)
The big simplification in the Great List Refactor was switching to the single argument rule for iterating features1. That is to say, features like a for or the array and hash composers (and subscripts) always get a single argument. That is indeed what's going on with your original example.
The single argument may be -- often will be -- a list of values, possibly even a list of lists etc., but the top level list would still then be a single argument to the iterating feature.
If the single argument to an iterating feature does the Iterable role (for example lists, arrays, and hashes), then it's iterated. (This is an imprecise formulation; see my answer to "When does for call the iterator method?" for a more precise one.)
So the key thing to note here about that extra comma is that if the single argument does not do the Iterable role, such as 1, then the end result is exactly the same as if the argument were instead a list containing just that one value (i.e. 1,):
.perl.say for {:a("b")} ; # :a("b") Iterable Hash was iterated
.perl.say for {:a("b")} , ; # {:a("b")} Iterable List was iterated
.perl.say for 1 ; # 1 Non Iterable 1 left as is
.perl.say for 1 , ; # 1 Iterable List was iterated
The typical way "to preserve structure [other than] using trailing comma when single element list is declared" (see comment below), i.e. to
stop a single Iterable value being iterated as it normally would, is by item-izing it with a $:
my #t = [ $[ $[ "a" ] ] ];
#t.push: "b";
#t.perl.say; # [[["a"],], "b"]
1 The iteration is used to get values to be passed to some code in the case of a for; to get values to become elements of the array/hash being constructed in the case of a composer; to get an indexing slice in the case of a subscript; and so on for other iterating features.

Time formatting (HH:MM:SS) in any Smalltalk dialect

I have three integer values, say
h := 3.
m := 19.
s := 8.
I would like to produce the string '03:19:08'. I know how to turn a number into a string, and even pad it with a zero if necessary. So as a first pass I wrote this absolutely horrific code:
h < 10 ifTrue: [hs := '0', (h asString)] ifFalse: [hs := h asString].
m < 10 ifTrue: [ms := '0', (m asString)] ifFalse: [ms := m asString].
s < 10 ifTrue: [ss := '0', (s asString)] ifFalse: [ss := s asString].
Transcript show: hs, ':', ms, ':', ss.
Transcript nl.
Now obviously I need to clean this up and so was wondering, among other things what the most idiomatic Smalltalk approach would be here. Could it be something like (not legal Smalltalk obviously):
aCollectionWithHMS each [c | padWithZero] join ':'
I found a discussion on streams with a print method taking a separatedBy argument but wouldn't there be a simpler way to do things just with strings?
Or perhaps there is a more elegant way to pad the three components and then I could just return hs, ':', ms, ':', ss ?
Or, is there an interface to POSIX time formatting (or something similar) common to all Smalltalks? I know GNU Smalltalk can link to C but this is way too much overkill for this simple problem IMHO.
EDIT
I got a little closer:
z := {h . m . s} collect: [:c | c < 10 ifTrue: ['0', c asString] ifFalse: [c asString]].
(Transcript show: ((z at: 1), ':', (z at: 2), ':', (z at: 3))) nl.
But the direct access of collection elements makes me sad. I found a page documenting the joining method asStringWith but that method is unsupported, it seems in GNU Smalltalk.
Here is a way to do this in Pharo:
String streamContents: [:stream |
{h.m.s}
do: [:token | token printOn: stream base: 10 nDigits: 2]
separatedBy: [stream nextPut: $:]]
Explanation:
The streamContents: message answers with the contents of the WriteStream represented by the formal block argument stream.
The do:separatedBy: message enumerates the tokens h, m and s evaluating the do: block for each of them and inserting the evaluation of the second block between consecutive tokens.
The printOn:base:nDigits: message dumps on the stream the base 10 representation of the token padded to 2 digits.
If the dialect you are using doesn't have the printOn:base:nDigits: method (or any appropriate variation of it), you can do the following:
String streamContents: [:stream |
{h.m.s}
do: [:token |
token < 10 ifTrue: [stream nextPut: $0].
stream nextPutAll: token asString]
separatedBy: [stream nextPut: $:]]
Finally, if you think you will be using this a lot, I would recommend adding the message hhmmss to Time (instance side), implemented as above with self hours instead of h, etc. Then it would be a matter of sending
(Time hour: h minute: m second: s) hhmmss
assuming you have these three quantities instead of a Time object, which would be unusual. Otherwise, you would only need something like
aTime hhmmss
ADDENDUM
Here is another way that will work on any dialect:
{h.m.s}
inject: ''
into: [:r :t | | pad colon |
pad := t < 10 ifTrue: ['0'] ifFalse: [''].
colon := r isEmpty ifTrue: [''] ifFalse: [':'].
r , colon, pad, t asString]
The inject:into: method builds its result from the inject: argument (the empty String in this case) and keeps replacing the formal block argument r with the value of the previous iteration. The second formal argument t is replaced with the corresponding element of each iteration.
ADDENDUM 2
time := '00:00:00' copy.
{h asString. m asString. s asString} withIndexDo: [:t :i |
time at: i - 1 * 3 + 2 put: t last.
t size = 2 ifTrue: [time at: i - 1 * 3 + 1 put: t first]].
^time
The copy is necessary to make sure that the literal is not modified.

How to assign a value to a position in a sequence in XQuery?

I have a sequence of length $n initialized to zeroes:
let $seq := (for $i in (1 to $n) return 0)
I can access a position easily...
return $seq[5]
...but how do I update it? (the following doesn't work)
let $seq[5] := $seq[5] + 1
If you're using an XQuery implementation which supports XQuery 3 maps (eg. Saxon, BaseX), you could use these:
declare namespace map="http://www.w3.org/2005/xpath-functions/map";
(: Fill map with square numbers :)
let $map := map:new(
for $i in (1 to 10)
return map:entry($i, $i*$i)
)
(: Overwrite a single value :)
let $map := map:new(($map, map:entry(2, 5)))
(: Fetch this value :)
return map:get($map, 2)
But generally it is possible to solve a problem without maps and in most cases this code will probably run faster as it will get better optimized.
XQuery is a functional, not a procedural language, so variables are immutable - they cannot be updated once assigned. You would need to do something like this and create a new sequence:
let $seq2 :=
for $n at $pos in $seq
if ($pos eq 5)
then $n + 1
else $n
Generally, in XQuery, it's best to design algorithms so that this type of mutable variable workaround isn't necessary. If you have data that needs to be updated, consider putting it in the database.
...but how do I update it? (the following doesn't work)
let $seq[5] := $seq[5] + 1
Using pure XPath (which is also pure XQuery :) here is probably the shortest way to specify this:
subsequence($seq, 1, 4), $seq[5] + 1, subsequence($seq, 6)
This produces a new sequence whose items are the same as the items of $seq, except that its 5th item's value is $seq[5] + 1.
As others have noted, XPath and XQuery are functional languages and among other things this means that a variable, once defined, cannot have its value modified.

Smalltalk - Compare two strings for equality

I am trying to compare two strings in Smalltalk, but I seem to be doing something wrong.
I keep getting this error:
Unhandled Exception: Non-boolean receiver. Proceed for truth.
stringOne := 'hello'.
stringTwo := 'hello'.
myNumber := 10.
[stringOne = stringTwo ] ifTrue:[
myNumber := 20].
Any idea what I'm doing wrong?
Try
stringOne = stringTwo
ifTrue: [myNumber := 20]`
I don't think you need square brackets in the first line
Found great explanation. Whole thing is here
In Smalltalk, booleans (ie, True or False) are objects: specifically, they're instantiations of the abstract base class Boolean, or rather of its two subclasses True and False. So every boolean has type True or False, and no actual member data. Bool has two virtual functions, ifTrue: and ifFalse:, which take as their argument a block of code. Both True and False override these functions; True's version of ifTrue: calls the code it's passed, and False's version does nothing (and vice-versa for ifFalse:). Here's an example:
a < b
ifTrue: [^'a is less than b']
ifFalse: [^'a is greater than or equal to b']
Those things in square brackets are essentially anonymous functions, by the way. Except they're objects, because everything is an object in Smalltalk. Now, what's happening there is that we call a's "<" method, with argument b; this returns a boolean. We call its ifTrue: and ifFalse: methods, passing as arguments the code we want executed in either case. The effect is the same as that of the Ruby code
if a < b then
puts "a is less than b"
else
puts "a is greater than or equal to b"
end
As others have said, it will work the way you want if you get rid of the first set of square brackets.
But to explain the problem you were running into better:
[stringOne = stringTwo ] ifTrue:[myNumber := 20]
is passing the message ifTrue: to a block, and blocks do not understand that method, only boolean objects do.
If you first evaluate the block, it will evaluate to a true object, which will then know how to respond:
[stringOne = stringTwo] value ifTrue:[myNumber := 20]
Or what you should really do, as others have pointed out:
stringOne = stringTwo ifTrue:[myNumber := 20]
both of which evaluates stringOne = stringTwo to true before sending ifTrue:[...] to it.
[stringOne = stringTwo] is a block, not a boolean. When the block is invoked, perhaps it will result in a boolean. But you are not invoking the block here. Instead, you are merely causing the block to be the receiver of ifTrue.
Instead, try:
(stringOne = stringTwo) ifTrue: [
myNumber := 20 ].
Should you be blocking the comparison? I would have thought that:
( stringOne = stringTwo ) ifTrue: [ myNumber := 20 ]
would be enough.
but I seem to be doing something wrong
Given that you are using VisualWorks your install should include a doc folder.
Look at the AppDevGuide.pdf - it has a lot of information about programming with VisualWorks and more to the point it has a lot of introductory information about Smalltalk programming.
Look through the Contents table at the beginning, until Chapter 7 "Control Structures", click "Branching" or "Conditional Tests" and you'll be taken to the appropriate section in the pdf that tells you all about Smalltalk if-then-else and gives examples that would have helped you see what you were doing wrong.
I would like to add the following 50Cent:
as blocks are actually lambdas which can be passed around, another good example would be the following method:
do:aBlock ifCondition:aCondition
... some more code ...
aCondition value ifTrue: aBlock.
... some more code ...
aBlock value
...
so the argument to ifTrue:/ifFalse: can actually come from someone else. This kind of passed-in conditions is often useful in "..ifAbsent:" or "..onError:" kind of methods.
(originally meant as a comment, but I could not get the code example to be unformatted)

Resources