Delphi: strange substring result - string

Delphi Seattle (S10). Win32 project.
Yesterday I got wrong result in my old routine.
I found this line:
sPre := Copy(aSQLText, p - 1, 1);
aSQLText was 'CREATE', and p = 1.
The sPre got "C" result.
Hmmm... Then I wrote to watch window:
Copy('ABC', 0, 1)
and the result was "A"...
Ouch... What???
The copy handles the overflow in the end well.
But not at the beginning? Or what?
I hope that I haven't got any codes which points to before the string.
Do you also got this result in your Delphi?
Why?
As I know the strings internally stored as 4 byte length + string; and they
based in 1 (not 0 as any arrays). Is this a bug?

The call to copy in your code is resolved to the internal function _UStrCopy from System.pas. Right in the beginning of its implementation it checks the Index and Count parameters and corrects them when necessary. This includes forcing the Index to point to the first character if it is too low.
I agree that this should be documented, though.

The documentation for Copy doesn't specify what will happen in this instance, so I wouldn't call it a Bug per se.
I can see arguments for both solutions (empty string or as it does, assume 1 as starting position).
Point is, as it is not defined in the documentation what happens in this instance, it is unwise for a program to assume anything one way or another, as it may be implementation dependent and might even change over the course of different versions. If your code can risk having p=1 (or even p=0) and you want the empty string in that case, you should explicitly write code to that effect instead of relying on your (wrong, in this case) expectation on what the compiler might do:
if p<=1 then sPre:='' else sPre := Copy(aSQLText, p - 1, 1);

Related

Way to find a number at the end of a string in Smalltalk

I have different commands my program is reading in (i.e., print, count, min, max, etc.). These words can also include a number at the end of them (i.e., print3, count1, min2, max6, etc.). I'm trying to figure out a way to extract the command and the number so that I can use both in my code.
I'm struggling to figure out a way to find the last element in the string in order to extract it, in Smalltalk.
You didn't told which incarnation of Smalltalk you use, so I will explain what I would do in Pharo, that is the one I'm familiar with.
As someone that is playing with Pharo a few months at most, I can tell you the sheer amount of classes and methods available can feel overpowering at first, but the environment actually makes easy to find things. For example, when you know the exact input and output you want, but doesn't know if a method already exists somewhere, or its name, the Finder actually allow you to search by giving a example. You can open it in the world menu, as shown bellow:
By default it seeks selectors (method names) matching your input terms:
But this default is not what we need right now, so you must change the option in the upper right box to "Examples", and type in the search field a example of the input, followed by the output you want, both separated by a ".". The input example I used was the string 'max6', followed by the desired result, the number 6. Pharo then gives me a list of methods that match that:
To get what would return us the text part, you can make a new search, changing the example output from number 6 to the string 'max':
Fortunately there is several built-in methods matching the description of your problem.
There are more elegant ways, I suppose, but you can make use of the fact that String>>#asNumber only parses the part it can recognize. So you can do
'print31' reversed asNumber asString reversed asNumber
to give you 31. That only works if there actually is a number at the end.
This is one of those cases where we can presume the input data has a specific form, ie, the only numbers appear at the end of the string, and you want all those numbers. In that case it's not too hard to do, really, just:
numText := 'Kalahari78' select: [ :each | each isDigit ].
num := numText asInteger. "78"
To get the rest of the string without the digits, you can just use this:
'Kalahari78' withoutTrailingDigits. "Kalahari"6
As some of the Pharo "OGs" pointed out, you can take a look at the String class (just type CMD-Return, type in String, hit Return) and you will find an amazing number of methods for all kinds of things. Usually you can get some ideas from those. But then there are times when you really just need an answer!

squeak(smalltalk) how to use method `findSubstring: in: startingAt: matchTable:`?

what I should send for matchTable: selector?
in the implementation, there are no examples or detailed explanation so
I don't understand which object is getting the message if I put the string in in: selector
The matchTable: keyword provides a way to identify characters so that they become equivalent in comparisons. The argument is usually a ByteArray of 256 entries, containing at position i the code point of the ith character to be considered when comparing.
The main use of the table is to implement case-insensitive searches, where, e.g., A=a. Thus, instead of comparing the characters at hand during the search, what are compared are the elements found in the matchTable at their respective code points. So, instead of
(string1 at: i) = (string2 at: j)
the testing becomes something on the lines of
cp1 := string1 basicAt: i.
cp2 := string2 basicAt: j.
(table at: cp1) = (table at: cp2).
In other words, the matchTable: argument is used to map actual characters to the ones that actually matter for the comparisons.
Note that the same technique can be applied for case-sensitive/insensitive sorting.
Finally, bear in mind that this is a rather low-level method that non-system programmers would rarely need. You should be using instead higher level versions for finding substrings such as findString:startingAt:caseSensitive:, where the argument of the last keyword is a Boolean.

COBOL substring between two finite points

I understand that the string_variable(start:length) can be used to get a substring of a string given a starting point and substring length, however, I am finding that I often need to get a substring between a 'start' and 'end' point.
While I know I could always do this:
SUBTRACT start FROM end GIVING len
string(start:len)
It seems cumbersome to have to do so every time when I am writing programs that use this functionality often. Is there perhaps a quicker/built-in way of achieving this?
How about?
move str (start-pos : end-pos - start-pos + 1) to ...
You can subtract the first from the last, but you need to add 1 to get the correct length.
STRING is a statement name, as is START, and END is reserved. LENGTH is a function name. I avoid those in anything that looks like code.

Delphi - ADO query and FillChar generates errors

I have the following code:
var wqry:TAdoQuery;
...
FillChar(wSpaces,cSpacesAfter,' ');
try
wqry := TADOQuery.Create(nil);//here the error
wqry.Connection:=...
cSpacesAfter is a constant and has the value 1035. wSpaces is a local string variable. The problem is that I receive the following error when TAdoQuery is created
even it is in french, I believe you got the idea.....
If I comment the FillChar code, everything works ok. I have the usual compiler directives, nothing special. I'm using Delphi 7.
Can someone tell me what is wrong with that code?
The troublesome code is most likely this one
FillChar(wSpaces,cSpacesAfter,' ');
I'm assuming that wSpaces is of string type. A string variable is in fact nothing more than a pointer to the data structure that holds the string. You don't need to use pointer syntax because the compiler takes care of that for you.
So what this code does is overwrite the variable holding that pointer with 4 space characters and then write 1031 more spaces over the top of whatever follows the variable. In short you will completely corrupt your memory. That would explain why the FillChar works but the very next line of code dies a painful and dramatic death.
If your string indeed had space for 1035 characters your could instead write:
FillChar(wSpaces[1], cSpacesAfter, ' ');
However, if may be more idiomatic to write:
wSpaces := StringOfChar(' ', cSpacesAfter);
FillChar procedure fills out a section of storage Buffer with the same byte or character FillValue FillCount times.
It is principally used to initialise arrays of numbers. It can be used to initialise records and strings, but care should be used to avoid overwriting length fields. StringOfChar is best for filling out strings to the same character.
Are you sure wSpaces has the size enough to fit all of cSpacesAfter you write to it?

Newest Delphi compiler versions and String type compatibilty

I'm trying to make some String processing routines compatible with
newest delphi version. I'm using Delphi2005 and 2007 but I'm not totally sure of the compatibility.
Here are a few samples, are they compatible with both the old and the new string type ?
( I'll use an imaginary STRING_UNICODE directive ).
a Type definition:
{$IFNDEF UNICODE_STRING}
TextBuffer = Array[0..13] Of Char;
{$ELSE}
TextBuffer = Array[0..13] Of WideChar;
{$ENDIF}
Useless or not? Is the Char type (becomes what was) a WideChar before the Unicode String, or is there still a difference?
a Function:
Function RemoveBlanks(Text: String): String;
Var
i: integer;
Begin
result := '';
For i:= 0 To Length(Text) Do
Begin
{$IFNDEF UNICODE_STRING}
If Byte(Text[i]) < 21 Then Continue;
{$ELSE}
If Word(Text[i]) < 21 Then Continue;
{$ENDIF}
If Text[i] = ' ' Then Continue;
Result := Result + Text[i];
End;
Is the Word() casting OK?
Here there is also the ' ' problem. How is the space handled
in Unicode version? Should I also use the directive to
differentiate ' ' and ' ' or will the ' ' be automatically handled
as a 2-byte blank?
a line jump:
NewLineBegin := CanReadText( aPTextBuffer, #13#10 );
How is the the second argument (#13#10) interpreted in the Unicode version? Is it compatible? Will it be translated to the byte block 00130010? If not, then should the directive be used instead with the constant #0013#0010?
The first thing to do is read Marco CantĂș's paper on
Unicode: http://edn.embarcadero.com/article/38980
Question 1
Just use Char all the time with no conditional code and it will work in old and new.
Char is a special type that is an 8 bit type in old versions of Delphi and a 16 bit type in new Unicode versions.
Question 2
Char is an ordinal type so you can write if s[i]<#21.
You also need to start loops at 1 for strings since they use 1-based indexing.
Question 3
Writing #0013 is not needed, #13 is fine.
In short almost all well written code will need no changes.
Compiler Directives
In general, I'd advise you to be very wary of compiler directives. They serve their purpose, but for general use, they should probably be avoided altogether.
The first problem is that you have to compile your app and test it twice, because it is fundamentally and/or subtly different for a directive on/off.
This situation get worse for each additional directive, because you usually have to permute the combinations:
D1 On, D2 On
D1 On, D2 Off
D1 Off, D2 On
D1 Off, D2 Off
3 directives is 8 permutations... etc.
Unicode Strings
Please see: Get ready for Delphi 2009 and up when developing with Delphi 7?
It has some nice answers for you to consider.
Question 1
As said, I advise against it. I also advise against for other reasons in my answer to the above mentioned question.
More specifically:
In Delphi <2009, both lines are different.
In Delphi >=2009 both lines are effectively the same.
Question 2
Not only is this ill advised for the same reasons as Question 1, but it actually has some subtle problems.
The more precise type of Text (String) is determined by your Delphi version. So:
In Delphi <2009, the else part of your conditional casts a single character to a Word. (Probably with no ill effect.)
In Delph >=2009, the if part of your conditional casts a double-byte character to a Byte. (With loss of information.)
Also, there are some special considerations, and new support classes for 'special' characters. You'll want to look into those. Refer to: How to identify unicode keys on key press?
Question 3
I'm pretty sure that #13 will be treated as a single character, so in Delphi >=2009 where Char == WideChar, that character will take up 2 bytes.
However, again look for Linebreak constants in Delphi. System.sLinebreak was probably introduced back in the Kylix days.
Generic type Char becomes either fundamental type AnsiChar or fundamental type WideChar (read up on generic vs. fundamental types). BTW, there is UNICODE symbol $DEFINEd for you already, however there is no need to branch at all, until specific byte size is required.
Second part smells, scratch it completely. It is an abuse of typecasts and creates a need for conditional compilation artifically. To get unsigned integer character code of given Char use Ord() function instead (or as said in the other answer - use ordinal traits of Char type).
For the third part, character constants are of generic type Char already. Again, there is no need to worry about, #13 becomes either byte sized $0D or word sized $0D00 (remember about little endianess)

Resources