Does Lua optimize concatenating with an empty string? - string

I have two strings. One of them is often (but not always) empty. The other is huge:
a = ""
b = "... huge string ..."
I need to concatenate the two strings. So I do the following:
return a .. b
But, if a is empty, this would, temporarily, unnecessarily create a copy of the huge string.
So I thought to write it as follows:
return (a == "" and b) or (a .. b)
This would solve the problem. But, I was wondering: does Lua optimize a concatenation that involves an empty string? That is, if we write a .. b, does Lua check to see if either of the strings is empty and return the other one immediately? If so, I could simply write a ..b instead of the more elaborate code.

Yes, it does.
In the Lua 5.2 source code luaV_concat:
if (!(ttisstring(top-2) || ttisnumber(top-2)) || !tostring(L, top-1)) {
if (!call_binTM(L, top-2, top-1, top-2, TM_CONCAT))
luaG_concaterror(L, top-2, top-1);
}
else if (tsvalue(top-1)->len == 0) /* second operand is empty? */
(void)tostring(L, top - 2); /* result is first operand */
else if (ttisstring(top-2) && tsvalue(top-2)->len == 0) {
setobjs2s(L, top - 2, top - 1); /* result is second op. */
}
else {
/* at least two non-empty string values; get as many as possible */
The two else if parts are exactly doing the job of optimizing string concatenation when one of the operand is an empty string.

Related

What does the int value returned from compareTo function in Kotlin really mean?

In the documentation of compareTo function, I read:
Returns zero if this object is equal to the specified other object, a
negative number if it's less than other, or a positive number if it's
greater than other.
What does this less than or greater than mean in the context of strings? Is -for example- Hello World less than a single character a?
val epicString = "Hello World"
println(epicString.compareTo("a")) //-25
Why -25 and not -10 or -1 (for example)?
Other examples:
val epicString = "Hello World"
println(epicString.compareTo("HelloWorld")) //-55
Is Hello World less than HelloWorld? Why?
Why it returns -55 and not -1, -2, -3, etc?
val epicString = "Hello World"
println(epicString.compareTo("Hello World")) //55
Is Hello World greater than Hello World? Why?
Why it returns 55 and not 1, 2, 3, etc?
I believe you're asking about the implementation of compareTo method for java.lang.String. Here is a source code for java 11:
public int compareTo(String anotherString) {
byte v1[] = value;
byte v2[] = anotherString.value;
if (coder() == anotherString.coder()) {
return isLatin1() ? StringLatin1.compareTo(v1, v2)
: StringUTF16.compareTo(v1, v2);
}
return isLatin1() ? StringLatin1.compareToUTF16(v1, v2)
: StringUTF16.compareToLatin1(v1, v2);
}
So we have a delegation to either StringLatin1 or StringUTF16 here, so we should look further:
Fortunately StringLatin1 and StringUTF16 have similar implementation when it comes to compare functionality:
Here is an implementation for StringLatin1 for example:
public static int compareTo(byte[] value, byte[] other) {
int len1 = value.length;
int len2 = other.length;
return compareTo(value, other, len1, len2);
}
public static int compareTo(byte[] value, byte[] other, int len1, int len2) {
int lim = Math.min(len1, len2);
for (int k = 0; k < lim; k++) {
if (value[k] != other[k]) {
return getChar(value, k) - getChar(other, k);
}
}
return len1 - len2;
}
As you see, it iterated over the characters of the shorter string and in case the charaters in the same index of two strings are different it returns the difference between them. If during the iterations it doesn't find any different (one string is prefix of another) it resorts to the comparison between the length of two strings.
In your case, there is a difference in the first iteration already...
So its the same as `"H".compareTo("a") --> -25".
The code of "H" is 72
The code of "a" is 97
So, 72 - 97 = -25
Short answer: The exact value doesn't have any meaning; only its sign does.
As the specification for compareTo() says, it returns a -ve number if the receiver is smaller than the other object, a +ve number if the receiver is larger, or 0 if the two are considered equal (for the purposes of this ordering).
The specification doesn't distinguish between different -ve numbers, nor between different +ve numbers — and so neither should you.  Some classes always return -1, 0, and 1, while others return different numbers, but that's just an implementation detail — and implementations vary.
Let's look at a very simple hypothetical example:
class Length(val metres: Int) : Comparable<Length> {
override fun compareTo(other: Length)
= metres - other.metres
}
This class has a single numerical property, so we can use that property to compare them.  One common way to do the comparison is simply to subtract the two lengths: that gives a number which is positive if the receiver is larger, negative if it's smaller, and zero of they're the same length — which is just what we need.
In this case, the value of compareTo() would happen to be the signed difference between the two lengths.
However, that method has a subtle bug: the subtraction could overflow, and give the wrong results if the difference is bigger than Int.MAX_VALUE.  (Obviously, to hit that you'd need to be working with astronomical distances, both positive and negative — but that's not implausible.  Rocket scientists write programs too!)
To fix it, you might change it to something like:
class Length(val metres: Int) : Comparable<Length> {
override fun compareTo(other: Length) = when {
metres > other.metres -> 1
metres < other.metres -> -1
else -> 0
}
}
That fixes the bug; it works for all possible lengths.
But notice that the actual return value has changed in most cases: now it only ever returns -1, 0, or 1, and no longer gives an indication of the actual difference in lengths.
If this was your class, then it would be safe to make this change because it still matches the specification.  Anyone who just looked at the sign of the result would see no change (apart from the bug fix).  Anyone using the exact value would find that their programs were now broken — but that's their own fault, because they shouldn't have been relying on that, because it was undocumented behaviour.
Exactly the same applies to the String class and its implementation.  While it might be interesting to poke around inside it and look at how it's written, the code you write should never rely on that sort of detail.  (It could change in a future version.  Or someone could apply your code to another object which didn't behave the same way.  Or you might want to expand your project to be cross-platform, and discover the hard way that the JavaScript implementation didn't behave exactly the same as the Java one.)
In the long run, life is much simpler if you don't assume anything more than the specification promises!

Shouldn't Empty Strings Implicitly Convert to false

Why does
if (x) {
f();
}
call f() if x is an empty string ""?
Shouldn't empty strings in D implicitly convert to bool false like they do in Python and when empty arrays does it (in D)?
Update: I fixed the question. I had incorrectly reversed the reasoning logic. Luckily, the bright D minds understood what I meant anyway ;)
Conditions and if statements and loops are cast to bool by the compiler. So,
if(x) {...}
becomes
if(cast(bool)x) {...}
and in the case of arrays, casting to bool is equivalent to testing whether its ptr property is not null. So, it becomes
if(x.ptr !is null) {...}
In the case of arrays, this is actually a really bad test, because null arrays are considered to be the same as empty arrays. So, in most cases, you don't care whether an array is null or not. An array is essentially a struct that looks like
struct Array(T)
{
T* ptr;
size_t length;
}
The == operator will check whether all of the elements referred to by ptr are equal, but if length is 0 for both arrays, it doesn't care what the value of ptr is. That means that "" and null are equal (as are [] and null). However, the is operator explicitly checks the ptr properties for equality, so "" and null won't be the same according to the is operator, and whether a particular array which is empty has a null ptr depends on how its value was set. So, the fact that an array is empty really says nothing about whether it's null or not. You have to check with the is operator to know for sure.
The result of all this is that it's generally bad practice to put an array (or string) directly in a condition like you're doing with
if(x) {...}
Rather, you should be clear about what you're checking. Do you care whether it's empty? In that case, you should check either
if(x.empty) {...}
or
if(x.length == 0} {...}
Or do you really care that it's null? In that case, use the is operator:
if(x is null) {...}
The behavior of arrays in conditions is consistent with the rest of the language (e.g. pointer and reference types are checked to see whether they're null or not), but unfortunately, in practice, such behavior for arrays is quite bug-prone. So, I'd advise that you just don't ever put an array by itself in the condition of an if statement or loop.
the default conversion of arrays looks at the .ptr, which means only the default initialized arrays (or explicitly set to null) evaluate to false
as an added effect string literals in D are \0 terminated which means ("")[0] == '\0' and as such ("").ptr can't be null (which would lead to a segfault)
IMO it should look at the length and you can use the ptr when you need to
It does when I try it...
void main() {
import std.stdio;
string s = "";
if(s)
writeln("true"); // triggered
}
If it was "string s = null;" (which is the default initialization), it doesn't, because the null converts to false, but "" is ok on my computer. Are you sure it isn't null?
BTW, if you want to test for (non-)emptiness, the way I prefer to do it is if(x.length) and if(x.length == 0). Those work consistently for both "" and null, then if you specifically want null, do if(x is null). It is just a little more clear, especially since "" and null are interchangeable in a lot of other contexts in D.

Groovy, troubles with the range operator

I wrote a method in Groovy using the range operator in order to execute the same code multiple times:
/**
* Prints the {#code files} {#code copyCount} times using
* {#code printService}.
* <p>
* Exceptions may be thrown.
* #param printService Print service
* #param files List of {#code File} objects
* #param copyCount Number of copies to print
*/
private static void printJob(
PrintService printService,
List<File> files,
int copyCount) {
// No multiple copy support for PS files, must do it manually
for ( i in 1..copyCount ) {
// Print files
}
}
This method did not pass unit testing as it badly fails when copyCount is 0.
I searched the documentation and it seems that Groovy implements ranges like a "list of sequential values". As I understand, a range does not represent a representation of an interval of integers since it also has the notion of order embedded.
In Groovy a..b is not the set of integers x such that a <= x <= b.
In Groovy a..b is the representation of the enumeration u: [0,|b-a|] -> [a..b] defined as: u(0) = a, for all i in [1,|b-a|], u(i) = u(i-1) + sgn(b-a)
Now I can fix my code:
if (copyCount > 0) for ( i in 1..copyCount ) {
// Print files
}
Also in Groovy a..<b is the representation of the enumeration u: [0,|b-a|-1] -> [a..b-1] defined as: u(0) = a, for all i in [1,|b-a|-1], u(i) = u(i-1) + sgn(b-a)
I noticed that the code below is also working for copyCount positive or zero:
for ( i in 0..<copyCount ) {
// Print files
}
Still, if I can choose a solution where damages are minimized in case of inconsistency (say copyCount is -200, I may get 200 prints)...
0.step(copyCount, 1) {
// Print files
}
At least with this solution I get a GroovyRuntimeException: Infinite loop in case of a negative copyCount. It is groovy but not very pretty and I feel like I’m playing with fire.
There is also this solution, but I find it ugly.
for ( i in 0..<[0,n].max() ) {
// Print files
}
Therefore, in this case, I think the best is to avoid using the range operator, because it may be confusing for developers that are used to Perl, Ruby or Mathematics, or French (there is no word for this definition of range in French, we would just say "intervalle" for a range)... I also found it safer in case of inconsistency. Still, it is not so groovy.
for ( i = 1 ; i <= copyCount ; i++ ) {
// Print files
}
Why does the range operator in Groovy is so complicated? As I see it, the fact that the step is "magically" determined and that we can’t force it (like in Ruby) is a big flaw in this implementation. Am I the only one who was ever troubled by this (two prints instead of none, it would have been a bad bug ^^ )? Did I miss something? Is there any practical case where it is required for a range to revert order when the higher bound gets lower than the lower bound? Am I being too picky?

How Does the any Method Work in Groovy?

I came across this bit of code:
n = args[0] as Long
[*n..1, n].any{ println ' '*it + '*'*(n - ~n - it*2) }
It's used for printing a tree form of structure. Like this:
*
***
*****
*******
*
(for n=4)
How does the code [*n..1,n] produce [4, 3, 2, 1, 4]?
How does any method works here? The Doc doesn't help me much. What is a predictive that can be passed to any(as mentioned in Doc's)?
Whats the use of any and how its handled in this case?
Q1a: * "unpacks" an array. .. creates a range. [] creates a collection.
Q1b: *n..1 unpacks [4,3,2,1] into its individual parts.
Q1c: [4,3,2,1,n] == [4,3,2,1,4]
Q2: I don't know why any was used here; each works just as well, and makes more sense in context. any does loop over the connection, so the println side-effect functions as intended.
Normally any would be used to determine if any collection elements met a criteria, for example:
[*n..1,n].any { it > 10 } // Returns false, no elements are > 10
[*n..1,n].any { it == 3 } // Returns true, because at least one element is 3
The last statement of the closure is used to determine if each item meets the criteria. println returns null, so any will return false. The value is unused and discarded.
The only reason I can think of that someone might have used any is to avoid seeing the return value of each in the console. each returns the original collection.
1) n..1 is called a range literal, it creates a groovy.lang.Range object that decrements by 1 from n to 1. This is then merged into the surrounding list context using the "Spread operator (*)"
2) the any method is defined in DefaultGroovyMethods and it is a predicate function that returns true if an element in a collection satisfies the supplied predicate closure. In this example, the code doesn't check the return value, so original other could have produced the same output using an each call instead.

I want Flood Fill without stack and without recursion

I wanted to know how to apply flood fill on array , my array is two dimensional , which contains times new roman font type letter boundry.
The boundry line contains 1's and inside and outside all 0's.
I want to fill all 1's instead 0 in only inside.
But i need a logic which do not required more memory.
So avoid recursion and stack or queue
I don't normally do homework for other people, but I liked the challenge:
int c = -1;
while (c < 0)
{
/* Store breadcrumb trail, look to carry on */
a[x][y] = c--;
if (!hunt(0))
{
/* Nowhere to go, so back-track by looking for breadcrumb */
a[x][y] = 1;
c += 2;
hunt(c);
}
}
bool_t hunt(int v)
{
if (a[x-1][y] == v) { x--; return TRUE; }
if (a[x+1][y] == v) { x++; return TRUE; }
if (a[x][y-1] == v) { y--; return TRUE; }
if (a[x][y+1] == v) { y++; return TRUE; }
return FALSE;
}
Note that this doesn't check for hitting the edges of the array. Also, it assumes your array elements are e.g. ints, and that you're only using the values 0 and 1 in your image.
Your task doesn't make much sense. If you have a typeface, you don't want to fill it with a flood fill, but rather render it directly as filled polygon instead. Determining which parts are in and out of the typeface, especially for a serif font, if not going to give good results reliably.
The typical schematic algorithm for a filled polygon goes like this (no stack or recursion required), and it can be applied to a bitmap as well under certain conditions (I'll come to that):
For each line (or column, whatever suits your data structure better), toggle the fill at each intersection of the virtual line you're following and all polygon lines (boundaries).
Assume this (could be the middle line of an O character):
00010010001001000
^ ^ ^ ^
| | | stop
| | start
| stop
start
Result:
00011110001111000
This works for bitmaps as well, but only if you actually always have two boundaries for start and stop.
function LowMemFloodFill(pixel)
FillPixel(pixel)
Do
didFill = false
For each pixel
If current pixel has been filled
For each adjacent pixel
If adjacent has not been filled
FillPixel(adjacent)
didFill = true
End
End
End
End
While didFill
End
The catch is that you must be able to tell that a pixel has been filled (fill it with an unused color). Also, this would be extremely slow.
You basically can't. You have to store this information somewhere, because you have to know where else to start filling after you're done with your current section. Recursion lets you do it implicitly. Keeping your own stack lets you do it explicitly, with possibly some saving. Oli Charlesworth does a cute thing by keeping an array of the same size as the picture, but that uses even more memory than recursion or keeping a stack of positions.

Resources