Strings in java 8 less memory - string

I got below code and I was asked which option gets the following pattern:
XXXX-XXXX-XXXX-2324
...
Code Below:
public class CCMark {
public static String maskCC(String creditCard){
String x = "XXXX-XXXX-XXXX-";
//line 1
}
public static void main(String[] args) {
System.out.println(maskCC("1234-5678-1234-2324"));
System.out.println(maskCC("4567-5678-1234-5643"));
System.out.println(maskCC("1234-5678-1234-4654"));
System.out.println(maskCC("4567-5678-1234-5435"));
}
}
Below possible options that can be inserted on "line 1":
A)
return x + creditCard.substring(15, 19);
B)
StringBuilder sb = new StringBuilder(x);
sb.append(creditCard, 15, 19);
return sb.toString();
I think that the best option here, as A and B provide us with the same output, is B, because it is using StringBuilder which means that its approach is mutable, so it will use less memory than option A.
Am I wrong? Could it be that option A for this particular situation is the best option?

Options a and b are identical, because the Java compiler will convert option a into option b. You could move the declaration of x outside the method (and make it final). Something like,
static final String x = "XXXX-XXXX-XXXX-";
public static String maskCC(final String creditCard) {
return x + creditCard.substring(15, 19);
}
Using javap to check the first against, the second. Java code like,
String x = "XXXX-XXXX-XXXX-";
String creditCard = "1234-5678-1234-23324";
String x2 = x + creditCard.substring(15, 19);
StringBuilder sb = new StringBuilder(x);
sb.append(creditCard, 15, 19);
String x3 = sb.toString();
generates byte-code that looks like (note lines 6-31 and 32-58)
0: ldc #16 // String XXXX-XXXX-XXXX-
2: astore_1
3: ldc #18 // String 1234-5678-1234-23324
5: astore_2
6: new #20 // class java/lang/StringBuilder
9: dup
10: aload_1
11: invokestatic #22 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
14: invokespecial #28 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
17: aload_2
18: bipush 15
20: bipush 19
22: invokevirtual #31 // Method java/lang/String.substring:(II)Ljava/lang/String;
25: invokevirtual #35 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
28: invokevirtual #39 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
31: astore_3
32: new #20 // class java/lang/StringBuilder
35: dup
36: aload_1
37: invokespecial #28 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
40: astore 4
42: aload 4
44: aload_2
45: bipush 15
47: bipush 19
49: invokevirtual #43 // Method java/lang/StringBuilder.append:(Ljava/lang/CharSequence;II)Ljava/lang/StringBuilder;
52: pop
53: aload 4
55: invokevirtual #39 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
58: astore 5
60: return

The big advantage of the variant A, return x + creditCard.substring(15, 19); is that it is simple and clean and it works in all Java versions from 1 to 8. In the case that its compiled form uses StringBuffer, a simple recompile for Java 5 or newer will make it use StringBuilder instead. This flexibility is lost when you work with either, StringBuffer or StringBuilder, manually.
The exact compiled form is not fixed. Since the semantic of the method String.substring is not fixed by the Java Language Specification, compilers usually won’t touch this and compile it as an ordinary method invocation. The specification encourages compiler vendors to use StringBuilder for string concatenation (the + operator) whenever there is a benefit and most compilers will do so, even when there is no benefit. Here, both, x and the result of substring, are Strings so a simple String.concat would be simpler but most compilers always use StringBuilder, compiling variant A to the equivalent of return new StringBuilder().append(x).append(creditCard.substring(15, 19)).toString();.
Comparing this typical form with your variant B, we can conclude that variant B has two advantages performance-wise:
new StringBuilder(x) initializes the StringBuilder to a capacity of x.length()+16 which is sufficient for the entire operation, whereas the default capacity of new StringBuilder(), typically used for variant A, is fixed to 16 characters which misses the mark here as we have a result of 19 characters, thus a reallocation and copying of the underlying character array will occur
sb.append(creditCard, 15, 19); will copy the four characters without the need to create an intermediate String representation of these characters. The expenses of the substring operation differ depending on the implementation, e.g. in Oracle’s implementation there was a significant change with version 1.7.0_06; starting with this version a substring requires a new char[] array holding a copy of the affected character data as it doesn’t maintain a separate offset and length field
But note that all these differences of variant A and B only affect the formal description of the operation to perform. What will actually happen, is up to the JVM/JRE and usually the Hotspot optimizer knows a lot of string related operations and may fuse operations or elide intermediate string representations. Thus, the outcome regarding performance is rather unpredictable and may be affected by subtle changes to the implementation.
That’s why developers might stick to variant A which is, as said, simpler and more readable, and only care for performance once a profiler tells them that there is a performance problem that could be solved by dealing with Stringbuilder manually.

Related

How can I test that a field is set to a certain value?

This seems basic so I'm expecting this to be a dupe... but I haven't found anything that answers this question.
My app code is also Groovy. Say I have a field
def something
and in my test (where the CUT is a Spock Spy) I run a method in the middle of which there is a line
something = null
or
something = new Bubble()
... I'm simply trying to find a way of testing that something has indeed been set to null (or any value...)
In my then block I've tried:
1 * spyCUT.setSomething( null )
and
1 * spyCUT.setSomething(_)
and
1 * spyCUT.set( 'something', _ )
Incidentally, in answer to the objection that I could just test the value of something in the then block, the situation is that something is meant to be set first to one value and then to another in the course of this method...
Having read Groovy In Action 2nd Ed I have the vaguest of notions about how Groovy goes about dealing with getting and setting fields... Not enough, clearly.
MCVE (FWIW!)
class Spocko {
def something
def doStuff() {
something = 'fruit'
}
}
class SpockoTest extends Specification {
def 'test it'(){
given:
Spocko spySpocko = Spy( Spocko )
when:
spySpocko.doStuff()
then:
1 * spySpocko.setSomething(_)
}
}
LATER (after kriegaex's very helpful reply)
With above SpockTest where setSomething is invoked:
class Spocko {
def something
def doStuff() {
this.each{
it.something = 'fruit'
}
}
}
... passes! I'm trying now to understand why...
Incidentally I also find that the following passes (and doesn't without the closure):
1 * spySpocko.setProperty( 'something', _ )
After I have seen your MCVE, the question can be answered as follows: You cannot test for a method call which never happens. doStuff() just assigns a value to a field, it does not call a setter method internally. Look at this:
package de.scrum_master.stackoverflow
import spock.lang.Specification
class SpockoTest extends Specification {
static class Spocko {
def something
def doStuff() {
something = 'fruit'
}
def doMoreStuff() {
setSomething('vegetable')
}
}
def 'test it'(){
given: 'Spocko spy'
Spocko spySpocko = Spy(Spocko)
when: 'calling method assigning value to property'
spySpocko.doStuff()
then: 'no setter is called'
0 * spySpocko.setSomething(_)
spySpocko.something == 'fruit'
when: 'calling method using setter'
spySpocko.doMoreStuff()
then: 'setter gets called'
1 * spySpocko.setSomething('vegetable')
when: 'using Groovy setter-like syntax from another class'
spySpocko.something = 'fish'
then: 'actually a setter gets called'
1 * spySpocko.setSomething('fish')
}
}
This is what happens. When calling
javap -v target/test-classes/de/scrum_master/stackoverflow/SpockoTest\$Spocko.class
you see (output shortened):
public java.lang.Object doStuff();
descriptor: ()Ljava/lang/Object;
flags: ACC_PUBLIC
Code:
stack=2, locals=3, args_size=1
0: invokestatic #24 // Method $getCallSiteArray:()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
3: astore_1
4: ldc #36 // String fruit
6: astore_2
7: aload_2
8: aload_0
9: swap
10: putfield #38 // Field something:Ljava/lang/Object;
13: aload_2
14: areturn
15: aconst_null
16: areturn
public java.lang.Object doMoreStuff();
descriptor: ()Ljava/lang/Object;
flags: ACC_PUBLIC
Code:
stack=3, locals=2, args_size=1
0: invokestatic #24 // Method $getCallSiteArray:()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
3: astore_1
4: aload_1
5: ldc #40 // int 0
7: aaload
8: aload_0
9: ldc #42 // String vegetable
11: invokeinterface #48, 3 // InterfaceMethod org/codehaus/groovy/runtime/callsite/CallSite.callCurrent:(Lgroovy/lang/GroovyObject;Ljava/lang/Object;)Ljava/lang/Object;
16: areturn
17: aconst_null
18: areturn
Can you spot the difference?
Update after question edit 2: You wanted to know why this triggers the setter call:
def doStuff() {
this.each {
it.something = 'fruit'
}
}
This is because this is provided to the closure as a parameter, thus it.something = 'fruit' gets resolved dynamically just like in my example spySpocko.something = 'fish' because it is not an internal assignment like in something = 'fruit' (equivalent to this.something = 'fruit') anymore.
Actually I think this is not so difficult to understand even without looking at bytecode, just following the usual Groovy tutorials. I am repeating myself, but I do think you are over-engineering and over-complicating things a bit, testing things too deeply. I would not put tests like these into a production code base. Try to test the behaviour of your classes (think specifications and features!), not the innards' intricacies. But if it helps you understand how Groovy works, just continue playing.
As of now, please refrain from further question edits and follow-up questions. If you have a new problem, it would be better to create a new question with a new MCVE.

Groovy 2.4 variable scope in closure with #Field annotation

Can someone explain to me why in closure2 initVars('c') is not able to modify the referenced object if #Field is used in declaration?
import groovy.transform.Field;
#Field def lines4 = "a";
void initVars(String pref){
println('init:'+lines4+' '+pref) //*3.init:a b *7.init:b c
lines4 = pref;
}
println("closure1") ///1. closure1
1.times {
println(lines4) ///2. a
initVars('b') ///3. init:a b
lines4 += 'p1'
println(lines4) ///4. bp1
}
println("closure2") ///5. closure2
1.times {
println(lines4) ///6. bp1
initVars('c') ///7. init:b c
println(lines4) ///8. bp1 Why not c
lines4 += 'q1'
println(lines4) ///9. bp1q1 Why not cq1
}
Output:
C:\projects\ATT>groovy test.groovy
1. closure1
2. a
3. init:a b
4. bp1
5. closure2
6. bp1
7. init:b c
8. bp1
9. bp1q1
Output without #Field and def, with just lines4 = "a" in script scope. This appears normal to me.
C:\projects\ATT>groovy test.groovy
1. closure1
2. a
3. init:a
4. bp1
5. closure2
6. bp1
7. init:bp1
8. c
9. cq1
I saw same behavior in groovy2.5-beta and groovy 2.6-alpha.
Using #Field annotation on a script variable changes a scope of this variable from a local one to a Script class one:
Variable annotation used for changing the scope of a variable within a script from being within the run method of the script to being at the class level for the script.
The annotated variable will become a private field of the script class. The type of the field will be the same as the type of the variable. Example usage:
import groovy.transform.Field
#Field List awe = [1, 2, 3]
def awesum() { awe.sum() }
assert awesum() == 6
In this example, without the annotation, variable awe would be a local script variable (technically speaking it will be a local variable within the run method of the script class). Such a local variable would not be visible inside the awesum method. With the annotation, awe becomes a private List field in the script class and is visible within the awesum method.
Source: http://docs.groovy-lang.org/2.4.12/html/gapi/groovy/transform/Field.html
Every Groovy script extends groovy.lang.Script class and the body of the script is executed inside Script.run() method. Groovy passes variables to this script using Binding object. When you change a scope of a local script variable to a class level then there is no binding for this variable passed to a closure, because binding object contains only local-scoped variables. Compare these two screenshots I made. First one shows what the binding object looks like when we call initVars(String pref) for the first time and lines4 is a local script variable:
And here is same breakpoint but now lines4 is a #Field def lines4 variable:
As you can see there is no binding for lines4 variable in binding object, but there is a class field called lines4, while this binding is present in the first screenshot attached.
When you call
lines4 += 'p1'
in the first closure, local binding for lines4 is created and it is initialized with a current value of a this.lines4 value. It happens because Script.getProperty(String property) is implemented in following way:
public Object getProperty(String property) {
try {
return binding.getVariable(property);
} catch (MissingPropertyException e) {
return super.getProperty(property);
}
}
Source: https://github.com/apache/groovy/blob/GROOVY_2_4_X/src/main/groovy/lang/Script.java#L54
So it firstly checks if there is a binding for a variable you access in the closure and when it does not exist it passes execution to a parent's getProperty(name) implementation - in our case it just returns class property value. At this point this.lines4 is equal to b and this is the value that is returned.
initVars(String pref) method accesses class field, so when you call it it always overrides Script.lines4 property. But when you call
lines4 += 'q1'
in the second closure, the binding lines4 for a closure already exists and its value is bp1 - this value was associated in the first closure call. That's why you don't see c after calling initVars('c'). I hope it helps.
UPDATE: How binding works in a script explained
Let's get a little deeper to get better understanding what is going on under the hood. This is what your Groovy script looks like when it is compiled to a bytecode:
Compiled from "script_with_closures.groovy"
public class script_with_closures extends groovy.lang.Script {
java.lang.Object lines4;
public static transient boolean __$stMC;
public script_with_closures();
public script_with_closures(groovy.lang.Binding);
public static void main(java.lang.String...);
public java.lang.Object run();
public void initVars(java.lang.String);
protected groovy.lang.MetaClass $getStaticMetaClass();
}
Two things worth mentioning at this moment:
#Field def lines4 is compiled to a class field java.lang.Object lines4;
void initVars(String pref) method is compiled to public void initVars(java.lang.String); class method.
For a simplicity you can assume that the rest content (excluding lines4 and initVars method) of your script is inlined to public java.lang.Objectrun() method.
initVars always accesses class field lines4 because it has direct access to this field. Decompiling this method to a bytecode shows us this:
public void initVars(java.lang.String);
Code:
0: invokestatic #19 // Method $getCallSiteArray:()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
3: astore_2
4: aload_2
5: ldc #77 // int 5
7: aaload
8: aload_0
9: aload_2
10: ldc #78 // int 6
12: aaload
13: aload_2
14: ldc #79 // int 7
16: aaload
17: aload_2
18: ldc #80 // int 8
20: aaload
21: ldc #82 // String init:
23: aload_0
24: getfield #23 // Field lines4:Ljava/lang/Object;
27: invokeinterface #67, 3 // InterfaceMethod org/codehaus/groovy/runtime/callsite/CallSite.call:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
32: ldc #84 // String
34: invokeinterface #67, 3 // InterfaceMethod org/codehaus/groovy/runtime/callsite/CallSite.call:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
39: aload_1
40: invokeinterface #67, 3 // InterfaceMethod org/codehaus/groovy/runtime/callsite/CallSite.call:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
45: invokeinterface #52, 3 // InterfaceMethod org/codehaus/groovy/runtime/callsite/CallSite.callCurrent:(Lgroovy/lang/GroovyObject;Ljava/lang/Object;)Ljava/lang/Object;
50: pop
51: aload_1
52: astore_3
53: aload_3
54: aload_0
55: swap
56: putfield #23 // Field lines4:Ljava/lang/Object;
59: aload_3
60: pop
61: return
Operation 56 is a opcode for a assigning value to a field.
Now let's understand what happens when both closures gets called. First thing worth mentioning - both closures have delegate field set to the script object that is being executed. We know that it extends groovy.lang.Script class - a class that uses binding private field to store all bindings (variables) available in the script runtime. This is important observation, because groovy.lang.Script class overrides:
public Object getProperty(String property)
public void setProperty(String property, Object newValue)
Both methods use binding to lookup and store variables used in the script runtime. getProperty gets called any time you read local script variable and setProperty gets called any time you assign a value to script local variable. That's why code like:
lines4 += 'p1'
generates sequence like:
getProperty -> value + 'p1' -> setProperty
In your example first attempt of reading lines4 ends up with returning a value from parent class (it happens if binding is not found, then GroovyObjectSupport.getProperty(name) is called and this one returns a value of a class property with given name). When closure assigns a value to a lines4 variable then a binding is created. And because both closures share same binding object (they use delegate to the same instance), when second closure reads or writes line4 variable then it uses previously created binding. And initVars does not modify binding because as I shown you earlier it accesses class field directly.

Is there any way to print the object ref that called an instance/Static method, Using byte-code instrumentation

I read somewhere that when ever a method is called by "invokevirtual",
the object reference is fetched from the top of stack, followed by arguments.
I need to somehow print the object reference. Is it possible?
So, I'm not going to do it for you, because the actual code is annoying and tedious and if you're really genuinely interested you should learn how to do it yourself. But I will attempt to be helpful and provide you with some direction.
Firstly, you're going to want to read the ASM tutorials here.
The byte code format i'm going to write below comes from ASMIfier because it's much more clear. I'm going to completely ignore javap because it's even more pedantic and detailed, but if you want to know what it is actually showing you, then you should read about the Java ClassFile format.
Actually, you should do that first anyway, just to make sure that your background knowledge is somewhat filled out.
So, here's the nutshell of what you're going to want to do. You're going to want to write a ClassWriter that looks for instances of the INVOKEVIRTUAL opcode.
invokevirtual pops the values from the stack in reverse order, so last parameter first and the object you're invoking against last. the #38 you are referring too is not the object, its a reference to the constant pool which contains a method name and method descriptor pair which is used as metadata because the JVM is typesafe.
Lets assume you have this code:
package sample;
public class JavaSimpleHelloWorld {
public static void main(String[] args) {
System.out.println("Hello World");
}
}
If you run ASMIFier against it, you'll get something like this for just the main method ( cutting the context down for brevity )
public static main([Ljava/lang/String;)V
L0
LINENUMBER 6 L0
GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
LDC "Hello World"
INVOKEVIRTUAL java/io/PrintStream.println (Ljava/lang/String;)V
L1
LINENUMBER 7 L1
RETURN
L2
LOCALVARIABLE args [Ljava/lang/String; L0 L2 0
MAXSTACK = 2
MAXLOCALS = 1
so, you implement some sort of static dump method ( public static final dump( Object o ) ) , and write a class visitor that reorganizes your byte code.
You can use the method descriptor to figure out how deep in the pervious stack push instructions ( ALOAD, LDC, ) you need to insert the the DUP/INVOKE to print your methods object target. For example the Method Descriptor for System.out.println is [Ljava/lang/String;]V Which means the method takes an array of Strings and returns void. So you need to go 1 back in the stack to find the object target. Your bytecode would, in turn, look like this:
Happy byte code twiddling.
public static main([Ljava/lang/String;)V
L0
GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
DUP
INVOKESTATIC my/staticutil/ClassThatDumps.dump (Ljava/lang/Object;)V
LDC "Hello World"
INVOKEVIRTUAL java/io/PrintStream.println (Ljava/lang/String;)V
RETURN
L1
LOCALVARIABLE args [Ljava/lang/String; L0 L1 0
MAXSTACK = 2
MAXLOCALS = 1

Groovy - Type checking in script not working as expected

I have a Groovy application in which I allow the user to add custom behavior via Groovy scripts. I include those scripts via GroovyShell and type check them via Type Checking Extensions. The full code of how I include the script in my application is:
def config = new CompilerConfiguration()
config.addCompilationCustomizers(
new ASTTransformationCustomizer(TypeChecked)
)
def shell = new GroovyShell(config)
shell.evaluate(new File("path/to/some/file.groovy"))
This works fine. However, type checking in the script seems to be seriously broken. For example, I can include the following scripts without any complaint from the compiler:
String test = getTestValue() // automatic conversion from Integer to String. But WHY?
println "The value is $test" // shows as "The value is 0" on the console
private Integer getTestValue(){
return 0
}
I can even go further than that. When creating a class inside the script, I can assign it to a String without any error:
String y = new Test()
println y // shows Test#somenr on the console
class Test { }
Other type checks do work. I have not discovered any logic behind it yet, so any pointers in the right direction are greatly appreciated.
If in doubt, disasm. This is the bit around a call similar to yours: String x = new T():
0: invokestatic #17 // Method $getCallSiteArray:()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
3: astore_1
4: aload_1
5: ldc #40 // int 1
7: aaload
8: ldc #42 // class T
10: invokeinterface #46, 2 // InterfaceMethod org/codehaus/groovy/runtime/callsite/CallSite.callConstructor:(Ljava/lang/Object;)Ljava/lang/Object;
15: invokestatic #52 // Method org/codehaus/groovy/runtime/typehandling/ShortTypeHandling.castToString:(Ljava/lang/Object;)Ljava/lang/String;
18: checkcast #54 // class java/lang/String
So this is the culprit for that cast. This seems also to hold true for #TypeChecked/#CompileStatic.
This is most likely a bug in the Static Type Checker. When LHS of the expression is a String variable, a conversion invoking ShortTypeHandling.castToString() is applied to the RHS.
This holds true as of Groovy 2.4.13.

Portable multithreading support in bytecodes/intermediate languages/compiler backends?

I've been working on the parser for a programming language that requires multithreading support. While investigating what the backend of my compiler should be, I noticed that I cannot find much information on multithreading for things like CIL, LLVM IR, gcc RTL, or JVM bytecode. I can find some references on how to make such code thread safe, but nothing on how to, say, create or fork threads. I can of course use signals or something to interface directly with the operating system, but that's nonportable and error-prone.
Is it the case that there's simply no portable way for managing threads in these low-level languages? Should I compile to a high(er)-level language like C instead?
In JVM byte code you can use any Java libraries, including those that work with threads. The conventional way of creating a thread would be
new Thread() {
#Override
public void run() {
/// code
}
}.start();
This code is written in Java. The corresponding JVM byte code can be seen by using javap:
0: new #2 // class Main$1
3: dup
4: invokespecial #3 // Method Main$1."<init>":()V
7: invokevirtual #4 // Method Main$1.start:()V
10: return
And Main$1 is a class:
final class Main$1 extends java/lang/Thread {
// compiled from: Intf.java
OUTERCLASS Main main ([Ljava/lang/String;)V
// access flags 0x8
static INNERCLASS Main$1 null null
// access flags 0x0
<init>()V
L0
LINENUMBER 7 L0
ALOAD 0
INVOKESPECIAL java/lang/Thread.<init> ()V
RETURN
L1
LOCALVARIABLE this LMain$1; L0 L1 0
MAXSTACK = 1
MAXLOCALS = 1
// access flags 0x1
public run()V
L0
LINENUMBER 11 L0
RETURN
L1
LOCALVARIABLE this LMain$1; L0 L1 0
MAXSTACK = 0
MAXLOCALS = 1
}
This code is perfectly portable.

Resources