Convert String into Spark Expression and evaluate it - apache-spark

We want to evaluate expression which will be in java String as spark Expression without Dataframe.
for example, these are working
import org.apache.spark.sql.catalyst.InternalRow;
import static org.apache.spark.sql.functions.*;
int age = 20;
Object result1= expr(String.format("%d*2",age)).expr().eval(InternalRow.empty());
System.out.println(result1); // results 40
Object result2 = expr("1+2").expr().eval(InternalRow.empty());
System.out.println(result2); //results 3
But can we also make this work?
System.out.println(expr("abs(-1)").expr().eval(InternalRow.empty()));

Related

Split a string by some separaator

Let say I have below dummy code
import a_package as ap
calc = ap.func.func1()
So in my case calc is a function or method. However I want python to consider this as a string and split that based . and then return the last element i.e. return func1.
Is there any direct way to achieve the same?
As the other reviewers noted, you will need to start with a string if you want to perform a string operation. Assuming you have a way of getting a string in the form you showed, here are two examples of how to get the part of that string you are interested in:
import re
my_string = "ap.func.func1()"
# Should be sufficient for the use case you describe
my_split_string = s.split(".")
print(my_split_string[-1])
# More powerful (but more complex) if you have extended use cases
match = re.search(r"\.([^.]+)$", my_string)
print(match.group(1))
func1()
func1()

Groovy: Constructor hash collision

I have the following groovy code:
def script
String credentials_id
String repository_path
String relative_directory
String repository_url
CredentialsWrapper(script, credentials_id, repository_name, repository_group, relative_directory=null) {
this(script, credentials_id, 'git#gitlab.foo.com:' + repository_group +'/' + repository_name + '.git', relative_directory);
}
CredentialsWrapper(script, credentials_id, repository_url, relative_directory=null) {
this.script = script;
this.credentials_id = credentials_id;
this.repository_url = repository_url;
if (null == relative_directory) {
int lastSeparatorIndex = repository_url.lastIndexOf("/");
int indexOfExt = repository_url.indexOf(".git");
this.relative_directory = repository_url.substring(lastSeparatorIndex+1, indexOfExt);
}
}
Jenkins gives me the following:
Unable to compile class com.foo.CredentialsWrapper due to hash collision in constructors # line 30, column 7.
I do not understand why, the constructors are different, they do not have the same number of arguments.
Also, "script" is an instance from "WorkflowScript", but I do not know what I should import to access this class, which would allow me to declare script explicitly instead of using "def"
Any idea ?
When you call the Constructor with four parameters, would you like to call the first or the second one?
If you write an constructor/method with default values, groovy will actually generate two or more versions.
So
Test(String x, String y ="test")
will result in
Test(String x, String y) {...}
and
Test(String x) {new Test(x, "test")}
So your code would like to compile to 4 constructors, but it contains the constructor with the signature
CredentialsWrapper(def, def, def, def)
two times.
If I understand your code correctly, you can omit one or both of the =null. The result will be the same, but you will get only two or three signatures. Then you can choose between both versions by calling calling them with the right parameter count.

Java and Python codes give different output?

This is the Java code that results 897986030:
import java.util.Arrays;
import java.util.Scanner;
class Algorithm {
public static void main(String args[]) throws Exception {
int mod = 1000000007;
long factor = 900414279;
long p1 = 883069911;
long p2 = 32;
long val = 560076994;
val = (val%mod+factor*p1*p2%mod)%mod;
System.out.println(val);
}
}
This is the equivalent Python code that outputs 480330031:
factor = 900414279
p1 = 883069911
p2 = 32;
val = 560076994;
mod = 1000000007;
val = (val%mod+factor*p1*p2%mod)%mod;
print val
Please help. Thanks!
The answer lies in the fact that you are using primitive types in java that are prone to overflows.
Let me explain if you are not aware of this concept already. In java, C, C++ and the likes primitive types have a certain amount of space allocated for them and the variables cannot use any space more than that. This means that there is a maximum number that the long data type can store. This is done for performance reasons.
What might be happening in the code above is that when you multiply two long values the result might become larger than the maximum the long data type can store. And this results in an overflow, causing the data to be narrowed. So the results of the math expression are messed up.
With Python this is not that much of an issue because Python can store numbers of a much larger range. Overflows are rare in Python. And this is why things like cryptographic applications where big numbers are used are easy to write in Python.

Iterate over JavaRDD<String> in Spark

I am working on SPARK. And my objective is to read lines from a file and sorted them based on hash. I understood that we get the file as RDD of lines. So is there a way by which i can iterate over this RDD so that i can read line by line. So i want to be able to convert it to Iterator type.
Am i limited to applying some transformation function on it in order to get it working. Following the lazy execution concept of Spark
So far i have tried this following transformation technique code.
SparkConf sparkConf = new SparkConf().setAppName("Sorting1");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
JavaRDD<String> lines = ctx.textFile("hdfs://localhost:9000/hash-example-output/part-r-00000", 1);
lines = lines.filter(new Function<String, Boolean>()
{
#Override
public Boolean call(String s) {
String str[] = COMMA.split(s);
unsortedArray1[i] = Long.parseLong(str[str.length-1]);
i++;
return s.contains("error");
}
});
lines.count();
ctx.stop();
sort(unsortedArray1);
If you want to sort string in RDD, you could use takeOrdered function
takeOrdered java.util.List takeOrdered(int num,
java.util.Comparator comp) Returns the first K elements from this RDD as defined by the specified
Comparator[T] and maintains the order.
Parameters: num - the number of
top elements to return comp - the comparator that defines the order
Returns: an array of top elements
or
takeOrdered java.util.List takeOrdered(int num) Returns the first K
elements from this RDD using the natural ordering for T while maintain
the order. Parameters: num - the number of top elements to return
Returns: an array of top elements
so you could do
List<String> sortedLines = lines.takeOrdered(lines.count());
ctx.stop();
since RDD are distributed and shuffeled for each transformation, it's kinda useless to sort when it's still in RDD form, because when sorted RDD transformed, it will be shuffled (cmiiw)
but take a look at JavaPairRDD.sortByKey()
Try collect():
List<String> list = lines.collect();
Collections.sort(list);

Converting a string to int in Groovy

I have a String that represents an integer value and would like to convert it to an int. Is there a groovy equivalent of Java's Integer.parseInt(String)?
Use the toInteger() method to convert a String to an Integer, e.g.
int value = "99".toInteger()
An alternative, which avoids using a deprecated method (see below) is
int value = "66" as Integer
If you need to check whether the String can be converted before performing the conversion, use
String number = "66"
if (number.isInteger()) {
int value = number as Integer
}
Deprecation Update
In recent versions of Groovy one of the toInteger() methods has been deprecated. The following is taken from org.codehaus.groovy.runtime.StringGroovyMethods in Groovy 2.4.4
/**
* Parse a CharSequence into an Integer
*
* #param self a CharSequence
* #return an Integer
* #since 1.8.2
*/
public static Integer toInteger(CharSequence self) {
return Integer.valueOf(self.toString().trim());
}
/**
* #deprecated Use the CharSequence version
* #see #toInteger(CharSequence)
*/
#Deprecated
public static Integer toInteger(String self) {
return toInteger((CharSequence) self);
}
You can force the non-deprecated version of the method to be called using something awful like:
int num = ((CharSequence) "66").toInteger()
Personally, I much prefer:
int num = 66 as Integer
Several ways to do it, this one's my favorite:
def number = '123' as int
As an addendum to Don's answer, not only does groovy add a .toInteger() method to Strings, it also adds toBigDecimal(), toBigInteger(), toBoolean(), toCharacter(), toDouble(), toFloat(), toList(), and toLong().
In the same vein, groovy also adds is* eqivalents to all of those that return true if the String in question can be parsed into the format in question.
The relevant GDK page is here.
I'm not sure if it was introduced in recent versions of groovy (initial answer is fairly old), but now you can use:
def num = mystring?.isInteger() ? mystring.toInteger() : null
or
def num = mystring?.isFloat() ? mystring.toFloat() : null
I recommend using floats or even doubles instead of integers in the case if the provided string is unreliable.
Well, Groovy accepts the Java form just fine. If you are asking if there is a Groovier way, there is a way to go to Integer.
Both are shown here:
String s = "99"
assert 99 == Integer.parseInt(s)
Integer i = s as Integer
assert 99 == i
also you can make static import
import static java.lang.Integer.parseInt as asInteger
and after this use
String s = "99"
asInteger(s)
toInteger() method is available in groovy, you could use that.
Several ways to achieve this. Examples are as below
a. return "22".toInteger()
b. if("22".isInteger()) return "22".toInteger()
c. return "22" as Integer()
d. return Integer.parseInt("22")
Hope this helps
Groovy Style conversion:
Integer num = '589' as Integer
If you have request parameter:
Integer age = params.int('age')
def str = "32"
int num = str as Integer
Here is the an other way. if you don't like exceptions.
def strnumber = "100"
def intValue = strnumber.isInteger() ? (strnumber as int) : null
The way to use should still be the toInteger(), because it is not really deprecated.
int value = '99'.toInteger()
The String version is deprecated, but the CharSequence is an Interface that a String implements. So, using a String is ok, because your code will still works even when the method will only work with CharSequence. Same goes for isInteger()
See this question for reference :
How to convert a String to CharSequence?
I commented, because the notion of deprecated on this method got me confuse and I want to avoid that for other people.
The Simpler Way Of Converting A String To Integer In Groovy Is As Follows...
String aa="25"
int i= aa.toInteger()
Now "i" Holds The Integer Value.

Resources