Every example I've seen (e.g., ElasticSearch: aggregation on _score field?) for doing aggregations on or related to the _score field seems to require the usage of scripting. With ElasticSearch disabling dynamic scripting by default for security reasons, is there any way to accomplish this without resorting to loading a script file onto every ES node or re-enabling dynamic scripting?
My original aggregation looked like the following:
"aggs": {
"terms_agg": {
"terms": {
"field": "field1",
"order": {"max_score": "desc"}
},
"aggs": {
"max_score": {
"max": {"script": "_score"}
},
"top_terms": {
"top_hits": {"size": 1}
}
}
}
Trying to specify expression as the lang doesn't seem to work as ES throws an error stating the score can only be accessed when being used to sort. I can't figure out any other method of ordering my buckets by the score field. Anyone have any ideas?
Edit: To clarify, my restriction is not being able to modify the server-side. I.e., I cannot add or edit anything as part of the ES installation or configuration.
One possible approach is to use the other scripting options available. mvel seems not to be possible to be used unless dynamic scripting is enabled. And, unless a more fine-grained control of scripting enable/disable reaches 1.6 version, I don't think is possible to enable dynamic scripting for mvel and not for groovy.
We are left with native and mustache (used for templates) that are enabled by default. I don't think custom scripting can be done with mustache, if it's possible I didn't find a way and we are left with native (Java) scripting.
Here's my take to this:
create an implementation of NativeScriptFactory:
package com.foo.script;
import java.util.Map;
import org.elasticsearch.script.ExecutableScript;
import org.elasticsearch.script.NativeScriptFactory;
public class MyScriptNativeScriptFactory implements NativeScriptFactory {
#Override
public ExecutableScript newScript(Map<String, Object> arg0) {
return new MyScript();
}
}
an implementation of AbstractFloatSearchScript for example:
package com.foo.script;
import java.io.IOException;
import org.elasticsearch.script.AbstractFloatSearchScript;
public class MyScript extends AbstractFloatSearchScript {
#Override
public float runAsFloat() {
try {
return score();
} catch (IOException e) {
e.printStackTrace();
}
return 0;
}
}
alternatively, build a simple Maven project to tie all together. pom.xml:
<properties>
<elasticsearch.version>1.5.2</elasticsearch.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
build it and get the resulting jar file.
place the jar inside [ES_folder]/lib
edit elasticsearch.yml and add
script.native.my_script.type: com.foo.script.MyScriptNativeScriptFactory
restart ES nodes.
use it in aggregations:
{
"aggs": {
"max_score": {
"max": {
"script": "my_script",
"lang": "native"
}
}
}
}
My sample above just returns the _score as a script but, of course, it can be used in more advanced scenarios.
EDIT: if you are not allowed to touch the instances, then I don't think you have any options.
ElasticSearch at least of version 1.7.1 and possibly earlier also offers the use of Lucene's Expression scripting language – and as Expression is sandboxed by default it can be used for dynamic inline scripts in much the same way that Groovy was. In our case, where our production ES cluster has just been upgraded from 1.4.1 to 1.7.1, we decided not to use Groovy anymore because of it's non-sandboxed nature, although we really still want to make use of dynamic scripts because of the ease of deployment and the flexibility they offer as we continue to fine-tune our application and its search layer.
While writing a native Java script as a replacement for our dynamic Groovy function scores may have also have been a possibility in our case, we wanted to look at the feasibility of using Expression for our dynamic inline scripting language instead. After reading through the documentation, I found that we were simply able to change the "lang" attribute from "groovy" to "expression" in our inline function_score scripts and with the script.inline: sandbox property set in the .../config/elasticsearch.yml file – the function score script worked without any other modification. As such, we can now continue to use dynamic inline scripting within ElasticSearch, and do so with sandboxing enabled (as Expression is sandboxed by default). Obviously other security measures such as running your ES cluster behind an application proxy and firewall should also be implemented to ensure that outside users have no direct access to your ES nodes or the ES API. However, this was a very simple change, that for now has solved a problem with Groovy's lack of sandboxing and the concerns over enabling it to run without sandboxing.
While switching your dynamic scripts to Expression may only work or be applicable in some cases (depending on the complexity of your inline dynamic scripts), it seemed it was worth sharing this information in the hopes it could help other developers.
As a note, one of the other supported ES scripting languages, Mustache only appears to be usable for creating templates within your search queries. It does not appear to be usable for any of the more complexing scripting needs such as function_score, etc., although I am not sure this was entirely apparent during the first read through of the updated ES documentation.
Lastly, one further issue to be mindful of is that the use of Lucene Expression scripts are marked as an experimental feature in the latest ES release and the documentation notes that as this scripting extension is undergoing significant development work at this time, its usage or functionality may change in later versions of ES. Thus if you do switch over to using Expression for any of your scripts (dynamic or otherwise), it should be noted in your documentation/developer notes to revisit these changes before upgrading your ES installation next time to ensure your scripts remain compatible and work as expected.
For our situation at least, unless we were willing to allow non-sandboxed dynamic scripting to be enabled again in the latest version of ES (via the script.inline: on option) so that inline Groovy scripts could continue to run, switching over to Lucene Expression scripting seemed like the best option for now.
It will be interesting to see what changes occur to the scripting choices for ES in future releases, especially given that the (apparently ineffective) sandboxing option for Groovy will be completely removed by version 2.0. Hopefully other protections can be put in place to enable dynamic Groovy usage, or perhaps Lucene Expression scripting will take Groovy's place and will enable all the types of dynamic scripting that developers are already making use of.
For more notes on Lucene Expression see the ES documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html#_lucene_expressions_scripts – this page is also the source of the note regarding the planned removal of Groovy's sandboxing option from ES v2.0+. Further Lucene Expression documentation can be found here: http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html
Related
According to the Puppet documentation:
Order does not matter in a declarative language.
If that is the case, why does this bit of code work:
class myserver {
$package_to_install = 'libcapture-tiny-perl'
package {
$package_to_install: ensure => present;
}
}
but this code does not work:
class myserver {
package {
$package_to_install: ensure => present;
}
$package_to_install = 'libcapture-tiny-perl'
}
If order matters, then I can see why one works and the other does not, but since order does not matter, why do they behave differently?
Disclaimer: I am one of the Puppet developers.
Because our language isn't, as our documentation claims, actually declarative. It is actually ordered. :(
Evaluation is more or less top to bottom inside the class or declaration. The product of that evaluation is a resource in the catalog, however, not evaluation of the catalog.
Think of the DSL as a not-entirely-declarative way to build the catalog, a graph of resources, that are entirely declarative in processing.
I'm totally new to this gradle, teamcity and groovy.
I'm tryign to write a plugin,which will get the value from svninfo. If the developer wants to override the value(in build.gradle) they can override something like this.
globalVariables{
virtualRepo = "virtualRepo"
baseName = "baseName"
version = "version"
group = "group"
}
Here i provide the extension called globalvariable.
Now, The jars to be produced shd hav the name according to the values from the build.gradle..
How to get the value from build.gradle in the plugin inorder name the jar???
Not sure I understand the question. It's the plugin that installs the extension object, and it's the plugin that needs to do something with it.
Note that the plugin has to defer reading information from the extension object because the latter might only get populated after the plugin has run. (A plugin runs when apply plugin: is called in a build script. globalVariables {} might come after that.) There are several techniques for deferring configuration. In your particular case, if I wanted to use the information provided by the extension object to configure a Jar task, I might use jar.doFirst { ... } or gradle.projectsEvaluated { jar. ... }.
Before you go about writing plugins, make sure to study the Writing Custom Plugins chapter in the Gradle user guide. A search on Stack Overflow or on http://forums.gradle.org should turn up more information on techniques for deferring configuration. Another valuable source of information is the Gradle codebase (e.g. the plugins in the code-quality subproject).
I'm planning to integrate Groovy Script Engine to my game so it will give the game nice moddability but how do you prevent players from writing evil scripts like deleting all files on C: drive?
Groovy includes library like java.io.File by default so it will be pretty easy to do once they decided to write such scripts.
I guess I can't prevent users from writing something like while(1==1){} but is there anyway to at least not let them allow to delete/modify files or something dangerous for PCs?
There's a blog post by Cedric Champeau on customising the Groovy Compilation process, the second part of it shows how to use SecureASTCustomizer and CompilerConfiguration to limit what Scripts can do (and then has examples of defining your own AST checks for System.exit, etc...
Look into the SecurityContext class.
The Groovy Web Console appears to have already solved this problem, because it won't execute something like System.exit(1). The source code is available on GitHub, so you can see how they did it.
If you're not sure where to start, I suggest getting in touch with the author, who should be able to point you in the right direction.
I know this is a old question. I'm posting this as it might help some people out there.
We needed to allow end-users to upload Groovy scripts and execute them as part of a web application (that does a lot of other things). Our concern was that within these Groovy scripts, some users might attempt to read files from the file system, read System properties, call System.exit(), etc.
I looked into http://mrhaki.blogspot.com/2014/04/groovy-goodness-restricting-script.html but that will not prevent an expert Groovy developer from bypassing the checks as pointed out by others in other posts.
I then tried to get http://www.sdidit.nl/2012/12/groovy-dsl-executing-scripts-in-sandbox.html working but setting the Security Manager and Policy implementation at runtime did not work for me. I kept running into issues during app server startup and web page access. It seemed like by the time the Policy implementation took hold, it was too late and "CodeSources" (in Java-Security-speak) already took its access settings from the default Java policy file.
I then stumbled across the excellent white paper by Ted Neward (http://www.tedneward.com/files/Papers/JavaPolicy/JavaPolicy.pdf) that explained quite convincingly that the best approach (for my use case) was to set the Policy implementation on JVM startup (instead of dynamically later on).
Below is the approach that worked for me (that combines Rene's and Ted's approaches). BTW: We're using Groovy 2.3.10.
In the [JDK_HOME]/jre/lib/security/java.security file, set the "policy.provider" value to "com.yourcompany.security.MySecurityPolicy".
Create the MySecurityPolicy class:
import java.net.MalformedURLException;
import java.net.URL;
import java.security.AllPermission;
import java.security.CodeSource;
import java.security.PermissionCollection;
import java.security.Permissions;
import java.security.Policy;
import java.util.HashSet;
import java.util.Set;
public class MySecurityPolicy extends Policy {
private final Set<URL> locations;
public MySecurityPolicy() {
try {
locations = new HashSet<URL>();
locations.add(new URL("file", "", "/groovy/shell"));
locations.add(new URL("file", "", "/groovy/script"));
} catch (MalformedURLException e) {
throw new IllegalStateException(e);
}
}
#Override
public PermissionCollection getPermissions(CodeSource codeSource) {
// Do not store these in static or instance variables. It won't work. Also... they're cached by security infrastructure ... so this is okay.
PermissionCollection perms = new Permissions();
if (!locations.contains(codeSource.getLocation())) {
perms.add(new AllPermission());
}
return perms;
}
}
Jar up MySecurityPolicy and drop the jar in [JDK_HOME]/jre/lib/ext directory.
Add "-Djava.security.manager" to the JVM startup options. You do not need to provide a custom security manager. The default one works fine.
The "-Djava.security.manager" option enables Java Security Manager for the whole application. The application and all its dependencies will have "AllPermission" and will thereby be allowed to do anything.
Groovy scripts run under the "/groovy/shell" and "/groovy/script" "CodeSources". They're not necessarily physical directories on the file system. The code above does not give Groovy scripts any permissions.
Users could still do the following:
Thread.currentThread().interrupt();
while (true) {} (infinite loop)
You could prepend the following (dynamically at runtime) to the beginning of every script before passing it onto the Groovy shell for execution:
#ThreadInterrupt
import groovy.transform.ThreadInterrupt
#TimedInterrupt(5)
import groovy.transform.TimedInterrupt
These are expalined at http://www.jroller.com/melix/entry/upcoming_groovy_goodness_automatic_thread
The first one handles "Thread.currentThread().interrupt()" a little more gracefully (but it doesn't prevent the user from interupting the thread). Perhaps, you could use AST to prevent interupts to some extent. In our case, it's not a big issue as each Groovy script execution runs in its own thread and if bad actors wish to kill their own thread, they could knock themselves out.
The second one prevents the infinite loop in that all scripts time out after 5 seconds. You can adjust the time.
Note that I noticed a performance degradation in the Groovy script execution time but did not notice a significant degradation in the rest of the web application.
Hope that helps.
I have a situation where I need to determine eligiblity for for one object to "ride" another. The rules for the vehicles are wildly confusing, and I would like to be able to change them without restarting or recompiling my project.
This works but basically makes my security friends convulse and speak in tongues:
class SweetRider{
String stuff
BigDecimal someNumber
BigDecimal anotherNumber
}
class SweetVehicle{
static hasMany=[constraintLinkers:VehicleConstraintLinker]
String vehicleName
Boolean canIRideIt(SweetRider checkRider){
def checkList = VehicleConstraintLinker.findAllByVehicle(this)
checkList.each{
def theClosureObject = it.closureConstraint
def iThinkINeedAShell = new GroovyShell()
def checkerThing = iThinkINeedAShell.evaluate(theClosureObject.closureText)
def result = checkerThing(checkRider)
return result
}
}
}
class VehicleConstraintLinker{
static belongsTo = [closureConstraint:ConstraintByClosure, vehicle:SweetVehicle]
}
class ConstraintByClosure{
String humanReadable
String closureText
static hasMany = [vehicleLinkers:VehicleConstraintLinker]
}
So if I want to add the rule that you are only eligible for a certain vehicle if your "stuff" is "peggy" or "waffles" and your someNumber is greater than your anotherNumber all I have to do is this:
Make a new ConstraintByClosure with humanReadable = "peggy waffle some#>" (thats the human readable explanation) and then add this string as the closureText
{
checkRider->if(
["peggy","waffles"].contains(checkRider.stuff) &&
checkRider.someNumber > checkRider.anotherNumber ) {
return true
}
else {
return false
}
}
Then I just make a VehicleConstraintLinker to link it up and voila.
My question is this: Is there any way to restrict what the GroovyShell can do? Can I make it unable to change any files, globals or database data? Is this sufficient?
Be aware that denying access to java.io and java.lang.Runtime in their various guises is not sufficient. There are many core libraries with a whole lot of authority that a malicious coder could try to exploit so you need to either white-list the symbols that an untrusted script can access (sandboxing or capability based security) or limit what anything in the JVM can do (via the Java SecurityManager). Otherwise you are vulnerable to confused deputy attacks.
provide security sandbox for executing scripts attempts to works with the GroovyClassLoader to provide sandboxing.
Sandboxing Java / Groovy / Freemarker Code - Preventing execution of specific methods discusses ways of sandboxing groovy, but not specifically at evaluate boundaries.
Groovy Scripts and JVM Security talks about sandboxing groovy scripts. I'm not a huge fan of the JVM security policy stuff since installing a security manager can affect a whole lot of other things in the JVM but it does provide a way of intercepting access to files and runtime stuff.
I don't know that either of the first two schemes have been exposed to widespread attack so I would be leery of deploying them without a thorough attack review. The JVM security manager has been subjected to widespread attack in a browser, and suffered many failures. It can be a good defense-in-depth, so I would not rely on it as the sole line of defense for anything precious.
I want to set up include paths (and other paths, like view script paths) based on the module being accessed. Is this safe? If not, how could I safely set up include paths dynamically? I'm doing something like the code below (this is from a controller plugin.)
public function dispatchLoopStartup(Zend_Controller_Request_Abstract $request) {
$modName = $request->getModuleName();
$modulePath = APP_PATH.'/modules/'.$modName.'/classes';
set_include_path(get_include_path().PATH_SEPARATOR.$modulePath);
}
I'm not sure whether it is safe or not, but it doesn't sound like the best practice. What if someone entered a module name like ../admin/? You should sanitize the module name before using it.
It's fine as long as you sanitize your variables before using them, but it won't be very performant. Fiddling with include paths at runtime causes a serious impact performance.
You're trying to load models/helpers per module? You should look at Zend_Application:
Zend_Application provides a bootstrapping facility for applications which provides reusable resources, common- and module-based bootstrap classes and dependency checking. It also takes care of setting up the PHP environment and introduces autoloading by default.
Emphasis by me