Exception when running lenskit-eval with the movielens dataset

Exception when running lenskit-eval with the movielens dataset - groovy

I'm trying to run the evaluation code in
http://lenskit.org/documentation/evaluator/quickstart/
but, after one minute or so, it finishes with an exception:
Exception in thread "main" Target "eval" does not exist in the project "null".
at org.apache.tools.ant.Project.tsort(Project.java:1912)
at org.apache.tools.ant.Project.topoSort(Project.java:1820)
at org.grouplens.lenskit.eval.EvalProject.executeTargets(EvalProject.java:168)
at org.grouplens.lenskit.eval.cli.EvalCLI.run(EvalCLI.java:91)
at org.grouplens.lenskit.eval.cli.EvalCLI.main(EvalCLI.java:127)
I just downloaded and unzipped ml-100k.zip, put the eval.groovy script in the same directory and run
lenskit-eval eval
I'm using lenskit 2.2 on Java 7.
What am I missing?
Cheers!!

The issue is the second eval - it tells the LensKit evaluator to try to run the eval target eval, which doesn't exist.
Either run:
lenskit eval
which is recommended, or the deprecated
lenskit-eval

Related

SparkR::dapply library not recognized

Introduction:
I've installed some packages on a Databricks cluster using install.packages on DR 9.1 LTS, and I want to run a UDF using R & Spark (SparkR or sparklyr). My use case is to score some data in batch using Spark (either SparkR or sparklyr). I've currently chosen SparkR::dapply. The main issue is that the installed packages don't appear to be available on the workers using SparkR::dapply.
Code (info reduced and some revised for privacy):
install.packages("lda", repos = "https://cran.microsoft.com/snapshot/2021-12-01/")
my_data<- read.csv('/dbfs/mnt/container/my_data.csv')
my_data_sdf <- as.DataFrame(my_data)
schema <- structType(structField("Var1", "integer"),structField("Var2", "integer"),structField("Var3", "integer"))
df1 <- SparkR::dapply(my_data_sdf , function(my_data) {
# lda #
#install.packages("lda", repos = "https://cran.microsoft.com/snapshot/2021-12-01/")
library( lda )
return(my_data_sdf)
}, schema)
display(df1)
Error message (some info redacted with 'X'):
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 9) (X0.XXX.X.X executor 0): org.apache.spark.SparkException: R unexpectedly exited.
R worker produced errors: Error in library(lda) : there is no package called ‘lda’
Calls: compute -> computeFunc -> library
Execution halted
System\Hardware:
Azure Databricks
Databricks Runtime 9.1 LTS (min 2 workers max 10)
Worker hardware = Standard_DS5_v2
Driver hardware = Standard_D32s_v2
Notes:
If I use 'require' no error message is returned, but 'require' is designed not to return an error message.
I'm able to run SparkR::dapply and preform operations, but once I add in library(lda) I get an error message even though I've installed 'lda' and I'm using DR 9.1 LTS
I'm using recommended CRAN snapshot to install - https://learn.microsoft.com/en-us/azure/databricks/kb/r/pin-r-packages
I'm using DR 9.1 LTS which (to my understanding) makes installed packages available to workers - "Starting with Databricks Runtime 9.0, R packages are accessible to worker nodes as well as the driver node." - https://learn.microsoft.com/en-us/azure/databricks/libraries/notebooks-r-libraries
If I include install.packages("lda", repos = "https://cran.microsoft.com/snapshot/2021-12-01/") in dapply, then it works without error, but this doesn't seem like best practice from documentation.
Questions:
How do I install R packages on Databricks clusters so they're available on all the nodes? What is the proper approach?
How do I make sure that my packages are available to SparkR::dapply?
Thoughts on including install.packages in the dapply function itself?
Should I try something other than SparkR::dapply?
Thanks everyone :)

After working with Azure support team, the following work-around / alternative option is to use an init script. Init script all together works well and plays nicely with Data Factory.
Example
From Notebook:
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("/databricks/scripts/r-installs.sh","""R -e 'install.packages("caesar", repos="https://cran.microsoft.com/snapshot/2021-08-02/")'""", True)
display(dbutils.fs.ls("dbfs:/databricks/scripts/r-installs.sh"))
From Cluster UI:
Add init script from 'Init Scripts' tab by following prompts.
References
https://docs.databricks.com/clusters/init-scripts.html#cluster-scoped-init-script-locations
https://nadirsidi.github.io/Databricks-Custom-R-Packages/

An addition to the init script approach, which works best by the way, is to persist the installed binaries of the R packages in DBFS which can be accessed by the worker nodes as well! This approach is easier for interactive workload, and also if you do not have the rights to modify cluster config to add init scripts.
Please refer this page for more details: https://github.com/marygracemoesta/R-User-Guide/blob/master/Developing_on_Databricks/package_management.md
Below codes can be run inside a Databricks notebook - this step is needed to be done only once. Later on, you won't have to install the packages even if you restart your cluster.
%python
# Creating a location in DBFS where we will finally store installed packages
dbutils.fs.mkdirs("/dbfs/persist-loc")
%sh
mkdir /usr/lib/R/persist-libs
%r
install.packages(c("caesar", "dplyr", "rlang"),
repos="https://cran.microsoft.com/snapshot/2021-08-02", lib="/usr/lib/R/persist-libs")
# Can even persist custom packages
# install.packages("/dbfs/path/to/package", repos=NULL, type="source", lib="/usr/lib/R/persist-libs")
%r
system("cp -R /usr/lib/R/persist-libs /dbfs/persist-loc", intern=TRUE)
Now just append the final persist location to .libPaths() in your R script where you used dapply - this can be done in the very first cell, and it will work just fine even with worker nodes. You will not have to install them again as well, which will save time also.
%r
.libPaths(c("/dbfs/persist-loc/persist-libs", .libPaths()))

I am getting an error "Failed to load org.apache.spark.examples"

I am getting this error, how can I run these Spark jobs (in Scala)?
Command:
bin/run-example /home/datadotz/streaming/wc_str.scala localhost 9999
Error:
Failed to load org.apache.spark.examples./home/datadotz/streami
java.lang.ClassNotFoundException: org.apache.spark.examples.

Start with the documentation -- https://spark.apache.org/docs/latest/#running-the-examples-and-shell
To run one of the Java or Scala sample programs, use bin/run-example [params] in the top-level Spark directory
It also mentions you can use spark-submit to run programs, which seems to take a path. Try that script instead.

Jenkins pipeline sh step fails

I'm learning Jenkins Pipelines and I'm trying to execute anything on a Linux build server but I get errors about it being unable to create a folder.
Here is the pipeline code
node('server') {
stage("Build-Release-Linux64-${NODE_NAME}") {
def ws = pwd()
sh "ls -lha ${ws}"
}
}
This is the error I get:
sh: 1: cannot create /opt/perforce/workspace/Dels-Testing-Area/MyStream-main#tmp/durable-07c26e68/pid; jsc=durable-8c9234a2eb6c2feded950bac03c8147a;JENKINS_SERVER_COOKIE=$jsc /opt/perforce/workspace/Dels-Testing-Area/MyStream-main#tmp/durable-07c26e68/script.sh: Directory nonexistent
I've checked the server while this is running and I can see that it does create
the file "/opt/perforce/workspace/Dels-Testing-Area/MyStream-main#tmp/durable-07c26e68/script.sh"
The file contains the following and is being created by Jenkins and not myself:
#!/bin/sh -xe
It does not matter what I try to execute using the sh step, I get the same error.
Can anyone shed some light on why this is happening?
-= UPDATE =-
I'm currently using Jenkins 2.46.2 LTS and there are a number of updates available. I'm going to wait for a quite period and perform a full update and try this again in case it fixes anything.

I found out that the problem was because I had a single quote in my folder name. As soon as I removed the single quote it ran perfectly. This also links to this Jenkins issue [https://issues.jenkins-ci.org/browse/JENKINS-44341] where I added a comment and voted for a fix.
So the fix is, only use the following characters in folder and job names [0-9a-zA-Z_-] excluding the square brackets and also don't use spaces.
I can confirm that using special characters and spaces in the "display name" field of a folder's configuration works fine.

How can I disable security checks for Jenkins pipeline builds

I'm running Jenkins in a local trusted environment where I'm trying to run this pipeline. This Jenkinsfile is checked into git.
#!groovy
node('master') {
def ver = pomVersion()
echo "Building version $ver"
}
def pomVersion(){
def pomtext = readFile('pom.xml')
def pomx = new XmlParser().parseText(pomtext)
pomx.version.text()
}
The first few times I ran the build, I needed to manually approve changes (Jenkins->Mange Jenkins-> In-process Script Approval). Now I get this Exception and there is nothing to approve. All I want to do is parse an XML file. Can these security checks be bypassed completely for pipeline builds?
org.jenkinsci.plugins.scriptsecurity.sandbox.RejectedAccessException: unclassified field groovy.util.Node version
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.unclassifiedField(SandboxInterceptor.java:367)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onGetProperty(SandboxInterceptor.java:363)
at org.kohsuke.groovy.sandbox.impl.Checker$4.call(Checker.java:241)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedGetProperty(Checker.java:238)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.getProperty(SandboxInvoker.java:23)
at com.cloudbees.groovy.cps.impl.PropertyAccessBlock.rawGet(PropertyAccessBlock.java:17)
at WorkflowScript.pomVersion(WorkflowScript:10)
at WorkflowScript.run(WorkflowScript:3)
at ___cps.transform___(Native Method)
at com.cloudbees.groovy.cps.impl.PropertyishBlock$ContinuationImpl.get(PropertyishBlock.java:62)
at com.cloudbees.groovy.cps.LValueBlock$GetAdapter.receive(LValueBlock.java:30)
at com.cloudbees.groovy.cps.impl.PropertyishBlock$ContinuationImpl.fixName(PropertyishBlock.java:54)
at sun.reflect.GeneratedMethodAccessor479.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
at com.cloudbees.groovy.cps.Next.step(Next.java:58)
at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:154)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:29)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:29)
at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:276)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:78)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:185)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:183)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Finished: FAILURE

Currently it is not possible. There is an open ticket for this problem https://issues.jenkins-ci.org/browse/JENKINS-28178

You can solve the problem by following steps:
install Permissive Script Security plugin (version 0.3 or newer)
If you are using a pipeline script, make sure Use Groovy Sandbox is checked. This can be done in the configuration of the job.
add permissive-script-security.enabled command line parameter to Jenkins master with value:
true if you want to disable the need to approve scripts, but potentially dangerous signatures will be logged:
-Dpermissive-script-security.enabled=true
no_security if you want to disable the need to approve scripts and disable also logging of the potentially dangerous signatures:
-Dpermissive-script-security.enabled=no_security

Try the following jenkins plugin: https://wiki.jenkins-ci.org/display/JENKINS/Permissive+Script+Security+Plugin
It disables the sandbox. Works for me.

As answered above: in the newer Jenkins versions Script Security has been tightened. However for the specific use case of reading a version from Maven's pom.xml one could use readMavenPom from the Pipeline Utility Steps Plugin:
pom = readMavenPom file: 'pom.xml'
pom.version
With some other solutions in this StackOverflow question as well.

I'd like to offer up a hack that I ended up implementing after scouring the interwebs for a solution and trying some of the solutions proposed here.
A little background on my setup:
Jenkins master (no slaves)
Dockerized Jenkins instance with a persistent volume for the jenkins_home directory
Jenkins jobs are delivered via the Jenkins Job DSL plugin with jobs written in .groovy
My scenario:
Anytime someone modified an existing Jenkins pipeline (via groovy) and introduced new functionality that used some custom groovy, Jenkins would fail the job and flag the code snippet for approval. Approval was manual and tedious.
I have tried the solutions posted above and they did not work for me. So my hack was to create a Jenkins job that runs a shell job that takes the list of signatures that need approved and then adds them to the /var/jenkins_home/scriptApproval.xml file.
Some gotchas:
The offending job still has to fail once for me to find/copy the offending code/signature
To get the change to take effect, you cant "reload from disk" for the file to get picked up. You have to restart the Jenkins process (in our case delete the container and bring it back up). This was not a big pain for me since Jenkins is restarted every morning.
In our world, we trust the devs who modify our Jenkins jobs so they are free to add signatures that need approval as needed. Plus the job is in source control so we can see who added what.
My Jenkins container also has xmlstarlet baked in so my shell job uses that for the updating of the file
Example of my Jenkins job's shell command:
#!/bin/bash
echo ""
#default location of the Jenkins approval file
APPROVE_FILE=/var/jenkins_home/scriptApproval.xml
#creating an array of the signatures that need approved
SIGS=(
'method hudson.model.ItemGroup getItem java.lang.String'
'staticMethod jenkins.model.Jenkins getInstance'
)
#stepping through the array
for i in "${SIGS[#]}"; do
echo "Adding :"
echo "$i"
echo "to $APPROVE_FILE"
echo ""
#checking the xml file to see if it has already been added, then deleting. this is a trick to keep xmlstarlet from creatine duplicates
xmlstarlet -q ed --inplace -d "/scriptApproval/approvedSignatures/string[text()=\"$i\"]" $APPROVE_FILE
#adding the entry
xmlstarlet -q ed --inplace -s /scriptApproval/approvedSignatures -t elem -n string -v "$i" $APPROVE_FILE
echo ""
done
echo "##### Completed updating "$APPROVE_FILE", displaying file: #####"
cat "$APPROVE_FILE"

Apache Nutch-2.2.1 installation

I am installing nutch2.2.1 on my centOS virtual machine and getting an error injecting the seed urls(directory name). I used this command:
/usr/share/apache-nutch-2.1/src/bin/nutch inject root/apache-nutch-2.1/src/testresources/testcrawl urls
And i got an error :
Error: Could not find or load main class org.apache.nutch.crawl.InjectorJob
Similarly, for the command
/usr/share/apache-nutch-2.1/src/bin/nutch readdb
gives me an error:
Error: Could not find or load main class org.apache.nutch.crawl.WebTableReader
What should i do to fix these errors?
I am following the tutorial from: http://wiki.apache.org/nutch/Nutch2Tutorial and followed the same steps as suggested.
Also my query also revolves around setting the path for ant. Every time i open a new session i have to set the ANT_HOME and PATH environment variable manually. And then they work all fine. Same is the case with setting JAVA_HOME.

You should go to $NUTCH_HOME/runtime/local/ directory to run the the commands.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Exception when running lenskit-eval with the movielens dataset - groovy

The issue is the second eval - it tells the LensKit evaluator to try to run the eval target eval, which doesn't exist. Either run: lenskit eval which is recommended, or the deprecated lenskit-eval

Related

SparkR::dapply library not recognized

I am getting an error "Failed to load org.apache.spark.examples"

Jenkins pipeline sh step fails

How can I disable security checks for Jenkins pipeline builds

Apache Nutch-2.2.1 installation

Categories

Resources