How to upload some dlls from local to Databricks cluster? - apache-spark

I am evaluating dotnet Spark and in order to debug application, I have to copy the dlls to /usr/local/bin in the Databricks cluster.
Today, I am able to create a init script that copies the dlls from dbfs to /usr/local/bin while starting the cluster. Then, I start spark using databricks-connect and debug my dotnet application (as if spark is running in local).
I would like to automate this on a running cluster. That is, when I start debugging my dotnet application in Visual studio, it deletes the dlls in /usr/local/bin and then copy dlls from my local dotnet debug folder to /usr/local/bin.

Related

I am getting JAVA_HOME not set error on azure python webapp

i am trying to install jdk on python webapp so that I can use Sparksession in my python code. But jdk gets installed in /usr/lib directory instead of /home directory and anything outside /home directory does not persist. That is why even if I use Java tool installer and try to install jdk through command line in azure pipeline, still i get JAVA_HOME not set error.
You can't modify any OS level as App Service exists in a sandbox; see https://learn.microsoft.com/en-us/azure/app-service/operating-system-functionality. Every app service comes preconfigured with various SDKs. If your app is hosted in a Windows app service, you can run echo %JAVA_HOME% from the CMD/Powershell to see what the version is.
If the available version is lower than what you need, I suggest switching to Web App for Containers where you can install the JDK you need on your image.

How to install dependent binaries on Azure App Service with Linux?

I have a spring boot application that I am running on Azure App Service (Linux). My application has a dependency on a binary and needs it to be present on the system. How do I install it on my App service?
I tried the following two options:
Did ssh via Kudu and installed the package ($ apk add package). But the changes are not persisted beyond /home. The dependencies were installed in other folders and when the app service was re-deployed all those dependencies were gone
Used the post deployment hook to run the command "$ apk add package" to install once the deployment finishes. This script is run as can be seen from the custom log statements but still i do not see the installed package. Even when is use apt-get it says "unable to lock administration directory"
Using a statically compiled binary is not an option for me since that has its own issues.
Thanks
For the Tomcat, Java SE and WildFly apps on App Service Linux, you can create a file at /home/startup.sh and use it to initialize the container in any way you want (Example: you can install the required packages using this script).
App Service Linux checks for the presence of /home/startup.sh at the time of startup. If it exists, it is executed. This provides web app developers with an extension point which can be used to perform necessary customization during startup, like installing necessary packages during container startup.
I think this is a common problem with Linux on Azure.
I recommend having a step back and consider one of the following options.
Run your application in a container that has all the dependencies
you are looking for.
Run your application on Linux VM IaaS instead
of Azure App Service (Linux),PaaS.
Run your application on Windows OS PaaS and add extension for your dependency.(Most likely you won't run into this problem when using Windows OS)
While I understand that none of them might be acceptable by you, but I have not found a solution for that problem in those specific circumstances.

Deploy a package on Windows by Jenkins running on Linux

I have a Jenkins installed on a Linux build server and I need a project to be deployed on a Windows machine. Jenkins builds a simple zip package that contains an executable. The package can be uploaded with FTP.
But how to deploy that package after uploading? Call a batch script for example. For Linux servers I just use "Publish Over SSH Plugin" plugin.
Using Jenkins ver. 1.638.
You need to know some powershell and use the Jenkins powershell plugin.
But i propose your to Dockerize the project and deploy like that.

Mobius: How to set CSharpBackendPortNumber for c# app to talk to Spark Cluster on Linux?

I have this very basic code to be run from windows machine connecting to a spark cluster running on a linux virtual box:
string sparkMaster = "spark://192.168.1.193:7077";
string hdfsURI = "hdfs://192.168.1.193:8020";
var sparkContext = new SparkContext(new SparkConf().SetAppName("MobiusWordCount").SetMaster(sparkMaster));
Followed the instruction as in "getting started page" (installed spark on the windows gateway machine and other prerequisites):
D:\SparkCLR\runtime>scripts\sparkclr-submit.cmd --master spark://192.168.1.193:7077 --total-executor-cores 2 --exe Spar
kCLR.exe "C:\Users\aaa\Documents\Visual Studio 2015\Projects\SparkCLR\SparkCLR\bin\Debug"
Got this error:
SPARKCLR_JAR=spark-clr_2.10-1.6.100.jar
Exception in thread "main" java.lang.NullPointerException
at org.apache.spark.launcher.SparkCLRSubmitArguments.concatCmdOptions(SparkCLRSubmitArguments.scala:389)
at org.apache.spark.launcher.SparkCLRSubmitArguments.buildCmdOptions(SparkCLRSubmitArguments.scala:492)
at org.apache.spark.launcher.SparkCLRSubmitArguments$.main(SparkCLRSubmitArguments.scala:30)
at org.apache.spark.launcher.SparkCLRSubmitArguments.main(SparkCLRSubmitArguments.scala)
D:\SparkCLR\runtime>scripts\sparkclr-submit.cmd --verbose --master spark://192.168.1.193:7077 --total-executor-cores 2
--exe SparkCLR.exe "C:\Users\aaa\Documents\Visual Studio 2015\Projects\SparkCLR\SparkCLR\bin\Debug"
SPARKCLR_JAR=spark-clr_2.10-1.6.100.jar
Exception in thread "main" java.lang.NullPointerException
at org.apache.spark.launcher.SparkCLRSubmitArguments.concatCmdOptions(SparkCLRSubmitArguments.scala:389)
at org.apache.spark.launcher.SparkCLRSubmitArguments.buildCmdOptions(SparkCLRSubmitArguments.scala:492)
at org.apache.spark.launcher.SparkCLRSubmitArguments$.main(SparkCLRSubmitArguments.scala:30)
at org.apache.spark.launcher.SparkCLRSubmitArguments.main(SparkCLRSubmitArguments.scala)
Any thoughts?
You do not need to compile Mobius code to use it in Linux. You can either get an official Mobius release and use it or if you have pre-built Mobius binaries (jar file and dlls) from a GitHub repo, you can use them in Windows or Linux irrespective of the platform on which they were built. Mono is a requirement to run Mobius in Linux. If you choose to build Mobius in Linux you need Mono for that as well.
You need to specify CSharpBackendPortNumber and CSharpWorkerPath in your driver config file only for debugging Mobius driver app in local mode. This will enable debugging of your C# Spark application by connecting your C# driver process from Visual Studio to the JVM process (running CSharpRunner) that is launched separately in an IDE (IntelliJ or Eclipse) or using sparkclr-submit script with "debug" parameter.
For normal (non-debug) execution of Apache Spark C# applications implemented using Mobius, you just need to run sparkclr-submit script without enabling debug switches (CSharpBackendPortNumber and CSharpWorkerPath) in the driver config file. You can find instructions to run a Mobius application at getting-started page. The instructions cover running Mobius in standalone and YARN clusters in Windows and Linux.

Do I need Hadoop in my windows to connect on hbase running on linux?

Do I need Hadoop in my windows to connect on hbase running on ununtu with hadoop?
My hbase is running fine on my ubuntu machine. I am able to connect with eclipse on same machine ( I am using kundera to connect hbase). Now I want to connect hbase from my windows 7 eclipse IDE . Do I need to install hadoop on my windows to connect remote hbase which is on ubuntu .?? when I tried I am getting something like this
Failed to locate the winutils binary in the hadoop binary path
Read about open-source technology .IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
All you need are the hadoop, hbase jars and then Configuration object initialized
with:
1. hbase.zookeeper.quorum(if cluster) details and other information initialized.
2. hbase.zookeeper.property.clientPort
3. zookeeper.znode.parent
And then, getting connection with the above config object
This problem usually occurs in Hadoop 2.x.x version. One of the option is to build Windows distribution for Hadoop version.
Refer this link:
http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os
But, before building try to use the zip file given in this link:
http://www.srccodes.com/p/article/39/error-util-shell-failed-locate-winutils-binary-hadoop-binary-path
Extract this zip file and paste the files under hadoop-common-2.2.0/bin to $HADOOP_HOME/bin directory.
Note: For me this works even for Hadoop 2.5 version.

Resources